1 min readSep 24, 2020
Hi Josh, thanks a lot! Yes of course, depending how large the entries are, I'd decide whether to extract and store the data or examine the content on the fly.
Also, depending on the complexity of the data (length, total number of unique words, etc. ), you might want to skip the tf-idf part and only use cosine similarity, but this is up to the data and how much weight you would like to place on the importance of the individual words :)