Günter Röhrich
1 min readSep 24, 2020

--

Hi Josh, thanks a lot! Yes of course, depending how large the entries are, I'd decide whether to extract and store the data or examine the content on the fly.

Also, depending on the complexity of the data (length, total number of unique words, etc. ), you might want to skip the tf-idf part and only use cosine similarity, but this is up to the data and how much weight you would like to place on the importance of the individual words :)

--

--

Günter Röhrich
Günter Röhrich

Written by Günter Röhrich

Data Scientist @ Mondi Group • Georgia Tech 🎓 • linkedin.com/in/groehrich

No responses yet