
tfidf(word, blob, bloblist) computes the TF-IDF score.Add 1 to the divisor to prevent division by zero. We take the ratio of the total number of documents to the number of documents containing word, then take the log of that. The more common a word is, the lower its idf. idf(word, bloblist) computes "inverse document frequency" which measures how common a word is among all documents in bloblist.A generator expression is passed to the sum() function. n_containing(word, bloblist) returns the number of documents containing word.We use TextBlob for breaking up the text into words and getting the word counts.


words ) def idf ( word, bloblist ): return math. words ) def n_containing ( word, bloblist ): return sum ( 1 for blob in bloblist if word in blob. Import math from textblob import TextBlob as tb def tf ( word, blob ): return blob.
