Graph-based methods generate a graph of related terms from the documents. In contrast to TF-IDF, it extracts keywords on a single document basis and does not need a large corpus. The YAKE’s advantage is that it does not depend on the external corpus, length of the text document, language or domain. In the end, the list of keywords is sorted based on their scores. The similarity is computed with either the Levenshtein similarity, the Jaro-Winkler similarity, or the sequence matcher. It keeps the one that is more relevant (one with a lower score).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |