Hi, I'm trying to set a workflow to solve the following scenario:
"given a document (it could be a Curriculum Vitae), and a set of other documents (could be a set of jobs descriptions), find which are the best matching documents in the set (along with a distance, similarity, etc)"
I've tried few options, but I'm new to Knime and I felt I wanted to ask the experts on (at least) how to structure my problem, and/or what are the recommended components I should focus on.
Any help is highly appreciated!
thanks in advance
the described scenario is a similarity search. To do a similarity search on documents in KNIME you need the Textprocessing extensions and the Distance Matrix extension. First transform the documents into documents vectors to get a numerical representation. Based on these vectors you can apply a similarity or distance measure, e.g. the cosine similarity. The Distance Matrix extension provides the node "Similarity Search". In this node you can specify a similarity measure to use for the search. The top n most similar data points to a reference data point will be searched and returned.
Attached you find a small example workflow how the Similarity Search node can be used on documents.
Alternatively you could use the Indexing Extension. This extensions uses Lucene to create indexes of e.g. documents which can be queried later on.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.