Dear KNIME Community,
which are the best ways or nodes to minimize the vector space of documents?
And why is it so important to minimize the space? Which advantages does it have?
Thanks,
Canan
Dear KNIME Community,
which are the best ways or nodes to minimize the vector space of documents?
And why is it so important to minimize the space? Which advantages does it have?
Thanks,
Canan
Hi Canan,
When you convert a document in a document vector you end up with a matrix where each vector is represented by a term. The properties of a vector matrix are:
Dimensionality reduction in this case has the advantage to discard infrequent and very frequent terms so then you can somehow focus on the important terms. In this way you will reduce the feature sparseness in your matrix.
How can you reduce the collection vocabulary? In this case keywords extraction might help. For more details, please have a look at one of our latest blogs: https://www.knime.com/blog/keyword-extraction-for-understanding.
Hope that helps!
Best,
Vincenzo
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.