I would like to use the textprocessing to detect relevant key words within a set of documents.
I have a list of documents with assigned categories and would like find the right keywords which we can use to search (build a query) in a larger database of documents documents within these categories.
I tried Keygraph keyword extractor and group by category to get all most occuring keywords per category.
I get a result, but it is not really good. I suppose there is a better way to detect the relevant.
do you want to find keywords that are good to distinguish between documents of the classes (categories)? If so you can first create a bow and then a document vector and use classifier nodes to find those words / featurtes, e.g. Tree Ensembles or Naive Bayes.
Attached is a workflow that uses Tree Ensembels and Naive Bayes to compute a score for how distinguishing terms are w.r.t. classes.
thanks a lot for for your help.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.