Get relevant keywords per document

Hello,

I would like to use the textprocessing to detect relevant key words within a set of documents.

I have a list of documents with assigned categories and would like find the right keywords which we can use to search (build a query) in a larger database of documents documents within these categories.

I tried Keygraph keyword extractor and group by category to get all most occuring keywords per category.

I get a result, but it is not really good. I suppose there is a better way to detect the relevant.

thanks

 

Bernd

Hi Bernd,

do you want to find keywords that are good to distinguish between documents of the classes (categories)? If so you can first create a bow and then a document vector and use classifier nodes to find those words / featurtes, e.g. Tree Ensembles or Naive Bayes.

Attached is a workflow that uses Tree Ensembels and Naive Bayes to compute a score for how distinguishing terms are w.r.t. classes.

Cheers, Kilian

 

Hello Kilian,

thanks a lot for for your help.

Bernd