Get relevant keywords per document

Dnreb · July 24, 2015, 2:50pm

Hello,

I would like to use the textprocessing to detect relevant key words within a set of documents.

I have a list of documents with assigned categories and would like find the right keywords which we can use to search (build a query) in a larger database of documents documents within these categories.

I tried Keygraph keyword extractor and group by category to get all most occuring keywords per category.

I get a result, but it is not really good. I suppose there is a better way to detect the relevant.

thanks

Bernd

kilian.thiel · July 28, 2015, 4:00pm

Hi Bernd,

do you want to find keywords that are good to distinguish between documents of the classes (categories)? If so you can first create a bow and then a document vector and use classifier nodes to find those words / featurtes, e.g. Tree Ensembles or Naive Bayes.

Attached is a workflow that uses Tree Ensembels and Naive Bayes to compute a score for how distinguishing terms are w.r.t. classes.

Cheers, Kilian

finddistinguishingterms.zip

Dnreb · August 25, 2015, 4:48pm

Hello Kilian,

thanks a lot for for your help.

Bernd

system · June 2, 2023, 9:49pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.