Noun Phrase Extract for Product Descriptions

I have an Excel file that contains a list of items purchased from various vendors. This list is primarily product descriptions or hand typed descriptions of what was purchased. As an example, you may see 1/2" Stainless Steel Crescent Wrench or Dinner or Package of Erasable Markers for conference room. I’d like to extract the noun phrases out of these so that I can start grouping the items that are similar (something like “Crescent Wrench” and “Erasable Markers”). I can pull the data in and label the documents with the POS. Is there a way to extract just noun phrases? I have only been able to filter out verbs, etc… but not noun phrases. I’ve done this in Python fairly easily but would like to learn how to do it in Knime. Additionally, is there a preferred tagger for something like this. I’ve tried a few but continue to get terms such as ‘dia’ marked as a nns. Maybe I need to come up with a dictionary of abbreviated measures that I remove.

Hi @UtilityHawk,

I’d try with OpenNLP NE Tagger node. The node recognizes standard named entities in English based on the Apache OpenNLP models and assigns them the corresponding tag. The node uses built-in OpenNLP models based on the selected entity. Starting from KNIME Analytics Platform 3.7.0 you can also input a model in the OpenNLP NE tagger.

Alternatively, you could come up with a dictionary and tag those names with the Dictionary Tagger node. Later on, you can keep only the filtered names with the Tag Filter node (this applies also with the OpenNLP NE Tagger node). The Dictionary Tagger (Multi Column) node recognizes named entities specified in one or more dictionary columns and assigns a specified tag value and type.

KNIME allows also to train a model for named-entity recognition. For that you can use the StanfordNLP NE Learner node. It creates a conditional random field model based on documents and a dictionary with entities that occur in the documents. After that you can use the StanfordNLP NE tagger node to tag new documents and validate the model. If this sounds interesting you can check the example workflow available on the EXAMPLES Server at the following path 08_Other_Analytics_Types/01_Text_Processing/14_NER_Tagger_Model_Training. More info available here:

Hope that helps.

Hi @UtilityHawk, we don’t have a noun phrase tagger model integrated so far. Tagger models include POS and NER but no phrases.

I suggest, as @Vincenzo already mentioned to use the POS tagger and filter the nouns or use a dictionary tagger.

Cheers, Kilian

1 Like