I have a column of item descriptions and another column of suppliers that supply the items. The item descriptions are manually entered and the level of detail just depends on who entered it. So I can have black bag, bag, sml bag, bag 984/12, etc and they'll have suppliers that matched to them. I have over 10 million rows. I'm wondering if there's a node or set of nodes I can use to intelligently group/categorize all descriptions that have "bag" in them as a bag. Similarly, I have other descriptions that have the word "tube" in them and I want to group/categorize them and see which suppliers provide the bulk of a category of products, etc. I've read the data in, done strings to document. I don't know if I need to do tagging. Frequencies only give me the number of times a term appears per document (mostly 1 or 2) but I can't see how all terms(in my item description column) appear in my whole dataset. Any pointers will be helpful. Again, the goal is to be able to group/categorize the items based on donimant keywords that appear in the whole dataset. Thanks.
See http://tech.knime.org/forum/knime-textprocessing/dictionary-tagging-help for a possible approach. You don't need necessarily the Textprocessing nodes it can also be done by regular String manipulation and matching nodes.