I have a training dataset for Search Data that looks something like:
Phrase | Value1 | Value 2 | Value 3 | Classification |
---|---|---|---|---|
abc | 100 | 50 | 17 | Brand Term |
abc def | 99 | 99 | 10 | Category Term |
There are a mix of numerical values that are associated with the delivery of the Search phrase (cost, position, click rate, etc) as well as the phrase itself. I have manually classified 2% of terms into either Brand Term or Category Term buckets and want to build a Learner to classify the other 98%.
I would like to use a combination of terms in Phrase as well as Values to build a classifier to most accurately predict outcome. For instance when Brand name is in Phrase it is always classified as Brand Term so seems this would be valuable feature to carry over.
My problem is I am not sure how to conver the "Phrase" column into a feature set of the Bayesian Learner. I would assume I need to create either a Bit or Term Vector or an individual column per unique term within Phrase (example below):
Phrase | Value1 | Value 2 | Value 3 | abc | def | Classification |
---|---|---|---|---|---|---|
abc | 100 | 50 | 17 | 1 | 0 | Brand Term |
abc def | 99 | 99 | 10 | 1 | 1 | Category Term |
Any good examples of doing something like this either via Text Processing nodes or other KNIME nodes?