I have some data from a raw data file that is organized in 5 different columns.
1.How do I go about creating a classification tool that would classify my data per line whether it is considered an inclusion “1” or an exclusion “0”? I am guessing that I will have to let my model know what qualifies as an inclusion or an exclusion based on specific keywords or chains of words?
2.How do I allocate a weight system over some of the keywords (some of the words within the line of data, if present, would out-weight some other keywords and win the inclusion over exclusion and vice versa.)
3. Finally, is there a way to manually train the model when it is classifying incorrectly?
Welcome to the forum. It sounds like you are trying to build a classification model that would incorporate some text processing features. We have several workflows on the KNIME Hub that could help you get started.
Here’s an example workflow that classifies sentiments of IMDB movie reviews. It includes all the usual steps of a text analysis problem, like preprocessing, transformation, and classification. I suppose if you wanted to apply weights to your certain keywords you could manipulate the term frequencies as needed before the classification step.
If you want to manually provide training labels to data in an iterative way, we have a workflow for that too. We also have a detailed blog post that describes the workflow.
Take a look through these resources and see what you think!