Sentiment Analysis (Classification) of Documents with NGram Features

The workflow reads textual data from a csv file and converts the strings into documents. The documents are then preprocessed, i.e. filtered and stemmed. The preprocessing magic takes place in the Preprocessing metanode. In the Feature Creation metanode two kinds of feature sets and document vectors are created. The top set of vectors contains only single word features the bottom set of vectors contains single word and 2-gram features. After the document vectors have been created the sentiment class is extracted and two predictive models are built and scored. One model based only on single word features and the second model based on single word and 2-gram features. Bothe models are compared in the ROC curve node.

This is a companion discussion topic for the original entry at