Lexicon Based Approach for Sentiment Analysis

This workflow shows how to perform a lexycon based approach for sentiment analysis of IMDB reviews dataset. The dataset contains movie reviews, previously labelled as positive/negative. The lexicon based approach assigns a sentiment to each word in a text based on dictionaries of positive and negative words. A sentiment score is then calculated for each document as: (number of positive words - number of negative words) / total number of words.


This is a companion discussion topic for the original entry at https://kni.me/w/zp_hhUROHNXToZHX

Hello, please note that there is some amount of duplication within both dictionaries. Additionally, the words boast, dig, excuse, fine, fun and keen are in both.

Somehow the my document type is “undefined” when I use this workflow for my data. Do you know what might be the problem?