Hi, I am looking at the workflow example for Lexicon-based sentiment analysis. I noticed that the cut-off point formula in the Rule-engine node to classify the score as positive is the average sentiment score instead of zero. May I know why shouldn’t it be zero instead, and considering I only have a subpar knowledge on sentiment analysis, is there a suggested reference that I can read regarding this classification rule which could elaborate the basics well for beginners?
And secondly, in the workflow example given in Knime, I believe that the actual class has already been predetermined, and that this class column is extracted using the Category-to-Class node (if I understand it right). I would like to know if I am to construct the confusion matrix from my own data, would it be suffice for my manually classified dataset to comprise only a portion of the total document number? For example, I want to study 1,000 online reviews but my confusion matrix data totals up to only 10% of the whole study?
Thank you in advance!