Category to class = "undefined"

Hi everyone,

I am performing dictionary-based sentiment analysis on the KNIME analytics platform. However, when I run the Category to Class node, it returns ‘undefined’. I have been unable to solve this problem. Below, I am providing the workflow and the data I used. Can you please help me?
Thanks everyone in advance.

final2.knwf (64.9 KB)

Since my data file is quite large, I’m providing the Kaggle link. I used the Books_rating.csv file.

Hi @eceknl,

the explanation might be simple when you check the node description::

The value of the class is the document’s category as string.

Since you converted a string to a document, it has no category, hence the class is empty. The data lacks that information so you might want to enrich it with other sources.

On another note, some easily overlooked default node settings can cause inaccuracy in your analysis i.e. if you pay attention to the console you will see:

WARN Row Sampling 3:5 Class column contains more classes (2062649) than sampled rows (100000)

Though, I was not able to execute the entire workflow until the Metanode in question since other files are missing:

Here are a few tips you can use to optimize your workflow to improve compute power and efficiency by:

  1. CSV Reader: You filter for one column after reading the huge file. You can configure the CSV Reader to only read that column.
  2. Dimensional Reduction: Prior to sampling remove duplicates and empty row. If you have only one column it is also much faster using a GroupBy to get the unique values than using a Duplicate Row Filer
  3. In regards to #2 you can use the GroupBy, when getting a set without missing values, to both get unique values and remove missing in one go improving performance by 300 to 400 % :wink:

  1. Memory usage: Since you convert 3 million long strings to a document, lots of cells and tables are caches. I have a system with 64 GB low latency memory, a beefy CPU and fast SSD. Yet, it started to sweat which is unusual. You might want to consider using the Don’t save Nodes to discard interims data
  1. Your meta node has a loose end (unconnected port)

With the performance optimizations in place, the saving operation was speed up substantially from about a few minutes to mere seconds. Here is the improved workflow. You might want to search for sources about the document author and it’s category to enrich your data and accomplish your goal.

Best
Mike

4 Likes

Hi @mwiegand ,

I forgot to add the positive and negative dictionaries. Your provided information is very valuable, and I will take them into consideration. Now, I will place the missing files below. Could you please review them again? Because I still haven’t been able to solve the problem. Thank you for attention!!

MPQA-OpinionCorpus-NegativeList.csv (59.3 KB)
MPQA-OpinionCorpus-PositiveList.csv (32.8 KB)

I am on it but Knime crashed … I did worked on many workflows in parallel xD

Update: Here you go … as predicted, when the category is set, the document class is too.

2 Likes

Hi @mwiegand ,

Actually, when I run the Category to Class node, I want it to give me the values ‘positive’, ‘negative’, and ‘neutral’. Because after this node, I need to run the Score node to obtain a confusion matrix. So unfortunately, I cannot use the last suggestion you provided.

After making the last few corrections, I’ll upload the workflow again:
final2.knwf (59.6 KB)

I’ve also included the workflow I’ve taken as an example:
26_Sentiment_Analysis_Lexicon_Based_Approach.knwf (1.2 MB)

So did you managed to resolve your issue or does it remain unresolved?

1 Like

I couldn’t solve the problem. :confused:

Without offense but you might imagine it is bit challenging when support is given but nothing, not even the ordering, is taken over. I am also just doing guesswork without any English comments or a screenshot that guides me to the point in your workflow where your try to execute the sentiment analsis.

Since you have provided a reference workflow I assume this is the location, correct?

Comparing the example workflow with yours and the one I provided I do see fast differences especially about in regards to the “Calculate Score”.

You might want to check this section out.

PS: I just copied the Metanode “Calculate Score” over, executed it and compared it to the one you had present. The results are pretty much identical and both contain a sentiment analysis column.

The Color Manager and Scorer nodes are both producing the same results. I don’t seem to get what the issue is.

Best
Mike

1 Like

I apologize for my incomplete explanation. I have detailed the problem further. I hope this explanation is more helpful.

#1
The image below is the output table of the Category to Class node from the example workflow I followed:

#2
The image below is the confusion matrix I obtained as a result of the Scorer node from the example workflow I followed.

Now I am moving on to my own workflow:

#3
The image below is the output table of the Category to Class node from my workflow. The values are coming in as undefined:

#4
The image below is the confusion matrix I obtained as a result of the Scorer node from my workflow. However, since the values in the Category to Class node came as ‘undefined’, I did not get a correct output:

That’s my problem. I want to obtain a confusion matrix as in the example workflow, but I can’t.

Thanks for sharing. I ma currently trying to finish my workflow for the current Knime Data Challenge but also try to enjoy vacation with my family. I will momentarily pause my work on this solution, resuming later this or next week and hope for your understanding. Though, chances are someone else is picking this up too :wink:

1 Like

I understand you. Nevertheless, thank you very much. You’ve put in a lot of effort. I wish you a wonderful vacation!!! :smiley:

1 Like