Sentiment Analysis Training with Integrated Deployment

You need to consider carefully the function of each node, and what it is they are doing.

In the case of the Cell Splitter, it doesn’t matter that it has a column dimension of 2405, since that node is performing as intended. The important bit happens in the Column Filter, where extraneous columns (like text and sentiment, etc) should be removed as I showed above. You only want the columns with titles like Text_Arr* coming out of this node, so that you get the proper dimensionality of 2400 being fed to the neural network.

Between the Strings to Document node and the output of the Text Preprocessing node, the number of columns changes from 3 to 4 because an additional column is added - “Preprocessed Document” - as a result of all the things that are happening inside the component (filtering, stemming, and so on). This is working as designed.

The number of columns after Zero Pad is much larger because the words in the document has been converted into a numeric representation, one column for each word (or token).

You are getting 2402 columns after the Zero Pad for reasons I explained above. You can fix that in the Column Filter node.

Also, I just wanted to step back for a moment and mention that starting your KNIME journey by combining both deep learning and text analytics is very advanced. If you’re getting frustrated, it might be beneficial to start with some of our L1 / L2 training courses and work up to this point.

You can find more information about our course offerings here:

2 Likes

Thank you for your suggestion.

Many thanks for your help with this workflow it now executes almost to the end.

Unfortunately the last node (Scorer) does not have the data columns to compare.

I have tried to find where this occured, but without success, please help.

I have uploaded the workflow.

Hi @stockdaa -

Oddly, I can no longer see your workflow on the Hub when I search for your username. Did you delete it by accident?

Do you have a direct link for it?

I just checked and as you say the workflow was not on the Hub. As you say I must have deleted it by accident. Sorry about that.

I have now uploaded it.

Please let me know if you can see it.

I added a Joiner node near the end to address this problem. Workflow here:

Having said that, all of the predictions from your Keras node have the same probability (0.521), so all of the predictions themselves are the same (1). It’s not immediately obvious to me why that is, but as I mentioned before, training a neural network on so few records (40) probably contributes to the low quality of the predictions… there’s just not enough for the model to work with yet.

Thank you for finishing off the final stage of the workflow. I would not have thought of this solution.

There is a lot of research that only has a relatively small data set, which is still essential for the development of new theory. Are there any activity functions, for example, that can deal with small data sets, or are there any in development?

Would you please explain how I can use this workflow to find the sentiment of new data, so that I can see how well it works. I will be using an Excel file for the new data.

@stockdaa Let me answer the second question first.

Pages 316-317 of the Codeless Deep Learning with KNIME book describe how you can use the automatically created deployment workflow (generated using Integrated Deployment functionality) to predict sentiment of unseen data. In the book, the example given is the Sentiment Analysis with Call Workflow - Deployment – KNIME Hub workflow. The key to this workflow is the Call Workflow (Table Based) node.

Getting back to your first question, there are definitely other ways you can approach sentiment analysis that are less complex than using deep learning, but may work better with smaller datasets anyway. Some examples are available here as part of our From Words to Wisdom book: vincenzo/Public – Chapter7 – KNIME Hub

That is very helpful, thank you.

I have added some more data to the input training set file but the workflow is not using it. I can see the new data in the configuration window of the Excel reader node but not in the preprocessing training data

I have added some more data to the input training set file but the workflow is not using it. I can see the new data in the configuration window of the Excel reader node but not in the preprocessing training data.
Suggestions please on how to fix this, so that the new data is processed along with the previous data.

A review of the dictionary created by KNIME includes numbers despite including a number removing node. Some numbers are preceded by a dollar or pound sign or a have a letter at the end. Also place names and company names have been included in the dictionary which do not impact on sentiment.
Any ideas for a work around please.

Hi @stockdaa -

Can you upload the latest version of your workflow the Hub and link it here? Without that it’s hard to understand your approach.

The latest workflow version has now been uploaded, attached is the updated Excel data file that I would like to use.

I can see the recent WF you uploaded to the Hub - thanks. A few comments -

  1. I think you forgot to attach the updated Excel file you mention in your previous post.

  2. It’s not clear to me what your “new data” is. I see a single Excel spreadsheet being input with 58 rows: 40 are being used to train the model, and 18 to test the model. Do you expect something different? If so, did you reset and re-execute the Excel Reader to make the sure the new data is being read into KNIME?

  3. As for the dictionary, it’s not uncommon that you would have to customize this a bit further. I’d suggest sorting alphabetically and the using a Row Filter node to get rid of “words” starting with numbers or symbols (which will show up at the beginning and end of a sorted dictionary).

I reset and re-executed the Excel Reader but the Keras Network Learner node says that the input shape does not match the tensor shape [2329] vs [2400] when all I have done is increase the number of rows of data in the Excel spreadsheet

I reset and re-executed the Excel Reader but the Keras Network Learner node says that the input shape does not match the tensor shape [2329] vs [2400] when all I have done is increase the number of rows of data in the Excel spreadsheet

Have you uploaded an updated workflow to the Hub?

EDIT: I see one dated Jan 27, let me check.

Yes that is the one. I uploaded it.