You need to consider carefully the function of each node, and what it is they are doing.
In the case of the Cell Splitter, it doesn’t matter that it has a column dimension of 2405, since that node is performing as intended. The important bit happens in the Column Filter, where extraneous columns (like text and sentiment, etc) should be removed as I showed above. You only want the columns with titles like Text_Arr* coming out of this node, so that you get the proper dimensionality of 2400 being fed to the neural network.
Between the Strings to Document node and the output of the Text Preprocessing node, the number of columns changes from 3 to 4 because an additional column is added - “Preprocessed Document” - as a result of all the things that are happening inside the component (filtering, stemming, and so on). This is working as designed.
The number of columns after Zero Pad is much larger because the words in the document has been converted into a numeric representation, one column for each word (or token).
You are getting 2402 columns after the Zero Pad for reasons I explained above. You can fix that in the Column Filter node.
Also, I just wanted to step back for a moment and mention that starting your KNIME journey by combining both deep learning and text analytics is very advanced. If you’re getting frustrated, it might be beneficial to start with some of our L1 / L2 training courses and work up to this point.
You can find more information about our course offerings here:
I added a Joiner node near the end to address this problem. Workflow here:
Having said that, all of the predictions from your Keras node have the same probability (0.521), so all of the predictions themselves are the same (1). It’s not immediately obvious to me why that is, but as I mentioned before, training a neural network on so few records (40) probably contributes to the low quality of the predictions… there’s just not enough for the model to work with yet.
Thank you for finishing off the final stage of the workflow. I would not have thought of this solution.
There is a lot of research that only has a relatively small data set, which is still essential for the development of new theory. Are there any activity functions, for example, that can deal with small data sets, or are there any in development?
Would you please explain how I can use this workflow to find the sentiment of new data, so that I can see how well it works. I will be using an Excel file for the new data.
@stockdaa Let me answer the second question first.
Pages 316-317 of the Codeless Deep Learning with KNIME book describe how you can use the automatically created deployment workflow (generated using Integrated Deployment functionality) to predict sentiment of unseen data. In the book, the example given is the Sentiment Analysis with Call Workflow - Deployment – KNIME Hub workflow. The key to this workflow is the Call Workflow (Table Based) node.
Getting back to your first question, there are definitely other ways you can approach sentiment analysis that are less complex than using deep learning, but may work better with smaller datasets anyway. Some examples are available here as part of our From Words to Wisdom book: vincenzo/Public – Chapter7 – KNIME Hub
I have added some more data to the input training set file but the workflow is not using it. I can see the new data in the configuration window of the Excel reader node but not in the preprocessing training data
I have added some more data to the input training set file but the workflow is not using it. I can see the new data in the configuration window of the Excel reader node but not in the preprocessing training data.
Suggestions please on how to fix this, so that the new data is processed along with the previous data.
A review of the dictionary created by KNIME includes numbers despite including a number removing node. Some numbers are preceded by a dollar or pound sign or a have a letter at the end. Also place names and company names have been included in the dictionary which do not impact on sentiment.
Any ideas for a work around please.
I can see the recent WF you uploaded to the Hub - thanks. A few comments -
I think you forgot to attach the updated Excel file you mention in your previous post.
It’s not clear to me what your “new data” is. I see a single Excel spreadsheet being input with 58 rows: 40 are being used to train the model, and 18 to test the model. Do you expect something different? If so, did you reset and re-execute the Excel Reader to make the sure the new data is being read into KNIME?
As for the dictionary, it’s not uncommon that you would have to customize this a bit further. I’d suggest sorting alphabetically and the using a Row Filter node to get rid of “words” starting with numbers or symbols (which will show up at the beginning and end of a sorted dictionary).
I reset and re-executed the Excel Reader but the Keras Network Learner node says that the input shape does not match the tensor shape [2329] vs [2400] when all I have done is increase the number of rows of data in the Excel spreadsheet
I reset and re-executed the Excel Reader but the Keras Network Learner node says that the input shape does not match the tensor shape [2329] vs [2400] when all I have done is increase the number of rows of data in the Excel spreadsheet