Sentiment Analysis Training with Integrated Deployment

I want to run the Sentiment Analysis Training with Integrated Deployment use case from the Knime book - Codeless Deep Learning with KNIME but with some new data using an Excel Spreadsheet rather than the .h5 files currently used.

Can anyone tell me which nodes to change? and why? Please.

Hi @stockdaa -

The H5 files in that workflow are for reading and writing the model created as part of the training process, and don’t have anything to do with the data used to train the model. You don’t need to worry about the H5 files at all, they are solely a mechanism for creating the model in the training workflow, which can then be used in the deployment workflow.

If you want to use your own data, open the “Read and partition data” metanode and swap out the Table Reader node with one pointing to your own data. This can be an Excel Reader, or whatever is appropriate for your dataset.

You may also want to take a look inside the “Preprocess training set” node and check the parameters for the dictionary size (Create Dictionaries metanode) and the number of words per document (Truncate metanode) to make sure these are appropriate for your dataset.

4 Likes

Hi Scott

That is great.

Thank you for your prompt response.

What would I need to change if I wanted to train (retrain) the workflow on some new data using Excel based data?

There were some issues with accuracy (reproducability) of the original workflow.

I’m afraid I don’t understand the question. Each time you run the training workflow with new data, you are training a new model. Maybe you mean something else by “training” than I do…?

Hi Scott

I think I may have misunderstood your comment ‘The H5 files in that workflow are for reading and writing the model created as part of the training process’. I assumed that they would still be present after changing to an Excel source.

Hi @stockdaa

Your .H5 files with the model information will still exist, as long as you don’t overwrite them.

gr. Hns

1 Like

Thank you for the screen shot.

I want to replace all the .h5 files with Excel spreadsheet data and run the model. Which nodes do I need to remove and replace?

When you retrain the model using your own data as described above, the H5 files will be overwritten, and the new neural network that’s created will reflect your data.

You don’t need to do anything manually to the H5 files yourself.

Thank you for explaining this.

If I use a different format to .h5 to retrain the model such as Excel will I need a new node?

Hi @stockdaa

The .h5 file is an object created by the Keras Learner. In the h5 file the model information (e.g. weights) from the trained model are stored. If you want to score a new dataset, you can load the .h5 (model information) with the Keras Network Reader and score your (pre-processed) data (your Excel file…?), with the Keras Network Executor.
You can’t store your created model from the Keras Network Learner into an Excel file.
gr. Hans

1 Like

Thank you for explaining how this works.

If I want to train the model with new data from an Excel file what nodes do I need to change?

Hi @stockdaa

In the wf there is a metanode “Read and partition data” . In here you find a Table Reader node. Replace the Table Reader node with your Excel (Excel Reader node). Be sure your input from Excel matches the original column names and column formats (use e.g the Transformation tab in the Excel Reader node or use the Table Manipulator node). After that you are ready to train the model on your new data…
But you have to pay attention to the way dictionary is created (Metanode Preprocess training set). Or to put in a different way, what dictionary is most helpfull in your case. It all depends on your input data and (“business”) question you want to answer. Training a model, is not just following the “IKEA manual”
gr. Hans

2 Likes

Thank you for your detailed explanation and tips.

1 Like

I have replaced the Table Reader node with the Excel Reader node and I need to increase the shape size to enable more data (text) to be read, from 80 to 3000.

I have changed the shape size in the Keras nodes but I am getting an error and I cannot find where to make adjustments.

Please let me have any suggestions. Work Flow below:

Hi @stockdaa -

I believe this is happening because you changed the dimensions of the Keras Input Layer - but you also need to configure the Truncate component inside the Preprocess Training Set metanode to match:

Can you try that?

Hi Scott

Thank you for your feedback.

I amended the Truncate node to match the new shape size and initially the run timed out, so I increased the timeout limit. When I executed this it came back with an error, indicating that the shape size was not consistent throughout the work flow. I have looked everywhere I can think of but cannot find the where to adjust.

Please let me have your suggestions.

Best wishes

Alan

Are you able to upload the workflow itself instead of screenshots, possibly with some dummy data? Without an actual workflow, at best we are just speculating at what the problem might be. I was able to recreate your error message on my end before, but that was just a lucky guess about the inputs.

The error message you posted is the same as before, so is it possible that when you re-saved the workflow you have the Keras Input Layer set to 80 again, instead of 3000?

knime://LOCAL/Codeless%20Deep%20Learning%20with%20KNIME/Chapter_10/Integrated_Deployment/Sentiment_Analysis_Training_With_Integrated_Deployment%201%20ExcelV01

Unfortunately that is a link to a WF in your local repository. Maybe you could try uploading it to the KNIME Hub? Scroll down a little on that page for some gifs showing you how that can be done.

I have uploaded the work flow to the KNIME Hub are you able to see it?