Is it feasible to perform Multi-Label Image Classification in Knime?
I want to create a supervised model workflow based on ~2500 TIF images (with 2 labels in CSV), using an 80/20 train and test split, using 20-30 epochs. Once I have tuned the model, I would like to then separately validate about 200 images for which I have no labels for, to see if I can score them as part of an assignment I am working on for school.
How do I perform multi-label image classification in Knime? Thanks.
just to be sure: When you talk about “Multi-Label” image classification, you mean you want do classify images (or object on images) into one of two classes? Or do you mean that images or objects can have multiple labels?
Thank you for your reply @ christian.dietz. To clarify, the images can have multiple labels (2 exactly); I’m looking at medical pathology slides:
- 1st label refers to an anatomic region of where the tissue was sampled from
- 2nd label refers to the level of toxic cells present in that image
I then want to pass unlabeled images through the trained model, to see if they can be classified by both labels correctly.
You probably want to use two different models that each solves one of your sub-tasks and combine the result afterwards. Which is possible in KNIME.
So you basically have a trained model for multilabel classification? If you say you have 200 Images w/o labels: how you want to “validate” them? Without ground truth you basically can only do this a) by hand or some b) kind of uncertainity scoring.
What exactly you want to do?
Thank you for your message @mereep
I am trying to build an automatic scoring model trained on lableled images (but used for unlabeled images) that can tell me (2) things about the unlabeled images: a) how toxic the cells (numerical 0.01-1.0) are b) what anatomic tissue region the cells came from (numerical 1 to 80)
I will train the model using 80/20 train/test split with over 2K labeled images.
I then have a distinct validation set of images, which are unlabled for this assigment - I won’t know those labels until the end of the month when my instructor reveals them to our teams. The goal is to try and score these unlabled images to predict toxicity level (label 1) and anatomic region (label 2) all in the SAME worflow so that I have a single scoring table to review the results.
BTW - I am very new to KNIME, but I recognize the platform’s potential for structuring such reaearch.
Ok, so you basically just want to apply some created model to unseen data? Do you have a model already?
If not basically do the following: Load Data -> Create one regression model for the toxic level and one classification model with 80 classes for the region. Train them separatly on the labelled data and apply the models to the unseen data. Propably you want to join the results afterwards and write them out using csv.
Thank you @mereep. I don’t have the model yet; I was looking at existing working flows like “Supervised Image Segmentation” or even the Cell Profiler plugin. Again, with being new to KNIME, I’m still trying to figure out the approach for using multi-label outputs for a scoring neural network.
I was thinking to simply, 1) Import the images to a table, add (2) columns to the table and populate the respective attributes. 2) Partition the data into 80/20. 3) Train a (2) label output neural network, to build the scoring model that I will use on unseen data.
Would you be able to point me to a workflow example for your suggestion regarding the regression and classification models?
Since you dont have a model yet, I would suggest you to first go for a “easier-to-handle” model like Random Forests. They can do both: classification and regression. And they happily work without burdens like standardization / normalization etc. Later on you can replace them with DL if you want and try to better the performance. That is easy once you have your workflow.
You will find the models (regression + classification) ready done in the node repository. Just search for “Random Forest”. You can see examples to get you started in:
There you see also a way to split your data rows in train and testset.
Of course you can exchange models.
I would also suggest you to see the two problems you have as separate. Try first to solve them independent from each other.
while I agree with mereep for the general case, this is actually a very nice use-case for deep learning.
Essentially you would have to modify this example workflow to fit your setting.
The mentioned workflow shows how to perform transfer learning i.e. how to adapt an already trained model to a different task than it was originally trained on.
Alternatively, you could train a new model from scratch but 2000 images are probably too few to make that work very well.
Anyway, I have to add the disclaimer that a DL based approach is unlikely to work out of the box and will require you to invest some time into learning a bit about DL.
Hi @mereep. I tried to perform the RF example you suggested, but the Tree Learner only accepts "Column attributes that are ordinary columns in the table (e.g. String, Double, Integer, etc.) as attributes to learn the model on. So this approach is not going to work for me since I need the model to learn on images as well as numerical values.
I have my ~2K labeled training images in my joined table (joined with their respective attributes RID and Score). I need to build a model to predict the RID & Score of 200 similar unlabeled images after training a model on ~2K labeled images. The goal is to build a model that can automatically score these type of biological images, with no labels.
ROW ID contains my image file name, RID = the anatomic region, and Score = the toxicity level.
Does this help clarify things now?
Any guidance is greatly appreciated, thanks.
technically it is possible to train an RF based model but you would have to extract features from the images that an RF understands using e.g. the feature extraction nodes from the KNIME image processing extension.
However, as I explained in my previous post, your task is well suited for a deep learning based approach.
My first approach would be based on transfer learning because 2000 images are too few to train a model from scratch.
Your images look large enough to easily work with models such as VGG, Inception or ResNet which makes it even easier to adapt the example workflow I mentioned in my first post to your needs.
You only have to replace the single output layer in the example with two output layers, where one has 80 units and a softmax activation to predict the anatomic region and the other has only 1 unit and a sigmoid activation to predict the toxicity.
see the attached workflow for an example of how you could tackle the model training.
TransferLearning.knwf (15.5 KB)
Ideally you only need to replace the Data Generator node with your input pipeline.
Note that it is crucial to normalize your images in the way the pretrained model expects.
For this, you can replicate the input pipeline of the workflow I linked in my first post.
Thank you for this @nemad - I really appreciate it. I would like to attempt over the weekend; I am still having an issue getting the Python extension to work on my Windows machine, so I cannot start until I fix this. Any chance you could assist with my open post on this? Setting up the KNIME Python extension on Windows with Anaconda Installed
Also, @nemad - can you please elaborate on the "Your images have
to be normalized between [0, 1] " - I have 2K+ images; is this performed by hand? Any detail you can provide is appreciated, thanks.
Here is an updated version of the workflow that includes the normalization as well as some additional preprocessing for deep learning:
TransferLearning.knwf (38.2 KB)
Thank you for this @nemad. I will try as soon as I can get my Python extension working.
Hi @nemad - I finally integrated python to Knime.
I was trying to initialize the DL Python Network Creator, but I get this error:
ImportError: No module named ‘keras’
ERROR DL Python Network Creator 0:2 Execute failed: Traceback (most recent call last):
File “C:\Program Files\KNIME\plugins\org.knime.python2_3.6.2.v201811051558\py\PythonKernelBase.py”, line 278, in execute
exec(source_code, self._exec_env, self._exec_env)
File “”, line 2, in
ImportError: No module named ‘keras’
I believe I installed Keras over the weekend; forgot how to check in cmd.
@asrichardson please check knime.com/deeplearning for instructions. Keras has to be installed in version 2.1.6 in your py35_knime environment.
Hi @nemad. I made some great progress - I was able to fully establish my Python and Keras environment per the help of @christian.dietz.
First: I’m having some issues with how I “read” and label my images via the “joiner”. I want to create a table that has (4) columns: Image Name, Image, RID, Score. Currently, I have (1) table and a set of images I want to join, they are in screenshots below. My join is adding “RowID” after the image name, today, which is not what I want.
Second: The “image calc” node turned the images 100% grey - they are all the same. Not sure of the function that needs to happen here.
Third : I was able to convert RID to String, but I’m at a loss for how to manipulate this in the Domain Calculator node.
I’ve attached my workflow for reference.
Keras_Transfer_Learning.knwf (32.0 KB)
Any thoughts are appreciated, thanks.