Machine Learning : recognize yellow or orange jacket on human being

Hello,

I am trying to build a classifier able to recognize if an human being wear a yelow or orange jacket, or not. I have a sample of pictures tagged with “pos” or “neg” in order to train the classifier. But at the moment I get very disappointing performance… 35% wrong classified and I dont find any coherence by taking a look at the wrong classified images. Obviously, the learner does not know what exactly it suppose to looking for.
My main problem, I suppose, come from the image processing. I cannot find the good processing and pre-processing to extract the good informations (the color and shape data of the points of intrest).

Is someone here who could suggest me some ways to resolve it ? I attached my workflow with some images to illustrate.

Thank you in advance for the time you will accord to my issue.

Amaury D.

test_gilet.knwf (28.6 KB)
pics_sample.zip (631.2 KB)

1 Like

Hi @AmauryD,
In your case the best way is to not try to classify the images whole, but to work with patches of the image.
This requires you to label the image patches though, so you will unfortunately need to do some manual work for this. I marked the part in the workflow for you where this is necessary.

I have attached a modified workflow (workflow download) to give you an idea on how to achieve your goal, following are some notes on my approach and what you should try out to get better results:

  • you might try playing around with the patch size + the overlap, so that the patches are adequately sized.
  • you need to over-sample the patches that contain people wearing a west as they are much rarer. I just over-sampled the minority class by factor 3 using SMOTE, but you should investigate using a more sophisticated approach
  • The Random Forrest model can not use the RGB values directly, I calculated some basic features for each channel separately instead and appended them. If the results are not good enough on your data, you should try out some additional features.

best,
Gabriel

1 Like

Hi @gab1one

Thank you for your answer so helpful ! A lot of thing is pretty much clear now.

I have already made some tries but only with very litte sample (the global one will take a little time to annotate manualy all the positive patches)
There are lot of thing in your workflow (thank again for that), so in order to give a constructive feedback I am deeply studying each of them to understand what they do exctatly.

However I still have a first question about the partitioning : should I have to partition before the patching step ? Because partitionned after implies that some patches of the same picture could be in the trainning part AND in the learning part

Thank for your help.

best regards,
Amaury

In the workflow you are training a model that learns to differentiate between image patches containing people wearing a west and ones that don’t, the model does not know about the whole image, as it only ever sees the patches. For this reason including patches from the same image in the training and in the test set not a problem at all.

best,
Gabriel

2 Likes

Hello Gabriel,

Yes but it could become problematic if I wish to build a model able to give a prediction like “there is or there is not someone wearing a west in this image”. To do so, I think about patition my sample and apply the same processing to test and train sample separetely, what is your point about it ?

And about the Group By node, how can I attribute “pos” value when at least one patch of the image has been predicted “pos” ?

Thank you

AD

I think the best approach is to split the problem into two steps:

  1. Train model that can detect whether there are people wearing a west in a small image (patch).
  2. Slice your larger image into patches and apply the model from step 1. on each of them. If any of the patches are predicted positively, we label the whole image as positive. This happens in the Transfer patch prediction to whole image node.

Doing it this way the individual problem for the models are much easier to solve and the combination of them solves your original goal.

I just took another look into the workflow and noticed I did not do this nicely, you can e.g use the Unique concatenate aggregation method on the patch classes to get a list of all classes that occur in the image. Follow it up with an Rule Engine node that contains a logic like the following:

$Prediction (class)$  MATCHES ".*pos.*" => "pos"
TRUE => "neg"

To transform that string into a prediction for the image.
See the attached updated workflow for reference: workflow download.

best,
Gabriel

2 Likes

Hello Gabriel,

Thank you for all your precisious advices.

This is what I tried :

  1. partitionned (stratified) 2) create patches for the two sample 3) annotated the class label to the train sample + calculated features on test sample 4) calculated features on train sample + oversampled 5) train the ramdom forest model 6) applied this model to the test sample

Resultat : between 40 and 45 % wrong classified.

I tried differents sizes of tile/overlap for the patches, less and without oversampling, with partition draw randomly, nothing improved the model.

When I annotate the class label, should I have to be very selective ? I mean annotate the “pos” value only when the west is clearly visible to my naked eyes, and not when it’s cuted or so far in the background. But I do not think so to bo honnest, because when I did the partition after the patching+annotation (like in this workflow west_detection-V1.knwf (75.7 KB)) the model’s precision is not too bad with 20% error. So that mean this model detect relatively well the west, I do not understand what goes wrong with this workfllow west_detection-V2.knwf (112.1 KB)

Should I have to delete this kind of pictures some_complicated_image.zip (35.5 KB) from my project and focus only on the simplest one to finally find an efficient model ?

Thank you for your time spend to help me, I am really greatful to you .

Best regards,
AD

Because the patches overlap, you can assume that an image with a cut off west most of the time also contain a a patch where the west is more or less “whole”, only annotating them as positive could result in your classifier learning to detect only whole patches with whole wests. I would not do that though, as this complicates your learning goal, instead I would annotate all patches with pos where you can clearly see that they contain a west.

You can take a look at the wrongly classified image patches to see whether you need to adjust your annotating. If your model misses many wests you need to be less strict, if your model detects a lot of wests where there are none, you need to be stricter.

When you split on the patches, your model is trained on a more diverse dataset, it learns to detect the wests in a higher number of environments. And it sees a larger number of different patches, this is especially important when doing an oversampling strategy. Depending on your number of images you can try to do the following:

  1. Take a validation dataset from your data (like ~10% ) of images which you never show the model during training (partitioning on the whole images).
  2. Cut the remaining images into patches and annotate.
  3. Partition the patches on their class now.
  4. Calculate features + optimize sample sizes
  5. Train the model with these images and try to find the best parameters for your model, (employing some form of cross-validation is a good idea here, to ensure these parameters are stable).
  6. Once you are happy with the performance of the model (or you can not improve it any more), you can test the performance using the validation dataset from step 1.
  7. If the performance is unsatisfactory start again, with a freshly drawn the validation set.

These pictures have a very low resolution and the wests are hard to detect, even for a human. I would try to build a model for the simpler images first, then either build a special model to detect such images afterwards, or discard them completely.

best,
Gabriel

Hi Gabriel,

So I excluded the more complicated pictures and focus on the most simple, I have been more strict during the annotation step and that has improved the model with 15 percentage points in less on error rate.

I wondered if I could use the Haralick’s features, because it is intended to graylevel image but after splitting the RGB channel, can we considered each one as one-color-level and extracted Haralick’s features ?

  1. Take a validation dataset from your data (like ~10% ) of images which you never show the model during training (partitioning on the whole images).
  2. Cut the remaining images into patches and annotate.
  3. Partition the patches on their class now.
  4. Calculate features + optimize sample sizes
  5. Train the model with these images and try to find the best parameters for your model, (employing some form of cross-validation is a good idea here, to ensure these parameters are stable).
  6. Once you are happy with the performance of the model (or you can not improve it any more), you can test the performance using the validation dataset from step 1.
  7. If the performance is unsatisfactory start again, with a freshly drawn the validation set.

I am not sure to understand on which sample I apply the prediction model and how I can evaluate the performance if the validation dataset is only used at the end of the process. You mean that I make a first partition on whole image, like 10% for validation and for the 90% remaining I apply the next process :
Create pathes > annotate patches from pos images > extract features > partition into trainning set (70%) and
testing set (30%) > overlapping the pos value from trainning set > train the model > apply prediction model on
testing set
And I loop on this process until I find the best parameters, and only when it is done I apply the prediction model on validation set.
Am I correctly understand ?

Tkank you for your answer.

Best regards,
AD

Yes, this is something you should try out, you can later add them all back together, just like I did with the simple features that where calculated per channel (min, max, etc.).

Yes, on that training set you can also experiment with different partition sizes, the 30 / 70 split is not necessarily the best one, you need to figure out what works well for you.
By doing it this way you can reduce over fitting and you can see how your modeling process performs on data which it has not seen at all before.
best,
Gabriel

Ok, thank you very much Gabriel. You gave me all I was needed to stand on my own two feet and to perform my project successfuly.

1 Like

I am very happy I could help out :slight_smile:

3 Likes