Help with Text Analytics - How to automatically group skills in broader skill categories

Dear community,

I hope you are doing well!
I am trying to find out a way to automatically group skills into a broader group. For example, I have a database where students input their skills. This is free text and there is no way to change the collection method or system. Due to this, I get many related things. For example, one student can put “data visualization” but another can put “data chart visualizations” and another can put “statistics visualization” and another “data charts”. Is there a way in which Knime nodes can just automatically make all these skills name be change into “Data Visualization” for example? But have Knime decide this category automatically?

Many thanks!

Best regards,

Could you share some anonymized data? Its difficult to offer advice without knowing the data format.

Hi rfeigel,

Thanks for wanting to help me with this. I can share the below image. As you will see, there are different but related skills. The objective is to bucket such related skills into a broader category that captures their central commonality automatically, without having to create a data dictionary manually. For example, everything cointaining “Python” will fall under an automatically category created called “Python”, everything containing “QA” will fall under the automatically created category called “QA”

Hope this helps clarify. Thanks!

image

You might want to check out string similarity node for that.
br