Hi KNIME - I am totally new to KNIME and the most andvenced tool I used so far for analytics is Excel…
I parsed a jobboard for Job Titles (not unique) and the corresponding Skills. What I want to identify is the Skills (keywords and keywords combinations), that describe each Job Title the best, or are the most common on all job advertidsements for each Job Title.
I was tinkering around with preprocessing and bag of words, but I am kind of stuck when it comes to relating the terms back to the Job Titles. It feels like I am already doing something wrong, when setting the data up.
I would be glad, if somebody could give me advice here, or point into a direction - any help is appreciated.
Welcome to the KNIME community (and sorry for the late response).
Your initial approach seems to be right!
Could you explain a bit what you mean with relating the terms back to the job titles?
My colleague Vincenzo wrote a blog post about key word extraction which could be of interest for you. The corresponding workflow can be found here.
Additionally, there is also a component which already does all of that in one go.
You can find it here. You can simply drag and drop the component icon from the website into your KNIME Analytics Platform.
If you want to find out more about the concept of components, please have look at this documenation.
Basically, a component encapsulates a set of native KNIME nodes to hide complexity but also to be able to create a new “node” which can also be configured and/or provide interactive views.
I hope this helps. If you are stuck at a specific step, please let me know and we can have a look together (either post a screenshot or the workflow, if possible).
thank you for getting back to me and providing the blog link. I only grasped half of the math, but it was still useful to me. I will try the Chi-Square Keyword Extraction on my pity workflow and see for the results.
What I want to achieve is to describe job titles based on their job descriptions with these keywords. This of course already works very well for each job title individually. I want to find a set of skills and experiences (keywords) that “describe” a certain group of job titles. Say for example I have 100 different job titles for accountants. Now I want to extract the n terms, that are common in all 100 job descriptions of all 100 accountants.
But 20 of these accountants do have the term treasury in their job title, so a more narrow title and I want to know what these 20 have in common, too. I think what I am talking about here is clusters or clustering, but not really sure, if I use this term correctly.
I am looking forward to any help you can provide.
Sorry for the delayed response.
I think calculating term frequencies is a good approach to see which terms are mentioned frequently.
This is basically the TF-IDF Index for Keyword Extraction part of the blog post which I have put in the initial post.
Another option would be the Topic Extractor (Parallel LDA) node. This extracts the main topics for a set of documents.
Document Clustering would also be helpful. A few examples can be found on the KNIME Hub: