Column Matching with a user set input string from different sheet or criteria

beloshi · April 26, 2018, 11:36pm

Hello everyone,

I am new to this platform and after extensive searching for the solution to my problem in the different threads using the search and different, I did not come across any solution which I am looking for.

I want to explain my case (this particular case revolves around HR Analytics) here;
I have two excel files

1. Selection criteria which contain the columns such as Name of the position (for which an internal candidate is required), name, education, language skills, skill sets etc.
2. The second file contains all the candidate data which are fetched internally to find out the best matching candidate.

the problem is in education, language, and skills column there are more than 1 language (e.g English, German etc.), education (eg. Diploma. BS, MS, etc.)

So, for example, I want the result where I define what I want in the Selection criteria sheet for the language (as I know it can be done by Array with a collection of strings and Regex but I don’t know how) same goes for education and skills.

Following is the illustration of the sheets and the results at the end needed;

Selection Criteria Excel File

Name of Position Qualification Skills Language
xyz MS abc German

Data File

EMP_ID F/Name L/Name Qualification Skills Language

123 hihi kssfd BS/MS/PHD hhh/jjj/kkk English/German/French
453 asfd sdfsf BS/MS jhk/kkk/hhh English/French
888 opio uiop BBA/MBA jjj/lio/kkl English/Cantonese
676 yuit tyui Diploma/BBA/MS sfd/jki/oiu German/English/Spanish

Result file (Excel output) which shows matches column-wise according to my criteria define in selection criteria column

Results as shown above whatever matches it adds the new column to right side naming Qualification match, Language match etch which 1 (Match) and 0 (Not match).

I hope I have explained the case with the best of my ability, if still there is any information unclear kindly let me know of that.

Picture illustration above

Awaiting reply

Beloshi

adaptagis · April 27, 2018, 9:14am

good morning Beloshi

I found your task quite interesting and gave it a go…
the attached workflow might not be the sexiest solution, but it was fun anyway… inputstring.knwf (30.6 KB)

izaychik63 · April 27, 2018, 2:05pm

I would recommend to replace RowID manipulations with Rank node with Ordinal mode. Also,
for KNIME people, in node description possible ranking mode the word Ordinal : is omitted.

beloshi · April 28, 2018, 2:09am

Good Morning adaptagis,

Thank you very much for the efforts, I like the solution you provided to an extent it serves my purpose and I also like the approach as of my understanding (please bear my nuisance) as I am not that of an expert of this platform but eventually learning it. However there are some points which I gather from your solution which I am expressing below;

It provides me the results when there is a language present in the Data sheet if it is not there than it doesn’t show any results. For my case if there is not a single match there should be an indication of it in the final result (As I mentioned above it should add a column Qualification_Match with 0 (No match) and for language do the same). In the end I want all the candidates in the data sheet in the end results with the 1 (Match) or 0 (No match).Hope I am clear with my requirement.
It is not working for the lower case requirements as for input if someone types english instead of English. Which I tried in your solution to check if it works and it gives me the same it creates an empty result table with no records or matches. Same goes for the qualification.
The resulting table is also calculating the occurrences of language and qualification. If one of the ingredient (i.e. language for example is missing or is Cantonese which is not present in the data table) than again it does not provide the result even if one condition is matched (i.e. Qualification for example is MS) so it is limiting the results, it only gives result when both the conditions fulfilled. I want to see both the results whether it has a match as I see you have appended a column with lang_prediction and qual_prediction with an aggregate final score column which is very good addition and provides a clear analysis with scoring.
The reason for opting out from another analytics tool (Rapidminer) for me was same it was not providing me those functionalities as I have worked on my project at first on rapidminer and after inquiring with my peers they suggested to go for KNIME and for rapidminer I need to create an operator with REGEX for this particular solution which got things complicated for me and the same peers also suggested that it is available in KNIME as I don’t need to create anything new for my solution.

So I am trying it for the sake of educating myself with KNIME. I am still trying to learn this tool to understand it and build my solution on it. Both of your and this platforms efforts are highly regarded in this sense.

I am looking forward to the your and this platforms support and help.

Kind regards

Beloshi

adaptagis · April 28, 2018, 7:23am

Hi Beloshi
I agree that my worklow is more a concept than a solution. But there are many options to improove it.
and as it is quite unstrucured, your data really needs some preprocessing.

here some hints:

regarding the table, you can have a look at the table created from the joiner node for more details on the matched conditions…the math formula and the filtering is just going a bit further
regarding the small letters you can put the "case converter " node after the table reader to convert all text into capital letters.
reagarding Zero for missing values, you can use the “misssing value” node and set it to 0

keep on digging into KNIME it is really worth it… using the Examples and the forum and all the other sources for learning, you will soon get familiar to this wonderful tool!

beloshi · April 28, 2018, 10:55pm

Hi adaptagis,

Thank you once again for the directions, its really helping me out and building my interest in this tool more and more. And also your hints worked in the direction I have eventually been able to achieve some of the requirements and also coming across something new to experiment with it (sometimes breaking and then fixing something) and learning my way forward with KNIME eventually.

Just to let you know as well that I am doing all this with my predefined excel requirements and also the actual data at hand now.

So now out of curiosity and more learning, I want to ask if there is a way that we can achieve this with a different approach. Like for example;

I want to know if there is a way that we can use wildcard searches for different columns and get the results out of it because I’ve been thinking of the different ways to encapsulate this like taking the request as the input criteria, bind it with the columns which are already there and instead of the score putting a relevance scale for each search criteria/columns. Eliminating Uppercase or lowercase rule, it just makes sense of the word or phrase if it matches.
Appending the new relevance columns (percentile relevance) after every criteria/column, it would help in ranking the candidate.
Or can we assign weightage to each and every column so that we calculate the overall relevance score at the end which would serve better in ranking.

Constructive suggestions and hints are welcome in this regard I am still experimenting things into it and by the help of this community I am building a very keen interest in this tools and its functionalities as compared to the ones I used before.

With best regards

Beloshi