I have 2 Columns which contains same information i.e name of famous brands and I want to take out all the word for every row from both the Columns and then then perform the exact match for the words and for that did a cross join and then count the number of matching words and I have flag which counts the match word and for this splitting and matching part I have written the code in python but the process is taking too long because of the Data size I have
P.S: I have a 128 Gb machine to work with
My Sample Data :
ID Col A Match_With
1 Tata Motors Tata Motors
2 Pepsi Co Pepsi Co
3 Tata Cola Tata Cola
Output Data after Splitting and Matching
ID Col A Match With Col A Words Matchwith Words TotalMatchingWord
The workflow I have attached is based on “standard” Knime nodes (no Python code),
even not using any node from Text processing repository. Try to adapt it for yourself.
However, there is also a critical point in “Cross Joiner” node as for memory consumption.
Hi Martin ,
The workFlow is doing wonders . thanks for the much help, i really appreciate that ., but it di d tweaked the workflow a bit .
Now In the Cell Spliter Node i have used the option of output as set , so now while when i m exporting the data to CSV i m getting the error that Input table should be int or double , any work around for this ?
Thank you very much . I suppose you have used CSV writer node. Insert “Split Collection Column”
node between the node containing column of Set type and CSV writer and break Set column into columns
of primitive data type like string, double, int. Then writing a CSV file should be working.
Now the situation is that I Have a 128 GB Ram But i want KNIME to use atlst 100GB but while i run heavy codes it is only Using maximum 2GB,
I have tried 2 option -Xmx and second was cellsinmemeory but still no Luck .