Hi all. Have been supported by this forum for long and now is the first time I couldn’t find what I’m looking for.
I need to remove certain words from the rows I have and I could do it(partially) using string manipulation receiving variables from a second file. In the example i have…
…the “SOURCE” file with: LIST
NAME 1 PERSON
NAME 2 PERSON VEHICLE
COMPANY 2 ORGANIZATION
COMPANY 1 ORGANIZATION PERSON
CAR 3 VEHICLE
BUS 1 VEHICLE ORGANIZATION
and the replace file that contais the words I need to remove from the source: REPLACEVALUE
Expected result is that any of the words in 2nd table is removed from first table becomming
Can you please upload your workflow as a knwf file instead? I’m not sure how to import a workflow that’s in a json format.
Just right click on your workflow from the KNIME Explorer section, and choose Export KNIME Workflow.... It will export the workflow as a knwf file, which you can then upload here.
Based on the results that you showed, it looks like you are running a loop, but you are using the original LIST in each iteration, so basically it’s only replacing one of the REPLACEVALUE only in the final result. You can see that in the first iteration (0), it replaced PERSON from the original LIST, and the in the second iteration (1), it replaced ORGANIZATION from the original LIST, etc.
You need to pass back the modified string from previous iteration to the next iteration.
But there is a better way to do this without any loop.
You can use the Dictionary Replacer to do this in one shot.
Bruno, thank you so much by your answer. Here I have attached the workflow in the format you requested. KNIME_project 1.knwf (19.0 KB)
In addition, I got your flow and seems it works fine with REPLACEVALUE with single words (like “PEOPLE” or “VEHICLE”) but by any reason it doesn’t work with REPLACEVALUE using multiple words string (like “PEOPLE VEHICLE”). Can you elaborate it?
PS: I didn’t have the nodes you used. Needed to update and add an extension.
Hi @ToniMeneses , I have already explained this. This was the explanation of it:
Basically what this means is that each time you are doing the replace in the loop, you are giving it back the original data. So it will replace the word for that iteration only.
You can see in the first iteration, the string PERSON got replaced from the original data:
“ITERACTION 0”;"NAME 1 "
“ITERACTION 0”;“NAME 2 VEHICLE ”
“ITERACTION 0”;“COMPANY 2 ORGANIZATION ”
“ITERACTION 0”;"COMPANY 1 ORGANIZATION "
“ITERACTION 0”;“CAR 3 VEHICLE ”
“ITERACTION 0”;“BUS 1 VEHICLEORGANIZATION ”
And in the second iteration, the string “ORGANIZATION” got replaced from the original data:
“ITERACTION 1”;“NAME 1 PERSON ”
“ITERACTION 1”;“NAME 2 PERSONVEHICLE ”
“ITERACTION 1”;"COMPANY 2 "
“ITERACTION 1”;“COMPANY 1 PERSON ”
“ITERACTION 1”;“CAR 3 VEHICLE ”
“ITERACTION 1”;"BUS 1 VEHICLE "
So the problem is that you have to pass the modified data from the previous iteration instead of the original data as I have already mentioned in my previous post.
As for the nodes that I am using, you just need to get the Text Processing extension. This is easily done. When you import the workflow, Knime should prompt you if you want to install the extension for your missing nodes, and if you agree, it will automatically installer the proper extension for you.
Note: Doing this via loop is very inefficient and resource extensive compared to what I suggested. With loop, it means going through the REPLACEVALUE one at a time, while the Dictionary Replacer will do all of them in 1 shot. Moreover, each iteration of the loop has to bring back the whole table and get scanned all over again.
Sorry Bruno, I was not clear in my statement. Eventually is because my english is not the best, even being better that google translator Using your flow, with dictionary, it works fine with REPLACEVALUE with single words (like “PEOPLE” or “VEHICLE”) but by any reason it doesn’t work with REPLACEVALUE using multiple words string (like “PEOPLE VEHICLE”). Can you elaborate it?
I have values (in the dictionary table) that are composed by two or three words.
Hi @ToniMeneses , it is I who have to apologize, it seems like I’m the one who did not properly read what you wrote and misunderstood. Sorry about that.
The reason why it does not work for multiple words is because the it can only be used for single terms. There is a workaround for multiple words using the Dictionary Tagger. By tagging them, each line (1 word, 2 words, 3 words, whatever you want) become a term.
This should do:
Since your original LIST did not have “PEOPLE” nor “PEOPLE VEHICLE”, I added a few entries.
I also added the “PEOPLE VEHICLE” as a new entry for REPLACEVALUE:
As you can see, “PEOPLE” will not be removed unless it’s “PEOPLE VEHICLE”, which is the expected behaviour (“VEHICLE” gets removed because “VEHICLE” on its own is part of the REPLACEVALUE).
Bruno. I’m glad you coud support me on this. I understood the limitation. It already helps a lot. For the case i have I’m considering to have two different dictionaries applied in different instances. The first one ith complex values and the second one with single values.
Just for your knowledge, and not for any support, the case is that Ireceive monthy a database with registries that my customer partners fills themselves. One specific field should have the person complete name. and it was supposed to have only the name of the person but they mix it with undesired content. Like:
“Bruno 29” fills his subscription with the name “Mr. Bruno 29” or even “Dr. Bruno 29 system engineer”. I’m not the owner of the application so I can’t fix it in the source.