and I would like to:
a) keep all strings betwen semicolons ‘;’ that contains ‘KEEP’ word;
b) remove (make blank) all other strings that don’t contain ‘KEEP’ word.
The expected outcome for the above example would be:
There might be dozens of semicolons in my real data.
I’m applying ‘Cell Splitter’ node (works OK) and trying ‘String Manipulation (Multi Column)’ however I’m not sure how to build an expression with the following meaning:
if CURRENTCOLUMN contains KEEP then do nothing,
else remove CURRENTCOLUMN.
@badger101 , @Kazimierz probably regex might be better here if the requirement is to look for the word “KEEP”. A LIKE "*KEEP*" condition will not look for the word “KEEP”, but rather any string containing “KEEP”. For example, “KEEPING”, or “KEEPER” (BOOKEEPING, BOOKEEPER, GOALKEEPER, HOUSEKEEPING, HOUSEKEEPER, PEACEKEEPING, PEACEKEEPER, etc) would qualify for LIKE "*KEEP*", but they’re not supposed to qualify as “KEEP” as a word as per the request
And if you are going to use regex, you may not need to split the strings into cells, you can come up with a regex that will look for the word “KEEP” between the colons “;”.
That’s correct! The exact solution will depend on what the real phrase is, and what word morphology exists in the corpus for that phrase. I suspect that @Kazimierz is using a dummy phrase KEEP just as an example.
Hi @badger101 , indeed you have a very valid point there. Since they’re dummy data (as they look like now that you mentioned it), it might not show all the cases, whether they’re restricted to word or substring of a word.
Hence I suggested the simple script that doesn’t really need the considerations like one would normally think of in a typical English-language text processing task (e.g. spaces, decimals, morphology). I was expecting them to try it out on the real data and provide feedback so I could have the opportunity to know more about the ACTUAL data so as to provide a better solution if needed.
Thank you very much for your responses and suggestions.
You are right: data and ‘KEEP’ words are dummy data/string. In fact, more complex data is behind data shown, and there string is managed by variable.
I will play with your suggestions and come back with my findings.