Delete List of Words in a String - Possible Pattern

Hello everyone, I may have a problem for you! :slightly_smiling_face:

I am performing a text similarity between two files but one of them is “dirty” and I need to clean it of unnecessary information.

The file is as follows:
Book1.xlsx (13.9 KB)

My goal is simply to eliminate all product weights (500g/500g/250g/1kg/etc).

I tried doing with string manipulation but I should make a node for each weight, I tried applying a variable table with a “dictionary” inside but I can’t get it to work.

Looking closer in my opinion you can definitely apply a pattern in such a way as to do a good cleanup–and this is where I need your super expertise!

How would you deal with this problem?

Thank you very much in advance to everyone for your help! :slight_smile:

HI @takeAfew

I managed to clean almost all of the records, so take a good look, especially when you run the wf on new data. some_data_cleaning.knwf (67.0 KB)
What helped me to solve this problem to reverse your input column. Which made it easier to eliminate the gr/kg etc.)


gr. Hans

1 Like

An one-node alternative approach could be to do it via Regex and use the regexReplace() function accordingly.

regexReplace($Product$,"[0-9]{1,4}((g)|( g)|( kg)|(kg)|( Kg)|(Kg))","")

The use of a the weight is quite inconsistent in terms of spaces and capital letters so some pre-procession would make the match function a bit more straightforward.

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.