How to more elegantly replace strings in bulk using wildcards?

Hi everyone,

I need some help finding a better way to search a cell with a string+wildcards and when it gets a match, replacing the entire entry with what I want.

I have a table with a list of disease names that I need to clean up. Below is an example of the sort of thing I am doing.

Current method:
String replace node with “huntington disease

Input:
huntington disease
huntington disease HD
huntington disease and associated diseases

Output:
huntington disease
huntington disease
huntington disease

Right now I have 60 “String Replacer” nodes. Each coded to a different disease. Theres gotta be a better way to do this, right? Its really cumbersome to manage all of these separate nodes.

If you current rules are pretty simple, say LIKE or MATCHES you can put rules on file and use


node

3 Likes

you could also find the minimal string needed to identity your disease and then use a wildcard pattern in the “string replacer” node. Hard to give a better solution without knowing more about the dataset.

1 Like

Have you thought about a similarity based approach? For example:

izaychik63 - I tried the rule engine but it deletes entries that don’t need cleaning. So my column I’m trying to clean ends up depopulated.

scapuzzi - That’s the method I have been using. The problem is I have ~60 individual nodes (one for each search term). Is there a way to use a table of search terms with wildcards?

scottF - Thanks for the suggestion. I must admit I am not sure how to implement this to do what I need but I can see how something like this would be powerful.

@josephelsbernd, to keep records you need to add one more rule as a last one
TRUE => $Input$

4 Likes

@izaychik63 That is great! Its working now. Thanks a ton!

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.