list of Strings to be removed from a specific column

gopal1 · October 8, 2021, 6:40pm

I am trying to eliminate all the Titles such as Dr, Mr, HR, Prof, FRCS, etc., from a specific column containing names.

I currently use one string manipulation(regexreplace to be specific)node for a single title. But I want to know if its possible to remove all the titles at once from the column data.

Thank you

bruno29a · October 8, 2021, 7:23pm

Hi @gopal1 , and welcome to the Knime Community.

If you want to replace a single one, you don’t need to use regexreplace, you can just use replace.

However, you can replace all of them using regexreplace, but just adding them to your expression as OR. So you’d be looking for "Dr ", or "Mr ", or "HR ", etc and replace them with nothing.

You can also use replace(), you just need to do multiple replace, for example:
replace(replace(replace($your_column$, "Dr ", ""), "Mr ", ""), "HR ", "")

or in a more readable way:

replace(replace(replace($your_column$
  , "Dr ", "")
  , "Mr ", "")
  , "HR ", "")

Can you share what you have done so far?

ipazin · October 12, 2021, 11:39am

Hello @gopal1,

did a workflow some time ago which removes list of words from multiple columns. Maybe you can use it. Here it is:

Welcome to KNIME Community!

Br,
Ivan

Daniel_Weikert · October 12, 2021, 5:39pm

Your current regex solution also support OR Criteria “|”
br

gopal1 · October 29, 2021, 12:39pm

Hi @bruno29a ,

Thank you so much for your inputs.

I am getting an error as attached, if I try using OR . I might be using wrong syntax.

Note : I have used all combination using OR in the string manipulation and ended up with the error.

Thank you

bruno29a · October 29, 2021, 12:40pm

Hi @gopal1 , try like this instead:
regexReplace($LAST NAME$, "Dr|MR|MD", "")

EDIT: You probably want to remove any space that’s between these and the name.
For example, you may have “DrSmith”. If you remove “Dr”, you will end up with “Smith”. Similarly for “MD” which is usually at the end, it will leave you with a space at the end. You can use the strip() function which strips whitespaces at the beginning and at the end. Just add it to the function altogether:
strip(regexReplace($LAST NAME$, "Dr|MR|MD", ""))

Test results:

Config:

gopal1 · October 29, 2021, 1:14pm

Thank you so much Ivan.

I have a set of ~ 230 titles to be removed from the Column and I couldn’t get it right as the titles are not getting removed .

I might be wrong.

Do you have any other option to remove all set of titles at one go ?

Thank you,
Sriram

bruno29a · October 29, 2021, 2:08pm

Hi @gopal1 , in my last post, I answered the latest question that you had, which is where you were trying to do a regexReplace with OR, and I corrected that expression.

However, after re-reading the thread (it’s been a while since you asked your original question), your expression will not do exactly what my original suggestion does. Notice that in my search string, I’m looking for "Dr<space>", "Mr<space>" and "HR<space>", while your most recent expression is not including that space.

In the case of “Dr Drew”, my expression will return “Drew”, which is the result you want, while your expression will return “<space>ew”. Your expression will also remove these from the name.

Since the search is case-sensitive, this is more of an issue with Capitalized strings such as Dr, Mr, Prof, etc., but for uppercase titles such as HR, MR, DR, it should not be an issue, unless the last name is also in uppercase.

You can add a space with your search string, although you need to know if the title is a prefix or a suffix (I think MD is usually a suffix, that is it comes after the name).

For prefix titles, you want to search for “title<space>” and for suffix titles, you want to search for “<space>title”:
Mr: "Mr "
MD: " MD"

I’ve put something together where you can add your 250 titles in a table, and the workflow will replace them in 1 shot.

Workflow looks like this:

Sample title list (that’s where you would add your 250 titles):

Results:

You can see that Dr Drew MD came out as Drew

Here’s the workflow: List of strings to be removed from a specific column.knwf (14.7 KB)

gopal1 · October 29, 2021, 4:31pm

HI @bruno29a ,

Thank you so much.

This worked

bruno29a · October 29, 2021, 5:27pm

No problem @gopal1 .

FYI, you should mark the post containing the solution as Solution, not the post where you are acknowledging that the solution works

ipazin · November 2, 2021, 9:08pm

Helllo @gopal1,

in general it’s always recommended to share input data set and expected outcome. (If data is confidential then you can use dummy data as it obviously doesn’t matter.) That way is usually fastest and less painful as we don’t have to guess your data set, expected outcome and logic behind it. Although you found a solution might be useful to know for next time

Br,
Ivan

Daniel_Weikert · November 3, 2021, 5:48pm

I assume there is an internal ticket for this regarding the forum improvements

ipazin · November 4, 2021, 11:35am

Don’t think there is but not a bad idea
Ivan

gopal1 · November 10, 2021, 4:04pm

Sure I will do that . Thanks

system · November 17, 2021, 4:05pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.