Remove special charactaristics

Dear Knime Community (It is my first post, so hope I am right here),

I want replace special charactaristics like ';$%&/(/)(][" witihn a huge excel file with nearly 100.000 line of textes. The Excel file is very simple:

Source Excel file:
Column A : Number (not relevant)
Column B: Text

Support Excel file:
Addition to this, I have another excel file with 1 column an 100 rows where the special chars are stored line by line separatly.

Challange: I need a workflow which removes ALL the special char from the Soruce file. How can I realise this?

I tried with the special char remover dictionary node but it is removing nothing. Maybe wrong configuration?

Appriciate you help

BR
M

Hi @minan ,

welcome to the forum!
Here is a small wf for you to try.

Steps followed:

  1. Take the column with all chars and combine them into a regex pattern
  2. Make the regex pattern into a variable
  3. Apply the pattern and clean the text from the special characters you have

Hope it helps!
Have a nice day,
Raffaello Barri

4 Likes

@minan you can use RegEx to declare which characters are allowed and the rest will be removed.

4 Likes

Thanks Ielloba,

but I am quite not sure what how to do this. I am a Knine beginner.

Can I ask you for some details how to do this 3 points. Appriciate your adivse.

Thanks
Muersel

Hi @minan,

sure, if you have questions about the three points in my last message you can ask! They follow what I’ve done inside the attached workflow. What is unclear?

Raffaello

@minan
if you are not into Regex the KNIME team has introduced a (at least for me) new node called String Cleaner which you could try as well.
br

2 Likes

Nice. Good to know it does exist now

1 Like

Thanks to all for you feedback and your support!

I have 100 special charaters in one “support file” ( Excel file with one column and 100 rows). Looks like:
Column A
line 2: ‘;’
line 3: ‘/’
line 4: ‘%’
and so on.

Can I use this as a kind of “input file” instead writing down all 100 chars in one row?

Knime should looping to the fist one from the ‘support file’, taking the first one and remove this from all texte wihtin the source file (100k rows with text). Than taking the next special charc in the support file and removing this from the text within the source file and so on.

Not sure If I could explain what exactly my demand is.

Thanks

@minan I have built a workflow where you would load your data and you have a table with a list of special characters to remove. You can edit that.

The loop will automatically create a RegEx code that will remove the special characters in a multi column string manipulation node. In this case:

regexReplace($$CURRENTCOLUMN$$,"[%§/\\)]","")

The result is shown with the original table and the changed on. But you can of course just replace the table once you are confident your setup does what you want:

RegEx remove Collection of Characters from multiple Columns.knwf (124.5 KB)

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.