delete unnecessary characters in data

Hi;
I want to delete unnecessary characters in the data, I tried several methods, but it deletes the words after the character or breaks the character and after. Exactly what I want is to delete the special characters in the picture before all the words are lost. How can I do it ?

image

Hi @umutcankurt

I guess you want to remove all the possible occurrences of “\.\.\...”? Not only just those that were highlighted in red.

Could you please copy and paste this piece of data as text (instead of a screen snapshot) ? I’ll be glad to have a go at it and provide a possible solution.

Best

Ael

2 Likes

Hi @aworker
Thanks for the answer, I want to remove the characters. Words that come before and after characters should not be lost.
example;
. Gemeente Oostkamp-PPP05R… Herinrichting Moerbrugsestraa…

Edit**
. Gemeente Oostkamp-PPP05R Herinrichting Moerbrugsestraa

image

Data…

. DG&P-SMA_lot 6 … Nieuwbouw en verbouwing school Lot 6.Omgevingsaanl…
. Vlaamse Maatschappij voor Sociaal Wonen-WI 0EDAAE05 … Wegen-, riolerings- en omgevingswerken project Kunstberg te LUMMEN .
. Gemeente Oostkamp-PPP05R… Herinrichting Moerbrugsestraa…
. Gemeente Oostkamp-PPP05R… Renoveren van een woning met herinrichting tot doorgangswoning .
. Gemeente Heist-op-den-Be… 2018-02.04 - rotonde Broekstraat .
. Wommelgem-WOM/2… Heraanleg Ternesselei-Das…
. Universiteit Gent-21VOM006 … Externe preventieadvise…
. ANB-NISIP-SIGMA… Blue Deal - cluster vallei van de Grote Nete - Zammelsbroek: RAAMOVEREENKOMS…
. inBW-PPP0HS-885… Invitation à présenter une offre - Court-Saint-Eti…
. Etalle-PPP0XS-2… Réfection de voiries à Mortinsart PIC 2019-2021 .

HI @umutcankurt

Thnaks for posting the data. Unfortunately the copy and paste on the editor suppressed the special characters. To exactly know what are this characters, could you please post the text in a file, making sure they are not transformed or suppressed ? Alternatively, may be you could post it (copy and paste here again) but within the option </> which should preserve the special characters.

image

Thanks and looking forward.

Best

3 Likes

What methods did you try? From your screenshot it looks that you have there same length substrings to be removed, so I can’t see why node “String Manipulation” with replace function would not help.

1 Like

Microsoft Excel __.xlsx (13.0 KB)
Hi; @aworker I attached the file.

1 Like

Hi; @Experimenter
deletes or shreds after the special character. I just want the characters we want to be deleted and the text to remain intact. I couldn’t do that in my experiments.

Thanks @umutcankurt

I’m entering a meeting now but I’ll be able to have a look after it, in about an hour.

2 Likes

Hi @umutcankurt

Please find below the solution based on a -String Manipulation- node using imbricated replace() functions:

replace( replace( replace( $Data$, "\\...", ""), "\\..", ""), "\\.", "")

Maybe the difficulty here is to know that they can be imbricated but most importantly, one need to protect first the \ sign because it is interpreted and secondly to remove the "\\..." and then the "\\.." and then the "\\." in this order. If it is not done in this order, they may not be removed entirely or properly, leaving trailing orphan dots “.”.

The workflow:

20210920 Pikairos delete unnecessary characters in data.knwf (37.8 KB)

Hope this helps.

Best,

Christophe

6 Likes

Hi @umutcankurt , you could also show us what you have attempted so we can point you to the right direction.

I have a feeling that you did not escape your string to be replaced.

For example, if you want to look for:
\.\.\...
then you need to escape the slashes, and look for:
\\.\\.\\...

One of the reasons I usually do not participate in this type of discussion is because it’s hard to set some replacement rules when the rules are not given, and ends up in a long thread of back and forth, because we give the solution for what was given for a sample data, then the author usually comes back with other rules that are in place and conflict with the proposed solution, and then later comes back with exceptions to be added.

In your case, what are the string or strings that need to be removed? I see that there are “\.\.\..” and also “\.\.\...”. Are these 2 strings to be removed? Or is “\.\.\...” simply “\.\.\..” followed by a period that needs to be kept (in which case, only “\.\.\..” needs to be replaced)?

5 Likes

Thank you so much. You found the solution I was looking for. :+1:

1 Like

@aworker Thank you so much. You found the solution I was looking for. :+1:

3 Likes

Thanks @umutcankurt. My pleasure :blush: !

1 Like

You could also try using the String Replacer node with a regex replacement.
I used [^a-zA-Z0-9 ] to keep only alphanumeric and spaces.

2 Likes

:+1: thanks for sharing

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.