Weird behaviour (doubling of a string) when replacing a regular expression

Hello,

I have the following SQL Statement (a Little bit simplified):

select regexp_replace(my_table.my_column, “.*” , “YES”) as wurde_kontaktiert
from my_table limit 100

When I am executing this from Impala (outside of Knime), it replaces every entry in the “my_column”-Column with “YES”. But when I use the exact same Statement from within Knime, using the Impala-Connector and database reader, every string is replaced with “YESYES”. Why is this, how can I Change that?

Best regards

Interesting - I saw similar behavior a few days back with the Column Rename (Regex) node, where the replacement text was doubled when a column name match using ‘.*’ as the regular expression was supplied. In that case, I fixed it by changing to ‘.+’, but I don’t think that is the expected behaviour either.

Steve

“.*” has the side effect that is also matches the empty string. Therefore if the used replace function applies the pattern repeatedly until no more matches are found (which regexp_replace seems to do) it first matches the full string and afterwards the remaining empty string.

2 Likes

Hello,

thank you, using “.+” worked.

@thor: Ah, thank you for the explanation. That still does not seem to explain why the regexp_replace function behaves differently in Knime and outside, except if the implementations are different, right?

No it doesn’t. But the replacement isn’t done by KNIME. It either happens directly in the database or in the driver. The latter could explain it if different drivers are used.