I have read a text file with adresses (creating unique row id's). Housenumber and extension are combined in one row and I split these using regex split (^\d+)[\ ]*(.*$)
I can then use value counter for all the columns, except the split off extension column. It does not execute: "encountered duplicate ROW id "?" ".
Any ideas on this?
PS does anybody know a sort of library with usefull regex solutions for known challenges?
you could use R's stringi or stringr but in case of non-ascii characters you'll run into problems if referencing those in the R code via the KNIME's R node code window...
Ok, thanks. But nevertheless, why does this happen in the first place?
What does the data structure look like ?
which node are you using? The KNIME Value Counter only takes one column and does the counting. But you were writing you applied the value counter to all columns?
What information are you looking for? Ik have a string variable with values like "88A", "4 II", and these I split.
Anything in particular I can add?
I use KNIME Value Counter separately for every column.
the problem is as follows: Your column contains missing values and the value "?". The Value Counter knows those are different but their string representation is not different. And than the node fails because there are two row with the RowID ?
What you can do: You can use a MissingValue node before the Value Counter node and replace the missing with some constant string like "MissingValue".
Or you can use the GroupBy node.
I will open a bug report for this one. Thank you for detecting and reporting it!
Indeed, the single occurrence of "50?" which was split into number "50" and extension "?" caused the problem.
A simple replace($extension$,"?","") helped too.