Cell Splitter misbehaving adding characters to values

Hello all,

I have seen a strange behavior using the ‘Cell Splitter’ module today:

When using the ‘joiner’ node prior to the ‘Cell Splitter’:

I see the following data which I am about to split in columns:

When using the ‘Cell Splitter’ node:

I get the following result:

I find it strange because when splitting different type values, strings for example, I did not see this behavior. I did ensure it was set encoding to UTF-8 when reading the .csv node as well.

Any ideas?

Thanks,
J.

@jarviscampbell you could try and change the scientific notation:

1 Like

Hi @mlauber71 ,

I did that,

Applied and Closed it. Reset Nodes. Ran them again.
No luck.

Restarted Knime, reset nodes again, ran it. No luck.

Same output, with the E10s at the end.

BR,
J.

Hi @jarviscampbell , basically, the original values with the “;” were string, and when you split them using “;” as delimiter, you are left with numbers and therefore Knime tries to guess the type, and that’s how Knime eventually represents them.

You could try converting them back to string after the split. You can use the String Manipulation (Multi Column) node for this.

  1. Select the columns you want to convert
  2. Use the following expression:
    string($$CURRENTCOLUMN$$)
  3. Select “Replace selected input columns”

You should get the values as they were before, splitted of course.

1 Like

Hi @bruno29a ,

tried all sorts of variations in the hopes you had meant ‘bring back to string’ “before” the split. But still. Didn’t work.

Please see attached.
ps: sorry, had to change .csv to .txt to be able to upload here

Thanks.
BR
J.
Cell Splitter misbehavior-fileReader.knwf (315.3 KB)
small-sample-data - small-sample-data.csv.txt (43.0 KB)

As your small sample, you have some rows with any and others strange situations my friend… A question: Are all csv files that have been input correctly? headers, columns, rows? Are they at the same format?

The “join node” have a first condition that is: the same “ID” column to match values… if start with some mistakes, all the process will fail. If possible, check the files consistence first…

After the join, check the output of this node before any other action.

If you have the same kind of files, you can use just one node of “File reader” and just configure the option “Files in folder”, set up the file name example and extensions as print below. Just click at “Files in folder”, “Filter options” and then, the options of the files.

Be careful with de delimiters and quotes, they can make a mistake for the information.

Which delimiter do you use in your files? “,” “;” …? Sometimes i set to use pipe “|” because no one use it normally… below this windows you’ll see a example of the data from the files.

After that, try to look what happens at the Join node output. In the Include

At the output, if just set match option, just the top arrow will be actived.

Check the result for erros… if exists, something with the files or the join column exists…you can don’t merge de columns … you’ll have a couple of columns, but you can see and compare both together.

If all is correctly, then try the splitt node.It has a lot of options to join or not the information…

Here you can output as a list (array), as set but will remove duplicated or create a new one. Try it and tell us the result ok?

Tks

3 Likes

Hi @denisfi , I appreciate the effort in searching where I could have dropped the ball.

Thank you for your solution. That’s exactly the route I took. Transformed it to a list than reverted. I hope it won’t give me trouble right after that… we’ll see.

Just wished the ‘String Manipulation’ with the “string” expression would have worked as expected. Anyways, Thanks.

J.
Cell Splitter misbehavior-Fix.knwf (329.5 KB)

1 Like

Hi @jarviscampbell , I’m not sure you understood what I was trying to do if you said that you hoped I meant bring back to string “before” the split. Before the split, the values in your column target are already strings. It’s only after the split that they do not seem to be anymore.

In any case, I got curious to see what my manipulation gave, and it looks like the split itself gives you the values in Double:

You can see the [D] in the column names that indicates a type Double. The first column stays as String ([S]) because there is the string “any” in it (came with your file).

When I apply the solution I suggested, that’s when it actually gives the same results as what you originally had:

I’m not sure how you got your results, but it would have been helpful to show all these details (that the columns were already of String type, etc).

I always say this “help us help you. The more details/info you give, the more precise the solution will be”.

Nonetheless, the idea is still good, and since it looks like you really wanted the suggestion with the String Manipulation to work, I’ll just make it work with a little hack. In order to keep the columns as String, we just need to make sure that the content being splitted will be string.

To do this, I’ll just add a dummy character within each content, for example, I’ll add the “#” sign.


join("#", replace($target$, ";", ";#"), "#")

After that, you just do your Cell Splitter as you did:

At this point, the cell will be splitted into String columns, with the dummy value added:

Now we can use the solution with the String Manipulation to remove the dummy value, and to keep the columns as String:


string(replace($$CURRENTCOLUMN$$, "#", ""))

And there’s your result:


You basically just need to add these 2 nodes:
image

3 Likes