Moving data between columns

Hi,

in a csv-file the column delimiter has also been used as a value in column 9. This results in an additional column. Correcting this in python/pandas is quite simple. The following code shifts the data in these rows back to the right columns:

mask = df.iloc[:, -1].notna()
df.loc[mask, df.columns[8:]] = (df.loc[mask, df.columns[8:]].shift(-1, axis=1).values)

I could not think of a simple solution for this in KNIME. Isn’t there any or did I miss something?

Regards
Dirk

1 Like

a)
no, knime does not support this with default nodes. especially if we talk about non-static structures with varying number of columns or changing column names (Knime just plain sucks for index based operations)

b)
the closest you may get to is using the Column List Loop Node.
Alternatively, you can reach the same result using using Java Snippet or Python snippet nodes.

c)
the more important question is why your csv file essentially is “corrupt”. Whatever tool is creating those files, should either " those values or be configured to use a different delimiter if you cannot guarantee that your delimiter is part of the dataset. In the best case, you switch to a modern format like parquet instead.

4 Likes

Hello @dirkschumacher,

if it’s always Column 9 then use Column Combiner node to join column 9 and column 10. After that you can filter these column out. But I agree with what @fe145f9fb2a1f6b mentioned under c) - values should be quoted or another delimiter should be used.

Br,
Ivan

3 Likes

Hello @ipazin,

when this happens, data in all columns after the one with the error must be shifted. Of course, the root cause should be resolved, but I’m focusing on how to handle the data shift in KNIME, which doesn’t seem straightforward.

Regards
Dirk

1 Like

Hi @dirkschumacher , to be honest with you, if it’s simple to do in python/pandas then one of the great things about KNIME is that you can just do it in python/pandas :wink:

I’ve put together a workflow here that I think could do what you are asking, using KNIME nodes, but it’s the kind of thing you would only do to attempt to prove it can be done :rofl: . The Table Cropper node, which I’ve hardly ever used in a production workflow, allows us to remove the last column from the table without knowing its name, or index number.

To ensure that the table ends up in the same row sequence that it started before manipulation, I added a Counter. However, because we want to subsequently delete the final column using the Table Cropper, the Counter cannot remain on the end of the table, so there’s an extra complication at the start just to put the Counter into column 1 instead of column n.

You’ll also see I added a component of mine as a convenience to re-align the column names, but internally it just uses a standard pattern for doing this (that I can never remember… primarily because over the years, the KNIME team have changed the names of the nodes… hence the component :wink: )

In the workflow, I also included an adapted version of your python code and included it in a python script node as follows :

import knime.scripting.io as knio

# data from KNIME to Python (pandas)
df = knio.input_tables[0].to_pandas()
mask = df.iloc[:, -1].notna()
df.loc[mask, df.columns[8:]] = (df.loc[mask, df.columns[8:]].shift(-1, axis=1).values)

# Remove the last column (your "na" column)
df = df.iloc[:, :-1]

# data from python (pandas) to KNIME
knio.output_tables[0] = knio.Table.from_pandas(df) 

shifting columns in some rows to the left.knwf (78.8 KB)

I’m not 100% that I fully understood what your csv data would actually look like, but regardless, I think it demonstrates that your python script, adapted to work with KNIME is the clear winner here.

4 Likes