Help needed: Source column header has line break within.

knimedt · July 11, 2022, 8:34am

Hi everyone!

I am having trouble with the syntax/expression in Rule-based Row Filter node. My source file (excel) column has a line break in it, therefore I am getting an syntax/expression error while refering to that column.

See screenshot line 6&7.

Thank you so much!

mlauber71 · July 11, 2022, 9:10am

@knimedt welcome to the KNIME forum. You could try and clean your column names before using them. You use a regex with ‘allowed’ chacaters (you might have to add special cases like “/” if you want to keep them in your case) and remove all others:

ArjenEX · July 11, 2022, 9:15am

Hi @knimedt

Welcome to the KNIME forum.

You can achieve this by using a Column Rename (Regex) node. Simple use [\n\r\t] and it will clean-up all your columns (keep Replace empty).

Before:

After:

aworker · July 11, 2022, 9:27am

Thanks @knimedt for posting this interesting question and welcome to the KNIME forum community.

Thanks @mlauber71 & @ArjenEX for your neat and complementary solutions !

@ArjenEX this is the first time I see a -Column Rename (Regex)- node where the “Replacement” field is left empty (instead of using $1 $2 …) and it works !

Is there any place where this option is documented ? Is this something inherent to regex or rather a -Column Rename (Regex)- node KNIME feature ?

Thanks in advance for your answer.

Best
Ael

ArjenEX · July 11, 2022, 9:37am

To my knowledge, this node has full Regex capability.

Practically speaking, I believe it’s basically doing regexReplace($Column Name$,"[^\\n\\r\\t ]","") as @mlauber71 also showed in his solution but without requiring any additional nodes.

The way I have been using it is for this exact cleansing purpose or to added prefixes to equally named columns in different data streams so that I recognize the source more easily after doing joiner operations (using the $1 etc.)

aworker · July 11, 2022, 9:42am

Thanks @ArjenEX for the complementary information and great to know about this resemblance with regexReplace().

knimedt · July 11, 2022, 9:45am

Thanks everyone for the quick answers!
Works like a charm
Have a great day!

ArjenEX · July 11, 2022, 10:06am

Great it hear! Please mark it as solution so that other users can also benefit from this in the future

@aworker If you are really tired of your job you can always check the sourcecode of the node what is going on and reverse engineer the requirements a bit, assumingly you can navigate your way through Java.

github.com

knime/knime-base/blob/f82ed81e4e15b92324b238297ca1fcb4f7da05e2/org.knime.base/src/org/knime/base/node/preproc/columnrenameregex/ColumnRenameRegexConfiguration.java

/*
 * ------------------------------------------------------------------------
 *  Copyright by KNIME AG, Zurich, Switzerland
 *  Website: http://www.knime.com; Email: contact@knime.com
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License, Version 3, as
 *  published by the Free Software Foundation.
 *
 *  This program is distributed in the hope that it will be useful, but
 *  WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, see <http://www.gnu.org/licenses>.
 *
 *  Additional permission under GNU GPL version 3 section 7:
 *
 *  KNIME interoperates with ECLIPSE solely via ECLIPSE's plug-in APIs.

This file has been truncated. show original

Note: also approachable by the Find Source button at the bottom of the NodePit page and go to the correct .java file.

In this case, the only requirement I see for the replacement is that after processing the set string, the new column name cannot be empty. Next to the usual IndexOutOfBoundsException catch.

aworker · July 11, 2022, 10:17am

Hi @ArjenEX

Yes I’m aware of it and I do it time to time but didn’t for the -Column Rename (Regex)- node

Great to share this information with all the KNIME forum community

mlauber71 · July 11, 2022, 10:19am

@ArjenEX you solution is very elegant thank you for that I would like to point to one characteristic of my solution that might be interesting if you have to deal with very messy data (headers) often. It allows you to define which characters are allowed and would throw out all the rest. So @knimedt if your source would come up with other funny characters they would also get removed.

system · July 18, 2022, 10:19am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.