How to pad a string?

Thiemo.Kellner · December 1, 2017, 10:05am

Hi

I need to left pad a string to be able to use it as join criterion, but I was not able to find such functionality in KNIME? I know the String Manipulation node but this does not sport padding... do I need to write a Java snippet?

Kind regards

Thiemo

Thiemo.Kellner · December 1, 2017, 11:13am

I could create the functionality by a Java snippet but I am still interested in knowing whether there is an out-of-the-box solution. I am rather concerned that if such a common transformation is missing, many more are missing and KNIME is not quite up to the task of ETL... sure it is not advertised as being one. My code is (derived from https://stackoverflow.com/questions/13475388/generate-fixed-length-strings-filled-with-whitespaces):

int width = 10;
char padding = '0';

return new String(new char[width - $PID$.length()]).replace('\0', padding) + $PID$;

ferry.abt · December 12, 2017, 3:41pm

Hey Thiemo,

I'm not aware of a direct function that does this, however a Java Snippet (simple) with the code return String.format("%010d",$PID$); is hardly any more complicated than a single node. There are solutions in R, python and any other integration KNIME offers available on the internet as well.

Personally, I agree with you. That's something that might be worth to add. However, there are thousands or even millions of possible ETL steps out there, and although there are so many different nodes in the Analytics Platform and even more in the community extensions, there will always be an ETL step that is not there as a dedicated node. Also it might be in there, we both just haven't found it yet.

The forum is always a good place to look for some pointers or neat solutions to a certain issue. Feel free to ask whenever you're looking for a certain functionality or elegant solution, or just send me a message.

Best regards,

Ferry

imagejan · June 13, 2018, 8:58am

While that’s certainly correct for a single string padding operation, it would still be useful to have padLeft and padRight functions (similar to the ones available in Groovy, for example) available in the String Manipulation nodes, for combining them with other string operations. This would avoid the need of having an additional node just for the padding, or moving the entire string manipulations into a Java snippet.

So how about adding the PadLeftManipulator and PadRightManipulator classes to the org.knime.base.node.preproc.stringmanipulation.manipulator package here:

https://github.com/knime/knime-core/tree/master/org.knime.jsnippets/src/org/knime/base/node/preproc/stringmanipulation/manipulator

I’d suggest the following function signatures (with default padding with spaces for the ones not defining a padding char):

padLeft(str, length)
padLeft(str, length, padding)
padRight(str, length)
padRight(str, length, padding)

Of course, already now you can do some padding in the String Manipulation node, without the need for a Java Snippet node, but it’s a bit cumbersome and not very readable, e.g. to pad an integer to four digits with 0s:

join(substr("0000", length(string($${INumber}$$))), string($${INumber}$$))

moritz.heine · July 6, 2018, 7:40am

Hi,

I’m happy to announce that string padding will be added for the upcoming release!

Cheers,
Moritz

DemandEngineer · April 9, 2021, 3:50pm

Am I misunderstanding the examples in the String Manipulation node? one example is
padLeft(null, *, *)
which should resolves to null … but this causes an error

what I’m most interested in is using the * wildcard for the size of the string

Daniel_Weikert · April 9, 2021, 4:15pm

Sometimes the wildcard needs to put in quotes. Not sure whether thats the case here as well

ipazin · April 12, 2021, 10:11pm

Hello @DemandEngineer,

* in these examples represent any size or character and not a wildcard pattern. So above example means that padding null string with any size and with any character will give you null string. Can you give us some example with data to better understand what are you trying to do?

Br,
Ivan

DemandEngineer · April 15, 2021, 7:08pm

I think I’m using the wrong function… I was tired and not reading or rather comprehending… I was looking for a concatenate and missed the join so I thought I could use padLeft if it length could be variable based on original length but then that would not be a pad… that would be concatenate.

when you are tired your ability to get stuff done has diminishing returns

DemandEngineer · April 15, 2021, 8:05pm

Feature request. I realize this is alot of overhead but it would be a very powerful selling point… Interface Language that is beyond spoken language… English (SQL user) or English (Excel user) or English (Marketing User) or English (Data Scientist)

Background: I had worked at a company where we acquired a company where the native language was not english. It supported a number of languages but the interface always felt like it was poorly google translated to English. Using word not commonly used in competitor apps. It always felt like it was not only translated from another language but also used Engineering terms but the target audience were marketers. This made onboarding new users especially tough.

How it relates to KNIME:
For instance using “join” for concatenation in String Manipulation node is a little odd as both SQL, Excel, Google Sheet and other competitors use some form of “concatenate” as the wording.

This would be a big undertaking and maintenance effort but perhaps could be crowd sourced by the community if you build the framework in the app then the community can create custom translations for other users to download and use. I know there is little likelihood this will get traction but thought I would throw it out there. It could really help expand user base.

izaychik63 · April 15, 2021, 8:23pm

KNIME is talking with Java and ML accents. It is not end user tool and recently even moved in programming tool area not in direction of end users tools.
Ex.

DemandEngineer · April 15, 2021, 8:45pm

If the target niche is programmer… that makes sense. I’ve observed most programmer perfer to stay in code for production and use other tool for rapid prototyping… I have seen other apps that aid programmer by giving them an interface to build up complex logic but then allow them to see what code is generated behind the scenes which they can examine and optimize.

But if the niche is Data Scientist… the question should be asked is their primary language the programming language (not sure if they use consistent terms) or something other set of apps they use to manipulate data. I don’t know the answer to this.

There is a shortage of Data Scientist as compared to open and anticipated roles at companies so giving programmers the focus is smart to tap into a pool of technical talent. Selfishly being a citizen data scientist I would love some attention for those of us users that have a business / engineering data analyst background but have long forgotten code (#FORTRAN but I heard it is coming back)

jarviscampbell · July 18, 2022, 4:55pm

Hello there,

I would like to loop over several columns and rows automatically. I may have 250 columns and 7000 rows… some of the cells do have data, some don’t.

I read the post, Replace - PadLeft - KNIME Analytics Platform - KNIME Community Forum but I couldn’t tell how to loop it.

I was able to shorten the expression using < padLeft(column(“Col01”),12,“0”) > meaning, I want all cells to have 12 digits and the ones that do not, add a zero as a prefix.

Any help is appreciated. Thanks!
J.

Here’s what I am doing, but since I have 200+ columns, I am looking for an easier way

Daniel_Weikert · July 18, 2022, 5:24pm

Hi,
if you provide a sample file it is easier for members to help you here
br

jarviscampbell · July 18, 2022, 7:31pm

Sure, it would be something like this:

I would like to add 0s to the left of the strings that do not have 12 characters. Some do as shown towards the left. However, the way I am doing, I am having to create a line as an output column for each column, as since I have a couple of hundreds… it becomes harder to do it manually

ipazin · July 18, 2022, 10:01pm

Hello @jarviscampbell,

there is a String Manipulation (Multi Column) node to transform (pad) multiple columns at once.

Br,
Ivan

jarviscampbell · July 19, 2022, 2:19pm

Thanks @ipazin

Can you tell me what portion of the expression will tell Knime to loop through the cells?

I’m getting the following error:

Thanks!
Regards,
J.

jarviscampbell · July 20, 2022, 1:41pm

Hi @ Daniel_Weikert, would you know how I can use this repository to loop recurrently over the columns and/or rows, not in any specific order? Thanks

jarviscampbell · July 20, 2022, 3:55pm

I think I got it. I have used this workflow Looping over Every Column with some modification to it according to my needs.