Easiest way to convert multiple columns to duration

Hello,

I am trying to convert multiple normal duration columns (“01:23:45”) to KNIME duration columns (“PT1H12m45s”). What is the easiest way to do it?

A solution working for one column is: split values by “:” to columns x_Arr[0] … x_Arr[2], combine them with string manipulation node to the desired format and then convert with string to duration node. I tried to make a loop based on the same workflow by using a Column list loop start node, in which I define the columns I want to convert. But I couldn’t find a solution how to tell in string manipulation node the column names. String manipulation doesn’t recognize for example: ${string($${ScurrentColumnName}$$+“_Arr[0]”)}$ or similar. I couldn’t find any reference guide for the syntax the string manipulation expression is accepting.

Any ideas?

Hello @J_C,

my usual approach when you need to do same operation on multiple columns (if there is no dedicated multi column node) is to unpivot, apply logic and then pivot data back. See here example:

Br,
Ivan

2 Likes

Hi @J_C ,

Re your thoughts on being able to do

${string($${ScurrentColumnName}$$+“_Arr[0]”)}$
or similar

You are not the first to want to be able to do something like this, but unfortunately there is no syntax for such substitution. It would be nice if there were such a facility. The String Manipulation (Multi Column) node, does however allow you to simply reference the $$CURRENTCOLUMN$$, and if you have a set of columns that you could easily select using a regex or wildcard pattern then this may also be a suitable option. I have on occasion combined using the String Manipulation (Multi Column) node with the Column List Loop, so that it is passed via a flow variable the current column to include, and then this one column can be referenced using $$CURRENTCOLUMN$$ in the expression. Much of the time that might seem overkill, and loops are to be avoided for performance reasons if possible.

In the single column String Manipulation node, it would require that the entire “expression” be passed in using a Flow Variable. So you could hand-craft your expression as a String in a String Manipulation (Variable) node, and then pass that into the String Manipulation node, but that can be quite hard work (you really have to want to do it!), and is prone to lots of trial-and-error+frustration in getting the whole syntax built correctly.

Using @ipazin’s pivot idea (which I think is brilliant) , or in some other way transposing the table so that columns temporarily become rows can often be a good solution as most of the transformation nodes are designed to work quickly across multiple rows but are not so useful with multiple columns.

4 Likes

This looks great solution for cases where you just replace single values to other single values, but the problem with my case is that I would need to compose the values from three different columns per each column to be converted.

Let’s say we have a column named “Running time” having values like “01:23:45”. As far as I know, I just can’t convert this simply to duration data type, but I need to manipulate the format for KNIME to different and, as far as I know, there is not other easy way of doing it than splitting the values to different columns like one having 01, the other having 23 and the third having 45. Then I need to concatenate them to the desired format like “PT”+ $Running time_Arr[0]$ + “H” + $Running time_Arr[1]$ + “m” + $Running time_Arr[2]$ + “s” and then I can convert the values to duration data type. Let’s then say that we also have couple of other columns called “Best lap”, “Worst lap”, “Average lap”, etc. To be able to convert them to a duration as well, I need again to make the splitting, concatenation and then conversion. One or two columns is ok to “hardcode” the columns names to the string manipulation node, but having ten or more columns it is not anymore meaningful.

I don’t understand how I could make this kind of processing with the unpivoting and pivoting approach. Of course, if there are other solutions for converting the normal duration format to KNIME duration format, I gladly hear about them.

Hi @J_C,

I think String Manipulation with a regexReplace should be able to handle it.

Take a look at this using the “@ipazin method” :wink:

Manipulating duration strings on multiple columns.knwf (19.4 KB)

Sample data:

Transformed data:

If that doesn’t fit with the way your data is held, please can you upload a sample (ensuring nothing sensitive is included) so we can come up with other ideas.

1 Like

I thought I’d also demo how it could be achieved with a loop construct, so here is an additional workflow containing both methods:

The Column List Loop Start iterates across the chosen “duration” columns. This loop generates a flow variable “currentColumnName” and this is then used by the nodes that follow.

Inside a Column List Loop, I find it much easier to use the Multi Column version of String Manipulation because then we don’t need to know the column name (which I believe is the issue you were facing)

It is configured as follows:


In the expression you can use $$CURRENTCOLUMN$$ instead of a hard-coded column name, and it will then apply the expression to whatever column is configured.

I’ve also set it to replace the selected input column.

Note that in the “include” panel on top right, you don’t include any columns and you set it to “Enforce inclusion”. This is important as the column we want to apply the expression to will be set by a flow variable:

Likewise on the String to Duration node, no columns are selected and “Enforce inclusion” is selected

Once again, the included column is specified at run time by the flow variable from the loop

And in the Column Filter, I think you will see a now-familiar pattern: :wink:

image

If we didn’t filter the columns, we would get the duration columns plus all other columns from the table repeated on each iteration.

After that, outside the loop the new duration columns are united by a joiner with their respective rows, and the joiner is configured to exclude the original duration columns.

I hope that helps :slight_smile:

Manipulating duration strings on multiple columns - unpivot or loop.knwf (43.2 KB)

6 Likes

Beauty of simplicity :slight_smile: Funny that I didn’t invent this regex by myself even if I have developer background :sweat_smile:

Thank you so much for the help! Time to time I think I am starting to learn knime way of thinking more and more :slight_smile:

You’re welcome @J_C

An alternative in String Manipulation to using regexReplace would have been to use the substring (substr) command, and pull out each of the digit pairs at positions 0, 3 and 6 of the string as follows, and have them concatenated with the additional string components “PT”, “H”, “m” and “s” as follows:

join("PT",
	substr($ColumnValues$,0,2),"H",
	substr($ColumnValues$,3,2),"m",
	substr($ColumnValues$,6,2),"s"
    )

(and thanks for marking the solution! :slight_smile: )

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.