Convert data input level (level 1, 2, 3) into tree (1, 1.1, 1.1.1, 1.1.2...)

christoph_knime · July 6, 2024, 8:21pm

Hello KNIME community,

I have a list of data input, where every line is assigned a level. If the value of the level increases, that means the data is associated one level down in a tree position.

The order of the data is extremely important, as it defines which e.g. level 5 item, belongs to which level 4 top level entry. As this is extremly volatil and can not be handled in a good way doing data operations, I want to convert this information into an unambigous tree value.

So I start with giving every level 1 line its own number (1, 2, 3…), then for every level 2 line the corresbonding level 1 value, plus its own level 2 value (1.1, 1.2… 2.1, 2.2…).

While the values only go one step at a time into higher levels (so if line 10 is level 4, line 11 can only increase to level 5, not to level 6 or higher), if the level decreases it can skip multiple levels (so if line 14 is level 6, line 15 can also be level 4). At the moment the levels go from 1 to 17, but this could change in the future, so best case would be, it is not dependent on a hard copy value of the number of available levels.

There are a view ideas, that this should be possible with a for loop, but I am not getting there. Could you help me, please?

To make it 100% clear, I defined a table of example data - “Line” and “Level” as input - together with “Treeposition” as the desired Output.

Line (Input)	Level (Input)	Treeposition (Output)
1	1	1
2	2	1.1
3	3	1.1.1
4	3	1.1.2
5	4	1.1.2.1
6	3	1.1.3
7	3	1.1.4
8	2	1.2
9	3	1.2.1
10	4	1.2.1.1
11	5	1.2.1.1.1
12	5	1.2.1.1.2
13	5	1.2.1.1.3
14	6	1.2.1.1.3.1
15	4	1.2.1.2
16	2	1.3
17	2	1.4
18	3	1.4.1
19	4	1.4.1.1
20	1	2
21	2	2.1
22	3	2.1.1

Thank you very much in advance.
Christoph

knimediger · July 7, 2024, 12:58pm

@christoph_knime

First of all: Welcome to the world of KNIME.

Form my point of view I would simply compare the level column of each line with the one of the previous line. Depending on the result you know whether to increase the level, add the next level(s).

The important node for this task is the Lag Column Node Lag Column — NodePit
This node copies the content of the previous line into your current line. Now it’s quite simple to compare the two columns and act accordingly.

HTH

takbb · July 7, 2024, 3:06pm

Hi @christoph_knime , welcome to the KNIME community

If you are happy to write some code, the Column Expressions node (in the KNIME Expressions extension) can derive this by cumulatively calculating the “tree”.

var treeOutput  // variable for cumulative tree calculation

// define the column representing the current Level
currentLevel=column("Level (Input)")

/* define some support functions */
function getTreeToLevel(tree,level)
{
    if (tree==null || tree=="")
    {
        return "0"
    }
    
    /* evaluate the current tree up to the Nth period */        
    n=level
    l=length(tree)
    outTree=""
    for(var i=0;i<l && n>0 ;i++)
    {
        c=tree[i]
        if (c=="." ){n--;}
        
        if (n>0)  outTree=outTree+ tree[i]
    
    }
    if (n>1)
    {
        // add a new level as .0 (this will be incremented by caller)
        outTree=outTree+".0"
    }
        
    return string(outTree) // make sure the output is a string or problems can occur
}

function getTreeLevelCount(tree)
{
    return toInt(countChars(tree,".")+1)
}

function incrementTree(tree)
{
    currentLevel= getTreeLevelCount(tree)
    
    lastNumber =  substr(tree,lastIndexOfChar(tree,'.')+1 )
    parentLevel=currentLevel - 1
    if (parentLevel==0)
    {
        return toInt(toInt(tree)+1)
    }

    lastNumber ++
    
    return getTreeToLevel(tree,parentLevel)+"."+toInt(lastNumber)
}

// special case for initial "unknown" tree
if (treeOutput == null)
{
    treeOutput="1"
}
else
{
    x=getTreeToLevel(treeOutput, currentLevel)
    treeOutput=string(incrementTree(x))
}

treeOutput // explicitly output the current value of variable

I’ve uploaded a demo workflow using your sample data here:

takbb · July 7, 2024, 4:08pm

An alternative approach, would be to use a recursive loop, and to calculate each row by using the values calculated for the previous row. This can become a little involved as you need to create a copy of the “tree” value from the previous row.

Last year I built set of components which I called the “Cumulative Framework Components” which can be used as a template for this kind of processing.

These components automatically provide you with the output for the previous row within a recursive loop, and the following workflow also provides a solution to the problem you have posed, without using either Column Expressions (or Java Snippet), which are the two scripting nodes that can perform cumulative calculations.

For more info on my cumulative framework components, see here:

Background notes: In both the Column Expressions workflow and the Cumulative Framework workflow, the same set of rules for deriving the tree apply:

Rules:

rule	description
1(a)	If there is no previous row, set the “current working tree” to “0”
1(b)	otherwise set the “current working tree” to the tree from the previous row
	(the “current working tree” is the tree value that we are working on for the current row)
2	Taking the level value as N, set the “current working tree” up to block N (by counting the blocks separated by “.”, and assuming an imaginary “.” on the end of final block ). Remove surplus numeric blocks (if any)
3	If the current working tree has fewer blocks than required for the current level (N), append “.0” to it.
4	Increment the final numeric block in current working tree by 1. This becomes the new tree for this row.

In this way given the levels:

level	becomes	according to rule	transformations
1	1	1(a),2,4	nothing → 0 → 0 → 1
2	1.1	1(b),2,3,4	1 → 1 → 1 → 1.0 → 1.1
2	1.2	1(b),2,4	1.1 → 1.1 → 1.2
2	1.3	1(b),2,4	1.2 → 1.2 → 1.3
3	1.3.1	1(b),2,3,4	1.3 → 1.3 → 1.3.0 → 1.3.1
4	1.3.1.1	1(b),2,3,4	1.3.1 → 1.3.1 → 1.3.1.0 → 1.3.1.1
2	1.4	1(b),2,4	1.3.1.1 → 1.3 → 1.4
1	2	1(b),2,4	1.4 → 1.4 → 1 → 2
2	2.1	1(b),2,4	2 → 2 → 2.0 → 2.1

and so on.

christoph_knime · July 8, 2024, 7:29am

Thanks for the welcome! I already love this community

This seems like the easiest approch, but I have one question. When the level goes from 3 to 4 it is easy, as there is just one tree level added, also if it stays the same, as the last number in the tree has to be increased. But if it goes from 6 to 2, the change is dependent on how low the level falls. So how would you go about this?

Nevertheless, while I am 99% sure, that also in the future the data always only goes up 1 level, I do not have any guarantee on that. And as the other solutions do not have a fallback for that, I will use this node to check the data quality (because if the level jumps, this would be an error and the data had to be checked). For this your suggested node is perfect.

Thank you very much.

christoph_knime · July 8, 2024, 7:56am

Hi @takbb,
thank you very much - the Column Expression with the Code works perfectly!

Normally I am hesitant to use code, as I am not all to familiar with coding. But enough to (at least 90%) understand what is happening, and I think it is a very clean solution for my problem, that integrates nicely with my current workflow. So I will for sure give this a try!

I have to look into the workflow implementation. It looks intriguing, but I have to get deeper into the provided workflow to understand it and to be able to incorporate it. The explanation is quit clear, I just have to map that to the workflow.

One question I have is, why you decided to start a new branch always with 0 and then increase it to 1, instead of directly going to 1?

takbb · July 9, 2024, 6:18am

Hi @christoph_knime , I’m glad you found it to be useful.

The reason for initially setting the branch at zero was so that the remainder of the code always works the same way without having to add further condition, or logic.

If you look at the set of descriptive “rules” (that I wrote in my second post above), the final step (4) is to increment the final number in the tree. So regardless of whether it adds a new level, or stays at the current level, or has returned to a higher level, when it gets to this point it increments the final number value.

So when adding a new level, the earlier code puts .0 as the new level in the knowledge that the final rule will then increment it, so it becomes .1. In that way this part of the process doesn’t need to know whether the level has changed as it works the same way each time.

I hope that makes sense but if not I’ll try to explain further.

system · July 16, 2024, 6:19am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.