Building rows based on other rows

hmd_pouya · October 8, 2021, 12:53pm

Hi,

I have a table in the following format
col 1|col 2|col 3
a | b | x
b | c | x
c | d | x
e | f | y
f | g | y

Now, I want to build a new table by merging the rows with the same value in col 3 like
col 1|col 2|col 3
a | d | x
e | g | y

I appreciate any help or suggestions.

aworker · October 8, 2021, 1:30pm

Hi @hmd_pouya

Your problem is the same as the one solved in this other thread:

Your data can be considered as a table of edges of a relational graph. A relational graph is made of nodes and edges. Your letters from “a” to “g” are the nodes. Your Table rows are the edges between nodes: for instance first row says “a is connected to b”. What you want to find is the beginning and end node of a set of connected nodes, i.e. a path leading from “a” to “d”. Btw, your 3rd column is the name of the path: for instance path x is “a-b-c-d”.

The thread given above should thus be a solution to your question.

Hope this helps. Otherwise please let us know and we will be happy to help.

Best

Ael

hmd_pouya · October 8, 2021, 2:35pm

Thanks, @aworker .

I checked the other solution and I think I need something very similar. I had a similar idea to what @ipazin proposed by using cell replacer. The issue I have is that I don’t know how to take the 3rd column into account. Let me use another example here.

col 1|col 2|col 3
a | b | x
b | c | x
c | d | x
d | e | y
e| f | y

The expected outcome is the following

col 1|col 2|col 3
a | d | x
d | f | y

As you see, the third column defines the start and end of the path. Without it, there is only one path going from a to f.

aworker · October 8, 2021, 2:40pm

Hi @hmd_pouya

Interesting ! It happens that your label is already splitting your relational graph into subgraphs, so if instead of giving all the rows to the workflow that I previously suggested, you give them to the workflow by chunks, then the problem should be solved. Every chunk should only contain the rows of the same label in col 3. For instance:

1st chunk → Label x
2nd chunk → Label y
…
and so on so forth. Thus the solution here is to use an extra chunk loop to deal with separate subgraphs.

Hope this is clear enough

Otherwise just let me know.

best

Ael

hmd_pouya · October 19, 2021, 1:45pm

Thanks @aworker . Sorry I could not get back sooner. That was a great idea and worked well. I get the rows with same values in col3 in the outer loop and then go through the targeted rows with the inner loop.

I just use “Table Row to Variable” node for defining the number of loop iterations but it has become deprecated. I think the substitute for this node is “Java Edit Variable”. Could you please explain how it works? Or guide me to the right place (preferably with an example)?

Thanks again for the help.

ScottF · October 19, 2021, 2:49pm

Deprecated nodes in KNIME still work just fine - it’s just an indication that there’s a newer version of the same node that you may want to try out (maybe it has additional features, or executes more quickly).

It doesn’t mean that you should switch to an entirely different node, that’s for sure

aworker · October 19, 2021, 3:07pm

Hi @hmd_pouya

I’m glad the solution worked as required for you.

@ScottF gave you explanations on how to consider deprecated nodes, so I think the question is solved. Otherwise please let us know.

Let us know too if you need any further help on this. We will be happy to help.

Best

Ael

hmd_pouya · October 19, 2021, 3:23pm

Thank you both @ScottF @aworker . As I mentioned and you highlighted, the code works just fine with the deprecated node too.

Daniel_Weikert · October 19, 2021, 5:38pm

Just thinking couldn’t this also be solves with groupby and then use first and last as options?

aworker · October 20, 2021, 12:55pm

Hi @Daniel_Weikert

I do not think so …

Best

Ael

hmd_pouya · October 20, 2021, 1:25pm

What as elegant solution … I just checked and it worked like a charm. Thanks @Daniel_Weikert

bruno29a · October 20, 2021, 1:26pm

Hi @aworker , Can you explain why do you think the groupby suggested by @Daniel_Weikert would not work?

I had the same suggestion too, and I thought it would work, so I tested it, and it gives the expected results:

Input data (same as what you guys have):

Results after the groupby:

Groupby config:

EDIT: I’m attaching the workflow in case you want to look at it: get first and last of groupby.knwf (6.7 KB)

aworker · October 20, 2021, 1:47pm

Hi @bruno29a

It works because the relationships are between letters that are sorted:

a → b
b → c
etc …

It would not work if the relationship were between not sorted letters, for instance:

a → c
c → b
b → e
… etc.

I thought the problem was generic, as in the latest example, and not just for the case of relations between sorted letters as in the former example.

So definitely the -groupby- node solves the problem in this particular case but (i still believe) it doesn’t solve the generic one of -relations between non-sorted letters-.

In other words, my solution was more generic. But if @hmd_pouya’s problem is only for -relationships within sorted letters- then @Daniel_Weikert solution is perfect.

Hope this clarifies my previous sincere comment to @Daniel_Weikert.

Best

Ael

bruno29a · October 20, 2021, 1:53pm

Hi @aworker , right, that was the only “weakness” I could think of, but I think first and last does not rely or sorting the data alphabetically, but rather in which order the data is.

For example, look at the following input data (I swap a and c in column1):

If I run the same groupby, I get this result:

c was indeed the first record. It’s not sorting the column to detect that a was first.

aworker · October 20, 2021, 1:56pm

Still your example is again a special case. To put it simple: If in every chunk, first letter in first column is the first expected letter and last letter in the 2nd column is the 2nd expected letter, then the groupby works. The solution I provided is generic: it works regardless of the order of the letters or/and the rows.

bruno29a · October 20, 2021, 1:57pm

Hi @aworker ,

I think I missed what you were saying, and also what has been said through the thread.
To your point, you are seeing a relationship a->b->c->d, etc… I don’t think there is any relationship. I think that was more a coincidence in the sample (@hmd_pouya should confirm if there is any link/relationship there). To me, I thought @hmd_pouya only wanted to know the first occurence from col1 and last occurence from col2 for the values of col3. I never thought that the data was linked that way.

IF the data is linked to each other, then I full agree with you, that the groupby would not work, at least it would not be guaranteed to work in all cases.

aworker · October 20, 2021, 2:01pm

Thanks @bruno29a for your last email and understanding of the point. It is exactly what I thought and that’s why I suggested this generic solution for perhaps a very simple -groupby- problem.

Definitely only @hmd_pouya can tell us what it is really his need. It seems that the -groupby- fullfills his need so the simplest the best !

Still I think the generic solution will help other people to solve more generic problems if they land on this thread in future …

Best

Ael

hmd_pouya · October 20, 2021, 2:02pm

Row0, Row1 and Row2 cannot have the same value in column 3 as row 2 is not connected to Row0 or Row1 (“a” and “d” do not appear in Row0 or Row1).

hmd_pouya · October 20, 2021, 2:12pm

The problem is simple. There are some paths and the links of each path has the same key. The letters show start and end for each link. I just want to find the start and end of every path (every unique key). The links are in order.

col 1|col 2|col 3
a | b | x
b | e | x
e | d | x
e | m | y
m | c | y

In this example, I look for a->d with key “x” and e->c with key “y”. Your solution was flawless and what @Daniel_Weikert suggested was simple and right to the point needed for my use case.

Regards,
H. Pouya

aworker · October 20, 2021, 2:15pm

Thanks @hmd_pouya for clarifying your need. I was definitely biased thinking of a more complicated need. Definitely as I said, the simplest the best

Best

Ael