Problem while concatenating via the GroupBy node Value 0

nooby · November 11, 2019, 1:59pm

Hi there fellow Knimers,

I am struggling to understand, what went wrong with the concatenation in a GroupBy node.

GroupBy set up:

group by city
aggregate by street(concatenate city)

The result is similar to the following:
City A | street 1, street 12, street 123
City B | street 234, street 245, street 267
City C | street 345, street 367, street 389

So far so good. However, there are like 15 streets, where the Concatenate lists them as empty. Funnily enough, when I filter on the city with those 15 streets in a Rule-Based Row Filter I can see the streets in the Spec - Columns in a column named “Value 0” and the Concatenate street column is empty.

I’ve already tried running the same action in a different ETL program and it works fine.

My question is, if it is possible to move the values from this column “Value 0” to the concatenated column.

Cheers,
nooby

ipazin · November 11, 2019, 2:08pm

Hi there @nooby,

Can you share workflow example where this can be observed?

Br,
Ivan

nooby · November 11, 2019, 2:19pm

Hi @ipazin,

not really, it’s sensitive data from an intern school project with more than 10 million data sets.

The node seems to work with smaller data, but when it hits the millions, I get the 15 missing rows.

Hope you understand.

Cheers,
nooby

ipazin · November 11, 2019, 3:40pm

Hi there @nooby,

I see. I tried to reproduce it with 10 million rows, 500 clusters (cities) and 50 items (streets) but everything was ok.

Can you maybe try with dummy data using this nodes/configuration? Maybe I missed something…

BigGroupBy

Br,
Ivan

mlauber71 · November 12, 2019, 5:51am

Is it a matter of the size of the data if so could you filter just to the cities with the left out streets and try again to see if this is the issue.

The you could check the writing of there are some strange characters there that might influence the process although they should not.

If it is an issue of data size might want to cocider looping thru a list of cities so the concatenation process would have to deal with fewer lines at a time.

Also could you tell us wich other ETL tool you used and which function.

And then it is easier to discuss such an issue with a small KNIME workflow.

Vernalis · November 12, 2019, 9:54am

It might be that those groups are bigger than the maximum group size if you have large dataset as you describe, under the ‘Advanced settings’? Try increasing this setting to a much bigger value:

Steve

ipazin · November 12, 2019, 11:49am

Hi there,

If the case is what @Vernalis thinks it is exactly what was mentioned here

Nice one Steve

Br,
Ivan

nooby · November 12, 2019, 12:35pm

Hi there,

thank you all for the replies!

@ipazin, I had difficulties creating matching data and caused my PC to crash, silly me. I need around 10 Million rows and round about 50k of unique values. Null values, should also occur - City with new streets, that aren’t noted, yet. For example. What do I need to enter in the Data Generator to get such data?

@mlauber71, I have similar thoughts, that the problem may lie on the memory. I’ve ran the nodes with a different memory policy, but got the same result. Furthermore, I’ve tested your approach of running some Loops. It did got quicker, but I’m still missing those damned 15 rows.
I’ve ran a similar node in Alteryx and it returned no errors or missing values.

@vernalis, good point. I’ve already done that by increasing the bounds up to 50k, since I get around 46k of unique values. However, I still get the “Value 0” column.

The needed info is snown in the “Value 0”

But not in the concatenate

Hope this brings some clarity,
nooby

mlauber71 · November 12, 2019, 12:43pm

Is it an accident that the misisngs are in the largest state (California). You try to make unique and (transpose) up to 50k streets into one single cell. I am not really comfortable with that.

Maybe you think about other approaches to get and store such a list (JSON). Also it might be that the missing streets may contain some strange characters, line breaks that come into conflict with KNIME structure.

Maybe you explain what you want to so and waht your data structure is so we could think again.

Vernalis · November 12, 2019, 1:43pm

Can you try following the table filtered for only the missing vales in the concatenation with a Extract Missing Value Cause node, and see if that sheds any further light on the situation?

There’s something really quite strange here - the fact that the concatenated value shows in the ‘Value_0’ column suggests it is being created correctly, so maybe there is something strange happening with the renderer

Steve

nooby · November 12, 2019, 3:49pm

@Vernalis, I’ve cleared all whitespaces, brackets, etc. and inspected the flow with the Missing Value Cause. So far, nothing new.

However, I’ve changed the aggregation from Concatenate to Set and it works - the full data is shown! I want to have the set as a normal string, so that’s why I’ve used a Column Rename node and chanded the value from SetDataValue to StringValue. This is where it gets magical… miraculously, the values disappear, when I open the finished result after the column rename O.o

I’ve never felt more confused.

Cheers,
nooby

s.roughley · November 12, 2019, 7:29pm

The column Rename node doesnt really do type conversions despite appearences. could you try using the Collection Size node (from the Vernalis community contribution, latest version on the nightly build update site) to check that there are actually the number of members in the Set that you expect, and then maybe try the Collection to String node (again from the Vernalis community contribution,but also in the stable release) and see if that works?

https://hub.knime.com/Vernalis/extensions/com.vernalis.knime.feature/latest/com.vernalis.nodes.collection2string.ColToStringNodeFactory

Steve

(yes, that’s the same ‘Steve’ as @Vernalis!)

temesgen-dadi · November 14, 2019, 6:13pm

Hi there @nooby ,

I was able to create this workflow (street_names_concat.knwf (17.0 KB)
) simulating your situation. This seems to be just a rendering problem as @Vernalis indicated earlier. On my MacBook Pro, everything looks fine. The problem seems specific to Linux. The cells are not actually empty, the concatenated string is there. You just can’t see them You can verify this by copying and pasting the content of the cell to a text editor. You can even write it to a CSV file. Nevertheless the developers will be looking in to it.

Best,
Temesgen

nooby · November 15, 2019, 10:00am

Morning @temesgen-dadi,

This is some next level David Copperfield trick. Is there a reason, why Knime dislikes Linux?

Cheers,
nooby

system · May 15, 2020, 10:00pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.