Hi there fellow Knimers,
I am struggling to understand, what went wrong with the concatenation in a GroupBy node.
City A | street 1
City A | street 12
City A | street 123
City B | street 234
City B | street 245
City B | street 267
City C | street 345
City C | street 367
City C | street 389
GroupBy set up:
- group by city
- aggregate by street(concatenate city)
The result is similar to the following:
City A | street 1, street 12, street 123
City B | street 234, street 245, street 267
City C | street 345, street 367, street 389
So far so good. However, there are like 15 streets, where the Concatenate lists them as empty. Funnily enough, when I filter on the city with those 15 streets in a Rule-Based Row Filter I can see the streets in the Spec - Columns in a column named “Value 0” and the Concatenate street column is empty.
I’ve already tried running the same action in a different ETL program and it works fine.
My question is, if it is possible to move the values from this column “Value 0” to the concatenated column.
Hi there @nooby,
Can you share workflow example where this can be observed?
not really, it’s sensitive data from an intern school project with more than 10 million data sets.
The node seems to work with smaller data, but when it hits the millions, I get the 15 missing rows.
Hope you understand.
Hi there @nooby,
I see. I tried to reproduce it with 10 million rows, 500 clusters (cities) and 50 items (streets) but everything was ok.
Can you maybe try with dummy data using this nodes/configuration? Maybe I missed something…
Is it a matter of the size of the data if so could you filter just to the cities with the left out streets and try again to see if this is the issue.
The you could check the writing of there are some strange characters there that might influence the process although they should not.
If it is an issue of data size might want to cocider looping thru a list of cities so the concatenation process would have to deal with fewer lines at a time.
Also could you tell us wich other ETL tool you used and which function.
And then it is easier to discuss such an issue with a small KNIME workflow.
It might be that those groups are bigger than the maximum group size if you have large dataset as you describe, under the ‘Advanced settings’? Try increasing this setting to a much bigger value:
If the case is what @Vernalis thinks it is exactly what was mentioned here
Nice one Steve
thank you all for the replies!
@ipazin, I had difficulties creating matching data and caused my PC to crash, silly me. I need around 10 Million rows and round about 50k of unique values. Null values, should also occur - City with new streets, that aren’t noted, yet. For example. What do I need to enter in the Data Generator to get such data?
@mlauber71, I have similar thoughts, that the problem may lie on the memory. I’ve ran the nodes with a different memory policy, but got the same result. Furthermore, I’ve tested your approach of running some Loops. It did got quicker, but I’m still missing those damned 15 rows.
I’ve ran a similar node in Alteryx and it returned no errors or missing values.
@vernalis, good point. I’ve already done that by increasing the bounds up to 50k, since I get around 46k of unique values. However, I still get the “Value 0” column.
The needed info is snown in the “Value 0”
But not in the concatenate
Hope this brings some clarity,
Is it an accident that the misisngs are in the largest state (California). You try to make unique and (transpose) up to 50k streets into one single cell. I am not really comfortable with that.
Maybe you think about other approaches to get and store such a list (JSON). Also it might be that the missing streets may contain some strange characters, line breaks that come into conflict with KNIME structure.
Maybe you explain what you want to so and waht your data structure is so we could think again.
Can you try following the table filtered for only the missing vales in the concatenation with a
Extract Missing Value Cause node, and see if that sheds any further light on the situation?
There’s something really quite strange here - the fact that the concatenated value shows in the ‘Value_0’ column suggests it is being created correctly, so maybe there is something strange happening with the renderer
@Vernalis, I’ve cleared all whitespaces, brackets, etc. and inspected the flow with the Missing Value Cause. So far, nothing new.
However, I’ve changed the aggregation from Concatenate to Set and it works - the full data is shown! I want to have the set as a normal string, so that’s why I’ve used a Column Rename node and chanded the value from SetDataValue to StringValue. This is where it gets magical… miraculously, the values disappear, when I open the finished result after the column rename O.o
I’ve never felt more confused.
The column Rename node doesnt really do type conversions despite appearences. could you try using the
Collection Size node (from the Vernalis community contribution, latest version on the nightly build update site) to check that there are actually the number of members in the Set that you expect, and then maybe try the
Collection to String node (again from the Vernalis community contribution,but also in the stable release) and see if that works?
(yes, that’s the same ‘Steve’ as @Vernalis!)
Hi there @nooby ,
I was able to create this workflow (street_names_concat.knwf (17.0 KB)
) simulating your situation. This seems to be just a rendering problem as @Vernalis indicated earlier. On my MacBook Pro, everything looks fine. The problem seems specific to Linux. The cells are not actually empty, the concatenated string is there. You just can’t see them You can verify this by copying and pasting the content of the cell to a text editor. You can even write it to a CSV file. Nevertheless the developers will be looking in to it.
This is some next level David Copperfield trick. Is there a reason, why Knime dislikes Linux?
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.