Hi everyone,
unfortunately, I have quite complex dataset I want to work on with KNIME… Here’s my problem:
One observation is not represented by one row as the data comes from two different sources. So the variables of the first data source are copied in as many rows as needed to display all data for this observation from the second source (e.g.: one product with 123 sales is copied in 10 rows as there are 10 components involved).
If I use Groupby, I drop a lot of information coming from the second data source. If I leave it untouched, I can’t sum e.g. the sales as they are counted multiple times then. I can’t change the structure of the original data set unfortunately.
My idea now was to enter a missing cell for the duplicate sales (not to remove the row). The the sum would be correct and I don’t lose the information in the other cells…. I couldn’t find any solution which does not involve removing the rows, so maybe you can help me out?
Thanks in advance!
KR Sabine