Enter Missing Value for Duplicate based on condition

Hi everyone,

unfortunately, I have quite complex dataset I want to work on with KNIME… Here’s my problem:

One observation is not represented by one row as the data comes from two different sources. So the variables of the first data source are copied in as many rows as needed to display all data for this observation from the second source (e.g.: one product with 123 sales is copied in 10 rows as there are 10 components involved).

If I use Groupby, I drop a lot of information coming from the second data source. If I leave it untouched, I can’t sum e.g. the sales as they are counted multiple times then. I can’t change the structure of the original data set unfortunately.

My idea now was to enter a missing cell for the duplicate sales (not to remove the row). The the sum would be correct and I don’t lose the information in the other cells…. I couldn’t find any solution which does not involve removing the rows, so maybe you can help me out?

Thanks in advance!

KR Sabine

Hi Sabine,

I’m sure there are couple of ways to deal with you challenge but instead of trying to figure out what is your exact structure and desired output and thus guessing what could be best approach an example with data would really help :wink:



This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.