How to find data loss from the sample data

Hi,

Need quick help in the scenario:sample.xlsx (8.5 KB)

I have a sample data (Attached is the file). which contains id and nmssn
id: is the unique identifier.
nmssn: it increments by +1. if the nmssn number is not in sequence that number should be recorded as the missing data/data loss.

I want to list down the id’s with its missing nmssn numbers.
for example: for the id: 000A1 43 and 44 are the missing nmssn numbers.

Have to work on the data file with 10,000 unique id’s. How to create knime workflow for this/which node works best in this?

You can use Groupby to find min and max value in the nms subgroup.
Then use this numbers to generate sn using Counter Generator in the group Loop
left joining generated sequence with sn and filtering the missed values.

2 Likes

Hi @dgambhir_1 This is annother approach:

Missing.knwf (32.3 KB)

1 Like

Thanks for the suggestion. I tried as per your recommendation.
Attached is the workflow.
But using the counter generator I am getting just the min value. how can i get all the other values. I know i might be missing something. Can you please look into it.sample.knwf (1.0 KB) Need help.

thanks @iperez for the solution. I looked into it. But when i see results there is something which seems to be incorrect. when i see in the id 000A1 has 2 missing sequence (missing nmssn) and id 000A2 is having 3 missing nmssn. But as per your results its displaying 3 missing for id 000A1 and 2 missing values for id 000A2.

Somehow your WF is empty.

@izaychik63 Try it nowsample.knwf (13.6 KB)

Hi. Two things: I posted the corrected flow, excuse me there was a -1 missing. The final table is:

imagen

The results show that for 000A1 you have two missing vlaues between 42 and 45, for 000A2 you have one missing value between 22 and 24, one between 28 and 30 and one between 33 and 35. Do you need that detail or just the total count?

Missing 2.knwf (32.5 KB)

@iperez you got it, Thanks.
Yes i need that detail too. one missing value between 22 and 24, one between 28 and 30 and one between 33 and 35 + the total count

Sorry, it looks somewhat more complex.


sample Missed values.knwf (31.8 KB)

4 Likes

@izaychik63 Thanks a lot for your help. It worked :+1:

1 Like

Hello hello!

Here is a way without loop (might notice I’m not a fan of loops :smile:)
sample Missed values_ipazin.knwf (35.4 KB)

Br,
Ivan

2 Likes

@ipazin Masterly, thanks!

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.