If a column is similiar in the data extract based on rule but not all duplicate columns have to be removed, there are 3 rules so unique function cannot be used

|UCN|Atrribute|

12345678 A
12345678 B
12345678 C
12345678 D
12345678 E
12345678 F
22222222 D
33333333 E
44444444 F

I want a java snippet code for extracting those ucn’s which meets the following condition:
IF UCN is same with attribute A and C the ucn with a attribute will be selected
If ucn is same with attribute B and D then ucn with B attribute will be selected
If ucn is same with attribute C and E then ucn with c attribute will be selected
else if ucn is not same then it will be selected as is.
so we will select ucn with C attribute only when it is not same as any other atrribute, same for attribute D and E.

In above example output will be:
12345678 A
12345678 B
12345678 C
22222222 D
33333333 E
44444444 F

Would we write a java snippet for this? because there is no function like “IN” in knime

@AlexanderFillbrunn can you help?

@AnotherFraudUser @Marlin Can you guys help with some java snippet code for this?

1 Like

Hi @hansa,
Is it correct that you basically look for every row at the row 2 steps ahead? In that case, you can sort your table in reverse so that F is first and A is last (see Sorter node). Then you can use the Lag Column node with a lag step of 2 and count 1 to get the row 2 steps behind. Then you can use the Rule Engine to determine what to output and finally you can sort it back to the original order. Does that make sense?
Kind regards,
Alexander

2 Likes

I am sorry, I can’t understand what you said.
I think you didn’t understand the question!
I need all the ucn in the given data set, only when there is a comparison between 2 similar types of ucn with attribute a and f I will keep the row with ucn with attribute a
and if similarily, same ucn has attribute b and e then I will keep the row with attribute b, similarily same ucn has attribute c and f then I will keep only row with ucn which has attribute c. But I will not completely ignore ucn’s with attribute d,e,f! They will be in the output table if their ucn’s are not duplicated with ucn’s of attributes written above(a,b,c) in that order.
Condition is only when ucn is same, otherwise all ucn’s are extracted.
With “IN” function it can be easily achieved : For e.g. If UCN IN Attribute(A) = UCN IN Attribute(C) then select UCN with Attribute(A).

Is there a similar function in knime or a java snippet code that might help?
I tried column expression but its not meeting the exact criteria, also pivoting didn’t help much.

Hi,
I don’t get it. A, B, and C also have the same UCN, so why are they all included in the output? Is there a difference between the groups A, B, C and D, E, F? You also write

I need all the ucn in the given data set, only when there is a comparison between 2 similar types of ucn with attribute a and f I will keep the row with ucn with attribute a

but below in your example with IN, you compare A and C. In your example in your first post, A and C are still both there, though.

Or can it be formulated like this: “Every attribute must occur in the output exactly once. If there are multiple occurrences, keep the one that has no duplicate UCN.”? I have added a workflow that outputs what you need based on your example, but I am not sure if this is the logic you want.
Alexander

UCNs.knwf (10.9 KB)

3 Likes

I am sorry!
My mistake in the first example: conditions are:

  1. a UCN with A attribute should not also have D attribute
  2. UCN with B attribute should not also have E attribute
  3. UCN with C attribute should not also have an F attribute

Only except these above condition duplication(of ucn) is allowed, For e.g if a similar UCN has A and B attribute, then both should be kept, comparison is between A and D not A and B in that case.
I hope I am more clear now.

I will attach my workflow as I tried a java snippet code for this condition.
testucn.knwf (10.4 KB)

@AlexanderFillbrunn
Thank you for your time and effort!
I have posted my workflow and I hope I am more clear now!

1 Like

Does this look like the output you were targeting?

testucn.knwf (69.0 KB)

1 Like

testucn.knwf (72.4 KB)

I added a sort before the groupby node to make sure that the Attributes always fell in alphabetical order for the later formulas. It wasn’t necessary for your sample data, but might be for your real life use case.

1 Like

Does this mean that you would be able to solve this if you did not need to do this via Java Snippet? As in if you found “IN” in Knime?

Here it is:

I’m just not sure how you would use it though. Unless your “IN” is something different? (The one I showed you is quite commonly used as it’s explained in the description. Most DB systems use IN like this also).

1 Like