Strategic approach on how get same and different data with limited source in KNIME

Hello,

I have a question wherein the amount is the only source. This is the example that I do manually:

Actual Amount data in one column:
1
2
3
4
5
6
7
-1
-2.04
-3
.4.18
-5

Then, when we will remove the negative sign and sort it by ascending which will look like this:

1
1
2
2.04
3
3
4
4.18
5
5
6
7

So with this, we will now identify the matched, unmatched and unique.

1 - matched
1 - matched
2 - unmatched
2.04 - unmatched
3 - matched
3 - matched
4 - unmatched
4.18 - unmatched
5 - matched
5 - matched
6 - unique
7 - unique

Is this possible to incorporate in KNIME? Thank you!

Hi @trafalgarlaw , the duplicate row filter can be configured to mark rows rather than remove them, but doesn’t quite get what you want here.

What is the difference between “unmatched” and “unique” in your example?

It looks like you chose “unmatched” if there is a non-integer version of a given number eg 2 and 2.04. Is that the rule?

Also, is "matched’ where there is one other row with the same value, or where there is one-or-more rows with the same value?

1 Like

Hello @takbb,

Thanks for responding…

Basically, what we are trying to attain here is if the amount positive or negative will equal to zero which will be tagged as “MATCHED”, but if not, “UNMATCHED” then if there is no corresponding number partner, it will be tagged as “UNIQUE” like number 6 and 7.

Here are another sample data all in one column:

0.01
0.02
0.03
0.04
0.05
0.06
0.07
-0.01
-0.0204
-0.0305
-0.0412
-0.05

So with these set of data if we will separate it into two column it will look like this:

Positive Negative Difference Tagging
0.01 -0.01 0 MATCHED
0.02 -0.0204 -0.0004 UNMATCHED
0.03 -0.0305 -0.0005 UNMATCHED
0.04 -0.0412 -0.0012 UNMATCHED
0.05 -0.05 0 MATCHED
0.06 0.06 UNIQUE
0.07 0.07 UNIQUE

In a nutshell, this is what we are leaning to get in our output. So not sure if this is capable to be implemented in KNIME.

This workflow works with your data sample, but may not generalize well.

2 Likes

Hello @rfeigel, may I ask how can I download the workflow?

Right click on the title in the box and then click on the download icon.

Hello @rfeigel, already got it! Thanks for your response but I think it will not cover all because all of these data are in one column that is why in my first statement, we do sorting of amount then whichever will be the partner or possible a unique amount.

I’m not sure I understand. My data is all in one column in the Table Creator node.

1 Like

Apologies for the confusion, I did not provide the other case or sample data which look like this:

82
82
82
82
82
85
87
-82
-82
-82.05
-85

Ideally, it should be like this.

Positive Negative Difference Tagging
82.00 -82.00 0.00 MATCHED
82.00 -82.00 0.00 MATCHED
82.00 -82.05 -0.05 UNMATCHED
82.00 82.00 UNMATCHED
82.00 82.00 UNMATCHED
85.00 -85.00 0.00 MATCHED
87.00 87.00 UNIQUE

Appreciate if you could help @rfeigel @takbb @bruno29a

Hi @trafalgarlaw , I’m still not getting it. You appear to want a different output now to the one you initially showed. You also haven’t explained to me what you consider a “number partner” to be. Maybe it’s obvious to you so could you please state exactly the requirement.

I also don’t see the purpose of all the getting rid of the negative signs that you did in the first example if you then need to produce the table containing negative signs in you later example. I guess that was you describing your method rather than part of the requirement.

Better to describe your input data, your required output data, and the transformation rules rather than trying to suggest an approach as it gets confusing to people about what is the requirement and what isn’t.

I can make assumptions but I’m not going to because that increases the iterations required to get to the solution. So instead I have more questions.

I have questions such as:

  1. Is -0.09 considered a partner for 0.00?

  2. Is 0.09 consisted a partner for 0.00?

  3. Is 82.02 considered a partner for -82.01?

  4. Is 82.02 considered a partner for -82.03?

  5. Is 82.9 considered a partner for -82?

  6. Is 79.99 considered a partner for -80.01?

  7. Is 80.00 considered a partner for -79.99?

  8. Is 80.01 considered a partner for -79.99?

Can you explain in each case why it is or isn’t a “number partner”.
Thanks.

4 Likes

I’m with @takbb, i.e. thoroughly confused. I think I provided a solution for your first data set since it had an order which could be subjected to rules. The second data set has no apparent order. For example, the first three “82.00” are compared to the first three “-82.x”. The next two “82.00” have nothing to match. I’m struggling with how to write rules that differentiate such data in a single list.

2 Likes

Really sorry for the confusion and did not provide the complete set of complex sample data.

But to answer your questions, I put it in a table for better presentation so better disregard my first approach on getting rid of the negative sings to avoid more confusion. We could stick on the table presentation for better understanding.

Now, let’s go with the questions. It seems all the numbers you’ve provided are all ‘UNMATCHED’ given that the requirement to be ‘MATCHED’ is no discrepancy between the two meaning even .01 difference could be tagged as ‘UNMATCHED’. So, it is critical not to round-off any decimal to ensure the accuracy of number matching.

Just to simplify, we may tag ‘UNIQUE’ for all if there’s no number partner to avoid complexity of the tagging which I put here.

Positive Negative Difference Tagging
82.00 82.00 UNMATCHED
82.00 82.00 UNMATCHED
87.00 87.00 UNIQUE

We may go like this:

Positive Negative Difference Tagging
82.00 82.00 UNIQUE
82.00 82.00 UNIQUE
87.00 87.00 UNIQUE

Not sure if this clarify the confusion / queries. But still happy to provide further information should be needed. Thank you so much for putting some attention to this, it will really help a lot!

How many rows can an actual data set have?

Hello @rfeigel, it could be a lot like 500, 1k, 2k depends on the data or amount.

Hi @trafalgarlaw

I still don’t have a clear confirmation of the difference between unmatched and unique. What is the cutoff or rule that determines this choice.

You said that all of the list i gave are unmatched, so is it unmatched if one number is positive and the other is negative, and the absolute difference is less than 1.0 , but greater than 0.0?

I believe it’s clearly defined rules like this that are needed if you are going to get a solution outside of machine learning :wink:

1 Like

I agree with @takbb that clear definitions of the categories are critical. Let me reiterate that the order of the list is equally important so we can do proper comparison of entries. That’s where I’m stuck.

3 Likes

Hello @takbb

I’ve missed the important info which will answer all of your questions. The number partnering if the difference will fall up to ‘.03’. For example 80.01 and -79.99, this will be a partner as the difference is 0.02 which will be tagged as ‘UNMATCHED’. Then if sample amount is -0.09 and 0.00, difference of this is -0.09 which will fall as ‘UNIQUE’.

In summary:

If no difference, it will fall as ‘MATCHED’
If difference is less than or equal .03, it will fall as ‘UNMATCHED’
Everything above .03 difference will fall as ‘UNIQUE’

Feel free to ask more questions for clarity. Thank you!

@rfeigel

1 Like

This still doesn’t solve the problem of how to pair data elements.

1 Like

Hello @rfeigel, what exactly do you mean?

I have questions such as:

  1. Is -0.09 considered a partner for 0.00? - Unique as difference is more than .03
  2. Is 0.09 consisted a partner for 0.00? - Partnering is only positive and negative amount
  3. Is 82.02 considered a partner for -82.01? - Unmatched as difference fall in .03 threshold
  4. Is 82.02 considered a partner for -82.03? - Unmatched as difference fall in .03 threshold
  5. Is 82.9 considered a partner for -82? - Unique as difference is more than .03
  6. Is 79.99 considered a partner for -80.01? - Unmatched as difference fall in .03 threshold
  7. Is 80.00 considered a partner for -79.99? - Unmatched as difference fall in .03 threshold
  8. Is 80.01 considered a partner for -79.99? - Unmatched as difference fall in .03 threshold

Answered also all the questions. Hope you can help @takbb. Thank you!

Adding some details. @takbb

I have questions such as:

  1. Is -0.09 considered a partner for 0.00? - Not a partner and will be tagged as Unique as difference is more than .03
  2. Is 0.09 consisted a partner for 0.00? - Partnering is only positive and negative amount
  3. Is 82.02 considered a partner for -82.01? - Yes considered partner but will be tagged as Unmatched as difference fall in .03 threshold
  4. Is 82.02 considered a partner for -82.03? - Yes considered partner but will be tagged as Unmatched as difference fall in .03 threshold
  5. Is 82.9 considered a partner for -82? - Not a partner and will be tagged as Unique as difference is more than .03
  6. Is 79.99 considered a partner for -80.01? - Yes considered partner but will be tagged as Unmatched as difference fall in .03 threshold
  7. Is 80.00 considered a partner for -79.99? - Yes considered partner but will be tagged as Unmatched as difference fall in .03 threshold
  8. Is 80.01 considered a partner for -79.99? - Yes considered partner but will be tagged as Unmatched as difference fall in .03 threshold