Data exploration

Hello,

Before joining dataset A and dataset B on respectively column C and D,
I like to check what values are:

  • only in C
  • only in D
  • in both

Any recommendations on doing this efficient and user friendly way?

Thank you!

PS: Knime is great!

1 Like

Hi there!

so for only in C and only in D you can use Reference Row Filter node. To see values that are in both you can use Joiner node with Inner join as join mode.

Br,
Ivan

Thank you for your quick answer.
Indeed what you suggest is correct.

Do you have an idea on how to present this information in a user friendly way?
For instance seeing only unique values and counts next to them for each of the 3 groups?

Hi!

Don’t quite understand your question regarding presenting in a user friendly way… Maybe try explaining a bit more what are you doing :slight_smile:

Br,
Ivan

Sure. Thank you for your hep.
Context is joining two datasets and I want to see what is lost in a join. (I currently do it manually by comparing Value count nodes of column C and D.)

The solution you suggested is correct but it does not give a an overview of the answer for a data analyst.
The solution is scattered in 3 different nodes while it would be fantastic to have a single table
summarizing the 3 cases.
It would be even better if there’s a distinct of each key value and the associated count of records.

Bernard

Hi Bernard!

Sry for slipping this one. I see now. I have composed a workflow which might do the trick. It is a start and you can modify it according to your needs. First you perform a Full Outer Join with Joiner node and afterwards using Filtering, Grouping and combining you can get everything in one table. Here is workflow picture:

Also here is workflow to check it out:
DataExploration.knwf (28.8 KB)

Regarding data exploration I would encourage you to check Statistics and new Data Explorer (JavaScript) nodes :wink: