Check that item data is consitently categorised

Andy_D · June 21, 2024, 8:03am

Hi all

I have a data set comprising of a couple of hundred thousand lines of picking transactions. Each of the items picked is identified by a distinct, individual SKU number and in categorised according to its characteristics - meat, fish, produce, homeware etc. There may be several hundred transactions picking the same SKU.

I have noticed that a few transactions have anomolas category allocation - eg an item that usually categorised as say meat, comes up as fish - I’ve spotted a couple in the first couple of hundred lines, but obvs can’t go through 200k by eye

Is there a way of comparing the SKU column with thecategory column and returning a count of the anomolies and/or the row numbers where the category and SKU do not match the mode?

Cheers
A

yogesh_nawale · June 21, 2024, 8:51am

Hello @Andy_D ,
Welcome to KNIME Community.

To solve this problem you can use combination of Group By , Joiner, Rule Engine and Row Filter node.

I have uploaded a demo workflow for better understanding

QnA7.knwf (80.9 KB)

Regards,
Yogesh

MartinDDDD · June 21, 2024, 8:51am

Don’t think there is anything that does that job out of the box.

The way that I’d start tackling this is checking the “unique” SKU => Category pairs in your data. Simplest way to do that is by using GroupBy Node and select SKU and Category Columns as Group Columns. Might be worthwhile to add any column as aggregation column under Manual Aggregation tab with Aggregation Method Count.

Then add Sorter Node and Sort by Count Column Ascending. That way you get the SKU => Category pairs at the top that have the lowest count and therefore might be candidates for wrong allocation…

If you can get your hands on “Master Data” that maps any SKU to a valid Category that’d be even better and give more options.

Andy_D · June 21, 2024, 9:14am

Thanks so much Yogesh!

I’ve downloaded your workflow, but can’t seem to open it - I can build the workflow from your diagram, did the demo contain suggested configuration?

Andy_D · June 21, 2024, 9:15am

Thank you Martin, I really apprceiate that, I’ll give it a go!

yogesh_nawale · June 21, 2024, 9:24am

Hi @Andy_D,

Are you using the latest version on KNIME Analytics Platform.
You just have to go to Space Explorer ----> Click on three Vertical dots -----> Import workflow
and the open it.

Yes the workflow contains same configuration as the image.

Andy_D · June 21, 2024, 9:36am

Ah, I can’t see that option on teh three dots.

Looks like v5 of the platform

yogesh_nawale · June 21, 2024, 9:41am

Its here

Andy_D · June 21, 2024, 9:43am

Eh, THAT three dots

Thanks so much for your patience Yogesh - I’m relatively new to Knime, and this is the first workflow that I’m using in the field

Andy_D · June 21, 2024, 10:00am

I’ve managed to import now - that’s pretty much working as I was hoping

Thanks so much again Yogesh!

yogesh_nawale · June 21, 2024, 10:06am

Hi @Andy_D ,

Happy to help.
If you are satisfied with the solution, you can mark solution as solved because this helps people get directly to the solution if they have the same or similar question.

Regards,
Yogesh

system · June 28, 2024, 10:06am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.