Joining datas

ConstantD · December 10, 2020, 12:07pm

Hi everyone,

I have an issue. I have 2 file that needs to be joined.
In my case, I have a file i which I have products, their size and unit sold (sales)
On the other one, I have all the products the shop has, and its stock.
Basically, I want to join the files so to have all the products by size, their sales and stock.
(I want to have the products with sales and stock, but also the ones with no sales but have stock)

Do you know how I can do? Ive been using joiner node but it seems more complicated than that

Kathrin · December 10, 2020, 12:53pm

Hi @ConstantD,

if I understand your issue correctly, the Joiner node is the correct node
The only thing you probably need to change is the join mode.

Let’s say

Table 1 is the table with the columns products, size and units sold
Table 2 is the table with the columns products and stock
and you connect Table 1 with the first input port and Table 2 with the second input port of the Joiner node.

If you want to have all rows from Table 2 and you want to add information in case the product is available in Table 1 you would need the join mode right outer join.

Best,
Kathrin

PS: This video might be helpful to understand the different join modes: https://www.youtube.com/watch?v=6BigLM6vbhs&t=57s

ConstantD · December 10, 2020, 4:48pm

mhmm i still struggle with it.
The thing is, if i make a joiner, i also want to have the products which have no sales
no sales but still stock

mehrdad_bgh · December 10, 2020, 5:06pm

Hi @ConstantD

Can you send your workflow or data and result table that you want? It will help to understand your problem.

morpheus · December 11, 2020, 5:31am

Hi @ConstantD,

i would also use the Joiner Node as @Kathrin suggested but joining the data by switching the table order and with left outer join mode setting.
The reason is that your stock table contains all products and therefore it should be the master table. As resutl you get a table containg all products with their stocks, sizes and unit solds.
Based on this table you can now apply different Filters e.g. Rule-base Row Filter, Java Snippet Row Filter or a set of Row Filters to get your final outcome.

BR
Hermann

ConstantD · December 11, 2020, 8:12am

W49.xls (376.5 KB)
Here it is !

morpheus · December 11, 2020, 8:59am

Hi @ConstantD

i quickly checked your file.
The first issue i found is, that your STOCK-Table contains multiple entries for same reference and size.

e.g.
Reference: 013240/41
size: TU
Records found: 23
stock values: 64, 29, 1, 17, 1, 1, 24, 4, 136, 105, 82, 49, 58, 58, 67, 62, 165, 46, 225, 73, 12, 44, 38

The same happens in your sales Table
Reference: 013240/41
size: TU
Records found: 8
Sales values: 1, 1, 1, 1, 2, 1, 2, 1

The problem is, if you join this data for this example you will endup with 8x23 replicated records. This data are not further processable.

The first step i think is to aggregate the data (e.g. summarize the stock and sales) to have unique records for both tables.

A further question is what das a sales value of -1 mean?

BR

ConstantD · December 11, 2020, 9:18am

Ok I see what you mean. What we can do is to make a sum of the sales, and stock, and then, i won’t need the size actually. So make another sum of sales, based on the reference, (not the size anymore)

-1 sales is a return from a client, no importance on that

ConstantD · December 11, 2020, 9:21am

the thing i forgot to do is that i have to make a vloop up for the reference. These are wrong and some of them change depending on the size. HERE would be a joiner node, based on two criteria, size and reference right? left outer join?

morpheus · December 11, 2020, 5:16pm

Yes this are the settings for the joiner if the stock table is the ‘master table’

Selster · December 13, 2020, 7:52am

Hi:

It would be helpful if you could provide a mock table(s) showing the result(s) you would like to see(?)

I think in your tables that the products are identified by the columns called “reference”. One table has a column
that shows the # of sales, the other a column showing stock (on hand?).

It would be easy to join the two tables if they had a consistent in-common id #. But, I am seeing that there are a number of products (reference ID’s) that turn up in the left table that are not in the right table and vice versa, although you have said that this could be the case. I think needs to be addressed first, possibly?

Sure, some items appears more than once, but is the date of sale column (?) missing, possibly the duplicates are the results of sales on different days?

Sure, you could variously count things such as the number of sales. For that, a pivot table in excel will give you a quick result.

Copy of W49 no revisions.xlsx (111.2 KB)

I am sure it could be done in knime, but I would be reluctant to move forward unless more consistency between the product ID’s was in place. Does that make sense?

Even without and changes, pictures/reports of you data could still be painted, although it would be best to know what you would like to see. For instance, 015760 J
appears in the left table (with sales) but there is one 015760 J’s, size 30, in the left table; In the right table, we find: (The items appear somewhat unordered because sometimes, J or J. or J… are used)
015760 J, size 32
015760 J size 30
015760 J size 29
015760 J size 26
015760 J. size 30
015760 J. size 22
015760 J… size 29
015760 J… size 29
015760 J… size 29
I suspect that more data cleaning would reveal why size 29 appears 3 times.

Perhaps it would be useful to see which items were selling and which are not? But a lot of this falls into the the area of data cleanup and, sure, queries or joins or various other kinds of reports can help you to discover which portions of your data need cleaning up.

But using queries and the like to find out which parts of the data need to be cleaned up is quite different than doing joins to really find matches between two lists.

Thank you,
Steve Elster

system · June 13, 2021, 7:52pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.