Find Past Months Value with Criteria

Hi,

So I have a data such as this:
image

My goal is to find the past months qty value from this set of data with a criteria of ID and store name.
Here’s the overview of the expected output I wanted to achieve:

I guess this scenario need some looping but I don’t understand the logic behind it.

I also attached the sample data in Knime-based.
Sample Data.knwf (13.9 KB)

Thanks in advance.

Hi @Kemuel , you can use the Group Loop, grouping on ID and Store (and possibly Year if you have more than a year of data), and within the Loop, you can use the Lag Column node to the Past 3 and 4 Months Qty

Hi @bruno29a ,

Sorry for the late reply. Thanks a lot for your help. I’ ve tried it and it works fine.

Sorry I’d like to ask one more question, if there are some gap with the month data, for example after month 1 there’a a gap in data so the next available month will be month 6. If we use lag column, the gap between each months must be equal right.

Is there anything I can do to solve this issue? Thanks in advance

Hi @Kemuel , yes the gap would be an issue for the Lag Column since it would lag on the existing rows only.

There are a few ways to solve this issue. If you want to keep using the Lag Column, you can temporarily add the missing months with value 0. This can be done during the Loop.

Basically:

  1. Create a table with Month with values from 1 to 12.
  2. Then, inside the Loop, do a Left Join of that table with your data table joining on Month
  3. This will give you 12 Months, with some Qty missing. Those with missing Qty would be the Months that do not exist in your dataset
  4. Convert all missing Qty values to zero using the Missing Value node
  5. Apply the Lag Column as you have already and do your sum
  6. Join (Inner) back with your original dataset on Month to keep only the Months from your original dataset
  7. Close the Loop

So, Step 5 should already be part of your existing workflow. The rest you add before and after.

If you need help, please you share your workflow and give some sample data with gaps in month (preferable a mix of both with and without gaps so we can show that it works for both cases), I can modify your workflow to add the changes above.

3 Likes

Here’s the current workflow including the data with gaps in months:
Sample Data2.knwf (34.9 KB)

Thanks.

Hi @Kemuel ,

These are the nodes that I added to your workflow:

The Table Creator simply contains 12 rows, with values from 1 to 12 (each month number)
image

I added the Extract Date&Time Fields in order to be able to separate the months (and year) so that it’s the same as what you showed in your original screenshot - it’s needed to add the missing months.

So, in the Group Loop Start, I also added the Year as a criteria:

The first iteration of the Loop starts with this:
image

When I run the Left Join, I get this:
image

So we can see that we now have the rows for all 12 months.

After we run the 2 Lag Columns node, we can see the correct result for month 6 and 8 for Qty(-3) and Qty(-4):

We then run an inner join back with the original dataset that had only months 6 and 8, so the inner join would return the rows only for months 6 and 8:
image

And here’s the final results:

Here’s the workflow: Sample Data_Bruno.knwf (31.0 KB)

EDIT: Just to add a note: These take care of the gaps as you mentioned gaps. But looking at your dataset, you also have the contrary, that is you have duplicate months for the same year and same ID. So this creates the reverse issue, which was not part of the question :stuck_out_tongue:

If that is an issue, then you need to define what needs to be done in these cases. One option is to probably do a groupby first and aggregate the Qty per date.

Here’s how I’m addressing this issue by doing the aggregation. I’m replacing the sort node by a groupby, and then moving the sorter after the join:

Here’s the new result:

Here’s the updated workflow: Sample Data_Bruno.knwf (32.9 KB)

3 Likes

Hi @bruno29a ,

Thanks a lot man, I really appreciated all of your help and clear guidance on how to managed to do this.
Kudos to you :grin:

2 Likes

No problem @Kemuel , happy to help

Hi @bruno29a ,

Sorry another quick one. Even after the Year is looped, why it doesn’t read the qty value with different year?

Here I attached your workflow with the additional data.
Sample Data_Bruno Updated.knwf (67.1 KB)

Take note on the value here:
image

The value of the first “?” should have been “1”, because the past 3 months of Jan-2022 should read the value of Oct-2021.

Thanks

Hi @Kemuel , indeed since the group by is being grouped by the year also, it will not see the data from the previous year, so if your 3 or 4 months span over 2 years, they will not qualify in this type of group by.

So, given your missing dates, it would be too complicated (compared to the new solution) to have to fill up all the missing dates of every year and then do a Lag Column and then remove what replaced the missing dates.

I’m going to take a different approach and not use the Lag Column in the end, but I will salvage the same approach that I used with the Join.

In this case, the approach will be to basically get that the date of what 3 and 4 months ago of the Dates in each row:

We can then left join with the original dataset on these dates to get the Qty of each of these dates.

The new workflow looks like this:
image

Results look like this:

Comparing:

Here’s the new workflow: Sample Data_Bruno_2.knwf (19.7 KB)

2 Likes

Hi @bruno29a ,

Thanks a lot for your help. I can learn so much from this.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.