I am new to knime and find this software very fascinating.
What i am after?
need to compare Month On Month for 12 Month Data sets.
For Eg Base Month July 21 against August 21 and July 21 against September 21 and so on.
Comparison for Sales by customer.
i have csv data sets for each month with around 200000 rows. (I am using Csv Reader node 12 times)
I have used Group by node on each Csv reader node ( to group each Month by customer, State etc)
Now i am not sure what to do next.
What i want is to line up sales for EACH MONTH next to July 21 customers.
In excel one can use vlookup to line up sales for each month and create a new column.
You may not have to use 12 csv reader nodes if you select “files in folders” instead of “file” (warning: your files must be standardized for this to work).
With files in folder you would stack all your 12 months vertically in one gib table. With lag column you could potentially compare your month values
If you use a loop with column append end node you have them stacked horizontally (similar to when you use joiner node for all your 12 files and one or more join criteria, e.g. customer name). Then you can calculate difference between the columns
as already pointed out by victor_palacios data sample is certainly helpful here.
br
Just trying to focus a bit more in the challenge (once I saw your example). We had a post in the forum happened last December month. The challenge was also focused in retail data and difference calculations.
There were some differences with your described challenge:
the requested granularity was based on a month by month basis (instead of a pinned reference month)
the target field was the product instead the client
the differentials were presented as quantities instead of percentages
However I think that there are few similarities with that challenge, and maybe you can get some insights from it, aiming to plan your next steps.
You can see a simple example here where I used your excel sheet to make sure my Quantity YOY matched yours with 3 nodes (I did something silly with reading in the data, but you can ignore that and just check out the 3 nodes Sorter, Lag Column, and Math Formula which will replicate what you did in Excel.)
This is a starting point and other forum members have pointed you to other threads with similar topics.
The Transpose node may also help when you read several files at once but the format can be adjusted with other nodes as well to replicate this simple example.
Hi,
The lag column gives you the lag specified. If you want a different lag from -1 then change the setting in the node. If you want to fix it to a base you can simply add this base number as a constant column and then divide by this column value.
br