Calculation on changing Column Names

sven-abx · December 9, 2021, 12:13pm

I’m accessing a table with all sales on every product. Now I like to generate a list of top selling articles and the changes in the last 30, 60 and 90 days.
Example:
Article 105822
Units sold (total): 16
Last month: 1
Delta 30 days: -5
Delta 60 days: -8
Delta 90 days: -14

Something like this.

BR,
Sven

ipazin · December 9, 2021, 12:34pm

Hello @sven-abx,

I see. But still don’t understand/know how your data structure looks like and column names are changed. Can you explain it a bit more with data input and output?

Br,
Ivan

sven-abx · December 9, 2021, 12:40pm

Hi @ipazin,

the column names change because they are based on month by number.
so in january the table will look like 9/10/11/12
in february 10/11/12/1 (i hope this will work)

BR,
Sven

bruno29a · December 9, 2021, 2:39pm

Hi @sven-abx , like @ipazin , I also do not understand the use case.

If you want to rely on column position, then you can use the column index via the Column Expressions as @ipazin suggested.

Basically, you would access the columns as column(0), column(1), etc…, column(n).

You can check this thread to see how it works:

morpheus · December 9, 2021, 5:27pm

Hi @sven-abx,
the math formula (multi column) node has also the option to use static and variable ($current column$) columns. With the enforce exclution option you are able to keep the node flexible.

BR

bruno29a · December 10, 2021, 5:48am

Hi @sven-abx , I put something in relation to your last comment

Since I don’t have a good sample data from you and I don’t know how you are doing your calculation, I put something together that shows how to access the columns dynamically. You can adapt it to your use case. The workflow looks like this:

My sample data is this:

Testing with an input of a January date:

I get this result:

As you can see, it’s accessing the columns from 9 to 12.

And testing with an input of February date:

It will access the columns 10/11/12/1:

So, you can add your calculation in the Column Expressions.

Here’s the workflow: Calculation on changing Column Names.knwf (35.1 KB)

kienerj · December 10, 2021, 6:48am

How do you get the data?

I’m not 100% sure but my intuition tells me it will be much easier if you have the data in “rows instead of columns” eg. duplicate aritklenr. and then a sales column and months column (=constant column names). Then you sort on artikelnr and months ascending and use 3 lag columns to calculate differences.

sven-abx · December 10, 2021, 8:10am

hi @kienerj,

thanks for your reply. first i was thinking to do it the way you describe, but then i fiured out, that it is possible no sales happend in a month and this month is then missing.

hi @bruno29a,
thanks for your reply, i will give it a try.

@ipazin maybe this helps to clearify what i like to do:
i want to calculate the changes (delta) in sales from 30, 60 and 90 days to today.
example: in november (11) 1 item of this article was sold and in october (10) 6. so sales decreased about 5 items.
the difficult thing is: the column names can change, because the workflow is executed once a month and loads the data of the previouse months. on the first of december i fetch tha data of november, october, september and august. in january december, november, october and september and so on.

BR,
Sven

ipazin · December 10, 2021, 10:24am

Helllo @sven-abx,

I do understand the logic but don’t know how your input data looks like, whats output format you want and are only column names changing or also number of columns is changing. Without that we can only guess…

Br,
Ivan

sven-abx · December 10, 2021, 10:33am

hello ivan @ipazin,

the values will stay the same. and maybe better the position of the column will change, but not the name.
in the end i like to write the data to a csv-file.

br,
sven

bruno29a · December 10, 2021, 1:41pm

Hi @kienerj , the lag option is a good idea, but the sorting is an issue, especially if you are doing end of last year + beginning of this year.

As per @sven-abx 's example, for February, you would be taking these months: 10/11/12/1. So technically speaking, it should be sorted as 10, 11, 12, and then start over with 1, 2, 3, etc… If you sort, it would actually be 1, 2, 3, …, 10, 11, 12. We don’t see the data to know if there is any value that indicates which year is the months for. If there is some sort of year, then yes, you can sort properly. If not, then the sort would not work.

mlauber71 · December 11, 2021, 5:52am

@sven-abx I think it would be best if you could provide us with an example of the changing names. It might be an artificial one that would demonstrate missing names also.

In any case there will have to be some indication, maybe information in a data row that would give you an indication of a starting or an endpoint and even then if a missing February could either be 6+ or 7+ I guess we will have a hard time coming up with a solution.

KNIME can do all sorts of logics and tricks but if there is no consistent system (how strange it might seem) and no indication it will not be possible.

You might have to talk to your data source since representing important information in such a way carries severe risks of misinterpretation.

knimediger · December 11, 2021, 7:52pm

Some time ago I created a workflow which is running on a regular base on SAP data (xls export).
After some months I had the problem that the structure of the data changed in terms of columns (same data in other cols).
I solved that by renaming the columns after reading the xls data file.
To allow a type of configuration I’m using a (xls) config file. That file defines the columns to be renamed.

Using that approach I have no need to touch the workflow itself. It is operating on the renamed columns only.

In this case I would rename like

in_xls_file | in_knime_WF
January | month
December | month-1
November | month-2

kienerj · December 13, 2021, 9:31am

You can add the year to the sort as well:

artikelnr
year
month

well if the data is missing, you will need to have checks for this either way and fix it. I don’t see here a difference between the approaches

Daniel_Weikert · December 13, 2021, 6:12pm

The moving aggregation allows to calculate absolute differences but you need to unpivot the data first.
br

gonhaddock · December 14, 2021, 11:53pm

Hi @sven-abx
Is this topic still alive? I agree with @Daniel_Weikert
I solved recently a related topic, and the solution came in Daniel’s pointed direction.

BR

sven-abx · December 16, 2021, 9:24am

hi all,

sorry for my late reply. the topic is still on my todo list @gonhaddock.
i will check your solution.

thanks and br,
sven

Francescots · December 16, 2021, 10:24am

I think you pivoted the values. You do not need it because you can use “the lag column” node.
This node picks value from a previous row and pull over the actual row.
Just take care about the order…your values should be ordered by date or by an index, hence I suggest to use a “sorter” node before lagging the column.
Once you did it use a math formula or a column expression to get delta.

I hope it will help.

gonhaddock · December 17, 2021, 11:27pm

Hi @sven-abx
I’ve anticipated a possible solution that covers all the mentioned bullets in your posts. As I interpreted, these are:

Customized reporting column names

Nulls handling

Calculate differentials on a monthly basis (or any other customised granularity)

Comments:
I’ve divided the workflow in two parts: The upper section brings some raw retail transaction data from the last past 12 months. And also a design of your desired report granularity, numbered in months to reporting date.

There’s a small intermediate section that cares about report sorting for rows and columns.

The lower section iterates over the report item: populating the missed, and calculating the differentials. The applied method to calculate differences is based in ‘Moving Aggregated’ node functions as anticipated in my previous post.

All relevant nodes are highlighted in shaded grey. The workflow ends with two reports allowing you to compare the results. If you inspect on LOREM item row, you will realize how Null Handling is achieved.

Count Items Report:

Differentials Report:

BR

system · January 20, 2022, 8:38am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.