Solutions to "Just KNIME It!" Challenge 21

alinebessa · June 15, 2022, 1:25pm

This thread is for posting solutions to “Just KNIME It!” Challenge 21.

Here is the challenge: Just KNIME It! | KNIME

Feel free to link your solution from KNIME Hub as well!

Have an idea for a challenge? We’d love to hear it! Please write it here .

And remember: the more you participate, the more participation badges you may end up getting. Fancy, huh? Just remember to correctly mark your solution in the Hub with tag justknimeit-21.

nithinth7 · June 15, 2022, 2:48pm

Hi,

Here is my solution:

ersy · June 15, 2022, 5:56pm

Hi everyone,
Here is my solution.

walkergv77 · June 15, 2022, 7:40pm

Here is my solution. It’s been a long time since I have done any stats. Hopefully I did that right.

Ilkka · June 15, 2022, 8:12pm

Hi guys, here’s my solution.

Ilkka

martinmunch · June 15, 2022, 8:58pm

Here’s my solution

I calculated the absolute spend to compensate for the switch columns (resulting in negative spend)

cheers

lelloba · June 16, 2022, 8:24am

Hello everyone,

here is my solution to this challenge:

I decided to consider negative costs errors and to filter them out.

As regards correlation, I found the following:

if you consider the average time and average cost per customer, there seems to be no correlation;

immagine1013×764 33.5 KB
if you consider the overall time and cost per customer instead, there is a good positive correlation.

immagine1016×767 32.1 KB

RB

Andrew_Steel · June 16, 2022, 12:49pm

Hi Knimers,

here

is my solution for challenge 21.

It can be seen that both in terms of costs and times, participantID 612 and 603 are leading.

Best Regards
Andrew

arddashti · June 16, 2022, 5:22pm

Hello KNIMErs, Here is my solution for Challenge 21

pio · June 16, 2022, 8:41pm

Here is my solution

AnilKS · June 17, 2022, 5:45am

My take on Challenge 21- Eating out…
Simple approach… basic nodes

kowisoft · June 17, 2022, 9:15am

Here’s my take:

My findings were as follows:

The top 10 are actually a top 12 because if we take the money spent as ranking criteria (and also consider the time spent) we have multiple #1s #6s and #10s. For the #10s there is - as far as I can see - no other way to determine any other criteria that would qualify any one of 3 #10s to be the one to be included in the list, hence it is only “fair” to include all of them

Lessons learned for me: This would be something I would discuss with the stakeholder when I come up with a first solution → is there any other criteria that could limit this list to a “true” top 10?

Participants who spend most money are NOT equal to the ones spending the most time. I determined this from the fact that #3 in money spent is actually #13 in time spent.

One additional lesson learned (thank you KNIME ): Instead of using String to Date&Time Node, I used the CSV Reader Nodes transformation tab → one node less

Also wanted to work with column expressions, but couldn’t find a way to calculate the time difference with the formulas given (that would then be another node less). Curious how the official solution will solve this

MEPivnenko · June 17, 2022, 12:00pm

I found myself giggling when I saw the total expenses. I hope that the scale is in hundreds or thousands.
Nevertheless, KnimeIT_21 – KNIME Hub

rfeigel · June 17, 2022, 3:27pm

Here’s my solution.

REF Challenge 21.knwf (639.1 KB)

Christian_Essen · June 17, 2022, 10:20pm

Hi Knimers,

In my opinion, the data set is difficult with regard to a “standard” correlation analysis. This presupposes that the individual observation points are statistically independent of each other. This is not the case for participants, which appear several times in the data set. Among others, this paper deals with the topic of correlation analysis with repeated measures. A mixed-model approach seems to perform best here. However, this is a method that goes well beyond the pure KNIME capabilities.

Grouping of participants, as seen in many justknimeit solutions, would be one way to deal with this, but is not optimal, as one loses information in the process.
An example:
A dataset with only one participant who eats twice in total. Once for 5 minutes for 10 Euro and once for 20 minutes for 20 Euro, i.e. positive correlation. In the aggregated analysis you would see only 10 minutes and 15 Euro.

What do you think? Do you have a suggestion? Or do I see the whole thing totally wrong?

siry · June 18, 2022, 1:55am

si_daniel_a · June 18, 2022, 5:20pm

Hi, this is my solution for Challenge #21

I made two outcomes for the challenge, (1) using total amounts regardless of the participation rate, (2) using average amounts based on total amounts divided by participation rate

kwatari · June 19, 2022, 2:30pm

Hi, here is my solution.

It’s my Day 15 with knime today and this is my first try of Just KNIME It.
(Don’t judge me too hard. )
I enjoyed it, thanks for the opportunity.

jefleisc · June 19, 2022, 10:25pm

Below is my solution:

Rubendg · June 20, 2022, 1:48pm

Hi,

Here is my solution, for the just knime it 21 challange.