Solutions to “Just KNIME It!” Challenge 23 - Season 3

:sun_with_face: Happy Wednesday, folks! As usual, a new Just KNIME It! challenge has just been posted. :boom:

:chart_with_upwards_trend: Imagine that you want to investigate a few hypotheses in a population and need control data to assess your results. Often, you will have to create synthetic data to this end – which is your task this week with this statistical data puzzle! :bar_chart:

Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason3-23 .

:sos: Need help with tags? To add tag JKISeason3-23 to your workflow, go to the description panel in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. :blush: Let us know if you have any problems!

3 Likes

My solution to the challange:

I heavily used the Random Number Assigner (Apache) node for the task as it has normal, beta and gamme distribution as well (I didn’t know about this node so far… I’m constantly amazed by how many KNIME nodes I haven’t explored yet!)

About my approach:

I generated ages using normal, heights using beta, and weights using gamma distributions. To be more realistic I have added some random noise (+/- 5 * rand()) to smooth out the categorical appearance in the data and introduce more natural variation.

While I’m sure one could create a more scientifically accurate population model, this is my initial attempt, and I’d love to hear your thoughts! I’m especially curious if there are ways to make my workflow more precise or if anyone has ideas for improving the realism of the data :slight_smile:

5 Likes

Here’s my solution. Component has a data panel for each of the age groups. “Adult” below as an example. Since the challenge didn’t specific a gender, I tried to develop height and weight data which includes both females and males.



Note: 10/18/24 Added table with age group counts.

4 Likes

My submission for challenge - Views with Bins .



3 Likes

Here is my solution.

I used the dedicated notes for the different distributions and that was quite a good experience.

I think someone may want to review the Description of the Gamma Distributed Assigner:

image

This sounded like e.g. if your Peak is 70 that the scaling should be e.g. 75 or 65… however this lead to weird results incl. the node ignoring the min/max boundaries. It seems that it works somewhat similar to the Beta Distributed Assigner (where p is ~1)…

Anyways here’s what my component looks like - just a histogram that can be filtered by the “binned” age and height dimensions.

Here’s the link to my solution:

5 Likes

I agree - the Gamma Distributed Assigner doesn’t seem to work as described.

2 Likes

Hi all,
Here is my solution.

The height of children (defined as those under 18 years) was not accurately modeled due to an insufficient sample size.

3 Likes

Hello all,
Here is my solution.

As @MartinDDDD said, the “Gamma Distributed Assigner” node did not work appropriately. In my case, I set max and min value for weight but both of them were not reflected.
So my workflow contains a person he has very high weight (more than 500kg… :joy:)

3 Likes

Hi all,
This is my solution. It was very difficult for me to create data for children effectively. As a last resort, I categorized the data further based on age and height :sweat_smile:

3 Likes

Added a gender categorization. Also had trouble with using the Gamma Distributed Assigner to generate the weights and just used it to generate a modifier for a weight calculation using BMI

2 Likes

Hi, here is my solution to this challenge.

I additionally used BMI for classification

2 Likes

:sun_with_face: Hi, folks! Happy Tuesday!

:bar_chart: Here’s our solution to our data generation challenge.

:eyes: It’s very interesting to notice how different parameter values change the generated data, and even their correlations, substantially. Since in this challenge’s story this data will be used as control (baseline) for scientific experiments, this gives the scientists great control.

:handshake: Thanks for your very nice contributions, even including other attributes!!, and see you all tomorrow for a new challenge on cohort analysis!

2 Likes

Hi All,
I am absolutely throwing in the towel on this one. I could not get the distribution nodes to generate distributions that made any sense to me… here is what my data app looked like maybe some one will spot my errors:

nothing fancy with the workflow itself:

In the end, i will do what i have done for almost two decades… ill head over and see how @jproudfoot111 solved the problem.

Thank you for the challenge!
L