Essentials on the Building of Workflows for Statistical Analysis

Hello!

I am new here (1st post) and am in dire need to learn how to:

  • Build the right workflows to structure a machine learning model.
  • How to deploy the results.

I watch some tutorials but didn’t find the answers I was looking. I would be rather grateful to anyone willing to help me!

Below is the narrative describing the excel file uploaded.

Example for KNIME.xlsx (19.0 KB)

OBJECTIVE
We made the decision to sell one product, only out of the five we currently sell. Thus, we want to establish which is that one product, that is predicted to sell the most according to the highest POISSON Probability value.

This is a small, simple example, and the idea is to establish the mechanics to define the workflows needed, which will be the same procedure to follow with an example involving thousands of rows and many columns. It will also help me devise new, additional workflows when creating other models. It is important to consider that, while the statistical tools were ran and applied to obtain the resulting values depicted (there may be some differences due to rounding), the choice of said tools as well as the results obtained are irrelevant.

THE MODEL
We will work with a 10-day observations analysis.

We will calculate the Poisson value, factoring in what is also referred to as Lambda, specifically as Poisson probability: P(X = x)

STATISTICAL TOOLS
The different POISSON values will depend on the variations of Lambda (same as Mean) and the value of these variations will be established by the forecast and various dispersion tools we will be using apart of Lambda, namely, EXPONENTIAL SMOOTHING, MAD-Mean Absolute Deviation and STDEV-Standard Deviation.

Each column depicting the values will be followed by its corresponding POISSON value, We will arbitrarily set the POISSON random variable as 1 and calculate the Poisson probability: P(X = x) in all cases, to compare values. Thus, the product we will select to sell from now on will be the one with the highest Poisson value. This highest value will appear under the SORT column.

Each statistical tool will have the corresponding POISSON value and in the last column we will then choose the product with the highest value.

AI-MACHINE LEARNING
I need to build workflows which will calculate:

  • Lambda
  • Exponential Smoothing
  • MAD-Mean Absolute Deviation
  • Standard Deviation
  • The Poisson Value of each of the mentioned tools above
  • SORT. The model will sort the Poisson values, from A to Z and the highest will correspond to the product selected to be sold from now on.

End of the procedure.

Hi Etzel,

Welcome to the KNIME forum!

If you are in general looking for example workflows about specific topics, please check the KNIME Hub: https://hub.knime.com/
If you then type something of your interest like machine learning you will get many workflows that will show you how to create machine learning models using different methods for different use cases. Of course you can also look for specific methods too.

With respect to deployment, this is depending on the method you are using for the model and the way how you want to deploy it. Are you planning to deploy it as a web application for example? then the KNIME Server would be helpful here.

With respect to the use case that you described I would recommend asking specific questions here to make it easier to help pointing you into the right direction.

Best,
Martyna

3 Likes

Yes, Martyna, you are right. It was probably a stretch to ask someone to help me build the model. If I would have someone sitting by my side I would likely solve the problem in less than an hour.
I am currently trying to find an expert freelancer in KNIME who will build this very small and simple model for me and show me how to build the workflows.
Thanks

You could find a list of trusted KNIME partners on the website. I would have to think about your task and see if I can come back with an idea. In general KNIME should be well suited to do any sort of statistical calculation and data processing with various forms of data (that is what it is made for :slight_smile: )

https://www.knime.com/partners/finder

4 Likes

I am currently considering 4 freelancers from an unrelated freelancer board who I have offered a $30 fee if they can build the very small and simple model (4 rows, 9 columns), show me how they do it and answer specific questions. They are all qualified AI/ML experts and are ready to accept my offer, but none of them is an expert in KNIME…

Hello!

I have been looking into the list of companies you sent me. The problem I see is that they are companies expecting big projects and I believe they would very likely not be interested in mine. QUESTION Is there the possibility getting a person from the forum or somewhere else who would help me with my tiny project? I would much rather pay him/her the $30 I am offering to other freelancers. The problem with the latter is that while they told me they can do it, they are not that experienced with KNIME.

Otherwise I think I will have to hire one of these guys.

Please let me know what you think, so I can move forward.

Thanks!

Would some KNIME expert in this forum help me create the AI model for a small incentive of $30?

Hi there @Etzel,

I suggest you either try doing it yourself and then asking questions here when stuck or go try find freelancers with KNIME expertise on some freelancing platform?

Br,
Ivan

2 Likes

Hi!

Reading my posts, you will notice that:

  1. What I am looking is for a KNIME expert to show me how to build the very small model I uploaded to the forum, as this procedure is light years ahead of trying to do it myself using tutorials.

  2. I have already discussed the issue with Martyna who provided valuable comments, in view of which I decided to look for freelancers to help me.

  3. As I mentioned, I have found freelancers interested in doing this for me for $30. However, the problem is that none of them is familiar with KNIME.

  4. Ideally, I envisioned that some expert in the forum could help me with this, receiving the small incentive offered. As this sounded a bit unusual, I contacted first KNIME support who gave me green light to post it, which I did.

I hope this clarifies the issue, Furthermore if you wish to help me, or know someone in KNIME who would, please let me know.
Thanks!

I started investigating your task but to be honest I have not completely understood what you want to do. And at least according to the excel formula your example seems to be wrong.

Then I could not immediately find the necessary formulas you want to use in Java or the Math nodes of KNIME (would have to investigate further).

Is there any chance you could define the job in more details, point us towards maybe an R package that could do the calculations or a formula and package in Java (I did not do a full google search) - or maybe Python.

Example for KNIME.xlsx (22.0 KB)

2 Likes

First of all, allow me to thank you very much for taking the time to view this project. Regarding the Excel model, however the example per se is fictitious, all the statistical components were run and the results are real and correct. However, even if they weren’t, it would be entirely irrelevant.

As explained in the narrative, what I want to do is to create a ML model that runs the statistical tools and chooses the one product (out of the 5) with the highest Poisson value.

You mention Java, R and Python. I am afraid I know nothing about coding and the whole point of using KNIME was precisely the fact that no coding is necessary to create ML models. Or am I wrong?

In an exchange I had with Scott Fincher (KNIME Support) who had seen the excel and the narrative of the model, according to his commentsI got the impression that KNIME has the statistical tools to create the kind of work I want to do

Thanks again for your comments and any other you wish to make in the future.

OK I put together a workflow trying to follow your description.

For the statistical tasks I use R functions and the package purrr to manipulate all rows at once. Not sure if there might be a more elegant way. The workflow assumes there is an ID column in the first column and the following ones are named M…

The intermediate results are still there to demonstrate and check how it is done. A final workflow might bring a lot of that together in one step. This is just to show the approach.

I used your instructions about calculating the poisson value (dividing by 1.000 and using 1 as standard value). Following this your best product would be No 4. Although I would check if this approach really does what you want it to do.

If this demonstrated approach entitles me to the $ 30 I would gladly point you to a local person rescuing squirrel babies. The money would be well placed there :slight_smile: :

https://www.facebook.com/Eichhörnchenhilfe-Bergisch-Gladbach-329224957473534/

7 Likes

Thank you for your interest in helping.

The problem is that you are building the model using computer coding, something about which I know nothing about, and would not be able neither to evaluate nor to replicate or use in any way. That is the reason I was interested in KNIME in the first place, namely that no coding is needed.

I am right now about to hire a freelancer but your post worries me very much, in the sense that it may after all not be possible to use KNIME for statistical analysis leading to the creation of a ML model. It may also be the case that KNIME does not have the statistical tools to be used to build the model…otherwise you would have resorted to it instead of writing code.

I would have to discuss this with the freelancer in question who i am about to hire. If KNIME cannot be used to build statistical ML models, there would not be any reason to hire anyone. This would be rather disappointing but I must find out KNIME’s capabilities in enabling me to do what I need.
Thanks!

My reason for using R is that it provides me with the necessary statistical tools to do what you described in your post. I was not able to find the methods within the genuine KNIME nodes. It might be possible that you could include some external Java packages but that again would involve Coding, although most of the data processing would then be done within KNIME instead of R.

KNIME is also a platform that happily connects to other tools and does not claim to bring every solution on its own. If an external tool like R and Python offers a solution KNIME will integrate it.

I would advise that you think about if this poisson statistics will give you the information you need using the other assumptions. And indeed you might want to consult someone who can check the results.

This is also why I would have liked to get real numbers that you would expect instead of a mixture between made up and real ones. It makes it much harder for a developer to get an idea if this is what you actually want. So my recommendation would be to create accurate numbers that one could check against. If these were the real numbers Excel might have a problem or the code would have to be tweaked. But you could investigate that from the version on the hub.

This is also why I left the intermediate steps in the workflow so you might calculate (by hand/Excel) if the R code does what you expect it to do. You might want to tweak that to your needs. I wanted to demonstrated that KNIME could do your task - and hope for some nuts for the squirrels :slight_smile: :chipmunk:

4 Likes

We may not be reading on the same page; allow me to clarify.

You write: “This is also why I would have liked to get real numbers that you would expect instead of a mixture between made up and real ones”* As I mentioned a couple of times, I don’t think that this is relevant but I can tell you that , to calculate POISSON, it will depend on the value of the Poisson random variable, the Poisson probability or the cummulative probability you wish to use, namely, P(X = x) or P(X < x) or P(X > x) or P(X < /- x or : P(X > / - x). All this will produce different values. But again. it is not important. What is really important is the “mechanics” of the structure, and whether KNIME can build a ML model to run the statistical tools, compare the results obtained and select on that basis, which is the ideal result. That’s all.

Regrettably, after reading your comments, I am more and more getting the impression that ML structures dealing with statistical calculations cannot be built without writing code, at least part of it. Should this be true, then i don’t think I would be able to use KNIME for my purposes.

Thanks

You are free to check out the example and adapt the formulas according to your needs. Concerning the ML workflow without code - I think this could be a challenge on its own to find such a tool that has these statistical functions without code.

In my opinion KNIME would offer you the most comfort in combining code and workflows where the code does perform limited tasks that you might easily be able to judge without scrolling thru hundreds of lines of code but since it is your project you will have to see what suites your needs.

Maybe if you find something you could tell us about so it might serve as an additional inspiration for KNIME. From my point of view, I would have liked some of the statistical concepts in the Math node(s) of KNIME instead of having to resort to R.

3 Likes

Hi there!

one of the key strength of KNIME is to make everything possible. We do not have every statistical model implemented, because of this and to give everyone the possibility to code we are providing a platform enriching GUI based elements with extensions, integrations and scripting methodologies.

We have nodes for calculating Exponential Smoothing ( Moving Average ) , MAD and StdDev is possible with the Group By . However those do not give you the Poisson values or Lambda.

If the external consultant provide you with a script you can still reuse it inside the respective KNIME nodes and combine it afterwards with GUI KNIME nodes.

@mlauber71 great workflow! Thank you for providing this. Independent of this, the squirrels and its eager human saving them definitely deserves money for some nuts, so I took care of this. In case someone else wants to join me, you need to write her a facebook message to get her paypal name.
:knime::chipmunk:

5 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.