I am brand new and installed knime mainly for one purpose.
I would like to Train a tool with certain strings like material names, material types, Features (all in string Format) where I already know the Prices and where I have put a threshold for expensive / non expensive
Input Format is Excel with the strings in various columns and one Price column respectively one column for Expensive / Non expensive.
The tool should create a table with strings for expensive items and identify in new tables “similar” strings to add the mark expensive.
Are there any Projects with similar Tasks, which I could study?
Thanks in Advance
Hi there @Labancz,
welcome to KNIME Community and KNIME itself!
No sure I totally understood your data structure. So you have one price and one Expensive / Non Expensive label column where label Expensive / Non Expensive is given based on threshold you defined or? In what format will new data be coming? Will price come as well?
Maybe best would be if you can share part of your data or create some dummy data that represents it good enough
I upload a sample table which is prized to “Train” the processor.
The same table Format but without Prices and slightly different names in the first columns would then to be judged (expensive 1/0) based on the learning data.
Example data in for training.xlsx (13.2 KB)
Columns A-G contains the strings which should form a Kind of pattern with which it is possible to predict if the line is expensive or not.
What is your data science/analytics background?
This sounds like a supervised machine learning problem; you need to think about what machine learning algorithm/technique is best for your application. Any Data Science course/website/textbook will cover these.
Only then should you start on implementation using KNIME. There are lots of options that can get you where you need to go.
If you’re looking for KNIME-specific introductory materials:
Here is the KNIME Machine Learning Cheat Sheet.
Here is the KNIME introductory course chapter on predictive analytics
Here is a series of books. The one on Practicing Data Science is probably going to be the most instructive for you
tnx for data. I have checked it and seems to me you can try two approaches:
- classification approach where you will predict 1/0 i.e. Expensive / Non Expensive (decision tree for example)
- or numeric prediction approach where you can predict result based on which you determine your label (that would be column K divided by column J from Excel example). After prediction you can use Rule Engine to assign label depending on prediction
In any case I agree with @elsamuel to learn a bit more about data science and KNIME if just starting with it. For workflow examples you can explore KNIME Hub.
thank you for your suggestions. As Elsamuel correctly predicted my background in data science is next to zero.
I wil therefore brush up my background Knowledge with the given links and only then step into the modeling of my Problem.
If any questions feel free to ask