Data Analytics for my data

Z1745566 · February 19, 2016, 8:42am

Hello,

I am very new to analytics and knime, please forgive me if I am asking some basic questions.

I have data with 9 parameters from some vehicles. [Vin, timestamp, latitude, longitude, speed, etc] and over 20 million rows. (Vin is unique number for each vehicle)

My requirement is to build a model using this 20 million rows, so that when I give (speed, latitude and longitude) of a new vehicle - it should predict other parametrs and to which vehicle it becomes or atleast quoting the similarities with the vehicle data which we had previously.

Please help me how to implement this in KNIME. I appreciate your help.

Thank you

Gabriel_Cornejo · February 20, 2016, 3:24am

Hello Z1745566:

There are several methods you can apply in your data to achive your goal, It seems to be a predict model that you need, for example, you can do linear regression, knn, M5P (Weka Extension) even you can do artificial neural network.

Good luck.

Gabriel.

mauuuuu5 · February 21, 2016, 12:28am

Hi you can also check a Knime video called "Building a basic Model for Churn Prediction with KNIME", but exactly what do you want to predict or in other words what are your independent and depend variables??

Cheers

Z1745566 · February 22, 2016, 3:21am

Hello Gabriel,

Thanks for your help & suggestion.

I am completely new to KNIME, if possible can you upload some example doing the same - so that I will try to follow you. I really appreciate your help.

Z1745566

Z1745566 · February 22, 2016, 3:35am

Hello mauuuuu5,

Thanks for your reply, I have seen that video which only states yes or no. Is there a way to predict multiple parameters at a time?

In Churn predicton while deploying, she has given only one row to predict, but in my case I will give data of a vehicle with multiple rows of (speed, latitude and longitude). By taking this new data set, my model show predict to which vehicle (vin) is similar to from my old data set.

For my work,

1) I need to predit all the other parameters if I give only the data of speed, latittude & longitude of a new vehicle.

2) I need to find out similarities between the new data set and old data set.

mauuuuu5 · February 22, 2016, 4:50am

Hi, I have experience with decision trees and they predict just one categorical variable, such as A, B, C, and D but you can "build" that categorical variable from a quantitative one by using the Auto-Binner Node that "allows to group numeric data in intervals - called bins". For instance you take a quantitative variable such as the Market Value of the vehicle and divide them into 4 or 5 categories such as:

A: 0 USD to 15.000 USD

B: 15.001 USD to 30.000 USD

C. 30.001 USD to 45.0000

D 45.001 USD to 60.000

and use the independent variables (speed, latitude and longitude) to predict in which bin (A,B,C,D) the car will, In other words you can predict the Categorical Market Value of the vehicle using the independent variables.

If you want to see how a decision tree work please google "r2 d3 A Visual Introduction to Machine Learning"

Regarding you second point maybe you can see if there is a correlation between the variables for instance Old Speed vs New Speed or Old Latitude vs New Latitude, you can use the Linear Correlation node or the HeatMap (JFreeChart) node you should mind which variables are numeric and nominal.

I recommend you to watch this knime video "Strategies for building Predictive Models" as it uses a Data mining methodology called Crisp DM.

Let me know if you need more help

Cheers

Z1745566 · February 23, 2016, 12:49am

Thank you mauuuuu5, for your help. I will start doing something and will let you know if I have any further questions.

doloop · May 1, 2016, 5:45pm

I realize that the above question is a few months old, but just in case Z1745566 is watching, the problem sounds like a "nearest neighbor" problem to me. Check out Analytics > Mining > Misc Classifiers > K Nearest Neighbor.