Dears Knimmer,
I’m going to to create a model that can predict the driver responsible for a given trajectory within a set of GPS data. The data set includes the daily driving trajectories of five taxi drivers over a period of six months. The trajectories to be classified contain all GPS records of a driver in a single day.
I use data collected from five drivers over five days, resulting in 25 records and feeding neural network. I have used Pytorch to implement to code and got the result. in parallels I want to use knime and get result as well to find out which model is better.
Dataset Description
plate
longitute
latitude
time
status
4
114.10437
22.573433
2016-07-02 0:08:45
1
1
114.179665
22.558701
2016-07-02 0:08:52
1
0
114.120682
22.543751
2016-07-02 0:08:51
0
3
113.93055
22.545834
2016-07-02 0:08:55
0
4
114.102051
22.571966
2016-07-02 0:09:01
1
0
114.12072
22.543716
2016-07-02 0:09:01
0
Now, I need some tips to start this job with KNIME, I would appreciate if somebody can get me a clue.
That’s a great implementation. KNIME currently has more integration for Keras instead of PyTorch. To get started with, you can view here for some documentation of Keras in KNIME. To view a list of nodes and sample workflows, you can click here.
Dear @jinwei_sun Thanks for your replay, in order to do it, I need to meet the below tasks:
Merge the CSV files to create a single dataset. ( Done by Knime Loop file reader )
Preprocess the dataset by dividing the GPS locations into grid cells. (How?)
Further preprocess the dataset by dividing a trajectory into two sets of sub-trajectories, seeking and service ( based on the status which is 1 means taxi is occupied and 0 means a vacant taxi.) How?
Would you please get some advice how to implement 2 and 3?
If you need to convert GPS locations to grid cells, you can use the Column to Grid node. By selecting both longitude and latitude columns, this node will create grid cells according to your specifications. However, some modifications may be necessary depending on your specific use case.
To split your dataset based on a specific column, you can use the Row Splitter node. To use this node, select the “status” column as the basis for the split and specify “0” as the pattern to match.
Feel free to leave a reply if you have any further questions.