Sequence classification by deep learning to predict taxi driver path

Dears Knimmer,
I’m going to to create a model that can predict the driver responsible for a given trajectory within a set of GPS data. The data set includes the daily driving trajectories of five taxi drivers over a period of six months. The trajectories to be classified contain all GPS records of a driver in a single day.
I use data collected from five drivers over five days, resulting in 25 records and feeding neural network. I have used Pytorch to implement to code and got the result. in parallels I want to use knime and get result as well to find out which model is better.

Dataset Description

plate longitute latitude time status
4 114.10437 22.573433 2016-07-02 0:08:45 1
1 114.179665 22.558701 2016-07-02 0:08:52 1
0 114.120682 22.543751 2016-07-02 0:08:51 0
3 113.93055 22.545834 2016-07-02 0:08:55 0
4 114.102051 22.571966 2016-07-02 0:09:01 1
0 114.12072 22.543716 2016-07-02 0:09:01 0

Now, I need some tips to start this job with KNIME, I would appreciate if somebody can get me a clue.

Hey natnazi,

That’s a great implementation. KNIME currently has more integration for Keras instead of PyTorch. To get started with, you can view here for some documentation of Keras in KNIME. To view a list of nodes and sample workflows, you can click here.

Hope this help.



Dear @jinwei_sun Thanks for your replay, in order to do it, I need to meet the below tasks:

  1. Merge the CSV files to create a single dataset. ( Done by Knime Loop file reader )

  2. Preprocess the dataset by dividing the GPS locations into grid cells. (How?)

  3. Further preprocess the dataset by dividing a trajectory into two sets of sub-trajectories, seeking and service ( based on the status which is 1 means taxi is occupied and 0 means a vacant taxi.) How?
    Would you please get some advice how to implement 2 and 3?


Hey Milad,

  1. If you need to convert GPS locations to grid cells, you can use the Column to Grid node. By selecting both longitude and latitude columns, this node will create grid cells according to your specifications. However, some modifications may be necessary depending on your specific use case.

  2. To split your dataset based on a specific column, you can use the Row Splitter node. To use this node, select the “status” column as the basis for the split and specify “0” as the pattern to match.

Feel free to leave a reply if you have any further questions.

Best regards,

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.