I am stuck with my data analysis on following problem:
I have one variable “car” and another called StartLongitude (and latitude). I now want to calculate the EndLatitude (and Latitude) for the car variables, at least for as many as possible.
An example of an extract of the variables would look like this:
Car StartLongitude
1 48.3678
1 48.377798
1 48.1009
2 48.000099
2 48.98220
2 48.876232756453
2 48.5367
2 48.729486182375
These are already sorted according to start time. That means that car 1’s second position of StartLongitude is its first position of EndLongitude. This means that for EndLongitude, it is the data of StartLongitude just one row up, expect the last value of each car.
That means that I can calculate (in theory) all EndLongitude values, except the last end-position of all cars.
I know this is a quite specific problem. I have no idea which nodes to use, and since I am not familiar with java, would have also no idea how to put that into a java code.
I would very much appreciate it, if there is anyone that can help. Thank you so so much to all of you in advance!!
Hi,
if your records are previously sorted, you can use a LagColumn Node to create a new column with same data shifted by one row.
You can repeat two times (lat and long)
Bye!
I don’t know much about JavaSnippets.But this is a way to solve your problem. calc_difference.knwf (31.1 KB) . It uses a Lag Column node inside a Group Loop (car). But to have the Lag Node work properly you need to sort your file in descending order.
@HansS: I used your workflow and it worked perfectly, thank you!!
Maybe, you could also help me with an additional question:
I tried to calculate the endLat because the endLat column I have has lots of missings. The endLat I calculated now - thanks to your workflow - does also have some missings (1 for each car). If I now want to insert the value from the endLat column I had in the first place into the column that I have just created for the values that are missing, which node would you recommend me to take?
In cases where I don’t know how to proceed, I always think of a javasnippet, but maybe there is another way?
check here options how to simulate lag column with negative value in order to avoid using Sorter node twice (on the larger data set it can take a while): Lag Column with Negative value?
To deal with missing values I would recommend Rule Engine node. Fast and simple to use