how to solve this problem?

Hello, i am new here.

i have a table like this: i need to find the time difference between per user(ID)

ID Date
A4 22-10-2014
A4 22-11-2014
A55 7-7-2014
A55 7-9-2014
A55 7-10-2014

my first plan is to get the output like this:

ID Date1 Date2 Date3 ...
A4 22-10-2014 22-11-2014    
A5 7-7-2014 7-9-2014 7-10-2014  

and then i subtract the columns for each user ( date2 - date 1), how to generate this table?

the problem is maximum number for date is 500 times, seem like it is weird to have column until date500.

can anyone help me?

 

Hi Xting, and welcome.

You can use the Pivot node to achieve such an intermediate table. But as you suspected, there are simpler ways. Like in most instances when dealing with time, it's easiest to convert your time values to numerical values first. There are a lot of ways for that, depending on the type of your input values and the expected result. After that, you can use a GroupBy node to calculate the difference. Just use the ID as a grouping column and range as the aggregation method.

Hi Xting,

attached you will find a small example workflow. I would not pivot your table if it is not necessary. I used a group by to select the user and the Time Difference node to calculate the difference between two consequent rows (option previous)

I sorted the subtable in  the group loop start by time.

Is this the result you wanted to achieve?

Cheers, Iris

 

Thanks all for helping me, appreciated.

hi Marlin, sorry for this novice question,  issit you mean i need to use "string to date" > "groupby" node to do that? issit groupby can calculate the  time difference?

hi Iris, sorry for extra question, i got around 5 million of records (rows), and the looping 1 by 1 seem too slow, is there got any faster method?

thanks.

did pivot help in this case? 

Hi Xting,

A GroupBy (alone) can not achieve the same result as Iris' suggestion. I understood your question in the sense that you wanted the total range per group, not individual differences per row. If that is what you want, yes, it can calculate differences between time values, either in days or in milliseconds (the aggregation methods are called "Date range").

The Group Loop Iris used does not iterate over single rows, but over groups. So you'll probably end up with only a few ten thousand iterations. And you are right that loops have a performance overhead, but in this case the alternatives would have to iterate more often instead. Also, even on a desktop machine, a few ten thousand rows should be done in a matter of hours, even more so with such a simple calculation. Just be sure to store the results in case you accidently disconnect a node.

Hi Marlin,

Thanks for your help, appreciate alot.