hi,
i have a dataset with ip, date, and time, I want to consolidate all the records grouping by ip and date having time between each of them less than 1 hours.
Here I would convert your dates and times to a timestamp using the String to Date&Time node. Then, you can use a Date&Time Difference node to figure out the length of time between each record. As needed, you could do some consolidation with a GroupBy node (by IP address, for example).
have checked your data but have a question about desired output. Should it be additional column stating unique group identifier or some aggregation/list? Also in this case how would you group?
ok let me explain.
I have a dataset and I would like to keep only the records having the same IP address, same date and time between thems are less than 1 hour
ok. So you want to filter but I’m still not getting your logic 100%. In output you both included duplicates and excluded them. Also what to do in case I mentioned in my first reply? I guess when you have only one record in day you leave him?
when you have one record in day you can leave or remove it’s not a problem.
but when you have for exemple 3 event’s like
11.11.133.130 12/03/2005 18:11:55
11.11.133.130 12/03/2005 18:12:00
11.11.133.130 12/03/2005 21:22:32
you remove the last one because it has more than 1 hours between him and the previous one.
note that the ip may be dupllicated on output depending on the date .
maybe something like this. Combine your date and time. Convert it to Date and for each group calculate time difference in seconds against previous row. Then using new Row Filter node exlude rows with difference greater than 3600
you are missing Column Combiner node in which you should combine date and time columns. If your delimiter is space you can use following syntax in String to Date&Time node: dd-MM-yyyy HH:mm:ss to get Date&Time column.