set interval time with knime

willson · January 13, 2020, 8:01pm

hi,
i have a dataset with ip, date, and time, I want to consolidate all the records grouping by ip and date having time between each of them less than 1 hours.

exemple:
|10.80.64.44 |15-mars-2005|16:28:42|
|10.80.64.44 |15-mars-2005|16:28:42|
|11.11.133.130 |07-mars-2009|19:38:32|
|11.11.133.130 |08-mars-2009 |12:39:20|
|11.11.133.130 |08-mars-2009 |17:55:13|
|11.11.133.130 |08-mars-2009 |17:55:13|
|11.11.133.130 |12-mars-2009 |18:11:55|
|11.11.133.130 |12-mars-2009 |18:12:00|
|11.11.133.130 |12-mars-2009 |21:22:32|
|11.11.133.130 |15-mars-2009 |13:56:27|
|11.11.133.130 |15-mars-2009 |13:56:27|

ScottF · January 13, 2020, 8:28pm

Hi @willson and welcome to the forum.

Here I would convert your dates and times to a timestamp using the String to Date&Time node. Then, you can use a Date&Time Difference node to figure out the length of time between each record. As needed, you could do some consolidation with a GroupBy node (by IP address, for example).

Give those a try and see what happens!

willson · January 13, 2020, 11:11pm

thanks can I give u the data to make it, it’s dosent work for me;data.xlsx (718.5 KB)

ipazin · January 14, 2020, 8:52am

Hi there @willson,

have checked your data but have a question about desired output. Should it be additional column stating unique group identifier or some aggregation/list? Also in this case how would you group?

|11.11.133.130 |12-mars-2009 |18:11:55|
|11.11.133.130 |12-mars-2009 |18:31:00|
|11.11.133.130 |12-mars-2009 |19:21:55|

Br,
Ivan

willson · January 14, 2020, 9:48am

ok let me explain.
I have a dataset and I would like to keep only the records having the same IP address, same date and time between thems are less than 1 hour

exemple
input
source_ip date time
10.80.64.44 15/03/2005 16:28:42
10.80.64.44 15/03/2005 16:28:42
11.11.133.130 07/03/2005 19:38:32
11.11.133.130 08/03/2005 12:39:20
11.11.133.130 08/03/2005 17:55:13
11.11.133.130 08/03/2005 17:55:13
11.11.133.130 12/03/2005 18:11:55
11.11.133.130 12/03/2005 18:12:00
11.11.133.130 12/03/2005 21:22:32

output
source_ip date time
10.80.64.44 15/03/2005 16:28:42
10.80.64.44 15/03/2005 16:28:42
11.11.133.130 07/03/2005 19:38:32
11.11.133.130 08/03/2005 17:55:13
11.11.133.130 12/03/2005 18:11:55
11.11.133.130 12/03/2005 18:12:00

thank you for your help

ipazin · January 14, 2020, 10:47am

Hi @willson,

ok. So you want to filter but I’m still not getting your logic 100%. In output you both included duplicates and excluded them. Also what to do in case I mentioned in my first reply? I guess when you have only one record in day you leave him?

Br,
Ivan

willson · January 14, 2020, 11:10am

when you have one record in day you can leave or remove it’s not a problem.
but when you have for exemple 3 event’s like
11.11.133.130 12/03/2005 18:11:55
11.11.133.130 12/03/2005 18:12:00
11.11.133.130 12/03/2005 21:22:32
you remove the last one because it has more than 1 hours between him and the previous one.
note that the ip may be dupllicated on output depending on the date .

Thank’s

willson · January 14, 2020, 2:33pm

@ipazin have you got any idea to resolve my problem. thks

ipazin · January 14, 2020, 2:58pm

Hi @willson,

maybe something like this. Combine your date and time. Convert it to Date and for each group calculate time difference in seconds against previous row. Then using new Row Filter node exlude rows with difference greater than 3600

Br,
Ivan

willson · January 14, 2020, 4:25pm

it dosen’t work here is the knime project data.knwf (15.1 KB)

willson · January 14, 2020, 5:52pm

@ipazin can you take a look on the project I send u.

ipazin · January 15, 2020, 10:43am

Hi there @willson,

you are missing Column Combiner node in which you should combine date and time columns. If your delimiter is space you can use following syntax in String to Date&Time node: dd-MM-yyyy HH:mm:ss to get Date&Time column.

Br,
Ivan

system · July 15, 2020, 10:43pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.