set interval time with knime

#1

hi,
i have a dataset with ip, date, and time, I want to consolidate all the records grouping by ip and date having time between each of them less than 1 hours.

exemple:
|10.80.64.44 |15-mars-2005|16:28:42|
|10.80.64.44 |15-mars-2005|16:28:42|
|11.11.133.130 |07-mars-2009|19:38:32|
|11.11.133.130 |08-mars-2009 |12:39:20|
|11.11.133.130 |08-mars-2009 |17:55:13|
|11.11.133.130 |08-mars-2009 |17:55:13|
|11.11.133.130 |12-mars-2009 |18:11:55|
|11.11.133.130 |12-mars-2009 |18:12:00|
|11.11.133.130 |12-mars-2009 |21:22:32|
|11.11.133.130 |15-mars-2009 |13:56:27|
|11.11.133.130 |15-mars-2009 |13:56:27|

0 Likes

#2

Hi @willson and welcome to the forum.

Here I would convert your dates and times to a timestamp using the String to Date&Time node. Then, you can use a Date&Time Difference node to figure out the length of time between each record. As needed, you could do some consolidation with a GroupBy node (by IP address, for example).

Give those a try and see what happens!

2 Likes

#3

thanks can I give u the data to make it, it’s dosent work for me;data.xlsx (718.5 KB)

0 Likes

#4

Hi there @willson,

have checked your data but have a question about desired output. Should it be additional column stating unique group identifier or some aggregation/list? Also in this case how would you group?

|11.11.133.130 |12-mars-2009 |18:11:55|
|11.11.133.130 |12-mars-2009 |18:31:00|
|11.11.133.130 |12-mars-2009 |19:21:55|

Br,
Ivan

1 Like

#5

ok let me explain.
I have a dataset and I would like to keep only the records having the same IP address, same date and time between thems are less than 1 hour

exemple
input
source_ip date time
10.80.64.44 15/03/2005 16:28:42
10.80.64.44 15/03/2005 16:28:42
11.11.133.130 07/03/2005 19:38:32
11.11.133.130 08/03/2005 12:39:20
11.11.133.130 08/03/2005 17:55:13
11.11.133.130 08/03/2005 17:55:13
11.11.133.130 12/03/2005 18:11:55
11.11.133.130 12/03/2005 18:12:00
11.11.133.130 12/03/2005 21:22:32

output
source_ip date time
10.80.64.44 15/03/2005 16:28:42
10.80.64.44 15/03/2005 16:28:42
11.11.133.130 07/03/2005 19:38:32
11.11.133.130 08/03/2005 17:55:13
11.11.133.130 12/03/2005 18:11:55
11.11.133.130 12/03/2005 18:12:00

thank you for your help

0 Likes

#6

Hi @willson,

ok. So you want to filter but I’m still not getting your logic 100%. In output you both included duplicates and excluded them. Also what to do in case I mentioned in my first reply? I guess when you have only one record in day you leave him?

Br,
Ivan

0 Likes

#7

when you have one record in day you can leave or remove it’s not a problem.
but when you have for exemple 3 event’s like
11.11.133.130 12/03/2005 18:11:55
11.11.133.130 12/03/2005 18:12:00
11.11.133.130 12/03/2005 21:22:32
you remove the last one because it has more than 1 hours between him and the previous one.
note that the ip may be dupllicated on output depending on the date .

Thank’s

0 Likes

#8

@ipazin have you got any idea to resolve my problem. thks

0 Likes

#9

Hi @willson,

maybe something like this. Combine your date and time. Convert it to Date and for each group calculate time difference in seconds against previous row. Then using new Row Filter node exlude rows with difference greater than 3600 :wink:

Br,
Ivan

1 Like

#10

it dosen’t work here is the knime project data.knwf (15.1 KB)

0 Likes

#11

@ipazin can you take a look on the project I send u.

0 Likes

#12

Hi there @willson,

you are missing Column Combiner node in which you should combine date and time columns. If your delimiter is space you can use following syntax in String to Date&Time node: dd-MM-yyyy HH:mm:ss to get Date&Time column.

Br,
Ivan

0 Likes