Splitting table into sections using timestamp

eu91 · April 21, 2017, 10:12am

Hello everyone,

I am very new to KNIME, and I would like to ask for some help for my problem:

Let's say I have a table with a list of simulated transactions with timestamp, and with the Time Difference Node I could calculate the time difference between each row (i.e. Row 2-Row 1, Row 3-Row 2 and so on)

Transaction	Timestamp	Time Difference in second smallest unit is also set to second
A	21.04.2017 10:04:59.929	? (because there is no Row 0)
B	21.04.2017 10:04:59.931	1
C	21.04.2017 10:04:59.933	1
D	21.04.2017 10:06:06.497	67
A	21.04.2017 10:07:14.652	69
B	21.04.2017 10:07:14.653	1
C	21.04.2017 10:07:14.656	1

Based on the time difference, I would like to split the table into sections / subtables to isolate the transactions which occur almost at the same time from those which occur later. The table should be splitted where the time difference is more than 60 seconds. The above table would be splitted in the following 3 sections:

A	21.04.2017 10:04:59.929	?
B	21.04.2017 10:04:59.931	1
C	21.04.2017 10:04:59.933	1

D

21.04.2017 10:06:06.497

67

A	21.04.2017 10:07:14.652	69
B	21.04.2017 10:07:14.653	1
C	21.04.2017 10:07:14.656	1

Afterwards, each section would be further labeled with different events:
(Since the transactions are simulated, I know which section belongs to which event. For e.g., the first 12 sections are Event 1 and 2 occuring alternatively, i.e. Event 1, Event 2, Event 1, Event 2 and so on. Afterwards, Event 3 and 4 would occur alternatively, each for 6 times. A loop node might be useful here to assign the event to the sections?)

A	21.04.2017 10:04:59.929	?	Event 1
B	21.04.2017 10:04:59.931	1	Event 1
C	21.04.2017 10:04:59.933	1	Event 1

D

21.04.2017 10:06:06.497

67

Event 2

A	21.04.2017 10:07:14.652	69	Event 1
B	21.04.2017 10:07:14.653	1	Event 1
C	21.04.2017 10:07:14.656	1	Event 1

The transactions will be converted into a string sequence based on the timestamp: (I would say that the GroupBy Node would be one of the suitable approaches)

A B C

Event 1

D

Event 2

A B C

Event 1

Sequences which belong to the same event be combined into a table in the next step:

Event 1

Sequence
A B C
A B C

Event 2

Sequence
D

The question is: How can I implement this approach? How can I split the table based on the time difference? It seems like I should use a loop node, and the main table will be further splitted down loop after loop. Could someone please assist me? Sorry for the long post, but thanks a lot :)

Iris · May 1, 2017, 1:37pm

Hi,

I would suggest using a Java Snippet. At least for your first part which generates the Events.

Afterwards, you can use a groupby node to get the Sequences.

If you provide me with a minimal start workflow, I can make you the example.

Best, Iris

eu91 · May 9, 2017, 2:56pm

Hi Iris,

thank you for your reply. I have attached a sample workflow with this reply.

Since I am reading my data from a databank, I created some sample data for you. There are three tables in the workflow:

1. "Sample Raw Data" is the data to be processed

2. "Sample Raw Data with Events assigned" is a table with Events assigned manually, just to be used as a comparison / for checking

3. "Table at goal" is the table to be achieved

In real case I have more than 26 transaction types, that's why I have changed the transaction types to numbers (Transaction 1, Transaction 2, Transaction 3...), and the events to alphabets (Event A, Event B, Event C...)

I don't really have any experience in Java, if you could help me out, I would be very thankful :)

Thanks!

Regards,

Eugene

example_forum.zip