I have two txt files which I read in, format and also convert string to datetime for one column each table. Everything works so far.
In the end the tables look like this:
table1
row no
column11
column12
column13
datetime1
1
1
Test
7
2016-07-13 12:48:20.417
2
5
TestValue
8
2016-07-13 19:07:37.133
3
2
rudi
1
2016-07-14 03:29:14.450
4
3
No Text
8
2016-07-14 05:57:04.371
table2
row no
column21
column22
column23
datetime2
1
Text1
-1.5
2.5
2016-07-12 23:40:44.124
2
sample text
1.6
2.6
2016-07-13 05:00:00.158
3
Yes
0.4
3.2
2016-07-13 12:48:20.417
4
No
-1.8
2.2
2016-07-13 17:57:34.536
5
Other
-1.2
2.7
2016-07-13 19:07:38.356
6
mine
-0.7
2.8
2016-07-13 21:29:07.290
7
day
0.4
3.3
2016-07-14 00:50:56.570
8
text3
0.9
2.3
2016-07-14 03:50:58.737
9
hope
-1.0
2.9
2016-07-14 04:26:18.601
10
no
-0.2
2.1
2016-07-14 04:58:59.615
11
Yes
0.6
3.4
2016-07-13 07:44:58.933
12
False
1.0
2.4
2016-07-13 09:52:07.715
Now I want to use a filter function for the rows of table2 in a way that only rows pass to the output that have either exactly the same datetime stamp as the row in table1 or (if no exact match exists) that is the nearest datetime stamp in positive direction (next). In other words: I want to cycle through all rows of table1, get the datetime1 stamp, look for the datetime2 stamps of table2 to find the equal or nearest next one and pass the result row to the output. So the output should contain a filter set with n rows of table2 (where n is the row count of table1).
might I ask you one question. From where is this question triggered? I got so many very similiar questions during the last days and I am only curious if this is a big coincident?
I am sorry I am a complete newbie to Knime and to this forum as well and so I obviously made some mistake.
I tried to post this question the first time just after I registered here. Then I saw that my post stayed in the "my contents" box and was not visible in the general forum as I defined it to be. I looked for it a few times again and the post did not appear in the forum.
So I decided to have made something wrong and posted the question again. But also the new post stayed in the my contents box and was not published in the next hours.
Today when I looked into this forum again I saw that there were both posts shortly one after another.
I am not familiar with this forum technique and made other experience in other forums so I must excuse for that.
If you (or anyone else) is able to delete one of the two posts it would be nice. I also wanted to delete one but unfortunately did not find the way to do that.
However it would be nice if you or anyone else has an idea for an answer to my question;-)) I tried to find a solution prior to my posts but did not find any hint to my problem.
Regards, Juppes
--------
Just a short update:
Now that the posts are published in this forum I was able to delete the first (unanswered) one myself:-)
thanks for the quick answer, it looks like it should be doing exactly all that I want ;-))
But I need a little help on the GroupBy node, I am not so familiar with that one (like most of the others). I assume that I have to configure here (4 tabs):
- Group Column: "Timediff"
- Manual Aggregation: ?? (RowIDs of both tables?)
- Pattern Based Aggregation: ?? (I assume nothing)
- Type Based Aggregation: Data Type: Number (long) (because I calculated the Timediff in ms); Aggregation: Minimum; (any further setting?)
- Any further setting in the "Advanced settings" at the bottom?
Sorry, through your question I realised that I missed out on one: you need to set the "Group" tab to the "RowId1" column. "Timediff" gets manually aggregated to "Minumum". Make sure to leave the other aggregation option unfilled though, as they will otherwise generate aggregations of their own.
So you are basically looking for one "Minimum timediff" per "RowId1" group field - that's how the GroupBy logic works. It's a database construct which takes some getting used to coming from Excel - been there, done that. :)
thanks a lot for your valueable hints! There was one point left out (glad I found it myself ;-) ): The 'time diff' values had to be absoluted by the Math Formula node additionally after calculation because otherwise the Group By node always takes the minumum (far away minus) value.