how to merge tables, one with single instance and the other with multiple instances for a unique ID

Hello,

I am new to Knime but currently learning quick

I have a table with web visitor instances almost 100,000 rows . Each instance has a session ID, this unique ID is assigned to a user so that to understand user behavior. The user may visit the website in different times but the ID will be kept the same. We have a data and hour and minute of a specific user visiting a site at different times. Please keep in mind that the user may visits the website several times in one hour and still have same ID ( ID X instances)

I have another table (Weather table) that shows the weather condition of a city. The unique ID for this table is the date and hour (a weather condition at a specific day and specific hour)

My task is to understand  the influence of weather on web visitors (visitor  beaver)  so I wanted to merge this two tables using the date and hour column. But my problem is that a single visitor may have several instances (rows) in a single hour While the weather table can only have one single value per hour. 

Example Visitor Table

Session ID

Date and Time

Hour

Minute

Page path

Distance from platform

2324

2015-11-23 12:00

12

00

xyz.com/products/electronics

locall

2324

2015-11-23 12:30

12

30

xyz.com/products/books

locall

4547

2015-11-23 13:00

13

00

Xyz.com/products/pets

distance

6784

2015-11-23 14:00

14

00

Xyz.com/products/tv

locall

6784

2015-11-23 14:30

14

30

Xyz.com/products/electronics

locall

6784

2015-11-23 14:55

14

55

Xyz.com/products/computers

locall

 

 Example of Weahter table

Hour of the Day

Houre

City

Temprature

Wind

Rain

Season

Weather category

2015-11-23

12

ABC

12.6

4.2

1.2

Winter

moderate

2015-11-23

13

ABC

3.6

3.6

5.2

Winter

hgjgjh

2015-11-23

14

ABC

6.8

4.2

2.9

Winter

hgjgjh

2015-11-23

15

ABC

17.4

0.4

0

Winter

hgjgjh

2015-11-23

16

ABC

17.2

0.5

0

Winter

hgjgjh

2015-11-23

17

ABC

16.9

2.6

1.1

Winter

hgjgjh

 

So far have merged the date and hour columns in to one and then formatted the dates on the two tables  using the “String to Date“ node and after that I try to merge them but the number of rows after merge are way less than I have expected and I suspect in the visitor table, the existence of multiple lines in a same hour might have affected .

Can you please advise on how to merge this two tables. And any further recommendations appreciated.

Thank you

1 Like

hi there

a little sample worklfow would be helpful. but as far as I understand you might want to use a group by node for your visitors. I would first agregate the date, hour and the visitor ID into one combination column, and do the group by on this cobination, and as Manipulaton do a count by ID and a  First by the Date

then you can do a join of the two tables.   good luck!

Hi Kibrom, I have a similar case where there are multiple rows for the same request number. Each row provides a sequence step and time processed. I would like to combine all of the sequences and times processed into a single row (by request number).

I have seen a workflow that uses unpivot and then pivot as @Iris suggested in this thread (Transpose row to column and column to row) but haven’t managed to make it work, any help will be highly appreciated !!!

Hi @B074534,

If you want to have the sequence IDs in the same row as each unique request number, you can use the GroupBy node and the the “List” aggregation method and then use Split Collection Column node:
to_collection_split.knwf (17.4 KB)

If you want to have the sequence IDs as column headers and have e.g. the time for each sequence, Pivoting is the node to use:
pivot.knwf (18.1 KB)

:blush:

P.S. Regarding @Kibrom’s issue, the Joiner node does the task as shown in this workflow:
join_weather.knwf (23.3 KB)

5 Likes

Hi @armingrudd, I have got a more complex data set I tried applying the principles of your instructions but couldn’t progress far… I am trying to transpose all the rows by each particular Route_ID:

Example input:

Example output I am trying to create:

Here is a workflow I created with first to tables showing the input ant output and below the masterdata that I need to transform:
KNIME_TransposeByRoute_ProcessAndTime.knwf (30.4 KB)

Thanks in advance!

In this case, I think the easiest way is to use 2 loops, one to loop over rout ID groups and the other puts all rows into one:
KNIME_TransposeByRoute_ProcessAndTime.knwf (91.6 KB)

:blush:

2 Likes

Hi @armingrudd, I tried the loop in a larger data set but got this error:

Any clue what has gone wrong? Is your solution only an option for small data sets?

Try this one which is without loops:
KNIME_TransposeByRoute_ProcessAndTime.knwf (65.5 KB)

You may need to modify the workflow.

:blush:

2 Likes

Hi Armin,
It doesn’t work because I have different formats as column headers, in your example it works because all are strings…

Any other possible simple solution to my problem? Thanks in advance…

Hi again @B074534,
Actually the example data set is yours. I just checked the whole topic and the solution again.
In your example data set, all columns are string because you have column headers in the first row.
If in your current input all rows are real data (no column headers in data rows), you can use the last 3 nodes (Column Combiner, GroupBy and Cell Splitter).

:blush:

2 Likes