I am new to Knime so am still feeling my way around - any help would be appreciated
I have multiple xls files (although in the future i think they could be any format) that i want to join together. I have one file as my base and then would like to join (left outer) other files as needed.
I want to join reference tables which wouldnt have the same columns as the core set of data (although of course there will be links to each set)
The most pertinent example i can find is the joiner but is limited to 2 joins. Ideally i dont want to have to do join on join so am looking for an alternative solution.
I am exploring using a loop to bring in all the sets in one area but not sure if i am heading the right direction
For your logic it is better to load files to DB and use SQL. Say MS-SQL Express is free and supports 10 GB database.
welcome to the community! Actually, there are two options: (a) upload the data into a SQL database and read the data using a single select statement with multiple joins or (b) have consecutive join nodes in KNIME.
AFAIK, there are no EXCEL-2-DB mapper nodes in KNIME (i.e., addressing an EXCEL sheet as a database table.
Keep up and kind regards,
I set up a demo how this could be done in KNIME although you could do that in consecutive steps or directly in a SQL database (as has been mentioned) but it is possible. Here I use SQLite which I like because it is a ‘real’ SQL database but you can just have it within one single file (you could switch that out for any SQL database).
The Join itself is done in the SQL Executor node. Plus: you have it in one step. The downside could be it might get a little bit confusing with all the steps, but you have it in one step (sort of ).
DROP TABLE IF EXISTS data2;
CREATE TABLE data2 AS
FROM data1 d1
LEFT JOIN dimension1 t1
ON d1.ID1 = t1.ID2
LEFT JOIN dimension2 t2
ON d1.ID1 = t2.ID3
kn_example_sqlite_multiple_joins.knwf (63.8 KB)
Nice approach! Quick question: Does one need to install some SQLite-driver or is this already all-in into KNIME?
SQLite should just work like that with KNIME. You just ‘adress’ a file on your computer and if it is not there it gets created. I like SQLite because:
- it is a ‘real’ SQL-DB
- you can use all the SQL stuff with it
- it keeps your formats and variable types intact (mostly)
- you can use it to transfer data ‘intact’ to other programs like R or Python
- no fixed limitations like 65k lines
- you can store multiple databases in one file
- you might switch out the SQLite with any other SQL-DB and keep the structure of your KNIME workflow (if you need more serious power)
Terms and conditions apply limitations would be the power of your system.
That’s good news! Thank you!
Thanks, this and your prevoius post is great. Much appreciated
I am just not sure what this means please “You just ‘adress’ a file on your computer and if it is not there it gets created”
I am at the point where i am configuring the my database writer and am not sure what driver and URL i should i use.
Do i need to create an empty database and reference it for the writer?
Let’s do this by the book.
@mlauber71 showed quite nicely the 5 steps necessary to do the trick:
- Set up the SQLite connector (the orange node without a box around).
- Read the EXCEL files (the orange nodes in the yellow box).
- Store them into SQLite data base (red nodes with yellow box around).
- Join the data (yellow node followed by orange node in a yellow box).
- Store the data (two red nodes without a box around).
To what step do you refer to when asking about a path / URL for a database writer?
You can just use the SQLite connector and enter a URL like:
Then the database would sit right in the folder where the workflow is like in my example.
I like to have my files stored in a separate folder /data that sits on the same level as my workflow. Then the reference would look something like this:
(you could use any name “db” is just my idea. You just have to constantly reference the same file each time you want to use the data)
And yes. For SQLite in KNIME you do not need a special driver or anything you just use the “SQLite Connector” node. It is as simple as that (I was confused about this also first because I toyed around with MySQL and had to install a lot of things - but SQLite you just have a file).
As I said: another benefit of SQLite is that there are drivers and connectors for R or Python so you could transfer your data (even multiple databases) if you have to.
I am using a path to my local drive for the SQLite connector - will that work?
I am getting an error msg “ERROR SQLite Connector 0:1 Execute failed: Could not create connection to database: [SQLITE_CANTOPEN] Unable to open the database file (out of memory)”
Hi @Patrick1974 - I am referring to step 1. How was that db.sqlite created?
as far as I understood @mlauber71, the file is created by typing the name into the DB-connector node. Please read the build in help to the SQLite connector node.
Hmm out of memory sounds like there is not enough space left. Could you check the following:
- make sure there is enough space on your disk
- check you knime.ini file if you have specified enough RAM for your files
- try to give the db.sqlite table a new name like db1.sqlite
- open the SQLite connector and see if it says anything about the location
- try to specify a different location (you would later want to have them all in a meaningful structure, but that is another story)
I found on my Mac that when the db.sqlite was originally there within one session and I would deliberately delete it from the file system it would not recreate the file - even if I had run it again and the node is green. Simple solution was to give it a new name. I have not tested what would happen if I restarted KNIME.
Could be some sort of bug or a feature to be added in a future KNIME version but not a big deal.
Thanks @mlauber71 - I think I must of had a bad connection for that error message but I am all sorted now and is working as expected. appreciate the assist.