Joining tables from two different postgresql

@bilallodhi welcome to the KNIME forum.

Question is what does that mean. If you can access them both by the same DB connector you might just use the DB Joiner – KNIME Community Hub node which in effect will create a SQL view. You can combine several DB (database) nodes so that the operation will be done on the database.

I do not think that a join using streaming is possible since the machine would not know which other possible values would be there. But that might depend on your actual data and what kind of databases we are talking about. Performance might also be influenced by things like indices (if they are being used).

In general it most often does make sense to do the larger database tasks on the DB. With big data operations my experience is that sometimes it can make sense to use temporary tables and do more complex operations in steps - while other databases (think Oracle) might have the option to optimise (nested) queries by themselves.

I think you might have to provide us with some more informations and maybe also talk to someone in or organisation who would know about performance tweaking in your database system.

If 5 or 7 million lines are large very much depends on the nature of the data :slight_smile: just considering the number of lines a strong machine might be able to do that with a KNIME installation :slight_smile: - depending on the complexity of the join.

Another thing to think about might be to do it in chunks and store the intermediate results (on a local machine Parquet would be a format that would be able to easily append intermediate results - also on a big data system). That again very much depends on the scenario.

4 Likes