Pass data table to join in query

I have a table that gets generated from a select statement based on certain criteria from a certain server/database we will call S1DB1. Now I have to use the columns that are coming through here and all the possible rows and choose data from another server/database we will call S2DB2. The two servers are separate, the database types are also separate. I do not have permission for the userid on S2DB2 to create any temporary table and such.
I have to run the query say
SELECT FROM TABLE(S) IN S2DB2
JOIN TABLE FROM S1DB1
WHERE S2DB2.FIELD1 = S1DB1.FIELD1
AND … SO ON.

Is that possible to do in Knime where the S1DB1 selected table is passed in entirety to the query?

Hi @ak2020,

Unfortunately I would say the answer is likely to be generally no.

KNIME is passing sql queries so unless you have a database with some special extensions to sql of which I’m not aware (and drivers to match), KNIME is going to be restricted to that syntax.

In terms of blending or joining data from different data sources, of course KNIME can do that, but the data from each source needs to be brought down to KNIME and then joined. It’s a good tool but it cannot change the inherent behaviour of your database servers.

The only way I can think of achieving what you need with a database is if one of your database servers has provision for remote access to tables on the other server. If for example both databases were Oracle, then a database link can be created and tables in the sql query could be referenced using the @remotedb syntax, such as in the following pseudo code:

SELECT FROM TABLE(S) IN S2DB2
JOIN TABLE FROM S1DB1@otherdatabase
WHERE S2DB2.FIELD1 = S1DB1.FIELD1
AND … SO ON.

Other database vendors will have alternative mechanisms for remote tables. But that obviously requires that your database servers support this or a similar feature.

Maybe if we knew the types of databases that you are using, people may have some alternative ideas.

But how large is the table you would want to “upload”?

If it were relatively small (such as a very small lookup table, or a short series of IDs) then it would be possible (but most likely still impractical) to create a sql query of the following form, where the rows from your S1DB1 table are reconstructed into a series of select statements in a UNIONed set of SELECT statements, e.g.

in sql server, where you can get away without supplying a table name in a select statement, something like this format:

SELECT FROM SDB2_TABLE t1     
inner join
(select '1' as field1, 'A' as field2, 'B' as field3
union
select '2' as field1, 'X' as field2, 'Y' as field3
union
select '3' as field1, 'M' as field2, 'N' as field3) X
on X.field1=t1.field1
AND … SO ON.

or with Oracle, where you need to select from an oracle-supplied one row table called “dual” to achieve the same result, this format:

SELECT FROM SDB2_TABLE t1     
inner join
(select '1' as field1, 'A' as field2, 'B' as field3 from dual
union
select '2' as field1, 'X' as field2, 'Y' as field3 from dual
union
select '3' as field1, 'M' as field2, 'N' as field3 from dual) X
on X.field1=t1.field1

In this case, the “pseudo table” X is constructed within the query as the following table:

field1 field2 field3
------ ------ ------
1      A      B
2      X      Y
3      M      N

and joined to the other table. However unless your constructed dataset is small I cannot imagine this being a good solution, and I wouldn’t recommend it except maybe as a last resort.

1 Like

yeah, we have two different database servers of the destination database type and one allows temporary tables to be created while this one specifically does not. Not sure why. And there is no way we can write a query to reach out to another server and pull data from there, security rule implementations.
I guess the current option is to pull the subset from the second database, use knime to join/filter from first and second query and use that process.

Thank you for your suggestion though, it will be handy where such an option is allowed.

1 Like

Hi @ak2020 , that’s what I find Knime to be very useful for this kind of situation, joining tables, or rather data, from different db servers.

You can do all this in Knime instead of the db servers. Basically do the select and filter as much as you can (date range for example, basically apply where conditions if you can) from both db servers separately, which will give you 2 Knime tables, and then you can do your join via the Joiner – KNIME Hub node.

The only downside of doing it this way is that you will require some RAM.

3 Likes

There is this example showing how to use KNIME streaming to transfer data between two different data bases without having to store the data in KNIME

Question would be if you could construct something around that. One possibility could be to just fast transfer the table temporarily to the DB where you have writing permission. If the list of IDs is not too large you might pass them on as flow variables or use them as rules in the streaming (only stream the ones you want). If there are too many you could try and use chunks.

Maybe you set up an example with H2 and SQLite that represents your challenge and we check out if anything would work.

Of course if the data is not that large you could just do it all in KNIME as has been mentioned.

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.