Help with streaming

Hello Knime World,
I and very basic-level and (guess what?) I need your help.

I need to filter a DB2 table with data coming from an Oracle table.
Apparently, a very “classic” join. But I understand Knime will not accept to join tables living in different databases.
I have read/write permissions in Oracle and only read permission in DB2.

Knime lets me extract data from DB2 and write a table in Oracle, but the DB2 table has dozens millions rows and Knime is installed in a machine with limited resources: my workflow will stop with a missing space error. In order to test my workflow, I artificially filtered my query in DB2 and everything went fine - see picture Knime_1.

Digging a little deeper, I discovered the streaming executor, so the natural following step was to create a component with “simple streaming”, as a way to avoid writing any data inside Knime - see picture Knime_2.

And here is the problem: while te “no-stream” workflow works as expected, its “streaming” sibling will stop with a “DB Writer failed: data” error - see picture Knime_3 and picture Knime_Log.
Still, the desired table is created inside Oracle (and it is populated as expected!!), but my workflow will stop and let me down there.
So, same workflow: no-streaming is OK and streaming is KO… is this even possible?

Can you help me understand what I m doing wrong? Because I know it’s my mistake…
Or maybe you could suggest different approaches?

Thanks everyone!
C.

KNIME_1
KNIME_2
KNIME_3

Hi @carl8 and welcome to the Knime Community.

You can check this demo workflow by @tobias.koetter :

Alternatively, if you had to do data manipulation, you could read and write in batches if you have limited amount of resources to do the whole operation at once.

You can FETCH with OFFSET in DB2, do your data manipulation and write the manipulated data to Oracle. Put all this in a loop.

In your example, you’re not doing any manipulation, so probably streaming is the best way to go.

1 Like

Depending on your use case, is it possible to extract your filter criteria as variables from DB1 and use them to filter your DB2 table?
br

Hi @Daniel_Weikert , just a slight precision, DB2 here is a DB2 system, so it might get confusing if you use DB1 and DB2 to denote the first and second system :slight_smile:

From what I understand, @carl8 is trying to read some data from a DB2 system and then write this data to an Oracle system. There is no filter to apply for the writing. The filter is applied at reading only.

1 Like

That’s correct, @bruno29a
Thank-you.

Hi @carl8 , did you try the demo that I suggested? If it works, can you compare with your workflow?

EDIT: It’s hard to troubleshoot when we don’t have access to the workflow.

One thing I wanted to confirm, is it breaking because of the streaming, or is it breaking because something is missing in the component? Have you tried to run it as a component with no streaming?

It’s a common mistake where we don’t make sure that ports follow inside the components - by default they do NOT. For example, are you sure you are including the connection port inside the component? You have to manually make sure that the Component Input (Node 99) is passing the 2 connections along inside the component.

EDIT 2: Nevermind the above, it looks like this is applicable only for Flow Variables inside a component. DB connections seem to be fine.

I re-created your workflow on my end (with other DB system though), and it worked without any issue:
image

image

Is it normal that your DB Table Selector and DB Reader did not execute? May be that’s why the DB Writer failed, cause it did not get any data.

Hello again and thanks for your answers.

I made a copy of the suggested demo workflow and edited where required: same results, same error, same Logs.

@bruno29a I am willing to pass my workflow privately, if possible.

Going on, I successfully run the component w/o streaming: it goes through with no errors, as in the case of simple nodes with no component. So apparently it looks like the streaming is causing troubles (even though the Oracle table is there and it’s correct!).

I am not sure I understand what you mean by “we don’t make sure that ports follow inside the components”; I opened the component and checked the Component Input inner ports configuration: that seems OK. Is this what you mean?

Finally. DB Table Selector and DB Reader were pictured after the workflow was stopped by the error, they seem to reset and go back to yellow light, but they were correctly executed.

One difference I noticed between Tobias’ workflow and ours is that both our Component Outputs are lacking a connection with the DB Writer node (and also an Input Port!).


PS: I can’t seem to able to edit my first message and correct typos… Am I not allowed?

Hi @carl8

As I clarified in my EDIT 2, this is applicable only for Flow Variables. Flow Variables do not flow through components by default. You have to manually include them. This is not applicable in your case, and you additionally confirmed that the component works well when not streaming, so there’s no problem on that side.

It should not make a difference. Besides, it works for me.

I’m not sure why it’s not working for you.

Hello @carl8 ,
I was able to reproduce the problem and created a bug ticket for it (internal reference AP-18648). As a workaround you can select the “Disable the DB Data output port” option in the node dialog.
Bye
Tobias

5 Likes

Hello @tobias.koetter .

The workaround you suggested worked perfectly: table created in Oracle and no workflow error.
Thank-you very much (and thanks again to @bruno29a , too)!!

Bye,
c.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.