Hello everyone. I’m so new to knime and trying to learn about loops to decrease memory usage while writing my query results to database. The flow below works like this;
- “query for variable” pulls out the distinct “year” data from my table because I want to separate query results based on year and write each years data at a time to decrease memory usage as I mentioned above.
- I used a column list loop start (and tried using generic loop start, had the same result imo) and it gets all distinct year data as I want.
- Then used a table row to variable to give variable for the “query”
- the loop end collects all data (thats not I want) then writes it to db2
My problem is about step 4. I don’t want all data to be collected in loop end node at once. my workflow should collect data for year 2012 for example, write it to db table then collect 2013 data and append it to the table and so on.
I’m totally aware that this a funny question but still
You can probably remove the the ‘Column List Loop Start’ node as it is redundant; besides that, it would need a loop end to make the nested loop work.
The second variable port connection is redundant as well.
and welcome to KNIME Community!
Using loops actually, to my knowledge, increases memory usage so you want to avoid them as much as possible Additionally writing into database depends on your database, node configuration (batch size mostly), data size and type, and KNIME (your machine) - DB (server machine) connection. So simply try Oracle Connector - DB Query Reader - DB Writer flow. If you already did that and it’s slow you can try increasing batch size.
To speed up database to database data transfer/migration check this workflow example that utilizes streaming option within KNIME:
Hope this helps!
Hi there. Ty for your reply. I have updated my flow but still… it collects all data to loop end node and have to figure that out
Hi Ivan. I wasn’t aware that loops can increase memory usage. I thought that running a query with partial date block, writing them to db and running that query again with a different date block might reduce the data stored in memory. If its as you said then yes batch increase is a viable option (I am doing that for now but some queries give me over 500m rows so it takes some time)
My knime version (company app) doesn’t have streaming but will check the flow anyways.
wow! 500m is quite a lot and you should definitely try streaming as it will avoid writing data on your disk. Hope you are not planning to perform this on a daily basis. In case you are doesn’t Oracle have some option/functionality/tool for such data migration?
Regarding memory usage. Not a huge expert and maybe something changed here in last year or two but loops did increase RAM memory usage. In some cases dividing data into smaller sets using loop can help as you can avoid writing data on disk. Check this part on documentation regarding optimizing KNIME and workflow running time:
Ivan, yes thats a lot but at least its not on a daily basis.
I should go talk to my supervisor or division head about acquiring a version with streaming option as I understand
Ty for the comments.
you don’t need to acquire a “KNIME version with streaming”. It’s just an extension that you install and that’s it. See here for more on installing extensions and integrations:
And a little bit about streaming (old but believe it’s valid):
Oh btw this connection solved my problem imo. At least it does not store data fully in loop end node and partially writes it to db2 table
But still I will take into consideration that loops increase ram usage instead of decreasing it
@efegurtunca you do not have to collect the data in the end you can just close the loop and maybe connect some meta information with:
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.