Data Pull from Oracle to Sql Server (4 million rows) Running on Windows

rameveer007 · March 28, 2019, 12:51am

I am trying to pull data from Oracle and write to Sql Server. It’s about 4 million rows running on my laptop and it takes 3-4 hrs. What’s the best way to speed this up.

ipazin · March 28, 2019, 8:04am

Hi there!

3-4 hours seems really long. Do you preform some transformations with data in between or only read from Oracle and write to SQL Server? For reading data from databases there is fetch size parameter and for writing data to databases there is batch size parameter. Both of them should be configured in KNIME configuration file knime.ini located in KNIME installation directory. To speed up data processing in KNIME check this blog page.

knime.database.fetchsize= <number>

Sets the fetch size for retrieving data from a database. The default value depends on the used JDBC driver.

knime.database.batch_write_size= <number>

Sets the batch write size for writing data rows into a database. The default value is 1, that is one row at a time.

Batch size parameter can also be configured in Database Writer/Update nodes.

Additionally I recommend using new Database Integration node (haven’t compared it to the old ones so can’t say they work faster) that are currently in KNIME Labs

Check database nodes from new integration here: https://hub.knime.com/search?q=db%20labs

Br,
Ivan

rameveer007 · March 29, 2019, 12:34am

Thank you Ivan. I am not doing any transformation. My flow is as follows:

Unless I am doing something incorrectly.

database Connector (Oracle) --> Database Table Selector --> Database Reader --> Database Write (sql server)

Batch size to write is 1000

izaychik63 · March 29, 2019, 12:55am

Try batch size 10000

rameveer007 · March 29, 2019, 2:03am

Will do

ipazin · March 29, 2019, 10:23am

Hi @rameveer007!

If no transformation then 3-4 hours is really slow. I think your flow is correct. Check what is executing a lot. A reading part from Oracle or a writing part to SQL Server.

You can use Timer Info node for this as it will give you the execution duration of every node.

@izaychik63 In my case I had very similar performance with batch size 1000 or 10000…

Br,
Ivan

rameveer007 · March 29, 2019, 12:59pm

It seems both read and write are taking 2-3 hrs each. I will use the Timer info to check.

Thank you for your suggestion

ipazin · March 29, 2019, 2:51pm

Both? That can be both good and bad news Check it and then we will try to come to the bottom of it.
Br,
Ivan

rameveer007 · April 1, 2019, 11:34am

Thank you very much Ivan and Ipazin. I tried to increase the batch size writer to 10K and it did took significantly shorter time.

The read is still slow. Our datapipe is about 10Gb/s from our data center but it’s taking the same amount of time 3-4. i did change the configuration for fetch to 1000 but did not see it improve.

I wanted to update you so that you are aware.

ipazin · April 1, 2019, 1:34pm

Hi there!

You are welcome.

I see. You can increase it to 20.000 but not sure that is the problem. What kind of data is in this table and how many columns you got in it? If it is just standard data then ok, if not maybe there is something.

How many RAM have you assigned to KNIME? It is controlled by -Xmx parameter in knime.ini file. Also
- Dorg.knime.container.cellsinmemory parameter can be increased accordingly.

Additionally KNIME does in memory processing if possible. If not (in case of bigger tables) KNIME writes data to disk and does compression which you can switch off entirely by adding
–Dknime.compress.io=false to the knime.ini file at expense of temporary disk space.

Here is a link I already provided but check it for reference to above mentioned things:
https://www.knime.com/blog/optimizing-knime-workflows-for-performance

Additionally anti-virus programs can be aggressive and slow down process a lot so check that as well.

For the end I would suggest to try with 1000,5000,10.000 records (and so on) and measure it to see when the performance degradation comes into play. Maybe you can figure out something this way. I least I would try it.

Good luck!

Br,
Ivan

beginner · April 3, 2019, 6:00am

That is the real question. Out of experience reading lob columns from oracle is terribly slow and would explain the long run time.