I have been struggling with getting both a small memory footprint and parallel SQL-statement execution. Using KNIME to orchestrate ETL for many sources, I need to process in parallel to stay inside the time window. At the same time keep the memory usage of KNIME (instances) down – memory used by KNIME/Eclipse cannot be used for caching/efficiency.
Using the jTDS driver to connect to SQL Server there is a hard-coded limitation: ‘one user’ equals ‘one statement’ at the time. There seems to be no way around this limitation (please let me know if there is a way). Meaning, if you want to process in parallel you need to start multiple instances of KNIME (java VM) – one parallel statement per instance. This brings up the memory footprint, especially for large workflows.
For connecting to SQL Server, the Microsoft JDBC drivers (sqljdbc_6.0) allow a work-around. Also this driver pools connection to the same server, but does it comparing more than just the host and username. Simply adding a dummy parameter with unique value to the connection string, will allow multiple connections and therefor multiple parallel statements to be executed. Now one workflow running in a single KNIME instance can execute statements within the time windows and with ‘low’ memory consumption.
Connection string: jdbc:sqlserver://<host>\<instance>:<port>;<other options>;x=<chunck-id>
Where <chunck-id> is a unique value to trigger a new connection to be made.
Please do reply to this post with other solutions if available.