KNIME Database Writer very slow, And batch size option is skipping rows

Dears – I’m facing an issue when attempting to write to an Azure SQL Database, I’m using the Datbase Writer Legacy Node, and trying to write almost 500K rows to a new DB table. When using a Batch Size of 1, the speed is extremely slow, might take almost 5 days to upload this minimal amount of data. Then tried a higher batch size of 100K, and what happened there is that the writer skipped 100K rows every time, so in the end only 5 rows were uploaded…
Am I doing something wrong there? any special setup that I need to do on the writer to make it faster?

Hi @bebeid,
1 and 100K seem to be two extreme cases for the batch size. This parameter determines how many records at a time are sent to the database. Have you tried setting it to 1000? This seems to be a good compromise.
Kind regards
Alexander

2 Likes

Hi @AlexanderFillbrunn – Thanks for getting back to me! Yes in fact I did try multiple settings between 1 and 100K, specifically tried 1000 as well, and still the same result, the writer skips rows at the increments of the Batch Size. So it starts at row 1, then row 1+BatchSize, then row 1 +Batch Size + Batch Size, etc…

Hi,
that’s strange. I was told that the new DB nodes have been improved in terms of performance. Can you try out those?
Kind regards
Alexander

Hi @AlexanderFillbrunn, I did try those, the new writer nodes–however–fail to write to Azure DB, as they translate into incorrect syntax, by adding to the CREATE statement “quotes” at the table name and column names, generating the below error. If you’re familiar with this issue or how to avoid it, would be great, so I can try my luck in the performance of those new nodes.

ERROR DB Writer 0:202 Execute failed: [Simba]SparkJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: Error running query: org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input ‘CREATE TABLE “userdb_ebeid_be.mapexAllDetails”’(line 1, pos 13)

== SQL ==
CREATE TABLE “userdb_ebeid_be.mapexAllDetails” (“Exclusion_Count_Cat” varchar(255), “Exclusion” varchar(255), “Scorecard_Category” varchar(255), “SMO” varchar(255), “Horizon” integer, “frcst_genertn_date” varchar(255), “Cluster” varchar(255), “horizon_days_val” integer, “frcst_date” varchar(255), “lag_days_val” integer, “intrvl_days_val” integer, “product_categ_name” varchar(255), “business_unit_name” varchar(255), “region_code” varchar(255), “uom” varchar(255), “subregion_id” varchar(255), “load_date” varchar(255), “loc_id” varchar(255), “Suffix” varchar(255), “AddRemove” varchar(255), “ds_horizon_qty” decimal, “ds_lag_qty” decimal, “dp_horizon_qty” decimal, “dp_lag_qty” decimal, “order_horizon_qty” decimal, “order_lag_qty” decimal, “shpmt_horizon_qty” decimal, “shmpt_lag_qty” decimal, “TAE” decimal, “Scaled_Actuals” decimal, “Scaled_TAE” decimal)
-------------^^^
, Query: CREATE TABLE “userdb_ebeid_be.mapexAllDetails” (“Exclusion_Count_Cat” varchar(255), “Exclusion” varchar(255), “Scorecard_Category” varchar(255), “SMO” varchar(255), “Horizon” integer, “frcst_genertn_date” varchar(255), “Cluster” varchar(255), “horizon_days_val” integer, “frcst_date” varchar(255), “lag_days_val” integer, “intrvl_days_val” integer, “product_categ_name” varchar(255), “business_unit_name” varchar(255), “region_code” varchar(255), “uom” varchar(255), “subregion_id” varchar(255), “load_date” varchar(255), “loc_id” varchar(255), “Suffix” varchar(255), “AddRemove” varchar(255), “ds_horizon_qty” decimal, “ds_lag_qty” decimal, “dp_horizon_qty” decimal, “dp_lag_qty” decimal, “order_horizon_qty” decimal, “order_lag_qty” decimal, “shpmt_horizon_qty” decimal, “shmpt_lag_qty” decimal, “TAE” decimal, “Scaled_Actuals” decimal, “Scaled_TAE” decimal).

Hi,
thanks for bringing this to my attention. The developer for this is currently on vacation, but I will let him know about it! I hope he can respond here once he is back.
Kind regards
Alexander

Hello bebeid,
it seems that for your database the quote_identifier parameter is set to off since the double quotes are not allowed in the statement. You can either enable this by setting it to enable as described in the documentation using the DB SQL Executor node in KNIME or you can change the identifier delimiter in the “Advanced” tab of the Microsoft SQL Server Connector node

I’m puzzled why the database drops rows when using batch insertion. Did you find any errors in the database log file or in the KNIME log? KNIME performs batch insertion as described here.

Bye
Tobias

One more thing that I noticed, are you using a Spark JDBC driver from Simba to connect to an Azure DB? If this is the case please use a JDBC driver for Microsoft SQL Server since Azure DB is very similar to a SQL Server database. For further details on how to register such a driver please see this section.

1 Like

Hello. I have exactly same error with batch. If batch is set to 10, then I get row 1, then 11, then 21 etc… always 1, 1+BatchSize, 1+BatchSize+BatchSize. Im using Apache Spark JDBC driver to connect to my Azure Databricks Enviroment. @tobias.koetter do you know any solution for this issue?

Hello,
with KNIME 4.1 we introduce the new Databricks integration which allows you to use the existing database and big data nodes to work with your Databricks environment. We also have an example workflow that demonstrates the usage of the different nodes including the DB Loader which should be used to upload data into the Databricks db. For more details on how to register the Databricks JDBC driver see this section of the database documentation.
Bye
Tobias

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.