KNIME direct connect with Delta Lake in Azure Cloud

Hello, Community,
I have one delta lake storage in Azure Cloud storage account Gen2. To serve this delta table, one Synapse Serverless SQL is used. We can successfully load the data from the Serverless SQL and do some desired operations in KNIME.
Sadly, we run into some blockers when writing the processed result back to Synapse Serverless SQL or Storage account as Delta Table. So is there anyone with experience:

  1. Write Delta Table ( [Home | Delta Lake] (https://delta.io/)) to Azure Storage Account.
  2. Merge the Delta Table instead of the overwrite / append.
  3. even better, direct read from the storage account in Delta Table format.

Kind regards
Loujiang

Hi @ Loujiang,
Can you tell us a bit more about the concrete blockers you are facing? I do not have experience with Delta lake specifically, but have debugged many database connections in KNIME before. So if you describe what is going wrong, I might be able to help.
Kind regards,
Alexander

1 Like

Hello, Alexander,
I want to return the table to Azure Storage Account Gen 2 in delta format(parquet files with special metadata denoting them). Currently, I can only do with the regular parquet files. Are there some possible solutions to this?
Best wishes
Loujiang

Hi,
Can you read the files with our Parquet Reader and the Azure Data Lake Storage Gen2 Connector instead of SQL? Or is the metadata so special that it cannot be read?
Kind regards,
Alexander

Hallo,
For that, I can read the parquet in with the Parquet Reader like following:
image

However, the data read-in is not desired.
In the folder, we have multiple parquets generated with the following logic(delta):

  1. One Version of the delta table will connect to one/multiple parquets.
  2. When default reading, we only read the limited parquets based on the metadata to get the latest version.
  3. When writing back, we also need metadata management. In our normal process, it is maintained automatically with the delta merge.

However, with KNIME, the parquet read could only read all parquet and not use the metadata. So in the flow, it reads all transaction results.

best wishes
loujiang

Hi,
In what format do you have the metadata? Could you read that as well and use it to control the Parquet Reader?
Kind regards,
Alexander

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.