ETL Help: Loading SAS Datasets to MS SQL Server

I am fairly new to KNIME and currently taking courses on KNIMe via Udemy. However, I have an urgent ETL project and need help. The project is to read more than 100 SAS datasets (*.sas7bdat), clean a bit in terms of renaming columns, etc and then loading to MS SQL Server without creating corresponding Tables on the Sever on a weekly basis. Each of datasets has it’s own unique and different structure. So far I managed to read one SAS dataset with SAS7BDat Reader node and was able to connect to SQL Server via MS SQL Server Connector node. But that’s it and I believe I am missing many steps in between to be able to write one SAS dataset to SQL server let alone to automate the process to read and load 100+ datasets. I appreciate any guidance and what steps to follow to accomplish this task.

Thank you,

1 Like

For a case you can find some information here

1 Like

Hi @helalmob and welcome to the forum -

Here’s an example workflow to get you started. You’ll need to adapt this to include whatever cleanup on your data you want to do, and of course this isn’t a one size fits all solution - but maybe it will give you an idea of how you can use the DB nodes, along with flow variables to govern database table names.

I’m using an H2 Connector here, but of course you can swap this out for an Microsoft SQL Server Connector with appropriate credentials.

sas7bdatToDatabaseExample.knwf (27.1 KB)

I didn’t post this example to the Hub since it’s fairly specific to your problem, but please do check the KNIME Hub for other example workflows on working with DB nodes, loops, and flow variables.

5 Likes

Thank you so much for sharing the PDF with me. Been very helpful.

2 Likes

Scott, this is awesome and thank you for sharing the workflow with me. I will certainly check the Hub for additional DB nodes.

Helal

3 Likes

Hi Scott,
I modified the workflow you shared with me and replace H2 with MS SQL. For some SAS data sets, the flow works perfectly. For others, I get this error:
Error while adding rows #-1000 - #0, reason: String or binary data would be truncated.
I did many search but couldn’t find a viable solution where and how to address this issue. Any thoughts/suggestions?

Thank You,

Helal

Hi @helalmob -

In which node does this error happen? Is is the SAS7BDAT Reader node? If so, does it happen when you read a single file outside of the loop structure entirely?

Scott,
It did happen at DB Writer node and as I mentioned, it happens with some SAS files and not all. I am able to read and write multiple SAS that’s great and it’s very fast. However, I believe some SAS files have long characters or perhaps hidden characters.

Helal

Hello helalmob,
the DB Writer in KNIME uses the default character length of the database which is for MS SQL 255 characters. If you want to influence the character size you can replace the DB Writer with the DB Table Creator node and modify the default string mapping via the “Dynamic Type Settings->KNIME-Based SQL Type Mapping” tab as shown below:


Further information about the DB Table Creator can be found in the DB Documentation.

This node you only need to execute once to create the table in the database. Once the database table is available you can use the DB Insert node to insert the data.
Bye
Tobias

2 Likes

Hi Tobias,
Thank you for the detail information. In Scott’s workflow that I borrowed and modified, I am using both Table Creator and DB Writer (please see attached screen capture). What you are saying is I don’t need DB Writer if I am using Table Creator where I can modify default string character length. Since the workflow read multiple SAS data set and import them to SQL, I assume removing DB Writer form the flow and using only Table Creator and DB Insert will not impact multiple files read and import. Any other suggestions regarding the workflow?

Thank you,
Helal

Tobias,
DB Insert requires selecting a table while I am importing multiple tables from SAS. Any suggestion?

Thank you,

Helal

Hi Hela,
DB Writer is a combination of the DB Table Creator and DB Insert node to make it easier to write data into the db into a new table. However usually one creates a DB table once and then continues to insert records into the already created table. This is why we split up the tow operations. The DB Table Creator also allows you to specify indices e.g. primary/unique keys which you should consider to create to speedup querying the data or updating the data later on in the db.
You can use flow variables to automatically use the same schema and table name in the DB Insert node as you use in the DB Table Creator node. Actually the DB Table Creator node automatically exposes the table name and schema as flow variables (they are called tableName and schema). In the DB Insert node you can use the flow variables via the two buttons next to the schema and table name input fields as shown below:


Bye
Tobias

1 Like

Thank you Tobias again. I followed your instruction and DB Insert has a green light but no Tables are written to SQL. Here is the new workflow:

Also, I can’t make connection between DB Insert node and Variable Loop End node. Do I need the latter at all? is the Loop node causing tables are not in SQL?

Helal

Hi Helal,
you need to connect the DB Insert node with the Variable Loop End node to close the loop. Otherwise the Loop Body will only be executed once. You can connect the DB Insert using the Flow Variable ports that each node has but that are hidden by default.
For more details on Flow Variables and Loops in KNIME have a look at our online course.
Bye
Tobias

1 Like

Thank you Tobias again. I will do.

Helal

Hi Tobias,
I followed the instructions and was able to get it to wrok yesterday by importing 10 SAS data files to SQL. Today, I receive the “String or binary data would be truncated” even though I set the string in DB Table Creator SQL type to 1024 and onetime I set it to max. I looked at the log to see which file(s) is causing it but didn’t find specific file or field information. Any thoughts?

Thank You,

Helal

HI Scott,
With Tobias’s help, I am getting very close to complete my ETL process reading multiple SAS data sets and importing them to SQL. However, after troubleshooting few issues, there emerged another one. This one is at the DB Insert node and error says: cannot insert Null into column on table. I made sure that “Not Null” option in DB Table Creator node is unchecked for all columns in the data sets. Attached is the new Knime flow. Any suggestions how to resolve the issue?

Thank you,
Helal

Not offhand, but I am not a DB expert unfortunately. Do you have any additional information from the log, perhaps? Let me ask internally.

Thank you. Attached is the log.
Knime.log (41.0 KB)

Hello helalmob,
is it possible that the table already existed in the DB with the not null option enabled for the dbtest02.dbo.btact. Bye default the “Faile if table exists” option in the DB Table Creator node is disabled which is why might not have noticed it. In order to recreate the table simply add a DB Table Remover node before the DB Table Creator node.
Bye
Tobias

2 Likes