Uploading CSV with Different Schema Using a Loop

datazoom · March 28, 2022, 10:38am

Hi,

I am trying to read 1000 CSV files with different schemas and upload them to Google Big Query using a loop. I get an error: “ERROR File Reader 0:5 Execute failed: For input string: “9.77” In line 2 (Row0) at column #10 (‘Ethnic Group: White and Black African; measures: Value’).”

I think this happens because the schemas of the CSV files are different. How can I solve this issue?

Thanks,

bruno29a · March 28, 2022, 12:14pm

Hi @datazoom , I don’t have Knime in front of me right now, but the CSV Reader has an option in the advanced tab for “Support different schemas” or something like that.

Also, the CSV Reader has the option of reading multiple files on its own without having to run it through a loop.

datazoom · March 28, 2022, 1:51pm

Hi @bruno29a,

Thank you for your support.

I think you are referring to CSV Reader (Labs) node. When I try to read all the files in a folder it again gives an error saying that the number of columns differs.

I tried to get the URL from another file using a flow variable but it does not accept the variable, I do not know why.

I am not sure what I am doing wrong.

Thanks again.

bruno29a · March 28, 2022, 2:44pm

Hi @datazoom , if you are seeing the “Files in folder” option only in the Labs version, I’m assuming that you are running on a quite “old” version of Knime.

The regular CSV Reader node of the newer Knime versions offers this option.

And this is the option for supporting different structure from different files:

Daniel_Weikert · March 28, 2022, 5:17pm

@datazoom And if you want to do it with Loop the configuration setting is in the loop end node
br

datazoom · March 28, 2022, 5:29pm

Thank you again. You are right! But I updated the version. I cannot use the folder read option, it tries to combine the files. I managed to create a loop but it still gives an error while getting each table:

ERROR Variable Condition Loop End 0:51 Execute failed: Input table’s structure differs from reference (first iteration) table: different column counts 64 vs. 15. I click on execute again and it continues. Still easier than uploading the files manually.

Thank you for your help!

bruno29a · March 28, 2022, 5:33pm

Hi @datazoom ,

Wouldn’t reading the files via a loop do the same? Unless you are processing the data and writing it to somewhere within the loop?

Can you show us how your workflow looks like, so we can understand what you are doing? It’s best if you can share you workflow. If you can’t share your workflow, can you at least show a screenshot of the workflow?

I’m also trying to understand why you are using a Generic Loop Start and a Variable Condition Loop End

datazoom · March 28, 2022, 6:25pm

I have 1400 CSV files in 140 different schemas. Each file includes a dataset about different properties of a portfolio and there are 10 portfolios vs 140 properties for each portfolio, so it ends up to 1400 files. I try to create 140 tables (1 table for each property) in BigQuery and append 10 portfolios’ data to these tables.

So, I do not want to combine the data included in the CSV files.

BIgQueryDataUpload 1.knwf (31.4 KB)

I created a simplified version of my workflow. I am doing this for the first time, so if there is sth else I should upload please let me know.

and after creating the tables, I change the DB table creator to DB loader and run the loop again.

I tried different loops, and Table Row to Variable Loop and Variable Condition LoopEnd worked.

Thank you again!

bruno29a · March 28, 2022, 7:38pm

Hi @datazoom , thanks for sharing your workflow. As I guessed, you are writing the read data to somewhere, in this case to Google BigQuery, in which case the use of the loop is justified.

In your case, you can end the loop with a Variable Loop End, which you attach to your DB Table Creator, instead of the Variable Condition Loop End

You should not get any error after that.

Also, if you are going to write to different tables, you need to create different table names dynamically via a variable, and pass that variable in your DB Table Creator.

Are you trying to write to the table though? Or just create the table?

datazoom · March 28, 2022, 8:01pm

Hi @bruno29a ,

Thank you very much for your help. I changed the end loop to Loop End and ticked “Allow Changing table specifications” and “Allow variable Column types” now it works without errors:)

I created the tables first and then loaded the data into the tables.

Thank you again for your support! Much appreciated!

bruno29a · March 28, 2022, 8:16pm

No problem @datazoom .

The DB Writer node would create the table for you if the table does not exist, and writes data to the table.

So, the create table (if not exist) and write data is done in the same node.

There’s no need to separately create the tables.

datazoom · March 28, 2022, 8:17pm

Thank you, I’ll use that next time.

system · April 4, 2022, 8:18pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.