CSV Writer Node hen egg problem

Hello together,

I have a problem with the CVS Writer Node. I created a workflow that reads text files and processes them according to certain criteria. From these text files I would like to write certain areas in separate CSV files. Filtering works, the variable writing of the CSV works and the whole in several loops. So far as so well. My problem is that the CVS Writer Node don’t care about the number or position of the columns. In the forum, I have already found solutions to help with the Column Appender Node if you have an existing CSV file. And that is exactly my hen egg problem.
The names of the CSV files to be written are not known during the first run of the workflow and new CSV files can arise during a cyclical procedure (e.g. Via batch mode). The whole thing is of course variable, which means that, for example, 20CSV files can arise from a read-in text file, from the next none or perhaps only one. What must my workflow look like at this point, that the first empty CSV file is written with the correct name, then this empty CSV file must be loaded again and compared with the data in the workflow and saved again?
Attached is a screeshot from my workflow section

The CSV Write Node (Node 142) crashes when the file not exists yet.

Has anybody an idea how to solve that problem?

Thanks, Brotfahrer

Hi @Brotfahrer,

i do not really understand your problem :thinking:
Do you maybe have an example workflow with your problem?

I cannot quite grasp what you want to say… :see_no_evil:

2 Likes

@Brotfahrer I also do not fully understand your problem. But if you have to operate with csv files you could maybe use a standard name in the first iteration and copy or rename the file to the final name. You might have to use and construct URI paths and names.

If everything else fails you could employ R to write your CSV file. It would not check if a file is there or not. Tough that might be somewhat over engineered …

2 Likes

Hello ,

thanks for your answers and links!

What I want to say is that the CSV Writer Node can’t sort the columns that have to be written in each loop. Unfortunately my data is sometimes incomplete or I have a file which has extendet data. With the option “Don’t write column headers if file exists” I get Columns in my CVS files in case of Extended data without column names. The Concatenate Node or the Column Appender Node are sorting the columns automatically and append if neccessary new columns.

For a better understandig I have copied a Clip of one of my result CSV file. In row 2 there are different stings as in row 1, 3 and 4. These strings in row 2 have to be in other columns but the CSV Write Node can’t sort.

AC_Err_PtMngmnt_Rq_AR2_1 AC_Err_PtMngmnt_Stat_AR2_1 AC_Err_StWhl_Angl_Stat_AR2_1 AC_Err_TxFWD_Rq_PtMngmnt_ECM_VAN_AR2_1 AC_Err_Tx_Rq_PtMngmnt_ECM_AR2_1 AC_Err_VehDyn_Stat1_AR2_1 AC_Err_VehDyn_Stat2_AR2_1 AC_Err_Whl_Lt_Stat_AR2_1
Off Off Off Off Off Off Off Off
no Error end position not reached engaged On no Error N-P P-N D TCC open 509.804
Off Off Off Off Off Off Off Off
Off Off Off Off Off Off Off Of

I know I can solve this for example with the Column Apender Node but in the initial first loop creating a new CSV file I have no CSV file to compare or append.
My real problem is the inconsistency of my data to be processed. Unfortunately, I cannot resolve this inconsistency, because I do not generate the data at this point. So I have to sort the data correctly at the end.

Maybe the CSV Writer Node is the wrong Node for my Problem. I tried the DB Writer node but I have no idea about databases or their structure. :pensive:

Brotfahrer

Another additional info: As already written in the first thread, there may be written multiple CSV files out from one text file. The screenshot shows such a partial result. The “DTC” column contains the file names for the CSV files. Now, of course, you can come up with the idea for the 14 files to be created here to perform the whole initial by hand. In my case, however, up to 5000 CSV files can occur. Of course I don’t want to do this by hand :wink:

image

Greetings,
Brotfahrer

I do not really understand what you want to do, maybe you should come up with some sort of concept what should happen. In general I do not see any possibility to reshuffle columns in CSV files after they have been written, that would have to happen somewhere else in KNIME or a database.

What you can do is eg use a group loop node to split a table into several parts based on a column and store the results in separate CSV files.

2 Likes

Hi @Brotfahrer,

i think i start to understand your problem.
Can you maybe create a small exampe workflow with 2-3 csv (and dummy data)?
I think the solution would be a combination of a group loop (like @mlauber71 suggested) and a csv reader/concat inside the loop to sort per csv file :thinking:
Then you should be able to create new columns in existing csv and write the correct columns from knime to the corresponding columns in the csv

2 Likes

Hello @mlauber71 and @AnotherFraudUser

thanks for your support!
I have researched in my company parallel to this thread which possibilities there are to create reference tables. And there are these possibilities. Here too a lot of work is needed and I will also need KNIME for the creation of the reference tables :wink:. Therefore, I don’t want more loops in my existing workflow, with the knowledge to process only incomplete data. With the reference tables I will be able to discover data that may only be generated sporadically or not at all. For me, this is also a realization that is important to me.

Greetings and nice weekend
Brotfahrer

1 Like

Hi @Brotfahrer,

not sure If I understand the problem correctly, but would it be a solution to check whether the file has been written at least once and exists (List Files node) and depending on the result of this check (IF or Switch Case nodes) either write out a first empty file or not? In doing so, you would have an empty file in the first iteration where you can then append.

Best,
Simon

3 Likes

Hi @SimonS,

I got the egg :hugs:!!!

My modified Metanode now looks like this. It includes another Metanode called “1st Iteration”

Now the important part is the new Metanode. This one look like this:

The ultimate trick is the mixture between data flow an variable flow with the Row Filter Node and the CASE switch. Because of the Create File Name Node in the upper Metanode I know the file path. In case the CSV file doesn’t exist I create with the Table Creator Node a dummy table with two columns I know they are in every log-file I read in (date and ident number). That’s the egg. :egg:
If the CSV file exists the CSV Rreader Node read the CSV file and I can proceed to juggle with my data. That’s the hen :chicken:
Now I’m able to sort my data correctly. Now I have some convention problems (read in a string, the hen :chicken:CSV Reader reads in Double or Integer values) but this is elaborate but simple.

Thanks also to @mlauber71 and @AnotherFraudUser for their time and ideas!

Greetings
Brotfahrer

6 Likes

Hi Brotfahrer,

glad to hear you got it to work! :slight_smile:

Best,
Simon

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.