Split a composite csv file

szawadski · December 11, 2019, 1:33am

I have a strangely formatted file:

***SCENARIO***;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; rtsc_id;rtsc_version;rtsc_description;rtsc_code_postal;rtsc_statut;rtsc_owner;rtsc_user_stamp;rtsc_date_stamp;rtsc_time_stamp;p_TodayDate; p_UserName;demg_bpg_rtsc_id;;;;;;;;;;;;;;;;;;;; D9_4730_15;1;Code 9 D Mail;4730;9;PROD ;MARTENS;02-10-15;14:36;02-10-15;DLBROUJ;322183;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ***SERVICE***;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; rdu_id;rdt_id;rdu_duration;;;;;;;;;;;;;;;;;;;;;;;;;;;;; J0451;MZ_____2; 8:43:48;;;;;;;;;;;;;;;;;;;;;;;;;;;;; J0452;MV_____3; 8:06:00;;;;;;;;;;;;;;;;;;;;;;;;;;;;; J0453;MZ_____2; 8:38:43;;;;;;;;;;;;;;;;;;;;;;;;;;;;; J0454;MV_____3; 8:16:17;;;;;;;;;;;;;;;;;;;;;;;;;;;;; JS455;ZMZ____1; 3:48:00;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ***SERVICE_ACTIVITY***;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; rdu_id;rdca_actv_type;roua_arrival_time;roua_departure_time;roua_duration;;;;;;;;;;;;;;;;;;;;;;;;;;; J0451;KW_Deure; 4:51; 4:52; 0:01:00;;;;;;;;;;;;;;;;;;;;;;;;;;; J0451;VS_VerKr; 4:52; 5:00; 0:08:00;;;;;;;;;;;;;;;;;;;;;;;;;;; J0451;BV_Terri; 5:00; 5:03; 0:03:00;;;;;;;;;;;;;;;;;;;;;;;;;;; J0451;DV_AfhVo; 5:03; 5:06; 0:03:00;;;;;;;;;;;;;;;;;;;;;;;;;;;

How can I split it easily into 3 files based on the lines where there is a title like ***SERVICE ACTIVITY***, each section having different columns to import into a diffrerent table in Knime?

mlauber71 · December 11, 2019, 6:24am

In short:

read the whole file with file reader with one large column
identify the lines where the separate files start “****…”
import with file reader or CSV reader starting 2 lines below the separation
get rid of the empty lines

armingrudd · December 11, 2019, 9:55am

Hi @szawadski,

Here is a workflow which splits a txt file with the given format into multiple Excel files (no limits in number of files):

split_files.knwf (103.3 KB)

P.S. The example file and the output Excel files are in the “data” folder in the workflow directory.

szawadski · December 13, 2019, 1:21pm

Hello,
Thanks for your proposals. This is finally what I did, inspired by @mlauber71 's proposal.

I read the whole file line by line
For the lines starting by “***”, i duplicate the value in a new column
I fill down the missing value of this column so that each line is tagged properly
Then, I group loop based on the tag after having removed the title lines
and generate new csv files based on the name of the section that I append to the original name

Split_concatenated_csv.knwf (28.7 KB)

Again, thanks to this wonderful community, for its responsiveness and the quality of the discussions.
Although @armingrudd 's solution was a based on a slightly different format than what I had effectively (not all text on one line), I discovered new useful nodes thanks to him.

armingrudd · December 13, 2019, 2:18pm

I am glad you have found the solution. Just to complete my work, I have modified my workflow to work with the multi-line format:

Split_concatenated_csv.knwf (127.4 KB)

system · December 20, 2019, 2:18pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.