Split a composite csv file

I have a strangely formatted file:

***SCENARIO***;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; rtsc_id;rtsc_version;rtsc_description;rtsc_code_postal;rtsc_statut;rtsc_owner;rtsc_user_stamp;rtsc_date_stamp;rtsc_time_stamp;p_TodayDate; p_UserName;demg_bpg_rtsc_id;;;;;;;;;;;;;;;;;;;; D9_4730_15;1;Code 9 D Mail;4730;9;PROD ;MARTENS;02-10-15;14:36;02-10-15;DLBROUJ;322183;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ***SERVICE***;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; rdu_id;rdt_id;rdu_duration;;;;;;;;;;;;;;;;;;;;;;;;;;;;; J0451;MZ_____2; 8:43:48;;;;;;;;;;;;;;;;;;;;;;;;;;;;; J0452;MV_____3; 8:06:00;;;;;;;;;;;;;;;;;;;;;;;;;;;;; J0453;MZ_____2; 8:38:43;;;;;;;;;;;;;;;;;;;;;;;;;;;;; J0454;MV_____3; 8:16:17;;;;;;;;;;;;;;;;;;;;;;;;;;;;; JS455;ZMZ____1; 3:48:00;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ***SERVICE_ACTIVITY***;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; rdu_id;rdca_actv_type;roua_arrival_time;roua_departure_time;roua_duration;;;;;;;;;;;;;;;;;;;;;;;;;;; J0451;KW_Deure; 4:51; 4:52; 0:01:00;;;;;;;;;;;;;;;;;;;;;;;;;;; J0451;VS_VerKr; 4:52; 5:00; 0:08:00;;;;;;;;;;;;;;;;;;;;;;;;;;; J0451;BV_Terri; 5:00; 5:03; 0:03:00;;;;;;;;;;;;;;;;;;;;;;;;;;; J0451;DV_AfhVo; 5:03; 5:06; 0:03:00;;;;;;;;;;;;;;;;;;;;;;;;;;;

How can I split it easily into 3 files based on the lines where there is a title like ***SERVICE ACTIVITY***, each section having different columns to import into a diffrerent table in Knime?

In short:

  • read the whole file with file reader with one large column
  • identify the lines where the separate files start β€œ****…”
  • import with file reader or CSV reader starting 2 lines below the separation
  • get rid of the empty lines

Hi @szawadski,

Here is a workflow which splits a txt file with the given format into multiple Excel files (no limits in number of files):

split_files.knwf (103.3 KB)


P.S. The example file and the output Excel files are in the β€œdata” folder in the workflow directory.


Thanks for your proposals. This is finally what I did, inspired by @mlauber71 's proposal.

  • I read the whole file line by line
  • For the lines starting by β€œ***”, i duplicate the value in a new column
  • I fill down the missing value of this column so that each line is tagged properly
  • Then, I group loop based on the tag after having removed the title lines
  • and generate new csv files based on the name of the section that I append to the original name

Split_concatenated_csv.knwf (28.7 KB)

Again, thanks to this wonderful community, for its responsiveness and the quality of the discussions.
Although @armingrudd 's solution was a based on a slightly different format than what I had effectively (not all text on one line), I discovered new useful nodes thanks to him.


I am glad you have found the solution. Just to complete my work, I have modified my workflow to work with the multi-line format:

Split_concatenated_csv.knwf (127.4 KB)



This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.