Need help with a non comma separated file (leading space being trimmed and grouping successive lines)

Greetings all !

I have to deal with an annoying file that I sadly cannot convert to CSV without manual labor (at least to my knowledge).

File goes like that : groups start with a dn:, and end with an empty line.

I would like to group into columns all that data for each line starting with “dn:” and their subsequent lines (up until we start again with a new “dn:”).

Bonus question : the tool that sends the data to me cuts the lines short and my URLS are split into two lines, the second one starting with a space. When using “file reader” node it seems leading spaces are trimmed so I can’t regex my way out of this little problem. Ideas ?

dn: cn=file transfer preferences,cn=file transfer,c=fr,o=ABCDEF
description: this is another description
cn: File Transfer Preferences
dcspref_ces: DCSApp.Class*$*String-$-dcs.prefs.DCSTDFPrefs
dcspref_ces: TDF.appList*$*List-$-stuff
dcspref_ces: DCSApp.Desc*$*String-$-File Transfer
dcspref_ces: DCSApp.URL*$*String-$-https://dcs.ABCDEF.com/rne_tdf/tdfprefs/j
 sp/tdfprefs.jsp

dn: cn=bcc preferences,cn=bcc,cn=dis,c=fr,o=ABCDEF
description: This is a description
cn: more cn stuff
dcspref_ces: ST.appList*$*List-$-cn=BCC Preferences, cn=BCC, cn=DIS

Thanks in advance

use a line reader to load it.
rule engine to put a 1 into each row that has DN:, 0 in the others
Moving Aggregation to make a cumulative sum. then you have each element grouped.

the rest afterwards is mostly string manipulation, grouping and pivoting

1 Like

when using line reader the preview does keep the leading space (see row 7)

image

But the output preview seems to have had that leading space trimmed. How do I prevent this ?

edit:
“seems to have”. maybe just check it? your screenshot seems fine. also dont trust the new “modern ui”, it does funky things like trimming whitespace in certain versions for display purposes - but the leading whitespaces are still present.
you can e.g. count the length and compare against manually counting

for me, line reader does not trim whitespace. If you have a tab instead of a whitespace, then those will also not be removed. Knime is just not capable of displaying those properly if the encoding is wrong.

hence if you have a tab and not whitespaces, get the encoding right or just run a replacing step. if you use the old interface, you will see a tab displayed as “_” in the node preview, but not in the table view that you get from rightclicking the node

I tried several file reader node and it doesn’t work unfortunately :frowning:

Hi @Startide

See this wf Reading Comma Separated File.knwf (100.4 KB). Check the Support Short Data Rows option in the File Reader node.

afbeelding

gr. Hans

3 Likes

@HansS omg thank you so much, I need to dig deeper into your solution but it’s saving my sanity at least. Thank you very much for your help !

1 Like