I have to deal with an annoying file that I sadly cannot convert to CSV without manual labor (at least to my knowledge).
File goes like that : groups start with a dn:, and end with an empty line.
I would like to group into columns all that data for each line starting with “dn:” and their subsequent lines (up until we start again with a new “dn:”).
Bonus question : the tool that sends the data to me cuts the lines short and my URLS are split into two lines, the second one starting with a space. When using “file reader” node it seems leading spaces are trimmed so I can’t regex my way out of this little problem. Ideas ?
dn: cn=file transfer preferences,cn=file transfer,c=fr,o=ABCDEF
description: this is another description
cn: File Transfer Preferences
dcspref_ces: DCSApp.Class*$*String-$-dcs.prefs.DCSTDFPrefs
dcspref_ces: TDF.appList*$*List-$-stuff
dcspref_ces: DCSApp.Desc*$*String-$-File Transfer
dcspref_ces: DCSApp.URL*$*String-$-https://dcs.ABCDEF.com/rne_tdf/tdfprefs/j
sp/tdfprefs.jsp
dn: cn=bcc preferences,cn=bcc,cn=dis,c=fr,o=ABCDEF
description: This is a description
cn: more cn stuff
dcspref_ces: ST.appList*$*List-$-cn=BCC Preferences, cn=BCC, cn=DIS
use a line reader to load it.
rule engine to put a 1 into each row that has DN:, 0 in the others
Moving Aggregation to make a cumulative sum. then you have each element grouped.
the rest afterwards is mostly string manipulation, grouping and pivoting
edit:
“seems to have”. maybe just check it? your screenshot seems fine. also dont trust the new “modern ui”, it does funky things like trimming whitespace in certain versions for display purposes - but the leading whitespaces are still present.
you can e.g. count the length and compare against manually counting
for me, line reader does not trim whitespace. If you have a tab instead of a whitespace, then those will also not be removed. Knime is just not capable of displaying those properly if the encoding is wrong.
hence if you have a tab and not whitespaces, get the encoding right or just run a replacing step. if you use the old interface, you will see a tab displayed as “_” in the node preview, but not in the table view that you get from rightclicking the node