XML to CSV / Table conversion without using XPath

Hi,

I am not sure if there is a possibility to transform XML to CSV without using XPATH as the XML is pretty large and I don’t like to define each of hundreds XML child. I know that there is a way to convert XML to JSON and JSON to Table but the issue with this is often the names of the columns are not specific to one child rather accumulative. Is there a way around ?
It is pretty easy in Excel (just import XML) but I am not aware of any feature in KNIME!

Hi @mateenraj ,

When you say “transform XML to CSV”, does this mean that your XML is quite a basic format, with no nested data structures, and that the data will easily fit into a tabular format?

Do you have an XSD for it? I was wondering how many fields you would have overall.

1 Like

Hi @takbb ,

Yes, XML is nested a bit. I don’t mind flattening it into a tabular format. Actually Excel does it when I open. XSD is not available but I created one yesterday. There are 192 columns.

One quick solution: XML to JSON → JSON to Table

–P

I already mentioned that above. The problem with this approach is that the names of the columns are not identical to XML tags.

Not sure I understand the issue correctly. If you need unique column names, the option “Use path with separator” might help, which will give you the entire path of the property instead of just the root name:

No on the contrary I want root / leaf name as otherwise the name would be too large but the leaf name option is somehow not working. I tried it in two different version of KNIME. Please see attached.
LeafNames

If you have text content, you’ll always have field names which are named “text” (or ending with “text” if you use option described above), as these are text() nodes per XPath/DOM terminology.

I’d suggest to use the “Use path with separator” and then try a Column Rename (Regex) to transform the column names into your desired names (i.e. trim trailing .text at the end, remove common suffix, etc.), e.g. from path.to.my.columnName.textcolumnName.

3 Likes

Ok, I got it fixed with Regex (^.*?)(\w+\Z) and replaced the column name with $2. Now comes the problem of performance. It has been already 10 minutes and the JSON To Table Node is stuck with 0%. The file size is 1.90 MB!!
WF

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.