Creating column names with XPATH Node

I would like to read the data from an entire XML file that contains about 1000 elements, using the element names as the column names. The XPATH value query of //* with Multiple tag options of Multiple Columns works great for capturing the data into columns; however I cannot figure out how to get the column names to be the element names. I have tried name and name() in the XPath Query for column name: but the (relative to value query) results in Value of //*name which upon execution results in

ERROR XPath 0:7 Execute failed: javax.xml.xpath.XPathExpressionException: Number of values differs from number of column names.
WARN XPath 0:7 Number of values differs from number of column names.

Searching the forums it seems like others have had similar issue in the past so I am not sure if there is a way to get this to work. Any suggestions are appreciated. Thanks.

Hi,

Unfortunately the supported version in XPath node is XPath 1.0 which does not support this format:
/root/item/*/name()
Which in XPath 2.0 will output the sequence of the names of the elements following the “item” element.

In XPath 1.0 you can get the name of 1 element using this:
name(/root/item/*[1])

I have already requested the feature to have the XPath node with version 2.0.

Or maybe I’m missing something at all?!

Best,
Armin

P.S. Resources:

1 Like

Here I have an alternative solution which is a bit tricky:
First, use a String Manipulation node to modify your XML file to make the element names be values.
E.g.:
If this is the XML:

<?xml version='1.0' encoding='UTF-8'?>
<root>
    <item>
        <id>0001</id>
        <idBoard>0001</idBoard>
    </item>
    <item>
        <id>0001</id>
        <idBoard>0001</idBoard>
    </item>
    <item>
        <id>0001</id>
        <idBoard>0001</idBoard>
    </item>
</root>

Use this expression:
regexReplace($json$, "(?:<)(.*)(?:>.*<\\/.*>*?)", "<item>$1</item>")
And then convert the output of the String Manipulation node to XML, so you will have this:

<?xml version='1.0' encoding='UTF-8'?>
<root>
    <item>
        <item>id</item>
        <item>idBoard</item>
    </item>
    <item>
        <item>id</item>
        <item>idBoard</item>
    </item>
    <item>
        <item>id</item>
        <item>idBoard</item>
    </item>
</root>

Now use a XPath node to extract the new values which are indeed the element names of the original XML file. Use Multiple rows option in XPath and use a RowID node after this XPath node to reset the Row IDs.
Now let’s get back to the original XML file. Use a XPath node to extract the values. The column names do not matter. Use Multiple columns option. Then use Extract Column Header node and uncheck the “Use new output column names”. Then use a Transpose node and connect the first output port of the Extract Column Header node the this node and execute the node. Then use another RowID node to reset the Row IDs.
Now, join the output of the two RowID nodes based on the Row IDs. Use a Insert Column Header node and pass the output of the joiner node to the second input port and the second output port of the Extract Column Header node to the first input port of the Insert Column Header node. Now in the configuration window of the Insert Column Header node assign the column containing the old column names to the “Lookup Column” and the column containing the new names to the “Value Column”.

That’s it.
Here is an example workflow for you:

XPath_Column.knwf (50.8 KB)

Best,
Armin

2 Likes

Armin,

Thanks for your very clever solution. It does work. However it is a bit messy since the XML document I am working with has namespace prefixes for each element as well as several attributes describing the element which make for a very long column name.

The ideal solution would be support for XPath 2.0, so hopefully the ticket mentioned by Ivan will be implemented soon.

Thanks again,

Jeff

2 Likes

Of course!
However, if you upload your XML or a sample here, I can provide you with the required regex and workflow.

Best,
Armin

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.