[BUG] XPath "Return missing cell" option not working properly

Hi,

it seems I’ve discovered a bug in the XPath node when using the option “Return missing cell on empty string”. The option to return missing seems to not work. Whenever there are indeed empty values or the targeted element is missing at all, the table structure get invalid.

Example XML with missing values

<root>
	<childs>
		<child ID="Classification root" ParentID="">
			<Name>Classification root</Name>
		</child>
		<child ID="child1" ParentID="">
			<Name>Inactive child</Name>
			<data>
				<value AttributeID="PartOfContext"/>
			</data>
		</child>
		<child ID="grand-child1-1" ParentID="child1">
			<Name>Inactive grand-child 1.1</Name>
			<data>
				<value AttributeID="PartOfContext"/>
			</data>
		</child>
		<child ID="child2" ParentID="Classification root">
			<Name>Active child 2</Name>
			<data>
				<value AttributeID="PartOfContext">Yes</value>
			</data>
		</child>
		<child ID="grand-child2-1" ParentID="child2">
			<Name>Inactive grand-child 2.1</Name>
			<data>
				<value AttributeID="PartOfContext">Yes</value>
			</data>
		</child>
	</childs>
</root>

Expected results

ID Parent ID PartOfContext
Classification root ? ?
child1 Classification root ?
grand-child1-1 child1 ?
child2 Classification root Yes
grand-child2-1 child2 Yes

Given result

ID Parent ID PartOfContext
Classification root child1 Yes
child1 Classification root Yes
grand-child1-1 child2 ?
child2 ? ?
grand-child2-1 ? ?

https://hub.knime.com/mw/space/BUG%20Xpath%20return%20missing

Hi @stelfrich,

apologize for bringing this forth directly to you but I saw your post about how to raise bugs properly. Did I do something wrong here? The bug I seem to have discovered might have a serious impact on all using the XPath node.

Kind regards
Mike

I’m not commenting whether this would be a bug, or how to properly report KNIME bugs here, but:

The issue you’re describing is more a characteristic of XPath – if you write a query such as //child/@ParentID the XPath engine will simply give you a list of matching nodes, it will not “care” whether there are parent elements without the child or attribute. And what the XPath node effectively does: It will process each query in isolation and then append a new column with matching results, starting from the top and fill up remaining rows with “?” cells.

Simple solution:

Use two XPath nodes. First query for the common parent element (e.g. //child) and output a “Node cell” (instead of a string). Then append a second XPath node where you query within the previously extracted node. This way you’ll maintain the structure, and the “missing values” will appear in the correct row.

Hope this helps.

– Philipp

Example WF:

6 Likes

No worries at all, @mw! This topic has somehow just slipped through the cracks, nothing more.

Thanks for chiming in, @qqilihq! :+1:

In addition to what @qqilihq has already mentioned, I think that the option actually does what it is supposed to do. Namely, instead of returning an empty string for an existing, but empty entry returning a missing value.

Best,
Stefan

1 Like

Good morning @qqilihq and @stelfrich,

thanks both for you reply.

Phillip, that surprises me. I tested the regex throughly before and validated the results lots of times. I started off with an absolute path and made it work relative always getting the same results. The xPath was used more than once year and it worked flawlessly.

The point is, even if the Xpath node processes node- and Xpath-wise, the number of referenced child nodes to extract data from do not change. So the “Return missing” option should, for each child xml-node referenced, ensure the corresponding result is given back, shouldn’t it?

It seems to me that “Return missing cell” is not working. The solution you suggested indeed works but simulates what the Xpath should do in the first place with referencing “//child”.

Kind regards
Mike

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.