Xpath and flow variables - set cell types via flow variable

Hi there,

I have a problem, I need to pass xpath flow variables into the xpath node from a .csv file. I have set up the following workflow;
image

I have set the following flow variables in the xpath node;
image

where fieldname contains the column name, xpath contains the xpath variable, celltype contains the celltype (String or collection in this case).

However, I get the following error message;

WARN XPath 0:558 Errors loading flow variables into node : Coding issue: No enum constant org.knime.xml.node.xpath2.XPathNodeSettings.XPathOutput.Collection

It would also be helpful if you could explain what the different flow variables do in the node descriptions, as I am pretty certain I haven’t got this right.

Thanks in advance for your help,

Rich H

1 Like

Hi @stanage

I very quick answer in case this could be the issue. I notice in your snapshot that there are special characters in the name of your variable “ï>>¿fieldname”. Could this be the problem? KNIME interprets many things in the background and special characters in variable or column names is sometime an issue.

Hope this help.

Best

1 Like

Hi, I did change the fieldname using the column rename node as the csv file reader adds these characters (or reads them from the excel csv generated file) and it did not have any effect

Rich

I will remove all ‘_’ from the column names and see if this has an effect.

Hi, still get the No enum constant error;

WARN Row Filter 0:84 Column range filter: Input table doesn’t contain specified column name
WARN Row Filter 0:81 Column range filter: Input table doesn’t contain specified column name
WARN Row Filter 0:84 Column range filter: Input table doesn’t contain specified column name
WARN Row Filter 0:81 Column range filter: Input table doesn’t contain specified column name
WARN Variable Loop End 0:164 No variables selected
WARN Variable Loop End 0:164 No variables selected
WARN Variable Loop End 0:164 No variables selected
WARN Variable Loop End 0:164 No variables selected
WARN Variable Loop End 0:164 No variables selected
WARN Variable Loop End 0:164 No variables selected
WARN Variable Loop End 0:164 No variables selected
WARN XPath 0:558 Errors loading flow variables into node : Coding issue: No enum constant org.knime.xml.node.xpath2.XPathNodeSettings.XPathOutput.Collection
WARN Row Filter 0:84 Column range filter: Input table doesn’t contain specified column name
WARN Row Filter 0:81 Column range filter: Input table doesn’t contain specified column name
WARN XPath 0:558 Errors loading flow variables into node : Coding issue: No enum constant org.knime.xml.node.xpath2.XPathNodeSettings.XPathOutput.missing
WARN Row Filter 0:84 Column range filter: Input table doesn’t contain specified column name
WARN Row Filter 0:81 Column range filter: Input table doesn’t contain specified column name
WARN Row Filter 0:84 Column range filter: Input table doesn’t contain specified column name
WARN Row Filter 0:81 Column range filter: Input table doesn’t contain specified column name
WARN Row Filter 0:84 Column range filter: Input table doesn’t contain specified column name
WARN Row Filter 0:81 Column range filter: Input table doesn’t contain specified column name
WARN XPath 0:558 Errors loading flow variables into node : Coding issue: No enum constant org.knime.xml.node.xpath2.XPathNodeSettings.XPathOutput.Collection

I confirm that ‘_’ is not a problem with column names, but other special characters could be.

If you have removed all the special characters from the column names and you still have this problem, then the cause is somewhere else.

Could you please upload here your workflow with at least some mock data ?

Best

Ael

1 Like

Hi,
Attached is an example workflow and test data set. The xpathlistTest is the .csv file containing the xpaths, the test.xml a set of test data. The workflow executes without error, but fails to parse the xml into rows as expected. Anyhelp would be much appreciated.

Regards,

Rich H

XML Parser.knwf (31.9 KB)
xpathListTest.txt (296 Bytes)

books.xml (4.4 KB)

Hello @stanage,

what you are seeing is just a warning due to the reason that flow variables you are using inside XPath node are still not created but your flow executes just fine seems to me. So no need to worry about it.

Br,
Ivan

1 Like

Hi @aworker @stanage ,

Regarding ï>>¿fieldname

No, csv reader does not add them. They are part of your csv file. They’re what we call BOM (Byte Order Mark). It gets added to your csv file when you “save as csv”. This use to drive me crazy when I received csv files to process.

I’m not sure how Knime handles BOM, but @stanage at least you were able to apply column rename to it.

EDIT: I think if you change the encoding to UTF-8 in the csv reader, it should not read the BOM:

3 Likes

Hi,

The workflow isn’t right as it fails to split the xml into new columns and rows, I am doing something wrong, just not sure what,

I expected the output to be something like;

And to cycle through the xml document extracting all elements, also not sure if the celltype value is correct to set the cell type in the table.

Thanks,

Rich H

Cheers for the information, changing the encoding to UTF8 removed the BOM. I had been scratching my head over that, its always a good day when I learn something new :slight_smile: Thanks for the update. Rich H

1 Like

No problem @stanage , and I can confirm that choosing the encoding to UTF-8 actually takes care of it:
Without UTF-8:

With UTF-8:

Regarding your issue now, there are a few things that are wrong:
First of all, I don’t think the loop is a good approach. If you do it this way, it will actually run the XPath multiple times, that will create duplicates each time the XPath runs. You want to run all the rules at once in the same XPath.

Secondly, I am guessing that you want the information of each book on a separate line, correct, like this:
image

If that is the case, then you have to use type String(Multiple Rows) instead of String(SingleCell)

Now, putting everything together… The thing is, while the approach with the separate file that defines the title and path, etc is great, it does not look to be efficient with XPath. Let me explain:
Let’s first look at how all of these path should be defined manually:

It’s basically 1 XPath node only, with all the queries defined in the XPath summary.

Now, to make use of Flow variables here, unfortunately the node does not take arrays/list/collections of data that you can pass dynamically, but rather it will create a certain amount of the following sets, in your case, since you are defining 5 queries, it will create 5 sets of these (0 to 4):

This means that you have to manually and add your variables 5 times. Then you might as well just add them directly manually into the XPath Summary:

EDIT: So, because you are using a Variable Loop End, you are not collecting the data during the loop, so that is why you see only the last column, which is price.

If you use a Loop End instead, it will then give you the data, however, you will get duplicate entries as I mentioned. There are ways to then process the data to remove these duplicates.

However, there is also the Loop End (Column Append), which will append each new column to the table, without duplicating anything. And this works perfectly with your approach.

Workflow looks like this:
image

Results:
image

Here’s the workflow: XML Parser - Bruno.knwf (17.8 KB)

4 Likes

Thank you so much for this, learnt a bit more today, and been able to progress the workflow I am working on. Much appreciated. As you rightly point out about xpath node the way I am working with it has limitations. It would be good to be able to load an xpath list from an external source into the node, as I found by my error when I changed my source of xml file, it lost 60+ xpaths I had defined already :frowning:

Again many thanks for all the suggestions and solutions.

Rich H

1 Like

No problem @stanage , happy to help.

I think your approach is quite a good alternative (defining the XPath queries separately). We saw that it’s doable with the latest workflow via the loop. So, the approach would be to have your xml along with that definition of Xpath queries. Each xml file would need to have its own definition of queries, since structure might differ. So, it’s just a matter of “plugging” the correct definition with the xml file.

But I agree, if the XPath node itself could accept an external source for the definition of all the queries, that would great. With the loop, it can end up being slow if you have a big xml data and a lot of queries. It would have to run the queries 1 by 1.

Hi Bruno,

Yes its slow, I have 48K XML documents with 60+ queries to extract, and its taking its time, but while its running I can have a coffee !

1 Like

Nice explanation, solution and improvement suggestion @bruno29a :+1:
Ivan

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.