Xpath and flow variables - set cell types via flow variable

stanage · October 19, 2021, 7:23am

Hi there,

I have a problem, I need to pass xpath flow variables into the xpath node from a .csv file. I have set up the following workflow;

I have set the following flow variables in the xpath node;

where fieldname contains the column name, xpath contains the xpath variable, celltype contains the celltype (String or collection in this case).

However, I get the following error message;

WARN XPath 0:558 Errors loading flow variables into node : Coding issue: No enum constant org.knime.xml.node.xpath2.XPathNodeSettings.XPathOutput.Collection

It would also be helpful if you could explain what the different flow variables do in the node descriptions, as I am pretty certain I haven’t got this right.

Thanks in advance for your help,

Rich H

aworker · October 19, 2021, 7:28am

Hi @stanage

I very quick answer in case this could be the issue. I notice in your snapshot that there are special characters in the name of your variable “ï>>¿fieldname”. Could this be the problem? KNIME interprets many things in the background and special characters in variable or column names is sometime an issue.

Hope this help.

Best

stanage · October 19, 2021, 7:30am

Hi, I did change the fieldname using the column rename node as the csv file reader adds these characters (or reads them from the excel csv generated file) and it did not have any effect

Rich

stanage · October 19, 2021, 7:34am

I will remove all ‘_’ from the column names and see if this has an effect.

stanage · October 19, 2021, 8:10am

Hi, still get the No enum constant error;

WARN Row Filter 0:84 WARN Row Filter 0:81 WARN Row Filter 0:84 WARN Row Filter 0:81 WARN Variable Loop End 0:164 WARN Variable Loop End 0:164 WARN Variable Loop End 0:164 WARN Variable Loop End 0:164 WARN Variable Loop End 0:164 WARN Variable Loop End 0:164 WARN Variable Loop End 0:164 WARN XPath WARN Row Filter 0:84 WARN Row Filter 0:81 WARN XPath WARN Row Filter 0:84 WARN Row Filter 0:81 WARN Row Filter 0:84 WARN Row Filter 0:81 WARN Row Filter 0:84 WARN Row Filter 0:81 WARN XPath Column range filter: Input table doesn’t contain specified column name
Column range filter: Input table doesn’t contain specified column name
Column range filter: Input table doesn’t contain specified column name
Column range filter: Input table doesn’t contain specified column name
No variables selected
No variables selected
No variables selected
No variables selected
No variables selected
No variables selected
No variables selected
0:558 Errors loading flow variables into node : Coding issue: No enum constant org.knime.xml.node.xpath2.XPathNodeSettings.XPathOutput.Collection
Column range filter: Input table doesn’t contain specified column name
Column range filter: Input table doesn’t contain specified column name
0:558 Errors loading flow variables into node : Coding issue: No enum constant org.knime.xml.node.xpath2.XPathNodeSettings.XPathOutput.missing
Column range filter: Input table doesn’t contain specified column name
Column range filter: Input table doesn’t contain specified column name
Column range filter: Input table doesn’t contain specified column name
Column range filter: Input table doesn’t contain specified column name
Column range filter: Input table doesn’t contain specified column name
Column range filter: Input table doesn’t contain specified column name
0:558 Errors loading flow variables into node : Coding issue: No enum constant org.knime.xml.node.xpath2.XPathNodeSettings.XPathOutput.Collection

aworker · October 19, 2021, 9:18am

I confirm that ‘_’ is not a problem with column names, but other special characters could be.

If you have removed all the special characters from the column names and you still have this problem, then the cause is somewhere else.

Could you please upload here your workflow with at least some mock data ?

Best

Ael

stanage · October 19, 2021, 12:16pm

Hi,
Attached is an example workflow and test data set. The xpathlistTest is the .csv file containing the xpaths, the test.xml a set of test data. The workflow executes without error, but fails to parse the xml into rows as expected. Anyhelp would be much appreciated.

Regards,

Rich H

XML Parser.knwf (31.9 KB)
xpathListTest.txt (296 Bytes)

books.xml (4.4 KB)

ipazin · October 19, 2021, 1:04pm

Hello @stanage,

what you are seeing is just a warning due to the reason that flow variables you are using inside XPath node are still not created but your flow executes just fine seems to me. So no need to worry about it.

Br,
Ivan

bruno29a · October 19, 2021, 1:04pm

Hi @aworker @stanage ,

Regarding ï>>¿fieldname

No, csv reader does not add them. They are part of your csv file. They’re what we call BOM (Byte Order Mark). It gets added to your csv file when you “save as csv”. This use to drive me crazy when I received csv files to process.

I’m not sure how Knime handles BOM, but @stanage at least you were able to apply column rename to it.

EDIT: I think if you change the encoding to UTF-8 in the csv reader, it should not read the BOM:

stanage · October 19, 2021, 1:11pm

Hi,

The workflow isn’t right as it fails to split the xml into new columns and rows, I am doing something wrong, just not sure what,

I expected the output to be something like;

And to cycle through the xml document extracting all elements, also not sure if the celltype value is correct to set the cell type in the table.

Thanks,

Rich H

stanage · October 19, 2021, 1:37pm

Cheers for the information, changing the encoding to UTF8 removed the BOM. I had been scratching my head over that, its always a good day when I learn something new Thanks for the update. Rich H

bruno29a · October 19, 2021, 1:59pm

No problem @stanage , and I can confirm that choosing the encoding to UTF-8 actually takes care of it:
Without UTF-8:

With UTF-8:

Regarding your issue now, there are a few things that are wrong:
First of all, I don’t think the loop is a good approach. If you do it this way, it will actually run the XPath multiple times, that will create duplicates each time the XPath runs. You want to run all the rules at once in the same XPath.

Secondly, I am guessing that you want the information of each book on a separate line, correct, like this:

If that is the case, then you have to use type String(Multiple Rows) instead of String(SingleCell)

Now, putting everything together… The thing is, while the approach with the separate file that defines the title and path, etc is great, it does not look to be efficient with XPath. Let me explain:
Let’s first look at how all of these path should be defined manually:

It’s basically 1 XPath node only, with all the queries defined in the XPath summary.

Now, to make use of Flow variables here, unfortunately the node does not take arrays/list/collections of data that you can pass dynamically, but rather it will create a certain amount of the following sets, in your case, since you are defining 5 queries, it will create 5 sets of these (0 to 4):

This means that you have to manually and add your variables 5 times. Then you might as well just add them directly manually into the XPath Summary:

EDIT: So, because you are using a Variable Loop End, you are not collecting the data during the loop, so that is why you see only the last column, which is price.

If you use a Loop End instead, it will then give you the data, however, you will get duplicate entries as I mentioned. There are ways to then process the data to remove these duplicates.

However, there is also the Loop End (Column Append), which will append each new column to the table, without duplicating anything. And this works perfectly with your approach.

Workflow looks like this:

Results:

Here’s the workflow: XML Parser - Bruno.knwf (17.8 KB)

stanage · October 19, 2021, 2:51pm

Thank you so much for this, learnt a bit more today, and been able to progress the workflow I am working on. Much appreciated. As you rightly point out about xpath node the way I am working with it has limitations. It would be good to be able to load an xpath list from an external source into the node, as I found by my error when I changed my source of xml file, it lost 60+ xpaths I had defined already

Again many thanks for all the suggestions and solutions.

Rich H

bruno29a · October 19, 2021, 2:59pm

No problem @stanage , happy to help.

I think your approach is quite a good alternative (defining the XPath queries separately). We saw that it’s doable with the latest workflow via the loop. So, the approach would be to have your xml along with that definition of Xpath queries. Each xml file would need to have its own definition of queries, since structure might differ. So, it’s just a matter of “plugging” the correct definition with the xml file.

But I agree, if the XPath node itself could accept an external source for the definition of all the queries, that would great. With the loop, it can end up being slow if you have a big xml data and a lot of queries. It would have to run the queries 1 by 1.

stanage · October 19, 2021, 3:06pm

Hi Bruno,

Yes its slow, I have 48K XML documents with 60+ queries to extract, and its taking its time, but while its running I can have a coffee !

ipazin · October 20, 2021, 10:31am

Nice explanation, solution and improvement suggestion @bruno29a
Ivan

system · October 27, 2021, 10:32am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.