Problem with parsing Swissprot XML files

andreas_bergner · December 13, 2012, 4:04pm

Dear KNIME Users,

I am trying to use the "XML Reader" and "XPath" components to parse Swissprot XML files. After failing for a while I found that, strangely, parsing the original XML files with top level tags that contain atttibutes does not work, but after removing the attributes the files can be parsed. Does anyone know why & how this could be overcome?

Any hints & help would be greatly appreciated!

Cheers, Andreas

==========

The original file with attributes in the top level tags cannot be parsed: (XPath: /uniprot/entry/accession; XPath query: accession)

<uniprot xmlns="http://uniprot.org/uniprot" xmlns:xsi="... " ...>
<entry dataset="Swiss-Prot" created="1996-10-01" modified="2012-11-28" version="147">
<accession>P53779</accession> ...

After remiving the attributes, it works: (XPath: /uniprot/entry/accession; XPath query: accession)

<uniprot>
<entry>
<accession>P53779</accession> ...

thor · December 13, 2012, 5:37pm

Could you post or send me two small files demonstrating the problem or even a workflow? This has very likely to do with the namespace declarations in the first file with requires the XPath expressions to be namespace aware.

andreas_bergner · December 14, 2012, 9:22am

Hi thor,

Many thanks for looking into my problem!

The original file that does not work with the attached workflow is P21524.xml, the one that works is test.xml. The only differences between the two files are in the <uniprot> and <entry> tags in the first couple of lines.

Thanks & best wishes, Andreas

thor · December 14, 2012, 12:30pm

It's as I suspected. The attribute in the root element is the so-called namespace for the document (see http://en.wikipedia.org/wiki/XML_namespace if you are interested in details). Each element then resides in the namespace "http://uniprot.org/uniprot" and if you want to access certain elements you also have to address them with this namespace. The XML Reader and XPath nodes have an option at the bottom to assign a so-called "prefix" to the root namespace ("dns" by default). You now need to use this prefix in all XPath expressions, e.g. /dns:uniprot/dns:entry/dns:accession.

If you delete the attribute the namespace is also removed and XPath expressions without prefixes work (again) because everything belongs to the default (empty) namespace then.

andreas_bergner · December 14, 2012, 2:45pm

Hi again,

Great, it works now. Many thanks again for your help, you saved my sanity ... :-)

Andreas

karelman · April 16, 2014, 2:42am

I found very usefull Thor post, however I´m trying to do something similar with a sbml file (sbml is an xml file for sistems biology).

I have to parse all KEGG database to obtain data fron reactions that ihave to extract from the xml text.

Each file corresponds to a metabolic pathway from an specific organism.

I load the files list from a "List Files" Node and then the idea i sto make a loop that creates a table with the reactions info.

So I use a "TableRow To Variable Loop Start" connected to a XML Reader and a Variable Based File Reader The XML Reader then connected to the Loop End Node.

I use "/sbml/model/listOfreactions/reactions" as a Xpath Query and "/dns:sbml/dns:model/dns:listOfreactions/dns:reactions" as Prefix of root´s namespace.

Image XML Reader node configuration

The problem is that I´m having a "WARN XML Reader Node created an empty data table." from each file that inputs to the loop.

Workflow File:

thor · April 16, 2014, 9:31am

The prefix of the root's namespace is not a path, it's a simple identifier that is then used in the XPath expression. So just enter dns (e.g.) for the prefix and prefix every path segment in your expression with "dns:" (your prefix is in fact the correct XPath expression). If the XML document does not have any namespace then you can ignore the dns and prefixes altogether.

lovemmz · June 8, 2017, 12:56pm

Thanks a lot, thor.

a quertion for you. does the Knime xmlreader xpath query support "[]" filter?

I have a xml file with

<?xml version='1.0' encoding='UTF-8'?>
<users>
<user uid="2">
<id>2</id>
<name>Mike[masked]</name>
<profession>Worker</profession>
</user>
<user>
<id>3</id>
<name>Jason</name>
<profession>Sudent</profession>
</user>

<users>

I used Knime xmlreader xpath query and I did not check the "Incorporate namespace of the root element":

/users/user[@uid="2"] or /users/user[1]

but he xmlreader node excuted result is,

Node created an empty data table.

ferry.abt · July 8, 2017, 11:31am

Hey lovemmz,

I quickly tested your example and according to the Node Description of the XML Reader the XPath used offers only limited functionality.
The source code of the XPath used in the XML Reader doesn't seem to do something regarding brackets. Therfore I suggest for cases like this to read the XML using the XML Reader and then using the XPath node.

Best,
Ferry