Read XML with DTD definition

I have an XML file that does not link to an XSD namespace defintion but contains this header:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE collection SYSTEM 'BioC.dtd'>
<collection>
...
</collection>

I am used to XSD schema defintions and have not seen DTD before. Also, it is not linked with a proper URI. When I try to read this file with the XML Reader node I get an error:

ERROR XML Reader  3:6  Execute failed: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,78]

I get an additional error message in German (not sure if that’s because my system language is German):

Message: Externe DTD: Lesen von externer DTD "" nicht erfolgreich, da "file"-Zugriff wegen der von der Eigenschaft "accessExternalDTD" festgelegten Einschränkung nicht zulässig ist.

This roughly translates to “reading of external DTD not succesful, file access not allowed because of the restrictions defined in accessExternalDTD”.

A simple solution is to remove <!DOCTYPE collection SYSTEM 'BioC.dtd'> from the XML file and read without specified namespaces. However, I have several million of those files and would prefer not to change them.

Is there a way to ignore this error in the XML Reader node or supply a DTD file to the reader?

The full XML file is attached:
9950.BioC.XML (2.8 MB)

That’s odd. On my system there are no errors:

What version of KNIME are you using?

4 Likes

You can read the file using File Reader node, concatenate the rows using GroupBy and then convert it to XML using String to XML. You may also need to edit the first line after reading file.

2 Likes

Thanks @elsamuel! You are right, I can read the file.

To be more specific: I get the error when I try to run an xpath query in the XML Reader node!

Thanks @armingrudd!

I think the best solution is a combination of both of your comments: Read the full XML with XML Reader but without an XPath in the Reader. This can be followed by an XPath node to do the XPath query.

The error only occured when the XPath query was done in the XML Reader node.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.