I have an XML file that does not link to an XSD namespace defintion but contains this header:
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE collection SYSTEM 'BioC.dtd'>
<collection>
...
</collection>
I am used to XSD schema defintions and have not seen DTD before. Also, it is not linked with a proper URI. When I try to read this file with the XML Reader node I get an error:
ERROR XML Reader 3:6 Execute failed: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,78]
I get an additional error message in German (not sure if that’s because my system language is German):
Message: Externe DTD: Lesen von externer DTD "" nicht erfolgreich, da "file"-Zugriff wegen der von der Eigenschaft "accessExternalDTD" festgelegten Einschränkung nicht zulässig ist.
This roughly translates to “reading of external DTD not succesful, file access not allowed because of the restrictions defined in accessExternalDTD”.
A simple solution is to remove <!DOCTYPE collection SYSTEM 'BioC.dtd'>
from the XML file and read without specified namespaces. However, I have several million of those files and would prefer not to change them.
Is there a way to ignore this error in the XML Reader node or supply a DTD file to the reader?
The full XML file is attached:
9950.BioC.XML (2.8 MB)