I am trying to import select tag/attribute values from OpenStreetMap data (Geofabrik GmbH) into a PostgreSQL database and already stumble at the very beginning.
I use an XML node with its XPath filter. I am aware that only XPath 1 is supported and I tested my XPath expression
/osm/node/tag[@k=‘place’ and (@v=‘city’ or @v=‘town’ or @v=‘village’ or @v=‘hamlet’ or @v=‘suburb’ or @v=‘locality’ or @v=‘country’ or @v=‘island’ or @v=‘islet’)]/…
against it. XMLBluerprint 16 tells me that XPath is correct and for my Malta test file it should return 11 node tags.
I have no clue what I am doing wrong not even where to dig into to fix. Has somebody experience with this?
Maybe I am getting to somewhere. I just found the following in the XML Reader description.
A limited XPath syntax is supported.
I could not find anything about the XML Reader node in the help. Am I now to test out what is supported and what is not? I strongly feel description/documentation needs improvement at this point.
As it is now, I concentrate on an external XSLT to produce a csv from the XML.
Hi,
I suggest you read in the whole file using the XML Reader (without any filter) and then use the XPath node to extract the desired nodes. I have attached a workflow as reference. Hope it helps!
Kind regards,
Alexander
Hi,
Okay, that’s not very feasible then. Maybe you can preprocess the XML with a Line Reader and a CSV Writer in streaming mode? In the Line Reader you can use the following Regex to keep only relevant tags: ^\s*((.*[^\/]>)|(<tag.*))$. You have to enable the “Match input agains regex” option. In the CSV Reader in the quotes options, you have to select to never add quotes. Instead of CSV, just select XML as output file extension.
Kind regards,
Alexander
Thanks for your suggestion. I actually have created an XSLT but on my hardware it fails blowing the memory roof. So, I shall experiment with a file reader of some sort. The problem is, if I want to avoid to read in the entire file into KNIME I need to catch the entire desired tag with its content at the same time. I will see.
Hi,
Based on your example file the line by line processing of KNIME streaming should work. If an XML element closes in the same line, it is ignored, unless it is a tag. All other elements are kept.
Kind regards,
Alexander