I am trying to import select tag/attribute values from OpenStreetMap data (Geofabrik GmbH) into a PostgreSQL database and already stumble at the very beginning.
I use an XML node with its XPath filter. I am aware that only XPath 1 is supported and I tested my XPath expression
/osm/node/tag[@k=‘place’ and (@v=‘city’ or @v=‘town’ or @v=‘village’ or @v=‘hamlet’ or @v=‘suburb’ or @v=‘locality’ or @v=‘country’ or @v=‘island’ or @v=‘islet’)]/…
against it. XMLBluerprint 16 tells me that XPath is correct and for my Malta test file it should return 11 node tags.
I have no clue what I am doing wrong not even where to dig into to fix. Has somebody experience with this?
I use KNIME 4.1.2 on Windows 10: Load_OSM_Data.knwf (6.9 KB)
My test file can be found at Malta test file.
Oh, well, stupid of me not to write the symptoms. The XML node keeps telling me that the result set produced when executing was an empty one,
I could narrow down a bit. It appears that it fails to compute with the condition on the attributes, i. e.
does not work whereas
produces a result set. I am rather puzzled about this.
Maybe I am getting to somewhere. I just found the following in the XML Reader description.
A limited XPath syntax is supported.
I could not find anything about the XML Reader node in the help. Am I now to test out what is supported and what is not? I strongly feel description/documentation needs improvement at this point.
As it is now, I concentrate on an external XSLT to produce a csv from the XML.
I suggest you read in the whole file using the XML Reader (without any filter) and then use the XPath node to extract the desired nodes. I have attached a workflow as reference. Hope it helps!
Load_OSM_Data.knwf (6.5 KB)
Thank you for your suggestion. I am afraid this is not feasible due to the file size - around 9 GB (the development file is significantly smaller).
Okay, that’s not very feasible then. Maybe you can preprocess the XML with a Line Reader and a CSV Writer in streaming mode? In the Line Reader you can use the following Regex to keep only relevant tags:
^\s*((.*[^\/]>)|(<tag.*))$. You have to enable the “Match input agains regex” option. In the CSV Reader in the quotes options, you have to select to never add quotes. Instead of CSV, just select XML as output file extension.
Thanks for your suggestion. I actually have created an XSLT but on my hardware it fails blowing the memory roof. So, I shall experiment with a file reader of some sort. The problem is, if I want to avoid to read in the entire file into KNIME I need to catch the entire desired tag with its content at the same time. I will see.
Based on your example file the line by line processing of KNIME streaming should work. If an XML element closes in the same line, it is ignored, unless it is a tag. All other elements are kept.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.