Import XML data

Hi everyone,

I'm looking for a way to import a XML data of about 880MB in Knime. (http://download.swissmedicinfo.ch/?Lang=DE)

It is the official description of the different alowed medicines in Switzerland and I would like to extract informations from it (textmining). I got the following message: Execute failed: Java heap space

If it's not possible, I thought on an other way to get the information. (My Knime knowldege are based unfortunatly on two days experience... :) )  The structure of each article is always the same, maybe there is any possibility in Knime to import only a part of the document (with a filter on column A of folowing list: https://www.swissmedic.ch/arzneimittel/00156/00221/00222/00230/index.html?lang=fr&download=NHzLpZeg7t,lnp6I0NTU042l2Z6ln1ae2IZn4Z2qZpnO2Yuq2Z6gpJCDdHx7hGym162epYbg2c_JjKbNoKSn6A--) and analyse each article separetly? 

I'm looking forward for your answers!

Cheers

 

You cannot read such a large file into memory as a single item. XML requires several times the file size as memory when it is processed. However, you can already split the file while reading and create much smaller documents, e.g. one per medicalInformation. This will results in ~22,000 rows which are much easier to process further down the workflow anyway. Simply specify /medicalInformations/medicalInformation as XPath filter in the XML Reader's dialog.

Thanks a lot Thor for your quick answer: first step is done :)