Problem with XML Reader

Martin2024 · December 30, 2024, 11:09am

Hello everyone,
I’m trying to read a 1.4 GB xml file with the XML Reader. Unfortunately it doesn’t work. I get the error message “Execute failed: Java heap space”

My laptop only has 16 GB RAM, but it doesn’t work for a colleague with 32 GB RAM either.

But if I try to read the file with the File Reader (Complex Format) node it works, but I can’t use an XPath node afterwards.

Is there a bug in the XML Reader? Does anyone have a solution for me?

Thank you for your support.

Best regards
Martin

mlauber71 · December 30, 2024, 12:09pm

@Martin2024 welcome to the KNIME forum. Can you tell us how much RAM do you have allocated for KNIME.

You could try switching to the Columnar Backend storage of internal data.

Martin2024 · December 30, 2024, 12:33pm

Hello @mlauber71 ,

I made the adjustment - unfortunately the same result.

I have 10GB of RAM allocated for KNIME in the ini File.

McReady · January 1, 2025, 9:53am

Have you tried it at your colleagues 32 GB laptop with more RAM allocated (e.g. 28 GB in the ini)?

And if it works with the “File reader (complex format)”: Do you get a XML- or a string result? If it is string, you can try the “String to XML” or “Column to XML” node afterwards and use the XPath afterwards.

qqilihq · January 1, 2025, 7:49pm

If possible, use the integrated “XPath query” option in the XML Reader node to split your document into smaller chunks - you can then process them row-wise with subsequent nodes. This is different from reading one huge document into a single data cell and then chunking it with a following XPath node, as each document cell’s DOM must fit into memory.

-Philipp

Martin2024 · January 2, 2025, 7:12am

Yes with the same result

I can no longer get the original XML structure this way, so I can’t go this way.

Martin2024 · January 2, 2025, 8:21am

Hello Philipp,
I wanted to try that, unfortunately I don’t know exactly how to use the XPath query here. Unfortunately I don’t have the knowledge about this.

Can you perhaps explain this to me with an example?

mlauber71 · January 3, 2025, 9:46am

@Martin2024 maybe you can give this example a try. The XML is initially 500 MB and does work on my M1 machine with 15 GB assigned to KNIME 5.4 and Columnar Backend activated for the workflow.

The handling of Path for XML or JSON requires some trials and experimenting unless you are very familiar with the structures (in the case of JSON also some ungroup might be necessary cf. also the additional links).

Martin2024 · January 6, 2025, 10:11am

Hello,
I discovered an error in my XML file. After removing the faulty line, I was able to use the XPath filter in the XML Reader Node. This means I can now read the large file and process it further.
Thanks again to everyone for their support.

system · January 13, 2025, 10:11am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.