XPath performance

I have been able to parse xml embedded in text files from a SEC form. However, when I started ramping up the number of forms to be processed, the XPath node performance degrades greatly. After the batch size goes up to more than 100, the XPath processing gets stuck for hours and not progressing. In the workflow attached, I have 3000+ forms and that the XPath node doesn’t move forward at all. Any ideas how this should be done better? NCEN_Meta_full.xlsx (880.9 KB)
N_CEN_Parsing.knwf (32.7 KB)

Hi @leahxu -

I downloaded and ran your workflow on my system, which runs KNIME 4.0 on Windows 10 with 16 GB of RAM. What version of KNIME are you running, and on which OS?

The XPath node ran pretty quickly - by far the bottleneck for me was the GET Request node. Here’s the abbreviated results of the Timer Info node:

2019-07-09%2013_18_00-Output%20table%20-%200_321%20-%20Timer%20Info

As far as tweaks that might help you - how much RAM do you have available to KNIME right now? You can increase this value by editing the knime.ini file as described in our FAQ.

Another thing you might consider is wrapping the XPath node in a Chunk Loop, as shown on the Hub here.

If neither of those options work for you, we could also try streaming. Let me know what you think.

Thanks for the quick response. I’m running KNIME 3.7 on windows 7 with 12G ram. I checked the knime.ini file, is ram configured by Xmx…? If so, it was around 6g. I’m aware of the Get request execution time, but Xpath on my instance took a lot longer (if it ever finishes). I also had issues with Data explorer node, with error complaining about heap size. I’m trying to increase Xmx to 10g and see how that goes first.
thanks.

Scott, non related to the question itself: we’ve been trying to contact the KNIME u.s. team about KNIME server but haven’t heard back. Since you work in at Austin location, could you please provide me an email/phone so I can pass to our procurement department?

Sorry about that - several of our business development folks are out this week. I’ve passed on your message to a few folks internally. If you don’t hear from someone soon, please contact me at scott.fincher@knime.com, and I’ll get you connected with someone who can help.

Hi @leahxu,

I’m sorry you’ve had a hard time contacting someone on the US team. Please send me an email at jim.falgout@knime.com and we can set up a time to talk.

Cheers,
Jim

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.