Execute failed: (“StackOverflowError”): null” on the HTML parser

Thanks for that explanation @qqilihq! That makes sense about what kinds of situations I should be more careful in.

I’m getting a new error “Execute failed: (“StackOverflowError”): null” on the HTML parser. Do you have any idea how I can fix that? Or where I can go to find out what different error codes mean? Also, I’m not sure on thread etiquette. Perhaps I should start this in a new thread, as it’s fairly unrelated to my original question.

I’ve isolated the issue to 4 URLs in my current list that are causing the problem. They are all PDF documents, but none of them have “pdf” in the URL. They do all have “View” or “Preview” in the URL, so I could filter by that, but that feels like I could also exclude valid pages that way. Do you know any more elegant solution that could help me exclude these kinds of results in the future, before I try to use the HTML parser?

http://www.pilotpointlibrary.org/DocumentCenter/View/2281/2017-2018-CAFR
https://neptunebeachfl.civicclerk.com/web/UserControls/DocPreview.aspx?p=1&aoid=33
http://www.garlandtx.gov/DocumentCenter/View/5526/0724-Fiirefighter-Recruit?bidId=
http://bonnieandclydedays.org/AgendaCenter/ViewFile/Agenda/_04082019-764

Moved to new topic as for reason stated yourself :slight_smile:
Br,
Ivan

1 Like

Hi stevelp,

I suggest the following combination for that:

  1. Define a maximum download size limit in the HTTP Retriever. This will stop downloading once the given file size has been reached. Set it to e.g. 0.5 MB (or even less, depending on your dataset)

  2. Use the Content-Type HTTP header to remove non-HTML files. You can extract this using the HTTP Result Data Extractor node.

I have prepared an example for you which you can find on my personal NodePit Space:

https://nodepit.com/workflow/com.nodepit.space%2Fqqilihq%2Fpublic%2Fforum%2Fexecute-failed-stackoverflowerror-null-on-the-html-parser-23567.knwf

Hope this helps!
Philipp

3 Likes

Awesome, thanks Phillip! I also adjusted the default socket timeout from 60 to 10 seconds. I tried it on a few different pages, and it seemed to work alright at that limit.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.