Handling HttpRetriever ConcurrentModificationException Error

Edlueze · March 22, 2016, 3:25am

Two questions:

1. How do I prevent the ConcurrentModificationException error when using the Palladian HttpRetriever Node?

2. If I can't prevent it, how do I retry the HttpRetriever Node by wrapping it in a Try-Catch block?

Some background. I have a very large workflow for crawling website data. It takes about 6 hours to run overnight but soon I'll want it to run continuously. Any problem that causes the workflow to stop is fatal so I am trying desperately to fix even the smallest issues.

Last week I finally upgraded from KNIME 2.12.1 to KNIME 3.1.1 and I've started to notice the workflow occassionally failing at the Palladian HttpRetriever node - throwing a ConcurrentModificationException error. When I crawl a webpage I do it in parallel with several HttpRetriever nodes calling a GET at the same time (just like a regular browser would load a webpage from many different sources).

In the past I've wrapped problematic nodes with a looping Try-Catch block to have the workflow retry rather than stop. This is how I handled an intermittent problem writing to an overworked database:

Retry Database Writer Until Success (*)
https://tech.knime.org/forum/knime-users/how-to-handle-the-15-minutes-disconnect-from-twitter-with-knime-twitter-workflow

(*) I posted my workflow at the bottom of the thread "How to handle the 15 minutes disconnect from twitter with knime twitter workflow"

Basically what I do is wrap the problematic node in a Try-Catch and then wrap that in a Recursive-Loop that loops until the node successfully completes what it was supposed to do. Rather complicated, but more on that later.

Unfortunately this technique doesn't work with the HttpRetriever Node because (I think) it is a two-port node - not a one-port node like the Database Writer. Consequently my other downstream nodes now complain that "Loop start and end nodes are not in the same workflow".

This is my theory about what is happening. The Try node pushes a call-block on top of the FlowVariable stack. The Catch node then pops that call-block off the top of the stack and allows the workflow to carry on. BUT the FlowVariables coming off the second data port of the HttpRetriever node (the Cookies data port) are now infected with the Try-Call-Block and are not Catched. This confuses the downstream Outer Loop-End which complains that "Loop start and end nodes are not in the same workflow".

It's complicated so I've attached some pictures. You can see the Try-Catch block around the HttpRetriever in the Inner Loop. And you can see the "Loop start and end nodes are not in the same workflow" error in the Outer Loop.

To fix my theory I tried adding another Catch to the second port of the HttpRetriever node. This is a bit weird but I've had some luck with this sort of thing in the past. No luck this time. But note here that the infection might still be leaking through the Recurisve Loop End port.

But looking at the big picture, this is all way too complicated for what I'm trying to do. All I want is an "On Error Try Again" loop which should be two nodes. Right now I'm up to 18 nodes and a very complicated mess!

Suggestions?

qqilihq · March 22, 2016, 9:30am

That exception should not happen in the first place. Please let me know:

The exact Palladian version you are using, as shown in the "About KNIME" menu.
The stacktrace when the exception occurs (as shown when DEBUG logging is enabled).

Edlueze · March 22, 2016, 2:58pm

1. The version of Palladian is:

Palladian for KNIME
1.6.100.v201512091814
ws.palladian.nodes.feature.feature.group
palladian.ws; Philipp Katz, Klemens Muthmann, David Urbansky.

2. DEBUG logging is on. I'll let you know when I next catch the error.

Thanks!

qqilihq · March 22, 2016, 4:45pm

Hi Edlueze,

thank you for the info. That version looks quite old, we had a similar bug report about a ConcurrentModificationException in the meantime and there should already be a fix for it. Could you please try updating the Palladian Nodes to a recent version and see, whether this solves your issue?

In case you're still having the problem, the log output would be superb!

Best,
Philipp

Edlueze · March 23, 2016, 1:11am

Fantastic! My KNIME 3.1.1 was a clean download-and-install so it never occurred to me that the Palladian Nodes had already been updated. But I'll continue to keep a lookout for the error.

BTW did you have any thoughts on my proposal for a general purpose "On Error Try Again" pair of loop nodes in KNIME? There seem to be a couple of use cases: (a) handling a 15 minute disconnect from Twitter, (b) writing to an overworked database, (c) handling nodes that intermittently fail (I had trouble with the Vernalis Wait-for-time node that occassionally failed if it missed the precise timeslice it was waiting for - but that makes the node unuseable). In short, any time the user would prefer the workflow to keep going rather than stop for no good purpose.

If you think it's a fair idea I'll track down the KNIME feature request list and post it there (or do the KNIME guys read the Palladian thread?).

Edlueze · March 23, 2016, 7:01am

My crawler was still running when I got your suggestion to update - so I just left it running in case I hit the ConcurrentModificationException. And I did!

The crawler had been running for about 14 hours when it suddenly stopped (extra slow because of all the logging).

I will send the entire log file via this forum to your private email (it's rather large). But here are some highlights from the knime.log:

2016-03-23 12:40:43,014 : DEBUG : KNIME-Worker-71 : DefaultHttpClient : HttpRetriever : 0:1409:1115:395 : Socket Closed
java.net.SocketException: Socket Closed

Exception java.net.SocketException: Socket Closed for URL ...

ws.palladian.retrieval.HttpException: Exception java.net.SocketException: Socket Closed for URL ...

Caused by: java.net.SocketException: Socket Closed
	at java.net.SocketInputStream.socketRead0(Native Method)

HttpRetrieverCellFactory : HttpRetriever : 0:1409:1115:395 : Error retrieving:
Exception java.lang.IllegalStateException: Connection is not open for URL: Connection is not open

ws.palladian.retrieval.HttpException: Exception java.lang.IllegalStateException: Connection is not open for URL: Connection is not open
at ws.palladian.retrieval.HttpRetriever.execute(HttpRetriever.java:419)

Caused by: java.lang.IllegalStateException: Connection is not open
	at org.apache.http.impl.SocketHttpClientConnection.assertOpen(SocketHttpClientConnection.java:84)

2016-03-23 12:40:43,014 : ERROR : KNIME-Worker-64 : HttpRetriever : HttpRetriever : 0:1409:1115:395 : Execute failed: ("ConcurrentModificationException"): null
2016-03-23 12:40:43,014 : DEBUG : KNIME-Worker-64 : HttpRetriever : HttpRetriever : 0:1409:1115:395 : Execute failed: ("ConcurrentModificationException"): null
java.util.ConcurrentModificationException
	at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)

qqilihq · March 23, 2016, 11:24am

Hi there,

thanks a bunch for the detailed error report. It looks pretty much like the issue which we recently fixed, so it definitely makes sense to give it another try with the updated Palladian nodes.

About the "try and wait" issue: In case of API requests, I could imagine just checking for the HTTP status codes (or even better: parse the quota HTTP headers, which show the remaining quota, in case they are present. Twitter for example provides such header info, as documented here) and keep looping and waiting until the request succeeds or there are new requests available in the quota.

In general, the HttpRetriever's philosophy is, not to fail execution at all. Instead, the node outputs a "Missing cell", in case a network error occured, or outputs the result with a 4xx or 5xx status code. To summarize, there are the following potential outputs:

2xx or 3xx status code ---> success
4xx or 5xx status code ---> retrieval worked (i.e. request was executed), but server returned an error
missing cell ---> usually network-specific error (i.e. request was not executed at all or failed in between)
node fails with red X ---> bug in the node, which should usually not happen :)

So for usual circumstances, the try-and-catch error nodes should not be necessary when working with the HttpRetriever.

I like your idea about the "keep trying" loop node, but as I outlined, the HttpRetriever's behaviour would have to be modified to all potential cases. This is something which I will definitly keep in mind for future improvements. I would recommend posting it to the "General" forum, as I'm not sure if the official KNIME guys keep looking into the Palladian forum regulary. I'll also have an eye on the topic and I will consider improving the HttpRetriever in the future to integrate with such a node, when it's available.

Best,
Philipp

qqilihq · March 23, 2016, 11:26am

PS: In case the error should still occur even after updating, please get back to me. When I fixed it, I was not really able to reproduce the problem setting and unfortunately I never heard back from the guy how reported the issue.

Thanks!

Edlueze · March 24, 2016, 3:26am

Turns out I haven't completely escaped from the ConcurrentModificationException - but this time I think it is a KNIME issue and not a Palladian issue. You can follow my new thread here:

https://tech.knime.org/forum/knime-general/unresponsive-death-of-recursive-loop-end-2-ports

Edlueze · March 24, 2016, 3:28am

{duplicate}

system · April 21, 2023, 9:40pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.