Unable to run Spark Predictor

Hi,

I am trying to run a Spark Predictor using a trained Spark Linear Regression Learner, but the Spark Predictor component seems to fail every time I execute it.
This is the error that is printed from KNIME’s console:
Execute failed: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://JobServer/user/data-manager#1019598049]] after [3000 ms]. Sender[null] sent message of type “spark.jobserver.DataManagerActor$StoreData”. (RestoredThrowable)

After inspecting /var/log/spark2-job-server/spark-job-server.log, turns out there was this exception:
ERROR ka.actor.OneForOneStrategy [] [akka://JobServer/user/data-manager] - /tmp/spark2-job-server/upload/knime-sparkModel7134388612124129131.tmp-2018-07-12T14_46_39.707+07_00.dat (No such file or directory)
java.io.FileNotFoundException: /tmp/spark2-job-server/upload/knime-sparkModel7134388612124129131.tmp-2018-07-12T14_46_39.707+07_00.dat (No such file or directory)

If it helps, I also tried looking at /var/log/spark2-job-server//spark-job-server.log, but there were nothing inside the log file relating to the exception (the log just ends at “Linear Regression With SGD done” and “Job finished OK

Hi @veee

in /var/log/spark2-job-server/ are there any folders called “jobserver-~”?

If yes: They contain the actual logs for the Spark contexts (one context = one folder). Please locate the one for the context with the failed job and send it to me via private message, so I can take a look.

If not, then Spark Jobserver is not correctly set up. Have you already taken a look at the installation guide?

Best,
Björn

Hi,

Yes, there is a jobserver-~ folder. What I meant to say in my previous post was that I took a look at /var/log/spark2-job-server/jobserver-~/spark-job-server.log, but nothing in the log file contains something that relates to the exception.
This time however, there are additional messages containing several “removed broadcast”.

How do I send a private message by the way?

Hi @veee

sorry for the long break on this.

How do I send a private message by the way?

It seems I have to send you a message first that you can reply to. Sorry for the confusion, I though this was possible.

I wasn’t quite sure what to do with this problem, but now someone else has run into a similar issue. After doing a bit of digging through the jobserver code, it would say that the directory

/tmp/spark2-job-server/upload

does not exist yet, but the directory

/tmp/spark2-job-server/

exists but belongs to the “wrong” Linux user. This would explain the FileNotFoundException with “No such file or directory”.

As per the installation guide, jobserver should run as its own Linux user, e.g. “spark-job-server”, and all the directories accessed by jobserver need to belong to that user.

I would propose to stop jobserver, then delete the /tmp/spark2-job-server/ directory, and then to start jobserver again. Afterwards run the predictor job again to see whether that fixes the issue.

Björn

Hi Björn,

Sorry for the long break as well.
I finally solved the issue by restarting the Spark jobserver, without deleting the “/tmp/spark2-job-server/” directory. This issue has happened several times, but restarting the jobserver always did the trick.

Thanks a lot for your help!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.