Run on AWS Workspace

Good morning,

did someone tried to run Knime on an AWS WorkSpace, a virtual machine in the cloud? Instance sizes range from 1 vCPU, 2 GiB Memory to 4 vCPU, 16 GiB Memory or a Graphics Instance with 8 vCPU, 15 GiB Memory, 1 GPU, 4 GiB Video Memory.

Purpose is to not maintain a physical workstation in a local environment. Users would be able to access it much easier without local bandwidth limitations.

Thanks a lot for your thoughts
Mike

Hi Mike,

We didn’t test that yet. Of course we put together the KNIME Analytics Platform AMI available through Marketplace which allows RDP access to a Windows Server 2016 instance which does improve the bandwidth issues but of course does result in a desktop that needs managing. https://aws.amazon.com/marketplace/pp/B071ZNNLC6

I’ll let you know if we plan something more in the direction of AWS WorkSpace in the future.

Best,

Jon

Hi Jon,

thanks a lot for your reply and apologize for my late one. I’d be keen to test some scenarios to find out if Knime:

  1. can leverage the GPU either on the EC2 (ie. p2.xlarge) or the Graphics WorkSpace or on both
  2. performs better on a Windows or Linux WorkSpace
  3. which configuration (vCPU to RAM ratio) is sufficient

The interesting part with WorkSpaces is that by paying hourly, assuming a Knime workflow runs every few hours during business days and not 24/7, the WorkSpace costs are only invoked for the hours the instance runs.

Automating this for EC2 can get quite complex with Auto Scheduler and i.e. EC2 Spot Instances.

Cheers
Mike

Hi Mike,

Thanks for the feedback, that certainly helps to get into your thinking a bit more. In the case of workflow execution, are those workflows that would/could be executed via a REST API, WebPortal or on a schedule? Wondering there whether KNIME Server with distributed executors that can scale elastically would be an alternative solution? In that sense, we’d be able to push out a Cloud Formation template that has almost everything pre-configured, which should make the deployment quite a lot simpler.

Best,

Jon

Hi Jon,

it’s on schedule. the scenario is as follows:

  1. Several exports running from 9 to 9 every 3 hrs. provide files to S3
  2. Knime fetches, processes and updates them

So out of 168 hours a week, assuming the workflow runs for up to two hours per batch, it would be just 50 hours a week.

I once started playing with Knime Server but the annual price tag for the license is, for a once in a while running workflow, not efficient.

While you mentioning CloudFormation it’s kind of interesting but may not resolve the cost bottle neck because of the annual Knime Server license.

Update
AWS WorkSpaces made a nice impression. I’ve chosen a comparable Linux Instance “Power with Amazon Linux 2” with 16 GB RAM but only 4 vCPU’s compared to my current MacBook Pro with 8 cores. The workflow executed and Knime updates were installed much faster. Not to speak about the blazing fast connection when it comes to receiving and sending huge amounts of data. The WorkSpace client is also very convenient too.

But, there’s a catch. WorkSpaces require a user to login in order to resume a session which was paused (to save AWS costs).

So am now experimenting with the AWS Marketplace AMI you shared @jonfuller. Unfortunately I immediately run into troubles as the Knime workspace, whilst being available, seem not accessible by Knime

In addition the Remote Desktop Client seems a bit slow. Opening Knime, even the mouse clicks, doesn’t feel that snappy in comparison to the AWS WorkSpace App. Updating Knime took longer as it did in the AWS WorkSpace and felt even slower than on my MacBook.

Cheers
Mike

Hi Mike,

Really interesting to hear your feedback on Workspaces. It certainly looks interesting. I was interested to know which instance type you used for the Marketplace AMI?

I think that the issue that you see is due to the fact that we run a powershell script to launch KNIME Analytics Platform automatically at login. It waits a few seconds before launching. If you also double-clicked on the KNIME Analytics Platform icon on the desktop as well, then both instances will try to launch and use the same workspace which isn’t possible - hence the error message.

Thanks,

Jon

Hey Mike,

I read through the above posts and a few things came to mind.

  • I’m not surprised that the “user experience” feels more responsive with Workspaces. AWS markets this service as a replacement for developer workspaces, so I’m sure they’ve put quite a bit of effort into giving the best experience possible. For example, they use PCoIP (see bottom of this doc) for the client connections (thus the need for a separate client to connect to them) rather than older connection approaches like RDP or VNC.

  • Also, AWS is pretty hush about what they do under the hood for services like this. You mention that data transfer is faster with workspaces than EC2. Are you reading/writing to S3 solely, or also transferring data in other ways? I ask because my educated guess is Workspaces heavily leverage S3 to persist and retrieve images, session state, user data, etc. That would likely correlate to pretty fast throughput when you are also working with S3 since they need the performance as well. By comparison, AWS limits bandwidth to EC2 instances based on instance size, so just because an EC2 instance has comparable CPU/RAM, it’s very hard to assess whether the potential bandwidth is comparable.

  • Based on the above, could you elaborate on what your EC2 based approach setup looks like?
    ** What is the EC2 instance size? (To gauge potential network throughput)
    ** Is the instance located in the same region as your S3 bucket?
    ** Is the EC2 instance in a VPC? If so, are you using VPC endpoints for S3 or is traffic routed through a NAT?

Thanks!

-Jason

I’ve chosen a "Power with Amazon Linux 2” with 16 GB RAM and 4 vCPU’s. Given the fact that the WorkSpace got to run permanently it did not help reduce costs. Whilst the performance was great I canceled it, though.

For the AMI I’ve chosen the default one in order to get comparable results. The explanation of the error message is plausible. Though, I terminated the instance due to reduced performance.

At present I am using a Windows 10 RDP provided by our IT for no costs which brings similar results to the AMI. I can freely add more resources to scale up if necessary but that approach is not available and does not match the initial purpose if this request.

If I’d be required to make a resume I’d recommend using the Linux WorkSpace with an annual subscription. The chosen size depends on the workflows which should run. Mine for example was parallelized to use each available core (so max. 4 parallel executions) and the RAM I’ve chosen based on the actual consumption while running the workflow.

One last word about EC2. There are several instance types available. Some got bottle necks in EBS (IOPS) the bandwidth or other stuff. It’s very complex to examine possible bottle necks as I’ve encountered, not with Knime btw, performance regressions without hitting any EC2 / AWS limits at all. Hence, I’d recommend EC2 only for really sophisticated Knimers or those with plenty of time and money to play with.

For all others WorkSpaces offers great performance for small fees.