Knime on AWS Running Costs

I am finding Knime is very powerful tool. Love it. The community is great. Love you all.
In the last few weeks, I have been able to successfully setup a 3 processes.

As we start getting better at it, I decided to install Knime on AWS…Doing it literally today.

With AWS in other contexts, I’ve had a few un-forseen expenses. Their costing method leaves me a bit confused and quite literally out of pocket!
We are a tiny business and want to keep the ongoing costs to a minimum.

All my processes are simple data processing steps that need to be automatically executed once every 24 hours. They are quite literally a series of sql based select followed by some filtering/data manipulation and finally update commands.Classic batch processing steps.

My question to you is…

  1. Does AWS charge you for ‘availability time’ or 'usage time?
    So if I have 4 work flows that execute in 10 minutes every night at 12:01am and I use these workflows only once every 24 hours at 12:01am, will AWS charge me for 24 hours or for the 10 minutes that the processes executed?

  2. How do I reduce the ‘availability time’
    I hope not but If it is ‘availability time’. Meaning whether I use it or not I will get charged, then I would be interested in knowing how if possible to start the ec2 instance of Knime, execute my workflows and stop it automatically.

Your suggestions, directions, tips are all welcome.
Please comment if you have worked with Knime with AWS.

1 Like

keen to know as well.

new to AWS & have been playing around with it today as well. but can’t seem to work it… is there a guide that you have used on Knime website to configure and connect to the server?

When carrying out the AWS installation steps, one of the screens directs you to the right material.
Yes, lot of jargon on the way. Here is my short version of it.

Go to your Knime Analytics Platform > File > Preferences > KNIME > KNIME Explorer > New

Server Address: http://ec2-nn.nn.nn.nn-ap-myregion-2.compute.amazonaws.com
(Your AWS public DNS)

Username = knimeadmin
Password = AWS instance id
Mount Id = knime-server

Once you set this up , your knime analytics platform will show your server as a knime explorer item

Wondering if anyone has any information on this.
Its an important decision for us.

Hi @Shai,

We have a couple of company-managed KNIME servers up and running in AWS for quite some time and were searching for similar ways to keep costs under control.

AWS charges us for uptime, meaning once the EC2 instance runs, it costs you instance fees plus KNIME license costs on a minute basis. This is totally independent of when and how long your workflows actually run. So it’s just about availability, not usage.

There are a couple of options to reduce availability time. Unless you do not need the server 24/7 available (e.g., for workflow storage), you can spin up and tear down the underlying EC2 instance based on a time schedule using external tools (e.g., Gitlab, Jenkins, …). Tear down might also work from within your workflow (e.g. tear down the instance via AWS CLI with a proper delay to ensure the workflow finishes first).

Happy to give you some insights into how we do it, but rather don’t want to make this info publicly available in a forum, so just drop me a line if you are interested: mail@danielesser.me

Best regards,
Daniel

P.S.: PMs would be a great feature for this forum…

works and agree that that there is way too much jargon… when it could have been done in a simple manner

I’m probably poorer than most of you, and I know that what I’m going to say probably doesn’t fit your use case. Since I can’t pay for a Knime server, and I’m using Knime intensively for a personal project, what I did is go to Ebay, buy a beefy Dual Xeon with 32GB RAM for <300€ and use Knime through RDP & VPN if I’m outside home.

It’s probably more powerful than a z1d.xlarge instance. I have 200mbps down/up and it runs well IMO. Just finished the setup a few days ago.

I eyed a bit AWS but man, those prices are scary. Also, they don’t mention ingress and egress cost, but I guess you’ll have to download the data from somewhere and I do use S3 for other stuff, and it doesn’t come for free.

I don’t work with ML (although I’m planning to) aside for some palladian nodes but I’m happy with it. It’s easy to use and I can have an almost clean installation on it.

@iagovar
Thanks for the suggestion. Dual Xeon 32GB machine sounds like a viable option.
What is Ingress/Egress with Amazon?

My current costs are coming to $2.6 per hour. And that is even before it started working correctly. AWS is not a viable option for me. To be fair, I’d be prepared to pay upto $60 per month…Above that, bring in the Xeon beast!

@danielesser
Please check your in box. It will be great to find a solution.

There has to be an easier way.

I was wrong apparently, Ingress is mostly free for EC2 instances from outside, but egress isn’t, here there are: https://aws.amazon.com/en/ec2/pricing/on-demand/

IDK man, I just bought a second-hand workstation from a generation behind the current one in Ebay. I can pay that, but I can’t pay for those AWS prices not to mention the Knime server prices unless I want to pay them my whole annual income, eat rock and live under a cardboard box.

Please get in touch on ShaiBedarkar@LGACloud.com

For what putpose?

Obligatory 20 characters.

This actually has a reason to keep the forum clean and tidy…

@Shai looking through this thread it looks like your questions are answered, is there anything you need from us?

  • Does AWS charge you for ‘availability time’ or 'usage time?
    So if I have 4 work flows that execute in 10 minutes every night at 12:01am and I use these workflows only once every 24 hours at 12:01am, will AWS charge me for 24 hours or for the 10 minutes that the processes executed?

AWS charges for the uptime. However you can flexible start and shut down the server to reduce the costs.

@Iris,
I believe so.
It’s a very resourceful community.

@iagovar
No reason as such to get in touch. Just to get more info on any automation that you may have tried or any other specifics worth discussing.

My direct questions have been answered but I am fishing for specifics of reducing ongoing costs.

1 Like

@Shai In order to mitigate cost, I believe the easiest way to automate the EC2 instance for KNIME Server being turned on or off would be via creating an autoscale group and using “scheduled actions” to increase or decrease the number of running EC2 instances in the group on a schedule.

Alternatively, you could look into using the AWS Instance Scheduler.

2 Likes

Amazon charged me USD$14 for some weird reason on the first day.
Each time I was running it, every hour costed USD$2.65…
I believe there is a separate cost if your database is on a server outside of the AWS framework.
I could afford all this if my workflow that executes perfectly on my Windows 10 laptop would work on the server too. But my workflow doesn’t.

It appears we have to tweak the workflow to suite the server environment.
So for my trial and errors, I’d be investing 2.65 per hour…Can not afford at this stage.

There are few different types of processes I need to run. Some workflows are required to be executed more often than others. There is no interaction, no end users. Its pure data processing.

My workflows are expected to execute.

  1. Some Every 5 minutes
  2. Some 1am every night
  3. Every Week
  4. Every First of the month

So Currently I have decided to invest in a tiny windows pc.
11cmX11cm refurbished desktop AUD$200 on eBay.
I will install Knime Analytics Platform on it.
Put my workflows on a time loop.
Leave the computer on forever.

I do feel my desktop based solution is not the best…it will work though.
Once this street smart solution works, I would have the luxury to find out a better solution.
At that point, I’d love to talk to someone who is happy to convert these Windows 10 workflows of mine into a server executable version.

Please keep those suggestions coming in.
I have to make this work.
So keep watching this space.

If you are really on a tight budget this makes sense. If budget is less tight and compute needs increase I would go with iagovars suggestion of buying used server hardware which is usually dirt cheap compared to it’s performance.

I would make a plan for regular backups and at least try to out the restore procedure once. Simply creating a full disk image and then restore from that disk image should do the trick but having that in place can get you up and running quickly again. Using such consumer hardware will very likely lead to some hardware failure at some point. I would also reboot it regularly and ensure security updates are installed. So “leaving it on forever” and just forgetting about it is not a good idea.

Just because the cloud is all the hype it doesn’t need to be the right tool for you. In fact I’m rather skeptical about the cloud for compute needs (too costly). Even the possible solutions presented have a huge management overhead to keep everything running and costs under control. Plus access to the actual data / database also becomes an issue. If just because of AWS you need to expose your database to the internet to be able to access it? Not a good idea at all IMHO.

The cloud is great for web apps that need to be constantly available but have huge access spikes (time of day, weekend/weekday etc) or if you are big enough to afford a virtual private cloud + vpn connection to it so the cloud becomes part of your network and hence solving the hassle with data access.

2 Likes