Server - Hardware recommendations

Hello Server experts,

We are finally moving to Knime server and I am looking at getting an AWS box. As there is not much documentation around the technical specs, what kind of hardware matters for data manipulation processes? We will be doing a mostly simple calculations, aggregations and fetching / parsing remote content.

As an example, we will be getting daily data from over 100 Google Profiles, Adwords Accounts, Twitter data etc. And then manipulate and store in a seperate DB. We will also collect 10k worth of HTML pages and extract content on a monthly basis..

What matters here? CPU, Disk Speed, Disk Size or Memory?

Thank you

Hi nxfxcom,

great that you are moving to the KNIME Server. Crawling ~10k websites and extracting the content requires no big hardware. Especially if you are running it in batch jobs once a month. Scanning ~100 Google, Twitter profiles is also no big deal.

More cores are nice of course. You can easily parallelize crawling and extraction processes and make use of more cores using the Parallel Chunk Start node. Workflow branches are execurted in parallel as well and many nodes make use of multiple cores. However, if you are running batch jobs monthly four cores should be enough.

The more memory you have the less data has to be buffered to hard disk by KNIME (less IO) the faster workflows are processed. 16GB Ram should be enoughg for this amount of data. However, if the amount will grow in the future more Ram is always better.

Disk Speed is important if the data will not fit into memory and is buffered to disk. RAID with fast disks makes sense here. SSDs are of course nice but not really a must have.

Cheers, Kilian

Thank you,

I appreciate the input.. IT justapproved my request.. More to come shortly ;)