Hardware -- advise needed


One of our client is setting up for us an on-premises space for a big data (2TB) analytics project. The project is a proof-of-concept one. I’d need to give the specifications for the machine needs. We are planning to work with KNIME Analytics Platform, Anaconda and Spark and access this dedicated space via remote desktop.
Could you please advise?

Thanks a lot,

Dear @ribizli,

this strongly depends on your use-case. Hence, we need to know quite a lot more information; currently we can’t give an estimate.

Where is the data processed? E.g. doing work on a spark cluster doesn’t impact KNIME, same goes for database processing, like joins.

How is it processed? Is this mainly ETL? Do you train (ML) models? do you use trained models for predictions?

What results are persisted to disk?


Hello Marvin,

Yes, I am aware that there isn’t enough specification, sorry. We figured we’d use a machine with 50GB memory, put a couple of VMs on it and thus set up the environment. I have no idea about the tasks coming with this dataset – from ETL to ML – as I haven’t see the data. I just want to be sure that there is enough memory to do whatever needs to be done.

Hi Ribizli,

thanks for following up. In my experience 50GB of memory will get you quite far, though I have also seen setups with 100+GB.
In any case: this is something that can partially be accommodated for in workflow design. For example, one can use chunked loops to process very large datasets in pieces, to reduce the memory load at any single point in time. Additionally, demanding workflows can be scheduled at quiet/differing times.
Workflow designers should also be informed about the servers capacity so that they can test (with a subset of data) on their own machine, to see how costly their workflow is.

Should you ever run into memory issues, please also check the hard disk – if a partition we write to is full, jobs and data may be kept in memory.

Kind regards


Thank you! I’ll keep this in mind!

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.