Hardware -- advise needed

ribizli · November 9, 2021, 1:27pm

Hi,

One of our client is setting up for us an on-premises space for a big data (2TB) analytics project. The project is a proof-of-concept one. I’d need to give the specifications for the machine needs. We are planning to work with KNIME Analytics Platform, Anaconda and Spark and access this dedicated space via remote desktop.
Could you please advise?

Thanks a lot,
Ribizli

marvin_kickuth · November 11, 2021, 9:51am

Dear @ribizli,

this strongly depends on your use-case. Hence, we need to know quite a lot more information; currently we can’t give an estimate.

Where is the data processed? E.g. doing work on a spark cluster doesn’t impact KNIME, same goes for database processing, like joins.

How is it processed? Is this mainly ETL? Do you train (ML) models? do you use trained models for predictions?

What results are persisted to disk?

ribizli · November 12, 2021, 1:34pm

Hello Marvin,

Yes, I am aware that there isn’t enough specification, sorry. We figured we’d use a machine with 50GB memory, put a couple of VMs on it and thus set up the environment. I have no idea about the tasks coming with this dataset – from ETL to ML – as I haven’t see the data. I just want to be sure that there is enough memory to do whatever needs to be done.
Thanks,
Ribizli

marvin_kickuth · November 12, 2021, 2:04pm

Hi Ribizli,

thanks for following up. In my experience 50GB of memory will get you quite far, though I have also seen setups with 100+GB.
In any case: this is something that can partially be accommodated for in workflow design. For example, one can use chunked loops to process very large datasets in pieces, to reduce the memory load at any single point in time. Additionally, demanding workflows can be scheduled at quiet/differing times.
Workflow designers should also be informed about the servers capacity so that they can test (with a subset of data) on their own machine, to see how costly their workflow is.

Should you ever run into memory issues, please also check the hard disk – if a partition we write to is full, jobs and data may be kept in memory.

Kind regards
Marvin

ribizli · November 12, 2021, 3:06pm

Thank you! I’ll keep this in mind!
Ribizli

system · May 14, 2022, 3:07am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.