NGram node options


NGram node has two options. The first is “Number of maximal parallel processes”. This option is repeated with other some nodes. According to what criteria determine its value regarding to the running machine? Is this option related to parallel environment like as Hadoop or data center? or is it related to the cores of individual CPU?

The second is “Number of document per process”. According to what criteria determine that number?



Hi @ahmed_gomaa -

The maximal parallel processes refers to the number of threads you will allow your local system to create during processing. This isn’t related to Hadoop.

For both this parameter and the number of documents per process, performance will vary according to your available RAM and cores. You can tweak them if you feel like things are running more slowly than you’d like.

1 Like