Parallel Chunk Start

Hi all,

I have used “Parallel Chunk Start” node to distribute my data and increase performance but time didn’t decreased. Let me give details. I have 600 stores and i applied 6 chunk to execute as parallel. But, while one flow read data from db, other flows are waiting.

Is there any solution to increase performance. Otherwise, i will not execute my flow daily, it takes long time.

Please help…

Hi there @yunusEG ,

I have seen couple of your posts recently on Forum regarding offset, parallel execution and scheduling on KNIME Server. This is all connected to a certain use case you are doing?

Every flow is reading data from db? Can you share some print screen of your workflow? Which KNIME db nodes are you using?

Br,
Ivan

Thanks for your attention firstly. I know, at recently, I had lots of shares because of my first project in Knime and i think they are be continued:) There is a flow below that contains 600 stores of a retailer and almost 70 millions records. At first, I have designed my flow only “table row to variable loop start” and for every store flow has executed one by one. It took about 4 hours. There could be different reasons such as server performance… But I think this time could be decreased If i use parallel chunk start and parallel chunk end. So I splits my stores to ten pieces (10 * 60) and execute as parallel them. But time is not decreased. Is it normal?, Need I different solutions?

Maybe you can say 4 hours is normal but There are 6 retail flows like that and they must be run daily.

Flow pics is below.
Thank you.

Hi @ipazin,

Do you have any recommendations about above problem.

I will be glad to handle performance problem.

Do you know what the slowest part of each loop iteration is?

It could also be worth investigating if it is faster to read all of the data in a single SQL query and then starting your loop at the lag column node.

I couldn’t understand why other sources are waiting while one of them is reading data. I think it is the most important point of my flow. To select stores, i must take sql reader node into loop. Because, my aim is splitting stores to different chunks like i said above.

Forget Parallel Chunk start. For me it never really worked nicely as in the background lot’s of data copying is happening (IO heavy, slow).

Also at least with old nodes the db connection was limited in my experience to 1 concurrent connection. I think that is also what you are seeing. As long as one reader is active, the other ones are waiting. You could try to add a second connector that is inside the parallel chunk loop.

Still, looping is slow. Can’t you adjust the workflow so that looping isn’t needed? We can’t really help you with the information available. Can you make an example workflow that reads some data from excel or table creator? like 100 rows for 3 shops each or similar?

1 Like

Hi there @yunusEG,

sry for a bit late response to this topic. Hope your first project with KNIME went well or at least is still going ok :slight_smile:

If you are still interested in getting performance you should think about streaming. Here is a blog regarding it. A bit older but still valid. This is especially handy in your use case cause you are reading huge amounts of data from database and new database framework supports streaming. For example you can take a look at this example workflow from KNIME Hub.

And further on here is link on general post regarding workflow optimization:
https://www.knime.com/blog/optimizing-knime-workflows-for-performance

Anyways if you still need any help feel free to ask.

Br,
Ivan

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.