I am trying to read in 10 million rows. However the process is really slow and i was wondering if i could make some changes in the impala connector like in the advanced settings or something else?
Any pointers?
Thanks
R
I am trying to read in 10 million rows. However the process is really slow and i was wondering if i could make some changes in the impala connector like in the advanced settings or something else?
Any pointers?
Thanks
R
Also is there a way to know the progress in % terms. Like i made a group by query so i don’t know how many rows are going to be fetched, is it possible to see how much time the entire process is going to take or how much it has progressed or pending in terms of row count and time?
@r_jain Big Data systems are a special case and most of the power must come fro the big data system itself.
Then … this is not relevant when retrieving the data but when exploring or writing code. You might check to deactivate “retrieve in configuration” (Microsoft Access Connector Java Heap Space - #3 by mlauber71) in the Impala connector in order to speed up the use of DB nodes (downside is you might not always have the latest columns and you would have to know the structure or retrieve them in advance).
Not sure how familiar you are with big data concepts. I have collected a few functions here to play around with (thanks to Create Local Big Data Environment – KNIME Hub it will also work if someone does not have a big data system at hand)
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.