I am attempting to process approximately 870MB of Excel data accross four files. When trying to load them (the file I am looking to at the moment load is about 220MB), it is taking hours at the refreshing preview table step.
Even on a 50MB file though, I've seen it choke on this step, even when it doesn't later on.
Also, is there any way to suppress previewing the old file when configuring? I've usually had to delete the read nodes and use new ones when loading updated data , since that seems faster.
Is there anyway to shorten or avoid the preview process with some set schema? Its going to be the same every time.
Finally, is there anoter type of file format reader that will be much faster. The place we get our data unfortunately has overly-messy .csv's, but we could translate the Excel files separately if need be.
I am running it on a MacBook Pro Retina, 2.6GHz i7 16GB RAM (12GB for heap space), and an SSD. While it is previewing (at the moment), RAM usage has plateaued well below the heap limit, whle CPU usage has fluctuated within a range, so it is definately operating, hasn't maxed out my resources and definately isn't hanging.