Large XLS File Reader Configure/Preview Times: What to expect?

I am attempting to process approximately 870MB of Excel data accross four files.  When trying to load them (the file I am looking to at the moment load is about 220MB), it is taking hours at the refreshing preview table step.

Even on a 50MB file though, I've seen it choke on this step, even when it doesn't later on.

Also, is there any way to suppress previewing the old file when configuring?  I've usually had to delete the read nodes and use new ones when loading updated data , since that seems faster.

Is there anyway to shorten or avoid the preview process with some set schema?  Its going to be the same every time.

Finally, is there anoter type of file format reader that will be much faster.  The place we get our data unfortunately has overly-messy .csv's, but we could translate the Excel files separately if need be. 

I am running it on a MacBook Pro Retina, 2.6GHz i7 16GB RAM (12GB for heap space), and an SSD.  While it is previewing (at the moment), RAM usage has plateaued well below the heap limit, whle CPU usage has fluctuated within a range, so it is definately operating, hasn't maxed out my resources and definately isn't hanging.



One idea to prevent the preview configuration problem is using a much smaller xls with the same structure you expect to configure the XLS Reader node and use a QuickForms node to select the larger file, and sew that path information to the XLS reader node's Flow Variables configuration. Once you are able to read to KNIME I think things will be faster.

Cheers, gabor

Interesting.  I'll look at that.

What would happen if I replaced the files in the directory with new ones of the same name?  Would it load the new ones, or does it need to go through the preview process?

You are right. Using files with the same name seems to be a simpler solution. :)

I think it will just load the new ones unless you open the configuration dialog.

I've had the "pleasure" of parsing such files as well... Maybe a DBReader-like "execute without configure" option would be of interest.

KNIMErs? *cough* feature request *cough* :)

-- E