Access to shared folder from the Business Hub

Hi Experts,

We are currently using Box as our cloud file system, but the workflow sometimes takes forever to run due to the number of files it reads, and the connector and the number of folders also cause it to stop responding often.

We also have a Windows server that is accessible from the Hub, but from what I read, a shared folder there wouldn’t be accessible from the Hub.

I’m looking for options and suggestions to make this process easier, like a local file system inside the container (I don’t even know if this is possible), or else.

Anyway, I would appreciate any input on the topic.

Thank you!

Fernando

There is no “local filesystem inside the container” you can rely on for large, shared datasets, container disk disappears between runs, between pods, and when the executor scales.

Because of that, using Box + lots of small files is unfortunately a very tough combination for good performance. Every small file can mean an extra network call that adds up very quickly — even if you give the job much more CPU.

What usually works much better:

  1. Combine the many small files into fewer, bigger files:

    • Parquet files (very efficient for data workflows)

    • ORC files

    • One big ZIP or TAR archive

    • Or load everything into a database table → Reading a few large files is faster than reading thousands of tiny ones.

  2. Move the data from Box to cloud object storage:

    • Amazon S3 (or any S3-compatible storage)

    • Azure Blob Storage

    This would be much more performant and stable than Box for this kind of workload.

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.