Dear KNIMErs,
I am trying to build a workflow to loop through multiple AWS S3 folders daily and extract CSV files. The folder structure has both fixed and dynamic components, which I’ll explain in detail below. Despite my efforts, I am struggling to dynamically feed the file path into the CSV Reader node.
Here’s the folder structure inside the S3 bucket:
a/b/c/product-name-a/MONTHLY/2024-10-01/2024-10-31/xycabz345aaa-2024-10-31.csv
Breakdown of the Folder and File Structure:
a/b/c
: Fixed path defining the main subfolder for daily snapshot files.product-name-a
: Semi-dynamic . This represents one of ~20-30 products. The product names are fixed for now but may change over time.MONTHLY
: A subfolder present in each product folder.2024-10-01
: The month-level subfolder. It reflects the date the folder was created (e.g., October 1st) and contains all files for that month.2024-10-31
: The daily snapshot subfolder, labeled with the creation date (e.g., October 31st).xycabz345aaa-2024-10-31.csv
: The dynamic CSV file name, combining a random hash and the creation date.
What I Have Tried:
To handle the dynamic parts, I planned to:
- Use loops and String Manipulation nodes to create the required folder structure.
- Dynamically generate the file paths using List Files/Folders, String Manipulation, and String to Path nodes.
- Feed the resulting file path as a flow variable into the CSV Reader node.
The Issue:
When I pass the dynamically created path as a flow variable to the CSV Reader, I get an error saying the file does not exist. This happens despite verifying that the path is correct when checked manually.
My Question:
Is there a way to dynamically pass a path flow variable into the CSV Reader node to successfully read data from AWS S3 buckets? If not, is there an alternative approach for handling such dynamic S3 folder structures in KNIME?
I appreciate any guidance or ideas to resolve this issue. Thank you in advance for your help!