I am really impressed with the new Microsoft Authentication and Sharepoint Online Connection nodes; they make dealing with the Sharepoint API a breeze. I’m exploring the use of these nodes to access a document library and aggregate across multiple like *.xlsx files, using the wildcard option in the file filter selector, and ‘Files in folder’ option. So far, success (Excel Reader config view):
My issue is collecting metadata from the files (e.g., like a file name, or file path) so that I can distinguish which rows in the final aggregated data frame come from which source file. Any insights into a feature I might be missing or a workflow adaptation I might use to address this?
I partially answered my own question. I’m using the ‘List Files/Folders (Labs)’ node (not the typical ‘List Files’ node), and configuring the input port to accept the Sharepoint Connector. Now I can list files and folders, to be used as paths or metadata for the aggregated reads!
Click on the breadcrumbs (outlined in red) on either of the nodes and select ‘Add File System Connection Port’. After linking to the Sharepoint connector node, you will be able to see the folder/files within the connection.
just to drop info in case you haven’t seen it that with new KNIME version 4.4.0 you can can distinguish which rows in the final aggregated data frame come from which source file by adding path column. See here for more: