Data aggregation across multiple Sharepoint files in KNIME 4.2

I am really impressed with the new Microsoft Authentication and Sharepoint Online Connection nodes; they make dealing with the Sharepoint API a breeze. I’m exploring the use of these nodes to access a document library and aggregate across multiple like *.xlsx files, using the wildcard option in the file filter selector, and ‘Files in folder’ option. So far, success (Excel Reader config view):


My issue is collecting metadata from the files (e.g., like a file name, or file path) so that I can distinguish which rows in the final aggregated data frame come from which source file. Any insights into a feature I might be missing or a workflow adaptation I might use to address this?

I partially answered my own question. I’m using the ‘List Files/Folders (Labs)’ node (not the typical ‘List Files’ node), and configuring the input port to accept the Sharepoint Connector. Now I can list files and folders, to be used as paths or metadata for the aggregated reads!

4 Likes

Hi @longoka,

glad you found a way. Anyways this is on the list. (Internal reference: AP-13949)

Br,
Ivan

1 Like

Hi @longoka,

Thanks for sharing your knowledge with others. I am successful until SharePoint Online Connector (Labs).

When i used excel reader or list files/folders (Labs) nodes, i do not see the connection port available like your’s. Then I connected through variables as shown below.




I tried to provide the path by browsing, but found no folders to select.

Any input from you on this issue.

BR,
Pavan

Click on the breadcrumbs (outlined in red) on either of the nodes and select ‘Add File System Connection Port’. After linking to the Sharepoint connector node, you will be able to see the folder/files within the connection.add_file_system_conn_port

3 Likes

Hi @longoka,

Thanks for your speedy response, it worked. I have realized now that it is available in the description of the node as well, which i missed earlier, sorry for the trouble.

Regards,
Pavan

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Hello @longoka,

just to drop info in case you haven’t seen it that with new KNIME version 4.4.0 you can can distinguish which rows in the final aggregated data frame come from which source file by adding path column. See here for more:

Br,
Ivan

1 Like