How to Automatically Refresh Data in KNIME Workflow with CSV/Excel/Table Readers

Hello KNIME community,

I am currently working on a KNIME workflow where I need to continuously read data from external files (such as CSV, Excel, or tables) that are being updated iteratively. Essentially, data is added to the same file or table after each iteration, and I need to automatically refresh the data being read without manually reloading the file every time new data is appended.

The issue I’m facing is that the nodes like CSV Reader, Excel Reader, or Table Reader do not seem to automatically refresh the data when new rows are added. Currently, the only way I can refresh the data is by manually reloading the table, which is not very practical for continuous workflows.

To give you a specific example, I’m working on a process that scrapes job listings from different pages and appends new URLs to a table (let’s say from pages 2, 3, 4, etc.). After each iteration, new data is added to the file, and I need to update my workflow to pick up these new rows automatically.

Has anyone here encountered this issue and found a way to automate the refresh of these data reading nodes so that they can handle data that is being added in real-time during each iteration?

Any insights or solutions would be greatly appreciated!

Thanks in advance!

Hi @Mohaed_Yahiaoui , welcome to the KNIME community forum.

Can you perhaps show a screenshot of where your CSV Reader (or other reader node) sits in your workflow.

It is true that the reader node will not detect file changes and automatically refresh by itself, but if you have it inside a loop then it would be forced to reread on each iteration if you link to its upstream flow variable port.

@Mohaed_Yahiaoui Maybe you can try “use new schema”.

1 Like

@Mohaed_Yahiaoui you can use the Wait node to (well) wait for a creation, deletion or change of a file:

4 Likes

Thank you all for the proposed solutions! They were very insightful and helped me better understand how to approach this problem.

I also managed to come up with a slightly unconventional workaround. Here’s how I implemented it:

  1. I used a Table Creator node to generate fictitious rows, providing a certain number of iterations.
  2. Then, I connected this to a Chunk Loop Start node to iterate over each row.
  3. Inside the loop, I added a Table Row to Variable node, which is connected to my actual data table (e.g., a CSV Reader or Excel Reader).
  4. This approach forces the data table to refresh at each iteration.

While this works, the main drawback is determining the exact number of iterations upfront. Since new rows might be added to the table dynamically, I can’t know in advance how many rows to create in the Table Creator.

I was wondering if there’s a way to incorporate a conditional check during the process to dynamically adjust the number of iterations. However, this still runs into the initial issue of needing to define a starting number of rows in the Table Creator.

Any suggestions on how to refine this solution or make it more dynamic would be greatly appreciated!

@Mohaed_Yahiaoui I just remembered that I did this where a folder gets checked every few minutes and changes are being recorded and if there is a change a process can be started.

Key to such things is planning. And with a proper KNIME hub this would be much easier.

2 Likes

Hi @Mohaed_Yahiaoui , I was wondering what the idea behind having two loops is.

If the Chunk Loop is merely serving n rows at a time and the Table Row to Variable Loop will then process one row at a time of each n rows, then this is no different to simply losing the Chunk Loop altogether, and having the Table Row to Variable Loop do it’s thing for each row, unless you actually have some other processing (not shown) which is performed on a “per-chunk” basis.

i.e.

To your question of not knowing the number of iterations required, and being able to dynamically adjust them, it is possible that either a Recursive Loop or more likely the following loop might better suit your needs:

Generic Loop Start ---- Variable Condition Loop End

but what I don’t understand from what you have said is what your termination condition is.

You want the loop to continue until what happens?

Also do the variables created by the Table Row to Variable loop actually “feed” the Table Reader in any way, or is this simply a mechanism to execute the statically configured Table Reader n times.

What you have shown may be a mock up, but pretty much the only thing you can configure on a Table Reader relates to “path”, and you cannot create a path variable from a Table Creator and Table Row to Variable without the inclusion of a String to Path node, so I’m guessing there is no dynamic configuration in what you are demonstrating here.

1 Like