I have an excel data extract dropped daily from a database and I would like to create a workflow to pull in updates from the data each day to an existing version of the excel file that has the same columns + additional columns at the end for tracking purposes. In column 1 each row has a unique number identifier so basically I would like the files to compare the new data extract against the old using the unique row identifier, and take updates from the new data to any of the shared columns (not override any of the “tracking” columns that aren’t in the dropped data) and if there are any new rows in the new data (based on the unique number identifier) add those to the data as well.
Could anyone help me with this? I’ve tried using the “joiner” node but I’m having trouble.
can you post the examples with some dummy data?
Unfortunately my company policy does not allow me to upload files to this website. But basically, the columns that both docs share would be along time lines of: Request ID, Client name, Client ID, Date of approval 1, Data of approval 2, and Status. The document that I’d like to maintain as a tracker will also have “week of prioritization” and “Comments” fields.
Hope this helps!
You could create some dummy data that would represent your problem. From your description it is not entirely clear what you want to do. Update something and track something. I think you have to come up with some sort of plan and an idea to store your status. And then you have to decide how to handle the results (write them back to Excel?).
KNIME provides a lot of tools you could combine. Here are some points you would want to think about:
- do you want to store the tracking information in a separate table and use that or would you use the result table to decide which rows to update
- would you update all rows that match or only some columns
- what do you mean by additional columns that help you track?
- do you have a unique ID or can you create one from existing columns
- do you need to be able to have a rollback option in case something goes wrong
- would it help to use a local database like H2 or SQLite or is it best to store everything in KNIME tables
- how large is the data likely to become (if it is not that big it could make sense to store intermediate steps)
- about the export. KNIME can not easily ‘update’ an existing Excel table (only thru Python code). So would you want to overwrite the results every time or maybe create a new Excel file (with the tatest timestamp)
As you can see - planning will go a long way to create a system you want. The plus side is: once you have set it up your task would only require a click (or you could even automate or schedule it)
just to add to @mlauber71 points - the update of existing files can also be done through the java snippet node and java (however the point stands that there is no native solution in knime
See the attached workflow
Basically, you read in the new and the master file.
add whatever trackign systme you are using ( i assumed the load in date)
append the rows using the concatenate node (basically a union)
then check remove any duplicates (in case you are loading in teh same data by mistake)
finally overwrite the preiovus master.
for additional redundancy the flow also makes a dated backup of the master prior to re-writing, for those days when things go wrong.
append to excel.knar
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.