Creating a different excel file by comparing the newly added data to the Excel (a.xls) File that is constantly adding new data and updated (b.xls)

Hi;
I hope I can accurately explain what I want to do.

1- There is a fixed excel file for example a.xls. Continuous data is added to the excel file every day. And with the data added every day, the number of records is progressing. (no deletions)

  • But for example, there are newly added data today, a.xls data added
    • Rule 1. If the previous reference column data comparison is made by providing repeated record control. No recurring entries
    • Rule 2. Writing new data that does not repeat to a.xls file (without deleting the data in the current file)

2- Determining new data added to a.xls and writing to b.xls file that will be daily report (this file is updated every day. Old data is not kept)

an example of this would be great for me. And I hope I was able to tell you correctly.

Many thanks for every kind of opinion and help

Note: The sample workflow will be educational for me. It will be great

Hi,

Try to do something with https://nodepit.com/node/org.knime.base.node.preproc.duplicates.DuplicateRowFilterNodeFactory
This node will tell you which rows are unique (b.xls, new data) and chosen (a.xls, old data) and duplicate (rows to delete),

1 Like

Hi; @andrejz

I made a draft workflow. But all I need is to compare the new records with the archive file.

then just adding new records to the archive file and adding them to the daily report file

how can i just export new records?

The solution thus reads an archive file, where data is transferred every day. New incoming data is checked before being transferred to archive file and daily report file, and parsed. Finally, new unique records are transferred to both the archive file and the report file.

image

image

1 Like

Concatenate the archive and new data use duplicate row filter which will find new data (node ads classification column)

Select “keep duplicate rows” and then filer unique

1 Like

Update: I share the latest version of the workflow for those who are looking for a solution in a similar subject.

as a result ;
1- archive file is read. If there is a record to be added, it will be updated by adding it to the archive file. So excel new record is added. (does not actually add archive + new record = new archive. combines and rewrites by comparing)

2- New records that are not in the file of archive are compared with the url address and generate reports for new records.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.