Append Data Node

Hi Friends,

I have a question with regards to my scenario.

Here’s the problem.

Is there a node that can identify new data in existing data tables? Thank you!

Respectfully,
Sean

Hi @seanmanzanilla , you can use the Table Difference Finder node

Just make sure that both tables have the same structure and are sorted the same way.

EDIT: Another option is, if your ID column is somewhat unique, you can do a Left (or Right, depending on which table is on the Left (first) and which one is on the Right (second)) Join between the 2 tables. It will give you which records from the Left table that are not in the Right table

3 Likes

Hi @bruno29a

Sorry, where do I find the Another Option is?

Thank you!

Hi @seanmanzanilla , the other option was to use the Joiner node if your ID column is unique. You can do a join on the ID between the 2 tables. A Left (or right, depending on which table is left or right) Join will show you the records that are in the Left table but not in the right table.

1 Like

Depending on your table size and whether the column are unique you could concatenate the 2 tables and then remove the duplicates using duplicate row filter node
br

1 Like

Hi @bruno29a @Daniel_Weikert, your ideas is great, however it will not work to my current workflow.

Here’s the sample workflow

My 1st Execute/New Data will create new Excel File. Please note that First Execute file will be my Reference in 2nd Execute, if there will be New ID that will be added.

1st Day Execute
ID Version Product BU/Product
1 1 SFT 11225 SFT
2 1 SFT 11225 SFT
3 1 SFT 11231 SFT

In 2nd Day Execute Data column ID 4 will append in Existing File(1st Execute)

2nd Day Execute
ID Version Product BU/Product
1 2 SFT 11225 SFT
2 2 SFT 11225 SFT
3 2 SFT 11231 SFT
4 2 SFT 11624 SFT

In 3rd Day Execute column ID 5 will append in existing file combination of Day 1&2.

3rd Day Execute
ID Version Product BU/Product
1 3 SFT 11225 SFT
2 3 SFT 11225 SFT
3 3 SFT 11231 SFT
4 3 SFT 11624 SFT
5 3 SFT 11625 SFT

It’s a daily task I need to identify if there are new ID to append, however there are Day that has NO new id to append. thank you

Why is it that people come with a set of requirements, to which we offer a proper solution, to then come back and finally give the full details. Why not give the full details since the beginning? :slight_smile: That’s quite common unfortunately. It serves no purpose when crucial information are missing in the initial data. I always say it, the more accurate you are in your details, the more accurate the solution will be :slight_smile:

@seanmanzanilla indeed if you have the version value changing all the time, you cannot use the Table Difference Finder in this case, but the Join option I mentioned will work since it will rely on the IDs. Have you tried it?

I put a quick simple demo for you to demonstrate:
image

Day 1 (same as what you have):
image

Day 2 (same as what you have):
image

Day 3 (same as what you have):
image

Check if there is any new data between Day 1 and Day 2:
image

Check if there is any new data between Day 2 and Day 3:
image

Obviously you have to implement something that will read the files dynamically (previous day file and current day file). I can help with this too if I fully understand your use case (may be we can use something in the file name to identify the file, or may be you will always have only 2 same files all the time because they are replaced daily)

Here’s the demo: Check for new data.knwf (41.4 KB)

Note: In my demo, I have 2 workflows, which are identical except for the input files. This is just to show you the results of the 2 comparisons. You only need one of them.

Also, you can check the difference between any 2 days. You can even try to check the difference between Day 1 and Day 3. You will see both ID 4 and 5.

duplicate row filter allows to specifiy which columns to include in check.
br

Hi @bruno29a, thanks for this,

BTW Is there a way to maintain the First Execute Data table because I will use it as a database(excel sheet) and aggregate some column. So Every time I will execute the daily data it will only add the new unique ID in my First Execute Data?

Thank you and Regards,
Sean

Hi @seanmanzanilla , no problem.

I need to understand your operation. Where are you reading from, the first time, and the subsequent times? Are you reading from a file? From a db table?

Hi @Daniel_Weikert thanks for this,

However I need to use the first execute data because I will use it as a database and aggregate some columns and only add unique ID from Daily.

Thanks and regards,
Sean

Hi @bruno29a

The First Execute Day(excel file) Data, will be my first database output (excel file), After that I need to add columns and information to add in my 1st Database(excel file).

Second Execute Data(excel file) data, I want to add the Unique ID information to my (1st database) so that I can maintain the tagging of my Prev working files. and tag the new ID from second execute.

Third Execute Data (excel file data) Again I only want the new ID to add in my Prev working files
and tag the new ID from 3rd execute.

Goal of the Workflow
I need to maintain the 1st execute(data) and manually add additional information and columns.

So every time I have to run daily data it will only add new ID and I will add manual information and columns depends on how many new id will append from daily data.

I hope it enlighten you @bruno29a,
Thanks and Regards,
Sean

Hi @seanmanzanilla , I would recommend using Parquet instead of Excel as database.

However, if you insist on using Excel, I would suggest the following:
First Execution:

  1. Just save the data as your permanent file (let’s say you call it Database.xlsx)

Next Executions:

  1. Load Database.xlsx via Excel Reader
  2. Load today’s file via Excel Reader
  3. Do the comparison with the Joiner - configure Joiner to return only the ID
  4. Concatenate the ID to the Excel Reader of Database.xlsx
  5. Write to Database.xlsx via Excel Writer - configure so that data is appended:

This should add the new ID(s) to your Database.xlsx. As per your request, it will add the new ID(s) only, where you want to then manually add information.

NOTE: Another option for getting the new IDs can be that, if your ID is incremental, then you can check what’s the max(ID) in your Database.xlsx, and then just do a row filter on your daily file where ID > max(ID) of your Database.xlsx. As your data grow, this option might be faster than the Join

3 Likes

Hi @bruno29a ,
Thanks a lot will try this case.

Respectfully,
Sean

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.