I have a MongoDB with a a set of documents based upon when they were updated in the origin system. So, for example, a sales invoice with number 1044 may be received twice, the first time when the transaction_status is "UNPAID" and again 28 days later when the transaction_status is "PAID".
Using MongoChef, I use a SORT by transaction_reference (ascending) and object_creation_date (decending) followed by a $group and select the $first of each set to have the latest value.
$sort : { "party_uuid":1, "connection_uuid":1, "transaction_reference":1, "object_creation_date": -1 } $group : { _id: "$transaction_reference", "party_uuid" : {$first: "$party_uuid"}, "transaction_date" : {$first: "$transaction_date"}, "transaction_net_value" : {$first: "$transaction_net_value"}, "object_category" : {$first: "$object_category"}, "transaction_status" : {$first: "$transaction_status"}, "object_origin_category" : {$first: "$object_origin_category"}, "object_origin" : {$first: "$object_origin"}, "goods_and_services" : {$first: "$goods_and_services"} }
I want to do a similar thing in a workflow in KNIME and I've done the following so far:
I've added a MongoDB Reader which I've configured and it is working. I have the following configuration in it:
Query : {object_class : "Goods & Services Transaction", object_origin_category : "Bookkeeping"} Projection : {_id:0, transaction_reference :1, object_creation_date:1, object_category:1, object_type : 1, object_origin: 1, transaction_date : 1, transaction_status :1, transaction_gross_value : 1, transaction_net_value :1, transaction_tax_value :1, party.party_name : 1, party_uuid : 1} Sort : {party_uuid : 1, transaction_reference : 1, object_creation_date : -1}
I've then added a JSON to Table to make it easy to read, which has worked (with exception to insisting on renaming my two date variables which is very annoying).
What I need help with is the following
From the table, I want to divide the set into two outputs - 1) the first row for each transaction_reference based upon the sort order above. 2) the last row for each transaction_reference based upon the sort order above.
I've tried groupBy, I've looked at Row Filter and others and cannot find one where I can add in the criteria of either MIN or MAX of object_creation_date for each transaction_reference.
Thanks, Matt