I have a MongoDB with a a set of documents based upon when they were updated in the origin system. So, for example, a sales invoice with number 1044 may be received twice, the first time when the transaction_status is "UNPAID" and again 28 days later when the transaction_status is "PAID".
Using MongoChef, I use a SORT by transaction_reference (ascending) and object_creation_date (decending) followed by a $group and select the $first of each set to have the latest value.
$sort :
{
"party_uuid":1,
"connection_uuid":1,
"transaction_reference":1,
"object_creation_date": -1
}
$group :
{
_id: "$transaction_reference",
"party_uuid" : {$first: "$party_uuid"},
"transaction_date" : {$first: "$transaction_date"},
"transaction_net_value" : {$first: "$transaction_net_value"},
"object_category" : {$first: "$object_category"},
"transaction_status" : {$first: "$transaction_status"},
"object_origin_category" : {$first: "$object_origin_category"},
"object_origin" : {$first: "$object_origin"},
"goods_and_services" : {$first: "$goods_and_services"}
}
I want to do a similar thing in a workflow in KNIME and I've done the following so far:
I've added a MongoDB Reader which I've configured and it is working. I have the following configuration in it:
Query : {object_class : "Goods & Services Transaction",
object_origin_category : "Bookkeeping"}
Projection : {_id:0,
transaction_reference :1,
object_creation_date:1,
object_category:1,
object_type : 1,
object_origin: 1,
transaction_date : 1,
transaction_status :1,
transaction_gross_value : 1,
transaction_net_value :1,
transaction_tax_value :1,
party.party_name : 1,
party_uuid : 1}
Sort : {party_uuid : 1,
transaction_reference : 1,
object_creation_date : -1}
I've then added a JSON to Table to make it easy to read, which has worked (with exception to insisting on renaming my two date variables which is very annoying).
What I need help with is the following
From the table, I want to divide the set into two outputs - 1) the first row for each transaction_reference based upon the sort order above. 2) the last row for each transaction_reference based upon the sort order above.
I've tried groupBy, I've looked at Row Filter and others and cannot find one where I can add in the criteria of either MIN or MAX of object_creation_date for each transaction_reference.
Thanks, Matt