Hi,
I'm trying to understand how the groupby node decides what entry is first, and what entry is last when aggregating grouped rows.
I have a siituation in a workflow where I'm bringing 10 or so sdf files together (around 10k rows total after concatenation), grouping by an InChiKey column, and aggregating about 4 columns based on "First" selection, and another column based on "List" selection.
I've used this approach for the past two years without issue - I've always assumed that the row order going into the groupby node dictated what is first, what the list order will be when agregating etc. Howver I now have a repeatable example where the post-grouped table is classifying the *First* entry in an aggregated grouped row as being the entry that is actually "second" in the relative row order when looking at the table pre groupby node. All other examples of grouping are behaving "normally" in that the "first" entry in the pregroup by is being classified as the first entry-
I've made many attenpts to re-assign row order etc to no avail. The only way I could get the groupby node to give me the desired list order and correctly identify the "First" entry in the table is when I deleted the input sdf reader with the offending row, created a new sdf reader and then pointed it at the same file. I couldnt belioeve this worked, so I tried it a second time and this time I got the same probelm.
What am I not seeing? Thanks for any help :-)
tl;dr What dictates the entry order in the groupby node for assigning First, Last, and collating lists- I have repeatble workflow issue that shows its got nothing to do with the row order of the entry table