I noticed that the Sentence Extractor node has no options to propagate fields that will allow the user to group by via the original document id.
For example after the sentence extraction mode and various manipulations I want to be able to retrieve the RowId of the original document to which the extracted words of sentences belong to.
Would it be possible to add it in the next release?
row ids can not be used from the input table, since the node will create a row for each sentence of the input documents. However, the document itself is also part of the output table. You can use the document to group or also to join information from the original table back to the output. Additionally you can store meta information like category or sources in the documents and extract this information later on with the Document Data Extractor.
Does this help?