I am trying to perform Named Entity Recognition on a set of documents, but after the tagging I loose my row ID which means I can't join the results back.
The information comes in as rows of strings, I convert it to docs using 'String To Doc' node. I assign a row ID to it. Then use 'StanfordNLP NE tagger' to perform the NE Recognition, create a bag of words and use a splitter to get the useful information. I want to join the results back to the original input, I use the document column to do this. But I am getting back an empty table, even though it is the exact same set of documents.
You are trying to join the original documents to the tagged documents. Since the tagged documents also include the tagging information, they can't be matched to the original documents.
Therefore, you should connect the bottom input of your Joiner to the StanfordNLP Tagger rather than to the RowID node. If you want to continue working with the documents without tags, you can use a Tag Stripper node after joining.
Can you please try again without the Tag Stripper that connects directly to the RowID node? It's possible that this changes something in the documents so that they don't match afterwards.
If that doesn't help, can you please post your workflow here with a small sample of your data? I can then have a closer look what is happening.