After trying to diagnose some strange tokenising behaviour, I've determined that the cause was the N Chars Filter removing punctuation marks paired with spaces. So the string "one, two" became "onetwo". At least, this is how it looks in the Document Viewer. When you create a bag of words, the terms are still separate. BUT, if you then use the Document Data Extractor to turn the documents into strings, the words are combined in the output strings -- and this can be a real nuisance.
Presumably this filter is intended to be used after removing punctuation, but in this case I used it beforehand because I wanted to retain punctuation prior to using the NGram creator (which seems to take punctuation into account, though I could be mistaken).
Anyway, this is easy enough to work around. But still, this does not seem like the most logical behaviour to expect from the N Chars Filter. Wouldn't it make more sense for this node to filter out only strings of non-whitespace characters?
The attached workflow replicates the behaviour I have described, both in relation to the N Chars Filter and the Document Data Extractor.