Sentence extraction/generating from unstructured data


currently I have a txt file with unstructured text which doesn't have any punctuation. Is there any node available that can parse my text and for examle using NLP extract/generate separate sentences? For example:

Input :

the world is big my name is Alice

Output : 

the world is big

my name is Alice

Hi Myla,

Sounds doable, but not with any single node I know. Part-of-speech (POS) tagging should give you hints on where to separate sentences. Though obviously it won't be possible to reproduce nested structures this way.


Moving into the text processing forum.

Hi Myla,

you can try to use the Strings to Document node on these strings and then the Sentence Extractor but I doubt that this will lead to reasonable results. The sentence tokenization is mostly based on punctuation marks.

Cheers, Kilian

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.