A friend and I have to make a TextMining project and the deadline is in one week and we're so stuck. We're analyzing books and we're trying to find relationships between people in the book.
We managed to make word-clouds extracting all named entities for locations and persons (using dictionaries, because the OpenNLP tagger did not manage to find all different persons that well...). Now we want to find sentences which contain, for example, 'pipo is the son of ajax' or 'venus daughter of pluto'. We now have a nested loop which loads 'firstnames', 'relations' and 'secondnames' and we are trying to use the RegEx node and make a regular expression which somehow combines these variables (their entries). However, we can not make this work.
Does anybody have a good idea on how to do this? Which regular expression to use or maybe there is a smarter way to do this all in all? We would be sooooo grateful!
(Attached the png of our workflow so far...)
Would it be easier to use the dictionary tagger instead of the sentence extractor and list the terms you are searching for in a second table that you can connect to the dictionary tagger. You would then use BoW node to get out these terms.
Simon ist right, using the Dictionary tagger to find relations (or terms) like "pipo is the son of ajax" or "venus daughter of pluto" or easier as looping and searching with regex. The terms of the dictionary can consist of multiple words, like "pipo is the son of ajax". This allows for searching and tagging multiwords and expressions like that.
To find the sentences with these expressions, extract the sentences of the documents. Convert the sentences back to documents, one for each sentence. Then apply the dictionary tagger with your expressions as dictionary to use. Filter out all terms that have not been tagged. What remains is a data table of tagged terms and the documents (sentences) they are contained in. And finally extract the text, to get the complete sentence as string.
The busy part is to create the dictionary of expressions like "pipo is the son of ajax" and "venus daughter of pluto".
Attached you find an example workflow of the method described above.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.