I want to extract job-names from a lot of job-offer-documents with the help of a local grammar. So I have created a local grammar with more than 100 rules. I wonder what is the best method to apply this rules in Knime. First I thought obout using the "Java Snippet (simple)" Node and several regular expressions, but I think this is not a good way because of the great number of regular expressions.
Does anybody know a more easy way to apply these rules to the job-offers?
The text processing nodes may be the way to go in the knime labs section using the regex filter node. As you have 100 or so rules, using a table row to variable loop start and loop end nodes would be useful.
Thank you for your response!
I hoped that there is a way to avoid using regular expressions. I think the number of regular expressions would be to big to handle it.
This is my grammar: http://share-me.de/grammar.png
If there is no way, I think I have to make the grammar mutch smaller.
I have three ideas to solve my problem with the big number of rules.
Is it possible in Knime to pack local alternatives like [wir suchen|suchen wir|sucht] together as one tag like "SEARCH" or [ab sofort|schnellstmöglich] as "FAST" and then combine these tags like this to simplify applying the rules:
- SEARCH "zur verstärkung unseres Teams" FAST
- SEARCH FAST
Or is it possible in Knime to pass a document from one regular expression to another until the job-name is recognized?
Or is it posible in Knime to applay a typ-2 grammar on the job-offers by using a specific parser?
Thank you for every advice!
As Simon suggests, you may be able to use the text processing extension in KNIME Labs. Specifically, the Dict Replacer might help you parse your grammar but I personally don't have any experience with this. If you post some example data I'd be willing to give it a try.
As far as I know there isn't a grammar parser, but if you have one (nooj?) we may be able to wire it up to KNIME using a java snippet node.
Please keep us posted on your progress.
Hello Mr. Hart,
thank you for your response!
I prepared and uploaded some job-offers in german language: http://share-me.de/job-offers.zip
It would be great if you could give it a try with the Dict Replacer to show me an example how to use it.
NooJ allows to model a grammar on a graphical user interface and afterwards to generate the language with all its rules. Unfortunately NooJ crashes after generating 5.000 rules when I trie to do it with my grammar (http://share-me.de/grammar.png, http://share-me.de/grammar-nooj.nog). 5.000 rules are also too mutch to maintain - this is why I am searching for anoter way to use the grammar in knime.