Hello all.. I encountered an issue where I have a function written in Python which is basiclly a Hebrew Stemmer, and i want to use terms or documents from the "Bag of Words" node, and apply the stemmer on the input features.
I have the functin which calls a local server for the stemming process:
import urllib import pandas def lemmatize(text): text_encoded = text.encode("utf8") params = urllib.urlencode({'text' : text_encoded})#.encode("utf8")}) didnt_got_tags=True while didnt_got_tags: try: f = urllib.urlopen("HTTP://127.0.0.1:8086", params) words = f.readlines() for word in words: if 'HTTP ERROR' in word: print 'error' continue didnt_got_tags = False except: print 'error :(' words = f.readlines() lemmas = [] for word in words: if not word == '\n': splt = word.split('\t') lemmatized = remove_nums(splt[2].split('^')[-1]) pos = tostring1(int(splt[1])).split(',')[0].split(':')[1].split('-')[0] lemmas.append((lemmatized, pos)) return lemmas
I know that i need to create additional column in the input_table which will contain the stemmed document or term:
input_table['lemmatize'] = lemmatize(input_table['Term']
However, i keep getting errors... anyone can understand why?