I'm a novice at Python and trying to migrate external scripts into my Knime Workflows.
In this specific use case I have a table of poorly standardized names along with 7 other columns containing categorical data. I wrote a python script using the name_cleaver package that cleans up the names and stores parsed first names, middle names, and last names into variables.
In my external script I simply use xlrd and xlwt to grab the name column, iterate through it cleaning them up, and then append the data row by row to a new processed file.
The problem I'm running into in Knime is in writing the new variables to the output_table. I've attached my script so far. I'm trying to figure out how to append the new variables (firstname, lastname, middlename, mr, ms, mrs) to my output_table. I have a column with an ID variable. The names variable is stored as a series but my new variables are stored as single value strings. Any help would be GREATLY appreciated!
from name_cleaver import IndividualNameCleaver
names = input_table['Name']
for x in names:
if "Mr." in x:
mr=1
else: mr=0
if "Ms." in x:
ms=1
else: ms=0
if "Mrs." in x :
mrs=1
else: mrs=0
newname = IndividualNameCleaver(str(x)).parse()
lastname= newname.last
middlename= newname.middle
firstname= newname.first
I'm not familiar with the name_cleaver module, but you are not making a new dataframe or dictionary in which to place your new names. At each iteration you need to populate your lastname, middlename and firstname with values/columns. At the end you equal it as PyOUT and it will be returned to KNIME.
Quick follow up: What would that look like inside the iteration? Specifcially, how would I populate the values/columns? The current result from the iterator is just a single string value.
for x in names:
if "Mr." in x:
personList.append('mr')
if "Ms." in x:
personList.append('ms')
if "Mrs." in x :
personList.append('mrs')
newname = IndividualNameCleaver(str(x)).parse()
lastnameList.append(newname.last)
middlenameList.append(newname.middle)
firstnameList.append(newname.first)
result = {'person': personList, 'lastname': lastnameList, 'middlename':middlenameList, 'firstname':firstnameList}
pyOut = pd.DataFrame(result.values(), result.keys()).T
A few tweaks for the Python Script (Local Installation) and it worked perfect. This gives me the roadmap I need to figure out the bigger picture. From my understanding the script
created empty lists,
filled them with our new datta, and
then fed the lists into a dictionary.
From there the dictionary was converted to a DataFrame using pandas to create the output type I needed to pass it on as a table in Knime.
Marc, THANK YOU! You have been a tremendous help in guiding me in the right direction. Without people like you responding to content in the forums it'd be incredibly difficult to learn.
from name_cleaver import IndividualNameCleaver
import pandas as pd
personList = []
lastnameList = []
middlenameList = []
firstnameList = []
result = {}
names = input_table['Name']
for x in names:
if "Mr." in x:
personList.append('mr')
if "Ms." in x:
personList.append('ms')
if "Mrs." in x :
personList.append('mrs')
newname = IndividualNameCleaver(str(x)).parse()
lastnameList.append(newname.last)
middlenameList.append(newname.middle)
firstnameList.append(newname.first)
result = {'person': personList, 'lastname': lastnameList, 'middlename':middlenameList, 'firstname':firstnameList}
pyOut = pd.DataFrame(result.values(), result.keys()).T
output_table = pyOut