Python Scripting Output Table

I'm a novice at Python and trying to migrate external scripts into my Knime Workflows.

In this specific use case I have a table of poorly standardized names along with 7 other columns containing categorical data. I wrote a python script using the name_cleaver package that cleans up the names and stores parsed first names, middle names, and last names into variables. 

In my external script I simply use xlrd and xlwt to grab the name column, iterate through it cleaning them up, and then append the data row by row to a new processed file.

The problem I'm running into in Knime is in writing the new variables to the output_table.  I've attached my script so far. I'm trying to figure out how to append the new variables (firstname, lastname, middlename, mr, ms, mrs) to my output_table. I have a column with an ID variable. The names variable is stored as a series but my new variables are stored as single value strings. Any help would be GREATLY appreciated!

 

from name_cleaver import IndividualNameCleaver
names = input_table['Name']

for x in names:
    if "Mr." in x:
        mr=1
    else: mr=0

    if "Ms." in x:
        ms=1
    else: ms=0

    if "Mrs." in x :
        mrs=1
    else: mrs=0
    newname = IndividualNameCleaver(str(x)).parse()
    lastname= newname.last
    middlename= newname.middle
    firstname= newname.first

 

Hi,

I'm not familiar with the name_cleaver module, but you are not making a new dataframe or dictionary in which to place your new names. At each iteration you need to populate your lastname, middlename and firstname with values/columns. At the end you equal it as PyOUT and it will be returned to KNIME.

 

good luck,

Marc

Thank you for a fast response. 

Quick follow up: What would that look like inside the iteration? Specifcially, how would I populate the values/columns? The current result from the iterator is just a single string value. 

Please post your table, so I don't have to guess and generate it.

Marc

Dataset is attached. 

This works for me, please try.

Bye

from name_cleaver import IndividualNameCleaver
import pandas as pd


personList = []
lastnameList = []
middlenameList = []
firstnameList = []
result = {}

names = kIn['Name']

for x in names:
    if "Mr." in x:
        personList.append('mr') 
    if "Ms." in x:
        personList.append('ms')  
    if "Mrs." in x :
        personList.append('mrs')  
    newname = IndividualNameCleaver(str(x)).parse()
    lastnameList.append(newname.last)
    middlenameList.append(newname.middle)
    firstnameList.append(newname.first)
result = {'person': personList, 'lastname': lastnameList, 'middlename':middlenameList, 'firstname':firstnameList}
pyOut = pd.DataFrame(result.values(), result.keys()).T

A few tweaks for the Python Script (Local Installation) and it worked perfect. This gives me the roadmap I need to figure out the bigger picture. From my understanding the script

  1. created empty lists,
  2. filled them with our new datta, and
  3. then fed the lists into a dictionary.
  4. From there the dictionary was converted to a DataFrame using pandas to create the output type I needed to pass it on as a table in Knime.

Looks like I need to read up on the Pandas package docs, specifically as it relates to DataFrames

Marc, THANK YOU! You have been a tremendous help in guiding me in the right direction. Without people like you responding to content in the forums it'd be incredibly difficult to learn. 

 

from name_cleaver import IndividualNameCleaver
import pandas as pd

personList = []
lastnameList = []
middlenameList = []
firstnameList = []
result = {}

names = input_table['Name']

for x in names:
    if "Mr." in x:
        personList.append('mr') 
    if "Ms." in x:
        personList.append('ms')  
    if "Mrs." in x :
        personList.append('mrs')  
    newname = IndividualNameCleaver(str(x)).parse()
    lastnameList.append(newname.last)
    middlenameList.append(newname.middle)
    firstnameList.append(newname.first)
result = {'person': personList, 'lastname': lastnameList, 'middlename':middlenameList, 'firstname':firstnameList}
pyOut = pd.DataFrame(result.values(), result.keys()).T
output_table = pyOut