Splitting an address

Reg · September 21, 2020, 8:08pm

Hi,

I’m attempting to split a cell with US mailing addresses into 3 columns: City, Street, State, Zip. Every single item in the string is separated by a space.

All states are 2 letter abbreviations (NJ, CA, ND), and all zips are 5 digits (10001, etc.). They always represent the final 8 digits of the text (state, space, zip).

I’m able to split the first cell of the address with the space (between number and street) using the cell splitter, but the variability in the second group (which ranges from 12 characters to 31) causes issues on the State/Zip extraction.

Example (12 characters vs. 31)

Ada OK 74820
to
Washington Court House OH 43160

The result should be 3 columns, City/State/Zip.

Thanks in advance for your help!

j_ochoada · September 21, 2020, 9:15pm

Hi @Reg,

Welcome to the forum!

I put this together based on your assumptions of the Zip and State being standard. Basically it splits those off and then treats the rest as the address. It’s very rudimentary… I’m sure there is someone who can chime in with a clever Regular expression which will do what you need in one node. I searched and tried some but couldn’t get it.

Hope this helps
Jasonparse_address.knwf (37.9 KB)

Reg · September 21, 2020, 9:20pm

Thanks. I tried to view the file, but I’m getting some errors and it loads blank. The error message =

parse_address-2.knwf 4 loaded with errors
parse_address-2.knwf 4
Column Auto Type Cast 4:401
Loading model settings failed: Config for key “column-filter” not found.
State has changed from CONFIGURED to IDLE
State has changed from CONFIGURED to IDLE

j_ochoada · September 21, 2020, 9:23pm

Hi @Reg,

Sorry about that. I put it together in another workflow I was working in and then copy pasted into new workflow. I think if you change your view to 50% you will be able to see it. Then you can ctl A to select all and you can move it to the upper left corner of the workbench.

Hope this helps
Jason

Reg · September 21, 2020, 9:29pm

Got it! Thanks again!

ipazin · September 22, 2020, 8:46am

Hello @Reg,

welcome to KNIME Community!

You can try following approach. Use reverse() function from String Manipulation node followed by Cell Splitter By Position as state and zip are of fixed length. Then use String Manipulation (Multi Column) node to reverse and strip newly created columns. Here is workflow example:
2020_09_22_Address_Manipulation.knwf (14.5 KB)

Regex can be probably used as well…

Br,
Ivan

Reg · September 22, 2020, 1:10pm

Ipanzin…Thank you for the workflow. I had considered the reversal but was stuck on the management of the city names (from 1 to 4 names). Your flow addressed that. I had “solved” the issue thanks to j_ochoada, but the reversal eliminates 2/3rds of the nodes and is definitely much faster. Cheers!

Daniel_Weikert · September 22, 2020, 6:05pm

Here is the regex way
regex

Reg · September 22, 2020, 7:56pm

I’m going to have to dig into the regex capabilities. Thanks!

j_ochoada · September 28, 2020, 8:07pm

@Daniel_Weikert very cool! Thanks !!

system · October 5, 2020, 8:07pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.