Python output to a dataframe

Hi,

Please allow me to caveat this topic post by saying I know that Python exists, there are many things I would like to do with Python in Knime (especially now that most of the projects I am working on include a big geospatial component), but my Python expertise ends there. :confused:

I have a python script that works in Jupyter Notebook to do a spatial join between 2 shapefiles, and I think that I have most of it working in the Knime Python script node but keep getting this error:

ERROR Python Script 3:2 Execute failed: No serializer extension having the id or processing python type “shapely.geometry.point.Point” could be found. Unsupported column type in column: “geometry”, column type: “<class ‘shapely.geometry.point.Point’>”.

This is the code from the node:

# Imports
import pandas as pd
import geopandas as gpd

# Points Shapefile filepath
traders_fp = "D:/Users/TC/Projects/Flash/Trader Profile/Active Trader Lists/2022-06/Test/Traders_2022-06_PyTest.shp"

# Read points shapefile
traders = gpd.read_file(traders_fp)

#Change CRS
traders = traders.set_crs('epsg:4326', allow_override=True)

# Polygon Shapefile filepath
areas_fp = "D:/Users/TC/Projects/Geo Data/Areas_20220608/Areas_20220608.shp"

# Read polygon shapefile
areas = gpd.read_file(areas_fp)

#Change CRS
areas = areas.set_crs('epsg:4326', allow_override=True)

#Spatial Join
join = gpd.sjoin(traders, areas, how="inner", predicate="intersects")

#Output to dataframe
output_table_1 = pd.DataFrame(join)

When I execute the script in the node in the configuration dialog I get no errors, I think because it does not need to create the output table, but as soon as I run it as part of the workflow I get the error. Could be when the node is creating the data frame for the output table? It works in Jupyter Notebook!

I would really appreciate any help getting this to work. It will save hours going backward and forwards between Knime and QGIS doing spatial joins.

Thanks…

tC/.

@TigerCole I think KNIME would not have a data type within KNIME. If you do not need the geo data *within’ KNIME would it be an option to just store the file (maybe save the path/name) and then reuse it later in a Python node.

Two additional remarks. You can use Jupyter notebooks with KNIME and also transfer data between them

(also check out the additional links at the sample workflow)

Then keep in mind, that the new Python nodes make use of the primary keys in a special way:

Hi @mlauber71 …

I will give the suggestion to write the data frame out to a file from the node and then read it back into the workflow a try.

The geodata is WKT like “Point (12.1233,34.1234)” which I use often in Knime and can be written out as text so I don’t see why creating a data frame fails Iif that is indeed the problem) so it seems strange that the data cannot be put into a python data frame and used as the node output.

Maybe KNIME 4.6 which is due out tomorrow will handle this better.

tC/.

@TigerCole you could always convert your data to a string and then it should work in KNIME. You would have to do that in Python.

5 Likes

Hi @mlauber71 …

It seems Knime minds think alike… that is exactly what I just did.

The problem was the column with dtype=geometry, so I converted it to a string and then could output the data frame.

Here is the updated python:

import pandas as pd # Need this for df
import geopandas as gpd # Need this for the spatial join

# Point shapefile filepath
traders_fp = "D:/Users/Russell/Projects/Trader Lists/2022-06/Test/Traders_2022-06_PyTest.shp"

# Read point shapefile and set CRS
traders = gpd.read_file(traders_fp)
traders = traders.set_crs('epsg:4326', allow_override=True)

# Areas shapefile filepath
areas_fp = "D:/Users/Russell/Projects/Areas_20220608/Areas_20220608.shp"

# Read areas shapefile and set CRS
areas = gpd.read_file(areas_fp)
areas = areas.set_crs('epsg:4326', allow_override=True)

# Spatial join
# how="inner", ...
# preicate=intersects","within","contains"
join = gpd.sjoin(traders, areas, how="inner", predicate="intersects")

# Create Dataframe
dfjoin = pd.DataFrame(join)
dfjoin['geometry'] = dfjoin['geometry'].astype(str)

#Output to dataframe
output_table_1 = pd.DataFrame(dfjoin)

As always, thank you for your comments and suggestions. They are really appreciated.

tC/.

6 Likes

Hi @TigerCole,
Thanks for trying the Python Scripting nodes with Geospatial data! And thanks @mlauber71 for the good suggestions :slight_smile: !

You are right, we currently do not support the shapely data type (or WKT in general) in KNIME. So right now the workaround is to convert this data to string, so that when we parse the Pandas DataFrame and convert it to a KNIME table we can use a type known to KNIME.

That being said, just a brief teaser here: there is a geospatial extension in the making, support will be coming to KNIME and Python in KNIME soon :wink:
Best, Carsten

3 Likes

Hi @carstenhaubold …

The promise of a geospatial extension is very exciting… a lot of the work I am doing in Knime now has a geospatial component which often means flipping data between Knime and QGIS.

Awesome!

tC/.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.