Assign different location information to a location

Hello,

Is it possible that Knime recognizes different location designations and assigns them accordingly to the respective countries/ states/ cities? I have a dataset that contains more than 50,000 different location names. For further analysis, I would like to shrink the dataset and, for example, only look at the data coming from the USA.

Currently, I could use the rule engine to filter out the typical spellings of the states and the country. However, the states would be too much manually.
The new Geospatial Analytics features have not helped me yet either, as I have neither longitude nor latitude data.

Locations excel 2.xlsx (1.8 MB)

Best,
Hendrik

Hello @hennemac ,

I have opened your file and noticed there’s lots of different data. One suggestion I could come up with is using API services. Are you familiar with it?

Here is a small workflow to give you an idea of what I had in mind:

Once you get the standardised address, you’ll find a column called “country” you could use for filtering.

Hope it helps.

Have a nice day,
Raffaello

1 Like

Hey @lelloba ,

thanks for the reply! I’m not really familiar with API services yet, but I’ll try my best with the provided workflow :slight_smile:

Thanks again

Hendrik

1 Like

Let me know if you need some help! :slight_smile:

Have a nice evening,
Raffaello

Hey @lelloba ,

hope you had a great weekend. Unfortunately I couldn’t get any process with my data using the API services. If you could give me some help I’d really appreciate it :slight_smile:

Have a nice day,

Hendrik

You could give the “Main Location Extractor” from Palladian a try:

Alternatively there’s also a couple of geocoders available. Check the “Geo” section in Palladian:

All of these will map the textual information to a latitude/longitude pair.

Let me know that works!

3 Likes

Hello @hennemac ,

here is a little workflow that coul help you.

I used Here services. Create an account and an API key, then paste it into the Variable Expressions node.


https://platform.here.com/sign-up?step=verify-identity

Make sure to remove the Partitioning node (it selects 100 random rows just for testing).

The output looks like this:

Not perfect, but might be a good starting point.

If you need help, tell me :slight_smile:

Have a nice day,
Raffaello

2 Likes

Thank you so much! I will try it out right away :smiley:

1 Like

How do I get the API Key? I signed up on Here… Do I need to select a specific service on their site?

@qqilihq Thanks for your solution. I would love to try this solution, but since installing NodePit, Knime crashes right after startup. Can I uninstall the extension somehow? :sweat_smile:

Looks like there’s an issue on some installations between the recently released KNIME 4.7 and the NodePit view. We’ll look into that and provide a fix. Apologies for the inconvenience!

To better reproduce this, may I ask what OS and version you’re running?

In the meantime, you can disable the NodePit view:

  1. In you KNIME workspace directory, create a file at the following location: knime-workspace/.metadata/.plugins/org.eclipse.core.runtime/.settings/com.nodepit.knime.plugin.prefs

  2. Put the following content into the file:

    eclipse.preferences.version=1
    lastNodePitVersionActivated=2.7.0.202112111436
    nodePitUrl=http\://localhost
    
    
  3. Start KNIME and after KNIME has started, just close the NodePit tab for now.

You can still install plugins and extensions from NodePit (such as Palladian, etc.), also without the “NodePit View” by adding the following update site URL to your preferences: https://download.nodepit.com/4.7

Hope that helps!

I’m running on MacOs 13.0.1.

Already managed to reopen Knime by deleting the “com.nodepit.knime.plugin_2.7.0.202112111436”-file. The extensions are supposed to be installed, but Palladian for example does not show up :thinking:

Are these steps the same on MacOs?

  1. In you KNIME workspace directory, create a file at the following location: knime-workspace/.metadata/.plugins/org.eclipse.core.runtime/.settings/com.nodepit.knime.plugin.prefs
  2. Put the following content into the file:
eclipse.preferences.version=1
lastNodePitVersionActivated=2.7.0.202112111436
nodePitUrl=http\://localhost

I’d suggest to try running KNIME with the “-clean” parameter once. See here for the steps:

This only needs to be done once, and hopefully after that, also Palladian will show up in the NodePit category. Fingers crossed :crossed_fingers:

@qqilihq Unfortunately it’s not showing up. Should I reinstall Knime completely?

Edit: Suddenly the Nodes show up. Fingers crossed that the Nodes can help :slight_smile:

1 Like

@qqilihq The Nodes all rely on the GeoBase Source or a Local Location Source… With the GeoBase Source my limit was directly used up within seconds. My whole dataset contains about 1.4 million location information. With the daily limit of 20.000 requests this will probably take forever :sweat_smile: Or did I overlook something?

If you have that much data, the Local Location Source definitely makes sense, which allows you to self-host the database on your machine:

This functionality is part of the commercial offering. You can get in touch at mail@palladian.ai for more information.

Follow this link with all instructions: :slight_smile:

Keep in mind that there’s a limit in using this solution: 5 requests per second and something like 250k per month.

Raffaello

Thanks!

I’ll try how far I can exceed the limit :smiling_face:

1 Like

I do get the ERROR Loop End 3:548 Execute failed: Input table’s structure differs from reference (first iteration) table: Column 29 [body (Binary object)] vs. [body (JSON)]. This always happens after a few minutes of runtime. Can’t imagine that it’s already due to reaching the limit of 250k

Back to this: Thanks for the bug report :pray: We fixed this in the NodePit for KNIME release 2.7.1 available since this afternoon.

–Philipp

2 Likes