Can anyone explain how to use get request to obtain data from the web? I am trying to obtain local postal codes by street names. I have two different large data sets (over a million rows). One data set already has a column by the postal code, the other by street names.
How do i use get request to find the postal code for street names in a column?
Is it smart to break up the data to smaller size through sampling or partitioning? Or can i get the information by for the entire dataset and how quickly can it be processed?
Should i use any specific request/response headers?
Below is the workflow that i am currently working through.
hi
you enter the url into the get request node and it returns the content of the page if the request was succesful. (status code 200)
For Geo Data you probably want to query an API (e.g. from Google Maps). You might want have a look at that to get started.
br
Hi @MichaelEkine and welcome to the Knime Community.
Based on your questions, it is not clear if you fully understand what you are asking, and that makes it hard to answer them.
Take this question for example:
That’s like asking “How to I use the street to get to the supermaket?”. Well you can go on foot or in a vehicle.
So in your case, you need either an api or a website and you enter their url in the Get Request node like @Daniel_Weikert mentioned.
It depends on how many customers can the supermarket take and how fast they can serve the customers.
In your case, it depends on how many requests the api/website allows you to send in a period of time, and how fast it can process your request. Generally speaking, you should send in batches (Chunk Loop) with some delays (5 seconds) in between (Wait node)
This is specific to the supermaket. It’s like asking “can I wear sandals?” or “can I bring my pet?”
So in your case, it depends on what the api says. The would specify this in their documentation.
@bruno29a I really like your metaphor. Well explained! @MichaelEkine Bruno mentioned a great tip at the end. The best way to figure out what you need is go to the API documenation directly. The doc usually tells you exactly what you need (headers, params,…)
br and take care
As you noted, i am new to the KNIME community. I found out about KNIME due to a school course. I have a project and i am trying build capacity and understand how to use KNIME to perform analysis. I have no technical background from a coding or programmer perspective. I do have some academic experience with statistics. So when i ask a question, its because i have an idea of what i am trying to do.
I tried @Daniel_Weikert suggestion with regards to using google API’s and i have spent time trying to learn and understand the technicalities behind using API’s to get my answer. I have two sets of data:
a) covid data that has locations based on “FSA Column” see below and take note of row size
b) parking ticket data based on “location2 column” see below and again take note of row size
So i needed find a way for location to be have postal codes so i can have a source of commonality between the two datasets for analysis.
With regards to this, my initial attempts to use the api’s for the data sets for parking tags, seemed to take over 8hrs, i literally saw the percentage bar not move in the space of 8hrs (i started in the evening and by next morning errors). Hence my questions.
used another one to replace the constant value with the values in "FSA column and get rid of the spaces ![image|624x52]
This is the error when i run the get request
This is the sample of the internet and result i am trying to get
I am honestly trying to learn. I enjoy the learning but i do have a deadline on the project and i want to do a good job.
I would appreciate any tips on getting a successful result.
Thanks
The questions that you asked were not exclusively related to Knime, so even if you are new or not to Knime, they are more related to web requests, and it sounded like you were not asking the right questions. That is why I tried to come up with an analogy were you could relate to what kind of questions you were asking thinking that it could help you understand.
I tried using the analogy, and also answer your questions at the same time, hence the “So in your case” parts.
I am sorry if they were not helpful, and that you felt they were heavy handed.
If you don’t mind a last advice from me (last one cause I don’t want conflicts), do not share your api key. It appears in the screenshots. Any requests that use that api key will be attached to your account, and if someone has bad intentions, they can abuse google using that key, and YOU will get in trouble, not them.
While you did try to obfuscate the key, it’s visible in the Browser Result.
EDIT: I went back and read carefully all that you wrote. Your overall approach is very good actually.
Just some tips:
Using generic address and Replace: Good approach, but instead of using a generic address (3040 silverthorn…), you can instead use a placeholder, like “##ADDRESS##”, so your url template would be something like https://maps.googleapis.com/maps/api/geocode/json?address=##ADDRESS##®ion=CO&key=<your_key>, and then use the placeholder for the replace: replace($FSA$, "##ADDRESS##", $location2$)
Replacing space with % in the url: There is a urlEncode() function that you can use that should do this trick for you. You can even do it in the same replace statement above from #1: replace($FSA$, "##ADDRESS##", urlEncode($location2$))
Processing the result: The result that you are getting is in JSON format. You can look into JSON Path that can help you extract values or specific values from a JSON data
Hi @bruno29a
I appreciate the apology. I didnt take offense. Thank you for helping out.
I was using get request node and i didnt know if i was using the node correctly.
I honestly tried to see it from your perspective and hence i provided as much details as i could. As it made me review the youtube video multiple times in order to attempt understand the node and provide details to the exam.
Thanks for the advice on the key, its not my key, its an example i saw online.
I will attempt your suggestions and let you know as soon as possible.
FYI, the encoding for a space is actually %20, not just %, so you should replace by “%20”.
urlencoding would actually replace with a “+”, which should also work. Both “+” and “%20” are acceptable.
One last thing that I forgot to include in my previous post is about “break up data to smaller size”. As I “explained”, this is more about how many requests does googleapi allow you to send per “X” amount of seconds/minutes.
The reason why they put a limit is because you could overload them, and also for them to protect themselves against such attacks. They would usually specify how many requests you can send for a period of time.
The good thing here is that Knime’s Get Request has the options to set that up, so you don’t have to manually implement that (though you still could, through a Chunk Loop).
That’s where the Delay and Concurrency come in:
You can specify how much delay you want between the requests. You specify how many milliseconds (1 second = 1000 milliseconds, so 5 seconds = 5000 milliseconds) you want to wait, and Concurrency is how many requests you want to send at a time.
So, let’s say google says you can send 1000 requests at a time, and you have to wait 3 seconds in between, you would set the options like this: