I am trying to extract store information from a store locator so that I don’t have to manually look up each one. I tried looking around the forums for some kind of clue and played with the Palladian extension, but I can’t seem to figure out a way to get a solution for my own need. The closest thing I could find was with extracting the tips and tricks, but that operates on the fact that the http changes and so you could create a loop. A store locator doesn’t have an http that changes for each store that you select and so I’m stuck. Similarly, I haven’t had luck with figuring out a way to get xpaths for every store because of this issue.
For example, I want to get the address and the store number for every Sprint store from their website https://storelocator.sprint.com/locator/.
I appreciate all the help I can get!
i guess the easiest solution would be to check how the post request for sending the Zip codes looks like (e.g. with Firefox network monitor)
Then loop through all US zip codes and sending a post request (palladian or just post node) and parsing the response… However that will be quite a few requests
maybe a small example for your store locator
Go to your target page and open the network monitor (“Netzwerkmonitor” in German )
Do your search and try to find the relevant get/post request (most one of the first)
There we see that the
will be requested (which includes our input location
It is a get request so we only have to check the target URL as well as if the headers include information we have to provide:
Header cookie includes some information which might be relevant e.g. userLocation, refererer and host
Check what is set in the initial page request:
see that maybe userLocation should be added and try if that is enough
Now create these information in a requestUrl and request cookie:
And it seems in your case that is enough:
Response from KNIME:
Response if doing it manually in firefox:
Now you just have to parse the jsons
*and maybe if you like to test what the Sprint site allows check how far you can “enhance” the distance modifier
9 returned objects
98 returned objects
store locator.knwf (15.7 KB)
Next step would be to have a list of ZIP Codes to check and mostly reducing the number of request you have to send in the given area range
And maybe set a good timeout between requests as to not get blocked by their page
Nice one @AnotherFraudUser
Sorry about the late reply! This definitely looks like what I’m looking for, but I guess the store locator has changed since your last post because now it is a T-Mobile store locator which seems to have a different format from the original Sprint Store Locator.
I noticed that the URL changes to reflect a search by state and city. Would it be possible (and maybe more simple) to extract the store information by looping the URL through all the states and cities?
Thanks for the help!!
Well sure could be possible.
Actually should not change much from the base workflow idea - looping through citys instead of zip codes should be the same process.
Maybe you won’t have to forge cookies etc.
Basically you’ll have to check what is returned if you open the pages.
If you check by state it seems to return all cities with stores, so you could do it in two steps (to reduce the amount of requests)
First search by each state, extract all the cities with stores - then search by each city.
At the first glance (without looking into detail) a simple get node should to the trick
Search by zip code should work as well but would require a lot more requests
If you get it to work - make sure to share it with everyone
My current workflow is just trying to pull the cities in each state that have a store and it’s not working for some reason. I can see that the Response contains the information I’m looking for, but it doesn’t show up in the body column from my GET result.
If I can understand how to get this part to work, I think that I’ll be able to figure out how to get the store information from the cities. Could you check out my workflow to see where I’m wrong? As far as I can tell, it doesn’t look like I need a cookie for this particular get request.
store locator.knwf (16.6 KB)
Here are some screenshots of the information I’m using from searching Alabama:
Actually, I was able to get it work now. I realized that my request string was supposed to be the GET from the Headers section and not the URL. I’ll try to see if I can get the store information. If it works out, I’ll post my solution!
Here’s the workflow with my solution:
store locator.knwf (34.6 KB)
It follows the process outlined by @AnotherFraudUser but done twice. The first iteration extracts all the cities from the store locator and the second iteration extracts all the stores in each city. The process is similar between the two with slight differences in the request string used and maybe some of the request headers.
There might be a more simple way to clean the JSON data that’s extracted, but I just did what I could with the limited knowledge I have of it. I welcome any suggestions for improving the workflow.
Thanks for all the help! Doing this little project helped me to understand more about how web scraping works.
great! Thanks for the final solution
Will check out how you solved it!
you can use KNIME Hub to share your solution. Will be more visible and easier to locate
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.