I am trying to extract store information from a store locator so that I don’t have to manually look up each one. I tried looking around the forums for some kind of clue and played with the Palladian extension, but I can’t seem to figure out a way to get a solution for my own need. The closest thing I could find was with extracting the tips and tricks, but that operates on the fact that the http changes and so you could create a loop. A store locator doesn’t have an http that changes for each store that you select and so I’m stuck. Similarly, I haven’t had luck with figuring out a way to get xpaths for every store because of this issue.
i guess the easiest solution would be to check how the post request for sending the Zip codes looks like (e.g. with Firefox network monitor)
Then loop through all US zip codes and sending a post request (palladian or just post node) and parsing the response… However that will be quite a few requests
Next step would be to have a list of ZIP Codes to check and mostly reducing the number of request you have to send in the given area range
And maybe set a good timeout between requests as to not get blocked by their page
Sorry about the late reply! This definitely looks like what I’m looking for, but I guess the store locator has changed since your last post because now it is a T-Mobile store locator which seems to have a different format from the original Sprint Store Locator.
I noticed that the URL changes to reflect a search by state and city. Would it be possible (and maybe more simple) to extract the store information by looping the URL through all the states and cities?
Actually should not change much from the base workflow idea - looping through citys instead of zip codes should be the same process.
Maybe you won’t have to forge cookies etc.
Basically you’ll have to check what is returned if you open the pages.
If you check by state it seems to return all cities with stores, so you could do it in two steps (to reduce the amount of requests)
First search by each state, extract all the cities with stores - then search by each city.
At the first glance (without looking into detail) a simple get node should to the trick
Search by zip code should work as well but would require a lot more requests
If you get it to work - make sure to share it with everyone
My current workflow is just trying to pull the cities in each state that have a store and it’s not working for some reason. I can see that the Response contains the information I’m looking for, but it doesn’t show up in the body column from my GET result.
If I can understand how to get this part to work, I think that I’ll be able to figure out how to get the store information from the cities. Could you check out my workflow to see where I’m wrong? As far as I can tell, it doesn’t look like I need a cookie for this particular get request.
Update:
Actually, I was able to get it work now. I realized that my request string was supposed to be the GET from the Headers section and not the URL. I’ll try to see if I can get the store information. If it works out, I’ll post my solution!
It follows the process outlined by @AnotherFraudUser but done twice. The first iteration extracts all the cities from the store locator and the second iteration extracts all the stores in each city. The process is similar between the two with slight differences in the request string used and maybe some of the request headers.
There might be a more simple way to clean the JSON data that’s extracted, but I just did what I could with the limited knowledge I have of it. I welcome any suggestions for improving the workflow.
Thanks for all the help! Doing this little project helped me to understand more about how web scraping works.