Collect Data from LinkedIn

Ralph2605 · November 5, 2020, 7:30pm

Hi,

I am trying to collect some data from LinkedIn for a further analysis, but LinkedIn requires a login. Any clue how I can set up a workflow in Knime with a login procedure?

Thanks!
Ralph

ipazin · November 6, 2020, 1:42pm

Hello @Ralph2605,

what data are you collecting from LinkedIn? Check here a workflow that gets data from LinkedIn using GET Request node. In this case authentication is not needed but there is tab where input your credentials:

Additionally here is a bit more about crawling LinkedIn data. Seems not a trivial task.

Br,
Ivan

Aymen · December 8, 2020, 1:52pm

Hi,
I’m trying to use GET REQUEST to collect data from Linkedin.
I used a simple Request: https://api.linkedin.com/v2/ugcPosts?q=authors&authors=List(urn%3Ali%3Aorganization%3A1415)&start=0&count=100 but it didn’t work. In my request header I put Authorization (with the value of my Bearer Token) and X-Restli-Protocol-Version.
Is something missing ?
Regards,
AYmen

ipazin · December 8, 2020, 3:18pm

Hello @Aymen,

don’t think I can help much as I have never tried collecting data from LinkedIn. However what’s the response, errorCode, status you are getting from GET Request node?

Br,
Ivan

Aymen · December 8, 2020, 3:30pm

Thank you hope you can help me.
ERROR GET Request 3:2 Execute failed: Wrong status: 403 Forbidden

ipazin · December 8, 2020, 4:01pm

Hello @Aymen,

that means access denied. See here what 403 means and what to check:

Also see some comments/solutions here:

Br,
Ivan

Aymen · December 8, 2020, 4:14pm

I already saw this websites. I checked everything, but I still have the same problem.
I can do the same request with POSTMAN and it works …
Is there something that I should configure in the node ?

ipazin · December 8, 2020, 4:25pm

Hello @Aymen,

unfortunately can’t help you more. Hopefully someone else knows more and will join topic. Also maybe you’ll get help following this suggestion from above docs:

“If you continue to see the error, reach out to your partner technical support channel or https://developer.linkedin.com/support.”

Br,
Ivan

Aymen · December 9, 2020, 11:59am

Hi,
the strange thing is that other API requests work… (For Example get Author)

Tyler · December 11, 2020, 1:54am

Howdy partner, quick question: what are you trying to analyze, what’s the link, and then lets go from there. If you need to use selenium to walk through a login process, be sure to check out selenium nodes. IMO you can do a lot of that stuff with 2-5 lines of python and i think it’s important to keep those things where they are easiest. Maybe what you’re doing will require this direction to be explored further

There’s two different kinds of HTML requests.

Sometimes requests library works OR in knime, sometimes the GET REQUEST node works too.
Other times you gotta wait for javascript to load, you’re hitting the WRONG URL, and a bunch of other smart stuff i dont really understand.

Next layer of scraping could be selenium, which has a really awesome HTML grab built into the library. So maybe that’s what you’re trying to do and because that extra pause and wait to get the information is necessary, the get request you’re trying in knime is coughing up a nopes.

You may want to explore the selenium method of grabbing ALL the html. I would suggest avoiding trying to play with beautifulsoup or requests library if the get request isn’t getting you the HTML you desire.

example;

requests library is requests.get(etc), im cool and dump it into a text file on my desktop because i know i can parse over a directory of files utilizing knime in various ways.

below that is driver.page_source, that’s selenium library grabbing the same data and making a text file too, and notice how im not stressing about making this in a “super cool database” which i hope will make this more adoptable if you decide to go this route.
r1 = requests.get(url3)
h1 = r1.text
t1 = time.strftime("%Y%m%d-%H%M%S")
f1 = open(‘C:\Users\tyler\Desktop\scrape\keyword-’+t1+’-’+x+’.txt’, “w”)
f1.write(h1.encode(‘utf8’))
y2 = driver.page_source
time.sleep(random.randint(9,10))
t2 = time.strftime("%Y%m%d-%H%M%S")
f2 = open(‘C:\Users\tyler\Desktop\scrape\keyword2-’+t2+’-’+x+’.txt’, “w”)
f2.write(y2.encode(‘utf8’))

There are several really good selenium knime-ers in this forum, and im sure you will find their info about their selenium usage of knime. I personally dont do any of that in knime because i think it’s easy to do in python and can now use knime as a tool to maybe… write my python VS doing it myself, which helps me orchestrate these pieces at a deeper level.

Good luck, hope this helps.
T