I am starting a research project about online conversations in social media and I will like to use knime to extract and process information from a social media platform. I am familiar with the text-processig part in Knime, but now I will like scraping the data I need by using the nodes from Palladian.
I am trying to parse a Facebook page and extract two colums. The first one related original posts of the facebook page or other user/friends and second one with the company or user/friend text replies to the original comment. I have checked the manual online manual/white paper in which it is explain how to use the xpath node of palladian (http://www.knime.org/files/knime_web_knowledge_extraction.pdf ) , but I don know which xpath querries, names or prefixs I could use to parse the facebook page and obtain the content that I want.
Please it would be great if somebody could help me with some advice about how to write an xpath querry... a simmilar example of with a twitter page would also help me.
The general approach for using XPath is to open the page in question in your Web browser and use some DOM inspection tools (WebKit based browsers, such as Chrome or Safari have that already included) to find out the XPath which you can then insert into the XPath node.
Depending on your actual use case, maybe using the WebSearcher node might simplify things. It's able to search on Twitter and Facebook (and many more).
Hi Philip. thanks a lot I will try what yo suggest!,
The Web Searches, don't display the option for search engine from Facebook.
Is it possible just Social Mention (option)?
Thank you Marcus
apologies, I overlooked your post. Facebook unfortunately removed their search through their Graph API, thats why it's no longer available in the WebSearcher node. Social Mention would be an alternative, though I haven't used it for a while and I cannot tell you how "complete" the results are.
Another more recent option would be building a workflow using our new Selenium nodes and performing a search via simulated browser and extract the desired results.
Hope that helps,
Thank very much you Philipp, I will try.