Selenium Nodes

Sezgin · January 2, 2023, 5:24pm

I’m QA Engineer and also rookie on KNIME.
I have task that should do as soon as possible.

Task is;
" Access the link with Selenium Nodes and collect the data table to KNIME as the following table

Query first 1000 CARS between $10.000.- and $15.000.- sorted by price"

How can I do?
Can you help me please?

Thanks everyone

qqilihq · January 2, 2023, 6:36pm

Hi Sezgin,

we’ve been in touch via email this afternoon – I’ll answer this here.

To familiarize yourself with the Selenium Nodes, I suggest to have a look at these entertaining and educational videos which I cannot recommend enough, by two power users of the nodes, @kowisoft and @armingrudd:

Then, here’s a more in-depth article by @armingrudd where he explains different aspects in detail:

https://blog.intellacct.com/rule-the-web-with-selenium-nodes-in-knime-eceda86f9a99

What’s generally helpful as well is the existing threads in the Selenium Nodes subforum which you find here – it’ll lead you to some hidden gems like this one, which you can surely adapt to your specific problems:

Several example workflows using the Selenium Nodes are available on NodePit, the most comprehensive search engine for KNIME nodes and workflows: selenium — NodePit

I hope this helps to get started – keep me posted how it goes! If you require more in depth support, please post a workflow which shows your current progress and describe where you’re having trouble.

Best,
Philipp

kowisoft · January 2, 2023, 11:36pm

Hi @Sezgin and welcome to the forums,

in general I would approach this to get a max list of all entries based on your search criteria and then process this “offline” within KNIME.

What I mean by that is to download the whole page source based on your search result and then convert the xml data that makes a website into a KNIME table.

If you could provide a link to the specific website you’d like to query, I am sure, we can come up with something.

@qqilihq thanks for the kind words Really appreciate that

Sezgin · January 3, 2023, 9:40am

Thank you so much @kowisoft .
I try to do this task.

https://www.cars.com

Please access the link above with Selenium Nodes and collect the data table to KNIME as the following table

Query first 1000 CARS between $10.000.- and $15.000.- sorted by price.

Populate in KNIME node.

Also as an additional note: Use CefSharp.OffScreen

kowisoft · January 3, 2023, 7:13pm

here’s my take on it

I used the configuration nodes to allow the user to configure the input into the component (e. g. min price, max price and number of entries to be returned).

Note: the SE Nodes run in the component, there is maybe a better way to structure that but it was just for these testing purposes. Hence the components takes a while until it is finished. If you want to have more control then manually run the loop within the component.

Interesting that all search parameters are “hard coded” into the result URL. With thatl, all you have to do is to make a clever combination of your parameters and the String Manipulation Node (Variable)

I also splitted the resulting items by space as cars.com seems to have a clear “titling policy” as in “YEAR [space] Brand [space] everything else”

I also couldn’t locate any ID, at least not in the ovierview pages which my workflow scrapes. I don’t have the time to deep dive into every results page and hence I return the detail listing page for every line item which enables you to go deeper if you want.

Let me know, if that helps

Sezgin · January 4, 2023, 5:43pm

Hi @kowisoft ;

Thank you so much for your support.

I’ve inspected and donloaded your workflow.
I run my PC and given this “WARN MISSING HTML Parser”

How can I solve this problem? And
Could you teach this workflow details to me?

As I said before I’m rookie on KNIME.
I should accomplish my task to Friday.

my task deadline is Friday (06/01/2023)

Thank you so much.

qqilihq · January 4, 2023, 6:31pm

Jumping in for Phil here. You’ll need the Palladian plugin which contains the HTML Parser (and several other web scraping related nodes)

You can find more information and download Palladian from the following link:

Best regards,
Philipp

kowisoft · January 4, 2023, 6:53pm

And I’d like to add it is sooooo much worth it @Sezgin

It contains other amazing nodes like the Regex Extractor Node, which is amazing

Sezgin · January 4, 2023, 8:35pm

Guys . You’re perfect. Weel Done

As you know that It is a task given from company to me. And If I have achieve this task I can get an OFFER. I need an offer and job

But I don’t know inner of this workflow.
I must learn this.
Could you teach me this workflow tomorrow please?

I really wonder KNIME and Selenium Nodes and Data Scraping.
I’m excited to learn these.

Thank you so much @kowisoft , @qqilihq

kowisoft · January 5, 2023, 8:08am

@Sezgin

I am pretty buys today but if a “rough video” is good enough for you, I can do one tonight (central European timezone)

Sezgin · January 5, 2023, 11:44am

Yeah . I see you.
We can talk 1:1 video for tonight on zoom etc.
In the meantime somepoints that I didn’t understnad (like Where selenium used this workflow)
Do you have any guide to learn quickly KNIME and Selenium nodes usage. I can watch and learn sth to our meeting.

Thanks so much

Sezgin · January 5, 2023, 4:36pm

oooo I’m so sorry.
You have a youtube channel and “rough video” means that you will record and upload there.

I misunderstood you.

I will be waiting for explanation of this worklow’s video.
Thanks a lot
Finally; I’ve subscribed your channel

kowisoft · January 5, 2023, 9:22pm

Ok, so here’s my explanation video for this specific workflow.

Please excuse my slightly “off” hairstyle, it is pretty late here and I had a busy day

let me know, if you have further questions

Sezgin · January 6, 2023, 7:41am

Thank you so much @kowisoft your video is that very understandable.
As you know that in my case I should use Selenium Node.
And I set up Selenium Nodes. So I have a 2 problem;

I can’t scroll down popup page for click “SIGN IN” button. (Normally uses window.scrollTo JS method) but didn’t enter popup page.

cars.com_extract_list_58073.knwf (45.5 KB)

Second problem is that to connect 2 workflow and run together

cars.com_extract_list_58073.knwf (45.5 KB)
57997-cars-dot-com-scraper.knwf (33.3 KB)

Because my task wants to scrap data via using Selenium Node (because I’m QA TESTER )

Thanks a lot
This day is my deadline

wonyoung11 · January 6, 2023, 7:52am

ดูการ์ตูนออนไลน์
ดูหนัง Netflix

kowisoft · January 6, 2023, 8:02am

Why do you need to sign in? The search results are available without the need to sign in, at least on cars.com

Besides that, if you set the max number of results per page to 100 in the URL, why do you need to scroll down? The shared workflow extracts the whole (!) page so I don’t see a need for scrolling.

The JS shared is usually used when you have endless scrolling pages like reddit, Facebook etc

Also, I do not understand why you want to connect 2 workflows.

kowisoft · January 6, 2023, 8:03am

Hello @wonyoung11 and welcome to the forums.

Could you translate that into English please?

Sezgin · January 6, 2023, 8:06am

Becuse my task owner wants to do this way. He wants to “SIGN IN” as a user and then scrap results(you did).

As a QA tester we uses Selenium tool . So my task owner (company) wants to use Selenium in my task.
If ı can do, he may give an offer to me.

kowisoft · January 6, 2023, 1:31pm

To be honest, I have no clue right now.

As a generic recommendation I would suggest to look into the pages source code and try to locate the following 3 items:

User name
Password
Submit button

Then interact with those elements through the SE Nodes, mainly “send keys” node and “click” node.

system · April 6, 2023, 1:31pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.