Selenium Nodes

I’m QA Engineer and also rookie on KNIME.
I have task that should do as soon as possible.

Task is;
" Access the link with Selenium Nodes and collect the data table to KNIME as the following table

Query first 1000 CARS between $10.000.- and $15.000.- sorted by price"

How can I do?
Can you help me please?

Thanks everyone

1 Like

Hi Sezgin,

we’ve been in touch via email this afternoon – I’ll answer this here.

To familiarize yourself with the Selenium Nodes, I suggest to have a look at these entertaining and educational videos which I cannot recommend enough, by two power users of the nodes, @kowisoft and @armingrudd:

  1. KNIME Workflow Tutorial - The KNIME Knight in Limelight - Episode 001 - Selenium Nodes - YouTube
  2. KNIME Workflow Tutorial - The KNIME Knight in Limelight - Episode 002 - Selenium Nodes - YouTube
  3. Login and upload files to a website using Selenium nodes in KNIME - YouTube

Then, here’s a more in-depth article by @armingrudd where he explains different aspects in detail:

https://blog.intellacct.com/rule-the-web-with-selenium-nodes-in-knime-eceda86f9a99

What’s generally helpful as well is the existing threads in the Selenium Nodes subforum which you find here – it’ll lead you to some hidden gems like this one, which you can surely adapt to your specific problems:

Several example workflows using the Selenium Nodes are available on NodePit, the most comprehensive search engine for KNIME nodes and workflows: selenium — NodePit

I hope this helps to get started – keep me posted how it goes! If you require more in depth support, please post a workflow which shows your current progress and describe where you’re having trouble.

Best,
Philipp

6 Likes

Hi @Sezgin and welcome to the forums,

in general I would approach this to get a max list of all entries based on your search criteria and then process this “offline” within KNIME.

What I mean by that is to download the whole page source based on your search result and then convert the xml data that makes a website into a KNIME table.

If you could provide a link to the specific website you’d like to query, I am sure, we can come up with something.

@qqilihq thanks for the kind words :slight_smile: Really appreciate that :slight_smile: :+1:

4 Likes

Thank you so much @kowisoft .
I try to do this task.

https://www.cars.com

Please access the link above with Selenium Nodes and collect the data table to KNIME as the following table

Query first 1000 CARS between $10.000.- and $15.000.- sorted by price.

|ID|Brand|Make|Model Year|Color|Price |

Populate in KNIME node.

Also as an additional note: Use CefSharp.OffScreen

1 Like

here’s my take on it

I used the configuration nodes to allow the user to configure the input into the component (e. g. min price, max price and number of entries to be returned).

Note: the SE Nodes run in the component, there is maybe a better way to structure that but it was just for these testing purposes. Hence the components takes a while until it is finished. If you want to have more control then manually run the loop within the component.

Interesting that all search parameters are “hard coded” into the result URL. With thatl, all you have to do is to make a clever combination of your parameters and the String Manipulation Node (Variable)

I also splitted the resulting items by space as cars.com seems to have a clear “titling policy” as in “YEAR [space] Brand [space] everything else”

I also couldn’t locate any ID, at least not in the ovierview pages which my workflow scrapes. I don’t have the time to deep dive into every results page and hence I return the detail listing page for every line item which enables you to go deeper if you want.

Let me know, if that helps :slight_smile:

4 Likes

Hi @kowisoft ;

Thank you so much for your support.

I’ve inspected and donloaded your workflow.
I run my PC and given this “WARN MISSING HTML Parser”

How can I solve this problem? And
Could you teach this workflow details to me?

As I said before I’m rookie on KNIME.
I should accomplish my task to Friday.

my task deadline is Friday (06/01/2023)

Thank you so much.

Jumping in for Phil here. You’ll need the Palladian plugin which contains the HTML Parser (and several other web scraping related nodes)

You can find more information and download Palladian from the following link:

Best regards,
Philipp

2 Likes

And I’d like to add it is sooooo much worth it @Sezgin

It contains other amazing nodes like the Regex Extractor Node, which is amazing

2 Likes

Guys . You’re perfect. Weel Done :grinning: :clap: :clap: :clap:

As you know that It is a task given from company to me. And If I have achieve this task I can get an OFFER. I need an offer and job :wink:

But I don’t know inner of this workflow.
I must learn this.
Could you teach me this workflow tomorrow please?

I really wonder KNIME and Selenium Nodes and Data Scraping.
I’m excited to learn these.

Thank you so much @kowisoft , @qqilihq

@Sezgin

I am pretty buys today but if a “rough video” is good enough for you, I can do one tonight (central European timezone)

2 Likes

Yeah . I see you.
We can talk 1:1 video for tonight on zoom etc.
In the meantime somepoints that I didn’t understnad (like Where selenium used this workflow)
Do you have any guide to learn quickly KNIME and Selenium nodes usage. I can watch and learn sth to our meeting.

Thanks so much

oooo I’m so sorry.
You have a youtube channel and “rough video” means that you will record and upload there.

I misunderstood you.

I will be waiting for explanation of this worklow’s video.
Thanks a lot
Finally; I’ve subscribed your channel :blush:

Ok, so here’s my explanation video for this specific workflow.

Please excuse my slightly “off” hairstyle, it is pretty late here and I had a busy day :wink:

let me know, if you have further questions

3 Likes

Thank you so much @kowisoft your video is that very understandable.
As you know that in my case I should use Selenium Node.
And I set up Selenium Nodes. So I have a 2 problem;

  1. I can’t scroll down popup page for click “SIGN IN” button. (Normally uses window.scrollTo JS method) but didn’t enter popup page.




cars.com_extract_list_58073.knwf (45.5 KB)

  1. Second problem is that to connect 2 workflow and run together

cars.com_extract_list_58073.knwf (45.5 KB)
57997-cars-dot-com-scraper.knwf (33.3 KB)

Because my task wants to scrap data via using Selenium Node (because I’m QA TESTER :smile:)

Thanks a lot
This day is my deadline

ดูการ์ตูนออนไลน์
ดูหนัง Netflix

Why do you need to sign in? The search results are available without the need to sign in, at least on cars.com

Besides that, if you set the max number of results per page to 100 in the URL, why do you need to scroll down? The shared workflow extracts the whole (!) page so I don’t see a need for scrolling.

The JS shared is usually used when you have endless scrolling pages like reddit, Facebook etc

Also, I do not understand why you want to connect 2 workflows.

Hello @wonyoung11 and welcome to the forums.

Could you translate that into English please? :blush:

1 Like

Becuse my task owner wants to do this way. He wants to “SIGN IN” as a user and then scrap results(you did).

As a QA tester we uses Selenium tool . So my task owner (company) wants to use Selenium in my task.
If ı can do, he may give an offer to me.

To be honest, I have no clue right now.

As a generic recommendation I would suggest to look into the pages source code and try to locate the following 3 items:

  • User name
  • Password
  • Submit button

Then interact with those elements through the SE Nodes, mainly “send keys” node and “click” node.