Table extractor - extraction settings

Hello together,

I try to read tables from a couple of websites and I created a loop for this case. The tables have different nacmes, but I am able to identify those tables. The problem is, I have to manually switch the extraction setting “Treat rowspan and colspan as missing cells” on or off, after every loop and I don’t understand why or how I can automate it. Is there a possibility to create a variable switching it on and off automatically?


Thank you!
Max

The table extractor expects #table_2, but has already selected the right Input (#table_1) How can this happen?

Now it gets even stranger. I only have to open the table extractor node and close it again and I can go on with my loop…

Hi @Max98,

it is somewhat hard to see what is exactly happening… is it possible for you to create an example workflow (if the page is public)
If it is not public - can you provide the html from the page? (if there is no confidential information there)
For the extraction you use the Palladian Nodes as far as I can see :thinking:
Maybe @qqilihq knows more about this

1 Like

Sales_Data 1.knwf (104.3 KB)

Of course, it is public, I am just playing around :slight_smile:

1 Like

Great! Will look into it later today :partying_face:

1 Like

Thank you very much!

Did you already found something? :slight_smile:

Hi @Max98,

i looked into it but could not figure out your problem.
(but the Excel Information is missing so beside the audi link which could be seen in the node - not sure if the problem happens in later loops?)

But basically you want to extract the two table from the site for multiple companies - right? :thinking:

Yes, exactly, I want two tables with different names from the carmakers defined in the excel reader.

At the moment, the sheets in the file writer get overwritten.

But the mein problem comes in the second loop. Then you can see, that the table extractor expects #table_2 (which he got in the first loop but already has the correct Input (#table_1) selected.

And you just have to open the configure windows of the table extractors and click OK and the loop continues without fault…

Hi @Max98,

could you provide two links for the loop or save your workflow so that at least 2 iterations are possible? :thinking:

1 Like

But basically I think the solution is to parse the html before you get to the table selector to check what is the first table in the html (what is most likely happening in the background when you open the confirmation/saving).
Then pass it on as a flow variable to the selector

2 Likes

okay got another link :slight_smile:
But as I said - i think you have to set it yourself using a flow variable.

e.g. extracting all id=“table_” from the html - and then deciding which one to use
(example for extracting below)
grafik
grafik

Here another possiblilty how to do it without the Selenium Nodes (not really needed but wanted to see if it can be done without :slight_smile: )
Seems to work as well for multiple companies


Sales Data.knar (432.9 KB)

5 Likes

I cannot thank you enough for all your effort!! The option without Selenium is even better! :heart_eyes: :partying_face:

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.