I try to read tables from a couple of websites and I created a loop for this case. The tables have different nacmes, but I am able to identify those tables. The problem is, I have to manually switch the extraction setting “Treat rowspan and colspan as missing cells” on or off, after every loop and I don’t understand why or how I can automate it. Is there a possibility to create a variable switching it on and off automatically?
it is somewhat hard to see what is exactly happening… is it possible for you to create an example workflow (if the page is public)
If it is not public - can you provide the html from the page? (if there is no confidential information there)
For the extraction you use the Palladian Nodes as far as I can see
Maybe @qqilihq knows more about this
i looked into it but could not figure out your problem.
(but the Excel Information is missing so beside the audi link which could be seen in the node - not sure if the problem happens in later loops?)
But basically you want to extract the two table from the site for multiple companies - right?
Yes, exactly, I want two tables with different names from the carmakers defined in the excel reader.
At the moment, the sheets in the file writer get overwritten.
But the mein problem comes in the second loop. Then you can see, that the table extractor expects #table_2 (which he got in the first loop but already has the correct Input (#table_1) selected.
And you just have to open the configure windows of the table extractors and click OK and the loop continues without fault…
But basically I think the solution is to parse the html before you get to the table selector to check what is the first table in the html (what is most likely happening in the background when you open the confirmation/saving).
Then pass it on as a flow variable to the selector
okay got another link
But as I said - i think you have to set it yourself using a flow variable.
e.g. extracting all id=“table_” from the html - and then deciding which one to use
(example for extracting below)
Here another possiblilty how to do it without the Selenium Nodes (not really needed but wanted to see if it can be done without )
Seems to work as well for multiple companies