I try to read tables from a couple of websites and I created a loop for this case. The tables have different nacmes, but I am able to identify those tables. The problem is, I have to manually switch the extraction setting “Treat rowspan and colspan as missing cells” on or off, after every loop and I don’t understand why or how I can automate it. Is there a possibility to create a variable switching it on and off automatically?
The table extractor expects #table_2, but has already selected the right Input (#table_1) How can this happen?
Now it gets even stranger. I only have to open the table extractor node and close it again and I can go on with my loop…
it is somewhat hard to see what is exactly happening… is it possible for you to create an example workflow (if the page is public)
If it is not public - can you provide the html from the page? (if there is no confidential information there)
For the extraction you use the Palladian Nodes as far as I can see
Maybe @qqilihq knows more about this
Sales_Data 1.knwf (104.3 KB)
Of course, it is public, I am just playing around
Great! Will look into it later today
Did you already found something?
i looked into it but could not figure out your problem.
(but the Excel Information is missing so beside the audi link which could be seen in the node - not sure if the problem happens in later loops?)
But basically you want to extract the two table from the site for multiple companies - right?
Yes, exactly, I want two tables with different names from the carmakers defined in the excel reader.
At the moment, the sheets in the file writer get overwritten.
But the mein problem comes in the second loop. Then you can see, that the table extractor expects #table_2 (which he got in the first loop but already has the correct Input (#table_1) selected.
And you just have to open the configure windows of the table extractors and click OK and the loop continues without fault…
could you provide two links for the loop or save your workflow so that at least 2 iterations are possible?
But basically I think the solution is to parse the html before you get to the table selector to check what is the first table in the html (what is most likely happening in the background when you open the confirmation/saving).
Then pass it on as a flow variable to the selector
okay got another link
But as I said - i think you have to set it yourself using a flow variable.
e.g. extracting all id=“table_” from the html - and then deciding which one to use
(example for extracting below)
Here another possiblilty how to do it without the Selenium Nodes (not really needed but wanted to see if it can be done without )
Seems to work as well for multiple companies
Sales Data.knar (432.9 KB)
I cannot thank you enough for all your effort!! The option without Selenium is even better!
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.