Web Interactions - loop - relative paths - browser settings

While testing the Web interactions I found some difficulties. Could be a misinterpretation or a bug in the use of the extension.

If I use the Web Interaction in a Loop, there is not a way to end the Interaction. If I connect the interaction the loop fails.

In this case I have a list of files to download in the same page. It didn’t make sense to me, start the Interaction navigate to the page, send the click, en the interaction multiple times. (Maybe is the only way)

If I don’t run the Web interaction End, the process never ends until starvation.

Leaking-1

If I use a headless browser

Leaking-2

Question: Which is the correct way to use the web interactions in a loop?

Question: How to change the download directory in the advance browser setting if possible.

Question: Is it possible to change the behavior of opening the pdf file on the page to “download”?

BUG or I need to adjust the column results?

Content Retriever - Works great with static paths.

With relative paths I’m getting these results in the link column, and the process fails. (I will try to test try-catch to avoid keep running the web interaction)

The calculation of the base url is: get left of the first find of /

href=“”**./**encuestas/ensanutnl2022/informes.php

"<?xml version=""1.0"" encoding=""UTF-8""?>
<a class=""on-iframe"" href=""./encuestas/ensanutnl2022/informes.php"">
 Informes
</a>"
https://ensanut.insp.mx./encuestas/ensanutnl2022/informes.php

Not a problem

To download the file I got the links in the page

Content Retriever returns

"<?xml version=""1.0"" encoding=""UTF-8""?>
<a href=""../../encuestas/ensanutnl2022/doctos/informes/NvoLeon22_Ensanut.pdf"" target=""_blank"">
    <img alt=""InformeFinal2022"" src=""../../encuestas/ensanutnl2022/img/NvoLeon22_Ensanut.png"">
    </img>
</a>"

This Link fails: …/…/

https://ensanut.insp.mx.../../encuestas/ensanutnl2022/doctos/informes/NvoLeon22_Ensanut.pdf

Based on the page position, I guess the link should be based on the active page

https://ensanut.insp.mx./encuestas/ensanutnl2022/../../encuestas/ensanutnl2022/doctos/informes/NvoLeon22_Ensanut.pdf

Which the server transform to:

https://ensanut.insp.mx./encuestas/ensanutnl2022/doctos/informes/NvoLeon22_Ensanut.pdf

By now I’m killing the Tasks and adjusting the paths. It is easier than download the files manually.

Thanks

Hello @ricknime,

I’ll do my best to answer your questions.

Currently, the web interaction nodes don’t support URLs containing three dots. I’ve created a bug ticket for this issue (internal reference: AP-23330).

  • For the loop setup, the problem is that the Web Interaction End (Labs) node is being treated as part of the loop. However, the flow variable connected to it is set to run after the loop finishes. To fix this, connect the Web Interaction End node to the Web Interaction Start node (a node outside of the loop), while keeping the flow variable connection unchanged.

  • I’m not sure how to change the download directory or modify the behavior of opening PDF files to download in the node. However, as a workaround, you can use the String Manipulation node (or the new Expression node if you’re using AP 5.3 or newer) to fix relative URLs and generate a valid URL without dots.Then, you can use the GET request node to download the file and and save it in your desired location using the Binary Objects to Files node.

Hope this helps.

Best,
Keerthan

1 Like