I would like some help to do a simple task. (I think)
I want to read some PDFs and extract to excel only two information.
I’ll explain into prints.
First
I have two PDFs (in truth, I have 800) and I want only those two information concatenated into a single line, like below, separated by underscore:
CNPJ: 43.708.379/0003-63_Referência: 11/2022 (then replace / to another character for rename the file)
If I reach the 2) item, I will extract to excel file the results, then uses PowerAutomate to rename the files. With only two columns, the original name and new name I could ( I found a youtube video)
@Felipereis50 I might be able to take a look at your example later. In the meantime I could point you to this solution using Rstats to extract text from a PDF and search for a special term:
Hello @Felipereis50
Following the KNIME Hub link, you can find a possible solution to your challenge. I tried to build the whole workflow without stepping on scripting nodes. Transfer the files with Transfer Files node was an easy task to be configured within a loop, however I couldn’t complete the file rename task.
As you can appreciate, coding transfer + rename into R is very simple. Then, you can avoid PowerAutomate extra tasks.
P.S.- PDF Files aren’t included in the workflow. The workflow is saved with data. Aiming to re-run the workflow you will have to point source folders in your system, and edit reference folders within R Snippet code.
Wow, great to see people committed to helping.
I’m still new to Knime, but I hope I can help too.
It’s great when you send me the code, so I can analyze it.
About the R language, I took a small course on Google Analytics, but I found the language difficult. I didn’t adapt.
I will download your code and studie step by step analysis.
I managed to do it using Power Query and Power Automate.
In the power query I read a folder and did the ETL (easy), then I used PowerAutomate Desktop (youtube video (very easy to)
But I want to learn in Knime. It is a magnificent tool. The best I’ve ever met. Incredible.
I’ll try to update the workflow by the end of the day, and it will include this function based in Transfer Files node (loop embedded). I can anticipate that it won’t be so efficient as R does.
Regex coding is a powerful tool for dealing with text. As any code, it can solve many workarounds. The explanation of the code is as follows:
Therefore, the code represents the whole multiline text within the cell. The first capturing group is the target row and it is represented by “$1”. Then we are replacing the whole text with only the capturing group.
Wow thank you very much.
If you can make the loop for me to understand. It will be a great learning experience.
I don’t want to disturb you.
Thanks for passing the rules. Yesterday I was watching videos to learn regex. I still think it’s difficult. It has many rules. Really Regex is a powerful. I didn’t know.
I had thought of using some “IF” formula and “search” and then left and right to capture the numbers after CNPJ.
I looked at your history on the forum and saw that you know a lot.
Hello @Felipereis50
I’ve just updated the workflow in KNIME Hub referenced in previous post. As you can see, ‘Transfer Files’ node can only copy or move (by deleting source files option) to target folder…
The best option to copy_and_rename() from my experience is by coding it into R. Would be interesting to know, if rename function can be achieved from KNIME base nodes.
Thanks @mlauber71
So ‘Transfer Files (Table)’ can do the trick
I will try to upgrade the workflow by including a @mlauber71 's inspired KNIME base nodes supported copy_and_rename() ‘option’. And maybe adding the Py option as well, aiming to close the gap…
I’m studying your code.
And for mlauber_71, a very creative way of renaming.
For the first time I saw loop example. Very interesting.
Well, there could be a Transfer node (move, copy and rename) I think from this point on, it’s creativity with other nodes to rename it.
As I don’t know anything about R or Python, it would be difficult for me to finish “GOAL” alone.
In any case, I am very grateful for what has already been achieved.
I found some other threads about renaming files. I haven’t been able to check yet. But I will. Who knows, maybe I can help you too.
But I managed to rename the 800 files in 3 minutes.
The workflow now achieves the Copy_Rename function suported only with KNIME based nodes. It is supported with ‘Transfer Files (Table)’ node, summing up @mlauber71 's suggestions.