how to import tables from .docx documents via R snippet

mlauber71 · November 12, 2019, 7:14pm

You can still try to install RServe anyway the Renviron is just an additional help. Or you would have to install R in a folder and tell it specifically where the library is.

Renviron by default sits in the home directory.

Next possibility would be to compile a R version on a different machine and copy it

jjdata · November 13, 2019, 11:34am

I just installed RServe in R 3.6.1; I reinstalled docxtractr, but Knime’s R Source (Table) still doesn’t see docxtractr

I still can’t use .Renviron, and I can’t use another machine, by friday also this one will be erased and formatted… by then… end of the topic…

mlauber71 · November 13, 2019, 11:56am

Surrender is not an option. Could you share your KNIME settings regarding R? And could you maybe share the log file. So I understand you now have the latest version of RServe 1.8.3 running. The rest should be simple …

A good part of analytics is not to give up and try to get to the point by small changes.

jjdata · November 13, 2019, 1:55pm

Thank you, Mlauber, I think the same way you do, my problem is only lack of knowledge an lack of time… so:

how can I do this?

mlauber71 · November 13, 2019, 2:01pm

For the settings you could just make a screenshot like

Then in the KNIME folder you should have the knime.log file. Maybe you delete it first, then start KNIME again and try to run your workflow and then send us the file.

Then you could use this workflow to check some R settings (I might extend that further):

jjdata · November 13, 2019, 2:25pm

Here are settings:

R

R-scripting

and this is RServe version according with your workflow:

mlauber71 · November 13, 2019, 2:42pm

OK I think I know what is going on. You have two R versions on your system. The ‘integrated’ one (32 Bit deep within KNIME) and your local one (the one you should use).

Just change the setting in the first screen to:

C:\Program Files\R\R-3.6.1

then it should see your R version.

The explanation is in the longer article:

jjdata · November 13, 2019, 2:54pm

HOOOOOOORRAAAYYYYYYY!!!

Thank You Mlauber, that was the last problem, now the workflow works!

Thank you so much for your patience, your competence and resilience, I would not have succeeded alone.

mlauber71 · November 13, 2019, 2:56pm

Glad it worked out in the end Sometimes KNIME and R (not to mention Python) can be somewhat tricky but the reward is you gain an ocean of new possibilities.

jjdata · November 13, 2019, 3:42pm

Excuse me, Mlauber, I have one more question about the tuning of the workflow:

I’m going to use it to extract all 7 tables from hundreds of .docx together in a specific directory

I tried to manage the nodes to do this, and I thought I had to change the settings of the String Widget

I suppose Default Value need to be changed in order to extract tables from all the documents I put in a specific directory: till now I didn’t get anything nor inserting the path of the directory, nor adding to it *.docx nor ?.docx

Have I to manage the flow variables? Or need I to change something in the R code in the R Source (Table) node? Perhaps iterating all the script?

I hope this will be the last difficult about this workflow…!

thank you again!

ipazin · November 14, 2019, 11:38am

Hi there,

nice one @mlauber71 for not giving up

Br,
Ivan

jjdata · November 14, 2019, 11:47am

Great, really effective!

mlauber71 · November 14, 2019, 5:03pm

Just a quick hint, the variable node is just there to simulate the usage of a variable in R.

In practice you might adapt something like this workflow:

Instead of excel files list .docx files and loop thru them and collect the results of the tables. Maybe I can create an example later.

jjdata · November 14, 2019, 5:51pm

I think this is the solution!

I linked List Files to a directory with a number of my docx, and it works

in the Snippet I tried to substitute

library(docxtractr)

path <- knime.flow.in[[“Location”]]

v_docs <- word_docs(path=path)

knime.out <- as.data.frame(v_docs)

but this didn’t work: I miss something (as usual!) but I think this could be the solution (if it’s possible replace .xls with .docx)

mlauber71 · November 17, 2019, 10:35am

I think you still need both R instances since you have changing structures. In this workflow alls .docx files get listed and then imported via R.

system · November 24, 2019, 10:38am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.