External Tool "issues" and high difficulty to use

mwiegand · January 4, 2023, 3:26pm

Hi,

Preamble
I apologize for the lengthly post but have been working on it, reading through many Knime community posts, working through Hub examples, try & erroring over quite some time, in order to provide a comprehensive picture.

Actual Post
I am trying to replace the Bash Node from NGS, which got removed entirely with Knime 4.7 (Corresponding posts here and here) , by any of the external tools nodes like External Tool (Labs) – KNIME Community Hub

However, the input and output files as well as other issues do make even the simplest commands virtually impossible to execute.

Most simplest approach: Just External Tool + Manually created Shell Script
First off, when using “Generate unique file name on execute” the following error is thrown:

ERROR External Tool (Labs) 4:3 Execute failed: Can't access 'file:/var/folders/jg/zcb9yqn90w54f3szwrsnqrl80000gn/T/bash_20230104_23758/output/port0.csv'. (/var/folders/jg/zcb9yqn90w54f3szwrsnqrl80000gn/T/bash_20230104_23758/output/port0.csv (No such file or directory))

This applies for both External Tools, the regular and labs node respectively.

More complex approach: Dynamically create SH file and try to set permissions
The Set Files/Folders Permissions – KNIME Community Hub Node fails to set permissions.

ERROR Set Files/Folders Permissions 4:6 Configure failed (IllegalStateException): The file system Current workflow data area does not support setting POSIX permissions

Worth to note that Knime is executed with my OS user (not root, www-data eclipse or whatever) which apparently has the necessary permissions (confirmed recursively).

Additional issues
In- and Output
About the high difficulty to use the external tools. That relates the confusing aspect of the input and output files. Why bother ingesting a table and offering options to loop over the entire table or individual lines when in- and egress files must get served as well. Maybe something is not well explained about that causing the confusion?

The External Tools (Labs) node breaks with the following error is files are defined manually:

ERROR External Tool (Labs) 4:10 Execute failed: Index 0 out of bounds for length 0

File Path and Flow-Variables
The some time ago introduces Path variables are not supported and the flow variables to not allow to populate the locations to any file let it be Input, Output or the Executable. Actually (almost) nothing is manageable via Flow Variables.

Complex Examples
Many community examples lack documentation and are not really working (anymore). The two created by Knime however do (partially) work but are quite complex.

Actually only the first and form that only the branch with the data generator.

Unaddressed Quotation Issue?
Please see External tool node - a problem with double quotes

Manually choosing files
Usually, when manually choosing a file, a file name could get provided which would result, upon execution, that the file would get created. However, as seen below, one must manually create in- and output files first.

In- and Output Files getting deleted
Whilst it’s good to prevent unexpected data garbage to accumulate, I noticed that the manually created and empty csv files got deleted without my notice or seeing any configuration in the nodes.

Questions

The NGS Tools got decommissioned without any advanced warning time and “against” the usual process of marking them as deprecated. Mentioned in this and following posts it revealed an insufficient processes which a couple of users got frustrated about (latest here). Is there any process improvement considered for the future?
How to work with the external tool node in general or details as per aforementioned issues?
How to handle the POSIX Permission issue?
Would it be possible to provide a really simple example workflow to showcase how to i.e. run a simple shell command like “printenv”?

Here is the workflow displaying the many issues mentioned above:

Best
Mike

iCFO · January 4, 2023, 4:25pm

The loss of the Bash node has also cost me a few days of diving back into several components to attempt rebuilds. There are a few heavy use components that I have been able to duplicate via Java, otherwise I am in the same spot as you.

mwiegand · January 4, 2023, 6:26pm

Just noticed in the initial comment that the second workflow example was missing. Here is the one I wanted to link:

@iCFO I too had some attempts in Java, predominantly in this workflow:

Unfortunately the overarching issue which I could not fix and which presumably lead to decommissioning of NGS Bash (I kind of feel being at fault) is summarized here:

Did you managed to execute shell commands like ping or more advanced like nmap or even 3rd party tools / commands i.e related to OSINT (to automate IT Security).

Best
Mike

iCFO · January 4, 2023, 6:38pm

Unfortunately I am no help there. I have only managed to pull off a few java based Windows OS hacks at this point…

AnotherFraudUser · January 4, 2023, 7:42pm

Hi @iCFO, Hi @mwiegand,

is it just the BASH node you guys need?
From the layout the node does not look to complex - I could try to add a node which executes in a similiar way through java process builder tomorrow evening

iCFO · January 4, 2023, 7:56pm

Bash was the only NGS node on my end… It would be a great Node resurrection if you can pull it off @AnotherFraudUser

Many thanks for the help and effort either way!

mwiegand · January 4, 2023, 8:22pm

Hi @AnotherFraudUser,

Your offer would be much welcome. However, I fear we’d run into the same issues I discovered before that even simple commands were not found. Quite some extensive research was made and it was concluded that the env variables, regardless of the circumstances (please see me linked post), were not properly read by Knime.

Anyways, I’d not hinder your motivation. If you could pull that off, you’d be our early crowned hero of the year.

Cheers
Mike

carstenhaubold · January 5, 2023, 10:11am

Hi @mwiegand,

Sorry to hear that you ran into these problems with the External Tool node and thank you for the detailed report. I’m no expert on it, but I agree that it is not straightforward to use.

Would it be a workaround for you to use a Python Script node and call external processes from Python using the subprocess module? Then you have full control over the input and output files for the external process.

Best,
Carsten

mwiegand · January 5, 2023, 10:41am

Hi @carstenhaubold,

Many thanks for your suggestion and kind words. Unfortunately I am not sufficiently proficient in using Python. Going “nuts” by manually coding complex tasks kind of goes, from my understanding, against the “low to no code” nature of Knime. An island / non scalable “solution” is deemed to fail earlier than later.

I can see why the Bash Node in particular got decommissioned. Alongside it’s redundancy to the external tools, it had some major drawbacks as well. Therefore, improving the External Tool Nodes would be I believe the best approach, wouldn’t it?

Best
Mike

carstenhaubold · January 5, 2023, 11:02am

Well… in my personal opinion each usage of an External Tool node is an island solution because it needs a handcrafted path to a locally installed executable, and if you call some bash/shell script it can never work across operating systems. Python scripts however are portable, and give you way more flexibility in controlling external processes.

But yes, I fully agree that using the Python Script node obviously requires you to know some Python, which goes against the “no-code” mentality.

And the issues you mentioned regarding file permissions, quotation marks, and the disappearance of the NGS extension are all valid points!

mwiegand · January 5, 2023, 11:36am

You are right with the fact that every externally executed tool would be an island solution. That‘s why I attempted to automate it at least to that extend by i.e. doing some basic / rudimentary tasks like pinging (reachability), nmap port scans to detect misconfigurations or more.

It could even get so far, albeit being a totally different example, as to execute an Adobe Photoshop Droplet to process images.

The opportunities are countless and would also help to conduct rather quick feasibility tests before going to the lengths of coding something non-native (difficult to phrase but I hope you might get the point) in Python, Java or R or even developing a dedicated node.

No complains, though. Just trying to share some thoughts and eventually facilitate a solution.

Best
Mike

carstenhaubold · January 5, 2023, 12:15pm

Totally agree, we should make the External Tool node more usable.

Still, to give you a possible workaround right now, I just rebuilt the NGS Bash node’s functionality using the Python Script node in KNIME 4.7, and using the bundled Python environment (which it’ll use by default if you freshly install the Python Integration in KNIME). So you don’t even need to deal with setting up a Python installation.

Place a Python Script node in your workflow, click the three dots to remove the input table and add a second output table and then paste the code below.

You only have to adjust the two lines after the comment to make the node do what you did with the NGS Bash node:

import knime.scripting.io as knio
import subprocess as sp
import pandas as pd

# equivalent to the input fields of the NGS bash node:
cmd = 'ls -lah'
working_dir = '/Users/chaubold'

proc = sp.run(cmd.split(" "), cwd=working_dir, capture_output=True)

def get_lines_not_empty(stream):
    lines = stream.decode().splitlines()
    if len(lines) == 0:
        return [""] # because KNIME doesn't like a completely empty data frame
    else:
        return lines

df_stdout = pd.DataFrame({"StdOut": get_lines_not_empty(proc.stdout)})
df_stderr = pd.DataFrame({"StdErr": get_lines_not_empty(proc.stderr)})

knio.output_tables[0] = knio.Table.from_pandas(df_stdout)
knio.output_tables[1] = knio.Table.from_pandas(df_stderr)

Hope that helps
Carsten

EDIT: this solution will also have problems with quotation marks in program arguments but that can be fixed rather easily.

mwiegand · January 7, 2023, 11:31am

Hi @carstenhaubold,

Apologize that I am replying delayed. I had to finish of two other tasks first. Your solution, which is awesome, partially works but also reinforces the findings from:

The regular commands like ping or nc do work, however nmap fails with the error:

ERROR Python Script 4:852 Execute failed: FileNotFoundError: [Errno 2] No such file or directory: 'nmap'

Reason for that seems that Knime does not use the users PATH variable which is baffling but might have a reasonable / simple explanation. Do you happen to know anything or have an idea which could resolve this mystery?

Best
Mike

AnotherFraudUser · January 10, 2023, 9:08pm

Hi @mwiegand, @iCFO,

took a bit longer due to issues with the node deployment.
But I added a script execution node to the AF Utilities which hopefully covers at least the basic usecases from the BASH node
grafik

I added env-prefs to maybe also cover @mwiegand problem
where you can change or add new enviroment settings used by the node (by default it reads all enviroment settings which are used by default)

But have to say did not use the old bash node at all - so things could still function slightly different…
(also could not test if the node functions correctly with mac os)

But in the long term I think it would be great if the External Tool node just covers all the requirements

The core is just simple java code with a few lines - so if you want I can also try to provide you with a snippet instead which you can tweak yourself

mwiegand · January 11, 2023, 8:04am

Dude, I have fallen madly in live with it at first sight! Exciting news is, defining the correct Path variable works like a charm as the formerly not found commands are found (albeit not producing a result but responding with “wrong command format”.

I am extensively tesring ti at the moment but already have some little feedback:

Reset button for Environment Preferences would be great. I allows to revert undesirable (in case the workflow is shared) / bad mistakes (in case the user messed it up) without the need to start from scratch
I was in the 2nd tab and got this error which could be improved with a reference to the input field (in this case the 1st tab)
Bildschirmfoto 2023-01-11 um 08.53.582086×940 199 KB
I tried an ls -la . but got this as a result. Not quite certain what the default directory is but there is also a presumably hard coded path. When the path was set to / is produced the expected output

Bildschirmfoto 2023-01-11 um 08.55.571714×1432 143 KB
When defining a relative path ~/ I got an error ERROR Execute Shell Script 3:855 Execute failed: Error while executing script:Cannot run program "sh" (in directory "~"): error=2, No such file or directory
Executing a working command ping -c 5 google.com (via other nodes) triggered a STOut error

Bildschirmfoto 2023-01-11 um 09.00.231950×1726 279 KB
printenv added a strange looking row to the output

Bildschirmfoto 2023-01-11 um 09.06.543110×958 335 KB

Workflow was updated

Many many thanks & kind regards
Mike

carstenhaubold · January 11, 2023, 10:11am

Hi Mike,

Quick status update on the environment variables: I asked around and had a brief look, but so far I don’t know yet why processes started by KNIME do not inherit the environment variables that KNIME was started with.

One could obviously also specify environment variables for the process in the Python script that I provided by providing the env parameter in the sp.run() call. But that would be adding another workaround. I’m trying to get to the root of the problem

Best,
Carsten

mwiegand · January 11, 2023, 12:18pm

You are close to become the hero of the year already for quite a few Knimers

mwiegand · January 11, 2023, 4:04pm

Hmm, no command seems to get accepted. Whilst being valid, each command with arguments – I assume the space is causing some issues (drop of arguments?) – causes usual response when the command is incomplete. Like just executing “ping” without anything else. System is OSX.

Update: This bares the question, more a generic topic, if Knime can be started in some sort of verbose mode to get more details i.e. about each node execution?

carstenhaubold · January 11, 2023, 5:09pm

Aaah . Learn something new every day they say…

This seems to be a security feature of OSX. When a program is started normally, meaning by double clicking the application or calling it via Spotlight, the PATH variable is cleared out by OSX.

However, if you start KNIME from a terminal, you can make it inherit the full environment like so:

navigate into your KNIME application’s Contents/MacOS folder with the terminal, from your output above this seems to be /Applications/KNIME.app/Contents/MacOS
run ./knime
try any of the solutions provided here

I tried it by running the env command using my Python script and via the terminal I did get my full PATH variable.

mwiegand · January 12, 2023, 9:50am

That is hilarious. I don’t know if I should swear, just scream or laugh … maybe all at the same time. Anyways, how did you find that out? It somehow reminds me of that procedure Apple required for unsigned applications that you, in order to open them the first time, have to right click on the app and chose open on purpose.

Otherwise OSX will mourn this App is “broken”.

While navigating to the path I happen to notice the reason of that odd string found when executing the ls command. If no execution path is provided, Knime automatically assumes the path of it’s “runtime?” which in my case is: “/Applications/KNIME.app/Contents/MacOS/”

To your solution. I can verify it is working. The PATH variable of my user is utilized! Halejulia!

This also sheds some light on the unusual row (the very end) being appended to the printenv command but I still struggle to fully comprehend it. The printenv command formerly listed “=/usr/bin/printenv" but now, when opening knime via terminal, it lists "=/Applications/KNIME.app/Contents/MacOS/./knime”

Maybe our conversation drifts too far away from the solution, you new awesome node. Shall we focus only on this and if so, what would you consider necessary to close this off?

Again many thanks & kind regards
Mike