@ScottF / @SimonS - following up on our discussion here. I would like to adress the H2O/windows problem in a more systematic way.
I wrote a wrapper for H2O.ai’s AutoML to be used with KNIME. On MacOSX and Linux it seems to work. Under Windows I encounter a strange problem I would like the community and the KNIME developers to know.
The problem appears when one tries to load a Pandas data frame into H2O data frame. It does work under:
but not wrapped in KNIME. And although the error message is related to H2O the cause seem to be some implementation or versions in KNME Python scripting.
The workflow below illustrates that with a simple example trying to load the iris dataset into Pandas and then further into H2O but also directly loading parquet files from the local disc into H2O directly (which also should be possible).
And as stated. It does work with ‘pure’ Jupyter notebooks but not under KNIME. And yes I have tried to set the encoding to UTF-8 in the KNIME ini
The subfolder /script/ contains the Jupyter notebooks as well as a knime.log and debug log.
Looking at posts such as this or this suggests that this is because KNIME uses Windows cmd/batch files to launch Python. Apparently, the Windows command line does not fully support UTF-8 encoding by default.
In the above posts, there are some suggestions for enabling this support. These two seem most promising:
Turn the Windows console into UTF-8 mode via command chcp 65001. If you are using the “Manual” mode on the KNIME Python preference page (i.e., you are using a start script to launch Python), you could try to prepend this line to the start script.
I have not tested any of the suggestions and would appreciate it if someone could try them out. Especially the first solution would be interesting since it requires no additional Python packages, and it is something that KNIME could do internally when the “Auto” option is selected on the KNIME Python preference page, so no user action would be required in the scripting nodes.
@MarcelW thank you for your effort and the suggestions. I think they put me on the right track with actually studying the error message and thinking about waht is going on.
It was the progress bar in H2O that would use characters that the KNIME/Windows console would not like and would refuse to fix using you suggestions. In the end I just introduced:
h2o.no_progress()
So no more progress bars inside the wrapper (which would not be of any real help) and now it does work. Still might be a point for KNIME or H2O to see that these progress bars with their (block ?) characters would not interfer with scripts from within KNIME.
I will mark this as solved when I have done some more tests.
@ScottF I tried several things alongside the ideas of @MarcelW but to no avail. I think it would be good if the windows console (call?) could handle UTF-8 characters but I am not sure how to achieve that. If KNIME could come up with a standard solution that would be good - but maybe not the highest priority.
Having said that - I encounter an increased interest in ‘wrapped’ solutions with KNIME where additional functions / packages in R and Python are made available through KNIME so stability in these wrapped nodes is a thing.
Admittedly using strange block characters as a progress bar is not your common problem with commands at the console.