Python Script and H2O Data Frames - Error under Windows

@ScottF / @SimonS - following up on our discussion here. I would like to adress the H2O/windows problem in a more systematic way.

I wrote a wrapper for H2O.ai’s AutoML to be used with KNIME. On MacOSX and Linux it seems to work. Under Windows I encounter a strange problem I would like the community and the KNIME developers to know.

The problem appears when one tries to load a Pandas data frame into H2O data frame. It does work under:

  • MacOSX wrapped with KNIME Python Script
  • MacOSX in Jupyter notebook
  • Windows in Jupyter notebook

but not wrapped in KNIME. And although the error message is related to H2O the cause seem to be some implementation or versions in KNME Python scripting.

The workflow below illustrates that with a simple example trying to load the iris dataset into Pandas and then further into H2O but also directly loading parquet files from the local disc into H2O directly (which also should be possible).

And as stated. It does work with ‘pure’ Jupyter notebooks but not under KNIME. And yes I have tried to set the encoding to UTF-8 in the KNIME ini :slight_smile:

The subfolder /script/ contains the Jupyter notebooks as well as a knime.log and debug log.

Here is a reduced workflow to illustrate the problem:

4 Likes

Hi @mlauber71 -

Thanks for this. I’ll ask internally and see if we can’t figure out why this is happening. Or maybe @SimonS has an idea already. :slight_smile:

5 Likes

Hi guys,

Looking at posts such as this or this suggests that this is because KNIME uses Windows cmd/batch files to launch Python. Apparently, the Windows command line does not fully support UTF-8 encoding by default.

In the above posts, there are some suggestions for enabling this support. These two seem most promising:

  • Turn the Windows console into UTF-8 mode via command chcp 65001. If you are using the “Manual” mode on the KNIME Python preference page (i.e., you are using a start script to launch Python), you could try to prepend this line to the start script.
  • Or, install win_unicode_console in your Python environment and prepend
    import win_unicode_console
    win_unicode_console.enable()
    
    to the Python script inside the Python node.

I have not tested any of the suggestions and would appreciate it if someone could try them out. Especially the first solution would be interesting since it requires no additional Python packages, and it is something that KNIME could do internally when the “Auto” option is selected on the KNIME Python preference page, so no user action would be required in the scripting nodes.

Marcel

5 Likes

@MarcelW thank you for your effort and the suggestions. I think they put me on the right track with actually studying the error message and thinking about waht is going on.

It was the progress bar in H2O that would use characters that the KNIME/Windows console would not like and would refuse to fix using you suggestions. In the end I just introduced:

h2o.no_progress()

So no more progress bars inside the wrapper (which would not be of any real help) and now it does work. Still might be a point for KNIME or H2O to see that these progress bars with their (block ?) characters would not interfer with scripts from within KNIME.

I will mark this as solved when I have done some more tests.

6 Likes

@MarcelW - Just as a quick followup here, I gave both of these options a try. On my system:

  • using chcp 65001 seems to have no effect that I can tell
  • win_unicode_console.enable() fails with error '_Logger' object has no attribute 'encoding' before I even get to executing the rest of the script

I’m happy to followup on either of these if you think it’s worth the time. But @mlauber71’s fix for the progress bar seems to work fine regardless.

2 Likes

@ScottF I tried several things alongside the ideas of @MarcelW but to no avail. I think it would be good if the windows console (call?) could handle UTF-8 characters but I am not sure how to achieve that. If KNIME could come up with a standard solution that would be good - but maybe not the highest priority.

Having said that - I encounter an increased interest in ‘wrapped’ solutions with KNIME where additional functions / packages in R and Python are made available through KNIME so stability in these wrapped nodes is a thing.

Admittedly using strange block characters as a progress bar is not your common problem with commands at the console.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.