Python Source node bugged

Niklas · July 25, 2018, 2:57pm

Hi,

I want to use the Python Source node to read .npz (numpy array) files.

My code is

import pandas as pd
import numpy as np
path = ‘…/test_batches_casefeat_punct_unk.npz’
test_batches = np.load(path, encoding=‘latin1’)[‘arr_0’]
output_table = pd.DataFrame(data = test_batches)

If I call

print(output_table)

within the Python Source node, I see the correct output:

                                                0                        ...                                                                          3
0 [[112], [2790], [3495], [7564], [549498], [217… … [[[4]], [[4]], [[4]], [[6]], [[2]], [[6]], [[4…
1 [[326226, 394315], [291987, 65], [9002, 183], … … [[[5], [1]], [[4], [4]], [[4], [4]], [[4], [4]…
2 [[531509, 65, 16583], [21, 2837, 65], [864, 75… … [[[4], [4], [4]], [[4], [4], [4]], [[6], [0], …
3 [[5821, 22986, 134597, 1], [66489, 65, 6100, 2… … [[[6], [4], [5], [1]], [[4], [4], [4], [4]], […
4 [[23548, 9, 4, 1732, 5240], [6517, 34, 33322, … … [[[4], [4], [4], [7], [8]], [[6], [4], [5], [1…
5 [[118079, 12205, 34, 926, 34, 15753], [8200, 9… … [[[6], [4], [0], [0], [0], [0]], [[5], [1], [4…
6 [[25385, 13577, 77510, 2307, 21, 2837, 65], [2… … [[[4], [4], [4], [4], [4], [4], [4]], [[5], [1…

These are 4 columns, 66 rows, and a lot of numbers.

Unfortunately, if I input the node into another node, the output_table (which is now the input table I guess) looks like this:

       0      1      2      3
Row0 True True True True
Row1 True True True True
Row2 True True True True
Row3 True True True True
…

So somehow the Dataframe gets “boolean-ized”.

Why is that and can I solve it somehow?

MarcelW · July 25, 2018, 8:45pm

Hi Niklas,

Could you upload the .npz file (or a representative subset of it if it’s too large)? Pandas sometimes does some implicit column type magic that is not expected by KNIME.
Thanks!

Marcel

Niklas · July 26, 2018, 6:45am

Hi Marcel,

sure, I can upload it. It is a zipped file because I was not allowed to upload the .npz file:

test_batches_casefeat_punct_unk.zip (453.1 KB)

MarcelW · July 30, 2018, 12:12pm

Hi Niklas,

The numpy array you’re using is deeply nested. This is not supported by our Python integration because KNIME largely works with flat tables (please also see beginner’s posts in your other thread: How to get an numpy array as output?).
I’m not quite sure why the output DataFrame you created is interpreted as a table full of booleans. Trying to transfer the table from Python to Java should actually result in a runtime error. So thanks for reporting that!

To get your data to Java, you’d need to flatten the numpy array somehow. I’ll gladly help you with that.

Marcel