Hi,
I want to use the Python Source node to read .npz (numpy array) files.
My code is
import pandas as pd
import numpy as np
path = ‘…/test_batches_casefeat_punct_unk.npz’
test_batches = np.load(path, encoding=‘latin1’)[‘arr_0’]
output_table = pd.DataFrame(data = test_batches)
If I call
print(output_table)
within the Python Source node, I see the correct output:
0 ... 3
0 [[112], [2790], [3495], [7564], [549498], [217… … [[[4]], [[4]], [[4]], [[6]], [[2]], [[6]], [[4…
1 [[326226, 394315], [291987, 65], [9002, 183], … … [[[5], [1]], [[4], [4]], [[4], [4]], [[4], [4]…
2 [[531509, 65, 16583], [21, 2837, 65], [864, 75… … [[[4], [4], [4]], [[4], [4], [4]], [[6], [0], …
3 [[5821, 22986, 134597, 1], [66489, 65, 6100, 2… … [[[6], [4], [5], [1]], [[4], [4], [4], [4]], […
4 [[23548, 9, 4, 1732, 5240], [6517, 34, 33322, … … [[[4], [4], [4], [7], [8]], [[6], [4], [5], [1…
5 [[118079, 12205, 34, 926, 34, 15753], [8200, 9… … [[[6], [4], [0], [0], [0], [0]], [[5], [1], [4…
6 [[25385, 13577, 77510, 2307, 21, 2837, 65], [2… … [[[4], [4], [4], [4], [4], [4], [4]], [[5], [1…
These are 4 columns, 66 rows, and a lot of numbers.
Unfortunately, if I input the node into another node, the output_table (which is now the input table I guess) looks like this:
0 1 2 3
Row0 True True True True
Row1 True True True True
Row2 True True True True
Row3 True True True True
…
So somehow the Dataframe gets “boolean-ized”.
Why is that and can I solve it somehow?