The Container Input reads a pandas DataFrame in using knimepy, and the CSV reader reads in the same table stored as a csv file. Upon inspecting both reader node outputs, I find the tables to be identical. However, in the first workflow, the GroupBy node drops several important columns whereas the second workflow retains them. I swap the reader nodes, and the results are the same. I need to figure out how to avoid dropping columns when using the Container Input. Can somebody please help me?
are you saying that output tables coming from CSV Reader and Container Input (Table) nodes are same and equally configured GroupBy node provides different output? Have you tried running Table Difference Finder node to make sure output from reader nodes is same? Maybe column types are different…
Yes, that is exactly what I am saying. The GroupBy node is the same node in both workflows, just with input changed between the two. I used the Table Difference Finder as you suggested, and the result contains about 900 rows. However, the ‘Value Compared Table’ and the ‘Value Reference Table’ columns are identical.
However, now that you mention types, I notice that the container input table uses longs where the csv reader table uses ints. This is strange, as the csv reader reads in the csv file generated from a call to df.to_csv(), and df is the same dataframe passed into the knimepy code.