Force KNIME to Read CSV Columns As Strings

@mwiegand that should also be possible and it took ChatGPT only about 4 tries to put this together :slight_smile: and there still might be more data types out there :slight_smile: - also shows the limitations of this automatic programming approach …

import pandas as pd
import pyarrow.parquet as pq
import ast
import pyarrow.parquet as pq

# Create a sample CSV file
csv_data = '''
col1;col2;col3;col4;col5;col6;col7
1.0;true;2020-01-01;1:00:00;cat1;434;{'a', 'b', 'c'}
2.5;false;2020-02-01;2:00:00;cat2;554;{'d', 'e', 'f'}
3.0;true;2020-03-01;3:00:00;cat3;677;{'g', 'h', 'i'}
'''

with open(var_csv_file, 'w') as f:
    f.write(csv_data)

# Read the CSV file and specify the data types of the columns
df = pd.read_csv(var_csv_file, dtype={'col1': 'float', 'col2': 'bool', 'col5': 'category', 'col6': 'int64', 'col7': 'str'}, sep=';', parse_dates=['col3'])

# Convert the col4 column to a timedelta object
df['col4'] = pd.to_timedelta(df['col4'])

# Convert the 'col3' column to a set data type
df['col7'] = df['col7'].apply(ast.literal_eval)

# View the data types of the columns
print(df.dtypes)

put this in the KNIME workflow (import complex CSV file and force all columns as strings using the bundled python version – KNIME Community Hub)

2 Likes