change data type column with python script node

alimly13 · January 13, 2024, 7:47pm

Hello, I used csv reader in the first node, then I came to the second node, I used python script to change the data type of my columns, the code I typed is the same, but it does not change in the output of the table, according to the example, the data type of the last column should be changed. But it doesn’t work, what should I do?

Copy input to output

output_table_1 = input_table_1.copy()
import pandas as pd
def analyze_data(df):
results =
for column in df.columns:
int_count = 0
float_count = 0
string_count = 0
for index, row in df.iterrows():
value = row[column]
if pd.notnull(value):
try:
value = int(value)
int_count += 1
except ValueError:
try:
value = float(value)
float_count += 1
except ValueError:
string_count += 1
total = int_count + float_count + string_count
if total != 0:
int_percentage = int_count/total100
float_percentage = float_count/total100
string_percentage = string_count/total*100
max_percentage = max(int_percentage, float_percentage, string_percentage)
if max_percentage == int_percentage and max_percentage == float_percentage and max_percentage == string_percentage:
results.append((column, “string”, int_percentage, float_percentage, string_percentage))
elif max_percentage == int_percentage and max_percentage == float_percentage:
results.append((column, “float”, int_percentage, float_percentage, string_percentage))
elif max_percentage == string_percentage and max_percentage == float_percentage:
results.append((column, “string”, int_percentage, float_percentage, string_percentage))
elif max_percentage == int_percentage and max_percentage == string_percentage:
results.append((column, “string”, int_percentage, float_percentage, string_percentage))
elif max_percentage == int_percentage:
results.append((column, “int”, int_percentage, float_percentage, string_percentage))
elif max_percentage == float_percentage:
results.append((column, “float”, int_percentage, float_percentage, string_percentage))
else:
results.append((column, “string”, int_percentage, float_percentage, string_percentage))
else:
results.append((column, “No data”, 0, 0, 0))
return results
results = analyze_data(input_table_1.copy())

mlauber71 · January 14, 2024, 7:21am

@alimly13 welcome to the KNIME forum. Could you check if you bring the result table back to KNIME?

Maybe you could provide us with a sample without spelling any secrets.

Then as a hint you can use preformated code to make Python code easier to read:

alimly13 · January 15, 2024, 4:37pm

I came with this code, I tried as you said, but it gives me an error that it doesn’t recognize knime, but I came from this side, I also installed all Python plugins, what should I do? Is there a problem with my code?

import pandas as pd
from dateutil.parser import parse
import knime.scripting.io as knio

file_path = r’C:\Users\Alireza\Desktop\Machine1.csv’
df = pd.read_csv(file_path)

def is_date(string, fuzzy=False):
try:
parse(string, fuzzy=fuzzy)
return True
except ValueError:
return False

def analyze_data(df):
results =
for column in df.columns:
int_count = 0
float_count = 0
string_count = 0
date_count = 0
for index, row in df.iterrows():
value = row[column]
if pd.notnull(value):
try:
value = int(value)
int_count += 1
except ValueError:
try:
value = float(value)
float_count += 1
except ValueError:
if is_date(value):
date_count += 1
else:
string_count += 1
total = int_count + float_count + string_count + date_count
if total != 0:
int_percentage = int_count/total100
float_percentage = float_count/total100
string_percentage = string_count/total100
date_percentage = date_count/total100
max_percentage = max(int_percentage, float_percentage, string_percentage, date_percentage)
if max_percentage == int_percentage:
df[column] = df[column].astype(int, errors=‘ignore’)
results.append((column, “int”, int_percentage, float_percentage, string_percentage, date_percentage))
elif max_percentage == float_percentage:
df[column] = df[column].astype(float, errors=‘ignore’)
results.append((column, “float”, int_percentage, float_percentage, string_percentage, date_percentage))
elif max_percentage == date_percentage:
df[column] = pd.to_datetime(df[column], errors=‘ignore’)
results.append((column, “date”, int_percentage, float_percentage, string_percentage, date_percentage))
else:
df[column] = df[column].astype(str, errors=‘ignore’)
results.append((column, “string”, int_percentage, float_percentage, string_percentage, date_percentage))
else:
results.append((column, “No data”, 0, 0, 0, 0))
df = knio.Table.from_pandas(df)
df.to_csv(file_path, index=False)
return df, results

df, df_results = analyze_data(df)

mlauber71 · January 15, 2024, 5:26pm

It might be best to read about how to set up KNIME and Python. There is this guide:

https://docs.knime.com/latest/python_installation_guide/index.html#_introduction

Also you might want to check the links I have provided. Maybe you start with a few examples from the KNIME Hub to see how KNIME and Python work together:

alimly13 · January 15, 2024, 11:57pm

Hi again, my freind I have read the link you mentioned completely and I did all this after I installed knime. What else can I do to solve my problem? Or is it possible to give me your Telegram ID so I can send a message there and guide me using any desk?

steffen_KNIME · January 16, 2024, 8:35am

Dear @alimly13,

nice that you work with the Python nodes.

@mlauber71 suggested several things which you still could try out:

(1) provide us with a small workflow showing the error
(2) formatting the code in your posts (what you sent is unreadable unfortunately):

Furthermore, the script example when (3) putting a new Python Script node into your workflow should give you a working example which you can then (4) extend to your own preferences. (5) Then you can post a screenshot of the error.

I suggest having a look at these steps (1)-(5) again, without this it is difficult for us to help you.

Best regards
Steffen

alimly13 · January 16, 2024, 3:35pm

See, I set this mode in the python settings in knime, and now when I put this code, it gives me this error

I put my script code in Python node and it gives me this error

alimly13 · January 16, 2024, 9:22pm

hi In the first node, I used csv reader, in the second node, I used python script, then I typed this code, but it doesn’t change the data types, if my data types are changed in my output, what should I do? import pandas as pd
from dateutil.parser import parse
import knime.scripting.io as knio

df = knio.input_tables[0].to_pandas()

def is_date(string, fuzzy=False):
try:
parse(string, fuzzy=fuzzy)
return True
except ValueError:
return False

def analyze_data(df):
results =
for column in df.columns:
int_count = 0
float_count = 0
string_count = 0
date_count = 0
for index, row in df.iterrows():
value = row[column]
if pd.notnull(value):
try:
value = int(value)
int_count += 1
except ValueError:
try:
value = float(value)
float_count += 1
except ValueError:
if is_date(value):
date_count += 1
else:
string_count += 1
total = int_count + float_count + string_count + date_count
if total != 0:
int_percentage = int_count/total100
float_percentage = float_count/total100
string_percentage = string_count/total100
date_percentage = date_count/total100
max_percentage = max(int_percentage, float_percentage, string_percentage, date_percentage)
if max_percentage == int_percentage:
df[column] = df[column].astype(int, errors=‘ignore’)
results.append((column, “int”, int_percentage, float_percentage, string_percentage, date_percentage))
elif max_percentage == float_percentage:
df[column] = df[column].astype(float, errors=‘ignore’)
results.append((column, “float”, int_percentage, float_percentage, string_percentage, date_percentage))
elif max_percentage == date_percentage:
df[column] = pd.to_datetime(df[column], errors=‘ignore’)
results.append((column, “date”, int_percentage, float_percentage, string_percentage, date_percentage))
else:
df[column] = df[column].astype(str, errors=‘ignore’)
results.append((column, “string”, int_percentage, float_percentage, string_percentage, date_percentage))
else:
results.append((column, “No data”, 0, 0, 0, 0))
df = knio.Table.from_pandas(df)
return df, results

df, df_results = analyze_data(df)
knio.output_tables[0] = df

mlauber71 · January 16, 2024, 10:30pm

@alimly13 I think the suggestion from the other thread still stands and I would encourage you to familiarise yourself with how knime and Python work together

First thing I notice is that you might want to check how data comes in and out. A simple example would look like this:

import knime.scripting.io as knio

import numpy as np
import pandas as pd

# data from KNIME to Python (pandas)
df = knio.input_tables[0].to_pandas()

# data from python (pandas) to KNIME
knio.output_tables[0] = knio.Table.from_pandas(df)

This will not work:

If you could provide some sample data one could try and check the Python syntax itself. Though you also could employ ChatGPT these days.

ScottF · January 17, 2024, 1:44am

Hi @alimly13 -

I merged your new thread back into the older one. Please help us keep the forum tidy, and don’t create multiple threads for the same issue.

rfeigel · January 17, 2024, 4:15pm

I’m curious about why you’re trying to use Python to change data types. There are native Knime nodes which will do that.

alimly13 · January 27, 2024, 9:12pm

I also used this code, but the data type of my columns did not change, what should I do?
import pandas as pd
import knime.scripting.io as knio
df = knio.input_tables[0].to_pandas()

def analyze_data(df):
results =
for column in df.columns:
int_count = 0
float_count = 0
string_count = 0
for index, row in df.iterrows():
value = row[column]
if pd.notnull(value):
try:
value = int(value)
int_count += 1
except ValueError:
try:
value = float(value)
float_count += 1
except ValueError:
string_count += 1
total = int_count + float_count + string_count
if total != 0:
int_percentage = int_count/total100
float_percentage = float_count/total100
string_percentage = string_count/total*100
max_percentage = max(int_percentage, float_percentage, string_percentage)
if max_percentage == int_percentage and max_percentage == float_percentage and max_percentage == string_percentage:
results.append((column, “string”, int_percentage, float_percentage, string_percentage))
elif max_percentage == int_percentage and max_percentage == float_percentage:
results.append((column, “float”, int_percentage, float_percentage, string_percentage))
elif max_percentage == string_percentage and max_percentage == float_percentage:
results.append((column, “string”, int_percentage, float_percentage, string_percentage))
elif max_percentage == int_percentage and max_percentage == string_percentage:
results.append((column, “string”, int_percentage, float_percentage, string_percentage))
elif max_percentage == int_percentage:
results.append((column, “int”, int_percentage, float_percentage, string_percentage))
elif max_percentage == float_percentage:
results.append((column, “float”, int_percentage, float_percentage, string_percentage))
else:
results.append((column, “string”, int_percentage, float_percentage, string_percentage))
else:
results.append((column, “No data”, 0, 0, 0))
return pd.DataFrame(results, columns=[‘Column’, ‘Type’, ‘Int Percentage’, ‘Float Percentage’, ‘String Percentage’])
df_results = analyze_data(df)
knio.output_tables[0] = knio.Table.from_pandas(df)

mlauber71 · January 27, 2024, 9:54pm

You might want to export df_results instead of df in the end …

system · April 15, 2024, 11:15pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.