I’m a new Knime user (V 3.7.1) coming from a Python programming background. I have installed Python 3.6 using Anaconda on Windows 10. I am munging data using Python Script (1=>1). I have tested my script in GradientBoostingRegressor and it works. I have replicated and executed the script within the Python Script node and it runs fine. I added a print statement at the end of my script to check, and my code generates the output I am looking for. My problem comes when I close the node and then try to execute it - it remains at 30% and eventually fails. Can anyone explain to why this is happening and how I can resolve it. Its incredibly frustrating
import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
features=[‘TopAirPress(-1)’,‘TopAirPress(-2)’,‘TopAirPress(-3)’,‘TopAirPress(-4)’,‘TopAirPress(-5)’,‘TopAirPress(-6)’,‘TopAirPress(-7)’,‘TopAirPress(-8)’,‘TopAirPress(-9)’,‘TopAirPress(-10)’,‘BottomAirPress(-1)’,‘BottomAirPress(-2)’,‘BottomAirPress(-3)’,‘BottomAirPress(-4)’,‘BottomAirPress(-5)’,‘BottomAirPress(-6)’,‘BottomAirPress(-7)’,‘BottomAirPress(-8)’,‘BottomAirPress(-9)’,‘BottomAirPress(-10)’]
X=input_table[features].values
Y=input_table[‘TopAirPress’]
#print(X)
def log_transform(feature):
train[feature] = np.log1p(train[feature].values)
def quadratic(feature):
train[feature + ‘2’] = train[feature] ** 2
gbm = GradientBoostingRegressor(n_estimators=4000, alpha=0.01);
y_train_log = np.log1p(Y)
gbm.fit(X, y_train_log)
preds = gbm.predict(X)
#print(preds)
predicted_values = np.exp(y_train_log)
actual_values = np.exp(preds)
predicted_values = np.array(predicted_values)
actual_values = np.array(actual_values)
log_predict = np.log(predicted_values + 1)
log_actual = np.log(actual_values + 1)
difference = log_predict - log_actual
difference = np.square(difference)
mean_difference = difference.mean()
score = np.sqrt(mean_difference)
#print ("RMSLE Value For Gradient Boost: ", score)
result = pd.DataFrame(np.exp(preds))