Logistic Regression Predictor Node - "The table for prediction does not contain the column ______."

Greetings,

I am a beginner in utilizing KNIME as part of a class I am presently taking and would like to preface this by stating that I am in no way, shape, or form asking for someone to give me the answers or otherwise do my assignment for me!

I am seeking help in overcoming a reoccurring warning message that, after two days of exhausting all available resources, I cannot for the life of me figure out and is preventing me from proceeding any further on this assignment.

Workflow is attached below. The objective of the assignment requires that I examine customer data to predict customer attrition for future customers. An excel file was provided with the data and I’ve been able to figure out the workflow, per the instructions provided.

Data was imported via the Excel Reader, and a Logistic Regression Learner Node configured with the “Churn?” as the target variable and “VMail Message”, “Day Mins”, “Eve Mins”, “Night Mins”, and “Intl Mins” set as the predictor variables. All configured per the instructions in the assignment.

*Side Note: It does produce a warning that the algorithm did not reach convergence after the specified number of epochs, but I’ve adjusted that number from the default value of 100 all the way up to 500,000 in various increments and it never goes away; however, it is generating a model and calculating the coefficients, Std. Dev, Z-score, and P values which I can observe in the table viewer attached to it.

The problem that is frustrating me is a constant warning in the console that states: "The table for the prediction does not contain the column “VMail Message”. Which prevents me from progressing forward.

I have gone through every troubleshooting step I’ve been able to find in the forums, I’ve watched countless youtube videos, and have downloaded about 30 workflows from various posts that even remotely relate to my issue in the hopes that I can find something within the workflow, whether it be nodes or their configurations, that is causing this warning to appear and prevent me from executing the Predictor node.

I went as far as cleaning up the data in excel, triple checking the document for missing values or special characters, removing all formatting, adjusting the formatting to ensure that numbers weren’t being stored as text, removing spaces from the column headers, and even deleting all columns save for the predictor and target variables. I’ve also experimented with deleting the “VMail Messages” column but all it does is pass on the same warning but for the next predictor variable column.

If anyone is willing to take a look and show me where I am messing this up, it would be immensely appreciated! Again, I am not asking for anyone to do my homework for me, I feel more than capable of finishing the rest on my own, I just desperately need help figuring out what I’m missing here.

I went ahead and redid the entire workflow from scratch, configured per my assignment instructions, and utilizing the original data provided by my instructor with no modifications. I unchecked the “reset workflows” option upon export:

KNIME_WK5_Logistic_Regression-Salvador_Rivera.knwf (698.2 KB)

But just in case the data is needed:
BIT-445-RS-Churn-2.xlsx (432.7 KB)

Thanks in advance!

-Sal



Try normalizing the data and then partition it to feed the learner and predictor.

2 Likes

@ElRivRav the reason for your immediate problem is that you are trying to connect a predictor to the statistics node of the learner. So there never will be the necessary column.

Then: you might want to learn more about the setup of machine learning workflows and machine learning in general. You would typically have a test and training dataset at least.

A sample workflow can look like this

2 Likes

@rfeigel , @mlauber71

Thank you so much for the input and resources! I Think I have this figured out, I had to use the normalizer and partitioning nodes in the previous assignment, and I did initially think to try to implement them somehow, but refrained, as the instructions did not direct me to do so, I suppose it was implied that preprocessing needed to take place, which makes sense. I played around with these as well as the Normalizer (PMML) and X Partitioner, but saw very little difference in the results and so elected the nodes from the previous assignment and per your advice.

This allowed me to overcome the immediate problem and move on to a different node. This is what I ended up with, workflow-wise:

Also, here is the updated export, in case anyone would like to offer further input as to how I can improve.

KNIME_WK5_Logistic_Regression-Salvador_Rivera.knwf (1.0 MB)

I think I can manage from here, so thank you again!

-Sal

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.