Tree Ensemble Learner does not recognize newlines in column names (include/exclude filter does)

tims · February 11, 2022, 6:40pm

I am trying to use the Tree Ensemble Learner node for feature extraction.

The target column is not showing columns that have newlines in them. Unfortunately, the data that I downloaded has newlines. On the other hand, the Include/Exclude filters are showing those attributes.

(Side note: it would be nice to have a search/filter option for the target column just like include/exclude filters. there are way too many columns in my dataset)

mlauber71 · February 11, 2022, 8:00pm

@tims I do not fully understand what you mean by „newlines“. Maybe you could provide us with an example.

tims · February 11, 2022, 8:01pm

Here’s an example of a column name that does not show up in “Target” dropdown:

“Length
(feet)”

mlauber71 · February 11, 2022, 8:02pm

@tims OK. I seriously would recommend changing that column name. All sorts of problems will lay ahead if you don‘t. Maybe just use the rename node.

tims · February 11, 2022, 8:04pm

I have a large dataset with lots of such columns … could this be automated on the fly?

mlauber71 · February 11, 2022, 8:09pm

@tims I think it could but I would have to try a few things. You might have to mask the line breaks and maybe other elements.

mlauber71 · February 15, 2022, 5:11pm

@tims you could use a regex function and a “Insert Column Header” node to clear your variable names. The logic is you remove everything that is not explicitly permitted. In this case a blank is OK. You might edit that to your needs.

regexReplace($Column Name$,"[^a-zA-Z0-9 ]","")

tims · February 15, 2022, 6:10pm

Thank you … does this loop through all columns and do the replacement automatically?
Would this need a “Loop Start/End” node?

mlauber71 · February 15, 2022, 6:12pm

@tims it handles every column at once. The Extract Table Specs node collects the column names and the String Replacer handles them all.

Daniel_Weikert · February 16, 2022, 6:00pm

Thanks for sharing @mlauber71. So far I always used transpose with column header replacement but never the table specs directly. Great idea
br

tims · February 16, 2022, 9:31pm

@mlauber71 your solution worked excellently in removing junk characters from column names. However, I still don’t see the column of my interest in “Target Column” dropdown of “Tree Ensemble Learner”

It is of double data type and is visible in the “Insert Column Header” node, all special characters gone and with only alphabets and underscores in it.

Is there any other reason why a column name would not show up in the “Target Column”?

mlauber71 · February 16, 2022, 9:35pm

@tims if the target is a double you will need the regression variant. Otherwise could you provide us with a small example?

tims · February 16, 2022, 9:45pm

@mlauber71 … Thanks, the regression variant worked! I wish I could mark both - newline removal and regression variant node consideration as solutions!

system · February 23, 2022, 9:46pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.