Tree Ensemble Learner does not recognize newlines in column names (include/exclude filter does)

I am trying to use the Tree Ensemble Learner node for feature extraction.

The target column is not showing columns that have newlines in them. Unfortunately, the data that I downloaded has newlines. On the other hand, the Include/Exclude filters are showing those attributes.

(Side note: it would be nice to have a search/filter option for the target column just like include/exclude filters. there are way too many columns in my dataset)

@tims I do not fully understand what you mean by „newlines“. Maybe you could provide us with an example.

Here’s an example of a column name that does not show up in “Target” dropdown:

“Length
(feet)”

@tims OK. I seriously would recommend changing that column name. All sorts of problems will lay ahead if you don‘t. Maybe just use the rename node.

I have a large dataset with lots of such columns … could this be automated on the fly?

@tims I think it could but I would have to try a few things. You might have to mask the line breaks and maybe other elements.

1 Like

@tims you could use a regex function and a “Insert Column Header” node to clear your variable names. The logic is you remove everything that is not explicitly permitted. In this case a blank is OK. You might edit that to your needs.

regexReplace($Column Name$,"[^a-zA-Z0-9 ]","")

4 Likes

Thank you … does this loop through all columns and do the replacement automatically?
Would this need a “Loop Start/End” node?

@tims it handles every column at once. The Extract Table Specs node collects the column names and the String Replacer handles them all.

1 Like

Thanks for sharing @mlauber71. So far I always used transpose with column header replacement but never the table specs directly. Great idea
br

1 Like

@mlauber71 your solution worked excellently in removing junk characters from column names. However, I still don’t see the column of my interest in “Target Column” dropdown of “Tree Ensemble Learner”

It is of double data type and is visible in the “Insert Column Header” node, all special characters gone and with only alphabets and underscores in it.

Is there any other reason why a column name would not show up in the “Target Column”?

@tims if the target is a double you will need the regression variant. Otherwise could you provide us with a small example?

2 Likes

@mlauber71 … Thanks, the regression variant worked! I wish I could mark both - newline removal and regression variant node consideration as solutions!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.