I need to clean data with thousands abbreviated by "K"

haygoodbeau · September 14, 2023, 6:01pm

I have multiple columns from a web scrape with cells that have thousands abbreviated with a “K”

If it was only something like “11k” i could just replace “k” with “000” and call it, but there is a mix of “11k”, “3.5k” and normal cells below 1000 with no “k”

How can I clean this data in KNIME?

“11k” would need “000”
“3.5k” would need “00”
Anything below 1000 needs nothing changed.

haygoodbeau · September 14, 2023, 7:15pm

My (not so elegant) solution is to use String manipulation to replace all instances of “k” with “000”, Use the Cell Splitter node to split the cells with “.” as the delimiter, then in the new column, replace all instances of “000” with “00”, then I am using the String Manipulation node to Join the two columns back together.

ArjenEX · September 14, 2023, 7:31pm

If you are looking for a one-node solution, something like this should work:

if (contains(column("column1"),".") == true) {
    replace(column("column1"),"k", "00")
} else {
    replace(column("column1"),"k", "000")
}

system · September 21, 2023, 7:32pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.