HASH KEY - Сharacter length

Hi everyone, I need to compress the values in a field to fit them into a CSV file. In short, I have a field with values longer than 30 characters, so I need to find a way to compress them, possibly using a hash key, without reducing the number of fields. The values in the field vary—some are shorter than 30 characters, some are exactly 30 characters, and some exceed 30 characters. How can I do this in KNIME? Is there a way to write Java code using the Java Snippet node or other methods?

I’ll add that the problem arises specifically because the data exceeds 30 characters, but I can’t simply use a substring since I need to preserve the original data in the fields.

Hi @BanS ,

If I understood you correctly, you need to later recover the full, original string. Correct? If so, you’re looking for a lossless compression. A hash function will compute a (hopefully) unique number for each value (to e.g. identify a location in an array → hash map), but it will not compress the value itself and you still need to store the value itself.
Compression is a general task where you’ll find quite a few approaches, see e.g. javascript - Lossless compression method to shorten string before base64 encoding to make it shorter? - Stack Overflow

Yes, you can implement compression in Java or Python code in KNIME, but if you have some knowledge about your input data to compress, you might be able to come up with your own rules or functions (e.g. using an Expression node, Rule Engine, or even a full lookup table).

If the task is not set in stone, I’d be interested in the overall use-case. Maybe there are alternatives to storing the data in a CSV file with the additional 30 character restriction…?

Kind regards
Marvin

2 Likes