Using Ascii to validate / manipulate Strings

Good Morning / Afternoon / Evening fellow KNIMErs

A while ago I made a NuGet Package in C# that validates Strings and manipulates Strings. I wanted it to be a strict process so I used ASCII. Below is an example of one of the validation methods:

public static bool IsAllUKEULettersNumbersAndSpaces(string text)
        {
            int strLen = text.Length;
            int counter = 0;
            foreach (char c in text)
            {
                int currentChar = Convert.ToInt32(c);
                // uppercase A - Z
                if (currentChar > 64 && currentChar < 91) { counter += 1; }
                // lowercase a - z
                else if (currentChar > 96 && currentChar < 123) { counter += 1; }
                // space
                else if (currentChar == 32) { counter += 1; }
                // numbers 0 - 9
                else if (currentChar > 47 && currentChar < 58) { counter += 1; }
                // À - Ö
                else if (currentChar > 191 && currentChar < 215) { counter += 1; }
                // Ø - Ü
                else if (currentChar > 215 && currentChar < 247) { counter += 1; }
                // ø - ÿ
                else if (currentChar > 247 && currentChar < 256) { counter += 1; }
            }
            if (strLen == counter)
            {
                return true;
            }
            else
            {
                return false;
            }
        }

I am now making a generic StringCleaner Component for my co-workers and I. It will be configurable with options like:

  • Chose Output Case: All Upper Case, All Lower Case, Pascal Case, Case as Input

  • Chose Output: Output as Input, Keep All UKEULetters Numbers and Spaces, Keep All UKEULetters and Spaces, Keep All Numbers and Spaces, Keep All UKEULetters and Numbers, etc

  • Remove Concurrent Spaces: false, true

  • And many more

I have looked at String Manipulation, String Replacer, Column Expression nodes etc and through the forums and can not find a way to use ASCII. I know I could put all of this in a Java Snippet.

Is there a KNIME solution out there?

Frank

Hi @FrankColumbo,

Looks like your task is fairly complex for string cleaning. However, you can also use Rule Engine node to replicate the code snippet in KNIME.
As a preliminary step you would also need to provide a dictionary mapping to your component or download it from somewhere to keep things moving.

Best,
Ali

1 Like

Hi @FrankColumbo , you may be interested in the following thread

As you have alluded to, this component makes use of java snippet to perform elements of filtering of strings and may give some inspiration. I don’t think you’ll find a “standard nodes” solution to what you are trying to do, or if you do, it’ll probably be so convoluted you’ll wish you’d just written it in java :wink:

By all means open it up and take a look inside! :slight_smile:

Using regex with unicode character classes rather than the specific ascii character subset is probably the way to go for things like “UKEULetters” and so forth.

hope that helps.

1 Like

Hello
thank you @aliasghar_marvi and @takbb for you replies.

Takbb your component is great! What I would like to do (in the future / when I have time) is try and merge what I have done with yours.

I have just uploaded my component into the KNIME hub .Here is a link

If anyone has any thoughts or improvements to suggest please let me know.

Frank

1 Like

Hi @FrankColumbo, thank you for the compliment!

Yours is also great. I did find a small bug, that crept in. On your String Manipulations for lower case and pascal case they are referring to the column name $output$ with a lower case “o” instead of $Ouput$. How ironic that it should be the case that causes trouble. :wink:

But nice one, and feel free to tag me if you event want to collaborate over building components.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.