Hi @leonaa01 , “dirty” in this context would mean data that is in some way not quite correct for processing. In this case it could be that you have data containing slightly different spellings, contains different case (e.g. some upper case text, some lower case) or even data with additional space characters on the beginning or end.
For example you have “AKI” (without a question mark), “AKI ?” and “AKI?” (without a space) so these will be treated as different. Likewise you have “Abdominal pain”, “Abdo pain”, “Abdomen pain” (small “p”) and “Abdomen Pain” (capital “P”) which may all be the same thing to you (or maybe they aren’t?), but are all different in terms of data processing.
I gave your screenshot to chatgpt so that your example data is available for copy/paste by anybody wishing to have a crack at assisting here, but essentially what you need to do is “standardise” the Presenting Conditions column.
Count |
Presenting conditions |
2 |
AKI |
1 |
AKI |
3 |
AKI |
1 |
AKI |
1 |
AKI |
1 |
AKI |
1 |
AKI ? |
1 |
AKI? |
1 |
AKI_dup |
1 |
AMU seated syncope while visiting h... |
1 |
AIR Delirium |
1 |
Abdo pain |
1 |
Abdo pain |
1 |
Abdomen pain |
1 |
Abdomen pain |
1 |
Abdomen pain |
1 |
Abdominal Pain and Vomiting x 3/7 |
1 |
Abdominal pain |
1 |
Abdominal pain |
1 |
Abdominal pain |
1 |
Abdominal pain |
1 |
Abnormal LFT |
This workflow demonstrates one way of achieving this if you have a relatively small data set that lends itself to such standardisation. The exact mechanism you need to use would largely depend on just how much variation of spellings/terms you have, and the size of the data set.
Standardising text for processing with Value Lookup and String Cleaner.knwf (123.9 KB)
Here, I have just used the String Cleaner to remove any trailing spaces (in case there are any) but there are a whole load more things it can do, such as converting everything to upper/lower case, removing diacritics (e.g. accent/variation of characters), non-printable/symbols and so on.
The Value Lookup can be used to lookup a term based on patterns. In this simple example I’ve used wildcards, but you can also use regex. You just need to make the option available in that node by clicking the “show advanced options” in its config.
The Table Creator is a lookup table such as this:
The end result from the demo workflow is this: