Replacing multiple columns of text strings with new columns of text strings

Replacing multiple columns of text strings with new columns of text strings. Text strings vary from single word to six words for each string. There are hundreds of these text descriptions throughout 40 columns.

Obviously, Dictionary Replacer fails as it can only replace one column at a time, and I cannot make it work on multi-length word text descriptions.

I’ve tried many times using Unpivoting+Dictionary+Joiner model—then Pivoting to work (first graphic below). But I receive an empty set at the Joiner outcome. I believe it is because I am unable to place S:Value in the lower left box in the Joiner Column Section (second graphic below). The example I was mimicking is the third graphic below.

Four questions:

  1. How can I fix the above described failing?
  2. Do I also need to convert Strings to Terms for this endeavor?
  3. Some columns have a single series of these multi-word text descriptions—but other columns contain many of these multi-word text descriptions separated by commas (I download data from the source as CSV, but am converting it to Excel and using the Excel Reader node).
  4. After I replace the original text descriptions to the new text descriptions, I will need to complete an additional replacement of the new text descriptions to numeric codes. What additional nodes will I need to add to the model to complete this text to number task?

Thank You

Hi @smithcreed, welcome to the forum.

What is the logic that you’re using to make this replacement?

What is the reasoning behind not using the original CSV file?

It would help if you provided some data for the users here to work with.

Hi @smithcreed -

Maybe this will help you get started. I think the key here is to use a Cell Replacer inside an appropriate loop setup. This way you can avoid using the text processing nodes, as well as the pivoting/unpivoting approach you were thinking of.

2020-08-07 10_46_38-KNIME Analytics Platform

ColumnLoopReplaceAppendExample.knwf (14.0 KB)

This is just a simple example that I put together in five minutes, but see if you can adapt it to your real data. (Sounds like you may need to do some additional preprocessing on your lookup table first.)

1 Like

I’m brand new to using KNIME and simply found a quick excel solution that worked for reading the files. No further grand logic. This portion of the project is simply pulling the source data, converting the source text descriptions to my “standardized” text descriptions (several front end uses for this), then converting my standardized text descriptions to numeric tokens for clustering and regression analyses.

I am attaching some files for you here.

The first two related to “hilltop” are in CSV and XLS. These are my downloaded source files for now. One day, if I can get the full system working “manually”, this will all me move to an automated production mode. Also attached is my initial “dictionary” for the text-to-text conversion and my text to numeric conversion.

Thanks for your help while I begin to learn the many KNIME options available.
(Hmmm, a note from this chat does not allow CSV upload so I’m attaching the excel files only “Sorry, the file you are trying to upload is not authorized (authorized extensions: jpg, jpeg, png, gif, knar, knwf, zip, xls, tiff, tif, xlsx, txt, log, lsm, doc, docx, json, xml, pptx).”

Hilltop_QUAL_DSF_XLS.xlsx (73.1 KB) QUAL_DICTIONARY_wTOKENS_REcolorado_RESO_Fields_TEST.xlsx (25.0 KB)

Thank You Scott. Let me give your set up a try :grinning:

1 Like

Hi @smithcreed,

I have checked your data a bit and this seems like big proper project at first. There’s a lot of data and lot of mappings to do. You’ll probably need to do some data preparation before mapping starts at all. My recommendation is to start with smaller and simpler data that mimics real data in format and then scale it up once you got it running to some degree.

Good luck and if you’ll have any questions feel free to ask!

Br,
Ivan

thank you for the feedback

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.