Help Dealing With Special Characters

badger101 · May 12, 2022, 8:32pm

Hello,

My GetRequest body contains hyphen-separated tokens. Examples:

strawberry-cupcakes
adios-amigos
edward-scissorhands

Sometimes, longtail tokens will have special characters in the middle, such as when & and + are involved. Another example is estudios-de-espa%C3%B1ol.

I’d like to convert them to the original characters but without diacritics. How do I do that? I also want to keep it low to no code, avoiding Java Snippet Node.

Thanks in advance!

bruno29a · May 13, 2022, 3:48am

Hi @badger101 , first of all, you have to use the correct character encoding.

The %C3%B1 is the unicode representation of the latin character ñ

So estudios-de-espa%C3%B1ol should actually read estudios-de-español

After that, you can remove diacritics with the function removeDiacritic() (via String Manipulation or Column Expressions, etc)

badger101 · May 13, 2022, 12:10pm

Hi @bruno29a I’m not uploading files, I’m downloading data via the Get Request as I’ve mentioned in the post. Unless there’s a way to change the encoding via a node, I’m not sure that’s possible. (I want to do it without having to download the data to my desktop and having to reupload via the File Reader afterward.)

bruno29a · May 13, 2022, 12:16pm

Hi @badger101 , can you share the url so we can see the data? If it’s already that way, then there may be a need to manually convert these, but it’s going to be hard if there is no pattern. We need to see the data to determine the best way to go about the conversion

badger101 · May 13, 2022, 12:19pm

Hi @bruno29a thanks for the quick reply. Sure, I’m Get-requesting any profile on Pinterest. The format of the general URL is https://www.pinterest.com/insert_profile_name_here/

badger101 · May 13, 2022, 12:23pm

It’ll extract the first two rows of the collection items for a particular profile. The problem comes when there are some collections are given titles with non-English words (even for the English-based profiles) and/or “&” and “+” symbols.

It’s user-generated content, so I have to expect every possible special characters.

badger101 · May 14, 2022, 8:34am

Found my solution! Just need to convert my strings to a dummy URL then make use of this node from the Vernalis Extension.

system · May 21, 2022, 8:34am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.