My GetRequest body contains hyphen-separated tokens. Examples:
Sometimes, longtail tokens will have special characters in the middle, such as when & and + are involved. Another example is estudios-de-espa%C3%B1ol.
I’d like to convert them to the original characters but without diacritics. How do I do that? I also want to keep it low to no code, avoiding Java Snippet Node.
Thanks in advance!
Hi @badger101 , first of all, you have to use the correct character encoding.
%C3%B1 is the unicode representation of the latin character
estudios-de-espa%C3%B1ol should actually read
After that, you can remove diacritics with the function
removeDiacritic() (via String Manipulation or Column Expressions, etc)
Hi @bruno29a I’m not uploading files, I’m downloading data via the Get Request as I’ve mentioned in the post. Unless there’s a way to change the encoding via a node, I’m not sure that’s possible. (I want to do it without having to download the data to my desktop and having to reupload via the File Reader afterward.)
Hi @badger101 , can you share the url so we can see the data? If it’s already that way, then there may be a need to manually convert these, but it’s going to be hard if there is no pattern. We need to see the data to determine the best way to go about the conversion
Hi @bruno29a thanks for the quick reply. Sure, I’m Get-requesting any profile on Pinterest. The format of the general URL is https://www.pinterest.com/insert_profile_name_here/
It’ll extract the first two rows of the collection items for a particular profile. The problem comes when there are some collections are given titles with non-English words (even for the English-based profiles) and/or “&” and “+” symbols.
It’s user-generated content, so I have to expect every possible special characters.
Found my solution! Just need to convert my strings to a dummy URL then make use of this node from the Vernalis Extension.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.