HTML characters / Help to convert HTML Symbols within data to text.

Hi;
There is a topic opened on this subject, but it is not enough or there is no complete solution. For me, all HTML characters and symbols that I will be confused with need to be converted to suit the text.

So, I think there is a need for a node that will transform every symbol and character encountered. Because I think there should be a fully comprehensive solution node that will be useful to everyone in organizing data.

Thanks in advance for your help.

Here are some sample texts.

“Acquisition of one thousand (1000) reams of A4 paper <br>
Purpose of the contract: Acquisition of one thousand (1000) reams of A4 paper <br>
File number: EX-2023-134141167- -APN-DCClocation and sample delivery period<br>
Address:Avda. Eduardo Madero 1020 7th floor CP 1106, CABA<br>
More info: <br>
Place of receipt of physical documentation: Avda. Eduardo Madero 1020 7th floor CP 1106, CABA<br>
Executing Unit: 56 - Purchasing and Contracting - ENRE”

“AUTOCAD LT 2020 Software Subscription for 36 months<br>
Object of the contract: AUTOCAD LT 2020 Software Subscription for 36 months<br>
File number: EX-2023-126092627- -APN-SFNo. of physical documentation: Sarmiento NO 1,962 of the Autonomous City of BUENOS AIRES, (C1044AAD).<br>
Executing Unit: 64/000 - Department of Purchasing and Contracting- SRT”

“Acquisition of bakery inputs for córdoba garrison. <br>
Object of hiring: acquisition of bakery inputs for córdoba garrison. <br>
File number: EX-2023-131901844- -APN-CBAIV#EA <br>
Legal framework: Delegate Decree No. 1023/2001 Art. 25 <br>
Decree No. 1030/2016 Art.10 <br>
<br>
Mais Info: <br>
Place of reception of physical documentation: 1st floor-UOC-SAF-CDO BR AEROT IV-AVENUE ARGENTINE-CAMINO A LA CALERA KM 9.5 (CP 5023) CORDOBA CAPITAL <br>
Executing Unit: 84/39 - Parachute Brigade Command IV”

“Acquisition of cleaning items for the maintenance of offices <br>
Object of hiring: Acquisition of cleaning articles for the maintenance of cleaning and hygiene of the offices of the C.I. <br>
File number: EX-2023-133219107- -APN-DGIA#ara <br>
Legal framework: Delegate Decree No. 1023/2001 Art. 25 <br>
Decree No. 1030/2016 Art.14 <br>
<br>
Mais Info: <br>
Place of reception of physical documentation: av.comodoro py 2055 - CABA - 4th floor - office 4-36 <br>
Executing Unit: 38/31 - General Directorate of Intelligence of the Navy”

This answer did not make the edit I wanted. A fully comprehensive solution is required. A node that will convert all characters into text, not just certain characters, would be the definitive solution for everyone.

Hi @umutcankurt

That sounds like a job for the Markup Tag Filter.

image

1 Like

test data1.xlsx (38.7 KB)
Hi @ArjenEX
A sample data file is attached. Can you please make me an example?

There isn’t much to make to be honest :wink:

Markup Tag Filter → default settings

The String to Documents is even an optional thing, depends a bit on what you ultimately want to do with the text. The Markup node will translate <br> to
in the first run so just place two Markup Tag Filter nodes in sequence and you will have a clean text.

2 Likes

:+1: :beer: @ArjenEX Thanks a lot. It definitely works but after adding the second one.

Final

image

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.