I have a project to do some analytics on customer surveys but in a number of the free-text columns, I am finding HTML tags like <br/>, <b> and <strong> that are causing all kinds of havoc. I have tried to remove the tags with the “String Manipulation” node but it is not working as I hoped.
I am sure that I am not the first person to encounter this problem. Does anyone have a solution?
<p>Hi,</p>
<p>I have a project to do some analytics on customer surveys but in a number of the free-text columns, I am finding HTML tags like <code><br/></code>, <code><b></code> and <code><strong></code> that are causing all kinds of havoc. I have tried to remove the tags with the “String Manipulation” node but it is not working as I hoped.</p>
<p>I am sure that I am not the first person to encounter this problem. Does anyone have a solution?</p>
<p>tC/.</p>
</div>
Output
Hi, I have a project to do some analytics on customer surveys but in a number of the free-text columns, I am finding HTML tags like <br/>, <b> and <strong> that are causing all kinds of havoc. I have tried to remove the tags with the “String Manipulation” node but it is not working as I hoped. I am sure that I am not the first person to encounter this problem. Does anyone have a solution? tC/.
If you want to make this more “readable” I recommend the HTML Node to Text node from Palladian which tries to keep the HTML semantics (i.e. new lines after block elements, filter comments and script, and style tags, etc.)
Hi,
I have a project to do some analytics on customer surveys but in a number of the free-text columns, I am finding HTML tags like <br/>, <b> and <strong> that are causing all kinds of havoc. I have tried to remove the tags with the "String Manipulation" node but it is not working as I hoped.
I am sure that I am not the first person to encounter this problem. Does anyone have a solution?
tC/.