OIS - MOSAIQ - changing RTF to plain text

Once again I would appreciate help if one of the SQL gurus can assist.

In my MSSQL database (MOSAIQ_REPORT_SERVER) there is a table ({Notes}) which contains a field ({Notes.notes}) of the memo type. It holds long text entries and they are formatted using RTF. I am able to successfully identify the notes that I want.

So, what was entered as this in the GUI (which I have no influence over):

DOB: 8.7.1839 - 51yo
SOUTHERN IML: 31.12.2016
WOLLONGONG DIAGNOSTICS

Comes out as this:

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial;}}\viewkind4\uc1\pard\fs18 DOB: 8.7.1839 - 51yo\par
\par
SOUTHERN IML: 31.12.2016\par
\par
WOLLONGONG DIAGNOSTICS\par }

I have found code that apparently will return the text to a plain format, but it defines a SQL function to be added to your database. The code is found here.

Obviously I don’t want to be playing around with my departmental database! So is there some way of using this SQL code inside a workflow object to allow identified RTF text inputs to be outputted as plain text?

Many thanks for any help.

A

Hi @AAM,

you could use the TiKa parser Node. It extracts text from rft documents.

Best regards
Andrew

2 Likes

Hi there @AAM,

when you say “comes out as this” where does it come out? In KNIME when you read it or?

Br,
Ivan

I mean that the output of the extraction looks like this:

While the view in the GUI shows:

DOB: 25.2.1962 - 50yo
SOUTHERN IML: CYTO - 5.10.2012

Hope this answers the question,

A

Dear Andrew,

Unfortunately I don’t seem to be able to put TiKa into the workflow.

The output from my database query has a field that contains text with RTF control characters. I think that I am looking for something like the String Manipulation node where a column is specified for the transformation to take place to return plain text.

There is an instance of the field in the post above this one.

Hi @AAM,

Not really :smiley: I’m wondering where is this output and how is it KNIME connected?

Br,
Ivan

It is the output from a DB Query node that SELECTs a field “Notes”.“notes” FROM “Notes” into the output.

It is the first step of the extraction process. No manipulation has occurred.

Do you need to see the workflow and output?

A

Hi @AAM,

oh yes, I’m sorry. The TiKa parser unfortunately works on file level. You would have to write the output from the database into a temporary file to read it in again later.

Best regards
Andrew

Hi @AAM,

you can try what @Andrew_Steel is proposing but I would rather prefer (if possible) to read it in KNIME as seen in GUI. Either by manipulating query or by choosing different Input Type Mapping option in connector node. So what is the database type in MySQL(haven’t found memo type to be honest)? And when you read it with DB Query Reader node you have String type I suppose?

Br,
Ivan

Hi @AAM,

I don’t think this is the best solution but it works

ParsRTFwithPython.knwf (9.9 KB)

Best Regards
Andrew

1 Like

Thanks Ivan, the ideas might bring a solution closer!

It’s a MS SQL database.
I’ll check later about it’s type. I know that I can select an option in CRW to deliver plain text, so I’ll have another look there also.

Once again, thank you for the time & effort.

A

Hello @Andrew_Steel

Thank you for the Python workflow. I’ll certainly give it a go soon.

Thanks for the time & effort in searching. Personally I find climbing this learning curve demoralising when alone & invigorating when there is collaboration! I can see my abilities and competence increasing by the day, and that is what keeps me working.

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.