Hi guys!
So, I’ve got a bunch of word documents that I need to process.
The basic idea of the documents are that they contain job descriptions that I need to fetch, transform, and push out as an import file for the target system.
The documents are formatted somewhat like this:
1.1 Job Title 1
Responsibilities
Bla bla bla bla…
[table]
Requirements
Bla bla bla…
1.2 Job Title 2
Responsibilities
Bla bla bla bla…
Requirements
Bla bla bla…
Some of the job descriptions have a table.
So far in Knime, I’ve been able to set up some rules to identify the paragraphs and separate the job positions. So I end up with a nice table, where I get all the paragraphs, and I have more columns defining the “1.1” number untill it sees the next new job title. All this is working fine, but what would be the best way for me to also preserve the tables?
The end result will be a text file to be imported into target system, and the target system is supporting HTML.
Thanks for any information that can point me in the right direction!