About developing my own textprocessing node .

Hi.
I want to create a node that can be used in connect with the knime’s “text processing” nodes.

To this end, I have downloaded(installed) the “textproecssiong(org.knime.ext.textprocessing)” API from the Plug-In. However, there is no “API documentation”, and development is difficult…

  1. I would like to use “DocumentCell” and “TermCell” as the DataCell type of “DataBuffredOutputTable” in code. How can I do this? or Could u tell me what i read content

Hey @HyojungPark,

I guess you have already installed KNIME SDK and checked out the code of the text processing extension, right? (There are also some links regarding node development on the bitbucket page.)

If you want to access Documents and Terms that are coming from cells of an input table, you can use the DocumentValue / TermValue interface:

final DataCell cell = ...
final Document d = ((DocumentValue)cell).getDocument();
// or for TermCells
final Term t = ((TermValue)cell).getTermValue();

If you want to modify documents and terms and create an output table, you could have a look at a basic preprocessing node (e.g. the Number Filter node). All preprocessing nodes are using a PreprocessingCellFactory which is responsible for the creation of new DocumentCells. It provides a method to get the cells of a data row and transform them (getCells(…)).
In case you want to create a preprocessing node, you could use the PreprocessingCellFactory. You only have to use the StreamablePreprocessingNodeModel as super class of your own preprocessing NodeModel.
The CellFactory also shows the creation of the DocumentCells.

[...]
// Creates a TextContainerDataCellFactory in the constructor of the CellFactory class...
m_documentCellFac = TextContainerDataCellFactoryBuilder.createDocumentCellFactory();
[...]

public DataCell getCell(final DataRow row) {
    final Document d = ((DocumentValue)row.getCell(colIndex)).getDocument();
    final Document preprocessedDocument = process(d);
    // The TextContainerDataCellFactory is then used to create a new DocumentCell within the getCell method.
    return m_documentCellFac.createDataCell(preprocessedDocument);
}

I hope this helps a bit. If you have more general questions on developing a node or a few more details about the node you want to create, feel free to answer. :slight_smile:

Best,
Julian

2 Likes