Guided Labeling for Document Classification

#1

This workflow defines a fully automated web based application that will label your data using active learning. The workflow was designed for business analysts to easily go through documents to be labeled in any number of classes. In each iteration the user labels more documents and the model is trained using the already labeled instances. With every new iteration, the model proposes the most uncertain documents using the entropy scorer node. Once the user is happy with the performance achieved with the available labels, they can exit the loop and export the model to label the remaining instances.


This is a companion discussion topic for the original entry at https://kni.me/w/y5nhpbd1PP5F4WKH
0 Likes

#2

Hello everyone,
I think it makes sense to post some of the views screenshot of this workflow.
As you can see they were taken from a web browser using the KNIME WebPortal:

You can use Tag Cloud terms to quickly filter documents to be labelled in the same way.
This screenshot is about labeling for sentiment analysis of movie reviews.

The workflow also works for multiclass classification like for example topic detection:

I am hiding few words on the screenshots to be politically correct :wink:

If you have any questions let me know!
Cheers
Paolo

0 Likes

#3

Hi,
From the screenshots it looks like a really useful utility for labeling data so I downloaded the workflow and tried to execute it but I am running into problems. May be you could help me.
No dialog box or any of the screenshot that you show pops up and I get warnings. I installed all the plugins that KNIME asked me to install to run the workflow. The program ends with a red cross on the Deploy node showing the error “XGBoost Linear Ensemble Learner 2:357:0:311 The selected target column is no longer valid. Please select a valid column in the dialog.”
Please let me know what I am doing wrong as I would really like to use this workflow.

Some of the initial warnings that I get are given below:

WARN Text Output 2:349:0:308 Errors overwriting node settings with flow variables: Unknown variable “css_title”
WARN File Upload 2:349:0:327 Errors overwriting node settings with flow variables: Unknown variable “css”
WARN Missing Value 2:349:0:1182:1165 The current settings use missing value handling methods that cannot be represented in PMML 4.2
WARN Table Creator 2:283 Node created an empty data table.
WARN GroupBy 2:371:324:287 No grouping column included. Aggregate complete table.
WARN XGBoost Linear Ensemble Learner 2:371:325:311 The selected target column is no longer valid. Please select a valid column in the dialog.
WARN XGBoost Linear Ensemble Learner 2:371:332:311 The selected target column is no longer valid. Please select a valid column in the dialog.
WARN GroupBy 2:371:324:287 No grouping column included. Aggregate complete table.
WARN XGBoost Linear Ensemble Learner 2:371:325:311 The selected target column is no longer valid. Please select a valid column in the dialog.
WARN XGBoost Linear Ensemble Learner 2:371:332:311 The selected target column is no longer valid. Please select a valid column in the dialog.
WARN GroupBy 2:371:324:287 No grouping column included. Aggregate complete table.
WARN XGBoost Linear Ensemble Learner 2:371:325:311 The selected target column is no longer valid. Please select a valid column in the dialog.
WARN XGBoost Linear Ensemble Learner 2:371:332:311 The selected target column is no longer valid. Please select a valid column in the dialog.
WARN XGBoost Linear Ensemble Learner 2:371:325:311 The selected target column is no longer valid. Please select a valid column in the dialog.
WARN XGBoost Linear Ensemble Learner 2:371:332:311 The selected target column is no longer valid. Please select a valid column in the dialog.
WARN IF Switch 2:371:341 Node created an empty data table.
WARN End IF 2:371:343 Node created an empty data table.
WARN Punctuation Erasure 2:365:1167:34 Node created an empty data table.
WARN Number Filter 2:365:1167:35 Node created an empty data table.
WARN N Chars Filter 2:365:1167:36 Node created an empty data table.
WARN Case Converter 2:365:1167:37 Node created an empty data table.
WARN Stop Word Filter 2:365:1167:372 Node created an empty data table.
WARN Bag Of Words Creator 2:365:1167:369 Node created an empty data table.
WARN TF 2:365:1167:367 Node created an empty data table.
WARN Group Loop Start 2:365:1167:436:411 Node created an empty data table.
WARN GroupBy 2:365:1167:436:412 Empty input table found
WARN Math Formula 2:365:1167:436:414 Node created an empty data table.
WARN Column Filter 2:365:1167:436:415 Node created an empty data table.
WARN Row Filter 2:365:1167:436:416 Node created an empty data table.
WARN Math Formula 2:365:1167:436:418 Node created an empty data table.
WARN Loop End 2:365:1167:436:419 Node created an empty data table.
WARN Pivoting 2:365:1167:436:421 Node created empty data tables on all out-ports.
WARN Missing Value 2:365:1167:436:422 Node created an empty data table.
WARN RowID 2:365:1167:436:425 Node created an empty data table.
WARN Transpose 2:365:1167:436:424 Node created an empty data table.
WARN RowID 2:365:1167:436:435 No row key column selected generate a new one
WARN RowID 2:365:1167:436:435 Node created an empty data table.
WARN Math Formula (Multi Column) 2:365:1167:436:423 Node created an empty data table.
WARN Row Filter 2:365:1167:436:426 Node created an empty data table.
WARN Transpose 2:365:1167:436:427 Node created an empty data table.
WARN Rule-based Row Filter 2:365:1167:436:432 Line: 1: Not a column: Row0
$Row0$ = $TF abs$ => TRUE
^
WARN RowID 2:365:1167:436:428 Node created an empty data table.

0 Likes

#4

Hi @junejo,
to use this workflow you need to execute the workflow iteration by iteration and label more and more documents. To get the full potential of this workflow deploy it on KNIME WebPortal which unfortunately comes with KNIME Server and it is not integrated yet in the KNIME Analytics Platform.

To troubleshoot the workflow however label documents start by right clicking on the “Label” Component and execute and open its view, type in some labels abbreviations and select “Apply and Close” at the bottom right corner of the view. Then do not execute the Loop end node just yet. Select the loop end and click twice using this button on the top toolbar of the KNIME Analytics Platform: 123

You should see the worflow performing one iteration and you should be able to open the view of the second iteration with the tag cloud present.

Read this article to know more:

https://www.knime.com/blog/labeling-with-active-learning

Cheers
Paolo

0 Likes