Guided Labeling for Document Classification

This workflow defines a fully automated web based application that will label your data using active learning. The workflow was designed for business analysts to easily go through documents to be labeled in any number of classes. In each iteration the user labels more documents and the model is trained using the already labeled instances. With every new iteration, the model proposes the most uncertain documents using the entropy scorer node. Once the user is happy with the performance achieved with the available labels, they can exit the loop and export the model to label the remaining instances.


This is a companion discussion topic for the original entry at https://kni.me/w/y5nhpbd1PP5F4WKH

Hello everyone,
I think it makes sense to post some of the views screenshot of this workflow.
As you can see they were taken from a web browser using the KNIME WebPortal:

You can use Tag Cloud terms to quickly filter documents to be labelled in the same way.
This screenshot is about labeling for sentiment analysis of movie reviews.

The workflow also works for multiclass classification like for example topic detection:

I am hiding few words on the screenshots to be politically correct :wink:

If you have any questions let me know!
Cheers
Paolo

Hi,
From the screenshots it looks like a really useful utility for labeling data so I downloaded the workflow and tried to execute it but I am running into problems. May be you could help me.
No dialog box or any of the screenshot that you show pops up and I get warnings. I installed all the plugins that KNIME asked me to install to run the workflow. The program ends with a red cross on the Deploy node showing the error “XGBoost Linear Ensemble Learner 2:357:0:311 The selected target column is no longer valid. Please select a valid column in the dialog.”
Please let me know what I am doing wrong as I would really like to use this workflow.

Some of the initial warnings that I get are given below:

WARN Rule-based Row Filter 2:365:1167:436:432 Line: 1: Not a column: Row0
$Row0$ = $TF abs$ => TRUE
^

Hi @junejo,
to use this workflow you need to execute the workflow iteration by iteration and label more and more documents. To get the full potential of this workflow deploy it on KNIME WebPortal which unfortunately comes with KNIME Server and it is not integrated yet in the KNIME Analytics Platform.

To troubleshoot the workflow however label documents start by right clicking on the “Label” Component and execute and open its view, type in some labels abbreviations and select “Apply and Close” at the bottom right corner of the view. Then do not execute the Loop end node just yet. Select the loop end and click twice using this button on the top toolbar of the KNIME Analytics Platform: 123

You should see the worflow performing one iteration and you should be able to open the view of the second iteration with the tag cloud present.

Read this article to know more:

https://www.knime.com/blog/labeling-with-active-learning

Cheers
Paolo

1 Like

Hey everyone,
this workflow was just updated to support both Active Learning and Weak Supervision with the new 4.1 Extensions.

Highlights:

The new Active Learning Loop is quite similar to the good old Recursive Loop just more intuitive to use.

Use Labeling Functions (or Rules) to label documents (example: if “great movie” in document body then sentiment label : “good”)

Labeling View added. Now to label your instances you need only buttons, no need of the Table Editor anymore.

Visualize the Labeling Functions in a Network View to measure how well they are correlated with the label provided by the user.

Train an XGBoost model using the probabilistic output of the Weak Label Predictor node.

2 Likes