Using Jupyter from KNIME to embed documents

Hub · June 27, 2019, 7:27am

Uses functionality provided in a Jupyter notebook to embed documents from a topic-space representation into 2D Euclidean space. The embedding is done using scikit-learn's implementation of the t-SNE algorithm.

This is a companion discussion topic for the original entry at https://kni.me/w/cD6OEYJXZmkXuEtm

mlauber71 · March 23, 2019, 3:46pm

The Jupyter Notebook seemingly did not make it into the final workflow, but it should be this one:

gist.github.com

https://gist.github.com/greglandrum/88d1739577c26b01e871d83c60c8898e

tSNE_for_text.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Working with t-SNE from scikit-learn\n",
    "\n",
    "I wanted to do an experiment and try embedding documents in 2D space so that the proximity can be used to identify clusters of related documents. I have a set of documents extracted from pubmed based on queries for disease names - Jeany described the construction of the dataset in her [Fun with Tags](https://www.knime.com/blog/fun-with-tags) blog post - and I've used those to build a topic model. I want to try the embedding using the projection of the documents into the topic space.\n",
    "\n",

This file has been truncated. show original

And also if one would put it into a subfolder in the workflow the relative KNIME path should look something like this:
knime://knime.workflow/jupyter_notebooks/tSNE_for_text.ipynb

Thank you for this interesting workflow

MarcelW · March 24, 2019, 5:45pm

Thanks mlauber71, we will add the missing file asap.