Topic Detection on IMDB reviews - not removing stopwords

I’ve just run the example - but the stopword filter doesn’t appear to work - topics include: his, her, and, the etc. The filter option is on Stopword lists: English. Other options except Case sensitive are greyed out. (Use built-in list is ticked, and greyed out). No error messages in the Console. I’m a complete newbie so would appreciate being pointed in the right direction. Thank you.

Hi @StephenG and welcome to the forum -

I’m not sure exactly which workflow you’re describing, but I’ll attempt to answer anyway.

A common issue with the text processing nodes occurs when selecting the appropriate document column. You’ll notice that the first time you apply such a node in a workflow, you’ll be presented with an option to append a new downstream column:

2020-02-17 13_21_58-Dialog - 0_330 - Stop Word Filter

You’ll want to be careful when doing any downstream processing that you select the Preprocessed Document column for any subsequent nodes. For example, in the Document Viewer node:

2020-02-17 13_26_41-Dialog - 0_331 - Document Viewer

If you’re inconsistent in your selections, then sometimes it will seem like the text processing nodes aren’t working.

Does that help?

2 Likes

Thanks Scott. The workflow comes with the initial download:
image
I assumed it would run without needing to configure since it’s a demo. But now I’ve learned some things!

If you run it as it is the output includes stopwords. The Stop Word Filter is set to Append column: Preprocessed Document, but the Topic Extractor is set to read the Document Column. Changing the Topic Extractor to read the Preprocessed Document as you indicated results in output without stopwords. A colleague suggested to change the output of the Stop Word Extractor to ‘replace column’ - and this also works.

2 Likes

Ah, the Example Workflows folder, of course. I was thinking about workflows on the Hub, and forgot about what was right under my nose. :slight_smile:

Glad you got it to work. It sounds like we need to tweak the Example workflows a little bit, since I noticed a deprecated node in there as well. Thanks for pointing this workflow out!

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.