topic extraction

Hello,
I am trying to remove one word “http” from the document since it integrates
with the analysis of topic extraction. I tried stop word node and dictionary
filter but both did not work. Any suggestions?

Best,
Begum

Are you sure that when you applied the nodes you mentioned, that you are applying them consistently to the Preprocessed Document column (as opposed to the Document column)?

Along those lines, are you sure you are reviewing the proper column after making the changes?

If you’re still having trouble, please post an example exported workflow and we’ll see if we can figure out the problem.

Hello,

I tried to apply it many different places but it did not work. I attached the workflow without the stop word node and it would be great where is the right place to apply it? I want to remove “http” from the topics so that is why I want to apply this node.

topic extraction beg 1.knwf (327.1 KB)

I really appreciate your help!
Begum

This is where I applied it, and it seems to work fine for me:

2020-04-07 13_53_30-Window

2020-04-07 13_55_02-Dialog - 2_297_0_43 - Stop Word Filter (Custom Filter)

This is at the end of the first preprocessing metanode.

I really appreciate your help!

I tried to use the same path but I still got error. I could not run the analysis after case converter. I attached the screenshots and would be glad if you can tell me where the error might be?

Thank you!
Begum

But what is the error message? Check the log, or mouseover the yellow triangle in the traffic light.

Is it possible for you to check if I am doing the right thing for table creater and stop word filter?

I really appreciate your help!

At a glance, it looks correct. But since you’re still running into problems, I need you to relay the error message itself. Alternatively, you can upload your current workflow (along with some data - important!).

Here is my workflow and a sample from my data! Even though I ran the workflow successfully, I am still having “http” as one of my topics but since this does not have a meaning topicwise, I want to exclude this from my topics. This is also showing on the tagcloud I created.

Thank you very much.

topic extraction beg 1.knwf (329.5 KB)

tweets 03.11.20-03.12.20.xlsx (730.8 KB)

When I run your workflow, I don’t see http as one of the terms in the two topics:

2020-04-09 15_57_44-Topic terms - 0_296 - Topic Extractor (Parallel LDA) (Extract topics from)

And when I look at the sentences generated after postprocessing, I don’t see any that have " http " in them (note the spaces). There are strings that have http as a part of them - stuff like httpwwwcnncom that are a result of stripping pronunciation - but those aren’t frequent enough that they would show up in the topics.

So I’m a bit confused, because as far as I can tell the workflow is doing what you have asked it to do. What am I missing?