1. If a title could not be extracted from the PDF the file path is used as title. The title is treated as any other text (terms) in the document and will appear as terms.
2. The number filter filters only terms that represent only numbers e.g. 1234 will be filtered whereas abc123 will not be filtered.
3. You need to filter the data set before creating a tag cloud e.g. using the Row Filter node. The tag cloud node allows no filtering.
Tipp: use the Bag of words creator after the filtering and preprocessing to apply these operation directly on the documents inseated of the bow. This will increase speed.
actually, to have the tag cloud filter words on a separate port based on a term weight would be a nice and intuitive extension, though completely redundant given row filter. Still... :-)