Load Internal Node Dictionary as Text File within Plugin

How do I load a large custom dictionary into my node at runtime?

I am doing some natural language processing and I have a large Chinese dictionary that I want to load into my custom node. Here's my code:

URL urlChineseDictionary = this.getClass().getResource("../data/ChineseDictionary.txt");
try {
    URI uriChineseDictionary = urlChineseDictionary.toURI();
    File fileChineseDictionary = new File( uriChineseDictionary );
    BufferedReader fileBufferedReader = new BufferedReader(
        new InputStreamReader(
            new FileInputStream(fileChineseDictionary), "UTF8") );

    while( ( readLine = fileBufferedReader.readLine() ) != null ){
        //read the file line by line and load it into an internal data structure

While this seems to be the standard way Java reads in files, it doesn't seem to work with KNIME when the node is a plugin. The exception I get is:

"URI scheme is not file"

In fact, the URI says that it is a bundleresource (bundleresource://629.fwk551897230/com/...)

I found another note that says: "This is not possible using a file path, as all resources in a plugin are resolved using a bundle relative URL (and the bundle itself might be deployed as jar file, so there even isn't a normal file)".

I also found this other link (http://blog.vogella.com/2010/07/06/reading-resources-from-plugin/) that seems useful. It says:

Frequenty you want to store static files in your bundle and load them from your bundle. For this you can use the following code:

URL url;
try {
    url = new URL("platform:/plugin/de.vogella.rcp.plugin.filereader/files/test.txt");
    InputStream inputStream = url.openConnection().getInputStream();
    BufferedReader in = new BufferedReader(new InputStreamReader(inputStream));
    String inputLine;

    while ((inputLine = in.readLine()) != null) {
        System.out.println(inputLine);
    }
 
    in.close();
 
} catch (IOException e) {
    e.printStackTrace();
}

So the trick seems to be the URL right. But what is a typical URL for KNIME nodes? How do I figure out the URL that will point to my text file from within the node plugin?

 

We're usually doing something like this, to get a URL for a file within a Plugin:

​IPath path = new Path(inputPath);
URL bundleUrl = FileLocator.find(getBundle(), path, null);
if (bundleUrl == null) {
	LOGGER.debug("bundleUrl == null");
	return null;
}
URL fileUrl;
try {
	fileUrl = FileLocator.toFileURL(bundleUrl);
} catch (IOException e) {
	LOGGER.debug("could not get file url for " + bundleUrl, e);
	return null;
}
// and so on; open an InputStream or convert to a File URL (only works, when Plugin is uncompressed during installation).

getBundle() is in your Plugin's Activator class (so it's convenient to implement that method in that place). inputPath is an absolute path for your plugin (e.g. /data/dictionary.txt).

P.

...and if you only need an InputStream you can directly use URL.openStream without having to resolve physical path.

And instead of using the bundle activator (which in most cases does not exist), you can use FrameworkUtil.getBundle (geClass()) to get a reference to the bundle.

I finished developing my Chinese Natural Language Processing node. Everything worked fine in the development environment but when I exported the node I ran into this problem again. Here is my code that was working in the development environment:

public static final String INTERNAL_CHINESE_DICTIONARY_CLASSPATH_LOCATION = "/com/scientificstrategy/marketmodel/common/data/ChineseDictionary.txt";
...

//Create a new Chinese Dictionary May
m_mapChineseDictionary = new ChineseHashMap<ChineseWord, String>();

try {
	//Open the text file as an InputStream as part of the KNIME plugin bundle
	URL urlChineseDictionary = this.getClass().getResource(INTERNAL_CHINESE_DICTIONARY_CLASSPATH_LOCATION);
	if( urlChineseDictionary != null ) {
		logger.debug( "loadInternalChineseDictionary: urlChineseDictionary = " + urlChineseDictionary.toString());
	} else {
		logger.debug( "loadInternalChineseDictionary: urlChineseDictionary = null");
	}
	
	InputStream inputStream = urlChineseDictionary.openStream();
	InputStreamReader inputStreamReader = new InputStreamReader(inputStream, StandardCharsets.UTF_8);
	BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
					
	//Read each line of the Chinese Dictionary input and split into Chinese Word versus English Translation
	String readLine;
	int whitespace = 0;
	int debugLineCount = 0;
	while( ( readLine = bufferedReader.readLine() ) != null ){
		
		//Skip to the next line if this line is empty
		if( readLine.length() == 0)
			continue;
			...

I opted for thor's first recommendation as it seemed easier and I thought I only needed an InputStream. But I may have misinterpreted what he was saying. When I export then run this node I get my debug message:

loadInternalChineseDictionary: urlChineseDictionary = null

My URL getResource instruction couldn't find the txt file.

Retrying I modified the code to look like this based upon qqilihq's code but substituting in thor's getBundle:

import org.eclipse.core.runtime.FileLocator;
import org.eclipse.core.runtime.IPath;
import org.eclipse.core.runtime.Path;
import org.knime.core.node.NodeLogger;
import org.osgi.framework.Bundle;
import org.osgi.framework.FrameworkUtil;
...

IPath pathChineseDictionary = new Path(INTERNAL_CHINESE_DICTIONARY_CLASSPATH_LOCATION);
Bundle pluginBundle = FrameworkUtil.getBundle( getClass() );

//Get the URL of the Bundle
URL urlPluginBundle = FileLocator.find(pluginBundle, pathChineseDictionary, null);
if (urlPluginBundle == null) {
	if( logger != null ) logger.debug( "loadInternalChineseDictionary: urlPluginBundle = null");
	return;
}

//Get the URL of the Chinese Dictionary Text File
URL urlChineseDictionary;
try {
	urlChineseDictionary = FileLocator.toFileURL( urlPluginBundle );
} catch (IOException exception) {
	if( logger != null ) logger.debug( "loadInternalChineseDictionary: urlChineseDictionary = null", exception);
	return;
}

But now I can't get past the first step as the URL of the Plugin Bundle urlPluginBundle = null.

Any ideas?

I assume you have exported your resources to the generated bundle (in the build.properties file), you can check that in the jar file whether they are present or not.

In case you are in a package, you should use path relative to the package of getClass() (for getResource()), I am not sure whether this absolute reference should work.

A bit more fiddling and I got it. Here is my final code with very few changes:

public static final String INTERNAL_CHINESE_DICTIONARY_BUNDLE_LOCATION = "data/ChineseDictionary.txt";
...

//Create a new Chinese Dictionary May
m_mapChineseDictionary = new ChineseHashMap<ChineseWord, String>();

try {
	//Open the text file as an InputStream as part of the KNIME plugin bundle
	IPath pathChineseDictionary = new Path(INTERNAL_CHINESE_DICTIONARY_BUNDLE_LOCATION);
	Bundle pluginBundle = FrameworkUtil.getBundle( getClass() );
	
	//Get the URL of the Bundle
	URL urlPluginBundle = FileLocator.find(pluginBundle, pathChineseDictionary, null);
	if (urlPluginBundle == null) {
		if( logger != null ) logger.debug( "loadInternalChineseDictionary: urlPluginBundle = null");
		return;
	}

	//Get the URL of the Chinese Dictionary Text File
	URL urlChineseDictionary;
	try {
		urlChineseDictionary = FileLocator.toFileURL( urlPluginBundle );
	} catch (IOException exception) {
		if( logger != null ) logger.debug( "loadInternalChineseDictionary: urlChineseDictionary = null", exception);
		return;
	}
	
	//Open the Input Stream
	InputStream inputStream = urlChineseDictionary.openStream();
	InputStreamReader inputStreamReader = new InputStreamReader(inputStream, StandardCharsets.UTF_8);
	BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
	
	//Read each line of the Chinese Dictionary input and split into Chinese Word versus English Translation
	String readLine;
	while( ( readLine = bufferedReader.readLine() ) != null ){

And for completeness here is my build.properties:

source.. = src/
output.. = bin/
bin.includes = META-INF/,\
               .,\
               plugin.xml,\
               icons/,\
               src/,\
               lib/apache/commons-collections4-4.0/commons-collections4-4.0.jar,\
               lib/apache/commons-math3-3.4.1/commons-math3-3.4.1.jar,\
               lib/apache/commons-lang3-3.4/commons-lang3-3.4.jar,\
               data/

This was a frustrating exercise so I'll try to be verbose with my explanation. Interestingly I also found this link listing the "top 53 voted examples" on how to use org.eclipse.core.runtime.FileLocator, so its obviously an area that confuses a lot of people:

http://www.programcreek.com/java-api-examples/org.eclipse.core.runtime.FileLocator

My main problem was in finding a suitable place to put the ChineseDictionary.txt file. Originally my package was (com.scientificstrategy.marketmodel.common.language) and I had the following file structure (trying to keep all of my future data files together):

/.
_/src
__/packages
___/common
____/language
_____/ChineseDictionary.java
____/data
_____/ChineseDictionary.txt

So my relative path from my java class to the text file was: /../data/ChineseDictionary.txt

But the instruction:

IPath pathChineseDictionary = new Path(INTERNAL_CHINESE_DICTIONARY_CLASSPATH_LOCATION)

seems to drop the ".." part of the path.

So instead I created a top-level data directory:

/.
_/src
_/data
__/ChineseDictionary.txt

and ensured that the /data directory was included in the build.properties. That seemed to do the trick. Thanks for the help!

If you start the path with "/" it's an absolute path starting at the plug-in root.

ah - yes - I could have easily made that mistake also. That's a great tip to remember for next time!