Advice for creating a node to read unusual data in chunks

toblatp · June 23, 2018, 1:28am

Hello there,

I am creating a node which is for reading in an unusual file type, and the data in it can be very large. Too large to store in memory. I am thinking it may be appropriate to read the data in chunks so that the chunks can be processed in parallel.

Do I need to create the node specifically to interact with loop nodes? (ie must it accept some input to not lose track of where it is in the loop and just read from the start of the data).

I am new to using KNIME and am looking to see if anyone has any recommendations for this, so that the node can be made correctly.

Thanks so much for your help!

s.roughley · June 23, 2018, 10:59am

If I understand correctly, then it sounds like your safest bet would be to write the node as a loop start. Then each loop iteration can correctly read the next chunk of your data, process it downstream in the loop body in whatever way you want, and then move on to the next loop iteration.

You might find some useful utility methods in the Vernalis plugin particularly in:

https://community.knime.org/svn/nodes4knime/trunk/com.vernalis/com.vernalis.knime.io/

and

https://community.knime.org/svn/nodes4knime/trunk/com.vernalis/com.vernalis.knime.core/src/com/vernalis/io/ (The FileHelpers class in particular)

The #execute() method is called on each iteration in a loop start, so probably have fields for e.g. BufferedReader, boolean for whether the end of the file has been reached, and ints for the number of rows to process per loop iteration and the current iteration. During #configure() you can reset the reader, counters and boolean, and during #execute() check whether this is the first iteration (in which case you need to create the reader) and then have something like

int rowReadInBlock=0;
String line;
while ((line=br.readLine())!=null && (rowReadInBlock++)<maxRowsPerBlock){
    //Do whatever you want with the line here...
}

//Check if we hit the end
if(line==null){
    endOfFile=true;
    br.close();
}

Then you need to implement #terminateLoop() with something like

@Override
public boolean terminateLoop() {
    return endOfFile;
}

There might also be some other pointers in https://community.knime.org/svn/nodes4knime/trunk/com.vernalis/com.vernalis.knime.flowcontrol/src/com/vernalis/knime/flowcontrol/nodes/timedloops/abstrct/loopstart/AbstractTimedLoopStartNodeModel.java

Steve

toblatp · June 23, 2018, 9:26pm

Hi Steve,

Thanks so much for your thorough reply, I did not think of using it as a loop start node. I am going to try and implement this tomorrow.

Really appreciate the help!

P

s.roughley · June 23, 2018, 9:44pm

Good luck! Make sure too that you close an open reader and reset loop counters and other fields in the NodeModel#reset() method.

Steve

system · June 30, 2018, 9:44pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.