I am creating a node which is for reading in an unusual file type, and the data in it can be very large. Too large to store in memory. I am thinking it may be appropriate to read the data in chunks so that the chunks can be processed in parallel.
Do I need to create the node specifically to interact with loop nodes? (ie must it accept some input to not lose track of where it is in the loop and just read from the start of the data).
I am new to using KNIME and am looking to see if anyone has any recommendations for this, so that the node can be made correctly.
If I understand correctly, then it sounds like your safest bet would be to write the node as a loop start. Then each loop iteration can correctly read the next chunk of your data, process it downstream in the loop body in whatever way you want, and then move on to the next loop iteration.
You might find some useful utility methods in the Vernalis plugin particularly in:
The #execute() method is called on each iteration in a loop start, so probably have fields for e.g. BufferedReader, boolean for whether the end of the file has been reached, and ints for the number of rows to process per loop iteration and the current iteration. During #configure() you can reset the reader, counters and boolean, and during #execute() check whether this is the first iteration (in which case you need to create the reader) and then have something like
int rowReadInBlock=0;
String line;
while ((line=br.readLine())!=null && (rowReadInBlock++)<maxRowsPerBlock){
//Do whatever you want with the line here...
}
//Check if we hit the end
if(line==null){
endOfFile=true;
br.close();
}
Then you need to implement #terminateLoop() with something like
@Override
public boolean terminateLoop() {
return endOfFile;
}