create node with output of 2 different tables

Hi -

Apologies in advance for the very basic nature of these questions.

  1. Is there a way to search just the forums?

  2. I’m creating a custom node read the text from PDF document. (I think I’m not re-inventing the wheel). I’m reading in the table of contents and each page of text. I’d like my node to have 2 outputs
    a) table of the table of contents (section title[string], page[int])
    b) table of the pages (page number[int], page text[string])

When I try to do this by building the specs for each table, having exec.createDataContainer create a container, then I fill that container, then I close the container, then I get the table.

however, the system falls down mysteriously when I’m part way through adding data to the second table. It suddenly doesn’t like the face that the types of the columns don’t match up (1st column is string for table 1, int for table 2). Reversing the order of the columns of one the tables causes everything to work fine…

Is there a way to output 2 different schema’d tables? It appears that the basic statistics node does this - any pointers on how I can find the source code for that? My code is:

BufferedDataTable[] bufDataTable = new BufferedDataTable[2];
    //read the pdf document into memory
    String docFilePath = "C:/docs/buildDrugWiki/buildDrugWiki3241/sourceDocs/SPL-133 Study Report.pdf";
    File docFile = new File(docFilePath);
    PdfDocument pdfDoc = new PdfDocument(docFile);
    
    
    //build data table containing information about table of contents
    DataColumnSpec[] tocColSpecs = new DataColumnSpec[2];
    tocColSpecs[0] = 
        new DataColumnSpecCreator("Section Title", StringCell.TYPE).createSpec();
    tocColSpecs[1] = 
        new DataColumnSpecCreator("Page Number", IntCell.TYPE).createSpec();
    DataTableSpec tocOutputSpec = new DataTableSpec(tocColSpecs);
    // the execution context will provide us with storage capacity, in this
    // case a data container to which we will add rows sequentially
    // Note, this container can also handle arbitrary big data tables, it
    // will buffer to disc if necessary.
    BufferedDataContainer tocContainer = exec.createDataContainer(tocOutputSpec);
    
    for (String tocTitle : pdfDoc.getTableOfContents().keySet()) {
    	int pageNum = pdfDoc.getTableOfContents().get(tocTitle);
    	
    	RowKey key = new RowKey(tocTitle);
    	
    	DataCell[] cells = new DataCell[2];
    	cells[0] = new StringCell(tocTitle);
    	cells[1] = new IntCell(pageNum);
    	DataRow row = new DefaultRow(key, cells);
    	tocContainer.addRowToTable(row);
    }
    
    // once we are done, we close the container and return its table
    tocContainer.close();
    bufDataTable[0] = tocContainer.getTable();
    

    //build data table containing information about table of contents
    DataColumnSpec[] pagesColSpecs = new DataColumnSpec[2];
    pagesColSpecs[0] = 
    	new DataColumnSpecCreator("Page Number", IntCell.TYPE).createSpec();
    pagesColSpecs[1] = 
    	new DataColumnSpecCreator("Page Text", StringCell.TYPE).createSpec();
    DataTableSpec pagesOutputSpec = new DataTableSpec(tocColSpecs);
    // the execution context will provide us with storage capacity, in this
    // case a data container to which we will add rows sequentially
    // Note, this container can also handle arbitrary big data tables, it
    // will buffer to disc if necessary.
    BufferedDataContainer pagesContainer = exec.createDataContainer(tocOutputSpec);

    int pageNum = 0;
    for (String curPage : pdfDoc.getPages()) {
    	RowKey key = new RowKey(Integer.toString(pageNum));
    	
    	DataCell[] cells = new DataCell[2];
    	cells[0] = new IntCell(pageNum);
    	cells[1] = new StringCell(curPage);
    	DataRow row = new DefaultRow(key, cells);
    	tocContainer.addRowToTable(row);
    	
    	pageNum++;
    }
    pagesContainer.close();
    bufDataTable[1] = pagesContainer.getTable();
    
    return bufDataTable;

Edit: changed subject to be something more meaningful for search/browsing

Why don’t you create the second BufferedDataContainer with the pagesOutputSpec instead of the tocOutputSpec :wink:

Not only the BufferedDataContainer, also the DataTableSpec is initiallized with “tocColSpecs”. So there are two errors (at least).

Ahh, whoops - sorry! wrong version of the code. I had fixed those errors but still had the problem. The “right” version of the code is below. Here’s the key point: for tocColSpecs, if I swap the the indices, it works. Another indication of the problem is that the column headers (e.g. “Info A”) set for tocColSpecs are the ones that show up for the 2nd table (should be “Info B”).

Here’s the error message:
ERROR ReadPdf Execute failed: Runtime class of object “1” (index 0) in row “ADDITIONAL RESPONSIBLE PERSON(S)” is IntCell and does not comply with its supposed superclass StringCell

here’s the “right” code:

BufferedDataTable[] bufDataTable = new BufferedDataTable[2];
    //read the pdf document into memory
    String docFilePath = "C:/docs/buildDrugWiki/buildDrugWiki3241/sourceDocs/SPL-133 Study Report.pdf";
    File docFile = new File(docFilePath);
    PdfDocument pdfDoc = new PdfDocument(docFile);
    
    
    //build data table containing information about table of contents
    DataColumnSpec[] tocColSpecs = new DataColumnSpec[2];
    tocColSpecs[1] = 
        new DataColumnSpecCreator("Page Number", IntCell.TYPE).createSpec();
    tocColSpecs[0] = 
        new DataColumnSpecCreator("Info A", StringCell.TYPE).createSpec();
    DataTableSpec tocOutputSpec = new DataTableSpec(tocColSpecs);
    // the execution context will provide us with storage capacity, in this
    // case a data container to which we will add rows sequentially
    // Note, this container can also handle arbitrary big data tables, it
    // will buffer to disc if necessary.
    BufferedDataContainer tocContainer = exec.createDataContainer(tocOutputSpec);
    
    for (String tocTitle : pdfDoc.getTableOfContents().keySet()) {
    	int pageNum = pdfDoc.getTableOfContents().get(tocTitle);
    	
    	RowKey key = new RowKey(tocTitle);
    	
    	DataCell[] cells = new DataCell[2];
    	cells[1] = new IntCell(pageNum);
    	cells[0] = new StringCell(tocTitle);
    	DataRow row = new DefaultRow(key, cells);
    	tocContainer.addRowToTable(row);
    }
    
    // once we are done, we close the container and return its table
    tocContainer.close();
    bufDataTable[0] = tocContainer.getTable();
    

    //build data table containing information about table of contents
    DataColumnSpec[] pagesColSpecs = new DataColumnSpec[2];
    pagesColSpecs[0] = 
    	new DataColumnSpecCreator("Page Number", IntCell.TYPE).createSpec();
    pagesColSpecs[1] = 
    	new DataColumnSpecCreator("Info B", StringCell.TYPE).createSpec();
    DataTableSpec pagesOutputSpec = new DataTableSpec(tocColSpecs);
    // the execution context will provide us with storage capacity, in this
    // case a data container to which we will add rows sequentially
    // Note, this container can also handle arbitrary big data tables, it
    // will buffer to disc if necessary.
    BufferedDataContainer pagesContainer = exec.createDataContainer(pagesOutputSpec);

    int pageNum = 0;
    for (String curPage : pdfDoc.getPages()) {
    	RowKey key = new RowKey(Integer.toString(pageNum));
    	
    	DataCell[] cells = new DataCell[2];
    	cells[0] = new IntCell(pageNum);
    	cells[1] = new StringCell(curPage);
    	DataRow row = new DefaultRow(key, cells);
    	pagesContainer.addRowToTable(row);
    	
    	pageNum++;
    }
    pagesContainer.close();
    bufDataTable[1] = pagesContainer.getTable();
    
    return bufDataTable;

Nevermind. I see it now. Even in the “right” code I was still making the error of using tocColSpec to initialize pageSpec.

Thanks for looking over my code, I really appreciate it.