Hi -
Apologies in advance for the very basic nature of these questions.
-
Is there a way to search just the forums?
-
I’m creating a custom node read the text from PDF document. (I think I’m not re-inventing the wheel). I’m reading in the table of contents and each page of text. I’d like my node to have 2 outputs
a) table of the table of contents (section title[string], page[int])
b) table of the pages (page number[int], page text[string])
When I try to do this by building the specs for each table, having exec.createDataContainer create a container, then I fill that container, then I close the container, then I get the table.
however, the system falls down mysteriously when I’m part way through adding data to the second table. It suddenly doesn’t like the face that the types of the columns don’t match up (1st column is string for table 1, int for table 2). Reversing the order of the columns of one the tables causes everything to work fine…
Is there a way to output 2 different schema’d tables? It appears that the basic statistics node does this - any pointers on how I can find the source code for that? My code is:
BufferedDataTable[] bufDataTable = new BufferedDataTable[2];
//read the pdf document into memory
String docFilePath = "C:/docs/buildDrugWiki/buildDrugWiki3241/sourceDocs/SPL-133 Study Report.pdf";
File docFile = new File(docFilePath);
PdfDocument pdfDoc = new PdfDocument(docFile);
//build data table containing information about table of contents
DataColumnSpec[] tocColSpecs = new DataColumnSpec[2];
tocColSpecs[0] =
new DataColumnSpecCreator("Section Title", StringCell.TYPE).createSpec();
tocColSpecs[1] =
new DataColumnSpecCreator("Page Number", IntCell.TYPE).createSpec();
DataTableSpec tocOutputSpec = new DataTableSpec(tocColSpecs);
// the execution context will provide us with storage capacity, in this
// case a data container to which we will add rows sequentially
// Note, this container can also handle arbitrary big data tables, it
// will buffer to disc if necessary.
BufferedDataContainer tocContainer = exec.createDataContainer(tocOutputSpec);
for (String tocTitle : pdfDoc.getTableOfContents().keySet()) {
int pageNum = pdfDoc.getTableOfContents().get(tocTitle);
RowKey key = new RowKey(tocTitle);
DataCell[] cells = new DataCell[2];
cells[0] = new StringCell(tocTitle);
cells[1] = new IntCell(pageNum);
DataRow row = new DefaultRow(key, cells);
tocContainer.addRowToTable(row);
}
// once we are done, we close the container and return its table
tocContainer.close();
bufDataTable[0] = tocContainer.getTable();
//build data table containing information about table of contents
DataColumnSpec[] pagesColSpecs = new DataColumnSpec[2];
pagesColSpecs[0] =
new DataColumnSpecCreator("Page Number", IntCell.TYPE).createSpec();
pagesColSpecs[1] =
new DataColumnSpecCreator("Page Text", StringCell.TYPE).createSpec();
DataTableSpec pagesOutputSpec = new DataTableSpec(tocColSpecs);
// the execution context will provide us with storage capacity, in this
// case a data container to which we will add rows sequentially
// Note, this container can also handle arbitrary big data tables, it
// will buffer to disc if necessary.
BufferedDataContainer pagesContainer = exec.createDataContainer(tocOutputSpec);
int pageNum = 0;
for (String curPage : pdfDoc.getPages()) {
RowKey key = new RowKey(Integer.toString(pageNum));
DataCell[] cells = new DataCell[2];
cells[0] = new IntCell(pageNum);
cells[1] = new StringCell(curPage);
DataRow row = new DefaultRow(key, cells);
tocContainer.addRowToTable(row);
pageNum++;
}
pagesContainer.close();
bufDataTable[1] = pagesContainer.getTable();
return bufDataTable;
Edit: changed subject to be something more meaningful for search/browsing