How can I extract data from multiple sheets if the table position is different sometimes, but I know that the column I need will contain a specific texts at the top and at the bottom, so I want to get the rows between them plus the data in next 3 columns.
For best results, please post an example of both your input data, and what you expect your output to look like. Your description is a bit vague. Also, if you have an example workflow showing what youâve already tried, you could upload that as well.
I will try to make it clear now with excel samples I uploaded in Excel (inputs/output). I have over 200 files and some files have multiple sheet from where I need to extract the data.
There is no standard format for those tables I am trying to consolidate . The only thing it might help is to identify some fixed titles (text) to use them as a reference to extract the data I need
I used the module below to get the data between the rows I need, but I still have add manually the column number where my reference is exist, but I want to automate the full process, so KNIME can find the tables I need to extract the data. Plus I need to know how to duplicate one cell in a new column.
Sorry for the complexity and if I am still not clear, but I can provide more details if required, and if there is no solution let me know please.
Based on a quick scan of your problem, it seems quite complex - mostly because you can never be certain of the format of the incoming table. I realize thatâs not your fault, but if possible I would encourage who ever is providing you the data to standardize their formatting.
If that canât be done, you are likely going to need to do a bit of scripting using something like the OpenPxyl package in Python to search for and identify rows and columns of interest. @mlauber71 has several sample workflows dealing with that package on the KNIME Community Hub that may be of use to you. (Maybe you are already doing this data search in the Java Snippet above, but since you only provided a screenshot, Iâm not sure.)
Sorry that I canât provide more useful feedback for you.
@aalnabelsi I think you could do this with KNIME nodes by identifying the âblocksâ in the Excel Sheet and then applying the rules you described. The secret is in marking the start and possible end of the table/bock and then filling the empty lines with the missing values node as described here:
Please also note the linked threads there.
The sheets themselves might be identified by reading the sheets and selecting the ones that have the name part you described.
Since these seem to be common questions I think about writing a blog about them.
Thanks @ScottF. I am working to create a new template with standard format to simplify the process. For now I am trying to find a way to consolidate the old files (over 200 files)
The start point will be finding the block contains âAA CONTRACT NO.â Java Snippet node I shared can find the first and last blocks but I still have to specify the column. I want it find the blocks without specifying the column. I will look at the details you shared and see if I can find a solution.