I try to read in several text files from a directory.
Every document contains strings which I want to analyse.
Using Flat File Document Parser this works fine to get documents.
Is there a way to get the file names and created date as well as meta information into the documents or can this be added at a later step?
with the Document Data Extractor you can extract the file path from parsed documents.
There is no node to get the creation date. So you would need to use a Java Snippet node and extract the creation date via Java code. Maybe this can help: http://stackoverflow.com/questions/21033928/how-to-get-proper-file-creation-date-of-file.
just a small addition: one can also easily encode some file features in the filename itself e.g. by ARen (Advanced Renamer, freeware). There is a vast diversity of file features from simple creation or modification date to numbering within folders or read out device with which an image was recorded with and all other types of exif information that can be read out and e.g. be appended to the file name.
In a second step these features (written to file name) can be easily extracted if consisted separators are used with ARen.
I have huge taxt files with same foramte and without delimiter like below.
4243919840103 00000001 000770600013RGT-WAY DED 00000000 CLARK FRANCIS
4243919850102 00000001 000804602044POWER ATTY 00000000 CHANCELLOR FANNIE
4243919890103 00000001 000947500944WARNTY DEED 00000000 MBANK MIDCITIES N
I built workflow using "List Files , Table Row To Variable Loop Start , File Reader, Loop End nodes" respectively for pulling all txt files. Here i am getting large number of duplicate values.Please help me out.