Reading text files not delimited (mainframe)

EduardoRio · September 18, 2023, 6:53pm

Hi there!
I’m trying to figure out how to read data from a not delimited file. I can do it from SAS, R or even Oracle. But cannot see any Node to do this task.

Just to example: There are 2 files: 1) with the data itself and 2) with the metadata about how the data is arranged inside the data file.
Something like:
from column 1 to 4 seq;
from 5 to 8 year;
from 9 to 10 month
and so on!

Thanks in advance!!!
Eduardo

takbb · September 18, 2023, 7:40pm

Hi @EduardoRio , I’m presuming the file is still divided into “records”, delimited by a carriage return and/or line feed, but from what you are describing is fixed format (the type of thing that an old cobol system might produce perhaps?)

If that’s the case there is a Line Reader node

You could combine this with a Cell Splitter by Position.

If you have varying record types you would probably need to split these out based on a record type identifier first.i

EduardoRio · September 18, 2023, 8:17pm

Hello Takbb
Great, I think you’ve got the point precisely. The data file come from a mainframe, we’re used to call those type of datasets as “positional”. A bit hard to use, but very economic and fast on the other hand.

Yes, each line has a carriage return, so Line Node should be able to handle with it
And Cell Spliter should do the other task.
And, yes I have a file with record for types and columns names . In fact, created for data import for SAS. But I think that with some workaround I could use it.
But I’m sorry I can’t get it " probably need to split these out based on a record type identifier first.i"
But Thanks a lot I’ll have a lot to do from know!!
Regards!

EduardoRio · September 19, 2023, 1:31pm

Hi,
Another question:
Since the data file has no header line with the name of the columns and there’s another file with the names and types.
So, I would ask: is there some Node to this kind of job to use these information and put it on the data file structure?
Example: Using SAS The import feature allow to point to de data file, specify the type of data and use either a file with data structure information as types, names and sizes/positions on the process. They mix the information in the import procedure to create a structured file with the data.

Daniel_Weikert · September 19, 2023, 3:02pm

You could take a look at the variable configurations of the line reader. Depending on what it support you could create variables from the config and then use them in the line reader for configuration.
br

takbb · September 20, 2023, 11:45am

Sorry @EduardoRio , it looks like I didn’t quite finish the line! What I meant to say was that if you have multiple record types, then the Cell Splitter would need to be configured differently for each. So I’d therefore assume you’d need to split your flow (e.g. using a series of row splitters) so that there would be different branches for each record type.

You’d then have a Cell Splitter on each branch to create the required columns for the records contained within that branch. As per @Daniel_Weikert 's comment, you could probably configure each Cell Splitter using variable values derived from whatever configuration file you have.

If you can upload some sample (dummy) examples of the “data file” and “structure file”, we could possibly give some additional hints on how you might be able to define the config for the Cell Splitter.

Daniel_Weikert · September 20, 2023, 3:47pm

please tell me you do this for living @takbb
The time you invest here to help is simly awesome. kudos and br

EduardoRio · September 20, 2023, 6:07pm

I do agree!
BTW, Thank you guys for the support.
I’ll put an example of the data and the "structure created for SAS import procedure.
And let me explain why I’m doing it. I’m used to create information on Oracle (BI tools) and SAS, some little in Tableau as well.
About 1 year ago I’ve Discovered Knime and became very well impressed. So, I’m learning how it works. I can understand(I think) the workflow and the Nodes structure. Already create a complete flow reading Oracle database, several tables, joins, etc and, as output created some Excel tables and graphics.
The data I’m trying to manipulate came from the Our census bureau, we call it as PNAD that stands for “National Program for domiciliary Sample” This kind of research is done every year in opposition of the census done each 10 years. It is open data and very actual with a lot of good information about the families work, health, wealth, etc.
But you have 2 ways of getting that data: 1) gross data (my case) or, 2)Final data - spreadsheets, already filtered, aggregated. But the second doesn’t allow you to do many of the simulations we probably shall do to stress the data and come up with more interesting information.
So, that it !! The bad is the primary effort to convert the text file on something more friendly is huge.The good is one time done you’ll use it for each new set of data.!!

Regards,
Eduardo

EduardoRio · September 20, 2023, 6:30pm

Bellow the structure example. Please, note this one is used by SAS. With some work the it can be changed to allow Oracle importation.
Where:
“@0000 #” - is the initial position of the field;
" V0101" - is the column name on the file/database;
“$#” - is the type(string) and length - “#” - Type(number) and length;
“/Reference year/” - column name description
data.txt (1.5 KB)
srtucture.txt (1.9 KB)

hmfa · September 21, 2023, 10:07am

Hi @EduardoRio.

I have this wf to import a ACL FIL file.

Maybe you need to change it a little.
Give it a try.
Import_FIL.knwf (106.5 KB)

Regards
Hugo

hmfa · September 21, 2023, 11:16am

Made this one

Import_Records.knwf (54.8 KB)

Regards

EduardoRio · September 21, 2023, 1:24pm

HI Hugo,
I think you’ve made it. I’ll try it today later and back to keep you noticed.
But, Looks like perfect.
THANKS A LOT!!!
Eduardo

EduardoRio · September 21, 2023, 4:05pm

Hello Hugo!
Here to say the the wf works like a charm. Thanks again!!!.
Also, with your wf I’ve figured out how flow variables work.

You guys rocks!!!

I’m going to change the wf now to handle with another huge file and will put both in a database.
Regards!

EduardoRio · September 27, 2023, 5:29pm

Hello,
Half of the problem is already solved. The reading of the data.
Now I’m going to wonder how to put the data into a Oracle schema, with the same structure and columns names.
I know about the Oracle’ connector, DB Table Creator.
So, Just want to ask you guys: if it’s possible to use the same “String Widget” to “pass” the names and types (maybe size either) to the Table Creator??
That’s it
Thanks again for the support!
Eduardo

EduardoRio · September 28, 2023, 6:56pm

Here to say that I’ve done it. Successfully create the table and import the data.

Btw, the table has been created with whole columns as string with 255 positions. I’m now trying to figure out how to set the types and lengths before the table creation.

Any tip will be welcomed!!!

I’ll put the wf here!

hmfa · September 28, 2023, 9:05pm

Hi, @EduardoRio
Try the node auto type cast, before inserting the data into the DB table

Can you share a picture of the workflow?

Regards
Hugo

EduardoRio · September 28, 2023, 9:59pm

Hi Hugo,
I’ll try this node.
And, I will share the workflow, for sure!
Thanks again !!!
Eduado

EduardoRio · September 28, 2023, 10:36pm

Here the wf

EduardoRio · September 28, 2023, 10:38pm

And the result inside Oracle

EduardoRio · September 28, 2023, 10:39pm

The data