I’m trying to figure out how to read data from a not delimited file. I can do it from SAS, R or even Oracle. But cannot see any Node to do this task.
Just to example: There are 2 files: 1) with the data itself and 2) with the metadata about how the data is arranged inside the data file.
from column 1 to 4 seq;
from 5 to 8 year;
from 9 to 10 month
and so on!
Thanks in advance!!!
Hi @EduardoRio , I’m presuming the file is still divided into “records”, delimited by a carriage return and/or line feed, but from what you are describing is fixed format (the type of thing that an old cobol system might produce perhaps?)
If that’s the case there is a Line Reader node
You could combine this with a Cell Splitter by Position.
If you have varying record types you would probably need to split these out based on a record type identifier first.i
Great, I think you’ve got the point precisely. The data file come from a mainframe, we’re used to call those type of datasets as “positional”. A bit hard to use, but very economic and fast on the other hand.
Yes, each line has a carriage return, so Line Node should be able to handle with it
And Cell Spliter should do the other task.
And, yes I have a file with record for types and columns names . In fact, created for data import for SAS. But I think that with some workaround I could use it.
But I’m sorry I can’t get it " probably need to split these out based on a record type identifier first.i"
But Thanks a lot I’ll have a lot to do from know!!
Since the data file has no header line with the name of the columns and there’s another file with the names and types.
So, I would ask: is there some Node to this kind of job to use these information and put it on the data file structure?
Example: Using SAS The import feature allow to point to de data file, specify the type of data and use either a file with data structure information as types, names and sizes/positions on the process. They mix the information in the import procedure to create a structured file with the data.
You could take a look at the variable configurations of the line reader. Depending on what it support you could create variables from the config and then use them in the line reader for configuration.
Sorry @EduardoRio , it looks like I didn’t quite finish the line! What I meant to say was that if you have multiple record types, then the Cell Splitter would need to be configured differently for each. So I’d therefore assume you’d need to split your flow (e.g. using a series of row splitters) so that there would be different branches for each record type.
You’d then have a Cell Splitter on each branch to create the required columns for the records contained within that branch. As per @Daniel_Weikert 's comment, you could probably configure each Cell Splitter using variable values derived from whatever configuration file you have.
If you can upload some sample (dummy) examples of the “data file” and “structure file”, we could possibly give some additional hints on how you might be able to define the config for the Cell Splitter.
please tell me you do this for living @takbb
The time you invest here to help is simly awesome. kudos and br
I do agree!
BTW, Thank you guys for the support.
I’ll put an example of the data and the "structure created for SAS import procedure.
And let me explain why I’m doing it. I’m used to create information on Oracle (BI tools) and SAS, some little in Tableau as well.
About 1 year ago I’ve Discovered Knime and became very well impressed. So, I’m learning how it works. I can understand(I think) the workflow and the Nodes structure. Already create a complete flow reading Oracle database, several tables, joins, etc and, as output created some Excel tables and graphics.
The data I’m trying to manipulate came from the Our census bureau, we call it as PNAD that stands for “National Program for domiciliary Sample” This kind of research is done every year in opposition of the census done each 10 years. It is open data and very actual with a lot of good information about the families work, health, wealth, etc.
But you have 2 ways of getting that data: 1) gross data (my case) or, 2)Final data - spreadsheets, already filtered, aggregated. But the second doesn’t allow you to do many of the simulations we probably shall do to stress the data and come up with more interesting information.
So, that it !! The bad is the primary effort to convert the text file on something more friendly is huge.The good is one time done you’ll use it for each new set of data.!!
Bellow the structure example. Please, note this one is used by SAS. With some work the it can be changed to allow Oracle importation.
“@0000 #” - is the initial position of the field;
" V0101" - is the column name on the file/database;
“$#” - is the type(string) and length - “#” - Type(number) and length;
“/Reference year/” - column name description
data.txt (1.5 KB)
srtucture.txt (1.9 KB)
I have this wf to import a ACL FIL file.
Maybe you need to change it a little.
Give it a try.
Import_FIL.knwf (106.5 KB)
Made this one
Import_Records.knwf (54.8 KB)
I think you’ve made it. I’ll try it today later and back to keep you noticed.
But, Looks like perfect.
THANKS A LOT!!!
Here to say the the wf works like a charm. Thanks again!!!.
Also, with your wf I’ve figured out how flow variables work.
You guys rocks!!!
I’m going to change the wf now to handle with another huge file and will put both in a database.
Half of the problem is already solved. The reading of the data.
Now I’m going to wonder how to put the data into a Oracle schema, with the same structure and columns names.
I know about the Oracle’ connector, DB Table Creator.
So, Just want to ask you guys: if it’s possible to use the same “String Widget” to “pass” the names and types (maybe size either) to the Table Creator??
Thanks again for the support!
Here to say that I’ve done it. Successfully create the table and import the data.
Btw, the table has been created with whole columns as string with 255 positions. I’m now trying to figure out how to set the types and lengths before the table creation.
Any tip will be welcomed!!!
I’ll put the wf here!
Try the node auto type cast, before inserting the data into the DB table
Can you share a picture of the workflow?
I’ll try this node.
And, I will share the workflow, for sure!
Thanks again !!!
And the result inside Oracle