Number of lines is different from what the txt file actually contains

I am reading a txt file and the number of lines read is less than the content of the file. I tried using the file reader, the read csv, and I’m not succeeding. I set it to read via Python Script and it’s reading everything. What am I doing wrong? I have read many identical files to this one and only this one is giving me this problem. Attached is a screenshot of the configuration I used on the file reader.

Using python script:

Have you tried with Support short data rows off? To be honest, your python script output looks quite weird. It’s omitting all the header information?

Your first two screenshots show everything being read in accordance with your last screenshot, or am I reading it wrong? Is that a precise point where data starts to be omitted??

@Aldemir there seem to be several tables in the file. You might have to identify the blocks and then read them skipping lines you do not need.

Something like this:

A sample file and an example of an expected result might help.

1 Like

import pandas as pd

arquivo = r"C:\Users\EFIM\OneDrive - PETROBRAS\Documents\Segment_RCE\Movimentação KE5Z 2020\1 TRI\estoque 0421.txt"

#caminho = r"C:\Users\EFIM\OneDrive - PETROBRAS\Documents\ateste\2021abr.parquet"

define o tipo de cada coluna como string

dtype = {col: str for col in range(0, 21)}

df = pd.read_csv(arquivo, sep=‘|’, encoding=‘ISO-8859-1’, skiprows=9, low_memory=False, dtype=dtype)

knio.output_tables[0] = knio.Table.from_pandas(df)

The begin is like this

16.04.2020 Saída dinâmica de lista 1

Ledger 8A
Área contab.custos ACPB
Empresa 1000
Período contábil 002
Exercício 2020
Versão 000

| Ano|Período|Cen.lucro |CnLcrParcs|Nº conta |Denominação |Cen.|Material |Tp.aval. |Texto | Quantidade|UMB| Em MCont.| Ano|Usuário |TD|Tipo|

|2020| 2|ATDTSEANGR| |1105100001|PETRÓLEO PRO PRÓPRIA|1055|PB.199 |PB19G |BESTD: débito/crédito material | 124,978-|M20| 13.866,97-|2020|Z550 | |ML |
|2020| 2|ATDTSEANGR| |1105100001|PETRÓLEO PRO PRÓPRIA|1055|PB.199 |PB1NG |BESTD: débito/crédito material | 154,188 |M20| 10.070,34 |2020|Z550 | |ML |
|2020| 2|GUPGNTECAB| |1105100001|PETRÓLEO PRO PRÓPRIA|0247|PB.199 |PB2CR |BESTD: débito/crédito material | 1.029,279-|M20| 281.647,22-|2020|Z550 | |ML |
|2020| 2|GUPGNTECAB| |1105100001|PETRÓLEO PRO PRÓPRIA|0247|PB.012 |PRODUZIDO |BESTD: débito/crédito material | 1.090,801-|M20| 275.344,63-|2020|Z550 | |ML |
|2020| 2|EE00000000| |1105100001|PETRÓLEO PRO PRÓPRIA|0630|PB.1ME |PRODUZIDO |BESTD: débito/crédito material | 1.355,539-|M20| 58.515,30-|2020|Z550 | |ML |
|2020| 2|ARREDUC000| |1105100001|PETRÓLEO PRO PRÓPRIA|1050|PB.199 |PB1QX |BESTD: débito/crédito material | 226,319 |M20| 0,27 |2020|Z550 | |ML |
|2020| 2|ARREDUC000| |1105100001|PETRÓLEO PRO PRÓPRIA|1050|PB.199 |PB06H |BESTD: débito/crédito material | 276,885 |M20| 0,32 |2020|Z550 | |ML |
|2020| 2|ARREDUC000| |1105100001|PETRÓLEO PRO PRÓPRIA|1050|PB.199 |PB15G |BESTD: débito/crédito material | 10.211,851-|M20| 1.406.487,71-|2020|Z550 | |ML |
|2020| 2|ARREDUC000| |1105100001|PETRÓLEO PRO PRÓPRIA|1050|PB.199 |PB29R |BESTD: débito/crédito material | 20.074,388 |M20| 1.376.827,28 |2020|Z550 | |ML |

I’m reading a folder with several files with this structure. The columns are separated by “|”. The first 9 lines are the beginning of the header. They have column names and a line full of dashes “-”. If I dont’t use “support short data rows” it causes a problem, I believe because of this line of “-”.

What’s even stranger is that I’m doing this process using the csv reader to read 10 years worth of files, and only one of them is causing this issue.

I forgot to mention that I am also trying to read only the specific file for the month/year that is causing the problem. It was at this point that I identified that the difference was due to the number of lines read.

@Aldemir I loaded you example. Maybe you can check it out:

You could also do this with the bundled Python version where you would determine the line where the header “| Ano|Período|Cen.lucro” starts …

1 Like

This feature of reading using Python solved the problem, thank you.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.