Text Files

CarishmaM · August 17, 2021, 3:07pm

Hi,

I am trying to read and parse this text file through Knime. Previously, I used Monarch.

It seems that there are multiple deliminators.

takbb · August 17, 2021, 4:10pm

Your post is quite general. Are you running into problems, or maybe wondering which nodes to use?

If you provide some specifics, and ideally maybe a sample of the text file, outlining your requirements, I’m sure somebody will be happy and able to assist.

CarishmaM · August 17, 2021, 5:39pm

I was just wondering which nodes I would be able to use. I tried using the / or , delimeter and I still cannot understand the data.

88, PAYMENTS 9758 EOS CCA
16,145,178482,S,119135,000,000,00000000000/
88, PAYMENTS 9758 EOS CCA

The data is somewhat like the above. When I tried converting with excel, it seems like some of “,” are actually decimal points.

ipazin · September 2, 2021, 10:10am

Hello @CarishmaM,

have you had any progress reading above data into KNIME? Data format indeed seems a bit confusing. Maybe you can try using Line Reader node followed by regex to separate decimal comma from separator comma. For example changing separator comma to another separator character that is not present in your data.

Br,
Ivan

bruno29a · September 2, 2021, 12:18pm

Hi @CarishmaM , I think you should reach out to the owner of the data and ask what is the structure of the data.

Like how many columns are there, and what the column headers would represent.

It also looks like there might be some sort of row delimiter, since the data “88, PAYMENTS 9758 EOS CCA” is repeated.

It would seem that, even if you do get the information about the columns, the structure is not as straight forward, so you might end up having to do what @ipazin said - use Line Reader to process one line at a time, and apply some if conditions if necessary, and apply whatever regex might be needed.

CarishmaM · September 14, 2021, 5:37pm

Hi @Bruno19a,

I am the owner of the data, so I know the columns and header information. I will see what I can do with Line Reader.

bruno29a · September 14, 2021, 6:50pm

Hi @CarishmaM , I’m not sure I understand… If you are the owner of the data, then you should know the structure or the data and of the file. But you seemed unsure based on your statement: “It seems that there are multiple deliminators”. That would be our line as we don’t know your data.

You also mentioned “I tried using the / or , delimeter and I still cannot understand the data”. If the owner of the data cannot understand the data, there is a bigger chance that we won’t understand the data either in that case

CarishmaM · September 14, 2021, 7:23pm

In my initial post, I mentioned using Monarch which is a data manipulator software. An SQL that manipulates the data for me was provided. I know how my data needs to look. However, because I am switching to Knime, I am working on finding nodes or suggestions that can help me manipulate the txt data as such.

Thank you for your suggestion.

bruno29a · September 14, 2021, 7:43pm

Hi @CarishmaM , if you can show us a sample of the original data, and how you want it to look after processing and explain what the logic/rules you want to apply, we can suggest what nodes to use.

We’re happy to help if we can

CarishmaM · September 15, 2021, 11:48am

This is the example of the data I previously posted:

88, PAYMENTS 9758 EOS CCA
16,145,178482,S,119135,000,000,00000000000/
88, PAYMENTS 9758 EOS CCA

There should be 4 columns. Date, Account Number, Trans Code, Amount. For example in the data above, the trans code is 145.

bruno29a · September 15, 2021, 12:04pm

Hi @CarishmaM , I think I figured it out. The first 2 lines are actually 1 line!

It’s your comment about 145 being the Trans Code that clarified it:
88, PAYMENTS 9758 EOS CCA 16,145,178482,S,119135,000,000,00000000000/

However, there seem to be commas within the data. Is it possible to enclose the data in quotes when generating this file? That’s usually the common practice when you are using comma as separator.

CarishmaM · September 30, 2021, 11:15am

No, this the original data that the database uploads. Is there a node you would recommend that would combine lines automatically?

ipazin · October 1, 2021, 11:08am

Hello @CarishmaM,

if each line is separated into two lines then use Math Formula node to add same ID to each two rows (expressions for it is ($$ROWINDEX$$ + 1) / 2 with Convert to Int option checked) and then use GroupBy node where grouping column is this new column and use concatenate aggregation on your original column with appropriate delimiter (seems space or even no delimiter is needed).

Br,
Ivan

system · April 1, 2022, 11:08pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.