Concatenation based on date in file name

Pedro87 · November 25, 2022, 1:08am

Hello, good evening, I present the following query:

I have several files inside a folder called COLOC, where the format of the files is “.txt” and the name of each file is the following “10102_COLOC1_202201”, where 2022 represents the year and 01 the month of that file, then the other file is called “10102_COLOC1_202202”, where 2022 represents the year and 02 the month of that file, and so on.
All files have the same column structure, number of fields and data structure, finally these files do not have column names (image and files are attached)

10102_COLOC1_202208.txt (2.6 KB)
10102_COLOC1_202209.txt (2.8 KB)
10102_COLOC1_202210.txt (5.5 KB)

What I do is occupy the concatenate node to group the files in a single base. However, I need that at the moment that it is concatenated, I can identify the date expressed in YYYYMM for each identified and concatenated data (excel image of what I need is attached)

bruno29a · November 25, 2022, 3:40am

Hi @Pedro87 , the File Reader or CSV Reader nodes allows you to open several files into 1 table for you instead of you having to open each of them individually and concatenate them.

Choose the option “Files in folder”, and then browse to the folder you want the Reader to the read files from.

You also have the option to append the path of the file in a column, so with this option, you know which file the data is coming from.

FYI, you’ve not presented to us what the issue is though…

Pedro87 · November 25, 2022, 10:31pm

Hello @bruno29a, thanks for your comments, however I expressed myself badly from what I understand, so I present my problem:

Occupying the reader node in csv format to read a folder containing several files named as follows “10102_COLOC1_202201”, where 2022 represents the year and 01 the month of that file, then the other file is named “10102_COLOC1_202202”, where 2022 represents the year and 02 the month of that file, and so on.

The problem is that I don’t know how to add in a new column the periods of each file in YYYYMM format where YYYY is the year and MM is the month and that they correspond to the corresponding data

Example: for the file called 10102_COLOC1_202201 and 10102_COLOC1_202202 (an image of the expected result is attached)

KNIME_project.knwf (32.5 KB)

Col_1 Col_2 Col_3 Col_4
data_1 data_2 data_3 202201
data_4 data_5 data_6 202201
data_7 data_8 data_9 202202
data_10 data_11 data_12 202202

workflow attached

gonhaddock · November 25, 2022, 11:29pm

Hello @Pedro87

As @bruno29a already mentioned, you have the option to add the path from the file in a column…
You can convert the Path type column into a String with ‘Path to String’ node.

Then, as your file name is standardize, the position of characters in column $Column 58$ within your example, it is always starting 10 characters from the end of the path string and a length of 6 characters.

You can extract this character sequence from the string Path with the ‘String Manipulation’ node:

substr($Path$
	, length($Path$) - 10
	, 6)

BR

bruno29a · November 26, 2022, 4:30am

Thanks for clarifying @Pedro87 , then you can extract this info from the file path as @gonhaddock has suggested.

ArjenEX · November 26, 2022, 12:09pm

To add for future reference, KNIME has some direct functions related to Paths which takes away the requirement to convert it to a string first (available in Column Expression).

As such, you can also use:

right(replace(getFileName(column("Path")),getFileExtension(column("Path")),""),6)

Pedro87 · November 26, 2022, 1:23pm

@bruno29a , @gonhaddock and @ArjenEX

Thank you very much for your comments and help. I’ll start at the beginning to explain if what I’m doing is correct since I try to execute the workflow and I get an error

Step 1
In the attached image I have the csv reader node and I create a variable called “Path”

Step 2
I run the reader node with the created variable

Step 3
I connect the “Column Expression” node and execute the following codes

a) getFileName(variable(“Path”)) → correct result

b) getFileExtension(variable(“Path”)) → blank result (I don’t know why it doesn’t show me the path

c) right(replace(getFileName(column(“Path”)),getFileExtension(column(“Path”)),“”),6) → result error (I don’t know what I’m doing wrong)

ArjenEX · November 26, 2022, 1:56pm

Hi @Pedro87

We are all referring to this function under the advanced section of the CSV reader You should not go the flow variable route here since you are dealing with multiple files.

Pedro87 · November 26, 2022, 2:40pm

@ArjenEX , @gonhaddock and @gonhaddock

Dear, thank you very much for your support, I did not understand exactly what you were telling me since my language is not English and I have to translate the messages into Spanish and vice versa, therefore the accuracy of the messages is lost, since the google translate is very literal

ArjenEX · November 26, 2022, 7:04pm

Tip: there is a dedicated section on this forum for Spanish speaking persons. There are some people around that can help you out in your own language

mlauber71 · November 27, 2022, 8:25am

For translations you might want to try deepl.com. My impression is the translations are much better.
————-
Para las traducciones, puede probar deepl.com. Mi impresión es que las traducciones son mucho mejores.

Para traduções pode querer experimentar o deepl.com. A minha impressão é que as traduções são muito melhores.

Pour les traductions, vous pouvez essayer deepl.com. J’ai l’impression que les traductions sont bien meilleures.

Per le traduzioni potreste provare deepl.com. La mia impressione è che le traduzioni siano molto migliori.

关于翻译，你可能想试试 deepl.com。我的印象是翻译的效果要好得多。

A fordításokhoz érdemes kipróbálni a deepl.com-ot. Az a benyomásom, hogy a fordítások sokkal jobbak.

system · December 4, 2022, 8:26am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.