"List file" node - how to get only filenames without path?

Docminus · August 21, 2014, 10:56pm

Am using "List file" to get, well, a list of files from a directory in a table.

With help of "Regex Split" I was able to extract the path into a separate column, pattern (windows):

(.*)\\.*$

see attached picture for the output.

What I can't get to work is extracting the file names (into a separate column). Any suggestions? Regex Split, Java snippet, whatever works is fine by me.

output.jpg

ImNotGoodSry · August 22, 2014, 9:07am

There's an URL to File Path node located under the Misc section in the node repository. This node will do the job.

urltofilepath.png

Docminus · August 22, 2014, 3:27pm

Oh, nice! Thank you so much!

joshuahoran · January 11, 2019, 3:43pm

I want to revisit this thread:

I often run these two nodes (“List FIles”->“URL to File Path”) to get a list of filenames in a folder. However, “URL to File Path” is SLOW when using cloud drive because it verifies that each file exists before breaking the URL down in its Path components. Since I just ran the List files node, I already know that the files exist and I don’t need to spend the extra 5-10 min waiting for “URL to File Path” to re-check everything. Does anyone know of an equivalent for “URL to File Path” that doesn’t perform the extra checking step?

Note: I am aware that I could use Regex split to manually decompose the URL, however I need a solution that is independent of the system (e.g. Windows, linux), so harcoding in specific path syntax (e.g. “” vs. “/”) isn’t what I’m looking for.

izaychik63 · January 11, 2019, 4:40pm

Try Cache node after List Files.

joshuahoran · January 14, 2019, 4:43pm

Thanks for the suggestion. I put a cache node between the “List Files” and “URL to File Path” and noticed no difference in behavior. I think this is because the “URL to File Path” will still perform file system calls to confirm each file exists regardless of whether the input table is cached.

prashantk · July 10, 2019, 6:36am

Dear @joshuahoran,

Did you find any solution for the same? I also facing same issue…

I want to read File Name along with the data in that file.

joshuahoran · July 10, 2019, 1:53pm

The only solution I found (but chose not to implement) was to roll my own RegEx Cell Split in order to manually parse the URLs into file paths and file names. The challenge here is that this is a platform specific task and I need my workflows to run across platforms, so a lot of extra logic based on reflection is required – likely making a very brittle end product.

ipazin · July 11, 2019, 12:42pm

Hi there,

considering platform issues there is Extract System Properties node which gives you platform information and based on it you can apply right logic using Case of If nodes. Shouldn’t be too complicated.

Considering the speed of URL to File Path node I will check it and get back to you.

Br,
Ivan

ipazin · July 16, 2019, 12:32pm

Hi there,

to come back to this one. Seems URL to File Path node needs to perform those system calls. Just out of curiosity where are those files located?

Additional workaround for platform issues can be done in Column Expressions node as well. If you need an example I can make one

Br,
Ivan

joshuahoran · July 17, 2019, 3:17pm

Thanks ipazin. The slowness of URL to Pile Path comes into play under the following scenario:

You read a list of 2000 file URLs from a remote server (e.g. cloud drive or FTP); then you need to extract the file name, the parent directory name, file extension, etc… The URL to File Path does a great, cross-platform-compatible job with this. However, this node has the side-effect of going through each of those 2000 entries and first verifying that the file exists.This is a one-at-a-time loop and can take 10 minutes depending on the connection speed. This is 100% wasted time since we already know that the files exists because we just obtained the URLs from the List Files node.

Thus having a version of URL-to-FilePath where you could turn off the “verify file exists” part of the functionality would be very useful. Your suggestion that we can re-implement this node from scratch is appealing as it would save hours of time over the course of a week, but I don’t know how to do this so it will work across all platforms (Mac, linux, & windows). An example workflow would be helpful!

Thanks.

armingrudd · July 17, 2019, 8:46pm

Hi @joshuahoran ,

Regex Split with this pattern will give you exactly the same output as the URL to File Path node on the “Location” column:
(.*)(?:[\\/])(.*)(?:\.)(.*)

Edited: It does not matter what OS you are using since both \ and / are accepted.

What do you think @ipazin? Do you think this works fine?

ipazin · July 18, 2019, 11:55am

Hi there @joshuahoran,

I see. Seems this node indeed performs one IO call per row. I will report it and then maybe something can be improved in future releases.

I didn’t mean to re-implement this node from scratch. What I had in mind is to perform Regex Split based on Platform information. Here is workflow example for you to check out:
https://kni.me/w/L0yXIdSKUMQREVbk

If you can apply right logic based on platform in Regex Split node this should do the trick. And should be faster

Br,
Ivan

ipazin · July 18, 2019, 12:37pm

Hi there @armingrudd,

If it can work on all three platforms then great! I have never really worked on Mac or Linux so not familiar with paths and thus can’t verify it covers all three platforms…

Br,
Ivan

armingrudd · July 18, 2019, 12:41pm

Windows uses \ in file paths and everything else uses / as far as I know.

Even if there was an OS using * then there would be no problem at all. You have to add * in the square brackets.

joshuahoran · July 18, 2019, 3:29pm

Thanks ipazin. That looks like a straightforward way to arrange the process. I will see about adopting this.

joshuahoran · July 18, 2019, 3:30pm

This looks like a nice, flexible bit of RegEx. If it can really handle all OS cases, then that would be great! I will try this one and ipazin’s solution and see what works best.

Thank you.

gbardon · September 30, 2021, 11:08am

for me this is the best solution! Thanks

tobias.koetter · September 30, 2021, 2:59pm

With the new file handling framework you can also use the new path expressions in the Column Expressions node to work with file paths. To get the file name simply use the getFileName() expression as shown in this example workflow:

Bye
Tobias

bruno29a · September 30, 2021, 3:18pm

Hi @tobias.koetter , thanks for sharing, it’s good to know these functions.

Alternatively, you can also get the filename using the URL to File Path node. Of course, we’d need to convert the Path to URI first using the Path to URI node: