Picking the file extensions from list of files

Hi All,

I am quite new to Knime and was wondering if someone could help me out.
I have a column with a list of file names, I want to pick up only the extensions of all the files into one column or multiple columns.

I am attaching a screenshots where the highlighted columns is the present data and the other column is the expected column.

Thanks in advance!

Hi @manish_mohan, welcome to the KNIME community.

From you image, it looks like the _AttachmentNames column is a String List, and that the individual items are enclosed in double quotes. If that’s the case, you could use the following nodes:

image

For String replacer, use the following Regular Expression:

.*(\..*?)"

exactly as written above, including the double quote. This ignores all characters until it finds a “.” and then it “captures” everything from the “.” (including the “.”, up to, but not including the double quote.

Enter the replacement string as:
$1

After that, the GroupBy node simply concatenates the result:

Make sure you specify a space as the Value delimiter

and then you can rename the columns as required, using Column Renamer

3 Likes

Hello @takbb , Thank you for your kind support and reply.

when i tried using the string replacer, it is picking only one extension and not the other extension. I am missing the other extension.

is there some modification that would help?

Thanks in advance

Hi @manish_mohan , The more information you can provide, the easier it would be to assist, as it’s difficult to say what modifications you need to make, when I cannot see exactly what your data looks like, and your workflow. The suggestion I gave does work but only if my understanding of your data is correct, so it could be that I have misunderstood what you said, or misinterpreted the picture you showed of your data.

For example, I made the assumption that your column “_AttachmentNames” is a string list, but maybe it isn’t. Can you confirm if my assumption is correct.

Can you upload a sample workflow?

Hi @takbb , Thank you for your reply but sorry for the misunderstanding.

I am not able to share the knime workflow as it contains sensitive data but i will be sharing the image which might help you understand my requirement.

the _AttachmentNames column contains multiple files, i would like to get only the extensions of those files and have them in a separate column.
for example: if the _AttahcmentNames contains “abc1.pdf”,“def1.pdf”,“xyz1.jpg” i would like to have 2 new columns by name ‘.pdf’ anf ‘.jpg’ and the count of files in those respective column. i.e ‘.pdf’ → 2 , ‘.jpg’ → 1.

Hope this helps.
Thanks in advance.

Thank you @manish_mohan , ok, so my assumption that the Attachment names was a List was incorrect, which may explain the problem. The other aspect of this is that I now see you have multiple rows, and additional columns.

That has implications on how to do this, as my existing solution would work for the single row and column, but might have the effect of grouping multiple rows together

Should I to assume, that you wish the existing rows to have the new column appended, and retain any other existing columns, along with retaining row order (and RowID) ?

Attached is a demo workflow, which is more than my original suggestion as it has to turn the delimited string into a String List, and then ensure that the multiple rows and additional columns remain intact at the other end. I also added the count of unique extensions.

Split delimited string, replace and regroup.knwf (29.3 KB)

Input


Output


As an aside, I of course wasn’t expecting you to upload the actual flow containing sensitive info, but instead to upload a demo workflow using a minimal set of example data that demonstrates the problem.

This way, it helps everybody understand the specifics of what is being asked, and also it gives us some sample data to use in prototyping a solution.

As you can see, mocked-up images can be open to interpretation. Your subsequent screenshot of the data gave sufficient information for me to realise that my original understanding of your data was incorrect. :slight_smile:

Anyway I hope this now gives you what you need, and again, welcome to the KNIME forum. :wink:

1 Like

Hi @takbb , This is exactly what i was looking for. Thank you so much for your support and kindness.

I am sorry for the confusion caused from my side.

This is something extra, i was just curious if it is possible to create columns with the name of the extensions and have the count of extension in those respective column.

Thanks for your support.

3 Likes

Hi @manish_mohan , I’m glad it worked and thank you for marking the solution. Apologies that I’d forgotten to get round to responding to your follow up question.

If you are still in need of an answer I’d suggest now asking if as a new question on the forum, and could you give a demo example of how your data looks and what output you need. Asking if as a new question will give visibility to more people than asking it here, and of course I’ll also assist if I can.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.