Evaluate value in a column and use that value for new column

Pawel40Adi · July 22, 2022, 10:10am

Hi there,

I am trying to evaluate a value in my first column as per below:

If first column has #FileFormat line or #FundID then use that value so the whole expression ‘#FileFormat = ANNUITY’ and populate into a new column all the way to the last row in my dataset.

Any ideas (Column expressions? - I cannot see any If statement there)

ArjenEX · July 22, 2022, 11:05am

Hi @Pawel40Adi

Is exclusively #FileFormat or #FundID? In other words, is it always one of two options or are there cases where both are present? How should that be handled? Two new columns?

If it’s exclusive, a way to do it is something like this:

This:

Reads the data.
Applies some cleansing for convenience.
Checks whether column1 contains the text "#FileFormat" or #FundID. If so, output the value. If not, keep the output blank.
Populates all the rows of the output column with the found header value.

Note: I added two duplicate data flows, one where "#FileFormat" is present and one where #FundID is present so you can see how it behaves in both cases.

WF:
Evaluate value in a column and use that value for new column.knwf (46.4 KB)

Hope this helps!

Pawel40Adi · July 22, 2022, 11:52am

@ArjenEX - top man, looks like this is exactly what I am looking for. Will test and get back to you. I would need 2 columns for both elements: FundID and FileFormat.

My inept attempt included column expression and expression validating if column0 included string I am looking for:

Which generates this result:

But I was not sure how to attach that FundID variable and copy it down to the last row.

Thnnk you kindly again.

ArjenEX · July 22, 2022, 12:00pm

I see. The contains function is not bad though. It’s basically achieving the same as the IndexOf route.

You should be easily able to get if you apply the if else logic and create two entries in the column expression and subsequently also two entries in the Missing Values nodes for both options.

Pawel40Adi · July 22, 2022, 12:26pm

@ArjenEX

I could not see this syntax under column expressions node (Logical section). Where it can be found please?

Apologies , i am totally new to Knime.

ArjenEX · July 22, 2022, 12:45pm

You’re right that it is not there. If else is just a basic programming principle.

What I mean is: if you need two seperate columns, just create two entries in the column expression node that checks for FundID and FileFormat individually. Same for the Missing Values.

Pawel40Adi · July 24, 2022, 2:59pm

HI @ArjenEX , many thanks.

I forgot to mention that I wanted to split the newly created column and get only the values after “=” sign so for fund it will be ‘LLWP’ and for FileFormat it will be ‘ANNUITY’.

I took a liberty to modify your elegant solution and below is the result:

I am just not too sure how to remove the output_arr[0] column - would it be a row filter?

Also it the above the best solution to do so or is there something better available?

Pawel40Adi · July 24, 2022, 3:07pm

@ArjenEX , sorry to pester you but , if I only wanted to place the ‘#FileFormat = ANNUITY’ string from row 4 onwards , how would i need to do it?

You see in one of my later steps (after extracting required information for fund and file format) i promoting the values from the original value in row 5 (in my original files) which is row 3 in your mocked file and promote them to headers - then if i have only ANNUITY string in row 5 (or 3 respectively) it wont work well

Pawel40Adi · July 24, 2022, 3:55pm

Also I modified your original flow to get the row value to variable but it doesnt work as I am clearly doing something incorrect.

I want my variable to be ‘ANNUITY’ value from output_arr[0] column:

SORRY NOT SURE HOW TO ADD FLOW HERE SO SCREENSHOT ONLY PROVIDED

What am i doing wrong

ArjenEX · July 24, 2022, 6:23pm

Hi @Pawel40Adi

From first to last:

Column Filter
Add the Column Filter node and Enforce Exclusion of output_Fileformat_Arr[0] (String)

Tip: to keep your dataset more clean, you can also opt to remove input column in the Cell Splitter.

Header value only from row x

For this, you can work with the ROWINDEX column that is included by default in some nodes. Here, I use a Java Snippet to set the value to null if the index of the row is smaller than value x. You can compare it to a row number like you would have in Excel. Just adjust the number in the if-statement to your own situation).

ANNUITY to variable
Based on your screenshots I don’t really understand what you are trying to achieve here. I see you want to pass ANNUITY as variable but then you are adding a constant with that same value which does not make a lot of sense since you already have this in a column.
If you still want to use it some way, I would just filter out the header row where it is included with a Row Filter.

Then do Table Row to Variable and enforce the inclusion of all desired columns.

That will give you the value as flow variable.

How to upload workflows to the forum.

In the KNIME Explorer, right click on your workflow and click Export KNIME Workflow.

Select a save location and uncheck the option to Reset the Workflow.

Thereafter, drag and drop the workflow from your local file drive to the forum reply box. It gets uploaded automatically.

WF:
Evaluate value in a column and use that value for new column v2.knwf (70.0 KB)

Hope this helps!

Pawel40Adi · July 25, 2022, 7:57am

many thanks @ArjenEX for all your help - its much appreciated.

Especially Java snippet will give me something to work with in the future at it seems very powerful. I used to do quite a lot of scripting in VBA and some other programming platforms so those concepts are not foreign for me , however I have seen that the syntax has a specific requirements such as brackets location or command ‘||’ which i suppose stands for OR operator (as you would expect):

It would be good to read a bit more about the above syntax principles as you used it previously in column expression node - any good place to start?

Question: I have now created my flow and I am stuck on loop element:

I am aiming create a list of files (from a specific location) then use that list via Table Row to Variable Loop start and for each file do a specific actions (essentially its cleansing the text file , extracting information such as ANNUITY etc) and finally creating a table with 5 columns:
List files

And I want to pass a variable of:

Last row filter

From here I want to save a table but with name slightly different than the original file name (can be suffix 1-19 as i have originally 19 text files to work with) - should I add additional column to result in List files to serve as a new name?

However I am struggling to get the loop working:

Table Row to variable

Table Writer

I hope you can make sense of the above , apologies I am not able to share the flow due to sensitivity of the files I am using for this exercise.

In terms of your question from the previous post:

It was merely a question as I wanted to essentially find out if there is an easier or better solution to fill the column rows to the last row in table with the use of passed variable (that would contain string ‘ANNUITY’ etc.) rather missing value node (if such method is more efficient and exists at all). But its not a big deal as the current set up works great for the proposes of my analysis.

ArjenEX · July 25, 2022, 5:08pm

Hi @Pawel40Adi

Creating files in a loop is a pretty commonly used topic in here I created a reference workflow that gives you a few options to approach this.

Starting point:

I have a folder with two files:

With the List Files/Folders node I get those into the workflow and initiate the Table Row To Variable Loop Start.

Next step is the CSV reader whereby the file is associated to the flow variable Path created earlier.

For convenience, I opt to include the path location also here so I can work with it later on as well.

From here, you can take multiple directions. Option 1 could be to directly reconstruct the entire file and add a suffix to designate that it has been processed. In this example, I used _processed_option1. Using a Column Expression node, the new file path now includes the mentioned suffix.

Using the source_path created earlier:
replacePath(column("source_path"),join(replace(getPathString(column("source_path")),getFileName(column("source_path")) ,""),replace(getFileName(column("source_path")),getFileExtension(column("source_path")),""),"_processed_option1",getFileExtension(column("source_path"))) )

See the difference:

Convert the new output filepath to variable with Table Row to Variable

Pass this variable to the CSV writer subsequently:

// // // // //
Option2
If Path operations are not really your thing, you can simplify it slightly by just creating the new filename first with: join(replace(getFileName(column("source_path")),getFileExtension(column("source_path")),""),"_processed_option2")

Repeat the step of converting it to a flow variable. The next step is the Create File/Folder Variables node which allows to create path flow variables. Fill in your base folder location, the desired new variable name, value and file extension.

The value is the new actual filename that I created before. As such, replace this Value field with the flow variable created in the previous step.

Finally, pass the csv_path variable to the CSV writer.

Run the entire workflow. Result:

See WF:
Text file iteration.knwf (80.5 KB)

Note1: as mentioned, there are many ways to do this more which others can most likely highlight. Test for yourself which is the most efficient with your data set! Have look around in the forum and the hub for more inspiration.
Note2: If your workflow still does not work after applying something like this: replace your actual input with a Table Creator with dummy data like I have done in the beginning and upload your workflow here to proper troubleshooting can be done.

Hope this helps!

Pawel40Adi · July 27, 2022, 12:51pm

@ArjenEX , this is such a detailed and fantastically comprehensive explanation - I salute you ,Sir.

So easy to follow - I found a number of other loop examples on this forum but none even close to your quality of explanation.

Thank you very much for your time and effort to prepare the flow and screenshots - I owe you a beer (or 2…)

It seems though that I am doing something wrong as I am not able to pass my initial file name variable to CSV writer at the end of the flow.

Here is my flow:

In CSV reader I create my source file path:

Then under Column Expressions I use your path modification formula to append suffix :‘Original_text_file_processed’ to the file name I wish to use to save the output:

replacePath(column(“original_file_name”),join(replace(getPathString(column(“original_file_name”)),getFileName(column(“original_file_name”)) ,“”),replace(getFileName(column(“original_file_name”)),getFileExtension(column(“original_file_name”)),“”),“_Original_text_file_processed”,getFileExtension(column(“original_file_name”))) )

Then under Table Row to Variable node I specify which column to use as a variable for a new name:

But then under CSV Writer ( I will use table writer in the end as my files are quite huge) I am not able to select the variable output_file_name. There is only ‘Path’ option available:

I am not sure where I am going wrong.

ArjenEX · July 27, 2022, 2:41pm

Hi @Pawel40Adi

The error that you made is that your Column Expression output is set to type string. Change it to Path and it will work.

Left yours, right mine.

Pawel40Adi · July 27, 2022, 3:17pm

thanks @ArjenEX but still no luck.

I still dont have option to select my ‘output_file_name’ variable in final CSV Writer:

ArjenEX · July 27, 2022, 3:24pm

Oke, then I have to refer to my earlier remark because I’m out of options

Make a copy, replace your actual input with a Table Creator with dummy data and upload your workflow here so proper troubleshooting can be done.

If not possible, maybe someone else is willing to jump in and check your screenshots.

Pawel40Adi · July 27, 2022, 3:57pm

I disconnected last 2 connections between final shaped table (Column Expressions) and CSV Writer/End loop ,connected again and voila:

It works a charm. Thank you!

ArjenEX · July 27, 2022, 4:04pm

Glad to hear it’s running!

Pawel40Adi · July 28, 2022, 10:58am

@ArjenEX , did you consider creating a Knime course for intermediate/advanced users and potentially dropping it on Udemy or other learning platform?
I am saying that from the perspective of the quality & clarity of your instructions (and dedication which cannot go unnoticed).

Apologies if you have done that already.

It would be silly to waste your potential in this respect.

system · August 4, 2022, 10:59am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.