Complex columns append and rename based on cell

Hi all,

I’m looking for a faster way to combine rows and column values by date group while renaming the columns to include one of the cell values. For each Date, the Symbol value is appended to the Col1, Col2 names and then the columns are restructured. I’m sure this is easier to visualize… (FYI, the values in the sample columns below for Col1 and 2 are meaningless)…

Covert this table struture:

RowID	Date	Symbol	Col1	Col2
Row0	1/1/2000	XXX	sers	efecs
Row1	1/1/2000	YYY	sewe	efcz
Row3	2/1/2000	XXX	sef	ceser
Row4	2/1/2000	YYY	esgvg	wed

To this:

RowID	Date	Col1_XXX	Col2_XXX	Col1_YYY	Col2_YYY
Row0	1/1/2000	sers	efecs	sewe	efcz
Row1	2/1/2000	sef	ceser	esgvg	wed

I have a incredibly slow way to do this using a group loop for Date, group loop for Symbol, and then a Column List loop that does the rename. But the data covers 20 years, with hundreds of Symbols, and 100+ columns so running these will take me weeks based on the small proof-of-concept I did. I know there has to be a better way.

Thanks in advance!

Hi @cybrkup ,

From the sample data you have given, this initially looks like a simple join on the date column for the set of rows XXX and the set of rows YYY.

So your example output could be achieved using a row Splitter on the SYMBOL column (XXX goes to the top output and YYY to the bottom). you’d then rename the columns on each data set and then join those two data sets on DATE.

BUT…, in reality your symbols aren’t just XXX and YYY, as you say there are hundreds of different symbols. So are there more than two symbols per date? I feel that perhaps your sample data is slightly too simplistic to fully describe the problem.

Could you show what the expected output is if you have, say, four symbols for one date and maybe three symbols (some of which are different to the previous four) for a different date?

Are you expecting, for example, a tabulation of all the hundreds of symbols with the values populated on their respective columns if they are present for a given date?
Thanks.

1 Like

To your point, this should be much clearer:

  • Column data will be the same type (str, int, boolean, etc.) for all of the Symbols… Col1 will always be Str, Col2 always Boolean,etc. I used all boolean in this example, but that is only to simplify the example.

  • Columns layout will always be the same for each Symbol for when data is available for a given Date. E.g. if Symbol A has data for a given date, it will have all of the columns that any other Symbol would have, even if the data is “?”.

  • I mentioned having hundreds of Symbols, and that is true but misleading. I break up the Symbols into groups based on corporate industry, so each group will be under 100 symbols.

  • In this example:
    … A and B have matching Date rows, but B is missing data in Col3 and Col4.
    … C only matches A/B for one Date row, and also has one missing value in Col2.

Input:

Date	Symbol	Col1	Col2	Col3	Col4
1/1/2000	A	TRUE	TRUE	FALSE	FALSE
1/1/2001	A	TRUE	TRUE	TRUE	FALSE
1/1/2002	A	FALSE	TRUE	TRUE	TRUE
1/1/2000	B	TRUE	TRUE	TRUE	TRUE
1/1/2001	B	FALSE	TRUE	?	?
1/1/2002	B	TRUE	TRUE	FALSE	TRUE
1/1/2003	C	FALSE	?	FALSE	TRUE

Output:

Date	Col1_A	Col2_A	Col3_A	Col4_A	Col1_B	Col2_B	Col3_B	Col4_B	Col1_C	Col2_C	Col3_C	Col4_C
1/1/2000	TRUE	TRUE	FALSE	FALSE	TRUE	TRUE	TRUE	TRUE	?	?	?	?
1/1/2001	TRUE	TRUE	TRUE	FALSE	FALSE	TRUE	?	?	?	?	?	?
1/1/2002	FALSE	TRUE	TRUE	TRUE	TRUE	TRUE	FALSE	TRUE	FALSE	?	FALSE	TRUE


1 Like

Hi @cybrkup

See this wf complex_column_append.knwf (70.5 KB).


gr. Hans

2 Likes

Perfect, works great and puts my solution to shame. Thank you!

Couldn’t you just pivot the data?
br