I’m looking for a faster way to combine rows and column values by date group while renaming the columns to include one of the cell values. For each Date, the Symbol value is appended to the Col1, Col2 names and then the columns are restructured. I’m sure this is easier to visualize… (FYI, the values in the sample columns below for Col1 and 2 are meaningless)…
Covert this table struture:
RowID Date Symbol Col1 Col2
Row0 1/1/2000 XXX sers efecs
Row1 1/1/2000 YYY sewe efcz
Row3 2/1/2000 XXX sef ceser
Row4 2/1/2000 YYY esgvg wed
To this:
RowID Date Col1_XXX Col2_XXX Col1_YYY Col2_YYY
Row0 1/1/2000 sers efecs sewe efcz
Row1 2/1/2000 sef ceser esgvg wed
I have a incredibly slow way to do this using a group loop for Date, group loop for Symbol, and then a Column List loop that does the rename. But the data covers 20 years, with hundreds of Symbols, and 100+ columns so running these will take me weeks based on the small proof-of-concept I did. I know there has to be a better way.
From the sample data you have given, this initially looks like a simple join on the date column for the set of rows XXX and the set of rows YYY.
So your example output could be achieved using a row Splitter on the SYMBOL column (XXX goes to the top output and YYY to the bottom). you’d then rename the columns on each data set and then join those two data sets on DATE.
BUT…, in reality your symbols aren’t just XXX and YYY, as you say there are hundreds of different symbols. So are there more than two symbols per date? I feel that perhaps your sample data is slightly too simplistic to fully describe the problem.
Could you show what the expected output is if you have, say, four symbols for one date and maybe three symbols (some of which are different to the previous four) for a different date?
Are you expecting, for example, a tabulation of all the hundreds of symbols with the values populated on their respective columns if they are present for a given date?
Thanks.
Column data will be the same type (str, int, boolean, etc.) for all of the Symbols… Col1 will always be Str, Col2 always Boolean,etc. I used all boolean in this example, but that is only to simplify the example.
Columns layout will always be the same for each Symbol for when data is available for a given Date. E.g. if Symbol A has data for a given date, it will have all of the columns that any other Symbol would have, even if the data is “?”.
I mentioned having hundreds of Symbols, and that is true but misleading. I break up the Symbols into groups based on corporate industry, so each group will be under 100 symbols.
In this example:
… A and B have matching Date rows, but B is missing data in Col3 and Col4.
… C only matches A/B for one Date row, and also has one missing value in Col2.
Input:
Date Symbol Col1 Col2 Col3 Col4
1/1/2000 A TRUE TRUE FALSE FALSE
1/1/2001 A TRUE TRUE TRUE FALSE
1/1/2002 A FALSE TRUE TRUE TRUE
1/1/2000 B TRUE TRUE TRUE TRUE
1/1/2001 B FALSE TRUE ? ?
1/1/2002 B TRUE TRUE FALSE TRUE
1/1/2003 C FALSE ? FALSE TRUE