Data manipulation

widjajaf · February 11, 2015, 5:54am

Hi

I need a bit of guidance in transforming data which is in a little bit of structure (but not fully structured)

as below

but i need to tranform it into purely data without the blank cells as below

richards99 · February 11, 2015, 7:15am

I think the key is to get the unique identifier in the count column to be fully populated down the rows, I.e. So it is full of 1s and then 2s etc, that would be the first step.

to do this, use the Time Series Missing Value node on the Count column, choosing Last.

now you should be able to pull all the other data from this using the Cell Replacer node, using the Count column as the lookup and the AgentId, agent name, agentcat etc as the replacement. Choose to append the results into new columns You will need several Cell Replacer nodes.

Then us the Column Resorter node to put these new columns to the front of the table, so it mirrors your expected output

that sorts out the data in the yellow boxes, you can now apply the time series missing values node on the AgentId column, and continue on in the same way.

Finally to remove the header rows, I notice these don't have a number in the agent name column. So use a String To Number node on the agent name column, followed by a Missing Value node on this column, opting to remove row where value is missing.

Simon.

widjajaf · February 11, 2015, 7:18am

Thanks Simon

I will have a look at Time Series Missing Value node and Cell Replacer, thanks for the guidance.

Marlin · February 11, 2015, 7:21am

Hi,

I would suggest something like the following for every type of "header": When you import the data, there should be a lot of missing or empty cells on the left under the respective header, right? We can use this information. First, use a Row Filter, a Rule-Based Row Filter or the respective splitters to extract the headers by checking if the respective column is empty. Now we have headers and data, but not the relationship. We can get that from the original table in a second branch. First,.make sure the empty cells in the important column contain missing cells instead of empty strings. (If they are only "empty", a Rule Engine might help, those return missing values if no rule matches.) Then use a Time Series Missing Values from the Time Series extension with the replacement method set to "last" to fill those cells. Now you can remove the header rows with a Reference Row Filter by comparing RowIDs with the RowIDs of one of the tables we generated earlier. After following those steps for both "headers", a Joiner or a Cell Replacer could replace the numbers in the columns we worked on earlier.

Hope that helps.

If it's too confusing, please upload your example data in an interpretable format and I'll whip up a workflow. :)

widjajaf · February 12, 2015, 6:26am

Hi

below is my work in progress flow with the output table below , which is kind of okay so far ...

but:

1. I have got to manually created the "agent detail lookup table" (Table Creator node), so there a way to automate this? since I have to do about 200-300 entries each time .

2. I have also managed to remove/exclude the header row (circled in red) using Rule-based Row Filterbased on COL8, COL9 and COL10 being blanks, but now my problem is with COL6, because I have a mixed of String and Integer values in the same column , so any pointer for taking care such problem.

Regards

widjajaf · February 12, 2015, 6:33am

Hi

DONT worry about Q1 , I have manage to automate the list using Row Filter and Column Filter easily. Thanks

richards99 · February 12, 2015, 7:08am

Simply use a String To Number node on this column 6

then use a Missing Value node to remove the missing values where the text was.

simon

widjajaf · February 18, 2015, 2:03am

Got it , Thanks heaps Simon ...that helps with my automation