I successfully grouped data row-wise by “QValue” categories (Q1, Q2, Q3, etc.) including a text description and its associated percentage within each of these groups.
I then moved each of nine “QValue” groups to column headers instead of identifying a row, including pulling the field text and percentage data into each respective column. I accomplished this using the Transpose Node.
What I need help with is each QValue column should be two columns: 1) the text description, then 2) the percentage contribution directly beside that in a second related QValue column. So something like: 1) Q1 Text Column, the 2) Q1 percentage Column. I have not found a means for this kind of related separation of the columns when the data within them is alternating text then integer by row within that column.
Thanks again for your help. One day i hope to get smart enough to be someone helping instead of begging for answers and bothering you.
Providing a sample datafile and workflow could go some way to help understanding your problem and helping to solve it.
here’s the original dataset. thanksDataSet.xlsx (20.7 KB)
the workflow is quite huge and i have no discovered a way to save just a portion of it with full field data. sorry
What would a result look like moving on from this file. Would there be separate file for each Q value with the columns and the values from the row?
Maybe is is possible to have the start of the challenge and an explanation about the outcome.
The end result is 18 columns, 2 per QValue (Q1, Q1…Q2, Q2… through Q9). The left Column would contain the string (text description); the right would contain the String’s derived percentage.
My first graphic inserted in this topic chat shows I have the text and it’s associated percentage–but spread across many columns. My goal is to move the QValue categories to column headers, then spread the same field data down the series of much fewer columns. Transpose came close (my second graphic posted originally here), but it mixed in the same QValue column both String and Double data. I’m really just trying to keep the general column format but separate, yet associate the String and Double in two separate columns.
I’m sorry this is burning your time. I may just try to find a way to analyze the data in the initial formatting.
Often there is a way. My problem is to understand what is the technical as well as business question/goal and have a data set that fully represents the challenge. For example if it is always 9 Q and then one line a solution may be different from a scenario where that may change.
In general: if the structure is more steady and stable it is easier to handle - otherwise one would have to resort to more unconventional measures. But the good news is with KNIME even larger data sets might be looped or handled within a few minutes which often is good enough. But I have also set up ‘funny’ loops on Big Data systems to get a job done.
Concerning time burning. The forum is based on the voluntary contribution of it so don’t worry
i think that your initial data manipulation is creating more issues than solutions
can you feed us the very raw data, without aggregation and calculation?
The QValues can range from 3 to 9 (min/max). This is derived by the KMeans clustering node. So the actual range can vary.
The dataset is actually not big, but there’s as much detail as I can capture. For accuracy in my project a controlled dataset using GIS shapefiles is used to create a physical boundary and then the data is extracted of every house within that boundary. So the set is small (100-500 houses). But each house has about 12 different quantitative variables and 20-40 different qualitative variables. And each house varies in these variables’ values.
So, the KMeans clustering parses this set of homes into my “QValue” sub-sets. That is what I am trying to do with the problem we are working on this moment–simply present each variable and the percentage of homes within each QValue sub-set that has each of these variables.
Hi Luca, attached are three sets of data–after only enough processing to make them useful–but before the current challenge I have.
I combine the three sets using key: ListingId. Then each ListingId and it’s associated variables are assigned to a QValue Group. The QValue Groups were derived by KMeans clustering.
My current challenge is to align two columns (or your better solution) for each QValue group: 1) is the variable text description, 2) is the percentage of homes within each QValue Group the cumulative homes within that group has.
PS: for this current challenge you likely do not need the “Full_Quanitative_Data”. It is here because I used it to create the QValue Group clusters using KMeans. The “CompleteFreeFormVariables” and “CompletePreSetVariables” are then assigned to QValue Groups using their “ListingId” key. These last two groups are what I am using for the current challenge.
FULL_QUANTITATIVE_Data.xlsx (113.6 KB)
CompleteFreeFormVariables.xlsx (16.4 KB)
CompletePreSetVariables.xlsx (53.6 KB)
i’ve tried to do something, based on DataSet.xlsx.
i’ve used the unique values of description to join relative Q-values.
rearranging columns.knwf (415.9 KB)
Let me know how far i am
I will take a try at it. Thank you
Hi Luca: 1) whoa–your stuff is way over my skill level, 2) I have terrible guilt you worked this hard, 3) i fed the initial file reader my DataSet and ran everything down to the very last groupby node and need to define Aggregation before I can get beyond this point. Where we you going with this?–oh, and 4) THANK YOU–and I still feel terrible guilt you worked this hard.
that’s fine, you don’t really need an aggregation here. I’m just pulling out distinct values of descriptions, so i need only to group by this field.
Is the final table suitable for you?
As mlauber71 said, don’t worry, we are here to help and improve our skill in real business scenario
Yes, wonderful, very-very kind of you to do this!
The results then are the Loop End table, and I don’t need to do anything with the Gruopby or sophisticated Meta Node you supplied?
Thank You very much again!
It depend on how you want to see this data.
Using the groupby i’ve extracted all the possible descriptions you have and then joined with every Q (from 1 to 9), thus you have a description every row and several column with the Q value (the logic is inside the metanode).
Surely all that cascandin join can be managed much better, but i hope the result is what you are expecting to see.
YES, wonderful outcome. Again, thank you very much for your skills and effort.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.