@takbb I need some help in KNIME. I have got a spreadsheet and want to create a Star Schema for my Power Bi reporting. I need to create Dimensions and Fact from this. What would be the approach. I mean I need to select columns, deduplicate data to create Dimensions
Can anyone help with this please?
You might be able to create a temporary H2 database in your workflow, then execute a SQL statement to get your schema. Not DB admin / schema are not really my area of expertise, but this would be my first direction.
Hi @sudhirkumra , a good quick summary of of the process of ânormalisationâ is described here by @Daniel_Weikert
This aligns with your comment âI need to select columns, deduplicate data to create Dimensionsâ
Iâd imagine that if you had a spreadsheet of department and employee info in a de-normalised formâŚ
Dept_Name |
Dept_Budget |
Employee_Name |
Start_Date |
---|---|---|---|
Accounts |
1000000 |
Alan |
2021-02-05 |
Accounts |
1000000 |
Belinda |
2016-03-02 |
Sales |
2000000 |
Claire |
2020-12-03 |
Marketting |
3000000 |
David |
2017-06-01 |
IT |
50 |
Elgin |
2024-02-27 |
IT |
50 |
Frances |
2016-09-05 |
âŚYou could put together the column filters, duplicate row filters and value lookups something like this:
(Iâve not uploaded a workflow as this as it is just a non-functional mockup)
So youâd filter out the columns related to Employees, and separately the columns related to departments. Remove duplicate, and create an id for each. Then back on the âFactâ table, you can apply Value Lookups to get the IDs for each employee and department, and then use column filter to remove the columns relating to Employee and Department other than the IDs
Employee_Name |
Start_Date |
Emp_ID |
---|---|---|
Alan |
2021-02-05 |
0 |
Belinda |
2016-03-02 |
1 |
Claire |
2020-12-03 |
2 |
David |
2017-06-01 |
3 |
Elgin |
2024-02-27 |
4 |
Frances |
2016-09-05 |
5 |
Dept_Name |
Dept_Budget |
Dept_ID |
---|---|---|
Accounts |
1000000 |
0 |
Sales |
2000000 |
1 |
Marketting |
3000000 |
2 |
IT |
50 |
3 |
and the relationship between departments and employees, which in this case represents the âFactâ table, although youâd normally have some additional information/columns besides the IDs:
Dept_Emp_ID |
Dept_ID |
Emp_ID |
---|---|---|
0 |
0 |
0 |
1 |
0 |
1 |
2 |
1 |
2 |
3 |
2 |
3 |
4 |
3 |
4 |
5 |
3 |
5 |
I donât know what form you specifically need the data to be for working with PowerBI, but this would effectively turn your data into a normalised ârelationalâ structure.
Is Math Formula like a Surrogate Key? I want a Number to be generated like 100, 101, 102,103, 104, 105 for each dimension How can I use a node to do this for me?
Not sure if this is what you are after, but once you have your items in a table in the desired order, you can create a column with the number format you described using Column Expressions: 100 + rowIndex()
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.