Hello, eager to find a way to solve a problem with data stacked vertical. Luckily, the data is in a similar pattern, it’s repeating every 4 rows, and each row is a header. The data is stored vertically because it’s coming from web data (html). On the web, it’s nice tables, however we need it to be horizontal. I’m familiar with pivot, where I’m stuck is how to loop only 4 rows, label those 4 rows 1,2,3,4… then move to the next 4.
id, html
1124, Tom
1124, Supervisor
1124, N/A
1124, N/A
1124, Jeff
1124, CEO
1124, 5%
1124, TextInfo_Texty
1199, Jill
1199, Owner
1199, 2%
1199, RandomTextInfo
1199, Sally
1199, Co-owner
1199, 2%
1199, TextInfo51FakeData
3324, Ashley
3324, Field Rep
3324, 1%
3324, TextInfo
3324, Isabel
3324, Marketing
3324, 1%
3324, TextInfo
About; Id is a foreign key, Html is the final parsed data from a web parsing process.
How: Using python selenium to (Extract & Load) get HTML files, and KNIME is the ETL (Extract Transform Load) prototyping tool, which connects to the HTML files in a directory.
Goal: I’m eager to repeat 1,2,3,4 for every 4 values, and I feel this will enable me an easier path for the next node(s) which will pivot everything into a normalized table.
Thanks everyone for the help/patience. Excited to see how you solve the problem.
Best regards,
Tyler