structuring data from webscraping

Hi,
as a result from webscraping I have a (simplified) dataset like this . . .

The problem is that the string of source data not always contains all the variables. So I’m looking for a solution to structure the data like in the target picture . .

I’ve looked at several nodes like Lag column, Chunk loop Strat/End, String manipulation etc but did not succeed.

Hopefully you can give me some hints . . . attached the simplified data . . .KNIME_structuring_data.knwf (6.5 KB)

Hello @sanderlenselink,

and welcome to KNIME Community!

I would try following approach:

  • use Unpivoting node two times (parallel). Once to get all types in single column and another time to get all values in single column
  • then simply append those two tables and pivot

I used regex, wildcard and type selection to make workflow run successfully regardless of webscraping result. Here is workflow example. Take a look and if any questions feel free to ask.
KNIME_structuring_data_ipazin.knwf (14.2 KB)

Br,
Ivan

3 Likes

Hi Ivan,

this works really GREAT . . .
Of course my simplified example was to simple (as always) but you pushed me in the right direction.

Once again you confirmed that KNIME really a fantastic tool with superb flexibility and features.

Again . . . THNX

-Sander

1 Like

Hello Sander,

glad to hear that :slight_smile:

Br,
Ivan

Hi Ivan,

I have another problem that look’s like before but I cannot get the right solution. Maybe someone has an idea

See below an example . . . from webscraping I generate a string. The structure is that the sequence of the year(s) correspond with the values. Hopefully the example is clear.
My goal is a structured table . . . but how ???

Attached my attempt for workflowSL_KNIME_structuring_data_making_table.knwf (16.0 KB)

Hello @sanderlenselink,

so Unpivoting to get everything in one column, then Rule-based Row Splitter node to split rows with years from rows with values and follow it up with Appender before finishing it with Pivoting.
SL_KNIME_structuring_data_making_table_ipazin.knwf (23.8 KB)

Br,
Ivan

2 Likes

Hi Ivan,
your solution is exactly what I wanted. :+1: :wave:

I didn’t know the node “Rule-based Row Splitter”. Very powerfull.

This forum is really excellent and helpfull; smart solutions, good discussions and very fast response.
1000x THNX

-Sander

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.