problems using the groupby and ungroup node and csv writer node in a loop

Hello,
I’m preprocessing a large dataset for hourly electricity consumption for loads of users. I want to group the data according to user ID first and then sample 50 users from these 13735 users. After this, I want to output the data of these 50 users to 50 csv files. There are several problems I have encountered: when I try to output the data of the users to csv files after grouping, the output csv file only contains one line of user ID, no original electricity consumption data in the group are outputted. I tried to use the ungroup node to get the original data, but it doesn’t work. In the loop I tried to use flow variable to control the file path of the csv writer, do I need another loop for this, or this can all be done in one loop somehow. When I used two loops, it says on the csv writer node "Can’t merge flow variable stacks! Likely a loop problem.” Please help. Thanks.

data_preprocess(2).knwf (33.3 KB)

Hi @jackcao53 ,

Welcome back!

You can use the groub by loop node to select by 1 or more columns information, it’ll help you to process data for only the selection that you put inside the loop node configuration.

Thats is your flow example:

But you can set like it:

image

You load the data first, groupby loop node for some info like “company name” column.

Then, you can select a sample of this data and put it into a csv write node.

At the end, you’ll pick some samples for all groups that you set… Can it help you? It can be more complex as you wish…

Tks,

Denis

2 Likes

Dear @denisfi,
Thanks for your reply, your suggestion works, the original data can be outputted in the csv file. However, this is not what I want exactly. As you can see in figure 1, there are 13735 groups, each group contains the information for one customer as shown in Figure 2. The group loop start node can put each user out of 13735 users in a group. This loop can take 50 data sample from each user out of 13735 users. However, I want to take 50 random users out of 13735 users (50 groups out of 13735 groups) and take all data from each user. How can I do that? Furthermore, how can I control the file paths to name the file using the 50 customer IDs selected?


Thanks,
Jack

AH, OK @jackcao53

So, you need to split into 2 paths first and then join again to pickup onlye the sample user ID.

As you can see above, a made it as an functional example with dummy data. I’ve the source with data (table creator), just for order, I sort it by user ID first of all to make it easier to see later.

Then I made 2 paths, one of them to make unique values and the sample that you said before… from more than 300 IDs, I’ll select random 50 examples and set a file path as string for each one. For better context, the path was “knime workspace\filename.userid.csv” as join function from string manipulation option.

After that, I set a joiner node to select only the data that match with the sample options… OK until here?

So, i stay with only with all samples data from this point, which I’ll use a group by loop node to make a write process to files for each user id.

The string to path node is inside the loop to convert it and used as flow variable when I’m inside the loop momment for each user ID.

The result will be 50 csv files with all data from user id that was your desire…

File with this flow:

data_preprocess(2)-denis.knwf (234.8 KB)

I hoppe that it help you now as you wish… good luck!

Tks,

Denis

1 Like

Thank you @denisfi, this is exactly what I want.
Regards,
Jack

OK, i’m happy to know… so, if the request is ended, can you set solved?

Tks,

Denis

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.