Moving every second string in a column to new row

Hi all,

Just started using KNIME and managed to solve a lot of issues except for the below:

The data relates to football games and gives data for both teams, each in a separate row. Example in row 1: Kyoto Sanga played Albirex Niigata on 28 Oct 2023 and the game ended 0 - 1.

Now the issue is as follows: in the column “xG”, it shows the xG for Albirex (0.87) in row1 and the xG of Kyoto Sanga in row 2 (1.14). This repeats for all other games thereafter.

I would like to move these xG’s to one row and two columns: ‘Home xG’ and ‘Away xG’ so that I have all the relevant data for one football match in one row like this:

Row ID | Date | HomeTeam | AwayTeam | HomeGoal | AwayGoal | HomexG | AwayxG|
Row1 | 2023-10-28 | Kyoto Sanga | Albirex | 0 | 1 | 1.14 | 0.87

Cause of how the data is provided, the second row for a match does not necessarily correspond to the away team, as such I would have to, in Excel language, index match the xG to the respective team in column “Team”.

I hope the above is clear enough, can gladly share the file if needed.

Hi @eksldpf .

Welcome to the KNIME forum.

I think you can use a lag column

Followed by a duplicate row filter

Br

1 Like

Hi @eksldpf welcome to the KNIME community and it was good to see a problem clearly explained!

In effect, you can think of your one table consisting of data that is “uniquely keyed” by Date and Home Team, so any two rows with the same Date and Home Team are in fact one row that is split across two.

One way to approach this is as follows:

Use a Joiner node, and feed the table to both the upper and lower ports. Join the rows based on Date and Home Team. For every two rows in your original table you will end up with the product (2x2 = 4 rows).

Then use a rule based row filter or rule based row splitter to retain only those rows where the HomeTeam = Team and the Team<> Team (right):

NOT $Team$ = $Team (right)$ AND $HomeTeam$ = $Team$ => TRUE

this will then give you back one row out of each four (see example below)

You should then have single rows with all the columns of data you need. Use a column filter to keep the columns that you require. You could alternatively filter the required columns within the Joiner node.

Sample data

Effects of joiner,


Blue marks the rows that relate to a single game
Red marks the specific data that will be used by the rule based row splitter to identify the row that is to be kept:

Effects of Rule Based Row Filter

Here is an example:
Filtering and joining rows from table.knwf (12.4 KB)

3 Likes

If you do not require all columns then groupby, aggregate list and then split columns might be an option as well.
br

Great! This helped out a lot, thanks for that!

One final issue that I am dealing with is something I overlooked in my original post. The data I have comes from various identical Excel files, each for a different team in the league. The “Team” is therefore equal to the team of the Excel file. As such, for each match I’ll end up with two rows, one for each Team.

I managed to fix this, sort of, but now I have one final hurdle to cross:

For each match I have the data of both teams in one row, whereby all data items to the right of “Team” and “Team (Right”) relate to that specific team. However it is not ordered based on Home / Away, but instead it is based on which file was loaded earlier.

If I now want to create two new columns: xG Home and xG Away, what would be the easiest way to allocate the correct xG to each team? In Excel you could fix this with a SUMIF whereby Team should be equal to HomeTeam if not → xG (Right), but i’d rather do everything in KNIME.

You can create “SUMIF” in KNIME e.g. with Column Expression node. (Be aware of the java script syntax there)
br

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.