Failling inside a FOR loop with DataFrame

Atapalou · March 1, 2022, 1:14pm

Bonjour,
The DataFrame creation under matrix:

data_matrix = {'x0': [0, 2, 0],
'x1':[2, 3, 1],
'x3':[4, 4, 1],
'x4':[6, 5, 1]}
data_mtrice = pd.DataFrame(data_matrix)
data_matrice = data_mtrice.T

Ce qui donne:

data_matrice.head()
Out[30]: 
    0  1  2
x0  0  2  0
x1  2  3  1
x3  4  4  1
x4  6  5  1

Each line is a cumulative sum. For example, 0+2= 2 then 0+2+2 = 4 then 0+2+2+2 = 6.
I am looking for a function to de-cumulate.
I tried to write:

def decumule(tableau):
    decu_table = np.zeros(tableau.shape)
    for ligne, element in enumerate(tableau.iloc()):
        print("ligne = ",ligne)
        for colon, elem in enumerate(tableau.iloc[ligne]):
            if ligne > 0:
                print("colonn",colon)
                decu_table.iloc[[ligne, colon]] = tableau.iloc[[ligne, colon]] - tableau.iloc[[ligne - 1, colon]]
            else:
                 decu_table.iloc[[ligne, colon]] = tableau.iloc[[ligne, colon]]
    return decu_table

tentative = data_matrice.apply(lambda tableau: decumule(tableau))

which leads to

test10 = data_matrice.apply(lambda tableau: decumule(tableau))
ligne =  0
Traceback (most recent call last):

  File "C:\Users\David\AppData\Local\Temp/ipykernel_14608/2847870468.py", line 1, in <module>
    test10 = data_matrice.apply(lambda tableau: decumule(tableau))

  File "C:\Users\David\anaconda3\lib\site-packages\pandas\core\frame.py", line 8740, in apply
    return op.apply()

  File "C:\Users\David\anaconda3\lib\site-packages\pandas\core\apply.py", line 688, in apply
    return self.apply_standard()

  File "C:\Users\David\anaconda3\lib\site-packages\pandas\core\apply.py", line 812, in apply_standard
    results, res_index = self.apply_series_generator()

  File "C:\Users\David\anaconda3\lib\site-packages\pandas\core\apply.py", line 828, in apply_series_generator
    results[i] = self.f(v)

  File "C:\Users\David\AppData\Local\Temp/ipykernel_14608/2847870468.py", line 1, in <lambda>
    test10 = data_matrice.apply(lambda tableau: decumule(tableau))

  File "C:\Users\David\AppData\Local\Temp/ipykernel_14608/3921423307.py", line 5, in decumule
    for colon, elem in enumerate(tableau.iloc[ligne]):

TypeError: 'numpy.int64' object is not iterable

Do you have any idea what can go wrong?

Regards,
Atapalou

aworker · March 1, 2022, 1:25pm

Bonjour @Atapalou et bienvenu au forum de la communauté de KNIME !

A “cumulative” function is the same as a discrete integral function.
A “decumulative” function is the same as a discrete derivative function.
Pandas library has a dataframe discrete differential function called
DataFrame.diff(periods=1, axis=0)

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.diff.html

Have you considered as an option to use this function instead of implementing your own"decumulative" function?

Maybe this helps.

A bientôt,

Ael

Atapalou · March 8, 2022, 5:25pm

Attention DataFrame.diff(periods=1, axis=0) crée des NaN sur la première ligne

une solution:
def decumule(tableau):
decu = tableau.to_numpy().copy()
print("range= ", range(decu.shape[0]-1, 0, -1))
for ligne in range(decu.shape[0]-1, 0, -1):
print(“ligne=”,ligne)
decu[ligne] = decu[ligne] - decu[ligne - 1]
return pd.DataFrame(decu)

tentative = decumule(data_matrice)
print(tentative)

aworker · March 8, 2022, 5:45pm

C’est normal que ça rend NaN et c’est comme ça que ça doit être implémenté pour bien faire comprendre à l’utilisateur qu’il y des effets de bord quand on calcule une dérivée discrète. Après, on peut le gérer comme on veut en fonction des besoins.

It’s normal that it returns NaN and that’s how it should be implemented to make it clear to the user that there are side effects when calculating a discrete derivative. Then, one can handle it as needed depending on the needs.

benjaminzx · July 13, 2022, 6:46am

First consider if you really need to iterate over rows in a DataFrame. Iterating through pandas dataFrame objects is generally slow. [Pandas Iteration] beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a List Comprehensions , vectorized solution or DataFrame.apply() method for iterate through DataFrame.

Pandas DataFrame loop using list comprehension

result = [(x, y,z) for x, y,z in zip(df['Name'], df['Promoted'],df['Grade'])]

Pandas DataFrame loop using DataFrame.apply()

result = df.apply(lambda row: row["Name"] + " , " + str(row["TotalMarks"]) + " , " + row["Grade"], axis = 1)

system · June 2, 2023, 9:27pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.