Strictly speaking, one should not normalize the data before splitting it into training and testing sets but after splitting, because it introduces a bias towards better results, specially when working with small datasets. Thus normalization is calculated on the training dataset and applied on the testing set at every data partitioning.
The workflow needs hence to be rearranged so that a separate normalization is done for every step of the X-partitioning loop. Tha aim of X-partitioning or Leaving-K-Out (LKO) is to evaluate the quality of your machine learning approach as if K different tests were achieved independently. The aim is not to generate a final model. It doesn’t matter hence if normalisations are achieved separately on data partitions within the loop and hence differently.
If your model turns out to be performant based on the LKO test, then you could use the whole dataset with the same normalization in this case, to train your final model to share or put it in production, but this is done only as the final step.
Concerning your second question, how did you set your X-partitioning ? If you are doing just a Leaving One Out partitioning (as it seems to be from column “Size of test set”), the total and the average Square Errors are the same. I guess this is why.
Hope this helps,