I agree with most of the features of the proposed ABI.
Nevertheless, I believe that a modern ABI should take into account that C is no longer the only language for which the ABI should be adequate.
Some requirements of other languages can be easily accommodated, e.g. for languages that allow returning multiple values, they can be placed in multiple registers starting with r0, exactly like the input arguments, not only in r0, like the single return value of C.
A much more important requirement of other languages is to allow the efficient implementation of procedures with tail calls, e.g. with tail recursion.
For this, the stack must be deallocated in the called procedure, not in the caller.
The one and only reason for the existence of the so-called C calling convention, where the caller deallocates the stack, is that it was a lazy solution to the (former) existence of lazy C programmers, who called vararg functions, e.g. printf, without also including the appropriate header where its prototype was declared (or including pre-standard headers, where vararg functions were not marked), thus the compiler could never know if an external function was vararg or not, so it had to suppose that all of them are vararg.
If such practices are prohibited, as they should be, then the right implementation of vararg functions is that the compiler must add an extra hidden parameter, e.g. the old value of the stack pointer, that would allow the called procedure to correctly deallocate the stack.
In that case the ABI should specify that the callee must deallocate the stack. There is absolute no advantage to defer the deallocation until after the return.
Besides allowing efficient tail calls, this ABI rule would also reduce the size of the code, because it replaces multiple deallocation instructions from the callers with a single instruction in the callee.
I am aware that the defendants of the C calling convention typically claimed that its code size disadvantage is not so great, because the compiler may coalesce several deallocation instructions into only one, inside the caller (if it includes multiple procedure calls).
I do not agree with this claim. The main environment where C is still dominant, and where code size is also essential, is in programs for embedded computers. However, that is also the environment where the stack size is severely constrained and deferring stack deallocation, to reduce the code size, greatly increases the risk of stack overflow, so that is not an acceptable solution.