The __fastcall calling convention specifies that arguments to functions are to be passed in registers, when possible. The following list shows the implementation of this calling convention.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | int simpleFunction(int a, int b, int c, int d){ return a+b+c+d; } int __fastcall simpleFastCallFunction(int a, int b, int c, int d){ return a+b+c+d; } int main(int argc, char* argv[]) { int resultNormal, resultFast; resultNormal = simpleFunction(5,6,7,8); resultFast = simpleFastCallFunction(5,6,7,8); return resultNormal+resultFast; } |
The result in IDA Pro (non-optimized)
The code above was compiled with optimization switched off. simpleFunction pushes all variables at stack before being called. As comparison simpleFastCallFunction pushed 2 variables on stack and stores the first 2 variables at registers ecx and edx.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ... .text:00401040 push ebp .text:00401041 mov ebp, esp .text:00401043 sub esp, 8 .text:00401046 push 8 ; int .text:00401048 push 7 ; int .text:0040104A push 6 ; int .text:0040104C push 5 ; int .text:0040104E call ?simpleFunction@@YAHHHHH@Z ; simpleFunction(int,int,int,int) .text:00401053 add esp, 10h .text:00401056 mov [ebp+var_4], eax .text:00401059 push 8 ; int .text:0040105B push 7 ; int .text:0040105D mov edx, 6 ; int .text:00401062 mov ecx, 5 ; int .text:00401067 call ?simpleFastCallFunction@@YIHHHHH@Z ; simpleFastCallFunction(int,int,int,int) .text:0040106C mov [ebp+var_8], eax .text:0040106F mov eax, [ebp+var_4] .text:00401072 add eax, [ebp+var_8] .text:00401075 mov esp, ebp .text:00401077 pop ebp .text:00401078 retn .text:00401078 _main endp ... |
The result in IDA Pro (optimized compile, maximize speed)
To see how the MS compiler (I used Visual Studio 2008) optimizes code we enable code optimization and set it to “maximize speed”. To answer the question how the MS compiler can improve application speed we just have a look at the following code snippet. The compiler first optimizes the code by function analysis and then compiles the optimized result! Just count the lines of assembly code above with the code below and you can guess why the code is faster…
1 2 3 4 5 6 7 | ... .text:00401000 ; int __cdecl main(int argc, const char **argv, const char **envp) .text:00401000 _main proc near .text:00401000 mov eax, 34h .text:00401005 retn .text:00401005 _main endp ... |
Conclusion
One main thing to consider about __fastcall is that usage of the ecx (or ecx and edx) registers in the function without loading them with explicit values before-hand. This typically indicates that they are being used as argument registers, as with __fastcall. The caller does not clean any arguments off the stack (no add esp instruction to clean the stack after the call). With __fastcall, the callee always cleans the arguments. A ret instruction (with no stack displacement argument) is terminating the function, if there are two or less arguments that are pointer-sized or smaller. In this case, __fastcall has no stack arguments. A retn (args-2)*4 instruction is terminating the function, if there are three or more arguments to the function. In this case, there are stack arguments that must be cleaned off the stack via the retn instruction.
ok..how is that optimized.. it loses reusablility,and its usefulness..that like cutting the arms and legs off someone to ‘roll’ them down a hill…and would it ‘optimize’ it like that if the parameters where not static?
excellent work none the less :)
It is reusable. The compiler will only optimise using information it has available. Any parameter which cannot be resolved to a constant at compile time will result in a calculation taking place. The actual calculation though may not be a+b+c+d! It depends on how many parameters turn out to be const; maybe all, some or none.
The function call will only takes place if it cannot be inlined (either because optimisation is off or for other reasons such as a virtual function etc.)
Why it is often best to leave the compiler to optimise away first these days before hitting the assembly