Simple function call versus __fastcall

Simple function call versus __fastcall

The __fastcall calling convention specifies that arguments to functions are to be passed in registers, when possible. The following list shows the implementation of this calling convention.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int simpleFunction(int a, int b, int c, int d){
	return a+b+c+d;
}
 
int __fastcall simpleFastCallFunction(int a, int b, int c, int d){
	return a+b+c+d;
}
 
int main(int argc, char* argv[])
{
	int resultNormal, resultFast;
	resultNormal = simpleFunction(5,6,7,8);
	resultFast = simpleFastCallFunction(5,6,7,8);
	return resultNormal+resultFast;
}

The result in IDA Pro (non-optimized)

The code above was compiled with optimization switched off. simpleFunction pushes all variables at stack before being called. As comparison simpleFastCallFunction pushed 2 variables on stack and stores the first 2 variables at registers ecx and edx.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
...
.text:00401040 push    ebp
.text:00401041 mov     ebp, esp
.text:00401043 sub     esp, 8
.text:00401046 push    8               ; int
.text:00401048 push    7               ; int
.text:0040104A push    6               ; int
.text:0040104C push    5               ; int
.text:0040104E call    ?simpleFunction@@YAHHHHH@Z ; simpleFunction(int,int,int,int)
.text:00401053 add     esp, 10h
.text:00401056 mov     [ebp+var_4], eax
.text:00401059 push    8               ; int
.text:0040105B push    7               ; int
.text:0040105D mov     edx, 6          ; int
.text:00401062 mov     ecx, 5          ; int
.text:00401067 call    ?simpleFastCallFunction@@YIHHHHH@Z ; simpleFastCallFunction(int,int,int,int)
.text:0040106C mov     [ebp+var_8], eax
.text:0040106F mov     eax, [ebp+var_4]
.text:00401072 add     eax, [ebp+var_8]
.text:00401075 mov     esp, ebp
.text:00401077 pop     ebp
.text:00401078 retn
.text:00401078 _main endp
...

The result in IDA Pro (optimized compile, maximize speed)

To see how the MS compiler (I used Visual Studio 2008) optimizes code we enable code optimization and set it to “maximize speed”. To answer the question how the MS compiler can improve application speed we just have a look at the following code snippet. The compiler first optimizes the code by function analysis and then compiles the optimized result! Just count the lines of assembly code above with the code below and you can guess why the code is faster…

1
2
3
4
5
6
7
...
.text:00401000 ; int __cdecl main(int argc, const char **argv, const char **envp)
.text:00401000 _main proc near
.text:00401000 mov     eax, 34h
.text:00401005 retn
.text:00401005 _main endp
...

Conclusion

One main thing to consider about __fastcall is that usage of the ecx (or ecx and edx) registers in the function without loading them with explicit values before-hand. This typically indicates that they are being used as argument registers, as with __fastcall. The caller does not clean any arguments off the stack (no add esp instruction to clean the stack after the call). With __fastcall, the callee always cleans the arguments. A ret instruction (with no stack displacement argument) is terminating the function, if there are two or less arguments that are pointer-sized or smaller. In this case, __fastcall has no stack arguments. A retn (args-2)*4 instruction is terminating the function, if there are three or more arguments to the function. In this case, there are stack arguments that must be cleaned off the stack via the retn instruction.