Archives
All Posts Tagged
Tag: ‘optimization’

Why does the dec/jne combo operate faster than the equivalent loopnz?

The dec/jnz pair operates faster then a loopsz for several reasons. First, dec and jnz pair up in the different modules of the netburst pipeline, so they can be executed simultaneously. Top that off with the fact that dec and jnz both require few cycles to execute, while the loopnz (and all the loop instructions, for that matter) instruction takes more cycles to complete. loop instructions are rarely seen output by good compilers.

Read More

Manual Optimization

The following lines of assembly code are not optimized, but they can be optimized very easily. Can you find a way to optimize these lines?

Read More

Duff’s device

The famous “Duff’s device” in C makes use of the fact that a case statement is still legal within a sub-block of its matching switch statement. Tom Duff used this for an optimised output loop. Duff’s device is an optimized implementation of a serial copy that uses a technique widely applied in assembly language for loop unwinding. It is perhaps the most dramatic use of case label fall-through in the C programming language to date.

Read More