The dec/jnz pair operates faster then a loopsz for several reasons. First, dec and jnz pair up in the different modules of the netburst pipeline, so they can be executed simultaneously. Top that off with the fact that dec and jnz both require few cycles to execute, while the loopnz (and all the loop instructions, for that matter) instruction takes more cycles to complete. loop instructions are rarely seen output by good compilers.
Loop Unwinding
When a loop needs to run for a small, but definite number of iterations, it is often better to unwind the loop in order to reduce the number of jump instructions performed, and in many cases prevent the processor’s branch predictor from failing.