Am 2013-08-15 14:01, schrieb Andrew Wu:
is 256-byte aligned, otherwise it will failback to use slower MOV loop. MOV loop is much slower then REP MOVSL on Vortex86EX, because reading instructions from ROM is slow.
Can't you cache the ROM area?
Other than that: Optimizing the common code is better than a special case. memmove won't change much, so that's a stable interface, but you never know what will happen (and few people will be able to test their changes on Vortex86).
Patrick