On Sun, Feb 21, 2010 at 8:48 PM, Bao, Zheng Zheng.Bao@amd.com wrote:
Index: src/lib/memcpy.c
--- src/lib/memcpy.c (revision 5133) +++ src/lib/memcpy.c (working copy) @@ -3,10 +3,14 @@ { const char *src = vsrc; char *dest = vdest;
int i;
for (i = 0; i < (int)bytes; i++)
dest[i] = src[i];
- __asm__ __volatile__ ( \
- "cld \n\t" \
- "rep \n\t" \
- "movsb" \
- : /* No output */ \
- : "S"(src), "D"(dest), "c"(bytes) \
- );
return vdest; } Index: src/lib/memset.c =================================================================== --- src/lib/memset.c (revision 5133) +++ src/lib/memset.c (working copy) @@ -2,11 +2,15 @@
void *memset(void *s, int c, size_t n) {
- int i;
char *ss = (char *) s;
- for (i = 0; i < (int)n; i++)
- ss[i] = c;
- __asm__ __volatile__ ( \
- "cld\n\t" \
- "rep\n\t" \
- "stosb" \
- : \
- : "a"(c), "D"(ss), "c"(n) \
- );
return s; }
I'm glad this works, but I am afraid I have a concern about it.
I've become opposed to inline assembly on several principles in the last few years:
- we've seen one problem after another as the manner in which __asm__ is supposed to be used has varied as gcc changes. we've also had code that had errors in __asm__ which nobody noticed.
- I'd like to understand why the C code is so much worse. I can't see a good reason for C to be so bad at this simple task. For example, with the memset, what if we just created a memzero function instead, which used a constant '0' instead of a parameter: does the code improve?
- I now feel a .c file should contain C. If we need a .s, let's create a .s.
So, first, can we please have a look at the .s produced by the C code and see if we can understand why it is so slow?
Thanks
Ron