[coreboot] [patch]: memcpy/memset inline asm & config_compress disabled when needed

Bao, Zheng Zheng.Bao at amd.com
Tue Feb 23 02:45:20 CET 2010


I think the memcpy/memset and decompressing are slow because of the Icache. Only one instruction executes repeatedly in the asm code. It doesn't have to access the instructions in the ROM every time. The memcpy/memset are easy to narrow down to a single instruction. But ulzma() can not. So I don't like my patch either. Does anybody have the idea to finally solve this problem?

Zheng

-----Original Message-----
From: ron minnich [mailto:rminnich at gmail.com] 
Sent: Tuesday, February 23, 2010 12:29 AM
To: Bao, Zheng
Cc: coreboot at coreboot.org
Subject: Re: [coreboot] [patch]: memcpy/memset inline asm & config_compress disabled when needed

On Sun, Feb 21, 2010 at 8:48 PM, Bao, Zheng <Zheng.Bao at amd.com> wrote:


>
> Index: src/lib/memcpy.c
> ===================================================================
> --- src/lib/memcpy.c    (revision 5133)
> +++ src/lib/memcpy.c    (working copy)
> @@ -3,10 +3,14 @@
>  {
>        const char *src = vsrc;
>        char *dest = vdest;
> -       int i;
>
> -       for (i = 0; i < (int)bytes; i++)
> -               dest[i] = src[i];
> +       __asm__ __volatile__ (                          \
> +               "cld \n\t"                              \
> +               "rep \n\t"                              \
> +               "movsb"                                 \
> +               :               /* No output */         \
> +               : "S"(src), "D"(dest), "c"(bytes)       \
> +               );
>
>        return vdest;
>  }
> Index: src/lib/memset.c
> ===================================================================
> --- src/lib/memset.c    (revision 5133)
> +++ src/lib/memset.c    (working copy)
> @@ -2,11 +2,15 @@
>
>  void *memset(void *s, int c, size_t n)
>  {
> -       int i;
>        char *ss = (char *) s;
>
> -       for (i = 0; i < (int)n; i++)
> -               ss[i] = c;
> +       __asm__ __volatile__ (                  \
> +               "cld\n\t"                       \
> +               "rep\n\t"                       \
> +               "stosb"                         \
> +               :                               \
> +               : "a"(c), "D"(ss), "c"(n)       \
> +               );
>
>        return s;
>  }

I'm glad this works, but I am afraid I have a concern about it.

I've become opposed to inline assembly on several principles in the
last few years:

- we've seen one problem after another as the manner in which __asm__
is supposed to be used has varied as gcc changes.
  we've also had code that had errors in __asm__ which nobody noticed.

- I'd like to understand why the C code is so much worse. I can't see
a good reason for C to be so bad at this simple task.
For example, with the memset, what if we just created a memzero
function instead, which used a constant '0' instead of a parameter:
does the code improve?

- I now feel a .c file should contain C. If we need a .s, let's create a .s.

So, first, can we please have a look at the .s produced by the C code
and see if we can understand why it is so slow?

Thanks

Ron






More information about the coreboot mailing list