[coreboot] global changes and checking programs

Mon Aug 11 17:18:42 CEST 2014

There's lots of really nice work going on with clang and other tools.
I just want to put out a word of caution, based on what I'm seeing.

A number of these tests are going in with global changes, no testing,
and in some examples, no clear improvement or bug fix, e.g. "remove
all blank lines at the end of a file", which can cause issues for
others to pick over later.

We all make mistakes. It is a given that some of these changes will
have errors. In many cases there is no testing because most people
only have one board, and they are changing many boards. Worse, they
are changing them at the behest of warnings from a new toolchain or
checker. I can't tell you how many issues we've had with toolchain
over the years; they've been frequent. Changes driven by toolchain
warnings worry me.

I was just involved in a port last week and one of the coreboot
utilities was always exiting via core dump. We don't know why yet as
it does not fail under gdb. But it's disconcerting to find a
formerly-working tool suddenly breaking.

So here are the things that can happen.

1. code change for N boards, test none, break 1 or more or N
2. code change for N boards, test one, break 2 or more or N
3. cosmetic change for N boards, create a bug due to a later merge issue
    (latest greatest example: changing 'haven't' to 'have not' caused
a bad merge)
4. code change for a tool, fixes no real problem, creates a future
problem due to incorrect
     coding -- but the audit tool reports no errors (this means that
in part the audit tool
     has errors)
      e.g.
      main(){
      char *s = malloc(x);
       lots of code some of which uses s
       point A:
        lots of code NONE of which uses s
       }
      cppcheck reports a non-freed variable. Somebody puts a free(s) at point A:
      But they don't set s to NULL -- a common mistake.
       Result: s is pointing to freed memory, which is unquestionably
far worse than the
       leak cppcheck claims is there. cppcheck and poor practice induced a bug.
5. Add broken cpp guards which cause more troubles than they might solve
    due to "best practices" -- which IMHO they are not.

These are five types of errors I've seen in the last week, there are
certainly more.

Just please be very careful. Code check tools are no replacement for
careful code changing.
We've had lots of situations where people broke boards or tools by
"improving" code.

The time delay from a coreboot change to someone testing a board can
be long. It's no fun to spend time debugging an ugly problem arising
from a misguided attempt to clean up code.

To give you some idea of what a reasonable test infrastructure looks
like, see this:
http://build.chromium.org/p/chromiumos/waterfall?reload=60

For every coreboot patch, these tests run on real hardware. Every one.
It's why I'm so willing to trust the chromeos upstreaming: if I'm
familiar with the CL, and it looks the same as it did when I wrote or
reviewed it, and jenkins is happy, it's been through a good testing
cycle.

It's very hard to do that for all coreboot boards, however. So,
please, take care.

ron