[flashrom] [PATCH 4/4] layout: Better file parsing.

Carl-Daniel Hailfinger c-d.hailfinger.devel.2006 at gmx.net
Tue Sep 17 11:08:07 CEST 2013

Am 16.09.2013 16:36 schrieb Stefan Tauner:
> On Mon, 16 Sep 2013 09:48:35 +0200
> Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006 at gmx.net> wrote:
>
>> so I can decide how to proceed with the review.]
> It would have been very nice if we would have discussed the parts below
> before I did implement them. The only reason why I have been jumping
> from hoop to hoop was to make migration as easy as possible to reduce
> possible reasons to nak the whole thing. Stefan Reinauer actually
> said that he thinks we could just ignore the previous layout
> specification completely because no one uses it anyway... so I don't
> have problems reducing backward compatibility at all... it would just
> have been easier if I would have known your stance earlier.

I am very sorry for the misunderstanding.
For me "new layout file" always implied that we'd drop support for the
old style (hard switchover), like we did with UI changes in the past.

>> Am 12.09.2013 22:40 schrieb Stefan Tauner:
>>> A complete rewrite of the layout file parser:
>>>  - safe (unlike before)
>>>  - supports including/sourcing of other layout files (relative to the including
>>>    file as well as via absolute paths)
>>>  - checks for duplicate region names (breaks inclusion loops too)
>>>  - can handle region names and layout file names with spaces correctly
>>>  - way better diagnostics
>>>  - no more character arrays in struct romentry_t but pointers only
>>>
>>> Signed-off-by: Stefan Tauner <stefan.tauner at student.tuwien.ac.at>
>>
>> Can we change the first line slightly to
>>
>> # flashrom layout v2
>>
>> and also mandate that region names may neither contain whitespace nor
> Why no whitespace...?

Well, I'm absolutely OK with whitespace (but not linebreaking
whitespace), but I'd like to have a consistent way to specify region
names. Either always quoted (and then whitespace is not an issue) or
never quoted (then whitespace is impossible).

>> I'd also like to drop support for v1 layouts as part of the switchover.
>> There is a conversion script and thus no reason to support the old
>> interface any more. The command line invocation changed anyway, so
>> nobody will be able to reuse old commands, and then there's no point to
>> reuse old layout files. This should reduce the amount of code needed.
> You dont expect old layouts be used in new version, but you expect old
> versions be used with new layout files? That does not make a lot of
> sense. (Dropping v1 support is ok with me, although IIRC that is not so
> much of a problem... it is already there and we could drop it later
> too).

I expect people to use an old flashrom version _by accident_ (i.e. they
run the distro-shipped /usr/bin/flashrom instead of ./flashrom) and I'd
like to make sure new layout files are rejected by old code.

>>>  flashrom.8.tmpl |  50 +++++---
>>>  layout.c        | 348 +++++++++++++++++++++++++++++++++++++++++++++++---------
>>>  2 files changed, 331 insertions(+), 67 deletions(-)
>>>
>>> diff --git a/flashrom.8.tmpl b/flashrom.8.tmpl
>>> index b1371e4..8793e91 100644
>>> --- a/flashrom.8.tmpl
>>> +++ b/flashrom.8.tmpl
>>> @@ -106,23 +106,46 @@ size for the flash bus.
>>>  .BR <file> .
>>>  .sp
>>> -A layout entry describes an address region of the flash chip and gives it a name. This allows to access certain
>>> -parts of the flash chip only. A layout file can contain multiple entries one per line with the following syntax:
>>> +A layout file is useful for interacting with parts of a flash chip only. It can contain comments which are
>>> +started by a pound sign (#) and causes all following characters on the same line to be ignored with one
>>> +exception: Every layout file
>>> +.B must
>>> +start with a comment in the following form to define the version of the file. Example for the current version 2:
>>> +.sp
>>> +.B "  # flashrom layout 2"
>>> +.sp
>>> +Every other line may contain either a command to include another layout file, a layout entry or white space
>>> +(and an optional comment).
>>> +.sp
>>> +To include another file you can use the
>>> +.sp
>>> +.B "  source <path>"
>>> +.sp
>>> +syntax where
>>> +.B path
>>> +specifies the absolute or relative path to the file to be included. Relative paths are interpreted to be
>>> +relative to the file containing the include command. If the path contains spaces it has to be written in double
>>> +quotes, or else only the part before the first space will be used.
>> Ouch! Can we please either enforce double-quotes everywhere or nowhere?
> I don't see the problem... but again this was mainly driven to mimic
> the old behavior as much as possible, but nevertheless allow for new
> features.

This is our chance to kill all the features which make our layout
parsing code complicated/buggy. We're breaking backward compatibility
anyway, so why not make it easier for future maintainers?

>>> +.sp
>>> +Each layout entry describes an address region of the flash chip and gives it a name. One entry per line is
>>> +allowed with the following syntax:
>>>  .sp
>>>  .sp
>>> -are hexadecimal addresses within the ROM file representing the flash ROM contents.
>>> -numbers is not necessary, but you can't specify decimal/octal numbers.
>>> -.BR "regionname " "is an arbitrary name for the region from "
>>> +are addresses within the ROM file representing the flash ROM contents. They are interpreted in the 'usual' form
>>> +i.e.\ a leading 0 means octal, leading 0x or 0X means hexadecimal, everything else is just decimal.
>>
>> Octal support is a really bad idea IMHO. Especially when people try to
> Then they should use hex values. I dont see a reason to change this.
> The format is rather common and understood by most technical people.
> And this is not a feature for every user.

OK.

>>> +.BR "regionname " "is the name for the region from " "startaddr " "to " "endaddr " "(both addresses included)."
>>> +If the name contains spaces it has to be written in double quotes, or else only the part before the first space
>>> +will be used.
>>>  .sp
>>>  Example content of file rom.layout:
>>>  .sp
>>> -  00000000:00008fff gfxrom
>>> -  00009000:0003ffff normal
>>> -  00040000:0007ffff fallback
>>> +  # flashrom layout 2
>>> +  source /home/flashrom/include.layout
>>> +  0x00000000:0x00008fff "gfx rom"
>>> +  0x00009000:0x0003ffff normal
>>> +  0x00040000:0x0007ffff fallback
>>>  .sp
>>>  .TP
>>>  .B "\-i, \-\-image <name>[:<file>]"
>>> @@ -136,10 +159,11 @@ parameter specifies the name of an image file that is used to map the contents o
>>>  This is useful for example if you want to write only a part of the flash chip and leave everything else alone
>>>  without preparing an image of the complete chip manually:
>>>  .sp
>>> -If you only want to update the image named
>>> -.BR "normal " "and " "fallback " "in a ROM based on the layout mentioned above, run"
>>> +If you only want to update the regions named
>>> +.BR "normal " "and " "gfx rom " "in a ROM based on the layout mentioned above, run"
>>>  .sp
>>> -.B "  flashrom \-p prog \-l rom.layout \-i normal -i fallback \-w some.rom"
>>> +
>>> +.B "  flashrom \-p prog \-l rom.layout \-i normal -i ""gfx rom"" \-w some.rom"
>>>  .sp
>>>  .RB "Overlapping sections are resolved in an implementation-dependent manner - do " "not " "rely on it."
>>>  .TP
>>
>> BIG FAT NOTE:
>> The review below is of a patch which had the deleted lines stripped out
>> to improve readability (the patch was almost unreadable due to code moves).
>>
>>
>>> diff --git a/layout.c b/layout.c
>>> index 20fe303..5b4bee4 100644
>>> --- a/layout.c
>>> +++ b/layout.c
>>> @@ -22,9 +22,11 @@
>>>  #include <stdio.h>
>>>  #include <stdlib.h>
>>>  #include <string.h>
>>> +#include <ctype.h>
>>>  #include <limits.h>
>>>  #include <errno.h>
>>> +#include <libgen.h>
>> Not available on Windows AFAICS.
> We dont support windows. we support mingw ;) and that has it since 2009
> or so.

My bad, I retract my objection.

>>>  #include <fcntl.h>
>>>  #include <sys/stat.h>
>>>  #endif
>>> @@ -32,12 +34,15 @@
>>>  #include "programmer.h"
>>>
>>>  #define MAX_ROMLAYOUT	32
>>> +#define MAX_ENTRY_LEN	1024
>>> +#define WHITESPACE_CHARS " \t"
>>> +#define INCLUDE_INSTR "source"
>>>
>>>  typedef struct {
>>>  	unsigned int start;
>>>  	unsigned int end;
>>>  	unsigned int included;
>>> -	char name[256];
>>> +	char *name;
>>>  	char *file;
>>>  } romentry_t;
>>>
>>> @@ -50,59 +55,307 @@ static int num_rom_entries = 0; /* the number of valid rom_entries */
>>>  static char *include_args[MAX_ROMLAYOUT];
>>>  static int num_include_args = 0; /* the number of valid include_args. */
>>>
>>> +/* returns the index of the entry (or a negative value if it is not found) */
>>> +static int find_romentry(char *name)
>>>  {
>>>  	int i;
>>> +	msg_gspew("Looking for region \"%s\"... ", name);
>>> +	for (i = 0; i < num_rom_entries; i++) {
>>> +		if (strcmp(rom_entries[i].name, name) == 0) {
>>> +			msg_gspew("found.\n");
>>> +			return i;
>>> +		}
>>> +	}
>>> +	return -1;
>>> +}
>>>
>>> +/* FIXME: While libpayload has no file I/O itself, code using libflashrom could still provide layout information
>>> + * obtained by other means like user input or by fetching it from somewhere else. Therefore the parsing code
>>> + * should be separated from the file reading code eventually. */
>>> +/** Parse a \em possibly quoted string.
>>> + *
>>> + * The function expects \a startp to point to the string to be parsed without leading white space. If the
>>> + * string starts with a quotation mark it looks for the second one and removes both in-place, if not then it
>>> + * looks for the first white space character and uses that as delimiter of the wanted string.
>>> + * After returning \a startp will point to a string that is either the original string with the quotation marks
>>> + * removed, or the first word of that string before a white space.
>>> + * If \a endp is not NULL it will be set to point to the first character after the parsed string, which might
>>> + * well be the '\0' at the end of the string pointed to by \a startp if there are no more characters.
>>> + *
>>> + * @param start	Points to the input string.
>>> + * @param end	Is set to the first char following the input string.
>>> + * @return	0 on success, 1 if the parsed string is empty or a quotation mark is not matched.

Should we use -1 as error return here?

>>> + */
>>> +static int unquote_string(char **startp, char **endp)
>> If we disallow spaces in region names and if we make include file names
>> either always-quoted or never-quoted, this complete function can die.
> Sure, but if that's the only argument then I would rather keep it.
> What's so problematic about it?

AFAICS your code also handles fun stuff like
"foo bar"baz
I'm totally fine with requiring quotes at the beginning and the end and
nowhere else. That should simplyfy the code a bit.

> If it ever gets in our way just drop it
> in a possible version 3 :) (half-jokingly)

I do expect our understanding of layout files to change over time, and
maybe even change before the next release.

>>> + * @param	entry If not NULL fill it with the parsed data, else just detect errors and print diagnostics.
>>> + * @return	1 on error,
>>> + *		0 if the line could be parsed into a layout entry succesfully,
>>> + *		-1 if a file was successfully sourced.
>> Very odd choice of return codes. Negative values are usually associated
>> with errors.
> Would the reversed order be ok with you?

Yes, definitely.

Regards,
Carl-Daniel

--
http://www.hailfinger.org/