"Fixing" `1 << 31` (technically undefined behavior with known implementation-specific results)

List overview All Threads
Download

newer

older

Updated invitation: coreboot...

New Defects reported by Coverity...

Nico Huber

7 Jan 2021 7 Jan '21

10:42 p.m.

Hi coreboot fellows,

another patch that fixes a `1 << 31` to `1UL << 31` [1] reminded me that some people objected to such changes. I'm not sure if we ever draw a conclusion on the matter.

`1 << 31` is undefined behavior because the `1` can (and thus will) be represented as a (signed) `int` which is limited by 2^31-1 for all our targets. But we know very well that GCC (and I assume Clang too) do the right thing: Produce a value that when casted to an `unsigned int` is converted to 2^31.

So, it's wrong but not broken ;) and any suffix to the `1` makes it a bit harder to read.

What do you think? Should we allow such changes? Should we normalize on any style?

If we want to make it defined behavior, my personal preference would be `1u << 31`. Lower case because it's more distinct from the number part, and we actually don't need a long (that might even hide actual errors if we want a 32-bit limited value).

Cheers, Nico

PS. Happy new year btw. :D

[1] https://review.coreboot.org/c/coreboot/+/49076

Show replies by date

Matt DeVillier

7 Jan 7 Jan

11:19 p.m.

`1u << 31` is correct, clear, and easy to read/understand quickly. It has my vote.

On Thu, Jan 7, 2021 at 3:42 PM Nico Huber nico.h@gmx.de wrote:

...

Hi coreboot fellows,

another patch that fixes a `1 << 31` to `1UL << 31` [1] reminded me that some people objected to such changes. I'm not sure if we ever draw a conclusion on the matter.

`1 << 31` is undefined behavior because the `1` can (and thus will) be represented as a (signed) `int` which is limited by 2^31-1 for all our targets. But we know very well that GCC (and I assume Clang too) do the right thing: Produce a value that when casted to an `unsigned int` is converted to 2^31.

So, it's wrong but not broken ;) and any suffix to the `1` makes it a bit harder to read.

What do you think? Should we allow such changes? Should we normalize on any style?

If we want to make it defined behavior, my personal preference would be `1u << 31`. Lower case because it's more distinct from the number part, and we actually don't need a long (that might even hide actual errors if we want a 32-bit limited value).

Cheers, Nico

PS. Happy new year btw. :D

[1] https://review.coreboot.org/c/coreboot/+/49076 _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Furquan Shaikh

11:24 p.m.

Would it make sense to use BIT(31)?

On Thu, Jan 7, 2021 at 2:20 PM Matt DeVillier matt.devillier@gmail.com wrote:

...

`1u << 31` is correct, clear, and easy to read/understand quickly. It has my vote.

On Thu, Jan 7, 2021 at 3:42 PM Nico Huber nico.h@gmx.de wrote:

...
Hi coreboot fellows,

another patch that fixes a `1 << 31` to `1UL << 31` [1] reminded me that some people objected to such changes. I'm not sure if we ever draw a conclusion on the matter.

`1 << 31` is undefined behavior because the `1` can (and thus will) be represented as a (signed) `int` which is limited by 2^31-1 for all our targets. But we know very well that GCC (and I assume Clang too) do the right thing: Produce a value that when casted to an `unsigned int` is converted to 2^31.

So, it's wrong but not broken ;) and any suffix to the `1` makes it a bit harder to read.

What do you think? Should we allow such changes? Should we normalize on any style?

If we want to make it defined behavior, my personal preference would be `1u << 31`. Lower case because it's more distinct from the number part, and we actually don't need a long (that might even hide actual errors if we want a 32-bit limited value).

Cheers, Nico

PS. Happy new year btw. :D

[1] https://review.coreboot.org/c/coreboot/+/49076 _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Tim Wawrzynczak

8 Jan 8 Jan

12:19 a.m.

I will admit that I am a big fan of the `BIT` macro as well, but that seems to not be as well-received, so if that's not an option, I would second `1u << 31`

On Thu, Jan 7, 2021 at 3:25 PM Furquan Shaikh furquan.m.shaikh@gmail.com wrote:

...

Would it make sense to use BIT(31)?

On Thu, Jan 7, 2021 at 2:20 PM Matt DeVillier matt.devillier@gmail.com wrote:

...
`1u << 31` is correct, clear, and easy to read/understand quickly. It has my vote.

On Thu, Jan 7, 2021 at 3:42 PM Nico Huber nico.h@gmx.de wrote:

...
Hi coreboot fellows,

another patch that fixes a `1 << 31` to `1UL << 31` [1] reminded me

that

...
...
some people objected to such changes. I'm not sure if we ever draw a conclusion on the matter.

`1 << 31` is undefined behavior because the `1` can (and thus will) be represented as a (signed) `int` which is limited by 2^31-1 for all our targets. But we know very well that GCC (and I assume Clang too) do the right thing: Produce a value that when casted to an `unsigned int` is converted to 2^31.

So, it's wrong but not broken ;) and any suffix to the `1` makes it a bit harder to read.

What do you think? Should we allow such changes? Should we normalize on any style?

If we want to make it defined behavior, my personal preference would be `1u << 31`. Lower case because it's more distinct from the number part, and we actually don't need a long (that might even hide actual errors if we want a 32-bit limited value).

Cheers, Nico

PS. Happy new year btw. :D

[1] https://review.coreboot.org/c/coreboot/+/49076 _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Felix Held

12:51 a.m.

Hi,

While I find the BIT() macro to be much better than the BITx defines in edk2, I still prefer the non-macro form and at least in the subtrees I maintain I try to get rid of BIT() usage in new code that gets merged. Since BIT() can only be used for single bits and not for multiple bits and bitmasks, the header files become a mix of BIT() and more than one bit shifted by x bits, which i find inconsistent and try to avoid.

Regards, Felix

Peter Stuge

1:12 a.m.

Felix Held wrote:

...

While I find the BIT() macro to be much better than the BITx defines

Why?

I don't think it was invented by edk2, so edk2 using it shouldn't be held against the format. :)

...

header files become a mix of BIT() and more than one bit shifted by x bits, which i find inconsistent and try to avoid.

I don't mind those being "inconsistent" because they represent two different things; one is a single bit value, the other a multibit value.

There could of course be a multibit macro, but the benefit diminishes because the macro is neither shorter nor simpler than the expansion.

//Peter

Felix Held

1:27 a.m.

...

...
While I find the BIT() macro to be much better than the BITx defines

Why?

The BITx defines seem to be a rather redundant way of doing things to me.

...

I don't think it was invented by edk2, so edk2 using it shouldn't be held against the format. :)

Sure, but that's where I remember seeing those defines in coreboot [1].

...

I don't mind those being "inconsistent" because they represent two different things; one is a single bit value, the other a multibit value.

When you have a register with single bit values and multibit values which is a very common case, I prefer having both in the same style/format. At least for me this improves readability.

...

There could of course be a multibit macro, but the benefit diminishes because the macro is neither shorter nor simpler than the expansion.

Yeah, that would likely also be bad for readability.

Regards, Felix

[1] https://review.coreboot.org/plugins/gitiles/coreboot/+/refs/heads/master/src...

Harshit Sharma

1:35 a.m.

Hi,

Although, I agree that 1u << 31 is much better, I believe 1 << 31 is not wrong either as long as it is assigned to an 'unsigned int' as the compiler performs an implicit conversion from a lower data type to a higher data type ('int' to 'unsigned int' in this case). That's the reason the line "return *shadow_addr" at lib/asan.c:89 (that Shawn talked about in the previous thread) seems correct to me as 'unsigned char' is being implicitly converted to 'bool'.

Best, Harshit

On Thu, Jan 7, 2021 at 4:28 PM Felix Held felix-coreboot@felixheld.de wrote:

...

...
...
While I find the BIT() macro to be much better than the BITx defines

Why?

The BITx defines seem to be a rather redundant way of doing things to me.

...
I don't think it was invented by edk2, so edk2 using it shouldn't be held against the format. :)

Sure, but that's where I remember seeing those defines in coreboot [1].

...
I don't mind those being "inconsistent" because they represent two different things; one is a single bit value, the other a multibit value.

When you have a register with single bit values and multibit values which is a very common case, I prefer having both in the same style/format. At least for me this improves readability.

...
There could of course be a multibit macro, but the benefit diminishes because the macro is neither shorter nor simpler than the expansion.

Yeah, that would likely also be bad for readability.

Regards, Felix

[1]

https://review.coreboot.org/plugins/gitiles/coreboot/+/refs/heads/master/src... _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Nico Huber

3:10 p.m.

Hi Harshit,

On 08.01.21 01:35, Harshit Sharma wrote:

...

Although, I agree that 1u << 31 is much better, I believe 1 << 31 is not wrong either as long as it is assigned to an 'unsigned int' as the compiler performs an implicit conversion from a lower data type to a higher data type ('int' to 'unsigned int' in this case).

you are right about the implicit conversion, but this happens too late. The actual error may happen before the conversion. I guess mostly because of the way signed numbers are handled in assembler, compilers usually produce code that results in -2^31 for `1 << 31`. This -2^31 then implicitly converted to an unsigned is 2^31 which is what we usually intend.

But the theoretical problem is what happens if the compiler produces code that does not result in -2^31 for the signed number. As the C standard implicitly says `1 << 31` is undefined, the compiler can basically do what it wants and would still be C compliant.

Nico

Harshit Sharma

9 Jan 9 Jan

9:12 p.m.

Thanks for your explanation. I do understand now how '1 << 31' could cause a potential problem.

On Fri, Jan 8, 2021 at 6:10 AM Nico Huber nico.h@gmx.de wrote:

...

Hi Harshit,

On 08.01.21 01:35, Harshit Sharma wrote:

...
Although, I agree that 1u << 31 is much better, I believe 1 << 31 is not wrong either as long as it is assigned to an 'unsigned int' as the

compiler

...
performs an implicit conversion from a lower data type to a higher data type ('int' to 'unsigned int' in this case).

you are right about the implicit conversion, but this happens too late. The actual error may happen before the conversion. I guess mostly because of the way signed numbers are handled in assembler, compilers usually produce code that results in -2^31 for `1 << 31`. This -2^31 then implicitly converted to an unsigned is 2^31 which is what we usually intend.

But the theoretical problem is what happens if the compiler produces code that does not result in -2^31 for the signed number. As the C standard implicitly says `1 << 31` is undefined, the compiler can basically do what it wants and would still be C compliant.

Nico

Nico Huber

8 Jan 8 Jan

2:40 p.m.

Hi Peter,

On 08.01.21 01:12, Peter Stuge wrote:

...

Felix Held wrote:

...
While I find the BIT() macro to be much better than the BITx defines

Why?

for me it's that the BITx defines provide no separation between the BIT and the number. Also, the all-caps BIT letters come close to decimal digits; e.g. try to read BIT118 without losing focus (yeah, that's an exaggerated example). BIT(118) would make the number more distinct and I believe 1 << 118 even more.

EDK code just makes this problem very visible with expressions like (BIT17|BIT8|BIT1) (yes, let's donate them some whitespace)

...

...
header files become a mix of BIT() and more than one bit shifted by x bits, which i find inconsistent and try to avoid.

I don't mind those being "inconsistent" because they represent two different things; one is a single bit value, the other a multibit value.

Just to clarify (I hope), I guess what Felix wants to avoid are single expressions that mix both, e.g. something like

something = BIT(13) | 3 << 3 | BIT(0);

And I agree, I would prefer to just use `1 << x` here.

Nico

Angel Pons

1:37 a.m.

Hi list,

Since register fields usually have more than one bit, I prefer to always use explicit shifts for consistency. The one case where I prefer the BIT() macro is to reduce the amount of nested braces when testing for individual bits in a mask. GCC encourages adding unnecessary braces for expressions such as `1 << x + 3`, so it's easy to end up with five levels of brace nesting in a single expression.

For silicon-specific code where portability isn't an issue (different hardware, different registers) and there's lots of hardware registers with multiple fields each, I prefer to use bitfield structs. The resulting code is rather verbose because it provides more information to both humans and compilers, so there can be less comments that can eventually become lies. This approach also prevents making certain kinds of mistakes, e.g. using the names from a different register or using too large constants in initializers, both of which will cause a build failure. This also avoids compiler woes when using bitwise negations in functions taking an AND-mask, where values get promoted to signed int and then need to be truncated to a smaller size unsigned type. Refer to https://review.coreboot.org/42134 for a more detailed explanation of the issue. When using bitfield structs, updating the value in a register field involves three discrete steps: read the register into a local variable, then assign the new value to the desired field (or fields), and finally write the updated value back. This is substantially more verbose, but it's also harder to make a dumb mistake such as forgetting to negate a mask.

Best regards, Angel

Peter Stuge

1:26 a.m.

Nico Huber wrote:

...

So, it's wrong but not broken

..yet

...

What do you think? Should we allow such changes?

I think yes.

...

Should we normalize on any style?

There is an argument to be made for allowing many different styles, but I think it would be wise to standardize.

...

my personal preference would be `1u << 31`.

That looks nicer, but based purely on my limited experience with C source code seems a lot less common than UL. Are there any relevant compatibility matters known, for the different suffixes?

I don't hate BIT(31) but I strongly prefer BIT31, which I use always because it's short and sweet; easy to type and read.

Macros also have the significant benefit of moving the definition into a single place, which for one prevents future omission mistakes.

//Peter

Nico Huber

2:58 p.m.

Hi,

On 08.01.21 01:26, Peter Stuge wrote:

...

Nico Huber wrote:

...
So, it's wrong but not broken

..yet

that's right. But I think chances are incredibly low that C compilers would change this behavior. Of all the things that I'm worried about that could change, this case -- shifting out of the range of a signed integer but still in range of an unsigned -- seems the least worrisome.

...

...
What do you think? Should we allow such changes?

I think yes.

...
Should we normalize on any style?

There is an argument to be made for allowing many different styles, but I think it would be wise to standardize.

...
my personal preference would be `1u << 31`.

That looks nicer, but based purely on my limited experience with C source code seems a lot less common than UL. Are there any relevant compatibility matters known, for the different suffixes?

Not for the suffixes but the types they represent. I would like to avoid `long` at all costs. Simply because we have LP64 and LP32 targets. That means `long` is not a single type for our code, it could be a 32 or a 64-bit type.

I think that `UL` is commonly used is just because it covers most cases of tools complaining (it's not necessarily the right choice in every instance, though).

Now that I looked, BIT() is defined using `ul`. If I'm not mistaken, this means you can use something like BIT(32) and the code would even pass all tools for an LP64 target. But if you use the same code for a 32-bit target later...

A quick fix could be to use `ull`. But it doesn't fix the underlying problem: The macro is hiding the type used. Encoding the type into the macro name could fix that, e.g. BIT32(), BIT64(), but I guess that would make it less useful for readability.

Nico

Julius Werner

3:20 a.m.

I'm someone who really doesn't like it. The U in 1U << 31 is unnecessary visual clutter that really makes the number harder to read (I guess 1u is slightly better but not by that much). I think we should define our code style first and foremost to make code easily written and easily read (with no ambiguity and no risk for issues). This isn't a "real" issue in any way -- both GCC and clang handle this the right way, and they even intentionally provide warning switches to allow this (-Wshift-overflow=1 vs -Wshift-overflow=2). So this is perfectly well supported and there's no indication it ever wouldn't in the future. The only thing that could be said against it is that it's not official standard C, and to that I would say that the official C standard is just dumb and incomplete in general. We already frequently use plenty of GCC extensions throughout coreboot, because the equivalent code can either not be written or not be written as cleanly or safely in fully standard-compliant C. In my book, cleanliness and readability always win against just doing what the standard says for the standard's sake.

I am okay with BIT(), I think that's a fair alternative and I wouldn't mind existing cases being rewritten if someone really wants to do that. But that still leaves the problem of 3 << 30 and things like that, so I don't think it's a full solution. (You can't really expand the macro to the multi-bit case without defining what's essentially just a wrapper macro for the shift operator, and at that point I don't think it makes sense anymore.)

One example of things that happen when you insist on 1U is that constants cannot be shared between C and assembly code anymore (because the assembler doesn't support the 1U syntax). If you wanna solve this you need to define a macro that either does or doesn't add the U based on whether it's being built for assembler or C (and I think even that doesn't work for inline assembly?), and then you have something like U(1) << 31 everywhere. You can see how that looks in the Trusted Firmware project which does it that way (e.g. https://review.trustedfirmware.org/plugins/gitiles/TF-A/trusted-firmware-a/+...)... if you ask me it's ugly, harder to read and unnecessarily confusing to new contributors.

(Also, slight tangent but maybe this is a good opportunity to shill the new DEFINE_BITFIELD/SET32_BITFIELDS() API we added a year ago: https://review.coreboot.org/plugins/gitiles/coreboot/+/refs/heads/master/src... ? By defining bit fields semantically you don't have to deal with raw shifts at all anymore, which is both safer and in my opinion more readable. It already has unsigned casts wrapped in the macros so there's no reason to add a U manually anywhere there anymore.)

1581

days inactive

1583

days old

coreboot@coreboot.org

14 comments

9 participants

tags (0)

participants (9)

Angel Pons
Felix Held
Furquan Shaikh
Harshit Sharma
Julius Werner
Matt DeVillier
Nico Huber
Peter Stuge
Tim Wawrzynczak