On 29.11.2008 11:53, Stefan Reinauer wrote:
Carl-Daniel Hailfinger wrote:
I think I can fix that serialization problem,
Aha? Mind to share the insight?
Sure. You found the bug. Really. Remember what you wrote earlier? Let me quote you:
The problem is that we end up with a valid option_table.o file (not truncated) but it has no symbols and no data in there. The file is 900 something bytes, which is exactly the size of an elf object created by touch foo.c gcc -c foo.c -o foo.o
And that's exactly what was happening. How?
Look at how option_table.c and option_table.o are generated. util/options/build_opt_tbl.c is doing that and it seems to be doing this in a way that confuses make. How can creating a simple file confuse make? GCC does it all the time. The answer is that gcc does it differently. Make totally depends on timestamps. It also assumes that if a file is present, it is usable. (Reread the last sentence, it is important.) The only way to make sure that a file is usable directly in the instant it is created (to avoid race conditions) is to demand creating and writing the whole file has to be one atomic operation. There is no way to do that directly with the standard fopen/fwrite/fclose. There is one way out: Use an atomic operation to make the whole file available. Rename is atomic. Create(open)/write/close a file with a temporary unique name, and after that is done, rename it to the file you wanted to create in the first place. gcc does it that way and make is happy. build_opt_tbl creates the files directly and has a HUGE race between fopen and fclose. We're hitting that race condition.
How do we solve it? Two ways are possible: 1. Fix build_opt_tbl.c to use fopen/fwrite/fclose/rename. 2. Perform that logic in the makefile. Fixing build_opt_tbl.c is IMHO the preferred course because it avoids hacks in makefiles.
I'd provide a patch, but my right hand is injured and I am typing with my left hand only (no worries ;-)).
but it seems r3777 makes the situation a bit better than it was before. That alone is an improvement. Stefan, you are right about the time/result tradeoff. If the failure rate stays low enough, we might want to leave your fix in place and simply accept the occassional failure.
Oh, we do want to leave my fix in place, even if you come up with another fix on top of that.
Absolutely. (Technically, the problem will be fixed completely by fixing build_opt_tbl.c even if r3777 is reverted.) Improving make rules for better readability and less overhead is something I'd call a requirement as well, so r3777 is a keeper.
Having two rules for the same set of files is quite unhealthy.
Yes, that's true. We had a similar problem some time back in v2. v3 also has them, but they are not showing up yet. I should resend my v3 dependency fix RFC, maybe it gathers more interest this time.
Regards, Carl-Daniel