On Wed, Jan 27, 2016 at 06:37:28PM +0100, Paolo Bonzini wrote:
On 27/01/2016 18:30, Kevin O'Connor wrote:
On Wed, Jan 27, 2016 at 06:19:43PM +0100, Paolo Bonzini wrote:
On 27/01/2016 18:15, Kevin O'Connor wrote:
Oh, I understand and agree that recovery isn't worth while. My concern is that a hardware error here will appear as a silent hang. Breaking out of the loop eventually, calling warn_timeout(), and returning an error code has the benefit of some debugging from seabios and likely some strong error messages from the calling app.
As I'm more interested in the debugging then the recovery, a simple addition like this would be an improvement IMO:
u32 end = calc_timeout(30000); // 30 second max timeout for (;;) { ... if (timer_check(end)) { warn_timeout(); return DISK_RET_ECONTROLLER; } usleep(50); }
I understood this to be your proposal. The problem is that I wouldn't be able to send any further requests later, because of the way the HBA is programmed.
I'm a bit confused. Without my proposal, the machine will silently busy loop forever and thus also wont be able to send any further requests.
The only difference, as I see it, is whether or not we report the problem.
The difference is that with this proposal you would definitely not be able to recover; without it, you'll get an error (good) but you'll also get an error if the timeout was spurious. In that case, if you do not have a timeout the behavior is a little better.
I think the best would be to handle all drivers in the same way, and configure the timeout at Kconfig time. What do you think? Are there drivers that do read/write timeouts?
Most of the SeaBIOS drivers do implement timeouts. I think the only ones that don't are lsi-scsi, pvscsi, and virtio.
The timeout should come from the hardware spec - a lot of specs do list a maximum transaction time. (It's also possible the SCSI standards may specify a maximum time.)
If there is no maximum time known, then I think using 30 seconds (which comes from the old PATA specs) is sufficient. The transaction size in SeaBIOS is limited to 64K, and if a drive can't read/write that in 30 seconds then I think it's fair to assume something is seriously broken and that the drive wont ever respond.
-Kevin