Opened 8 years ago

Closed 8 years ago

Last modified 7 years ago

#204 closed defect (wontfix)

Illegal request CDBs submit to some models of Fujitsu SCSI disks

Reported by: koitsu2009 Owned by: somebody
Priority: major Milestone:
Component: all Version: 5.41
Keywords: solaris scsi fujitsu Cc: Doug Gilbert

Description

Since smartmontools 5.41, on our Solaris 9/10 systems when using "smartctl -a" against a Fujitsu SCSI disk, it appears smartctl is submitting invalid CDBs to the underlying drive (which rejects the command, citing ILLEGAL REQUEST). "smartctl -x" induces two rejections. smartmontools 5.40 does not cause this behaviour.

The problem with the rejection is that the Solaris kernel logs this in such a way that it appears as a disk failure to our NOC, which results tickets opened to have disks replaced when in fact there's nothing wrong with the disk at all.

Relevant details:

# iostat -E -n
c0t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: FUJITSU  Product: MAW3147NC        Revision: 0104 Serial No:
Size: 147.09GB <147086327296 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

# smartctl -x /dev/rdsk/c0t0d0s0
smartctl 5.42 2011-10-20 r3458 [i386-pc-solaris2.10] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               FUJITSU
Product:              MAW3147NC
Revision:             0104
User Capacity:        147,086,327,808 bytes [147 GB]
Logical block size:   512 bytes
Serial number:        DAA0P7B05H9C
Device type:          disk
Transport protocol:   Parallel SCSI (SPI-4)
Local Time is:        Wed Nov  9 01:49:21 2011 PST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:     26 C
Drive Trip Temperature:        65 C
Manufactured in week 46 of year 2007
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  19
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0       6208.306           0
write:         0        9         0         0          0      24562.295           0

Non-medium error count:       53

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Self test in progress ...   -     NOW                 - [-   -    -]
# 2  Background long   Self test in progress ...   -     NOW                 - [-   -    -]

Long (extended) Self Test duration: 3432 seconds [57.2 minutes]
Device does not support Background scan results logging
scsiPrintSasPhy Log Sense Failed [unsupported field in scsi command]

# iostat -E -n
c0t0d0           Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
Vendor: FUJITSU  Product: MAW3147NC        Revision: 0104 Serial No:
Size: 147.09GB <147086327296 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0

Errors on console, which I imagine will greatly help since they include the request CDB:

Nov  9 09:49:21 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci1022,7450@a/pci9005,ffff@a/sd@0,0 (sd0):       Error for Command: inquiry                 Error Level: Informational
Nov  9 09:49:21 scsi: [ID 107833 kern.notice]      Requested Block: 0                         Error Block: 0
Nov  9 09:49:21 scsi: [ID 107833 kern.notice]      Vendor: FUJITSU                            Serial Number: DAA0P7B05H9C
Nov  9 09:49:21 scsi: [ID 107833 kern.notice]      Sense Key: Illegal Request
Nov  9 09:49:21 scsi: [ID 107833 kern.notice]      ASC: 0x24 (invalid field in cdb), ASCQ: 0x0, FRU: 0x0

Nov  9 09:49:22 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci1022,7450@a/pci9005,ffff@a/sd@0,0 (sd0):       Error for Command: log sense(10)           Error Level: Informational
Nov  9 09:49:22 scsi: [ID 107833 kern.notice]      Requested Block: 0                         Error Block: 0
Nov  9 09:49:22 scsi: [ID 107833 kern.notice]      Vendor: FUJITSU                            Serial Number: DAA0P7B05H9C
Nov  9 09:49:22 scsi: [ID 107833 kern.notice]      Sense Key: Illegal Request
Nov  9 09:49:22 scsi: [ID 107833 kern.notice]      ASC: 0x24 (invalid field in cdb), ASCQ: 0x0, FRU: 0x0

I believe the "scsiPrintSasPhy Log Sense Failed" error can explain one of the illegal requests, but I'm not sure where the other is.

Pretty much all our Fujitsu disks behave like this -- more than just the model shown above (smaller models, etc.). If you need me to make a list of them all (for drivedb exclusions/quirks) I can do so.

Let me know how to proceed, I have lots of systems to test with. :-)

Change History (9)

comment:1 Changed 8 years ago by Christian Franke

Keywords: scsi added

comment:2 Changed 8 years ago by koitsu2009

Keywords: fujitsu added

I've spent some time tracking this one down, or at least part of it. This comment is focusing on the INQUIRY failure.

This bug was introduced in r3302 by dpgilbert (support for VPD):

https://sourceforge.net/changeset/3302

Reverting that commit solves the problem. One can also use "-q noserial" to stop the VPD LUID query (not just serial number!) as well.

The problem with the VPD LUID query is that Fujitsu drives don't like type 0x83 for LUID lookup. Type 0x80 (serial number lookup) works fine. LUID is attempted first, then SERNO.

By using "-r scsiioctl" we can see debug data. The LUID lookup (0x83) fails as shown here:

 [inquiry: 12 01 83 00 c8 00 ]
  status=2: sense_key=5 asc=24 ascq=0
  dxfer_len=200, resid=200
Vital Product Data (VPD) INQUIRY failed [3]

Full output:

# smartctl -r scsiioctl -x /dev/rdsk/c0t0d0s0
smartctl 5.42 2011-10-20 r3458 [i386-pc-solaris2.10] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

 [inquiry: 12 00 00 00 24 00 ]
 [inquiry: 12 00 00 00 24 00 ]
Vendor:               FUJITSU
Product:              MAP3367NP
Revision:             0108
 [read capacity(10): 25 00 00 00 00 00 00 00 00 00 ]
User Capacity:        36,748,945,408 bytes [36.7 GB]
Logical block size:   512 bytes
 [mode sense(6): 1a 00 1c 00 40 00 ]
  dxfer_len=64, resid=40
 [mode sense(6): 1a 00 5c 00 40 00 ]
  dxfer_len=64, resid=40
 [inquiry: 12 01 83 00 c8 00 ]
  status=2: sense_key=5 asc=24 ascq=0
  dxfer_len=200, resid=200
Vital Product Data (VPD) INQUIRY failed [3]
 [inquiry: 12 01 80 00 40 00 ]
Serial number:        UPU0P3900BNE
Device type:          disk
 [mode sense(6): 1a 00 19 00 40 00 ]
  dxfer_len=64, resid=44
Transport protocol:   Parallel SCSI (SPI-4)
Local Time is:        Tue Dec 13 13:36:55 2011 PST
 [test unit ready: 00 00 00 00 00 00 ]
Device supports SMART and is Disabled
Temperature Warning Disabled or Not Supported
 [log sense: 4d 00 40 00 00 00 00 00 04 00 ]
 [log sense: 4d 00 40 00 00 00 00 00 10 00 ]
 [log sense: 4d 00 6f 00 00 00 00 00 04 00 ]
 [log sense: 4d 00 6f 00 00 00 00 00 0a 00 ]
 [request sense: 03 00 00 00 12 00 ]
 [log sense: 4d 00 4d 00 00 00 00 00 04 00 ]
 [log sense: 4d 00 4d 00 00 00 00 00 10 00 ]
SMART Health Status: OK

 [log sense: 4d 00 4d 00 00 00 00 00 04 00 ]
 [log sense: 4d 00 4d 00 00 00 00 00 10 00 ]
Current Drive Temperature:     22 C
Drive Trip Temperature:        65 C
 [log sense: 4d 00 4e 00 00 00 00 00 04 00 ]
 [log sense: 4d 00 4e 00 00 00 00 00 28 00 ]
Manufactured in week 38 of year 2003
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  20
 [read defect list(10): 37 00 0c 00 00 00 00 00 04 00 ]
Elements in grown defect list: 0
 [log sense: 4d 00 43 00 00 00 00 00 04 00 ]
 [log sense: 4d 00 43 00 00 00 00 00 3a 00 ]
 [log sense: 4d 00 42 00 00 00 00 00 04 00 ]
 [log sense: 4d 00 42 00 00 00 00 00 3a 00 ]
 [log sense: 4d 00 45 00 00 00 00 00 04 00 ]
 [log sense: 4d 00 45 00 00 00 00 00 3a 00 ]

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        2         0         0          0       6441.649           0
write:         0       13         0         0          0       8449.913           0
 [log sense: 4d 00 46 00 00 00 00 00 04 00 ]
 [log sense: 4d 00 46 00 00 00 00 00 0c 00 ]

Non-medium error count:      124
 [mode sense(6): 1a 00 0a 00 40 00 ]
  dxfer_len=64, resid=40
 [log sense: 4d 00 50 00 00 00 00 00 04 00 ]
 [log sense: 4d 00 50 00 00 00 00 01 94 00 ]
No self-tests have been logged
 [mode sense(6): 1a 00 0a 00 40 00 ]
  dxfer_len=64, resid=40
Long (extended) Self Test duration: 1479 seconds [24.6 minutes]
Device does not support Background scan results logging
 [log sense: 4d 00 58 00 00 00 00 00 04 00 ]
  status=2: sense_key=5 asc=24 ascq=0
  dxfer_len=4, resid=4
scsiPrintSasPhy Log Sense Failed [unsupported field in scsi command]

I'm aware the above information is for a disk that is different in my original report -- like I said, this happens on many different models of Fujitsu drives and this is added validation. :-)

So to solve this long-term, I would recommend that quirks be added for Fujitsu disks to drivedb.h to limit the type of VPD queries being done. LUID lookup via VPD query does not work on these drives.

I'll see about making a patch for this, but it'll take some time...

I'm still working on the LOG SENSE error, but I'll figure that out.

comment:3 Changed 8 years ago by Christian Franke

Milestone: Release 5.43

comment:4 Changed 8 years ago by Christian Franke

Milestone: Release 5.43

comment:5 Changed 8 years ago by Christian Franke

Resolution: wontfix
Status: newclosed

This is a very old parallel SCSI disk model that does not support the device identification VPD page 0x83 which has been mandatory for many years.

The VPD page 0x80 is still optional and may not be supported by modern devices.

Workaround: smartctl -q noserial ...

comment:6 Changed 8 years ago by koitsu2009

  1. The disk was manufactured in 2007. That's 5 years old; that is in no way shape or form "very old" nor "many years". In fact, it's still under warranty.
  1. Please quote me where in the T10 specifications it states that VPD page 0x83 is required. Everything I have read states it's an optional page; one cannot travel back in time and implement page 0x83 on devices which were made prior to such a specification update.
  1. smartctl -q noserial is an ineffective workaround because it then does not perform *any* serial inquiries, which has obvious drawbacks (such as not printing serial number, which it absolutely can get without VPD page 0x83).
  1. Finally, and probably the most important thing: this change was introduced to smartmontools. It would be worthwhile to have something like -q novpd83 or something similar. Is this WONTFIX because you refuse to support devices that don't have VPD page 0x83, or is it WONTFIX because I didn't provide a patch? If the latter, if I submit a patch will it be considered/added?

comment:7 in reply to:  6 Changed 8 years ago by Christian Franke

Cc: Doug Gilbert added

A reasonably tested patch would be considered, of course.

comment:8 in reply to:  6 Changed 8 years ago by Doug Gilbert

Replying to koitsu2009:

  1. The disk was manufactured in 2007. That's 5 years old; that is in no way shape or form "very old" nor "many years". In fact, it's still under warranty.
  1. Please quote me where in the T10 specifications it states that VPD page 0x83 is required. Everything I have read states it's an optional page; one cannot travel back in time and implement page 0x83 on devices which were made prior to such a specification update.

SPC-2 which became a standard in 2001 (ANSI INCITS 351-2001), made the VPD Device identification page (0x83) "Mandatory". Same again in SPC-3 (ANSI INCITS 408-2005) and it remains mandatory in SPC-4 which is still at the draft stage. So Fujitsu had 6 years to correct their firmware (2001 to 2007). You can fetch spc2r20.pdf from www.t10.org, then look at section 8.4.1 .

comment:9 Changed 7 years ago by koitsu2009

Thank you dpgilbert -- that is exactly what I needed. The SPC-2 documentation I had on file was incomplete (and not to mention extremely old; revision 2!). I will take this up with Fujitsu, as I think that path is better overall compared to adding more quirk-nonsense to smartmontools.

Note: See TracTickets for help on using tickets.