Opened 6 weeks ago

Last modified 5 weeks ago

#1614 assigned patch

Add more status strings for ASC 0xb

Reported by: asomers Owned by: Doug Gilbert
Priority: minor Milestone: undecided
Component: all Version:
Keywords: scsi Cc:

Description

This patch adds in all currently defined status strings for ASC 0xb. In particular, I find that Seagate ST16000NM002G disks frequently return 0xb/0x14 .

Attachments (1)

0001-Add-more-status-strings-for-ASC-0xb.patch (2.0 KB) - added by asomers 6 weeks ago.

Download all attachments as: .zip

Change History (8)

Changed 6 weeks ago by asomers

comment:1 Changed 5 weeks ago by Christian Franke

Component: smartctlall
Keywords: scsi added
Milestone: undecided

comment:2 Changed 5 weeks ago by Doug Gilbert

A complete list of decoded SCSI ASC/ASCQ codes is pretty large and smartmontools doesn't have one (sg3_utils does in its library). That said 0xb,0x14 is a new one and looks pretty important: "WARNING - PHYSICAL ELEMENT STATUS CHANGE". I only have recent WD SAS disks and they don't use physical elements; it looks like 16 GB Seagate SAS disks do have physical elements. According to sbc5r01.pdf section 4.36.2 that warning should prompt a GET PHYSICAL ELEMENT STATUS command with a filter value of 1. Could you try sg_get_elem_status utility in sg3_utils to see what it reports (when --filter=1) and report if it shows anything of note?

comment:3 Changed 5 weeks ago by asomers

Oh, that's a good idea! Unfortunately I've already RMAed all of the drives that were reporting this error. But it happens fairly often. It'll probably happen again within a month, and then I'll run that command for you.

comment:4 Changed 5 weeks ago by asomers

We got lucky: I just got another error of that type. But the output isn't very interesting I'm afraid:

$ sudo sg_get_elem_status --filter=1 -vv /dev/da557

Get physical element status cdb: [9e 17 00 00 00 00 00 00 00 00 00 00 00 20 40 00]

response length 32 bytes
Number of descriptors: 1
Number of descriptors returned: 0
Identifier of element being depopulated: 0
No complete physical element status descriptors available

For comparison, here is the same command run on a healthy drive:

$ sudo sg_get_elem_status --filter=1 -vv /dev/da556

Get physical element status cdb: [9e 17 00 00 00 00 00 00 00 00 00 00 00 20 40 00]

response length 32 bytes
Number of descriptors: 0
Number of descriptors returned: 0
Identifier of element being depopulated: 0
No complete physical element status descriptors available

And here is the output without the filter bit set. It's the same on both healthy and degraded drives.

$ sudo sg_get_elem_status --filter=0 -vv /dev/da557

Get physical element status cdb: [9e 17 00 00 00 00 00 00 00 00 00 00 00 20 00 00]

response length 32 bytes
Number of descriptors: 18
Number of descriptors returned: 0
Identifier of element being depopulated: 0
No complete physical element status descriptors available

comment:5 Changed 5 weeks ago by Doug Gilbert

Thanks for that as its the first time I've seen a real response to that command. In the last case adding the --maxlen=1k option should print out the 18 descriptors. I should change that. Anyway 18 seems a bit strange as its a 16 TB disk. I would like to see the full output, perhaps you could email to me.

I would like to see the "0xb,0x14" sense data also include a INFO field that said _which_ physical element id it was reporting. Would a physical element "coming good" qualify for this warning since it is a change? I found a product manual for that disk family [100845788g.pdf] but it says virtually nothing about "physical elements" apart from saying Get physical element status and Remove element and truncate commands are supported.

There is a Physical element health field for each element where values 0x1 through 0x63 are okay, 0x64 is on the edge and >= 0x65 is kaput. T10 doesn't say whether that is a sliding scale (like endurance on a SSD).

comment:6 Changed 5 weeks ago by asomers

Seagate's datasheet doesn't say so, but other websites describe this disk as having 9 platters. So each one of those physical elements probably corresponds to a surface. Here's the command output with --maxlen=1k

For the degraded disk:
$ sudo sg_get_elem_status --filter=0 --maxlen=1k /dev/da557
Number of descriptors: 18
Number of descriptors returned: 18
Identifier of element being depopulated: 0

Element descriptors:
[1] identifier: 0x000001 associated LBs: not specified health: within manufacturer's specification limits <1>
[2] identifier: 0x000002 associated LBs: not specified health: within manufacturer's specification limits <1>
[3] identifier: 0x000003 associated LBs: not specified health: within manufacturer's specification limits <1>
[4] identifier: 0x000004 associated LBs: not specified health: within manufacturer's specification limits <1>
[5] identifier: 0x000005 associated LBs: not specified health: within manufacturer's specification limits <1>
[6] identifier: 0x000006 associated LBs: not specified health: within manufacturer's specification limits <1>
[7] identifier: 0x000007 associated LBs: not specified health: within manufacturer's specification limits <1>
[8] identifier: 0x000008 associated LBs: not specified health: within manufacturer's specification limits <1>
[9] identifier: 0x000009 associated LBs: not specified health: within manufacturer's specification limits <1>
[10] identifier: 0x00000a associated LBs: not specified health: outside manufacturer's specification limits <101>
[11] identifier: 0x00000b associated LBs: not specified health: within manufacturer's specification limits <1>
[12] identifier: 0x00000c associated LBs: not specified health: within manufacturer's specification limits <1>
[13] identifier: 0x00000d associated LBs: not specified health: within manufacturer's specification limits <1>
[14] identifier: 0x00000e associated LBs: not specified health: within manufacturer's specification limits <1>
[15] identifier: 0x00000f associated LBs: not specified health: within manufacturer's specification limits <1>
[16] identifier: 0x000010 associated LBs: not specified health: within manufacturer's specification limits <1>
[17] identifier: 0x000011 associated LBs: not specified health: within manufacturer's specification limits <1>
[18] identifier: 0x000012 associated LBs: not specified health: within manufacturer's specification limits <1>

And for a healthy disk:
$ sudo sg_get_elem_status --filter=0 --maxlen=1k /dev/da556
Number of descriptors: 18
Number of descriptors returned: 18
Identifier of element being depopulated: 0

Element descriptors:
[1] identifier: 0x000001 associated LBs: not specified health: within manufacturer's specification limits <1>
[2] identifier: 0x000002 associated LBs: not specified health: within manufacturer's specification limits <1>
[3] identifier: 0x000003 associated LBs: not specified health: within manufacturer's specification limits <1>
[4] identifier: 0x000004 associated LBs: not specified health: within manufacturer's specification limits <1>
[5] identifier: 0x000005 associated LBs: not specified health: within manufacturer's specification limits <1>
[6] identifier: 0x000006 associated LBs: not specified health: within manufacturer's specification limits <1>
[7] identifier: 0x000007 associated LBs: not specified health: within manufacturer's specification limits <1>
[8] identifier: 0x000008 associated LBs: not specified health: within manufacturer's specification limits <1>
[9] identifier: 0x000009 associated LBs: not specified health: within manufacturer's specification limits <1>
[10] identifier: 0x00000a associated LBs: not specified health: within manufacturer's specification limits <1>
[11] identifier: 0x00000b associated LBs: not specified health: within manufacturer's specification limits <1>
[12] identifier: 0x00000c associated LBs: not specified health: within manufacturer's specification limits <1>
[13] identifier: 0x00000d associated LBs: not specified health: within manufacturer's specification limits <1>
[14] identifier: 0x00000e associated LBs: not specified health: within manufacturer's specification limits <1>
[15] identifier: 0x00000f associated LBs: not specified health: within manufacturer's specification limits <1>
[16] identifier: 0x000010 associated LBs: not specified health: within manufacturer's specification limits <1>
[17] identifier: 0x000011 associated LBs: not specified health: within manufacturer's specification limits <1>
[18] identifier: 0x000012 associated LBs: not specified health: within manufacturer's specification limits <1>

comment:7 Changed 5 weeks ago by Christian Franke

Owner: set to Doug Gilbert
Status: newassigned
Note: See TracTickets for help on using tickets.