Opened 19 months ago

Closed 8 months ago

Last modified 3 months ago

#871 closed enhancement (fixed)

cciss: Add option to disable SAT auto detection

Reported by: Stanislav Brabec Owned by: Christian Franke
Priority: major Milestone: Release 7.0
Component: all Version: 6.5
Keywords: cciss freebsd linux Cc:

Description

Some newer HPSA devices reply to basic SAT commands and provide inquiry that contains "ATA ".

It causes that sat variable in sat_device::autodetect_open() becomes true, and
even if cciss is explicitly specified by
smartctl -d cciss,0 -H /dev/sda
it switches to sat
dev/sda [cciss_disk_00] [SAT]: Device open changed type from 'sat,auto' to 'sat'

As a result, it causes failure:
SMART STATUS RETURN: incomplete response, ATA output registers missing
REPORT-IOCTL: Device=/dev/sda Command=SMART STATUS CHECK returned -1 errno=38 [Function not implemented]

Attached patch disables the auto-switch to "better" driver for cciss.

Note that I do not have a test report from the customer for that patch yet, but setting sat = 0 was already confirmed to prevent this bug.

Note that smart_interface::autodetect_sat_device() contains a similar code, but I am not sure whether it needs a fix as well.

Attachments (2)

smartmontools-cciss-not-sat.patch (487 bytes) - added by Stanislav Brabec 18 months ago.
New version of the patch. Confirmed to fix the issue.
scsiata-scsi_only.patch (3.5 KB) - added by Christian Franke 16 months ago.
Patch adds '-d scsi+TYPE' prefix to disable auto-detection of TYPE

Download all attachments as: .zip

Change History (14)

comment:1 Changed 19 months ago by Christian Franke

Keywords: cciss freebsd linux added
Milestone: undecided

SAT auto detection for '-d cciss' was added 5+ years ago as suggested by Don Brace, see ticket #202.

As a result, it causes failure:
SMART STATUS RETURN: incomplete response, ATA output registers missing
REPORT-IOCTL: Device=/dev/sda Command=SMART STATUS CHECK returned -1 errno=38 [Function not implemented]

This message does not indicate disk problems. It is the usual result from buggy/incomplete SAT layers which do not properly return ATA output registers in SCSI sense data (ATA Return Descriptor).

The attached patch probably does not work. It only changes the info texts. It does not change the actual ATA/SCSI interface selection.

To disable implicit SAT auto detection for -d cciss, simply revert the get_sat_device("sat,auto", ...) additions from r3564 and r3565.

comment:2 Changed 18 months ago by Stanislav Brabec

Yes, the drive is OK and works perfectly if connected directly. Also the HPSA array is OK.

The problem is caused by the new HPSA firmware that responds to SAT/SCSI inquiry. Even if it responds to the inquiry, it does not respond to SMART STATUS CHECK, SMART ATA attributes nor SCSI temperature queries. To get these values, CCISS passthrough protocol has to be used.

This firmware behavior caused more problems than this one. For example: https://www.smartmontools.org/ticket/817

My intention was a fix that will do: Once -d cciss is specified, never fall back to SAT/SCSI protocol. Only CCISS passthrough should be used.

Notes:

  • The device behind the HPSA array can still be SAS or SATA, so the code has to pick a correct CCISS passthrough protocol.

I just got a reply from customer. The attached patch does not work, it still switches to sat later, generating the same error. I will post new patch once it will be confirmed.

I will try to revert referred patches and let you know the result.

Changed 18 months ago by Stanislav Brabec

New version of the patch. Confirmed to fix the issue.

comment:3 Changed 18 months ago by Stanislav Brabec

The new version of the patch disables the inquiry based switch from cciss to sat. Customer confirmed that it fixes the problem.

Customer also confirmed that reverting of r3564 and r3565 fixes the problem as well.


As I do not have a full insight into the code, I see are some uncertain things:

  • Is it correct to call hide_scsi() for cciss devices?
  • Should be autodetect_sat_device() modified in the same way?

comment:4 in reply to:  2 ; Changed 18 months ago by Christian Franke

Replying to sbrabec:

  • The device behind the HPSA array can still be SAS or SATA, so the code has to pick a correct CCISS passthrough protocol.

It already does. If SATA is detected, SAT ATA_PASS_THROUGH commands are issued via CCISS passthrough protocol to address the SAT layer in CCISS driver or firmware.

New version of the patch. Confirmed to fix the issue.

Sorry, no. Disabling -d sat,auto for CCISS in the generic SAT code after it has been added in CCISS specific code does not make much sense. The correct way is to undo the latter (r3564, r3565).

There are three alternatives:

  1. Convince the customer that the incomplete response, ATA output registers missing reports a driver/firmware limitation and not a disk problem.
  1. Undo r3564 and r3565 and require all other smartmontools users relying on this 5+ year old behavior to change -d cciss,N to -d sat,auto+cciss,N in all monitoring scripts and smartd.conf files.
  1. Add a new -d noauto[+TYPE] prefix which disables any controller/platform specific auto-detection. Then your customer could change -d cciss,N to -d noauto+cciss,N. The customer will possibly realize then that the smartctl output has limited value for SATA drives. The SAT layer typically translates very limited diagnostic info (temperature, health status) to the SCSI/SAS view of the drive. Other interesting parts are no longer visible then.

comment:5 in reply to:  4 ; Changed 18 months ago by Stanislav Brabec

There are three alternatives:

  1. Convince the customer that the incomplete response, ATA output registers missing reports a driver/firmware limitation and not a disk problem.

In case of -d sat I would agree. If this happens with -d cciss, then I will not agree. If -d cciss is used, then user explicitly requests CCISS-pass-through protocol. smartctl should never switch back to sat.

Additionally, one work-around was already added for failing temperature reading after switching to sat from -d cciss.

  1. Undo r3564 and r3565 and require all other smartmontools users relying on this 5+ year old behavior to change -d cciss,N to -d sat,auto+cciss,N in all monitoring scripts and smartd.conf files.

Note that -d sat,auto+cciss,N will not work these modern HPSA devices, as it will behave exactly as -d sat.

  1. Add a new -d noauto[+TYPE] prefix which disables any controller/platform specific auto-detection. Then your customer could change -d cciss,N to -d noauto+cciss,N. The customer will possibly realize then that the smartctl output has limited value for SATA drives. The SAT layer typically translates very limited diagnostic info (temperature, health status) to the SCSI/SAS view of the drive. Other interesting parts are no longer visible then.

Then -d cciss would be usable only for the legacy CCISS and HPSA devices, not those new ones, which respond to SAT inquiry.

I have another two ideas:

  • Do an extended inquiry check.

For example:
If the inquiry ID is ATA      EK000400GWEPE and version is HPG0, then never use sat.

  • In CCISS/auto mode, try sat command. If it fails, try CCISS-pass-through.
Last edited 18 months ago by Stanislav Brabec (previous) (diff)

Changed 16 months ago by Christian Franke

Attachment: scsiata-scsi_only.patch added

Patch adds '-d scsi+TYPE' prefix to disable auto-detection of TYPE

comment:6 Changed 16 months ago by Christian Franke

With the attached patch, smartctl -d scsi+cciss,0 ... should disable SAT auto-detection. Please test if possible.

comment:7 in reply to:  5 Changed 16 months ago by Christian Franke

Replying to comment 5:

In case of -d sat I would agree. If this happens with -d cciss, then I will not agree. If -d cciss is used, then user explicitly requests CCISS-pass-through protocol. smartctl should never switch back to sat.

It doesn't switch back to SAT via SG_IO protocol. It still sends SCSI (in particular SAT) commands via CCISS-pass-through protocol.

comment:8 in reply to:  6 Changed 16 months ago by Stanislav Brabec

Replying to chrfranke:

With the attached patch, smartctl -d scsi+cciss,0 ... should disable SAT auto-detection. Please test if possible.

Thanks for the patch. I made a test package and sent it to the customer with the affected hardware.

comment:9 Changed 15 months ago by Stanislav Brabec

The customer just confirmed that your patch scsiata-scsi_only.patch works perfectly on a customer's hardware with -d scsi+cciss,0. Thanks.

comment:10 Changed 15 months ago by Christian Franke

Milestone: undecidedRelease 6.7
Owner: set to Christian Franke
Status: newaccepted
Summary: [PATCH] cciss: Never switch cciss device back to satcciss: Add option to disable SAT auto detection
Type: defectenhancement

comment:11 Changed 8 months ago by Christian Franke

Resolution: fixed
Status: acceptedclosed

comment:12 Changed 3 months ago by Christian Franke

Milestone: Release 6.7Release 7.0

Milestone renamed

Note: See TracTickets for help on using tickets.