Opened 3 years ago

Last modified 3 years ago

#1527 new patch

linux megaraid: opening the device for ioctls with O_RDWR causes a partition rescan

Reported by: charlotte Owned by:
Priority: major Milestone: undecided
Component: all Version: 7.2
Keywords: megaraid linux Cc:

Description

When calling smartctl to get the health status, e.g.

sudo smartctl -H -d megaraid,0 /dev/sdb

smartctl opens /dev/sdb as O_RDWR before using the fd to do the SG_GET_SCSI_ID and SCSI_IOCTL_GET_BUS_NUMBER ioctls. This causes the kernel to rescan partitions when the fd closes, which can be disruptive. For example, other processes trying to open a partition like /dev/sdb1 can fail if their timing is unlucky.

e.g. you can get a transient error by doing stat in a loop then running smartctl on that device.

$ while true; do stat /dev/sdb1 > /dev/null; done
[ ... no output before smartctl call ... ]
stat: cannot statx '/dev/sdb1': No such file or directory               

I've attached a patch changing O_RDWR to O_RDONLY for megaraid to match the convention in the rest of the file, though O_ACCMODE might be relevant here according to the man page for open:

Linux reserves the special, nonstandard access mode 3 (binary 11) in flags to mean: check for read and write permission on the file and return a file descriptor that can't be used for reading or writing. This nonstandard access mode is used by some Linux drivers to return a file descriptor that is to be used only for device-specific ioctl(2) operations.

We've worked around this in the meantime by passing the partition to smartctl (e.g. /dev/sdb1), but we'd want to avoid the rescan even if no partitions exist.

Attachments (1)

megaraid_rdonly.diff (517 bytes ) - added by charlotte 3 years ago.

Download all attachments as: .zip

Change History (3)

by charlotte, 3 years ago

Attachment: megaraid_rdonly.diff added

comment:1 by Christian Franke, 3 years ago

Component: smartctlall
Keywords: megaraid, linux → megaraid linux
Milestone: undecided

I do not remember a similar report since -d megaraid has been added 13+ years ago (r2650).

Which kernel, driver and controller firmware version(s) did you use for testing?

Leaving ticket open as undecided until it could be confirmed that this change is compatible to a reasonable range of kernel versions.

comment:2 by charlotte, 3 years ago

I do not remember a similar report since -d megaraid has been added 13+ years ago (r2650).

Yes, it's unlikely to be a problem unless the actual usage of the partition coincides with the call to smartctl. Users are also unlikely to open a partition like /dev/sdb1 directly unless they're using it as a raw device. Unfortunately for us, we are using it as a raw device and we hit the timing reliably in our automated testing.

Which kernel, driver and controller firmware version(s) did you use for testing?

For all intents and purposes, it's Ubuntu-4.15.0-147.151. We do have mods but they are minor and unrelated to fs, disk, block, or megaraid.

I don't think the driver and controller matter at all, since this is reproducible by doing close(open("/dev/sdb", O_RDWR|O_NONBLOCK)) without sending any ioctls. Note that the driver-specific ioctls use /dev/megaraid_sas_ioctl_node or /dev/megadev0

Anyways, the driver version is misleading because ubuntu backports changes to megaraid but it's defined in Ubuntu-4.15.0-147.151 as 07.703.05.00-rc1

Here's the megacli output:

                    Versions
                ================
Product Name    : PERC H730P Mini
Serial No       : 7C801L0
FW Package Build: 25.5.8.0001
[...]
                Image Versions in Flash:
                ================
BIOS Version       : 6.33.01.0_4.19.08.00_0x06120304
Ctrl-R Version     : 5.18-0702
FW Version         : 4.300.00-8366
NVDATA Version     : 3.1511.00-0028
Boot Block Version : 3.07.00.00-0003

Leaving ticket open as undecided until it could be confirmed that this change is compatible to a reasonable range of kernel versions.

I understand your perspective, but given that this doesn't involve the driver, it should be in the same situation as the other usages in os_linux.cpp that have been passing O_RDONLY to linux_smart_device and doing ioctls since ~2008, e.g. linux_marvell_device, linux_scsi_device

Note: See TracTickets for help on using tickets.