Opened 9 years ago

Closed 8 years ago

#78 closed defect (fixed)

Smartctl segmentation fault and crash followed by kernel invalid opcode trace

Reported by: mhlavink Owned by: Christian Franke
Priority: major Milestone: Release 5.41
Component: all Version: 5.39.1
Keywords: megaraid linux Cc:

Description

I got following bug report from one Fedora user, let me know if you need some other information.


Smartctl segmentation fault and crash when asking for SMART test of
a disk on a DELL MegaRaid? controller.

How reproducible:

Always reproducible

Steps to Reproduce:

  1. smartctl -t short /dev/sda -d megaraid,0
  2. segmentation fault and crash

Actual results:
smartctl 5.39.1 2010-01-28 r3054 [x86_64-redhat-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Segmentation fault

Message from syslogd@webster at Mar 29 14:45:01 ...

kernel:invalid opcode: 0000 #8 SMP
kernel:last sysfs file:

/sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map

kernel:Stack:
kernel:Call Trace:
kernel:Code: 00 08 00 00 49 c1 ef 0b 4c 8b 75 c0 49 81 c6 ff 07 00 00 49 c1 ee

0b 48 81 7d c0 01 10 00 00 45 19 ed 41 83 c5 02 45 85 f6 75 04 <0f> 0b eb fe 48
c7 c7 c0 7a a0 81 45 89 ec e8 a9 10 22 00 49 89

Additional info:
DELL PowerEdge? R710 with 2 Xeon E5530 with 8GB, running F12 x86_64
LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)


The stack trace of smartmontools at the system call that causes the problem is

a little hard to get because the crash happens in the kernel so you can't just
run the debugger to the error (stack is gone by that time), but it seems that
the problem is in an ioctl:

#0 os_linux::linux_megaraid_device::megasas_cmd (this=0x7ffff821a030,
cdbLen=<value optimized out>, cdb=0x7fffffffc8f0, dataLen=-134715736, data=0x0)
at os_linux.cpp:1112
#1 0x00007ffff7fd682f in os_linux::linux_megaraid_device::scsi_pass_through
(this=<value optimized out>, iop=0x7fffffffc870) at os_linux.cpp:1076
#2 0x00007ffff7fcba52 in scsiSendDiagnostic (device=0x7ffff821a030,
functioncode=<value optimized out>, pBuf=<value optimized out>, bufLen=<value
optimized out>) at scsicmds.cpp:722
#3 0x00007ffff7fcbb9f in scsiSmartExtendSelfTest (device=<value optimized
out>) at scsicmds.cpp:1699
#4 0x00007ffff7fd45ad in scsiPrintMain (device=<value optimized out>,
options=<value optimized out>) at scsiprint.cpp:1703
#5 0x00007ffff7fbbcf2 in main_worker (argc=<value optimized out>, argv=<value
optimized out>) at smartctl.cpp:951
#6 0x00007ffff7fbc049 in main (argc=<value optimized out>, argv=<value
optimized out>) at smartctl.cpp:967

line 1112 of os_linux.cpp is

rc = ioctl(m_fd, MEGASAS_IOC_FIRMWARE, &uio);

where uio is:

{host_no = 2, pad1 = 0, sgl_off = 48, sge_count = 1, sense_off = 0, sense_len = 0, frame = {

raw = "\004\000\377\000\000\000\006\001\000\000\000\000\000\000\000\000\020", '\000'<repeats 15 times>, "\035@", '\000' <repeats 93 times>, hdr = {cmd = 4 '\004', sense_len = 0 '\000', cmd_status = 255 '\377', scsi_status = 0 '\000', target_id = 0 '\000', lun = 0 '\000', cdb_len = 6 '\006', sge_count = 1 '\001', context = 0, pad_0 = 0, flags = 16, timeout = 0, data_xferlen = 0}}, sgl = {{iov_base = 0x0, iov_len = 0} <repeats 16 times>}}

Don't know if it's useful but last non-hardware specific call level up the
stack is line 722 of scsicmds.cpp :

if (!device->scsi_pass_through(&io_hdr));

at that point, io_hdr is
$1 = {cmnd = 0x7fffffffc8f0 "\035@", cmnd_len = 6, dxfer_dir = 0, dxferp = 0x0, dxfer_len = 0, sensep = 0x7fffffffc8d0 "HITACHI ", max_sense_len = 32, timeout = 18000, resp_sense_len = 0, scsi_status = 0 '\000', resid = 0}

Change History (3)

comment:1 Changed 8 years ago by Christian Franke

Keywords: megaraid linux added

comment:2 Changed 8 years ago by Christian Franke

Milestone: Release 5.41
Owner: changed from somebody to Christian Franke
Status: newaccepted

Similar problem was reported on smartmontools-support list. Smartmontools passes sge_count=1 to ioctl even if dataLen==0. The megaraid driver does does not handle this case properly. This patch for os_linux.cpp should fix this.

comment:3 Changed 8 years ago by Christian Franke

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.