Opened 14 years ago
Closed 14 years ago
#78 closed defect (fixed)
Smartctl segmentation fault and crash followed by kernel invalid opcode trace
Reported by: | mhlavink | Owned by: | Christian Franke |
---|---|---|---|
Priority: | major | Milestone: | Release 5.41 |
Component: | all | Version: | 5.39.1 |
Keywords: | megaraid linux | Cc: |
Description
I got following bug report from one Fedora user, let me know if you need some other information.
Smartctl segmentation fault and crash when asking for SMART test of
a disk on a DELL MegaRaid controller.
How reproducible:
Always reproducible
Steps to Reproduce:
- smartctl -t short /dev/sda -d megaraid,0
- segmentation fault and crash
Actual results:
smartctl 5.39.1 2010-01-28 r3054 [x86_64-redhat-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
Segmentation fault
Message from syslogd@webster at Mar 29 14:45:01 ...
kernel:invalid opcode: 0000 #8 SMP
kernel:last sysfs file:
/sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
kernel:Stack:
kernel:Call Trace:
kernel:Code: 00 08 00 00 49 c1 ef 0b 4c 8b 75 c0 49 81 c6 ff 07 00 00 49 c1 ee
0b 48 81 7d c0 01 10 00 00 45 19 ed 41 83 c5 02 45 85 f6 75 04 <0f> 0b eb fe 48
c7 c7 c0 7a a0 81 45 89 ec e8 a9 10 22 00 49 89
Additional info:
DELL PowerEdge R710 with 2 Xeon E5530 with 8GB, running F12 x86_64
LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
The stack trace of smartmontools at the system call that causes the problem is
a little hard to get because the crash happens in the kernel so you can't just
run the debugger to the error (stack is gone by that time), but it seems that
the problem is in an ioctl:
#0 os_linux::linux_megaraid_device::megasas_cmd (this=0x7ffff821a030,
cdbLen=<value optimized out>, cdb=0x7fffffffc8f0, dataLen=-134715736, data=0x0)
at os_linux.cpp:1112
#1 0x00007ffff7fd682f in os_linux::linux_megaraid_device::scsi_pass_through
(this=<value optimized out>, iop=0x7fffffffc870) at os_linux.cpp:1076
#2 0x00007ffff7fcba52 in scsiSendDiagnostic (device=0x7ffff821a030,
functioncode=<value optimized out>, pBuf=<value optimized out>, bufLen=<value
optimized out>) at scsicmds.cpp:722
#3 0x00007ffff7fcbb9f in scsiSmartExtendSelfTest (device=<value optimized
out>) at scsicmds.cpp:1699
#4 0x00007ffff7fd45ad in scsiPrintMain (device=<value optimized out>,
options=<value optimized out>) at scsiprint.cpp:1703
#5 0x00007ffff7fbbcf2 in main_worker (argc=<value optimized out>, argv=<value
optimized out>) at smartctl.cpp:951
#6 0x00007ffff7fbc049 in main (argc=<value optimized out>, argv=<value
optimized out>) at smartctl.cpp:967
line 1112 of os_linux.cpp is
rc = ioctl(m_fd, MEGASAS_IOC_FIRMWARE, &uio);
where uio is:
{host_no = 2, pad1 = 0, sgl_off = 48, sge_count = 1, sense_off = 0, sense_len = 0, frame = {
raw = "\004\000\377\000\000\000\006\001\000\000\000\000\000\000\000\000\020", '\000'<repeats 15 times>, "\035@", '\000' <repeats 93 times>, hdr = {cmd = 4 '\004', sense_len = 0 '\000', cmd_status = 255 '\377', scsi_status = 0 '\000', target_id = 0 '\000', lun = 0 '\000', cdb_len = 6 '\006', sge_count = 1 '\001', context = 0, pad_0 = 0, flags = 16, timeout = 0, data_xferlen = 0}}, sgl = {{iov_base = 0x0, iov_len = 0} <repeats 16 times>}}
Don't know if it's useful but last non-hardware specific call level up the
stack is line 722 of scsicmds.cpp :
if (!device->scsi_pass_through(&io_hdr));
at that point, io_hdr is
$1 = {cmnd = 0x7fffffffc8f0 "\035@", cmnd_len = 6, dxfer_dir = 0, dxferp = 0x0, dxfer_len = 0, sensep = 0x7fffffffc8d0 "HITACHI ", max_sense_len = 32, timeout = 18000, resp_sense_len = 0, scsi_status = 0 '\000', resid = 0}
Change History (3)
comment:1 by , 14 years ago
Keywords: | megaraid linux added |
---|
comment:2 by , 14 years ago
Milestone: | → Release 5.41 |
---|---|
Owner: | changed from | to
Status: | new → accepted |
Similar problem was reported on smartmontools-support list. Smartmontools passes sge_count=1 to ioctl even if dataLen==0. The megaraid driver does does not handle this case properly. This patch for os_linux.cpp should fix this.