Opened 5 weeks ago

Last modified 5 weeks ago

#1247 new defect

LVM on M.2 errors in smartctl

Reported by: w1t3c Owned by:
Priority: minor Milestone: undecided
Component: smartctl Version: 6.6
Keywords: nvme Cc:

Description

Hello,
One of my friends (java developer) get random OS freezes on his new workstation.
I gues it can be ssd issue, but I'm not sure how to interpret smartctl output.
Please check those outputs.

root@Workstation-TT:~# smartctl -V

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-5.0.0-31-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

smartctl comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
the terms of the GNU General Public License; either
version 2, or (at your option) any later version.
See http://www.gnu.org for further details.

smartmontools release 6.6 dated 2016-05-07 at 11:17:46 UTC
smartmontools SVN rev 4324 dated 2016-05-31 at 20:45:50
smartmontools build host: x86_64-pc-linux-gnu
smartmontools build with: C++98, GCC 5.4.0 20160609
smartmontools configure arguments: '--prefix=/usr' '--build=x86_64-linux-gnu' '--host=x86_64-linux-gnu' '--sysconfdir=/etc' '--mandir=/usr/share/man' '--with-initscriptdir=no' '--docdir=/usr/share/doc/smartmontools' '--with-savestates=/var/lib/smartmontools/smartd.' '--with-attributelog=/var/lib/smartmontools/attrlog.' '--with-exampledir=/usr/share/doc/smartmontools/examples/' '--with-drivedbdir=/var/lib/smartmontools/drivedb' '--with-systemdsystemunitdir=/lib/systemd/system' '--with-smartdscriptdir=/usr/share/smartmontools' '--with-smartdplugindir=/etc/smartmontools/smartd_warning.d' '--with-systemdenvfile=/etc/default/smartmontools' '--with-selinux' 'build_alias=x86_64-linux-gnu' 'host_alias=x86_64-linux-gnu' 'CXXFLAGS=-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -fsigned-char -Wall -O2' 'LDFLAGS=-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CFLAGS=-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -fsigned-char -Wall -O2'

PC spec:
RAM ADATA 16GB DDR4 3200MHZ CL16
CPU AMD RYZEN 5 3600
GPU PALIT CUDA GT710 2GB
SSD ADATA 256GB M.2 PCIe NVMe XPG GAMMIX S5
MB MSI B450M PRO-VDH PLUS
PSU SilentiumPC Zephyr 120

root@Workstation-TT:~# lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.3 LTS
Release:        18.04
Codename:       bionic

root@Workstation-TT:~# df -h

System plików               rozm. użyte dost. %uż. zamont. na
udev                         7,8G     0  7,8G   0% /dev
tmpfs                        1,6G  2,1M  1,6G   1% /run
/dev/mapper/ubuntu--vg-root  233G   30G  192G  14% /
tmpfs                        7,9G  127M  7,8G   2% /dev/shm
tmpfs                        5,0M  4,0K  5,0M   1% /run/lock
tmpfs                        7,9G     0  7,9G   0% /sys/fs/cgroup
/dev/loop1                    55M   55M     0 100% /snap/core18/1192
/dev/loop2                   150M  150M     0 100% /snap/gnome-3-28-1804/71
/dev/nvme0n1p1               511M  6,1M  505M   2% /boot/efi
/dev/loop3                    55M   55M     0 100% /snap/core18/1144
/dev/loop5                   4,2M  4,2M     0 100% /snap/gnome-calculator/406
/dev/loop4                   218M  218M     0 100% /snap/wine-platform-runtime/30
/dev/loop0                    74M   74M     0 100% /snap/wine-platform-3-stable/6
/dev/loop6                   3,8M  3,8M     0 100% /snap/gnome-system-monitor/100
/dev/loop7                   150M  150M     0 100% /snap/gnome-3-28-1804/67
/dev/loop8                    15M   15M     0 100% /snap/gnome-characters/296
/dev/loop10                   15M   15M     0 100% /snap/gnome-characters/317
/dev/loop9                   218M  218M     0 100% /snap/wine-platform-runtime/37
/dev/loop11                   90M   90M     0 100% /snap/core/7713
/dev/loop12                   89M   89M     0 100% /snap/core/7270
/dev/loop13                   43M   43M     0 100% /snap/gtk-common-themes/1313
/dev/loop14                  1,0M  1,0M     0 100% /snap/gnome-logs/81
/dev/loop15                  147M  147M     0 100% /snap/slack/17
/dev/loop16                  3,9M  3,9M     0 100% /snap/notepad-plus-plus/212
/dev/loop17                  4,3M  4,3M     0 100% /snap/gnome-calculator/501
/dev/loop18                  1,0M  1,0M     0 100% /snap/gnome-logs/73
tmpfs                        1,6G   12K  1,6G   1% /run/user/121
tmpfs                        1,6G   56K  1,6G   1% /run/user/1000
/dev/loop19                  147M  147M     0 100% /snap/slack/18
tmpfs                        1,6G     0  1,6G   0% /run/user/1001


root@Workstation-TT:~# smartctl -a /dev/nvme0n1

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-5.0.0-31-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===
Model Number:                       XPG GAMMIX S5
Serial Number:                      2J2320002639
Firmware Version:                   V9001c19
PCI Vendor/Subsystem ID:            0x10ec
IEEE OUI Identifier:                0x00e04c
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256 060 514 304 [256 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Thu Oct 10 14:06:09 2019 CEST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x0014):     DS_Mngmt Sav/Sel_Feat
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     118 Celsius
Critical Comp. Temp. Threshold:     150 Celsius


Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.00W       -        -    0  0  0  0        0       0
 1 +     4.00W       -        -    1  1  1  1        0       0
 2 +     3.00W       -        -    2  2  2  2        0       0
 3 -   0.0128W       -        -    3  3  3  3     4000    8000
 4 -   0.0080W       -        -    4  4  4  4     8000   30000


Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0


=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning:                   0x00
Temperature:                        36 Celsius
Available Spare:                    100%
Available Spare Threshold:          32%
Percentage Used:                    0%
Data Units Read:                    703 878 [360 GB]
Data Units Written:                 998 067 [511 GB]
Host Read Commands:                 4 588 622
Host Write Commands:                5 763 985
Controller Busy Time:               0
Power Cycles:                       23
Power On Hours:                     20
Unsafe Shutdowns:                   9
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0


Error Information (NVMe Log 0x01, max 8 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0          1     0  0x0000  0x0000  0x000            0     0     -
  6 1219368206019409473     0  0x0000  0x0000  0x000            0     0     -

root@Workstation-TT:~# nvme --smart-log /dev/nvme0n1

Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning                    : 0
temperature                         : 36 C
available_spare                     : 100%
available_spare_threshold           : 32%
percentage_used                     : 0%
data_units_read                     : 703 883
data_units_written                  : 998 825
host_read_commands                  : 4 588 883
host_write_commands                 : 5 770 518
controller_busy_time                : 0
power_cycles                        : 23
power_on_hours                      : 20
unsafe_shutdowns                    : 9
media_errors                        : 0
num_err_log_entries                 : 0
Warning Temperature Time            : 0
Critical Composite Temperature Time : 0
Thermal Management T1 Trans Count   : 0
Thermal Management T2 Trans Count   : 0
Thermal Management T1 Total Time    : 0
Thermal Management T2 Total Time    : 0


root@Workstation-TT:~# nvme --error-log /dev/nvme0n1

Error Log Entries for device:nvme0n1 entries:8
.................
 Entry[ 0]
.................
error_count  : 1
sqid         : 0
cmdid        : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba          : 0
nsid         : 0
vs           : 0
.................
 Entry[ 1]
.................
error_count  : 0
sqid         : 0
cmdid        : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba          : 0
nsid         : 0
vs           : 0
.................
 Entry[ 2]
.................
error_count  : 0
sqid         : 0
cmdid        : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba          : 0
nsid         : 0
vs           : 0
.................
 Entry[ 3]
.................
error_count  : 0
sqid         : 0
cmdid        : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba          : 0
nsid         : 0
vs           : 0
.................
 Entry[ 4]
.................
error_count  : 0
sqid         : 0
cmdid        : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba          : 0
nsid         : 0
vs           : 0
.................
 Entry[ 5]
.................
error_count  : 0
sqid         : 0
cmdid        : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba          : 0
nsid         : 0
vs           : 0
.................
 Entry[ 6]
.................
error_count  : 1219368206019409473
sqid         : 0
cmdid        : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba          : 0
nsid         : 0
vs           : 0
.................
 Entry[ 7]
.................
error_count  : 0
sqid         : 0
cmdid        : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba          : 0
nsid         : 0
vs           : 0
.................


Change History (2)

comment:1 Changed 5 weeks ago by w1t3c

Milestone: Release 7.1

comment:2 Changed 5 weeks ago by Christian Franke

Keywords: nvme added; lvme m.2 ssd error removed
Milestone: Release 7.1undecided
Error Information (NVMe Log 0x01, max 8 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0          1     0  0x0000  0x0000  0x000            0     0     -
  6 1219368206019409473     0  0x0000  0x0000  0x000            0     0     -
 Entry[ 6]
.................
error_count  : 1219368206019409473

Smartctl apparently prints NVMe Error Information log as returned by the drive.

The Error Count 1219368206019409473 = 0x10ec10ec42313641 looks bogus. Error Information log is useless at all because Status and other info are not set to meaningful values. This suggests that at least the implementation of this log in drive firmware is buggy.

I don't see any bug in smartctl.

PS: This bug tracker is not a support forum. For future support request, please use the smartmontools-support mailing list instead). In future bugs reports, please don't set a milestone.

Note: See TracTickets for help on using tickets.