Opened 4 months ago
Last modified 2 months ago
#1835 accepted enhancement
Smartd should also ignore 'Set Feature' related errors from NVMe Error Information log
Reported by: | Christian Franke | Owned by: | Christian Franke |
---|---|---|---|
Priority: | minor | Milestone: | Release 7.5 |
Component: | smartd | Version: | 7.4 |
Keywords: | nvme | Cc: | BrianG |
Description
Recent comments from ticket #1222 show that the NVMe error "Feature Identifier Not Saveable" (SCT=0x1, SC=0x0d) may also appear in the error log after each reboot:
Num ErrCount SQId CmdId Status PELoc LBA NSID VS Message 0 4813 0 0x2010 0x4004 - 0 1 - Invalid Field in Command 1 4812 0 0x0010 0x4004 - 0 0 - Invalid Field in Command 2 4811 0 0x001b 0x421a 0x028 0 0 - Feature Identifier Not Saveable 3 4810 0 0x0012 0x4004 - 0 0 - Invalid Field in Command 4 4809 0 0x3007 0x4004 - 0 1 - Invalid Field in Command 5 4808 0 0x1003 0x4004 - 0 0 - Invalid Field in Command 6 4807 0 0x001b 0x421a 0x028 0 0 - Feature Identifier Not Saveable
smartd syslog:
2024-05-25T14:44:49+0000 smartd[834]: Device: /dev/nvme0, Samsung SSD 960 PRO 512GB, S/N:***************, FW:2B6QCXP7, 512 GB ... 2024-05-25T14:44:49+0000 smartd[834]: Device: /dev/nvme0, NVMe error [1], count 4811, status 0x421a: Feature Identifier Not Saveable 2024-05-25T14:44:49+0000 smartd[834]: Device: /dev/nvme0, NVMe error count increased from 4808 to 4812 (1 new, 3 ignored, 0 unknown)
This suggests that the kernel (or another component run during boot) issues a Set Features NVMe command with SV (Save) bit set without a prior check whether this bit is supported. If the kernel does it, this is IMO a kernel bug.
Smartd should also ignore this error.
Change History (7)
comment:1 by , 4 months ago
Owner: | set to |
---|---|
Status: | new → accepted |
comment:2 by , 4 months ago
comment:3 by , 4 months ago
Christian, thanks a lot for the prompt reply!
Is it possible to identify the offending component? Let me know if I can help with any additional information.
comment:4 by , 4 months ago
If a restart of services without reboot (e.g. systemctl rescue
and then ^D
, or systemctl soft-reboot
) results in new Feature Identifier Not Saveable
log entries, some of the restarted services might be the root of the problem.
I could not reproduce the behavior on a system with Debian 12 (Console/SSH only, no GUI) using a Samsung SSD 970 EVO Plus 500GB
. Only one Invalid Field in Command
appears after each reboot.
AFAICS from the Linux kernel sources, there is support for NVMe Set Features commands, see core.c. This is (only?) used by pci.c to change the power state and by hwmon.c to change the temperature thresholds.
comment:5 by , 4 months ago
I am still following this ticket and #1222 because I still receive NVMe errors.
Now with another Linux build and a newer version of smartmontools:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build) Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
The following warning/error was logged by the smartd daemon: Device: /dev/nvme0, number of Error Log entries increased from 1098 to 1099 Device info: SAMSUNG MZVLB256HAHQ-000H1, S/N:S425NX1M782076, FW:EXD70H1Q, 256 GB
comment:6 by , 4 months ago
@ThoughtPolice84, the new behavior was added on build [5472]. You need to update smartmontools to version 7.4.
Christian, let me know if I can help with any additional information.
comment:7 by , 2 months ago
FWIW, I moved my hard drive to a different machine and just turning the system on registers the new error entries, not even mounting any partition of the disk.
6.9.7-1~bpo12+1 (2024-07-03) x86_64 GNU/Linux
smartctl 7.4 2023-08-01 r5530
I take that back because this device indicates SV bit (
Sav/Sel_Feat
) support:The individual Feature ID used may not support SV.