Opened 2 years ago

Closed 6 months ago

Last modified 6 months ago

#1521 closed defect (invalid)

Wearout not correct Transcend TS512GMTE220S

Reported by: jongeren Owned by:
Priority: minor Milestone:
Component: smartctl Version:
Keywords: nvme Cc: slootnomtrams

Description

I use proxmox incombination with the TS512GMTE220S.

The wearout is now 8%. But when i put this drive in a windows machine the wearout is 1%. So the information is incorrect.

The TBW of this drive is:
2 TB 4,400 TBW
1 TB 2,200 TBW
512 GB 1,100 TBW
256 GB 550 TBW

Can you fix it?

Attachments (2)

wearoutimg.png (42.9 KB) - added by jongeren 2 years ago.
smartmontoollog.txt (21.3 KB) - added by jongeren 6 months ago.

Download all attachments as: .zip

Change History (11)

comment:1 Changed 2 years ago by Christian Franke

Milestone: undecided

Please provide more details including smartctl -x output(s) of this device.

comment:2 Changed 2 years ago by jongeren

root@server4:~# smartctl /dev/nvme2n1 -x
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.4.128-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       TS512GMTE220S
Serial Number:                      G35XXX0001
Firmware Version:                   42B4SAUA
PCI Vendor/Subsystem ID:            0x126f
IEEE OUI Identifier:                0x000000
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Utilization:            493,156,945,920 [493 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Thu Sep 16 17:25:47 2021 CEST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     75 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        45 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    8%
Data Units Read:                    16,764,860 [8.58 TB]
Data Units Written:                 28,891,904 [14.7 TB]
Host Read Commands:                 330,706,243
Host Write Commands:                709,229,081
Controller Busy Time:               5,083
Power Cycles:                       359
Power On Hours:                     4,300
Unsafe Shutdowns:                   23
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Thermal Temp. 1 Transition Count:   18
Thermal Temp. 1 Total Time:         97

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
Last edited 2 years ago by Christian Franke (previous) (diff)

comment:3 Changed 2 years ago by Christian Franke

The wearout is now 8%. But when i put this drive in a windows machine the wearout is 1%. So the information is incorrect.

Which tools display this information?

NVMe drives do not provide a wearout percentage.

Changed 2 years ago by jongeren

Attachment: wearoutimg.png added

comment:4 Changed 2 years ago by jongeren

Thats strange. On the proxmox gui i get a wearout of 8% (see the attached image)
The windows transcend program https://www.transcend-info.com/Support/Software-10/# gives a wearout of 1%.

comment:5 Changed 2 years ago by Christian Franke

Component: allsmartctl
Keywords: nvme added

Proxmox apparently prints the Percentage Used value from SMART/Health log as Wearout. This makes sense. Quote from NVMe standard:

Byte 5 | Percentage Used: Contains a vendor specific estimate of the percentage of NVM subsystem life used based on the actual usage and the manufacturer’s prediction of NVM life. A value of 100 indicates that the estimated endurance of the NVM in the NVM subsystem has been consumed, but may not indicate an NVM subsystem failure. The value is allowed to exceed 100. Percentages greater than 254 shall be represented as 255.

To make sure that smartctl prints this value correctly, please provide the output of

smartctl -r ioctl,2 -x ...

as a plaintext attachment.

comment:6 Changed 19 months ago by slootnomtrams

Cc: slootnomtrams added

comment:7 Changed 6 months ago by Christian Franke

Milestone: undecided
Resolution: invalid
Status: newclosed

Not a smartctl bug. No feedback from reporter.

Changed 6 months ago by jongeren

Attachment: smartmontoollog.txt added

comment:8 Changed 6 months ago by jongeren

The log is attached to this ticket. Excuses for the delay.

The wearout is now 105% (125 TB) but the drive had a TBW of 1100TB, so the info is not correct.

comment:9 Changed 6 months ago by Christian Franke

...
=== START OF SMART DATA SECTION ===
 [NVMe call: opcode=0x02, size=0x0200, nsid=0xffffffff, cdw10=0x007f0002]
  [Duration: 0.003s]
 [NVMe call succeeded: result=0x00000000
 00     04 40 01 64 0a 69 00 00  00 00 00 00 00 00 00 00    .@.d.i..........
        ||             ^^ ------- Percentage Used 0x69 = 105
        ^^----------------------- Critical Warning 0x04 = "reliability ..."
...
SMART overall-health self-assessment test result: FAILED!
- NVM subsystem reliability has been degraded

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x04
...
Percentage Used:                    105%
...

Conclusion: Smartctl prints the info as provided by the drive firmware. The firmware also sets Critical Warning bit 2 which is usually the case if Percentage Used reached 100%.

Either the 105% is bogus information due to a firmware bug - or their marketing world does not match the real world :-)

Leaving ticket resolution as is.

Note: See TracTickets for help on using tickets.