Custom Query (1418 matches)
Results (79 - 81 of 1418)
Ticket | Resolution | Summary | Owner | Reporter |
---|---|---|---|---|
#1663 | worksforme | Please fix the error_log on query for NVMe HPE drives (kioxia) | ||
Description |
Smartmontools is causing the error-log of HPE drives to increment constantly. Our drives are hitting 100k+ errors. Related to: https://www.smartmontools.org/ticket/1134 === START OF INFORMATION SECTION === Model Number: TCM615T4P5xnFTRI Serial Number: <removed> Firmware Version: 3P01 PCI Vendor ID: 0x1e0f PCI Vendor Subsystem ID: 0x1590 IEEE OUI Identifier: 0x8ce38e Total NVM Capacity: 15,360,950,534,144 [15.3 TB] Unallocated NVM Capacity: 312,458,870,784 [312 GB] Controller ID: 1 NVMe Version: 1.4 Number of Namespaces: 1 Namespace 1 Size/Capacity: 15,048,491,663,360 [15.0 TB] Namespace 1 Utilization: 2,165,793,648,640 [2.16 TB] Namespace 1 Formatted LBA Size: 4096 Namespace 1 IEEE EUI-64: 8ce38e e209a49501 Local Time is: Sun Oct 30 16:37:26 2022 PDT Firmware Updates (0x16): 3 Slots, no Reset required Optional Admin Commands (0x025f): Security Format Frmw_DL NS_Mngmt Self_Test MI_Snd/Rec Get_LBA_Sts Optional NVM Commands (0x00ff): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Resv Timestmp Verify Log Page Attributes (0x1e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg Maximum Data Transfer Size: 8192 Pages Warning Comp. Temp. Threshold: 76 Celsius Critical Comp. Temp. Threshold: 82 Celsius Namespace 1 Features (0x10): NP_Fields Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 27.50W 25.00W - 0 0 0 0 500000 500000 1 + 19.80W 18.00W - 0 0 1 1 500000 500000 2 + 17.60W 16.00W - 0 0 2 2 500000 500000 3 + 15.40W 14.00W - 1 1 3 3 500000 500000 4 + 12.10W 11.00W - 2 2 4 4 500000 500000 5 + 9.90W 9.00W - 3 3 5 5 500000 500000 6 - 5.00W - - 6 6 6 6 500000 500000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 - 512 0 0 1 - 512 8 0 2 - 1 0 0 3 + 4096 0 0 4 - 4096 8 0 5 - 4096 64 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 47 Celsius Available Spare: 100% Available Spare Threshold: 26% Percentage Used: 0% Data Units Read: 76,623,421 [39.2 TB] Data Units Written: 59,365,282 [30.3 TB] Host Read Commands: 614,079,901 Host Write Commands: 549,353,795 Controller Busy Time: 544 Power Cycles: 59 Power On Hours: 3,526 Unsafe Shutdowns: 24 Media and Data Integrity Errors: 25 Error Information Log Entries: 214,053 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 109 Celsius Temperature Sensor 2: 101 Celsius Error Information (NVMe Log 0x01, 16 of 256 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 214053 0 0xc00a 0xc004 0x02e - 0 - 1 214052 0 0x7019 0xc004 0x02e - 0 - 2 214051 0 0xc008 0xc004 0x02e - 0 - 3 214050 0 0xb00b 0xc004 0x02e - 0 - 4 214049 0 0x601b 0xc004 0x02e - 0 - 5 214048 0 0x601a 0xc004 0x02e - 0 - 6 214047 0 0x6018 0xc004 0x02e - 0 - 7 214046 0 0x501b 0xc004 0x02e - 0 - 8 214045 0 0x700c 0xc004 0x02e - 0 - 9 214044 0 0x600f 0xc004 0x02e - 0 - 10 214043 0 0x600d 0xc004 0x02e - 0 - 11 214042 0 0x600c 0xc004 0x02e - 0 - 12 214041 0 0x3013 0xc004 0x02e - 0 - 13 214040 0 0x3012 0xc004 0x02e - 0 - 14 214039 0 0x3010 0xc004 0x02e - 0 - 15 214038 0 0x2013 0xc004 0x02e - 0 - --- 2022 Oct 30 16:38:18 truenas Device: /dev/nvme4n1, number of Error Log entries increased from 230917 to 231033 2022 Oct 30 16:38:18 truenas Device: /dev/nvme1n1, number of Error Log entries increased from 214056 to 214172 2022 Oct 30 16:38:18 truenas Device: /dev/nvme0n1, number of Error Log entries increased from 213937 to 214053 2022 Oct 30 16:38:18 truenas Device: /dev/nvme5n2, number of Error Log entries increased from 36162 to 36278 2022 Oct 30 16:38:18 truenas Device: /dev/nvme7n1, number of Error Log entries increased from 27371 to 27487 --- smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.142+truenas] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org smartctl comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under the terms of the GNU General Public License; either version 2, or (at your option) any later version. See http://www.gnu.org for further details. smartmontools release 7.2 dated 2020-12-30 at 16:48:30 UTC smartmontools SVN rev 5155 dated 2020-12-30 at 16:49:18 smartmontools build host: x86_64-pc-linux-gnu smartmontools build with: C++14, GCC 10.2.1 20210110 smartmontools configure arguments: '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--build=x86_64-linux-gnu' '--host=x86_64-linux-gnu' '--prefix=/usr' '--sysconfdir=/etc' '--mandir=/usr/share/man' '--with-initscriptdir=no' '--docdir=/usr/share/doc/smartmontools' '--with-attributelog=/var/lib/smartmontools/attrlog.' '--with-drivedbdir=/var/lib/smartmontools/drivedb' '--with-exampledir=/usr/share/doc/smartmontools/examples/' '--with-savestates=/var/lib/smartmontools/smartd.' '--with-smartdplugindir=/etc/smartmontools/smartd_warning.d' '--with-smartdscriptdir=/usr/share/smartmontools' '--with-systemdenvfile=/etc/default/smartmontools' '--with-systemdsystemunitdir=/lib/systemd/system' '--with-libsystemd=auto' '--with-selinux' 'build_alias=x86_64-linux-gnu' 'host_alias=x86_64-linux-gnu' 'CXXFLAGS=-g -O2 -ffile-prefix-map=/dpkg-src=. -fstack-protector-strong -Wformat -Werror=format-security -fsigned-char -Wall' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CFLAGS=-g -O2 -ffile-prefix-map=/dpkg-src=. -fstack-protector-strong -Wformat -Werror=format-security -fsigned-char -Wall' |
|||
#1677 | worksforme | Unable to Ignore 'unreadable (pending) sectors' Error | ||
Description |
I have many, many machines running SSDs and they periodicaly generate errors like this: Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors The SSDs tend to reman these failures because they will go away over time. But how to I stop smartd from e-mailing these alerts. Because of the number of machines we have, we can get up to 100 e-mails a day. It makes it difficult to weed out more serious errors. I have tried the following: DEVICESCAN -I 197 -m gcn-alerts@… -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q and DEVICESCAN -m gcn-alerts@… -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q -t -I 197 But the e-mails keep coming. Any idea how to get rid of errors for type 197 failures? |
|||
#1692 | worksforme | Seagate DKS2E-H4R0SS is not working correctly with smartctl | ||
Description |
~]$ sudo smartctl -a /dev/sdg smartctl 7.1 2020-04-05 r5049 [x86_64-linux-4.18.0-425.3.1.el8.x86_64] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: SEAGATE Product: DKS2E-H4R0SS Revision: 7FA6 Compliance: SPC-3 User Capacity: 4,000,787,030,016 bytes [4.00 TB] Logical block size: 512 bytes LU is fully provisioned Rotation Rate: 7200 rpm Form Factor: 3.5 inches Logical Unit id: 0x5000c50085880b2f Serial number: Z1ZB2BGT0000R627VEN0 Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Fri Feb 3 09:09:33 2023 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK Current Drive Temperature: 0 C Drive Trip Temperature: 0 C Elements in grown defect list: 2 Error Counter logging not supported [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] Device does not support Self Test logging [bartosz@nas ~]$ scsi_temperature /dev/sdf sg_logs -t /dev/sdf error opening file: /dev/sdf: Permission denied sg_logs failed: Permission denied ~]$ sudo scsi_temperature /dev/sdg sg_logs -t /dev/sdg SEAGATE DKS2E-H4R0SS 7FA6 Current temperature = 43 C Reference temperature = 68 C ~]$ ~]$ lspci |grep -i sas 02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03) This fault is similar #1346. Witch version working property with my SAS Disk? |