Custom Query (1111 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (73 - 75 of 1111)

Ticket Resolution Summary Owner Reporter
#727 invalid smartctl hangs with Areca 1883 Iavor Stoev
Description

Hello,

We have hundreds of Areca RAID controllers model ARC1231/ARC1882 in production, and we use smartctl to poll the SMART parameters of the drives for our monitoring needs.

Several months ago we have started using Areca 1883 controllers in our latest servers.

The new machines are with the following hardware & software setup:

SuperMicro? X10DRI-T BIOS v2.0

SuperMicro? CA-846BA9B chassis

2 x ARC-1883IX-24 Firmware Version : V1.52 2015-11-20

Each of the Areca controllers has 12 x INTEL P3500 1.2TB SSD drives.

Areca CLI, Version: 1.14.7, Arclib: 350, Date: May 19 2015( Linux )

Debian Jessie Stable with kernel 3.16.17 + latest driver from dkms arcmsr version v1.30.0X.23-20151225

The controllers work fine but we experience the following issue: After several hours of uptime, the smartctl utility that we invoke every 5 minutes stalls and it's unable to display any information. The same is true if we try to use areca CLI utility.

When the problem occurs, the strace of the smartctl/CLI process, shows that the it is waiting for sg3 or sg1: open("/dev/sg3" (device or resource busy)

sg_map shows the following info

/dev/sg0 /dev/sda - Areca 1883 /dev/sg1 /dev/sg2 /dev/sdb - Areca 1883 /dev/sg3

If we execute sg_reset for the problem sg device, it sometime solves the problem, but sometime leaves the machine with a bad I/O performance and we needd to reboot the machine in order to restore the performance.

If we stop using smartctl and switch to areca CLI in order to collect the SMART parameters (the problem is that the CLI displays very limited amount of SMART parameters) everything works fine.

We experience the issue with all of the following smartmontools versions: 6.3+svn4002-2+b2 6.4+svn4214-1 6.5+svn4324.

Could you advise what could be done in order to solve this issue? If you need any other information or debug info we will be glad to provide it.

We haven't experienced such issue with any of our servers with Areca 1231 & 1882 controllers and the same hardware & software setup.

Thank you

Iavor Stoev Project Manager Head of System & Network Administration Department ICDSoft Ltd - http://icdsoft.com

#781 invalid smartctl for OCZ-AGILITY3 incorrectly reporting temperature of 128C Daniel
Description

When running smartctl in FreeNAS 10 on an OCZ-AGILITY3 drive, I get a Temperature_Celsius reading of 128. The other drives in my system (Western Digital Reds) report values between 25-27C.

[root@freenas] ~# smartctl -a /dev/ada4
smartctl 6.4 2015-06-04 r4109 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     OCZ-AGILITY3
Serial Number:    OCZ-XXXXXXXXXXXXXXXX
LU WWN Device Id: 5 e83a97 f1fb643ef
Firmware Version: 2.15
User Capacity:    60,022,480,896 bytes [60.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Dec 15 17:11:44 2016 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 2097) seconds.
Offline data collection
capabilities:                    (0x7f) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Abort Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  48) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0021) SCT Status supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   088   088   050    Pre-fail  Always       -       0/40025267
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   098   098   000    Old_age   Always       -       2522h+33m+58.310s
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       640
171 Program_Fail_Count      0x0032   000   000   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       15
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       3
181 Program_Fail_Count      0x0032   000   000   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   128   129   000    Old_age   Always       -       128 (0 127 0 129 0)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/40025267
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/40025267
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/40025267
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0013   100   100   010    Pre-fail  Always       -       0
233 SandForce_Internal      0x0000   000   000   000    Old_age   Offline      -       2000
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       2733
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       2733
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       4477

SMART Error Log not supported

SMART Self-test Log not supported

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I can include an actual Serial Number if that's necessary. If you care to view the ticket I originally filed with FreeNAS 10, it's included here.

#1233 invalid smartctl exit code 4. "scsiPrintFormatStatus: Failed" in -a output padner
Description

After upgrade smartmontools from 6.6-1.el7 to 7.0-1.el7, smartctl exit with code 4.

This line is present in the output: scsiPrintFormatStatus: Failed [Input/output error]

smartctl -a /dev/bus/0 -d megaraid,44
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.14.35-1818.1.6.el7uek.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUH721212AL5204
Revision:             C3D0
Compliance:           SPC-4
User Capacity:        12,000,138,625,024 bytes [12.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca253084a6c
Serial number:        8DG4KATZ
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Fri Sep  6 11:27:00 2019 VLAT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

scsiPrintFormatStatus: Failed [Input/output error]
Current Drive Temperature:     25 C
Drive Trip Temperature:        85 C

Manufactured in week 43 of year 2017
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  159
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  168
Vendor (Seagate Cache) information
  Blocks sent to initiator = 109731280584704

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0     120567      14366.075           0
write:         0        0         0         0      14168       5367.122           0
verify:        0        0         0         0        396     612007.092           0

Non-medium error count:        0

No Self-tests have been logged

But with another drive of the same model, everything is fine. This text instead of an error string

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Note: See TracQuery for help on using queries.