Article ID: 000056596 Content Type: Product Information & Documentation Last Reviewed: 07/26/2021

Common SMART Attributes for Client Intel® SSD's and Intel® Optane™ Technology Products

Summary

Explains how SMART attributes can monitor the health of a storage device. This article describes common attributes supported on Client Intel® SSD's

Description

What are SMART attributes and how can they be useful?

Resolution

Self-Monitoring, Analysis and Reporting Technology (SMART) is an open standard used by drives and hosts to monitor drive health and to report potential problems.

Each drive operates under a predefined set of SMART attributes and corresponding threshold values, of which the drive should not pass during normal operation.

Descriptions of some SMART Health Info attributes are shown in the following table. These attributes vary depending on the Intel SSD or other drive selected. Your SSD or drive may not support some of these attributes.

SMART Attributes for SATA

ID

Attribute and Description (SATA)

05

Reallocated Sector Count

The raw value shows the number of retired blocks since leaving the factory (grown defect count).

09

Power-On Hours Count

The raw value reports the cumulative number of power-on hours over the life of the device.

Note:  The On/Off status of the Device Initiated Power Management (DIPM) feature affects the number of hours reported.

  • If DIPM is turned on, the recorded value does not include the time that the device is in a slumber state.
  • If DIPM is turned off, the recorded value should match the clock time, as all three device states are counted: active, idle, and slumber.

0C

Power Cycle Count

The raw value reports the cumulative number of power-cycle events (power on/off cycles) over the life of the device.

AA

Available Reserved Space

Reports the number of reserve blocks remaining. The normalized value begins at 100 (64h), which corresponds to 100 percent availability of the reserved space. The threshold value for this attribute is 10 percent availability.

AB

Program Fail Count

The raw value shows total count of program fails. The normalized value, beginning at 100, shows the percent remaining of allowable program fails.

AC

Erase Fail Count

The raw value shows total count of erase fail. The normalized value, beginning at 100, shows the percent remaining of allowable erase fails.

AE

Unexpected Power Loss

Reports number of unclean shutdowns, cumulative over the life of the SSD. An “unclean shutdown” is the removal of power without STANDBY IMMEDIATE as the last command (regardless of PLI activity using capacitor power). Also known as “Power-off Retract Count” per magnetic-drive terminology.

B8

End-to-End Error Detection Count

Reports number of errors encountered during Logical Block Address (LBA) tag checks within the SSD data path. The normalized value begins at 100 and decrements by 1 for each LBA tag mismatch detected. The threshold value is 90.

BB

Uncorrectable Error Count

The raw value shows the count of errors that could not be recovered using Error Correction Code (ECC).

BE

Temperature - Airflow (Case)

Reports the SSD case temperature in degree Celsius. The raw value is as follows:

  • Byte 0 = Current case temperature (° C)
  • Byte 2 = Recent minimum case temperature (° C)
  • Byte 3 = Recent maximum case temperature (° C)

The normalized value is 100. Case temperature is calculated based on an offset from internal temperature sensor.

C0

Unsafe Shutdown Count (Power-off Retract Count)

The raw value reports the cumulative number of unsafe (unclean) shutdown events over the life of the device. An unsafe shutdown occurs whenever the device is powered off without STANDBY IMMEDIATE being the last command.

C2

Temperature - Device Internal

Reports internal temperature of the SSD. Temperature reading is the value direct from the internal sensor. The raw value is the current temperature. The normalized value is the results equation min (150-current temp, 100).

C7

CRC Error Count

The total number of encountered SATA interface Cyclic Redundancy Check (CRC) errors.

E1

Host Writes

The raw value reports the total number of sectors written by the host system. The raw value increases by 1 for every 65,536 sectors written by the host.

E2

Timed Workload, Media Wear

Measures the wear seen by the SSD (since reset of the Timed Workload Timer, attribute E4), as a percentage of the maximum rated cycles.

E3

Timed Workload, Host Read/Write Ratio

The percentage of I/O operations that are read operations (since reset of the Timed Workload Timer, attribute E4).

E4

Timed Workload Timer

Measures the elapsed time (number of minutes) since starting this workload timer.

E8

Available Reserved Space

Reports the number of reserve blocks remaining. The normalized value begins at 100 (64h), which corresponds to 100 percent availability of the reserved space. The threshold value for this attribute is 10 percent availability.

E9

Media Wearout Indicator

Reports the number of cycles the NAND media has undergone. The normalized value declines linearly from 100 to 1 as the average erase cycle count increases from 0 to the maximum rated cycles. Once the normalized value reaches 1, the number will not decrease, although it is likely that significant additional wear can be put on the device.

F1

Total LBAs Written

Counts sectors written by the host.

F2

Total LBAs Read

Counts sectors read by the host.

SMART Attributes for NVMe*

ID

Attribute and Description (NVMe)

0

Critical Warning

These bits if set, flag various warning sources.

  • Bit 0: Available Spare is below Threshold
  • Bit 1: Temperature has exceeded Threshold
  • Bit 2: Reliability is degraded due to excessive media or internal errors
  • Bit 3: Media is placed in Read- Only Mode
  • Bit 4: Volatile Memory Backup System has failed (e.g., enhanced power loss capacitor test failure)
  • Bits 5-7: Reserved

Any of the critical warnings can be tied to asynchronous event notification.

1

Temperature

Reports overall Device current temperature in Kelvin.

3

Available Spare

Contains a normalized percentage (0 to 100%) of the remaining spare capacity available. Starts from 100 and decrements.

4

Available Spare Threshold

Threshold is set to 10%.

5

Percentage Used Estimate

(Value allowed to exceed 100%). A value of 100 indicates that the estimated endurance of the device has been consumed, but may not indicate a device failure. The value is allowed to exceed 100. Percentages greater than 254 shall be represented as 255. This value shall be updated once per power-on hour (when the controller is not in a sleep state).

32

Data Units Read (in LBAs)

Contains the number of 512 byte data units the host has read from the controller; this value does not include metadata. This value is reported in thousands (i.e., a value of 1 corresponds to 1000 units of 512 bytes read) and is rounded up. When the LBA size is a value other than 512 bytes, the controller shall convert the amount of data read to 512 byte units.

48

Data Units Write (in LBAs)

Contains the number of 512 byte data units the host has written to the controller; this value does not include metadata. This value is reported in thousands (i.e., a value of 1 corresponds to 1000 units of 512 bytes written) and is rounded up. When the LBA size is a value other than 512 bytes, the controller shall convert the amount of data written to 512 byte units. For the NVM command set, logical blocks written as part of Write operations shall be included in this value. Write Uncorrectable commands shall not impact this value.

64

Host Read Commands

Contains the number of read commands issued to the controller.

80

Host Write Commands

Contains the number of write commands issued to the controller.

96

Controller Busy Time (in minutes)

Contains the amount of time the controller is busy with I/O commands. The controller is busy when there is a command outstanding to an I/O Queue. (Specifically, a command was issued by way of an I/O Submission Queue Tail doorbell write and the corresponding completion queue entry has not been posted yet to the associated I/O Completion Queue.) This value is reported in minutes.

112

Power Cycles

Contains the number of power cycles

128

Power-On Hours

Contains the number of power-on hours. This does not include time that the controller was powered and in a low-power state condition.

144

Unsafe shutdowns

Contains the number of unsafe shutdowns. This count is incremented when a shutdown notification (CC.SHN) is not received prior to loss of power.

160

Media Errors

Contains the number of occurrences where the controller detected an unrecovered data integrity error. Errors such as uncorrectable ECC, CRC checksum failure, or LBA tag mismatch are included in this field.

176

Number of Error Information Log Entries

Contains the number of Error Information log entries over the life of the controller.

192

Warning Composite Temperature Time

Contains the amount of time in minutes that the controller is operational and the Composite Temperature is greater than or equal to the Warning Composite Temperature Threshold (WCTEMP) field and less than the Critical Composite Temperature Threshold (CCTEMP) field in the Identify Controller data structure.

196

Critical Composite Temperature Time

Contains the amount of time in minutes that the controller is operational and the Composite Temperature is greater the Critical Composite Temperature Threshold (CCTEMP) field in the Identify Controller data structure.

Disclaimer

1

All postings and use of the content on this site are subject to Intel.com Terms of Use.