In the world of data storage, hard drives play a crucial role in maintaining the integrity and accessibility of your data. However, hard drives are not immune to failures, and monitoring their health is essential to prevent data loss and ensure smooth operation.
The smartctl command, available on Linux systems, allows users to monitor and manage the "Self-Monitoring, Analysis and Reporting Technology (SMART)" configuration of hard drives.
Most modern storage devices like hard drives, ssds and nvmes provide somekind of S.M.A.R.T implementation inside them which allows software to read the values and make intelligent judgement about the overall performance and health status of the drive.
For large storage setups like data centers, this is an invaluable tool as it can help predict failures in advance and allow system admins to move data safely and avoid data loss.
In this article, we will explore the smartctl command with detailed examples. We shall run the command on local machines with ssds, hdds installed also on cloud servers like amazon elastic servers.
Installing smartctl
Before we begin, ensure that the smartctl utility is installed on your Linux system. Most distributions include it by default, but if needed, you can install it using the package manager.
For Debian-based systems
sudo apt-get install smartmontools
For CentOS-based systems
sudo yum install smartmontools
1. List all devices on the system
The scan option will make smartctl report all the availabl disk drives on the system along with their device paths and device types.
$ smartctl --scan /dev/sda -d scsi # /dev/sda, SCSI device /dev/sdb -d scsi # /dev/sdb, SCSI device /dev/sdc -d scsi # /dev/sdc, SCSI device /dev/sdd -d sat # /dev/sdd [SAT], ATA device $
In the above output the first 3 drives are internal ssds connected via sata cable to motherboard. The fourth one is a portable samsung ssd connected via USB.
2. Quick health checkup
With the H option we can do a quick health checkup and smartctl will tell us how the drive is doing at present.
$ sudo smartctl -H /dev/sda [sudo] password for enlightened: smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.0-27-generic] (local build) Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED $
If the result is PASSED, the drive is probably doing fine, though its not guaranteed.
3. Full S.M.A.R.T information of disks
The SMART technology embedded in modern hard drives offers insights into their health, performance, and reliability. smartctl allows you to extract detailed information pertaining to these aspects.
To print all SMART information about a disk, the syntax is as follows:
smartctl -a /dev/sdX
Substitute "/dev/sdX" with the suitable device identifier corresponding to your disk. For instance, to view SMART information for the first hard drive, you would use:
$ sudo smartctl -a /dev/sdb smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.0-27-generic] (local build) Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Samsung based SSDs Device Model: Samsung SSD 850 EVO 120GB Serial Number: S21SNXAGC12532L LU WWN Device Id: 5 002538 d408f4063 Firmware Version: EMT02B6Q User Capacity: 120,034,123,776 bytes [120 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device TRIM Command: Available Device is: In smartctl database 7.3/5319 ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Fri Sep 1 16:13:48 2023 IST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x53) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 64) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 16838 12 Power_Cycle_Count 0x0032 095 095 000 Old_age Always - 4523 177 Wear_Leveling_Count 0x0013 098 098 000 Pre-fail Always - 27 179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0 181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0 183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0 187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0032 063 048 000 Old_age Always - 37 195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0 199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 49 241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 1937655527 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Warning! SMART Selective Self-Test Log Structure error: invalid SMART checksum. SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing 255 0 65535 Read_scanning was never started Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. enlightened@enlightened:~$
Note there are 2 sections, first is "INFORMATION SECTION" that reports details about the drive, like the manufacturer, model, size etc. The second is the "SMART DATA" which reports SMART related parameters and their corresponding values and a bunch of other details.
The command can provide details about cloud storage drives as well like amazon elastic (aws).
linuxworld:~# smartctl -a /dev/nvme0n1p1 smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.71.1.el7.x86_64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: Amazon Elastic Block Store Serial Number: vol0bc76da967d23bf84 Firmware Version: 1.0 PCI Vendor/Subsystem ID: 0x1d0f IEEE OUI Identifier: 0xa002dc Controller ID: 0 Number of Namespaces: 1 Namespace 1 Size/Capacity: 107,374,182,400 [107 GB] Namespace 1 Formatted LBA Size: 512 Local Time is: Tue Aug 29 07:52:30 2023 CEST Firmware Updates (0x03): 1 Slot, Slot 1 R/O Maximum Data Transfer Size: 64 Pages Warning Comp. Temp. Threshold: 70 Celsius Namespace 1 Features (0x12): NA_Fields *Other* Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 0.01W - - 0 0 0 0 1000000 1000000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: - Available Spare: 0% Available Spare Threshold: 0% Percentage Used: 0% Data Units Read: 0 Data Units Written: 0 Host Read Commands: 0 Host Write Commands: 0 Controller Busy Time: 0 Power Cycles: 0 Power On Hours: 0 Unsafe Shutdowns: 0 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Error Information (NVMe Log 0x01, max 64 entries) No Errors Logged
4. Checking drive information
To view general information about your hard drive, such as its model, serial number, and firmware version, use the following command. The "-i" option prints just basic information about the drive.
smartctl -i /dev/sdX
Replace "/dev/sdX" with your hard drive identifier. Here's an example:
The following is a samsung 850 evo 120GB ssd connected internall via sata.
$ sudo smartctl -i /dev/sdb smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.0-27-generic] (local build) Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Samsung based SSDs Device Model: Samsung SSD 850 EVO 120GB Serial Number: S21SNXAGC12532L LU WWN Device Id: 5 002538 d408f4063 Firmware Version: EMT02B6Q User Capacity: 120,034,123,776 bytes [120 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device TRIM Command: Available Device is: In smartctl database 7.3/5319 ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Fri Sep 1 16:12:00 2023 IST SMART support is: Available - device has SMART capability. SMART support is: Enabled $
The smartctl command can also provide information about virtual cloud servers like amazon elastic storage (aws).
linuxworld:~# smartctl -i /dev/nvme0n1p1 smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.71.1.el7.x86_64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: Amazon Elastic Block Store Serial Number: vol0bc76da967d23bf84 Firmware Version: 1.0 PCI Vendor/Subsystem ID: 0x1d0f IEEE OUI Identifier: 0xa002dc Controller ID: 0 Number of Namespaces: 1 Namespace 1 Size/Capacity: 107,374,182,400 [107 GB] Namespace 1 Formatted LBA Size: 512 Local Time is: Tue Aug 29 08:19:12 2023 CEST
5. Checking SMART attributes
SMART attributes provide valuable information about the health (hardware condition) and performance parameters of the drive. To access the treasure trove of SMART attributes, the "-A" or "--attributes" option can be used
This command provides a comprehensive list of attributes alongside their current, worst, and threshold values. You can list these attributes using:
sudo smartctl -A /dev/sdx
Sample output Here is an example of smart data of an nvme drive.
linuxworld:~# smartctl -A /dev/nvme0n1p1 smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.71.1.el7.x86_64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF SMART DATA SECTION === SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: - Available Spare: 0% Available Spare Threshold: 0% Percentage Used: 0% Data Units Read: 0 Data Units Written: 0 Host Read Commands: 0 Host Write Commands: 0 Controller Busy Time: 0 Power Cycles: 0 Power On Hours: 0 Unsafe Shutdowns: 0 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0
Here is another drive and it smart attribute information, which looks very different from the above one. This is a 480 GB Kingston internal sata ssd on my ubuntu desktop machine.
$ sudo smartctl -A /dev/sdb smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.0-27-generic] (local build) Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 16837 12 Power_Cycle_Count 0x0032 095 095 000 Old_age Always - 4520 177 Wear_Leveling_Count 0x0013 098 098 000 Pre-fail Always - 27 179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0 181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0 183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0 187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0032 061 048 000 Old_age Always - 39 195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0 199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 49 241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 1936278631 $
There are actually many smart attributes and indicators for different parameters of the storage drive and a complete list can be found on the wikipedia page on S.M.A.R.T.
6. Estimating TBW (Terabytes written) for SSDs
For ssds we can calculate the tbw parameter using values other parameters and some math. There is a discussion on askubuntu.com about this.
Here is a quick example. The following command would report the total amount of data (in GB) written to the drive. Just make sure to put the correct device identifier path. Here its /dev/sdb.
echo "GB Written: $(echo "scale=3; $(sudo /usr/sbin/smartctl -A /dev/sdb | grep "Total_LBAs_Written" | awk '{print $10}') * 512 / 1073741824" | bc | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta')"
Here is a shorter version of the same command:
sudo /usr/sbin/smartctl -A /dev/sdb | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }'
$ sudo /usr/sbin/smartctl -A /dev/sdb | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }' TBW 0.9 $
7. Initiating tests
SMART-enabled drives offer self-testing capabilities. To initiate tests, the "-t" option followed by a test type is utilized. For instance, to execute a short self-test, you can use the below command. Also, see terminal output for more information.
sudo smartctl -t short /dev/sdx
Sample output
linuxworld:~# smartctl -t short /dev/nvme0n1p1 smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.71.1.el7.x86_64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org NVMe device successfully opened === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 1 minutes for test to complete. Test will complete after Sun Nov 16 12:51:45 2014 Use smartctl -X to abort test.
After the test concludes (typically within minutes), the results can be scrutinized using the "-l selftest" option. See the command and terminal output below.
sudo smartctl -l selftest /dev/sda
Sample output
linuxworld:~# smartctl -l selftest /dev/nvme0n1p1 smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.71.1.el7.x86_64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org NVMe device successfully opened === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 90% 492 210841222 # 2 Extended offline Completed: read failure 90% 492 210841222
8. Accessing error logs
The "-l error" option grants access to the drive's error log, providing historical insights into past issues. The command and terminal output are as follows:
sudo smartctl -l error /dev/sda
linuxworld:~# smartctl -l error /dev/nvme0n1p1 smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.71.1.el7.x86_64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === Error Information (NVMe Log 0x01, max 64 entries) SMART Error Log Version: 1 ATA Error Count: 5 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 da 08 e7 e5 a5 4c 00 00:30:44.515 READ DMA EXT 25 da 08 df e5 a5 4c 00 00:30:44.514 READ DMA EXT 25 da 80 5f e5 a5 4c 00 00:30:44.502 READ DMA EXT 25 da f0 5f e6 a5 4c 00 00:30:44.496 READ DMA EXT 25 da 10 4f e6 a5 4c 00 00:30:44.383 READ DMA EXT
9. Automating SMART monitoring
Given the demands of large-scale systems, where numerous hard drives need constant monitoring and maintenance, manual intervention becomes not only cumbersome but also impractical. Checking the health status of each drive, running tests, and generating reports manually can consume an enormous amount of time and effort.
This is where the power of automation shines, and the smartctl command comes to the forefront as a valuable tool for seamless integration into scripts and automation tools.
smartctl can be harnessed within scripts to create automated workflows that handle the monitoring and management of hard drive health. By leveraging its capabilities, system administrators can streamline the process of ensuring the integrity and performance of drives, all while minimizing the need for manual oversight.
Here's a practical example of how smartctl can be seamlessly integrated into a Bash script for automated monitoring and reporting:
#!/bin/bash EMAIL="[email protected]" LOGFILE="/var/log/smartctl.log" echo "SMARTCTL Report" > $LOGFILE date >> $LOGFILE echo "===============================" >> $LOGFILE for DEV in /dev/nvme0n1p1 do smartctl -H $DEV >> $LOGFILE echo "---------------------------" >> $LOGFILE done cat $LOGFILE | mail -s "SMARTCTL Report" $EMAIL
In this script, several key components come together to automate the process of monitoring hard drive health using smartctl:
- Email Configuration: The script starts by setting the email address (EMAIL) where the SMART monitoring report will be sent. Replace [email protected] with the appropriate email address.
- Logfile Specification: The script defines a logfile (LOGFILE) where the SMART monitoring results will be recorded. The specified path /var/log/smartctl.log is just an example; you can adjust it to match your desired directory and naming conventions.
- Creating the Report: The script initiates the creation of the SMART monitoring report by echoing a title, date, and a divider into the logfile.
- Loop Through Drives: The script employs a loop to iterate through all drive devices (/dev/sd?), where the "?" represents a single character, such as a, b, c, etc. This loop ensures that the script examines all drives.
- Run smartctl Command: Inside the loop, the smartctl -H $DEV command is executed for each drive device. This command fetches the health status of the drive and appends the result to the logfile.
- Log Separators: After each drive's health status is recorded, a separator line is added to improve readability in the logfile.
- Email the Report: Once all drives have been processed, the script uses cat to read the content of the logfile and then pipes it to the mail command with the "-s" flag to send the email with the SMART monitoring report to the specified email address.
By running this script regularly, perhaps as a scheduled task using a tool like cron, system administrators can maintain a watchful eye over the health of their drives without manual intervention. If any issues arise, the automated report will promptly notify them, enabling swift response and resolution.
The smartctl command's seamless integration into automation scripts empowers administrators to tackle the challenges posed by large-scale systems. By automating the monitoring and reporting of hard drive health, time and effort are saved, while the reliability and performance of the system are upheld.
This approach exemplifies the power of technology in easing the burdens of system management and ensuring the stability of complex environments.
Virtual machines
When run inside virtual machines, smartctl will likely not report any smart parameters as most of the time these are not available. For instance i tried running it on ubuntu running in virtualbox as a guest and the output looked like this:
sudo smartctl -a /dev/sda smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.19.0-43-generic] (local build) Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: VBOX HARDDISK Serial Number: VBa61165f2-20eea9f5 Firmware Version: 1.0 User Capacity: 53,687,091,200 bytes [53.6 GB] Sector Size: 512 bytes logical/physical Device is: Not in smartctl database 7.3/5319 ATA Version is: ATA/ATAPI-6 published, ANSI INCITS 361-2002 Local Time is: Fri Sep 1 16:19:28 2023 IST SMART support is: Unavailable - device lacks SMART capability. A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. $
As can be seen above, the command could report disk size and some limited information about the disk itself, but no smart information is available. This also makes sense, since smart data is more related to the actual physical hardware of the drive and in a virtualised environment, everything is emulated.
Conclusion
The smartctl command, nestled within the smartmontools package, emerges as an indispensable tool in the arsenal of system administrators. Its ability to uncover intricate SMART attributes, execute tests, and enable automation equips administrators to proactively safeguard against potential drive failures.
By harnessing its multifaceted capabilities, administrators can ensure data integrity, minimize downtime, and fortify their systems against the perils of hard drive failures. The practical examples and screenshots provided in this guide serve as a stepping stone towards mastering the art of smartctl on Linux systems.