Wget Command
'Wget' is developed as a part of the GNU Project. You can use it to download/extract data and content from web servers. Its name is a combination of the "www" and the word get.
It supports downloading over multiple protocols like FTP, SFTP, HTTP and HTTPS.
Wget is written in c and can be used on any Unix system. It can also be compiled on mac, windows, AmigaOS and other popular operating systems.
Installing Wget
Most Linux distributions today have pre-installed wget packages. You can check if the wget package on your system is installed or not by typing the following command to check its version (or you can just run wget without any option).
$ wget --version GNU Wget 1.21.2 built on linux-gnu.
If your Linux machine does not have 'wget' installed yet, run the below command for installing:
On Ubuntu/Debian distros
$ sudo apt-get install wget
On CentOS/RHEL/Fedora Distro
# sudo yum install wget
On Arch Linux Distro
$ sudo pacman -S wget
Basic command Syntax
To check the syntax of 'wget', try with '--help' option:
$ wget --help
Running the above command gives us the following result:
GNU Wget 1.21.2, a non-interactive network retriever. Usage: wget [OPTION]... [URL]...
Examples of 'Wget' Command in basic usage
We will show you some examples of the 'wget' command that you will probably use every day. It's worth noting that these examples can also be incorporated into shell scripts or scheduled tasks using cron jobs, allowing you to automate and streamline your workflow.
1. Download single file
In its simplest syntax, when used without any options, it downloads the resource specified in to the current directory (Type command 'pwd' to check your current directory)
$ wget http://ftp.gnu.org/gnu/wget/wget2-latest.tar.lz
Running the above command gives us the following result:
--2023-04-21 09:24:27-- http://ftp.gnu.org/gnu/wget/wget2-latest.tar.lz Resolving ftp.gnu.org (ftp.gnu.org)... 209.51.188.20, 2001:470:142:3::b Connecting to ftp.gnu.org (ftp.gnu.org)|209.51.188.20|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2059922 (2.0M) [application/x-tar] Saving to: 'wget2-latest.tar.lz' ...
During the download, a progress bar is displayed along with the file name, size, download speed, and estimated time to complete. Once the process is complete, you can find the downloaded file in your current directory.
Incase file is duplicated, 'wget' will add a .number automatically at the end of the file's name.
2. Download multiple files
With a single command, specify multiple file urls separated by space and it will download them all.
$ wget https://download.fedoraproject.org/pub/fedora/linux/releases/38/Workstation/aarch64/images/Fedora-Workstation-38-1.6.aarch64.raw.xz https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.2.tar.xz
Running the above command gives us the following result:
--2023-04-21 13:38:39-- https://download.fedoraproject.org/pub/fedora/linux/releases/38/Workstation/aarch64/images/Fedora-Workstation-38-1.6.aarch64.raw.xz Resolving download.fedoraproject.org (download.fedoraproject.org)... 13.125.120.8, 38.145.60.21, 13.233.183.170, ... Connecting to download.fedoraproject.org (download.fedoraproject.org)|13.125.120.8|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://mirrors.tuna.tsinghua.edu.cn/fedora/releases/38/Workstation/aarch64/images/Fedora-Workstation-38-1.6.aarch64.raw.xz [following] --2023-04-22 23:38:40-- https://mirrors.tuna.tsinghua.edu.cn/fedora/releases/38/Workstation/aarch64/images/Fedora-Workstation-38-1.6.aarch64.raw.xz Resolving mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)... 101.6.15.130, 2402:f000:1:400::2 Connecting to mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)|101.6.15.130|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 4080790616 (3.8G) [application/octet-stream] Saving to: 'Fedora-Workstation-38-1.6.aarch64.raw.xz' ...
Or we can create a text file and put the download URLs in it.
This command will create a file named 'example.txt' and open the text editor.
vi link_URL.txt
Then, paste the download urls in it in plain text format one in each line:
http://ftp.gnu.org/gnu/wget/wget2-latest.tar.lz https://download.fedoraproject.org/pub/fedora/linux/releases/38/Workstation/aarch64/images/Fedora-Workstation-38-1.6.aarch64.raw.xz https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.2.tar.xz
You can then use the '-i' option to get all the files contained in the example file:
$ wget -i link_URL.txt
Running the above command gives us the following result:
--2023-04-21 15:45:08-- http://ftp.gnu.org/gnu/wget/wget2-latest.tar.lz Resolving ftp.gnu.org (ftp.gnu.org)... 209.51.188.20, 2001:470:142:3::b Connecting to ftp.gnu.org (ftp.gnu.org)|209.51.188.20|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2059922 (2.0M) [application/x-tar] ....
3. Download file with speed limit
With wget, you can limit the download speed. This is useful when you are downloading a large file but don't want it consume your entire internet bandwidth.
$ wget --limit-rate=100k http://mirrors.vhost.vn/centos/7.9.2009/isos/x86_64/CentOS-7-x86_64-NetInstall-2009.iso
Running the above command gives us the following result:
--2023-04-21 23:53:52-- http://mirrors.vhost.vn/centos/7.9.2009/isos/x86_64/CentOS-7-x86_64-NetInstall-2009.iso Resolving mirrors.vhost.vn (mirrors.vhost.vn)... 103.27.60.115 Connecting to mirrors.vhost.vn (mirrors.vhost.vn)|103.27.60.115|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 602931200 (575M) [application/octet-stream] Saving to: ‘CentOS-7-x86_64-NetInstall-2009.iso’ -7-x86_64-NetInstall-2009.iso 0%[ ] 502.97K 100KB/s eta 98m 4s
As you can see in the example above, we use option '--limit-rate=100k' and its limited speed to 100kb/s:
Append "k" for kilobytes. In another case, we can use the speed rate to "m" for megabytes, and "g" for gigabytes.
4. Download files in background
For extremely large files, you can use option '-b'. It will run the download process in the background. It will also create a 'wget-log' file in the current directory, which will contain progress and status data.
You can watch the status of the download with the tail command.
$ wget -b https://releases.ubuntu.com/22.04.2/ubuntu-22.04.2-desktop-amd64.iso
Running the above command gives us the following result:
Continuing in background, pid 4521. Output will be written to 'wget-log'5.
Next, we verify the status of current download progress.
$ tail -f wget-log
Running the above command gives us the following result:
207300K .......... .......... .......... .......... .......... 4% 57.7M 7m5s 207350K .......... .......... .......... .......... .......... 4% 61.6M 7m5s 207400K .......... .......... .......... .......... .......... 4% 17.5M 7m5s 207450K .......... .......... .......... .......... .......... 4% 113M 7m5s 207500K .......... .......... .......... .......... .......... 4% 27.5M 7m5s 207550K .......... .......... .......... .......... .......... 4% 15.6M 7m5s 207600K .......... .......... .......... .......... .......... 4% 90.2M 7m5s 207650K .......... .......... .......... .......... .......... 4% 19.8M 7m5s 207700K .......... .......... .......... .......... .......... 4% 112M 7m4s ...
5. Pause/Resume download
Wget supports a very useful option that allows you to resume downloading files that were interrupted for some reason. Instead of starting the whole download from the start resume from where it was interrupted with the '-c' option in wget.
For example, when terminal window is showing download progress of your file, you just enter the following keyboard shortcut to pause the download:
Ctrl + c
Next, we resume it as following command:
$ wget -c http://mirrors.vhost.vn/centos/7.9.2009/isos/x86_64/CentOS-7-x86_64-NetInstall-2009.iso
Running the above command gives us the following result:
--2023-04-24 17:58:42-- http://mirrors.vhost.vn/centos/7.9.2009/isos/x86_64/CentOS-7-x86_64-NetInstall-2009.iso Resolving mirrors.vhost.vn (mirrors.vhost.vn)... 103.27.60.115 Connecting to mirrors.vhost.vn (mirrors.vhost.vn)|103.27.60.115|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 602931200 (575M) [application/octet-stream] Saving to: ‘CentOS-7-x86_64-NetInstall-2009.iso’ CentOS-7-x86_64-NetInstall-20 6%[==> ] 34.80M 11.5MB/s eta 47s
6. Save downloaded file with specific name
By default wget guesses the name of the file from the download url and picks up the part from the last forward slash "/" as the filename for saving.
Using the "-O" option we can specify the filename that we want to use for saving the file.
For example, I can download from "http://ftp.gnu.org/gnu/wget/wget2-latest.tar.lz", and save it with the name 'wget_01.tar.lz' with the following command
$ wget -O wget_01.tar.lz http://ftp.gnu.org/gnu/wget/wget2-latest.tar.lz
Running the above command gives us the following result:
--2023-04-24 18:46:01-- http://ftp.gnu.org/gnu/wget/wget2-latest.tar.lz Resolving ftp.gnu.org (ftp.gnu.org)... 209.51.188.20, 2001:470:142:3::b Connecting to ftp.gnu.org (ftp.gnu.org)|209.51.188.20|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2059922 (2.0M) [application/x-tar] Saving to: ‘wget_01.tar.lz’ wget_01.tar.lz 100%[===================================>] 1.96M 1.02MB/s in 1.9s 2023-04-24 18:46:04 (1.02 MB/s) - ‘wget_01.tar.lz’ saved [2059922/2059922]
7. Save to compressed file with tar
We can also pipe with 'tar' command to get a compressed output file with a single single command:
$ wget http://centos-hcm.viettelidc.com.vn/7.9.2009/isos/x86_64/0_README.txt | tar -czvf note_file.tar.gz /home/jayce/Downloads/0_README.txt
Running the above command gives us the following result:
--2023-04-24 20:48:07-- http://centos-hcm.viettelidc.com.vn/7.9.2009/isos/x86_64/0_README.txt tar: /home/jayce/Downloads/0_README.txt: Cannot stat: No such file or directory Resolving centos-hcm.viettelidc.com.vn (centos-hcm.viettelidc.com.vn)... tar: Exiting with failure status due to previous errors 115.84.182.155 Connecting to centos-hcm.viettelidc.com.vn (centos-hcm.viettelidc.com.vn)|115.84.182.155|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2740 (2.7K) [text/plain] Saving to: ‘0_README.txt’ 0_README.txt 100%[============================>] 2.68K --.-KB/s in 0s 2023-04-24 20:48:07 (342 MB/s) - ‘0_README.txt’ saved [2740/2740]
Next, we list files to verify the result.
$ ls /home/jayce/Downloads 0_README.txt note_file.tar.gz
8. Download and save files to specific directory
By default, the downloaded file will be saved in the current working directory. To save a file to a specific location, use the '-P' option:
$ wget -P /home/jayce/Downloads http://ftp.gnu.org/gnu/wget/wget2-latest.tar.lz
Running the above command gives us the following result:
--2023-04-24 20:03:46-- http://ftp.gnu.org/gnu/wget/wget2-latest.tar.lz Resolving ftp.gnu.org (ftp.gnu.org)... 209.51.188.20, 2001:470:142:3::b Connecting to ftp.gnu.org (ftp.gnu.org)|209.51.188.20|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2059922 (2.0M) [application/x-tar] Saving to: ‘/home/jayce/Downloads/wget2-latest.tar.lz’ wget2-latest.tar.lz 100%[===================================>] 1.96M 1.10MB/s in 1.8s 2023-04-24 20:03:49 (1.10 MB/s) - ‘/home/jayce/Downloads/wget2-latest.tar.lz’ saved [2059922/2059922]
9. Change the user agent
When your browser connects to a website, it includes the 'user-agent' field in its HTTP header. The contents of the User agent field vary between browsers.
Each browser has its own separate User agent text. Basically, a User agent is a way for the browser to report its software name and version to the remote web server.
Some websites only accept certain user-agents. So, change the user-agent to download files from that site using the '--user-agent' option.
You can check current 'user-agent' field with option '-d':
$ wget -d http://google.com
Running the above command gives us the following result:
DEBUG output created by Wget 1.21.2 on linux-gnu. Reading HSTS entries from /home/jayce/.wget-hsts URI encoding = ‘UTF-8’ Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8) --2023-04-24 21:17:20-- http://google.com/ Resolving google.com (google.com)... 142.251.130.14, 2404:6800:4005:814::200e Caching google.com => 142.251.130.14 2404:6800:4005:814::200e Connecting to google.com (google.com)|142.251.130.14|:80... connected. Created socket 3. Releasing 0x0000558851a6d760 (new refcount 1). ---request begin--- GET / HTTP/1.1 Host: google.com User-Agent: Wget/1.21.2 Accept: */* Accept-Encoding: identity Connection: Keep-Alive ---request end---
As you can see, the default 'User-Agent' field is 'Wget/1.21.2'. Then, you want to change to another agent like 'Mozilla', try with command:
$ wget -d --user-agent=" Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" "http://google.com"
Running the above command gives us the following result:
Setting --user-agent (useragent) to Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36 DEBUG output created by Wget 1.21.2 on linux-gnu. Reading HSTS entries from /home/jayce/.wget-hsts URI encoding = ‘UTF-8’ Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8) --2023-04-24 21:53:16-- http://google.com/ Resolving google.com (google.com)... 142.250.204.142, 2404:6800:4005:80f::200e Caching google.com => 142.250.204.142 2404:6800:4005:80f::200e Connecting to google.com (google.com)|142.250.204.142|:80... connected. Created socket 3. Releasing 0x000055cde2db88b0 (new refcount 1). ---request begin--- GET / HTTP/1.1 Host: google.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36 Accept: */* Accept-Encoding: identity Connection: Keep-Alive ---request end---
Advanced usage of the wget command
Now lets take a look at some more examples of using wget in conjuction with other utilities to perform more complicated tasks.
1. Scheduling downloads at a specific time
Suppose you need to download files at 5:00 AM every day. Because wget itself doesn't have scheduling properties, so for this example we combine it with Crontab, which is a time-based job scheduler in Linux, used to schedule jobs (commands or shell scripts) to run periodically at fixed times, dates, or intervals.
Let's take a look on crontab command:
crontab -e Edit crontab file, or create one if it doesn’t already exist. crontab -l crontab list of cronjobs , display crontab file contents. crontab -r Remove your crontab file.
To create a crontab file, you can follow the format syntax of the file, which consists of these fields: minute (m), hour (h), day of month (DOM), month (M), day of week (DOW), and the command (CMD) to execute.
M H DOM MON DOW COMMAND Field Description Allowed Value M Minute field 0 to 59 or use '*' in these fields (for 'any'). H Hour field 0 to 23 or use '*' in these fields (for 'any'). DOM Day of Month 1-31 or use '*' in these fields (for 'any'). MON Month field 1-12 or use '*' in these fields (for 'any'). DOW Day Of Week 0-6 or use '*' in these fields (for 'any'). COMMAND Command Any command to be executed.
So, based on the example requirement, we have a crontab file below:
0 5 * * * wget http://ftp.gnu.org/gnu/wget/wget2-latest.tar.lz
Once you've added the entry, you should restart the 'cron.service' and check the status to ensure that the new job has been added to the schedule.
# systemctl restart cron.service # systemctl status cron.service
Running the above command gives us the following result:
● cron.service - Regular background program processing daemon Loaded: loaded (/lib/systemd/system/cron.service; enabled; vendor preset: enabled) Active: active (running) since Sun 2023-04-21 13:34:32 +07; 3s ago Docs: man:cron(8) Main PID: 6485 (cron) Tasks: 1 (limit: 4568) Memory: 408.0K CPU: 3ms CGroup: /system.slice/cron.service └─6485 /usr/sbin/cron -f -P Apr 21 13:34:33 UBUNTU-SRV01 cron[6485]: (CRON) INFO (pidfile fd = 3) Apr 21 13:34:32 UBUNTU-SRV01 systemd[1]: Started Regular background program processing daemon. Apr 21 13:34:33 UBUNTU-SRV01 cron[6485]: (CRON) INFO (Skipping @reboot jobs -- not system startup)
2. Monitoring website changes
You can also use wget to monitor websites for changes, either manually or as part of a script. In the below example, we use wget to download the homepage of a website, and then compare the downloaded files to see if there are any changes.
Firstly, we download web data by using wget. We are going to check the change for EUR-USD price on Google Finance website:
# wget https://www.google.com/finance/quote/EUR-USD
Running the above command gives us the following result:
--2023-04-21 14:35:34-- https://www.google.com/finance/quote/EUR-USD Resolving www.google.com (www.google.com)... 142.251.220.36, 2404:6800:4005:81c::2004 Connecting to www.google.com (www.google.com)|142.251.220.36|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘EUR-USD’ EUR-USD [ ] 1.02M 2.26MB/s in 0.5s 2023-04-21 14:35:35 (2.26 MB/s) - ‘EUR-USD’ saved [1070042]
Then, we change the name of the file with the prefix "Previous" as follows.
# mv EUR-USD Previous_EUR-USD
After downloading the files, we use a below shell script. The "cmp" command to compare the downloaded file with the previous version is to check for any changes.
#!/bin/sh wget https://www.google.com/finance/quote/EUR-USD $Log_File = /root/log-monitoring-web-changes/log.txt if cmp -s last_EUR-USD EUR-USD; then echo "`date`" ": EUR-USD Price not changed." >> $Log_File else echo "`date`" ": EUR-USD Price changed." >> $Log_File fi rm Previous_EUR-USD mv EUR-USD Previous_EUR-USD
To verify that, we can run a test:
root@UBUNTU-SRV01:~# ./compare.sh
Running the above command gives us the following result:
--2023-04-21 15:11:15-- https://www.google.com/finance/quote/EUR-USD Resolving www.google.com (www.google.com)... 142.250.66.100, 2404:6800:4005:813::2004 Connecting to www.google.com (www.google.com)|142.250.66.100|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘EUR-USD’ EUR-USD [ ] 1.02M 1.82MB/s in 0.6s 2023-04-21 15:11:16 (1.82 MB/s) - ‘EUR-USD’ saved [1071314]
Checking the log is an essential step to ensure that the script is working correctly. It contains the date/time and result about checking changes:
root@UBUNTU-SRV01:~# cat log-monitoring-web-changes/log.txt Sun Apr 21 03:05:18 PM +07 2023 : EUR-USD Price not changed. Sun Apr 21 03:11:16 PM +07 2023 : EUR-USD Price changed.
To automate the script to run hourly or daily, you can add it to either the /etc/cron.hourly or /etc/cron.daily directory, depending on your desired frequency.
Troubleshooting with wget
The 'wget' command is not limited to downloading purposes only. In addition, it can be utilized for troubleshooting some issues in the daily operations.
For instance, network or connection timeout issues can be diagnosed and troubleshooted with 'wget' command. This is the most common issue caused by a variety of reasons, including network congestion, server issues, or site availability.
With the 'wget' command, you get several options that can help you retry failed connections refused by the server or adjust timeouts. Here are a few options you can refer:
Example we test http connection to website "medium.com"
# wget --retry-connrefused --timeout=15 http://medium.com/
Running the above command gives us the following result:
--2023-04-21 16:11:21-- http://medium.com/ Resolving medium.com (medium.com)... ::1, 127.0.0.1 Connecting to medium.com (medium.com)|::1|:80... failed: Connection refused. Connecting to medium.com (medium.com)|127.0.0.1|:80... failed: Connection refused. Retrying. --2023-04-21 16:11:22-- (try: 2) http://medium.com/ Connecting to medium.com (medium.com)|::1|:80... failed: Connection refused. Connecting to medium.com (medium.com)|127.0.0.1|:80... failed: Connection refused. Retrying. --2023-04-21 16:11:24-- (try: 3) http://medium.com/ Connecting to medium.com (medium.com)|::1|:80... failed: Connection refused. Connecting to medium.com (medium.com)|127.0.0.1|:80... failed: Connection refused. Retrying.
Another example:
# wget --retry-connrefused --timeout=15 http://google.com/
Running the above command gives us the following result:
--2023-04-21 16:13:10-- http://google.com/ Resolving google.com (google.com)... 142.250.199.78, 2404:6800:4005:804::200e Connecting to google.com (google.com)|142.250.199.78|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: http://www.google.com/ [following] --2023-04-21 16:13:11-- http://www.google.com/ Resolving www.google.com (www.google.com)... 142.250.207.68, 2404:6800:4005:80d::2004 Connecting to www.google.com (www.google.com)|142.250.207.68|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘index.html’ index.html [ ] 15.98K --.-KB/s in 0.06s 2023-04-21 16:13:11 (290 KB/s) - ‘index.html’ saved [16363]
Conclusion
Wget command is a powerful tool for downloading files from the web, whether it's a single file or an entire website. Moreover, Its ability to automate tasks and integrate with scripts makes it an indispensable tool for linux users and administrators.
There are also other commands like curl that can be used to download web content on the command line.
Let us know your comments below!
wget and curl is basic command and easy use
yes, both wget and curl are easy to use from the command line on linux.
if you want these commands on windows you can use wsl to run command-line ubuntu, or use the cygwin utilities.