wget Tips and Tricks

Some useful wget features:

  1. Filter downloads by extensions using -A (Accept) and -R (Reject):
    wget -r -A.pdf http://example.com/
    wget -r -R.html http://example.com/
          

    Recursively traverse the website http://example.com. In the first case, download only PDF files; in the second, download everything except HTML files.

  2. Check for broken links on a site:
    wget --spider -r -o log.txt http://example.com
          

    This makes wget behave like a "web spider": it recursively checks all links on the site and logs the results in log.txt.

  3. Download files that require cookies (e.g., Sun JDK) without a browser:
    wget --header='Cookie: gpw_e24=<VALUE_OF_COOKIE>' '<DOWNLOAD_LINK>'
          
    • <VALUE_OF_COOKIE> — the cookie value of gpw_e24 (confirms license agreement).
    • <DOWNLOAD_LINK> — the URL of the file required by emerge.

    You can specify a path and filename, or simply navigate to the desired folder and wget will save the file with the original server name.