Add .html extension to downloaded files --html-extension If a file of type text/html is downloaded and the URL does not end with the regexp \.[Hh][Tt][Mm][Ll]?, this option will cause the suffix .html to be appended to the local filename. This is useful, for instance, when you're mirroring a remote site that uses .asp pages, but you want the mirrored pages to be viewable on your stock Apache server. Another good use for this is when you're downloading the output of CGIs. A URL like http://site.com/article.cgi?25 will be saved as article.cgi?25.html. Note that filenames changed in this way will be re- downloaded every time you re-mirror a site, because Wget can't tell that the local X.html file corresponds to remote URL X (since it doesn't yet know that the URL produces output of type text/html. To prevent this re-downloading, you must use -k and -K so that the original version of the file will be saved as X.orig. (** Not implemented in interface **) --http-user=user --http-passwd=password Specify the username user and password password on an HTTP server. According to the type of the challenge, Wget will encode them using either the `basic' (inse- cure) or the `digest' authentication scheme. Another way to specify username and password is in the URL itself. For more information about security issues with Wget, Get files from remote site rather than sites cache --cache=on/off When set to off, disable server-side cache. In this case, Wget will send the remote server an appropriate directive (Pragma: no-cache) to get the file from the remote service, rather than returning the cached ver- sion. This is especially useful for retrieving and flushing out-of-date documents on proxy servers. Caching is allowed by default. Disable the use of cookies --cookies=on/off When set to off, disable the use of cookies. Cookies are a mechanism for maintaining server-side state. The server sends the client a cookie using the `Set-Cookie' header, and the client responds with the same cookie upon further requests. Since cookies allow the server owners to keep track of visitors and for sites to exchange this information, some consider them a breach of privacy. The default is to use cook- ies; however, storing cookies is not on by default. Load cookies from --load-cookies file Load cookies from file before the first HTTP retrieval. The format of file is one used by Netscape and Mozilla, at least their Unix version. Save cookies to --save-cookies file Save cookies from file at the end of session. Cookies whose expiry time is not specified, or those that have already expired, are not saved. Ignore 'Content-Length' header field --ignore-length Unfortunately, some HTTP servers (CGI programs, to be more precise) send out bogus `Content-Length' headers, which makes Wget go wild, as it thinks not all the document was retrieved. You can spot this syndrome if Wget retries getting the same document again and again, each time claiming that the (otherwise normal) connection has closed on the very same byte. With this option, Wget will ignore the `Con- tent-Length' header---as if it never existed. Define additional headers --header=additional-header Define an additional-header to be passed to the HTTP servers. Headers must contain a : preceded by one or more non-blank characters, and must not contain new- lines. You may define more than one additional header by specifying --header more than once. wget --header='Accept-Charset: iso-8859-2' \ --header='Accept-Language: hr' \ http://fly.srk.fer.hr/ Specification of an empty string as the header value will clear all previous user-defined headers. (** Not implemented in interface **) --proxy-user=user --proxy-passwd=password Specify the username user and password password for authentication on a proxy server. Wget will encode them using the `basic' authentication scheme. Include 'Referer: url' header --referer=url Include `Referer: url' header in HTTP request. Useful for retrieving documents with server-side processing that assume they are always being retrieved by inter- active web browsers and only come out properly when Referer is set to one of the pages that point to them. Save headers --save-headers Save the headers sent by the HTTP server to the file, preceding the actual contents, with an empty line as the separator. Identify as agent-string --user-agent=agent-string Identify as agent-string to the HTTP server. The HTTP protocol allows the clients to identify them- selves using a `User-Agent' header field. This enables distinguishing the WWW software, usually for statistical purposes or for tracing of protocol viola- tions. Wget normally identifies as Wget/version, ver- sion being the current version number of Wget. However, some sites have been known to impose the pol- icy of tailoring the output according to the `User-Agent'-supplied information. While conceptually this is not such a bad idea, it has been abused by servers denying information to clients other than `Mozilla' or Microsoft `Internet Explorer'. This option allows you to change the `User-Agent' line issued by Wget. Use of this option is discouraged, unless you really know what you are doing. FTP Options Do not remove .listing files --dont-remove-listing Don't remove the temporary .listing files generated by FTP retrievals. Normally, these files contain the raw directory listings received from FTP servers. Not removing them can be useful for debugging purposes, or when you want to be able to easily check on the con- tents of remote server directories (e.g. to verify that a mirror you're running is complete). Note that even though Wget writes to a known filename for this file, this is not a security hole in the sce- nario of a user making .listing a symbolic link to /etc/passwd or something and asking `root' to run Wget in his or her directory. Depending on the options used, either Wget will refuse to write to .listing, making the globbing/recursion/time-stamping operation fail, or the symbolic link will be deleted and replaced with the actual .listing file, or the listing will be written to a .listing.number file. Even though this situation isn't a problem, though, `root' should never run Wget in a non-trusted user's directory. A user could do something as simple as linking index.html to /etc/passwd and asking `root' to run Wget with -N or -r so the file will be overwrit- ten. Turn off wildcards in file names --glob=on/off Turn FTP globbing on or off. Globbing means you may use the shell-like special characters (wildcards), like *, ?, [ and ] to retrieve more than one file from the same directory at once, like: wget ftp://gnjilux.srk.fer.hr/*.msg By default, globbing will be turned on if the URL con- tains a globbing character. This option may be used to turn globbing on or off permanently. You may have to quote the URL to protect it from being expanded by your shell. Globbing makes Wget look for a directory listing, which is system-specific. This is why it currently works only with Unix FTP servers (and the ones emulating Unix `ls' output). Use passive FTP retrieval --passive-ftp Use the passive FTP retrieval scheme, in which the client initiates the data connection. This is some- times required for FTP to work behind firewalls. Retrieve files from symbolic links --retr-symlinks Usually, when retrieving FTP directories recursively and a symbolic link is encountered, the linked-to file is not downloaded. Instead, a matching symbolic link is created on the local filesystem. The pointed-to file will not be downloaded unless this recursive retrieval would have encountered it separately and downloaded it anyway. When --retr-symlinks is specified, however, symbolic links are traversed and the pointed-to files are retrieved. At this time, this option does not cause Wget to traverse symlinks to directories and recurse through them, but in the future it should be enhanced to do this. Note that when retrieving a file (not a directory) because it was specified on the commandline, rather than because it was recursed to, this option has no effect. Symbolic links are always traversed in this case.