Recursive Retrieval Options


Turn on recursive retrieving
--recursive
	 Turn on recursive retrieving.

Recursion depth:
--level=depth
	 Specify recursion maximum depth level depth.  The
	 default maximum depth is 5.

Delete files after download
--delete-after
	 This option tells Wget to delete every single file it
	 downloads, after having done so.  It is useful for
	 pre-fetching popular pages through a proxy, e.g.:

					 wget -r -nd --delete-after http://whatever.com/~popular/page/

	 The -r option is to retrieve recursively, and -nd to
	 not create directories.

	 Note that --delete-after deletes files on the local
	 machine.  It does not issue the DELE command to remote
	 FTP sites, for instance.  Also note that when
	 --delete-after is specified, --convert-links is
	 ignored, so .orig files are simply not created in the
	 first place.

Convert links to local paths
--convert-links
	 After the download is complete, convert the links in
	 the document to make them suitable for local viewing.
	 This affects not only the visible hyperlinks, but any
	 part of the document that links to external content,
	 such as embedded images, links to style sheets, hyper-
	 links to non-HTML content, etc.

	 Each link will be changed in one of the two ways:

	 o   The links to files that have been downloaded by
			 Wget will be changed to refer to the file they
			 point to as a relative link.

			 Example: if the downloaded file /foo/doc.html
			 links to /bar/img.gif, also downloaded, then the
			 link in doc.html will be modified to point to
			 ../bar/img.gif.  This kind of transformation works
			 reliably for arbitrary combinations of directo-
			 ries.

	 o   The links to files that have not been downloaded
			 by Wget will be changed to include host name and
			 absolute path of the location they point to.

			 Example: if the downloaded file /foo/doc.html
			 links to /bar/img.gif (or to ../bar/img.gif), then
			 the link in doc.html will be modified to point to
			 http://hostname/bar/img.gif.

	 Because of this, local browsing works reliably: if a
	 linked file was downloaded, the link will refer to its
	 local name; if it was not downloaded, the link will
	 refer to its full Internet address rather than pre-
	 senting a broken link.  The fact that the former links
	 are converted to relative links ensures that you can
	 move the downloaded hierarchy to another directory.

	 Note that only at the end of the download can Wget
	 know which links have been downloaded.  Because of
	 that, the work done by -k will be performed at the end
	 of all the downloads.

Backup original file before conversion
--backup-converted
	 When converting a file, back up the original version
	 with a .orig suffix.  Affects the behavior of -N.

Update local site (mirror)
--mirror
	 Turn on options suitable for mirroring.  This option
	 turns on recursion and time-stamping, sets infinite
	 recursion depth and keeps FTP directory listings.  It
	 is currently equivalent to -r -N -l inf -nr.

Get all files (images, sounds, etc.) 
--page-requisites
	 This option causes Wget to download all the files that
	 are necessary to properly display a given HTML page.
	 This includes such things as inlined images, sounds,
	 and referenced stylesheets.

	 Ordinarily, when downloading a single HTML page, any
	 requisite documents that may be needed to display it
	 properly are not downloaded.  Using -r together with
	 -l can help, but since Wget does not ordinarily dis-
	 tinguish between external and inlined documents, one
	 is generally left with ``leaf documents'' that are
	 missing their requisites.

	 For instance, say document 1.html contains an `'
	 tag referencing 1.gif and an `' tag pointing to
	 external document 2.html.  Say that 2.html is similar
	 but that its image is 2.gif and it links to 3.html.
	 Say this continues up to some arbitrarily high number.

	 If one executes the command:

					 wget -r -l 2 http://I/1.html

	 then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be
	 downloaded.  As you can see, 3.html is without its
	 requisite 3.gif because Wget is simply counting the
	 number of hops (up to 2) away from 1.html in order to
	 determine where to stop the recursion.  However, with
	 this command:

					 wget -r -l 2 -p http://I/1.html

	 all the above files and 3.html's requisite 3.gif will
	 be downloaded.  Similarly,

					 wget -r -l 1 -p http://I/1.html

	 will cause 1.html, 1.gif, 2.html, and 2.gif to be
	 downloaded.  One might think that:

					 wget -r -l 0 -p http://I/1.html

	 would download just 1.html and 1.gif, but unfortu-
	 nately this is not the case, because -l 0 is equiva-
	 lent to -l inf---that is, infinite recursion.  To

	 download a single HTML page (or a handful of them, all
	 specified on the commandline or in a -i URL input
	 file) and its (or their) requisites, simply leave off
	 -r and -l:

					 wget -p http://I/1.html

	 Note that Wget will behave as if -r had been speci-
	 fied, but only that single page and its requisites
	 will be downloaded.  Links from that page to external
	 documents will not be followed.  Actually, to download
	 a single page and all its requisites (even if they
	 exist on separate websites), and make sure the lot
	 displays properly locally, this author likes to use a
	 few options in addition to -p:

					 wget -E -H -k -K -nh -p http://I/I

	 In one case you'll need to add a couple more options.
	 If document is a `' page, the "one more hop"
	 that -p gives you won't be enough---you'll get the
	 `' pages that are referenced, but you won't get
	 their requisites.  Therefore, in this case you'll need
	 to add -r -l1 to the commandline.  The -r -l1 will
	 recurse from the `' page to to the `'
	 pages, and the -p will get their requisites.  If
	 you're already using a recursion level of 1 or more,
	 you'll need to up it by one.  In the future, -p may be
	 made smarter so that it'll do "two more hops" in the
	 case of a `' page.

	 To finish off this topic, it's worth knowing that
	 Wget's idea of an external document link is any URL
	 specified in an `' tag, an `' tag, or a
	 `' tag other than `'.

Recursive Accept/Reject Options

Accept files with extensions
-A acclist --accept acclist

Do not accept files with extensions
-R rejlist --reject rejlist
	 Specify comma-separated lists of file name suffixes or
	 patterns to accept or reject.

Accept files from domains
--domains=domain-list
	 Set domains to be accepted and DNS looked-up, where
	 domain-list is a comma-separated list.  Note that it
	 does not turn on -H.  This option speeds things up,
	 even if only one host is spanned.

Do not accept files from domains:
--exclude-domains domain-list
	 Exclude the domains given in a comma-separated domain-
	 list from DNS-lookup.

Follow FTP links in HTML documents
--follow-ftp
	 Follow FTP links from HTML documents.  Without this
	 option, Wget will ignore all the FTP links.

Follow HTML tags in HTML files:
--follow-tags=list
	 Wget has an internal table of HTML tag / attribute
	 pairs that it considers when looking for linked docu-
	 ments during a recursive retrieval.  If a user wants
	 only a subset of those tags to be considered, however,
	 he or she should be specify such tags in a comma-sepa-
	 rated list with this option.

Do not follow FTP links in HTML documents
--ignore-tags=list
	 This is the opposite of the --follow-tags option.  To
	 skip certain HTML tags when recursively looking for
	 documents to download, specify them in a comma-sepa-
	 rated list.

	 In the past, the -G option was the best bet for down-
	 loading a single page and its requisites, using a com-
	 mandline like:

					 wget -Ga,area -H -k -K -nh -r http://I/I

	 However, the author of this option came across a page
	 with tags like `' and came
	 to the realization that -G was not enough.  One can't
	 just tell Wget to ignore `', because then
	 stylesheets will not be downloaded.  Now the best bet
	 for downloading a single page and its requisites is
	 the dedicated --page-requisites option.

Go to foreign hosts for files 
--span-hosts
	 Enable spanning across hosts when doing recursive
	 retrieving.

Follow relative links only
--relative
	 Follow relative links only.  Useful for retrieving a
	 specific home page without any distractions, not even
	 those from the same hosts.

Follow dir. when downloading
--include-directories=list
	 Specify a comma-separated list of directories you wish
	 to follow when downloading  Elements of list may con-
	 tain wildcards.

Do not follow dir. when downloading
--exclude-directories=list
	 Specify a comma-separated list of directories you wish
	 to exclude from download  Elements of list may contain
	 wildcards.

No DNS lookup for hosts
--no-host-lookup
	 Disable the time-consuming DNS lookup of almost all
	 hosts.

Get files below the target dir.
--no-parent
	 Do not ever ascend to the parent directory when
	 retrieving recursively.  This is a useful option,
	 since it guarantees that only the files below a cer-
	 tain hierarchy will be downloaded.