How to Properly Mirror a Webpage (using wget)

2011.11.07 | Web Development | ,

reflectionWhen you’re building a front-end component for a client’s site, often times it’s useful to grab a working copy of the page in question for the purposes of testing if your code will work within their site.

The way most people (try to) do it is by navigating to the page and doing a simple “File -> Save as…”.  However, this rarely results in a complete working page.  Frequently, CSS embedded images and Javascript interactions will fail in your downloaded copy. Additionally, some browsers will re-write the HTML itself, resulting in a version of the page that doesn’t exactly reflect the original copy.  Finally, this process often involves changing where files themselves are located, which can break scripts and some images.

Solution? Use wget. After installing wget, use the following command:

$ wget -E -H -k -K -p http://example.com/path/to/page
 

This creates a local mirror of the page + all external elements with file paths rewritten to reflect this.  You’ll end up with a folder called “example.com” and possibly others if the page referenced other domains.  Drill down through the file path structure to your file (so in this example, you’d find your page in the example.com/path/to/ directory).

  • MC

    This worked perfectly.