Accessing Extinct Websites
While on the Internet a user clicks on a favorite/bookmark. The dreaded “Error 404 Not Found” message appears in the browser window. If the URL is correct and the site truly existed in the past there is hope. The data may still be viewable.
The web is always changing. Perhaps the desired webpage has changed from the last visit. It was just three days ago that the site was visited and contained vital information for successful completion of a project or a quote for a report. It has been replaced with new content, a new design, a new web master. This site too may be recoverable.
Perhaps the problem is as simple as a server has gone down and service is expected back soon(?). The information from the site is needed now, not 3 hours or 3 days from now. The data from this site also may be retrievable before the server is up and running again.
Note: For the sake of brevity, the term “dead site” will be used to refer to sites not immediately available for any of the above reasons.
Two Methods for Recovering Lost Sites
There are two resources that can be used to attempt retrieval of dead websites. The differences between the two tend to compliment, or supplement, the other.- Search Engines
- Web Archive Sites
Using Search Engines to View Dead Sites
Many search engines create a cache of any website viewed by users. The four most popular search engines, as well as many others have this ability.- Ask
- Bing
- Yahoo!
- Type in search criteria.
- After discovery of dead site, return to search citations.
- Near the end of the citation there will be a hyperlink marked “Cache,” “Cached page”, or a similar phrase.
- Click on this hyperlink.
- The page will show up as it appeared the last time it was cached. The cached view will usually be from sometime in the last 7 months.
Using a Web Archive Site to Retrieve Dead Websites
Web archive sites actually archive websites over time. When a change to a site is made it is added to the archive. This can be useful not only for accessing dead sites but also for viewing changes that have taken place to the website over the years and getting a historical overview of the site’s evolution.Perhaps the best known and popular of these archive sites is The Wayback Machine. It archives web sites back to 1996. It boasts 150 billion archived pages that fill up roughly 2 petabytes (1 petabyte equals 10 to the 15th power). To access these archived pages:
- Open The Wayback Machine home page.
- Type the URL in the search box
- If the URL is archived, The Wayback Machine will display a table of dates, separated by yearly columns, of links to the site. An asterisk (*) appears next to dates the site was updated.
- Click on the desired date and a view of the site, as it appeared on that date, will appear.
Limitations When Viewing Cached or Archived Sites
Even with the vast resources of the most powerful search engines and 150 billion pages in the Wayback Machine, there are limitations. Here are a few- The site may not be cached or archived.
- The site was cached (by search engine) before or after the data searched for was on the site.
- Cached files from search engines have only one view.
- Links in archived files might be dead and also require an archive search of their own.
- Images may no longer be stored causing empty graphic boxes to appear.
- Some sites carry a “robots.txt” message that prevents them from being indexed.
- Most search engine caches are less than seven months old.
- Archive results are usually at least six months old.