web design web development website optimization
Creative Web Design and graphics for all categories of business.
Web Development services and customized features to satisfy your particular business needs.
Website Optimization aimed at improving the quality of your page design and support your business.
Showing posts with label accessing archived websites. Show all posts

Accessing Extinct Websites

While on the Internet a user clicks on a favorite/bookmark. The dreaded “Error 404 Not Found” message appears in the browser window. If the URL is correct and the site truly existed in the past there is hope. The data may still be viewable.

Extinct Websites
The web is always changing. Perhaps the desired webpage has changed from the last visit. It was just three days ago that the site was visited and contained vital information for successful completion of a project or a quote for a report. It has been replaced with new content, a new design, a new web master. This site too may be recoverable.

Perhaps the problem is as simple as a server has gone down and service is expected back soon(?). The information from the site is needed now, not 3 hours or 3 days from now. The data from this site also may be retrievable before the server is up and running again.

Note: For the sake of brevity, the term “dead site” will be used to refer to sites not immediately available for any of the above reasons.

Two Methods for Recovering Lost Sites

There are two resources that can be used to attempt retrieval of dead websites. The differences between the two tend to compliment, or supplement, the other.
  • Search Engines
  • Web Archive Sites

Using Search Engines to View Dead Sites

Many search engines create a cache of any website viewed by users. The four most popular search engines, as well as many others have this ability.
  • Ask
  • Bing
  • Google
  • Yahoo!
Each search engine has its own criteria and time limit for caching pages. The following procedure can be used to view search engine caches.
  1. Type in search criteria.
  2. After discovery of dead site, return to search citations.
  3. Near the end of the citation there will be a hyperlink marked “Cache,” “Cached page”, or a similar phrase.
  4. Click on this hyperlink.
  5. The page will show up as it appeared the last time it was cached. The cached view will usually be from sometime in the last 7 months.

Using a Web Archive Site to Retrieve Dead Websites

Web archive sites actually archive websites over time. When a change to a site is made it is added to the archive. This can be useful not only for accessing dead sites but also for viewing changes that have taken place to the website over the years and getting a historical overview of the site’s evolution.

Perhaps the best known and popular of these archive sites is The Wayback Machine. It archives web sites back to 1996. It boasts 150 billion archived pages that fill up roughly 2 petabytes (1 petabyte equals 10 to the 15th power). To access these archived pages:
  • Open The Wayback Machine home page.
  • Type the URL in the search box
  • If the URL is archived, The Wayback Machine will display a table of dates, separated by yearly columns, of links to the site. An asterisk (*) appears next to dates the site was updated.
  • Click on the desired date and a view of the site, as it appeared on that date, will appear.
Image 1 is the top portion of the United States White House home page as it appeared on February 29, 2000. Image 2 is the top portion of the United States White House home page as it appeared on December 3, 2009. Image 1 was found using The Wayback Machine.

Limitations When Viewing Cached or Archived Sites

Even with the vast resources of the most powerful search engines and 150 billion pages in the Wayback Machine, there are limitations. Here are a few
  • The site may not be cached or archived.
  • The site was cached (by search engine) before or after the data searched for was on the site.
  • Cached files from search engines have only one view.
  • Links in archived files might be dead and also require an archive search of their own.
  • Images may no longer be stored causing empty graphic boxes to appear.
  • Some sites carry a “robots.txt” message that prevents them from being indexed.
  • Most search engine caches are less than seven months old.
  • Archive results are usually at least six months old.
The more recent dates will provide more images and functional links than those sites several years old. Knowing how to use both the search engine cache and an archive website can make researching formerly inaccessible sites more successful.