Saving a Geocities Site
Yahoo’s Geocities.com is being converted / taken offline (it will no longer be the Geocities we’ve known since the mid-nineties).
A friend asked me how to save Geocities sites (she has 51 sites!). Do the following for each site.
Preparing for the Download
First, go into the GeoCities file manager, and rename the index.html files to index2.html. (Check the box beside index.html and press the “Rename” button at the top.) This way the server will show the “Index Of” page when you go to the site: http://www.geocities.com/mysite
There should be no Yahoo sidebars, not banners, no pop-ups, nothing to confuse the downloading program.
Performing the Download
Using linux, it’s easy. First be sure you have the “wget” utility installed. On Debian (lenny), I just typed:
apt-get install wget
Then, I changed to an empty directory and issued the wget command:
wget --continue --recursive --tries=inf --limit-rate=0.8k --convert-links --html-extension --no-clobber \ -P ./mysite http://geocities.com/mysite
This will be agonizingly slow, but I find the transfer quota (typically~4MB per hour for a free user) can be prohibitive. Exceeding the hourly transfer limitation causes the script to download the same error page for all subsequent files. Remove the “–limit-rate=0.8k” part if you pay for sufficient hourly transfer.
Automating the Download
I wrote a script accepting a commandline argument specifying the name of the geocities site.
Invocation for http://www.geocities.com/mysite:
./geodownload.sh mysite
Here is the script:
#!/bin/bash wdir=`pwd` wget --continue --recursive --tries=inf --limit-rate=0.8k --convert-links --html-extension -P ./$1 http://geocities.com/$1 mv $wdir/$1/geocities.com/$1/index.html $wdir/$1/geocities.com/$1/index.html.bak mv $wdir/$1/geocities.com/$1/index2.html $wdir/$1/geocities.com/$1/index.html
