Downloading the entire ss64 site for personal offline use

28 Aug 2018 01:24

Hello Forum,
I'll be going to a location where my internet connection is likely to go down for unknown periods of time and I would like to have a copy of this site on an external drive (not for hosting only anywhere else).

I've reviewed the sites copywrite page which explicitly states you can copy, share, and remix the pages hosted on SS64, but I wanted to verify whether it would be an issue for me to mirror the entire site using something like HTTrack?

Doing this creates many simultaneous connections, it could potentially strain weaker servers and look like malicious activity. (I'd hate to be Auto-blocked from this site)

My question: Are there any recommended guidelines for performing a site mirror? or better yet are there any current copies of the site already floating around, perhaps a torrent?

- Jay


#2 28 Aug 2018 19:02
Simon Sheppard

This shouldnt be a problem if you keep the number of simultaneous connections to about 5. I don't have any automatic IP blocking in place.

There are thousands of pages under the Oracle section, so if you dont need those you might want to exclude /ora/ just to speed things up.

In a typical week there are one or two people who scrape everything in addition to all the usual search bot visits and the server hasnt fallen over yet smile

Thanks for asking.


#3 30 Jun 2019 14:25
Simon Sheppard

I feel like I should clarify this a little, contains just over 4,000 pages. While I don't have any problem with people downloading one copy, repeatedly downloading en mass, i.e. 100,000+ pages in a 24 hour period then I will have to block the access IP.

I hate having to do this because I suspect it is more likely incompetence than malice and obviously if I block a range of IP addresses then theres a risk a few other people could lose access, however the alternative is running out of bandwidth and the site gets knocked offline for everyone.

I've had to block 2 people in the last 2 weeks, I don't know why this has suddenly become an issue after years with no problems.

