Title: Clustering Solutions and Zero Downtime Hosting Pitfalls
Author: Godfrey Heron
Email: info@irieisle-online.com
Word Count:1452
Copyright: © 2005 by Godfrey Heron
Article URL:www.irieisle-online.com/zero-downtime-hosting.htm
Publishing Guidelines: You may publish this article in your
newsletter, on your web site, or in your print publication
provided you include the resource box at the end. Notification
would be appreciated but is not required.
Clustering Solutions and Zero Downtime Hosting Pitfalls
There are a number of benchmarks, which we may use to
evaluate hosting companies. One of these is, reliability.
Like most things in this life, reliability in web hosting is
typically a function of how much we are willing to spend for
it. In essence, a “cost-effectiveness” equation needs to be
determined and solved.
Reliability can be measured in terms of percentage
availability. Industry personnel will talk of reliability
in terms of system availability with three (99.9%), four
(99.99%) or five nines.(99.999%).
Typically, web-hosting availability exceeding three nines
was the purvue of extremely large companies with multiple
layers of redundancy built into their network and software
systems. However technology has now brought
high-availability theory and cost-effective reality into
alignment.
High availability can be achieved by removing, as far as
possible, any “single point/s of failure”, or, where this
is not altogether possible, minimizing the time spent
in a “failure” situation.
One of the ways in which small businesses and ISP’s can
reasonably avoid single point of failures is by employing
server farm clustering and load-balancing solutions.
Webopedia defines server farm clustering as follows:
“A server farm is a group of networked servers that are
housed in one location. A server farm streamlines
internal processes by distributing the workload between
the individual components of the farm and expedites
computing processes by harnessing the power of multiple
servers.
The farms rely on load-balancing software that
accomplishes such tasks as tracking demand for
processing power from different machines, prioritizing
the tasks and scheduling and rescheduling them
depending on priority and demand that users put on
the network. When one server in the farm fails,
another can step in as a backup.”
It is important to note, that typically, web servers,
which are load-balanced in such a manner, display one
external IP address to the public Internet, while
using internal network IP’s to communicate between
the clustered servers and load balancer.
Now this is indeed fantastic! Not only do you receive
web site peak demand scalability with web server
clusters, but you also have the built-in
“high uptime availability” component which is so
important.
However this is only half of the picture.
There are very important cautionary notes to keep in
mind.
Where web hosting is concerned, availability depends
on two things:
1.Hardware reliability (RAID drives, server
clustering etc) within the Data Center;
2.High Bandwidth Internet Connectivity to the
Data Center / Network Operating Center (NOC).
Now, with all your well thought out server clustering
solutions, what would be the result, if, (as had
recently occurred in a very high profile web company),
a fire in the Network vicinity had caused the entire
Data Center to shut down power for hours. Or, a
bandwidth provider to the NOC had router problems.
All your websites would be showing the dreaded “Page
Cannot be Displayed” page.
The ideal solution therefore would be to employ
clustering solutions with servers in entirely
different Data Centers with different bandwidth
providers. Redundant Data Centers eliminate the NOC
itself being a single point of failure. This
scenario becomes interesting at this point, because
the difficulty of addressing the potential problems
now increase exponentially.
We now have to deal with DNS caching, the concept of
failover, and how static and dynamic web
applications respond to failure events.
Failover and Load balancing are frequently used
interchangeably, however they are in fact quite
different.
·Load Balancing refers to physically sharing
servers capacity, so that one server is not
overloaded and swamped with requests.
·Failover however, is the process that manually
or automatically switches a failed server or
bandwidth provider to a standby server or
network if the primary system fails or is
temporarily shut down for servicing.
As such, failover software is an important function of
mission-critical systems that rely on constant
accessibility.
One of the inherent difficulties with failover for Web
Hosting companies operating on different networks
is the limitations imposed by the DNS caching system.
As DNS records are passed from the original DNS
servers (i.e., ns1/ns2.your-domain.com), they are
cached or stored at several different ISP’s along the way.
Which is why it takes a while for a newly
registered domain name to resolve to its IP address.
Each DNS record has a TTL (time to live) setting assigned.
By manipulating this value, it is possible to alter how
long that particular IP address/ DNS record combo is
stored. If your site is on 2 different servers with 2
different IP addresses, you could set the ‘time to live’
with a value of, say, 2 minutes.
The failover software would check server availability
by “pinging” the web server every few minutes to
determine whether it’s IP address is responding
appropriately. (perhaps by looking for a particular text
string in a web page).
If a failure is detected, then the software would pull
the non-working web server IP address out of the list
of IP addresses assigned to the your web site’s domain
name. If/when your web server IP comes back online
it would be restored to the list.
With a TTL setting of 2 minutes, theoretically, your
web site should be down for just 2 minutes, while switching
DNS information to the other web server.
The problem with this scenario, is that, while some
ISP’s caching might respond to such low figures,
other ISP’s may decide to ignore,(to save on
bandwidth utilization), any TTL’s below a certain
value, say, 60 minutes. So it is entirely possible
that some of your visitors would see your websites
and for others, your site would be down for 1 hour
or more, even though one of your servers was
operating perfectly.
Static non interactive web sites are great candidates
for server clustering, but the wicket becomes a bit
sticky for dynamically generated sites. Most database
application software in general, although
having some replication capabilities, are not happy
with multiple server master/slave relationships and
real time updating between servers. The issue can
become very problematic if your site requires
frequent updates.
Then there is the problem of how to keep your websites
synchronized. Unix/Linux servers have a built in
synchronizing software tool called rsync. You can
also automate the synchronizing process by setting up
a cron job to run periodically.
DNS caching and synchronizing issues can be so
problematic so as to nullify the advantages of server
clustering. For example, a cron job to synchronize
your servers every few minutes might very well use
up your server capacity.
Your customers will also have to contend with their
desktop email client software having dual email
addresses for each email account on each
web server. e.g. info@server1.net, info@server2.net.
It is important to realize that DNS operates by
default in a round robin manner, so that, if you have
the same web site on 2 separate servers, it is very
likely that server 1 will get 50% of all the web
traffic.
Now, this is important for a number of reasons,
but one of the principal reasons to keep this in
mind, is that, you will not be able to effectively
keep a ”back-up” site (as some providers would have
you believe) which will only be used when the primary
server goes down. For e.g. a site saying” we’re
sorry our main server is down but you may contact us
at: www.yourdomain2.com.
On a final note, hardware based load balancing
solutions tend to be quite expensive and also
introduce a potential single point of failure
into the system, the load balancer itself. There
is a very prominent Data Center that began
offering load balanced hosting solutions, where
the load balancer itself failed on several
occasions, although the web servers were operating
perfectly. The net effect to the public however,
was that the sites were unavailable.
Reasonable cost effective software based solutions
may be obtained as a service model or by purchasing
the software yourself. Zoneedit is an example of a
service model, and Simplefailover is an example of
a software based model which maybe purchased on a
server license basis.
In conclusion, at this point in time, there are
several limiting factors to successfully implementing
a “true” high availability multiple server web hosting
system. Depending on your clientele and the nature of
their web sites,this may indeed be a very viable alternative.
For others, simply setting up a server with high
quality components, redundant RAID hard drives and a
good supply of server spare parts may be the best way
to ensure high availability.
About the Author
Godfrey Heron is the Website Manager of the Irieisle
Multiple Domain Hosting Services company. Signup
for your free trial, and host multi domains and
web sites on one account: http://www.irieisle-online.com