Exchange Server clustering has been with us since Exchange Server 5.5 but left Exchange Administrators having to know the complexities of Microsoft Clustering and Microsoft Exchange in order to support it. With the advent of Exchange Server 2007 Microsoft introduced new clustering options to the Exchange server product line that gave messaging architects increased flexibility in deploying highly available systems, but there were still limitations such as:
- Site failover was not seamless due to the DNS TTL.
- Fail over could take minutes.
- There are scenarios where message loss could occur.
- The cluster had to be built before the installation of Exchange. If a cluster was required after the installation of Exchange another server had to be built.
- Failover was restricted to the server level so if one database failed then all databases failed over.
Microsoft are acknowledging with Exchange Server 2010 that the configuration and support of the Windows Cluster required additional skill sets and added complexity to the server build and ongoing support. With Exchange Server 2010 Exchange clustering as we know it and replaced it with the Database Availability Group (DAG) which combines and improves on the best parts of the CCR and SCR technologies from Exchange Server 2007. The benefits of a DAG are as follows:
- Allows multiple copies of the same database on different servers
- Failover is at the database level.
- Failover time is around 30 seconds.
- Better tracking of the delivery of messages and only uncommitted messages are retained in the Hub Dumpster.
- High availability options can be deployed after initial install.
To assist with the creation of the Database Availability Group there has been another fundamental change to Exchange Server 2010 which is that the MAPI end point for Outlook access has now been moved onto the CAS role, this provides users with a single access point for client requests no matter which server their mailbox is active on. For multiple CAS server deployments a CAS array needs to be configured which I have talked about in a previous post.
As previously mentioned the DAG is configured after initial install and still uses components from Windows failover-clustering however the difference now is that Exchange Server will configure the clustering options that it needs for you as part of the DAG creation so the Exchange engineer only needs to know how to configure the DAG.
Database availability groups can consist of up to 16 servers and within each DAG there can be up to 16 copies of any given mailbox database. The DAG uses the log shipping technology introduced with Exchange server 2007 to maintain copies of the database on multiple servers for which differing log replay times can be configured (up to 1 week lag).
Each copy of the database is given a “Preferred List Sequence Number”, this is essentially tells the CAS server in which priority it should try and connect to the database and also the order in which the DAG will try to fail over to the next copy of the database, so if the database with Sequence Number 1 fails, the DAG will bring the database with Sequence Number 2 online and so on. The CAS server will then route requests for that database to server with the online copy of the database.
So in the diagram above if the database DB1 on Server 1 goes offline then the Database Availability Group will automatically bring the DB1 database copy on Server 2 online and so on with Server 4 should the database on Server 2 not be available. A time of around 30 seconds is being mentioned for a database copy to be brought online by the DAG.
The diagram above shows a four server DAG with copies of the database distributed across the servers. The general advice currently from Microsoft is to have fewer large DAG’s rather than a large number of smaller DAG’s. The database does not have to be replicated to all or any servers depending on requirements. In distributed environments the placement of database copies could be quite important.
As the client is now connecting in via the CAS server, they should not experience any disconnection from the server but there will be a short outage will the database copy is brought online.
With respect to Public Folder databases, these are not replicated between servers by the Database Availability Group. Public Folder high availability still relies on Public Folder Replication between servers.
With regard to the Hub Transport Dumpster this has also been optimized for the new failover model and messages are only retained in the dumpster until it is recorded that the message has been replicated to all database copies, this assists with bringing the database copies online as only messages that have not been replicated need to be requested from the Dumpster.

