Connect Time Failover & Transparent Application Failover for Data Guard
I was giving a 10g Data Guard course this week in Düsseldorf, demonstrating amongst others the possibility to configure Transparent Application Failover
(TAF) for Data Guard. I always try to keep things as simple as
seriously possible, in order to achieve an easy and good understanding
of what I like to explain. Later on, things are getting complex by
themselves soon enough :-)
In my simple scenario, I have one Primary
Database (prima) and one Physical Standby Database (physt). After a
switchover or after a failover, the primary is going to be physt. The
challenge is now to get the connect from the client side to the right
(primary) database. That is called Connect Time Failover and is achieved as follows:
First, we make sure that the client uses a tnsnames.ora with a connect descriptor that uses a SERVICE_NAME instead of a SIDMYAPP = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = HostA)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = HostB)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = myapp) ) )
HostA is the host on which prima runs, HostB has physt running.
Second, we take care that the service myapp is offered only at the right database
– only on the primary.Notice that the PMON background processes of both
databases must be able to communicate with the (local) listeners in
order to register the service myapp.
If you don’t use the listener port 1521, they can’t. You have to point
to that listener port then with the initialization parameter
LOCAL_LISTENER.
We create and start now the service myapp manually on the primary:
begin dbms_service.create_service('myapp','myapp'); end; / begin DBMS_SERVICE.START_SERVICE('myapp'); end; /Then we create a trigger, that ensures that this service is only offered, if the database is in the primary role:
create trigger myapptrigg after startup on database declare v_role varchar(30); begin select database_role into v_role from v$database; if v_role = 'PRIMARY' then DBMS_SERVICE.START_SERVICE('myapp'); else DBMS_SERVICE.STOP_SERVICE('myapp'); end if; end; /
The event after startup is fired, if an
instance changes from status MOUNT to OPEN. If you use a logical
standby, it is not fired, because the logical standby remains in status
OPEN. You may use the event after db_role_change in this case.
The creation of the trigger and of the service is accompanied with redo
protocol (the Data Dictionary has changed) and therefore also present at
physt without any additional work to do there for the DBA. With the
present setup, we have already achieved Connect Time Failover: Clients can use the same connect descriptor (myapp) to get to the right (primary) database now, regardless of switchover or failover.
But sessions that are connected to prima
are disconnected if a switchover or failover to physt takes place. They
have got to connect again then. We can change that, so that a Runtime Failover is possible, under ideal circumstances, that failover is even completely transparent to
the client and proceeds without error messages. To achieve that, you
don’t have to touch the tnsnames.ora on the client side. Instead, you do
the following on the primary database:
begin dbms_service.modify_service ('myapp', FAILOVER_METHOD => 'BASIC', FAILOVER_TYPE => 'SELECT', FAILOVER_RETRIES => 200, FAILOVER_DELAY => 1); end; /
Connections to the service myapp are
now automatically failed over together with the service to the new
primary. Should they have done nothing during the time of the
failover/switchover, or even if they had run a select statement, they
will not receive any error but only notice a short interruption (about
20 seconds, in a typical case). Only if sessions have open transactions
during the failover/switchover, they will receive error messages
(“transaction must roll back”) after they try commit then.
I use to demonstrate that with a select
on a table with 100000 rows that starts on the primary. Then I kill the
SMON of that primary and the select stops at row 30000 something, waits a
couple of seconds (maximal 200,
with the above settings) and then continues on the new primary after the
failover, fetching exactly the 100000 rows! That is always quite
impressive and shows how robust Oracle Databases – especially combined
with Data Guard – are :-)
No comments:
Post a Comment