Database Blog

Thursday, 6 September 2018

Enable/Disable all table constraints in Oracle


disable all constraints:

BEGIN
  FOR c IN
  (SELECT c.owner, c.table_name, c.constraint_name
   FROM user_constraints c, user_tables t
   WHERE c.table_name = t.table_name
   AND c.status = 'ENABLED'
   AND NOT (t.iot_type IS NOT NULL AND c.constraint_type = 'P')
   ORDER BY c.constraint_type DESC)
  LOOP
    dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" disable constraint ' || c.constraint_name);
  END LOOP;
END;
/

enable all constraints:

BEGIN
  FOR c IN
  (SELECT c.owner, c.table_name, c.constraint_name
   FROM user_constraints c, user_tables t
   WHERE c.table_name = t.table_name
   AND c.status = 'DISABLED'
   ORDER BY c.constraint_type)
  LOOP
    dbms_utility.exec_ddl_statement('alter table "' || c.owner || '"."' || c.table_name || '" enable constraint ' || c.constraint_name);
  END LOOP;
END;
/

Export /Expdp without data

expdp:

expdp username/password DIRECTORY=directory_object_name \
                        DUMPFILE=dumpfile_name \
                        TABLES=table_names|TABLESPACES=tablespace_names|FULL=y \
                        CONTENT=metadata_only

exp:

exp username/password GRANTS=y \
                      CONSTRAINTS=y \
                      ROWS=n

Wednesday, 29 August 2018

PostgreSQL copy database within the same server

CREATE DATABASE targetdb WITH TEMPLATE sourcedb;

Postgresql Copy database to another server

pg_dump -C -h localhost -U localuser dbname | psql -h remotehost -U remoteuser dbname

pg_dump -C -h remotehost -U remoteuser dbname | psql -h localhost -U localuser dbname

Wednesday, 9 November 2016

courtesy by http://dbaharrison.blogspot.com.tr/2015/02/tuning-untunable.html

As part of an application migration we have to run a script that moves the schema from version x to version y. This schema upgrade process involves running an executable that works out what version of the app we are on and then running a whole series of migration scripts. This executable connects using an odbc driver and unfortunately the driver seems to have some built in code that runs a query against the data dictionary that is very very slow.

This is the query:

select /*+ RULE * / a.owner,a.table_name,b.column_name,c.owner,c.table_name,d.column_name,b.position,c.delete_rule,c.constraint_name,a.constraint_name
from sys.all_constraints a,sys.all_cons_columns b,sys.all_constraints c,sys.all_cons_columns d
where a.owner=b.owner and a.constraint_name=b.constraint_name and c.owner=d.owner and c.constraint_name=d.constraint_name and
a.constraint_type='P'and c.constraint_type='R'and a.constraint_name=c.r_constraint_name and a.owner=c.r_owner and b.position=d.position and
c.owner='ZAINET' and c.table_name='RTPROFILE'

The table name changing all the time as the script works its way through the system.

The same table can be modified a number of times.

This upgrade is happening on 12.1.0.1 and when the SQL is run here (with fairly default settings) the above statement takes about 36 seconds to run, the 2 screenshots below show an overall view and then a drill into one of the SQL statements.

Now these 36 seconds add up to a huge amount of time (many hours in fact) with the number of scripts that have to be run and this makes the outage time for migration unacceptable.

So how do we tune this?

We cannot change the code in any way - it's hardcoded into the odbc driver and we can't touch that - so what do we do?

Well the first idea was to get rid of the hint - that's clearly not what we want to have - but how can we do that?

Well the answer is to use

"_optimizer_ignore_hints"=true

This set at the database level tells oracle to ignore all hints.

So we enable that and now the code is down to 15 seconds to run - so much better - but still way too slow - so what next?

Well looking at the query the most likely issue is with ALL_CONSTRAINTS and i know there have been some performance issues with that in some versions.

A quick metalink search reveals something that looks useful

Oracle Support Document 796359.1 (SUB-OPTIMAL PLAN FOR JDBC GENERATED SQL USING ALL_CONSTRAINTS AND ALL_CONS_COLUMNS) can be found at:

https://support.oracle.com/epmos/faces/DocumentDisplay?id=796359.1

Going through this doc and the suggestion is to revert optimizer behaviour to that of earlier versions.

So we try that

alter system set optimizer_features_enable='10.1.0';

Now when we run the code it takes 0 seconds!

So we've managed to almost completely remove any overhead from this odbc driver - we tuned the untunable!

Post the migration we'll remove these two settings at let 12c do it's thing as the normal app makes no use of this odbc driver.

Monday, 11 April 2016

Distributed Database Systems

You can find the original post here :
http://dba.stackexchange.com/questions/34892/why-cant-rdbms-cluster-the-way-nosql-does

Or, Distributed Databases - what the FK does 'web scale' actually mean?
Distributed database systems are complex critters and come in a number of different flavours. If I dig deep in to the depths of my dimly remembered distributed systems papers I did at university (roughly 15 years ago) I'll try to explain some of the key engineering problems to building a distributed database system.

First, some terminology

ACID (Atomicity, Consistency, Isolation and Durability) properties: These are the key invariants that have to be enforced for a transaction to be reliably implemented without causing undesirable side effects.
Atomicity requires that the transaction complete or rollback completely. Partially finished transactions should never be visible, and the system has to be built in a way that prevents this from happening.
Consistency requires that a transaction should never violate any invariants (such as declarative referential integrity) that are guaranteed by the database schema. For example, if a foreign key exists it should be impossible to insert a child record with a reverence to a non-existent parent.
Isolation requires that transactions should not interfere with each other. The system should guarantee the same results if the transactions are executed in parallel or sequentially. In practice most RDBMS products allow modes that trade off isolation against performance.
Durability requires that once committed, the transaction remains in persistent storage in a way that is robust to hardware or software failure.
I'll explain some of the technical hurdles these requirements present on distributed systems below.
Shared Disk Architecture: An architecture in which all processing nodes in a cluster have access to all of the storage. This can present a central bottleneck for data access. An example of a shared-disk system is Oracle RAC or Exadata.
Shared Nothing Architecture: An architecture in which processing nodes in a cluster have local storage that is not visible to other cluster nodes. Examples of shared-nothing systems are Teradata and Netezza.
Shared Memory Architecture: An architecture in which multiple CPUs (or nodes) can access a shared pool of memory. Most modern servers are of a shared memory type. Shared memory facilitates certain operations such as caches or atomic synchronisation primitives that are much harder to do on distributed systems.
Synchronisation: A generic term describing various methods for ensuring consistent access to a shared resource by multiple processes or threads. This is much harder to do on distributed systems than on shared memory systems, although some network architectures (e.g. Teradata's BYNET) had synchronisation primitives in the network protocol. Synchronisation can also come with a significant amount of overhead.
Semi-Join: A primitive used in joining data held in two different nodes of a distributed system. Essentially it consists of enough information about the rows to join being bundled up and passed by one node to the other in order to resolve the join. On a large query this could involve significant network traffic.
Eventual Consistency: A term used to describe transaction semantics that trade off immediate update (consistency on reads) on all nodes of a distributed system for performance (and therefore higher transaction throughput) on writes. Eventual consistency is a side effect of using Quorum Replication as a performance optimisation to speed up transaction commits in distributed databases where multiple copies of data are held on separate nodes.
Lamport's Algorithm: An algorithm for implementing mutual exclusion (synchronisation) across systems with no shared memory. Normally mutual exclusion within a system requires an atomic read-compare-write or similar instruction of a type normally only practical on a shared memory system. Other distributed synchronisation algorithms exist, but Lamport's was one of the first and is the best known. Like most distributed synchronisation mechanisms, Lamport's algorithm is heavily dependent on accurate timing and clock synchronisation beteen cluster nodes.
Two Phase Commit (2PC): A family of protocols that ensure that database updates involving multiple physical systems commit or roll back consistently. Whether 2PC is used within a system or across multiple systems via a transaction manager it carries a significant overhead.
In a two-phase commit protocol the transaction manager asks the participating nodes to persist the transaction in such a way that they can guarantee that it will commit, then signal this status. When all nodes have returned a 'happy' status it then signals the nodes to commit. The transaction is still regarded as open until all of the nodes send a reply indicating the commit is complete. If a node goes down before signalling the commit is complete the transaction manager will re-query the node when it comes back up until it gets a positive reply indicating the transaction has committed.
Multi-Version Concurrency Control (MVCC): Managing contention by writing new versions of the data to a different location and allowing other transactions to see the old version of the data until the new version is committed. This reduces database contention at the expense of some additional write traffic to write the new version and then mark the old version as obsolete.
Election Algorithm: Distributed systems involving multiple nodes are inherently less reliable than a single system as there are more failure modes. In many cases some mechanism is needed for clustered systems to deal with failure of a node. Election algorithms are a class of algorithms used to select a leader to coordinate a distributed computation in situations where the 'leader' node is not 100% determined or reliable.
Horizontal Partitioning: A table may be split across multiple nodes or storage volumes by its key. This allows a large data volume to be split into smaller chunks and distributed across storage nodes.
Sharding: A data set may be horizontally partitioned across multiple physical nodes in a shared-nothing architecture. Where this partitioning is not transparent (i.e. the client must be aware of the partition scheme and work out which node to query explicitly) this is known as sharding. Some systems (e.g. Teradata) do split data across nodes but the location is transparent to the client; the term is not normally used in conjunction with this type of system.
Consistent Hashing: An algorithm used to allocate data to partitions based on the key. It is characterised by even distribution of the hash keys and the ability to elastically expand or reduce the number of buckets efficiently. These attributes make it useful for partitioning data or load across a cluster of nodes where the size can change dynamically with nodes being added or dropping off the cluster (perhaps due to failure).
Multi-Master Replication: A technique that allows writes across multiple nodes in a cluster to be replicated to the other nodes. This technique facilitates scaling by allowing some tables to be partitioned or sharded across servers and others to be synchronised across the cluster. Writes must be replicated to all nodes as opposed to a quorum, so transaction commits are more expensive on a multi-master replicated architecture than on a quorum replicated system.
Non-Blocking Switch: A network switch that uses internal hardware parallelism to achieve throughput that is proportional to the number of ports with no internal bottlenecks. A naive implementation can use a crossbar mechanism, but this has O(N^2) complexity for N ports, limiting it to smaller switches. Larger switches can use more a complex internal topology called a non-blocking minimal spanning switch to achieve linear throughput scaling without needing O(N^2) hardware.

Making a distributed DBMS - how hard can it be?

Several technical challenges make this quite difficult to do in practice. Apart from the added complexity of building a distributed system the architect of a distributed DBMS has to overcome some tricky engineering problems.
Atomicity on distributed systems: If the data updated by a transaction is spread across multiple nodes the commit/rollback of the nodes must be coordinated. This adds a significant overhead on shared-nothing systems. On shared-disk systems this is less of an issue as all of the storage can be seen by all of the nodes so a single node can coordinate the commit.
Consistency on distributed systems: To take the foreign key example cited above the system must be able to evaluate a consistent state. For example, if the parent and child of a foreign key relationship could reside on different nodes some sort of distributed locking mechanism is needed to ensure that outdated information is not used to validate the transaction. If this is not enforced you could have (for example) a race condition where the parent is deleted after the its presence is verified before allowing the insert of the child.
Delayed enforcement of constraints (i.e. waiting until commit to validate DRI) requires the lock to be held for the duration of the transaction. This sort of distributed locking comes with a significant overhead.
If multiple copies of data are held (this may be necessary on shared-nothing systems to avoid unnecessary network traffic from semi-joins) then all copies of the data must be updated.
Isolation on distributed systems: Where data affected on a transaction resides on multiple system nodes the locks and version (if MVCC is in use) must be synchronised across the nodes. Guaranteeing serialisability of operations, particularly on shared-nothing architectures where redundant copies of data may be stored requires a distributed synchronisation mechanism such as Lamport's Algorithm, which also comes with a significant overhead in network traffic.
Durability on distributed systems: On a shared disk system the durability issue is essentially the same as a shared-memory system, with the exception that distributed synchronisation protocols are still required across nodes. The DBMS must journal writes to the log and write the data out consistently. On a shared-nothing system there may be multiple copies of the data or parts of the data stored on different nodes. A two-phase commit protocol is needed to ensure that the commit happens correctly across the nodes. This also incurs significant overhead.
On a shared-nothing system the loss of a node can mean data is not available to the system. To mitigate this data may be replicated across more than one node. Consistency in this situation means that the data must be replicated to all nodes where it normally resides. This can incur substantial overhead on writes.
One common optimisation made in NoSQL systems is the use of quorum replication and eventual consistency to allow the data to be replicated lazily while guaranteeing a certain level of resiliency of the data by writing to a quorum before reporting the transaction as committed. The data is then replicated lazily to the other nodes where copies of the data reside.
Note that 'eventual consistency' is a major trade-off on consistency that may not be acceptable if the data must be viewed consistently as soon as the transaction is committed. For example, on a financial application an updated balance should be available immediately.

Shared-Disk systems

A shared-disk system is one where all of the nodes have access to all of the storage. Thus, computation is independent of location. Many DBMS platforms can also work in this mode - Oracle RAC is an example of such an architecture.
Shared disk systems can scale substantially as they can support a M:M relationship between storage nodes and processing nodes. A SAN can have multiple controllers and multiple servers can run the database. These architectures have a switch as a central bottleneck but crossbar switches allow this switch to have a lot of bandwidth. Some processing can be offloaded onto the storage nodes (as in the case of Oracle's Exadata) which can reduce the traffic on the storage bandwidth.
Although the switch is theoretically a bottleneck the bandwidth available means that shared-disk architectures will scale quite effectively to large transaction volumes. Most mainstream DBMS architectures take this approach because it affords 'good enough' scalability and high reliability. With a redundant storage architecture such as fibre channel there is no single point of failure as there are at least two paths between any processing node and any storage node.

Shared-Nothing systems

Shared-nothing systems are systems where at least some of the data is held locally to a node and is not directly visible to other nodes. This removes the bottleneck of a central switch, allowing the database to scale (at least in theory) with the number of nodes. Horizontal partitioning allows the data to be split across nodes; this may be transparent to the client or not (see Sharding above).
Because the data is inherently distributed a query may require data from more than one node. If a join needs data from different nodes a semi-join operation is used to transfer enough data to support the join from one node to another. This can result in a large amount of network traffic, so optimising the distribution of the data can make a big difference to query performance.
Often, data is replicated across nodes of a shared-nothing system to reduce the necessity for semi-joins. This works quite well on data warehouse appliances as the dimensions are typically many orders of magnitude smaller than the fact tables and can be easily replicated across nodes. They are also typically loaded in batches so the replication overhead is less of an issue than it would be on a transactional application.
The inherent parallelism of a shared-nothing architecture makes them well suited to the sort of table-scan/aggregate queries characteristic of a data warehouse. This sort of operation can scale almost linearly with the number of processing nodes. Large joins across nodes tend to incur more overhead as the semi-join operations can generate lots of network traffic.
Moving large data volumes is less useful for transaction processing applications, where the overhead of multiple updates makes this type of architecture less attractive than a shared disk. Thus, this type of architecture tends not to be used widely out of data warehouse applications.

Sharding, Quorum Replication and Eventual Consistency

Quorum Replication is a facility where a DBMS replicates data for high availability. This is useful for systems intended to work on cheaper commodity hardware that has no built-in high-availability features like a SAN. In this type of system the data is replicated across multiple storage nodes for read performance and redundant storage to make the system resilient to hardware failure of a node.
However, replication of writes to all nodes is O(M x N) for M nodes and N writes. This makes writes expensive if the write must be replicated to all nodes before a transaction is allowed to commit. Quorum replication is a compromise that allows writes to be replicated to a subset of the nodes immediately and then lazily written out to the other nodes by a background task. Writes can be committed more quickly, while providing a certain degree of redundancy by ensuring that they are replicated to a minimal subset (quorum) of nodes before the transaction is reported as committed to the client.
This means that reads off nodes outside the quorum can see obsolete versions of the data until the background process has finished writing data to the rest of the nodes. The semantics are known as 'Eventual Consistency' and may or may not be acceptable depending on the requirements of your application but mean that transaction commits are closer to O(1) than O(n) in resource usage.
Sharding requires the client to be aware of the partitioning of data within the databases, often using a type of algorithm known as 'consistent hashing'. In a sharded database the client hashes the key to determine which server in the cluster to issue the query to. As the requests are distributed across nodes in the cluster there is no bottleneck with a single query coordinator node.
These techniques allow a database to scale at a near-linear rate by adding nodes to the cluster. Theoretically, quorum replication is only necessary if the underlying storage medium is to be considered unreliable. This is useful if commodity servers are to be used but is of less value if the underlying storage mechanism has its own high availability scheme (for example a SAN with mirrored controllers and multi-path connectivity to the hosts).
For example, Google's BigTable does not implement Quorum Replication by itself, although it does sit on GFS, a clustered file system that does use quorum replication. BigTable (or any shared-nothing system) could use a reliable storage system with multiple controllers and partition the data among the controllers. Parallel access would then be achieved through partitioning of the data.

Back to RDBMS platforms

There is no inherent reason that these techniques could not be used with a RDBMS. However lock and version management would be quite complex on such a system and any market for such a system is likely to be quite specialised. None of the mainstream RDBMS platforms use quorum replication and I'm not specifically aware of any RDBMS product (at least not one with any significant uptake) that does.
Shared-disk and shared-nothing systems can scale up to very large workloads. For instance, Oracle RAC can support 63 processing nodes (which could be large SMP machines in their own right) and an arbitrary number of storage controllers on the SAN. An IBM Sysplex (a cluster of zSeries mainframes) can support multiple mainframes (each with substantial processing power and I/O bandwidth of their own) and multiple SAN controllers. These architectures can support very large transaction volumes with ACID semantics, although they do assume reliable storage. Teradata, Netezza and other vendors make high-performance analytic platforms based on shared-nothing designs that scale to extremely large data volumes.
So far, the market for cheap but ultra-high volume fully ACID RDBMS platforms is dominated by MySQL, which supports sharding and multi-master replication. MySQL does not use quorum replication to optimise write throughput, so transaction commits are more expensive than on a NoSQL system. Sharding allows very high read throughputs (for example Facebook uses MySQL extensively), so this type of architecture scales well on read-heavy workloads.

An interesting debate

BigTable is a shared-nothing architecture (essentially a distributed key-value pair) as pointed out by Michael Hausenblas below. My original evaluation of it included the MapReduce engine, which is not a part of BigTable but would normally be used in conjunction with it in its most common implementations (e.g. Hadoop/HBase and Google's MapReduce framework).
Comparing this architecture with Teradata, which has physical affinity between storage and processing (i.e. the nodes have local storage rather than a shared SAN) you could argue that BigTable/MapReduce is a shared disk architecture through the globally visible parallel storage system.
The processing throughput of a MapReduce style system such as Hadoop is constrained by the bandwidth of a non-blocking network switch.¹ Non-blocking switches can, however, handle large bandwidth aggregates due to the parallelism inherent in the design, so they are seldom a significant practical constraint on performance. This means that a shared disk architecture (perhaps better referred to as a shared-storage system) can scale to large workloads even though the network switch is theoretically a central bottleneck.
The original point was to note that although this central bottleneck exists in shared-disk systems, a partitioned storage subsystem with multiple storage nodes (e.g. BigTable tablet servers or SAN controllers) can still scale up to large workloads. A non-blocking switch architecture can (in theory) handle as many current connections as it has ports.
¹ Of course the processing and I/O throughput available also constitutes a limit on performance but the network switch is a central point through which all traffic passes.

Tuesday, 15 March 2016

ORA-38760: This database instance failed to turn on flashback database

Cause

It is caused by the Flashback logs required are not available, as for example FLB logs was deleted using OS commands. It applied to standby databases as well as RAC. One of the RAC Instances or the single database instance does not come up, its looking for a Flash back log which has been accidently deleted.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-38760: This database instance failed to turn on flashback database

Action

1. Turn off / on Flashback

SQL> alter database flashback off;
SQL> alter database flashback on;

If it's success, then you can open it.

2. In cases where "Guaranteed Restore Point" is defined, RVWR will still try to write to flashback log even though you have disable it and it will fail again with same error

SQL> select flashback_on from v$database;
FLASHBACK_ON
------------------
RESTORE POINT ONLY

Step 1

SQL> select * from v$restore_point;
select * from v$restore_point
*
ERROR at line 1:
ORA-38701:

Here we are not able to query v$restore _point to find out the name.

Step 2

Search for restore point name in alert log. If you can find it, then use it.

Step 3

But, in our case, we could not find the name in alert log. So, We dump the controlfile to get the restore point name:

SQL> oradebug setmypid
SQL> alter session set events 'immediate trace name controlf level 9';
SQL> oradebug tracefile_name
'blabla .trc'

Open this 'blabla. trc' file and navigate to 'RESTORE POINT RECORDS'

***************************************************************************
RESTORE POINT RECORDS
***************************************************************************
(size = 212, compat size = 212, section max = 2048, section in-use = 1,
last-recid= 1, old-recno = 0, last-recno = 0)
(extent = 1, blkno = 240, numrecs = 2048)
RESTORE POINT #1:
restore point name: PRE_UPGRADE guarantee flag: 1 incarnation: 2next record
0 <<<< Name of restore point
restore point scn: 0x0000.fbff3d87 12/07/2015 14:56:23

Now we have name of Guaranteed Restore Point, After you find the restore point in Step 1, Step 2 or Step 3, use it to delete restore point and open database:

SQL> Drop restore point PRE_UPGRADE;
SQL> alter database open;

Tuesday, 3 November 2015

PRVF-4557 During installation of Oracle Binaries ( 11.2.0.4)

Introduction: During an Oracle software install for a new RAC environment once again I was surprised by Oracle . I had installed the Grid infrastructure (GI) 11.2.0.3 on RedHat Linux with success and I was about to install the Oracle Binaries which should be a piece of cake with the installed GI up and running properly. According to the old quote “Never stop learning ( or being surprised )” during the preparations the runInstaller managed to give me a surprise and I was unable to continue with installation. So obviously this surprise needed fixing first.

Details: When running the runInstaller , after choosing the option to install the Oracle binaries on a Real Application Cluster environment and after a couple of options I received a pop up telling me :
PRVF-4557 : Node application “ora.svrb1hr.vip” is offline on node
I was unable to go ahead and had to investigate. As always Google and the Oracle Community where my brothers in oracle arms for this so i Came across this scenario :
First I checked my hosts file to make sure information to be present on both nodes. That showed following details which looked ok:

oracle@svrb1hr:/opt/oracle [CRM]# grep  vip /etc/hosts 
10.97.242.32 svrb1hr-vip.myenv.dc-world.de svrb1hr-vip
10.97.242.33 svrb2hr-vip.myenv.dc-world.de svrb2hr-vip

Next step was checking my cluster resources. As mentioned before GI install had finished properly the day before so that really made me wonder:

oracle@svrb1hr:/opt/oracle [CRM]# crsctl status resource -t 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CLUSTERDATA.dg
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.LISTENER.lsnr
               ONLINE  OFFLINE      svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.asm
               ONLINE  ONLINE       svrb1hr                 Started             
               ONLINE  ONLINE       svrb2hr                 Started             
ora.gsd
               OFFLINE OFFLINE      svrb1hr                                     
               OFFLINE OFFLINE      svrb2hr                                     
ora.net1.network
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.ons
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.registry.acfs
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       svrb1hr                                     
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       svrb2hr                                     
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       svrb1hr                                     
ora.cvu
      1        ONLINE  ONLINE       svrb2hr                                     
ora.svrb1hr.vip
      1        ONLINE  INTERMEDIATE svrb2hr                 FAILED OVER         
ora.svrb2hr.vip
      1        ONLINE  ONLINE       svrb2hr                                     
ora.oc4j
      1        ONLINE  ONLINE       svrb1hr                                     
ora.scan1.vip
      1        ONLINE  ONLINE       svrb1hr                                     
ora.scan2.vip
      1        ONLINE  ONLINE       svrb2hr                                     
ora.scan3.vip
      1        ONLINE  ONLINE       svrb1hr

Well that was unexpected, because expectation was that the srvrb1hr.vip should be running on the first node. Still don’t understand what happened to cause this. Hmm and frankly if you have suggestions what happened please let me know . But I did know it was needed to bring back the vip address to the first server.
First attempt was to issue the command needed ( crs_relocate ) on the node where I was already working ( node 1) .
oracle@svrb1hr:/opt/oracle [CRM]# which crs_relocate
/opt/crs/product/11203/crs/bin/crs_relocate
oracle@svrb1hr:/opt/oracle [CRM]# crs_relocate svrb1hr.vip
CRS-0210: Could not find resource ‘svrb1hr.vip’.
### activities needed 2 b done from second node
Grumbling with this but well at least it was explained what to do next ….
So I opened a session against the second node , made sure my Oracle Home was pointing to the GI.
oracle@svrb1hr:/opt/oracle [CRM]# ssh svrb2hr
On the second box the command crs_relocate was entered:

                     
oracle@svrb2hr:/opt/oracle [CRS]# crs_relocate ora.svrb1hr.vip
Attempting to stop `ora.svrb1hr.vip` on member `svrb2hr`
Stop of `ora.svrb1hr.vip` on member `svrb2hr` succeeded.
Attempting to start `ora.svrb1hr.vip` on member `svrb1hr`
Start of `ora.svrb1hr.vip` on member `svrb1hr` succeeded.
Attempting to start `ora.LISTENER.lsnr` on member `svrb1hr`
Start of `ora.LISTENER.lsnr` on member `svrb1hr` succeeded.

Well that looked promessing so let’s check one more time then:

oracle@svrb2hr:/opt/oracle [CRS]#  crsctl status resource -t 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CLUSTERDATA.dg
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.LISTENER.lsnr
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.asm
               ONLINE  ONLINE       svrb1hr                 Started             
               ONLINE  ONLINE       svrb2hr                 Started             
ora.gsd
               OFFLINE OFFLINE      svrb1hr                                     
               OFFLINE OFFLINE      svrb2hr                                     
ora.net1.network
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.ons
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.registry.acfs
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       svrb1hr                                     
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       svrb2hr                                     
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       svrb1hr                                     
ora.cvu
      1        ONLINE  ONLINE       svrb2hr                                     
ora.svrb1hr.vip
      1        ONLINE  ONLINE       svrb1hr                                     
ora.svrb2hr.vip
      1        ONLINE  ONLINE       svrb2hr                                     
ora.oc4j
      1        ONLINE  ONLINE       svrb1hr                                     
ora.scan1.vip
      1        ONLINE  ONLINE       svrb1hr                                     
ora.scan2.vip
      1        ONLINE  ONLINE       svrb2hr                                     
ora.scan3.vip
      1        ONLINE  ONLINE       svrb1hr

Much better!! after this I restarted the runInstaller again from the first node and indeed installation of the Oracle Binaries was flawless.

Please refer to : https://mathijsbruggink.wordpress.com/2014/10/03/prvf-4557-during-installation-of-oracle-binaries-11-2-0-3/

Friday, 16 October 2015

ASM new features in Oracle Database 11g

ASM was first time introduced in Oracle Database 10g. Since then lot of improvement has been done. Here I am presenting changes and new feature when you use ASM in Oracle Database 11h environment.
--------------------------------------------------------------------------------------
The default parameter settings work perfectly for ASM. The only parameters needed for 11g ASM:
ASM_DISKSTRING
ASM_DISKGROUPS
INSTANCE_TYPE
--------------------------------------------------------------------------------------
ASM path are not part of regular OS level I/O. Because of this RDBMS does a raw I/O which will be faster as compared to file system I/O. Since there is no intervention OS.
-------------------------------------------------------------------------------------
ASM Fast Mirror Resync - When first time ASM features was released in 10g, Whenever it is unable to complete the write or read operation in a disk, disk was dropped from ASM group. This was relatively costlier operation. It could take hours to do that.
but now in 11g, When a disk goes offline following a transient failure, ASM tracks the ASM data extents that are modified during the outage. After the transient failure is repaired, ASM quickly resynchronizes only those ASM data extents that have been affected during the outage. This feature assumes that the content of the affected ASM disks has not been damaged or modified.
conclusion is this feature is beneficial by reducing the time required to resynchronize a transient failure of a disk.
ALTER DISKGROUP dgroupA SET ATTRIBUTE 'DISK_REPAIR_TIME'='4H';
ALTER DISKGROUP dgroupA
OFFLINE DISKS IN FAILGROUP site2 DROP AFTER 6H;
ALTER DISKGROUP dgroupA
ONLINE DISKS IN FAILGROUP site2 POWER 2 WAIT;
ALTER DISKGROUP dgroupA DROP DISKS IN FAILGROUP site2
FORCE;
V$ASM_ATTRIBUTE,V$ASM_DISK, V$ASM_DISK_IOSTAT,V$ASM_OPERATION views can be checked for further investigation.
ASM Preferred Mirror Read
============================================
Mirroring is done to ensure the protection from data loss. ASM failure groups in Oracle Database 10g always reads the primary copy of a mirrored ASM data extent. If a node is allowed to read the data from local or from closest node inspite that data is secondary, then it can improve the performance by reducing network traffic. Oracle Database 11g can do this by configuring preferred mirror read using the new ASM_PREFERRED_READ_FAILURE_GROUPS initialization parameter to specify a list of preferred mirror read names.
ASM_PREFERRED_READ_FAILURE_GROUPS=DATA.SITEA On first instance
ASM_PREFERRED_READ_FAILURE_GROUPS=DATA.SITEB On second instance
This can be monitored from v$asm_disk(preferred_read) and v$asm_disk_iostat tables.
ASM Scalability and Performance Enhancements
=================================================
ASM uses a predetermined number of ASM data extents of each size. As soon as a file crosses a certain threshold, the next ASM data extent size is used. An ASM file can begin with 1 AU; as the file’s size increases, the ASM data extent size also increases to 8 or 64 AUs based on predefined file size thresholds.
ASM also automatically defragments during allocation if the desired size is unavailable, thereby potentially affecting allocation times but offering much faster file opens, given the reduction in the memory required to store file ASM data extents.
SYSASM privilege
=======================================================
In Oracle Database 11g, if you log in to an ASM instance as SYSDBA, warnings are written in the corresponding alert.log file.
SYSASM privilege is replacing SYSDBA privilege as compared to proir version. SYSDBA privilege in ASM instance will be restricted.
ASM Disk Group Compatibility
================================================
An ASM instance can support different RDBMS clients running at different compatibility settings. The database compatible version setting of each instance must be greater than or equal to the RDBMS compatibility of all disk groups used by that database. Database instances are typically run from a different Oracle home than the ASM instance. This implies that the database instance may be running a different software version than the ASM instance. When a database instance first connects to an ASM instance, it negotiates the highest version that they both can support.
Main Objective behind this new feature is to ensure heterogenous environment should also supproted by ASM.
Few New Maintenance command in ASM
==========================================
ALTER DISKGROUP DATA CHECK; Checks all the metadata directories by default
ALTER DISKGROUP data MOUNT RESTRICT; When a disk group is mounted in RESTRICTED mode, clients cannot access the files in a disk group.
ALTER DISKGROUP data MOUNT FORCE; Mounts the disk group even if some disks belonging to the disk group are not accessible
DROP DISKGROUP data FORCE INCLUDING CONTENTS; Enables users to drop a disk group that cannot be mounted; fails if the disk group is mounted anywhere
ASMCMD is now more powerful in 11g ASM
========================================
The lsdsk command lists ASM disk information. This command can run in two modes:
• Connected mode: ASMCMD uses the V$ and GV$ views to retrieve disk information.
• Nonconnected mode: ASMCMD scans disk headers to retrieve disk information, using an ASM disk string to restrict the discovery set. The connected mode is always attempted first.
MD_BACKUP command is used to do backup of metadata from ASM disk group.
MD_RESTORE command is used to restore the lost metadata to ASM disk group.
ASMCMD> md_backup –b jfv_backup_file -g data
Disk group to be backed up: DATA#
Current alias directory path: jfv
ASMCMD>
ASMCMD> md_restore -b jfv_backup_file -t full -g data
Disk group to be restored: DATA#
ASMCMDAMBR-09358, Option -t newdg specified without any override options.
Current Diskgroup being restored: DATA
Diskgroup DATA created!
User Alias directory +DATA/jfv
created!
ASMCMD now support cp command by which we can copy the ASM files from local to remote node or vice versa and from file system to ASM disk or vice versa.
cp +DATA/ORCL/DATAFILE/TBSJFV.256.629730771 +DATA/ORCL/tbsjfv.bak
cp /home/oracle/tbsjfv.dbf +data/jfv
cp +DATA/orcl/datafile/tbsjfv.256.629989893 \sys@edcdr12p1.+ASM2:+D2/jfv/tbsjfv.dbf

ASM Preffered Mirror

Introduction

The concept of the Preferred Read is not a new idea, but is now implemented in Oracle’s ASM volume
management in Oracle 11g.The concept is to read from the storage that can present the needed data at a lower
latency. Initially, this was designed for WAN or site-specific storage in order to avoid higher-latency site
connections. By restricting data reads to the local storage, the application would be able to service requests at
nominal read speeds while writes were the only communication needed to traverse the long haul site link. This is
a feature that is available to most Operating Systems with their included volume manager and as a feature to
Symantec/Veritas through the title Preferred Plex. This paper will discuss the merits of using PRG technology with Oracle ASM

eg:

SELECT name, failgroup FROM v$asm_disk;

NAME FAILGROUP
------------------------------ ------------------------------
DATA_0000 DATA_0000
DATA_0001 DATA_0001
DATA_0002 DATA_0002

3 rows selected.

SQL> ALTER SYSTEM SET ASM_PREFERRED_READ_FAILURE_GROUPS = 'data.data_0000', 'data.data_0001', 'data.data_0002';

Allocating ASM Extent Sets

With ASM redundancy, the first file extent allocated is chosen as primary extent, and
the mirrored extent is called the secondary extent. In the case of high redundancy,
there will two secondary extents. This logical grouping of primary and secondary
extents is called an extent set. When a block is read from disk, it is always read from
the primary extent, unless the primary extent cannot be read. In Oracle Database
11g, the preferred read feature allows the database to read the secondary extent first
instead of reading the primary extent. This is especially important for RAC Extended

Cluster implementations.

As stated earlier, ASM in Oracle Database 10g always reads the primary copy of a mirrored extent set. Thus, a read for a specific block may require a read of the primary extent at the remote site across the interconnect. Accessing a remote disk through a metropolitan area or wide area storage network is substantially slower than accessing a local disk. This can tax the interconnect as well as result in high I/ O and network latency. To assuage this, Oracle Database 11g provides a feature called preferred reads. This feature enables ASM administrators to specify a failure group for local reads–– that is, provide preferred reads. In a normal or high-redundancy diskgroup, when a secondary extent is on a preferred disk and the primary extent is remote, the secondary extent is read rather than the primary one on that node. This feature is especially beneficial for extended cluster configurations.

To configure preferred read failure groups the disk group compatibility attributes must be set to 11.1 or higher. Once the compatibility options are correct, the ASM_PREFERRED_READ_FAILURE_GROUPS parameter is set to the preferred failure groups for each node.

SQL> select compatibility, database_compatibility
  2  from  v$asm_diskgroup
  3  where name = 'DG1'
  4  /

ASM_PREFERRED_READ_FAILURE_GROUPS=DATA.SITEA On first instance
ASM_PREFERRED_READ_FAILURE_GROUPS=DATA.SITEB On second instance