Saturday, 21 January 2017

Funny "ORA-01017: invalid username/password; logon denied" during DataGuard switchover operation from DG broker

Recently one of my customers encountered a problem when they tried to perform a switchover from DataGuard broker command line interface.

$ dgmgrl /
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected as SYSDG.
DGMGRL> show configuration;

Configuration - db12c

  Protection Mode: MaxPerformance
  Members:
  db12c  - Primary database
    sdb12c - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 26 seconds ago)

DGMGRL> validate database sdb12c;

  Database Role:     Physical standby database
  Primary Database:  db12c

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    db12c:   On
    sdb12c:  Off

DGMGRL> switchover to sdb12c;
Performing switchover NOW, please wait...
Operation requires a connection to instance "sdb12c" on database "sdb12c"
Connecting to instance "sdb12c"...
ORA-01017: invalid username/password; logon denied

Warning: You are no longer connected to ORACLE.

        connect to instance "sdb12c" of database "sdb12c"



This output is from the test environment where we could easily reproduce the problem.
The installation is using single-tenant fashion of 12.1 database with ASM and Oracle Restart. Therefore we must take into consideration also the fact, that the databases are started/stopped through Oracle Clusterware.

Investigation:

1.) Check of if password files on primary and standby are the same - shutdown the standby database and copy password file prom primary.

Checking what we see from V$PWFILE_USERS on primary database:

SQL> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM     CON_ID
------------------------------ ----- ----- ----- ----- ----- ----- ----------
SYS                            TRUE  TRUE  FALSE FALSE FALSE FALSE          0

SYSDG                          FALSE FALSE FALSE FALSE TRUE  FALSE          1
SYSBACKUP                      FALSE FALSE FALSE TRUE  FALSE FALSE          1
SYSKM                          FALSE FALSE FALSE FALSE FALSE TRUE           1
C##_ADM_JSENEGACNIK            FALSE TRUE  FALSE FALSE FALSE FALSE          1
C##_SYS                        TRUE  FALSE FALSE FALSE FALSE FALSE          0
JOC                            TRUE  FALSE FALSE FALSE FALSE FALSE          3



After restarting standby database only to MOUNT state with refreshed password file we get the following:

SQL> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS SYSBA SYSDG SYSKM     CON_ID
------------------------------ ----- ----- ----- ----- ----- ----- ----------
SYS                            TRUE  TRUE  FALSE FALSE FALSE FALSE          1
SYSDG                          FALSE FALSE FALSE FALSE TRUE  FALSE          1
SYSBACKUP                      FALSE FALSE FALSE TRUE  FALSE FALSE          1
SYSKM                          FALSE FALSE FALSE FALSE FALSE TRUE           1
C##_ADM_JSENEGACNIK            FALSE TRUE  FALSE FALSE FALSE FALSE          1
C##_SYS                        TRUE  FALSE FALSE FALSE FALSE FALSE          1


The standby database represents the password file users slightly different.
Important to interpret here is the CON_ID column which per Oracle documentation has the following meaning:


    The ID of the container to which the data pertains. Possible values include:

        0: This value is used for rows containing data that pertain to the entire CDB. This value is also used for rows in non-CDBs.

        1: This value is used for rows containing data that pertain to only the root

        n: Where n is the applicable container ID for the rows containing data

As the standby database is only in MOUNT state the value 1 in CON_ID is normal because the database is not opened yet in read write mode and therefore no operations on pluggable databases can be performed.

The displayed message when error occurred was:

Operation requires a connection to instance "sdb12c" on database "sdb12c"
Connecting to instance "sdb12c"...
ORA-01017: invalid username/password; logon denied


Analyzing this text means that we have a connection problem when the DG broker tries to connect to standby database.
We are connected as SYSDG user with SYSDG privileges in DG broker on primary site and we get the error when DG broker tries to connect to standby (remote) instance. As we have high level of privileges and the parameter remote_login_passwordfile is set to EXCLUSIVE it means that obviously we have a problem how we connect to the DGMGRL.

Therefore let us try to connect with username and password remotely from DGMGRL utility on primary database using connection string (alias) from standby database:

$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sysdg@sdb12c
Password:
ORA-01017: invalid username/password; logon denied

Warning: You are no longer connected to ORACLE.
 

Ooops, we get the same error. So when SYSDG user tries to connect to standby database via password file the username/password combination is wrong. We can connect as SYSDG user to DGMGRL utility using only "/" because we are authenticated via OSGROUP authorization. But for stopping/starting remote databases via Oracle Clusterware we need to be logged in via password file.

From the 12c manuals it looks like that the only way to change the password for SYSDG user in password file is by revoking and granting SYSDG privilege.


Find all users who have been granted the SYSDG privilege.

    SELECT USERNAME FROM V$PWFILE_USERS WHERE USERNAME != 'SYS' AND SYSDG='TRUE';

   

    Revoke and regrant the SYSDG privilege to these users.

    REVOKE SYSDG FROM non-SYS-user;
    GRANT SYSDG TO non-SYS-user;  



Actually it is enough that you change the password via "ALTER USER" command. The time  at the OS level of the password file will reflect this change so obviously the password file is updated with new password for SYSDG user.

SQL> alter user sysdg identified by Secret$124;


I have intentionally selected to put special character in password in order to show a problem when you try to use such password from the command line.


$ dgmgrl sys/Secret$124@db12c
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
ORA-01017: invalid username/password; logon denied

Warning: You are no longer connected to ORACLE.
DGMGRL>



Again we get error but this time due to "$" character in password. Escaping it does not resolve the problem either, also putting connection parameter into quotes.


$ dgmgrl "sys/Secret$124@db12c"
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
ORA-01017: invalid username/password; logon denied

Warning: You are no longer connected to ORACLE.



So the right way to test connection is:


$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sysdg@sdb12c
Password:
Connected as SYSDG.
DGMGRL>
DGMGRL> connect sysdg@db12c
Password:
Connected as SYSDG.



Now let us perform switchover to standby and afterwards back to old primary:


$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sysdg@db12c
Password:
Connected as SYSDG.
DGMGRL> validate database sdb12c;

  Database Role:     Physical standby database
  Primary Database:  db12c

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    db12c:   On
    sdb12c:  Off

DGMGRL> switchover to sdb12c;
Performing switchover NOW, please wait...
Operation requires a connection to instance "sdb12c" on database "sdb12c"
Connecting to instance "sdb12c"...
Connected as SYSDG.
New primary database "sdb12c" is opening...
Oracle Clusterware is restarting database "db12c" ...
Switchover succeeded, new primary is "sdb12c"


DGMGRL>  validate database db12c;

  Database Role:     Physical standby database
  Primary Database:  sdb12c

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    sdb12c:  Off
    db12c:   On

DGMGRL> switchover to db12c;
Performing switchover NOW, please wait...
Operation requires a connection to instance "db12c" on database "db12c"
Connecting to instance "db12c"...
Connected as SYSDG.
New primary database "db12c" is opening...
Oracle Clusterware is restarting database "sdb12c" ...
Switchover succeeded, new primary is "db12c"



On Oracle support site there are several notes regarding the ORA-1017 error. So it is a good idea to check them in case that your problem is not as described above.

It is mandatory to test also the static connection identifiers names "StaticConnectIdentifier" which are part of the DG configuration. Do this for primary and standby database.


DGMGRL> show database verbose db12c;

Database - db12c

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    db12c

  Properties:
    DGConnectIdentifier             = 'db12c'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    RedoRoutes                      = ''
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '0'
    TransportLagThreshold           = '0'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'MANUAL'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '4'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = ''
    LogFileNameConvert              = ''
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.110)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=db12c_DGMGRL)(INSTANCE_NAME=db12c)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Database Status:
SUCCESS

DGMGRL> connect sysdg@'(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.110)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=db12c_DGMGRL)(INSTANCE_NAME=db12c)(SERVER=DEDICATED)))';
Password:
Connected as SYSDG.

DGMGRL> connect sysdg@"(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.110)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=db12c_DGMGRL)(INSTANCE_NAME=db12c)(SERVER=DEDICATED)))"
Password:
Connected as SYSDG.



It is mandatory to put the value of StaticConnectIdentifier in single or double quotes.

Of course you can use SYS user to perform switchover operation. Again, connect via SQL*Net (as being remote) to be authenticated by password file. Needless to say that you have to have the right setup in litener.ora and tnsnames.ora for both databases. I found that the easiest way is to have static definition in listener.ora file:


LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = olinux)(PORT = 1521))
      (ADDRESS = (PROTOCOL = IPC)(KEY = KEY1521))
    )
  )

SID_LIST_LISTENER =
 (SID_LIST =
  (SID_DESC =
    (GLOBAL_DBNAME = db12c_DGMGRL)
    (ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)
    (SID_NAME = db12c)
  )
  (SID_DESC =
    (GLOBAL_DBNAME = db12c)
    (ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)
    (SID_NAME = db12c)
  )
 )


Similar is for the standby database.

Conclusion:

Although one might think that the "validate database" command checks everything, it actually does not. Oracle could check also the connectivity during "validate database" command, but does not. So it is a big surprise when one encounters ORA-1017 error due to incomplete check performed by DG.


There is a document on Oracle support site "Oracle Data Guard Broker and Static Service Registration (Doc ID 1387859.1)" about configuring DGB and static connection identifiers.
As per this document the requirement for

Single Instance Database with Oracle Restart

Here there is no cluster, but clusterware has been installed to enable the Oracle Restart feature. The local listener LISTENER has its LISTENER.ORA located in the /network/admin directory of the Oracle Grid Infrastructure home. In this case the static service registration is:

SID_LIST_LISTENER=
  (SID_LIST=
    (SID_DESC=
     (GLOBAL_DBNAME=db_unique_name_DGMGRL.db_domain)
     (ORACLE_HOME=oracle_home)
     (SID_NAME=sid_name)
    )
  )

As with Single Instance databases, the SID_NAME value sid_name will default to the db_unique_name.



Updated on Jan 23nd 2017.

3 comments:

  1. Hey Jože,
    any reason why a static DGMGRL listener entry is still defined? I mean - http://docs.oracle.com/database/121/DGBKR/install.htm:

    "To enable DGMGRL to restart instances during the course of broker operations, a static service must be registered with the local listener of each instance. A static service registration is also required to enable the observer to restart instances as part of automatic reinstatement of the old primary database after a fast-start failover has occurred. A static service needs to be defined and registered only if Oracle Clusterware or Oracle Restart is not being used."

    Thanks.

    Regards
    Stefan

    ReplyDelete
    Replies
    1. Stefan,
      Thanks for the comment and I was expecting that it should work like described in the manual. Unfortunately it does not, at least with my installation.
      I removed the DG configuration, removed DG configuration files and created a brand new configuration.
      On primary site in the listener I see:
      Service "db12c" has 1 instance(s).
      Instance "db12c", status READY, has 1 handler(s) for this service...
      Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
      LOCAL SERVER
      Service "db12c_DGB" has 1 instance(s).
      Instance "db12c", status READY, has 1 handler(s) for this service...
      Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
      LOCAL SERVER

      On standby location I see only one entry:
      Service "sdb12c" has 1 instance(s).
      Instance "sdb12c", status READY, has 1 handler(s) for this service...
      Handler(s):
      "DEDICATED" established:2 refused:0 state:ready
      LOCAL SERVER

      So only on primary site I see a service with DGB.

      Checking the configuration on primary site:

      StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.110)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=db12c_DGMGRL)(INSTANCE_NAME=db12c)(SERVER=DEDICATED)))'

      Standby side:
      StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.107)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=sdb12c_DGMGRL)(INSTANCE_NAME=sdb12c)(SERVER=DEDICATED)))'

      Checking the connectivity:
      DGMGRL> connect sysdg@'(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.107)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=sdb12c_DGMGRL)(INSTANCE_NAME=sdb12c)(SERVER=DEDICATED)))'
      Password:
      Unable to connect to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.107)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=sdb12c_DGMGRL)(INSTANCE_NAME=sdb12c)(SERVER=DEDICATED)))
      ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

      So there is a DGMGRL suffix added to service name but service registered on primary database has DGB suffix. On standby side no dataguard service registered at all.

      This is on 12.1.0.2 version. Without static definitions it will not work. Besides that my client is using non default ports so we had to create static definitions in listener.ora.

      I never had enough time to investigate this problem, but on internet several people reported similar problems. I think this is a kind of a "feature" or maybe I am getting this completely wrong.

      Kind regards, Joze

      Delete
    2. Stefan,
      I retested again and I found that I forgot to enable configuration. After that I got service sdb12c_DGB on standby site in listener activated automatically, but the static connect identifiers are still wrong in DGB configuartion - they expect *_DGMGRL service.

      Delete