修改sys密码导致legato无法正常备份


今天接到说某省的legato备份无法执行,查看legato的monitor窗口,发现没有明显报错,直接就出备份fail了信息了。

由于monitor中说“……Hostname(s) Unresolved,1 Failed,1 Succeeded(xj_db Failed)”,一开始是怀疑hostname的问题,但是在备份服务器上ping client都没有问题:

C:\Documents and Settings\Administrator>ping xj_db
 
Pinging xj_db [10.203.102.11] with 32 bytes of data:
 
Reply from 10.203.102.11: bytes=32 time
<1ms TTL=255
Reply from 10.203.102.11: bytes=32 time<1ms TTL=255
Reply from 10.203.102.11: bytes=32 time<1ms TTL=255
 
Ping statistics for 10.203.102.11:
    
Packets: Sent = 3, Received = 3, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    
Minimum = 0ms, Maximum = 0ms, Average = 0ms

登录client,也是就db主机,用root权限检查相关log:

进/nsr/applogs目录,vi nsrnmostart.log

……
(20721) Legato NetWorker Module for Oracle v4.1
(20721) Tue Jan  6 17:12:20 2009
(20721) Entering Function nwora_process_calling_args
(20721) argc = 16
(20721) Calling Summary
(20721) argv[ 0] = nsrnmostart
(20721) argv[ 1] = -s
(20721) argv[ 2] = xj_bak01
(20721) argv[ 3] = -g
(20721) argv[ 4] = OracleArch
(20721) argv[ 5] = -LL
(20721) argv[ 6] = -m
(20721) argv[ 7] = xj_db
(20721) argv[ 8] = -l
(20721) argv[ 9] = full
(20721) argv[10] = -q
(20721) argv[11] = -W
(20721) argv[12] = 78
(20721) argv[13] = -N
(20721) argv[14] = /oracle/app/oracle/product/9.2.0/bin/OracleArch
(20721) argv[15] = /oracle/app/oracle/product/9.2.0/bin/OracleArch
(20721) Environment Read by nsrnmostart
(20721) ORACLE_SID           = xjmisc
(20721) ORACLE_HOME          = /oracle/app/oracle/product/9.2.0
(20721) PRECMD               =
(20721) POSTCMD              =
(20721) PATH                 = /bin:/usr/sbin:/usr/bin:/nsr/bin:/opt/networker/bin
(20721) NSR_RMAN_ARGUMENTS   = msglog '/nsr/applogs/msglog.log' append
(20721) NSR_RMAN_OUTPUT      = /nsr/applogs/msglog.log append
(20721) Leaving Function nwora_process_calling_args
 
(20721) Entering Function nwora_scan_rman_script
(20721) Checking rman script /oracle/app/oracle/product/9.2.0/bin/OracleArch for validity.
(20721) found connect catalog string.
(20721) found connect target string.
(20721) found allocate channel:   allocate channel t1 type 'sbt_tape'
(20721) found allocate channel:   allocate channel t2 type 'sbt_tape'
(20721) found allocate channel:   allocate channel t3 type 'sbt_tape'
(20721) Completed checking of rman script.
(20721) Leaving Function nwora_scan_rman_script
 
(20721) Entering Function nwora_nsrnmostart_rman
(20721) nwora_find_rman_version: file /oracle/app/oracle/product/9.2.0/bin/tmp000002 created
(20721) nwora_find_rman_version: RMAN version: major 9, minor 2
(20721) RMAN internal version 0 found after send command testing
(20721) savegrp information added to 3 channels
(20721) exepath = /oracle/app/oracle/product/9.2.0/bin/rman
(20721) cmd_args = msglog '/nsr/applogs/msglog.log' append
(20721) rman_script = /oracle/app/oracle/product/9.2.0/bin/nmosb000003
(20721) saveset_name = /oracle/app/oracle/product/9.2.0/bin/OracleArch
(20721) Launching backup process
(20721) Backup process failed: RMAN exited with return code '1'.
(20721) nwora_nsrnmostart_rman: RMAN script execution is not successful. RMAN exited with return code '1'.
(20721) Leaving Function nwora_nsrnmostart_rman

发现是rman的脚本没有执行成功:RMAN script execution is not successful。

我们测试一下rman的脚本。根据legato界面中的group-save set,

找到脚本/oracle/app/oracle/product/9.2.0/bin/OracleArch:

xj_db01:[/nsr/applogs]#cat /oracle/app/oracle/product/9.2.0/bin/OracleArch
connect catalog rman/rman@xjrman;
connect target sys/pwd111;
run {
  allocate channel t1 type 'sbt_tape'
  parms 'ENV=(NSR_CLIENT=xj_db)';
  allocate channel t2 type 'sbt_tape'
  parms 'ENV=(NSR_CLIENT=xj_db)';
  allocate channel t3 type 'sbt_tape'
  parms 'ENV=(NSR_CLIENT=xj_db)';
  sql 'alter system archive log current';
  crosscheck archivelog all;
  backup
    format "arch_%d_t%t_s%s_p%p"
    (archivelog all delete input);
  release channel t1;
  release channel t2;
  release channel t3;
}

在oracle用户下测试能备份成功!

继续检查/nsr/applogs下的msglog.log:vi msglog.log

Recovery Manager: Release 9.2.0.6.0 - 64bit Production
 
Copyright (c) 1995, 2002, Oracle Corporation.  All rights reserved.
 
RMAN> connect catalog rman/rman@xjrman;
2> connect target sys/pwd111;
3> run {
4>   allocate channel t1 type 'sbt_tape'
5>   parms 'ENV=(NSR_CLIENT=xj_db,
6> NSR_SERVER=xj_bak01,
7> NSR_GROUP=OracleArch,
8> NSR_SAVESET_NAME=/oracle/app/oracle/product/9.2.0/bin/OracleArch)';
9>   allocate channel t2 type 'sbt_tape'
10>   parms 'ENV=(NSR_CLIENT=xj_db,
11> NSR_SERVER=xj_bak01,
12> NSR_GROUP=OracleArch,
13> NSR_SAVESET_NAME=/oracle/app/oracle/product/9.2.0/bin/OracleArch)';
14>   allocate channel t3 type 'sbt_tape'
15>   parms 'ENV=(NSR_CLIENT=xj_db,
16> NSR_SERVER=xj_bak01,
17> NSR_GROUP=OracleArch,
18> NSR_SAVESET_NAME=/oracle/app/oracle/product/9.2.0/bin/OracleArch)';
19>   sql 'alter system archive log current';
20>   crosscheck archivelog all;
21>   backup
22>     format "arch_%d_t%t_s%s_p%p"
23>     (archivelog all delete input);
24>   release channel t1;
25>   release channel t2;
26>   release channel t3;
27> }
28>
connected to recovery catalog database
 
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
ORA-01031: insufficient privileges

发现rman备份的报错信息了。时候ora-1031的报错,legato是在root下安装,执行的时候,是root用户。现在root用户执行rman脚本报错,难道是root调用oracle用户的环境变量出了问题?

继续找legato的环境变量文件:

根据legato界面的backup command中的文件名,找到/opt/networker/bin/nsrnmo1

cat nsrnmo1

xj_db01:[/opt/networker/bin]#cat nsrnmo1
#!/bin/sh
#
# $Id: nsrnmo.template,v 1.3.52.4 2003/06/25 21:42:19 yozekinc Exp $ Copyright (c) 2003, Legato Systems, Inc.
#
# All rights reserved.
#
# nsrnmo.sh
#
# Legato Networker Module for Oracle 4.1
#
# This script is part of the Legato NetWorker Module for Oracle.
# Modification of this script should be done with care and only after reading
# the administration manual included with this product.
#
# This script should only be run as part of a scheduled savegroup.
#
# Returns 0 on success; 1 on failure.
#
 
#
# REQUIRED Variable: ORACLE_HOME
#
# Default value: NONE (site specific)
#
# Description: Specifies where the Oracle Server installation is located.
# It is a requirement that rman be located in ORACLE_HOME/bin.
#
# Samples:
#       ORACLE_HOME=/disk3/oracle/app/oracle/product/8.1.6
#
ORACLE_HOME=/oracle/app/oracle/product/9.2.0
 
 
# REQUIRED Variable: PATH
#
# Default value: NONE (site and platform specific)
#
# Description: Set up the PATH environment variable.
# This must be configured to include the path to "nsrnmostart"
#
# Samples:
#       PATH=/bin:/usr/sbin:/usr/bin:/nsr/bin:/opt/networker/bin
#
PATH=/bin:/usr/sbin:/usr/bin:/nsr/bin:/opt/networker/bin
 
#
# Optional Variable: ORACLE_SID
#
# Default value: NONE (site specific)
#
# Description: Specifies the SID of the Oracle database being backed up.
# It is required by proxy copy backups when catalog synchronization is
# enabled.
#
# Samples:
#       ORACLE_SID=orcl815
#
ORACLE_SID=xjmisc
 
#
# Optional Variable: NSR_RMAN_ARGUMENTS
#
# Default value: NONE (site specific)
#
# Description: Provide extra rman parameters.
# You must enclose the command in quotes or it will not be
# passed correctly to rman.
#
# Samples:
#       NSR_RMAN_ARGUMENTS="nocatalog msglog '/nsr/applogs/msglog.log' append"
#
#       NSR_RMAN_ARGUMENTS="nocatalog"
#
NSR_RMAN_ARGUMENTS="msglog '/nsr/applogs/msglog.log' append"
 
#
# Optional Variable: NSR_RMAN_OUTPUT
#
# Default value: NONE (site specific)
#
# Description: Provide option to capture the RMAN standard output
# if RMAN "msglog" or "log" command line option is not set.
# The connect strings will be hidden in this file.
#
# Samples:
#       NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log append"
#
#       NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log"
#
NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log append"
 
#
# Optional Variable: NSR_SB_DEBUG_FILE
#
# Default value: NONE (site specific)
#
# Description:  To enable debugging output for NMO scheduled backups set
#                               the following to an appropriate path and file name.
#                               Set this variable for debugging purposes only
#
# Samples:
#       NSR_SB_DEBUG_FILE=/nsr/applogs/nsrnmostart.log
#
NSR_SB_DEBUG_FILE=
 
#
# Optional Variable: PRECMD
#
# Default value: NONE
#
# Description:  This variable can be used to run a command or command script
#                               before nsrnmostart. It will be launched once for every saveset
#                               entered in the client setup.
#
PRECMD=
 
#
# Optional Variable: POSTCMD
#
# Default value: NONE
#
# Description:  This variable can be used to run a command or command script
#                               after nsrnmostart has completed. It will be launched once for
#                               every saveset entered in the client setup.
#
POSTCMD=
 
#
# Optional Variable: SHLIB_PATH,LD_LIBRARY_PATH
#
# Default value: NONE
#
# Description:  These variables may have to be set on HP-UX 11.0 (64 bit) operating systems.
#                               We suggest leaving it unset unless you have a scheduled backup problem.
#                               If it is set you must also uncomment the export SHLIB_PATH and LD_LIBRARY_PATH
#                               in the function export_environment_variables below.
#
# Samples:
#       SHLIB_PATH=/disk3/oracle/app/oracle/product/8.1.6/lib
#       LD_LIBRARY_PATH=/disk3/oracle/app/oracle/product/8.1.6/lib64
#
 
#
# Optional Variable: TNS_ADMIN
#
# Default value: NONE
#
# Description:  This variable needs to be set if Oracle Net configuration
#                       files are not located in default locations.If it is set you must also uncomment
#                       the export TNS_ADMIN in the function export_environment_variables below.
#
# Samples:
#       TNS_ADMIN=/disk3/oracle/app/oracle/product/8.1.6/network/admin1
#
 
export_environment_variables()
{
 
export ORACLE_HOME
export ORACLE_SID
export NSR_RMAN_ARGUMENTS
export NSR_RMAN_OUTPUT
export PRECMD
export POSTCMD
export PATH
export NSR_SB_DEBUG_FILE
#export SHLIB_PATH
#export LD_LIBRARY_PATH
#export TNS_ADMIN
 
}
 
 
###########################################################################
# Do not edit anything below this line.
###########################################################################
 
 
Pid=0                   # process to kill if we are cancelled
nsrnmostart_status=0    # did it work?
 
 
#
# Handle cancel signals sent by savegrp when user stops the group.
#
handle_signal()
{
        if [ $Pid != 0 ]; then
                kill -2 $Pid
        fi
        exit 1
}
 
#
# The main portion of this shell.
#
 
#
# Make sure we respond to savegrp cancellations.
#
trap handle_signal 2 15
 
#
# Build the nsrnmostart command
#
 
opts=""
while [ $# -gt 0 ]; do
        case "$1" in
        -s )    # server name
                opts="$opts $1 '$2'"
                shift 2
                ;;
        -N )    # save set name
                opts="$opts $1 '$2'"
                shift 2
                ;;
        -e )    # expiration time
                opts="$opts $1 '$2'"
                shift 2
                ;;
        -b )    # Specify pool
                opts="$opts $1 '$2'"
                shift 2
                ;;
        -c )    # Specify the client name
                opts="$opts $1 '$2'"
                shift 2
                ;;
        -g )    # Specify group
                opts="$opts $1 '$2'"
                shift 2
                ;;
        -m )    # Specify masquerade
                opts="$opts $1 '$2'"
                shift 2
                ;;
        -A )    # Specify PowerSnap options
                opts="$opts $1 '$2'"
                shift 2
                ;;
        *)      # rest of options     
                opts="$opts $1"
                shift
                ;;
        esac
done
 
if [ "${BACKUP_OPT}" != "" ];
then
        BACKUP_COMMAND_LINE="nsrnmostart ""$BACKUP_OPT"" $opts"
else
        BACKUP_COMMAND_LINE="nsrnmostart $opts"
fi
 
#
# Export all necessary environment variables
#
export_environment_variables
 
#
# Call nsrnmostart to do the backups.
#
 
#print $BACKUP_COMMAND_LINE
eval ${BACKUP_COMMAND_LINE} &
 Pid=$!
 wait $Pid
 
 nsrnmostart_status=$?
 if [ $nsrnmostart_status != 0 ] ; then
        echo "nsrnmostart returned status of "$nsrnmostart_status
        echo  $0 "exiting."
        exit 1
 fi
 
exit 0

检查发现里面的环境变量没有问题:ORACLE_SID,ORACLE_HOME,PATH都设置正确了。

在root手工测试了一次指定环境变量,手工连target数据库:

xj_db01:[/opt/networker/bin]#export ORACLE_SID=xjmisc
xj_db01:[/opt/networker/bin]#export ORACLE_HOME=/oracle/app/oracle/product/9.2.0
xj_db01:[/opt/networker/bin]#export PATH=$ORACLE_HOME/bin
xj_db01:[/opt/networker/bin]#rman
 
Recovery Manager: Release 9.2.0.6.0 - 64bit Production
 
Copyright (c) 1995, 2002, Oracle CorporationAll rights reserved.
 
RMAN> connect target sys/pwd111
 
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
ORA-01031: insufficient privileges
 
RMAN> exit

发现确实root用户无法登录。

检查数据库的登录策略设置:

切换到oracle用户,sqlplus登录后:

SQL> show parameter remote
 
NAME                                 TYPE        VALUE
----------------------------------
-- ----------- ------------------------------
remote_archive_enable                string      true
remote_dependencies_mode             string      TIMESTAMP
remote_listener                      string
remote_login_passwordfile            string      EXCLUSIVE
remote_os_authent                    boolean     FALSE
remote_os_roles                      boolean     FALSE
SQL>

上述策略表示除了dba组用户之外,其他用户登录需要通过密码文件验证。

进一步查看密码文件的创建时间:

oracle@xj_db01:/oracle/app/oracle/product/9.2.0 > cd dbs
oracle@xj_db01:/oracle/app/oracle/product/9.2.0/dbs > ll
total 27548
-rw-r--r--   1 oracle     dba           8385 Mar  9  2002 init.ora
-rw-r--r--   1 oracle     dba          12920 Mar  9  2002 initdw.ora
-rw-r--rw-   1 oracle     dba           1041 Jun 19  2005 initxjmisc.bak
-rw-rw-rw-   1 oracle     dba             70 Apr 28  2008 initxjmisc.ora
-rw-rw-rw-   1 oracle     dba             36 Dec 26  2005 initxjmisc.ora.20051226
-rw-rw----   1 oracle     dba             24 Dec  3 05:07 lkXJMISC
-rwSr-----   1 oracle     dba           3072 Jan  5 16:34 orapwxjmisc
-rw-rw----   1 oracle     dba        14065664 Jan  6 15:49 snapcf_xjmisc.f
oracle@xj_db01:/oracle/app/oracle/product/9.2.0/dbs >

发现密码文件的时间是最近的,因此判断最近有人改过sys用户的密码!!

咨询驻点后,确认了在5日下午,有人确实改动了sys用户的密码,将密码改成了pwd222,因此,本次故障的原因确认。

找到了原因,处理就方便很多了,重新将sys用户的密码改回成pwd111,测试legato的arch备份,成功!

· 【文章发布信息】发表于: 2009-01-06 @ 19:07:43 · ||分类: ..experience, Working case

留条评论