修改sys密码导致legato无法正常备份
今天接到说某省的legato备份无法执行,查看legato的monitor窗口,发现没有明显报错,直接就出备份fail了信息了。

由于monitor中说“……Hostname(s) Unresolved,1 Failed,1 Succeeded(xj_db Failed)”,一开始是怀疑hostname的问题,但是在备份服务器上ping client都没有问题:
Pinging xj_db [10.203.102.11] with 32 bytes of data:
Reply from 10.203.102.11: bytes=32 time<1ms TTL=255
Reply from 10.203.102.11: bytes=32 time<1ms TTL=255
Reply from 10.203.102.11: bytes=32 time<1ms TTL=255
Ping statistics for 10.203.102.11:
Packets: Sent = 3, Received = 3, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms
登录client,也是就db主机,用root权限检查相关log:
进/nsr/applogs目录,vi nsrnmostart.log
(20721) Legato NetWorker Module for Oracle v4.1
(20721) Tue Jan 6 17:12:20 2009
(20721) Entering Function nwora_process_calling_args
(20721) argc = 16
(20721) Calling Summary
(20721) argv[ 0] = nsrnmostart
(20721) argv[ 1] = -s
(20721) argv[ 2] = xj_bak01
(20721) argv[ 3] = -g
(20721) argv[ 4] = OracleArch
(20721) argv[ 5] = -LL
(20721) argv[ 6] = -m
(20721) argv[ 7] = xj_db
(20721) argv[ 8] = -l
(20721) argv[ 9] = full
(20721) argv[10] = -q
(20721) argv[11] = -W
(20721) argv[12] = 78
(20721) argv[13] = -N
(20721) argv[14] = /oracle/app/oracle/product/9.2.0/bin/OracleArch
(20721) argv[15] = /oracle/app/oracle/product/9.2.0/bin/OracleArch
(20721) Environment Read by nsrnmostart
(20721) ORACLE_SID = xjmisc
(20721) ORACLE_HOME = /oracle/app/oracle/product/9.2.0
(20721) PRECMD =
(20721) POSTCMD =
(20721) PATH = /bin:/usr/sbin:/usr/bin:/nsr/bin:/opt/networker/bin
(20721) NSR_RMAN_ARGUMENTS = msglog '/nsr/applogs/msglog.log' append
(20721) NSR_RMAN_OUTPUT = /nsr/applogs/msglog.log append
(20721) Leaving Function nwora_process_calling_args
(20721) Entering Function nwora_scan_rman_script
(20721) Checking rman script /oracle/app/oracle/product/9.2.0/bin/OracleArch for validity.
(20721) found connect catalog string.
(20721) found connect target string.
(20721) found allocate channel: allocate channel t1 type 'sbt_tape'
(20721) found allocate channel: allocate channel t2 type 'sbt_tape'
(20721) found allocate channel: allocate channel t3 type 'sbt_tape'
(20721) Completed checking of rman script.
(20721) Leaving Function nwora_scan_rman_script
(20721) Entering Function nwora_nsrnmostart_rman
(20721) nwora_find_rman_version: file /oracle/app/oracle/product/9.2.0/bin/tmp000002 created
(20721) nwora_find_rman_version: RMAN version: major 9, minor 2
(20721) RMAN internal version 0 found after send command testing
(20721) savegrp information added to 3 channels
(20721) exepath = /oracle/app/oracle/product/9.2.0/bin/rman
(20721) cmd_args = msglog '/nsr/applogs/msglog.log' append
(20721) rman_script = /oracle/app/oracle/product/9.2.0/bin/nmosb000003
(20721) saveset_name = /oracle/app/oracle/product/9.2.0/bin/OracleArch
(20721) Launching backup process
(20721) Backup process failed: RMAN exited with return code '1'.
(20721) nwora_nsrnmostart_rman: RMAN script execution is not successful. RMAN exited with return code '1'.
(20721) Leaving Function nwora_nsrnmostart_rman
发现是rman的脚本没有执行成功:RMAN script execution is not successful。
我们测试一下rman的脚本。根据legato界面中的group-save set,

找到脚本/oracle/app/oracle/product/9.2.0/bin/OracleArch:
connect catalog rman/rman@xjrman;
connect target sys/pwd111;
run {
allocate channel t1 type 'sbt_tape'
parms 'ENV=(NSR_CLIENT=xj_db)';
allocate channel t2 type 'sbt_tape'
parms 'ENV=(NSR_CLIENT=xj_db)';
allocate channel t3 type 'sbt_tape'
parms 'ENV=(NSR_CLIENT=xj_db)';
sql 'alter system archive log current';
crosscheck archivelog all;
backup
format "arch_%d_t%t_s%s_p%p"
(archivelog all delete input);
release channel t1;
release channel t2;
release channel t3;
}
在oracle用户下测试能备份成功!
继续检查/nsr/applogs下的msglog.log:vi msglog.log
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
RMAN> connect catalog rman/rman@xjrman;
2> connect target sys/pwd111;
3> run {
4> allocate channel t1 type 'sbt_tape'
5> parms 'ENV=(NSR_CLIENT=xj_db,
6> NSR_SERVER=xj_bak01,
7> NSR_GROUP=OracleArch,
8> NSR_SAVESET_NAME=/oracle/app/oracle/product/9.2.0/bin/OracleArch)';
9> allocate channel t2 type 'sbt_tape'
10> parms 'ENV=(NSR_CLIENT=xj_db,
11> NSR_SERVER=xj_bak01,
12> NSR_GROUP=OracleArch,
13> NSR_SAVESET_NAME=/oracle/app/oracle/product/9.2.0/bin/OracleArch)';
14> allocate channel t3 type 'sbt_tape'
15> parms 'ENV=(NSR_CLIENT=xj_db,
16> NSR_SERVER=xj_bak01,
17> NSR_GROUP=OracleArch,
18> NSR_SAVESET_NAME=/oracle/app/oracle/product/9.2.0/bin/OracleArch)';
19> sql 'alter system archive log current';
20> crosscheck archivelog all;
21> backup
22> format "arch_%d_t%t_s%s_p%p"
23> (archivelog all delete input);
24> release channel t1;
25> release channel t2;
26> release channel t3;
27> }
28>
connected to recovery catalog database
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
ORA-01031: insufficient privileges
发现rman备份的报错信息了。时候ora-1031的报错,legato是在root下安装,执行的时候,是root用户。现在root用户执行rman脚本报错,难道是root调用oracle用户的环境变量出了问题?
继续找legato的环境变量文件:
根据legato界面的backup command中的文件名,找到/opt/networker/bin/nsrnmo1

cat nsrnmo1
#!/bin/sh
#
# $Id: nsrnmo.template,v 1.3.52.4 2003/06/25 21:42:19 yozekinc Exp $ Copyright (c) 2003, Legato Systems, Inc.
#
# All rights reserved.
#
# nsrnmo.sh
#
# Legato Networker Module for Oracle 4.1
#
# This script is part of the Legato NetWorker Module for Oracle.
# Modification of this script should be done with care and only after reading
# the administration manual included with this product.
#
# This script should only be run as part of a scheduled savegroup.
#
# Returns 0 on success; 1 on failure.
#
#
# REQUIRED Variable: ORACLE_HOME
#
# Default value: NONE (site specific)
#
# Description: Specifies where the Oracle Server installation is located.
# It is a requirement that rman be located in ORACLE_HOME/bin.
#
# Samples:
# ORACLE_HOME=/disk3/oracle/app/oracle/product/8.1.6
#
ORACLE_HOME=/oracle/app/oracle/product/9.2.0
# REQUIRED Variable: PATH
#
# Default value: NONE (site and platform specific)
#
# Description: Set up the PATH environment variable.
# This must be configured to include the path to "nsrnmostart"
#
# Samples:
# PATH=/bin:/usr/sbin:/usr/bin:/nsr/bin:/opt/networker/bin
#
PATH=/bin:/usr/sbin:/usr/bin:/nsr/bin:/opt/networker/bin
#
# Optional Variable: ORACLE_SID
#
# Default value: NONE (site specific)
#
# Description: Specifies the SID of the Oracle database being backed up.
# It is required by proxy copy backups when catalog synchronization is
# enabled.
#
# Samples:
# ORACLE_SID=orcl815
#
ORACLE_SID=xjmisc
#
# Optional Variable: NSR_RMAN_ARGUMENTS
#
# Default value: NONE (site specific)
#
# Description: Provide extra rman parameters.
# You must enclose the command in quotes or it will not be
# passed correctly to rman.
#
# Samples:
# NSR_RMAN_ARGUMENTS="nocatalog msglog '/nsr/applogs/msglog.log' append"
#
# NSR_RMAN_ARGUMENTS="nocatalog"
#
NSR_RMAN_ARGUMENTS="msglog '/nsr/applogs/msglog.log' append"
#
# Optional Variable: NSR_RMAN_OUTPUT
#
# Default value: NONE (site specific)
#
# Description: Provide option to capture the RMAN standard output
# if RMAN "msglog" or "log" command line option is not set.
# The connect strings will be hidden in this file.
#
# Samples:
# NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log append"
#
# NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log"
#
NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log append"
#
# Optional Variable: NSR_SB_DEBUG_FILE
#
# Default value: NONE (site specific)
#
# Description: To enable debugging output for NMO scheduled backups set
# the following to an appropriate path and file name.
# Set this variable for debugging purposes only
#
# Samples:
# NSR_SB_DEBUG_FILE=/nsr/applogs/nsrnmostart.log
#
NSR_SB_DEBUG_FILE=
#
# Optional Variable: PRECMD
#
# Default value: NONE
#
# Description: This variable can be used to run a command or command script
# before nsrnmostart. It will be launched once for every saveset
# entered in the client setup.
#
PRECMD=
#
# Optional Variable: POSTCMD
#
# Default value: NONE
#
# Description: This variable can be used to run a command or command script
# after nsrnmostart has completed. It will be launched once for
# every saveset entered in the client setup.
#
POSTCMD=
#
# Optional Variable: SHLIB_PATH,LD_LIBRARY_PATH
#
# Default value: NONE
#
# Description: These variables may have to be set on HP-UX 11.0 (64 bit) operating systems.
# We suggest leaving it unset unless you have a scheduled backup problem.
# If it is set you must also uncomment the export SHLIB_PATH and LD_LIBRARY_PATH
# in the function export_environment_variables below.
#
# Samples:
# SHLIB_PATH=/disk3/oracle/app/oracle/product/8.1.6/lib
# LD_LIBRARY_PATH=/disk3/oracle/app/oracle/product/8.1.6/lib64
#
#
# Optional Variable: TNS_ADMIN
#
# Default value: NONE
#
# Description: This variable needs to be set if Oracle Net configuration
# files are not located in default locations.If it is set you must also uncomment
# the export TNS_ADMIN in the function export_environment_variables below.
#
# Samples:
# TNS_ADMIN=/disk3/oracle/app/oracle/product/8.1.6/network/admin1
#
export_environment_variables()
{
export ORACLE_HOME
export ORACLE_SID
export NSR_RMAN_ARGUMENTS
export NSR_RMAN_OUTPUT
export PRECMD
export POSTCMD
export PATH
export NSR_SB_DEBUG_FILE
#export SHLIB_PATH
#export LD_LIBRARY_PATH
#export TNS_ADMIN
}
###########################################################################
# Do not edit anything below this line.
###########################################################################
Pid=0 # process to kill if we are cancelled
nsrnmostart_status=0 # did it work?
#
# Handle cancel signals sent by savegrp when user stops the group.
#
handle_signal()
{
if [ $Pid != 0 ]; then
kill -2 $Pid
fi
exit 1
}
#
# The main portion of this shell.
#
#
# Make sure we respond to savegrp cancellations.
#
trap handle_signal 2 15
#
# Build the nsrnmostart command
#
opts=""
while [ $# -gt 0 ]; do
case "$1" in
-s ) # server name
opts="$opts $1 '$2'"
shift 2
;;
-N ) # save set name
opts="$opts $1 '$2'"
shift 2
;;
-e ) # expiration time
opts="$opts $1 '$2'"
shift 2
;;
-b ) # Specify pool
opts="$opts $1 '$2'"
shift 2
;;
-c ) # Specify the client name
opts="$opts $1 '$2'"
shift 2
;;
-g ) # Specify group
opts="$opts $1 '$2'"
shift 2
;;
-m ) # Specify masquerade
opts="$opts $1 '$2'"
shift 2
;;
-A ) # Specify PowerSnap options
opts="$opts $1 '$2'"
shift 2
;;
*) # rest of options
opts="$opts $1"
shift
;;
esac
done
if [ "${BACKUP_OPT}" != "" ];
then
BACKUP_COMMAND_LINE="nsrnmostart ""$BACKUP_OPT"" $opts"
else
BACKUP_COMMAND_LINE="nsrnmostart $opts"
fi
#
# Export all necessary environment variables
#
export_environment_variables
#
# Call nsrnmostart to do the backups.
#
#print $BACKUP_COMMAND_LINE
eval ${BACKUP_COMMAND_LINE} &
Pid=$!
wait $Pid
nsrnmostart_status=$?
if [ $nsrnmostart_status != 0 ] ; then
echo "nsrnmostart returned status of "$nsrnmostart_status
echo $0 "exiting."
exit 1
fi
exit 0
检查发现里面的环境变量没有问题:ORACLE_SID,ORACLE_HOME,PATH都设置正确了。
在root手工测试了一次指定环境变量,手工连target数据库:
xj_db01:[/opt/networker/bin]#export ORACLE_HOME=/oracle/app/oracle/product/9.2.0
xj_db01:[/opt/networker/bin]#export PATH=$ORACLE_HOME/bin
xj_db01:[/opt/networker/bin]#rman
Recovery Manager: Release 9.2.0.6.0 - 64bit Production
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
RMAN> connect target sys/pwd111
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
ORA-01031: insufficient privileges
RMAN> exit
发现确实root用户无法登录。
检查数据库的登录策略设置:
切换到oracle用户,sqlplus登录后:
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
remote_archive_enable string true
remote_dependencies_mode string TIMESTAMP
remote_listener string
remote_login_passwordfile string EXCLUSIVE
remote_os_authent boolean FALSE
remote_os_roles boolean FALSE
SQL>
上述策略表示除了dba组用户之外,其他用户登录需要通过密码文件验证。
进一步查看密码文件的创建时间:
oracle@xj_db01:/oracle/app/oracle/product/9.2.0/dbs > ll
total 27548
-rw-r--r-- 1 oracle dba 8385 Mar 9 2002 init.ora
-rw-r--r-- 1 oracle dba 12920 Mar 9 2002 initdw.ora
-rw-r--rw- 1 oracle dba 1041 Jun 19 2005 initxjmisc.bak
-rw-rw-rw- 1 oracle dba 70 Apr 28 2008 initxjmisc.ora
-rw-rw-rw- 1 oracle dba 36 Dec 26 2005 initxjmisc.ora.20051226
-rw-rw---- 1 oracle dba 24 Dec 3 05:07 lkXJMISC
-rwSr----- 1 oracle dba 3072 Jan 5 16:34 orapwxjmisc
-rw-rw---- 1 oracle dba 14065664 Jan 6 15:49 snapcf_xjmisc.f
oracle@xj_db01:/oracle/app/oracle/product/9.2.0/dbs >
发现密码文件的时间是最近的,因此判断最近有人改过sys用户的密码!!
咨询驻点后,确认了在5日下午,有人确实改动了sys用户的密码,将密码改成了pwd222,因此,本次故障的原因确认。
· 【文章发布信息】发表于: 2009-01-06 @ 19:07:43 · ||分类: ..experience, Working case




CopyRight ©