修改hosts后,rac通讯失败
今天收到个告警某省的一个数据库的一个节点down了,重启后,只能到started状态,数据库无法open,登录上去后,看到alertlog中:
Mon Jun 15 15:38:28 2009
Errors in file /oracle/app/oracle/admin/fjmisc/udump/fjmisc2_ora_26950.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:gethostbyname failed with status: 3
ORA-27301: OS failure message: No such process
ORA-27302: failure occurred at: sskgxpmyip2
ORA-27303: additional information: nodename FJ_DB02
Mon Jun 15 15:41:19 2009
Errors in file /oracle/app/oracle/admin/fjmisc/udump/fjmisc2_ora_27062.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:gethostbyname failed with status: 3
ORA-27301: OS failure message: No such process
ORA-27302: failure occurred at: sskgxpmyip2
ORA-27303: additional information: nodename FJ_DB02
Mon Jun 15 15:42:37 2009
Errors in file /oracle/app/oracle/admin/fjmisc/udump/fjmisc2_ora_27147.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:gethostbyname failed with status: 3
ORA-27301: OS failure message: No such process
ORA-27302: failure occurred at: sskgxpmyip2
ORA-27303: additional information: nodename FJ_DB02
Mon Jun 15 15:46:19 2009
Errors in file /oracle/app/oracle/admin/fjmisc/udump/fjmisc2_ora_27362.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
Errors in file /oracle/app/oracle/admin/fjmisc/udump/fjmisc2_ora_26950.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:gethostbyname failed with status: 3
ORA-27301: OS failure message: No such process
ORA-27302: failure occurred at: sskgxpmyip2
ORA-27303: additional information: nodename FJ_DB02
Mon Jun 15 15:41:19 2009
Errors in file /oracle/app/oracle/admin/fjmisc/udump/fjmisc2_ora_27062.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:gethostbyname failed with status: 3
ORA-27301: OS failure message: No such process
ORA-27302: failure occurred at: sskgxpmyip2
ORA-27303: additional information: nodename FJ_DB02
Mon Jun 15 15:42:37 2009
Errors in file /oracle/app/oracle/admin/fjmisc/udump/fjmisc2_ora_27147.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:gethostbyname failed with status: 3
ORA-27301: OS failure message: No such process
ORA-27302: failure occurred at: sskgxpmyip2
ORA-27303: additional information: nodename FJ_DB02
Mon Jun 15 15:46:19 2009
Errors in file /oracle/app/oracle/admin/fjmisc/udump/fjmisc2_ora_27362.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
当时第一个反映是gethostbyname failed 应该是主机名解析有问题了。但是ping的时候发现还是能ping通:
oracle@FJ_DB02:/oracle/app/oracle/admin/fjmisc/bdump > ping FJ_DB01
PING FJ_DB01: 64 byte packets
64 bytes from 10.203.17.1: icmp_seq=0. time=0. ms
64 bytes from 10.203.17.1: icmp_seq=1. time=0. ms
64 bytes from 10.203.17.1: icmp_seq=2. time=1. ms
64 bytes from 10.203.17.1: icmp_seq=3. time=3. ms
64 bytes from 10.203.17.1: icmp_seq=4. time=0. ms
----FJ_DB01 PING Statistics----
5 packets transmitted, 5 packets received, 0% packet loss
round-trip (ms) min/avg/max = 0/1/3
oracle@FJ_DB02:/oracle/app/oracle/admin/fjmisc/bdump >
PING FJ_DB01: 64 byte packets
64 bytes from 10.203.17.1: icmp_seq=0. time=0. ms
64 bytes from 10.203.17.1: icmp_seq=1. time=0. ms
64 bytes from 10.203.17.1: icmp_seq=2. time=1. ms
64 bytes from 10.203.17.1: icmp_seq=3. time=3. ms
64 bytes from 10.203.17.1: icmp_seq=4. time=0. ms
----FJ_DB01 PING Statistics----
5 packets transmitted, 5 packets received, 0% packet loss
round-trip (ms) min/avg/max = 0/1/3
oracle@FJ_DB02:/oracle/app/oracle/admin/fjmisc/bdump >
奇怪了,但是能ping通为啥报这个错?
询问了当地的同事,说修改了hosts文件,但是现在已经改回去了。进一步检查hosts文件,发现内容虽然已经改回去,但是权限没改回去:
drwxr-xr-x 2 bin bin 96 Dec 26 2006 gss
-rw-r--r-- 1 bin bin 489 Dec 27 2006 hba.conf
-rw------- 1 bin bin 847 Jun 15 15:08 hosts
-r--r--r-- 1 root sys 847 Jun 15 14:57 hosts.bak
-r--r--r-- 1 root sys 626 Dec 28 2006 hosts.old
-rw-r--r-- 1 bin bin 489 Dec 27 2006 hba.conf
-rw------- 1 bin bin 847 Jun 15 15:08 hosts
-r--r--r-- 1 root sys 847 Jun 15 14:57 hosts.bak
-r--r--r-- 1 root sys 626 Dec 28 2006 hosts.old
oracle用户cat hosts文件是cat不出来的。
于是改回这个文件的权限为444,重启数据库(注意先用abort shutdown数据库),问题解决:
Mon Jun 15 15:52:09 2009
Shutting down instance (abort)
License high water mark = 2
Instance terminated by USER, pid = 28002
Mon Jun 15 15:52:34 2009
Starting ORACLE instance (normal)
Mon Jun 15 15:52:34 2009
Global Enqueue Service Resources = 36504, pool = 8
Mon Jun 15 15:52:34 2009
Global Enqueue Service Enqueues = 54768
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Mon Jun 15 15:52:50 2009
SCN scheme 1
Mon Jun 15 15:53:16 2009
Using log_archive_dest parameter default value
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.6.0.
System parameters with non-default values:
processes = 1500
timed_statistics = TRUE
shared_pool_size = 536870912
large_pool_size = 218103808
java_pool_size = 218103808
spfile = /dev/vg_rac3/rG3_spfile_128m_01
control_files = /dev/vg_rac1/rG1_ctrl_256m_01, /dev/vg_rac2/rG2_ctrl_256m_02, /dev/vg_rac3/rG3_ctrl_256m_03
db_block_size = 8192
db_cache_size = 8589934592
compatible = 9.2.0.0.0
log_archive_start = TRUE
log_archive_dest_1 = location=/archlog2
log_archive_format = fjmisc2_%t_%s.arc
log_buffer = 4194304
db_files = 2000
db_file_multiblock_read_count= 16
cluster_database = TRUE
cluster_database_instances= 2
thread = 2
fast_start_mttr_target = 800
instance_number = 2
undo_management = AUTO
undo_tablespace = UNDOTBS2
undo_suppress_errors = TRUE
undo_retention = 21600
remote_login_passwordfile= EXCLUSIVE
db_domain =
instance_name = fjmisc2
local_listener = fjmisc2
remote_listener = fjmisc
job_queue_processes = 10
hash_join_enabled = TRUE
background_dump_dest = /oracle/app/oracle/admin/fjmisc/bdump
user_dump_dest = /oracle/app/oracle/admin/fjmisc/udump
max_dump_file_size = 5120
core_dump_dest = /oracle/app/oracle/admin/fjmisc/cdump
sort_area_size = 524288
db_name = fjmisc
open_cursors = 400
star_transformation_enabled= FALSE
query_rewrite_enabled = TRUE
pga_aggregate_target = 838860800
Mon Jun 15 15:53:16 2009
cluster interconnect IPC version:Oracle UDP/IP
IPC Vendor 1 proto 2 Version 1.0
PMON started with pid=2
DIAG started with pid=3
LMON started with pid=4
LMD0 started with pid=5
LMS0 started with pid=6
LMS1 started with pid=7
DBW0 started with pid=8
LGWR started with pid=9
CKPT started with pid=10
SMON started with pid=11
RECO started with pid=12
CJQ0 started with pid=13
Mon Jun 15 15:53:20 2009
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=14
ARC0: Archival started
ARC1 started with pid=15
Mon Jun 15 15:53:20 2009
ARCH: STARTING ARCH PROCESSES COMPLETE
Mon Jun 15 15:53:20 2009
ARC1: Archival started
Mon Jun 15 15:53:20 2009
ARC1: Thread not mounted
Mon Jun 15 15:53:20 2009
ARC0: Thread not mounted
Mon Jun 15 15:53:20 2009
ALTER DATABASE MOUNT
Mon Jun 15 15:53:21 2009
lmon registered with NM - instance id 2 (internal mem no 1)
Mon Jun 15 15:53:21 2009
Reconfiguration started (old inc 0, new inc 4)
List of nodes:
0 1
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 0
0 GCS shadows traversed, 0 cancelled, 0 closed
0 GCS resources traversed, 0 cancelled
set master node info
Submitted all remote-enqueue requests
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
0 GCS shadows traversed, 0 replayed, 0 unopened
Submitted all GCS remote-cache requests
0 write requests issued in 480118 GCS resources
116 PIs marked suspect, 0 flush PI msgs
Mon Jun 15 15:53:29 2009
Reconfiguration complete
LCK0 started with pid=17
Mon Jun 15 15:53:35 2009
Successful mount of redo thread 2, with mount id 248173721
Mon Jun 15 15:53:35 2009
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE).
Completed: ALTER DATABASE MOUNT
Mon Jun 15 15:53:35 2009
ALTER DATABASE OPEN
Picked Lamport scheme to generate SCNs
Mon Jun 15 15:53:39 2009
LGWR: Primary database is in CLUSTER CONSISTENT mode
Thread 2 advanced to log sequence 18067
Thread 2 opened at log sequence 18067
Current log# 8 seq# 18067 mem# 0: /dev/vg_rac1/rG1_redo_512m_204
Current log# 8 seq# 18067 mem# 1: /dev/vg_rac2/rG2_redo_512m_214
Current log# 8 seq# 18067 mem# 2: /dev/vg_rac3/rG3_redo_512m_224
Successful open of redo thread 2
Mon Jun 15 15:53:39 2009
SMON: enabling cache recovery
Mon Jun 15 15:53:39 2009
ARC0: Evaluating archive log 6 thread 2 sequence 18066
ARC0: Beginning to archive log 6 thread 2 sequence 18066
Creating archive destination LOG_ARCHIVE_DEST_1: '/archlog2/fjmisc2_2_18066.arc'
Mon Jun 15 15:53:40 2009
Successfully onlined Undo Tablespace 3.
Mon Jun 15 15:53:40 2009
SMON: enabling tx recovery
Mon Jun 15 15:53:40 2009
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: ALTER DATABASE OPEN
Shutting down instance (abort)
License high water mark = 2
Instance terminated by USER, pid = 28002
Mon Jun 15 15:52:34 2009
Starting ORACLE instance (normal)
Mon Jun 15 15:52:34 2009
Global Enqueue Service Resources = 36504, pool = 8
Mon Jun 15 15:52:34 2009
Global Enqueue Service Enqueues = 54768
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Mon Jun 15 15:52:50 2009
SCN scheme 1
Mon Jun 15 15:53:16 2009
Using log_archive_dest parameter default value
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.6.0.
System parameters with non-default values:
processes = 1500
timed_statistics = TRUE
shared_pool_size = 536870912
large_pool_size = 218103808
java_pool_size = 218103808
spfile = /dev/vg_rac3/rG3_spfile_128m_01
control_files = /dev/vg_rac1/rG1_ctrl_256m_01, /dev/vg_rac2/rG2_ctrl_256m_02, /dev/vg_rac3/rG3_ctrl_256m_03
db_block_size = 8192
db_cache_size = 8589934592
compatible = 9.2.0.0.0
log_archive_start = TRUE
log_archive_dest_1 = location=/archlog2
log_archive_format = fjmisc2_%t_%s.arc
log_buffer = 4194304
db_files = 2000
db_file_multiblock_read_count= 16
cluster_database = TRUE
cluster_database_instances= 2
thread = 2
fast_start_mttr_target = 800
instance_number = 2
undo_management = AUTO
undo_tablespace = UNDOTBS2
undo_suppress_errors = TRUE
undo_retention = 21600
remote_login_passwordfile= EXCLUSIVE
db_domain =
instance_name = fjmisc2
local_listener = fjmisc2
remote_listener = fjmisc
job_queue_processes = 10
hash_join_enabled = TRUE
background_dump_dest = /oracle/app/oracle/admin/fjmisc/bdump
user_dump_dest = /oracle/app/oracle/admin/fjmisc/udump
max_dump_file_size = 5120
core_dump_dest = /oracle/app/oracle/admin/fjmisc/cdump
sort_area_size = 524288
db_name = fjmisc
open_cursors = 400
star_transformation_enabled= FALSE
query_rewrite_enabled = TRUE
pga_aggregate_target = 838860800
Mon Jun 15 15:53:16 2009
cluster interconnect IPC version:Oracle UDP/IP
IPC Vendor 1 proto 2 Version 1.0
PMON started with pid=2
DIAG started with pid=3
LMON started with pid=4
LMD0 started with pid=5
LMS0 started with pid=6
LMS1 started with pid=7
DBW0 started with pid=8
LGWR started with pid=9
CKPT started with pid=10
SMON started with pid=11
RECO started with pid=12
CJQ0 started with pid=13
Mon Jun 15 15:53:20 2009
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=14
ARC0: Archival started
ARC1 started with pid=15
Mon Jun 15 15:53:20 2009
ARCH: STARTING ARCH PROCESSES COMPLETE
Mon Jun 15 15:53:20 2009
ARC1: Archival started
Mon Jun 15 15:53:20 2009
ARC1: Thread not mounted
Mon Jun 15 15:53:20 2009
ARC0: Thread not mounted
Mon Jun 15 15:53:20 2009
ALTER DATABASE MOUNT
Mon Jun 15 15:53:21 2009
lmon registered with NM - instance id 2 (internal mem no 1)
Mon Jun 15 15:53:21 2009
Reconfiguration started (old inc 0, new inc 4)
List of nodes:
0 1
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 0
0 GCS shadows traversed, 0 cancelled, 0 closed
0 GCS resources traversed, 0 cancelled
set master node info
Submitted all remote-enqueue requests
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
0 GCS shadows traversed, 0 replayed, 0 unopened
Submitted all GCS remote-cache requests
0 write requests issued in 480118 GCS resources
116 PIs marked suspect, 0 flush PI msgs
Mon Jun 15 15:53:29 2009
Reconfiguration complete
LCK0 started with pid=17
Mon Jun 15 15:53:35 2009
Successful mount of redo thread 2, with mount id 248173721
Mon Jun 15 15:53:35 2009
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE).
Completed: ALTER DATABASE MOUNT
Mon Jun 15 15:53:35 2009
ALTER DATABASE OPEN
Picked Lamport scheme to generate SCNs
Mon Jun 15 15:53:39 2009
LGWR: Primary database is in CLUSTER CONSISTENT mode
Thread 2 advanced to log sequence 18067
Thread 2 opened at log sequence 18067
Current log# 8 seq# 18067 mem# 0: /dev/vg_rac1/rG1_redo_512m_204
Current log# 8 seq# 18067 mem# 1: /dev/vg_rac2/rG2_redo_512m_214
Current log# 8 seq# 18067 mem# 2: /dev/vg_rac3/rG3_redo_512m_224
Successful open of redo thread 2
Mon Jun 15 15:53:39 2009
SMON: enabling cache recovery
Mon Jun 15 15:53:39 2009
ARC0: Evaluating archive log 6 thread 2 sequence 18066
ARC0: Beginning to archive log 6 thread 2 sequence 18066
Creating archive destination LOG_ARCHIVE_DEST_1: '/archlog2/fjmisc2_2_18066.arc'
Mon Jun 15 15:53:40 2009
Successfully onlined Undo Tablespace 3.
Mon Jun 15 15:53:40 2009
SMON: enabling tx recovery
Mon Jun 15 15:53:40 2009
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: ALTER DATABASE OPEN
在这里,我们注意一下,在hpux的环境下,哪怕某用户没有cat /etc/hosts的权限,他也是能ping通主机名的,这个和linux还是有点区别的。但是rac是需要读取hosts文件的,因此oracle用户必须有读hosts的权限。
· 【文章发布信息】发表于: 2009-06-15 @ 16:35:47 · ||分类: ..experience, Working case



CopyRight ©