操作系统内核参数限制导致ora-4030


最近同事遇到一个问题,某个报表程序报错ora-4030的错误:

错误信息:[Error] **** Dbupdate.sqlUpdateSubs Function:begin ? := trt_subs_1_i_update_target (?, ?, ?, ?, ?); end;, Execute GD:1li run finished with error, errcode:-4030,errmsg:ORA-04030: out of process memory when trying to allocate 16408 bytes (koh-kghu call ,pmucalm coll),errlog:GD_1li TableA_S.(SCS or ADS) Update TableP_S, circle= 2.

通过trace,发现是以下语句的问题:

  1. declare
  2.   type subs_mobile_no_arr is table of trt_subs_GD_1li_wap_S.subs_mobile_no%type index by pls_integer;
  3.   type icp_id_arr is table of trt_subs_GD_1li_wap_S.icp_id%type index by pls_integer;
  4.   type icp_code_arr is table of trt_subs_GD_1li_wap_S.icp_code%type index by pls_integer;
  5.   type channel_id_arr is table of trt_subs_GD_1li_wap_S.channel_id%type index by pls_integer;
  6.   type serv_id_arr is table of trt_subs_GD_1li_wap_S.serv_id%type index by pls_integer;
  7.   type brand_id_arr is table of trt_subs_GD_1li_wap_S.brand_id%type index by pls_integer;
  8.   type fee_mobile_no_arr is table of trt_subs_GD_1li_wap_S.fee_mobile_no%type index by pls_integer;
  9.   type action_id_arr is table of trt_subs_GD_1li_wap_S.action_id%type index by pls_integer;
  10.   type crt_his_date_arr is table of trt_subs_GD_1li_wap_S.crt_his_date%type index by pls_integer;
  11.   type city_id_arr is table of trt_subs_GD_1li_wap_S.city_id%type index by pls_integer;
  12.   type attr_arr is table of trt_subs_GD_1li_wap_S.attr%type index by pls_integer;
  13.   type srh_channel_id_arr is table of trt_subs_GD_1li_wap_S.srh_channel_id%type index by pls_integer;
  14.   arr_subs_mobile_no subs_mobile_no_arr;
  15.   arr_icp_id         icp_id_arr;
  16.   arr_icp_code       icp_code_arr;
  17.   arr_channel_id     channel_id_arr;
  18.   arr_serv_id        serv_id_arr;
  19.   arr_brand_id       brand_id_arr;
  20.   arr_fee_mobile_no  fee_mobile_no_arr;
  21.   arr_action_id      action_id_arr;
  22.   arr_crt_his_date   crt_his_date_arr;
  23.   arr_city_id        city_id_arr;
  24.   arr_attr           attr_arr;
  25.   arr_srh_channel_id srh_channel_id_arr;
  26. begin
  27.   select subs_mobile_no,
  28.          icp_id,
  29.          icp_code,
  30.          channel_id,
  31.          serv_id,
  32.          brand_id,
  33.          fee_mobile_no,
  34.          decode(action_id, 'SCS', 'C', 'ADS', 'D') action_id,
  35.          crt_his_date,
  36.          city_id,
  37.          attr,
  38.          srh_channel_id bulk collect
  39.     into arr_subs_mobile_no,
  40.          arr_icp_id,
  41.          arr_icp_code,
  42.          arr_channel_id,
  43.          arr_serv_id,
  44.          arr_brand_id,
  45.          arr_fee_mobile_no,
  46.          arr_action_id,
  47.          arr_crt_his_date,
  48.          arr_city_id,
  49.          arr_attr,
  50.          arr_srh_channel_id
  51.     from trt_subs_GD_1li_wap_S
  52.    where action_id = 'SCS'
  53.       or action_id = 'ADS';
  54.   if arr_subs_mobile_no.last > 0 then
  55.     arr_srh_channel_id srh_channel_id_arr;
  56.     begin
  57.       select subs_mobile_no,
  58.              icp_id,
  59.              icp_code,
  60.              channel_id,
  61.              serv_id,
  62.              brand_id,
  63.              fee_mobile_no,
  64.              decode(action_id, 'SCS', 'C', 'ADS', 'D') action_id,
  65.              crt_his_date,
  66.              city_id,
  67.              attr,
  68.              srh_channel_id bulk collect
  69.         into arr_subs_mobile_no,
  70.              arr_icp_id,
  71.              arr_icp_code,
  72.              arr_channel_id,
  73.              arr_serv_id,
  74.              arr_brand_id,
  75.              arr_fee_mobile_no,
  76.              arr_action_id,
  77.              arr_crt_his_date,
  78.              arr_city_id,
  79.              arr_attr,
  80.              arr_srh_channel_id
  81.         from trt_subs_GD_1li_wap_S
  82.        where action_id = 'SCS'
  83.           or action_id = 'ADS';
  84.       if arr_subs_mobile_no.last > 0 then
  85.         forall i in arr_subs_mobile_no.first .. arr_subs_mobile_no.last
  86.           update f_wap_subs_GD_p_s c
  87.              set c.city_id                = arr_city_id(i),
  88.                  c.icp_id                 = arr_icp_id(i),
  89.                  c.icp_code               = arr_icp_code(i),
  90.                  c.subs_cancel_channel_id = arr_channel_id(i),
  91.                  c.brand_id               = arr_brand_id(i),
  92.                  c.subs_status_id         = arr_action_id(i),
  93.                  c.fee_mobile_no          = arr_fee_mobile_no(i),
  94.                  c.usubs_date             = arr_crt_his_date(i),
  95.                  c.usubs_day              = to_char(arr_crt_his_date(i),
  96.                                                     'YYYYMMDD'),
  97.                  c.idx_day                = to_char(arr_crt_his_date(i),
  98.                                                     'YYYYMMDD'),
  99.                  c.is_subs_valid          = 0,
  100.                  c.flow_time              = sysdate,
  101.                  c.srh_channel_id         = arr_srh_channel_id(i)
  102.            where subs_mobile_no = arr_subs_mobile_no(i)
  103.              and serv_id = arr_serv_id(i)
  104.              and is_subs_valid = 1;
  105.       end if;
  106.     end;

这个sql在执行的时候,pga的内存被迅速的消耗,通过v$sesstat看到这个session每次使用pga快到2G时,就报错了。而且每次都是这样。所以一开始,我们怀疑是不是pga不够,增加pga大小为3G继续观察程序的运行情况,结果发现还是一样的报错。

在metalink上发现类似的有个bug:Bug 4994859: ORA-04030 WHILE EXECUTING A PROCEDURE CONTAINING BULK COLLECT。于是打算分2步走:
1、向开发建议修改程序,不使用BULK COLLECT(上面sql的85行处)
2、由于bug是在Sun Solaris SPARC (64-bit)上的9207版本,我们的是HP UX的9208版本,可能不是这个bug。所以向oracle开SR进行确认。

oracle回复:

  1. Basically, an ORA-4030 indicates a limit has been reached with respect to process private memory allocation. An ORA-4030 error is an error in the PGA; Memory limitation related to an Oracle rdbms server process is reached.
  2.  
  3. Typical causes:
  4.  
  5. * OS Memory limit reached such as physical memory and/or swap/virtual paging
  6. * OS limits reached such as kernel or user shell limits that limit overall, user level or process level memory usage
  7. * OS limit on PGA memory size due to SGA attach address <<Note 262540.1>> Relocate SGABEG on 64-bit Oracle
  8. * Oracle internal limit example Bug 3130972
  9. * Application design causing limits to be reached
  10. * Bug – space leaks, heap leaks

于是用RDA收集主机信息,发现有一个参数设置:

Comments
--------
Ulimit
=====
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) 2097152
<----
stack(kbytes) 131072
memory(kbytes) unlimited
coredump(blocks) 4194303
nofiles(descriptors) 4096

从这里我们看到ulimit -d参数被限制了,大小为2G,我们通过修改2个内核参数从而达到修改这个参数的目的:
1、将maxdsiz从2147483648 (2G) 调整到4187593114(4G)
2、将maxdsiz_64bit从2147483648 (2G)调整6442450944 (6G).
ulimit -d自动的变成:

data(kbytes) 4063168

再次运行原来的应用程序,发现不再报错。

· 【文章发布信息】发表于: 2010-02-28 @ 12:13:34 · ||分类: ..experience, Working case

1 条评论 »

  1. 匿名 于 2010-03-05 @ 11:22:44 留言

    最好将程序中的 “批量绑定” 修改为游标方式。
    曾经遇到类似问题, 将PGA设置非常大, ulimits改为无限大,都不能解决问题, 最后让项目组把代码改成游标,问题终于解决。

RSS 为此帖反馈评论 · 反向跟踪 网站

留条评论