본문 바로가기

Oracle/Oracle testing

백업꼭 합시다. 11g DB날리다.

시골에 가있는 동안 . 새벽에 11g에서 테스트를 하나 했다.
대단한 테스트는 아니고.

11g에서 트랜잭션을 일으켜서
주요 PROCESS 에서 발생하는 이벤트 구경하기를 했다.

그런데 ㅡ_ㅡ;;; 이짓을 하다가 db를 날렸다.
백업도 안했는데 DB를 날려서 다시 create db작업을 한 과정을 작성한다.

사실 아래 테스트는 내가 회사다니면서 
직접 일을 저지른 일이다. ㅡ_ㅡ;;;;

O/S : Linux 64Bit Centos5
Oracle Version : 11g R2

총 3개의 세션에서 일을 시키고 그 과정을 trace를 했다.
세션 1.  -- select 문 order 많이 한
begin
        for i in 1..10000000 loop
        execute immediate 'select *from sh.sales order by 1,2,3,4,5 desc';
        end loop;
end;
/

세션 2. -- update and rollback
begin
        for i in 1..1000000 loop
        execute immediate 'update sh.sales set amount_sold = '||i;
        rollback;
        end loop;
end;
/

세션 3. -- memory 초기화
begin
        for i in 1..100000 loop
        execute immediate 'alter system flush buffer_cache';
        execute immediate 'alter system flush shared_pool';
        end loop;
end;
/


3개의 세션에서 작업을 진행하면서
CKPT, LGWR , DBWR, SOMN, ARC 에서 이벤트를 구경했다.

많이 보여진 이벤트순으로 나열한것이다.

::: LGWR :::
1. log file parallel write
2. control file sequential read
3. log file single write
4. LGWR wait for redo copy
5. enq: CF - contention

::: CKPT :::
1. control file sequential read
2. control file parallel write
3. db file sequential read
4. db file single write
5. latch: cache buffers chains
6. latch: cache buffers lru chain
7. latch free

::: DBWR :::
1. control file sequential read
2. db file async I/O submit
3. Disk file operations I/O
4. latch free
5. latch: cache buffers lru chain
6. latch: object queue header operation
latch: cache buffers chains

::: SMON :::
1287049654689186  SYS_FBA_FA
1287049654689992  insert smon_scn_time
1287049654690709  cdef$
1287049654691248  hist_head$
1287049654693582  cdef$
histgrm$
lock table sys.mon_mods$ in exclusive mode nowait
update sys.mon_mods$
hist_head$
lock table sys.mon_mods$ in exclusive mode nowait
update sys.mon_mods$
lock table sys.mon_mods$ in exclusive mode nowait
update sys.mon_mods$
lock table sys.col_usage$ in exclusive mode nowait
update sys.col_usage$
insert into sys.col_usage$
obj$

::: ARC :::
1. latch free
2. Disk file operations I/O
3. control file sequential read
4. ARCH wait for archivelog lock
5. log file sequential read
6. control file parallel write
7. Log archive I/O

ORA-19815: WARNING: db_recovery_file_dest_size of 4070572032 bytes is 100.00% used, and has 0 remaining bytes available.
************************************************************************
You have following choices to free up space from recovery area:
1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard,
   then consider changing RMAN ARCHIVELOG DELETION POLICY.

2. Back up files to tertiary device such as tape using RMAN
   BACKUP RECOVERY AREA command.

3. Add disk space and increase db_recovery_file_dest_size parameter to
   reflect the new space.

4. Delete unnecessary files using RMAN DELETE command. If an operating
   system command was used to delete files, then use RMAN CROSSCHECK and
   DELETE EXPIRED commands.
************************************************************************
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 47035392 bytes disk space from 4070572032 limit
*** 2010-10-14 19:30:08.446 4132 krsh.c
ARC0: Error 19809 Creating archive log file to '/u02/flash_recovery_area/ORCL/archivelog/2010_10_14/o1_mf_1_89_%u_.arc'
*** 2010-10-14 19:30:08.446 2747 krsi.c
krsi_dst_fail: dest:1 err:19809 force:0 blast:1


DDE: Problem Key 'ORA 312' was flood controlled (0x1) (no incident)
ORA-00312: online log 2 thread 1: '/u02/oradata/orcl/redo02.log'
*** 2010-10-14 19:30:08.453 4529 kcrr.c

DDE rules only execution for: ORA 312
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
----- START DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (Async) -----
Successfully dispatched
----- END DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (SUCCESS, 0 csec) -----
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
*** 2010-10-14 19:32:11.774 4529 kcrr.c

#######################################################################################
위와 같은 이벤트들이 발생하고 에러가 떨어지는모습을 봤다.
순식간에 많은 트랜잭션이 발생하여 대량의  로그가 발생하여 풀발생.

::: 상황발생 :::
1. 아카이브 작업 실패!

짧은시간에 큰 트랜잭션을 시도했더니. 아카이브를 쓰면서 너무 많이쓰다가 쓰기를 포기했나보다.
죽지도 않는다 ㅡ_ㅡ;;;

2. shutdown abort 시도 완료.

예상대로

3. SQL> startup
ORACLE instance started.

Total System Global Area  839282688 bytes
Fixed Size                  2217992 bytes
Variable Size             478152696 bytes
Database Buffers          352321536 bytes
Redo Buffers                6590464 bytes
Database mounted.
ORA-03113: end-of-file on communication channel
Process ID: 20784
Session ID: 125 Serial number: 5


SQL> select status from v$instance;
ERROR:
ORA-03114: not connected to ORACLE


SQL>

실패;;;;


4. recover database until cancel;
alter database open resetlogs;

alert 로그를 보니.

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_21014.trc:
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 47035392 bytes disk space from 4070572032 limit
ARCH: Error 19809 Creating archive log file to '/u02/flash_recovery_area/ORCL/archivelog/20
10_10_14/o1_mf_1_90_%u_.arc'
RESETLOGS after complete recovery through change 1360520
Resetting resetlogs activation ID 1260103569 (0x4b1ba791)
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_21014.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 1 of thread 1 is not current copy
ORA-00312: online log 1 thread 1: '/u02/oradata/orcl/redo01.log'
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_21014.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 2 of thread 1 is not current copy
ORA-00312: online log 2 thread 1: '/u02/oradata/orcl/redo02.log'
Thu Oct 14 20:41:32 2010
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_21104.trc:
ORA-00316: log 1 of thread 1, type 0 in header is not log file
ORA-00312: online log 1 thread 1: '/u02/oradata/orcl/redo01.log'
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_21014.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 3 of thread 1 is not current copy
ORA-00312: online log 3 thread 1: '/u02/oradata/orcl/redo03.log'
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_21104.trc:
ORA-00316: log 2 of thread 1, type 0 in header is not log file
ORA-00312: online log 2 thread 1: '/u02/oradata/orcl/redo02.log'
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_21104.trc:
ORA-00316: log 3 of thread 1, type 0 in header is not log file
ORA-00312: online log 3 thread 1: '/u02/oradata/orcl/redo03.log'
Checker run found 7 new persistent data failures
Thu Oct 14 20:41:33 2010
Setting recovery target incarnation to 3
Thu Oct 14 20:41:33 2010
Assigning activation ID 1260161751 (0x4b1c8ad7)
LGWR: STARTING ARCH PROCESSES
Thu Oct 14 20:41:33 2010
ARC0 started with pid=20, OS id=21108
ARC0: Archival started


open 은 되었고
5. 다시 shutdown immediate 를 시도.

헛~


SQL*Plus: Release 11.2.0.1.0 Production on Thu Oct 14 20:48:25 2010

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Enter user-name: /as sysdba
Connected.
SQL>

alert 로그에....
ORA-19815: WARNING: db_recovery_file_dest_size of 4070572032 bytes is 100.00% used, and has 0 remaining bytes available.

결국 또 먹통
6. 그래서 arc 를 전부 지웠다.

그리고나서 다시 shutdown immediate 시도 헛.
안된다....

7. 먹통 os상에서 oracle process 전부 종료.
8. 다시 sqlplus들어가니.
??? 아무메세지가 안나온다.

SQL*Plus: Release 11.2.0.1.0 Production on Thu Oct 14 20:48:25 2010

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Enter user-name: /as sysdba
Connected.
SQL>

ㅡ_ㅡ;;;;;

어떻게 할까 최고의 명답안 리부팅을 할것인가 말것인가????
흠.... 왜... process에 아무것도 없는데 저렇게 될까나....


9. 리부팅 시작..


답이 안나온다..

백업안하고 이짓저짓하다가 결국 날렸다 ㅡ_ㅡ;;;;

※ 물론 딱 이상황의 문제는 DISK FULL이었다.
개별로 다시 위와같은 상황을 만들어서 에러가 발생한 상황을 만들어서 alert_log에 FULL 메세지가 뜨면
다른세션에서 ALTER SYSTEM SET db_recovery_file_dest_size='5G'; 를 실행하면 해결은 된다.
그러면 다시 아카이브를 쓰고 작업을 하게된다.
( 물론 사이즈를 늘리고 바로 작업이 되는게 아니라 좀 대기시간이 발생하고 나서
계속 archive 작업을 한다. )

아래는 alter 작업을 한 결과 alert_log에 나온 메세지이다.

db_recovery_file_dest_size of 5120 MB is 75.35% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Mon Oct 18 19:57:10 2010
ALTER SYSTEM SET db_recovery_file_dest_size='5G' SCOPE=BOTH;

SQL> alter database open resetlogs;
alter database open resetlogs
*

ERROR at line 1:
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 52428800 bytes disk space from 1073741824 limit

SQL> alter system set db_recovery_file_dest_size='3G';

System altered.

SQL> alter database open resetlogs;

Database altered.

SQL>




아무튼.
백업의 중요성과 ㅡ_ㅡ;;;
짧은시간의 트랜잭션이 많을경우는 충분한 db_recovery_file_dest_size , 그리고
큰 redo 사이즈.... 가 중요하다는것을 뼈저리게 느꼈다.