현재 프로젝트에서 SQOOP 과 압축과 저장포멧을 선택해야해서 간단하게 테스트를 했다. 


테스트항목은 sqoop 을 통해서 oracle 데이터를 hadoop 에 넣을때 

snappy의 압축 / 비압축  

text plain / parquet 포멧

이 두가지 종류로 총4 개의 테스트를 진행한다.




테스트 장비의 간단한 스펙 


Host 장비 : 

CPU : Xeon CPU E5-2620 v3 * 2 HT ( Total 24 Core )

RAM : 256GB

HDD : PCI-E(NVMe) Vm OS , SATA (hadoop , oracle data )




guest os 스펙 


HADOOP ECO SYSTEM


vm node spec 

core : 16core ( 4socket , 4core )

ram : 16GB

1 name node , 4 data node , 1 ambari-util server 구성 



ORACLE


vm node spec

core : 8core ( 4socket , 2core )

ram : 8GB

1 single node : Oracle DB (Enterprise Single Node) 12C



원천데이터 (SOURCE) 설명 

이전테스트 실패에서 알수있듯이 랜덤으로 텍스트를 가득 채웠더니 압축률이 0%가 나왔다. 근거는

압축과 비압축 전송총량이 동일했다. 

해당 테스트 결과는 아래에 있다. 


위와 같은 문제로 일반적인 데이터 즉 무언가 문장이 있고 숫자가있는 데이터를 만들기로 하고 oracle dictionary 에 있는 comment 를 가지고 테스트데이터를 만들었다. 


테이블 구조 : 파티션테이블이 아닌 일반 힙테이블.


CREATE TABLE HDFS_TEST5

(

TABLE_NAME VARCHAR2(128),

LVL NUMBER,

COMMENTS VARCHAR(4000),

REG_DT VARCHAR2(19),

SEQ NUMBER 

)


대량 들어있는 데이터를 보면.

약간 데이터 스럽게 생겼다 ㅎㅎ;;


ROW COUNT : 11,620,800건




테스트 1.

========================= 압축률이 0% 인 결과를 보여주는 테스트 ============================


CASE 1. 파일포멧 : text plain , 비압축


명령어

sqoop import --target-dir=/dev/test/data_nc_txt --table HDFS_3_SUB -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test/data_nc_txt --table HDFS_3_SUB -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 17:46:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 17:46:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 17:46:04 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 17:46:06 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 17:46:06 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 17:46:06 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 17:46:06 INFO tool.CodeGenTool: Beginning code generation

16/08/10 17:46:06 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM HDFS_3_SUB WHERE 0=1

16/08/10 17:46:06 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM "HDFS_3_SUB" WHERE 1=0

16/08/10 17:46:06 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/7e534716bf2036f166e1b14257055d00/HDFS_3_SUB.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 17:46:09 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/7e534716bf2036f166e1b14257055d00/HDFS_3_SUB.jar

16/08/10 17:46:09 INFO mapreduce.ImportJobBase: Beginning import of HDFS_3_SUB

16/08/10 17:46:11 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM "HDFS_3_SUB" WHERE 1=0

16/08/10 17:46:12 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 17:46:12 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 17:46:15 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 17:46:15 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 17:46:15 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 17:46:15 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810174606KST'); 

end;

16/08/10 17:46:15 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 17:46:15 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 17:46:15 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 17:46:17 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 72704 blocks that have been divided into 128 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 17:46:17 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 17:46:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0001

16/08/10 17:46:18 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0001

16/08/10 17:46:18 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0001/

16/08/10 17:46:18 INFO mapreduce.Job: Running job: job_1470815003334_0001

16/08/10 17:46:26 INFO mapreduce.Job: Job job_1470815003334_0001 running in uber mode : false

16/08/10 17:46:26 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 17:46:40 INFO mapreduce.Job:  map 7% reduce 0%

16/08/10 17:46:41 INFO mapreduce.Job:  map 32% reduce 0%

16/08/10 17:46:42 INFO mapreduce.Job:  map 57% reduce 0%

16/08/10 17:46:43 INFO mapreduce.Job:  map 80% reduce 0%

16/08/10 17:46:44 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 17:46:46 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 17:46:46 INFO mapreduce.Job: Job job_1470815003334_0001 completed successfully

16/08/10 17:46:46 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=614156

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=11629

HDFS: Number of bytes written=505000000

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=116770

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=58385

Total vcore-seconds taken by all map tasks=58385

Total megabyte-seconds taken by all map tasks=89679360

Map-Reduce Framework

Map input records=500000

Map output records=500000

Input split bytes=11629

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1801

CPU time spent (ms)=75560

Physical memory (bytes) snapshot=1357881344

Virtual memory (bytes) snapshot=13260427264

Total committed heap usage (bytes)=709361664

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=505000000

16/08/10 17:46:46 INFO mapreduce.ImportJobBase: Transferred 481.6055 MB in 34.0606 seconds (14.1397 MB/sec)

16/08/10 17:46:46 INFO mapreduce.ImportJobBase: Retrieved 500000 records.

[hdfs@amb2 ~]$ 



CASE 2. 파일포멧 : text plain , 압축 : snappy 


명령어

sqoop import --target-dir=/dev/test/data_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_3_SUB -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test/data_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_3_SUB -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 17:50:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 17:50:13 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 17:50:13 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 17:50:14 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 17:50:14 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 17:50:14 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 17:50:14 INFO tool.CodeGenTool: Beginning code generation

16/08/10 17:50:14 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM HDFS_3_SUB WHERE 0=1

16/08/10 17:50:14 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM "HDFS_3_SUB" WHERE 1=0

16/08/10 17:50:14 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/175537b93a775793afb75735d65b176f/HDFS_3_SUB.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 17:50:16 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/175537b93a775793afb75735d65b176f/HDFS_3_SUB.jar

16/08/10 17:50:16 INFO mapreduce.ImportJobBase: Beginning import of HDFS_3_SUB

16/08/10 17:50:17 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM "HDFS_3_SUB" WHERE 1=0

16/08/10 17:50:18 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 17:50:18 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 17:50:21 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 17:50:21 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 17:50:21 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 17:50:21 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810175014KST'); 

end;

16/08/10 17:50:21 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 17:50:21 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 17:50:21 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 17:50:22 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 72704 blocks that have been divided into 128 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 17:50:22 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 17:50:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0002

16/08/10 17:50:23 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0002

16/08/10 17:50:23 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0002/

16/08/10 17:50:23 INFO mapreduce.Job: Running job: job_1470815003334_0002

16/08/10 17:50:31 INFO mapreduce.Job: Job job_1470815003334_0002 running in uber mode : false

16/08/10 17:50:31 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 17:50:41 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 17:50:42 INFO mapreduce.Job:  map 50% reduce 0%

16/08/10 17:50:44 INFO mapreduce.Job:  map 80% reduce 0%

16/08/10 17:50:46 INFO mapreduce.Job:  map 89% reduce 0%

16/08/10 17:50:47 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 17:50:47 INFO mapreduce.Job: Job job_1470815003334_0002 completed successfully

16/08/10 17:50:47 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=614136

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=11629

HDFS: Number of bytes written=505057648

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=86936

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=43468

Total vcore-seconds taken by all map tasks=43468

Total megabyte-seconds taken by all map tasks=66766848

Map-Reduce Framework

Map input records=500000

Map output records=500000

Input split bytes=11629

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1239

CPU time spent (ms)=67760

Physical memory (bytes) snapshot=1410674688

Virtual memory (bytes) snapshot=13316395008

Total committed heap usage (bytes)=737673216

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=505057648

16/08/10 17:50:47 INFO mapreduce.ImportJobBase: Transferred 481.6605 MB in 29.1766 seconds (16.5084 MB/sec)

16/08/10 17:50:47 INFO mapreduce.ImportJobBase: Retrieved 500000 records.

[hdfs@amb2 ~]$





본격적인테스트는 아래부터 이다.


========================= 4가지 케이스 테스트  ============================




CASE 1. 텍스트 , 비압축


명령어

sqoop import --target-dir=/dev/test2/data_nc_txt --table HDFS_TEST5 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2/data_nc_txt --table HDFS_TEST5 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 18:09:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 18:09:15 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 18:09:15 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 18:09:17 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 18:09:17 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 18:09:17 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 18:09:17 INFO tool.CodeGenTool: Beginning code generation

16/08/10 18:09:17 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM HDFS_TEST5 WHERE 0=1

16/08/10 18:09:17 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:09:17 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/83d82e240522d651aa49f619fb1c723b/HDFS_TEST5.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 18:09:20 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/83d82e240522d651aa49f619fb1c723b/HDFS_TEST5.jar

16/08/10 18:09:20 INFO mapreduce.ImportJobBase: Beginning import of HDFS_TEST5

16/08/10 18:09:22 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:09:23 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 18:09:23 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 18:09:26 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 18:09:26 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 18:09:26 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 18:09:26 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810180917KST'); 

end;

16/08/10 18:09:26 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 18:09:26 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 18:09:26 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 18:09:27 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 180224 blocks that have been divided into 185 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 18:09:27 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 18:09:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0004

16/08/10 18:09:28 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0004

16/08/10 18:09:28 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0004/

16/08/10 18:09:28 INFO mapreduce.Job: Running job: job_1470815003334_0004

16/08/10 18:09:36 INFO mapreduce.Job: Job job_1470815003334_0004 running in uber mode : false

16/08/10 18:09:36 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 18:09:47 INFO mapreduce.Job:  map 3% reduce 0%

16/08/10 18:09:49 INFO mapreduce.Job:  map 10% reduce 0%

16/08/10 18:09:50 INFO mapreduce.Job:  map 13% reduce 0%

16/08/10 18:09:52 INFO mapreduce.Job:  map 23% reduce 0%

16/08/10 18:09:54 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 18:09:55 INFO mapreduce.Job:  map 37% reduce 0%

16/08/10 18:09:57 INFO mapreduce.Job:  map 44% reduce 0%

16/08/10 18:09:58 INFO mapreduce.Job:  map 59% reduce 0%

16/08/10 18:10:00 INFO mapreduce.Job:  map 60% reduce 0%

16/08/10 18:10:01 INFO mapreduce.Job:  map 73% reduce 0%

16/08/10 18:10:03 INFO mapreduce.Job:  map 77% reduce 0%

16/08/10 18:10:04 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 18:10:06 INFO mapreduce.Job:  map 83% reduce 0%

16/08/10 18:10:08 INFO mapreduce.Job:  map 91% reduce 0%

16/08/10 18:10:10 INFO mapreduce.Job:  map 95% reduce 0%

16/08/10 18:10:11 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 18:10:12 INFO mapreduce.Job: Job job_1470815003334_0004 completed successfully

16/08/10 18:10:12 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=614016

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=16707

HDFS: Number of bytes written=1422465312

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=241392

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=120696

Total vcore-seconds taken by all map tasks=120696

Total megabyte-seconds taken by all map tasks=185389056

Map-Reduce Framework

Map input records=11620800

Map output records=11620800

Input split bytes=16707

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=2308

CPU time spent (ms)=178250

Physical memory (bytes) snapshot=1499193344

Virtual memory (bytes) snapshot=13390106624

Total committed heap usage (bytes)=746586112

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=1422465312

16/08/10 18:10:12 INFO mapreduce.ImportJobBase: Transferred 1.3248 GB in 48.8625 seconds (27.763 MB/sec)

16/08/10 18:10:12 INFO mapreduce.ImportJobBase: Retrieved 11620800 records.

[hdfs@amb2 ~]$ 




case 2 . 텍스트 , snappy 압축


명령어

sqoop import --target-dir=/dev/test2/data_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_TEST5 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2/data_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_TEST5 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 18:12:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 18:12:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 18:12:53 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 18:12:54 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 18:12:54 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 18:12:54 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 18:12:54 INFO tool.CodeGenTool: Beginning code generation

16/08/10 18:12:54 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM HDFS_TEST5 WHERE 0=1

16/08/10 18:12:54 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:12:54 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/a40bc11d71b280f6f6f0be86d8987524/HDFS_TEST5.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 18:12:56 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/a40bc11d71b280f6f6f0be86d8987524/HDFS_TEST5.jar

16/08/10 18:12:56 INFO mapreduce.ImportJobBase: Beginning import of HDFS_TEST5

16/08/10 18:12:57 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:12:58 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 18:12:58 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 18:13:01 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 18:13:01 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 18:13:01 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 18:13:01 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810181254KST'); 

end;

16/08/10 18:13:01 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 18:13:01 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 18:13:01 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 18:13:02 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 180224 blocks that have been divided into 185 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 18:13:02 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 18:13:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0005

16/08/10 18:13:02 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0005

16/08/10 18:13:02 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0005/

16/08/10 18:13:02 INFO mapreduce.Job: Running job: job_1470815003334_0005

16/08/10 18:13:10 INFO mapreduce.Job: Job job_1470815003334_0005 running in uber mode : false

16/08/10 18:13:10 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 18:13:22 INFO mapreduce.Job:  map 3% reduce 0%

16/08/10 18:13:23 INFO mapreduce.Job:  map 10% reduce 0%

16/08/10 18:13:25 INFO mapreduce.Job:  map 15% reduce 0%

16/08/10 18:13:26 INFO mapreduce.Job:  map 24% reduce 0%

16/08/10 18:13:28 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 18:13:29 INFO mapreduce.Job:  map 36% reduce 0%

16/08/10 18:13:31 INFO mapreduce.Job:  map 43% reduce 0%

16/08/10 18:13:32 INFO mapreduce.Job:  map 55% reduce 0%

16/08/10 18:13:34 INFO mapreduce.Job:  map 56% reduce 0%

16/08/10 18:13:35 INFO mapreduce.Job:  map 67% reduce 0%

16/08/10 18:13:37 INFO mapreduce.Job:  map 70% reduce 0%

16/08/10 18:13:38 INFO mapreduce.Job:  map 81% reduce 0%

16/08/10 18:13:40 INFO mapreduce.Job:  map 85% reduce 0%

16/08/10 18:13:41 INFO mapreduce.Job:  map 90% reduce 0%

16/08/10 18:13:43 INFO mapreduce.Job:  map 94% reduce 0%

16/08/10 18:13:44 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 18:13:47 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 18:13:47 INFO mapreduce.Job: Job job_1470815003334_0005 completed successfully

16/08/10 18:13:47 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=613996

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=16707

HDFS: Number of bytes written=809177318

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=244686

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=122343

Total vcore-seconds taken by all map tasks=122343

Total megabyte-seconds taken by all map tasks=187918848

Map-Reduce Framework

Map input records=11620800

Map output records=11620800

Input split bytes=16707

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=2268

CPU time spent (ms)=166060

Physical memory (bytes) snapshot=1489506304

Virtual memory (bytes) snapshot=13366169600

Total committed heap usage (bytes)=789577728

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=809177318

16/08/10 18:13:47 INFO mapreduce.ImportJobBase: Transferred 771.6916 MB in 49.3399 seconds (15.6403 MB/sec)

16/08/10 18:13:47 INFO mapreduce.ImportJobBase: Retrieved 11620800 records.

[hdfs@amb2 ~]$ 





case 3. 파켓 포멧  , 비압축 


명령어

sqoop import --target-dir=/dev/test2/data_nc_pq --table HDFS_TEST5 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2/data_nc_pq --table HDFS_TEST5 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 18:16:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 18:16:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 18:16:27 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 18:16:28 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 18:16:28 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 18:16:28 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 18:16:28 INFO tool.CodeGenTool: Beginning code generation

16/08/10 18:16:28 INFO tool.CodeGenTool: Will generate java class as codegen_HDFS_TEST5

16/08/10 18:16:28 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM HDFS_TEST5 WHERE 0=1

16/08/10 18:16:28 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:16:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/cc39417a2380979acedad21e69766d18/codegen_HDFS_TEST5.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 18:16:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/cc39417a2380979acedad21e69766d18/codegen_HDFS_TEST5.jar

16/08/10 18:16:31 INFO mapreduce.ImportJobBase: Beginning import of HDFS_TEST5

16/08/10 18:16:31 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:16:32 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:16:34 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 18:16:34 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 18:16:36 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 18:16:36 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 18:16:36 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 18:16:36 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810181628KST'); 

end;

16/08/10 18:16:36 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 18:16:36 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 18:16:36 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 18:16:37 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 180224 blocks that have been divided into 185 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 18:16:37 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 18:16:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0006

16/08/10 18:16:38 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0006

16/08/10 18:16:38 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0006/

16/08/10 18:16:38 INFO mapreduce.Job: Running job: job_1470815003334_0006

16/08/10 18:16:46 INFO mapreduce.Job: Job job_1470815003334_0006 running in uber mode : false

16/08/10 18:16:46 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 18:17:10 INFO mapreduce.Job:  map 2% reduce 0%

16/08/10 18:17:22 INFO mapreduce.Job:  map 9% reduce 0%

16/08/10 18:17:25 INFO mapreduce.Job:  map 16% reduce 0%

16/08/10 18:17:28 INFO mapreduce.Job:  map 21% reduce 0%

16/08/10 18:17:31 INFO mapreduce.Job:  map 28% reduce 0%

16/08/10 18:17:34 INFO mapreduce.Job:  map 36% reduce 0%

16/08/10 18:17:37 INFO mapreduce.Job:  map 40% reduce 0%

16/08/10 18:17:40 INFO mapreduce.Job:  map 49% reduce 0%

16/08/10 18:17:43 INFO mapreduce.Job:  map 60% reduce 0%

16/08/10 18:17:46 INFO mapreduce.Job:  map 65% reduce 0%

16/08/10 18:17:49 INFO mapreduce.Job:  map 72% reduce 0%

16/08/10 18:17:52 INFO mapreduce.Job:  map 78% reduce 0%

16/08/10 18:17:55 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 18:17:58 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 18:18:01 INFO mapreduce.Job:  map 93% reduce 0%

16/08/10 18:18:03 INFO mapreduce.Job:  map 94% reduce 0%

16/08/10 18:18:04 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 18:18:07 INFO mapreduce.Job:  map 99% reduce 0%

16/08/10 18:18:10 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 18:18:10 INFO mapreduce.Job: Job job_1470815003334_0006 completed successfully

16/08/10 18:18:10 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=618184

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=49495

HDFS: Number of bytes written=456758842

HDFS: Number of read operations=200

HDFS: Number of large read operations=0

HDFS: Number of write operations=36

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=612860

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=306430

Total vcore-seconds taken by all map tasks=306430

Total megabyte-seconds taken by all map tasks=470676480

Map-Reduce Framework

Map input records=11620800

Map output records=11620800

Input split bytes=16707

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=31375

CPU time spent (ms)=363220

Physical memory (bytes) snapshot=2344189952

Virtual memory (bytes) snapshot=13274750976

Total committed heap usage (bytes)=1465909248

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=0

16/08/10 18:18:10 INFO mapreduce.ImportJobBase: Transferred 435.5992 MB in 97.1124 seconds (4.4855 MB/sec)

16/08/10 18:18:10 INFO mapreduce.ImportJobBase: Retrieved 11620800 records.

[hdfs@amb2 ~]$ 




case 4. 파켓 포멧 , snappy 압축


명령어

sqoop import --target-dir=/dev/test2/data_sn_pq --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_TEST5 -direct --as-parquetfile --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2/data_sn_pq --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_TEST5 -direct --as-parquetfile --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 18:21:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 18:21:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 18:21:21 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 18:21:22 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 18:21:22 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 18:21:22 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 18:21:22 INFO tool.CodeGenTool: Beginning code generation

16/08/10 18:21:22 INFO tool.CodeGenTool: Will generate java class as codegen_HDFS_TEST5

16/08/10 18:21:22 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM HDFS_TEST5 WHERE 0=1

16/08/10 18:21:22 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:21:22 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/ffc7dd24a3b45d0a3b7dad6697d1826d/codegen_HDFS_TEST5.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 18:21:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/ffc7dd24a3b45d0a3b7dad6697d1826d/codegen_HDFS_TEST5.jar

16/08/10 18:21:24 INFO mapreduce.ImportJobBase: Beginning import of HDFS_TEST5

16/08/10 18:21:25 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:21:27 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:21:29 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 18:21:29 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 18:21:32 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 18:21:32 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 18:21:32 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 18:21:32 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810182122KST'); 

end;

16/08/10 18:21:32 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 18:21:32 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 18:21:32 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 18:21:32 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 180224 blocks that have been divided into 185 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 18:21:33 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 18:21:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0007

16/08/10 18:21:33 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0007

16/08/10 18:21:33 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0007/

16/08/10 18:21:33 INFO mapreduce.Job: Running job: job_1470815003334_0007

16/08/10 18:21:41 INFO mapreduce.Job: Job job_1470815003334_0007 running in uber mode : false

16/08/10 18:21:41 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 18:21:57 INFO mapreduce.Job:  map 2% reduce 0%

16/08/10 18:21:58 INFO mapreduce.Job:  map 8% reduce 0%

16/08/10 18:22:00 INFO mapreduce.Job:  map 9% reduce 0%

16/08/10 18:22:01 INFO mapreduce.Job:  map 14% reduce 0%

16/08/10 18:22:03 INFO mapreduce.Job:  map 15% reduce 0%

16/08/10 18:22:04 INFO mapreduce.Job:  map 20% reduce 0%

16/08/10 18:22:06 INFO mapreduce.Job:  map 22% reduce 0%

16/08/10 18:22:07 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 18:22:09 INFO mapreduce.Job:  map 28% reduce 0%

16/08/10 18:22:10 INFO mapreduce.Job:  map 33% reduce 0%

16/08/10 18:22:12 INFO mapreduce.Job:  map 34% reduce 0%

16/08/10 18:22:13 INFO mapreduce.Job:  map 40% reduce 0%

16/08/10 18:22:15 INFO mapreduce.Job:  map 44% reduce 0%

16/08/10 18:22:16 INFO mapreduce.Job:  map 48% reduce 0%

16/08/10 18:22:19 INFO mapreduce.Job:  map 54% reduce 0%

16/08/10 18:22:20 INFO mapreduce.Job:  map 59% reduce 0%

16/08/10 18:22:22 INFO mapreduce.Job:  map 65% reduce 0%

16/08/10 18:22:24 INFO mapreduce.Job:  map 71% reduce 0%

16/08/10 18:22:25 INFO mapreduce.Job:  map 76% reduce 0%

16/08/10 18:22:26 INFO mapreduce.Job:  map 77% reduce 0%

16/08/10 18:22:29 INFO mapreduce.Job:  map 78% reduce 0%

16/08/10 18:22:32 INFO mapreduce.Job:  map 81% reduce 0%

16/08/10 18:22:33 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 18:22:34 INFO mapreduce.Job:  map 83% reduce 0%

16/08/10 18:22:35 INFO mapreduce.Job:  map 86% reduce 0%

16/08/10 18:22:36 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 18:22:38 INFO mapreduce.Job:  map 90% reduce 0%

16/08/10 18:22:39 INFO mapreduce.Job:  map 92% reduce 0%

16/08/10 18:22:40 INFO mapreduce.Job:  map 93% reduce 0%

16/08/10 18:22:41 INFO mapreduce.Job:  map 96% reduce 0%

16/08/10 18:22:42 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 18:22:44 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 18:22:46 INFO mapreduce.Job: Job job_1470815003334_0007 completed successfully

16/08/10 18:22:46 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=618704

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=49495

HDFS: Number of bytes written=456758940

HDFS: Number of read operations=200

HDFS: Number of large read operations=0

HDFS: Number of write operations=36

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=466256

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=233128

Total vcore-seconds taken by all map tasks=233128

Total megabyte-seconds taken by all map tasks=358084608

Map-Reduce Framework

Map input records=11620800

Map output records=11620800

Input split bytes=16707

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=9158

CPU time spent (ms)=325900

Physical memory (bytes) snapshot=2222469120

Virtual memory (bytes) snapshot=13310775296

Total committed heap usage (bytes)=1477443584

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=0

16/08/10 18:22:46 INFO mapreduce.ImportJobBase: Transferred 435.5993 MB in 78.3224 seconds (5.5616 MB/sec)

16/08/10 18:22:46 INFO mapreduce.ImportJobBase: Retrieved 11620800 records.

[hdfs@amb2 ~]$ 




위테스트에 대한 결과.


 

텍스트 , 비압축 

텍스트, 압축 

파켓, 비압축 

파켓, 압축

 총 작업및 전송시간(Sec)

 48.8625

 49.3399

 97.1124

 78.3224

 총 전송량

 1.3248(GB)

 771.6916(MB)

 435.5992(MB)

 435.5993(MB)

 초당 전송량 (MB/sec)

 27.763

 15.6403

 4.4855

 5.5616

       





저작자 표시
신고
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

댓글을 달아 주세요

sqoop , compress , parquet , textplain 을 사용하기 위한 테스트를 진행하기로 했다.


현재 프로젝트중인 고객사에서 사용하기로 한 

snappy와 parquet에 대한 테스트를 위해서 해당 내용을 작성한다.

사용할경우 안할경우를 비교하기위해서 

총 4가지의 케이스를 테스트한다.


MAIN Server 환경 : 

- CPU : 2CPU (2Socket 24Core)

- RAM : 256GB

- HDD : PCX용 SSD + SATA HDD 조합 (hadoop, oracle data가 SATA HDD에 존재)

테스트상 Disk I/O가 영향이 많이가기때문에 이 내용도 기록함 

- HDD는 전부 5400RPM - 버퍼 64MB , WD사의 데이터 저장용 HDD임.

- SSD는 PCI-E 에 장착하는 장비이며 INTEL SSD 700시리즈

- OS+APP+ENGINE은 SSD에 , DATA는 HDD에 있다

* ambari 를 이용한 하둡설치를 진행했으며 DATA NODE 는 총 4개 replica는 3

* namenode 2개 HA구성.

* Vmware 10 

* Vmware 구성 Network 는 기본 100m 라인.

* Oracle은 Enterprise single node 12.1버전 (linux) - 이것역시 Vmware 



oracle 테이블은 아래와 같다.

데이터 사이즈는 약 80GB정도 되는 테이블이며 150개의 컬럼에 1바이트안남기고 150바이트씩 전부 넣어서 데이터를 만들었다. 건수는 5,385,050건 이다. 

최대한 비슷한 환경을 위해서 shared pool , db block buffer flush를 하면서 테스트를 진행했다.




case 1. 비압축 + text plain 


명령어 실행

sqoop import --target-dir=/dev/data1_nc_txt --table HDFS_2 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/data1_nc_txt --table HDFS_2 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 14:01:17 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 14:01:17 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 14:01:17 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 14:01:18 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 14:01:18 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 14:01:18 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 14:01:18 INFO tool.CodeGenTool: Beginning code generation

16/08/10 14:01:18 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM HDFS_2 WHERE 0=1

16/08/10 14:01:18 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM "HDFS_2" WHERE 1=0

16/08/10 14:01:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/9152233b807a9011bb4c6752ec771805/HDFS_2.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 14:01:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/9152233b807a9011bb4c6752ec771805/HDFS_2.jar

16/08/10 14:01:21 INFO mapreduce.ImportJobBase: Beginning import of HDFS_2

16/08/10 14:01:22 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM "HDFS_2" WHERE 1=0

16/08/10 14:01:23 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 14:01:23 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 14:01:25 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 14:01:25 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 14:01:25 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 14:01:25 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810140118KST'); 

end;

16/08/10 14:01:25 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 14:01:25 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 14:01:25 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 14:01:26 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 11335664 blocks that have been divided into 739 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 14:01:26 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 14:01:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0015

16/08/10 14:01:27 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0015

16/08/10 14:01:27 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0015/

16/08/10 14:01:27 INFO mapreduce.Job: Running job: job_1470728284233_0015

16/08/10 14:01:34 INFO mapreduce.Job: Job job_1470728284233_0015 running in uber mode : false

16/08/10 14:01:34 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 14:01:55 INFO mapreduce.Job:  map 1% reduce 0%

16/08/10 14:02:10 INFO mapreduce.Job:  map 2% reduce 0%

16/08/10 14:02:29 INFO mapreduce.Job:  map 3% reduce 0%

16/08/10 14:02:46 INFO mapreduce.Job:  map 4% reduce 0%

16/08/10 14:03:01 INFO mapreduce.Job:  map 5% reduce 0%

16/08/10 14:03:19 INFO mapreduce.Job:  map 6% reduce 0%

16/08/10 14:03:39 INFO mapreduce.Job:  map 7% reduce 0%

16/08/10 14:03:57 INFO mapreduce.Job:  map 8% reduce 0%

16/08/10 14:04:17 INFO mapreduce.Job:  map 9% reduce 0%

16/08/10 14:04:39 INFO mapreduce.Job:  map 10% reduce 0%

16/08/10 14:04:56 INFO mapreduce.Job:  map 11% reduce 0%

16/08/10 14:05:15 INFO mapreduce.Job:  map 12% reduce 0%

16/08/10 14:05:32 INFO mapreduce.Job:  map 13% reduce 0%

16/08/10 14:05:53 INFO mapreduce.Job:  map 14% reduce 0%

16/08/10 14:06:10 INFO mapreduce.Job:  map 15% reduce 0%

16/08/10 14:06:29 INFO mapreduce.Job:  map 16% reduce 0%

16/08/10 14:06:54 INFO mapreduce.Job:  map 17% reduce 0%

16/08/10 14:07:12 INFO mapreduce.Job:  map 18% reduce 0%

16/08/10 14:07:30 INFO mapreduce.Job:  map 19% reduce 0%

16/08/10 14:07:51 INFO mapreduce.Job:  map 20% reduce 0%

16/08/10 14:08:21 INFO mapreduce.Job:  map 21% reduce 0%

16/08/10 14:08:36 INFO mapreduce.Job:  map 22% reduce 0%

16/08/10 14:09:01 INFO mapreduce.Job:  map 23% reduce 0%

16/08/10 14:09:20 INFO mapreduce.Job:  map 24% reduce 0%

16/08/10 14:09:37 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 14:09:50 INFO mapreduce.Job:  map 26% reduce 0%

16/08/10 14:10:15 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 14:10:30 INFO mapreduce.Job:  map 28% reduce 0%

16/08/10 14:10:55 INFO mapreduce.Job:  map 29% reduce 0%

16/08/10 14:11:18 INFO mapreduce.Job:  map 30% reduce 0%

16/08/10 14:11:33 INFO mapreduce.Job:  map 31% reduce 0%

16/08/10 14:11:54 INFO mapreduce.Job:  map 32% reduce 0%

16/08/10 14:12:09 INFO mapreduce.Job:  map 33% reduce 0%

16/08/10 14:12:27 INFO mapreduce.Job:  map 34% reduce 0%

16/08/10 14:12:49 INFO mapreduce.Job:  map 35% reduce 0%

16/08/10 14:13:02 INFO mapreduce.Job:  map 36% reduce 0%

16/08/10 14:13:32 INFO mapreduce.Job:  map 37% reduce 0%

16/08/10 14:13:53 INFO mapreduce.Job:  map 38% reduce 0%

16/08/10 14:14:12 INFO mapreduce.Job:  map 39% reduce 0%

16/08/10 14:14:29 INFO mapreduce.Job:  map 40% reduce 0%

16/08/10 14:14:48 INFO mapreduce.Job:  map 41% reduce 0%

16/08/10 14:14:54 INFO mapreduce.Job:  map 42% reduce 0%

16/08/10 14:15:15 INFO mapreduce.Job:  map 43% reduce 0%

16/08/10 14:15:31 INFO mapreduce.Job:  map 44% reduce 0%

16/08/10 14:15:43 INFO mapreduce.Job:  map 45% reduce 0%

16/08/10 14:16:06 INFO mapreduce.Job:  map 46% reduce 0%

16/08/10 14:16:13 INFO mapreduce.Job:  map 47% reduce 0%

16/08/10 14:16:37 INFO mapreduce.Job:  map 48% reduce 0%

16/08/10 14:16:43 INFO mapreduce.Job:  map 49% reduce 0%

16/08/10 14:17:11 INFO mapreduce.Job:  map 50% reduce 0%

16/08/10 14:17:27 INFO mapreduce.Job:  map 51% reduce 0%

16/08/10 14:17:49 INFO mapreduce.Job:  map 52% reduce 0%

16/08/10 14:17:55 INFO mapreduce.Job:  map 53% reduce 0%

16/08/10 14:18:25 INFO mapreduce.Job:  map 54% reduce 0%

16/08/10 14:18:43 INFO mapreduce.Job:  map 55% reduce 0%

16/08/10 14:18:53 INFO mapreduce.Job:  map 56% reduce 0%

16/08/10 14:19:16 INFO mapreduce.Job:  map 57% reduce 0%

16/08/10 14:19:40 INFO mapreduce.Job:  map 58% reduce 0%

16/08/10 14:19:47 INFO mapreduce.Job:  map 59% reduce 0%

16/08/10 14:20:14 INFO mapreduce.Job:  map 60% reduce 0%

16/08/10 14:20:22 INFO mapreduce.Job:  map 61% reduce 0%

16/08/10 14:20:42 INFO mapreduce.Job:  map 62% reduce 0%

16/08/10 14:20:55 INFO mapreduce.Job:  map 63% reduce 0%

16/08/10 14:21:15 INFO mapreduce.Job:  map 64% reduce 0%

16/08/10 14:21:29 INFO mapreduce.Job:  map 65% reduce 0%

16/08/10 14:21:53 INFO mapreduce.Job:  map 66% reduce 0%

16/08/10 14:22:23 INFO mapreduce.Job:  map 67% reduce 0%

16/08/10 14:22:35 INFO mapreduce.Job:  map 68% reduce 0%

16/08/10 14:23:00 INFO mapreduce.Job:  map 69% reduce 0%

16/08/10 14:23:18 INFO mapreduce.Job:  map 70% reduce 0%

16/08/10 14:23:34 INFO mapreduce.Job:  map 71% reduce 0%

16/08/10 14:23:55 INFO mapreduce.Job:  map 72% reduce 0%

16/08/10 14:24:15 INFO mapreduce.Job:  map 73% reduce 0%

16/08/10 14:24:31 INFO mapreduce.Job:  map 74% reduce 0%

16/08/10 14:24:54 INFO mapreduce.Job:  map 75% reduce 0%

16/08/10 14:25:14 INFO mapreduce.Job:  map 76% reduce 0%

16/08/10 14:25:56 INFO mapreduce.Job:  map 77% reduce 0%

16/08/10 14:26:17 INFO mapreduce.Job:  map 78% reduce 0%

16/08/10 14:26:37 INFO mapreduce.Job:  map 79% reduce 0%

16/08/10 14:27:25 INFO mapreduce.Job:  map 80% reduce 0%

16/08/10 14:27:39 INFO mapreduce.Job:  map 81% reduce 0%

16/08/10 14:28:30 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 14:28:58 INFO mapreduce.Job:  map 83% reduce 0%

16/08/10 14:29:41 INFO mapreduce.Job:  map 84% reduce 0%

16/08/10 14:29:59 INFO mapreduce.Job:  map 85% reduce 0%

16/08/10 14:31:00 INFO mapreduce.Job:  map 86% reduce 0%

16/08/10 14:31:39 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 14:32:11 INFO mapreduce.Job:  map 88% reduce 0%

16/08/10 14:32:35 INFO mapreduce.Job:  map 89% reduce 0%

16/08/10 14:33:04 INFO mapreduce.Job:  map 90% reduce 0%

16/08/10 14:33:51 INFO mapreduce.Job:  map 91% reduce 0%

16/08/10 14:34:45 INFO mapreduce.Job:  map 92% reduce 0%

16/08/10 14:35:14 INFO mapreduce.Job:  map 93% reduce 0%

16/08/10 14:35:44 INFO mapreduce.Job:  map 94% reduce 0%

16/08/10 14:36:16 INFO mapreduce.Job:  map 95% reduce 0%

16/08/10 14:37:08 INFO mapreduce.Job:  map 96% reduce 0%

16/08/10 14:37:44 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 14:38:15 INFO mapreduce.Job:  map 98% reduce 0%

16/08/10 14:38:45 INFO mapreduce.Job:  map 99% reduce 0%

16/08/10 14:39:41 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 14:40:05 INFO mapreduce.Job: Job job_1470728284233_0015 completed successfully

16/08/10 14:40:05 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=619112

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=66021

HDFS: Number of bytes written=81583507500

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=17659492

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=8829746

Total vcore-seconds taken by all map tasks=8829746

Total megabyte-seconds taken by all map tasks=13562489856

Map-Reduce Framework

Map input records=5385050

Map output records=5385050

Input split bytes=66021

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=52264

CPU time spent (ms)=2221300

Physical memory (bytes) snapshot=1561075712

Virtual memory (bytes) snapshot=13249417216

Total committed heap usage (bytes)=847249408

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=81583507500

16/08/10 14:40:05 INFO mapreduce.ImportJobBase: Transferred 75.9806 GB in 2,322.6017 seconds (33.4987 MB/sec)

16/08/10 14:40:05 INFO mapreduce.ImportJobBase: Retrieved 5385050 records.

[hdfs@amb2 ~]$ 




case 2. 압축 + text plain 

sqoop import --target-dir=/dev/data2_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_2 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/data2_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --tablHDFS_2 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 14:52:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 14:52:34 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 14:52:34 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 14:52:35 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 14:52:36 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 14:52:36 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 14:52:36 INFO tool.CodeGenTool: Beginning code generation

16/08/10 14:52:36 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM HDFS_2 WHERE 0=1

16/08/10 14:52:36 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM "HDFS_2" WHERE 1=0

16/08/10 14:52:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/296e19dcfe9895e28bbc77da203ad202/HDFS_2.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 14:52:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/296e19dcfe9895e28bbc77da203ad202/HDFS_2.jar

16/08/10 14:52:39 INFO mapreduce.ImportJobBase: Beginning import of HDFS_2

16/08/10 14:52:40 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM "HDFS_2" WHERE 1=0

16/08/10 14:52:42 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 14:52:42 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 14:52:44 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 14:52:44 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 14:52:44 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 14:52:44 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810145235KST'); 

end;

16/08/10 14:52:44 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 14:52:44 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 14:52:44 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 14:52:45 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 11335664 blocks that have been divided into 739 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 14:52:45 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 14:52:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0017

16/08/10 14:52:46 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0017

16/08/10 14:52:46 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0017/

16/08/10 14:52:46 INFO mapreduce.Job: Running job: job_1470728284233_0017

16/08/10 14:52:54 INFO mapreduce.Job: Job job_1470728284233_0017 running in uber mode : false

16/08/10 14:52:54 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 14:53:16 INFO mapreduce.Job:  map 1% reduce 0%

16/08/10 14:53:28 INFO mapreduce.Job:  map 2% reduce 0%

16/08/10 14:53:43 INFO mapreduce.Job:  map 3% reduce 0%

16/08/10 14:53:58 INFO mapreduce.Job:  map 4% reduce 0%

16/08/10 14:54:15 INFO mapreduce.Job:  map 5% reduce 0%

16/08/10 14:54:30 INFO mapreduce.Job:  map 6% reduce 0%

16/08/10 14:54:58 INFO mapreduce.Job:  map 7% reduce 0%

16/08/10 14:55:44 INFO mapreduce.Job:  map 8% reduce 0%

16/08/10 14:56:31 INFO mapreduce.Job:  map 9% reduce 0%

16/08/10 14:57:03 INFO mapreduce.Job:  map 10% reduce 0%

16/08/10 14:57:31 INFO mapreduce.Job:  map 11% reduce 0%

16/08/10 14:58:06 INFO mapreduce.Job:  map 12% reduce 0%

16/08/10 14:58:45 INFO mapreduce.Job:  map 13% reduce 0%

16/08/10 14:59:17 INFO mapreduce.Job:  map 14% reduce 0%

16/08/10 14:59:57 INFO mapreduce.Job:  map 15% reduce 0%

16/08/10 15:00:32 INFO mapreduce.Job:  map 16% reduce 0%

16/08/10 15:01:24 INFO mapreduce.Job:  map 17% reduce 0%

16/08/10 15:02:05 INFO mapreduce.Job:  map 18% reduce 0%

16/08/10 15:02:36 INFO mapreduce.Job:  map 19% reduce 0%

16/08/10 15:03:16 INFO mapreduce.Job:  map 20% reduce 0%

16/08/10 15:03:45 INFO mapreduce.Job:  map 21% reduce 0%

16/08/10 15:04:30 INFO mapreduce.Job:  map 22% reduce 0%

16/08/10 15:04:58 INFO mapreduce.Job:  map 23% reduce 0%

16/08/10 15:05:30 INFO mapreduce.Job:  map 24% reduce 0%

16/08/10 15:06:05 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 15:06:54 INFO mapreduce.Job:  map 26% reduce 0%

16/08/10 15:07:24 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 15:08:00 INFO mapreduce.Job:  map 28% reduce 0%

16/08/10 15:08:44 INFO mapreduce.Job:  map 29% reduce 0%

16/08/10 15:09:11 INFO mapreduce.Job:  map 30% reduce 0%

16/08/10 15:09:55 INFO mapreduce.Job:  map 31% reduce 0%

16/08/10 15:10:25 INFO mapreduce.Job:  map 32% reduce 0%

16/08/10 15:11:40 INFO mapreduce.Job:  map 33% reduce 0%

16/08/10 15:12:08 INFO mapreduce.Job:  map 34% reduce 0%

16/08/10 15:12:42 INFO mapreduce.Job:  map 35% reduce 0%

16/08/10 15:13:30 INFO mapreduce.Job:  map 36% reduce 0%

16/08/10 15:14:05 INFO mapreduce.Job:  map 37% reduce 0%

16/08/10 15:14:29 INFO mapreduce.Job:  map 38% reduce 0%

16/08/10 15:14:59 INFO mapreduce.Job:  map 39% reduce 0%

16/08/10 15:15:44 INFO mapreduce.Job:  map 40% reduce 0%

16/08/10 15:16:12 INFO mapreduce.Job:  map 41% reduce 0%

16/08/10 15:17:00 INFO mapreduce.Job:  map 42% reduce 0%

16/08/10 15:17:35 INFO mapreduce.Job:  map 43% reduce 0%

16/08/10 15:18:00 INFO mapreduce.Job:  map 44% reduce 0%

16/08/10 15:18:45 INFO mapreduce.Job:  map 45% reduce 0%

16/08/10 15:19:26 INFO mapreduce.Job:  map 46% reduce 0%

16/08/10 15:19:52 INFO mapreduce.Job:  map 47% reduce 0%

16/08/10 15:20:42 INFO mapreduce.Job:  map 48% reduce 0%

16/08/10 15:21:11 INFO mapreduce.Job:  map 49% reduce 0%

16/08/10 15:21:52 INFO mapreduce.Job:  map 50% reduce 0%

16/08/10 15:22:37 INFO mapreduce.Job:  map 51% reduce 0%

16/08/10 15:23:21 INFO mapreduce.Job:  map 52% reduce 0%

16/08/10 15:23:56 INFO mapreduce.Job:  map 53% reduce 0%

16/08/10 15:24:33 INFO mapreduce.Job:  map 54% reduce 0%

16/08/10 15:25:10 INFO mapreduce.Job:  map 55% reduce 0%

16/08/10 15:25:36 INFO mapreduce.Job:  map 56% reduce 0%

16/08/10 15:26:14 INFO mapreduce.Job:  map 57% reduce 0%

16/08/10 15:26:58 INFO mapreduce.Job:  map 58% reduce 0%

16/08/10 15:27:40 INFO mapreduce.Job:  map 59% reduce 0%

16/08/10 15:28:18 INFO mapreduce.Job:  map 60% reduce 0%

16/08/10 15:28:49 INFO mapreduce.Job:  map 61% reduce 0%

16/08/10 15:29:36 INFO mapreduce.Job:  map 62% reduce 0%

16/08/10 15:30:08 INFO mapreduce.Job:  map 63% reduce 0%

16/08/10 15:30:37 INFO mapreduce.Job:  map 64% reduce 0%

16/08/10 15:31:06 INFO mapreduce.Job:  map 65% reduce 0%

16/08/10 15:32:05 INFO mapreduce.Job:  map 66% reduce 0%

16/08/10 15:32:39 INFO mapreduce.Job:  map 67% reduce 0%

16/08/10 15:33:16 INFO mapreduce.Job:  map 68% reduce 0%

16/08/10 15:33:53 INFO mapreduce.Job:  map 69% reduce 0%

16/08/10 15:34:26 INFO mapreduce.Job:  map 70% reduce 0%

16/08/10 15:34:59 INFO mapreduce.Job:  map 71% reduce 0%

16/08/10 15:35:32 INFO mapreduce.Job:  map 72% reduce 0%

16/08/10 15:36:26 INFO mapreduce.Job:  map 73% reduce 0%

16/08/10 15:36:59 INFO mapreduce.Job:  map 74% reduce 0%

16/08/10 15:37:40 INFO mapreduce.Job:  map 75% reduce 0%

16/08/10 15:37:47 INFO mapreduce.Job:  map 76% reduce 0%

16/08/10 15:38:28 INFO mapreduce.Job:  map 77% reduce 0%

16/08/10 15:39:16 INFO mapreduce.Job:  map 78% reduce 0%

16/08/10 15:39:43 INFO mapreduce.Job:  map 79% reduce 0%

16/08/10 15:40:12 INFO mapreduce.Job:  map 80% reduce 0%

16/08/10 15:40:43 INFO mapreduce.Job:  map 81% reduce 0%

16/08/10 15:41:14 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 15:42:01 INFO mapreduce.Job:  map 83% reduce 0%

16/08/10 15:42:30 INFO mapreduce.Job:  map 84% reduce 0%

16/08/10 15:43:24 INFO mapreduce.Job:  map 85% reduce 0%

16/08/10 15:43:51 INFO mapreduce.Job:  map 86% reduce 0%

16/08/10 15:44:27 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 15:44:54 INFO mapreduce.Job:  map 88% reduce 0%

16/08/10 15:45:31 INFO mapreduce.Job:  map 89% reduce 0%

16/08/10 15:45:53 INFO mapreduce.Job:  map 90% reduce 0%

16/08/10 15:46:48 INFO mapreduce.Job:  map 91% reduce 0%

16/08/10 15:47:23 INFO mapreduce.Job:  map 92% reduce 0%

16/08/10 15:48:02 INFO mapreduce.Job:  map 93% reduce 0%

16/08/10 15:48:36 INFO mapreduce.Job:  map 94% reduce 0%

16/08/10 15:49:07 INFO mapreduce.Job:  map 95% reduce 0%

16/08/10 15:49:43 INFO mapreduce.Job:  map 96% reduce 0%

16/08/10 15:50:10 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 15:51:10 INFO mapreduce.Job:  map 98% reduce 0%

16/08/10 15:51:54 INFO mapreduce.Job:  map 99% reduce 0%

16/08/10 15:52:49 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 15:53:13 INFO mapreduce.Job: Job job_1470728284233_0017 completed successfully

16/08/10 15:53:13 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=619092

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=66021

HDFS: Number of bytes written=81593058054

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=27988674

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=13994337

Total vcore-seconds taken by all map tasks=13994337

Total megabyte-seconds taken by all map tasks=21495301632

Map-Reduce Framework

Map input records=5385050

Map output records=5385050

Input split bytes=66021

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=174416

CPU time spent (ms)=3340330

Physical memory (bytes) snapshot=1647304704

Virtual memory (bytes) snapshot=13287915520

Total committed heap usage (bytes)=879755264

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=81593058054

16/08/10 15:53:13 INFO mapreduce.ImportJobBase: Transferred 75.9895 GB in 3,632.0301 seconds (21.4242 MB/sec)

16/08/10 15:53:13 INFO mapreduce.ImportJobBase: Retrieved 5385050 records.

[hdfs@amb2 ~]$


case 3. 압축 + parquet 

sqoop import --target-dir=/dev/data4_sn_pq --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_2 -direct --as-parquetfile --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행 결과

실패 ( 어떤 상황이 생겼는지는 나만 알아야겠다;;; 끔찍하다;;; )

- 50만건으로 재시도하기로함.




저작자 표시
신고
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

댓글을 달아 주세요

sqoop 으로 hadoop 으로 넣을경우 4가지 파일 포멧 리스트 


* Oracle 에 있는데이터를 Hadoop 으로 옮겨넣을때 그동안은 실시간으로 넣어야해서 flume을 썼는데 

배치성으로 도는 작업등은 flume까진 필요없었다. 

지금 들어와있는 프로젝트에서는 sqoop 을 사용 해서 데이터를 hadoop으로 넣는 작업을 진행했다. 


sqoop 은 크게 어려움은 없었으며 쉘상에서 명령어의 사용을 통해서 데이터를 전송해서 사실 개인적으로 사용하기 많이 편했다. 단지 플럼처럼 커스터마이징이 될지는 아직 모르는 상태.


원본은 

ORACLE 상에 일반 HEAP TABLE 이다. 

테스트용 테이블을 만들고나서 임시로 1,000 건의 데이터를 넣었다.


CREATE TABLE HDFS_4 

 (

  ID VARCHAR(100),

  NUM NUMBER(10),

  TEST VARCHAR(100),

  REG_DT DATE DEFAULT SYSDATE 

);


INSERT INTO HDFS_4

SELECT 'USR_'||Dbms_Random.string('A',5),

       Trunc(Dbms_Random.Value(10000,90000)),

       Dbms_Random.string('A',100),

       SYSDATE + To_Number(Dbms_Random.Value(1,30)) 

FROM DUAL

CONNECT BY LEVEL <= 1000;








1. text plain 


실행한 명령어

sqoop import --target-dir=/dev/test2_text --query='select *from HDFS_4 where $CONDITIONS' -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_text --query='select *from HDFS_4 where $CONDITIONS' -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:09:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:09:28 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:09:28 INFO manager.SqlManager: Using default fetchSize of 1000

16/08/10 09:09:28 INFO tool.CodeGenTool: Beginning code generation

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:09:29 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:09:29 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:09:29 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:09:30 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/284d77af5917fa2113d961ae72341cc4/QueryResult.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:09:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/284d77af5917fa2113d961ae72341cc4/QueryResult.jar

16/08/10 09:09:31 INFO mapreduce.ImportJobBase: Beginning query import.

16/08/10 09:09:33 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 09:09:33 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 09:09:35 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 09:09:35 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM (select *from HDFS_4 where  (1 = 1) ) t1

16/08/10 09:09:35 WARN db.TextSplitter: Generating splits for a textual index column.

16/08/10 09:09:35 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

16/08/10 09:09:35 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

16/08/10 09:09:35 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 09:09:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0007

16/08/10 09:09:35 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0007

16/08/10 09:09:35 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0007/

16/08/10 09:09:35 INFO mapreduce.Job: Running job: job_1470728284233_0007

16/08/10 09:09:45 INFO mapreduce.Job: Job job_1470728284233_0007 running in uber mode : false

16/08/10 09:09:45 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 09:09:53 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 09:09:55 INFO mapreduce.Job:  map 75% reduce 0%

16/08/10 09:09:56 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 09:09:56 INFO mapreduce.Job: Job job_1470728284233_0007 completed successfully

16/08/10 09:09:56 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=606092

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=489

HDFS: Number of bytes written=139000

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=57580

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=28790

Total vcore-seconds taken by all map tasks=28790

Total megabyte-seconds taken by all map tasks=44221440

Map-Reduce Framework

Map input records=1000

Map output records=1000

Input split bytes=489

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=2674

CPU time spent (ms)=38860

Physical memory (bytes) snapshot=1292296192

Virtual memory (bytes) snapshot=13235789824

Total committed heap usage (bytes)=689438720

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=139000

16/08/10 09:09:56 INFO mapreduce.ImportJobBase: Transferred 135.7422 KB in 23.6533 seconds (5.7388 KB/sec)

16/08/10 09:09:56 INFO mapreduce.ImportJobBase: Retrieved 1000 records.

[hdfs@amb2 ~]$ 



cat 으로 보면 내용이 다 보인다 

컬럼간의 구분이 쉼표로 되어있는것이 보인다.

CSV랑 같은 형태로 저장



2. AVRO 


실행한 명령어

sqoop import --target-dir=/dev/test2_avro --query='select *from HDFS_4 where $CONDITIONS' --as-avrodatafile  -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_avro --query='select *from HDFS_4 where $CONDITIONS' --as-avrodatafile  -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:15:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:15:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:15:04 INFO manager.SqlManager: Using default fetchSize of 1000

16/08/10 09:15:04 INFO tool.CodeGenTool: Beginning code generation

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:15:05 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:15:05 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:15:05 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:15:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/837dbbb2e304900b1151d2fa6186b0b7/QueryResult.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:15:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/837dbbb2e304900b1151d2fa6186b0b7/QueryResult.jar

16/08/10 09:15:07 INFO mapreduce.ImportJobBase: Beginning query import.

16/08/10 09:15:08 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:15:08 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:15:08 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:15:08 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: /tmp/sqoop-hdfs/compile/837dbbb2e304900b1151d2fa6186b0b7/QueryResult.avsc

16/08/10 09:15:08 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 09:15:08 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 09:15:10 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 09:15:10 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM (select *from HDFS_4 where  (1 = 1) ) t1

16/08/10 09:15:10 WARN db.TextSplitter: Generating splits for a textual index column.

16/08/10 09:15:10 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

16/08/10 09:15:10 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

16/08/10 09:15:10 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 09:15:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0008

16/08/10 09:15:11 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0008

16/08/10 09:15:11 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0008/

16/08/10 09:15:11 INFO mapreduce.Job: Running job: job_1470728284233_0008

16/08/10 09:15:18 INFO mapreduce.Job: Job job_1470728284233_0008 running in uber mode : false

16/08/10 09:15:18 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 09:15:26 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 09:15:27 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 09:15:28 INFO mapreduce.Job: Job job_1470728284233_0008 completed successfully

16/08/10 09:15:28 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=607148

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=489

HDFS: Number of bytes written=130230

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=49984

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=24992

Total vcore-seconds taken by all map tasks=24992

Total megabyte-seconds taken by all map tasks=38387712

Map-Reduce Framework

Map input records=1000

Map output records=1000

Input split bytes=489

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1696

CPU time spent (ms)=38110

Physical memory (bytes) snapshot=1327677440

Virtual memory (bytes) snapshot=13322395648

Total committed heap usage (bytes)=721420288

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=130230

16/08/10 09:15:28 INFO mapreduce.ImportJobBase: Transferred 127.1777 KB in 20.4259 seconds (6.2263 KB/sec)

16/08/10 09:15:28 INFO mapreduce.ImportJobBase: Retrieved 1000 records.

[hdfs@amb2 ~]$ 



cat 으로보면 구분자쪽이 약간 깨진것처럼 보인다. 



3. 시퀀스파일 


실행한 명령어

sqoop import --target-dir=/dev/test2_seq --query='select *from HDFS_4 where $CONDITIONS' --as-sequencefile  -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_seq --query='select *from HDFS_4 where $CONDITIONS' --as-sequencefile  -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:22:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:22:31 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:22:31 INFO manager.SqlManager: Using default fetchSize of 1000

16/08/10 09:22:31 INFO tool.CodeGenTool: Beginning code generation

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:22:32 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:22:32 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:22:32 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:22:32 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/3a0d3133347de2dccea51b1c74f948bd/QueryResult.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:22:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/3a0d3133347de2dccea51b1c74f948bd/QueryResult.jar

16/08/10 09:22:34 INFO mapreduce.ImportJobBase: Beginning query import.

16/08/10 09:22:36 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 09:22:36 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 09:22:38 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 09:22:38 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM (select *from HDFS_4 where  (1 = 1) ) t1

16/08/10 09:22:38 WARN db.TextSplitter: Generating splits for a textual index column.

16/08/10 09:22:38 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

16/08/10 09:22:38 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

16/08/10 09:22:38 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 09:22:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0009

16/08/10 09:22:38 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0009

16/08/10 09:22:39 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0009/

16/08/10 09:22:39 INFO mapreduce.Job: Running job: job_1470728284233_0009

16/08/10 09:22:47 INFO mapreduce.Job: Job job_1470728284233_0009 running in uber mode : false

16/08/10 09:22:47 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 09:22:54 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 09:22:55 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 09:22:56 INFO mapreduce.Job: Job job_1470728284233_0009 completed successfully

16/08/10 09:22:56 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=605572

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=489

HDFS: Number of bytes written=157788

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=48586

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=24293

Total vcore-seconds taken by all map tasks=24293

Total megabyte-seconds taken by all map tasks=37314048

Map-Reduce Framework

Map input records=1000

Map output records=1000

Input split bytes=489

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=765

CPU time spent (ms)=27050

Physical memory (bytes) snapshot=1312333824

Virtual memory (bytes) snapshot=13253160960

Total committed heap usage (bytes)=707788800

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=157788

16/08/10 09:22:56 INFO mapreduce.ImportJobBase: Transferred 154.0898 KB in 20.6491 seconds (7.4623 KB/sec)

16/08/10 09:22:56 INFO mapreduce.ImportJobBase: Retrieved 1000 records.

[hdfs@amb2 ~]$ 


cat 으로보면 맨위에 시퀀스파일이다! 라고 선언하고 약간깨진듯한문자로 이어진 형태를 보인다.



4. 파켓? 파큇? 파일 


실행한 명령어

* sqoop 은 현재 테스트버전은 1.4.6 이다. 명령어를 보면 위와 좀 다르다.

위에 명령어는 전부 query를 날린건데 이건 TABLE지정으로 바꿨다. 

현재 버그가 있는 관계로 아래처럼 바꿨으며 테스트 시점에 검색을 해보니 패치파일이 있었다. 

그러나 난 ambari를 통해서 아예 통으로 설치해서 일일 찾아서 수정하기 귀찮으므로 1.4.7버전을 기다린다 그냥. ㅎㅎ;

sqoop import --target-dir=/dev/test2_pq --table HDFS_4 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID


아래는 쿼리를 실행했을경우 나는 오류로 JAVA null pointer 예외 가 발생한다.

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_pq --query='select *from HDFS_4 where $CONDITIONS' --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:29:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:29:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:29:29 INFO manager.SqlManager: Using default fetchSize of 1000

16/08/10 09:29:29 INFO tool.CodeGenTool: Beginning code generation

16/08/10 09:29:29 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException

java.lang.NullPointerException

at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:97)

at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)

at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)

at org.apache.sqoop.Sqoop.run(Sqoop.java:148)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)

at org.apache.sqoop.Sqoop.main(Sqoop.java:244)

[hdfs@amb2 ~]$


실행 결과 

TABLE 을 가져오는 명령으로 실행하면 아래와 같이 실행중에 에러가난다.  

젠장~ 내용을 대충 보자면~ 데이터 타입으로 보여져서 기존table에서 데이터 타입만 변경한 뷰를 만들어서 그 뷰를 가져와보기로 한다.

16/08/10 09:31:25 INFO mapreduce.Job: Task Id : attempt_1470728284233_0010_m_000002_1, Status : FAILED

Error: org.apache.avro.UnresolvedUnionException: Not in union ["null","long"]: 2016-08-29 21:09:28.0

at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:561)

at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:941)

at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:922)

at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.copy(DatasetKeyOutputFormat.java:327)

at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.write(DatasetKeyOutputFormat.java:321)

at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.write(DatasetKeyOutputFormat.java:300)

at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)

at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)

at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)

at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:70)

at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:39)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)

at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)


Container killed by the ApplicationMaster.

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143


위에러로 인한 뷰생성후 재실행 

CREATE OR REPLACE VIEW VW_HDFS_4

AS

SELECT ID,NUM,TEST,To_Char(REG_DT,'YYYY-MM-DD HH24:MI:SS') REG_DT

  FROM HDFS_4;


실행결과 - 아래 박스에서 검은글씨부분은 에러난것 , 컬러가 있는것은 정상실행부분 

에러는 이미 실행전에 hadoop에 디렉토리를 만들고 meta파일을 생성해서 이미 존재한다고 한 에러.

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_pq --table VW_HDFS_4 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:36:33 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:36:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:36:33 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:36:34 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop will not process this sqoop connection, as "FLASHONE"."VW_HDFS_4" is not an Oracle table, it's a VIEW.

16/08/10 09:36:34 INFO tool.CodeGenTool: Beginning code generation

16/08/10 09:36:34 INFO tool.CodeGenTool: Will generate java class as codegen_VW_HDFS_4

16/08/10 09:36:35 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:36:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "VW_HDFS_4" t WHERE 1=0

16/08/10 09:36:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/95f85eece904f411d74c356ac450d5b5/codegen_VW_HDFS_4.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:36:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/95f85eece904f411d74c356ac450d5b5/codegen_VW_HDFS_4.jar

16/08/10 09:36:36 INFO mapreduce.ImportJobBase: Beginning import of VW_HDFS_4

16/08/10 09:36:37 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:36:38 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:36:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "VW_HDFS_4" t WHERE 1=0

16/08/10 09:36:38 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetExistsException: Descriptor directory already exists: hdfs://amb2.local:8020/dev/test2_pq/.metadata

org.kitesdk.data.DatasetExistsException: Descriptor directory already exists: hdfs://amb2.local:8020/dev/test2_pq/.metadata

at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:219)

at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)

at org.kitesdk.data.Datasets.create(Datasets.java:239)

at org.kitesdk.data.Datasets.create(Datasets.java:307)

at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)

at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)

at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)

at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)

at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)

at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:445)

at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)

at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)

at org.apache.sqoop.Sqoop.run(Sqoop.java:148)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)

at org.apache.sqoop.Sqoop.main(Sqoop.java:244)



[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_pq2 --table VW_HDFS_4 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:36:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:36:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:36:56 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:36:57 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop will not process this sqoop connection, as "FLASHONE"."VW_HDFS_4" is not an Oracle table, it's a VIEW.

16/08/10 09:36:57 INFO tool.CodeGenTool: Beginning code generation

16/08/10 09:36:57 INFO tool.CodeGenTool: Will generate java class as codegen_VW_HDFS_4

16/08/10 09:36:57 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:36:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "VW_HDFS_4" t WHERE 1=0

16/08/10 09:36:57 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/586ec1c10f04fc4c1fdd5986c9fae96b/codegen_VW_HDFS_4.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:36:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/586ec1c10f04fc4c1fdd5986c9fae96b/codegen_VW_HDFS_4.jar

16/08/10 09:36:59 INFO mapreduce.ImportJobBase: Beginning import of VW_HDFS_4

16/08/10 09:36:59 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:37:00 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:37:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "VW_HDFS_4" t WHERE 1=0

16/08/10 09:37:02 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 09:37:02 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 09:37:04 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 09:37:04 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN("ID"), MAX("ID") FROM "VW_HDFS_4"

16/08/10 09:37:04 WARN db.TextSplitter: Generating splits for a textual index column.

16/08/10 09:37:04 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

16/08/10 09:37:04 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

16/08/10 09:37:04 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 09:37:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0011

16/08/10 09:37:04 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0011

16/08/10 09:37:04 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0011/

16/08/10 09:37:04 INFO mapreduce.Job: Running job: job_1470728284233_0011

16/08/10 09:37:11 INFO mapreduce.Job: Job job_1470728284233_0011 running in uber mode : false

16/08/10 09:37:11 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 09:37:20 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 09:37:21 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 09:37:22 INFO mapreduce.Job: Job job_1470728284233_0011 completed successfully

16/08/10 09:37:22 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=610560

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=28037

HDFS: Number of bytes written=137608

HDFS: Number of read operations=200

HDFS: Number of large read operations=0

HDFS: Number of write operations=36

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=60614

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=30307

Total vcore-seconds taken by all map tasks=30307

Total megabyte-seconds taken by all map tasks=46551552

Map-Reduce Framework

Map input records=1000

Map output records=1000

Input split bytes=505

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1345

CPU time spent (ms)=41390

Physical memory (bytes) snapshot=1502236672

Virtual memory (bytes) snapshot=13242277888

Total committed heap usage (bytes)=828375040

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=0

16/08/10 09:37:22 INFO mapreduce.ImportJobBase: Transferred 134.3828 KB in 21.319 seconds (6.3034 KB/sec)

16/08/10 09:37:22 INFO mapreduce.ImportJobBase: Retrieved 1000 records.

[hdfs@amb2 ~]$


cat으로 보면 첫번째 스크린샷은 시작지점이고 두번째 스크린샷은 마지막부분이다. 

타입에대한 내용도 기술되어있다. 좀더 보면 ~avro.schema라는 글도 보인다. 

검색대로 스키마를 avro를 사용한것으로 보인다.



이번프로젝트에서는 text(CSV) 와 parquet을 사용하력 한다.

이제는 parquet으로 저장했을경우 데이터를 다른놈들이 잘가져오는지를 봐야한다. ㅠ_ㅜ;

저작자 표시
신고
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

댓글을 달아 주세요


티스토리 툴바