달력

11

« 2017/11 »

  •  
  •  
  •  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  •  
  •  

Spark2.1 Hive Auth Custom Test


- 2017년 현재. 모 빅데이터 프로젝트 운영을 하고있는중.

- 요구사항 : Spark 를 사용하는데 Hive권한 인증을 사용하려한다.


Spark 버전 : 2.1

문제점 : 

Spark의 강력한 기능에는 현재 호튼웍스 빅데이터 플랫폼에서 사용하고있는 Spark인증은 찾아봐도 없었다.


Hive Metastore를 쓰기때문에 Custom을 해서 재컴파일하려고 했고.

테스트는 잘되고 .


그 위치만 올려서 나중에 안까먹으려한다.


- Spark Source는 Scala로 되어있다 .

- 일단 Scala를 좀 스스로 Hello World는 찍고나서 아래부분에 기능을 추가함.

SparkSource 위치

sql/hive/src/main/scala/org/apache/spark/sql/hive


파일명 : TableReader.scala


Trait를 참조하는곳이 두군데니까 꼭 두군데를 수정할것

일반 Heap Table  과 Partition Table 을 두가지로 분리해서 소스가 개발되었으니. 두군데 다수정해야한다.


어차피 쓰는용도는 확정이니 난 Hive에 들어있는 권한을 사용하기로 함.


아래는 내가 수정한부분의 일부이다.(이정도면 안까먹을듯)

def makeRDDForTable(

      hiveTable: HiveTable,

      deserializerClass: Class[_ <: Deserializer],

      filterOpt: Option[PathFilter]): RDD[InternalRow] = {


이부분부터 수정을 시작.

var hive_table_nm = hiveTable.getTableName()

val driver = "com.mysql.jdbc.Driver"

val url = "jdbc:mysql://내메타스토어경로/hive?characterEncoding=utf8"

val username = "flashone"

val password = "1234"

val check_query = "SELECT count(*) CNT from TBL_PRIVS A , TBLS B WHERE A.TBL_ID = B.TBL_ID AND B.TBL_NAME = ? and A.GRANTOR = ? "

var connection:Connection = null

var resultSet:ResultSet = null

var stat:PreparedStatement = null

try {


     Class.forName(driver)

     connection = DriverManager.getConnection(url, username, password)


     stat = connection.prepareStatement(check_query)

     stat.setString(1,hive_table_nm)

     stat.setString(2,hive_user_nm)

     resultSet = stat.executeQuery()

     while ( resultSet.next() )

     {

            if ( resultSet.getString("CNT") == "0" ) {


                   val Npath = new Path("hdfs path")


                   logInfo(hiveTable.getDataLocation().toString())


                   hiveTable.setDataLocation(Npath)


                   throw new Exception("Access Denied")

            }


     }


   } catch {


   }

   resultSet.close()

   stat.close()

   connection.close()


https://github.com/ocmpromaster/sparkcustom21

여기서 수정하고있음 ㅋ



Rstudio에서 인증처리하는 모습 :

Custom을일단했으니. Audit도 별문제는 없다.



저작자 표시
신고
TAG 2, 2.1, auth, Grant, Hive, MySQL, spark
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

현재 프로젝트에서 SQOOP 과 압축과 저장포멧을 선택해야해서 간단하게 테스트를 했다. 


테스트항목은 sqoop 을 통해서 oracle 데이터를 hadoop 에 넣을때 

snappy의 압축 / 비압축  

text plain / parquet 포멧

이 두가지 종류로 총4 개의 테스트를 진행한다.




테스트 장비의 간단한 스펙 


Host 장비 : 

CPU : Xeon CPU E5-2620 v3 * 2 HT ( Total 24 Core )

RAM : 256GB

HDD : PCI-E(NVMe) Vm OS , SATA (hadoop , oracle data )




guest os 스펙 


HADOOP ECO SYSTEM


vm node spec 

core : 16core ( 4socket , 4core )

ram : 16GB

1 name node , 4 data node , 1 ambari-util server 구성 



ORACLE


vm node spec

core : 8core ( 4socket , 2core )

ram : 8GB

1 single node : Oracle DB (Enterprise Single Node) 12C



원천데이터 (SOURCE) 설명 

이전테스트 실패에서 알수있듯이 랜덤으로 텍스트를 가득 채웠더니 압축률이 0%가 나왔다. 근거는

압축과 비압축 전송총량이 동일했다. 

해당 테스트 결과는 아래에 있다. 


위와 같은 문제로 일반적인 데이터 즉 무언가 문장이 있고 숫자가있는 데이터를 만들기로 하고 oracle dictionary 에 있는 comment 를 가지고 테스트데이터를 만들었다. 


테이블 구조 : 파티션테이블이 아닌 일반 힙테이블.


CREATE TABLE HDFS_TEST5

(

TABLE_NAME VARCHAR2(128),

LVL NUMBER,

COMMENTS VARCHAR(4000),

REG_DT VARCHAR2(19),

SEQ NUMBER 

)


대량 들어있는 데이터를 보면.

약간 데이터 스럽게 생겼다 ㅎㅎ;;


ROW COUNT : 11,620,800건




테스트 1.

========================= 압축률이 0% 인 결과를 보여주는 테스트 ============================


CASE 1. 파일포멧 : text plain , 비압축


명령어

sqoop import --target-dir=/dev/test/data_nc_txt --table HDFS_3_SUB -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test/data_nc_txt --table HDFS_3_SUB -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 17:46:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 17:46:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 17:46:04 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 17:46:06 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 17:46:06 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 17:46:06 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 17:46:06 INFO tool.CodeGenTool: Beginning code generation

16/08/10 17:46:06 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM HDFS_3_SUB WHERE 0=1

16/08/10 17:46:06 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM "HDFS_3_SUB" WHERE 1=0

16/08/10 17:46:06 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/7e534716bf2036f166e1b14257055d00/HDFS_3_SUB.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 17:46:09 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/7e534716bf2036f166e1b14257055d00/HDFS_3_SUB.jar

16/08/10 17:46:09 INFO mapreduce.ImportJobBase: Beginning import of HDFS_3_SUB

16/08/10 17:46:11 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM "HDFS_3_SUB" WHERE 1=0

16/08/10 17:46:12 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 17:46:12 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 17:46:15 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 17:46:15 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 17:46:15 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 17:46:15 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810174606KST'); 

end;

16/08/10 17:46:15 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 17:46:15 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 17:46:15 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 17:46:17 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 72704 blocks that have been divided into 128 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 17:46:17 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 17:46:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0001

16/08/10 17:46:18 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0001

16/08/10 17:46:18 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0001/

16/08/10 17:46:18 INFO mapreduce.Job: Running job: job_1470815003334_0001

16/08/10 17:46:26 INFO mapreduce.Job: Job job_1470815003334_0001 running in uber mode : false

16/08/10 17:46:26 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 17:46:40 INFO mapreduce.Job:  map 7% reduce 0%

16/08/10 17:46:41 INFO mapreduce.Job:  map 32% reduce 0%

16/08/10 17:46:42 INFO mapreduce.Job:  map 57% reduce 0%

16/08/10 17:46:43 INFO mapreduce.Job:  map 80% reduce 0%

16/08/10 17:46:44 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 17:46:46 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 17:46:46 INFO mapreduce.Job: Job job_1470815003334_0001 completed successfully

16/08/10 17:46:46 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=614156

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=11629

HDFS: Number of bytes written=505000000

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=116770

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=58385

Total vcore-seconds taken by all map tasks=58385

Total megabyte-seconds taken by all map tasks=89679360

Map-Reduce Framework

Map input records=500000

Map output records=500000

Input split bytes=11629

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1801

CPU time spent (ms)=75560

Physical memory (bytes) snapshot=1357881344

Virtual memory (bytes) snapshot=13260427264

Total committed heap usage (bytes)=709361664

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=505000000

16/08/10 17:46:46 INFO mapreduce.ImportJobBase: Transferred 481.6055 MB in 34.0606 seconds (14.1397 MB/sec)

16/08/10 17:46:46 INFO mapreduce.ImportJobBase: Retrieved 500000 records.

[hdfs@amb2 ~]$ 



CASE 2. 파일포멧 : text plain , 압축 : snappy 


명령어

sqoop import --target-dir=/dev/test/data_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_3_SUB -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test/data_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_3_SUB -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 17:50:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 17:50:13 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 17:50:13 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 17:50:14 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 17:50:14 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 17:50:14 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 17:50:14 INFO tool.CodeGenTool: Beginning code generation

16/08/10 17:50:14 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM HDFS_3_SUB WHERE 0=1

16/08/10 17:50:14 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM "HDFS_3_SUB" WHERE 1=0

16/08/10 17:50:14 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/175537b93a775793afb75735d65b176f/HDFS_3_SUB.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 17:50:16 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/175537b93a775793afb75735d65b176f/HDFS_3_SUB.jar

16/08/10 17:50:16 INFO mapreduce.ImportJobBase: Beginning import of HDFS_3_SUB

16/08/10 17:50:17 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM "HDFS_3_SUB" WHERE 1=0

16/08/10 17:50:18 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 17:50:18 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 17:50:21 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 17:50:21 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 17:50:21 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 17:50:21 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810175014KST'); 

end;

16/08/10 17:50:21 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 17:50:21 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 17:50:21 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 17:50:22 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 72704 blocks that have been divided into 128 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 17:50:22 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 17:50:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0002

16/08/10 17:50:23 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0002

16/08/10 17:50:23 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0002/

16/08/10 17:50:23 INFO mapreduce.Job: Running job: job_1470815003334_0002

16/08/10 17:50:31 INFO mapreduce.Job: Job job_1470815003334_0002 running in uber mode : false

16/08/10 17:50:31 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 17:50:41 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 17:50:42 INFO mapreduce.Job:  map 50% reduce 0%

16/08/10 17:50:44 INFO mapreduce.Job:  map 80% reduce 0%

16/08/10 17:50:46 INFO mapreduce.Job:  map 89% reduce 0%

16/08/10 17:50:47 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 17:50:47 INFO mapreduce.Job: Job job_1470815003334_0002 completed successfully

16/08/10 17:50:47 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=614136

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=11629

HDFS: Number of bytes written=505057648

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=86936

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=43468

Total vcore-seconds taken by all map tasks=43468

Total megabyte-seconds taken by all map tasks=66766848

Map-Reduce Framework

Map input records=500000

Map output records=500000

Input split bytes=11629

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1239

CPU time spent (ms)=67760

Physical memory (bytes) snapshot=1410674688

Virtual memory (bytes) snapshot=13316395008

Total committed heap usage (bytes)=737673216

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=505057648

16/08/10 17:50:47 INFO mapreduce.ImportJobBase: Transferred 481.6605 MB in 29.1766 seconds (16.5084 MB/sec)

16/08/10 17:50:47 INFO mapreduce.ImportJobBase: Retrieved 500000 records.

[hdfs@amb2 ~]$





본격적인테스트는 아래부터 이다.


========================= 4가지 케이스 테스트  ============================




CASE 1. 텍스트 , 비압축


명령어

sqoop import --target-dir=/dev/test2/data_nc_txt --table HDFS_TEST5 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2/data_nc_txt --table HDFS_TEST5 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 18:09:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 18:09:15 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 18:09:15 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 18:09:17 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 18:09:17 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 18:09:17 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 18:09:17 INFO tool.CodeGenTool: Beginning code generation

16/08/10 18:09:17 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM HDFS_TEST5 WHERE 0=1

16/08/10 18:09:17 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:09:17 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/83d82e240522d651aa49f619fb1c723b/HDFS_TEST5.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 18:09:20 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/83d82e240522d651aa49f619fb1c723b/HDFS_TEST5.jar

16/08/10 18:09:20 INFO mapreduce.ImportJobBase: Beginning import of HDFS_TEST5

16/08/10 18:09:22 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:09:23 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 18:09:23 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 18:09:26 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 18:09:26 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 18:09:26 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 18:09:26 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810180917KST'); 

end;

16/08/10 18:09:26 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 18:09:26 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 18:09:26 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 18:09:27 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 180224 blocks that have been divided into 185 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 18:09:27 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 18:09:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0004

16/08/10 18:09:28 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0004

16/08/10 18:09:28 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0004/

16/08/10 18:09:28 INFO mapreduce.Job: Running job: job_1470815003334_0004

16/08/10 18:09:36 INFO mapreduce.Job: Job job_1470815003334_0004 running in uber mode : false

16/08/10 18:09:36 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 18:09:47 INFO mapreduce.Job:  map 3% reduce 0%

16/08/10 18:09:49 INFO mapreduce.Job:  map 10% reduce 0%

16/08/10 18:09:50 INFO mapreduce.Job:  map 13% reduce 0%

16/08/10 18:09:52 INFO mapreduce.Job:  map 23% reduce 0%

16/08/10 18:09:54 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 18:09:55 INFO mapreduce.Job:  map 37% reduce 0%

16/08/10 18:09:57 INFO mapreduce.Job:  map 44% reduce 0%

16/08/10 18:09:58 INFO mapreduce.Job:  map 59% reduce 0%

16/08/10 18:10:00 INFO mapreduce.Job:  map 60% reduce 0%

16/08/10 18:10:01 INFO mapreduce.Job:  map 73% reduce 0%

16/08/10 18:10:03 INFO mapreduce.Job:  map 77% reduce 0%

16/08/10 18:10:04 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 18:10:06 INFO mapreduce.Job:  map 83% reduce 0%

16/08/10 18:10:08 INFO mapreduce.Job:  map 91% reduce 0%

16/08/10 18:10:10 INFO mapreduce.Job:  map 95% reduce 0%

16/08/10 18:10:11 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 18:10:12 INFO mapreduce.Job: Job job_1470815003334_0004 completed successfully

16/08/10 18:10:12 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=614016

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=16707

HDFS: Number of bytes written=1422465312

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=241392

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=120696

Total vcore-seconds taken by all map tasks=120696

Total megabyte-seconds taken by all map tasks=185389056

Map-Reduce Framework

Map input records=11620800

Map output records=11620800

Input split bytes=16707

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=2308

CPU time spent (ms)=178250

Physical memory (bytes) snapshot=1499193344

Virtual memory (bytes) snapshot=13390106624

Total committed heap usage (bytes)=746586112

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=1422465312

16/08/10 18:10:12 INFO mapreduce.ImportJobBase: Transferred 1.3248 GB in 48.8625 seconds (27.763 MB/sec)

16/08/10 18:10:12 INFO mapreduce.ImportJobBase: Retrieved 11620800 records.

[hdfs@amb2 ~]$ 




case 2 . 텍스트 , snappy 압축


명령어

sqoop import --target-dir=/dev/test2/data_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_TEST5 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2/data_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_TEST5 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 18:12:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 18:12:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 18:12:53 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 18:12:54 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 18:12:54 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 18:12:54 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 18:12:54 INFO tool.CodeGenTool: Beginning code generation

16/08/10 18:12:54 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM HDFS_TEST5 WHERE 0=1

16/08/10 18:12:54 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:12:54 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/a40bc11d71b280f6f6f0be86d8987524/HDFS_TEST5.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 18:12:56 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/a40bc11d71b280f6f6f0be86d8987524/HDFS_TEST5.jar

16/08/10 18:12:56 INFO mapreduce.ImportJobBase: Beginning import of HDFS_TEST5

16/08/10 18:12:57 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:12:58 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 18:12:58 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 18:13:01 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 18:13:01 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 18:13:01 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 18:13:01 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810181254KST'); 

end;

16/08/10 18:13:01 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 18:13:01 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 18:13:01 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 18:13:02 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 180224 blocks that have been divided into 185 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 18:13:02 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 18:13:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0005

16/08/10 18:13:02 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0005

16/08/10 18:13:02 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0005/

16/08/10 18:13:02 INFO mapreduce.Job: Running job: job_1470815003334_0005

16/08/10 18:13:10 INFO mapreduce.Job: Job job_1470815003334_0005 running in uber mode : false

16/08/10 18:13:10 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 18:13:22 INFO mapreduce.Job:  map 3% reduce 0%

16/08/10 18:13:23 INFO mapreduce.Job:  map 10% reduce 0%

16/08/10 18:13:25 INFO mapreduce.Job:  map 15% reduce 0%

16/08/10 18:13:26 INFO mapreduce.Job:  map 24% reduce 0%

16/08/10 18:13:28 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 18:13:29 INFO mapreduce.Job:  map 36% reduce 0%

16/08/10 18:13:31 INFO mapreduce.Job:  map 43% reduce 0%

16/08/10 18:13:32 INFO mapreduce.Job:  map 55% reduce 0%

16/08/10 18:13:34 INFO mapreduce.Job:  map 56% reduce 0%

16/08/10 18:13:35 INFO mapreduce.Job:  map 67% reduce 0%

16/08/10 18:13:37 INFO mapreduce.Job:  map 70% reduce 0%

16/08/10 18:13:38 INFO mapreduce.Job:  map 81% reduce 0%

16/08/10 18:13:40 INFO mapreduce.Job:  map 85% reduce 0%

16/08/10 18:13:41 INFO mapreduce.Job:  map 90% reduce 0%

16/08/10 18:13:43 INFO mapreduce.Job:  map 94% reduce 0%

16/08/10 18:13:44 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 18:13:47 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 18:13:47 INFO mapreduce.Job: Job job_1470815003334_0005 completed successfully

16/08/10 18:13:47 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=613996

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=16707

HDFS: Number of bytes written=809177318

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=244686

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=122343

Total vcore-seconds taken by all map tasks=122343

Total megabyte-seconds taken by all map tasks=187918848

Map-Reduce Framework

Map input records=11620800

Map output records=11620800

Input split bytes=16707

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=2268

CPU time spent (ms)=166060

Physical memory (bytes) snapshot=1489506304

Virtual memory (bytes) snapshot=13366169600

Total committed heap usage (bytes)=789577728

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=809177318

16/08/10 18:13:47 INFO mapreduce.ImportJobBase: Transferred 771.6916 MB in 49.3399 seconds (15.6403 MB/sec)

16/08/10 18:13:47 INFO mapreduce.ImportJobBase: Retrieved 11620800 records.

[hdfs@amb2 ~]$ 





case 3. 파켓 포멧  , 비압축 


명령어

sqoop import --target-dir=/dev/test2/data_nc_pq --table HDFS_TEST5 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2/data_nc_pq --table HDFS_TEST5 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 18:16:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 18:16:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 18:16:27 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 18:16:28 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 18:16:28 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 18:16:28 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 18:16:28 INFO tool.CodeGenTool: Beginning code generation

16/08/10 18:16:28 INFO tool.CodeGenTool: Will generate java class as codegen_HDFS_TEST5

16/08/10 18:16:28 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM HDFS_TEST5 WHERE 0=1

16/08/10 18:16:28 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:16:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/cc39417a2380979acedad21e69766d18/codegen_HDFS_TEST5.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 18:16:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/cc39417a2380979acedad21e69766d18/codegen_HDFS_TEST5.jar

16/08/10 18:16:31 INFO mapreduce.ImportJobBase: Beginning import of HDFS_TEST5

16/08/10 18:16:31 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:16:32 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:16:34 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 18:16:34 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 18:16:36 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 18:16:36 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 18:16:36 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 18:16:36 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810181628KST'); 

end;

16/08/10 18:16:36 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 18:16:36 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 18:16:36 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 18:16:37 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 180224 blocks that have been divided into 185 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 18:16:37 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 18:16:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0006

16/08/10 18:16:38 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0006

16/08/10 18:16:38 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0006/

16/08/10 18:16:38 INFO mapreduce.Job: Running job: job_1470815003334_0006

16/08/10 18:16:46 INFO mapreduce.Job: Job job_1470815003334_0006 running in uber mode : false

16/08/10 18:16:46 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 18:17:10 INFO mapreduce.Job:  map 2% reduce 0%

16/08/10 18:17:22 INFO mapreduce.Job:  map 9% reduce 0%

16/08/10 18:17:25 INFO mapreduce.Job:  map 16% reduce 0%

16/08/10 18:17:28 INFO mapreduce.Job:  map 21% reduce 0%

16/08/10 18:17:31 INFO mapreduce.Job:  map 28% reduce 0%

16/08/10 18:17:34 INFO mapreduce.Job:  map 36% reduce 0%

16/08/10 18:17:37 INFO mapreduce.Job:  map 40% reduce 0%

16/08/10 18:17:40 INFO mapreduce.Job:  map 49% reduce 0%

16/08/10 18:17:43 INFO mapreduce.Job:  map 60% reduce 0%

16/08/10 18:17:46 INFO mapreduce.Job:  map 65% reduce 0%

16/08/10 18:17:49 INFO mapreduce.Job:  map 72% reduce 0%

16/08/10 18:17:52 INFO mapreduce.Job:  map 78% reduce 0%

16/08/10 18:17:55 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 18:17:58 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 18:18:01 INFO mapreduce.Job:  map 93% reduce 0%

16/08/10 18:18:03 INFO mapreduce.Job:  map 94% reduce 0%

16/08/10 18:18:04 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 18:18:07 INFO mapreduce.Job:  map 99% reduce 0%

16/08/10 18:18:10 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 18:18:10 INFO mapreduce.Job: Job job_1470815003334_0006 completed successfully

16/08/10 18:18:10 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=618184

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=49495

HDFS: Number of bytes written=456758842

HDFS: Number of read operations=200

HDFS: Number of large read operations=0

HDFS: Number of write operations=36

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=612860

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=306430

Total vcore-seconds taken by all map tasks=306430

Total megabyte-seconds taken by all map tasks=470676480

Map-Reduce Framework

Map input records=11620800

Map output records=11620800

Input split bytes=16707

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=31375

CPU time spent (ms)=363220

Physical memory (bytes) snapshot=2344189952

Virtual memory (bytes) snapshot=13274750976

Total committed heap usage (bytes)=1465909248

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=0

16/08/10 18:18:10 INFO mapreduce.ImportJobBase: Transferred 435.5992 MB in 97.1124 seconds (4.4855 MB/sec)

16/08/10 18:18:10 INFO mapreduce.ImportJobBase: Retrieved 11620800 records.

[hdfs@amb2 ~]$ 




case 4. 파켓 포멧 , snappy 압축


명령어

sqoop import --target-dir=/dev/test2/data_sn_pq --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_TEST5 -direct --as-parquetfile --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2/data_sn_pq --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_TEST5 -direct --as-parquetfile --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 18:21:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 18:21:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 18:21:21 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 18:21:22 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 18:21:22 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 18:21:22 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 18:21:22 INFO tool.CodeGenTool: Beginning code generation

16/08/10 18:21:22 INFO tool.CodeGenTool: Will generate java class as codegen_HDFS_TEST5

16/08/10 18:21:22 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM HDFS_TEST5 WHERE 0=1

16/08/10 18:21:22 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:21:22 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/ffc7dd24a3b45d0a3b7dad6697d1826d/codegen_HDFS_TEST5.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 18:21:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/ffc7dd24a3b45d0a3b7dad6697d1826d/codegen_HDFS_TEST5.jar

16/08/10 18:21:24 INFO mapreduce.ImportJobBase: Beginning import of HDFS_TEST5

16/08/10 18:21:25 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:21:27 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:21:29 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 18:21:29 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 18:21:32 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 18:21:32 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 18:21:32 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 18:21:32 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810182122KST'); 

end;

16/08/10 18:21:32 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 18:21:32 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 18:21:32 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 18:21:32 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 180224 blocks that have been divided into 185 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 18:21:33 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 18:21:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0007

16/08/10 18:21:33 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0007

16/08/10 18:21:33 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0007/

16/08/10 18:21:33 INFO mapreduce.Job: Running job: job_1470815003334_0007

16/08/10 18:21:41 INFO mapreduce.Job: Job job_1470815003334_0007 running in uber mode : false

16/08/10 18:21:41 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 18:21:57 INFO mapreduce.Job:  map 2% reduce 0%

16/08/10 18:21:58 INFO mapreduce.Job:  map 8% reduce 0%

16/08/10 18:22:00 INFO mapreduce.Job:  map 9% reduce 0%

16/08/10 18:22:01 INFO mapreduce.Job:  map 14% reduce 0%

16/08/10 18:22:03 INFO mapreduce.Job:  map 15% reduce 0%

16/08/10 18:22:04 INFO mapreduce.Job:  map 20% reduce 0%

16/08/10 18:22:06 INFO mapreduce.Job:  map 22% reduce 0%

16/08/10 18:22:07 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 18:22:09 INFO mapreduce.Job:  map 28% reduce 0%

16/08/10 18:22:10 INFO mapreduce.Job:  map 33% reduce 0%

16/08/10 18:22:12 INFO mapreduce.Job:  map 34% reduce 0%

16/08/10 18:22:13 INFO mapreduce.Job:  map 40% reduce 0%

16/08/10 18:22:15 INFO mapreduce.Job:  map 44% reduce 0%

16/08/10 18:22:16 INFO mapreduce.Job:  map 48% reduce 0%

16/08/10 18:22:19 INFO mapreduce.Job:  map 54% reduce 0%

16/08/10 18:22:20 INFO mapreduce.Job:  map 59% reduce 0%

16/08/10 18:22:22 INFO mapreduce.Job:  map 65% reduce 0%

16/08/10 18:22:24 INFO mapreduce.Job:  map 71% reduce 0%

16/08/10 18:22:25 INFO mapreduce.Job:  map 76% reduce 0%

16/08/10 18:22:26 INFO mapreduce.Job:  map 77% reduce 0%

16/08/10 18:22:29 INFO mapreduce.Job:  map 78% reduce 0%

16/08/10 18:22:32 INFO mapreduce.Job:  map 81% reduce 0%

16/08/10 18:22:33 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 18:22:34 INFO mapreduce.Job:  map 83% reduce 0%

16/08/10 18:22:35 INFO mapreduce.Job:  map 86% reduce 0%

16/08/10 18:22:36 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 18:22:38 INFO mapreduce.Job:  map 90% reduce 0%

16/08/10 18:22:39 INFO mapreduce.Job:  map 92% reduce 0%

16/08/10 18:22:40 INFO mapreduce.Job:  map 93% reduce 0%

16/08/10 18:22:41 INFO mapreduce.Job:  map 96% reduce 0%

16/08/10 18:22:42 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 18:22:44 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 18:22:46 INFO mapreduce.Job: Job job_1470815003334_0007 completed successfully

16/08/10 18:22:46 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=618704

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=49495

HDFS: Number of bytes written=456758940

HDFS: Number of read operations=200

HDFS: Number of large read operations=0

HDFS: Number of write operations=36

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=466256

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=233128

Total vcore-seconds taken by all map tasks=233128

Total megabyte-seconds taken by all map tasks=358084608

Map-Reduce Framework

Map input records=11620800

Map output records=11620800

Input split bytes=16707

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=9158

CPU time spent (ms)=325900

Physical memory (bytes) snapshot=2222469120

Virtual memory (bytes) snapshot=13310775296

Total committed heap usage (bytes)=1477443584

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=0

16/08/10 18:22:46 INFO mapreduce.ImportJobBase: Transferred 435.5993 MB in 78.3224 seconds (5.5616 MB/sec)

16/08/10 18:22:46 INFO mapreduce.ImportJobBase: Retrieved 11620800 records.

[hdfs@amb2 ~]$ 




위테스트에 대한 결과.


 

텍스트 , 비압축 

텍스트, 압축 

파켓, 비압축 

파켓, 압축

 총 작업및 전송시간(Sec)

 48.8625

 49.3399

 97.1124

 78.3224

 총 전송량

 1.3248(GB)

 771.6916(MB)

 435.5992(MB)

 435.5993(MB)

 초당 전송량 (MB/sec)

 27.763

 15.6403

 4.4855

 5.5616

       





저작자 표시
신고
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

sqoop , compress , parquet , textplain 을 사용하기 위한 테스트를 진행하기로 했다.


현재 프로젝트중인 고객사에서 사용하기로 한 

snappy와 parquet에 대한 테스트를 위해서 해당 내용을 작성한다.

사용할경우 안할경우를 비교하기위해서 

총 4가지의 케이스를 테스트한다.


MAIN Server 환경 : 

- CPU : 2CPU (2Socket 24Core)

- RAM : 256GB

- HDD : PCX용 SSD + SATA HDD 조합 (hadoop, oracle data가 SATA HDD에 존재)

테스트상 Disk I/O가 영향이 많이가기때문에 이 내용도 기록함 

- HDD는 전부 5400RPM - 버퍼 64MB , WD사의 데이터 저장용 HDD임.

- SSD는 PCI-E 에 장착하는 장비이며 INTEL SSD 700시리즈

- OS+APP+ENGINE은 SSD에 , DATA는 HDD에 있다

* ambari 를 이용한 하둡설치를 진행했으며 DATA NODE 는 총 4개 replica는 3

* namenode 2개 HA구성.

* Vmware 10 

* Vmware 구성 Network 는 기본 100m 라인.

* Oracle은 Enterprise single node 12.1버전 (linux) - 이것역시 Vmware 



oracle 테이블은 아래와 같다.

데이터 사이즈는 약 80GB정도 되는 테이블이며 150개의 컬럼에 1바이트안남기고 150바이트씩 전부 넣어서 데이터를 만들었다. 건수는 5,385,050건 이다. 

최대한 비슷한 환경을 위해서 shared pool , db block buffer flush를 하면서 테스트를 진행했다.




case 1. 비압축 + text plain 


명령어 실행

sqoop import --target-dir=/dev/data1_nc_txt --table HDFS_2 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/data1_nc_txt --table HDFS_2 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 14:01:17 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 14:01:17 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 14:01:17 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 14:01:18 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 14:01:18 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 14:01:18 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 14:01:18 INFO tool.CodeGenTool: Beginning code generation

16/08/10 14:01:18 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM HDFS_2 WHERE 0=1

16/08/10 14:01:18 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM "HDFS_2" WHERE 1=0

16/08/10 14:01:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/9152233b807a9011bb4c6752ec771805/HDFS_2.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 14:01:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/9152233b807a9011bb4c6752ec771805/HDFS_2.jar

16/08/10 14:01:21 INFO mapreduce.ImportJobBase: Beginning import of HDFS_2

16/08/10 14:01:22 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM "HDFS_2" WHERE 1=0

16/08/10 14:01:23 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 14:01:23 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 14:01:25 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 14:01:25 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 14:01:25 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 14:01:25 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810140118KST'); 

end;

16/08/10 14:01:25 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 14:01:25 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 14:01:25 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 14:01:26 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 11335664 blocks that have been divided into 739 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 14:01:26 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 14:01:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0015

16/08/10 14:01:27 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0015

16/08/10 14:01:27 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0015/

16/08/10 14:01:27 INFO mapreduce.Job: Running job: job_1470728284233_0015

16/08/10 14:01:34 INFO mapreduce.Job: Job job_1470728284233_0015 running in uber mode : false

16/08/10 14:01:34 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 14:01:55 INFO mapreduce.Job:  map 1% reduce 0%

16/08/10 14:02:10 INFO mapreduce.Job:  map 2% reduce 0%

16/08/10 14:02:29 INFO mapreduce.Job:  map 3% reduce 0%

16/08/10 14:02:46 INFO mapreduce.Job:  map 4% reduce 0%

16/08/10 14:03:01 INFO mapreduce.Job:  map 5% reduce 0%

16/08/10 14:03:19 INFO mapreduce.Job:  map 6% reduce 0%

16/08/10 14:03:39 INFO mapreduce.Job:  map 7% reduce 0%

16/08/10 14:03:57 INFO mapreduce.Job:  map 8% reduce 0%

16/08/10 14:04:17 INFO mapreduce.Job:  map 9% reduce 0%

16/08/10 14:04:39 INFO mapreduce.Job:  map 10% reduce 0%

16/08/10 14:04:56 INFO mapreduce.Job:  map 11% reduce 0%

16/08/10 14:05:15 INFO mapreduce.Job:  map 12% reduce 0%

16/08/10 14:05:32 INFO mapreduce.Job:  map 13% reduce 0%

16/08/10 14:05:53 INFO mapreduce.Job:  map 14% reduce 0%

16/08/10 14:06:10 INFO mapreduce.Job:  map 15% reduce 0%

16/08/10 14:06:29 INFO mapreduce.Job:  map 16% reduce 0%

16/08/10 14:06:54 INFO mapreduce.Job:  map 17% reduce 0%

16/08/10 14:07:12 INFO mapreduce.Job:  map 18% reduce 0%

16/08/10 14:07:30 INFO mapreduce.Job:  map 19% reduce 0%

16/08/10 14:07:51 INFO mapreduce.Job:  map 20% reduce 0%

16/08/10 14:08:21 INFO mapreduce.Job:  map 21% reduce 0%

16/08/10 14:08:36 INFO mapreduce.Job:  map 22% reduce 0%

16/08/10 14:09:01 INFO mapreduce.Job:  map 23% reduce 0%

16/08/10 14:09:20 INFO mapreduce.Job:  map 24% reduce 0%

16/08/10 14:09:37 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 14:09:50 INFO mapreduce.Job:  map 26% reduce 0%

16/08/10 14:10:15 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 14:10:30 INFO mapreduce.Job:  map 28% reduce 0%

16/08/10 14:10:55 INFO mapreduce.Job:  map 29% reduce 0%

16/08/10 14:11:18 INFO mapreduce.Job:  map 30% reduce 0%

16/08/10 14:11:33 INFO mapreduce.Job:  map 31% reduce 0%

16/08/10 14:11:54 INFO mapreduce.Job:  map 32% reduce 0%

16/08/10 14:12:09 INFO mapreduce.Job:  map 33% reduce 0%

16/08/10 14:12:27 INFO mapreduce.Job:  map 34% reduce 0%

16/08/10 14:12:49 INFO mapreduce.Job:  map 35% reduce 0%

16/08/10 14:13:02 INFO mapreduce.Job:  map 36% reduce 0%

16/08/10 14:13:32 INFO mapreduce.Job:  map 37% reduce 0%

16/08/10 14:13:53 INFO mapreduce.Job:  map 38% reduce 0%

16/08/10 14:14:12 INFO mapreduce.Job:  map 39% reduce 0%

16/08/10 14:14:29 INFO mapreduce.Job:  map 40% reduce 0%

16/08/10 14:14:48 INFO mapreduce.Job:  map 41% reduce 0%

16/08/10 14:14:54 INFO mapreduce.Job:  map 42% reduce 0%

16/08/10 14:15:15 INFO mapreduce.Job:  map 43% reduce 0%

16/08/10 14:15:31 INFO mapreduce.Job:  map 44% reduce 0%

16/08/10 14:15:43 INFO mapreduce.Job:  map 45% reduce 0%

16/08/10 14:16:06 INFO mapreduce.Job:  map 46% reduce 0%

16/08/10 14:16:13 INFO mapreduce.Job:  map 47% reduce 0%

16/08/10 14:16:37 INFO mapreduce.Job:  map 48% reduce 0%

16/08/10 14:16:43 INFO mapreduce.Job:  map 49% reduce 0%

16/08/10 14:17:11 INFO mapreduce.Job:  map 50% reduce 0%

16/08/10 14:17:27 INFO mapreduce.Job:  map 51% reduce 0%

16/08/10 14:17:49 INFO mapreduce.Job:  map 52% reduce 0%

16/08/10 14:17:55 INFO mapreduce.Job:  map 53% reduce 0%

16/08/10 14:18:25 INFO mapreduce.Job:  map 54% reduce 0%

16/08/10 14:18:43 INFO mapreduce.Job:  map 55% reduce 0%

16/08/10 14:18:53 INFO mapreduce.Job:  map 56% reduce 0%

16/08/10 14:19:16 INFO mapreduce.Job:  map 57% reduce 0%

16/08/10 14:19:40 INFO mapreduce.Job:  map 58% reduce 0%

16/08/10 14:19:47 INFO mapreduce.Job:  map 59% reduce 0%

16/08/10 14:20:14 INFO mapreduce.Job:  map 60% reduce 0%

16/08/10 14:20:22 INFO mapreduce.Job:  map 61% reduce 0%

16/08/10 14:20:42 INFO mapreduce.Job:  map 62% reduce 0%

16/08/10 14:20:55 INFO mapreduce.Job:  map 63% reduce 0%

16/08/10 14:21:15 INFO mapreduce.Job:  map 64% reduce 0%

16/08/10 14:21:29 INFO mapreduce.Job:  map 65% reduce 0%

16/08/10 14:21:53 INFO mapreduce.Job:  map 66% reduce 0%

16/08/10 14:22:23 INFO mapreduce.Job:  map 67% reduce 0%

16/08/10 14:22:35 INFO mapreduce.Job:  map 68% reduce 0%

16/08/10 14:23:00 INFO mapreduce.Job:  map 69% reduce 0%

16/08/10 14:23:18 INFO mapreduce.Job:  map 70% reduce 0%

16/08/10 14:23:34 INFO mapreduce.Job:  map 71% reduce 0%

16/08/10 14:23:55 INFO mapreduce.Job:  map 72% reduce 0%

16/08/10 14:24:15 INFO mapreduce.Job:  map 73% reduce 0%

16/08/10 14:24:31 INFO mapreduce.Job:  map 74% reduce 0%

16/08/10 14:24:54 INFO mapreduce.Job:  map 75% reduce 0%

16/08/10 14:25:14 INFO mapreduce.Job:  map 76% reduce 0%

16/08/10 14:25:56 INFO mapreduce.Job:  map 77% reduce 0%

16/08/10 14:26:17 INFO mapreduce.Job:  map 78% reduce 0%

16/08/10 14:26:37 INFO mapreduce.Job:  map 79% reduce 0%

16/08/10 14:27:25 INFO mapreduce.Job:  map 80% reduce 0%

16/08/10 14:27:39 INFO mapreduce.Job:  map 81% reduce 0%

16/08/10 14:28:30 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 14:28:58 INFO mapreduce.Job:  map 83% reduce 0%

16/08/10 14:29:41 INFO mapreduce.Job:  map 84% reduce 0%

16/08/10 14:29:59 INFO mapreduce.Job:  map 85% reduce 0%

16/08/10 14:31:00 INFO mapreduce.Job:  map 86% reduce 0%

16/08/10 14:31:39 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 14:32:11 INFO mapreduce.Job:  map 88% reduce 0%

16/08/10 14:32:35 INFO mapreduce.Job:  map 89% reduce 0%

16/08/10 14:33:04 INFO mapreduce.Job:  map 90% reduce 0%

16/08/10 14:33:51 INFO mapreduce.Job:  map 91% reduce 0%

16/08/10 14:34:45 INFO mapreduce.Job:  map 92% reduce 0%

16/08/10 14:35:14 INFO mapreduce.Job:  map 93% reduce 0%

16/08/10 14:35:44 INFO mapreduce.Job:  map 94% reduce 0%

16/08/10 14:36:16 INFO mapreduce.Job:  map 95% reduce 0%

16/08/10 14:37:08 INFO mapreduce.Job:  map 96% reduce 0%

16/08/10 14:37:44 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 14:38:15 INFO mapreduce.Job:  map 98% reduce 0%

16/08/10 14:38:45 INFO mapreduce.Job:  map 99% reduce 0%

16/08/10 14:39:41 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 14:40:05 INFO mapreduce.Job: Job job_1470728284233_0015 completed successfully

16/08/10 14:40:05 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=619112

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=66021

HDFS: Number of bytes written=81583507500

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=17659492

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=8829746

Total vcore-seconds taken by all map tasks=8829746

Total megabyte-seconds taken by all map tasks=13562489856

Map-Reduce Framework

Map input records=5385050

Map output records=5385050

Input split bytes=66021

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=52264

CPU time spent (ms)=2221300

Physical memory (bytes) snapshot=1561075712

Virtual memory (bytes) snapshot=13249417216

Total committed heap usage (bytes)=847249408

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=81583507500

16/08/10 14:40:05 INFO mapreduce.ImportJobBase: Transferred 75.9806 GB in 2,322.6017 seconds (33.4987 MB/sec)

16/08/10 14:40:05 INFO mapreduce.ImportJobBase: Retrieved 5385050 records.

[hdfs@amb2 ~]$ 




case 2. 압축 + text plain 

sqoop import --target-dir=/dev/data2_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_2 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/data2_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --tablHDFS_2 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 14:52:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 14:52:34 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 14:52:34 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 14:52:35 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 14:52:36 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 14:52:36 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 14:52:36 INFO tool.CodeGenTool: Beginning code generation

16/08/10 14:52:36 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM HDFS_2 WHERE 0=1

16/08/10 14:52:36 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM "HDFS_2" WHERE 1=0

16/08/10 14:52:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/296e19dcfe9895e28bbc77da203ad202/HDFS_2.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 14:52:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/296e19dcfe9895e28bbc77da203ad202/HDFS_2.jar

16/08/10 14:52:39 INFO mapreduce.ImportJobBase: Beginning import of HDFS_2

16/08/10 14:52:40 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM "HDFS_2" WHERE 1=0

16/08/10 14:52:42 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 14:52:42 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 14:52:44 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 14:52:44 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 14:52:44 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 14:52:44 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810145235KST'); 

end;

16/08/10 14:52:44 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 14:52:44 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 14:52:44 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 14:52:45 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 11335664 blocks that have been divided into 739 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 14:52:45 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 14:52:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0017

16/08/10 14:52:46 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0017

16/08/10 14:52:46 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0017/

16/08/10 14:52:46 INFO mapreduce.Job: Running job: job_1470728284233_0017

16/08/10 14:52:54 INFO mapreduce.Job: Job job_1470728284233_0017 running in uber mode : false

16/08/10 14:52:54 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 14:53:16 INFO mapreduce.Job:  map 1% reduce 0%

16/08/10 14:53:28 INFO mapreduce.Job:  map 2% reduce 0%

16/08/10 14:53:43 INFO mapreduce.Job:  map 3% reduce 0%

16/08/10 14:53:58 INFO mapreduce.Job:  map 4% reduce 0%

16/08/10 14:54:15 INFO mapreduce.Job:  map 5% reduce 0%

16/08/10 14:54:30 INFO mapreduce.Job:  map 6% reduce 0%

16/08/10 14:54:58 INFO mapreduce.Job:  map 7% reduce 0%

16/08/10 14:55:44 INFO mapreduce.Job:  map 8% reduce 0%

16/08/10 14:56:31 INFO mapreduce.Job:  map 9% reduce 0%

16/08/10 14:57:03 INFO mapreduce.Job:  map 10% reduce 0%

16/08/10 14:57:31 INFO mapreduce.Job:  map 11% reduce 0%

16/08/10 14:58:06 INFO mapreduce.Job:  map 12% reduce 0%

16/08/10 14:58:45 INFO mapreduce.Job:  map 13% reduce 0%

16/08/10 14:59:17 INFO mapreduce.Job:  map 14% reduce 0%

16/08/10 14:59:57 INFO mapreduce.Job:  map 15% reduce 0%

16/08/10 15:00:32 INFO mapreduce.Job:  map 16% reduce 0%

16/08/10 15:01:24 INFO mapreduce.Job:  map 17% reduce 0%

16/08/10 15:02:05 INFO mapreduce.Job:  map 18% reduce 0%

16/08/10 15:02:36 INFO mapreduce.Job:  map 19% reduce 0%

16/08/10 15:03:16 INFO mapreduce.Job:  map 20% reduce 0%

16/08/10 15:03:45 INFO mapreduce.Job:  map 21% reduce 0%

16/08/10 15:04:30 INFO mapreduce.Job:  map 22% reduce 0%

16/08/10 15:04:58 INFO mapreduce.Job:  map 23% reduce 0%

16/08/10 15:05:30 INFO mapreduce.Job:  map 24% reduce 0%

16/08/10 15:06:05 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 15:06:54 INFO mapreduce.Job:  map 26% reduce 0%

16/08/10 15:07:24 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 15:08:00 INFO mapreduce.Job:  map 28% reduce 0%

16/08/10 15:08:44 INFO mapreduce.Job:  map 29% reduce 0%

16/08/10 15:09:11 INFO mapreduce.Job:  map 30% reduce 0%

16/08/10 15:09:55 INFO mapreduce.Job:  map 31% reduce 0%

16/08/10 15:10:25 INFO mapreduce.Job:  map 32% reduce 0%

16/08/10 15:11:40 INFO mapreduce.Job:  map 33% reduce 0%

16/08/10 15:12:08 INFO mapreduce.Job:  map 34% reduce 0%

16/08/10 15:12:42 INFO mapreduce.Job:  map 35% reduce 0%

16/08/10 15:13:30 INFO mapreduce.Job:  map 36% reduce 0%

16/08/10 15:14:05 INFO mapreduce.Job:  map 37% reduce 0%

16/08/10 15:14:29 INFO mapreduce.Job:  map 38% reduce 0%

16/08/10 15:14:59 INFO mapreduce.Job:  map 39% reduce 0%

16/08/10 15:15:44 INFO mapreduce.Job:  map 40% reduce 0%

16/08/10 15:16:12 INFO mapreduce.Job:  map 41% reduce 0%

16/08/10 15:17:00 INFO mapreduce.Job:  map 42% reduce 0%

16/08/10 15:17:35 INFO mapreduce.Job:  map 43% reduce 0%

16/08/10 15:18:00 INFO mapreduce.Job:  map 44% reduce 0%

16/08/10 15:18:45 INFO mapreduce.Job:  map 45% reduce 0%

16/08/10 15:19:26 INFO mapreduce.Job:  map 46% reduce 0%

16/08/10 15:19:52 INFO mapreduce.Job:  map 47% reduce 0%

16/08/10 15:20:42 INFO mapreduce.Job:  map 48% reduce 0%

16/08/10 15:21:11 INFO mapreduce.Job:  map 49% reduce 0%

16/08/10 15:21:52 INFO mapreduce.Job:  map 50% reduce 0%

16/08/10 15:22:37 INFO mapreduce.Job:  map 51% reduce 0%

16/08/10 15:23:21 INFO mapreduce.Job:  map 52% reduce 0%

16/08/10 15:23:56 INFO mapreduce.Job:  map 53% reduce 0%

16/08/10 15:24:33 INFO mapreduce.Job:  map 54% reduce 0%

16/08/10 15:25:10 INFO mapreduce.Job:  map 55% reduce 0%

16/08/10 15:25:36 INFO mapreduce.Job:  map 56% reduce 0%

16/08/10 15:26:14 INFO mapreduce.Job:  map 57% reduce 0%

16/08/10 15:26:58 INFO mapreduce.Job:  map 58% reduce 0%

16/08/10 15:27:40 INFO mapreduce.Job:  map 59% reduce 0%

16/08/10 15:28:18 INFO mapreduce.Job:  map 60% reduce 0%

16/08/10 15:28:49 INFO mapreduce.Job:  map 61% reduce 0%

16/08/10 15:29:36 INFO mapreduce.Job:  map 62% reduce 0%

16/08/10 15:30:08 INFO mapreduce.Job:  map 63% reduce 0%

16/08/10 15:30:37 INFO mapreduce.Job:  map 64% reduce 0%

16/08/10 15:31:06 INFO mapreduce.Job:  map 65% reduce 0%

16/08/10 15:32:05 INFO mapreduce.Job:  map 66% reduce 0%

16/08/10 15:32:39 INFO mapreduce.Job:  map 67% reduce 0%

16/08/10 15:33:16 INFO mapreduce.Job:  map 68% reduce 0%

16/08/10 15:33:53 INFO mapreduce.Job:  map 69% reduce 0%

16/08/10 15:34:26 INFO mapreduce.Job:  map 70% reduce 0%

16/08/10 15:34:59 INFO mapreduce.Job:  map 71% reduce 0%

16/08/10 15:35:32 INFO mapreduce.Job:  map 72% reduce 0%

16/08/10 15:36:26 INFO mapreduce.Job:  map 73% reduce 0%

16/08/10 15:36:59 INFO mapreduce.Job:  map 74% reduce 0%

16/08/10 15:37:40 INFO mapreduce.Job:  map 75% reduce 0%

16/08/10 15:37:47 INFO mapreduce.Job:  map 76% reduce 0%

16/08/10 15:38:28 INFO mapreduce.Job:  map 77% reduce 0%

16/08/10 15:39:16 INFO mapreduce.Job:  map 78% reduce 0%

16/08/10 15:39:43 INFO mapreduce.Job:  map 79% reduce 0%

16/08/10 15:40:12 INFO mapreduce.Job:  map 80% reduce 0%

16/08/10 15:40:43 INFO mapreduce.Job:  map 81% reduce 0%

16/08/10 15:41:14 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 15:42:01 INFO mapreduce.Job:  map 83% reduce 0%

16/08/10 15:42:30 INFO mapreduce.Job:  map 84% reduce 0%

16/08/10 15:43:24 INFO mapreduce.Job:  map 85% reduce 0%

16/08/10 15:43:51 INFO mapreduce.Job:  map 86% reduce 0%

16/08/10 15:44:27 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 15:44:54 INFO mapreduce.Job:  map 88% reduce 0%

16/08/10 15:45:31 INFO mapreduce.Job:  map 89% reduce 0%

16/08/10 15:45:53 INFO mapreduce.Job:  map 90% reduce 0%

16/08/10 15:46:48 INFO mapreduce.Job:  map 91% reduce 0%

16/08/10 15:47:23 INFO mapreduce.Job:  map 92% reduce 0%

16/08/10 15:48:02 INFO mapreduce.Job:  map 93% reduce 0%

16/08/10 15:48:36 INFO mapreduce.Job:  map 94% reduce 0%

16/08/10 15:49:07 INFO mapreduce.Job:  map 95% reduce 0%

16/08/10 15:49:43 INFO mapreduce.Job:  map 96% reduce 0%

16/08/10 15:50:10 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 15:51:10 INFO mapreduce.Job:  map 98% reduce 0%

16/08/10 15:51:54 INFO mapreduce.Job:  map 99% reduce 0%

16/08/10 15:52:49 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 15:53:13 INFO mapreduce.Job: Job job_1470728284233_0017 completed successfully

16/08/10 15:53:13 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=619092

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=66021

HDFS: Number of bytes written=81593058054

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=27988674

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=13994337

Total vcore-seconds taken by all map tasks=13994337

Total megabyte-seconds taken by all map tasks=21495301632

Map-Reduce Framework

Map input records=5385050

Map output records=5385050

Input split bytes=66021

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=174416

CPU time spent (ms)=3340330

Physical memory (bytes) snapshot=1647304704

Virtual memory (bytes) snapshot=13287915520

Total committed heap usage (bytes)=879755264

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=81593058054

16/08/10 15:53:13 INFO mapreduce.ImportJobBase: Transferred 75.9895 GB in 3,632.0301 seconds (21.4242 MB/sec)

16/08/10 15:53:13 INFO mapreduce.ImportJobBase: Retrieved 5385050 records.

[hdfs@amb2 ~]$


case 3. 압축 + parquet 

sqoop import --target-dir=/dev/data4_sn_pq --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_2 -direct --as-parquetfile --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행 결과

실패 ( 어떤 상황이 생겼는지는 나만 알아야겠다;;; 끔찍하다;;; )

- 50만건으로 재시도하기로함.




저작자 표시
신고
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

sqoop 으로 hadoop 으로 넣을경우 4가지 파일 포멧 리스트 


* Oracle 에 있는데이터를 Hadoop 으로 옮겨넣을때 그동안은 실시간으로 넣어야해서 flume을 썼는데 

배치성으로 도는 작업등은 flume까진 필요없었다. 

지금 들어와있는 프로젝트에서는 sqoop 을 사용 해서 데이터를 hadoop으로 넣는 작업을 진행했다. 


sqoop 은 크게 어려움은 없었으며 쉘상에서 명령어의 사용을 통해서 데이터를 전송해서 사실 개인적으로 사용하기 많이 편했다. 단지 플럼처럼 커스터마이징이 될지는 아직 모르는 상태.


원본은 

ORACLE 상에 일반 HEAP TABLE 이다. 

테스트용 테이블을 만들고나서 임시로 1,000 건의 데이터를 넣었다.


CREATE TABLE HDFS_4 

 (

  ID VARCHAR(100),

  NUM NUMBER(10),

  TEST VARCHAR(100),

  REG_DT DATE DEFAULT SYSDATE 

);


INSERT INTO HDFS_4

SELECT 'USR_'||Dbms_Random.string('A',5),

       Trunc(Dbms_Random.Value(10000,90000)),

       Dbms_Random.string('A',100),

       SYSDATE + To_Number(Dbms_Random.Value(1,30)) 

FROM DUAL

CONNECT BY LEVEL <= 1000;








1. text plain 


실행한 명령어

sqoop import --target-dir=/dev/test2_text --query='select *from HDFS_4 where $CONDITIONS' -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_text --query='select *from HDFS_4 where $CONDITIONS' -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:09:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:09:28 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:09:28 INFO manager.SqlManager: Using default fetchSize of 1000

16/08/10 09:09:28 INFO tool.CodeGenTool: Beginning code generation

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:09:29 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:09:29 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:09:29 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:09:30 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/284d77af5917fa2113d961ae72341cc4/QueryResult.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:09:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/284d77af5917fa2113d961ae72341cc4/QueryResult.jar

16/08/10 09:09:31 INFO mapreduce.ImportJobBase: Beginning query import.

16/08/10 09:09:33 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 09:09:33 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 09:09:35 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 09:09:35 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM (select *from HDFS_4 where  (1 = 1) ) t1

16/08/10 09:09:35 WARN db.TextSplitter: Generating splits for a textual index column.

16/08/10 09:09:35 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

16/08/10 09:09:35 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

16/08/10 09:09:35 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 09:09:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0007

16/08/10 09:09:35 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0007

16/08/10 09:09:35 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0007/

16/08/10 09:09:35 INFO mapreduce.Job: Running job: job_1470728284233_0007

16/08/10 09:09:45 INFO mapreduce.Job: Job job_1470728284233_0007 running in uber mode : false

16/08/10 09:09:45 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 09:09:53 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 09:09:55 INFO mapreduce.Job:  map 75% reduce 0%

16/08/10 09:09:56 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 09:09:56 INFO mapreduce.Job: Job job_1470728284233_0007 completed successfully

16/08/10 09:09:56 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=606092

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=489

HDFS: Number of bytes written=139000

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=57580

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=28790

Total vcore-seconds taken by all map tasks=28790

Total megabyte-seconds taken by all map tasks=44221440

Map-Reduce Framework

Map input records=1000

Map output records=1000

Input split bytes=489

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=2674

CPU time spent (ms)=38860

Physical memory (bytes) snapshot=1292296192

Virtual memory (bytes) snapshot=13235789824

Total committed heap usage (bytes)=689438720

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=139000

16/08/10 09:09:56 INFO mapreduce.ImportJobBase: Transferred 135.7422 KB in 23.6533 seconds (5.7388 KB/sec)

16/08/10 09:09:56 INFO mapreduce.ImportJobBase: Retrieved 1000 records.

[hdfs@amb2 ~]$ 



cat 으로 보면 내용이 다 보인다 

컬럼간의 구분이 쉼표로 되어있는것이 보인다.

CSV랑 같은 형태로 저장



2. AVRO 


실행한 명령어

sqoop import --target-dir=/dev/test2_avro --query='select *from HDFS_4 where $CONDITIONS' --as-avrodatafile  -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_avro --query='select *from HDFS_4 where $CONDITIONS' --as-avrodatafile  -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:15:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:15:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:15:04 INFO manager.SqlManager: Using default fetchSize of 1000

16/08/10 09:15:04 INFO tool.CodeGenTool: Beginning code generation

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:15:05 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:15:05 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:15:05 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:15:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/837dbbb2e304900b1151d2fa6186b0b7/QueryResult.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:15:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/837dbbb2e304900b1151d2fa6186b0b7/QueryResult.jar

16/08/10 09:15:07 INFO mapreduce.ImportJobBase: Beginning query import.

16/08/10 09:15:08 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:15:08 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:15:08 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:15:08 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: /tmp/sqoop-hdfs/compile/837dbbb2e304900b1151d2fa6186b0b7/QueryResult.avsc

16/08/10 09:15:08 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 09:15:08 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 09:15:10 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 09:15:10 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM (select *from HDFS_4 where  (1 = 1) ) t1

16/08/10 09:15:10 WARN db.TextSplitter: Generating splits for a textual index column.

16/08/10 09:15:10 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

16/08/10 09:15:10 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

16/08/10 09:15:10 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 09:15:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0008

16/08/10 09:15:11 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0008

16/08/10 09:15:11 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0008/

16/08/10 09:15:11 INFO mapreduce.Job: Running job: job_1470728284233_0008

16/08/10 09:15:18 INFO mapreduce.Job: Job job_1470728284233_0008 running in uber mode : false

16/08/10 09:15:18 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 09:15:26 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 09:15:27 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 09:15:28 INFO mapreduce.Job: Job job_1470728284233_0008 completed successfully

16/08/10 09:15:28 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=607148

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=489

HDFS: Number of bytes written=130230

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=49984

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=24992

Total vcore-seconds taken by all map tasks=24992

Total megabyte-seconds taken by all map tasks=38387712

Map-Reduce Framework

Map input records=1000

Map output records=1000

Input split bytes=489

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1696

CPU time spent (ms)=38110

Physical memory (bytes) snapshot=1327677440

Virtual memory (bytes) snapshot=13322395648

Total committed heap usage (bytes)=721420288

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=130230

16/08/10 09:15:28 INFO mapreduce.ImportJobBase: Transferred 127.1777 KB in 20.4259 seconds (6.2263 KB/sec)

16/08/10 09:15:28 INFO mapreduce.ImportJobBase: Retrieved 1000 records.

[hdfs@amb2 ~]$ 



cat 으로보면 구분자쪽이 약간 깨진것처럼 보인다. 



3. 시퀀스파일 


실행한 명령어

sqoop import --target-dir=/dev/test2_seq --query='select *from HDFS_4 where $CONDITIONS' --as-sequencefile  -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_seq --query='select *from HDFS_4 where $CONDITIONS' --as-sequencefile  -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:22:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:22:31 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:22:31 INFO manager.SqlManager: Using default fetchSize of 1000

16/08/10 09:22:31 INFO tool.CodeGenTool: Beginning code generation

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:22:32 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:22:32 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:22:32 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:22:32 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/3a0d3133347de2dccea51b1c74f948bd/QueryResult.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:22:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/3a0d3133347de2dccea51b1c74f948bd/QueryResult.jar

16/08/10 09:22:34 INFO mapreduce.ImportJobBase: Beginning query import.

16/08/10 09:22:36 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 09:22:36 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 09:22:38 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 09:22:38 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM (select *from HDFS_4 where  (1 = 1) ) t1

16/08/10 09:22:38 WARN db.TextSplitter: Generating splits for a textual index column.

16/08/10 09:22:38 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

16/08/10 09:22:38 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

16/08/10 09:22:38 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 09:22:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0009

16/08/10 09:22:38 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0009

16/08/10 09:22:39 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0009/

16/08/10 09:22:39 INFO mapreduce.Job: Running job: job_1470728284233_0009

16/08/10 09:22:47 INFO mapreduce.Job: Job job_1470728284233_0009 running in uber mode : false

16/08/10 09:22:47 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 09:22:54 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 09:22:55 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 09:22:56 INFO mapreduce.Job: Job job_1470728284233_0009 completed successfully

16/08/10 09:22:56 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=605572

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=489

HDFS: Number of bytes written=157788

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=48586

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=24293

Total vcore-seconds taken by all map tasks=24293

Total megabyte-seconds taken by all map tasks=37314048

Map-Reduce Framework

Map input records=1000

Map output records=1000

Input split bytes=489

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=765

CPU time spent (ms)=27050

Physical memory (bytes) snapshot=1312333824

Virtual memory (bytes) snapshot=13253160960

Total committed heap usage (bytes)=707788800

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=157788

16/08/10 09:22:56 INFO mapreduce.ImportJobBase: Transferred 154.0898 KB in 20.6491 seconds (7.4623 KB/sec)

16/08/10 09:22:56 INFO mapreduce.ImportJobBase: Retrieved 1000 records.

[hdfs@amb2 ~]$ 


cat 으로보면 맨위에 시퀀스파일이다! 라고 선언하고 약간깨진듯한문자로 이어진 형태를 보인다.



4. 파켓? 파큇? 파일 


실행한 명령어

* sqoop 은 현재 테스트버전은 1.4.6 이다. 명령어를 보면 위와 좀 다르다.

위에 명령어는 전부 query를 날린건데 이건 TABLE지정으로 바꿨다. 

현재 버그가 있는 관계로 아래처럼 바꿨으며 테스트 시점에 검색을 해보니 패치파일이 있었다. 

그러나 난 ambari를 통해서 아예 통으로 설치해서 일일 찾아서 수정하기 귀찮으므로 1.4.7버전을 기다린다 그냥. ㅎㅎ;

sqoop import --target-dir=/dev/test2_pq --table HDFS_4 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID


아래는 쿼리를 실행했을경우 나는 오류로 JAVA null pointer 예외 가 발생한다.

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_pq --query='select *from HDFS_4 where $CONDITIONS' --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:29:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:29:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:29:29 INFO manager.SqlManager: Using default fetchSize of 1000

16/08/10 09:29:29 INFO tool.CodeGenTool: Beginning code generation

16/08/10 09:29:29 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException

java.lang.NullPointerException

at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:97)

at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)

at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)

at org.apache.sqoop.Sqoop.run(Sqoop.java:148)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)

at org.apache.sqoop.Sqoop.main(Sqoop.java:244)

[hdfs@amb2 ~]$


실행 결과 

TABLE 을 가져오는 명령으로 실행하면 아래와 같이 실행중에 에러가난다.  

젠장~ 내용을 대충 보자면~ 데이터 타입으로 보여져서 기존table에서 데이터 타입만 변경한 뷰를 만들어서 그 뷰를 가져와보기로 한다.

16/08/10 09:31:25 INFO mapreduce.Job: Task Id : attempt_1470728284233_0010_m_000002_1, Status : FAILED

Error: org.apache.avro.UnresolvedUnionException: Not in union ["null","long"]: 2016-08-29 21:09:28.0

at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:561)

at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:941)

at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:922)

at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.copy(DatasetKeyOutputFormat.java:327)

at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.write(DatasetKeyOutputFormat.java:321)

at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.write(DatasetKeyOutputFormat.java:300)

at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)

at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)

at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)

at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:70)

at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:39)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)

at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)


Container killed by the ApplicationMaster.

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143


위에러로 인한 뷰생성후 재실행 

CREATE OR REPLACE VIEW VW_HDFS_4

AS

SELECT ID,NUM,TEST,To_Char(REG_DT,'YYYY-MM-DD HH24:MI:SS') REG_DT

  FROM HDFS_4;


실행결과 - 아래 박스에서 검은글씨부분은 에러난것 , 컬러가 있는것은 정상실행부분 

에러는 이미 실행전에 hadoop에 디렉토리를 만들고 meta파일을 생성해서 이미 존재한다고 한 에러.

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_pq --table VW_HDFS_4 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:36:33 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:36:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:36:33 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:36:34 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop will not process this sqoop connection, as "FLASHONE"."VW_HDFS_4" is not an Oracle table, it's a VIEW.

16/08/10 09:36:34 INFO tool.CodeGenTool: Beginning code generation

16/08/10 09:36:34 INFO tool.CodeGenTool: Will generate java class as codegen_VW_HDFS_4

16/08/10 09:36:35 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:36:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "VW_HDFS_4" t WHERE 1=0

16/08/10 09:36:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/95f85eece904f411d74c356ac450d5b5/codegen_VW_HDFS_4.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:36:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/95f85eece904f411d74c356ac450d5b5/codegen_VW_HDFS_4.jar

16/08/10 09:36:36 INFO mapreduce.ImportJobBase: Beginning import of VW_HDFS_4

16/08/10 09:36:37 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:36:38 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:36:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "VW_HDFS_4" t WHERE 1=0

16/08/10 09:36:38 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetExistsException: Descriptor directory already exists: hdfs://amb2.local:8020/dev/test2_pq/.metadata

org.kitesdk.data.DatasetExistsException: Descriptor directory already exists: hdfs://amb2.local:8020/dev/test2_pq/.metadata

at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:219)

at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)

at org.kitesdk.data.Datasets.create(Datasets.java:239)

at org.kitesdk.data.Datasets.create(Datasets.java:307)

at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)

at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)

at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)

at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)

at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)

at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:445)

at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)

at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)

at org.apache.sqoop.Sqoop.run(Sqoop.java:148)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)

at org.apache.sqoop.Sqoop.main(Sqoop.java:244)



[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_pq2 --table VW_HDFS_4 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:36:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:36:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:36:56 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:36:57 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop will not process this sqoop connection, as "FLASHONE"."VW_HDFS_4" is not an Oracle table, it's a VIEW.

16/08/10 09:36:57 INFO tool.CodeGenTool: Beginning code generation

16/08/10 09:36:57 INFO tool.CodeGenTool: Will generate java class as codegen_VW_HDFS_4

16/08/10 09:36:57 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:36:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "VW_HDFS_4" t WHERE 1=0

16/08/10 09:36:57 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/586ec1c10f04fc4c1fdd5986c9fae96b/codegen_VW_HDFS_4.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:36:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/586ec1c10f04fc4c1fdd5986c9fae96b/codegen_VW_HDFS_4.jar

16/08/10 09:36:59 INFO mapreduce.ImportJobBase: Beginning import of VW_HDFS_4

16/08/10 09:36:59 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:37:00 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:37:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "VW_HDFS_4" t WHERE 1=0

16/08/10 09:37:02 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 09:37:02 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 09:37:04 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 09:37:04 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN("ID"), MAX("ID") FROM "VW_HDFS_4"

16/08/10 09:37:04 WARN db.TextSplitter: Generating splits for a textual index column.

16/08/10 09:37:04 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

16/08/10 09:37:04 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

16/08/10 09:37:04 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 09:37:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0011

16/08/10 09:37:04 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0011

16/08/10 09:37:04 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0011/

16/08/10 09:37:04 INFO mapreduce.Job: Running job: job_1470728284233_0011

16/08/10 09:37:11 INFO mapreduce.Job: Job job_1470728284233_0011 running in uber mode : false

16/08/10 09:37:11 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 09:37:20 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 09:37:21 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 09:37:22 INFO mapreduce.Job: Job job_1470728284233_0011 completed successfully

16/08/10 09:37:22 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=610560

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=28037

HDFS: Number of bytes written=137608

HDFS: Number of read operations=200

HDFS: Number of large read operations=0

HDFS: Number of write operations=36

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=60614

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=30307

Total vcore-seconds taken by all map tasks=30307

Total megabyte-seconds taken by all map tasks=46551552

Map-Reduce Framework

Map input records=1000

Map output records=1000

Input split bytes=505

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1345

CPU time spent (ms)=41390

Physical memory (bytes) snapshot=1502236672

Virtual memory (bytes) snapshot=13242277888

Total committed heap usage (bytes)=828375040

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=0

16/08/10 09:37:22 INFO mapreduce.ImportJobBase: Transferred 134.3828 KB in 21.319 seconds (6.3034 KB/sec)

16/08/10 09:37:22 INFO mapreduce.ImportJobBase: Retrieved 1000 records.

[hdfs@amb2 ~]$


cat으로 보면 첫번째 스크린샷은 시작지점이고 두번째 스크린샷은 마지막부분이다. 

타입에대한 내용도 기술되어있다. 좀더 보면 ~avro.schema라는 글도 보인다. 

검색대로 스키마를 avro를 사용한것으로 보인다.



이번프로젝트에서는 text(CSV) 와 parquet을 사용하력 한다.

이제는 parquet으로 저장했을경우 데이터를 다른놈들이 잘가져오는지를 봐야한다. ㅠ_ㅜ;

저작자 표시
신고
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

Storm 을 소개하는 글을 보면 보통 실시간 데이터 처리용 이라고들 하더라.

아무튼 어찌되었든 쓰긴할것처럼 보여서 일단 Hello World 찍어보고 .기본사용법만 익혀본다.


완전 java 로 만들어져있는것처럼 

다른 opensource 마냥 무언가 application 을 설치하는 형태라고 보긴 모호한것같다.



사용 버전 : 0.9.6 

사용노드 수 : 6대 ( 1 : nimbus , 5 : supervisor )

주키퍼 버전 : 3.4.6 (원래쓰던거임으로 ~ 일단 이건 패스)


수정한 설정파일 내역

주키퍼설정, 디렉토리 , 호트 , 슬롯포트 정도만 넣었다.

conf 디렉토리의 storm.yaml 파일만 수정하고 모든 노드에 동일하게 적용. 


########### These MUST be filled in for a storm configuration

storm.zookeeper.servers:

     - "192.168.0.112"

     - "192.168.0.113"

     - "192.168.0.114"

     - "192.168.0.115"

     - "192.168.0.116"



storm.local.dir: "/apps/strm/data"


nimbus.host: "192.168.0.111"

supervisor.slots.ports:

    - 6700

    - 6701

    - 6702

    - 6703


ui.port: 6799



설정후 백그라운드 실행을 하면

자바프로세스명으로 nimbus ( 마스터 노드 ? )

나머지 5대노드에선 (supervisor ) 라고 보일것이다.    

그리고 마스터 노드에선 ( nimbus 가 있는 노드 ) ui 도 띄웠다. 


전부 백드라운드로 실행함.  


UI 화면 ( http://192.168.0.111:6799/index.html )




HelloWorld 찍을 소스 ( 책보고 만듬 : 책이름 : 에이콘꺼 아파치 Storm을 이용한 분산 실시간 빅데이터 처리 - 도움된다. 적절히 ㅋ ) 

스톰의 HelloWorld 는 단어세기라고 한다. 우리 그 뭐시냐 빅데이터 RM 테스트할때도 단어세기가 대표적이니.이것도 마찬가지인듯.

(아래소스는 싱글노드용이나 마찬가지다. 클러스토로 할땐 다른클래스를 호출해서 써야하던데 그건 다음기록에;;;)


테스트한 단어 배열 ( 미국 국가라고 한다 - 검색해서 대충 복사함 )


Spout 부분

public class TestSpout extends BaseRichSpout 

{

private SpoutOutputCollector collector;

private String[] test_str = 

{

"Oh, say can you see, by the dawn’s early light,",

"What so proudly we hailed at the twilight’s last gleaming?",

"Whose broad stripes and bri",

"ght stars, through the perilous fight,",

"O’er the ramparts we watched, were so gallantly streaming?",

"And the rockets’red glare, the bombs bursting in air,",

"Gave proof through the night that our flag was still there.",

"Oh say, does that star spangled banner yet wave",

"O’er the land of the free and the home of the brave?"

};

private int index = 0;

/*  

    public static void main( String[] args )

    {

        System.out.println( "Hello World!" );

    }

*/

@Override

public void declareOutputFields(OutputFieldsDeclarer arg0) {

// TODO Auto-generaated method stub

System.out.println("sput declareoutputfields!!!");

arg0.declare(new Fields("sentence"));

}


@Override

public void open(Map arg0, TopologyContext arg1, SpoutOutputCollector arg2) {

// TODO Auto-generated method stub

System.out.println("sput open!!!");

this.collector = arg2;

}

@Override

public void nextTuple() {

// TODO Auto-generated method stub

System.out.println("sput nexttuple!!!");

this.collector.emit(new Values(test_str[index]));

index++;

System.out.println("sput nexttuple =======" + index);

System.out.println(index + "<<<<<<<<<<<<<<<<<<<=============================>>>>>>>>>>>>>>>>>>>>>" + test_str.length);

if ( index >= test_str.length)

{

index = 0;

System.out.println("sput nexttuple =======zero ===============" + 0);

}

waitForSeconds(1);

System.out.println("next tupple last");

}


public static void waitForSeconds(int seconds) {

        try {

            Thread.sleep(seconds * 1);

        } catch (InterruptedException e) {

        }

    }


}


Bolt부분 ( 문장배열 분리 )

public class TestBolt extends BaseRichBolt {


private OutputCollector collector;

@Override

public void declareOutputFields(OutputFieldsDeclarer arg0) {

// TODO Auto-generated method stub

System.out.println("==================== declare ====================");

arg0.declare(new Fields("word")); 

}

@Override

public void prepare(Map arg0, TopologyContext arg1, OutputCollector arg2) {

// TODO Auto-generated method stub

System.out.println("bolt prepare!!!");

this.collector = arg2;

}

@Override

public void execute(Tuple arg0) {

// TODO Auto-generated method stub

System.out.println("bolt execute!!!");

String sentence = arg0.getStringByField("sentence");

String[] words = sentence.split(" ");

for(String word: words)

{

this.collector.emit(new Values(word));

System.out.println("bolt execute!!! === >>>>>>>>>>>>>>>> " + word);

}

}


Bolt부분 두번째 ( 분리한 문장과 단어에서 단어 수 체크  작업 )

public class Test2Bolt extends BaseRichBolt {


private OutputCollector collector;

private HashMap<String,Long> counts = null;

@Override

public void prepare(Map arg0, TopologyContext arg1, OutputCollector arg2) {

// TODO Auto-generated method stub

System.out.println("test2 prepare");

this.collector = arg2;

this.counts = new HashMap<String, Long>();

}

@Override

public void execute(Tuple arg0) {

// TODO Auto-generated method stub

System.out.println("test2 execute");

String word = arg0.getStringByField("word");

Long count = this.counts.get(word);

if(count == null)

{

count = 0L;

}

count++;

this.counts.put(word, count);

this.collector.emit(new Values(word,count));

}



@Override

public void declareOutputFields(OutputFieldsDeclarer arg0) {

// TODO Auto-generated method stub

System.out.println("test2 declareoutput");

arg0.declare( new Fields("word","count"));

}


}



Bolt부분 세번째 ( 결과 출력 부분 )

public class ResultBolt extends BaseRichBolt {


private HashMap<String,Long> counts = null;

@Override

public void execute(Tuple arg0) {

// TODO Auto-generated method stub

String word = arg0.getStringByField("word");

Long count = arg0.getLongByField("count");

this.counts.put(word, count);

}


@Override

public void prepare(Map arg0, TopologyContext arg1, OutputCollector arg2) {

// TODO Auto-generated method stub

this.counts = new HashMap<String, Long>();

}


@Override

public void declareOutputFields(OutputFieldsDeclarer arg0) {

// TODO Auto-generated method stub

System.out.println("====================== This is result declare ================");

}

public void cleanup()

{

System.out.println("====================== Result ========================");

List<String> keys = new ArrayList<String>();

keys.addAll(this.counts.keySet());

Collections.sort(keys);

for (String key : keys)

{

System.out.println("======> " + key + "=======>>>>>>" + this.counts.get(key));

}

System.out.println("====================== Result ========================");

}


}




토폴로지 구현

이게 java 로 말하면 main 인듯.

토폴로지 부분을 보면 SPOUT 에들어간 인스턴스 넣고 , 이놈이 다음 볼트에 들어가고 , 그놈이 또 다음볼트 , 이놈이 또 다음볼트 이렇게 

들어가는것처럼 보인다.


해당 책을 가지고 있다면 글에도 써있겠지만 로컬테스트에선 이상없는데 난 일단 운영서버처럼 서버구성후 클러스터로 돌려서 그런지

이게 무한대로 돌더라~ 


그래서 아래 토폴로지 소스의 빨간색부분을 보면 이게 출력하는 매쏘드 인데.

이걸 직접 호출함. 아무튼 그런거같음 ㅋ


그러니 책있는 사람중에 노드 구현해서 테스트하는 사람은 . 저거 넣어줘야 출력이 될것임.


public class TestTolpo {


private static final String SENDENCE_SPOUT_ID = "test_spout_id";

private static final String SPLIT_BOLT_ID = "test_split_bolt_id";

private static final String COUNT_BOLT_ID = "test_count_bolt_id";

private static final String RESULT_BOLT_ID = "test_result_bolt_id";

private static final String TOPOL_NAME = "tpol_name";

public static void main(String[] args) throws Exception {

TestSpout ts = new TestSpout();

TestBolt tb1 = new TestBolt();

Test2Bolt tb2 = new Test2Bolt();

ResultBolt rs = new ResultBolt();

TopologyBuilder build = new TopologyBuilder();

System.out.println("======================== bolt set ========================== ");

build.setSpout(SENDENCE_SPOUT_ID, ts);

build.setBolt(SPLIT_BOLT_ID, tb1).shuffleGrouping(SENDENCE_SPOUT_ID);

build.setBolt(COUNT_BOLT_ID, tb2).fieldsGrouping(SPLIT_BOLT_ID, new Fields("word"));

build.setBolt(RESULT_BOLT_ID, rs).globalGrouping(COUNT_BOLT_ID);

Config cf = new Config();

System.out.println("======================== cluster regist ========================== ");

LocalCluster cluster = new LocalCluster();

cluster.submitTopology(TOPOL_NAME,cf,build.createTopology());

System.out.println("======================== submit ========================== ");

waitForSeconds(20);

System.out.println("======================== 10 s ========================== ");

cluster.killTopology(TOPOL_NAME);

rs.cleanup();

System.out.println("======================== kill ========================== ");

cluster.shutdown();

System.out.println("======================== shutdown ========================== ");

}

public static void waitForSeconds(int seconds) {

        try {

            Thread.sleep(seconds * 1000);

        } catch (InterruptedException e) {

        } 

    }

}




위의 소스를 가지고 jar로 만들어준다음 ( 난 maven 씀 )

storm 토폴로지에 등록시켜서 실행해본다.

 ./storm jar /home/hadoop/chapter1-0.0.1-SNAPSHOT.jar storm.blueprint.chapter1.TestTolpo 


이러면 자바로 뭐 막 이것저것 쭉쭉 등록하고 어쩌구 저쩌구 실행이 되는과정이 나온다. ㅋ 

======================== bolt set ==========================

======================== cluster regist ==========================

2126 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT

2130 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:host.name=os1.local

2130 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:java.version=1.8.0_77

2130 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:java.vendor=Oracle Corporation

2131 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:java.home=/apps/j2se/jre

2131 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:java.class.path=/apps/strm/lib/storm-core-0.9.6.jar:/apps/strm/lib/ring-jetty-adapter-0.3.11.jar:/apps/strm/lib/clout-1.0.1.jar:/apps/strm/lib/joda-time-2.0.jar:/apps/strm/lib/

math.numeric-tower-0.0.1.jar:/apps/strm/lib/clojure-1.5.1.jar:/apps/strm/lib/jline-2.11.jar:/apps/strm/lib/compojure-1.1.3.jar:/apps/strm/lib/logback-core-1.0.13.jar:/apps/strm/lib/commons-codec-1.6.jar:/apps/strm/lib/jetty-6.1.26.jar:/apps/strm/lib/kr

yo-2.21.jar:/apps/strm/lib/ring-core-1.1.5.jar:/apps/strm/lib/disruptor-2.10.4.jar:/apps/strm/lib/commons-logging-1.1.3.jar:/apps/strm/lib/core.incubator-0.1.0.jar:/apps/strm/lib/logback-classic-1.0.13.jar:/apps/strm/lib/jetty-util-6.1.26.jar:/apps/str

m/lib/hiccup-0.3.6.jar:/apps/strm/lib/commons-fileupload-1.2.1.jar:/apps/strm/lib/objenesis-1.2.jar:/apps/strm/lib/commons-io-2.4.jar:/apps/strm/lib/json-simple-1.1.jar:/apps/strm/lib/commons-lang-2.5.jar:/apps/strm/lib/ring-devel-0.3.11.jar:/apps/strm

/lib/ring-servlet-0.3.11.jar:/apps/strm/lib/servlet-api-2.5.jar:/apps/strm/lib/tools.cli-0.2.4.jar:/apps/strm/lib/asm-4.0.jar:/apps/strm/lib/carbonite-1.4.0.jar:/apps/strm/lib/tools.logging-0.2.3.jar:/apps/strm/lib/tools.macro-0.1.0.jar:/apps/strm/lib/

slf4j-api-1.7.5.jar:/apps/strm/lib/snakeyaml-1.11.jar:/apps/strm/lib/clj-stacktrace-0.2.2.jar:/apps/strm/lib/clj-time-0.4.1.jar:/apps/strm/lib/chill-java-0.3.5.jar:/apps/strm/lib/reflectasm-1.07-shaded.jar:/apps/strm/lib/commons-exec-1.1.jar:/apps/strm

/lib/log4j-over-slf4j-1.6.6.jar:/apps/strm/lib/minlog-1.2.jar:/apps/strm/lib/jgrapht-core-0.9.0.jar:/home/hadoop/chapter1-0.0.1-SNAPSHOT.jar:/apps/strm/conf:/apps/strm/bin

2131 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib

2131 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp

2131 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:java.compiler=<NA>

2131 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:os.name=Linux

2131 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:os.arch=amd64

2131 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:os.version=3.10.0-327.10.1.el7.x86_64

2131 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:user.name=hadoop

2131 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:user.home=/home/hadoop

2131 [main] INFO  org.apache.storm.zookeeper.ZooKeeper - Client environment:user.dir=/apps/strm/bin

2144 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT

2145 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:host.name=os1.local

2145 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:java.version=1.8.0_77

2145 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:java.vendor=Oracle Corporation

2145 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:java.home=/apps/j2se/jre

2148 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:java.class.path=/apps/strm/lib/storm-core-0.9.6.jar:/apps/strm/lib/ring-jetty-adapter-0.3.11.jar:/apps/strm/lib/clout-1.0.1.jar:/apps/strm/lib/joda-time-2.0.jar:/a

pps/strm/lib/math.numeric-tower-0.0.1.jar:/apps/strm/lib/clojure-1.5.1.jar:/apps/strm/lib/jline-2.11.jar:/apps/strm/lib/compojure-1.1.3.jar:/apps/strm/lib/logback-core-1.0.13.jar:/apps/strm/lib/commons-codec-1.6.jar:/apps/strm/lib/jetty-6.1.26.jar:/app

s/strm/lib/kryo-2.21.jar:/apps/strm/lib/ring-core-1.1.5.jar:/apps/strm/lib/disruptor-2.10.4.jar:/apps/strm/lib/commons-logging-1.1.3.jar:/apps/strm/lib/core.incubator-0.1.0.jar:/apps/strm/lib/logback-classic-1.0.13.jar:/apps/strm/lib/jetty-util-6.1.26.

jar:/apps/strm/lib/hiccup-0.3.6.jar:/apps/strm/lib/commons-fileupload-1.2.1.jar:/apps/strm/lib/objenesis-1.2.jar:/apps/strm/lib/commons-io-2.4.jar:/apps/strm/lib/json-simple-1.1.jar:/apps/strm/lib/commons-lang-2.5.jar:/apps/strm/lib/ring-devel-0.3.11.j

ar:/apps/strm/lib/ring-servlet-0.3.11.jar:/apps/strm/lib/servlet-api-2.5.jar:/apps/strm/lib/tools.cli-0.2.4.jar:/apps/strm/lib/asm-4.0.jar:/apps/strm/lib/carbonite-1.4.0.jar:/apps/strm/lib/tools.logging-0.2.3.jar:/apps/strm/lib/tools.macro-0.1.0.jar:/a

pps/strm/lib/slf4j-api-1.7.5.jar:/apps/strm/lib/snakeyaml-1.11.jar:/apps/strm/lib/clj-stacktrace-0.2.2.jar:/apps/strm/lib/clj-time-0.4.1.jar:/apps/strm/lib/chill-java-0.3.5.jar:/apps/strm/lib/reflectasm-1.07-shaded.jar:/apps/strm/lib/commons-exec-1.1.j

ar:/apps/strm/lib/log4j-over-slf4j-1.6.6.jar:/apps/strm/lib/minlog-1.2.jar:/apps/strm/lib/jgrapht-core-0.9.0.jar:/home/hadoop/chapter1-0.0.1-SNAPSHOT.jar:/apps/strm/conf:/apps/strm/bin

2148 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib

2148 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:java.io.tmpdir=/tmp

2149 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:java.compiler=<NA>

2149 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:os.name=Linux

2149 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:os.arch=amd64

2149 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:os.version=3.10.0-327.10.1.el7.x86_64

2149 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:user.name=hadoop

2149 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:user.home=/home/hadoop

2149 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Server environment:user.dir=/apps/strm/bin

2993 [main] INFO  org.apache.storm.zookeeper.server.ZooKeeperServer - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /tmp/0ff07ea0-be81-44fb-a9e1-c813c71bce61/version-2 snapdir /tmp/0ff07ea0-be81-44fb-a9e1-c813

c71bce61/version-2

3007 [main] INFO  org.apache.storm.zookeeper.server.NIOServerCnxnFactory - binding to port 0.0.0.0/0.0.0.0:2000

3014 [main] INFO  backtype.storm.zookeeper - Starting inprocess zookeeper at port 2000 and dir /tmp/0ff07ea0-be81-44fb-a9e1-c813c71bce61

3284 [main] INFO  backtype.storm.daemon.nimbus - Starting Nimbus with conf {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, "topology.builtin.metrics.bucket.size.secs" 60, "topology.fall.back.on.java.serialization"

 true, "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 0, "topology.skip.missing.kryo.registrations" true, "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reas

sign" true, "topology.trident.batch.emit.interval.millis" 50, "storm.messaging.netty.flush.check.interval.ms" 10, "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", "topology.e

xecutor.send.buffer.size" 1024, "storm.local.dir" "/tmp/6ab89940-328b-4454-b360-4dbe4e7dabc3", "storm.messaging.netty.buffer_size" 5242880, "supervisor.worker.start.timeout.secs" 120, "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.

secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64, "storm.meta.serialization.delegate" "backtype.storm.serialization.DefaultSerializationDelegate", "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "192.168.0.111", "

storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 2000, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size" 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", "storm.zookeeper.retry.interv

alceiling.millis" 30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_threads" 1, "storm.zookeeper.servers" ["localhost"], "transactional.zookeeper.root" "/transactional", "topology.acker.executors" nil, "topology.transfer.buffer.size

" 1024, "topology.worker.childopts" nil, "drpc.queue.size" 128, "worker.childopts" "-Xmx768m", "supervisor.heartbeat.frequency.secs" 5, "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor.frequency.secs" 3, "d

rpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, "topology.tasks" nil, "storm.messaging.netty.



중간은 열라 길어서 패스 ~ ㅡ_ㅡ;;;;;;



====================== Result ========================

======> And=======>>>>>>1213

======> Gave=======>>>>>>1213

======> Oh=======>>>>>>1212

======> Oh,=======>>>>>>1213

======> O셞r=======>>>>>>2425

======> What=======>>>>>>1213

======> Whose=======>>>>>>1213

======> air,=======>>>>>>1213

======> and=======>>>>>>2425

======> at=======>>>>>>1213

======> banner=======>>>>>>1212

======> bombs=======>>>>>>1213

======> brave?=======>>>>>>1212

======> bri=======>>>>>>1213

======> broad=======>>>>>>1213

======> bursting=======>>>>>>1213

======> by=======>>>>>>1213

======> can=======>>>>>>1213

======> dawn셲=======>>>>>>1213

======> does=======>>>>>>1212

======> early=======>>>>>>1213

======> fight,=======>>>>>>1213

======> flag=======>>>>>>1213

======> free=======>>>>>>1212

======> gallantly=======>>>>>>1213

======> ght=======>>>>>>1213

======> glare,=======>>>>>>1213

======> gleaming?=======>>>>>>1213

======> hailed=======>>>>>>1213

======> home=======>>>>>>1212

======> in=======>>>>>>1213

======> land=======>>>>>>1212

======> last=======>>>>>>1213

======> light,=======>>>>>>1213

======> night=======>>>>>>1213

======> of=======>>>>>>2424

======> our=======>>>>>>1213

======> perilous=======>>>>>>1213

======> proof=======>>>>>>1213

======> proudly=======>>>>>>1213

======> ramparts=======>>>>>>1213

======> rockets셱ed=======>>>>>>1213

======> say=======>>>>>>1213

======> say,=======>>>>>>1212

======> see,=======>>>>>>1213

======> so=======>>>>>>2426

======> spangled=======>>>>>>1212

======> star=======>>>>>>1212

======> stars,=======>>>>>>1213

======> still=======>>>>>>1213

======> streaming?=======>>>>>>1213

======> stripes=======>>>>>>1213

======> that=======>>>>>>2425

======> the=======>>>>>>13339

======> there.=======>>>>>>1213

======> through=======>>>>>>2426

======> twilight셲=======>>>>>>1213

======> was=======>>>>>>1213

======> watched,=======>>>>>>1213

======> wave=======>>>>>>1212

======> we=======>>>>>>2426

======> were=======>>>>>>1213

======> yet=======>>>>>>1212

======> you=======>>>>>>1213

====================== Result ========================

55332 [Thread-4] INFO  backtype.storm.daemon.executor - Shut down executor test_result_bolt_id:[3 3]

55333 [Thread-4] INFO  backtype.storm.daemon.executor - Shutting down executor test_split_bolt_id:[4 4]

55334 [Thread-12-disruptor-executor[4 4]-send-queue] INFO  backtype.storm.util - Async loop interrupted!

55334 [Thread-13-test_split_bolt_id] INFO  backtype.storm.util - Async loop interrupted!

55335 [Thread-4] INFO  backtype.storm.daemon.executor - Shut down executor test_split_bolt_id:[4 4]

55336 [Thread-4] INFO  backtype.storm.daemon.executor - Shutting down executor test_spout_id:[5 5]

55339 [Thread-15-test_spout_id] INFO  backtype.storm.util - Async loop interrupted!

55339 [Thread-14-disruptor-executor[5 5]-send-queue] INFO  backtype.storm.util - Async loop interrupted!

55341 [Thread-4] INFO  backtype.storm.daemon.executor - Shut down executor test_spout_id:[5 5]

55342 [Thread-4] INFO  backtype.storm.daemon.executor - Shutting down executor __system:[-1 -1]

55342 [Thread-17-__system] INFO  backtype.storm.util - Async loop interrupted!

55343 [Thread-16-disruptor-executor[-1 -1]-send-queue] INFO  backtype.storm.util - Async loop interrupted!

55343 [Thread-4] INFO  backtype.storm.daemon.executor - Shut down executor __system:[-1 -1]

Storm 은개념이 안서면 쓰기힘들듯.

저작자 표시
신고
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

개놈에 카산드라 -- 연동 짜증난다 ㅋ

Hbase 연동을 먼저좀 해보려고 하다가 실패를 하고 짜증나서 카산드라 연동좀 시도해보고 성공후에 기록남김.


spark 버전 : 1.6.1 ( 하둡 2.7.2를 사용해서 하둡용으로 컴파일된 버전사용 )

cassandra 버전 : 3.4 

cassandra spark java connector 버전 : 1.5버전 사용.

node 수 : 6


cassandra 설치는 쉬우므로 내가 기록할 정보만 기록한다.

나는 6대로 계속 테스트중이므로~~~


참고로 모든 노드의 스펙은 아래와 같다. ( vmware 스펙 ㅋ )


카산드라 설정파일 작업부분.

위의 VM들을 가지고 모든 노드에 각각 아이피를 할당하여 설정한 설정정보만 남긴다.


카산드라의 분산및 리플리케이션을 위한 구성을 위해서 건드려야할 파일리스트 


cassandra.yaml

* 변경위치 : 

- listen_address : 해당노드의 아이피

- rpc_address : 해당노드의 아이피

- seed_provider의 seeds 부분 : 연결된노드 아이피들~~ "111.111.111.111,222.222.222.222" 이런식으로.


cassandra-env.sh

* 변경위치 :

- JVM_OPTS의 hostname 부분 : 해당노드의 아이피


여기까지 변경하고 cassandra를 실행하면 1개의 datacenter 에 1개의 rack 으로 해서 6개의 노드가 생겨지더라.


아래 두개파일로 datacenter 는 몇개로 할지 rack은 몇개로 할지 등을 설정.


cassandra-rackdc.properties


cassandra-topology.properties


카산드라 노드 연결된 상태 확인.

[hadoop@os1 bin]$ ./nodetool status

Datacenter: datacenter1

=======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address        Load       Tokens       Owns (effective)  Host ID                               Rack

UN  192.168.0.111  201.03 KB  256          49.5%             e13656b7-1436-4011-bd1d-01042a2d75fc  rack1

UN  192.168.0.112  205.89 KB  256          47.7%             09ef262e-9013-4292-9ea8-98a32ebdadc1  rack1

UN  192.168.0.113  253.59 KB  256          47.7%             bad6d57a-3135-4dff-b546-21887d77aca6  rack1

UN  192.168.0.114  198.71 KB  256          49.3%             c2e5c7bc-b9a1-4751-9e6b-6179b20fc76f  rack1

UN  192.168.0.115  214 KB      256          53.8%             9f9d28e9-7e03-4653-a329-7aba65aaf5f0  rack1

UN  192.168.0.116  219.55 KB  256          52.1%             cffba9e0-19b3-4070-93ba-ecb4e6263e47  rack1


[hadoop@os1 bin]$ 


참고로 노드가 죽으면 DN 으로 바뀌더라;;;


이제 카산드라에 기본 데이터몇개 넣기.

* 설정파일이 로컬이아닌 아이피를 넣으니 cqlsh 만 치면 못찾더라 ㅋ

* 메뉴얼의 굵은글씨만 보고 일단 만들어서 실행 ㅋ


[hadoop@os1 bin]$ ./cqlsh 192.168.0.111

Connected to CASDR at 192.168.0.111:9042.

[cqlsh 5.0.1 | Cassandra 3.4 | CQL spec 3.4.0 | Native protocol v4]

Use HELP for help.

cqlsh> create keyspace keytest with replication = {'class' :'SimpleStrategy','replication_factor' : 3};

cqlsh>

cqlsh> use keytest;

cqlsh:keytest> create table test_table ( id varchar,name varchar,primary key(id));          

cqlsh:keytest> insert into test_table (id,name) values('2','fasdlkffasdffajfkls');

cqlsh:keytest> select *from test_table;


 id | name

----+-------------------------------

  3 | fasdlkfasdfasdfaffasdffajfkls

  2 |           fasdlkffasdffajfkls

  1 |     fasdlkfjafkljfsdklfajfkls


(3 rows)

cqlsh:keytest> 



* 현재 상황은 1~6번까지 노드가 있고 spark는 1번노드에 설치해놓음.

* 인터넷에 떠돌아 다니는 소스를 조합하여 몇몇가지 수정해서 실행함.

* 아래는 콘솔에서 짠 cassandra 테이블 카운트 세는 소스

* 아래소스보면 커넥터를 1.6과 1.5를 테스트하느라 둘다 쓴게 나오는데 1.6 , 1.5어느거 하나만 써도 상관없이 작동하니 신경쓰지말것.

import static com.datastax.spark.connector.japi.CassandraJavaUtil.javaFunctions;


import java.util.ArrayList;

import java.util.List;


import org.apache.spark.SparkConf;

import org.apache.spark.api.java.JavaRDD;

import org.apache.spark.api.java.JavaSparkContext;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.hbase.client.Result;

import org.apache.spark.api.java.JavaPairRDD;

import org.apache.spark.api.java.function.Function;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.mapreduce.TableInputFormat;


import com.datastax.spark.connector.japi.CassandraJavaUtil;

import com.datastax.spark.connector.japi.CassandraRow;




public class test3 {


        public static void main(String[] args)

        {


                SparkConf conf = new SparkConf(true)

                .set("spark.cassandra.connection.host","192.168.0.111")

                .setMaster("spark://192.168.0.111:7077")

                .setAppName("casr")

                .setJars(new String[]{"/apps/spark/lib/spark-cassandra-connector-java_2.10-1.6.0-M1.jar","/apps/spark/lib/spark-cassandra-connec

tor_2.10-1.6.0-M1.jar"})

                .setSparkHome("/apps/spark");



                JavaSparkContext sc = new JavaSparkContext(conf);


                sc.addJar("/home/hadoop/classes/cassandra-driver-core-3.0.0.jar");

                sc.addJar("/home/hadoop/classes/guava-19.0.jar");

                JavaRDD<CassandraRow> cassandraRdd = CassandraJavaUtil.javaFunctions(sc)

                        .cassandraTable("keytest", "test_table")

                        .select("name");


                                        System.out.println("row count:"+ cassandraRdd.count()); // once it is in RDD, we can use RDD 



               }

}


위의것을 컴파일후 실행하면 아래처럼 실행되더라.

[hadoop@os1 ~]$ java test3

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/hadoop/classes/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/home/hadoop/classes/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/04/08 00:53:19 INFO SparkContext: Running Spark version 1.6.1

16/04/08 00:53:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

16/04/08 00:53:20 INFO SecurityManager: Changing view acls to: hadoop

16/04/08 00:53:20 INFO SecurityManager: Changing modify acls to: hadoop

16/04/08 00:53:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)

16/04/08 00:53:21 INFO Utils: Successfully started service 'sparkDriver' on port 54330.

16/04/08 00:53:22 INFO Slf4jLogger: Slf4jLogger started

16/04/08 00:53:22 INFO Remoting: Starting remoting

16/04/08 00:53:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.111:34441]

16/04/08 00:53:22 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 34441.

16/04/08 00:53:22 INFO SparkEnv: Registering MapOutputTracker

16/04/08 00:53:22 INFO SparkEnv: Registering BlockManagerMaster

16/04/08 00:53:22 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e91fb4d5-8f1e-45b5-9631-2af967164aff

16/04/08 00:53:22 INFO MemoryStore: MemoryStore started with capacity 1077.8 MB

16/04/08 00:53:22 INFO SparkEnv: Registering OutputCommitCoordinator

16/04/08 00:53:23 INFO Utils: Successfully started service 'SparkUI' on port 4040.

16/04/08 00:53:23 INFO SparkUI: Started SparkUI at http://192.168.0.111:4040

16/04/08 00:53:23 INFO HttpFileServer: HTTP File server directory is /tmp/spark-d02efafe-ae00-4eaa-9cfc-0700e5848dd3/httpd-c8f30120-f92a-4ec9-922f-c201e94998af

16/04/08 00:53:23 INFO HttpServer: Starting HTTP Server

16/04/08 00:53:23 INFO Utils: Successfully started service 'HTTP file server' on port 48452.

16/04/08 00:53:23 INFO SparkContext: Added JAR /apps/spark/lib/spark-cassandra-connector-java_2.10-1.6.0-M1.jar at http://192.168.0.111:48452/jars/spark-cassandra-connector-java_2.10-1.6.0-M1.jar with timestamp 1460044403426

16/04/08 00:53:23 INFO SparkContext: Added JAR /apps/spark/lib/spark-cassandra-connector_2.10-1.6.0-M1.jar at http://192.168.0.111:48452/jars/spark-cassandra-connector_2.10-1.6.0-M1.jar with timestamp 1460044403435

16/04/08 00:53:23 INFO AppClient$ClientEndpoint: Connecting to master spark://192.168.0.111:7077...

16/04/08 00:53:23 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160408005323-0013

16/04/08 00:53:23 INFO AppClient$ClientEndpoint: Executor added: app-20160408005323-0013/0 on worker-20160407175148-192.168.0.111-57563 (192.168.0.111:57563) with 8 cores

16/04/08 00:53:23 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160408005323-0013/0 on hostPort 192.168.0.111:57563 with 8 cores, 1024.0 MB RAM

16/04/08 00:53:23 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44940.

16/04/08 00:53:23 INFO NettyBlockTransferService: Server created on 44940

16/04/08 00:53:23 INFO BlockManagerMaster: Trying to register BlockManager

16/04/08 00:53:23 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.111:44940 with 1077.8 MB RAM, BlockManagerId(driver, 192.168.0.111, 44940)

16/04/08 00:53:23 INFO BlockManagerMaster: Registered BlockManager

16/04/08 00:53:23 INFO AppClient$ClientEndpoint: Executor updated: app-20160408005323-0013/0 is now RUNNING

16/04/08 00:53:24 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

16/04/08 00:53:24 INFO SparkContext: Added JAR /home/hadoop/classes/cassandra-driver-core-3.0.0.jar at http://192.168.0.111:48452/jars/cassandra-driver-core-3.0.0.jar with timestamp 1460044404233

16/04/08 00:53:24 INFO SparkContext: Added JAR /home/hadoop/classes/guava-19.0.jar at http://192.168.0.111:48452/jars/guava-19.0.jar with timestamp 1460044404244

16/04/08 00:53:24 INFO NettyUtil: Found Netty's native epoll transport in the classpath, using it

16/04/08 00:53:25 INFO Cluster: New Cassandra host /192.168.0.111:9042 added

16/04/08 00:53:25 INFO Cluster: New Cassandra host /192.168.0.112:9042 added

16/04/08 00:53:25 INFO LocalNodeFirstLoadBalancingPolicy: Added host 192.168.0.112 (datacenter1)

16/04/08 00:53:25 INFO Cluster: New Cassandra host /192.168.0.113:9042 added

16/04/08 00:53:25 INFO LocalNodeFirstLoadBalancingPolicy: Added host 192.168.0.113 (datacenter1)

16/04/08 00:53:25 INFO Cluster: New Cassandra host /192.168.0.114:9042 added

16/04/08 00:53:25 INFO LocalNodeFirstLoadBalancingPolicy: Added host 192.168.0.114 (datacenter1)

16/04/08 00:53:25 INFO Cluster: New Cassandra host /192.168.0.115:9042 added

16/04/08 00:53:25 INFO LocalNodeFirstLoadBalancingPolicy: Added host 192.168.0.115 (datacenter1)

16/04/08 00:53:25 INFO Cluster: New Cassandra host /192.168.0.116:9042 added

16/04/08 00:53:25 INFO LocalNodeFirstLoadBalancingPolicy: Added host 192.168.0.116 (datacenter1)

16/04/08 00:53:25 INFO CassandraConnector: Connected to Cassandra cluster: CASDR

16/04/08 00:53:26 INFO SparkContext: Starting job: count at test3.java:44

16/04/08 00:53:26 INFO DAGScheduler: Got job 0 (count at test3.java:44) with 7 output partitions

16/04/08 00:53:26 INFO DAGScheduler: Final stage: ResultStage 0 (count at test3.java:44)

16/04/08 00:53:26 INFO DAGScheduler: Parents of final stage: List()

16/04/08 00:53:26 INFO DAGScheduler: Missing parents: List()

16/04/08 00:53:26 INFO DAGScheduler: Submitting ResultStage 0 (CassandraTableScanRDD[1] at RDD at CassandraRDD.scala:15), which has no missing parents

16/04/08 00:53:26 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 7.3 KB, free 7.3 KB)

16/04/08 00:53:26 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.8 KB, free 11.1 KB)

16/04/08 00:53:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.111:44940 (size: 3.8 KB, free: 1077.7 MB)

16/04/08 00:53:26 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006

16/04/08 00:53:26 INFO DAGScheduler: Submitting 7 missing tasks from ResultStage 0 (CassandraTableScanRDD[1] at RDD at CassandraRDD.scala:15)

16/04/08 00:53:26 INFO TaskSchedulerImpl: Adding task set 0.0 with 7 tasks

16/04/08 00:53:27 INFO SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (os1.local:56503) with ID 0

16/04/08 00:53:27 INFO BlockManagerMasterEndpoint: Registering block manager os1.local:53103 with 511.1 MB RAM, BlockManagerId(0, os1.local, 53103)

16/04/08 00:53:27 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, os1.local, partition 0,NODE_LOCAL, 26312 bytes)

16/04/08 00:53:27 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, os1.local, partition 1,NODE_LOCAL, 26222 bytes)

16/04/08 00:53:27 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 2, os1.local, partition 5,NODE_LOCAL, 26352 bytes)

16/04/08 00:53:28 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on os1.local:53103 (size: 3.8 KB, free: 511.1 MB)

16/04/08 00:53:31 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 3, os1.local, partition 2,ANY, 26312 bytes)

16/04/08 00:53:31 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 4, os1.local, partition 3,ANY, 21141 bytes)

16/04/08 00:53:31 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 5, os1.local, partition 4,ANY, 23882 bytes)

16/04/08 00:53:31 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, os1.local, partition 6,ANY, 10818 bytes)

16/04/08 00:53:32 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 1106 ms on os1.local (1/7)

16/04/08 00:53:33 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 5530 ms on os1.local (2/7)

16/04/08 00:53:33 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 4) in 1945 ms on os1.local (3/7)

16/04/08 00:53:33 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 6015 ms on os1.local (4/7)

16/04/08 00:53:33 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 2) in 5959 ms on os1.local (5/7)

16/04/08 00:53:33 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 5) in 2150 ms on os1.local (6/7)

16/04/08 00:53:33 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 3) in 2286 ms on os1.local (7/7)

16/04/08 00:53:33 INFO DAGScheduler: ResultStage 0 (count at test3.java:44) finished in 7.049 s

16/04/08 00:53:33 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 

16/04/08 00:53:33 INFO DAGScheduler: Job 0 finished: count at test3.java:44, took 7.435243 s

16/04/08 00:53:34 INFO CassandraConnector: Disconnected from Cassandra cluster: CASDR

16/04/08 00:53:35 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.0.111:44940 in memory (size: 3.8 KB, free: 1077.8 MB)

row count:3 <--------- 요기 요거가 나온답 ㅋ

16/04/08 00:53:35 INFO BlockManagerInfo: Removed broadcast_0_piece0 on os1.local:53103 in memory (size: 3.8 KB, free: 511.1 MB)

16/04/08 00:53:35 INFO SparkContext: Invoking stop() from shutdown hook

16/04/08 00:53:35 INFO ContextCleaner: Cleaned accumulator 1

16/04/08 00:53:35 INFO SparkUI: Stopped Spark web UI at http://192.168.0.111:4040

16/04/08 00:53:35 INFO SparkDeploySchedulerBackend: Shutting down all executors

16/04/08 00:53:35 INFO SparkDeploySchedulerBackend: Asking each executor to shut down

16/04/08 00:53:35 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

16/04/08 00:53:35 INFO MemoryStore: MemoryStore cleared

16/04/08 00:53:35 INFO BlockManager: BlockManager stopped

16/04/08 00:53:35 INFO BlockManagerMaster: BlockManagerMaster stopped

16/04/08 00:53:35 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

16/04/08 00:53:35 INFO SparkContext: Successfully stopped SparkContext

16/04/08 00:53:35 INFO ShutdownHookManager: Shutdown hook called

16/04/08 00:53:35 INFO ShutdownHookManager: Deleting directory /tmp/spark-d02efafe-ae00-4eaa-9cfc-0700e5848dd3/httpd-c8f30120-f92a-4ec9-922f-c201e94998af

16/04/08 00:53:35 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.

16/04/08 00:53:35 INFO ShutdownHookManager: Deleting directory /tmp/spark-d02efafe-ae00-4eaa-9cfc-0700e5848dd3

16/04/08 00:53:35 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

[hadoop@os1 ~]$


주의사항

위에 저 빨간 글씨인 row count 나오게 하려고 별짓을 다했다. ㅋ

이 클래스안에서 실행되는 spark와 cassandra는 새로 이 두놈에 접속을 하여 자바에서 설정된 환경설정을 따라서 

실행되는것으로 보인다.

java 소스상에 보면 setJars 를 통해서 jar파일위치를 추가해주었다.

그랬더니 길고긴 닭질을 끝내고 성공하게 됨;;;; 

그렇게 서버를 만져도 자꾸 망각하게되는 부분. ㅎㅎ;;;


저작자 표시
신고
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

이클립스에서 개발테스트중에 발생한 오류 ( 실행하면 SparkConf쪽에서 바로 오류가 난다.)

사실 저 매쏘드를 찾아보려고 jar을 다뒤져보고 역컴파일하다가 짜증나서 폭팔함;;


Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;

at org.apache.spark.util.Utils$.getSystemProperties(Utils.scala:1546)

at org.apache.spark.SparkConf.<init>(SparkConf.scala:59)

at spark1.test1.main(test1.java:20)


해결책 : 버전을 잘 마춰준다 ㅋㅋ

참고 : spark-1.6.1 , scala-library 2.10.4버전 사용 

maven 버전은 아래처럼 마춰서 사용. 



에러 날때 사용한 버전

<dependency>

  <groupId>org.apache.spark</groupId>

  <artifactId>spark-core_2.11</artifactId>

  <version>1.6.1</version>

  </dependency>

  <dependency>

  <groupId>org.apache.spark</groupId>

  <artifactId>spark-sql_2.11</artifactId>

  <version>1.6.1</version>

  </dependency>

  <dependency>

  <groupId>org.apache.spark</groupId>

  <artifactId>spark-streaming_2.11</artifactId>

  <version>1.6.1</version>

  </dependency>

  <dependency>

  <groupId>org.apache.spark</groupId>

  <artifactId>spark-mllib_2.11</artifactId>

  <version>1.6.1</version>

   </dependency>


이걸사용하고나니 없어짐 

 <dependency>

  <groupId>org.apache.spark</groupId>

  <artifactId>spark-core_2.10</artifactId>

  <version>1.6.1</version>

  </dependency>

  <dependency>

  <groupId>org.apache.spark</groupId>

  <artifactId>spark-sql_2.10</artifactId>

  <version>1.6.1</version>

  </dependency>

  <dependency>

  <groupId>org.apache.spark</groupId>

  <artifactId>spark-streaming_2.10</artifactId>

  <version>1.6.1</version>

  </dependency>

  <dependency>

  <groupId>org.apache.spark</groupId>

  <artifactId>spark-mllib_2.10</artifactId>

  <version>1.6.1</version>

  </dependency>


저작자 표시
신고
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

서론.

Hbase에 접근하여 SQL형태로 데이터를 추출하는작업을 하고자 처음 사용한것은 phoneix ~

===> http://phoenix.apache.org/

관심 있는 사람은 들어가보겠지만

일단 난 갈아타기로 했다. 

이것저것 찾아보다가 SPARK를 발견.

나이가 좀있는사람은 알겠지만 국내 성인잡지중에 SPARK라고 있었다.

음~ -_-ㅋ;;; 난 그걸 얼굴에 철판을 깔고 당당히 서점에서 사서 본적은 있다.

물론 지금도 있는지 찾아볼필요도 없었고 궁금하지도 않다.;;;;;;;


아무튼 spark는 자바도 지원하고 겸사겸사 공부하려고하는 파이썬도 지원하고있다보니 관심있게보다가.

이쪽분야에서 고공분투하는 연구팀및 개발자분들이 극찬을 한것을 보고 이게모지??? 하면서 깔기시작하다가.


개인사정으로 구성만하고 이제서야 helloworld를 찍어보려고 한다.


본론 1

준비물 : hadoop ~ ㅋㅋㅋ

참고로 난 2.7.2 ( 1master + 5data node )


spark는 당연히 spark 사이트에서 받으면 되고 .

내가 이글을 쓴시점에 1.6.1이 나왔다. (내가 처음안건 1.5~)

우선뭐 ~ 

내려받아서 압축풀면된다.

난 바로 하둡에 붙일것이라 ~ 아예 hadoop에 맞게 컴파일이 된버전을 받았다.


1. 다운로드

스파크 주소 : http://spark.apache.org 

다운로드메뉴로 가면~


아래같은 모습을 볼것이고


난 2번항목에서 아래처럼 hadoop 용을 선택했다. ( 내가 설치한 hadoop은 2.7.2 이걸쓸때 최신버전이 또올라왔길래 바로 업그레이드)



본론 2

1. helloworld 찍어보기.

개발이면뭐 콘솔에 찍어보겠으나. 

이건 데이터니까~ 샘플파일하나 하둡에 넣고 그 파일의 row 갯수확인후 완료


2. hadoop에 아무 텍스트파일하나 만들어서 넣는다.

혹시모르니 hadoop에 파일넣는것도 같이 기록하련다. 난 돌대가리니까 ㅋ


$vi testman.txt 

나불나불~ 

나불나불~


저장후

$hadoop fs -mkdir /test <-- 디렉토리 생성

$hadoop fs -put /test testman.txt <-- OS파일을 HADOOP으로 밀어넣기.



$cd 스파크있는 home 경로

$./bin/pyspark 

Python 2.7.5 (default, Nov 20 2015, 02:00:19) 

[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

16/03/15 22:20:55 INFO SparkContext: Running Spark version 1.6.1

16/03/15 22:20:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

16/03/15 22:20:57 INFO SecurityManager: Changing view acls to: hadoop

16/03/15 22:20:57 INFO SecurityManager: Changing modify acls to: hadoop

16/03/15 22:20:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)

16/03/15 22:20:58 INFO Utils: Successfully started service 'sparkDriver' on port 49115.

16/03/15 22:20:59 INFO Slf4jLogger: Slf4jLogger started

16/03/15 22:20:59 INFO Remoting: Starting remoting

16/03/15 22:21:00 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.111:41015]

16/03/15 22:21:00 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 41015.

16/03/15 22:21:00 INFO SparkEnv: Registering MapOutputTracker

16/03/15 22:21:00 INFO SparkEnv: Registering BlockManagerMaster

16/03/15 22:21:00 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b918f3eb-05d7-4f85-a0cb-678f3327e8b6

16/03/15 22:21:00 INFO MemoryStore: MemoryStore started with capacity 511.5 MB

16/03/15 22:21:00 INFO SparkEnv: Registering OutputCommitCoordinator

16/03/15 22:21:01 INFO Utils: Successfully started service 'SparkUI' on port 4040.

16/03/15 22:21:01 INFO SparkUI: Started SparkUI at http://192.168.0.111:4040

16/03/15 22:21:01 INFO Executor: Starting executor ID driver on host localhost

16/03/15 22:21:01 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37043.

16/03/15 22:21:01 INFO NettyBlockTransferService: Server created on 37043

16/03/15 22:21:01 INFO BlockManagerMaster: Trying to register BlockManager

16/03/15 22:21:01 INFO BlockManagerMasterEndpoint: Registering block manager localhost:37043 with 511.5 MB RAM, BlockManagerId(driver, localhost, 37043)

16/03/15 22:21:01 INFO BlockManagerMaster: Registered BlockManager

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /__ / .__/\_,_/_/ /_/\_\   version 1.6.1

      /_/


Using Python version 2.7.5 (default, Nov 20 2015 02:00:19)

SparkContext available as sc, HiveContext available as sqlContext.

>>> xx = sc.hadoopFile("hdfs://192.168.0.111:9000/test/testman.txt","org.apache.hadoop.mapred.TextInputFormat","org.apache.hadoop.io.Text","org.apache.hadoop.io.LongWritable") <-- document 에 있는 문서보고 이렇게 쓰랜다 ;;; 그래서 우선 따라서 이대로 해봄

16/03/15 22:30:19 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 153.6 KB, free 153.6 KB)

16/03/15 22:30:19 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 167.5 KB)

16/03/15 22:30:19 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37043 (size: 13.9 KB, free: 511.5 MB)

16/03/15 22:30:19 INFO SparkContext: Created broadcast 0 from hadoopFile at PythonRDD.scala:613

16/03/15 22:30:20 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 166.7 KB, free 334.2 KB)

16/03/15 22:30:20 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 13.0 KB, free 347.2 KB)

16/03/15 22:30:20 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:37043 (size: 13.0 KB, free: 511.5 MB)

16/03/15 22:30:20 INFO SparkContext: Created broadcast 1 from broadcast at PythonRDD.scala:570

16/03/15 22:30:21 INFO FileInputFormat: Total input paths to process : 1

16/03/15 22:30:21 INFO SparkContext: Starting job: take at SerDeUtil.scala:201

16/03/15 22:30:21 INFO DAGScheduler: Got job 0 (take at SerDeUtil.scala:201) with 1 output partitions

16/03/15 22:30:21 INFO DAGScheduler: Final stage: ResultStage 0 (take at SerDeUtil.scala:201)

16/03/15 22:30:21 INFO DAGScheduler: Parents of final stage: List()

16/03/15 22:30:21 INFO DAGScheduler: Missing parents: List()

16/03/15 22:30:21 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at PythonHadoopUtil.scala:181), which has no missing parents

16/03/15 22:30:21 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.3 KB, free 350.5 KB)

16/03/15 22:30:21 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1980.0 B, free 352.5 KB)

16/03/15 22:30:21 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:37043 (size: 1980.0 B, free: 511.5 MB)

16/03/15 22:30:21 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006

16/03/15 22:30:21 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at PythonHadoopUtil.scala:181)

16/03/15 22:30:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks

16/03/15 22:30:21 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,ANY, 2144 bytes)

16/03/15 22:30:21 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)

16/03/15 22:30:21 INFO HadoopRDD: Input split: hdfs://192.168.0.111:9000/test/testman.txt:0+60

16/03/15 22:30:21 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id

16/03/15 22:30:21 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id

16/03/15 22:30:21 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap

16/03/15 22:30:21 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition

16/03/15 22:30:21 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id

16/03/15 22:30:22 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2193 bytes result sent to driver

16/03/15 22:30:22 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 502 ms on localhost (1/1)

16/03/15 22:30:22 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 

16/03/15 22:30:22 INFO DAGScheduler: ResultStage 0 (take at SerDeUtil.scala:201) finished in 0.544 s

16/03/15 22:30:22 INFO DAGScheduler: Job 0 finished: take at SerDeUtil.scala:201, took 0.755415 s

>>> 16/03/15 22:30:45 INFO BlockManagerInfo: Removed broadcast_2_piece0 on localhost:37043 in memory (size: 1980.0 B, free: 511.5 MB)

16/03/15 22:30:45 INFO ContextCleaner: Cleaned accumulator 2


>>> xx.count() <-- 방금 읽은거 갯수출력명령

16/03/15 22:32:49 INFO SparkContext: Starting job: count at <stdin>:1

16/03/15 22:32:49 INFO DAGScheduler: Got job 1 (count at <stdin>:1) with 2 output partitions

16/03/15 22:32:49 INFO DAGScheduler: Final stage: ResultStage 1 (count at <stdin>:1)

16/03/15 22:32:49 INFO DAGScheduler: Parents of final stage: List()

16/03/15 22:32:49 INFO DAGScheduler: Missing parents: List()

16/03/15 22:32:49 INFO DAGScheduler: Submitting ResultStage 1 (PythonRDD[3] at count at <stdin>:1), which has no missing parents

16/03/15 22:32:49 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 6.0 KB, free 353.2 KB)

16/03/15 22:32:49 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 3.7 KB, free 356.9 KB)

16/03/15 22:32:49 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:37043 (size: 3.7 KB, free: 511.5 MB)

16/03/15 22:32:49 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006

16/03/15 22:32:49 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (PythonRDD[3] at count at <stdin>:1)

16/03/15 22:32:49 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks

16/03/15 22:32:49 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0,ANY, 2144 bytes)

16/03/15 22:32:49 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, localhost, partition 1,ANY, 2144 bytes)

16/03/15 22:32:49 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)

16/03/15 22:32:49 INFO Executor: Running task 1.0 in stage 1.0 (TID 2)

16/03/15 22:32:49 INFO HadoopRDD: Input split: hdfs://192.168.0.111:9000/test/testman.txt:0+60

16/03/15 22:32:49 INFO HadoopRDD: Input split: hdfs://192.168.0.111:9000/test/testman.txt:60+61

16/03/15 22:32:50 INFO PythonRunner: Times: total = 661, boot = 627, init = 33, finish = 1

16/03/15 22:32:50 INFO PythonRunner: Times: total = 633, boot = 616, init = 15, finish = 2

16/03/15 22:32:50 INFO Executor: Finished task 1.0 in stage 1.0 (TID 2). 2179 bytes result sent to driver

16/03/15 22:32:50 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 734 ms on localhost (1/2)

16/03/15 22:32:50 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 2179 bytes result sent to driver

16/03/15 22:32:50 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 750 ms on localhost (2/2)

16/03/15 22:32:50 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 

16/03/15 22:32:50 INFO DAGScheduler: ResultStage 1 (count at <stdin>:1) finished in 0.752 s

16/03/15 22:32:50 INFO DAGScheduler: Job 1 finished: count at <stdin>:1, took 0.778757 s

27  <--- 결과가 출력됨.

>>>


결론 

HelloWorld 완료~ ㅋ


참고 : 

나같은경우엔 hadoop master에다가 spark를 같이 넣어서 하는중이라 주소가 같은데.

위에 pyspark 를 실행하면 http://192.168.0.111:4040/stages 웹관리자? 가 있다고 주저리주저리 하는걸 볼수있다.

써있기로는 SparkUI 라고하는데 일종의 모니터링툴이다.


아래처럼 깔끔한 UI에서 방금 하둡에 있는 파일을 읽은 내용이 나온다. 

아래 그림보면 count한것도 나온다~ 

오!!!!!!!!!! 멋찜~ ㅋㅋㅋㅋㅋㅋ 




내가 실행했던 count명령에 대한 상세내용~ 그림으로도 니가 이케했는데 이케댔어요~~~ 라고 이쁘장하게 잘나온다.

인터넷으로 돌아다니는 공개자료들중에 뭐 보면 몇노드에서 백억건을 읽는는데 몇분몇분 이러는 결과가 아래처럼 잘 나온다.

리포트로 쓰기에도 나쁘지않은 디자인이다 ㅋ



저작자 표시
신고
TAG hadoop, spark
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

시스템 규모가 커지면서 서버도 많아지고. ......

또한 각각서버는 응가하듯 각자의 머신에 로그를 쌓는다.

그러다가 장애가 발생하면 어느서버인지 확신하지못하면서 전로그를 뒤져보게 된다.


서버가 많으면 이걸 일일히 볼수가 없어서 한곳에 몰아 넣는작업을 했다.


이미 빅데이터 프로젝트를 진행하고 있는지라. Hadoop 이 있어서 아래처럼 구성했다.


아래는 회사에 적용한 로그 취합 구조이다. 



우리는 서버가 2대가 있는데 이 두대에서 나오는 로그를 쌓고 HDFS에다가 넣어버렸다.



flume conf 설정


flume 1.6 을 사용했고 


인터넷뒤져보면서 2~3개 짬뽕한거니까. 보다가 똑같아! 라고 한다면 그게 맞다 -_-;;;



WAS쪽에 설정된 flume.conf 

agent02.sources = execGenSrc

agent02.channels = memoryChannel

agent02.sinks = avroSink


# For each one of the sources, the type is defined

agent02.sources.execGenSrc.type = exec

agent02.sources.execGenSrc.command = tail -F Server.log

agent02.sources.execGenSrc.batchSize = 10


# The channel can be defined as follows.

agent02.sources.execGenSrc.channels = memoryChannel


# Each sink's type must be defined

agent02.sinks.avroSink.type = avro

agent02.sinks.avroSink.hostname = 취합할 서버 IP 

agent02.sinks.avroSink.port = 33333

agent02.sinks.avroSink.batch-size = 10


#Specify the channel the sink should use

agent02.sinks.avroSink.channel = memoryChannel


# Each channel's type is defined.

agent02.channels.memoryChannel.type = memory


# Other config values specific to each type of channel(sink or source)

# can be defined as well

# In this case, it specifies the capacity of the memory channel

agent02.channels.memoryChannel.capacity = 100000

agent02.channels.memoryChannel.transactionCapacity = 10000


실행은 아래와 같이

$flume-ng agent --conf-file /usr/local/flume/conf/flume.conf --name agent02 &




전송된 로그를 받아서 취합하는쪽 flume.conf

(서버의 여유가 없어서 그냥 Hadoop Master에다가 올렸는데 사용량이 늘면 이것도 분리해야 할 대상)

* 아래보면 알겠지만 년/월/일 디렉토리로 구분

agent01.sources = avroGenSrc

agent01.channels = memoryChannel

agent01.sinks = HDFS


# For each one of the sources, the type is defined

agent01.sources.avroGenSrc.type = avro

agent01.sources.avroGenSrc.bind = 취합할 서버 IP

agent01.sources.avroGenSrc.port = 33333


# The channel can be defined as follows.

agent01.sources.avroGenSrc.channels = memoryChannel


agent01.sinks.HDFS.type = HDFS

agent01.sinks.HDFS.hdfs.path = hdfs://하둡마스터:port/log/%Y/%m/%d 

agent01.sinks.HDFS.hdfs.fileType = DataStream

agent01.sinks.HDFS.hdfs.writeFormat = text

agent01.sinks.HDFS.hdfs.batchSize = 1000

agent01.sinks.HDFS.hdfs.rollSize = 0

agent01.sinks.HDFS.hdfs.rollCount = 10000

agent01.sinks.HDFS.hdfs.rollInterval = 600

agent01.sinks.HDFS.hdfs.useLocalTimeStamp = true


#Specify the channel the sink should use

agent01.sinks.HDFS.channel = memoryChannel


# Each channel's type is defined.

agent01.channels.memoryChannel.type = memory


# Other config values specific to each type of channel(sink or source)

# can be defined as well

# In this case, it specifies the capacity of the memory channel

agent01.channels.memoryChannel.capacity = 100000


실행은 아래와 같이

flume-ng agent --conf-file /usr/local/flume/conf/flume.conf --name agent01




실행하고 몇분 정도있으면 쌓인다 버퍼가 좀있으니 ㅋ 이걸 설정한 오늘날짜 디렉토리를 보면 아래처럼 리스트가 주욱 쌓인다.
( 별다를건없고 저거 걍 받아서 열면 로그가 그대로 잘 보여진다.)
[]$ hadoop fs -ls /log/2015/10/02
날짜 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 14 items
-rw-r--r--   2 hadoop supergroup       7886 System Date 10:32 /log/2015/10/02/FlumeData.1443749496016
-rw-r--r--   2 hadoop supergroup      84770 System Date 10:54 /log/2015/10/02/FlumeData.1443750268279
-rw-r--r--   2 hadoop supergroup      76584 System Date 11:04 /log/2015/10/02/FlumeData.1443750882785
-rw-r--r--   2 hadoop supergroup      80806 System Date 11:14 /log/2015/10/02/FlumeData.1443751485776
-rw-r--r--   2 hadoop supergroup      82399 System Date 11:24 /log/2015/10/02/FlumeData.1443752087823
-rw-r--r--   2 hadoop supergroup      93033 System Date 11:34 /log/2015/10/02/FlumeData.1443752688829
-rw-r--r--   2 hadoop supergroup      97378 System Date 11:45 /log/2015/10/02/FlumeData.1443753300847
-rw-r--r--   2 hadoop supergroup      62234 System Date 11:55 /log/2015/10/02/FlumeData.1443753907868
-rw-r--r--   2 hadoop supergroup      62485 System Date 12:05 /log/2015/10/02/FlumeData.1443754510881
-rw-r--r--   2 hadoop supergroup      62473 System Date 12:15 /log/2015/10/02/FlumeData.1443755148898
-rw-r--r--   2 hadoop supergroup      82722 System Date 12:25 /log/2015/10/02/FlumeData.1443755759911
-rw-r--r--   2 hadoop supergroup      67439 System Date 12:36 /log/2015/10/02/FlumeData.1443756361919
-rw-r--r--   2 hadoop supergroup       2087 System Date 12:49 /log/2015/10/02/FlumeData.1443757141933
-rw-r--r--   2 hadoop supergroup        172 System Date 12:51 /log/2015/10/02/FlumeData.1443757865435.tmp



저작자 표시
신고
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

RDBMS 가 좋다~ 편하니까 ㅎㅎ....

하지만 대형데이터를 다루기엔 역시 개발자입장에선 NoSQL이 편하기도 하다.

SQL의 튜닝적인 요소에 머리를 쥐어짜고 고민하고 스트레스받는것보단 NoSQL에서 데이터를 쭉쭉뽑으면 쉽긴하다.


RDBMS든 BigData든 중요한건 데이터를 메모리로 들고가서 처리하고 결과를 보여준다.

그에대한 내부 아키텍쳐가 아무리 복잡해도 후다닥 가져와서 후다닥처리하고자함은 역시나 다를바없다고 본다.


회사에서도 오라클로는 저장하기 부담스러운 데이터를 적재하고 분석하고자 고민끝에.


아래처럼 Hadoop을 선택했다.

그중에서도 가장 DB스러운 hbase를 채택하여 운영해보고 쓸만하면 복잡스럽지 않은 데이터는 

hbase를 앞에 두고 운영하려고 겸사 겸사 프로젝트 시작한다.




그림은 최종 고객이 사이트에 들어오면 우리서버에서 hadoop에 데이터를 적재하고 적재한 데이터를 추출하는 과정을 보여주는

그림이다.



위에 대한 작업을 따로 정리하기하여 올려보기로 했고.

우선 첫번째로 운영되는 기본구성그림을 공개한다.


사용 버전 : 

* Java 1.7.최종

* Hbase 1.1.0.1

* Hadoop 2.7.1 

* Zookeeper 3.4.6

* Phoenix 4.5.0


저작자 표시
신고
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설