달력

06

« 2018/06 »

  •  
  •  
  •  
  •  
  •  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30

* 목적 : 여러대의 hadoop data node 에서 발생하는 정보를 한눈으로 보고싶다!

* 내용 : hadoop 대수가 늘어나면서 직접 해당 노드에 들어가서 상태를 보는것이 

          사실상 안된다고 보는게 맞는것 같다. 그래서 한곳으로 몰빵하는 작업을 시작 하기로 했음.


준비물 


* 장비 : 그냥 서버들 ( 내 상황은 운영중인 약 50대의 datanode 서버들 전부 ) 

* 소프트웨어 : 

flume 1.8



1. 서버에서 agent 형태로 실행해야하기때문에 shell 에서 설정한 변수를 가져올필요가 있었다

아래 ${~ } 이게 되네~ ㅋㅋ


하지만 알아둘것은 shell 내에서 변수를 선언하는게 아니라 export 까지 해야 인식한다.

그래서 conf 안에 있는 flume-env.sh 에다가 추가함.


* 1.6일때는 kafka가 없어서 custom 으로 만들어야 했으나 1.7부터는 kafka가 기본 제공

agent.source = seqGenSrc

agent.channels = memoryChannel

agent.sinks = k1


agent.sources.seqGenSrc.type = exec

agent.sources.seqGenSrc.command = tail -F 뭔가 떨어지는 로그

agent.sources.seqGenSrc.channels = memoryChannel


agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink

agent.sinks.k1.kafka.topic = topic-${HOSTNAME}

agent.sinks.k1.kafka.bootstrap.servers = data1:9092,data2:9092,data3:9092,data4:9092

agent.sink.k1.kafka.flumeBatchSize = 1000

agent.sink.k1.kafka.producer.acks = 1

agent.sinks.k1.kafka.producer.linger.ms = 10

agent.sinks.k1.channel = memoryChannel 

agent.channels.memoryChannel.type = memory 


agent.channels.memoryChannel.capacity = 10000

agent.channels.memoryChannel.transactionCapacity = 1000



Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

* 목적 : 여러대의 hadoop data node 에서 발생하는 정보를 한눈으로 보고싶다!

* 내용 : hadoop 대수가 늘어나면서 직접 해당 노드에 들어가서 상태를 보는것이 

          사실상 안된다고 보는게 맞는것 같다. 그래서 한곳으로 몰빵하는 작업을 시작 하기로 했음.

* 설치 내용은 이제 빼도 될만큼 잘구성되어있어서 설치는 쉬우니 패스;;;;


준비물 


* 장비 : 서버 6대

* 소프트웨어 : 

1. spark 2.2.1 

2. kafka 0.10.x

3. zookeepr 3.4.11

4. flume 1.8


1. 구성 

개발목적으로 제공되는 서버가 6대가 있어서 해당 서버를 가지고 구성을 하기때문에 

여기저기에 중복설치를 하기로함. 


내가 사용중인 구성은 아래와같다. ( spak stream 처리에 대한 내용이니 그거관련 만 기술함 )




2. SPARK STREAM 

* 여러가지 테스트 케이스를 거쳐서 현재는 잘 작동 ( 원소스는 타자치기 귀찮아서 필요한것만 작성 ) 


* 개발하면서 발생한 문제및 참고사항 기록 

1. kafka 를 최신 버전을 선택했다. 그러다보니 관련 라이브러리가 필요하다. 

   0.8 과 0.10 대를 나눠서 되는거를 구분했던데 아무튼 import 를 보면 알듯 10 대 소스를 가져와서 사용 .


   참조 사이트 : http://spark.apache.org/docs/latest/streaming-kafka-integration.html 


2. spark-shell 을 가지고 그냥 돌리다보니 기본으로 sc 가 생겨나니 그걸가지고 다시 ssc 생성 


3. kafka에 topic을 노드수만큼 만들어두어서 모든 topic 에 대한 정보를 아래 rdd.foreach 부분에서 

   데이터를 처리한다.


4. 아래 소스중 accumulator 가 있다. 

고민을 많이 했었다. 이것때문에 

처리할 데이터가 많고 해서 spark를 cluster로 구성해서 여러노드에서 돌렸더니 문제가 발생했었다. 

처리하고 난 데이터를 아래 rdd.foreach 상위인 dstream.foreachRDD 안에다가 배열선언하고 배열값을 받았더니 에러도 나고 안되기도 하고 문제였다. 


그럴것이 각 다른노드에서 작업중인 데이터가 뭔가 특별한 장치가 없으면 값을 못받을것이다. 


spark 문서에보니 accumulator를 제공했다.


참조 사이트 : http://spark.apache.org/docs/latest/streaming-programming-guide.html#accumulators-broadcast-variables-and-checkpoints



import org.apache.spark._

import org.apache.spark.streaming._

import org.apache.spark.streaming.StreamingContext._

import org.apache.kafka.clients.consumer.ConsumerRecord

import org.apache.kafka.common.serialization.StringDeserializer

import org.apache.spark.streaming.kafka010._

import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent

import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe

import org.apache.kafka.common.TopicPartition


import scala.util.matching.Regex

import org.apache.spark.storage.StorageLevel

import scala.collection.JavaConversions.asScalaBuffer

import scala.collection.mutable.ListBuffer

import scala.collection.mutable.{Map,SynchroizedMap,HashMap}

import scala.util.control.Breaks._


val ssc = new StreamingContext(sc, Seconds(2)) 

val kafkaParams = Map[String, Object](

"bootstrap.servers" -> "data1:9092,data2:9092,data3:9092,data4:9092",

"session.timeout.ms" -> 100000,

"key.deserializer" -> classOf[StringDeserializer],

"value.deserializer" -> classOf[StringDeserializer],

"enable.auto.commit" -> (true: java.lang.Boolean), 

"group.id" -> "MERONG", 

"auto.offset.reset" -> "latest" , 

"enable.auto.commit" -> (false : java.lang.Boolean)

)


val topicList = new ListBuffer[String]()

val topics = topicList.toList


val dstream = KafkaUtils.createDirectStream[String, String] (

ssc,

PreferConsistent,

ConsumerStrategies.Subscribe[String, String](topics, kafkaParams)

)


dstream.foreachRDD { rdd => 


      val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges

      

      val acObj = spark.sparkContext.collecitonAccumulator[String]("acObj")


      rdd.foreach { res => 

       val o: OffsetRange = offsetRanges(TaskContext.get.partitionId)

 val vTopic = o.topic

 val vPartition = o.partition

 val vsoffset = o.fromOffset

 val veoffset = o.untilOffset 


 /* 데이터 처리하는 부분 시작 */

 acObj.add(vsoffset)

 println(res.value)

 /* 데이터 처리하는 부분 시작 */


      }


      dstream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)

}


ssc.start()

ssc.awaitTermination()


* scala 작업이 끝나고 spark master 있는 노드에서 실행!

* 아래는 spark-shell 돌린것 

spark-shell --master spark://spark-master:7077 -i scala-file --verbose --executor-cores 4 --total-executor-core 32 --driver-memory 4G --executor-memory 4G --driver-java-options -Dcom.datastax.driver.FORCE_NIO=true


















Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

1. elasticsearch 설정



- elasticsearch.yml


path.data: /data01/esdata, data02/esdata


RHEL 6.5 (커널문제로 지원 안대서 아래설정 추가 )

위에 설정한 disk 수만큼 아래 숫자 지정했음.

bootstrap.system_call_filter: false

node.max_local_storage_node: 10

network.bind_host: * * * * 

network.publish_host: * * * * 




2. logstash 설정


- logstash.yml


x pack 설치하면서 추가된 설정

xpack.monitoring.elastichsearch.username: logstash_system

xpack.monitoring.elastichsearch.password: *******

xpack.monitoring.enabled: true

xpack.monitoring.elasticsearch.url : http://hostname:9200 


- pipeline.yml


- pipeline.id: job1

   path.config: "path"

   pipeline.workers: 10

   queyue.type: memory

- pipeline.id : job2

   path.config: "path"

   pipeline.workers: 10

   queue.type: memory



- job1에 대한 설정 ( xpack 설치하면서 elastic 계정 사용 )

input { 

    file {

        close_older => 0

        sincedb_path => "/dev/null"

        path => "저장할경로"

        start_position => "beginning"

        exclude => "*.gz"

        max_open_fikes => "9999999"

        codec => multiline {

                             max_lines => 9000000

                             max_bytes => "100 mib"

                             pattern => "^%{TIMESTAMP_ISO8601} %{LOGLEVEL} "

                             what => "previous"

                             negate => true

                       }

    }

}

filter {

        mutate { strip => "message" }

        mutate { gsub => ["message" , "\n" , "<<ENTER>>"] }

        mutate { gsub => ["message" , "\t" , "<<TAB>>"] }

        grok {

               match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglvl} %{GREEDYDATA:msg}"          }

        } 

    

}

output {

    elasticsearch

    {

        host => ["hostname","hostname"]

        index => "elasticsearch index"

        user => "elastic"

        password => "*****"

    }

}



3. kibana 설정


이건 별거없어서 패스


Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

1. MySQL 테이블 조회해서 dataframe 으로 만들고 Hive 에넣기


var driver = "com.mysql.jdbc.Driver"

var url = "jdbc:mysql://hostname"

var username = "user"

var password = "pass"


val conn = new Properties()

conn.put("user", username)

conn.put("password",password)


val sqlContext = new SQLContext(sc)

val df = sqlContext.read.jdbc(url, "TABLE_NAME", conn)

df.write,mode("overwrite").saveAsTable("hive.table")




2. ELASTICSEARCH 조회해서 spark Temp Table (dataframe) 만들고 Hive 에 넣기


val conf = new SparkConf().setAppName("ela")

conf.set("spark.sql.catalogImplementation","hive")

conf.set("spark.master","spark://hostname:7077")

conf.set("spark.driver.allowMultipleContexts","true")

conf.set("es.index.auto.create","false")

conf.set("es.node.discovery","true")

conf.set("es.nodes","host,host,host")


val es_query = """

{

    "query" : {

        "bool" : { 

            "must" : [

               {"range" : {

                  "@timestamp" : { "gte" : "2017-01-01T00:30:00" , "lte" : "2017-02-01T30:00" , "time_zone" : "+09:00"

               }

         }}

         ,

         { "bool" : {

                "should" : [

                      { "wildcard" : {

                             "msg.keyword" : { "value" : "message*" }

                      }}

             ]

     }}

     ]

  }

  }

}

"""


conf.set("es.query", es_query)


val sc = new SparkContext(conf)

val sqlContext = new SQLContext(sc)


val df = sqlContext.read.format("org.elasticsearch.spark.sql").load("elastic_index")

df.regsterTempTable("SPARK TEMP TABLE")

val ndf = sqlContext.sql("SELECT *FROM SPARK TEMP TABLE")

ndf.write.mode("overwrite").saveAsTable("hive table")




3. SPARK SQL ( hive ) 로 조회해서 HBASE - phoenix로 넣기 


val res_sql = """

SELECT *FROM HIVE_TABLE 

"""


val df = sqlContext.sql(res_sql)

val ar = df.collect.map( x => (x.getString(0) , x.getString(1))).toLIst

sc.parallelize(ar).saveToPhoenix(

     "PHOENIX_TABLE",

     Seq("col1","col2"),

     zkUrl = Some("zookeeper_host:2181")




Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설
2017.08.28 22:37

언더코팅 진행 기록. 취미/My Car2017.08.28 22:37


언더코팅을 진행함. 

방음도 같이 처리하기 위해서 고가를 주고 진행하였다.

차체가 워낙커서 비용이 좀 비싸다고 하는데....



아래 3장의 사진은 언더코팅전의 사진이다.

4WD를 위한 샤프트에는 적용되지 않게 처리를 하고나서 진행했다고 한다.





아래부터가 코팅을 하고난 뒤의 모든 사진이다.













'취미 > My Car' 카테고리의 다른 글

언더코팅 진행 기록.  (0) 2017.08.28
내생에 첫 차  (0) 2017.08.27
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설
2017.08.27 21:37

내생에 첫 차 취미/My Car2017.08.27 21:37

내생애 첫 차 

아래는 ford korea 의 첫페이지에 나오는 Explorer ~

2008년에는 포드가 인기가 있었는지 모르겠지만 2008년부터 쭉 같은 생각을 가지고 있었다.

사면 포드~ 익스플로러~


이생각을 가지고 작년에 늦은나이에 면허를 따고 

드디어 올해 첫차를 샀다. 




빨간색 익스플로러를 구매하고 주차에 좀 문제가 있어서 우선 공영주차장에서 처리중.






'취미 > My Car' 카테고리의 다른 글

언더코팅 진행 기록.  (0) 2017.08.28
내생에 첫 차  (0) 2017.08.27
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

Spark2.1 Hive Auth Custom Test


- 2017년 현재. 모 빅데이터 프로젝트 운영을 하고있는중.

- 요구사항 : Spark 를 사용하는데 Hive권한 인증을 사용하려한다.


Spark 버전 : 2.1

문제점 : 

Spark의 강력한 기능에는 현재 호튼웍스 빅데이터 플랫폼에서 사용하고있는 Spark인증은 찾아봐도 없었다.


Hive Metastore를 쓰기때문에 Custom을 해서 재컴파일하려고 했고.

테스트는 잘되고 .


그 위치만 올려서 나중에 안까먹으려한다.


- Spark Source는 Scala로 되어있다 .

- 일단 Scala를 좀 스스로 Hello World는 찍고나서 아래부분에 기능을 추가함.

SparkSource 위치

sql/hive/src/main/scala/org/apache/spark/sql/hive


파일명 : TableReader.scala


Trait를 참조하는곳이 두군데니까 꼭 두군데를 수정할것

일반 Heap Table  과 Partition Table 을 두가지로 분리해서 소스가 개발되었으니. 두군데 다수정해야한다.


어차피 쓰는용도는 확정이니 난 Hive에 들어있는 권한을 사용하기로 함.


아래는 내가 수정한부분의 일부이다.(이정도면 안까먹을듯)

def makeRDDForTable(

      hiveTable: HiveTable,

      deserializerClass: Class[_ <: Deserializer],

      filterOpt: Option[PathFilter]): RDD[InternalRow] = {


이부분부터 수정을 시작.

var hive_table_nm = hiveTable.getTableName()

val driver = "com.mysql.jdbc.Driver"

val url = "jdbc:mysql://내메타스토어경로/hive?characterEncoding=utf8"

val username = "flashone"

val password = "1234"

val check_query = "SELECT count(*) CNT from TBL_PRIVS A , TBLS B WHERE A.TBL_ID = B.TBL_ID AND B.TBL_NAME = ? and A.GRANTOR = ? "

var connection:Connection = null

var resultSet:ResultSet = null

var stat:PreparedStatement = null

try {


     Class.forName(driver)

     connection = DriverManager.getConnection(url, username, password)


     stat = connection.prepareStatement(check_query)

     stat.setString(1,hive_table_nm)

     stat.setString(2,hive_user_nm)

     resultSet = stat.executeQuery()

     while ( resultSet.next() )

     {

            if ( resultSet.getString("CNT") == "0" ) {


                   val Npath = new Path("hdfs path")


                   logInfo(hiveTable.getDataLocation().toString())


                   hiveTable.setDataLocation(Npath)


                   throw new Exception("Access Denied")

            }


     }


   } catch {


   }

   resultSet.close()

   stat.close()

   connection.close()


https://github.com/ocmpromaster/sparkcustom21

여기서 수정하고있음 ㅋ



Rstudio에서 인증처리하는 모습 :

Custom을일단했으니. Audit도 별문제는 없다.



TAG 2, 2.1, auth, Grant, Hive, MySQL, spark
Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

현재 프로젝트에서 SQOOP 과 압축과 저장포멧을 선택해야해서 간단하게 테스트를 했다. 


테스트항목은 sqoop 을 통해서 oracle 데이터를 hadoop 에 넣을때 

snappy의 압축 / 비압축  

text plain / parquet 포멧

이 두가지 종류로 총4 개의 테스트를 진행한다.




테스트 장비의 간단한 스펙 


Host 장비 : 

CPU : Xeon CPU E5-2620 v3 * 2 HT ( Total 24 Core )

RAM : 256GB

HDD : PCI-E(NVMe) Vm OS , SATA (hadoop , oracle data )




guest os 스펙 


HADOOP ECO SYSTEM


vm node spec 

core : 16core ( 4socket , 4core )

ram : 16GB

1 name node , 4 data node , 1 ambari-util server 구성 



ORACLE


vm node spec

core : 8core ( 4socket , 2core )

ram : 8GB

1 single node : Oracle DB (Enterprise Single Node) 12C



원천데이터 (SOURCE) 설명 

이전테스트 실패에서 알수있듯이 랜덤으로 텍스트를 가득 채웠더니 압축률이 0%가 나왔다. 근거는

압축과 비압축 전송총량이 동일했다. 

해당 테스트 결과는 아래에 있다. 


위와 같은 문제로 일반적인 데이터 즉 무언가 문장이 있고 숫자가있는 데이터를 만들기로 하고 oracle dictionary 에 있는 comment 를 가지고 테스트데이터를 만들었다. 


테이블 구조 : 파티션테이블이 아닌 일반 힙테이블.


CREATE TABLE HDFS_TEST5

(

TABLE_NAME VARCHAR2(128),

LVL NUMBER,

COMMENTS VARCHAR(4000),

REG_DT VARCHAR2(19),

SEQ NUMBER 

)


대량 들어있는 데이터를 보면.

약간 데이터 스럽게 생겼다 ㅎㅎ;;


ROW COUNT : 11,620,800건




테스트 1.

========================= 압축률이 0% 인 결과를 보여주는 테스트 ============================


CASE 1. 파일포멧 : text plain , 비압축


명령어

sqoop import --target-dir=/dev/test/data_nc_txt --table HDFS_3_SUB -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test/data_nc_txt --table HDFS_3_SUB -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 17:46:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 17:46:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 17:46:04 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 17:46:06 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 17:46:06 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 17:46:06 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 17:46:06 INFO tool.CodeGenTool: Beginning code generation

16/08/10 17:46:06 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM HDFS_3_SUB WHERE 0=1

16/08/10 17:46:06 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM "HDFS_3_SUB" WHERE 1=0

16/08/10 17:46:06 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/7e534716bf2036f166e1b14257055d00/HDFS_3_SUB.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 17:46:09 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/7e534716bf2036f166e1b14257055d00/HDFS_3_SUB.jar

16/08/10 17:46:09 INFO mapreduce.ImportJobBase: Beginning import of HDFS_3_SUB

16/08/10 17:46:11 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM "HDFS_3_SUB" WHERE 1=0

16/08/10 17:46:12 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 17:46:12 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 17:46:15 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 17:46:15 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 17:46:15 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 17:46:15 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810174606KST'); 

end;

16/08/10 17:46:15 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 17:46:15 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 17:46:15 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 17:46:17 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 72704 blocks that have been divided into 128 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 17:46:17 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 17:46:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0001

16/08/10 17:46:18 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0001

16/08/10 17:46:18 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0001/

16/08/10 17:46:18 INFO mapreduce.Job: Running job: job_1470815003334_0001

16/08/10 17:46:26 INFO mapreduce.Job: Job job_1470815003334_0001 running in uber mode : false

16/08/10 17:46:26 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 17:46:40 INFO mapreduce.Job:  map 7% reduce 0%

16/08/10 17:46:41 INFO mapreduce.Job:  map 32% reduce 0%

16/08/10 17:46:42 INFO mapreduce.Job:  map 57% reduce 0%

16/08/10 17:46:43 INFO mapreduce.Job:  map 80% reduce 0%

16/08/10 17:46:44 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 17:46:46 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 17:46:46 INFO mapreduce.Job: Job job_1470815003334_0001 completed successfully

16/08/10 17:46:46 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=614156

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=11629

HDFS: Number of bytes written=505000000

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=116770

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=58385

Total vcore-seconds taken by all map tasks=58385

Total megabyte-seconds taken by all map tasks=89679360

Map-Reduce Framework

Map input records=500000

Map output records=500000

Input split bytes=11629

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1801

CPU time spent (ms)=75560

Physical memory (bytes) snapshot=1357881344

Virtual memory (bytes) snapshot=13260427264

Total committed heap usage (bytes)=709361664

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=505000000

16/08/10 17:46:46 INFO mapreduce.ImportJobBase: Transferred 481.6055 MB in 34.0606 seconds (14.1397 MB/sec)

16/08/10 17:46:46 INFO mapreduce.ImportJobBase: Retrieved 500000 records.

[hdfs@amb2 ~]$ 



CASE 2. 파일포멧 : text plain , 압축 : snappy 


명령어

sqoop import --target-dir=/dev/test/data_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_3_SUB -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test/data_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_3_SUB -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 17:50:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 17:50:13 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 17:50:13 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 17:50:14 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 17:50:14 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 17:50:14 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 17:50:14 INFO tool.CodeGenTool: Beginning code generation

16/08/10 17:50:14 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM HDFS_3_SUB WHERE 0=1

16/08/10 17:50:14 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM "HDFS_3_SUB" WHERE 1=0

16/08/10 17:50:14 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/175537b93a775793afb75735d65b176f/HDFS_3_SUB.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 17:50:16 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/175537b93a775793afb75735d65b176f/HDFS_3_SUB.jar

16/08/10 17:50:16 INFO mapreduce.ImportJobBase: Beginning import of HDFS_3_SUB

16/08/10 17:50:17 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10" FROM "HDFS_3_SUB" WHERE 1=0

16/08/10 17:50:18 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 17:50:18 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 17:50:21 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 17:50:21 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 17:50:21 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 17:50:21 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810175014KST'); 

end;

16/08/10 17:50:21 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 17:50:21 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 17:50:21 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 17:50:22 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 72704 blocks that have been divided into 128 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 17:50:22 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 17:50:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0002

16/08/10 17:50:23 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0002

16/08/10 17:50:23 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0002/

16/08/10 17:50:23 INFO mapreduce.Job: Running job: job_1470815003334_0002

16/08/10 17:50:31 INFO mapreduce.Job: Job job_1470815003334_0002 running in uber mode : false

16/08/10 17:50:31 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 17:50:41 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 17:50:42 INFO mapreduce.Job:  map 50% reduce 0%

16/08/10 17:50:44 INFO mapreduce.Job:  map 80% reduce 0%

16/08/10 17:50:46 INFO mapreduce.Job:  map 89% reduce 0%

16/08/10 17:50:47 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 17:50:47 INFO mapreduce.Job: Job job_1470815003334_0002 completed successfully

16/08/10 17:50:47 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=614136

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=11629

HDFS: Number of bytes written=505057648

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=86936

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=43468

Total vcore-seconds taken by all map tasks=43468

Total megabyte-seconds taken by all map tasks=66766848

Map-Reduce Framework

Map input records=500000

Map output records=500000

Input split bytes=11629

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1239

CPU time spent (ms)=67760

Physical memory (bytes) snapshot=1410674688

Virtual memory (bytes) snapshot=13316395008

Total committed heap usage (bytes)=737673216

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=505057648

16/08/10 17:50:47 INFO mapreduce.ImportJobBase: Transferred 481.6605 MB in 29.1766 seconds (16.5084 MB/sec)

16/08/10 17:50:47 INFO mapreduce.ImportJobBase: Retrieved 500000 records.

[hdfs@amb2 ~]$





본격적인테스트는 아래부터 이다.


========================= 4가지 케이스 테스트  ============================




CASE 1. 텍스트 , 비압축


명령어

sqoop import --target-dir=/dev/test2/data_nc_txt --table HDFS_TEST5 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2/data_nc_txt --table HDFS_TEST5 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 18:09:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 18:09:15 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 18:09:15 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 18:09:17 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 18:09:17 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 18:09:17 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 18:09:17 INFO tool.CodeGenTool: Beginning code generation

16/08/10 18:09:17 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM HDFS_TEST5 WHERE 0=1

16/08/10 18:09:17 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:09:17 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/83d82e240522d651aa49f619fb1c723b/HDFS_TEST5.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 18:09:20 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/83d82e240522d651aa49f619fb1c723b/HDFS_TEST5.jar

16/08/10 18:09:20 INFO mapreduce.ImportJobBase: Beginning import of HDFS_TEST5

16/08/10 18:09:22 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:09:23 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 18:09:23 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 18:09:26 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 18:09:26 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 18:09:26 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 18:09:26 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810180917KST'); 

end;

16/08/10 18:09:26 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 18:09:26 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 18:09:26 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 18:09:27 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 180224 blocks that have been divided into 185 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 18:09:27 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 18:09:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0004

16/08/10 18:09:28 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0004

16/08/10 18:09:28 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0004/

16/08/10 18:09:28 INFO mapreduce.Job: Running job: job_1470815003334_0004

16/08/10 18:09:36 INFO mapreduce.Job: Job job_1470815003334_0004 running in uber mode : false

16/08/10 18:09:36 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 18:09:47 INFO mapreduce.Job:  map 3% reduce 0%

16/08/10 18:09:49 INFO mapreduce.Job:  map 10% reduce 0%

16/08/10 18:09:50 INFO mapreduce.Job:  map 13% reduce 0%

16/08/10 18:09:52 INFO mapreduce.Job:  map 23% reduce 0%

16/08/10 18:09:54 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 18:09:55 INFO mapreduce.Job:  map 37% reduce 0%

16/08/10 18:09:57 INFO mapreduce.Job:  map 44% reduce 0%

16/08/10 18:09:58 INFO mapreduce.Job:  map 59% reduce 0%

16/08/10 18:10:00 INFO mapreduce.Job:  map 60% reduce 0%

16/08/10 18:10:01 INFO mapreduce.Job:  map 73% reduce 0%

16/08/10 18:10:03 INFO mapreduce.Job:  map 77% reduce 0%

16/08/10 18:10:04 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 18:10:06 INFO mapreduce.Job:  map 83% reduce 0%

16/08/10 18:10:08 INFO mapreduce.Job:  map 91% reduce 0%

16/08/10 18:10:10 INFO mapreduce.Job:  map 95% reduce 0%

16/08/10 18:10:11 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 18:10:12 INFO mapreduce.Job: Job job_1470815003334_0004 completed successfully

16/08/10 18:10:12 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=614016

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=16707

HDFS: Number of bytes written=1422465312

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=241392

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=120696

Total vcore-seconds taken by all map tasks=120696

Total megabyte-seconds taken by all map tasks=185389056

Map-Reduce Framework

Map input records=11620800

Map output records=11620800

Input split bytes=16707

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=2308

CPU time spent (ms)=178250

Physical memory (bytes) snapshot=1499193344

Virtual memory (bytes) snapshot=13390106624

Total committed heap usage (bytes)=746586112

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=1422465312

16/08/10 18:10:12 INFO mapreduce.ImportJobBase: Transferred 1.3248 GB in 48.8625 seconds (27.763 MB/sec)

16/08/10 18:10:12 INFO mapreduce.ImportJobBase: Retrieved 11620800 records.

[hdfs@amb2 ~]$ 




case 2 . 텍스트 , snappy 압축


명령어

sqoop import --target-dir=/dev/test2/data_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_TEST5 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2/data_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_TEST5 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 18:12:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 18:12:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 18:12:53 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 18:12:54 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 18:12:54 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 18:12:54 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 18:12:54 INFO tool.CodeGenTool: Beginning code generation

16/08/10 18:12:54 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM HDFS_TEST5 WHERE 0=1

16/08/10 18:12:54 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:12:54 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/a40bc11d71b280f6f6f0be86d8987524/HDFS_TEST5.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 18:12:56 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/a40bc11d71b280f6f6f0be86d8987524/HDFS_TEST5.jar

16/08/10 18:12:56 INFO mapreduce.ImportJobBase: Beginning import of HDFS_TEST5

16/08/10 18:12:57 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:12:58 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 18:12:58 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 18:13:01 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 18:13:01 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 18:13:01 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 18:13:01 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810181254KST'); 

end;

16/08/10 18:13:01 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 18:13:01 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 18:13:01 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 18:13:02 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 180224 blocks that have been divided into 185 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 18:13:02 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 18:13:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0005

16/08/10 18:13:02 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0005

16/08/10 18:13:02 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0005/

16/08/10 18:13:02 INFO mapreduce.Job: Running job: job_1470815003334_0005

16/08/10 18:13:10 INFO mapreduce.Job: Job job_1470815003334_0005 running in uber mode : false

16/08/10 18:13:10 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 18:13:22 INFO mapreduce.Job:  map 3% reduce 0%

16/08/10 18:13:23 INFO mapreduce.Job:  map 10% reduce 0%

16/08/10 18:13:25 INFO mapreduce.Job:  map 15% reduce 0%

16/08/10 18:13:26 INFO mapreduce.Job:  map 24% reduce 0%

16/08/10 18:13:28 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 18:13:29 INFO mapreduce.Job:  map 36% reduce 0%

16/08/10 18:13:31 INFO mapreduce.Job:  map 43% reduce 0%

16/08/10 18:13:32 INFO mapreduce.Job:  map 55% reduce 0%

16/08/10 18:13:34 INFO mapreduce.Job:  map 56% reduce 0%

16/08/10 18:13:35 INFO mapreduce.Job:  map 67% reduce 0%

16/08/10 18:13:37 INFO mapreduce.Job:  map 70% reduce 0%

16/08/10 18:13:38 INFO mapreduce.Job:  map 81% reduce 0%

16/08/10 18:13:40 INFO mapreduce.Job:  map 85% reduce 0%

16/08/10 18:13:41 INFO mapreduce.Job:  map 90% reduce 0%

16/08/10 18:13:43 INFO mapreduce.Job:  map 94% reduce 0%

16/08/10 18:13:44 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 18:13:47 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 18:13:47 INFO mapreduce.Job: Job job_1470815003334_0005 completed successfully

16/08/10 18:13:47 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=613996

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=16707

HDFS: Number of bytes written=809177318

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=244686

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=122343

Total vcore-seconds taken by all map tasks=122343

Total megabyte-seconds taken by all map tasks=187918848

Map-Reduce Framework

Map input records=11620800

Map output records=11620800

Input split bytes=16707

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=2268

CPU time spent (ms)=166060

Physical memory (bytes) snapshot=1489506304

Virtual memory (bytes) snapshot=13366169600

Total committed heap usage (bytes)=789577728

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=809177318

16/08/10 18:13:47 INFO mapreduce.ImportJobBase: Transferred 771.6916 MB in 49.3399 seconds (15.6403 MB/sec)

16/08/10 18:13:47 INFO mapreduce.ImportJobBase: Retrieved 11620800 records.

[hdfs@amb2 ~]$ 





case 3. 파켓 포멧  , 비압축 


명령어

sqoop import --target-dir=/dev/test2/data_nc_pq --table HDFS_TEST5 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2/data_nc_pq --table HDFS_TEST5 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 18:16:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 18:16:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 18:16:27 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 18:16:28 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 18:16:28 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 18:16:28 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 18:16:28 INFO tool.CodeGenTool: Beginning code generation

16/08/10 18:16:28 INFO tool.CodeGenTool: Will generate java class as codegen_HDFS_TEST5

16/08/10 18:16:28 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM HDFS_TEST5 WHERE 0=1

16/08/10 18:16:28 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:16:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/cc39417a2380979acedad21e69766d18/codegen_HDFS_TEST5.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 18:16:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/cc39417a2380979acedad21e69766d18/codegen_HDFS_TEST5.jar

16/08/10 18:16:31 INFO mapreduce.ImportJobBase: Beginning import of HDFS_TEST5

16/08/10 18:16:31 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:16:32 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:16:34 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 18:16:34 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 18:16:36 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 18:16:36 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 18:16:36 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 18:16:36 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810181628KST'); 

end;

16/08/10 18:16:36 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 18:16:36 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 18:16:36 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 18:16:37 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 180224 blocks that have been divided into 185 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 18:16:37 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 18:16:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0006

16/08/10 18:16:38 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0006

16/08/10 18:16:38 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0006/

16/08/10 18:16:38 INFO mapreduce.Job: Running job: job_1470815003334_0006

16/08/10 18:16:46 INFO mapreduce.Job: Job job_1470815003334_0006 running in uber mode : false

16/08/10 18:16:46 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 18:17:10 INFO mapreduce.Job:  map 2% reduce 0%

16/08/10 18:17:22 INFO mapreduce.Job:  map 9% reduce 0%

16/08/10 18:17:25 INFO mapreduce.Job:  map 16% reduce 0%

16/08/10 18:17:28 INFO mapreduce.Job:  map 21% reduce 0%

16/08/10 18:17:31 INFO mapreduce.Job:  map 28% reduce 0%

16/08/10 18:17:34 INFO mapreduce.Job:  map 36% reduce 0%

16/08/10 18:17:37 INFO mapreduce.Job:  map 40% reduce 0%

16/08/10 18:17:40 INFO mapreduce.Job:  map 49% reduce 0%

16/08/10 18:17:43 INFO mapreduce.Job:  map 60% reduce 0%

16/08/10 18:17:46 INFO mapreduce.Job:  map 65% reduce 0%

16/08/10 18:17:49 INFO mapreduce.Job:  map 72% reduce 0%

16/08/10 18:17:52 INFO mapreduce.Job:  map 78% reduce 0%

16/08/10 18:17:55 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 18:17:58 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 18:18:01 INFO mapreduce.Job:  map 93% reduce 0%

16/08/10 18:18:03 INFO mapreduce.Job:  map 94% reduce 0%

16/08/10 18:18:04 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 18:18:07 INFO mapreduce.Job:  map 99% reduce 0%

16/08/10 18:18:10 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 18:18:10 INFO mapreduce.Job: Job job_1470815003334_0006 completed successfully

16/08/10 18:18:10 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=618184

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=49495

HDFS: Number of bytes written=456758842

HDFS: Number of read operations=200

HDFS: Number of large read operations=0

HDFS: Number of write operations=36

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=612860

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=306430

Total vcore-seconds taken by all map tasks=306430

Total megabyte-seconds taken by all map tasks=470676480

Map-Reduce Framework

Map input records=11620800

Map output records=11620800

Input split bytes=16707

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=31375

CPU time spent (ms)=363220

Physical memory (bytes) snapshot=2344189952

Virtual memory (bytes) snapshot=13274750976

Total committed heap usage (bytes)=1465909248

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=0

16/08/10 18:18:10 INFO mapreduce.ImportJobBase: Transferred 435.5992 MB in 97.1124 seconds (4.4855 MB/sec)

16/08/10 18:18:10 INFO mapreduce.ImportJobBase: Retrieved 11620800 records.

[hdfs@amb2 ~]$ 




case 4. 파켓 포멧 , snappy 압축


명령어

sqoop import --target-dir=/dev/test2/data_sn_pq --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_TEST5 -direct --as-parquetfile --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2/data_sn_pq --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_TEST5 -direct --as-parquetfile --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 18:21:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 18:21:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 18:21:21 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 18:21:22 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 18:21:22 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 18:21:22 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 18:21:22 INFO tool.CodeGenTool: Beginning code generation

16/08/10 18:21:22 INFO tool.CodeGenTool: Will generate java class as codegen_HDFS_TEST5

16/08/10 18:21:22 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM HDFS_TEST5 WHERE 0=1

16/08/10 18:21:22 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:21:22 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/ffc7dd24a3b45d0a3b7dad6697d1826d/codegen_HDFS_TEST5.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 18:21:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/ffc7dd24a3b45d0a3b7dad6697d1826d/codegen_HDFS_TEST5.jar

16/08/10 18:21:24 INFO mapreduce.ImportJobBase: Beginning import of HDFS_TEST5

16/08/10 18:21:25 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:21:27 INFO manager.SqlManager: Executing SQL statement: SELECT "TABLE_NAME","LVL","COMMENTS","REG_DT","SEQ" FROM "HDFS_TEST5" WHERE 1=0

16/08/10 18:21:29 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 18:21:29 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 18:21:32 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 18:21:32 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 18:21:32 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 18:21:32 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810182122KST'); 

end;

16/08/10 18:21:32 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 18:21:32 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 18:21:32 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 18:21:32 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 180224 blocks that have been divided into 185 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 18:21:33 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 18:21:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470815003334_0007

16/08/10 18:21:33 INFO impl.YarnClientImpl: Submitted application application_1470815003334_0007

16/08/10 18:21:33 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470815003334_0007/

16/08/10 18:21:33 INFO mapreduce.Job: Running job: job_1470815003334_0007

16/08/10 18:21:41 INFO mapreduce.Job: Job job_1470815003334_0007 running in uber mode : false

16/08/10 18:21:41 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 18:21:57 INFO mapreduce.Job:  map 2% reduce 0%

16/08/10 18:21:58 INFO mapreduce.Job:  map 8% reduce 0%

16/08/10 18:22:00 INFO mapreduce.Job:  map 9% reduce 0%

16/08/10 18:22:01 INFO mapreduce.Job:  map 14% reduce 0%

16/08/10 18:22:03 INFO mapreduce.Job:  map 15% reduce 0%

16/08/10 18:22:04 INFO mapreduce.Job:  map 20% reduce 0%

16/08/10 18:22:06 INFO mapreduce.Job:  map 22% reduce 0%

16/08/10 18:22:07 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 18:22:09 INFO mapreduce.Job:  map 28% reduce 0%

16/08/10 18:22:10 INFO mapreduce.Job:  map 33% reduce 0%

16/08/10 18:22:12 INFO mapreduce.Job:  map 34% reduce 0%

16/08/10 18:22:13 INFO mapreduce.Job:  map 40% reduce 0%

16/08/10 18:22:15 INFO mapreduce.Job:  map 44% reduce 0%

16/08/10 18:22:16 INFO mapreduce.Job:  map 48% reduce 0%

16/08/10 18:22:19 INFO mapreduce.Job:  map 54% reduce 0%

16/08/10 18:22:20 INFO mapreduce.Job:  map 59% reduce 0%

16/08/10 18:22:22 INFO mapreduce.Job:  map 65% reduce 0%

16/08/10 18:22:24 INFO mapreduce.Job:  map 71% reduce 0%

16/08/10 18:22:25 INFO mapreduce.Job:  map 76% reduce 0%

16/08/10 18:22:26 INFO mapreduce.Job:  map 77% reduce 0%

16/08/10 18:22:29 INFO mapreduce.Job:  map 78% reduce 0%

16/08/10 18:22:32 INFO mapreduce.Job:  map 81% reduce 0%

16/08/10 18:22:33 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 18:22:34 INFO mapreduce.Job:  map 83% reduce 0%

16/08/10 18:22:35 INFO mapreduce.Job:  map 86% reduce 0%

16/08/10 18:22:36 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 18:22:38 INFO mapreduce.Job:  map 90% reduce 0%

16/08/10 18:22:39 INFO mapreduce.Job:  map 92% reduce 0%

16/08/10 18:22:40 INFO mapreduce.Job:  map 93% reduce 0%

16/08/10 18:22:41 INFO mapreduce.Job:  map 96% reduce 0%

16/08/10 18:22:42 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 18:22:44 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 18:22:46 INFO mapreduce.Job: Job job_1470815003334_0007 completed successfully

16/08/10 18:22:46 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=618704

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=49495

HDFS: Number of bytes written=456758940

HDFS: Number of read operations=200

HDFS: Number of large read operations=0

HDFS: Number of write operations=36

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=466256

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=233128

Total vcore-seconds taken by all map tasks=233128

Total megabyte-seconds taken by all map tasks=358084608

Map-Reduce Framework

Map input records=11620800

Map output records=11620800

Input split bytes=16707

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=9158

CPU time spent (ms)=325900

Physical memory (bytes) snapshot=2222469120

Virtual memory (bytes) snapshot=13310775296

Total committed heap usage (bytes)=1477443584

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=0

16/08/10 18:22:46 INFO mapreduce.ImportJobBase: Transferred 435.5993 MB in 78.3224 seconds (5.5616 MB/sec)

16/08/10 18:22:46 INFO mapreduce.ImportJobBase: Retrieved 11620800 records.

[hdfs@amb2 ~]$ 




위테스트에 대한 결과.


 

텍스트 , 비압축 

텍스트, 압축 

파켓, 비압축 

파켓, 압축

 총 작업및 전송시간(Sec)

 48.8625

 49.3399

 97.1124

 78.3224

 총 전송량

 1.3248(GB)

 771.6916(MB)

 435.5992(MB)

 435.5993(MB)

 초당 전송량 (MB/sec)

 27.763

 15.6403

 4.4855

 5.5616

       





Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

sqoop , compress , parquet , textplain 을 사용하기 위한 테스트를 진행하기로 했다.


현재 프로젝트중인 고객사에서 사용하기로 한 

snappy와 parquet에 대한 테스트를 위해서 해당 내용을 작성한다.

사용할경우 안할경우를 비교하기위해서 

총 4가지의 케이스를 테스트한다.


MAIN Server 환경 : 

- CPU : 2CPU (2Socket 24Core)

- RAM : 256GB

- HDD : PCX용 SSD + SATA HDD 조합 (hadoop, oracle data가 SATA HDD에 존재)

테스트상 Disk I/O가 영향이 많이가기때문에 이 내용도 기록함 

- HDD는 전부 5400RPM - 버퍼 64MB , WD사의 데이터 저장용 HDD임.

- SSD는 PCI-E 에 장착하는 장비이며 INTEL SSD 700시리즈

- OS+APP+ENGINE은 SSD에 , DATA는 HDD에 있다

* ambari 를 이용한 하둡설치를 진행했으며 DATA NODE 는 총 4개 replica는 3

* namenode 2개 HA구성.

* Vmware 10 

* Vmware 구성 Network 는 기본 100m 라인.

* Oracle은 Enterprise single node 12.1버전 (linux) - 이것역시 Vmware 



oracle 테이블은 아래와 같다.

데이터 사이즈는 약 80GB정도 되는 테이블이며 150개의 컬럼에 1바이트안남기고 150바이트씩 전부 넣어서 데이터를 만들었다. 건수는 5,385,050건 이다. 

최대한 비슷한 환경을 위해서 shared pool , db block buffer flush를 하면서 테스트를 진행했다.




case 1. 비압축 + text plain 


명령어 실행

sqoop import --target-dir=/dev/data1_nc_txt --table HDFS_2 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/data1_nc_txt --table HDFS_2 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 14:01:17 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 14:01:17 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 14:01:17 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 14:01:18 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 14:01:18 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 14:01:18 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 14:01:18 INFO tool.CodeGenTool: Beginning code generation

16/08/10 14:01:18 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM HDFS_2 WHERE 0=1

16/08/10 14:01:18 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM "HDFS_2" WHERE 1=0

16/08/10 14:01:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/9152233b807a9011bb4c6752ec771805/HDFS_2.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 14:01:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/9152233b807a9011bb4c6752ec771805/HDFS_2.jar

16/08/10 14:01:21 INFO mapreduce.ImportJobBase: Beginning import of HDFS_2

16/08/10 14:01:22 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM "HDFS_2" WHERE 1=0

16/08/10 14:01:23 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 14:01:23 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 14:01:25 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 14:01:25 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 14:01:25 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 14:01:25 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810140118KST'); 

end;

16/08/10 14:01:25 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 14:01:25 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 14:01:25 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 14:01:26 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 11335664 blocks that have been divided into 739 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 14:01:26 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 14:01:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0015

16/08/10 14:01:27 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0015

16/08/10 14:01:27 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0015/

16/08/10 14:01:27 INFO mapreduce.Job: Running job: job_1470728284233_0015

16/08/10 14:01:34 INFO mapreduce.Job: Job job_1470728284233_0015 running in uber mode : false

16/08/10 14:01:34 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 14:01:55 INFO mapreduce.Job:  map 1% reduce 0%

16/08/10 14:02:10 INFO mapreduce.Job:  map 2% reduce 0%

16/08/10 14:02:29 INFO mapreduce.Job:  map 3% reduce 0%

16/08/10 14:02:46 INFO mapreduce.Job:  map 4% reduce 0%

16/08/10 14:03:01 INFO mapreduce.Job:  map 5% reduce 0%

16/08/10 14:03:19 INFO mapreduce.Job:  map 6% reduce 0%

16/08/10 14:03:39 INFO mapreduce.Job:  map 7% reduce 0%

16/08/10 14:03:57 INFO mapreduce.Job:  map 8% reduce 0%

16/08/10 14:04:17 INFO mapreduce.Job:  map 9% reduce 0%

16/08/10 14:04:39 INFO mapreduce.Job:  map 10% reduce 0%

16/08/10 14:04:56 INFO mapreduce.Job:  map 11% reduce 0%

16/08/10 14:05:15 INFO mapreduce.Job:  map 12% reduce 0%

16/08/10 14:05:32 INFO mapreduce.Job:  map 13% reduce 0%

16/08/10 14:05:53 INFO mapreduce.Job:  map 14% reduce 0%

16/08/10 14:06:10 INFO mapreduce.Job:  map 15% reduce 0%

16/08/10 14:06:29 INFO mapreduce.Job:  map 16% reduce 0%

16/08/10 14:06:54 INFO mapreduce.Job:  map 17% reduce 0%

16/08/10 14:07:12 INFO mapreduce.Job:  map 18% reduce 0%

16/08/10 14:07:30 INFO mapreduce.Job:  map 19% reduce 0%

16/08/10 14:07:51 INFO mapreduce.Job:  map 20% reduce 0%

16/08/10 14:08:21 INFO mapreduce.Job:  map 21% reduce 0%

16/08/10 14:08:36 INFO mapreduce.Job:  map 22% reduce 0%

16/08/10 14:09:01 INFO mapreduce.Job:  map 23% reduce 0%

16/08/10 14:09:20 INFO mapreduce.Job:  map 24% reduce 0%

16/08/10 14:09:37 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 14:09:50 INFO mapreduce.Job:  map 26% reduce 0%

16/08/10 14:10:15 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 14:10:30 INFO mapreduce.Job:  map 28% reduce 0%

16/08/10 14:10:55 INFO mapreduce.Job:  map 29% reduce 0%

16/08/10 14:11:18 INFO mapreduce.Job:  map 30% reduce 0%

16/08/10 14:11:33 INFO mapreduce.Job:  map 31% reduce 0%

16/08/10 14:11:54 INFO mapreduce.Job:  map 32% reduce 0%

16/08/10 14:12:09 INFO mapreduce.Job:  map 33% reduce 0%

16/08/10 14:12:27 INFO mapreduce.Job:  map 34% reduce 0%

16/08/10 14:12:49 INFO mapreduce.Job:  map 35% reduce 0%

16/08/10 14:13:02 INFO mapreduce.Job:  map 36% reduce 0%

16/08/10 14:13:32 INFO mapreduce.Job:  map 37% reduce 0%

16/08/10 14:13:53 INFO mapreduce.Job:  map 38% reduce 0%

16/08/10 14:14:12 INFO mapreduce.Job:  map 39% reduce 0%

16/08/10 14:14:29 INFO mapreduce.Job:  map 40% reduce 0%

16/08/10 14:14:48 INFO mapreduce.Job:  map 41% reduce 0%

16/08/10 14:14:54 INFO mapreduce.Job:  map 42% reduce 0%

16/08/10 14:15:15 INFO mapreduce.Job:  map 43% reduce 0%

16/08/10 14:15:31 INFO mapreduce.Job:  map 44% reduce 0%

16/08/10 14:15:43 INFO mapreduce.Job:  map 45% reduce 0%

16/08/10 14:16:06 INFO mapreduce.Job:  map 46% reduce 0%

16/08/10 14:16:13 INFO mapreduce.Job:  map 47% reduce 0%

16/08/10 14:16:37 INFO mapreduce.Job:  map 48% reduce 0%

16/08/10 14:16:43 INFO mapreduce.Job:  map 49% reduce 0%

16/08/10 14:17:11 INFO mapreduce.Job:  map 50% reduce 0%

16/08/10 14:17:27 INFO mapreduce.Job:  map 51% reduce 0%

16/08/10 14:17:49 INFO mapreduce.Job:  map 52% reduce 0%

16/08/10 14:17:55 INFO mapreduce.Job:  map 53% reduce 0%

16/08/10 14:18:25 INFO mapreduce.Job:  map 54% reduce 0%

16/08/10 14:18:43 INFO mapreduce.Job:  map 55% reduce 0%

16/08/10 14:18:53 INFO mapreduce.Job:  map 56% reduce 0%

16/08/10 14:19:16 INFO mapreduce.Job:  map 57% reduce 0%

16/08/10 14:19:40 INFO mapreduce.Job:  map 58% reduce 0%

16/08/10 14:19:47 INFO mapreduce.Job:  map 59% reduce 0%

16/08/10 14:20:14 INFO mapreduce.Job:  map 60% reduce 0%

16/08/10 14:20:22 INFO mapreduce.Job:  map 61% reduce 0%

16/08/10 14:20:42 INFO mapreduce.Job:  map 62% reduce 0%

16/08/10 14:20:55 INFO mapreduce.Job:  map 63% reduce 0%

16/08/10 14:21:15 INFO mapreduce.Job:  map 64% reduce 0%

16/08/10 14:21:29 INFO mapreduce.Job:  map 65% reduce 0%

16/08/10 14:21:53 INFO mapreduce.Job:  map 66% reduce 0%

16/08/10 14:22:23 INFO mapreduce.Job:  map 67% reduce 0%

16/08/10 14:22:35 INFO mapreduce.Job:  map 68% reduce 0%

16/08/10 14:23:00 INFO mapreduce.Job:  map 69% reduce 0%

16/08/10 14:23:18 INFO mapreduce.Job:  map 70% reduce 0%

16/08/10 14:23:34 INFO mapreduce.Job:  map 71% reduce 0%

16/08/10 14:23:55 INFO mapreduce.Job:  map 72% reduce 0%

16/08/10 14:24:15 INFO mapreduce.Job:  map 73% reduce 0%

16/08/10 14:24:31 INFO mapreduce.Job:  map 74% reduce 0%

16/08/10 14:24:54 INFO mapreduce.Job:  map 75% reduce 0%

16/08/10 14:25:14 INFO mapreduce.Job:  map 76% reduce 0%

16/08/10 14:25:56 INFO mapreduce.Job:  map 77% reduce 0%

16/08/10 14:26:17 INFO mapreduce.Job:  map 78% reduce 0%

16/08/10 14:26:37 INFO mapreduce.Job:  map 79% reduce 0%

16/08/10 14:27:25 INFO mapreduce.Job:  map 80% reduce 0%

16/08/10 14:27:39 INFO mapreduce.Job:  map 81% reduce 0%

16/08/10 14:28:30 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 14:28:58 INFO mapreduce.Job:  map 83% reduce 0%

16/08/10 14:29:41 INFO mapreduce.Job:  map 84% reduce 0%

16/08/10 14:29:59 INFO mapreduce.Job:  map 85% reduce 0%

16/08/10 14:31:00 INFO mapreduce.Job:  map 86% reduce 0%

16/08/10 14:31:39 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 14:32:11 INFO mapreduce.Job:  map 88% reduce 0%

16/08/10 14:32:35 INFO mapreduce.Job:  map 89% reduce 0%

16/08/10 14:33:04 INFO mapreduce.Job:  map 90% reduce 0%

16/08/10 14:33:51 INFO mapreduce.Job:  map 91% reduce 0%

16/08/10 14:34:45 INFO mapreduce.Job:  map 92% reduce 0%

16/08/10 14:35:14 INFO mapreduce.Job:  map 93% reduce 0%

16/08/10 14:35:44 INFO mapreduce.Job:  map 94% reduce 0%

16/08/10 14:36:16 INFO mapreduce.Job:  map 95% reduce 0%

16/08/10 14:37:08 INFO mapreduce.Job:  map 96% reduce 0%

16/08/10 14:37:44 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 14:38:15 INFO mapreduce.Job:  map 98% reduce 0%

16/08/10 14:38:45 INFO mapreduce.Job:  map 99% reduce 0%

16/08/10 14:39:41 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 14:40:05 INFO mapreduce.Job: Job job_1470728284233_0015 completed successfully

16/08/10 14:40:05 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=619112

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=66021

HDFS: Number of bytes written=81583507500

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=17659492

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=8829746

Total vcore-seconds taken by all map tasks=8829746

Total megabyte-seconds taken by all map tasks=13562489856

Map-Reduce Framework

Map input records=5385050

Map output records=5385050

Input split bytes=66021

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=52264

CPU time spent (ms)=2221300

Physical memory (bytes) snapshot=1561075712

Virtual memory (bytes) snapshot=13249417216

Total committed heap usage (bytes)=847249408

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=81583507500

16/08/10 14:40:05 INFO mapreduce.ImportJobBase: Transferred 75.9806 GB in 2,322.6017 seconds (33.4987 MB/sec)

16/08/10 14:40:05 INFO mapreduce.ImportJobBase: Retrieved 5385050 records.

[hdfs@amb2 ~]$ 




case 2. 압축 + text plain 

sqoop import --target-dir=/dev/data2_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_2 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/data2_sn_txt --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --tablHDFS_2 -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1

16/08/10 14:52:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 14:52:34 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 14:52:34 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 14:52:35 INFO oracle.OraOopManagerFactory: 

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/08/10 14:52:36 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/08/10 14:52:36 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/08/10 14:52:36 INFO tool.CodeGenTool: Beginning code generation

16/08/10 14:52:36 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM HDFS_2 WHERE 0=1

16/08/10 14:52:36 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM "HDFS_2" WHERE 1=0

16/08/10 14:52:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/296e19dcfe9895e28bbc77da203ad202/HDFS_2.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 14:52:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/296e19dcfe9895e28bbc77da203ad202/HDFS_2.jar

16/08/10 14:52:39 INFO mapreduce.ImportJobBase: Beginning import of HDFS_2

16/08/10 14:52:40 INFO manager.SqlManager: Executing SQL statement: SELECT "ACOL1","ACOL2","ACOL3","ACOL4","ACOL5","ACOL6","ACOL7","ACOL8","ACOL9","ACOL10","ACOL11","ACOL12","ACOL13","ACOL14","ACOL15","ACOL16","ACOL17","ACOL18","ACOL19","ACOL20","ACOL21","ACOL22","ACOL23","ACOL24","ACOL25","ACOL26","ACOL27","ACOL28","ACOL29","ACOL30","ACOL31","ACOL32","ACOL33","ACOL34","ACOL35","ACOL36","ACOL37","ACOL38","ACOL39","ACOL40","ACOL41","ACOL42","ACOL43","ACOL44","ACOL45","ACOL46","ACOL47","ACOL48","ACOL49","ACOL50","ACOL51","ACOL52","ACOL53","ACOL54","ACOL55","ACOL56","ACOL57","ACOL58","ACOL59","ACOL60","ACOL61","ACOL62","ACOL63","ACOL64","ACOL65","ACOL66","ACOL67","ACOL68","ACOL69","ACOL70","ACOL71","ACOL72","ACOL73","ACOL74","ACOL75","BCOL1","BCOL2","BCOL3","BCOL4","BCOL5","BCOL6","BCOL7","BCOL8","BCOL9","BCOL10","BCOL11","BCOL12","BCOL13","BCOL14","BCOL15","BCOL16","BCOL17","BCOL18","BCOL19","BCOL20","BCOL21","BCOL22","BCOL23","BCOL24","BCOL25","BCOL26","BCOL27","BCOL28","BCOL29","BCOL30","BCOL31","BCOL32","BCOL33","BCOL34","BCOL35","BCOL36","BCOL37","BCOL38","BCOL39","BCOL40","BCOL41","BCOL42","BCOL43","BCOL44","BCOL45","BCOL46","BCOL47","BCOL48","BCOL49","BCOL50","BCOL51","BCOL52","BCOL53","BCOL54","BCOL55","BCOL56","BCOL57","BCOL58","BCOL59","BCOL60","BCOL61","BCOL62","BCOL63","BCOL64","BCOL65","BCOL66","BCOL67","BCOL68","BCOL69","BCOL70","BCOL71","BCOL72","BCOL73","BCOL74","BCOL75" FROM "HDFS_2" WHERE 1=0

16/08/10 14:52:42 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 14:52:42 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 14:52:44 WARN oracle.OraOopUtilities: System property java.security.egd is not set to file:///dev/urandom - Oracle connections may time out.

16/08/10 14:52:44 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 14:52:44 INFO oracle.OraOopOracleQueries: Session Time Zone set to GMT

16/08/10 14:52:44 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL :

begin 

  dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'import 20160810145235KST'); 

end;

16/08/10 14:52:44 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session disable parallel query

16/08/10 14:52:44 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set "_serial_direct_read"=true

16/08/10 14:52:44 INFO oracle.OracleConnectionFactory: Initializing Oracle session with SQL : alter session set tracefile_identifier=oraoop

16/08/10 14:52:45 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 11335664 blocks that have been divided into 739 chunks which will be processed in 4 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN

16/08/10 14:52:45 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 14:52:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0017

16/08/10 14:52:46 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0017

16/08/10 14:52:46 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0017/

16/08/10 14:52:46 INFO mapreduce.Job: Running job: job_1470728284233_0017

16/08/10 14:52:54 INFO mapreduce.Job: Job job_1470728284233_0017 running in uber mode : false

16/08/10 14:52:54 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 14:53:16 INFO mapreduce.Job:  map 1% reduce 0%

16/08/10 14:53:28 INFO mapreduce.Job:  map 2% reduce 0%

16/08/10 14:53:43 INFO mapreduce.Job:  map 3% reduce 0%

16/08/10 14:53:58 INFO mapreduce.Job:  map 4% reduce 0%

16/08/10 14:54:15 INFO mapreduce.Job:  map 5% reduce 0%

16/08/10 14:54:30 INFO mapreduce.Job:  map 6% reduce 0%

16/08/10 14:54:58 INFO mapreduce.Job:  map 7% reduce 0%

16/08/10 14:55:44 INFO mapreduce.Job:  map 8% reduce 0%

16/08/10 14:56:31 INFO mapreduce.Job:  map 9% reduce 0%

16/08/10 14:57:03 INFO mapreduce.Job:  map 10% reduce 0%

16/08/10 14:57:31 INFO mapreduce.Job:  map 11% reduce 0%

16/08/10 14:58:06 INFO mapreduce.Job:  map 12% reduce 0%

16/08/10 14:58:45 INFO mapreduce.Job:  map 13% reduce 0%

16/08/10 14:59:17 INFO mapreduce.Job:  map 14% reduce 0%

16/08/10 14:59:57 INFO mapreduce.Job:  map 15% reduce 0%

16/08/10 15:00:32 INFO mapreduce.Job:  map 16% reduce 0%

16/08/10 15:01:24 INFO mapreduce.Job:  map 17% reduce 0%

16/08/10 15:02:05 INFO mapreduce.Job:  map 18% reduce 0%

16/08/10 15:02:36 INFO mapreduce.Job:  map 19% reduce 0%

16/08/10 15:03:16 INFO mapreduce.Job:  map 20% reduce 0%

16/08/10 15:03:45 INFO mapreduce.Job:  map 21% reduce 0%

16/08/10 15:04:30 INFO mapreduce.Job:  map 22% reduce 0%

16/08/10 15:04:58 INFO mapreduce.Job:  map 23% reduce 0%

16/08/10 15:05:30 INFO mapreduce.Job:  map 24% reduce 0%

16/08/10 15:06:05 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 15:06:54 INFO mapreduce.Job:  map 26% reduce 0%

16/08/10 15:07:24 INFO mapreduce.Job:  map 27% reduce 0%

16/08/10 15:08:00 INFO mapreduce.Job:  map 28% reduce 0%

16/08/10 15:08:44 INFO mapreduce.Job:  map 29% reduce 0%

16/08/10 15:09:11 INFO mapreduce.Job:  map 30% reduce 0%

16/08/10 15:09:55 INFO mapreduce.Job:  map 31% reduce 0%

16/08/10 15:10:25 INFO mapreduce.Job:  map 32% reduce 0%

16/08/10 15:11:40 INFO mapreduce.Job:  map 33% reduce 0%

16/08/10 15:12:08 INFO mapreduce.Job:  map 34% reduce 0%

16/08/10 15:12:42 INFO mapreduce.Job:  map 35% reduce 0%

16/08/10 15:13:30 INFO mapreduce.Job:  map 36% reduce 0%

16/08/10 15:14:05 INFO mapreduce.Job:  map 37% reduce 0%

16/08/10 15:14:29 INFO mapreduce.Job:  map 38% reduce 0%

16/08/10 15:14:59 INFO mapreduce.Job:  map 39% reduce 0%

16/08/10 15:15:44 INFO mapreduce.Job:  map 40% reduce 0%

16/08/10 15:16:12 INFO mapreduce.Job:  map 41% reduce 0%

16/08/10 15:17:00 INFO mapreduce.Job:  map 42% reduce 0%

16/08/10 15:17:35 INFO mapreduce.Job:  map 43% reduce 0%

16/08/10 15:18:00 INFO mapreduce.Job:  map 44% reduce 0%

16/08/10 15:18:45 INFO mapreduce.Job:  map 45% reduce 0%

16/08/10 15:19:26 INFO mapreduce.Job:  map 46% reduce 0%

16/08/10 15:19:52 INFO mapreduce.Job:  map 47% reduce 0%

16/08/10 15:20:42 INFO mapreduce.Job:  map 48% reduce 0%

16/08/10 15:21:11 INFO mapreduce.Job:  map 49% reduce 0%

16/08/10 15:21:52 INFO mapreduce.Job:  map 50% reduce 0%

16/08/10 15:22:37 INFO mapreduce.Job:  map 51% reduce 0%

16/08/10 15:23:21 INFO mapreduce.Job:  map 52% reduce 0%

16/08/10 15:23:56 INFO mapreduce.Job:  map 53% reduce 0%

16/08/10 15:24:33 INFO mapreduce.Job:  map 54% reduce 0%

16/08/10 15:25:10 INFO mapreduce.Job:  map 55% reduce 0%

16/08/10 15:25:36 INFO mapreduce.Job:  map 56% reduce 0%

16/08/10 15:26:14 INFO mapreduce.Job:  map 57% reduce 0%

16/08/10 15:26:58 INFO mapreduce.Job:  map 58% reduce 0%

16/08/10 15:27:40 INFO mapreduce.Job:  map 59% reduce 0%

16/08/10 15:28:18 INFO mapreduce.Job:  map 60% reduce 0%

16/08/10 15:28:49 INFO mapreduce.Job:  map 61% reduce 0%

16/08/10 15:29:36 INFO mapreduce.Job:  map 62% reduce 0%

16/08/10 15:30:08 INFO mapreduce.Job:  map 63% reduce 0%

16/08/10 15:30:37 INFO mapreduce.Job:  map 64% reduce 0%

16/08/10 15:31:06 INFO mapreduce.Job:  map 65% reduce 0%

16/08/10 15:32:05 INFO mapreduce.Job:  map 66% reduce 0%

16/08/10 15:32:39 INFO mapreduce.Job:  map 67% reduce 0%

16/08/10 15:33:16 INFO mapreduce.Job:  map 68% reduce 0%

16/08/10 15:33:53 INFO mapreduce.Job:  map 69% reduce 0%

16/08/10 15:34:26 INFO mapreduce.Job:  map 70% reduce 0%

16/08/10 15:34:59 INFO mapreduce.Job:  map 71% reduce 0%

16/08/10 15:35:32 INFO mapreduce.Job:  map 72% reduce 0%

16/08/10 15:36:26 INFO mapreduce.Job:  map 73% reduce 0%

16/08/10 15:36:59 INFO mapreduce.Job:  map 74% reduce 0%

16/08/10 15:37:40 INFO mapreduce.Job:  map 75% reduce 0%

16/08/10 15:37:47 INFO mapreduce.Job:  map 76% reduce 0%

16/08/10 15:38:28 INFO mapreduce.Job:  map 77% reduce 0%

16/08/10 15:39:16 INFO mapreduce.Job:  map 78% reduce 0%

16/08/10 15:39:43 INFO mapreduce.Job:  map 79% reduce 0%

16/08/10 15:40:12 INFO mapreduce.Job:  map 80% reduce 0%

16/08/10 15:40:43 INFO mapreduce.Job:  map 81% reduce 0%

16/08/10 15:41:14 INFO mapreduce.Job:  map 82% reduce 0%

16/08/10 15:42:01 INFO mapreduce.Job:  map 83% reduce 0%

16/08/10 15:42:30 INFO mapreduce.Job:  map 84% reduce 0%

16/08/10 15:43:24 INFO mapreduce.Job:  map 85% reduce 0%

16/08/10 15:43:51 INFO mapreduce.Job:  map 86% reduce 0%

16/08/10 15:44:27 INFO mapreduce.Job:  map 87% reduce 0%

16/08/10 15:44:54 INFO mapreduce.Job:  map 88% reduce 0%

16/08/10 15:45:31 INFO mapreduce.Job:  map 89% reduce 0%

16/08/10 15:45:53 INFO mapreduce.Job:  map 90% reduce 0%

16/08/10 15:46:48 INFO mapreduce.Job:  map 91% reduce 0%

16/08/10 15:47:23 INFO mapreduce.Job:  map 92% reduce 0%

16/08/10 15:48:02 INFO mapreduce.Job:  map 93% reduce 0%

16/08/10 15:48:36 INFO mapreduce.Job:  map 94% reduce 0%

16/08/10 15:49:07 INFO mapreduce.Job:  map 95% reduce 0%

16/08/10 15:49:43 INFO mapreduce.Job:  map 96% reduce 0%

16/08/10 15:50:10 INFO mapreduce.Job:  map 97% reduce 0%

16/08/10 15:51:10 INFO mapreduce.Job:  map 98% reduce 0%

16/08/10 15:51:54 INFO mapreduce.Job:  map 99% reduce 0%

16/08/10 15:52:49 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 15:53:13 INFO mapreduce.Job: Job job_1470728284233_0017 completed successfully

16/08/10 15:53:13 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=619092

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=66021

HDFS: Number of bytes written=81593058054

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=27988674

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=13994337

Total vcore-seconds taken by all map tasks=13994337

Total megabyte-seconds taken by all map tasks=21495301632

Map-Reduce Framework

Map input records=5385050

Map output records=5385050

Input split bytes=66021

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=174416

CPU time spent (ms)=3340330

Physical memory (bytes) snapshot=1647304704

Virtual memory (bytes) snapshot=13287915520

Total committed heap usage (bytes)=879755264

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=81593058054

16/08/10 15:53:13 INFO mapreduce.ImportJobBase: Transferred 75.9895 GB in 3,632.0301 seconds (21.4242 MB/sec)

16/08/10 15:53:13 INFO mapreduce.ImportJobBase: Retrieved 5385050 records.

[hdfs@amb2 ~]$


case 3. 압축 + parquet 

sqoop import --target-dir=/dev/data4_sn_pq --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table HDFS_2 -direct --as-parquetfile --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ACOL1


실행 결과

실패 ( 어떤 상황이 생겼는지는 나만 알아야겠다;;; 끔찍하다;;; )

- 50만건으로 재시도하기로함.




Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설

sqoop 으로 hadoop 으로 넣을경우 4가지 파일 포멧 리스트 


* Oracle 에 있는데이터를 Hadoop 으로 옮겨넣을때 그동안은 실시간으로 넣어야해서 flume을 썼는데 

배치성으로 도는 작업등은 flume까진 필요없었다. 

지금 들어와있는 프로젝트에서는 sqoop 을 사용 해서 데이터를 hadoop으로 넣는 작업을 진행했다. 


sqoop 은 크게 어려움은 없었으며 쉘상에서 명령어의 사용을 통해서 데이터를 전송해서 사실 개인적으로 사용하기 많이 편했다. 단지 플럼처럼 커스터마이징이 될지는 아직 모르는 상태.


원본은 

ORACLE 상에 일반 HEAP TABLE 이다. 

테스트용 테이블을 만들고나서 임시로 1,000 건의 데이터를 넣었다.


CREATE TABLE HDFS_4 

 (

  ID VARCHAR(100),

  NUM NUMBER(10),

  TEST VARCHAR(100),

  REG_DT DATE DEFAULT SYSDATE 

);


INSERT INTO HDFS_4

SELECT 'USR_'||Dbms_Random.string('A',5),

       Trunc(Dbms_Random.Value(10000,90000)),

       Dbms_Random.string('A',100),

       SYSDATE + To_Number(Dbms_Random.Value(1,30)) 

FROM DUAL

CONNECT BY LEVEL <= 1000;








1. text plain 


실행한 명령어

sqoop import --target-dir=/dev/test2_text --query='select *from HDFS_4 where $CONDITIONS' -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_text --query='select *from HDFS_4 where $CONDITIONS' -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:09:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:09:28 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:09:28 INFO manager.SqlManager: Using default fetchSize of 1000

16/08/10 09:09:28 INFO tool.CodeGenTool: Beginning code generation

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:09:29 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:09:29 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:09:29 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:09:30 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/284d77af5917fa2113d961ae72341cc4/QueryResult.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:09:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/284d77af5917fa2113d961ae72341cc4/QueryResult.jar

16/08/10 09:09:31 INFO mapreduce.ImportJobBase: Beginning query import.

16/08/10 09:09:33 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 09:09:33 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 09:09:35 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 09:09:35 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM (select *from HDFS_4 where  (1 = 1) ) t1

16/08/10 09:09:35 WARN db.TextSplitter: Generating splits for a textual index column.

16/08/10 09:09:35 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

16/08/10 09:09:35 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

16/08/10 09:09:35 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 09:09:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0007

16/08/10 09:09:35 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0007

16/08/10 09:09:35 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0007/

16/08/10 09:09:35 INFO mapreduce.Job: Running job: job_1470728284233_0007

16/08/10 09:09:45 INFO mapreduce.Job: Job job_1470728284233_0007 running in uber mode : false

16/08/10 09:09:45 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 09:09:53 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 09:09:55 INFO mapreduce.Job:  map 75% reduce 0%

16/08/10 09:09:56 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 09:09:56 INFO mapreduce.Job: Job job_1470728284233_0007 completed successfully

16/08/10 09:09:56 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=606092

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=489

HDFS: Number of bytes written=139000

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=57580

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=28790

Total vcore-seconds taken by all map tasks=28790

Total megabyte-seconds taken by all map tasks=44221440

Map-Reduce Framework

Map input records=1000

Map output records=1000

Input split bytes=489

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=2674

CPU time spent (ms)=38860

Physical memory (bytes) snapshot=1292296192

Virtual memory (bytes) snapshot=13235789824

Total committed heap usage (bytes)=689438720

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=139000

16/08/10 09:09:56 INFO mapreduce.ImportJobBase: Transferred 135.7422 KB in 23.6533 seconds (5.7388 KB/sec)

16/08/10 09:09:56 INFO mapreduce.ImportJobBase: Retrieved 1000 records.

[hdfs@amb2 ~]$ 



cat 으로 보면 내용이 다 보인다 

컬럼간의 구분이 쉼표로 되어있는것이 보인다.

CSV랑 같은 형태로 저장



2. AVRO 


실행한 명령어

sqoop import --target-dir=/dev/test2_avro --query='select *from HDFS_4 where $CONDITIONS' --as-avrodatafile  -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_avro --query='select *from HDFS_4 where $CONDITIONS' --as-avrodatafile  -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:15:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:15:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:15:04 INFO manager.SqlManager: Using default fetchSize of 1000

16/08/10 09:15:04 INFO tool.CodeGenTool: Beginning code generation

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:15:05 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:15:05 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:15:05 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:15:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/837dbbb2e304900b1151d2fa6186b0b7/QueryResult.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:15:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/837dbbb2e304900b1151d2fa6186b0b7/QueryResult.jar

16/08/10 09:15:07 INFO mapreduce.ImportJobBase: Beginning query import.

16/08/10 09:15:08 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:15:08 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:15:08 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:15:08 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: /tmp/sqoop-hdfs/compile/837dbbb2e304900b1151d2fa6186b0b7/QueryResult.avsc

16/08/10 09:15:08 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 09:15:08 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 09:15:10 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 09:15:10 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM (select *from HDFS_4 where  (1 = 1) ) t1

16/08/10 09:15:10 WARN db.TextSplitter: Generating splits for a textual index column.

16/08/10 09:15:10 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

16/08/10 09:15:10 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

16/08/10 09:15:10 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 09:15:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0008

16/08/10 09:15:11 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0008

16/08/10 09:15:11 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0008/

16/08/10 09:15:11 INFO mapreduce.Job: Running job: job_1470728284233_0008

16/08/10 09:15:18 INFO mapreduce.Job: Job job_1470728284233_0008 running in uber mode : false

16/08/10 09:15:18 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 09:15:26 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 09:15:27 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 09:15:28 INFO mapreduce.Job: Job job_1470728284233_0008 completed successfully

16/08/10 09:15:28 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=607148

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=489

HDFS: Number of bytes written=130230

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=49984

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=24992

Total vcore-seconds taken by all map tasks=24992

Total megabyte-seconds taken by all map tasks=38387712

Map-Reduce Framework

Map input records=1000

Map output records=1000

Input split bytes=489

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1696

CPU time spent (ms)=38110

Physical memory (bytes) snapshot=1327677440

Virtual memory (bytes) snapshot=13322395648

Total committed heap usage (bytes)=721420288

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=130230

16/08/10 09:15:28 INFO mapreduce.ImportJobBase: Transferred 127.1777 KB in 20.4259 seconds (6.2263 KB/sec)

16/08/10 09:15:28 INFO mapreduce.ImportJobBase: Retrieved 1000 records.

[hdfs@amb2 ~]$ 



cat 으로보면 구분자쪽이 약간 깨진것처럼 보인다. 



3. 시퀀스파일 


실행한 명령어

sqoop import --target-dir=/dev/test2_seq --query='select *from HDFS_4 where $CONDITIONS' --as-sequencefile  -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID


실행 결과

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_seq --query='select *from HDFS_4 where $CONDITIONS' --as-sequencefile  -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:22:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:22:31 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:22:31 INFO manager.SqlManager: Using default fetchSize of 1000

16/08/10 09:22:31 INFO tool.CodeGenTool: Beginning code generation

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:22:32 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:22:32 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:22:32 INFO manager.SqlManager: Executing SQL statement: select *from HDFS_4 where  (1 = 0) 

16/08/10 09:22:32 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/3a0d3133347de2dccea51b1c74f948bd/QueryResult.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:22:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/3a0d3133347de2dccea51b1c74f948bd/QueryResult.jar

16/08/10 09:22:34 INFO mapreduce.ImportJobBase: Beginning query import.

16/08/10 09:22:36 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 09:22:36 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 09:22:38 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 09:22:38 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM (select *from HDFS_4 where  (1 = 1) ) t1

16/08/10 09:22:38 WARN db.TextSplitter: Generating splits for a textual index column.

16/08/10 09:22:38 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

16/08/10 09:22:38 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

16/08/10 09:22:38 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 09:22:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0009

16/08/10 09:22:38 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0009

16/08/10 09:22:39 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0009/

16/08/10 09:22:39 INFO mapreduce.Job: Running job: job_1470728284233_0009

16/08/10 09:22:47 INFO mapreduce.Job: Job job_1470728284233_0009 running in uber mode : false

16/08/10 09:22:47 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 09:22:54 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 09:22:55 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 09:22:56 INFO mapreduce.Job: Job job_1470728284233_0009 completed successfully

16/08/10 09:22:56 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=605572

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=489

HDFS: Number of bytes written=157788

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=48586

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=24293

Total vcore-seconds taken by all map tasks=24293

Total megabyte-seconds taken by all map tasks=37314048

Map-Reduce Framework

Map input records=1000

Map output records=1000

Input split bytes=489

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=765

CPU time spent (ms)=27050

Physical memory (bytes) snapshot=1312333824

Virtual memory (bytes) snapshot=13253160960

Total committed heap usage (bytes)=707788800

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=157788

16/08/10 09:22:56 INFO mapreduce.ImportJobBase: Transferred 154.0898 KB in 20.6491 seconds (7.4623 KB/sec)

16/08/10 09:22:56 INFO mapreduce.ImportJobBase: Retrieved 1000 records.

[hdfs@amb2 ~]$ 


cat 으로보면 맨위에 시퀀스파일이다! 라고 선언하고 약간깨진듯한문자로 이어진 형태를 보인다.



4. 파켓? 파큇? 파일 


실행한 명령어

* sqoop 은 현재 테스트버전은 1.4.6 이다. 명령어를 보면 위와 좀 다르다.

위에 명령어는 전부 query를 날린건데 이건 TABLE지정으로 바꿨다. 

현재 버그가 있는 관계로 아래처럼 바꿨으며 테스트 시점에 검색을 해보니 패치파일이 있었다. 

그러나 난 ambari를 통해서 아예 통으로 설치해서 일일 찾아서 수정하기 귀찮으므로 1.4.7버전을 기다린다 그냥. ㅎㅎ;

sqoop import --target-dir=/dev/test2_pq --table HDFS_4 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID


아래는 쿼리를 실행했을경우 나는 오류로 JAVA null pointer 예외 가 발생한다.

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_pq --query='select *from HDFS_4 where $CONDITIONS' --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:29:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:29:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:29:29 INFO manager.SqlManager: Using default fetchSize of 1000

16/08/10 09:29:29 INFO tool.CodeGenTool: Beginning code generation

16/08/10 09:29:29 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException

java.lang.NullPointerException

at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:97)

at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)

at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)

at org.apache.sqoop.Sqoop.run(Sqoop.java:148)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)

at org.apache.sqoop.Sqoop.main(Sqoop.java:244)

[hdfs@amb2 ~]$


실행 결과 

TABLE 을 가져오는 명령으로 실행하면 아래와 같이 실행중에 에러가난다.  

젠장~ 내용을 대충 보자면~ 데이터 타입으로 보여져서 기존table에서 데이터 타입만 변경한 뷰를 만들어서 그 뷰를 가져와보기로 한다.

16/08/10 09:31:25 INFO mapreduce.Job: Task Id : attempt_1470728284233_0010_m_000002_1, Status : FAILED

Error: org.apache.avro.UnresolvedUnionException: Not in union ["null","long"]: 2016-08-29 21:09:28.0

at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:561)

at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:941)

at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:922)

at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.copy(DatasetKeyOutputFormat.java:327)

at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.write(DatasetKeyOutputFormat.java:321)

at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.write(DatasetKeyOutputFormat.java:300)

at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)

at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)

at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)

at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:70)

at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:39)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)

at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)


Container killed by the ApplicationMaster.

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143


위에러로 인한 뷰생성후 재실행 

CREATE OR REPLACE VIEW VW_HDFS_4

AS

SELECT ID,NUM,TEST,To_Char(REG_DT,'YYYY-MM-DD HH24:MI:SS') REG_DT

  FROM HDFS_4;


실행결과 - 아래 박스에서 검은글씨부분은 에러난것 , 컬러가 있는것은 정상실행부분 

에러는 이미 실행전에 hadoop에 디렉토리를 만들고 meta파일을 생성해서 이미 존재한다고 한 에러.

[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_pq --table VW_HDFS_4 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:36:33 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:36:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:36:33 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:36:34 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop will not process this sqoop connection, as "FLASHONE"."VW_HDFS_4" is not an Oracle table, it's a VIEW.

16/08/10 09:36:34 INFO tool.CodeGenTool: Beginning code generation

16/08/10 09:36:34 INFO tool.CodeGenTool: Will generate java class as codegen_VW_HDFS_4

16/08/10 09:36:35 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:36:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "VW_HDFS_4" t WHERE 1=0

16/08/10 09:36:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/95f85eece904f411d74c356ac450d5b5/codegen_VW_HDFS_4.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:36:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/95f85eece904f411d74c356ac450d5b5/codegen_VW_HDFS_4.jar

16/08/10 09:36:36 INFO mapreduce.ImportJobBase: Beginning import of VW_HDFS_4

16/08/10 09:36:37 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:36:38 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:36:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "VW_HDFS_4" t WHERE 1=0

16/08/10 09:36:38 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetExistsException: Descriptor directory already exists: hdfs://amb2.local:8020/dev/test2_pq/.metadata

org.kitesdk.data.DatasetExistsException: Descriptor directory already exists: hdfs://amb2.local:8020/dev/test2_pq/.metadata

at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:219)

at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)

at org.kitesdk.data.Datasets.create(Datasets.java:239)

at org.kitesdk.data.Datasets.create(Datasets.java:307)

at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)

at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)

at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)

at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)

at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)

at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:445)

at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)

at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)

at org.apache.sqoop.Sqoop.run(Sqoop.java:148)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)

at org.apache.sqoop.Sqoop.main(Sqoop.java:244)



[hdfs@amb2 ~]$ sqoop import --target-dir=/dev/test2_pq2 --table VW_HDFS_4 --as-parquetfile -direct --connect jdbc:oracle:thin:@192.168.0.117:1521:ORCL --username flashone --password 1234 --split-by ID

16/08/10 09:36:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258

16/08/10 09:36:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

16/08/10 09:36:56 INFO manager.SqlManager: Using default fetchSize of 1000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

16/08/10 09:36:57 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop will not process this sqoop connection, as "FLASHONE"."VW_HDFS_4" is not an Oracle table, it's a VIEW.

16/08/10 09:36:57 INFO tool.CodeGenTool: Beginning code generation

16/08/10 09:36:57 INFO tool.CodeGenTool: Will generate java class as codegen_VW_HDFS_4

16/08/10 09:36:57 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:36:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "VW_HDFS_4" t WHERE 1=0

16/08/10 09:36:57 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce

Note: /tmp/sqoop-hdfs/compile/586ec1c10f04fc4c1fdd5986c9fae96b/codegen_VW_HDFS_4.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/08/10 09:36:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/586ec1c10f04fc4c1fdd5986c9fae96b/codegen_VW_HDFS_4.jar

16/08/10 09:36:59 INFO mapreduce.ImportJobBase: Beginning import of VW_HDFS_4

16/08/10 09:36:59 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:37:00 INFO manager.OracleManager: Time zone has been set to GMT

16/08/10 09:37:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "VW_HDFS_4" t WHERE 1=0

16/08/10 09:37:02 INFO impl.TimelineClientImpl: Timeline service address: http://amb3.local:8188/ws/v1/timeline/

16/08/10 09:37:02 INFO client.RMProxy: Connecting to ResourceManager at amb3.local/192.168.0.143:8050

16/08/10 09:37:04 INFO db.DBInputFormat: Using read commited transaction isolation

16/08/10 09:37:04 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN("ID"), MAX("ID") FROM "VW_HDFS_4"

16/08/10 09:37:04 WARN db.TextSplitter: Generating splits for a textual index column.

16/08/10 09:37:04 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

16/08/10 09:37:04 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

16/08/10 09:37:04 INFO mapreduce.JobSubmitter: number of splits:4

16/08/10 09:37:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470728284233_0011

16/08/10 09:37:04 INFO impl.YarnClientImpl: Submitted application application_1470728284233_0011

16/08/10 09:37:04 INFO mapreduce.Job: The url to track the job: http://amb3.local:8088/proxy/application_1470728284233_0011/

16/08/10 09:37:04 INFO mapreduce.Job: Running job: job_1470728284233_0011

16/08/10 09:37:11 INFO mapreduce.Job: Job job_1470728284233_0011 running in uber mode : false

16/08/10 09:37:11 INFO mapreduce.Job:  map 0% reduce 0%

16/08/10 09:37:20 INFO mapreduce.Job:  map 25% reduce 0%

16/08/10 09:37:21 INFO mapreduce.Job:  map 100% reduce 0%

16/08/10 09:37:22 INFO mapreduce.Job: Job job_1470728284233_0011 completed successfully

16/08/10 09:37:22 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=610560

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=28037

HDFS: Number of bytes written=137608

HDFS: Number of read operations=200

HDFS: Number of large read operations=0

HDFS: Number of write operations=36

Job Counters 

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=60614

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=30307

Total vcore-seconds taken by all map tasks=30307

Total megabyte-seconds taken by all map tasks=46551552

Map-Reduce Framework

Map input records=1000

Map output records=1000

Input split bytes=505

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1345

CPU time spent (ms)=41390

Physical memory (bytes) snapshot=1502236672

Virtual memory (bytes) snapshot=13242277888

Total committed heap usage (bytes)=828375040

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=0

16/08/10 09:37:22 INFO mapreduce.ImportJobBase: Transferred 134.3828 KB in 21.319 seconds (6.3034 KB/sec)

16/08/10 09:37:22 INFO mapreduce.ImportJobBase: Retrieved 1000 records.

[hdfs@amb2 ~]$


cat으로 보면 첫번째 스크린샷은 시작지점이고 두번째 스크린샷은 마지막부분이다. 

타입에대한 내용도 기술되어있다. 좀더 보면 ~avro.schema라는 글도 보인다. 

검색대로 스키마를 avro를 사용한것으로 보인다.



이번프로젝트에서는 text(CSV) 와 parquet을 사용하력 한다.

이제는 parquet으로 저장했을경우 데이터를 다른놈들이 잘가져오는지를 봐야한다. ㅠ_ㅜ;

Posted by ORACLE,DBA,BIG,DATA,JAVA 흑풍전설