2015年10月19日 星期一

雲端 UberOS 資戰機 - Hadoop 電戰系統 (HDFS,YARN)

在 雲端 UberOS 資戰機 的 Hadoop 電戰系統中, 內定 nna, dna1 及 dna2 這三部貨櫃主機 負責 HDFS 分散檔案系統, rma, nma1 及 nma2 這三部貨櫃主機 負責 YARN 分散運算系統, 至於 Hadoop Client 則是由 cla01 貨櫃主機負責

啟動與設定 HDFS 分散檔案系統

1. 啟動 HDFS 所有貨櫃主機
$ dkstart  a.hdfs
nna starting
   java version "1.7.0_79"
   Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

dna1 starting
   java version "1.7.0_79"
   Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

dna2 starting
   java version "1.7.0_79"
   Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

2. 格式化與啟動 HDFS 分散檔案系統 
$ formathdfs  a  myring (只需做一次)
format (yes/no) yes
nna format ok
nna clean sn
dna1 clean dn
dna2 clean dn

$ starthdfs  a
starting namenode, logging to /tmp/hadoop-bigred-namenode-nna.out
starting secondarynamenode, logging to /tmp/hadoop-bigred-secondarynamenode-nna.out
starting datanode, logging to /tmp/hadoop-bigred-datanode-dna1.out
starting datanode, logging to /tmp/hadoop-bigred-datanode-dna2.out

3. 檢視 HDFS 運作資訊 
$ hdfs dfsadmin -report
Configured Capacity: 39348232192 (36.65 GB)
Present Capacity: 12454195200 (11.60 GB)
DFS Remaining: 7035166720 (6.55 GB)
DFS Used: 5419028480 (5.05 GB)
DFS Used%: 43.51%
Under replicated blocks: 4
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):

Name: 172.17.10.21:50010 (dna2)
Hostname: dna2
Decommission Status : Normal
Configured Capacity: 19674116096 (18.32 GB)
DFS Used: 2709520384 (2.52 GB)
Non DFS Used: 13448355840 (12.52 GB)
DFS Remaining: 3516239872 (3.27 GB)
DFS Used%: 13.77%
DFS Remaining%: 17.87%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Nov 01 00:14:23 CST 2015

Name: 172.17.10.20:50010 (dna1)
Hostname: dna1
Decommission Status : Normal
Configured Capacity: 19674116096 (18.32 GB)
DFS Used: 2709508096 (2.52 GB)
Non DFS Used: 13445681152 (12.52 GB)
DFS Remaining: 3518926848 (3.28 GB)
DFS Used%: 13.77%
DFS Remaining%: 17.89%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Nov 01 00:14:24 CST 2015

4. HDFS 分散檔案系統的目錄權限規劃 (只需做一次)
$ ssh nna
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.16.0-45-generic x86_64)

* Documentation: https://help.ubuntu.com/
Last login: Thu Aug 20 11:37:01 2015 from 172.17.42.1

產生 ds01 及 ds02 帳號
bigred@nna:~$ sudo useradd -m -s /bin/bash  ds01
bigred@nna:~$ sudo useradd -m -s /bin/bash  ds02

產生 biguser 群組
bigred@nna:~$ sudo groupadd biguser

將 ds01 及 ds02 加入 biguser 群組
bigred@nna:~$ sudo usermod -aG  biguser  ds01
bigred@nna:~$ sudo usermod -aG  biguser  ds02

建立 ds01 及 ds02 資料家目錄
bigred@nna:~$ hdfs dfs -mkdir  /tmp
bigred@nna:~$ hdfs dfs -mkdir -p  /user/ds01
bigred@nna:~$ hdfs dfs -mkdir  /user/ds02

將 /tmp 目錄權限設為所有人可存取
bigred@nna:~$ hdfs dfs -chmod 777  /tmp

設定 ds01 及 ds02 資料家目錄權限
bigred@nna:~$ hdfs dfs -chown ds01:biguser  /user/ds01
bigred@nna:~$ hdfs dfs -chown ds02:biguser  /user/ds02

bigred@nna:~$ exit

資料科學家上工

1. 啟動 Hadoop Client 貨櫃主機
$ dkstart a.client
cla01 starting
   java version "1.7.0_79"
   Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
   dk:2211->172.17.10.100:22

2. 登入 Hadoop Client 貨櫃主機
$ ssh ds01@cla01
ds01@cla01's password: ds01
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.16.0-50-generic x86_64)

* Documentation: https://help.ubuntu.com/
Last login: Sun Oct 18 00:11:33 2015 from 172.17.42.1

$ ping nna
PING nna (172.17.10.10) 56(84) bytes of data.
64 bytes from nna (172.17.10.10): icmp_seq=1 ttl=64 time=0.236 ms
64 bytes from nna (172.17.10.10): icmp_seq=2 ttl=64 time=0.298 ms
64 bytes from nna (172.17.10.10): icmp_seq=3 ttl=64 time=0.297 ms
64 bytes from nna (172.17.10.10): icmp_seq=4 ttl=64 time=0.123 ms

rtt min/avg/max/mdev = 0.123/0.238/0.298/0.072 ms

3. 資料管理 - 建立目錄與上載檔案
$ hdfs dfs -mkdir mytest
$ hdfs dfs -ls
Found 1 items
drwxr-xr-x - bigred biguser 0 2015-10-19 19:29 mytest

$ hdfs dfs -put /etc/passwd mytest/
$ hdfs dfs -ls mytest
Found 1 items
-rw-r--r-- 2 bigred biguser 1235 2015-10-19 19:39 mytest/passwd

$ hdfs dfs -cat mytest/passwd
                     ::
daemon,,,:/var/lib/colord:/bin/false

$ hdfs dfs -rm -r mytest
15/10/19 19:43:06 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted mytest

4. 資料管理 - 下載 103 年大專校院名錄檔
$ wget --no-check-certificate https://stats.moe.gov.tw/files/school/103/u1_new.txt

5. 轉換資料格式
$ iconv -f UCS-2 -t utf8 u1_new.txt -o school.txt

6. 顯示檔案內容
$ head -n 5 school.txt 
103學年度大專校院名錄

代碼 學校名稱 縣市名稱 地址 電話 網址 體系別
0001 國立政治大學 [38]臺北市 [116]臺北市文山區指南路二段64號 (02)29393091 http://www.nccu.edu.tw [1]一般
0002 國立清華大學 [18]新竹市 [300]新竹市東區光復路二段101號 (03)5715131 http://www.nthu.edu.tw [1]一般

7. 資料管理 - 上載 103 年大專校院名錄檔
$ hdfs dfs -put school.txt school.txt
$ hdfs dfs -ls
Found 1 items
-rw-r--r-- 2 ds01 biguser 20807 2015-08-20 13:11 school.txt

8. 登出 Hadoop Client 貨櫃主機
$ exit
logout
Connection to cla01 closed.

啟動 YARN 分散運算系統

1. 啟動 YARN 所有貨櫃主機
$ dkstart a.yarn
rma starting
   java version "1.7.0_79"
   Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

nma1 starting
   java version "1.7.0_79"
   Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

nma2 starting
   java version "1.7.0_79"
   Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

2. 啟動 YARN 分散運算系統
$ startyarn a
starting resourcemanager, logging to /tmp/yarn-bigred-resourcemanager-rma.out
starting historyserver, logging to /home/bigred/jhs/mapred-bigred-historyserver-rma.out
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma1.out
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma2.out

3. 檢視 YARN 運作資訊 
$ yarn node -list -all
15/11/01 00:18:32 INFO client.RMProxy: Connecting to ResourceManager at rma/172.17.10.30:8032
15/11/01 00:18:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Total Nodes:2
         Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
      nma1:52197                RUNNING         nma1:8042                                  0
      nma2:41974                RUNNING         nma2:8042                                  0

重新配置 YARN 分散運算資源

1. 檢視目前資源配置
$ curl http://rma:8088/ws/v1/cluster/metrics
{"clusterMetrics":{"appsSubmitted":0,"appsCompleted":0,"appsPending":0,"appsRunning":0,"appsFailed":0,"appsKilled":0,"reservedMB":0,"availableMB":2048,"allocatedMB":0,"reservedVirtualCores":0,"availableVirtualCores":2,"allocatedVirtualCores":0,"containersAllocated":0,"containersReserved":0,"containersPending":0,"totalMB":2048,"totalVirtualCores":2,"totalNodes":2,"lostNodes":0,"unhealthyNodes":0,"decommissionedNodes":0,"rebootedNodes":0,"activeNodes":2}}

[重點] YARN 系統內定每部 Node Manager 的記憶體為 8G, CPU 為 8 Core

2. 設定 YARN 運算資源
$ sudo nano /opt/conf/A/yarn-site.xml
                           ::
  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>1536</value>
  </property>

  <property>
     <name>yarn.nodemanager.resource.cpu-vcores</name>
     <value>2</value>
  </property>

3. 設定 MapReduce 程式運算資源
$ sudo nano /opt/conf/A/mapred-site.xml
                               ::
  <property>
    <name>yarn.app.mapreduce.am.resource.mb</name>
    <value>512</value>
  </property>


4. 重新啟動 YARN
stopyarn a 
stopping resourcemanager 
stopping historyserver 
stopping nodemanager 
stopping nodemanager 

startyarn a 
starting resourcemanager, logging to /tmp/yarn-bigred-resourcemanager-rma.out 
starting historyserver, logging to /home/bigred/jhs/mapred-bigred-historyserver-rma.out 
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma1.out 
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma2.out 

資料科學家再次上工

1. 再次登入 Hadoop Client 貨櫃主機
$ ssh ds01@cla01
ds01@cla01's password:
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.16.0-50-generic x86_64)

* Documentation: https://help.ubuntu.com/
Last login: Sun Oct 18 00:11:33 2015 from 172.17.42.1

2. 啟動 Pig 分析工具
$ pig  -version
Apache Pig version 0.15.0 (r1682971)
compiled Jun 01 2015, 11:44:35

$ pig
grunt> pwd
hdfs://nna:8020/user/ds01

3. 資料載入與顯示
grunt> school = LOAD 'school.txt' AS (sno:int,name:chararray);
grunt> dump school;
              ::
(1283,仁德醫護管理專科學校)
(1284,樹人醫護管理專科學校)
(1285,慈惠醫護管理專科學校)
(1286,耕莘健康管理專科學校)
(1287,敏惠醫護管理專科學校)
(1288,高美醫護管理專科學校)
(1289,育英醫護管理專科學校)
(1290,崇仁醫護管理專科學校)
(1291,聖母醫護管理專科學校)
(1292,新生醫護管理專科學校)
(,)
(,)

4. 資料過濾
grunt> frec = FILTER school by sno is not null;
grunt> dump frec;
                   ::
(1285,慈惠醫護管理專科學校)
(1286,耕莘健康管理專科學校)
(1287,敏惠醫護管理專科學校)
(1288,高美醫護管理專科學校)
(1289,育英醫護管理專科學校)
(1290,崇仁醫護管理專科學校)
(1291,聖母醫護管理專科學校)
(1292,新生醫護管理專科學校)

5. 離開 Pig
grunt> quit;
2015-08-27 14:53:47,782 [main] INFO org.apache.pig.Main - Pig script completed in 21 seconds and 322 milliseconds (21322 ms)

6. 登出 Hadoop Client 貨櫃主機
$ exit
logout
Connection to cla01 closed.

資料科學家收工

1. 關閉 HDFS 及 YARN 分散系統
$ stopyarn a
stopping resourcemanager
stopping historyserver
stopping nodemanager
stopping nodemanager

$ stophdfs a
stopping namenode
stopping secondarynamenode
stopping datanode
stopping datanode

2. 關閉 Hadoop 所有貨櫃主機
$ dkstop a
cla01 Exiting
dk:2211->172.17.10.100:22 deleted

nna Exiting
dna1 Exiting
dna2 Exiting
rma Exiting
nma1 Exiting
nma2 Exiting

3. 關閉 雲端 UberOS 資戰機
$ bye

2015年10月5日 星期一

雲端 UberOS 資戰機 - 首航 (Hadoop 2.x)

簡易操作說明

UberOS271 系統登入的帳號及密碼均為 bigred, 登入後執行 dkls 命令, 執行結果如下 :

bigred@dk:~$ dkls
Docker Utility 0.2 (2015/07/21)

[Container]
--------------------------------------------------------------------------------------------
myweb(816ddf9479ac)  Exited
nma2(c639239d3bf8) 172.17.10.51 Exited ()
nma1(f907aa2662c3) 172.17.10.50 Exited ()
rma(68bc2e91dbf2) 172.17.10.30 Exited ()
dna2(febce027f05e) 172.17.10.21 Exited ()
dna1(016988f1e165) 172.17.10.20 Exited ()
nna(d359071723e9) 172.17.10.10 Exited (user:ds01 ds02)
cla01(ad24d7c10d48) 172.17.10.100 Exited (dk:2211->cla01:22, user:ds01 ds02)

[Images]
--------------------------------------------------------------------------------------------
dafu/bigdata        0.2                 3cf2e2291746        9 weeks ago         1.491 GB
dafu/myweb          latest              7768c27140cc        4 months ago        367.3 MB

由以上資訊得知, 在 UberOS271 系統中已內建 8 部貨櫃主機, 啟動這些貨櫃主機 (不包含 myweb), 命令如下 :

bigred@dk:~$ dkstart a
cla01 starting
  java version "1.7.0_79"
  Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
  dk:2211->172.17.10.100:22

nna starting
  java version "1.7.0_79"
  Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

dna1 starting
  java version "1.7.0_79"
  Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

dna2 starting
  java version "1.7.0_79"
  Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

rma starting
  java version "1.7.0_79"
  Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

nma1 starting
  java version "1.7.0_79"
  Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

nma2 starting
  java version "1.7.0_79"
  Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

多部貨櫃主機啟動後, 執行 "starthdfs a" 命令, 將 HDFS 系統啟動

bigred@dk:~$ starthdfs a
starting namenode, logging to /tmp/hadoop-bigred-namenode-nna.out
starting secondarynamenode, logging to /tmp/hadoop-bigred-secondarynamenode-nna.out
starting datanode, logging to /tmp/hadoop-bigred-datanode-dna1.out
starting datanode, logging to /tmp/hadoop-bigred-datanode-dna2.out

執行 "hdfs dfs -ls /" 命令, 檢視 HDFS 檔案系統的根目錄內容

bigred@dk:~$ hdfs dfs -ls /
Found 2 items
drwxrwxrwx   - bigred biguser          0 2015-09-28 15:28 /tmp
drwxr-xr-x   - bigred biguser          0 2015-09-01 19:18 /user

確認 HDFS 正常執行, 執行 "startyarn a" 命令, 將 YARN 系統啟動

bigred@dk:~$ startyarn a
starting resourcemanager, logging to /tmp/yarn-bigred-resourcemanager-rma.out
starting historyserver, logging to /home/bigred/jhs/mapred-bigred-historyserver-rma.out
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma1.out
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma2.out

執行 "yarn node -list -all" 命令, 確認 nma1 及 nma2 這二部 Node Manager 主機正常運作

bigred@dk:~$ yarn node -list -all
15/10/08 20:04:34 INFO client.RMProxy: Connecting to ResourceManager at rma/172.17.10.30:8032
15/10/08 20:04:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Total Nodes:2
         Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
      nma2:44635                RUNNING         nma2:8042                                  0
      nma1:46128                RUNNING         nma1:8042                                  0

登入 資料科學家 主機 (cla01), 帳號及密碼均為 ds01, 命令如下 :
bigred@dk:~$ ssh ds01@cla01
ds01@cla01's password:
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.16.0-50-generic x86_64)

 * Documentation:  https://help.ubuntu.com/
Last login: Tue Oct  6 11:17:29 2015 from 172.17.42.1
ds01@cla01:~$

執行以下 Mapreduce 程式, 確認 Hadoop 系統能正常運作.

ds01@cla01:~$ hadoop jar /opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 1 1
                                                   ::
        File Input Format Counters
                Bytes Read=118
        File Output Format Counters
                Bytes Written=97
Job Finished in 56.222 seconds
Estimated value of Pi is 4.00000000000000000000


離開 資料科學家 主機 (cla01), 命令如下 :
ds01@cla01:~$ exit
logout
Connection to cla01 closed.

關閉 YARN 分散運算系統
bigred@dk:~$ stopyarn a
stopping resourcemanager
stopping historyserver
stopping nodemanager
nodemanager did not stop gracefully after 5 seconds: killing with kill -9
stopping nodemanager
nodemanager did not stop gracefully after 5 seconds: killing with kill -9

關閉 HDFS 分散檔案系統
bigred@dk:~$ stophdfs a
stopping namenode
stopping secondarynamenode
stopping datanode
stopping datanode

關閉所有貨櫃主機
bigred@dk:~$ dkstop a
cla01 Exiting
dk:2211->172.17.10.100:22 deleted

nna Exiting
dna1 Exiting
dna2 Exiting
rma Exiting
nma1 Exiting
nma2 Exiting

關閉 UberOS271 主機
bigred@dk:~$ bye

認識雲端 UberOS 資戰機

雲端 UberOS 資戰機主要做為 Big Data 資料管理, 分析 及 訓練之用, 這套系統是由 Ubuntu 14.04 + Docker 1.9.0 + Hadoop 2.7.1 組合而成, 在 雲端 UberOS 資戰機 中已內建一個完整 Hadoop 2.7.1 生態系統, 系統架構圖如下 :




在上圖中, 共有 8 部貨櫃主機 (Container), nna, dna1 及 dna2 這三部主機負責 HDFS 分散檔案系統,rma, nma1 及 nma2 這三部主機負責 YARN 分散運算系統cla01 主機則是提供給資料科學家分析之用, 已內建 Pig, Hive 工具

目前 雲端 UberOS 資戰機 只提供 VMware 版本. 要執行此一版本, 最低系統需求如下 :
1. 雙核心 CPU
2. 4 G 記憶體
3. 微軟 64 位元作業系統 (Windows 7/8/10)
4. VMware Player 虛擬軟體 (這是免費軟體), 下載網址如下 :

雲端 UberOS 資戰機 下載網址如下 :

2015/11/06 更新
1. docker 版本更新至 1.9.0
2. /opt/hosts-0.2 設定檔的 gateway 更新為 172.17.0.1