在 雲端 UberOS 資戰機 的 Hadoop 電戰系統中, 內定 nna, dna1 及 dna2 這三部貨櫃主機 負責 HDFS 分散檔案系統, rma, nma1 及 nma2 這三部貨櫃主機 負責 YARN 分散運算系統, 至於 Hadoop Client 則是由 cla01 貨櫃主機負責
啟動與設定 HDFS 分散檔案系統
1. 啟動 HDFS 所有貨櫃主機
$ dkstart a.hdfs nna starting
java version "1.7.0_79"
Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
dna1 starting
java version "1.7.0_79"
Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
dna2 starting
java version "1.7.0_79"
Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
2. 格式化與啟動 HDFS 分散檔案系統
$ formathdfs a myring (只需做一次)
format (yes/no) yes
nna format ok
nna clean sn
dna1 clean dn
dna2 clean dn
$ starthdfs a
starting namenode, logging to /tmp/hadoop-bigred-namenode-nna.out
starting secondarynamenode, logging to /tmp/hadoop-bigred-secondarynamenode-nna.out
starting datanode, logging to /tmp/hadoop-bigred-datanode-dna1.out
starting datanode, logging to /tmp/hadoop-bigred-datanode-dna2.out
3. 檢視 HDFS 運作資訊
$ hdfs dfsadmin -report
Configured Capacity: 39348232192 (36.65 GB)
Present Capacity: 12454195200 (11.60 GB)
DFS Remaining: 7035166720 (6.55 GB)
DFS Used: 5419028480 (5.05 GB)
DFS Used%: 43.51%
Under replicated blocks: 4
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (2):
Name: 172.17.10.21:50010 (dna2)
Hostname: dna2
Decommission Status : Normal
Configured Capacity: 19674116096 (18.32 GB)
DFS Used: 2709520384 (2.52 GB)
Non DFS Used: 13448355840 (12.52 GB)
DFS Remaining: 3516239872 (3.27 GB)
DFS Used%: 13.77%
DFS Remaining%: 17.87%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Nov 01 00:14:23 CST 2015
Name: 172.17.10.20:50010 (dna1)
Hostname: dna1
Decommission Status : Normal
Configured Capacity: 19674116096 (18.32 GB)
DFS Used: 2709508096 (2.52 GB)
Non DFS Used: 13445681152 (12.52 GB)
DFS Remaining: 3518926848 (3.28 GB)
DFS Used%: 13.77%
DFS Remaining%: 17.89%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Nov 01 00:14:24 CST 2015
4. HDFS 分散檔案系統的目錄權限規劃 (只需做一次)
$ ssh nna
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.16.0-45-generic x86_64)
* Documentation: https://help.ubuntu.com/
Last login: Thu Aug 20 11:37:01 2015 from 172.17.42.1
產生 ds01 及 ds02 帳號
bigred@nna:~$ sudo useradd -m -s /bin/bash ds01
bigred@nna:~$ sudo useradd -m -s /bin/bash ds02
產生 biguser 群組
bigred@nna:~$ sudo groupadd biguser
java version "1.7.0_79"
Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
2. 格式化與啟動 HDFS 分散檔案系統
$ formathdfs a myring (只需做一次)
format (yes/no) yes
nna format ok
nna clean sn
dna1 clean dn
dna2 clean dn
$ starthdfs a
starting namenode, logging to /tmp/hadoop-bigred-namenode-nna.out
starting secondarynamenode, logging to /tmp/hadoop-bigred-secondarynamenode-nna.out
starting datanode, logging to /tmp/hadoop-bigred-datanode-dna1.out
starting datanode, logging to /tmp/hadoop-bigred-datanode-dna2.out
3. 檢視 HDFS 運作資訊
$ hdfs dfsadmin -report
Configured Capacity: 39348232192 (36.65 GB)
Present Capacity: 12454195200 (11.60 GB)
DFS Remaining: 7035166720 (6.55 GB)
DFS Used: 5419028480 (5.05 GB)
DFS Used%: 43.51%
Under replicated blocks: 4
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (2):
Name: 172.17.10.21:50010 (dna2)
Hostname: dna2
Decommission Status : Normal
Configured Capacity: 19674116096 (18.32 GB)
DFS Used: 2709520384 (2.52 GB)
Non DFS Used: 13448355840 (12.52 GB)
DFS Remaining: 3516239872 (3.27 GB)
DFS Used%: 13.77%
DFS Remaining%: 17.87%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Nov 01 00:14:23 CST 2015
Name: 172.17.10.20:50010 (dna1)
Hostname: dna1
Decommission Status : Normal
Configured Capacity: 19674116096 (18.32 GB)
DFS Used: 2709508096 (2.52 GB)
Non DFS Used: 13445681152 (12.52 GB)
DFS Remaining: 3518926848 (3.28 GB)
DFS Used%: 13.77%
DFS Remaining%: 17.89%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Nov 01 00:14:24 CST 2015
4. HDFS 分散檔案系統的目錄權限規劃 (只需做一次)
$ ssh nna
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.16.0-45-generic x86_64)
* Documentation: https://help.ubuntu.com/
Last login: Thu Aug 20 11:37:01 2015 from 172.17.42.1
產生 ds01 及 ds02 帳號
bigred@nna:~$ sudo useradd -m -s /bin/bash ds01
bigred@nna:~$ sudo useradd -m -s /bin/bash ds02
產生 biguser 群組
bigred@nna:~$ sudo groupadd biguser
將 ds01 及 ds02 加入 biguser 群組
bigred@nna:~$ sudo usermod -aG biguser ds01
bigred@nna:~$ sudo usermod -aG biguser ds02
建立 ds01 及 ds02 資料家目錄
bigred@nna:~$ hdfs dfs -mkdir /tmp
bigred@nna:~$ hdfs dfs -mkdir -p /user/ds01
bigred@nna:~$ hdfs dfs -mkdir /user/ds02
將 /tmp 目錄權限設為所有人可存取
bigred@nna:~$ hdfs dfs -chmod 777 /tmp
bigred@nna:~$ sudo usermod -aG biguser ds01
bigred@nna:~$ sudo usermod -aG biguser ds02
建立 ds01 及 ds02 資料家目錄
bigred@nna:~$ hdfs dfs -mkdir /tmp
bigred@nna:~$ hdfs dfs -mkdir -p /user/ds01
bigred@nna:~$ hdfs dfs -mkdir /user/ds02
將 /tmp 目錄權限設為所有人可存取
bigred@nna:~$ hdfs dfs -chmod 777 /tmp
設定 ds01 及 ds02 資料家目錄權限
bigred@nna:~$ hdfs dfs -chown ds01:biguser /user/ds01
bigred@nna:~$ hdfs dfs -chown ds02:biguser /user/ds02
bigred@nna:~$ exit
bigred@nna:~$ hdfs dfs -chown ds01:biguser /user/ds01
bigred@nna:~$ hdfs dfs -chown ds02:biguser /user/ds02
bigred@nna:~$ exit
資料科學家上工
$ dkstart a.client
cla01 starting
java version "1.7.0_79"
Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
dk:2211->172.17.10.100:22
2. 登入 Hadoop Client 貨櫃主機
$ ssh ds01@cla01
ds01@cla01's password: ds01
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.16.0-50-generic x86_64)
* Documentation: https://help.ubuntu.com/
Last login: Sun Oct 18 00:11:33 2015 from 172.17.42.1
$ ping nna
PING nna (172.17.10.10) 56(84) bytes of data.
64 bytes from nna (172.17.10.10): icmp_seq=1 ttl=64 time=0.236 ms
64 bytes from nna (172.17.10.10): icmp_seq=2 ttl=64 time=0.298 ms
64 bytes from nna (172.17.10.10): icmp_seq=3 ttl=64 time=0.297 ms
64 bytes from nna (172.17.10.10): icmp_seq=4 ttl=64 time=0.123 ms
rtt min/avg/max/mdev = 0.123/0.238/0.298/0.072 ms
3. 資料管理 - 建立目錄與上載檔案
$ hdfs dfs -mkdir mytest
$ hdfs dfs -ls
Found 1 items
drwxr-xr-x - bigred biguser 0 2015-10-19 19:29 mytest
$ hdfs dfs -put /etc/passwd mytest/
$ hdfs dfs -ls mytest
Found 1 items
-rw-r--r-- 2 bigred biguser 1235 2015-10-19 19:39 mytest/passwd
$ hdfs dfs -cat mytest/passwd
::
$ hdfs dfs -ls mytest
Found 1 items
-rw-r--r-- 2 bigred biguser 1235 2015-10-19 19:39 mytest/passwd
$ hdfs dfs -cat mytest/passwd
::
daemon,,,:/var/lib/colord:/bin/false
$ hdfs dfs -rm -r mytest
15/10/19 19:43:06 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted mytest
4. 資料管理 - 下載 103 年大專校院名錄檔
$ wget --no-check-certificate https://stats.moe.gov.tw/files/school/103/u1_new.txt
5. 轉換資料格式
$ hdfs dfs -rm -r mytest
15/10/19 19:43:06 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted mytest
4. 資料管理 - 下載 103 年大專校院名錄檔
$ wget --no-check-certificate https://stats.moe.gov.tw/files/school/103/u1_new.txt
5. 轉換資料格式
$ iconv -f UCS-2 -t utf8 u1_new.txt -o school.txt
6. 顯示檔案內容
$ head -n 5 school.txt
103學年度大專校院名錄
代碼 學校名稱 縣市名稱 地址 電話 網址 體系別
0001 國立政治大學 [38]臺北市 [116]臺北市文山區指南路二段64號 (02)29393091 http://www.nccu.edu.tw [1]一般
0002 國立清華大學 [18]新竹市 [300]新竹市東區光復路二段101號 (03)5715131 http://www.nthu.edu.tw [1]一般
7. 資料管理 - 上載 103 年大專校院名錄檔
$ hdfs dfs -put school.txt school.txt
$ hdfs dfs -ls
Found 1 items
-rw-r--r-- 2 ds01 biguser 20807 2015-08-20 13:11 school.txt
8. 登出 Hadoop Client 貨櫃主機
$ exit
logout
Connection to cla01 closed.
6. 顯示檔案內容
$ head -n 5 school.txt
103學年度大專校院名錄
代碼 學校名稱 縣市名稱 地址 電話 網址 體系別
0001 國立政治大學 [38]臺北市 [116]臺北市文山區指南路二段64號 (02)29393091 http://www.nccu.edu.tw [1]一般
0002 國立清華大學 [18]新竹市 [300]新竹市東區光復路二段101號 (03)5715131 http://www.nthu.edu.tw [1]一般
7. 資料管理 - 上載 103 年大專校院名錄檔
$ hdfs dfs -put school.txt school.txt
$ hdfs dfs -ls
Found 1 items
-rw-r--r-- 2 ds01 biguser 20807 2015-08-20 13:11 school.txt
8. 登出 Hadoop Client 貨櫃主機
$ exit
logout
Connection to cla01 closed.
啟動 YARN 分散運算系統
1. 啟動 YARN 所有貨櫃主機
$ dkstart a.yarn
rma starting
java version "1.7.0_79"
Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
nma1 starting
java version "1.7.0_79"
Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
nma2 starting
java version "1.7.0_79"
Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
2. 啟動 YARN 分散運算系統
$ startyarn a
starting resourcemanager, logging to /tmp/yarn-bigred-resourcemanager-rma.out
starting historyserver, logging to /home/bigred/jhs/mapred-bigred-historyserver-rma.out
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma1.out
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma2.out
3. 檢視 YARN 運作資訊
$ yarn node -list -all
15/11/01 00:18:32 INFO client.RMProxy: Connecting to ResourceManager at rma/172.17.10.30:8032
15/11/01 00:18:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Total Nodes:2
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
nma1:52197 RUNNING nma1:8042 0
nma2:41974 RUNNING nma2:8042 0
重新配置 YARN 分散運算資源
1. 檢視目前資源配置
$ curl http://rma:8088/ws/v1/cluster/metrics
{"clusterMetrics":{"appsSubmitted":0,"appsCompleted":0,"appsPending":0,"appsRunning":0,"appsFailed":0,"appsKilled":0,"reservedMB":0,"availableMB":2048,"allocatedMB":0,"reservedVirtualCores":0,"availableVirtualCores":2,"allocatedVirtualCores":0,"containersAllocated":0,"containersReserved":0,"containersPending":0,"totalMB":2048,"totalVirtualCores":2,"totalNodes":2,"lostNodes":0,"unhealthyNodes":0,"decommissionedNodes":0,"rebootedNodes":0,"activeNodes":2}}
[重點] YARN 系統內定每部 Node Manager 的記憶體為 8G, CPU 為 8 Core
2. 設定 YARN 運算資源
$ sudo nano /opt/conf/A/yarn-site.xml
::
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1536</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
3. 設定 MapReduce 程式運算資源
$ sudo nano /opt/conf/A/mapred-site.xml
::
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>512</value>
</property>
4. 重新啟動 YARN
$ stopyarn a
stopping resourcemanager
stopping historyserver
stopping nodemanager
stopping nodemanager
$ startyarn a
starting resourcemanager, logging to /tmp/yarn-bigred-resourcemanager-rma.out
starting historyserver, logging to /home/bigred/jhs/mapred-bigred-historyserver-rma.out
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma1.out
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma2.out
資料科學家再次上工
java version "1.7.0_79"
Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
nma1 starting
java version "1.7.0_79"
Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
nma2 starting
java version "1.7.0_79"
Scala compiler version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
2. 啟動 YARN 分散運算系統
$ startyarn a
starting resourcemanager, logging to /tmp/yarn-bigred-resourcemanager-rma.out
starting historyserver, logging to /home/bigred/jhs/mapred-bigred-historyserver-rma.out
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma1.out
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma2.out
3. 檢視 YARN 運作資訊
$ yarn node -list -all
15/11/01 00:18:32 INFO client.RMProxy: Connecting to ResourceManager at rma/172.17.10.30:8032
15/11/01 00:18:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Total Nodes:2
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
nma1:52197 RUNNING nma1:8042 0
nma2:41974 RUNNING nma2:8042 0
重新配置 YARN 分散運算資源
1. 檢視目前資源配置
$ curl http://rma:8088/ws/v1/cluster/metrics
{"clusterMetrics":{"appsSubmitted":0,"appsCompleted":0,"appsPending":0,"appsRunning":0,"appsFailed":0,"appsKilled":0,"reservedMB":0,"availableMB":2048,"allocatedMB":0,"reservedVirtualCores":0,"availableVirtualCores":2,"allocatedVirtualCores":0,"containersAllocated":0,"containersReserved":0,"containersPending":0,"totalMB":2048,"totalVirtualCores":2,"totalNodes":2,"lostNodes":0,"unhealthyNodes":0,"decommissionedNodes":0,"rebootedNodes":0,"activeNodes":2}}
[重點] YARN 系統內定每部 Node Manager 的記憶體為 8G, CPU 為 8 Core
2. 設定 YARN 運算資源
$ sudo nano /opt/conf/A/yarn-site.xml
::
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1536</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
3. 設定 MapReduce 程式運算資源
$ sudo nano /opt/conf/A/mapred-site.xml
::
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>512</value>
</property>
4. 重新啟動 YARN
$ stopyarn a
stopping resourcemanager
stopping historyserver
stopping nodemanager
stopping nodemanager
$ startyarn a
starting resourcemanager, logging to /tmp/yarn-bigred-resourcemanager-rma.out
starting historyserver, logging to /home/bigred/jhs/mapred-bigred-historyserver-rma.out
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma1.out
starting nodemanager, logging to /tmp/yarn-bigred-nodemanager-nma2.out
資料科學家再次上工
1. 再次登入 Hadoop Client 貨櫃主機
$ ssh ds01@cla01
ds01@cla01's password:
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.16.0-50-generic x86_64)
* Documentation: https://help.ubuntu.com/
Last login: Sun Oct 18 00:11:33 2015 from 172.17.42.1
2. 啟動 Pig 分析工具
$ pig -version
Apache Pig version 0.15.0 (r1682971)
compiled Jun 01 2015, 11:44:35
$ pig
grunt> pwd
hdfs://nna:8020/user/ds01
3. 資料載入與顯示
grunt> school = LOAD 'school.txt' AS (sno:int,name:chararray);
grunt> dump school;
::
(1283,仁德醫護管理專科學校)
(1284,樹人醫護管理專科學校)
(1285,慈惠醫護管理專科學校)
(1286,耕莘健康管理專科學校)
(1287,敏惠醫護管理專科學校)
(1288,高美醫護管理專科學校)
(1289,育英醫護管理專科學校)
(1290,崇仁醫護管理專科學校)
(1291,聖母醫護管理專科學校)
(1292,新生醫護管理專科學校)
(,)
(,)
4. 資料過濾
grunt> frec = FILTER school by sno is not null;
grunt> dump frec;
::
(1285,慈惠醫護管理專科學校)
(1286,耕莘健康管理專科學校)
(1287,敏惠醫護管理專科學校)
(1288,高美醫護管理專科學校)
(1289,育英醫護管理專科學校)
(1290,崇仁醫護管理專科學校)
(1291,聖母醫護管理專科學校)
(1292,新生醫護管理專科學校)
5. 離開 Pig
grunt> quit;
2015-08-27 14:53:47,782 [main] INFO org.apache.pig.Main - Pig script completed in 21 seconds and 322 milliseconds (21322 ms)
6. 登出 Hadoop Client 貨櫃主機
$ exit
logout
Connection to cla01 closed.
資料科學家收工
1. 關閉 HDFS 及 YARN 分散系統
$ stopyarn a
stopping resourcemanager
stopping historyserver
stopping nodemanager
stopping nodemanager
$ stophdfs a
stopping namenode
stopping secondarynamenode
stopping datanode
stopping datanode
$ ssh ds01@cla01
ds01@cla01's password:
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.16.0-50-generic x86_64)
* Documentation: https://help.ubuntu.com/
Last login: Sun Oct 18 00:11:33 2015 from 172.17.42.1
2. 啟動 Pig 分析工具
$ pig -version
Apache Pig version 0.15.0 (r1682971)
compiled Jun 01 2015, 11:44:35
$ pig
grunt> pwd
hdfs://nna:8020/user/ds01
3. 資料載入與顯示
grunt> school = LOAD 'school.txt' AS (sno:int,name:chararray);
grunt> dump school;
::
(1283,仁德醫護管理專科學校)
(1284,樹人醫護管理專科學校)
(1285,慈惠醫護管理專科學校)
(1286,耕莘健康管理專科學校)
(1287,敏惠醫護管理專科學校)
(1288,高美醫護管理專科學校)
(1289,育英醫護管理專科學校)
(1290,崇仁醫護管理專科學校)
(1291,聖母醫護管理專科學校)
(1292,新生醫護管理專科學校)
(,)
(,)
4. 資料過濾
grunt> frec = FILTER school by sno is not null;
grunt> dump frec;
::
(1285,慈惠醫護管理專科學校)
(1286,耕莘健康管理專科學校)
(1287,敏惠醫護管理專科學校)
(1288,高美醫護管理專科學校)
(1289,育英醫護管理專科學校)
(1290,崇仁醫護管理專科學校)
(1291,聖母醫護管理專科學校)
(1292,新生醫護管理專科學校)
5. 離開 Pig
grunt> quit;
2015-08-27 14:53:47,782 [main] INFO org.apache.pig.Main - Pig script completed in 21 seconds and 322 milliseconds (21322 ms)
6. 登出 Hadoop Client 貨櫃主機
$ exit
logout
Connection to cla01 closed.
資料科學家收工
1. 關閉 HDFS 及 YARN 分散系統
$ stopyarn a
stopping resourcemanager
stopping historyserver
stopping nodemanager
stopping nodemanager
$ stophdfs a
stopping namenode
stopping secondarynamenode
stopping datanode
stopping datanode
2. 關閉 Hadoop 所有貨櫃主機
$ dkstop a
cla01 Exiting
dk:2211->172.17.10.100:22 deleted
nna Exiting
dna1 Exiting
dna2 Exiting
rma Exiting
nma1 Exiting
nma2 Exiting
3. 關閉 雲端 UberOS 資戰機
$ bye
沒有留言:
張貼留言