台灣 Big Data 菁英戰士

Open Data 台灣
1. 政府資料開放平臺
2. 臺北市政府資料開放平台
3. 高雄市政府資料開放平台
4. 宜蘭縣政府開放資料平台
5. 臺南市政府資料開放平台
6. 新北市政府資料開放平台

第一代 UberOS 資戰機 C 型 (Ubuntu Server 版)
首次加入 Spark 閃光主動陣列雷達 及 Apache Tez 渦輪增壓系統, 運行效能立即躍升一級,  資戰機的電戰系統更提升為 Hadoop 2.7.2, 資戰機結構圖如下 :




系統需求

1. 雙核心 CPU
2. 至少 8G 記憶體
3. 微軟 64 位元作業系統 (Windows 7/8/10)
4. VMware Workstation 12.1.1 Player(這是免費軟體), 下載網址如下 :
UberOS 資戰機下載網址如下 :
https://docs.google.com/uc?id=0ByAESZ_C1fg-bFZjdGVQQnkyQ0E&export=download

操作手冊

1. 認識 雲端 UberOS 資戰機 (UberOS271.zip)
2. Docker 發動機 (Container)
3. 資戰機 首航 (Hadoop 叢集系統)
4. Hadoop 電戰系統 (HDFS,YARN)
5. ETL 閃舞雷達系統 (Pig)
6. WH 陣列雷達系統 (Hive)

第一代 UberDL 航空母艦 (Ubuntu 16.04 桌面版+Docker)


下載網址 (2017/03/01 更新)

https://docs.google.com/uc?id=0ByAESZ_C1fg-SDc4eXZhMWNKVTg&export=download


1. 2017/03/01 修正 putty 無法連接問題 (對外連接網卡設定錯誤)

建置與規劃 Hadoop 核心系統
請開啟終端機, 執行以下步驟 :

1. 建立 Hadoop A 叢集的所有貨櫃主機 
$ dkcreate a
yes/no : yes

cla00 created : dsa100 dsb100 dsc100 
cla01 created : dsa101 dsb101 dsc101 
hbma created
nginx created
nna created
rma created
spkma created
wka01 created
wka02 created
zka01 created

2. 檢視  Hadoop A 叢集的系統架構
$ dkls a
Docker Utility 0.6.0 (2017/02/01)

[A Cluster]
--------------------------------------------------------------------------------------------
zka01(a29d40f5af01) 172.17.6.30 Exited ()
wka02(8c441c83f381) 172.17.8.11 Exited ()
wka01(a7ad549704d2) 172.17.8.10 Exited ()
spkma(4d9d1e69181d) 172.17.6.20 Exited ()
rma(8dff19f0d4fe) 172.17.6.12 Exited ()
nna(968ee9420d20) 172.17.6.10 Exited ()
nginx(dc35fb5f597a) 172.17.7.20 Exited (CVBG:80->nginx:80)
hbma(90effcbad3cf) 172.17.6.30 Exited ()
cla01(17c404486d66) 172.17.2.11 Exited (CVBG:22101->cla01:22, user:dsa101 dsb101 dsc101)
cla00(08fb31ee111b) 172.17.2.10 Exited (CVBG:22100->cla00:22, user:dsa100 dsb100 dsc100)

[Docker Images]
--------------------------------------------------------------------------------------------
dafu/worker         16.04               523546e962f0        3 weeks ago         1.92 GB


3. 格式化 Hadoop A 叢集的 HDFS 
$ formathdfs a jedi
format (yes/no) yes
Name Node (nna) format ok
Secondary NameNode (nna) ok
DataNode (wka01) ok
DataNode (wka02) ok

4. 建立 Hadoop A 叢集的 Data Lake
createdlka 
start HDFS ok

[create HDFS schema]
/elt (bigred:bigdata,750)
/dataset (bigred:bigdata,750)
/app (bigred:bigdata,750)
/metadata (bigred:bigdata,750)
/tmp (777)

[create Hadoop users]
(HDFS) /user dir created
(nna) bigdata group created
(nna) dsa100 -> /user/dsa100 created
(nna) dsb100 -> /user/dsb100 created
(nna) dsc100 -> /user/dsc100 created
(nna) dsa101 -> /user/dsa101 created
(nna) dsb101 -> /user/dsb101 created
(nna) dsc101 -> /user/dsc101 created
(nna) dsa150 -> /user/dsa150 created
(nna) dsa151 -> /user/dsa151 created

stop HDFS ok

開始大數據資料分析
請開啟終端機, 執行以下步驟 :

1. 啟動 Hadoop 核心系統
$ starthd a
[Cluster A]
start Application Container ok
start HDFS ok
start YARN ok

2. 檢視 Hadoop 核心系統資訊
$ dkls a
Docker Utility 0.6.0 (2017/02/01)

[A Cluster]
--------------------------------------------------------------------------------------------
zka01(e335072afdbf) 172.17.6.30 Running ()
wka02(ecc0eb4a695b) 172.17.8.11 Running ( NodeManager DataNode )
wka01(7d1c16a590a6) 172.17.8.10 Running ( NodeManager DataNode )
spkma(92b188924db1) 172.17.6.20 Running ()
rma(3e2dec0dd023) 172.17.6.12 Running ( ResourceManager JobHistoryServer )
nna(560485ebef6f) 172.17.6.10 Running ( NameNode SecondaryNameNode )
nginx(0d07f7419af6) 172.17.7.20 Running (CVN79:80->nginx:80)
hbma(76eaa30d57c6) 172.17.6.30 Running ()
cla01(2e2416304852) 172.17.2.11 Running (CVN79:22101->cla01:22, user:dsa101 dsb101 dsc101)
cla00(cfe443a20ab7) 172.17.2.10 Running (CVN79:22100->cla00:22, user:dsa100 dsb100 dsc100)

[Docker Images]
--------------------------------------------------------------------------------------------
dafu/worker         16.04               8a5745347502        12 days ago         1.93 GB

3. 確認 HDFS 運作資訊
$ hdfs dfsadmin -printTopology
Rack: /default-rack
   172.17.8.10:50010 (wka01)
   172.17.8.11:50010 (wka02)

4. 確認 YARN 運作資訊
$ yarn node -list -all
17/01/01 19:08:26 INFO client.RMProxy: Connecting to ResourceManager at rma/172.17.6.12:8032
17/01/01 19:08:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Total Nodes:2
         Node-Id     Node-State Node-Http-Address Number-of-Running-Containers
     wka01:44971        RUNNING       wka01:8042                           0
     wka02:36045        RUNNING       wka02:8042                           0

5. 登入 Hadoop Client 主機
$ dslogin dsa101
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 4.4.0-62-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

build derby database ... ok

6. 執行 MapReduce 程式
$ hadoop  jar  /opt/hadoop-2.8.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jar pi 2 10000
                        :::
Job Finished in 85.291 seconds
Estimated value of Pi is 3.14280000000000000000

7. 使用 Pig 資料分析工具
$ hdfs dfs -put /opt/dataset/customer.csv 
$ pig -e 'ls' 2>/dev/null
hdfs://nna:8020/user/dsa101/customer.csv<r 2> 695

$ pig 2>/dev/null
grunt> a = load 'customer.csv' using PigStorage(',');
grunt> store a into 'customer' using PigStorage(',');
grunt> ls
hdfs://nna:8020/user/dsa101/customer <dir>
hdfs://nna:8020/user/dsa101/customer.csv<r 2> 695
grunt> quit

8. 使用 Hive 資料倉儲工具
$ nano customer.sql 
CREATE EXTERNAL TABLE CUSTOMER (
cid string,
name string,
fname string,
age int,
occupation string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE LOCATION '/user/dsa101/customer';

$ hive -S -f customer.sql 2>/dev/null

$ hive -S -e 'select * from CUSTOMER limit 2' 2>/dev/null
4000001 Kristina Chung 55 Pilot
4000002 Paige Chen 74 Teacher

9. 離開 Hadoop Client 主機
$ exit
logout
Connection to cla01 closed.

開始使用大數據資料庫 HBase
必須先啟動 Hadoop 核心系統 (starthda)

1. 啟動 HBase 系統
starthba
starting master, logging to /tmp/hbase-bigred-master-zka01.out
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.9/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
starting regionserver, logging to /tmp/hbase-bigred-regionserver-wka01.out
starting regionserver, logging to /tmp/hbase-bigred-regionserver-wka02.out

2. 檢測 HBase 系統
$ hbase hbck -metaonly 2>/dev/null
HBaseFsck command line options: -metaonly
Version: 1.2.4
Number of live region servers: 2
Number of dead region servers: 0
Master: zka01,16000,1486623999995
Number of backup masters: 0
Average load: 1.0
Number of requests: 0
Number of regions: 2
Number of regions in transition: 0

Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 0

Summary:
Table hbase:meta is okay.
    Number of regions: 1
    Deployed on:  wka02,16020,1486624008071
0 inconsistencies detected.
Status: OK

3. 建立 customers 資料表
$ echo "create 'customers', 'customers_data'" | hbase shell -n 2>/dev/null
0 row(s) in 5.7480 seconds

Hbase::Table - customers

4. 計算 customers 資料筆數
$ echo "count 'customers'" | hbase shell -n 2>/dev/null
0 row(s) in 1.3280 seconds

0

5. 刪除 customers 資料表
$ echo "disable 'customers'" | hbase shell -n 2>/dev/null
0 row(s) in 3.6750 seconds

nil
$ echo "drop 'customers'" | hbase shell -n 2>/dev/null
0 row(s) in 2.8300 seconds

nil
$ echo "list" | hbase shell -n 2>/dev/null
TABLE                                                                                                  
0 row(s) in 1.2070 seconds

6. 關閉 HBase 系統
$ stophba 
stopping master.
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.9/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
stopping regionserver.............................................
stopping regionserver...................

7. 關閉 Hadoop 核心系統
stophd a
[Cluster A]
stop YARN ok
stop HDFS ok
stop Application Container ok


Hadoop 開發特戰包 (Windows 版) 操作手冊
下載網址如下 : 
https://docs.google.com/uc?id=0ByAESZ_C1fg-T05xd1ZZbXdjZGc&export=download

1. 工具安裝與設定
2. 連接 UberOS 資戰機 C 型

3. 使用開發工具 Eclipse 

沒有留言:

張貼留言