(第三套)ZZ052大数据应用与服务赛项赛题

曦2024-10-282024-10-28

模块一：平台搭建与运维

（一）任务一：大数据平台搭建

子任务一：Hadoop 完全分布式安装配置本任务需要使用 root 用户完成相关配置，安装 Hadoop需要配置前置环境。命令中要求使用绝对路径，具体要求如下:

（1）从 Master 中的/opt/software 目录下将文件 hadoop-3.1.3.tar.gz 、 jdk-8u191-linux-x64.tar.gz 安装包解压到/opt/module 路径中(若路径不存在，则需新建)，将 JDK 解压命令复制并粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下；

1	[root@master software]# tar -zxvf jdk-8u391-linux-x64.tar.gz -C /opt/module/

（2）修改 Master 中/etc/profile 文件，设置 JDK 环境变量并使其生效，配置完毕后在 Master 节点分别执行“java -version”和“javac”命令，将命令行执行结果分别截图并粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下；

[root@master module]# java -version
java version "1.8.0_391"
Java(TM) SE Runtime Environment (build 1.8.0_391-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.391-b13, mixed mode)
[root@master module]# javac
Usage: javac <options> <source files>
where possible options include:
  -g                         Generate all debugging info
  -g:none                    Generate no debugging info
  -g:{lines,vars,source}     Generate only some debugging info
  -nowarn                    Generate no warnings
  -verbose                   Output messages about what the compiler is doing
  -deprecation               Output source locations where deprecated APIs are used
  -classpath <path>          Specify where to find user class files and annotation processors
  -cp <path>                 Specify where to find user class files and annotation processors
  -sourcepath <path>         Specify where to find input source files
  -bootclasspath <path>      Override location of bootstrap class files
  -extdirs <dirs>            Override location of installed extensions
  -endorseddirs <dirs>       Override location of endorsed standards path
  -proc:{none,only}          Control whether annotation processing and/or compilation is done.
  -processor <class1>[,<class2>,<class3>...] Names of the annotation processors to run; bypasses default discovery process
  -processorpath <path>      Specify where to find annotation processors
  -parameters                Generate metadata for reflection on method parameters
  -d <directory>             Specify where to place generated class files
  -s <directory>             Specify where to place generated source files
  -h <directory>             Specify where to place generated native header files
  -implicit:{none,class}     Specify whether or not to generate class files for implicitly referenced files
  -encoding <encoding>       Specify character encoding used by source files
  -source <release>          Provide source compatibility with specified release
  -target <release>          Generate class files for specific VM version
  -profile <profile>         Check that API used is available in the specified profile
  -version                   Version information
  -help                      Print a synopsis of standard options
  -Akey[=value]              Options to pass to annotation processors
  -X                         Print a synopsis of nonstandard options
  -J<flag>                   Pass <flag> directly to the runtime system
  -Werror                    Terminate compilation if warnings occur
  @<filename>                Read options and filenames from file

（3）请完成 host 相关配置，将三个节点分别命名为master、slave1、slave2，并做免密登录，用 scp 命令并使用绝对路径从Master复制JDK解压后的安装文件到slave1、slave2节点（若路径不存在，则需新建），并配置 slave1、slave2 相关环境变量，将全部 scp 复制JDK 的命令复制并粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下；

1 2	[root@master module]# scp -r /opt/module/jdk/ slave1:/opt/module [root@master module]# scp -r /opt/module/jdk/ slave2:/opt/module

（4）在 Master 将 Hadoop 解压到/opt/module(若路径不存在，则需新建)目录下，并将解压包分发至 slave1、slave2中，其中 master、slave1、slave2 节点均作为 datanode，配置好相关环境，初始化 Hadoop 环境 namenode，将初始化命令及初始化结果截图（截取初始化结果日志最后 20 行即可）粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下；

2025-02-21 03:01:37,259 INFO util.GSet: capacity      = 2^13 = 8192 entries
2025-02-21 03:01:37,283 INFO namenode.FSImage: Allocated new BlockPoolId: BP-232044380-192.168.1.131-1740124897277
2025-02-21 03:01:37,292 INFO common.Storage: Storage directory /data/nn has been successfully formatted.
2025-02-21 03:01:37,314 INFO namenode.FSImageFormatProtobuf: Saving image file /data/nn/current/fsimage.ckpt_0000000000000000000 using no compression
2025-02-21 03:01:37,396 INFO namenode.FSImageFormatProtobuf: Image file /data/nn/current/fsimage.ckpt_0000000000000000000 of size 396 bytes saved in 0 seconds .
2025-02-21 03:01:37,402 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2025-02-21 03:01:37,425 INFO namenode.FSNamesystem: Stopping services started for active state
2025-02-21 03:01:37,425 INFO namenode.FSNamesystem: Stopping services started for standby state
2025-02-21 03:01:37,428 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2025-02-21 03:01:37,428 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.1.131
************************************************************/

（5）启动 Hadoop 集群（包括 hdfs 和 yarn），使用 jps命令查看 Master 节点与 slave1 节点的 Java 进程，将 jps 命令与结果截图粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下。

[root@master hadoop]# jps
11728 JobHistoryServer
10547 DataNode
11235 NodeManager
11093 ResourceManager
11750 Jps
10857 SecondaryNameNode
11646 WebAppProxyServer
10399 NameNode

[root@slave1 ~]# jps
10193 NodeManager
10305 Jps
10085 DataNode

子任务二：Kafka 安装配置

本任务需要使用 root 用户完成相关配置，已安装 Hadoop及需要配置前置环境，具体要求如下：

（1）从 Master 中的/opt/software 目录下将文件 apache-zookeeper-3.5.7-bin.tar.gz 、 kafka_2.12-2.4.1.tgz 解压到/opt/module 目录下，将 Kafka 解压命令复制并粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下；

1	[root@master software]# tar -zxvf kafka_2.11-2.1.0.tgz -C /opt/module/

（2）配置好 zookeeper，其中 zookeeper 使用集群模式，分别将 master、slave1、slave2 作为其节点（若 zookpeer 已安装配置好，则无需再次配置），配置好 Kafka 的环境变量，使用 kafka-server-start.sh –version 查看 Kafka 的版本内容，并将命令和结果截图粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下；

1 2	[root@master kafka]# kafka-server-start.sh --version 2.1.0 (Commit:809be928f1ae004e)

（3）完善其他配置并分发 Kafka 文件到 slave1、slave2中，并在每个节点启动 Kafka，创建 Topic，其中 Topic 名称为 installtopic，分数为 2，副本数为 2，将创建命令和创建成果截图粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下。

1	[root@master kafka]# kafka-topics.sh --create --zookeeper master:2181,slave1:2181,slave2:2181 --partitions 2 --replication-factor 2 --topic installtopic

子任务三：Hive 安装配置

本任务需要使用 root 用户完成相关配置，已安装 Hadoop及需要配置前置环境，具体要求如下：

（1）从 Master 中的/opt/software 目录下将文件 apachehive-3.1.2-bin.tar.gz、mysql-connector-java-5.1.37.jar 解压到/opt/module 目录下，将命令复制并粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下；

1 2	[root@master software]# tar -xvf mysql-5.7.44-1.el7.x86_64.rpm-bundle.tar -C /opt/module/ [root@master software]# tar -zxvf apache-hive-2.3.4-bin.tar.gz -C /opt/module/

（2）设置 Hive 环境变量，并使环境变量生效，执行命令 hive –version 并将命令与结果截图粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下；

[root@master module]# hive --version
Hive 2.3.4
Git git://daijymacpro-2.local/Users/daijy/commit/hive -r 56acdd2120b9ce6790185c679223b8b5e884aaf2
Compiled by daijy on Wed Oct 31 14:20:50 PDT 2018
From source with checksum 9f2d17b212f3a05297ac7dd40b65bab0

（3）完成相关配置并添加所依赖包，将 MySQL 数据库作为 Hive 元数据库。初始化 Hive 元数据，并通过 schematool相关命令执行初始化，将初始化结果截图（范围为命令执行结束的最后 10 行）粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下。

[!NOTE]

我这是2.x版本所以没有很长的留白

[root@master hive]# schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed

（二）任务二：数据库配置维护

子任务一：数据库配置

（1）配置服务端 MySQL 数据库的远程连接。

（2）初始化 MySQL 数据库系统，将完整命令及初始化成功的截图复制粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下。

1 2	[root@master module]# mysqld --initialize-insecure --user=mysql --datadir=/var/lib/mysql [root@master module]#

（3）配置 root 用户允许任意 ip 连接，将完整命令截图复制粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下。

1
2
3

mysql> update mysql.user set host='%' where user='root';
Query OK, 1 row affected (0.04 sec)
Rows matched: 1  Changed: 1  Warnings: 0

（4）通过 root 用户登录 MySQL 数据库系统，查看 mysql库下的所有表，将完整命令及执行命令后的结果的截图复制粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下。

mysql> show tables from mysql;
+---------------------------+
| Tables_in_mysql           |
+---------------------------+
| columns_priv              |
| db                        |
| engine_cost               |
| event                     |
| func                      |
| general_log               |
| gtid_executed             |
| help_category             |
| help_keyword              |
| help_relation             |
| help_topic                |
| innodb_index_stats        |
| innodb_table_stats        |
| ndb_binlog_index          |
| plugin                    |
| proc                      |
| procs_priv                |
| proxies_priv              |
| server_cost               |
| servers                   |
| slave_master_info         |
| slave_relay_log_info      |
| slave_worker_info         |
| slow_log                  |
| tables_priv               |
| time_zone                 |
| time_zone_leap_second     |
| time_zone_name            |
| time_zone_transition      |
| time_zone_transition_type |
| user                      |
+---------------------------+
31 rows in set (0.04 sec)

（5）输入命令以创建新的用户。完整命令及执行命令后的结果的截图复制粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下。

1 2	mysql> create user 'hkjcpdd'@'%' identified by '123456'; Query OK, 0 rows affected (0.11 sec)

（6）授予新用户访问数据的权限。完整命令及执行命令后的结果的截图复制粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下。

1 2	mysql> grant select on . to 'hkjcpdd'@'%'; Query OK, 0 rows affected (0.09 sec)

（7）刷新权限。完整命令及执行命令后的结果的截图复制粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下。

1 2	mysql> flush privileges; Query OK, 0 rows affected (0.05 sec)

子任务二：创建相关表

（1）根据以下数据字段在 MySQL 数据库中创建酒店表（hotel）。酒店表字段如下：

字段	类型	中文对齐
Id	int	酒店编号
hotel_name	varchar	酒店名称
City	varchar	城市
Province	varchar	省份
Level	varchar	星级
room_num	int	房间数
Score	double	评分
shopping	varchar	评论数

1
2

mysql> create table hotel( id int, hotel_name varchar(255), City varchar(255), Province varchar(255), Level varchar(255), room_num int, Score double, shopping varcchar(255));
Query OK, 0 rows affected (0.10 sec)

（2）根据以下数据字段在 MySQL 数据库中创建评论表（comment）。评论表字段如下：

字段	类型	中文含义
Id	int	评论编号
Name	varchar	酒店名称
Commentator	varchar	评论人
Score	double	评分
comment_time	datetime	评论时间
Content	varchar	评论内容

mysql> create table comment(
    -> id int,
    -> Name varchar(255),
    -> Commentator varchar(255),
    -> Score double,
    -> comment_time datetime,
    -> Content varchar(255));
Query OK, 0 rows affected (0.06 sec)

将这两个 SQL 建表语句分别截图复制粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下。

子任务三：维护数据表根据已给到的 sql 文件将这两份数据导入任意自己创建的数据库中，并对其中的数据进行如下操作：

在 comment_all 表中将 id 为 30 的评分改为 5;
1
2
3
mysql> update comment_all set Score=5 where id=30;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
在 hotel表中统计各城市的酒店总数。
1
mysql> select city, count(hotel_name) from hotel group by city;
将这两个 SQL 语句分别截图复制粘贴至客户端桌面【Release\提交结果.docx】中对应的任务序号下。