大数据环境下Spark性能优化分析研究与应用

首页 > 过刊浏览>2022年第50卷第1期 >51-58

大数据环境下Spark性能优化分析研究与应用
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:2021年广西气象科研计划指令性项目(桂气科2021ZL02)资助

Research and Application of Spark Performance Optimization Analysis in Big Data Environment

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对长时间序列、多站点和多气象要素的大数据量查询需求，现有的CIMISS(China Integrated Meteorological Information Sharing System)存在支撑能力严重不足的问题。本研究使用广西气象站点建站至今的历史地面气象记录月报表数据资料和现有Hadoop集群物理资源,重新设计数据ETL流程，构建Parquet格式数据集并完成HDFS转换存储；嵌入Spark的Broadcast广播变量，优化Spark集群执行参数，提高了集群的处理并行度和SparkSql的关联查询效率。结果表明，Parquet格式数据集的最高压缩比超过95%，一次性大数据量的查询效率比原来提升了1～5倍，并支持高并发访问，为各类相关预报预测业务的开展提供了有效的技术支撑。

Abstract:

Aiming at a large amount of data query requirements of longtime series, multisites and multimeteorological elements, the supporting capacity of the existing CMISS(China Integrated Meteorological Information Sharing System) is seriously insufficient. In this study, the monthly report data of historical surface meteorological records since the establishment of the meteorological stations in Guangxi and existing Hadoop cluster physical resources are used to redesign the ETL process, construct the Parquet format dataset, and complete HDFS conversion storage. Besides, the Broadcast variable of Spark is embedded to optimize the execution parameters of the Spark cluster, which improves the processing parallelism of the cluster and the association query efficiency of SparkSql. The results show that the maximum compression ratio of the Parquet format data set was more than 95%; the query efficiency of the onetime large amount of data was 1 to 5 times higher than the original and supported high concurrent access, providing effective technical support for the development of various related forecasting services.

参考文献

相似文献

引证文献

引用本文

黄志,苏传程,苏晓红.大数据环境下Spark性能优化分析研究与应用[J].气象科技,2022,50(1):51~58

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-04-24
定稿日期:2021-09-06
录用日期:
在线发布日期: 2022-02-28
出版日期: 2022-02-28

您是第位访问者
技术支持：北京勤云科技发展有限公司

引用本文

分享

文章指标

历史