期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李超岭李健强张宏春龚爱华魏东琦《地质通报》2015,34(07):1288-1299

地质调查数据主要由结构化和非结构化多样性的数据构成。由非结构化多样性数据文件组成的报告,由于技术原因,长期以来一直以传统的目录文件方式进行存储。这种存储方式导致数据的查询、统计、更新等操作不但低效,而且非常不利于检索、查询、挖掘等应用,使得数据服务能力极低。通过把Hadoop生态体系融入中国地质调查云平台架构,基于Hadoop HDFS和HBase存储架构,建立非结构化地质数据基础内容库存储组织模式,采用Lucene全文搜索引擎架和地质领域本体词库构建快速随机访问的索引文件机制,改变了多样化、碎片化的复杂地质调查非结构化数据的存储、阅读、搜索和应用模式,为智能地质调查提供精确、快速服务奠定基础。相似文献

2.

湖北交通运输大数据存储应用平台设计与实现

杨厚新漆炜王鹏王伟廖思远周彦君《测绘与空间地理信息》2021,44(4):24-28,34

针对当前湖北省交通运输数据的存储管理与应用瓶颈,结合目前大数据技术的发展背景,本文基于Hadoop生态体系设计与实现了针对湖北交通运输业务的大数据存储分析平台。文中从总体架构、存储结构、扩展架构、数据接入与治理、分布式任务调度的设计实现到平台的应用场景进行了论述,并使用10年的高速公路数据与具体业务场景对平台进行了存储与效率实验。实验结果表明,相比传统关系型数据库,该平台在数据存储、查询计算方面均具有非常明显的优越性。相似文献

3.

基于并行编程计算模型的索贝尔滤波技术

徐昌荣王聪颖袁秀华《测绘科学》2014,39(10)

随着遥感影像数据量的骤增,单机环境下完成索贝尔边缘滤波运算所需的计算时间也剧增.根据遥感数据的分幅特征,结合MapReduce并行分布式计算模型,本文提出了一种将该运算迁徙到Hadoop集群环境中的方法,以完成海量影像数据的索贝尔滤波运算.实验结果表明集群运算能够显著缩短计算时间,并且该计算时间会随着集群节点数目的增加而趋于减少. 相似文献

4.

基于MapReduce的时空数据模型设计方法

苏韦李景文刘华尧张海英欧阳云《测绘与空间地理信息》2013,36(7):41-44

针对时空数据存储与查询问题,传统方法存在硬件成本高,存储效率低等缺点。通过对MapReduce模型和Hadoop框架等云计算核心技术的分析和研究,提出了一种基于Hadoop的时空数据存储模型,并在此模型的基础上,设计了基于MapReduce的时空数据查询并行化框架。该框架通过对时空数据的并行操作,使其适用于海量时空数据的存储与管理。相似文献

5.

基于MapReduce计算模型的气象资料处理调优试验

下载免费PDF全文

杨润芝沈文海肖卫青胡开喜杨昕王颖田伟《应用气象学报》2014,25(5):618-628

云计算技术使用分布式的计算技术实现了并行计算的计算能力和计算效率,解决了单机服务器计算能力低的问题。基于长序列历史资料所计算得出的气候标准值对于气象领域实时业务、准实时业务及科学研究中均具有重要的意义。由于长序列历史资料数据量大、运算逻辑较复杂,在传统单节点计算平台上进行整编计算耗时非常长。该文基于Hadoop分布式计算框架搭建了集群模式的云计算平台,以长序列历史资料作为源数据,基于MapReduce计算模型实现了部分整编算法,提高计算时效。同时,由于数据源本身具有文件个数多、单个文件小等特点,对数据源存储形式及数据文件大小进行改造,分别利用SequenceFile方式及文本文件合并方式对同一种场景进行计算时效对比测试,分别测试了10个文件合并、100个文件合并两种情况,使时效性得到了更大程度的提升。相似文献

6.

大数据环境下地震观测数据存储方案研究

单维锋滕云田刘海军杨冠泽《中国地震》2019,35(3):558-564

大数据技术为处理海量地震观测数据提供了一种新的数据存储与计算模式。为了解决现有基于关系数据库存储方案的读写速度低、用户并发度低和可扩展性差等问题,以地震前兆观测数据为例,在详细分析业务需求的基础上,提出了基于HBase和Open TSDB的地震大数据存储方案,搭建了大数据测试平台,完成了不同存储方案下查询、插入性能实验和并发性实验。实验结果表明,与关系数据库存储方案相比,基于HBase和Open TSDB的存储方案具有很好的可扩展性和并发性,经过优化后的HBase存储方案具有更高的读取和存储性能。相似文献

7.

Hadoop环境下基于SparkSQL海量自动站数据查询统计初探

黄志詹利群任晓炜李涛《气象科技》2019,47(5):768-772

在Hadoop分布式计算和存储架构下,自定义ETL数据清洗规则将海量自动站小时单站文件按所属年和站号合并为大文件流转存储至HDFS中,并运用SparkSQL并行计算框架进行统计处理生成常用气象要素日统计值。结果表明,数据处理和获取时效较关系型数据库方式有显著提升。采用SparkSQL并行计算框架对多气象要素多站点和长时间序列进行数据统计处理查询均能达到秒级别响应,并随着统计站点数的不断增加和时间跨度的延长其优势更为明显,能更高效地支撑此类气象数据服务,为海量气象数据处理从关系型数据库到大数据分布式架构的转换处理提供了新思路。相似文献

8.

A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce

Zhenlong Li Fei Hu John L. Schnase Daniel Q. Duffy Tsengdar Lee Michael K. Bowen 《International journal of geographical information science》2017,31(1):17-35

Climate observations and model simulations are producing vast amounts of array-based spatiotemporal data. Efficient processing of these data is essential for assessing global challenges such as climate change, natural disasters, and diseases. This is challenging not only because of the large data volume, but also because of the intrinsic high-dimensional nature of geoscience data. To tackle this challenge, we propose a spatiotemporal indexing approach to efficiently manage and process big climate data with MapReduce in a highly scalable environment. Using this approach, big climate data are directly stored in a Hadoop Distributed File System in its original, native file format. A spatiotemporal index is built to bridge the logical array-based data model and the physical data layout, which enables fast data retrieval when performing spatiotemporal queries. Based on the index, a data-partitioning algorithm is applied to enable MapReduce to achieve high data locality, as well as balancing the workload. The proposed indexing approach is evaluated using the National Aeronautics and Space Administration (NASA) Modern-Era Retrospective Analysis for Research and Applications (MERRA) climate reanalysis dataset. The experimental results show that the index can significantly accelerate querying and processing (~10× speedup compared to the baseline test using the same computing cluster), while keeping the index-to-data ratio small (0.0328%). The applicability of the indexing approach is demonstrated by a climate anomaly detection deployed on a NASA Hadoop cluster. This approach is also able to support efficient processing of general array-based spatiotemporal data in various geoscience domains without special configuration on a Hadoop cluster. 相似文献

9.

A spatiotemporal algebra in Hadoop for moving objects

Mohamed S. Bakli Mahmoud A. Sakr Taysir Hassan A. Soliman 《地球空间信息科学学报》2018,21(2):102-114

Spatiotemporal data represent the real-world objects that move in geographic space over time. The enormous numbers of mobile sensors and location tracking devices continuously produce massive amounts of such data. This leads to the need for scalable spatiotemporal data management systems. Such systems shall be capable of representing spatiotemporal data in persistent storage and in memory. They shall also provide a range of query processing operators that may scale out in a cloud setting. Currently, very few researches have been conducted to meet this requirement. This paper proposes a Hadoop extension with a spatiotemporal algebra. The algebra consists of moving object types added as Hadoop native types, and operators on top of them. The Hadoop file system has been extended to support parameter passing for files that contain spatiotemporal data, and for operators that can be unary or binary. Both the types and operators are accessible for the MapReduce jobs. Such an extension allows users to write Hadoop programs that can perform spatiotemporal analysis. Certain queries may call more than one operator for different jobs and keep these operators running in parallel. This paper describes the design and implementation of this algebra, and evaluates it using a benchmark that is specific to moving object databases. 相似文献

10.

一种结合RDBMS和Hadoop的海量小文件存储方法 总被引：1，自引：0，他引：1

刘小俊徐正全潘少明《武汉大学学报(信息科学版)》2013,38(1):113-115,120

提出了一种综合利用RDBMS和Hadoop云存储各自优势、同时避免各自缺陷的海量小文件存储方法。原型系统实验表明,该方法可以满足"数字城市"应用中小文件的存储需求,同时,也可作为其他具备结构化特征的海量小文件数据存储系统。相似文献