首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 719 毫秒
1.
A mathematical framework for earth science data provenance tracing   总被引:1,自引:1,他引:0  
This paper identifies three distinct data production paradigms for Earth science data, each having its own versioning structure:
–  Climate data record production, used when the data producer’s dominant concern is providing a homogeneous error structure for each data set version, particularly when the data record is expected to cover a long time period  相似文献   

2.
The Nu Expression for Probabilistic Data Integration   总被引:4,自引:0,他引:4  
The general problem of data integration is expressed as that of combining probability distributions conditioned to each individual datum or data event into a posterior probability for the unknown conditioned jointly to all data. Any such combination of information requires taking into account data interaction for the specific event being assessed. The nu expression provides an exact analytical representation of such a combination. This representation allows a clear and useful separation of the two components of any data integration algorithm: individual data information content and data interaction, the latter being different from data dependence. Any estimation workflow that fails to address data interaction is not only suboptimal, but may result in severe bias. The nu expression reduces the possibly very complex joint data interaction to a single multiplicative correction parameter ν 0, difficult to evaluate but whose exact analytical expression is given; availability of such an expression provides avenues for its determination or approximation. The case ν 0=1 is more comprehensive than data conditional independence; it delivers a preliminary robust approximation in presence of actual data interaction. An experiment where the exact results are known allows the results of the ν 0=1 approximation to be checked against the traditional estimators based on assumption of data independence.  相似文献   

3.
A diverse set of computer programs has been developed at the Lawrence Livermore National Laboratory (LLNL)to process geophysical data obtained from boreholes. These programs support such services as digitizing analog records, reading and processing raw data, cataloging and storing processed data, retrieving selected data for analysis, and generating data plots on several different devices. A variety of geophysical data types are accommodated, including both wireline logs and laboratory analyses of downhole samples. Many processing tasks are handled by means of a single, flexible, general-purpose, data-manipulation program. Separate programs are available for processing data from density, gravity, velocity, and epithermal neutron logs. The computer-based storage and retrieval system, which has been in operation since 1973, currently contains over 4400 data files. Most of this data was obtained from the Nevada Test Site (NTS)in conjunction with the nuclear test program. Each data file contains a single geophysical parameter as a function of depth. Automatic storage and retrieval are facilitated by the assignment of unique file names that define the storage location of each data file. Files of interest to the user may be located and retrieved by means of a search program that examines the catalog. A convention recognized by all programs in the system is that of a zero ordinate denoting a gap in an otherwise continuous data trace. This convention provides a simple mechanism for editing and displaying data files in an automated and consistent manner.  相似文献   

4.
Unidata’s Common Data Model mapping to the ISO 19123 Data Model   总被引:1,自引:1,他引:0  
Access to real-time distributed Earth and Space Science (ESS) information is essential for enabling critical Decision Support Systems (DSS). Thus, data model interoperability between the ESS and DSS communities is a decisive achievement for enabling cyber-infrastructure which aims to serve important societal benefit areas. The ESS community is characterized by a certain heterogeneity, as far as data models are concerned. Recent spatial data infrastructures implement international standards for the data model in order to achieve interoperability and extensibility. This paper presents well-accepted ESS data models, introducing a unified data model called the Common Data Model (CDM). CDM mapping into the corresponding elements of the international standard coverage data model of ISO 19123 is presented and discussed at the abstract level. The mapping of CDM scientific data types to the ISO coverage model is a first step toward interoperability of data systems. This mapping will provide the abstract framework that can be used to unify subsequent efforts to define appropriate conventions along with explicit agreed-upon encoding forms for each data type. As a valuable case in point, the content mapping rules for CDM grid data are discussed addressing a significant example.
Lorenzo BigagliEmail: URL: www.imaa.cnr.it
  相似文献   

5.
This paper introduces a novel web-based database, SHRIMPDB, to support the efficient reutilization of U-Th-Pb geochronological data from sensitive high-resolution ion microprobe (SHRIMP) measurements. In order to provide complete data content that can be reutilized by Earth scientists, a new data model containing analytical data and relevant sample metadata is proposed according to analyses of measurement procedures and the data characteristics of SHRIMP. Vivid data visualization, real-time data query interfaces (including a novel and intuitive polygonal region search), and a pragmatic data management module are designed and implemented using web-based and cloud GIS-based technologies, which provide a platform for Earth scientists to efficiently curate and share SHRIMP data on the internet. An incentive that encourages geochronologists to contribute data is suggested through cooperation between SHRIMPDB and the Beijing SHRIMP center. The database is currently under evaluation at the Beijing SHRIMP center. SHRIMPDB is globally available online at http://202.198.17.27/shrimpdb/home.  相似文献   

6.
大数据驱动的研究范式正在引起地学领域的革命,而海量大数据的有效管理和共享是数据高效利用的前提。英国地质调查局作为最早成立的国家地质调查局,拥有海量的地学数据资源,通过近年来对数字化工作的全面推进,在数据的开放共享方面走在了世界各国的最前沿。文章对英国地质调查局的数据资源管理和数据共享方式进行了分析调研,详细介绍了他们的一站式管理平台——开放地学的主体组成,以及他们与同行合作建设的数据库。开放地学全面汇总了地调局内的数据资源,并通过一系列数据共享将所有数据集有机链接,通过数据和模型的巧妙结合,在满足用户数据需求的同时,对数据的应用进行了全方位的拓展,是地球系统科学研究框架下地球科学数字化工作的良好典范。  相似文献   

7.
8.
Benford’s Law gives the expected frequencies of the digits in tabulated data and asserts that the lower digits (1, 2, and 3) are expected to occur more frequently than the higher digits. This study tested whether the law applied to two large earth science data sets. The first test analyzed streamflow statistics and the finding was a close conformity to Benford’s Law. The second test analyzed the sizes of lakes and wetlands, and the finding was that the data did not conform to Benford’s Law. Further analysis showed that the lake and wetland data followed a power law. The expected digit frequencies for data following a power law were derived, and the lake data had a close fit to these expected digit frequencies. The use of Benford’s Law could serve as a quality check for streamflow data subsets, perhaps related to time or geographical area. Also, with the importance of lakes as essential components of the water cycle, either Benford’s Law or the expected digit frequencies of data following a power law could be used as an authenticity and validity check on future databases dealing with water bodies. We give several applications and avenues for future research, including an assessment of whether the digit frequencies of data could be used to derive the power law exponent, and whether the digit frequencies could be used to verify the range over which a power law applies. Our results indicate that data related to water bodies should conform to Benford’s Law and that nonconformity could be indicators of (a) an incomplete data set, (b) the sample not being representative of the population, (c) excessive rounding of the data, (d) data errors, inconsistencies, or anomalies, and/or (e) conformity to a power law with a large exponent.  相似文献   

9.
Global health threats such as the recent Ebola and Zika virus outbreaks require rapid and robust responses to prevent, reduce and recover from disease dispersion. As part of broader big data and digital humanitarianism discourses, there is an emerging interest in data produced through mobile phone communications for enhancing the data environment in such circumstances. This paper assembles user perspectives and critically examines existing evidence and future potential of mobile phone data derived from call detail records (CDRs) and two-way short message service (SMS) platforms, for managing and responding to humanitarian disasters caused by communicable disease outbreaks. We undertake a scoping review of relevant literature and in-depth interviews with key informants to ascertain the: (i) information that can be gathered from CDRs or SMS data; (ii) phase(s) in the disease disaster management cycle when mobile data may be useful; (iii) value added over conventional approaches to data collection and transfer; (iv) barriers and enablers to use of mobile data in disaster contexts; and (v) the social and ethical challenges. Based on this evidence we develop a typology of mobile phone data sources, types, and end-uses, and a decision-tree for mobile data use, designed to enable effective use of mobile data for disease disaster management. We show that mobile data holds great potential for improving the quality, quantity and timing of selected information required for disaster management, but that testing and evaluation of the benefits, constraints and limitations of mobile data use in a wider range of mobile-user and disaster contexts is needed to fully understand its utility, validity, and limitations.  相似文献   

10.
INTRODUCTIONGeographic information systems (GIS) is a newtechnology of storing and processing spatial informa-tion , which can combine graphics with many types ofdatabase .It can also exhibit accurate and real spaceinformation with charts and texts according to actualneed ,and canintegrate geographic locations and cor-related data attributes as an organic whole .Geoscien-tists have shown GIS to be a very useful tool in theanalysis of geoscience problems (Zhao et al .,2004 ;Singer ,1993…  相似文献   

11.
An Empirical Failure Criterion for Intact Rocks   总被引:1,自引:1,他引:0  
The parameter m i is an important rock property parameter required for use of the Hoek–Brown failure criterion. The conventional method for determining m i is to fit a series of triaxial compression test data. In the absence of laboratory test data, guideline charts have been provided by Hoek to estimate the m i value. In the conventional Hoek–Brown failure criterion, the m i value is a constant for a given rock. It is observed that using a constant m i may not fit the triaxial compression test data well for some rocks. In this paper, a negative exponent empirical model is proposed to express m i as a function of confinement, and this exercise leads us to a new empirical failure criterion for intact rocks. Triaxial compression test data of various rocks are used to fit parameters of this model. It is seen that the new empirical failure criterion fits the test data better than the conventional Hoek–Brown failure criterion for intact rocks. The conventional Hoek–Brown criterion fits the test data well in the high-confinement region but fails to match data well in the low-confinement and tension regions. In particular, it overestimates the uniaxial compressive strength (UCS) and the uniaxial tensile strength of rocks. On the other hand, curves fitted by the proposed empirical failure criterion match test data very well, and the estimated UCS and tensile strength agree well with test data.  相似文献   

12.
为更加准确的摸清地质资料用户的需求,消除资料使用者与资料管理者之间的信息鸿沟,全国地质资料馆开展了数字地质资料馆主站访问日志的数据收集工作。数据集采用计算机自动记录的方法,将访问者的所在地、关键词、IP地址等信息进行完整记录。为了更好的利用这些数据,采用了规范的数据处理方法和质量控制体系。数据集提供了2014—2017年数字地质资料馆主站的访问记录,有效地反映了访问者在进行地质资料获取过程中的行为习惯,可为日后地质资料网站建设、地质资料开发利用、地质资料的管理与服务提供依据。  相似文献   

13.
Building of models in the Earth Sciences often requires the solution of an inverse problem: some unknown model parameters need to be calibrated with actual measurements. In most cases, the set of measurements cannot completely and uniquely determine the model parameters; hence multiple models can describe the same data set. Bayesian inverse theory provides a framework for solving this problem. Bayesian methods rely on the fact that the conditional probability of the model parameters given the data (the posterior) is proportional to the likelihood of observing the data and a prior belief expressed as a prior distribution of the model parameters. In case the prior distribution is not Gaussian and the relation between data and parameters (forward model) is strongly non-linear, one has to resort to iterative samplers, often Markov chain Monte Carlo methods, for generating samples that fit the data likelihood and reflect the prior model statistics. While theoretically sound, such methods can be slow to converge, and are often impractical when the forward model is CPU demanding. In this paper, we propose a new sampling method that allows to sample from a variety of priors and condition model parameters to a variety of data types. The method does not rely on the traditional Bayesian decomposition of posterior into likelihood and prior, instead it uses so-called pre-posterior distributions, i.e. the probability of the model parameters given some subset of the data. The use of pre-posterior allows to decompose the data into so-called, “easy data” (or linear data) and “difficult data” (or nonlinear data). The method relies on fast non-iterative sequential simulation to generate model realizations. The difficult data is matched by perturbing an initial realization using a perturbation mechanism termed “probability perturbation.” The probability perturbation method moves the initial guess closer to matching the difficult data, while maintaining the prior model statistics and the conditioning to the linear data. Several examples are used to illustrate the properties of this method.  相似文献   

14.
Cluster analysis can be used to group samples and to develop ideas about the multivariate geochemistry of the data set at hand. Due to the complex nature of regional geochemical data (neither normal nor log-normal, strongly skewed, often multi-modal data distributions, data closure), cluster analysis results often strongly depend on the preparation of the data (e.g. choice of the transformation) and on the clustering algorithm selected. Different variants of cluster analysis can lead to surprisingly different cluster centroids, cluster sizes and classifications even when using exactly the same input data. Cluster analysis should not be misused as a statistical “proof” of certain relationships in the data. The use of cluster analysis as an exploratory data analysis tool requires a powerful program system to test different data preparation, processing and clustering methods, including the ability to present the results in a number of easy to grasp graphics. Such a tool has been developed as a package for the R statistical software. Two example data sets from geochemistry are used to demonstrate how the results change with different data preparation and clustering methods. A data set from S-Norway with a known number of clusters and cluster membership is used to test the performance of different clustering and data preparation techniques. For a complex data set from the Kola Peninsula, cluster analysis is applied to explore regional data structures.  相似文献   

15.
地质数据库是地球信息科学的重要组成部分,可为地球科学研究工作提供可靠的数据基础.Re-Os同位素定年已广泛应用于矿床成因、地幔演化、海洋环境的研究中,建设Re-Os同位素定年数据库可整合相关研究成果,提升该领域成果资料的集成化管理和应用水平.本文采用GIS空间数据库构建的技术路线,从数据库建设思路、数据整合加工方法、数...  相似文献   

16.
地质大数据特点及其合理开发利用   总被引:1,自引:0,他引:1  
大数据的合理开发与利用将开辟一个新的认知空间,提出解决各种实际问题的新模式,催生一个前所未有的数字经济新领域和开创一个新的人类生活方式。"数学地质"与"信息技术"结合形成"数字地质"。"数字地质"是地质科学的"数据科学"。地质数据科学是用数据的方法研究地质学,根据地质数据的特点和地质工作的需要研究和开发利用地质大数据。本文概述了地质数据的混合性、抽样性、因果性、时空性、多态性和多元性等主要特点,提出地质大数据的开发利用宜首先建立"知识库",据此建立"数据库""模型库"和"方法库",以便有针对性地做到获取、分析、研究和应用大数据。  相似文献   

17.
Interpretation of geophysical data or other indirect measurements provides large-scale soft secondary data for modeling hard primary data variables. Calibration allows such soft data to be expressed as prior probability distributions of nonlinear block averages of the primary variable; poorer quality soft data leads to prior distributions with large variance, better quality soft data leads to prior distributions with low variance. Another important feature of most soft data is that the quality is spatially variable; soft data may be very good in some areas while poorer in other areas. The main aim of this paper is to propose a new method of integrating such soft data, which is large-scale and has locally variable precision. The technique of simulated annealing is used to construct stochastic realizations that reflect the uncertainty in the soft data. This is done by constraining the cumulative probability values of the block average values to follow a specified distribution. These probability values are determined by the local soft prior distribution and a nonlinear average of the small-scale simulated values within the block, which are all known. For each realization to accurately capture the information contained in the soft data distributions, we show that the probability values should be uniformly distributed between 0 and 1. An objective function is then proposed for a simulated annealing based approach to enforce this uniform probability constraint. The theoretical justification of this approach is discussed, implementation details are considered, and an example is presented.  相似文献   

18.
Geological data frequently have a heavy-tailed normal-in-the-middle distribution, which gives rise to grade distributions that appear to be normal except for the occurrence of a few outliers. This same situation also applies to log-transformed data to which lognormal kriging is to be applied. For such data, linear kriging is nonrobust in that (1)kriged estimates tend to infinity as the outliers do, and (2)it is also not minimum mean squared error. The more general nonlinear method of disjunctive kriging is even more nonrobust, computationally more laborious, and in the end need not produce better practical answers. We propose a robust kriging method for such nearly normal data based on linear kriging of an editing of the data. It is little more laborious than conventional linear kriging and, used in conjunction with a robust estimator of the variogram, provides good protection against the effects of data outliers. The method is also applicable to time series analysis.  相似文献   

19.
Holistic understanding of estuarine and coastal environments across interacting domains with high-dimensional complexity can profitably be approached through data-centric synthesis studies. Synthesis has been defined as “the inferential process whereby new models are developed from analysis of multiple data sets to explain observed patterns across a range of time and space scales.” Examples include ecological—across ecosystem components or organization levels, spatial—across spatial scales or multiple ecosystems, and temporal—across temporal scales. Though data quantity and volume are increasingly accessible, infrastructures for data sharing, management, and integration remain fractured. Integrating heterogeneous data sets is difficult yet critical. Technological and cultural obstacles hamper finding, accessing, and integrating data to answer scientific and policy questions. To investigate synthesis within the estuarine and coastal science community, we held a workshop at a coastal and estuarine research federation conference and conducted two case studies involving synthesis science. The workshop indicated that data-centric synthesis approaches are valuable for (1) hypothesis testing, (2) baseline monitoring, (3) historical perspectives, and (4) forecasting. Case studies revealed important weaknesses in current data infrastructures and highlighted opportunities for ecological synthesis science. Here, we list requirements for a coastal and estuarine data infrastructure. We model data needs and suggest directions for moving forward. For example, we propose developing community standards, accommodating and integrating big and small data (e.g., sensor feeds and single data sets), and digitizing ‘dark data’ (inaccessible, non-curated, non-archived data potentially destroyed when researchers leave science).  相似文献   

20.
As large quantities of physical data are always collected for Ecoinformatics research, it is difficult for them to be cleaned, shared, visualized, and analyzed by research collaborators. To resolve this difficulty, this study presents online weather data analysis and visualization cyber-infrastructures consisting of (1) online weather data analysis and visualization tools and (2) near real-time online weather data portal. Firstly, these online tools at www.twibl.org/weather provide data sharing in three web pages: information on instruments and site; data access protected by simple password security; data analysis and visualization services so-called “Ecoinfows”. Secondly, the near real-time online weather data portal for visualizing and forecasting weather data from cloud storage of many automatic weather stations is online at www.twibl.org/aaportal. To overcome speed and accessibility problems, we developed these tools with many technologies - i.e. cloud computing, online computing XML (webMathematica), and binary access data conversion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号