首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
Support Vector Machine (SVM) is a popular data mining technique, and it has been widely applied in astronomical tasks, especially in stellar spectra classification. Since SVM doesn’t take the data distribution into consideration, and therefore, its classification efficiencies can’t be greatly improved. Meanwhile, SVM ignores the internal information of the training dataset, such as the within-class structure and between-class structure. In view of this, we propose a new classification algorithm-SVM based on Within-Class Scatter and Between-Class Scatter (WBS-SVM) in this paper. WBS-SVM tries to find an optimal hyperplane to separate two classes. The difference is that it incorporates minimum within-class scatter and maximum between-class scatter in Linear Discriminant Analysis (LDA) into SVM. These two scatters represent the distributions of the training dataset, and the optimization of WBS-SVM ensures the samples in the same class are as close as possible and the samples in different classes are as far as possible. Experiments on the K-, F-, G-type stellar spectra from Sloan Digital Sky Survey (SDSS), Data Release 8 show that our proposed WBS-SVM can greatly improve the classification accuracies.  相似文献   

2.
Support Vector Machine (SVM) is one of the important stellar spectral classification methods, and it is widely used in practice. But its classification efficiencies cannot be greatly improved because it does not take the class distribution into consideration. In view of this, a modified SVM named Minimum within-class and Maximum between-class scatter Support Vector Machine (MMSVM) is constructed to deal with the above problem. MMSVM merges the advantages of Fisher’s Discriminant Analysis (FDA) and SVM, and the comparative experiments on the Sloan Digital Sky Survey (SDSS) show that MMSVM performs better than SVM.  相似文献   

3.
With the increase of stellar spectra, how to automatically classify these spectra have attracted astronomer's attention. Support Vector Machine (SVM), as a typical classifier, has widely used in stellar spectra classification. Due to its limited performance in various classification problems and higher training time, a model with a pair of hyperspheres named Twin Hypersphere Model (THM), proposed by Peng and Xu, is utilized for stellar spectra classification in this paper. In THM, the samples in one hypersphere is far from another according to the Euclidean distance. The comparative experiments with SVM and Twin Support Vector Machine (TWSVM) on the SDSS datasets shows that the THM model gives the best classification accuracy of 0.8836 for type F, 0.9446 for type G, and 0.9509 for type K, which are better than the classification accuracies of 0.8000, 0.8484, 0.8911 obtained by SVM and 0.8413, 0.8699, 0.9109 obtained by TWSVM. It can be concluded that THM perform better than traditional techniques such as SVM and TWSVM on the K-, F-, G- type stellar spectra classification.  相似文献   

4.
With the help of computer tools and algorithms, automatic stellar spectral classification has become an area of current interest. The process of stellar spectral classification mainly includes two steps: dimension reduction and classification. As a popular dimensionality reduction technique, Principal Component Analysis (PCA) is widely used in stellar spectra classification. Another dimensionality reduction technique, Locality Preserving Projections (LPP) has not been widely used in astronomy. The advantage of LPP is that it can preserve the local structure of the data after dimensionality reduction. In view of this, we investigate how to apply LPP+SVM in classifying the stellar spectral subclasses. In the comparative experiment, the performance of LPP is compared with PCA. The stellar spectral classification process is composed of the following steps. Firstly, PCA and LPP are respectively applied to reduce the dimension of spectra data. Then, Support Vector Machine (SVM) is used to classify the 4 subclasses of K-type and 3 subclasses of F-type spectra from Sloan Digital Sky Survey (SDSS). Lastly, the performance of LPP+SVM is compared with that of PCA+SVM in stellar spectral classification, and we found that LPP does better than PCA.  相似文献   

5.
In this work, we select spectra of stars with high signal-to-noise ratio from LAMOST data and map their MK classes to the spectral features. The equivalent widths of prominent spectral lines, which play a similar role as multi-color photometry, form a clean stellar locus well ordered by MK classes. The advantage of the stellar locus in line indices is that it gives a natural and continuous classification of stars consistent with either broadly used MK classes or stellar astrophysical parameters. We also employ an SVM-based classification algorithm to assign MK classes to LAMOST stellar spectra. We find that the completenesses of the classifications are up to 90% for A and G type stars, but they are down to about 50% for OB and K type stars. About 40% of the OB and K type stars are mis-classified as A and G type stars,respectively. This is likely due to the difference in the spectral features between late B type and early A type stars or between late G and early K type stars being very weak. The relatively poor performance of the automatic MK classification with SVM suggests that the direct use of line indices to classify stars is likely a more preferable choice.  相似文献   

6.
巡天观测与高能物理、黑洞天文等领域均有密切的联系.基于星系-超新星二分类问题,研究光谱数据预处理,结合余弦相似度改善PCA(Principal Component Analysis)光谱分解特征提取方法,用SDSS(the Sloan Digital Sky Survey)、WISeREP(the Weizmann Interactive Supernova data REPository)组成的5620条光谱数据集训练支持向量机,可以得到0.498%泛化误差的识别模型和新样本分类概率.使用Neyman-Pearson决策方法建立NPSVM(Neyman-Pearson Support Vector Machine)模型可进一步降低超新星的漏判率.  相似文献   

7.
Because of the effects of noise, distortion, observational environment and other factors, some appropriate preprocessing should be made in advance of automatic classification of celestial spectra. We have studied the effect of data format and flux standardization on the automatic classification of sky survey spectra. A basic model adaptable for the order-of-magnitude variation of fluxes is proposed, and the corresponding standardization methods are given. Our experimental results on galaxy and quasar classification show that the logarithmic wavelength data format is better for the automatic spectral classification. By these experiments, the reasonableness of the proposed model and the performances of the given flux standardization methods are verified. Especially, it is noted that the commonly used flux standardization is the worst, among other standardizations, for automatic spectral classification.  相似文献   

8.
We present an automatic, fast, accurate and robust method of classifying astronomical objects. The Self Organizing Map (SOM) as an unsupervised Artificial Neural Network (ANN) algorithm is used for classification of stellar spectra of stars. The SOM is used to make clusters of different spectral classes of Jacoby, Hunter and Christian (JHC) library. This ANN technique needs no training examples and the stellar spectral data sets are directly fed to the network for the classification. The JHC library contains 161 spectra out of which, 158 spectra are selected for the classification. These 158 spectra are input vectors to the network and mapped into a two dimensional output grid. The input vectors close to each other are mapped into the same or neighboring neurons in the output space. So, the similar objects are making clusters in the output map and making it easy to analyze high dimensional data.  相似文献   

9.
We investigate the application of neural networks to the automation of MK spectral classification. The data set for this project consists of a set of over 5000 optical (3800–5200 Å) spectra obtained from objective prism plates from the Michigan Spectral Survey. These spectra, along with their two-dimensional MK classifications listed in the Michigan Henry Draper Catalogue, were used to develop supervised neural network classifiers. We show that neural networks can give accurate spectral type classifications (σ68= 0.82 subtypes, σrms= 1.09 subtypes) across the full range of spectral types present in the data set (B2–M7). We show also that the networks yield correct luminosity classes for over 95 per cent of both dwarfs and giants with a high degree of confidence.   Stellar spectra generally contain a large amount of redundant information. We investigate the application of principal components analysis (PCA) to the optimal compression of spectra. We show that PCA can compress the spectra by a factor of over 30 while retaining essentially all of the useful information in the data set. Furthermore, it is shown that this compression optimally removes noise and can be used to identify unusual spectra.   This paper is a continuation of the work carried out by von Hippel et al. (Paper I).  相似文献   

10.
A method combining the support vector machine (SVM) the K-Nearest Neighbors (KNN), labelled the SVM-KNN method, is used to construct a solar flare forecasting model. Based on a proven relationship between SVM and KNN, the SVM-KNN method improves the SVM algorithm of classification by taking advantage of the KNN algorithm according to the distribution of test samples in a feature space. In our flare forecast study, sunspots and 10cm radio flux data observed during Solar Cycle 23 are taken as predictors, and whether an M class flare will occur for each active region within two days will be predicted. The SVM- KNN method is compared with the SVM and Neural networks-based method. The test results indicate that the rate of correct predictions from the SVM-KNN method is higher than that from the other two methods. This method shows promise as a practicable future forecasting model.  相似文献   

11.
A new method for classification of galaxy spectra is presented, based on a recently introduced information theoretical principle, the information bottleneck . For any desired number of classes, galaxies are classified such that the information content about the spectra is maximally preserved. The result is classes of galaxies with similar spectra, where the similarity is determined via a measure of information. We apply our method to ∼6000 galaxy spectra from the ongoing 2dF redshift survey, and a mock-2dF catalogue produced by a cold dark matter (CDM) based semi-analytic model of galaxy formation. We find a good match between the mean spectra of the classes found in the data and in the models. For the mock catalogue, we find that the classes produced by our algorithm form an intuitively sensible sequence in terms of physical properties such as colour, star formation activity, morphology, and internal velocity dispersion. We also show the correlation of the classes with the projections resulting from a principal component analysis.  相似文献   

12.
The rapid development of large-scale sky survey project has produced a large amount of stellar spectral data, which make the automatic classification of stellar spectral data a challenging task. In this paper, we have proposed a stellar spectral classification method based on a capsule network. At first, by using the one-dimensional convolutional network and short-time Fourier transform (STFT), the one-dimensional spectra of the F5, G5, and K5 types selected from the LAMOST Data Release 5 (DR5) are converted into the two-dimensional Fourier spectrum images. Then, the two-dimensional Fourier spectrum images are classified automatically by the capsule network. Because the capsule network can preserve the hierarchical pose relationships among the entities in the image, and it does not need any pooling layers, the experimental results show that the capsule network has a better classification performance, for the classifications of the F5, G5, and K5-type stellar spectra, its classification accuracy is superior to other classification methods.  相似文献   

13.
本文提供了125颗MK标准星的CCD光谱,光谱型从O到M,光度级从V到Ⅰ,构成较完整的二元分类框架,光谱覆盖范围由传统蓝紫区延伸到黄红区.初步考察和归纳了黄红区适于恒星分类的主要光谱特征和判据.这些结果对于采用相似分辨率的恒星光谱分类工作是非常有用的.  相似文献   

14.
We present a method for radical linear compression of data sets where the data are dependent on some number M of parameters. We show that, if the noise in the data is independent of the parameters, we can form M linear combinations of the data which contain as much information about all the parameters as the entire data set, in the sense that the Fisher information matrices are identical; i.e. the method is lossless. We explore how these compressed numbers fare when the noise is dependent on the parameters, and show that the method, though not precisely lossless, increases errors by a very modest factor. The method is general, but we illustrate it with a problem for which it is well-suited: galaxy spectra, the data for which typically consist of ∼103 fluxes, and the properties of which are set by a handful of parameters such as age, and a parametrized star formation history. The spectra are reduced to a small number of data, which are connected to the physical processes entering the problem. This data compression offers the possibility of a large increase in the speed of determining physical parameters. This is an important consideration as data sets of galaxy spectra reach 106 in size, and the complexity of model spectra increases. In addition to this practical advantage, the compressed data may offer a classification scheme for galaxy spectra which is based rather directly on physical processes.  相似文献   

15.
The second phase of the Small Main-belt Asteroid Spectroscopic Survey (SMASSII) produced an internally consistent set of visible-wavelength charge-coupled device (CCD) spectra for 1447 asteroids (Bus and Binzel 2002, Icarus, ). These data provide a basis for developing a new asteroid taxonomy that utilizes more of the information contained in CCD spectra. Here we construct a classification system that builds on the robust framework provided by existing asteroid taxonomies. In particular, we define three major groupings (the S-, C-, and X-complexes) that adhere to the classical definitions of the S-, C-, and X-type asteroids. A total of 26 classes are defined, based on the presence or absence of specific spectral features. Definitions and boundary parameters are provided for each class, allowing new spectral observations to be placed in this system. Of these 26 classes, 12 bear familiar single-letter designations that follow previous conventions: A, B, C, D, K, O, Q, R, S, T, V, and X. A new L-class is introduced to describe 35 objects with spectra having a steep UV slope shortward of 0.75 μm, but which are relatively flat longward of 0.75 μm. Asteroids with intermediate spectral characteristics are assigned multiletter designations: Cb, Cg, Cgh, Ch, Ld, Sa, Sk, Sl, Sq, Sr, Xc, Xe, and Xk. Members of the Cgh- and Ch-classes have spectra containing a 0.7-μm feature that is generally attributed to hydration. Although previously considered featureless, CCD observations reveal distinct features of varying strengths in the spectra of asteroids in the X-complex, thus allowing the Xc-, Xe-, and Xk-classes to be established. Most notably, the spectra of Xe-type asteroids contain an absorption feature centered near 0.49 μm that may be associated with troilite. Several new members are identified for previously unique or sparsely populated classes: 12 A-types, 3 O-types, and 3 R-types. Q-types are common within the near-Earth asteroid population but remain unobserved in the main belt. More than 30 new V-types are found in the vicinity of Vesta. The heliocentric distribution of the SMASSII taxonomic classes is similar to that determined from previous studies, though additional structure is revealed as a result of the larger sample size.  相似文献   

16.
机器学习在当今诸多领域已经取得了巨大的成功,但是机器学习的预测效果往往依赖于具体问题.集成学习通过综合多个基分类器来预测结果,因此,其适应各种场景的能力较强,分类准确率较高.基于斯隆数字巡天(Sloan Digital Sky Survey,SDSS)计划恒星/星系中最暗源星等集分类正确率低的问题,提出一种基于Stacking集成学习的恒星/星系分类算法.从SDSS-DR7(SDSS Data Release 7)中获取完整的测光数据集,并根据星等值划分为亮源星等集、暗源星等集和最暗源星等集.仅针对分类较为复杂且困难的最暗源星等集展开分类研究.首先,对最暗源星等集使用10折嵌套交叉验证,然后使用支持向量机(Support Vector Machine,SVM)、随机森林(Random Forest,RF)、XGBoost(eXtreme Gradient Boosting)等算法建立基分类器模型;使用梯度提升树(Gradient Boosting Decision Tree,GBDT)作为元分类器模型.最后,使用基于星系的分类正确率等指标,与功能树(Function Tree,FT)、SVM、RF、GBDT、XGBoost、堆叠降噪自编码(Stacked Denoising AutoEncoders,SDAE)、深度置信网络(Deep Belief Network,DBN)、深度感知决策树(Deep Perception Decision Tree,DPDT)等模型进行分类结果对比分析.实验结果表明,Stacking集成学习模型在最暗源星等集分类中要比FT算法的星系分类正确率提高了将近10%.同其他传统的机器学习算法、较强的提升算法、深度学习算法相比,Stacking集成学习模型也有较大的提升.  相似文献   

17.
Machine learning has achieved great success in many areas today, but the forecast effect of machine learning often depends on the specific problem. An ensemble learning forecasts results by combining multiple base classifiers. Therefore, its ability to adapt to various scenarios is strong, and the classification accuracy is high. In response to the low classification accuracy of the darkest source magnitude set of stars/galaxies in the Sloan Digital Sky Survey (SDSS), a star/galaxy classification algorithm based on the stacking ensemble learning is proposed in this paper. The complete photometric data set is obtained from the SDSS Data Release (DR) 7, and divided into the bright source magnitude set, dark source magnitude set, and darkest source magnitude set according to the stellar magnitude. Firstly, the 10-fold nested cross-validation method is used for the darkest source magnitude set, then the Support Vector Machine (SVM), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost) algorithms are used to establish the base-classifier model; the Gradient Boosting Decision Tree (GBDT) is used as the meta-classifier model. Finally, based on the classification accuracy of galaxies and other indicators, the classification results are analyzed and compared with the results obtained by the Function Tree (FT), SVM, RF, GBDT, Stacked Denoising Autoencoders (SDAE), Deep Belief Nets (DBN), and Deep Perception Decision Tree (DPDT) models. The experimental results show that the stacking ensemble learning model has improved the classification accuracy of galaxies in the darkest source magnitude set by nearly 10% compared to the function tree algorithm. Compared with other traditional machine learning algorithm, stronger lifting algorithm, and deep learning algorithm, the stacking ensemble learning model also has different degrees of improvement.  相似文献   

18.
Planetary transits detected by the CoRoT mission can be mimicked by a low‐mass star in orbit around a giant star. Spectral classification helps to identify the giant stars and also early‐type stars which are often excluded from further follow‐up. We study the potential and the limitations of low‐resolution spectroscopy to improve the photometric spectral types of CoRoT candidates. In particular, we want to study the influence of the signal‐to‐noise ratio (SNR) of the target spectrum in a quantitative way. We built an own template library and investigate whether a template library from the literature is able to reproduce the classifications. Including previous photometric estimates, we show how the additional spectroscopic information improves the constraints on spectral type. Low‐resolution spectroscopy (R ≈ 1000) of 42 CoRoT targets covering a wide range in SNR (1–437) and of 149 templates was obtained in 2012–2013 with the Nasmyth spectrograph at the Tautenburg 2 m telescope. Spectral types have been derived automatically by comparing with the observed template spectra. The classification has been repeated with the external CFLIB library. The spectral class obtained with the external library agrees within a few sub‐classes when the target spectrum has a SNR of about 100 at least. While the photometric spectral type can deviate by an entire spectral class, the photometric luminosity classification is as close as a spectroscopic classification with the external library. A low SNR of the target spectrum limits the attainable accuracy of classification more strongly than the use of external templates or photometry. Furthermore we found that low‐resolution reconnaissance spectroscopy ensures that good planet candidates are kept that would otherwise be discarded based on photometric spectral type alone. (© 2015 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

19.
20.
Data from the Cassini plasma spectrometer (CAPS) electron spectrometer (ELS) have been found to be contaminated with an energy-independent background count rate which has been associated with radiation sources on Cassini. In this paper we describe this background radiation and quantitatively assess its impact on numerically integrated electron moments. The general properties of such a background and its effects on numerical moments are derived. The properties of the ELS background are described and a model for the background presented. A model to generate synthetic ELS spectra is presented and used to evaluate the density and temperature of pure noise and then extended to include ambient distributions. It is shown that the presence of noise produces a saturation of the electron density and temperature at quasi-constant values when the instrument is at background, but that these noise level moments are dependent on the floating spacecraft potential and the orientation of the ELS instrument with respect to the spacecraft. When the ambient distribution has a poor signal-to-noise ratio (SNR) the noise determines the density and temperature; however, as the SNR increases (increasing primarily with density) the density and temperature tend to those of the ambient distribution. It is also shown that these noise effects produce highly artificial density-temperature inverse correlations. A method to subtract this noise is presented and shown to correct for the presence of the noise. Simulated error estimates for the density and temperature are also presented. The analysis described in this paper not only applies to weak background noise, but also to more significant penetrating backgrounds such as those in radiation belt regions of planetary magnetospheres.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号