Search Results

You are looking at 1 - 10 of 468 items :

  • "High-dimensional data" x
Clear All

comprehensive genomic characterization for a cohort of cancer and normal samples. High-dimensional data offer a unique opportunity to more comprehensively describe the ethology and prognosis of important diseases. Correspondingly in statistical literature we have seen a surge of interest in ultra-high dimensionality survival data set where the covariate dimensionality p $p$ grows exponentially or non-polynomially fast with sample size n $n$ , i.e., log ( p ) = O ( n α ) $\log(p) = O(n^\alpha) $ for some α ∈ ( 0 , 1 / 2 ) $\alpha\in(0,1/2)$ . For ultra-high dimensional

1 Introduction 1.1 Statement of problem High-dimensional data is increasingly common in modern biological experiments, where the number of variables is on the order of thousands and beyond. For instance, in an international competition on the analysis of breast cancer, the raw data has p =32,670 bins for predictors ( Hand, 2008 ). At the Center for Cancer Research, the proteomic data for ovarian cancer has p =360,000 predictors. Efron (2008 , 2010) discussed p =6033 for microarray gene expression data, p =15,445 for diffusion tensor imaging processing, and

Volume 8, Issue 1 2009 Article 21 Statistical Applications in Genetics and Molecular Biology Univariate Shrinkage in the Cox Model for High Dimensional Data Robert J. Tibshirani, Stanford University Recommended Citation: Tibshirani, Robert J. (2009) "Univariate Shrinkage in the Cox Model for High Dimensional Data," Statistical Applications in Genetics and Molecular Biology: Vol. 8: Iss. 1, Article 21. DOI: 10.2202/1544-6115.1438 Univariate Shrinkage in the Cox Model for High Dimensional Data Robert J. Tibshirani Abstract We propose a method for prediction in Cox

Volume 9, Issue 1 2010 Article 17 Statistical Applications in Genetics and Molecular Biology Sparse Partial Least Squares Classification for High Dimensional Data Dongjun Chung, University of Wisconsin, Madison Sunduz Keles, University of Wisconsin, Madison Recommended Citation: Chung, Dongjun and Keles, Sunduz (2010) "Sparse Partial Least Squares Classification for High Dimensional Data," Statistical Applications in Genetics and Molecular Biology: Vol. 9: Iss. 1, Article 17. DOI: 10.2202/1544-6115.1492 Sparse Partial Least Squares Classification for High

Gupta, A. (2003). An elementary proof of a theorem of Johnson and Lindenstrauss, Random Structures and Algorithms 22 (1): 60-65. Donoho D.L. (2000 ). High-dimensional data analysis: The curses and blessings of dimensionality, Technical report , Department of Statistics, Stanford University, Stanford, CA. Frankl, P. and Maehara, H. (1987). The Johnson-Lindenstrauss lemma and the sphericity of some graphs, Journal of Combinatorial Theory A 44 (3): 355-362. Forbes, C., Evans, M. and Hastings, N. and Peacock, B. (2011). Statistical Distributions , 4th Edn., John

Volume 3, Issue 1 2007 Article 12 The International Journal of Biostatistics Multiple Imputation and Random Forests (MIRF) for Unobservable, High-Dimensional Data Bareng A. S. Nonyane, University of Massachusetts, Amherst Andrea S. Foulkes, University of Massachusetts, Amherst Recommended Citation: Nonyane, Bareng A. S. and Foulkes, Andrea S. (2007) "Multiple Imputation and Random Forests (MIRF) for Unobservable, High-Dimensional Data," The International Journal of Biostatistics: Vol. 3: Iss. 1, Article 12. DOI: 10.2202/1557-4679.1049 Multiple Imputation and

and entropy estimation in manifold learning, IEEE Transactions on Signal Processing 52 (8): 2210–2221. Costa, J.A. and Hero, A.O. (2005). Estimating local intrinsic dimension with k-nearest neighbor graphs, IEEE Transactions on Statistical Signal Processing 30 (23): 1432–1436. Donoho, D.L. and Grimes, C. (2005). Hessian eigenmaps: New locally linear embedding techniques for high-dimensional data, Proceedings of the National Academy of Sciences 102 (21): 7426–7431. Dzemyda, G., Kurasova, O. and Žilinskas, J. (2013). Multidimensional Data Visualization: Methods

DEMONSTRATIO MATHEMATICA Vol. XXXV No 3 2002 Mieczystaw A. Klopotek A N E W SPACE-SAVING B A Y E S I A N TREE C O N S T R U C T I O N M E T H O D FOR HIGH DIMENSIONAL DATA Abstract . Bayesian networks have many practical applications due to their capability to represent joint probability distribution in many variables in a compact way. There exist efficient reasoning methods for Bayesian networks. Many algorithms for learning Bayesian networks from empirical data have been developed. A well-known problem with Bayesian networks is the practical limitation


We propose privacy-preserving protocols for computing linear regression models, in the setting where the training dataset is vertically distributed among several parties. Our main contribution is a hybrid multi-party computation protocol that combines Yao’s garbled circuits with tailored protocols for computing inner products. Like many machine learning tasks, building a linear regression model involves solving a system of linear equations. We conduct a comprehensive evaluation and comparison of different techniques for securely performing this task, including a new Conjugate Gradient Descent (CGD) algorithm. This algorithm is suitable for secure computation because it uses an efficient fixed-point representation of real numbers while maintaining accuracy and convergence rates comparable to what can be obtained with a classical solution using floating point numbers. Our technique improves on Nikolaenko et al.’s method for privacy-preserving ridge regression (S&P 2013), and can be used as a building block in other analyses. We implement a complete system and demonstrate that our approach is highly scalable, solving data analysis problems with one million records and one hundred features in less than one hour of total running time.


The presence of genotype-environment interaction (GEI) influences production making the selection of cultivars in a complex process. The two most used methods to analyze GEI and evaluate genotypes are AMMI and GGE Biplot, being used for the analysis of multi environment trials data (MET). Despite their different approaches, both models complement each other in order to strengthen decision making. However, both models are based on biplots, consequently, biplot-based interpretation doesn’t scale well beyond two-dimensional plots, which happens whenever the first two components don’t capture enough variation. This paper proposes an approach to such cases based on cluster analysis combined with the concept of medoids. It also applies AMMI and GGE Biplot to the adjusted data in order to compare both models. The data is provided by the International Maize and Wheat Improvement Center (CIMMYT) and comes from the 14th Semi-Arid Wheat Yield Trial (SAWYT), an experiment concerning 50 genotypes of spring bread wheat (Triticum aestivum) germplasm adapted to low rainfall. It was performed in 36 environments across 14 countries. The analysis provided 25 genotypes clusters and 6 environments clusters. Both models were equivalent for the data’s evaluation, permitting increased reliability in the selection of superior cultivars and test environments.