tailieunhanh - Database-friendly random projections: Johnson-Lindenstrauss with binary coins

The prediction of transcription factor binding sites in genomic sequences is in principle very useful to identify upstream regulatory factors. However, when applying this concept to genomes of multi- cellular organisms such as mammals, one has to deal with a large number of false positive predic- tions since many transcription factor genes are only expressed in specific tissues or cell types. We developed TS-REX, a database/software system that supports the analysis of tissue and cell type- specific transcription factor-gene networks based on expressed sequence tag abundance of transcrip- tion factor-encoding genes in UniGene EST libraries. The use of expression levels of transcription factor- encoding genes according to hierarchical anatomi- cal classifications covering different tissues and cell. | ACADEMIC PRESS Available iA POWERED BY SCIENCE DI RECT Journal of Computer and System Sciences 66 2003 671-687 JOURNAL OF COMPUTER AND System Sciences http locate jcss Database-friendly random projections Johnson-Lindenstrauss with binary coins _ . t au Microsoft Research One Microsoft Way Redmond WA 98052 USA Received 28 August 2001 revised 19 July 2002 Abstract A classic result of Johnson and Lindenstrauss asserts that any set of n points in 7-dimensional Euclidean spacecan be embedded into -dimensional Euclidean space where k is logarithmic in n and independent of d so that all pairwise distances are maintained within an arbitrarily small factor. All known comtructtonsi f suchembeddmgs involvepaojecaangtheMpoants onto a spherically raAdomk-dimensional hyperplanetlưAughthe ofsuchembeddingrwito a a-ope-ro thetsVI dements oOcUo projocUton mahix belongSn . 1 . Such eonslructionuart pArticuCurlywelOsuitedSov Oghtr s ee euvironmehts authe eomoutatiooof sheembeddmgvrdunrt to eohluatáng a olhgle agoreAsVc tour hrondoanpartstions of tUa attributeo. SSlA0 L3ESnecierSefhnceLUSA . AU r i . 1. Introduction Consider projecting the points of your favorite sculpture first onto the plane and then onto a result amply demonstrates the power of dimensionality. In general given a high-dimensional pointset it is natural to ask if it could be embedded into a lowetdimtnsmnel epace wtdiOLiSSLd fennggrraldissortion. rnthsspaoep Weeohsi dertgi9 tiLieslionibr iinitetesr of pomisieEltc will begohve-úontto think of n points in IRrf as an ixsAnatne PiPochpAmtrenPosontedas a cow Osp 1 i u . Giivehfueh glaaal pseprelhgiaÚ01Ci0nv opthe morscommolilyLlte0embeddmet t iha one sugnrssvdgy thesm cr value decomposilioisof us Iiiuibtrlo triibed the n jsvmtsmto IR we proýecAaheiaa omolhegtn lsshlss ioiaa spac e spaaaiaedfy the singular vectors corresponding to thttL largect .