tailieunhanh - Gene Selection for Cancer Classification using Support Vector Machines
Most cancers found in the adrenal gland did not start there and are not adrenal cancers. Instead, they start in other organs or tissues and then spread (metastasize) through the bloodstream to the adrenal glands. For example, lung cancers, melanomas, and breast cancers often spread to the adrenals. Even when other cancers spread to the adrenals; however, they are still named after the place they started and are treated like other cancers that start in the same place. They are not considered adrenal cancer. Their treatment is described in our documents on these cancers. . | Gene Selection for Cancer Classiiication using Support Vector Machines Isabelle Guyon Jason Weston Stephen Barnhill . and Vladimir Vapnik Barnhill Bioinformatics Savannah Georgia USA AT T Labs Red Bank New Jersey USA Address correspondence to Isabelle Guyon 955 Creston Road Berkeley CA 94708. Tel 510 524 6211. Email isabelle@ Submitted to Machine Learning. Summary DNA micro-arrays now permit scientists to screen thousands of genes simultaneously and determine whether those genes are active hyperactive or silent in normal or cancerous tissue. Because these new micro-array devices generate bewildering amounts of raw data new analytical methods must be developed to sort out whether cancer tissues have distinctive signatures of gene expression over normal tissues or other types of cancer tissues. In this paper we address the problem of selection of a small subset of genes from broad patterns of gene expression data recorded on DNA micro-arrays. Using available training examples from cancer and normal patients we build a classifier suitable for genetic diagnosis as well as drug discovery. Previous attempts to address this problem select genes with correlation techniques. We propose a new method of gene selection utilizing Support Vector Machine methods based on Recursive Feature Elimination RFE . We demonstrate experimentally that the genes selected by our techniques yield better classification performance and are biologically relevant to cancer. In contrast with the baseline method our method eliminates gene redundancy automatically and yields better and more compact gene subsets. In patients with leukemia our method discovered 2 genes that yield zero leave-one-out error while 64 genes are necessary for the baseline method to get the best result one leave-one-out error . In the colon cancer database using only 4 genes our method is 98 accurate while the baseline method is only 86 accurate. Keywords Diagnosis diagnostic tests drug discovery .
đang nạp các trang xem trước