Datasets
WHO’s Communicable Disease Global Atlas
http://globalatlas.who.int/
UCI Knowledge Discovery in Databases Archive
http://kdd.ics.uci.edu/
Raw input data for "small" Sequoia benchmark
http://epoch.cs.berkeley.edu:8000/sequoia/benchmark/
Kent Ridge Biomedical Data Set Repository
http://sdmc.i2r.a-star.edu.sg/rp/
Summarization of Kent Ridge Biomedical Data Set Repository:
Breast Cancer: 78 +19 samples, 24481 features(genes), 2 classes;
Central Nervous System : 60 samples, 7129 genes, 2 classes;
Colon Tumor: 62 samples, 2000 genes, 2 classes;
Diffuse Large B-Cell Lymphoma (DLBCL)
DLBCL-Stanford: 47 samples, 4026 genes, two classes;
DLBCL-Harvard : 58+19 samples, 6817 genes, 2 classes;
DLBCL-NIH: 240 samples, 7399 microarray features, 2 classes;
Leukemia
Leukemia-ALLAML (WhiteHead, MIT) : 38+34 samples, 7129 probes from 6817 genes, 2 classes;
Leukemia-MLL (WhiteHead, MIT) : 57+15 samples, 12582 genes, 3 classes;
Leukemia-subtype (Stjude) : 215+112 samples, 12558 genes, 7 classes;
Lung Cancer
LungCancer-DanaFarberCancerInstitute-HarvardMedicalSchool : 203 samples, 12600 genes, 5 classes;
LungCancer-BrighamAndWomenHospital-HarvardMedicalSchool : 181 samples, 12533 genes, 2 classes;
LungCancer-Michigan : 86+10 samples, 7129 genes, 2 classes;
LungCancer-Ontario : 39 samples, 2880 genes, 2 classes;
Ovarian Cancer
OvarianCancer-NCI-PBSII-061902 : 91+162 samples, 15154 M/Z identities, 2 classes; OvarianCancer-NCI-QStar : 216 samples, 373401 features, 2 classes;
Prostate Cancer : (a) 52+50+25+9 samples, 126000 genes, two classes; (b) 21 samples, two classes
Genomic Sequences
Translation Initiation Site Prediction : 3312 sequences, 927 features, two classes; Polyadenylation Signal Prediction: 2327 (training) + 982 (testing), 168 features, two classes
0 Comments:
Post a Comment
<< Home