Class Prediction Analysis for Gene Expression Data

For each sample that is provided with this package, it becomes possible to predict which class it belongs to using a multivariate predictor. In addition to Compound Covariate Predictor, Diagonal Linear Discriminant Analysis, Nearest Neighbor Predictor, Nearest Centroid Predictor, and Support Vector Machine Predictor, there are many other multivariate classification methods. A graphical representation of how accurately this multivariate class predictor can determine all class predictions requested is provided in this package.

The complete evaluation of the process is carried out by using cross-validation methods including leave-one-out cross-validation, k-fold validation, and bootstrap validation with 0.632+. Each classifier’s performance is examined along with a cross-validated estimate of misclassification rate. The full dataset can be used to classify new samples based on specific predictors and multivariate predictors.


Set working directory:

setwd("C:/Users/manso/OneDrive - University of West London/MSc Bioinformatics - UWL/6.BGA - Bioinformatics and Genome Analysis/week 5 - Microarray analysis/practical")

Install packages and load libraries:

if (!require("BiocManager", quietly = TRUE))

#Class prediction package
install.packages("", repos = NULL, type="source")


1. Get Built-in sample data - Cancer data

Expression data:

dataset <- "Brca"

x <- read.delim(system.file("extdata", paste0(dataset, "_LOGRAT.TXT"), package = "classpredict"), header = FALSE)

2. Class Information

expdesign <- read.delim(system.file("extdata", paste0(dataset, "_EXPDESIGN.txt"), package = "classpredict"), = TRUE)

  Patient.Array PID BRCA1.v.BRCA2.v.Sporadic BRCA1.V.BRCA2
1         s1321  20                 Sporadic              
2         s1996   1                    BRCA1         BRCA1
3         s1822   5                    BRCA1         BRCA1
4         s1714   3                    BRCA1         BRCA1
5         s1224   7                    BRCA1         BRCA1
6         s1252   2                    BRCA1         BRCA1
  BRCA1.v.Sporadic BRCA2.v.Sporadic BRCA1.v.notBRCA1 BRCA2.v.notBRCA2
1                                           notBRCA1         notBRCA2
2            BRCA1                             BRCA1         notBRCA2
3            BRCA1                             BRCA1         notBRCA2
4            BRCA1                             BRCA1         notBRCA2
5            BRCA1                             BRCA1         notBRCA2
6            BRCA1                             BRCA1         notBRCA2
  group predictTest
1     a    training
2     a    training
3     b    training
4     b    training
5     c    training
6     c    training

3. Class Prediction Analysis

The “classPredict” function calculates multiple classifiers that are used to predict the class of a new sample, implementing the class prediction tool with multiple methods in BRB-ArrayTools. This package provides test.classPrediction for a quick start of class prediction analysis over one of the built-in sample data (i.e., “Brca”, “Perou”, and “Pomeroy”).

res1 <- test.classPredict('Brca', outputName = "ClassPrediction_Brca", 
generateHTML = TRUE)
Getting analysis results ...
## Getting analysis results ...
res2 <- test.classPredict('Pomeroy', outputName = "ClassPrediction_Pomeroy", 
generateHTML = TRUE)
Getting analysis results ...
## Getting analysis results ...
res3 <- test.classPredict('Perou', outputName = "ClassPrediction_Brca", 
generateHTML = TRUE)
Getting analysis results ...

4. List Objects In The Results

Explanation about each object:

res$performClass is a data frame with the performance of classifiers during cross-validation:

res$percentCorrectClass is a data frame with the mean percent of correct classification for each sample using different prediction methods:

res$classifierTable Data frame with composition of classifiers such as geometric means of values in each class, p-values and Gene IDs
res$probInClass Data frame with predicted probability of each training sample belonging to a class during cross-validation from the Bayesian Compound Covariate
res$CCPSenSpec Data frame with performance (i.e., sensitivity, specificity, positive prediction value, negative prediction value) of the Compound Covariate Predictor Classifier
res$LDASenSpec Data frame with performance (i.e., sensitivity, specificity, positive prediction value, negative prediction value) of the Diagonal Linear Discriminant Analysis Classifier.
res$K1NNSenSpec Data frame with performance (i.e., sensitivity, specificity, positive prediction value, negative prediction value) of the 1-Nearest Neighbor Classifier
res$K3NNSenSpec Data frame with performance (i.e., sensitivity, specificity, positive prediction value, negative prediction value) of the 3-Nearest Neighbor Classifier
res$CentroidSenSpec Data frame with performance (i.e., sensitivity, * specificity, positive prediction value, negative | prediction value) of the Nearest Centroid Classifier
res$SVMSenSpec Data frame with performance (i.e., sensitivity, specificity, positive prediction value, negative prediction value) of the Support Vector Machine Classifier
res$BCPPSenSpec Data frame with performance (i.e., sensitivity, specificity, positive prediction value, negative prediction value) of the Bayesian Compound Covariate Classifier
res$weightLinearPred Data frame with gene weights for linear predictors such ** as Compound Covariate Predictor, Diagonal Linear Discriminant Analysis and Support Vector Machine
res$thresholdLinearPred Contains the thresholds for the linear prediction rules related with res$weightLinearPred. Each prediction rule is defined by the inner sum of the weights (wiwi) and log expression values (xixi) of significant genes. In this case, a sample is classified to the class BRCA1 if the sum is greater than the threshold; that is, ∑iwixi>threshold∑iwixi>threshold
res$GRPCentroid Data frame with centroid of each class for each predictor gene
res$pmethod Vector of prediction methods that are specified
res$workPath Path for Fortran and other intermediate outputs

5. Producing ROC Curves

Cross-validation ROC curves are provided for Compound Covariate Predictor, Diagonal Linear Discriminant Analysis and Bayesian Compound Covariate Classifiers.


