Fundamentals of DNA and RNA analysis - Using BioMart

Beatriz Manso
2021-12-02

Introduction

BioMart is an easy-to-use, web-based tool that allows users to extract data without the need for programming skills or understanding of the underlying database structure. It provides easy navigation using the left panel and in the right panel, filters and attributes can be chosen.

Methods

1: Install biomaRt package using BiocManager and load libraries

#Set working directory
setwd("C:/Users/manso/OneDrive - University of West London/MSc Bioinformatics - UWL/done/BFG - Bioinformatics and Functional Genomics - final mark =/Practicals")
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install()
BiocManager::install("biomaRt")
library("biomaRt") 

2. List available Ensembl BioMart web services using listEnsembl()

        biomart                version
1         genes      Ensembl Genes 106
2 mouse_strains      Mouse strains 106
3          snps  Ensembl Variation 106
4    regulation Ensembl Regulation 106
               biomart                version
1 ENSEMBL_MART_ENSEMBL      Ensembl Genes 106
2   ENSEMBL_MART_MOUSE      Mouse strains 106
3     ENSEMBL_MART_SNP  Ensembl Variation 106
4 ENSEMBL_MART_FUNCGEN Ensembl Regulation 106

3. Select database

The first step in analyzing biomart data is selecting a database. We will connect to Ensembl’s Human Genes BioMart using the commands below.

ensembl <- useEnsembl(biomart = "genes", 
                      dataset = "hsapiens_gene_ensembl")

With the default parameters for useEnsembl(), your queries will be routed to the nearest geographical mirror.

However, it’s possible to use the mirror argument to explicitly request a specific mirror:

ensembl <- useEnsembl(biomart = "genes", 
                   dataset = "hsapiens_gene_ensembl", 
                   mirror = "uswest")

listDatasets() will show us what datasets are available in the selected biomart=“genes” :

datasets <- listDatasets(ensembl)
head(datasets) 
                       dataset                           description
1 abrachyrhynchus_gene_ensembl Pink-footed goose genes (ASM259213v1)
2     acalliptera_gene_ensembl      Eastern happy genes (fAstCal1.2)
3   acarolinensis_gene_ensembl       Green anole genes (AnoCar2.0v2)
4    acchrysaetos_gene_ensembl       Golden eagle genes (bAquChr1.2)
5    acitrinellus_gene_ensembl        Midas cichlid genes (Midas_v5)
6    amelanoleuca_gene_ensembl       Giant panda genes (ASM200744v2)
      version
1 ASM259213v1
2  fAstCal1.2
3 AnoCar2.0v2
4  bAquChr1.2
5    Midas_v5
6 ASM200744v2

4. Find entries matching a specific term or pattern, for example “hsapiens”

searchDatasets(mart = ensembl, pattern = "hsapiens")
                 dataset              description    version
81 hsapiens_gene_ensembl Human genes (GRCh38.p13) GRCh38.p13
ensembl = useDataset("hsapiens_gene_ensembl", mart=ensembl)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")

Now that we have selected our dataset we’ll create a query and send it to Emsembl BioMart server.

5. Send query to Ensembl BioMart server using getBM()

getBM() has 4 main arguments:

listFilters() displays all the available filters in the selected dataset:

filters = listFilters(ensembl)
filters[1:5,]
             name              description
1 chromosome_name Chromosome/scaffold name
2           start                    Start
3             end                      End
4      band_start               Band Start
5        band_end                 Band End

Attributes define the data we are interested in retrieving.

listAttributes() displays all the available attributes in the selected dataset:

attributes = listAttributes(ensembl)
attributes[1:5,] 
                           name                  description
1               ensembl_gene_id               Gene stable ID
2       ensembl_gene_id_version       Gene stable ID version
3         ensembl_transcript_id         Transcript stable ID
4 ensembl_transcript_id_version Transcript stable ID version
5            ensembl_peptide_id            Protein stable ID
          page
1 feature_page
2 feature_page
3 feature_page
4 feature_page
5 feature_page

6. Build a biomaRt query

Let’s apply this knowledge to solve the following problem:

We have a list of Affymetrix identifiers from the u133plus2 platform and we want to retrieve the corresponding EntrezGene identifiers using Ensembl mappings.

The u133plus2 platform will be the filter for this query and as values for this filter we use our list of Affymetrix identifiers.

As output (attributes) for the query we want to retrieve the EntrezGene and u133plus2 identifiers so we get a mapping of these two identifiers as a result.

The exact names that we will have to use to specify the attributes and filters can be retrieved with the listAttributes() and listFilters() function respectively.

Run the query:

affyids <- c("202763_at","209310_s_at","207500_at")

getBM(attributes = c('affy_hg_u133_plus_2', 
                     'entrezgene_id', 
                     'ensembl_gene_id'),
      filters = 'affy_hg_u133_plus_2',
      values = affyids, 
      mart = ensembl)
  affy_hg_u133_plus_2 entrezgene_id ensembl_gene_id
1           202763_at           836 ENSG00000164305
2         209310_s_at           837 ENSG00000196954
3           207500_at           838 ENSG00000137757

7. Use biomaRT to retrieve large amounts of data

listAttributes() and listFilters() - return every available option for their respective types, which can produce a very long output where it is hard to find the value you are interested in.

searchAttributes() and searchFilters() - will try to find any entries matching a specific term or pattern. searchDatasets() is similar.

In this example we get the details for all the attributes that contain the pattern “hgnc”:

searchAttributes(mart = ensembl, pattern = "hgnc")
              name        description         page
62         hgnc_id            HGNC ID feature_page
63     hgnc_symbol        HGNC symbol feature_page
91 hgnc_trans_name Transcript name ID feature_page
searchDatasets(mart = ensembl, pattern = "hsapiens")
                 dataset              description    version
81 hsapiens_gene_ensembl Human genes (GRCh38.p13) GRCh38.p13

The pattern argument takes a regular expression, which means we can create more complex queries if required.

For example, if we have the string ENST00000577249.1, and aren not sure what the appropriate filter term is. The example bellow uses a pattern that will find all filters that contain the terms “ensembl” and “id”, allowing us to reduced the list of filters.

searchFilters(mart = ensembl, pattern = "ensembl.*id")
                            name
52               ensembl_gene_id
53       ensembl_gene_id_version
54         ensembl_transcript_id
55 ensembl_transcript_id_version
56            ensembl_peptide_id
57    ensembl_peptide_id_version
58               ensembl_exon_id
                                                      description
52                       Gene stable ID(s) [e.g. ENSG00000000003]
53       Gene stable ID(s) with version [e.g. ENSG00000000003.15]
54                 Transcript stable ID(s) [e.g. ENST00000000233]
55 Transcript stable ID(s) with version [e.g. ENST00000000233.10]
56                    Protein stable ID(s) [e.g. ENSP00000000233]
57     Protein stable ID(s) with version [e.g. ENSP00000000233.5]
58                              Exon ID(s) [e.g. ENSE00000000003]

Now we can compare ENST00000577249.1 and see it is a Transcript ID with version. Thus the appropriate filter value to use with it is ensembl_transcript_id_version.

listFilterOptions(mart = ensembl, filter = "chromosome_name")
  [1] "1"                                     
  [2] "2"                                     
  [3] "3"                                     
  [4] "4"                                     
  [5] "5"                                     
  [6] "6"                                     
  [7] "7"                                     
  [8] "8"                                     
  [9] "9"                                     
 [10] "10"                                    
 [11] "11"                                    
 [12] "12"                                    
 [13] "13"                                    
 [14] "14"                                    
 [15] "15"                                    
 [16] "16"                                    
 [17] "17"                                    
 [18] "18"                                    
 [19] "19"                                    
 [20] "20"                                    
 [21] "21"                                    
 [22] "22"                                    
 [23] "CHR_HG1_PATCH"                         
 [24] "CHR_HG26_PATCH"                        
 [25] "CHR_HG28_PATCH"                        
 [26] "CHR_HG30_PATCH"                        
 [27] "CHR_HG76_PATCH"                        
 [28] "CHR_HG107_PATCH"                       
 [29] "CHR_HG109_PATCH"                       
 [30] "CHR_HG126_PATCH"                       
 [31] "CHR_HG142_HG150_NOVEL_TEST"            
 [32] "CHR_HG151_NOVEL_TEST"                  
 [33] "CHR_HG439_PATCH"                       
 [34] "CHR_HG545_PATCH"                       
 [35] "CHR_HG699_PATCH"                       
 [36] "CHR_HG705_PATCH"                       
 [37] "CHR_HG708_PATCH"                       
 [38] "CHR_HG721_PATCH"                       
 [39] "CHR_HG926_PATCH"                       
 [40] "CHR_HG986_PATCH"                       
 [41] "CHR_HG1277_PATCH"                      
 [42] "CHR_HG1298_PATCH"                      
 [43] "CHR_HG1309_PATCH"                      
 [44] "CHR_HG1311_PATCH"                      
 [45] "CHR_HG1320_PATCH"                      
 [46] "CHR_HG1342_HG2282_PATCH"               
 [47] "CHR_HG1362_PATCH"                      
 [48] "CHR_HG1384_PATCH"                      
 [49] "CHR_HG1395_PATCH"                      
 [50] "CHR_HG1398_PATCH"                      
 [51] "CHR_HG1445_PATCH"                      
 [52] "CHR_HG1485_PATCH"                      
 [53] "CHR_HG1524_PATCH"                      
 [54] "CHR_HG1531_PATCH"                      
 [55] "CHR_HG1535_PATCH"                      
 [56] "CHR_HG1651_PATCH"                      
 [57] "CHR_HG1708_PATCH"                      
 [58] "CHR_HG1815_PATCH"                      
 [59] "CHR_HG1832_PATCH"                      
 [60] "CHR_HG2002_PATCH"                      
 [61] "CHR_HG2021_PATCH"                      
 [62] "CHR_HG2022_PATCH"                      
 [63] "CHR_HG2023_PATCH"                      
 [64] "CHR_HG2030_PATCH"                      
 [65] "CHR_HG2046_PATCH"                      
 [66] "CHR_HG2047_PATCH"                      
 [67] "CHR_HG2057_PATCH"                      
 [68] "CHR_HG2058_PATCH"                      
 [69] "CHR_HG2060_PATCH"                      
 [70] "CHR_HG2062_PATCH"                      
 [71] "CHR_HG2063_PATCH"                      
 [72] "CHR_HG2066_PATCH"                      
 [73] "CHR_HG2067_PATCH"                      
 [74] "CHR_HG2072_PATCH"                      
 [75] "CHR_HG2087_PATCH"                      
 [76] "CHR_HG2088_PATCH"                      
 [77] "CHR_HG2095_PATCH"                      
 [78] "CHR_HG2104_PATCH"                      
 [79] "CHR_HG2111_PATCH"                      
 [80] "CHR_HG2114_PATCH"                      
 [81] "CHR_HG2115_PATCH"                      
 [82] "CHR_HG2116_PATCH"                      
 [83] "CHR_HG2121_PATCH"                      
 [84] "CHR_HG2128_PATCH"                      
 [85] "CHR_HG2133_PATCH"                      
 [86] "CHR_HG2191_PATCH"                      
 [87] "CHR_HG2198_PATCH"                      
 [88] "CHR_HG2213_PATCH"                      
 [89] "CHR_HG2217_PATCH"                      
 [90] "CHR_HG2232_PATCH"                      
 [91] "CHR_HG2233_PATCH"                      
 [92] "CHR_HG2235_PATCH"                      
 [93] "CHR_HG2236_PATCH"                      
 [94] "CHR_HG2239_PATCH"                      
 [95] "CHR_HG2246_HG2248_HG2276_PATCH"        
 [96] "CHR_HG2247_PATCH"                      
 [97] "CHR_HG2249_PATCH"                      
 [98] "CHR_HG2263_PATCH"                      
 [99] "CHR_HG2266_PATCH"                      
[100] "CHR_HG2285_HG106_HG2252_PATCH"         
[101] "CHR_HG2288_HG2289_PATCH"               
[102] "CHR_HG2290_PATCH"                      
[103] "CHR_HG2291_PATCH"                      
[104] "CHR_HG2334_PATCH"                      
[105] "CHR_HG2365_PATCH"                      
[106] "CHR_HG2412_PATCH"                      
[107] "CHR_HG2419_PATCH"                      
[108] "CHR_HG2442_PATCH"                      
[109] "CHR_HG2471_PATCH"                      
[110] "CHR_HG2499_PATCH"                      
[111] "CHR_HG2509_PATCH"                      
[112] "CHR_HG2510_PATCH"                      
[113] "CHR_HG2511_PATCH"                      
[114] "CHR_HG2512_PATCH"                      
[115] "CHR_HG2513_PATCH"                      
[116] "CHR_HG2525_PATCH"                      
[117] "CHR_HSCHRX_1_CTG3"                     
[118] "CHR_HSCHRX_2_CTG3"                     
[119] "CHR_HSCHRX_2_CTG12"                    
[120] "CHR_HSCHR1_ALT2_1_CTG32_1"             
[121] "CHR_HSCHR1_1_CTG3"                     
[122] "CHR_HSCHR1_1_CTG11"                    
[123] "CHR_HSCHR1_1_CTG31"                    
[124] "CHR_HSCHR1_1_CTG32_1"                  
[125] "CHR_HSCHR1_2_CTG3"                     
[126] "CHR_HSCHR1_2_CTG31"                    
[127] "CHR_HSCHR1_2_CTG32_1"                  
[128] "CHR_HSCHR1_3_CTG3"                     
[129] "CHR_HSCHR1_3_CTG31"                    
[130] "CHR_HSCHR1_3_CTG32_1"                  
[131] "CHR_HSCHR1_4_CTG3"                     
[132] "CHR_HSCHR1_4_CTG31"                    
[133] "CHR_HSCHR1_5_CTG3"                     
[134] "CHR_HSCHR1_5_CTG32_1"                  
[135] "CHR_HSCHR1_6_CTG3"                     
[136] "CHR_HSCHR1_8_CTG3"                     
[137] "CHR_HSCHR1_9_CTG3"                     
[138] "CHR_HSCHR2_1_CTG1"                     
[139] "CHR_HSCHR2_1_CTG5"                     
[140] "CHR_HSCHR2_1_CTG7"                     
[141] "CHR_HSCHR2_1_CTG7_2"                   
[142] "CHR_HSCHR2_1_CTG15"                    
[143] "CHR_HSCHR2_2_CTG1"                     
[144] "CHR_HSCHR2_2_CTG7"                     
[145] "CHR_HSCHR2_2_CTG7_2"                   
[146] "CHR_HSCHR2_2_CTG15"                    
[147] "CHR_HSCHR2_3_CTG1"                     
[148] "CHR_HSCHR2_3_CTG7_2"                   
[149] "CHR_HSCHR2_3_CTG15"                    
[150] "CHR_HSCHR2_4_CTG1"                     
[151] "CHR_HSCHR2_6_CTG7_2"                   
[152] "CHR_HSCHR2_7_CTG7_2"                   
[153] "CHR_HSCHR2_8_CTG7_2"                   
[154] "CHR_HSCHR3_1_CTG1"                     
[155] "CHR_HSCHR3_1_CTG2_1"                   
[156] "CHR_HSCHR3_1_CTG3"                     
[157] "CHR_HSCHR3_2_CTG2_1"                   
[158] "CHR_HSCHR3_2_CTG3"                     
[159] "CHR_HSCHR3_3_CTG1"                     
[160] "CHR_HSCHR3_3_CTG3"                     
[161] "CHR_HSCHR3_4_CTG1"                     
[162] "CHR_HSCHR3_4_CTG2_1"                   
[163] "CHR_HSCHR3_4_CTG3"                     
[164] "CHR_HSCHR3_5_CTG1"                     
[165] "CHR_HSCHR3_5_CTG2_1"                   
[166] "CHR_HSCHR3_5_CTG3"                     
[167] "CHR_HSCHR3_6_CTG2_1"                   
[168] "CHR_HSCHR3_6_CTG3"                     
[169] "CHR_HSCHR3_7_CTG3"                     
[170] "CHR_HSCHR3_8_CTG2_1"                   
[171] "CHR_HSCHR3_8_CTG3"                     
[172] "CHR_HSCHR3_9_CTG2_1"                   
[173] "CHR_HSCHR3_9_CTG3"                     
[174] "CHR_HSCHR4_1_CTG4"                     
[175] "CHR_HSCHR4_1_CTG6"                     
[176] "CHR_HSCHR4_1_CTG9"                     
[177] "CHR_HSCHR4_1_CTG12"                    
[178] "CHR_HSCHR4_2_CTG4"                     
[179] "CHR_HSCHR4_2_CTG12"                    
[180] "CHR_HSCHR4_3_CTG12"                    
[181] "CHR_HSCHR4_4_CTG12"                    
[182] "CHR_HSCHR4_5_CTG12"                    
[183] "CHR_HSCHR4_6_CTG12"                    
[184] "CHR_HSCHR4_7_CTG12"                    
[185] "CHR_HSCHR4_8_CTG12"                    
[186] "CHR_HSCHR4_9_CTG12"                    
[187] "CHR_HSCHR4_11_CTG12"                   
[188] "CHR_HSCHR4_12_CTG12"                   
[189] "CHR_HSCHR5_1_CTG1"                     
[190] "CHR_HSCHR5_1_CTG1_1"                   
[191] "CHR_HSCHR5_1_CTG5"                     
[192] "CHR_HSCHR5_2_CTG1"                     
[193] "CHR_HSCHR5_2_CTG1_1"                   
[194] "CHR_HSCHR5_2_CTG5"                     
[195] "CHR_HSCHR5_3_CTG1"                     
[196] "CHR_HSCHR5_3_CTG5"                     
[197] "CHR_HSCHR5_4_CTG1"                     
[198] "CHR_HSCHR5_4_CTG1_1"                   
[199] "CHR_HSCHR5_5_CTG1"                     
[200] "CHR_HSCHR5_6_CTG1"                     
[201] "CHR_HSCHR5_7_CTG1"                     
[202] "CHR_HSCHR5_8_CTG1"                     
[203] "CHR_HSCHR6_MHC_APD_CTG1"               
[204] "CHR_HSCHR6_MHC_COX_CTG1"               
[205] "CHR_HSCHR6_MHC_DBB_CTG1"               
[206] "CHR_HSCHR6_MHC_MANN_CTG1"              
[207] "CHR_HSCHR6_MHC_MCF_CTG1"               
[208] "CHR_HSCHR6_MHC_QBL_CTG1"               
[209] "CHR_HSCHR6_MHC_SSTO_CTG1"              
[210] "CHR_HSCHR6_1_CTG2"                     
[211] "CHR_HSCHR6_1_CTG3"                     
[212] "CHR_HSCHR6_1_CTG4"                     
[213] "CHR_HSCHR6_1_CTG5"                     
[214] "CHR_HSCHR6_1_CTG6"                     
[215] "CHR_HSCHR6_1_CTG7"                     
[216] "CHR_HSCHR6_1_CTG8"                     
[217] "CHR_HSCHR6_1_CTG9"                     
[218] "CHR_HSCHR6_8_CTG1"                     
[219] "CHR_HSCHR7_1_CTG1"                     
[220] "CHR_HSCHR7_1_CTG4_4"                   
[221] "CHR_HSCHR7_1_CTG6"                     
[222] "CHR_HSCHR7_1_CTG7"                     
[223] "CHR_HSCHR7_2_CTG1"                     
[224] "CHR_HSCHR7_2_CTG4_4"                   
[225] "CHR_HSCHR7_2_CTG6"                     
[226] "CHR_HSCHR7_2_CTG7"                     
[227] "CHR_HSCHR7_3_CTG1"                     
[228] "CHR_HSCHR7_3_CTG4_4"                   
[229] "CHR_HSCHR7_3_CTG6"                     
[230] "CHR_HSCHR8_1_CTG1"                     
[231] "CHR_HSCHR8_1_CTG6"                     
[232] "CHR_HSCHR8_1_CTG7"                     
[233] "CHR_HSCHR8_2_CTG1"                     
[234] "CHR_HSCHR8_2_CTG7"                     
[235] "CHR_HSCHR8_3_CTG1"                     
[236] "CHR_HSCHR8_3_CTG7"                     
[237] "CHR_HSCHR8_4_CTG1"                     
[238] "CHR_HSCHR8_4_CTG7"                     
[239] "CHR_HSCHR8_5_CTG1"                     
[240] "CHR_HSCHR8_5_CTG7"                     
[241] "CHR_HSCHR8_6_CTG1"                     
[242] "CHR_HSCHR8_7_CTG1"                     
[243] "CHR_HSCHR8_7_CTG7"                     
[244] "CHR_HSCHR8_8_CTG1"                     
[245] "CHR_HSCHR8_9_CTG1"                     
[246] "CHR_HSCHR9_1_CTG1"                     
[247] "CHR_HSCHR9_1_CTG2"                     
[248] "CHR_HSCHR9_1_CTG3"                     
[249] "CHR_HSCHR9_1_CTG4"                     
[250] "CHR_HSCHR9_1_CTG5"                     
[251] "CHR_HSCHR9_1_CTG6"                     
[252] "CHR_HSCHR10_1_CTG1"                    
[253] "CHR_HSCHR10_1_CTG2"                    
[254] "CHR_HSCHR10_1_CTG3"                    
[255] "CHR_HSCHR10_1_CTG4"                    
[256] "CHR_HSCHR10_1_CTG6"                    
[257] "CHR_HSCHR11_1_CTG1_2"                  
[258] "CHR_HSCHR11_1_CTG3_1"                  
[259] "CHR_HSCHR11_1_CTG5"                    
[260] "CHR_HSCHR11_1_CTG6"                    
[261] "CHR_HSCHR11_1_CTG7"                    
[262] "CHR_HSCHR11_1_CTG8"                    
[263] "CHR_HSCHR11_2_CTG1"                    
[264] "CHR_HSCHR11_2_CTG1_1"                  
[265] "CHR_HSCHR11_2_CTG8"                    
[266] "CHR_HSCHR11_3_CTG1"                    
[267] "CHR_HSCHR12_1_CTG1"                    
[268] "CHR_HSCHR12_1_CTG2_1"                  
[269] "CHR_HSCHR12_2_CTG2"                    
[270] "CHR_HSCHR12_2_CTG2_1"                  
[271] "CHR_HSCHR12_3_CTG2"                    
[272] "CHR_HSCHR12_3_CTG2_1"                  
[273] "CHR_HSCHR12_4_CTG2"                    
[274] "CHR_HSCHR12_4_CTG2_1"                  
[275] "CHR_HSCHR12_5_CTG2"                    
[276] "CHR_HSCHR12_5_CTG2_1"                  
[277] "CHR_HSCHR12_6_CTG2_1"                  
[278] "CHR_HSCHR12_8_CTG2_1"                  
[279] "CHR_HSCHR12_9_CTG2_1"                  
[280] "CHR_HSCHR13_1_CTG1"                    
[281] "CHR_HSCHR13_1_CTG3"                    
[282] "CHR_HSCHR13_1_CTG5"                    
[283] "CHR_HSCHR14_1_CTG1"                    
[284] "CHR_HSCHR14_2_CTG1"                    
[285] "CHR_HSCHR14_3_CTG1"                    
[286] "CHR_HSCHR14_7_CTG1"                    
[287] "CHR_HSCHR14_8_CTG1"                    
[288] "CHR_HSCHR14_9_CTG1"                    
[289] "CHR_HSCHR15_1_CTG1"                    
[290] "CHR_HSCHR15_1_CTG3"                    
[291] "CHR_HSCHR15_1_CTG8"                    
[292] "CHR_HSCHR15_2_CTG3"                    
[293] "CHR_HSCHR15_2_CTG8"                    
[294] "CHR_HSCHR15_3_CTG3"                    
[295] "CHR_HSCHR15_3_CTG8"                    
[296] "CHR_HSCHR15_4_CTG8"                    
[297] "CHR_HSCHR15_5_CTG8"                    
[298] "CHR_HSCHR15_6_CTG8"                    
[299] "CHR_HSCHR16_CTG2"                      
[300] "CHR_HSCHR16_1_CTG1"                    
[301] "CHR_HSCHR16_1_CTG3_1"                  
[302] "CHR_HSCHR16_2_CTG3_1"                  
[303] "CHR_HSCHR16_3_CTG1"                    
[304] "CHR_HSCHR16_4_CTG1"                    
[305] "CHR_HSCHR16_4_CTG3_1"                  
[306] "CHR_HSCHR16_5_CTG1"                    
[307] "CHR_HSCHR16_5_CTG3_1"                  
[308] "CHR_HSCHR17_1_CTG1"                    
[309] "CHR_HSCHR17_1_CTG2"                    
[310] "CHR_HSCHR17_1_CTG4"                    
[311] "CHR_HSCHR17_1_CTG5"                    
[312] "CHR_HSCHR17_1_CTG9"                    
[313] "CHR_HSCHR17_2_CTG1"                    
[314] "CHR_HSCHR17_2_CTG2"                    
[315] "CHR_HSCHR17_2_CTG4"                    
[316] "CHR_HSCHR17_2_CTG5"                    
[317] "CHR_HSCHR17_3_CTG1"                    
[318] "CHR_HSCHR17_3_CTG2"                    
[319] "CHR_HSCHR17_3_CTG4"                    
[320] "CHR_HSCHR17_4_CTG4"                    
[321] "CHR_HSCHR17_5_CTG4"                    
[322] "CHR_HSCHR17_6_CTG4"                    
[323] "CHR_HSCHR17_7_CTG4"                    
[324] "CHR_HSCHR17_8_CTG4"                    
[325] "CHR_HSCHR17_9_CTG4"                    
[326] "CHR_HSCHR17_10_CTG4"                   
[327] "CHR_HSCHR17_11_CTG4"                   
[328] "CHR_HSCHR17_12_CTG4"                   
[329] "CHR_HSCHR18_ALT2_CTG2_1"               
[330] "CHR_HSCHR18_ALT21_CTG2_1"              
[331] "CHR_HSCHR18_1_CTG1"                    
[332] "CHR_HSCHR18_1_CTG1_1"                  
[333] "CHR_HSCHR18_1_CTG2_1"                  
[334] "CHR_HSCHR18_2_CTG1_1"                  
[335] "CHR_HSCHR18_2_CTG2"                    
[336] "CHR_HSCHR18_2_CTG2_1"                  
[337] "CHR_HSCHR18_3_CTG2_1"                  
[338] "CHR_HSCHR18_5_CTG1_1"                  
[339] "CHR_HSCHR19_1_CTG2"                    
[340] "CHR_HSCHR19_1_CTG3_1"                  
[341] "CHR_HSCHR19_2_CTG2"                    
[342] "CHR_HSCHR19_2_CTG3_1"                  
[343] "CHR_HSCHR19_3_CTG2"                    
[344] "CHR_HSCHR19_3_CTG3_1"                  
[345] "CHR_HSCHR19_4_CTG2"                    
[346] "CHR_HSCHR19_4_CTG3_1"                  
[347] "CHR_HSCHR19_5_CTG2"                    
[348] "CHR_HSCHR19KIR_ABC08_AB_HAP_C_P_CTG3_1"
[349] "CHR_HSCHR19KIR_ABC08_AB_HAP_T_P_CTG3_1"
[350] "CHR_HSCHR19KIR_ABC08_A1_HAP_CTG3_1"    
[351] "CHR_HSCHR19KIR_CA01-TA01_1_CTG3_1"     
[352] "CHR_HSCHR19KIR_CA01-TA01_2_CTG3_1"     
[353] "CHR_HSCHR19KIR_CA01-TB01_CTG3_1"       
[354] "CHR_HSCHR19KIR_CA01-TB04_CTG3_1"       
[355] "CHR_HSCHR19KIR_CA04_CTG3_1"            
[356] "CHR_HSCHR19KIR_FH05_A_HAP_CTG3_1"      
[357] "CHR_HSCHR19KIR_FH05_B_HAP_CTG3_1"      
[358] "CHR_HSCHR19KIR_FH06_A_HAP_CTG3_1"      
[359] "CHR_HSCHR19KIR_FH06_BA1_HAP_CTG3_1"    
[360] "CHR_HSCHR19KIR_FH08_A_HAP_CTG3_1"      
[361] "CHR_HSCHR19KIR_FH08_BAX_HAP_CTG3_1"    
[362] "CHR_HSCHR19KIR_FH13_A_HAP_CTG3_1"      
[363] "CHR_HSCHR19KIR_FH13_BA2_HAP_CTG3_1"    
[364] "CHR_HSCHR19KIR_FH15_A_HAP_CTG3_1"      
[365] "CHR_HSCHR19KIR_FH15_B_HAP_CTG3_1"      
[366] "CHR_HSCHR19KIR_GRC212_AB_HAP_CTG3_1"   
[367] "CHR_HSCHR19KIR_GRC212_BA1_HAP_CTG3_1"  
[368] "CHR_HSCHR19KIR_G085_A_HAP_CTG3_1"      
[369] "CHR_HSCHR19KIR_G085_BA1_HAP_CTG3_1"    
[370] "CHR_HSCHR19KIR_G248_A_HAP_CTG3_1"      
[371] "CHR_HSCHR19KIR_G248_BA2_HAP_CTG3_1"    
[372] "CHR_HSCHR19KIR_HG2393_CTG3_1"          
[373] "CHR_HSCHR19KIR_HG2394_CTG3_1"          
[374] "CHR_HSCHR19KIR_HG2396_CTG3_1"          
[375] "CHR_HSCHR19KIR_LUCE_A_HAP_CTG3_1"      
[376] "CHR_HSCHR19KIR_LUCE_BDEL_HAP_CTG3_1"   
[377] "CHR_HSCHR19KIR_RP5_B_HAP_CTG3_1"       
[378] "CHR_HSCHR19KIR_RSH_A_HAP_CTG3_1"       
[379] "CHR_HSCHR19KIR_RSH_BA2_HAP_CTG3_1"     
[380] "CHR_HSCHR19KIR_T7526_A_HAP_CTG3_1"     
[381] "CHR_HSCHR19KIR_T7526_BDEL_HAP_CTG3_1"  
[382] "CHR_HSCHR19KIR_0010-5217-AB_CTG3_1"    
[383] "CHR_HSCHR19KIR_0019-4656-A_CTG3_1"     
[384] "CHR_HSCHR19KIR_0019-4656-B_CTG3_1"     
[385] "CHR_HSCHR19KIR_7191059-1_CTG3_1"       
[386] "CHR_HSCHR19KIR_7191059-2_CTG3_1"       
[387] "CHR_HSCHR19KIR_502960008-1_CTG3_1"     
[388] "CHR_HSCHR19KIR_502960008-2_CTG3_1"     
[389] "CHR_HSCHR19LRC_COX1_CTG3_1"            
[390] "CHR_HSCHR19LRC_COX2_CTG3_1"            
[391] "CHR_HSCHR19LRC_LRC_I_CTG3_1"           
[392] "CHR_HSCHR19LRC_LRC_J_CTG3_1"           
[393] "CHR_HSCHR19LRC_LRC_S_CTG3_1"           
[394] "CHR_HSCHR19LRC_LRC_T_CTG3_1"           
[395] "CHR_HSCHR19LRC_PGF1_CTG3_1"            
[396] "CHR_HSCHR19LRC_PGF2_CTG3_1"            
[397] "CHR_HSCHR20_1_CTG1"                    
[398] "CHR_HSCHR20_1_CTG2"                    
[399] "CHR_HSCHR20_1_CTG3"                    
[400] "CHR_HSCHR20_1_CTG4"                    
[401] "CHR_HSCHR21_2_CTG1_1"                  
[402] "CHR_HSCHR21_3_CTG1_1"                  
[403] "CHR_HSCHR21_4_CTG1_1"                  
[404] "CHR_HSCHR21_5_CTG2"                    
[405] "CHR_HSCHR21_6_CTG1_1"                  
[406] "CHR_HSCHR21_8_CTG1_1"                  
[407] "CHR_HSCHR22_1_CTG1"                    
[408] "CHR_HSCHR22_1_CTG2"                    
[409] "CHR_HSCHR22_1_CTG3"                    
[410] "CHR_HSCHR22_1_CTG4"                    
[411] "CHR_HSCHR22_1_CTG5"                    
[412] "CHR_HSCHR22_1_CTG6"                    
[413] "CHR_HSCHR22_1_CTG7"                    
[414] "CHR_HSCHR22_2_CTG1"                    
[415] "CHR_HSCHR22_3_CTG1"                    
[416] "CHR_HSCHR22_4_CTG1"                    
[417] "CHR_HSCHR22_5_CTG1"                    
[418] "CHR_HSCHR22_6_CTG1"                    
[419] "CHR_HSCHR22_7_CTG1"                    
[420] "CHR_HSCHR22_8_CTG1"                    
[421] "GL000009.2"                            
[422] "GL000194.1"                            
[423] "GL000195.1"                            
[424] "GL000205.2"                            
[425] "GL000213.1"                            
[426] "GL000216.2"                            
[427] "GL000218.1"                            
[428] "GL000219.1"                            
[429] "GL000220.1"                            
[430] "GL000225.1"                            
[431] "KI270442.1"                            
[432] "KI270711.1"                            
[433] "KI270713.1"                            
[434] "KI270721.1"                            
[435] "KI270726.1"                            
[436] "KI270727.1"                            
[437] "KI270728.1"                            
[438] "KI270731.1"                            
[439] "KI270733.1"                            
[440] "KI270734.1"                            
[441] "KI270744.1"                            
[442] "KI270750.1"                            
[443] "MT"                                    
[444] "X"                                     
[445] "Y"                                     
searchFilterOptions(mart = ensembl, filter = "chromosome_name", pattern = "^GL")
 [1] "GL000009.2" "GL000194.1" "GL000195.1" "GL000205.2" "GL000213.1"
 [6] "GL000216.2" "GL000218.1" "GL000219.1" "GL000220.1" "GL000225.1"
searchFilterOptions(mart = ensembl, filter = "phenotype_description", pattern = "Crohn")
[1] "INFLAMMATORY BOWEL DISEASE CROHN DISEASE 1" 
[2] "INFLAMMATORY BOWEL DISEASE CROHN DISEASE 10"
[3] "INFLAMMATORY BOWEL DISEASE CROHN DISEASE 19"
[4] "INFLAMMATORY BOWEL DISEASE CROHN DISEASE 30"
[5] "NON RARE IN EUROPE: Crohn disease"          

8. Perform online ID conversion and annotate genes

affyids=c("202763_at",
          "209310_s_at",
          "207500_at")

getBM(attributes = c('affy_hg_u133_plus_2', 
                     'hgnc_symbol', 
                     'chromosome_name',
                     'start_position', 
                     'end_position',
                     'band'),

      filters = 'affy_hg_u133_plus_2',
      values = affyids,
      mart = ensembl)
  affy_hg_u133_plus_2 hgnc_symbol chromosome_name start_position
1           202763_at       CASP3               4      184627696
2         209310_s_at       CASP4              11      104942866
3           207500_at       CASP5              11      104994235
  end_position  band
1    184649509 q35.1
2    104969366 q22.3
3    105023168 q22.3
entrez=c("673","837")

goids = getBM (attributes = c('entrezgene_id',
                            'go_id'),
              filters = 'entrezgene_id',
              values = entrez,
              mart = ensembl)

head(goids)
  entrezgene_id      go_id
1           673 GO:0043231
2           673 GO:0000166
3           673 GO:0004672
4           673 GO:0004674
5           673 GO:0005524
6           673 GO:0006468
go=c("GO:0051330","GO:0000080",
     "GO:0000114","GO:0000082")

chrom=c(17,20,"Y")

getBM(attributes= "hgnc_symbol",
      filters=c("go","chromosome_name"),
      values=list(go, chrom), mart=ensembl)
  hgnc_symbol
1        E2F1
2     RPS6KB1
3        CDK3
refseqids = c("NM_005359","NM_000546")

ipro = getBM(attributes=c("refseq_mrna", "interpro",
                          "interpro_description"),
             filters="refseq_mrna",
             values=refseqids,
             mart=ensembl)

ipro
   refseq_mrna  interpro
1    NM_000546 IPR002117
2    NM_000546 IPR008967
3    NM_000546 IPR010991
4    NM_000546 IPR011615
5    NM_000546 IPR012346
6    NM_000546 IPR013872
7    NM_000546 IPR036674
8    NM_000546 IPR040926
9    NM_005359 IPR001132
10   NM_005359 IPR003619
11   NM_005359 IPR008984
12   NM_005359 IPR013019
13   NM_005359 IPR013790
14   NM_005359 IPR017855
15   NM_005359 IPR036578
                                                 interpro_description
1                                        p53 tumour suppressor family
2                          p53-like transcription factor, DNA-binding
3                                         p53, tetramerisation domain
4                                             p53, DNA-binding domain
5  p53/RUNT-type transcription factor, DNA-binding domain superfamily
6                                          p53 transactivation domain
7                         p53-like tetramerisation domain superfamily
8                Cellular tumor antigen p53, transactivation domain 2
9                                           SMAD domain, Dwarfin-type
10                                       MAD homology 1, Dwarfin-type
11                                        SMAD/FHA domain superfamily
12                                                  MAD homology, MH1
13                                                            Dwarfin
14                                       SMAD-like domain superfamily
15                                        SMAD MH1 domain superfamily

InterPro provides functional analysis of proteins by classifying them into families and predicting domains as well as important sites.

getBM(attributes = c('affy_hg_u133_plus_2',
                     'ensembl_gene_id'),
      filters = c('chromosome_name','start','end'),
      values = list(16,1100000,1250000),
      mart = ensembl)
   affy_hg_u133_plus_2 ensembl_gene_id
1                      ENSG00000260702
2            215502_at ENSG00000260532
3                      ENSG00000273551
4            205845_at ENSG00000196557
5                      ENSG00000196557
6                      ENSG00000260403
7                      ENSG00000259910
8                      ENSG00000261294
9          220339_s_at ENSG00000116176
10                     ENSG00000277010
11         215382_x_at ENSG00000197253
12         207134_x_at ENSG00000197253
13         216474_x_at ENSG00000197253
14         217023_x_at ENSG00000197253
15         205683_x_at ENSG00000197253
16         210084_x_at ENSG00000197253
17         215382_x_at ENSG00000172236
18         207134_x_at ENSG00000172236
19         216474_x_at ENSG00000172236
20         217023_x_at ENSG00000172236
21         205683_x_at ENSG00000172236
22         210084_x_at ENSG00000172236