BioMart is an easy-to-use, web-based tool that allows users to extract data without the need for programming skills or understanding of the underlying database structure. It provides easy navigation using the left panel and in the right panel, filters and attributes can be chosen.
#Set working directory
setwd("C:/Users/manso/OneDrive - University of West London/MSc Bioinformatics - UWL/done/BFG - Bioinformatics and Functional Genomics - final mark =/Practicals")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install()
BiocManager::install("biomaRt")
library("biomaRt")
biomart version
1 genes Ensembl Genes 106
2 mouse_strains Mouse strains 106
3 snps Ensembl Variation 106
4 regulation Ensembl Regulation 106
biomart version
1 ENSEMBL_MART_ENSEMBL Ensembl Genes 106
2 ENSEMBL_MART_MOUSE Mouse strains 106
3 ENSEMBL_MART_SNP Ensembl Variation 106
4 ENSEMBL_MART_FUNCGEN Ensembl Regulation 106
The first step in analyzing biomart data is selecting a database. We will connect to Ensembl’s Human Genes BioMart using the commands below.
ensembl <- useEnsembl(biomart = "genes",
dataset = "hsapiens_gene_ensembl")
With the default parameters for useEnsembl(), your queries will be routed to the nearest geographical mirror.
However, it’s possible to use the mirror argument to explicitly request a specific mirror:
ensembl <- useEnsembl(biomart = "genes",
dataset = "hsapiens_gene_ensembl",
mirror = "uswest")
listDatasets() will show us what datasets are available in the selected biomart=“genes” :
datasets <- listDatasets(ensembl)
head(datasets)
dataset description
1 abrachyrhynchus_gene_ensembl Pink-footed goose genes (ASM259213v1)
2 acalliptera_gene_ensembl Eastern happy genes (fAstCal1.2)
3 acarolinensis_gene_ensembl Green anole genes (AnoCar2.0v2)
4 acchrysaetos_gene_ensembl Golden eagle genes (bAquChr1.2)
5 acitrinellus_gene_ensembl Midas cichlid genes (Midas_v5)
6 amelanoleuca_gene_ensembl Giant panda genes (ASM200744v2)
version
1 ASM259213v1
2 fAstCal1.2
3 AnoCar2.0v2
4 bAquChr1.2
5 Midas_v5
6 ASM200744v2
searchDatasets(mart = ensembl, pattern = "hsapiens")
dataset description version
81 hsapiens_gene_ensembl Human genes (GRCh38.p13) GRCh38.p13
ensembl = useDataset("hsapiens_gene_ensembl", mart=ensembl)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
Now that we have selected our dataset we’ll create a query and send it to Emsembl BioMart server.
getBM() has 4 main arguments:
Attributes: vector of attributes that one wants to retrieve (= the output of the query).
Filters: filters that one wil use as input to the query.
Values: values for the filters. In case of multiple filters, the values argument requires a list of values where each position in the list corresponds to the position of the filters in the filters argument (see examples below).
Mart: object of class Mart, which is created by the useEnsembl() function.
listFilters() displays all the available filters in the selected dataset:
filters = listFilters(ensembl)
filters[1:5,]
name description
1 chromosome_name Chromosome/scaffold name
2 start Start
3 end End
4 band_start Band Start
5 band_end Band End
Attributes define the data we are interested in retrieving.
listAttributes() displays all the available attributes in the selected dataset:
attributes = listAttributes(ensembl)
attributes[1:5,]
name description
1 ensembl_gene_id Gene stable ID
2 ensembl_gene_id_version Gene stable ID version
3 ensembl_transcript_id Transcript stable ID
4 ensembl_transcript_id_version Transcript stable ID version
5 ensembl_peptide_id Protein stable ID
page
1 feature_page
2 feature_page
3 feature_page
4 feature_page
5 feature_page
Let’s apply this knowledge to solve the following problem:
We have a list of Affymetrix identifiers from the u133plus2 platform and we want to retrieve the corresponding EntrezGene identifiers using Ensembl mappings.
The u133plus2 platform will be the filter for this query and as values for this filter we use our list of Affymetrix identifiers.
As output (attributes) for the query we want to retrieve the EntrezGene and u133plus2 identifiers so we get a mapping of these two identifiers as a result.
The exact names that we will have to use to specify the attributes and filters can be retrieved with the listAttributes() and listFilters() function respectively.
affyids <- c("202763_at","209310_s_at","207500_at")
getBM(attributes = c('affy_hg_u133_plus_2',
'entrezgene_id',
'ensembl_gene_id'),
filters = 'affy_hg_u133_plus_2',
values = affyids,
mart = ensembl)
affy_hg_u133_plus_2 entrezgene_id ensembl_gene_id
1 202763_at 836 ENSG00000164305
2 209310_s_at 837 ENSG00000196954
3 207500_at 838 ENSG00000137757
listAttributes() and listFilters() - return every available option for their respective types, which can produce a very long output where it is hard to find the value you are interested in.
searchAttributes() and searchFilters() - will try to find any entries matching a specific term or pattern. searchDatasets() is similar.
In this example we get the details for all the attributes that contain the pattern “hgnc”:
searchAttributes(mart = ensembl, pattern = "hgnc")
name description page
62 hgnc_id HGNC ID feature_page
63 hgnc_symbol HGNC symbol feature_page
91 hgnc_trans_name Transcript name ID feature_page
searchDatasets(mart = ensembl, pattern = "hsapiens")
dataset description version
81 hsapiens_gene_ensembl Human genes (GRCh38.p13) GRCh38.p13
The pattern argument takes a regular expression, which means we can create more complex queries if required.
For example, if we have the string ENST00000577249.1, and aren not sure what the appropriate filter term is. The example bellow uses a pattern that will find all filters that contain the terms “ensembl” and “id”, allowing us to reduced the list of filters.
searchFilters(mart = ensembl, pattern = "ensembl.*id")
name
52 ensembl_gene_id
53 ensembl_gene_id_version
54 ensembl_transcript_id
55 ensembl_transcript_id_version
56 ensembl_peptide_id
57 ensembl_peptide_id_version
58 ensembl_exon_id
description
52 Gene stable ID(s) [e.g. ENSG00000000003]
53 Gene stable ID(s) with version [e.g. ENSG00000000003.15]
54 Transcript stable ID(s) [e.g. ENST00000000233]
55 Transcript stable ID(s) with version [e.g. ENST00000000233.10]
56 Protein stable ID(s) [e.g. ENSP00000000233]
57 Protein stable ID(s) with version [e.g. ENSP00000000233.5]
58 Exon ID(s) [e.g. ENSE00000000003]
Now we can compare ENST00000577249.1 and see it is a Transcript ID with version. Thus the appropriate filter value to use with it is ensembl_transcript_id_version.
listFilterOptions(mart = ensembl, filter = "chromosome_name")
[1] "1"
[2] "2"
[3] "3"
[4] "4"
[5] "5"
[6] "6"
[7] "7"
[8] "8"
[9] "9"
[10] "10"
[11] "11"
[12] "12"
[13] "13"
[14] "14"
[15] "15"
[16] "16"
[17] "17"
[18] "18"
[19] "19"
[20] "20"
[21] "21"
[22] "22"
[23] "CHR_HG1_PATCH"
[24] "CHR_HG26_PATCH"
[25] "CHR_HG28_PATCH"
[26] "CHR_HG30_PATCH"
[27] "CHR_HG76_PATCH"
[28] "CHR_HG107_PATCH"
[29] "CHR_HG109_PATCH"
[30] "CHR_HG126_PATCH"
[31] "CHR_HG142_HG150_NOVEL_TEST"
[32] "CHR_HG151_NOVEL_TEST"
[33] "CHR_HG439_PATCH"
[34] "CHR_HG545_PATCH"
[35] "CHR_HG699_PATCH"
[36] "CHR_HG705_PATCH"
[37] "CHR_HG708_PATCH"
[38] "CHR_HG721_PATCH"
[39] "CHR_HG926_PATCH"
[40] "CHR_HG986_PATCH"
[41] "CHR_HG1277_PATCH"
[42] "CHR_HG1298_PATCH"
[43] "CHR_HG1309_PATCH"
[44] "CHR_HG1311_PATCH"
[45] "CHR_HG1320_PATCH"
[46] "CHR_HG1342_HG2282_PATCH"
[47] "CHR_HG1362_PATCH"
[48] "CHR_HG1384_PATCH"
[49] "CHR_HG1395_PATCH"
[50] "CHR_HG1398_PATCH"
[51] "CHR_HG1445_PATCH"
[52] "CHR_HG1485_PATCH"
[53] "CHR_HG1524_PATCH"
[54] "CHR_HG1531_PATCH"
[55] "CHR_HG1535_PATCH"
[56] "CHR_HG1651_PATCH"
[57] "CHR_HG1708_PATCH"
[58] "CHR_HG1815_PATCH"
[59] "CHR_HG1832_PATCH"
[60] "CHR_HG2002_PATCH"
[61] "CHR_HG2021_PATCH"
[62] "CHR_HG2022_PATCH"
[63] "CHR_HG2023_PATCH"
[64] "CHR_HG2030_PATCH"
[65] "CHR_HG2046_PATCH"
[66] "CHR_HG2047_PATCH"
[67] "CHR_HG2057_PATCH"
[68] "CHR_HG2058_PATCH"
[69] "CHR_HG2060_PATCH"
[70] "CHR_HG2062_PATCH"
[71] "CHR_HG2063_PATCH"
[72] "CHR_HG2066_PATCH"
[73] "CHR_HG2067_PATCH"
[74] "CHR_HG2072_PATCH"
[75] "CHR_HG2087_PATCH"
[76] "CHR_HG2088_PATCH"
[77] "CHR_HG2095_PATCH"
[78] "CHR_HG2104_PATCH"
[79] "CHR_HG2111_PATCH"
[80] "CHR_HG2114_PATCH"
[81] "CHR_HG2115_PATCH"
[82] "CHR_HG2116_PATCH"
[83] "CHR_HG2121_PATCH"
[84] "CHR_HG2128_PATCH"
[85] "CHR_HG2133_PATCH"
[86] "CHR_HG2191_PATCH"
[87] "CHR_HG2198_PATCH"
[88] "CHR_HG2213_PATCH"
[89] "CHR_HG2217_PATCH"
[90] "CHR_HG2232_PATCH"
[91] "CHR_HG2233_PATCH"
[92] "CHR_HG2235_PATCH"
[93] "CHR_HG2236_PATCH"
[94] "CHR_HG2239_PATCH"
[95] "CHR_HG2246_HG2248_HG2276_PATCH"
[96] "CHR_HG2247_PATCH"
[97] "CHR_HG2249_PATCH"
[98] "CHR_HG2263_PATCH"
[99] "CHR_HG2266_PATCH"
[100] "CHR_HG2285_HG106_HG2252_PATCH"
[101] "CHR_HG2288_HG2289_PATCH"
[102] "CHR_HG2290_PATCH"
[103] "CHR_HG2291_PATCH"
[104] "CHR_HG2334_PATCH"
[105] "CHR_HG2365_PATCH"
[106] "CHR_HG2412_PATCH"
[107] "CHR_HG2419_PATCH"
[108] "CHR_HG2442_PATCH"
[109] "CHR_HG2471_PATCH"
[110] "CHR_HG2499_PATCH"
[111] "CHR_HG2509_PATCH"
[112] "CHR_HG2510_PATCH"
[113] "CHR_HG2511_PATCH"
[114] "CHR_HG2512_PATCH"
[115] "CHR_HG2513_PATCH"
[116] "CHR_HG2525_PATCH"
[117] "CHR_HSCHRX_1_CTG3"
[118] "CHR_HSCHRX_2_CTG3"
[119] "CHR_HSCHRX_2_CTG12"
[120] "CHR_HSCHR1_ALT2_1_CTG32_1"
[121] "CHR_HSCHR1_1_CTG3"
[122] "CHR_HSCHR1_1_CTG11"
[123] "CHR_HSCHR1_1_CTG31"
[124] "CHR_HSCHR1_1_CTG32_1"
[125] "CHR_HSCHR1_2_CTG3"
[126] "CHR_HSCHR1_2_CTG31"
[127] "CHR_HSCHR1_2_CTG32_1"
[128] "CHR_HSCHR1_3_CTG3"
[129] "CHR_HSCHR1_3_CTG31"
[130] "CHR_HSCHR1_3_CTG32_1"
[131] "CHR_HSCHR1_4_CTG3"
[132] "CHR_HSCHR1_4_CTG31"
[133] "CHR_HSCHR1_5_CTG3"
[134] "CHR_HSCHR1_5_CTG32_1"
[135] "CHR_HSCHR1_6_CTG3"
[136] "CHR_HSCHR1_8_CTG3"
[137] "CHR_HSCHR1_9_CTG3"
[138] "CHR_HSCHR2_1_CTG1"
[139] "CHR_HSCHR2_1_CTG5"
[140] "CHR_HSCHR2_1_CTG7"
[141] "CHR_HSCHR2_1_CTG7_2"
[142] "CHR_HSCHR2_1_CTG15"
[143] "CHR_HSCHR2_2_CTG1"
[144] "CHR_HSCHR2_2_CTG7"
[145] "CHR_HSCHR2_2_CTG7_2"
[146] "CHR_HSCHR2_2_CTG15"
[147] "CHR_HSCHR2_3_CTG1"
[148] "CHR_HSCHR2_3_CTG7_2"
[149] "CHR_HSCHR2_3_CTG15"
[150] "CHR_HSCHR2_4_CTG1"
[151] "CHR_HSCHR2_6_CTG7_2"
[152] "CHR_HSCHR2_7_CTG7_2"
[153] "CHR_HSCHR2_8_CTG7_2"
[154] "CHR_HSCHR3_1_CTG1"
[155] "CHR_HSCHR3_1_CTG2_1"
[156] "CHR_HSCHR3_1_CTG3"
[157] "CHR_HSCHR3_2_CTG2_1"
[158] "CHR_HSCHR3_2_CTG3"
[159] "CHR_HSCHR3_3_CTG1"
[160] "CHR_HSCHR3_3_CTG3"
[161] "CHR_HSCHR3_4_CTG1"
[162] "CHR_HSCHR3_4_CTG2_1"
[163] "CHR_HSCHR3_4_CTG3"
[164] "CHR_HSCHR3_5_CTG1"
[165] "CHR_HSCHR3_5_CTG2_1"
[166] "CHR_HSCHR3_5_CTG3"
[167] "CHR_HSCHR3_6_CTG2_1"
[168] "CHR_HSCHR3_6_CTG3"
[169] "CHR_HSCHR3_7_CTG3"
[170] "CHR_HSCHR3_8_CTG2_1"
[171] "CHR_HSCHR3_8_CTG3"
[172] "CHR_HSCHR3_9_CTG2_1"
[173] "CHR_HSCHR3_9_CTG3"
[174] "CHR_HSCHR4_1_CTG4"
[175] "CHR_HSCHR4_1_CTG6"
[176] "CHR_HSCHR4_1_CTG9"
[177] "CHR_HSCHR4_1_CTG12"
[178] "CHR_HSCHR4_2_CTG4"
[179] "CHR_HSCHR4_2_CTG12"
[180] "CHR_HSCHR4_3_CTG12"
[181] "CHR_HSCHR4_4_CTG12"
[182] "CHR_HSCHR4_5_CTG12"
[183] "CHR_HSCHR4_6_CTG12"
[184] "CHR_HSCHR4_7_CTG12"
[185] "CHR_HSCHR4_8_CTG12"
[186] "CHR_HSCHR4_9_CTG12"
[187] "CHR_HSCHR4_11_CTG12"
[188] "CHR_HSCHR4_12_CTG12"
[189] "CHR_HSCHR5_1_CTG1"
[190] "CHR_HSCHR5_1_CTG1_1"
[191] "CHR_HSCHR5_1_CTG5"
[192] "CHR_HSCHR5_2_CTG1"
[193] "CHR_HSCHR5_2_CTG1_1"
[194] "CHR_HSCHR5_2_CTG5"
[195] "CHR_HSCHR5_3_CTG1"
[196] "CHR_HSCHR5_3_CTG5"
[197] "CHR_HSCHR5_4_CTG1"
[198] "CHR_HSCHR5_4_CTG1_1"
[199] "CHR_HSCHR5_5_CTG1"
[200] "CHR_HSCHR5_6_CTG1"
[201] "CHR_HSCHR5_7_CTG1"
[202] "CHR_HSCHR5_8_CTG1"
[203] "CHR_HSCHR6_MHC_APD_CTG1"
[204] "CHR_HSCHR6_MHC_COX_CTG1"
[205] "CHR_HSCHR6_MHC_DBB_CTG1"
[206] "CHR_HSCHR6_MHC_MANN_CTG1"
[207] "CHR_HSCHR6_MHC_MCF_CTG1"
[208] "CHR_HSCHR6_MHC_QBL_CTG1"
[209] "CHR_HSCHR6_MHC_SSTO_CTG1"
[210] "CHR_HSCHR6_1_CTG2"
[211] "CHR_HSCHR6_1_CTG3"
[212] "CHR_HSCHR6_1_CTG4"
[213] "CHR_HSCHR6_1_CTG5"
[214] "CHR_HSCHR6_1_CTG6"
[215] "CHR_HSCHR6_1_CTG7"
[216] "CHR_HSCHR6_1_CTG8"
[217] "CHR_HSCHR6_1_CTG9"
[218] "CHR_HSCHR6_8_CTG1"
[219] "CHR_HSCHR7_1_CTG1"
[220] "CHR_HSCHR7_1_CTG4_4"
[221] "CHR_HSCHR7_1_CTG6"
[222] "CHR_HSCHR7_1_CTG7"
[223] "CHR_HSCHR7_2_CTG1"
[224] "CHR_HSCHR7_2_CTG4_4"
[225] "CHR_HSCHR7_2_CTG6"
[226] "CHR_HSCHR7_2_CTG7"
[227] "CHR_HSCHR7_3_CTG1"
[228] "CHR_HSCHR7_3_CTG4_4"
[229] "CHR_HSCHR7_3_CTG6"
[230] "CHR_HSCHR8_1_CTG1"
[231] "CHR_HSCHR8_1_CTG6"
[232] "CHR_HSCHR8_1_CTG7"
[233] "CHR_HSCHR8_2_CTG1"
[234] "CHR_HSCHR8_2_CTG7"
[235] "CHR_HSCHR8_3_CTG1"
[236] "CHR_HSCHR8_3_CTG7"
[237] "CHR_HSCHR8_4_CTG1"
[238] "CHR_HSCHR8_4_CTG7"
[239] "CHR_HSCHR8_5_CTG1"
[240] "CHR_HSCHR8_5_CTG7"
[241] "CHR_HSCHR8_6_CTG1"
[242] "CHR_HSCHR8_7_CTG1"
[243] "CHR_HSCHR8_7_CTG7"
[244] "CHR_HSCHR8_8_CTG1"
[245] "CHR_HSCHR8_9_CTG1"
[246] "CHR_HSCHR9_1_CTG1"
[247] "CHR_HSCHR9_1_CTG2"
[248] "CHR_HSCHR9_1_CTG3"
[249] "CHR_HSCHR9_1_CTG4"
[250] "CHR_HSCHR9_1_CTG5"
[251] "CHR_HSCHR9_1_CTG6"
[252] "CHR_HSCHR10_1_CTG1"
[253] "CHR_HSCHR10_1_CTG2"
[254] "CHR_HSCHR10_1_CTG3"
[255] "CHR_HSCHR10_1_CTG4"
[256] "CHR_HSCHR10_1_CTG6"
[257] "CHR_HSCHR11_1_CTG1_2"
[258] "CHR_HSCHR11_1_CTG3_1"
[259] "CHR_HSCHR11_1_CTG5"
[260] "CHR_HSCHR11_1_CTG6"
[261] "CHR_HSCHR11_1_CTG7"
[262] "CHR_HSCHR11_1_CTG8"
[263] "CHR_HSCHR11_2_CTG1"
[264] "CHR_HSCHR11_2_CTG1_1"
[265] "CHR_HSCHR11_2_CTG8"
[266] "CHR_HSCHR11_3_CTG1"
[267] "CHR_HSCHR12_1_CTG1"
[268] "CHR_HSCHR12_1_CTG2_1"
[269] "CHR_HSCHR12_2_CTG2"
[270] "CHR_HSCHR12_2_CTG2_1"
[271] "CHR_HSCHR12_3_CTG2"
[272] "CHR_HSCHR12_3_CTG2_1"
[273] "CHR_HSCHR12_4_CTG2"
[274] "CHR_HSCHR12_4_CTG2_1"
[275] "CHR_HSCHR12_5_CTG2"
[276] "CHR_HSCHR12_5_CTG2_1"
[277] "CHR_HSCHR12_6_CTG2_1"
[278] "CHR_HSCHR12_8_CTG2_1"
[279] "CHR_HSCHR12_9_CTG2_1"
[280] "CHR_HSCHR13_1_CTG1"
[281] "CHR_HSCHR13_1_CTG3"
[282] "CHR_HSCHR13_1_CTG5"
[283] "CHR_HSCHR14_1_CTG1"
[284] "CHR_HSCHR14_2_CTG1"
[285] "CHR_HSCHR14_3_CTG1"
[286] "CHR_HSCHR14_7_CTG1"
[287] "CHR_HSCHR14_8_CTG1"
[288] "CHR_HSCHR14_9_CTG1"
[289] "CHR_HSCHR15_1_CTG1"
[290] "CHR_HSCHR15_1_CTG3"
[291] "CHR_HSCHR15_1_CTG8"
[292] "CHR_HSCHR15_2_CTG3"
[293] "CHR_HSCHR15_2_CTG8"
[294] "CHR_HSCHR15_3_CTG3"
[295] "CHR_HSCHR15_3_CTG8"
[296] "CHR_HSCHR15_4_CTG8"
[297] "CHR_HSCHR15_5_CTG8"
[298] "CHR_HSCHR15_6_CTG8"
[299] "CHR_HSCHR16_CTG2"
[300] "CHR_HSCHR16_1_CTG1"
[301] "CHR_HSCHR16_1_CTG3_1"
[302] "CHR_HSCHR16_2_CTG3_1"
[303] "CHR_HSCHR16_3_CTG1"
[304] "CHR_HSCHR16_4_CTG1"
[305] "CHR_HSCHR16_4_CTG3_1"
[306] "CHR_HSCHR16_5_CTG1"
[307] "CHR_HSCHR16_5_CTG3_1"
[308] "CHR_HSCHR17_1_CTG1"
[309] "CHR_HSCHR17_1_CTG2"
[310] "CHR_HSCHR17_1_CTG4"
[311] "CHR_HSCHR17_1_CTG5"
[312] "CHR_HSCHR17_1_CTG9"
[313] "CHR_HSCHR17_2_CTG1"
[314] "CHR_HSCHR17_2_CTG2"
[315] "CHR_HSCHR17_2_CTG4"
[316] "CHR_HSCHR17_2_CTG5"
[317] "CHR_HSCHR17_3_CTG1"
[318] "CHR_HSCHR17_3_CTG2"
[319] "CHR_HSCHR17_3_CTG4"
[320] "CHR_HSCHR17_4_CTG4"
[321] "CHR_HSCHR17_5_CTG4"
[322] "CHR_HSCHR17_6_CTG4"
[323] "CHR_HSCHR17_7_CTG4"
[324] "CHR_HSCHR17_8_CTG4"
[325] "CHR_HSCHR17_9_CTG4"
[326] "CHR_HSCHR17_10_CTG4"
[327] "CHR_HSCHR17_11_CTG4"
[328] "CHR_HSCHR17_12_CTG4"
[329] "CHR_HSCHR18_ALT2_CTG2_1"
[330] "CHR_HSCHR18_ALT21_CTG2_1"
[331] "CHR_HSCHR18_1_CTG1"
[332] "CHR_HSCHR18_1_CTG1_1"
[333] "CHR_HSCHR18_1_CTG2_1"
[334] "CHR_HSCHR18_2_CTG1_1"
[335] "CHR_HSCHR18_2_CTG2"
[336] "CHR_HSCHR18_2_CTG2_1"
[337] "CHR_HSCHR18_3_CTG2_1"
[338] "CHR_HSCHR18_5_CTG1_1"
[339] "CHR_HSCHR19_1_CTG2"
[340] "CHR_HSCHR19_1_CTG3_1"
[341] "CHR_HSCHR19_2_CTG2"
[342] "CHR_HSCHR19_2_CTG3_1"
[343] "CHR_HSCHR19_3_CTG2"
[344] "CHR_HSCHR19_3_CTG3_1"
[345] "CHR_HSCHR19_4_CTG2"
[346] "CHR_HSCHR19_4_CTG3_1"
[347] "CHR_HSCHR19_5_CTG2"
[348] "CHR_HSCHR19KIR_ABC08_AB_HAP_C_P_CTG3_1"
[349] "CHR_HSCHR19KIR_ABC08_AB_HAP_T_P_CTG3_1"
[350] "CHR_HSCHR19KIR_ABC08_A1_HAP_CTG3_1"
[351] "CHR_HSCHR19KIR_CA01-TA01_1_CTG3_1"
[352] "CHR_HSCHR19KIR_CA01-TA01_2_CTG3_1"
[353] "CHR_HSCHR19KIR_CA01-TB01_CTG3_1"
[354] "CHR_HSCHR19KIR_CA01-TB04_CTG3_1"
[355] "CHR_HSCHR19KIR_CA04_CTG3_1"
[356] "CHR_HSCHR19KIR_FH05_A_HAP_CTG3_1"
[357] "CHR_HSCHR19KIR_FH05_B_HAP_CTG3_1"
[358] "CHR_HSCHR19KIR_FH06_A_HAP_CTG3_1"
[359] "CHR_HSCHR19KIR_FH06_BA1_HAP_CTG3_1"
[360] "CHR_HSCHR19KIR_FH08_A_HAP_CTG3_1"
[361] "CHR_HSCHR19KIR_FH08_BAX_HAP_CTG3_1"
[362] "CHR_HSCHR19KIR_FH13_A_HAP_CTG3_1"
[363] "CHR_HSCHR19KIR_FH13_BA2_HAP_CTG3_1"
[364] "CHR_HSCHR19KIR_FH15_A_HAP_CTG3_1"
[365] "CHR_HSCHR19KIR_FH15_B_HAP_CTG3_1"
[366] "CHR_HSCHR19KIR_GRC212_AB_HAP_CTG3_1"
[367] "CHR_HSCHR19KIR_GRC212_BA1_HAP_CTG3_1"
[368] "CHR_HSCHR19KIR_G085_A_HAP_CTG3_1"
[369] "CHR_HSCHR19KIR_G085_BA1_HAP_CTG3_1"
[370] "CHR_HSCHR19KIR_G248_A_HAP_CTG3_1"
[371] "CHR_HSCHR19KIR_G248_BA2_HAP_CTG3_1"
[372] "CHR_HSCHR19KIR_HG2393_CTG3_1"
[373] "CHR_HSCHR19KIR_HG2394_CTG3_1"
[374] "CHR_HSCHR19KIR_HG2396_CTG3_1"
[375] "CHR_HSCHR19KIR_LUCE_A_HAP_CTG3_1"
[376] "CHR_HSCHR19KIR_LUCE_BDEL_HAP_CTG3_1"
[377] "CHR_HSCHR19KIR_RP5_B_HAP_CTG3_1"
[378] "CHR_HSCHR19KIR_RSH_A_HAP_CTG3_1"
[379] "CHR_HSCHR19KIR_RSH_BA2_HAP_CTG3_1"
[380] "CHR_HSCHR19KIR_T7526_A_HAP_CTG3_1"
[381] "CHR_HSCHR19KIR_T7526_BDEL_HAP_CTG3_1"
[382] "CHR_HSCHR19KIR_0010-5217-AB_CTG3_1"
[383] "CHR_HSCHR19KIR_0019-4656-A_CTG3_1"
[384] "CHR_HSCHR19KIR_0019-4656-B_CTG3_1"
[385] "CHR_HSCHR19KIR_7191059-1_CTG3_1"
[386] "CHR_HSCHR19KIR_7191059-2_CTG3_1"
[387] "CHR_HSCHR19KIR_502960008-1_CTG3_1"
[388] "CHR_HSCHR19KIR_502960008-2_CTG3_1"
[389] "CHR_HSCHR19LRC_COX1_CTG3_1"
[390] "CHR_HSCHR19LRC_COX2_CTG3_1"
[391] "CHR_HSCHR19LRC_LRC_I_CTG3_1"
[392] "CHR_HSCHR19LRC_LRC_J_CTG3_1"
[393] "CHR_HSCHR19LRC_LRC_S_CTG3_1"
[394] "CHR_HSCHR19LRC_LRC_T_CTG3_1"
[395] "CHR_HSCHR19LRC_PGF1_CTG3_1"
[396] "CHR_HSCHR19LRC_PGF2_CTG3_1"
[397] "CHR_HSCHR20_1_CTG1"
[398] "CHR_HSCHR20_1_CTG2"
[399] "CHR_HSCHR20_1_CTG3"
[400] "CHR_HSCHR20_1_CTG4"
[401] "CHR_HSCHR21_2_CTG1_1"
[402] "CHR_HSCHR21_3_CTG1_1"
[403] "CHR_HSCHR21_4_CTG1_1"
[404] "CHR_HSCHR21_5_CTG2"
[405] "CHR_HSCHR21_6_CTG1_1"
[406] "CHR_HSCHR21_8_CTG1_1"
[407] "CHR_HSCHR22_1_CTG1"
[408] "CHR_HSCHR22_1_CTG2"
[409] "CHR_HSCHR22_1_CTG3"
[410] "CHR_HSCHR22_1_CTG4"
[411] "CHR_HSCHR22_1_CTG5"
[412] "CHR_HSCHR22_1_CTG6"
[413] "CHR_HSCHR22_1_CTG7"
[414] "CHR_HSCHR22_2_CTG1"
[415] "CHR_HSCHR22_3_CTG1"
[416] "CHR_HSCHR22_4_CTG1"
[417] "CHR_HSCHR22_5_CTG1"
[418] "CHR_HSCHR22_6_CTG1"
[419] "CHR_HSCHR22_7_CTG1"
[420] "CHR_HSCHR22_8_CTG1"
[421] "GL000009.2"
[422] "GL000194.1"
[423] "GL000195.1"
[424] "GL000205.2"
[425] "GL000213.1"
[426] "GL000216.2"
[427] "GL000218.1"
[428] "GL000219.1"
[429] "GL000220.1"
[430] "GL000225.1"
[431] "KI270442.1"
[432] "KI270711.1"
[433] "KI270713.1"
[434] "KI270721.1"
[435] "KI270726.1"
[436] "KI270727.1"
[437] "KI270728.1"
[438] "KI270731.1"
[439] "KI270733.1"
[440] "KI270734.1"
[441] "KI270744.1"
[442] "KI270750.1"
[443] "MT"
[444] "X"
[445] "Y"
searchFilterOptions(mart = ensembl, filter = "chromosome_name", pattern = "^GL")
[1] "GL000009.2" "GL000194.1" "GL000195.1" "GL000205.2" "GL000213.1"
[6] "GL000216.2" "GL000218.1" "GL000219.1" "GL000220.1" "GL000225.1"
searchFilterOptions(mart = ensembl, filter = "phenotype_description", pattern = "Crohn")
[1] "INFLAMMATORY BOWEL DISEASE CROHN DISEASE 1"
[2] "INFLAMMATORY BOWEL DISEASE CROHN DISEASE 10"
[3] "INFLAMMATORY BOWEL DISEASE CROHN DISEASE 19"
[4] "INFLAMMATORY BOWEL DISEASE CROHN DISEASE 30"
[5] "NON RARE IN EUROPE: Crohn disease"
affyids=c("202763_at",
"209310_s_at",
"207500_at")
getBM(attributes = c('affy_hg_u133_plus_2',
'hgnc_symbol',
'chromosome_name',
'start_position',
'end_position',
'band'),
filters = 'affy_hg_u133_plus_2',
values = affyids,
mart = ensembl)
affy_hg_u133_plus_2 hgnc_symbol chromosome_name start_position
1 202763_at CASP3 4 184627696
2 209310_s_at CASP4 11 104942866
3 207500_at CASP5 11 104994235
end_position band
1 184649509 q35.1
2 104969366 q22.3
3 105023168 q22.3
entrez=c("673","837")
goids = getBM (attributes = c('entrezgene_id',
'go_id'),
filters = 'entrezgene_id',
values = entrez,
mart = ensembl)
head(goids)
entrezgene_id go_id
1 673 GO:0043231
2 673 GO:0000166
3 673 GO:0004672
4 673 GO:0004674
5 673 GO:0005524
6 673 GO:0006468
go=c("GO:0051330","GO:0000080",
"GO:0000114","GO:0000082")
chrom=c(17,20,"Y")
getBM(attributes= "hgnc_symbol",
filters=c("go","chromosome_name"),
values=list(go, chrom), mart=ensembl)
hgnc_symbol
1 E2F1
2 RPS6KB1
3 CDK3
refseqids = c("NM_005359","NM_000546")
ipro = getBM(attributes=c("refseq_mrna", "interpro",
"interpro_description"),
filters="refseq_mrna",
values=refseqids,
mart=ensembl)
ipro
refseq_mrna interpro
1 NM_000546 IPR002117
2 NM_000546 IPR008967
3 NM_000546 IPR010991
4 NM_000546 IPR011615
5 NM_000546 IPR012346
6 NM_000546 IPR013872
7 NM_000546 IPR036674
8 NM_000546 IPR040926
9 NM_005359 IPR001132
10 NM_005359 IPR003619
11 NM_005359 IPR008984
12 NM_005359 IPR013019
13 NM_005359 IPR013790
14 NM_005359 IPR017855
15 NM_005359 IPR036578
interpro_description
1 p53 tumour suppressor family
2 p53-like transcription factor, DNA-binding
3 p53, tetramerisation domain
4 p53, DNA-binding domain
5 p53/RUNT-type transcription factor, DNA-binding domain superfamily
6 p53 transactivation domain
7 p53-like tetramerisation domain superfamily
8 Cellular tumor antigen p53, transactivation domain 2
9 SMAD domain, Dwarfin-type
10 MAD homology 1, Dwarfin-type
11 SMAD/FHA domain superfamily
12 MAD homology, MH1
13 Dwarfin
14 SMAD-like domain superfamily
15 SMAD MH1 domain superfamily
InterPro provides functional analysis of proteins by classifying them into families and predicting domains as well as important sites.
getBM(attributes = c('affy_hg_u133_plus_2',
'ensembl_gene_id'),
filters = c('chromosome_name','start','end'),
values = list(16,1100000,1250000),
mart = ensembl)
affy_hg_u133_plus_2 ensembl_gene_id
1 ENSG00000260702
2 215502_at ENSG00000260532
3 ENSG00000273551
4 205845_at ENSG00000196557
5 ENSG00000196557
6 ENSG00000260403
7 ENSG00000259910
8 ENSG00000261294
9 220339_s_at ENSG00000116176
10 ENSG00000277010
11 215382_x_at ENSG00000197253
12 207134_x_at ENSG00000197253
13 216474_x_at ENSG00000197253
14 217023_x_at ENSG00000197253
15 205683_x_at ENSG00000197253
16 210084_x_at ENSG00000197253
17 215382_x_at ENSG00000172236
18 207134_x_at ENSG00000172236
19 216474_x_at ENSG00000172236
20 217023_x_at ENSG00000172236
21 205683_x_at ENSG00000172236
22 210084_x_at ENSG00000172236