BLAST (Basic Local Alignment Search Tool) is a bioinformatics tool designed to help us understand the relatioship between any query sequences by providing rapid sequence comparison of a query sequence against a database. Depending on the experiment, you can design use BLASTP, BLASTN, DELTABLAST, MMHER etc.
This practical provides the learning materials to develop the skills for writing CLI scripts to integrate BLAST searches into your pipelines.
setwd = ("C:/Users/manso/OneDrive - University of West London\MSc Bioinformatics - UWL/3.DSB - Data Science for Bioinformatics/Practice/DSB W10 - BLAST and MSA in R and Biopython Part 1")
We will use these four objects in Bio.SearchIO to: read, parse, index, or index_db.
The QueryResult object represents a single search query and contains zero or more Hit objects.
from Bio import SearchIO
blast_qresult = SearchIO.read("my_blast.xml", "blast-xml")
print (blast_qresult)
Program: blastn (2.12.0+)
Query: mystery_seq (61)
Homo sapiens microRNA 520b (MIR520B), microRNA
Target: nt
Hits: ---- ----- ----------------------------------------------------------
# # HSP ID + description
---- ----- ----------------------------------------------------------
0 41 gi|1549097453|gb|CP034522.1| Eukaryotic synthetic cons...
1 41 gi|1548994293|gb|CP034497.1| Eukaryotic synthetic cons...
2 1 gi|731185158|ref|NR_128580.1| Gorilla gorilla microRNA...
3 41 gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM1...
4 41 gi|1909942460|dbj|AP023479.1| Homo sapiens DNA, chromo...
5 1 gi|262205317|ref|NR_030195.1| Homo sapiens microRNA 52...
6 41 gi|8886975|gb|AC011453.4| Homo sapiens chromosome 19 c...
7 1 gi|301171311|ref|NR_035856.1| Pan troglodytes microRNA...
8 1 gi|270133242|ref|NR_032573.1| Macaca mulatta microRNA ...
9 37 gi|156181109|gb|EU086471.1| Symphalangus syndactylus B...
10 42 gi|156181108|gb|EU086470.1| Pygathrix bieti BAC YGM271...
11 45 gi|154152211|gb|AC200331.2| Pongo abelii BAC clone CH2...
12 2 gi|301171322|ref|NR_035857.1| Pan troglodytes microRNA...
13 1 gi|301171267|ref|NR_035851.1| Pan troglodytes microRNA...
14 2 gi|262205330|ref|NR_030198.1| Homo sapiens microRNA 52...
15 1 gi|262205302|ref|NR_030191.1| Homo sapiens microRNA 51...
16 1 gi|731442011|ref|NR_128601.1| Gorilla gorilla microRNA...
17 1 gi|301171259|ref|NR_035850.1| Pan troglodytes microRNA...
18 1 gi|262205451|ref|NR_030222.1| Homo sapiens microRNA 51...
19 1 gi|34534892|dbj|AK127814.1| Homo sapiens cDNA FLJ45916...
20 15 gi|25140998|gb|AC008753.9| Homo sapiens chromosome 19 ...
21 1 gi|731185130|ref|NR_128576.1| Gorilla gorilla microRNA...
22 2 gi|301171447|ref|NR_035871.1| Pan troglodytes microRNA...
23 1 gi|301171276|ref|NR_035852.1| Pan troglodytes microRNA...
24 1 gi|262205290|ref|NR_030188.1| Homo sapiens microRNA 51...
25 1 gi|1620864509|ref|NR_128592.2| Gorilla gorilla microRN...
26 1 gi|731184631|ref|NR_128574.1| Gorilla gorilla microRNA...
27 1 gi|301171354|ref|NR_035860.1| Pan troglodytes microRNA...
28 1 gi|262205281|ref|NR_030186.1| Homo sapiens microRNA 52...
29 53 gi|156181110|gb|EU086472.1| Ateles geoffroyi BAC SM257...
~~~
97 1 gi|731185470|ref|NR_128589.1| Gorilla gorilla microRNA...
98 1 gi|731185237|ref|NR_128596.1| Gorilla gorilla microRNA...
99 1 gi|301171206|ref|NR_035844.1| Pan troglodytes microRNA...
blat_qresult = SearchIO.read("my_blat.psl", "blat-psl")
print(blat_qresult)
Program: blat (<unknown version>)
Query: mystery_seq (61)
<unknown description>
Target: <unknown target>
Hits: ---- ----- ----------------------------------------------------------
# # HSP ID + description
---- ----- ----------------------------------------------------------
0 17 chr19 <unknown description>
1 1 chr11 <unknown description>
len(blast_qresult)
100
len(blat_qresult)
2
blast_qresult[0] #retrieves the top hit
Hit(id='gi|1549097453|gb|CP034522.1|', query_id='mystery_seq', 41 hsps)
blast_qresult[-1] #retrieves the last hit
Hit(id='gi|301171206|ref|NR_035844.1|', query_id='mystery_seq', 1 hsps)
In this case, the slice will return a new QueryResult object containing only the sliced hits:
blast_slice = blast_qresult[:3] #slices the first 3 hits
print(blast_slice)
Program: blastn (2.12.0+)
Query: mystery_seq (61)
Homo sapiens microRNA 520b (MIR520B), microRNA
Target: nt
Hits: ---- ----- ----------------------------------------------------------
# # HSP ID + description
---- ----- ----------------------------------------------------------
0 41 gi|1549097453|gb|CP034522.1| Eukaryotic synthetic cons...
1 41 gi|1548994293|gb|CP034497.1| Eukaryotic synthetic cons...
2 1 gi|731185158|ref|NR_128580.1| Gorilla gorilla microRNA...
blast_qresult["gi|301171206|ref|NR_035844.1|"]
Hit(id='gi|301171206|ref|NR_035844.1|', query_id='mystery_seq', 1 hsps)
blast_qresult.hits # gives a list of all hits
[Hit(id='gi|1549097453|gb|CP034522.1|', query_id='mystery_seq', 41 hsps), Hit(id='gi|1548994293|gb|CP034497.1|', query_id='mystery_seq', 41 hsps), Hit(id='gi|731185158|ref|NR_128580.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|2033715243|gb|CP068259.2|', query_id='mystery_seq', 41 hsps), Hit(id='gi|1909942460|dbj|AP023479.1|', query_id='mystery_seq', 41 hsps), Hit(id='gi|262205317|ref|NR_030195.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|8886975|gb|AC011453.4|', query_id='mystery_seq', 41 hsps), Hit(id='gi|301171311|ref|NR_035856.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|270133242|ref|NR_032573.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|156181109|gb|EU086471.1|', query_id='mystery_seq', 37 hsps), Hit(id='gi|156181108|gb|EU086470.1|', query_id='mystery_seq', 42 hsps), Hit(id='gi|154152211|gb|AC200331.2|', query_id='mystery_seq', 45 hsps), Hit(id='gi|301171322|ref|NR_035857.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|301171267|ref|NR_035851.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205330|ref|NR_030198.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|262205302|ref|NR_030191.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731442011|ref|NR_128601.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171259|ref|NR_035850.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205451|ref|NR_030222.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|34534892|dbj|AK127814.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|25140998|gb|AC008753.9|', query_id='mystery_seq', 15 hsps), Hit(id='gi|731185130|ref|NR_128576.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171447|ref|NR_035871.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|301171276|ref|NR_035852.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205290|ref|NR_030188.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|1620864509|ref|NR_128592.2|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731184631|ref|NR_128574.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171354|ref|NR_035860.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205281|ref|NR_030186.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|156181110|gb|EU086472.1|', query_id='mystery_seq', 53 hsps), Hit(id='gi|731184621|ref|NR_128582.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|262205298|ref|NR_030190.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|262205429|ref|NR_030218.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731184686|ref|NR_128594.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171394|ref|NR_035865.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205423|ref|NR_030217.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731184417|ref|NR_128578.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171401|ref|NR_035866.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|270133247|ref|NR_032574.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205309|ref|NR_030193.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731184517|ref|NR_128401.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|270132717|ref|NR_032716.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731184831|ref|NR_128586.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171437|ref|NR_035870.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|270133306|ref|NR_032587.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|731441564|ref|NR_128597.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731184772|ref|NR_128581.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171428|ref|NR_035869.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|301171211|ref|NR_035845.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171153|ref|NR_035838.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171146|ref|NR_035837.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|270133254|ref|NR_032575.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|262205445|ref|NR_030221.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205441|ref|NR_030220.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205325|ref|NR_030197.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|262205321|ref|NR_030196.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|1209185461|ref|NR_032569.2|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731184451|ref|NR_128587.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|301171236|ref|NR_035848.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|262205383|ref|NR_030209.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|301171342|ref|NR_035859.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205265|ref|NR_030183.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731184500|ref|NR_128579.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|731184438|ref|NR_128599.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171247|ref|NR_035849.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|262205313|ref|NR_030194.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|731185113|ref|NR_128588.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|301171225|ref|NR_035847.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|270133232|ref|NR_032571.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205392|ref|NR_030211.1|', query_id='mystery_seq', 2 hsps), Hit(id='gi|156181117|gb|EU086479.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731184691|ref|NR_128603.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731184220|ref|NR_128595.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171300|ref|NR_035855.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205294|ref|NR_030189.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|563318547|ref|NR_106505.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171161|ref|NR_035839.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|270133259|ref|NR_032576.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|270133218|ref|NR_032568.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205396|ref|NR_030212.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|1406928263|ref|NR_035872.2|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731441797|ref|NR_128583.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731184761|ref|NR_128573.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171602|ref|NR_035634.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171333|ref|NR_035858.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171291|ref|NR_035854.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171219|ref|NR_035846.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205435|ref|NR_030219.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205336|ref|NR_030199.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|269847477|ref|NR_031696.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731441601|ref|NR_128604.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731185213|ref|NR_128598.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|269846946|ref|NR_031573.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|262205358|ref|NR_030204.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|270133264|ref|NR_032577.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731184634|ref|NR_128600.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171169|ref|NR_035840.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731185470|ref|NR_128589.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|731185237|ref|NR_128596.1|', query_id='mystery_seq', 1 hsps), Hit(id='gi|301171206|ref|NR_035844.1|', query_id='mystery_seq', 1 hsps)]
blast_qresult.hit_keys #gives a list of all hit IDs
['gi|1549097453|gb|CP034522.1|', 'gi|1548994293|gb|CP034497.1|', 'gi|731185158|ref|NR_128580.1|', 'gi|2033715243|gb|CP068259.2|', 'gi|1909942460|dbj|AP023479.1|', 'gi|262205317|ref|NR_030195.1|', 'gi|8886975|gb|AC011453.4|', 'gi|301171311|ref|NR_035856.1|', 'gi|270133242|ref|NR_032573.1|', 'gi|156181109|gb|EU086471.1|', 'gi|156181108|gb|EU086470.1|', 'gi|154152211|gb|AC200331.2|', 'gi|301171322|ref|NR_035857.1|', 'gi|301171267|ref|NR_035851.1|', 'gi|262205330|ref|NR_030198.1|', 'gi|262205302|ref|NR_030191.1|', 'gi|731442011|ref|NR_128601.1|', 'gi|301171259|ref|NR_035850.1|', 'gi|262205451|ref|NR_030222.1|', 'gi|34534892|dbj|AK127814.1|', 'gi|25140998|gb|AC008753.9|', 'gi|731185130|ref|NR_128576.1|', 'gi|301171447|ref|NR_035871.1|', 'gi|301171276|ref|NR_035852.1|', 'gi|262205290|ref|NR_030188.1|', 'gi|1620864509|ref|NR_128592.2|', 'gi|731184631|ref|NR_128574.1|', 'gi|301171354|ref|NR_035860.1|', 'gi|262205281|ref|NR_030186.1|', 'gi|156181110|gb|EU086472.1|', 'gi|731184621|ref|NR_128582.1|', 'gi|262205298|ref|NR_030190.1|', 'gi|262205429|ref|NR_030218.1|', 'gi|731184686|ref|NR_128594.1|', 'gi|301171394|ref|NR_035865.1|', 'gi|262205423|ref|NR_030217.1|', 'gi|731184417|ref|NR_128578.1|', 'gi|301171401|ref|NR_035866.1|', 'gi|270133247|ref|NR_032574.1|', 'gi|262205309|ref|NR_030193.1|', 'gi|731184517|ref|NR_128401.1|', 'gi|270132717|ref|NR_032716.1|', 'gi|731184831|ref|NR_128586.1|', 'gi|301171437|ref|NR_035870.1|', 'gi|270133306|ref|NR_032587.1|', 'gi|731441564|ref|NR_128597.1|', 'gi|731184772|ref|NR_128581.1|', 'gi|301171428|ref|NR_035869.1|', 'gi|301171211|ref|NR_035845.1|', 'gi|301171153|ref|NR_035838.1|', 'gi|301171146|ref|NR_035837.1|', 'gi|270133254|ref|NR_032575.1|', 'gi|262205445|ref|NR_030221.1|', 'gi|262205441|ref|NR_030220.1|', 'gi|262205325|ref|NR_030197.1|', 'gi|262205321|ref|NR_030196.1|', 'gi|1209185461|ref|NR_032569.2|', 'gi|731184451|ref|NR_128587.1|', 'gi|301171236|ref|NR_035848.1|', 'gi|262205383|ref|NR_030209.1|', 'gi|301171342|ref|NR_035859.1|', 'gi|262205265|ref|NR_030183.1|', 'gi|731184500|ref|NR_128579.1|', 'gi|731184438|ref|NR_128599.1|', 'gi|301171247|ref|NR_035849.1|', 'gi|262205313|ref|NR_030194.1|', 'gi|731185113|ref|NR_128588.1|', 'gi|301171225|ref|NR_035847.1|', 'gi|270133232|ref|NR_032571.1|', 'gi|262205392|ref|NR_030211.1|', 'gi|156181117|gb|EU086479.1|', 'gi|731184691|ref|NR_128603.1|', 'gi|731184220|ref|NR_128595.1|', 'gi|301171300|ref|NR_035855.1|', 'gi|262205294|ref|NR_030189.1|', 'gi|563318547|ref|NR_106505.1|', 'gi|301171161|ref|NR_035839.1|', 'gi|270133259|ref|NR_032576.1|', 'gi|270133218|ref|NR_032568.1|', 'gi|262205396|ref|NR_030212.1|', 'gi|1406928263|ref|NR_035872.2|', 'gi|731441797|ref|NR_128583.1|', 'gi|731184761|ref|NR_128573.1|', 'gi|301171602|ref|NR_035634.1|', 'gi|301171333|ref|NR_035858.1|', 'gi|301171291|ref|NR_035854.1|', 'gi|301171219|ref|NR_035846.1|', 'gi|262205435|ref|NR_030219.1|', 'gi|262205336|ref|NR_030199.1|', 'gi|269847477|ref|NR_031696.1|', 'gi|731441601|ref|NR_128604.1|', 'gi|731185213|ref|NR_128598.1|', 'gi|269846946|ref|NR_031573.1|', 'gi|262205358|ref|NR_030204.1|', 'gi|270133264|ref|NR_032577.1|', 'gi|731184634|ref|NR_128600.1|', 'gi|301171169|ref|NR_035840.1|', 'gi|731185470|ref|NR_128589.1|', 'gi|731185237|ref|NR_128596.1|', 'gi|301171206|ref|NR_035844.1|']
"gi|301171169|ref|NR_035840.1|" in blast_qresult #checking if a particular hit is present in the query results
True
"gi|262205317|ref|NR_030194.1|" in blast_qresult
False
blast_qresult.index("gi|301171437|ref|NR_035870.1|") #check the rank of the hit
43
#id and sequence lenght of first 5 hits
for hit in blast_qresult[:5]:
print ("%s %i" % (hit.id, hit.seq_len))
gi|1549097453|gb|CP034522.1| 64242768 gi|1548994293|gb|CP034497.1| 64242768 gi|731185158|ref|NR_128580.1| 99 gi|2033715243|gb|CP068259.2| 61707364 gi|1909942460|dbj|AP023479.1| 59105444
sort_key = lambda hit: hit.seq_len
sorted_qresult = blast_qresult.sort(key=sort_key, reverse=True, in_place=False)
for hit in sorted_qresult[:5]:
print("%s %i" % (hit.id, hit.seq_len)) #"i" prints ids and "s" prints sequence
gi|1549097453|gb|CP034522.1| 64242768 gi|1548994293|gb|CP034497.1| 64242768 gi|2033715243|gb|CP068259.2| 61707364 gi|1909942460|dbj|AP023479.1| 59105444 gi|154152211|gb|AC200331.2| 212089
To filter and map your QueryResult objects you can use the hit_filter, hsp_filter and map, QueryResult methods. These methods filter its QueryResult object either on its Hit objects or HSP objects. The map, QueryResult objects provide the hit_map and hsp_map methods. These methods apply a given function to all hits or HSPs in a QueryResult object, respectively. The method accepts a callback function that checks whether a given Hit object passes the condition you set as True or False.
filter_func = lambda hit: len(hit.hsps) >1 #Callback function
len(blast_qresult) #nº of hits before filtering
100
filtered_qresult = blast_qresult.hit_filter(filter_func)
len(filtered_qresult) #nº of hits after filtering
28
for hit in filtered_qresult[:5]: #check for the hit lenghts
print("%s %i" % (hit.id, len (hit.hsps)))
gi|1549097453|gb|CP034522.1| 41 gi|1548994293|gb|CP034497.1| 41 gi|2033715243|gb|CP068259.2| 41 gi|1909942460|dbj|AP023479.1| 41 gi|8886975|gb|AC011453.4| 41
Example: use hit_map to rename the hit IDs:
def map_func(hit):
hit.id = hit.id.split("|")[3] # renames "gi|301171322|ref|NR_035857.1|" to "NR_035857.1"
return hit
mapped_qresult = blast_qresult.hit_map(map_func)
for hit in mapped_qresult[:5]:
print(hit.id)
CP034522.1 CP034497.1 NR_128580.1 CP068259.2 AP023479.1
Hit objects represent all query results from a single database entry. They are the second-level container in the Bio.SearchIO object. They are contained by QueryResult objects, but the Hit object in turn contains HSP objects.
blast_qresult = SearchIO.read("my_blast.xml", "blast-xml")
blast_hit = blast_qresult[3] # fourth hit from the query result
print(blast_hit)
Query: mystery_seq
Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| (61707364)
Homo sapiens isolate CHM13 chromosome 19
HSPs: ---- -------- --------- ------ --------------- ---------------------
# E-value Bit score Span Query range Hit range
---- -------- --------- ------ --------------- ---------------------
0 4.9e-21 111.29 61 [0:61] [56780727:56780788]
1 8.8e-18 100.47 60 [0:60] [56774740:56774800]
2 8.8e-18 100.47 60 [1:61] [56786970:56787030]
3 3.1e-17 98.67 59 [0:59] [56841844:56841903]
4 3.8e-16 95.96 60 [0:60] [56766006:56766066]
5 1.3e-15 93.26 61 [0:61] [56761684:56761745]
6 4.6e-15 91.45 60 [0:60] [56773921:56773981]
7 1.6e-14 90.55 57 [2:59] [56831900:56831957]
8 1.6e-14 89.65 59 [0:59] [56830713:56830772]
9 5.6e-14 88.75 61 [0:61] [56777904:56777965]
10 2.4e-12 82.44 60 [0:60] [56782248:56782308]
11 2.4e-12 82.44 60 [0:60] [56785767:56785827]
12 2.4e-12 82.44 50 [11:61] [56836255:56836305]
13 2.4e-12 82.44 50 [11:61] [56840647:56840697]
14 8.3e-12 80.63 59 [0:59] [56809344:56809403]
15 2.9e-11 79.73 61 [0:61] [56755236:56755297]
16 1e-10 77.93 60 [0:60] [56779528:56779588]
17 3.5e-10 76.13 59 [0:59] [56814383:56814442]
18 1.2e-09 74.32 58 [0:58] [56770413:56770471]
19 1.2e-09 73.42 50 [11:61] [56816365:56816415]
20 4.3e-09 72.52 57 [2:59] [56788264:56788321]
21 4.3e-09 72.52 47 [8:55] [56833523:56833570]
22 4.3e-09 71.62 58 [3:59] [56837732:56837790]
23 1.5e-08 70.72 60 [3:61] [56768018:56768078]
24 1.5e-08 70.72 60 [3:61] [56799602:56799662]
25 5.2e-08 68.01 47 [8:55] [56810518:56810565]
26 1.8e-07 67.11 59 [0:59] [56828140:56828199]
27 6.4e-07 65.31 61 [0:61] [56806419:56806476]
28 6.4e-07 64.40 59 [0:59] [56820824:56820881]
29 2.2e-06 63.50 47 [8:55] [56818847:56818894]
30 7.8e-06 60.80 63 [0:59] [56801672:56801734]
31 7.8e-06 60.80 63 [0:59] [56822018:56822080]
32 2.7e-05 59.90 60 [1:61] [56779528:56779588]
33 2.7e-05 59.90 60 [1:61] [56785767:56785827]
34 2.7e-05 58.99 58 [0:55] [56826628:56826686]
35 9.4e-05 57.19 57 [2:59] [56759467:56759521]
36 0.00033 55.39 60 [0:60] [56786970:56787030]
37 0.0012 54.49 57 [2:59] [56796101:56796158]
38 0.014 50.88 60 [1:61] [56773921:56773981]
39 0.049 49.08 59 [2:61] [56809344:56809403]
40 0.049 49.08 59 [2:61] [56814383:56814442]
The query ID and description is present. A hit is always tied to a query, so we want to keep track of the originating query as well. These values can be accessed from a hit using the query_id and query_description attributes.
Unique hit ID, description, and full sequence lengths. They can be accessed using id, description, and seq_len, respectively.
Table containing quick information about the HSPs this hit contains. In each row, we’ve got the important HSP details listed: the HSP index, its e-value, its bit score, its span (the alignment length including gaps), its query coordinates, and its hit coordinates.
The previously BLAT search provided one hit with 17 HSPs.
blat_qresult = SearchIO.read("my_blat.psl", "blat-psl")
blat_hit = blat_qresult[0] #the only hit
print(blat_hit)
Query: mystery_seq
<unknown description>
Hit: chr19 (58617616)
<unknown description>
HSPs: ---- -------- --------- ------ --------------- ---------------------
# E-value Bit score Span Query range Hit range
---- -------- --------- ------ --------------- ---------------------
0 ? ? ? [0:61] [53701226:53701287]
1 ? ? ? [0:61] [53729850:53761209]
2 ? ? ? [0:61] [53751223:53756817]
3 ? ? ? [1:61] [53707466:53707526]
4 ? ? ? [0:60] [53695222:53695282]
5 ? ? ? [0:61] [53762356:53762417]
6 ? ? ? [0:61] [53734889:53736921]
7 ? ? ? [0:60] [53686481:53686541]
8 ? ? ? [0:61] [53682171:53682232]
9 ? ? ? [0:60] [53694403:53694463]
10 ? ? ? [0:61] [53752408:53752469]
11 ? ? ? [0:61] [53698397:53698458]
12 ? ? ? [8:61] [53708764:53725513]
13 ? ? ? [8:60] [53702755:53702807]
14 ? ? ? [10:61] [53675733:53675784]
15 ? ? ? [8:51] [53731024:53731067]
16 ? ? ? [8:61] [53734889:53734942]
• The e-value and bit score column values. As BLAT HSPs do not have evalues and bit scores, the display defaults to ‘?’.
• The span values is meant to display the complete alignment length, which consists of all residues and any gaps that may be present. The PSL format do not have this information readily available and Bio.SearchIO does not attempt to try guess what it is, so we get a ‘?’ similar to the e-value and bit score columns.
for hsp in blast_hit:
print(hsp)
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:61] (1)
Hit range: [56780727:56780788] (1)
Quick stats: evalue 4.9e-21; bitscore 111.29
Fragments: 1 (61 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hit - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:60] (1)
Hit range: [56774740:56774800] (1)
Quick stats: evalue 8.8e-18; bitscore 100.47
Fragments: 1 (60 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGG
||||||| ||||||||||||||||||||||||||||||||||||||| ||||||||||||
Hit - CCCTCTAGAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCATCCTTTTAGAGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [1:61] (1)
Hit range: [56786970:56787030] (1)
Quick stats: evalue 8.8e-18; bitscore 100.47
Fragments: 1 (60 columns)
Query - CCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
|||||| |||||||| ||||||||||||||||||||||||||||||||||||||||||||
Hit - CCTCTAGAGGGAAGCACTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:59] (1)
Hit range: [56841844:56841903] (1)
Quick stats: evalue 3.1e-17; bitscore 98.67
Fragments: 1 (59 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAG
|||||||||||||||||||||||||||||||||||||| |||||||| |||||||||||
Hit - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAGGAAAGTGCATCCTTTTAGAG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:60] (1)
Hit range: [56766006:56766066] (1)
Quick stats: evalue 3.8e-16; bitscore 95.96
Fragments: 1 (60 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGG
||||||| ||||||||||||||||||||||||||||||||||||||| || |||||||||
Hit - CCCTCTAGAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCATCTTTTTAGAGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:61] (1)
Hit range: [56761684:56761745] (1)
Quick stats: evalue 1.3e-15; bitscore 93.26
Fragments: 1 (61 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
||||||| ||||||||||||||||| ||| |||||||||| ||||||||||||||||||||
Hit - CCCTCTAAAGGGAAGCGCTTTCTGTGGTCAGAAAGAAAAGCAAGTGCTTCCTTTTAGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:60] (1)
Hit range: [56773921:56773981] (1)
Quick stats: evalue 4.6e-15; bitscore 91.45
Fragments: 1 (60 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGG
|||||| |||||||| |||||||||||||||||||| ||||||||||||||||||||||
Hit - CCCTCTTGAGGGAAGCACTTTCTGTTGTCTGAAAGAAGAGAAAGTGCTTCCTTTTAGAGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [2:59] (1)
Hit range: [56831900:56831957] (1)
Quick stats: evalue 1.6e-14; bitscore 90.55
Fragments: 1 (57 columns)
Query - CTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAG
||||| |||||||||||||||||||||||||||||| |||||||| |||||||||||
Hit - CTCTAGAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAGGAAAGTGCATCCTTTTAGAG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:59] (1)
Hit range: [56830713:56830772] (1)
Quick stats: evalue 1.6e-14; bitscore 89.65
Fragments: 1 (59 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAG
||||||| ||||||||||||||||||||||||||||||||||| || |||| |||||||
Hit - CCCTCTAGAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAATGGTTCCCTTTAGAG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:61] (1)
Hit range: [56777904:56777965] (1)
Quick stats: evalue 5.6e-14; bitscore 88.75
Fragments: 1 (61 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
||||||| |||||||||||||||||||||||||||||||||| | |||||| | |||||||
Hit - CCCTCTAGAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAACGCGCTTCCCTATAGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:60] (1)
Hit range: [56782248:56782308] (1)
Quick stats: evalue 2.4e-12; bitscore 82.44
Fragments: 1 (60 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGG
||||| | ||||||||||||||||||||||||||||||| |||| ||| || ||||||||
Hit - CCCTCCAGAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAACAAAGCGCTCCCCTTTAGAGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:60] (1)
Hit range: [56785767:56785827] (1)
Quick stats: evalue 2.4e-12; bitscore 82.44
Fragments: 1 (60 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGG
||||||| |||||||| |||||||||| |||||||| |||||| |||||||||||||||
Hit - CCCTCTAGAGGGAAGCACTTTCTGTTGCTTGAAAGAAGAGAAAGCGCTTCCTTTTAGAGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [11:61] (1)
Hit range: [56836255:56836305] (1)
Quick stats: evalue 2.4e-12; bitscore 82.44
Fragments: 1 (50 columns)
Query - GAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
||||| ||||||||||||||||||||||||||||||||||||| ||||||
Hit - GAAGCACTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTCAGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [11:61] (1)
Hit range: [56840647:56840697] (1)
Quick stats: evalue 2.4e-12; bitscore 82.44
Fragments: 1 (50 columns)
Query - GAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
||||| ||||||||||||||||||||||||||||||||||||| ||||||
Hit - GAAGCACTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTCAGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:59] (1)
Hit range: [56809344:56809403] (1)
Quick stats: evalue 8.3e-12; bitscore 80.63
Fragments: 1 (59 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAG
||||||| ||||||||||||||||||| || ||||||||||||| |||||| || ||||
Hit - CCCTCTAGAGGGAAGCGCTTTCTGTTGGCTAAAAGAAAAGAAAGCGCTTCCCTTCAGAG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:61] (1)
Hit range: [56755236:56755297] (1)
Quick stats: evalue 2.9e-11; bitscore 79.73
Fragments: 1 (61 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
||||| | | |||||| ||||||||||||||||| ||||||||||||||||||| |||||
Hit - CCCTCAAGATGGAAGCAGTTTCTGTTGTCTGAAAGGAAAGAAAGTGCTTCCTTTTTGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:60] (1)
Hit range: [56779528:56779588] (1)
Quick stats: evalue 1e-10; bitscore 77.93
Fragments: 1 (60 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGG
||||||| |||||||| |||||| |||||| ||||||||||||| ||||| ||||||||
Hit - CCCTCTAGAGGGAAGCACTTTCTCTTGTCTAAAAGAAAAGAAAGCGCTTCTCTTTAGAGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:59] (1)
Hit range: [56814383:56814442] (1)
Quick stats: evalue 3.5e-10; bitscore 76.13
Fragments: 1 (59 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAG
||||||| |||||||| ||||||||||||||||||||| |||| |||||| ||| |||
Hit - CCCTCTAGAGGGAAGCACTTTCTGTTGTCTGAAAGAAACCAAAGCGCTTCCCTTTGGAG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:58] (1)
Hit range: [56770413:56770471] (1)
Quick stats: evalue 1.2e-09; bitscore 74.32
Fragments: 1 (58 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGA
||||| | ||||||| ||||||||||||||| |||||||||||||||||| ||| ||
Hit - CCCTCCAGAGGGAAGTACTTTCTGTTGTCTGAGAGAAAAGAAAGTGCTTCCCTTTGGA
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [11:61] (1)
Hit range: [56816365:56816415] (1)
Quick stats: evalue 1.2e-09; bitscore 73.42
Fragments: 1 (50 columns)
Query - GAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
||||| ||||||||| | ||||||||||||||||||||||||| ||||||
Hit - GAAGCACTTTCTGTTTTGTGAAAGAAAAGAAAGTGCTTCCTTTCAGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [2:59] (1)
Hit range: [56788264:56788321] (1)
Quick stats: evalue 4.3e-09; bitscore 72.52
Fragments: 1 (57 columns)
Query - CTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAG
|||| |||||||| |||||||||||||||||||||| |||| ||||| |||||||
Hit - CTCTGGAGGGAAGCACTTTCTGTTGTCTGAAAGAAAACAAAGCGCTTCTCTTTAGAG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [8:55] (1)
Hit range: [56833523:56833570] (1)
Quick stats: evalue 4.3e-09; bitscore 72.52
Fragments: 1 (47 columns)
Query - AGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTT
|||||||| ||||||||||||| |||||||||||||||||||| |||
Hit - AGGGAAGCCCTTTCTGTTGTCTAAAAGAAAAGAAAGTGCTTCCCTTT
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [3:59] (1)
Hit range: [56837732:56837790] (1)
Quick stats: evalue 4.3e-09; bitscore 71.62
Fragments: 1 (58 columns)
Query - TCTACA--GGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAG
|||||| || ||||||||||||||||||||||||||||||| |||||| ||| |||
Hit - TCTACAAAGGAAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAATCGCTTCCCTTTGGAG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [3:61] (1)
Hit range: [56768018:56768078] (1)
Quick stats: evalue 1.5e-08; bitscore 70.72
Fragments: 1 (60 columns)
Query - TCTACA--GGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
|||||| || ||||||||||||||||| ||||||| |||||| |||||| ||| |||||
Hit - TCTACAAAGGAAAGCGCTTTCTGTTGTCAGAAAGAAGAGAAAGCGCTTCCCTTTTGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [3:61] (1)
Hit range: [56799602:56799662] (1)
Quick stats: evalue 1.5e-08; bitscore 70.72
Fragments: 1 (60 columns)
Query - TCTACA--GGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
|||||| ||||||| ||||||||||||| ||||||||||||||||||| ||| | |||
Hit - TCTACAAAGGGAAGCCCTTTCTGTTGTCTAAAAGAAAAGAAAGTGCTTCTCTTTGGTGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [8:55] (1)
Hit range: [56810518:56810565] (1)
Quick stats: evalue 5.2e-08; bitscore 68.01
Fragments: 1 (47 columns)
Query - AGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTT
|||||||| |||||||||||||||||||| |||||| |||||| |||
Hit - AGGGAAGCCCTTTCTGTTGTCTGAAAGAAGAGAAAGCGCTTCCCTTT
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:59] (1)
Hit range: [56828140:56828199] (1)
Quick stats: evalue 1.8e-07; bitscore 67.11
Fragments: 1 (59 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAG
||||| | ||||||| ||||||||||||| ||||||||||| | ||||| |||||||
Hit - CCCTCCAAAGGGAAGAACTTTCTGTTGTCTAAAAGAAAAGAACGCACTTCCCTTTAGAG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:61] (1)
Hit range: [56806419:56806476] (1)
Quick stats: evalue 6.4e-07; bitscore 65.31
Fragments: 1 (61 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
||||||| |||||||| |||||||||| ||||||||||| ||| |||||| ||||||
Hit - CCCTCTAGAGGGAAGCACTTTCTGTTG----AAAGAAAAGAACATGCATCCTTTCAGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:59] (1)
Hit range: [56820824:56820881] (1)
Quick stats: evalue 6.4e-07; bitscore 64.40
Fragments: 1 (59 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAG
||||||| | |||||| || |||||||||| ||||||||| |||| |||||||||||
Hit - CCCTCTAGATGGAAGCACTGTCTGTTGTCT--AAGAAAAGATCGTGCATCCTTTTAGAG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [8:55] (1)
Hit range: [56818847:56818894] (1)
Quick stats: evalue 2.2e-06; bitscore 63.50
Fragments: 1 (47 columns)
Query - AGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTT
|||||||| ||||||||||||| |||||| |||||| |||||| |||
Hit - AGGGAAGCCCTTTCTGTTGTCTAAAAGAAGAGAAAGCGCTTCCCTTT
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:59] (1)
Hit range: [56801672:56801734] (1)
Quick stats: evalue 7.8e-06; bitscore 60.80
Fragments: 1 (63 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTT----GTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAG
||||||| ||| |||| ||||||||| |||||| | |||| ||||||||||| |||||||
Hit - CCCTCTAGAGG-AAGCACTTTCTGTTTGTTGTCTGAGAAAAAACAAAGTGCTTCCCTTTAGAG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:59] (1)
Hit range: [56822018:56822080] (1)
Quick stats: evalue 7.8e-06; bitscore 60.80
Fragments: 1 (63 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTT----GTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAG
||||||| ||| |||| ||||||||| |||||| | |||| ||||||||||| |||||||
Hit - CCCTCTAGAGG-AAGCACTTTCTGTTTGTTGTCTGAGAAAAAACAAAGTGCTTCCCTTTAGAG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [1:61] (1)
Hit range: [56779528:56779588] (-1)
Quick stats: evalue 2.7e-05; bitscore 59.90
Fragments: 1 (60 columns)
Query - CCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
|||||| || |||||||||||| || | | | | || ||||||||||||| | |||||||
Hit - CCTCTAAAGAGAAGCGCTTTCTTTTCTTTTAGACAAGAGAAAGTGCTTCCCTCTAGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [1:61] (1)
Hit range: [56785767:56785827] (-1)
Quick stats: evalue 2.7e-05; bitscore 59.90
Fragments: 1 (60 columns)
Query - CCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
|||||| | ||||||||||||| || | | || || ||||||||||||| | |||||||
Hit - CCTCTAAAAGGAAGCGCTTTCTCTTCTTTCAAGCAACAGAAAGTGCTTCCCTCTAGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:55] (1)
Hit range: [56826628:56826686] (1)
Quick stats: evalue 2.7e-05; bitscore 58.99
Fragments: 1 (58 columns)
Query - CCCTCTACAGG---GAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTT
|| |||| ||| ||||| ||||||||||||| |||||||||||||| ||| ||||
Hit - CCATCTAGAGGTAAGAAGCACTTTCTGTTGTCTTAAAGAAAAGAAAGTCCTTTTTTTT
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [2:59] (1)
Hit range: [56759467:56759521] (1)
Quick stats: evalue 9.4e-05; bitscore 57.19
Fragments: 1 (57 columns)
Query - CTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAG
||| | | || ||| ||||||||| |||||||||| ||||||| |||||||||||
Hit - CTCCAAAAGGGAGCACTTTCTGTT---TGAAAGAAAACAAAGTGCCTCCTTTTAGAG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [0:60] (1)
Hit range: [56786970:56787030] (-1)
Quick stats: evalue 0.00033; bitscore 55.39
Fragments: 1 (60 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGG
||||||| | |||||| |||||| || | | | | || ||||||||||||| | ||||||
Hit - CCCTCTAAAAGGAAGCACTTTCTTTTCTTTCAGACAACAGAAAGTGCTTCCCTCTAGAGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [2:59] (1)
Hit range: [56796101:56796158] (1)
Quick stats: evalue 0.0012; bitscore 54.49
Fragments: 1 (57 columns)
Query - CTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAG
||| | ||||||| ||||| |||||| ||||||||||| | ||||| |||||||
Hit - CTCCAAAGGGAAGAATTTTCTCTTGTCTAAAAGAAAAGAACGCACTTCCCTTTAGAG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [1:61] (1)
Hit range: [56773921:56773981] (-1)
Quick stats: evalue 0.014; bitscore 50.88
Fragments: 1 (60 columns)
Query - CCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
|||||| | |||||| |||||| || | | | | || ||||||||||||| | ||||||
Hit - CCTCTAAAAGGAAGCACTTTCTCTTCTTTCAGACAACAGAAAGTGCTTCCCTCAAGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [2:61] (1)
Hit range: [56809344:56809403] (-1)
Quick stats: evalue 0.049; bitscore 49.08
Fragments: 1 (59 columns)
Query - CTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
|||| ||||||||||||||| || | | | || |||||| |||||| | |||||||
Hit - CTCTGAAGGGAAGCGCTTTCTTTTCTTTTAGCCAACAGAAAGCGCTTCCCTCTAGAGGG
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|2033715243|gb|CP068259.2| Homo sapiens isolate CHM13 chromoso...
Query range: [2:61] (1)
Hit range: [56814383:56814442] (-1)
Quick stats: evalue 0.049; bitscore 49.08
Fragments: 1 (59 columns)
Query - CTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
||| | ||||||||||||| || | | | | || ||||||||||||| | |||||||
Hit - CTCCAAAGGGAAGCGCTTTGGTTTCTTTCAGACAACAGAAAGTGCTTCCCTCTAGAGGG
len (blast_hit)
41
len(blat_hit)
17
blat_hit[0] #retrieve single items
HSP(hit_id='chr19', query_id='mystery_seq', 1 fragments)
sliced_hit = blat_hit[4:9] #retrieve mutiple items
len(sliced_hit)
5
print(sliced_hit)
Query: mystery_seq
<unknown description>
Hit: chr19 (58617616)
<unknown description>
HSPs: ---- -------- --------- ------ --------------- ---------------------
# E-value Bit score Span Query range Hit range
---- -------- --------- ------ --------------- ---------------------
0 ? ? ? [0:60] [53695222:53695282]
1 ? ? ? [0:61] [53762356:53762417]
2 ? ? ? [0:61] [53734889:53736921]
3 ? ? ? [0:60] [53686481:53686541]
4 ? ? ? [0:61] [53682171:53682232]
HSP (high-scoring pair) represents region(s) in the hit sequence that contains significant alignment(s) to the query sequence. It contains the actual match between your query sequence and a database entry. As this match is determined by the sequence search tool’s algorithms, the HSP object contains the bulk of the statistics computed by the search tool. This also makes the distinction between HSP objects from different search tools more apparent compared to the differences you’ve seen in QueryResult or Hit objects.
blast_qresult = SearchIO.read("my_blast.xml", "blast-xml")
blast_hsp = blast_qresult[0][0] #first hit, first hsp
print(blast_hsp)
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|1549097453|gb|CP034522.1| Eukaryotic synthetic construct chro...
Query range: [0:61] (1)
Hit range: [54144480:54144541] (1)
Quick stats: evalue 4.9e-21; bitscore 111.29
Fragments: 1 (61 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hit - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
blast_hsp.query_range
(0, 61)
blast_hsp.evalue
4.89015e-21
blast_hsp.hit_start # start coordinate of the hit sequence
54144480
blast_hsp.query_span # how many residues in the query sequence
61
blast_hsp.aln_span # how long the alignment is
61
blast_hsp.gap_num # number of gaps
0
blast_hsp.ident_num # number of identical residues
61
These details are format-specific; they may not be present in other formats.
use .dict.keys() for a quick list of what’s available:
blast_hsp.__dict__.keys()
dict_keys(['output_index', '_items', 'bitscore', 'bitscore_raw', 'evalue', 'ident_num', 'pos_num', 'gap_num'])
blast_hsp.query
SeqRecord(seq=Seq('CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTT...GGG'), id='mystery_seq', name='aligned query sequence', description='Homo sapiens microRNA 520b (MIR520B), microRNA', dbxrefs=[])
blast_hsp.hit
SeqRecord(seq=Seq('CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTT...GGG'), id='gi|1549097453|gb|CP034522.1|', name='aligned hit sequence', description='Eukaryotic synthetic construct chromosome 19', dbxrefs=[])
print(blast_hsp.aln)
Alignment with 2 rows and 61 columns CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAG...GGG mystery_seq CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAG...GGG gi|1549097453|gb|CP034522.1|
blast_frag = blast_qresult [0][0][0]
print (blast_frag)
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|1549097453|gb|CP034522.1| Eukaryotic synthetic construct chro...
Query range: [0:61] (1)
Hit range: [54144480:54144541] (1)
Fragments: 1 (61 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hit - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
HSPFragment represents a single, contiguous match between the query and hit sequences. Consider it the core of the object model and search result, since it is the presence of these fragments that determine whether your search have results or not.
HSPFragment objects were written with to be as compact as possible. In most cases, they only contain attributes directly related to sequences: strands, reading frames, molecule types, coordinates,the sequences themselves, and their IDs and descriptions.
blast_qresult = SearchIO.read("my_blast.xml", "blast-xml")
blast_frag = blast_qresult[0][0][0] #first hit, first hsp, first fragment
print(blast_frag)
Query: mystery_seq Homo sapiens microRNA 520b (MIR520B), microRNA
Hit: gi|1549097453|gb|CP034522.1| Eukaryotic synthetic construct chro...
Query range: [0:61] (1)
Hit range: [54144480:54144541] (1)
Fragments: 1 (61 columns)
Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hit - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
blast_frag.query_start # query start coordinate
0
blast_frag.hit_strand # hit sequence strand
1
blast_frag.hit # hit sequence, as a SeqRecord object
SeqRecord(seq=Seq('CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTT...GGG'), id='gi|1549097453|gb|CP034522.1|', name='aligned hit sequence', description='Eukaryotic synthetic construct chromosome 19', dbxrefs=[])
Use read and parse (submodules like Bio.SeqIO or Bio.AlignIO)
Requirement: you need to supply the search output file name and the file format name.
returns QueryResult object.
from Bio import SearchIO
qresult = SearchIO.read("tab_2226_tblastn_003.txt", "blast-tab")
qresult
QueryResult(id='gi|16080617|ref|NP_391444.1|', 3 hits)
qresult2 = SearchIO.read("tab_2226_tblastn_007.txt", "blast-tab",
comments=True)
qresult2
QueryResult(id='gi|16080617|ref|NP_391444.1|', 3 hits)
qresults = SearchIO.parse("tab_2226_tblastn_001.txt", "blast-tab")
for qresult in qresults:
print(qresult.id)
gi|16080617|ref|NP_391444.1| gi|11464971:4-101
qresults2 = SearchIO.parse("tab_2226_tblastn_005.txt", "blast-tab",
comments=True)
for qresult in qresults2:
print(qresult.id)
random_s00 gi|16080617|ref|NP_391444.1| gi|11464971:4-101
For large file you can use Bio.SearchIO.index or Bio.SearchIO.index_db to index the files.
idx = SearchIO.index("tab_2226_tblastn_001.txt", "blast-tab")
sorted(idx.keys())
['gi|11464971:4-101', 'gi|16080617|ref|NP_391444.1|']
idx["gi|16080617|ref|NP_391444.1|"]
QueryResult(id='gi|16080617|ref|NP_391444.1|', 3 hits)
idx.close()
idx = SearchIO.index("tab_2226_tblastn_005.txt", "blast-tab",
comments=True)
sorted(idx.keys())
['gi|11464971:4-101', 'gi|16080617|ref|NP_391444.1|', 'random_s00']
idx["gi|16080617|ref|NP_391444.1|"]
QueryResult(id='gi|16080617|ref|NP_391444.1|', 3 hits)
idx.close()
key_function = lambda id: id.upper() # capitalizes the keys
idx = SearchIO.index("tab_2226_tblastn_001.txt", "blast-tab",
key_function=key_function)
sorted(idx.keys())
['GI|11464971:4-101', 'GI|16080617|REF|NP_391444.1|']
idx["GI|16080617|REF|NP_391444.1|"]
QueryResult(id='gi|16080617|ref|NP_391444.1|', 3 hits)
idx.close()
Occasionally, you may need to manipulate search results from an output file and rewrite it to a new file. You can use the write function in Bio.SearchIO to do this. It takes a QueryResult objects, the output filename to write to. It returns a four-item tuple, which denotes the number or QueryResult, Hit, HSP, and HSPFragment objects that were written.
qresults = SearchIO.parse("mirna.xml", "blast-xml")
SearchIO.write(qresults, "results.tab", "blast-tab") # write totabular file
(3, 239, 277, 277)
SearchIO.convert("mirna.xml", "blast-xml", "results.tab", "blast-tab")
(3, 239, 277, 277)