BioPython Sequence Objects Part 2

Beatriz Manso

1. Setting working directory and import Libraries:

2. Create a sequence object

2.1. Access each nucelotide of the sequence.

3. Print GC Content

Biopython SeqUtils also has an inbuilt GC content calculating function:

4. Slice

Biopython sequence objects can be sliced in the same way as a normal string in Python

We have just taken the section between position 4 to 12 from our sequence above.

An extended slicing feature available is to be able to extract every "nth" position specified in the string. This is achieved using a double colon "::" eg. to extract every third position from position 0 would be done as follows:

5. Reverse a string using -1:

6. Convert sequence object into Python string

7. Concatenating Sequences:

Loop to concatenate many sequences:

The .join method available for Python strings can also be used for Biopython Seq objects.

The contactenated sequences can be separated using a specified delimiter:

If sequences of difference cases are present in an uploaded sequence, or have been combined during analysis, they can be converted into the same case using the upper or lower methods:

If searching to find an exact match of a substring within a string, if the case is different, it will not be identified as present

If the case is set and then the substring is searched, it can now be identified:

8. Translation Tables

To import and view the codon tables, the Bio.Data package codon table can be used.

First import the Standard translation table, and the translation table for Vertebrate Mitochondrial DNA:

9. Comparing Seq objects

Sequences can be compared to assess if they match

The Seq object is “read only” (immutable) like other Python string. The benefit of this is that you want to ensure you are not changing your sequence data during analysis.

You will get an erorr if you try an alter your Seq:

You can make your sequence mutable if you wish to alter it, first you need to turn your Seq object into a string as we did above:

Alternatively, the mutable object can be directly produced:

Now the sequence can be altered:

10. Working with unknown sequences

There is an UnknownSeq object that is a subclass of the basic Seq object, and its purpose is to represent a sequence where we know the length, but not the individual letters.

The Seq object could be used in this scenario, but it would waste a lot of memory to hold a million "N" characters when it could be stored as a single letter "N" and the desired length as an integer.

For DNA or RNA sequences, unknown nucleotides are commonly denoted by the letter “N”, while for proteins “X” is commonly used for unknown amino acids. When creating an ‘UnknownSeq‘, you can specify the character to be used instead of “?” to represent unknown letters: