import os
os.chdir("C:/Users/manso/OneDrive - University of West London/MSc Bioinformatics - UWL/3.DSB - Data Science for Bioinformatics/Practice/DSB W6")


! pip install markdown --user
! pip install nbconvert

import markdown
html = markdown.markdown('/SequenceAnnotation.ipynb')
print(html)

Requirement already satisfied: markdown in c:\users\manso\appdata\roaming\python\python38\site-packages (3.3.7)
Requirement already satisfied: importlib-metadata>=4.4; python_version < "3.10" in c:\users\manso\appdata\roaming\python\python38\site-packages (from markdown) (4.11.4)
Requirement already satisfied: zipp>=0.5 in c:\programdata\anaconda3\lib\site-packages (from importlib-metadata>=4.4; python_version < "3.10"->markdown) (3.4.0)
Requirement already satisfied: nbconvert in c:\programdata\anaconda3\lib\site-packages (6.0.7)
Requirement already satisfied: nbformat>=4.4 in c:\programdata\anaconda3\lib\site-packages (from nbconvert) (5.0.8)
Requirement already satisfied: mistune<2,>=0.8.1 in c:\programdata\anaconda3\lib\site-packages (from nbconvert) (0.8.4)
Requirement already satisfied: defusedxml in c:\programdata\anaconda3\lib\site-packages (from nbconvert) (0.6.0)
Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in c:\programdata\anaconda3\lib\site-packages (from nbconvert) (0.5.1)
Requirement already satisfied: pandocfilters>=1.4.1 in c:\programdata\anaconda3\lib\site-packages (from nbconvert) (1.4.3)
Requirement already satisfied: traitlets>=4.2 in c:\programdata\anaconda3\lib\site-packages (from nbconvert) (5.0.5)
Requirement already satisfied: entrypoints>=0.2.2 in c:\programdata\anaconda3\lib\site-packages (from nbconvert) (0.3)
Requirement already satisfied: jinja2>=2.4 in c:\programdata\anaconda3\lib\site-packages (from nbconvert) (2.11.2)
Requirement already satisfied: bleach in c:\programdata\anaconda3\lib\site-packages (from nbconvert) (3.2.1)
Requirement already satisfied: testpath in c:\programdata\anaconda3\lib\site-packages (from nbconvert) (0.4.4)
Requirement already satisfied: jupyter-core in c:\programdata\anaconda3\lib\site-packages (from nbconvert) (4.6.3)
Requirement already satisfied: jupyterlab-pygments in c:\programdata\anaconda3\lib\site-packages (from nbconvert) (0.1.2)
Requirement already satisfied: pygments>=2.4.1 in c:\programdata\anaconda3\lib\site-packages (from nbconvert) (2.7.2)
Requirement already satisfied: ipython-genutils in c:\programdata\anaconda3\lib\site-packages (from nbformat>=4.4->nbconvert) (0.2.0)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in c:\programdata\anaconda3\lib\site-packages (from nbformat>=4.4->nbconvert) (3.2.0)
Requirement already satisfied: jupyter-client>=6.1.5 in c:\programdata\anaconda3\lib\site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert) (6.1.7)
Requirement already satisfied: nest-asyncio in c:\programdata\anaconda3\lib\site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert) (1.4.2)
Requirement already satisfied: async-generator in c:\programdata\anaconda3\lib\site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert) (1.10)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\manso\appdata\roaming\python\python38\site-packages (from jinja2>=2.4->nbconvert) (2.0.1)
Requirement already satisfied: packaging in c:\programdata\anaconda3\lib\site-packages (from bleach->nbconvert) (20.4)
Requirement already satisfied: six>=1.9.0 in c:\programdata\anaconda3\lib\site-packages (from bleach->nbconvert) (1.15.0)
Requirement already satisfied: webencodings in c:\programdata\anaconda3\lib\site-packages (from bleach->nbconvert) (0.5.1)
Requirement already satisfied: pywin32>=1.0; sys_platform == "win32" in c:\programdata\anaconda3\lib\site-packages (from jupyter-core->nbconvert) (227)
Requirement already satisfied: attrs>=17.4.0 in c:\programdata\anaconda3\lib\site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.4->nbconvert) (20.3.0)
Requirement already satisfied: setuptools in c:\programdata\anaconda3\lib\site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.4->nbconvert) (50.3.1.post20201107)
Requirement already satisfied: pyrsistent>=0.14.0 in c:\programdata\anaconda3\lib\site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.4->nbconvert) (0.17.3)
Requirement already satisfied: python-dateutil>=2.1 in c:\programdata\anaconda3\lib\site-packages (from jupyter-client>=6.1.5->nbclient<0.6.0,>=0.5.0->nbconvert) (2.8.1)
Requirement already satisfied: tornado>=4.1 in c:\programdata\anaconda3\lib\site-packages (from jupyter-client>=6.1.5->nbclient<0.6.0,>=0.5.0->nbconvert) (6.0.4)
Requirement already satisfied: pyzmq>=13 in c:\programdata\anaconda3\lib\site-packages (from jupyter-client>=6.1.5->nbclient<0.6.0,>=0.5.0->nbconvert) (19.0.2)
Requirement already satisfied: pyparsing>=2.0.2 in c:\programdata\anaconda3\lib\site-packages (from packaging->bleach->nbconvert) (2.4.7)
<p>/SequenceAnnotation.ipynb</p>


# "!" sends the command to the shell
! jupyter nbconvert --to html /SequenceAnnotation.ipynb

This application is used to convert notebook files (*.ipynb) to various other
formats.

WARNING: THE COMMANDLINE INTERFACE MAY CHANGE IN FUTURE RELEASES.

Options
=======
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--debug
    set log level to logging.DEBUG (maximize logging output)
    Equivalent to: [--Application.log_level=10]
--generate-config
    generate default config file
    Equivalent to: [--JupyterApp.generate_config=True]
-y
    Answer yes to any questions instead of prompting.
    Equivalent to: [--JupyterApp.answer_yes=True]
--execute
    Execute the notebook prior to export.
    Equivalent to: [--ExecutePreprocessor.enabled=True]
--allow-errors
    Continue notebook execution even if one of the cells throws an error and include the error message in the cell output (the default behaviour is to abort conversion). This flag is only relevant if '--execute' was specified, too.
    Equivalent to: [--ExecutePreprocessor.allow_errors=True]
--stdin
    read a single notebook file from stdin. Write the resulting notebook with default basename 'notebook.*'
    Equivalent to: [--NbConvertApp.from_stdin=True]
--stdout
    Write notebook output to stdout instead of files.
    Equivalent to: [--NbConvertApp.writer_class=StdoutWriter]
--inplace
    Run nbconvert in place, overwriting the existing notebook (only 
    relevant when converting to notebook format)
    Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory=]
--clear-output
    Clear output of current file and save in place, 
    overwriting the existing notebook.
    Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory= --ClearOutputPreprocessor.enabled=True]
--no-prompt
    Exclude input and output prompts from converted document.
    Equivalent to: [--TemplateExporter.exclude_input_prompt=True --TemplateExporter.exclude_output_prompt=True]
--no-input
    Exclude input cells and output prompts from converted document. 
    This mode is ideal for generating code-free reports.
    Equivalent to: [--TemplateExporter.exclude_output_prompt=True --TemplateExporter.exclude_input=True]
--allow-chromium-download
    Whether to allow downloading chromium if no suitable version is found on the system.
    Equivalent to: [--WebPDFExporter.allow_chromium_download=True]
--log-level=<Enum>
    Set the log level by value or name.
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 30
    Equivalent to: [--Application.log_level]
--config=<Unicode>
    Full path of a config file.
    Default: ''
    Equivalent to: [--JupyterApp.config_file]
--to=<Unicode>
    The export format to be used, either one of the built-in formats
    ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf',
    'python', 'rst', 'script', 'slides', 'webpdf'] or a dotted object name that

[NbConvertApp] WARNING | pattern '/SequenceAnnotation.ipynb' matched no files

    represents the import path for an `Exporter` class
    Default: ''
    Equivalent to: [--NbConvertApp.export_format]
--template=<Unicode>
    Name of the template to use
    Default: ''
    Equivalent to: [--TemplateExporter.template_name]
--template-file=<Unicode>
    Name of the template file to use
    Default: None
    Equivalent to: [--TemplateExporter.template_file]
--writer=<DottedObjectName>
    Writer class used to write the  results of the conversion
    Default: 'FilesWriter'
    Equivalent to: [--NbConvertApp.writer_class]
--post=<DottedOrNone>
    PostProcessor class used to write the results of the conversion
    Default: ''
    Equivalent to: [--NbConvertApp.postprocessor_class]
--output=<Unicode>
    overwrite base name use for output files. can only be used when converting
    one notebook at a time.
    Default: ''
    Equivalent to: [--NbConvertApp.output_base]
--output-dir=<Unicode>
    Directory to write output(s) to. Defaults to output to the directory of each
    notebook. To recover previous default behaviour (outputting to the current
    working directory) use . as the flag value.
    Default: ''
    Equivalent to: [--FilesWriter.build_directory]
--reveal-prefix=<Unicode>
    The URL prefix for reveal.js (version 3.x). This defaults to the reveal CDN,
    but can be any url pointing to a copy  of reveal.js.
    For speaker notes to work, this must be a relative path to a local  copy of
    reveal.js: e.g., "reveal.js".
    If a relative path is given, it must be a subdirectory of the current
    directory (from which the server is run).
    See the usage documentation
    (https://nbconvert.readthedocs.io/en/latest/usage.html#reveal-js-html-
    slideshow) for more details.
    Default: ''
    Equivalent to: [--SlidesExporter.reveal_url_prefix]
--nbformat=<Enum>
    The nbformat version to write. Use this to downgrade notebooks.
    Choices: any of [1, 2, 3, 4]
    Default: 4
    Equivalent to: [--NotebookExporter.nbformat_version]

Examples
--------

    The simplest way to use nbconvert is
    
    > jupyter nbconvert mynotebook.ipynb --to html
    
    Options include ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'rst', 'script', 'slides', 'webpdf'].
    
    > jupyter nbconvert --to latex mynotebook.ipynb
    
    Both HTML and LaTeX support multiple output templates. LaTeX includes
    'base', 'article' and 'report'.  HTML includes 'basic' and 'full'. You
    can specify the flavor of the format used.
    
    > jupyter nbconvert --to html --template lab mynotebook.ipynb
    
    You can also pipe the output to stdout, rather than a file
    
    > jupyter nbconvert mynotebook.ipynb --stdout
    
    PDF is generated via latex
    
    > jupyter nbconvert mynotebook.ipynb --to pdf
    
    You can get (and serve) a Reveal.js-powered slideshow
    
    > jupyter nbconvert myslides.ipynb --to slides --post serve
    
    Multiple notebooks can be given at the command line in a couple of 
    different ways:
    
    > jupyter nbconvert notebook*.ipynb
    > jupyter nbconvert notebook1.ipynb notebook2.ipynb
    
    or you can specify the notebooks list in a config file, containing::
    
        c.NbConvertApp.notebooks = ["my_notebook.ipynb"]
    
    > jupyter nbconvert --config mycfg.py

To see all available configurables, use `--help-all`.


!pip install -U notebook-as-pdf
!pip install biopython

Collecting notebook-as-pdf
  Using cached notebook_as_pdf-0.5.0-py3-none-any.whl (6.5 kB)
Requirement already satisfied, skipping upgrade: PyPDF2 in c:\programdata\anaconda3\lib\site-packages (from notebook-as-pdf) (2.3.1)
Collecting pyppeteer
  Using cached pyppeteer-1.0.2-py3-none-any.whl (83 kB)
Requirement already satisfied, skipping upgrade: nbconvert in c:\programdata\anaconda3\lib\site-packages (from notebook-as-pdf) (6.0.7)
Requirement already satisfied, skipping upgrade: typing-extensions; python_version < "3.10" in c:\programdata\anaconda3\lib\site-packages (from PyPDF2->notebook-as-pdf) (3.7.4.3)
Requirement already satisfied, skipping upgrade: pyee<9.0.0,>=8.1.0 in c:\programdata\anaconda3\lib\site-packages (from pyppeteer->notebook-as-pdf) (8.2.2)
Requirement already satisfied, skipping upgrade: websockets<11.0,>=10.0 in c:\programdata\anaconda3\lib\site-packages (from pyppeteer->notebook-as-pdf) (10.3)
Collecting certifi>=2021
  Using cached certifi-2022.6.15-py3-none-any.whl (160 kB)
Requirement already satisfied, skipping upgrade: importlib-metadata>=1.4 in c:\users\manso\appdata\roaming\python\python38\site-packages (from pyppeteer->notebook-as-pdf) (4.11.4)
Requirement already satisfied, skipping upgrade: tqdm<5.0.0,>=4.42.1 in c:\programdata\anaconda3\lib\site-packages (from pyppeteer->notebook-as-pdf) (4.50.2)
Requirement already satisfied, skipping upgrade: appdirs<2.0.0,>=1.4.3 in c:\programdata\anaconda3\lib\site-packages (from pyppeteer->notebook-as-pdf) (1.4.4)
Requirement already satisfied, skipping upgrade: urllib3<2.0.0,>=1.25.8 in c:\programdata\anaconda3\lib\site-packages (from pyppeteer->notebook-as-pdf) (1.25.11)
Requirement already satisfied, skipping upgrade: nbclient<0.6.0,>=0.5.0 in c:\programdata\anaconda3\lib\site-packages (from nbconvert->notebook-as-pdf) (0.5.1)
Requirement already satisfied, skipping upgrade: pandocfilters>=1.4.1 in c:\programdata\anaconda3\lib\site-packages (from nbconvert->notebook-as-pdf) (1.4.3)
Requirement already satisfied, skipping upgrade: defusedxml in c:\programdata\anaconda3\lib\site-packages (from nbconvert->notebook-as-pdf) (0.6.0)
Requirement already satisfied, skipping upgrade: pygments>=2.4.1 in c:\programdata\anaconda3\lib\site-packages (from nbconvert->notebook-as-pdf) (2.7.2)
Requirement already satisfied, skipping upgrade: jupyter-core in c:\programdata\anaconda3\lib\site-packages (from nbconvert->notebook-as-pdf) (4.6.3)
Requirement already satisfied, skipping upgrade: testpath in c:\programdata\anaconda3\lib\site-packages (from nbconvert->notebook-as-pdf) (0.4.4)
Requirement already satisfied, skipping upgrade: traitlets>=4.2 in c:\programdata\anaconda3\lib\site-packages (from nbconvert->notebook-as-pdf) (5.0.5)
Requirement already satisfied, skipping upgrade: bleach in c:\programdata\anaconda3\lib\site-packages (from nbconvert->notebook-as-pdf) (3.2.1)
Requirement already satisfied, skipping upgrade: mistune<2,>=0.8.1 in c:\programdata\anaconda3\lib\site-packages (from nbconvert->notebook-as-pdf) (0.8.4)
Requirement already satisfied, skipping upgrade: jupyterlab-pygments in c:\programdata\anaconda3\lib\site-packages (from nbconvert->notebook-as-pdf) (0.1.2)
Requirement already satisfied, skipping upgrade: jinja2>=2.4 in c:\programdata\anaconda3\lib\site-packages (from nbconvert->notebook-as-pdf) (2.11.2)
Requirement already satisfied, skipping upgrade: entrypoints>=0.2.2 in c:\programdata\anaconda3\lib\site-packages (from nbconvert->notebook-as-pdf) (0.3)
Requirement already satisfied, skipping upgrade: nbformat>=4.4 in c:\programdata\anaconda3\lib\site-packages (from nbconvert->notebook-as-pdf) (5.0.8)
Requirement already satisfied, skipping upgrade: zipp>=0.5 in c:\programdata\anaconda3\lib\site-packages (from importlib-metadata>=1.4->pyppeteer->notebook-as-pdf) (3.4.0)
Requirement already satisfied, skipping upgrade: async-generator in c:\programdata\anaconda3\lib\site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook-as-pdf) (1.10)
Requirement already satisfied, skipping upgrade: jupyter-client>=6.1.5 in c:\programdata\anaconda3\lib\site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook-as-pdf) (6.1.7)
Requirement already satisfied, skipping upgrade: nest-asyncio in c:\programdata\anaconda3\lib\site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook-as-pdf) (1.4.2)
Requirement already satisfied, skipping upgrade: pywin32>=1.0; sys_platform == "win32" in c:\programdata\anaconda3\lib\site-packages (from jupyter-core->nbconvert->notebook-as-pdf) (227)
Requirement already satisfied, skipping upgrade: ipython-genutils in c:\programdata\anaconda3\lib\site-packages (from traitlets>=4.2->nbconvert->notebook-as-pdf) (0.2.0)
Requirement already satisfied, skipping upgrade: webencodings in c:\programdata\anaconda3\lib\site-packages (from bleach->nbconvert->notebook-as-pdf) (0.5.1)
Requirement already satisfied, skipping upgrade: packaging in c:\programdata\anaconda3\lib\site-packages (from bleach->nbconvert->notebook-as-pdf) (20.4)
Requirement already satisfied, skipping upgrade: six>=1.9.0 in c:\programdata\anaconda3\lib\site-packages (from bleach->nbconvert->notebook-as-pdf) (1.15.0)
Requirement already satisfied, skipping upgrade: MarkupSafe>=0.23 in c:\users\manso\appdata\roaming\python\python38\site-packages (from jinja2>=2.4->nbconvert->notebook-as-pdf) (2.0.1)
Requirement already satisfied, skipping upgrade: jsonschema!=2.5.0,>=2.4 in c:\programdata\anaconda3\lib\site-packages (from nbformat>=4.4->nbconvert->notebook-as-pdf) (3.2.0)
Requirement already satisfied, skipping upgrade: python-dateutil>=2.1 in c:\programdata\anaconda3\lib\site-packages (from jupyter-client>=6.1.5->nbclient<0.6.0,>=0.5.0->nbconvert->notebook-as-pdf) (2.8.1)
Requirement already satisfied, skipping upgrade: tornado>=4.1 in c:\programdata\anaconda3\lib\site-packages (from jupyter-client>=6.1.5->nbclient<0.6.0,>=0.5.0->nbconvert->notebook-as-pdf) (6.0.4)
Requirement already satisfied, skipping upgrade: pyzmq>=13 in c:\programdata\anaconda3\lib\site-packages (from jupyter-client>=6.1.5->nbclient<0.6.0,>=0.5.0->nbconvert->notebook-as-pdf) (19.0.2)
Requirement already satisfied, skipping upgrade: pyparsing>=2.0.2 in c:\programdata\anaconda3\lib\site-packages (from packaging->bleach->nbconvert->notebook-as-pdf) (2.4.7)
Requirement already satisfied, skipping upgrade: pyrsistent>=0.14.0 in c:\programdata\anaconda3\lib\site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.4->nbconvert->notebook-as-pdf) (0.17.3)
Requirement already satisfied, skipping upgrade: setuptools in c:\programdata\anaconda3\lib\site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.4->nbconvert->notebook-as-pdf) (50.3.1.post20201107)
Requirement already satisfied, skipping upgrade: attrs>=17.4.0 in c:\programdata\anaconda3\lib\site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.4->nbconvert->notebook-as-pdf) (20.3.0)
Installing collected packages: certifi, pyppeteer, notebook-as-pdf
  Attempting uninstall: certifi
    Found existing installation: certifi 2020.6.20
    Uninstalling certifi-2020.6.20:

ERROR: Could not install packages due to an EnvironmentError: [WinError 5] Access is denied: 'c:\\programdata\\anaconda3\\lib\\site-packages\\certifi-2020.6.20-py3.6.egg-info\\dependency_links.txt'
Consider using the `--user` option or check the permissions.

Requirement already satisfied: biopython in c:\programdata\anaconda3\lib\site-packages (1.79)
Requirement already satisfied: numpy in c:\programdata\anaconda3\lib\site-packages (from biopython) (1.19.2)


from Bio.SeqRecord import SeqRecord
help(SeqRecord)

Help on class SeqRecord in module Bio.SeqRecord:

class SeqRecord(builtins.object)
 |  SeqRecord(seq, id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=None, features=None, annotations=None, letter_annotations=None)
 |  
 |  A SeqRecord object holds a sequence and information about it.
 |  
 |  Main attributes:
 |   - id          - Identifier such as a locus tag (string)
 |   - seq         - The sequence itself (Seq object or similar)
 |  
 |  Additional attributes:
 |   - name        - Sequence name, e.g. gene name (string)
 |   - description - Additional text (string)
 |   - dbxrefs     - List of database cross references (list of strings)
 |   - features    - Any (sub)features defined (list of SeqFeature objects)
 |   - annotations - Further information about the whole sequence (dictionary).
 |     Most entries are strings, or lists of strings.
 |   - letter_annotations - Per letter/symbol annotation (restricted
 |     dictionary). This holds Python sequences (lists, strings
 |     or tuples) whose length matches that of the sequence.
 |     A typical use would be to hold a list of integers
 |     representing sequencing quality scores, or a string
 |     representing the secondary structure.
 |  
 |  You will typically use Bio.SeqIO to read in sequences from files as
 |  SeqRecord objects.  However, you may want to create your own SeqRecord
 |  objects directly (see the __init__ method for further details):
 |  
 |  >>> from Bio.Seq import Seq
 |  >>> from Bio.SeqRecord import SeqRecord
 |  >>> record = SeqRecord(Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF"),
 |  ...                    id="YP_025292.1", name="HokC",
 |  ...                    description="toxic membrane protein")
 |  >>> print(record)
 |  ID: YP_025292.1
 |  Name: HokC
 |  Description: toxic membrane protein
 |  Number of features: 0
 |  Seq('MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF')
 |  
 |  If you want to save SeqRecord objects to a sequence file, use Bio.SeqIO
 |  for this.  For the special case where you want the SeqRecord turned into
 |  a string in a particular file format there is a format method which uses
 |  Bio.SeqIO internally:
 |  
 |  >>> print(record.format("fasta"))
 |  >YP_025292.1 toxic membrane protein
 |  MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF
 |  <BLANKLINE>
 |  
 |  You can also do things like slicing a SeqRecord, checking its length, etc
 |  
 |  >>> len(record)
 |  44
 |  >>> edited = record[:10] + record[11:]
 |  >>> print(edited.seq)
 |  MKQHKAMIVAIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF
 |  >>> print(record.seq)
 |  MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF
 |  
 |  Methods defined here:
 |  
 |  __add__(self, other)
 |      Add another sequence or string to this sequence.
 |      
 |      The other sequence can be a SeqRecord object, a Seq object (or
 |      similar, e.g. a MutableSeq) or a plain Python string. If you add
 |      a plain string or a Seq (like) object, the new SeqRecord will simply
 |      have this appended to the existing data. However, any per letter
 |      annotation will be lost:
 |      
 |      >>> from Bio import SeqIO
 |      >>> record = SeqIO.read("Quality/solexa_faked.fastq", "fastq-solexa")
 |      >>> print("%s %s" % (record.id, record.seq))
 |      slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN
 |      >>> print(list(record.letter_annotations))
 |      ['solexa_quality']
 |      
 |      >>> new = record + "ACT"
 |      >>> print("%s %s" % (new.id, new.seq))
 |      slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNNACT
 |      >>> print(list(new.letter_annotations))
 |      []
 |      
 |      The new record will attempt to combine the annotation, but for any
 |      ambiguities (e.g. different names) it defaults to omitting that
 |      annotation.
 |      
 |      >>> from Bio import SeqIO
 |      >>> with open("GenBank/pBAD30.gb") as handle:
 |      ...     plasmid = SeqIO.read(handle, "gb")
 |      >>> print("%s %i" % (plasmid.id, len(plasmid)))
 |      pBAD30 4923
 |      
 |      Now let's cut the plasmid into two pieces, and join them back up the
 |      other way round (i.e. shift the starting point on this plasmid, have
 |      a look at the annotated features in the original file to see why this
 |      particular split point might make sense):
 |      
 |      >>> left = plasmid[:3765]
 |      >>> right = plasmid[3765:]
 |      >>> new = right + left
 |      >>> print("%s %i" % (new.id, len(new)))
 |      pBAD30 4923
 |      >>> str(new.seq) == str(right.seq + left.seq)
 |      True
 |      >>> len(new.features) == len(left.features) + len(right.features)
 |      True
 |      
 |      When we add the left and right SeqRecord objects, their annotation
 |      is all consistent, so it is all conserved in the new SeqRecord:
 |      
 |      >>> new.id == left.id == right.id == plasmid.id
 |      True
 |      >>> new.name == left.name == right.name == plasmid.name
 |      True
 |      >>> new.description == plasmid.description
 |      True
 |      >>> new.annotations == left.annotations == right.annotations
 |      True
 |      >>> new.letter_annotations == plasmid.letter_annotations
 |      True
 |      >>> new.dbxrefs == left.dbxrefs == right.dbxrefs
 |      True
 |      
 |      However, we should point out that when we sliced the SeqRecord,
 |      any annotations dictionary or dbxrefs list entries were lost.
 |      You can explicitly copy them like this:
 |      
 |      >>> new.annotations = plasmid.annotations.copy()
 |      >>> new.dbxrefs = plasmid.dbxrefs[:]
 |  
 |  __bool__(self)
 |      Boolean value of an instance of this class (True).
 |      
 |      This behaviour is for backwards compatibility, since until the
 |      __len__ method was added, a SeqRecord always evaluated as True.
 |      
 |      Note that in comparison, a Seq object will evaluate to False if it
 |      has a zero length sequence.
 |      
 |      WARNING: The SeqRecord may in future evaluate to False when its
 |      sequence is of zero length (in order to better match the Seq
 |      object behaviour)!
 |  
 |  __contains__(self, char)
 |      Implement the 'in' keyword, searches the sequence.
 |      
 |      e.g.
 |      
 |      >>> from Bio import SeqIO
 |      >>> record = SeqIO.read("Fasta/sweetpea.nu", "fasta")
 |      >>> "GAATTC" in record
 |      False
 |      >>> "AAA" in record
 |      True
 |      
 |      This essentially acts as a proxy for using "in" on the sequence:
 |      
 |      >>> "GAATTC" in record.seq
 |      False
 |      >>> "AAA" in record.seq
 |      True
 |      
 |      Note that you can also use Seq objects as the query,
 |      
 |      >>> from Bio.Seq import Seq
 |      >>> Seq("AAA") in record
 |      True
 |      
 |      See also the Seq object's __contains__ method.
 |  
 |  __eq__(self, other)
 |      Define the equal-to operand (not implemented).
 |  
 |  __format__(self, format_spec)
 |      Return the record as a string in the specified file format.
 |      
 |      This method supports the Python format() function and f-strings.
 |      The format_spec should be a lower case string supported by
 |      Bio.SeqIO as a text output file format. Requesting a binary file
 |      format raises a ValueError. e.g.
 |      
 |      >>> from Bio.Seq import Seq
 |      >>> from Bio.SeqRecord import SeqRecord
 |      >>> record = SeqRecord(Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF"),
 |      ...                    id="YP_025292.1", name="HokC",
 |      ...                    description="toxic membrane protein")
 |      ...
 |      >>> format(record, "fasta")
 |      '>YP_025292.1 toxic membrane protein\nMKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF\n'
 |      >>> print(f"Here is {record.id} in FASTA format:\n{record:fasta}")
 |      Here is YP_025292.1 in FASTA format:
 |      >YP_025292.1 toxic membrane protein
 |      MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF
 |      <BLANKLINE>
 |      
 |      See also the SeqRecord's format() method.
 |  
 |  __ge__(self, other)
 |      Define the greater-than-or-equal-to operand (not implemented).
 |  
 |  __getitem__(self, index)
 |      Return a sub-sequence or an individual letter.
 |      
 |      Slicing, e.g. my_record[5:10], returns a new SeqRecord for
 |      that sub-sequence with some annotation preserved as follows:
 |      
 |      * The name, id and description are kept as-is.
 |      * Any per-letter-annotations are sliced to match the requested
 |        sub-sequence.
 |      * Unless a stride is used, all those features which fall fully
 |        within the subsequence are included (with their locations
 |        adjusted accordingly). If you want to preserve any truncated
 |        features (e.g. GenBank/EMBL source features), you must
 |        explicitly add them to the new SeqRecord yourself.
 |      * With the exception of any molecule type, the annotations
 |        dictionary and the dbxrefs list are not used for the new
 |        SeqRecord, as in general they may not apply to the
 |        subsequence. If you want to preserve them, you must explicitly
 |        copy them to the new SeqRecord yourself.
 |      
 |      Using an integer index, e.g. my_record[5] is shorthand for
 |      extracting that letter from the sequence, my_record.seq[5].
 |      
 |      For example, consider this short protein and its secondary
 |      structure as encoded by the PDB (e.g. H for alpha helices),
 |      plus a simple feature for its histidine self phosphorylation
 |      site:
 |      
 |      >>> from Bio.Seq import Seq
 |      >>> from Bio.SeqRecord import SeqRecord
 |      >>> from Bio.SeqFeature import SeqFeature, FeatureLocation
 |      >>> rec = SeqRecord(Seq("MAAGVKQLADDRTLLMAGVSHDLRTPLTRIRLAT"
 |      ...                     "EMMSEQDGYLAESINKDIEECNAIIEQFIDYLR"),
 |      ...                 id="1JOY", name="EnvZ",
 |      ...                 description="Homodimeric domain of EnvZ from E. coli")
 |      >>> rec.letter_annotations["secondary_structure"] = "  S  SSSSSSHHHHHTTTHHHHHHHHHHHHHHHHHHHHHHTHHHHHHHHHHHHHHHHHHHHHTT  "
 |      >>> rec.features.append(SeqFeature(FeatureLocation(20, 21),
 |      ...                     type = "Site"))
 |      
 |      Now let's have a quick look at the full record,
 |      
 |      >>> print(rec)
 |      ID: 1JOY
 |      Name: EnvZ
 |      Description: Homodimeric domain of EnvZ from E. coli
 |      Number of features: 1
 |      Per letter annotation for: secondary_structure
 |      Seq('MAAGVKQLADDRTLLMAGVSHDLRTPLTRIRLATEMMSEQDGYLAESINKDIEE...YLR')
 |      >>> rec.letter_annotations["secondary_structure"]
 |      '  S  SSSSSSHHHHHTTTHHHHHHHHHHHHHHHHHHHHHHTHHHHHHHHHHHHHHHHHHHHHTT  '
 |      >>> print(rec.features[0].location)
 |      [20:21]
 |      
 |      Now let's take a sub sequence, here chosen as the first (fractured)
 |      alpha helix which includes the histidine phosphorylation site:
 |      
 |      >>> sub = rec[11:41]
 |      >>> print(sub)
 |      ID: 1JOY
 |      Name: EnvZ
 |      Description: Homodimeric domain of EnvZ from E. coli
 |      Number of features: 1
 |      Per letter annotation for: secondary_structure
 |      Seq('RTLLMAGVSHDLRTPLTRIRLATEMMSEQD')
 |      >>> sub.letter_annotations["secondary_structure"]
 |      'HHHHHTTTHHHHHHHHHHHHHHHHHHHHHH'
 |      >>> print(sub.features[0].location)
 |      [9:10]
 |      
 |      You can also of course omit the start or end values, for
 |      example to get the first ten letters only:
 |      
 |      >>> print(rec[:10])
 |      ID: 1JOY
 |      Name: EnvZ
 |      Description: Homodimeric domain of EnvZ from E. coli
 |      Number of features: 0
 |      Per letter annotation for: secondary_structure
 |      Seq('MAAGVKQLAD')
 |      
 |      Or for the last ten letters:
 |      
 |      >>> print(rec[-10:])
 |      ID: 1JOY
 |      Name: EnvZ
 |      Description: Homodimeric domain of EnvZ from E. coli
 |      Number of features: 0
 |      Per letter annotation for: secondary_structure
 |      Seq('IIEQFIDYLR')
 |      
 |      If you omit both, then you get a copy of the original record (although
 |      lacking the annotations and dbxrefs):
 |      
 |      >>> print(rec[:])
 |      ID: 1JOY
 |      Name: EnvZ
 |      Description: Homodimeric domain of EnvZ from E. coli
 |      Number of features: 1
 |      Per letter annotation for: secondary_structure
 |      Seq('MAAGVKQLADDRTLLMAGVSHDLRTPLTRIRLATEMMSEQDGYLAESINKDIEE...YLR')
 |      
 |      Finally, indexing with a simple integer is shorthand for pulling out
 |      that letter from the sequence directly:
 |      
 |      >>> rec[5]
 |      'K'
 |      >>> rec.seq[5]
 |      'K'
 |  
 |  __gt__(self, other)
 |      Define the greater-than operand (not implemented).
 |  
 |  __init__(self, seq, id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=None, features=None, annotations=None, letter_annotations=None)
 |      Create a SeqRecord.
 |      
 |      Arguments:
 |       - seq         - Sequence, required (Seq or MutableSeq)
 |       - id          - Sequence identifier, recommended (string)
 |       - name        - Sequence name, optional (string)
 |       - description - Sequence description, optional (string)
 |       - dbxrefs     - Database cross references, optional (list of strings)
 |       - features    - Any (sub)features, optional (list of SeqFeature objects)
 |       - annotations - Dictionary of annotations for the whole sequence
 |       - letter_annotations - Dictionary of per-letter-annotations, values
 |         should be strings, list or tuples of the same length as the full
 |         sequence.
 |      
 |      You will typically use Bio.SeqIO to read in sequences from files as
 |      SeqRecord objects.  However, you may want to create your own SeqRecord
 |      objects directly.
 |      
 |      Note that while an id is optional, we strongly recommend you supply a
 |      unique id string for each record.  This is especially important
 |      if you wish to write your sequences to a file.
 |      
 |      You can create a 'blank' SeqRecord object, and then populate the
 |      attributes later.
 |  
 |  __iter__(self)
 |      Iterate over the letters in the sequence.
 |      
 |      For example, using Bio.SeqIO to read in a protein FASTA file:
 |      
 |      >>> from Bio import SeqIO
 |      >>> record = SeqIO.read("Fasta/loveliesbleeding.pro", "fasta")
 |      >>> for amino in record:
 |      ...     print(amino)
 |      ...     if amino == "L": break
 |      X
 |      A
 |      G
 |      L
 |      >>> print(record.seq[3])
 |      L
 |      
 |      This is just a shortcut for iterating over the sequence directly:
 |      
 |      >>> for amino in record.seq:
 |      ...     print(amino)
 |      ...     if amino == "L": break
 |      X
 |      A
 |      G
 |      L
 |      >>> print(record.seq[3])
 |      L
 |      
 |      Note that this does not facilitate iteration together with any
 |      per-letter-annotation.  However, you can achieve that using the
 |      python zip function on the record (or its sequence) and the relevant
 |      per-letter-annotation:
 |      
 |      >>> from Bio import SeqIO
 |      >>> rec = SeqIO.read("Quality/solexa_faked.fastq", "fastq-solexa")
 |      >>> print("%s %s" % (rec.id, rec.seq))
 |      slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN
 |      >>> print(list(rec.letter_annotations))
 |      ['solexa_quality']
 |      >>> for nuc, qual in zip(rec, rec.letter_annotations["solexa_quality"]):
 |      ...     if qual > 35:
 |      ...         print("%s %i" % (nuc, qual))
 |      A 40
 |      C 39
 |      G 38
 |      T 37
 |      A 36
 |      
 |      You may agree that using zip(rec.seq, ...) is more explicit than using
 |      zip(rec, ...) as shown above.
 |  
 |  __le___(self, other)
 |      Define the less-than-or-equal-to operand (not implemented).
 |  
 |  __len__(self)
 |      Return the length of the sequence.
 |      
 |      For example, using Bio.SeqIO to read in a FASTA nucleotide file:
 |      
 |      >>> from Bio import SeqIO
 |      >>> record = SeqIO.read("Fasta/sweetpea.nu", "fasta")
 |      >>> len(record)
 |      309
 |      >>> len(record.seq)
 |      309
 |  
 |  __lt__(self, other)
 |      Define the less-than operand (not implemented).
 |  
 |  __ne__(self, other)
 |      Define the not-equal-to operand (not implemented).
 |  
 |  __radd__(self, other)
 |      Add another sequence or string to this sequence (from the left).
 |      
 |      This method handles adding a Seq object (or similar, e.g. MutableSeq)
 |      or a plain Python string (on the left) to a SeqRecord (on the right).
 |      See the __add__ method for more details, but for example:
 |      
 |      >>> from Bio import SeqIO
 |      >>> record = SeqIO.read("Quality/solexa_faked.fastq", "fastq-solexa")
 |      >>> print("%s %s" % (record.id, record.seq))
 |      slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN
 |      >>> print(list(record.letter_annotations))
 |      ['solexa_quality']
 |      
 |      >>> new = "ACT" + record
 |      >>> print("%s %s" % (new.id, new.seq))
 |      slxa_0001_1_0001_01 ACTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN
 |      >>> print(list(new.letter_annotations))
 |      []
 |  
 |  __repr__(self)
 |      Return a concise summary of the record for debugging (string).
 |      
 |      The python built in function repr works by calling the object's ___repr__
 |      method.  e.g.
 |      
 |      >>> from Bio.Seq import Seq
 |      >>> from Bio.SeqRecord import SeqRecord
 |      >>> rec = SeqRecord(Seq("MASRGVNKVILVGNLGQDPEVRYMPNGGAVANITLATSESWRDKAT"
 |      ...                     "GEMKEQTEWHRVVLFGKLAEVASEYLRKGSQVYIEGQLRTRKWTDQ"
 |      ...                     "SGQDRYTTEVVVNVGGTMQMLGGRQGGGAPAGGNIGGGQPQGGWGQ"
 |      ...                     "PQQPQGGNQFSGGAQSRPQQSAPAAPSNEPPMDFDDDIPF"),
 |      ...                 id="NP_418483.1", name="b4059",
 |      ...                 description="ssDNA-binding protein",
 |      ...                 dbxrefs=["ASAP:13298", "GI:16131885", "GeneID:948570"])
 |      >>> print(repr(rec))
 |      SeqRecord(seq=Seq('MASRGVNKVILVGNLGQDPEVRYMPNGGAVANITLATSESWRDKATGEMKEQTE...IPF'), id='NP_418483.1', name='b4059', description='ssDNA-binding protein', dbxrefs=['ASAP:13298', 'GI:16131885', 'GeneID:948570'])
 |      
 |      At the python prompt you can also use this shorthand:
 |      
 |      >>> rec
 |      SeqRecord(seq=Seq('MASRGVNKVILVGNLGQDPEVRYMPNGGAVANITLATSESWRDKATGEMKEQTE...IPF'), id='NP_418483.1', name='b4059', description='ssDNA-binding protein', dbxrefs=['ASAP:13298', 'GI:16131885', 'GeneID:948570'])
 |      
 |      Note that long sequences are shown truncated. Also note that any
 |      annotations, letter_annotations and features are not shown (as they
 |      would lead to a very long string).
 |  
 |  __str__(self)
 |      Return a human readable summary of the record and its annotation (string).
 |      
 |      The python built in function str works by calling the object's ___str__
 |      method.  e.g.
 |      
 |      >>> from Bio.Seq import Seq
 |      >>> from Bio.SeqRecord import SeqRecord
 |      >>> record = SeqRecord(Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF"),
 |      ...                    id="YP_025292.1", name="HokC",
 |      ...                    description="toxic membrane protein, small")
 |      >>> print(str(record))
 |      ID: YP_025292.1
 |      Name: HokC
 |      Description: toxic membrane protein, small
 |      Number of features: 0
 |      Seq('MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF')
 |      
 |      In this example you don't actually need to call str explicity, as the
 |      print command does this automatically:
 |      
 |      >>> print(record)
 |      ID: YP_025292.1
 |      Name: HokC
 |      Description: toxic membrane protein, small
 |      Number of features: 0
 |      Seq('MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF')
 |      
 |      Note that long sequences are shown truncated.
 |  
 |  format(self, format)
 |      Return the record as a string in the specified file format.
 |      
 |      The format should be a lower case string supported as an output
 |      format by Bio.SeqIO, which is used to turn the SeqRecord into a
 |      string.  e.g.
 |      
 |      >>> from Bio.Seq import Seq
 |      >>> from Bio.SeqRecord import SeqRecord
 |      >>> record = SeqRecord(Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF"),
 |      ...                    id="YP_025292.1", name="HokC",
 |      ...                    description="toxic membrane protein")
 |      >>> record.format("fasta")
 |      '>YP_025292.1 toxic membrane protein\nMKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF\n'
 |      >>> print(record.format("fasta"))
 |      >YP_025292.1 toxic membrane protein
 |      MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF
 |      <BLANKLINE>
 |      
 |      The Python print function automatically appends a new line, meaning
 |      in this example a blank line is shown.  If you look at the string
 |      representation you can see there is a trailing new line (shown as
 |      slash n) which is important when writing to a file or if
 |      concatenating multiple sequence strings together.
 |      
 |      Note that this method will NOT work on every possible file format
 |      supported by Bio.SeqIO (e.g. some are for multiple sequences only,
 |      and binary formats are not supported).
 |  
 |  lower(self)
 |      Return a copy of the record with a lower case sequence.
 |      
 |      All the annotation is preserved unchanged. e.g.
 |      
 |      >>> from Bio import SeqIO
 |      >>> record = SeqIO.read("Fasta/aster.pro", "fasta")
 |      >>> print(record.format("fasta"))
 |      >gi|3298468|dbj|BAA31520.1| SAMIPF
 |      GGHVNPAVTFGAFVGGNITLLRGIVYIIAQLLGSTVACLLLKFVTNDMAVGVFSLSAGVG
 |      VTNALVFEIVMTFGLVYTVYATAIDPKKGSLGTIAPIAIGFIVGANI
 |      <BLANKLINE>
 |      >>> print(record.lower().format("fasta"))
 |      >gi|3298468|dbj|BAA31520.1| SAMIPF
 |      gghvnpavtfgafvggnitllrgivyiiaqllgstvaclllkfvtndmavgvfslsagvg
 |      vtnalvfeivmtfglvytvyataidpkkgslgtiapiaigfivgani
 |      <BLANKLINE>
 |      
 |      To take a more annotation rich example,
 |      
 |      >>> from Bio import SeqIO
 |      >>> old = SeqIO.read("EMBL/TRBG361.embl", "embl")
 |      >>> len(old.features)
 |      3
 |      >>> new = old.lower()
 |      >>> len(old.features) == len(new.features)
 |      True
 |      >>> old.annotations["organism"] == new.annotations["organism"]
 |      True
 |      >>> old.dbxrefs == new.dbxrefs
 |      True
 |  
 |  reverse_complement(self, id=False, name=False, description=False, features=True, annotations=False, letter_annotations=True, dbxrefs=False)
 |      Return new SeqRecord with reverse complement sequence.
 |      
 |      By default the new record does NOT preserve the sequence identifier,
 |      name, description, general annotation or database cross-references -
 |      these are unlikely to apply to the reversed sequence.
 |      
 |      You can specify the returned record's id, name and description as
 |      strings, or True to keep that of the parent, or False for a default.
 |      
 |      You can specify the returned record's features with a list of
 |      SeqFeature objects, or True to keep that of the parent, or False to
 |      omit them. The default is to keep the original features (with the
 |      strand and locations adjusted).
 |      
 |      You can also specify both the returned record's annotations and
 |      letter_annotations as dictionaries, True to keep that of the parent,
 |      or False to omit them. The default is to keep the original
 |      annotations (with the letter annotations reversed).
 |      
 |      To show what happens to the pre-letter annotations, consider an
 |      example Solexa variant FASTQ file with a single entry, which we'll
 |      read in as a SeqRecord:
 |      
 |      >>> from Bio import SeqIO
 |      >>> record = SeqIO.read("Quality/solexa_faked.fastq", "fastq-solexa")
 |      >>> print("%s %s" % (record.id, record.seq))
 |      slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN
 |      >>> print(list(record.letter_annotations))
 |      ['solexa_quality']
 |      >>> print(record.letter_annotations["solexa_quality"])
 |      [40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5]
 |      
 |      Now take the reverse complement, here we explicitly give a new
 |      identifier (the old identifier with a suffix):
 |      
 |      >>> rc_record = record.reverse_complement(id=record.id + "_rc")
 |      >>> print("%s %s" % (rc_record.id, rc_record.seq))
 |      slxa_0001_1_0001_01_rc NNNNNNACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
 |      
 |      Notice that the per-letter-annotations have also been reversed,
 |      although this may not be appropriate for all cases.
 |      
 |      >>> print(rc_record.letter_annotations["solexa_quality"])
 |      [-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40]
 |      
 |      Now for the features, we need a different example. Parsing a GenBank
 |      file is probably the easiest way to get an nice example with features
 |      in it...
 |      
 |      >>> from Bio import SeqIO
 |      >>> with open("GenBank/pBAD30.gb") as handle:
 |      ...     plasmid = SeqIO.read(handle, "gb")
 |      >>> print("%s %i" % (plasmid.id, len(plasmid)))
 |      pBAD30 4923
 |      >>> plasmid.seq
 |      Seq('GCTAGCGGAGTGTATACTGGCTTACTATGTTGGCACTGATGAGGGTGTCAGTGA...ATG')
 |      >>> len(plasmid.features)
 |      13
 |      
 |      Now, let's take the reverse complement of this whole plasmid:
 |      
 |      >>> rc_plasmid = plasmid.reverse_complement(id=plasmid.id+"_rc")
 |      >>> print("%s %i" % (rc_plasmid.id, len(rc_plasmid)))
 |      pBAD30_rc 4923
 |      >>> rc_plasmid.seq
 |      Seq('CATGGGCAAATATTATACGCAAGGCGACAAGGTGCTGATGCCGCTGGCGATTCA...AGC')
 |      >>> len(rc_plasmid.features)
 |      13
 |      
 |      Let's compare the first CDS feature - it has gone from being the
 |      second feature (index 1) to the second last feature (index -2), its
 |      strand has changed, and the location switched round.
 |      
 |      >>> print(plasmid.features[1])
 |      type: CDS
 |      location: [1081:1960](-)
 |      qualifiers:
 |          Key: label, Value: ['araC']
 |          Key: note, Value: ['araC regulator of the arabinose BAD promoter']
 |          Key: vntifkey, Value: ['4']
 |      <BLANKLINE>
 |      >>> print(rc_plasmid.features[-2])
 |      type: CDS
 |      location: [2963:3842](+)
 |      qualifiers:
 |          Key: label, Value: ['araC']
 |          Key: note, Value: ['araC regulator of the arabinose BAD promoter']
 |          Key: vntifkey, Value: ['4']
 |      <BLANKLINE>
 |      
 |      You can check this new location, based on the length of the plasmid:
 |      
 |      >>> len(plasmid) - 1081
 |      3842
 |      >>> len(plasmid) - 1960
 |      2963
 |      
 |      Note that if the SeqFeature annotation includes any strand specific
 |      information (e.g. base changes for a SNP), this information is not
 |      amended, and would need correction after the reverse complement.
 |      
 |      Note trying to reverse complement a protein SeqRecord raises an
 |      exception:
 |      
 |      >>> from Bio.Seq import Seq
 |      >>> from Bio.SeqRecord import SeqRecord
 |      >>> protein_rec = SeqRecord(Seq("MAIVMGR"), id="Test",
 |      ...                         annotations={"molecule_type": "protein"})
 |      >>> protein_rec.reverse_complement()
 |      Traceback (most recent call last):
 |         ...
 |      ValueError: Proteins do not have complements!
 |      
 |      If you have RNA without any U bases, it must be annotated as RNA
 |      otherwise it will be treated as DNA by default with A mapped to T:
 |      
 |      >>> from Bio.Seq import Seq
 |      >>> from Bio.SeqRecord import SeqRecord
 |      >>> rna1 = SeqRecord(Seq("ACG"), id="Test")
 |      >>> rna2 = SeqRecord(Seq("ACG"), id="Test", annotations={"molecule_type": "RNA"})
 |      >>> print(rna1.reverse_complement(id="RC", description="unk").format("fasta"))
 |      >RC unk
 |      CGT
 |      <BLANKLINE>
 |      >>> print(rna2.reverse_complement(id="RC", description="RNA").format("fasta"))
 |      >RC RNA
 |      CGU
 |      <BLANKLINE>
 |      
 |      Also note you can reverse complement a SeqRecord using a MutableSeq:
 |      
 |      >>> from Bio.Seq import MutableSeq
 |      >>> from Bio.SeqRecord import SeqRecord
 |      >>> rec = SeqRecord(MutableSeq("ACGT"), id="Test")
 |      >>> rec.seq[0] = "T"
 |      >>> print("%s %s" % (rec.id, rec.seq))
 |      Test TCGT
 |      >>> rc = rec.reverse_complement(id=True)
 |      >>> print("%s %s" % (rc.id, rc.seq))
 |      Test ACGA
 |  
 |  translate(self, table='Standard', stop_symbol='*', to_stop=False, cds=False, gap=None, id=False, name=False, description=False, features=False, annotations=False, letter_annotations=False, dbxrefs=False)
 |      Return new SeqRecord with translated sequence.
 |      
 |      This calls the record's .seq.translate() method (which describes
 |      the translation related arguments, like table for the genetic code),
 |      
 |      By default the new record does NOT preserve the sequence identifier,
 |      name, description, general annotation or database cross-references -
 |      these are unlikely to apply to the translated sequence.
 |      
 |      You can specify the returned record's id, name and description as
 |      strings, or True to keep that of the parent, or False for a default.
 |      
 |      You can specify the returned record's features with a list of
 |      SeqFeature objects, or False (default) to omit them.
 |      
 |      You can also specify both the returned record's annotations and
 |      letter_annotations as dictionaries, True to keep that of the parent
 |      (annotations only), or False (default) to omit them.
 |      
 |      e.g. Loading a FASTA gene and translating it,
 |      
 |      >>> from Bio import SeqIO
 |      >>> gene_record = SeqIO.read("Fasta/sweetpea.nu", "fasta")
 |      >>> print(gene_record.format("fasta"))
 |      >gi|3176602|gb|U78617.1|LOU78617 Lathyrus odoratus phytochrome A (PHYA) gene, partial cds
 |      CAGGCTGCGCGGTTTCTATTTATGAAGAACAAGGTCCGTATGATAGTTGATTGTCATGCA
 |      AAACATGTGAAGGTTCTTCAAGACGAAAAACTCCCATTTGATTTGACTCTGTGCGGTTCG
 |      ACCTTAAGAGCTCCACATAGTTGCCATTTGCAGTACATGGCTAACATGGATTCAATTGCT
 |      TCATTGGTTATGGCAGTGGTCGTCAATGACAGCGATGAAGATGGAGATAGCCGTGACGCA
 |      GTTCTACCACAAAAGAAAAAGAGACTTTGGGGTTTGGTAGTTTGTCATAACACTACTCCG
 |      AGGTTTGTT
 |      <BLANKLINE>
 |      
 |      And now translating the record, specifying the new ID and description:
 |      
 |      >>> protein_record = gene_record.translate(table=11,
 |      ...                                        id="phya",
 |      ...                                        description="translation")
 |      >>> print(protein_record.format("fasta"))
 |      >phya translation
 |      QAARFLFMKNKVRMIVDCHAKHVKVLQDEKLPFDLTLCGSTLRAPHSCHLQYMANMDSIA
 |      SLVMAVVVNDSDEDGDSRDAVLPQKKKRLWGLVVCHNTTPRFV
 |      <BLANKLINE>
 |  
 |  upper(self)
 |      Return a copy of the record with an upper case sequence.
 |      
 |      All the annotation is preserved unchanged. e.g.
 |      
 |      >>> from Bio.Seq import Seq
 |      >>> from Bio.SeqRecord import SeqRecord
 |      >>> record = SeqRecord(Seq("acgtACGT"), id="Test",
 |      ...                    description = "Made up for this example")
 |      >>> record.letter_annotations["phred_quality"] = [1, 2, 3, 4, 5, 6, 7, 8]
 |      >>> print(record.upper().format("fastq"))
 |      @Test Made up for this example
 |      ACGTACGT
 |      +
 |      "#$%&'()
 |      <BLANKLINE>
 |      
 |      Naturally, there is a matching lower method:
 |      
 |      >>> print(record.lower().format("fastq"))
 |      @Test Made up for this example
 |      acgtacgt
 |      +
 |      "#$%&'()
 |      <BLANKLINE>
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  letter_annotations
 |      Dictionary of per-letter-annotation for the sequence.
 |      
 |      For example, this can hold quality scores used in FASTQ or QUAL files.
 |      Consider this example using Bio.SeqIO to read in an example Solexa
 |      variant FASTQ file as a SeqRecord:
 |      
 |      >>> from Bio import SeqIO
 |      >>> record = SeqIO.read("Quality/solexa_faked.fastq", "fastq-solexa")
 |      >>> print("%s %s" % (record.id, record.seq))
 |      slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN
 |      >>> print(list(record.letter_annotations))
 |      ['solexa_quality']
 |      >>> print(record.letter_annotations["solexa_quality"])
 |      [40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5]
 |      
 |      The letter_annotations get sliced automatically if you slice the
 |      parent SeqRecord, for example taking the last ten bases:
 |      
 |      >>> sub_record = record[-10:]
 |      >>> print("%s %s" % (sub_record.id, sub_record.seq))
 |      slxa_0001_1_0001_01 ACGTNNNNNN
 |      >>> print(sub_record.letter_annotations["solexa_quality"])
 |      [4, 3, 2, 1, 0, -1, -2, -3, -4, -5]
 |      
 |      Any python sequence (i.e. list, tuple or string) can be recorded in
 |      the SeqRecord's letter_annotations dictionary as long as the length
 |      matches that of the SeqRecord's sequence.  e.g.
 |      
 |      >>> len(sub_record.letter_annotations)
 |      1
 |      >>> sub_record.letter_annotations["dummy"] = "abcdefghij"
 |      >>> len(sub_record.letter_annotations)
 |      2
 |      
 |      You can delete entries from the letter_annotations dictionary as usual:
 |      
 |      >>> del sub_record.letter_annotations["solexa_quality"]
 |      >>> sub_record.letter_annotations
 |      {'dummy': 'abcdefghij'}
 |      
 |      You can completely clear the dictionary easily as follows:
 |      
 |      >>> sub_record.letter_annotations = {}
 |      >>> sub_record.letter_annotations
 |      {}
 |      
 |      Note that if replacing the record's sequence with a sequence of a
 |      different length you must first clear the letter_annotations dict.
 |  
 |  seq
 |      The sequence itself, as a Seq or MutableSeq object.
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __hash__ = None


from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
simple_seq = Seq("GATCAGGATTAGGCC")
simple_seq_r = SeqRecord(simple_seq)


simple_seq_r.id

'<unknown id>'


simple_seq_r.id = "AC12345"


simple_seq_r.description = "not a real sequence"
print(simple_seq_r.description)

not a real sequence


simple_seq_r.seq

Seq('GATCAGGATTAGGCC')


from Bio.Seq import Seq
simple_seq = Seq("GATC")
from Bio.SeqRecord import SeqRecord
simple_seq_r = SeqRecord(simple_seq, id="AC12345")


simple_seq_r.annotations["evidence"] = "None."


print(simple_seq_r.annotations)

{'evidence': 'None.'}


simple_seq_r.letter_annotations["phred_quality"] = [40, 38, 30, 30]


print(simple_seq_r.letter_annotations)

{'phred_quality': [40, 38, 30, 30]}


print(simple_seq_r.letter_annotations["phred_quality"])

[40, 38, 30, 30]


translated_seq = simple_seq_r.seq.translate()
print(translated_seq)

D

C:\ProgramData\Anaconda3\lib\site-packages\Bio\Seq.py:2979: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(


#https://raw.githubusercontent.com/biopython/biopython/master/Tests/GenBank/NC_005816.fna
from Bio import SeqIO
recordN = SeqIO.read("NC_005816.fna", "fasta")
recordN

SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG'), id='gi|45478711|ref|NC_005816.1|', name='gi|45478711|ref|NC_005816.1|', description='gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence', dbxrefs=[])


#https://www.ncbi.nlm.nih.gov/nuccore/V01325.1?report=fasta
from Bio import SeqIO
record = SeqIO.read("myFasta.fasta", "fasta")
record

SeqRecord(seq=Seq('AATTCCGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCTTGGTATTCCAG...TAT'), id='V01325.1', name='V01325.1', description='V01325.1 Yeast (S. carlsbergensis) genes for 5.8S rRNA and 26S rRNA with intergenic region', dbxrefs=[])


record.seq

Seq('AATTCCGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCTTGGTATTCCAG...TAT')


recordN.seq

Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG')


recordN.id

'gi|45478711|ref|NC_005816.1|'


record.name

'V01325.1'


record.description

'V01325.1 Yeast (S. carlsbergensis) genes for 5.8S rRNA and 26S rRNA with intergenic region'


recordN.dbxrefs

[]


from Bio import SeqIO
record = SeqIO.read("NC_005816.gb", "genbank")
record

SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG'), id='NC_005816.1', name='NC_005816', description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence', dbxrefs=['Project:58037'])


record.seq

Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG')


record.id

'NC_005816.1'


record.name

'NC_005816'


record.description

'Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence'


record.letter_annotations

{}


len(record.annotations)

13


record.annotations["source"]

'Yersinia pestis biovar Microtus str. 91001'


record.dbxrefs

['Project:58037']


len(record.features)

41


from Bio import SeqFeature
start_pos = SeqFeature.AfterPosition(5)
end_pos = SeqFeature.BetweenPosition(9, left=8, right=9)
my_location = SeqFeature.FeatureLocation(start_pos, end_pos)


print(my_location)

[>5:(8^9)]


my_location.start

AfterPosition(5)


print(my_location.start)

>5


my_location.end

BetweenPosition(9, left=8, right=9)


print(my_location.end)

(8^9)


int(my_location.start)

5


int(my_location.end)

9


my_location.nofuzzy_start

5


my_location.nofuzzy_end

9


exact_location = SeqFeature.FeatureLocation(5, 9)
print(exact_location)

[5:9]


exact_location.start

ExactPosition(5)


int(exact_location.start)

5


exact_location.nofuzzy_start

5


from Bio import SeqIO
my_snp = 4350
record = SeqIO.read("NC_005816.gb", "genbank")
for feature in record.features:
   if my_snp in feature:
     print("%s %s" % (feature.type, feature.qualifiers.get("db_xref")))

source ['taxon:229193']
gene ['GeneID:2767712']
CDS ['GI:45478716', 'GeneID:2767712']


from Bio.Seq import Seq
from Bio.SeqFeature import SeqFeature, FeatureLocation
seq = Seq("ACCGAGACGGCAAAGGCTAGCATAGGTATGAGACTTCCTTCCTGCCAGTGCTGAGGAACTGGGAGCCTAC")
feature = SeqFeature(FeatureLocation(5, 18), type="gene", strand=-1)


feature_seq = seq[feature.location.start:feature.location.end].reverse_complement()
print(feature_seq)

AGCCTTTGCCGTC


feature_seq = feature.extract(seq)
print(feature_seq)

AGCCTTTGCCGTC


print(len(feature_seq))

13


print(len(feature))

13


print(len(feature.location))

13


from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
record1 = SeqRecord(Seq("ACGT"), id="test")
record2 = SeqRecord(Seq("ACGT"), id="test")


record1 = record2


record1.id == record2.id

True


record1.seq == record2.seq

True


from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
record = SeqRecord(
    Seq(
         "MMYQQGCFAGGTVLRLAKDLAENNRGARVLVVCSEITAVTFRGPSETHLDSMVGQALFGD"
         "GAGAVIVGSDPDLSVERPLYELVWTGATLLPDSEGAIDGHLREVGLTFHLLKDVPGLISK"
         "NIEKSLKEAFTPLGISDWNSTFWIAHPGGPAILDQVEAKLGLKEEKMRATREVLSEYGNM"
         "SSAC"
         ),
         id="gi|14150838|gb|AAK54648.1|AF376133_1",
         description="chalcone synthase [Cucumis sativus]",
         )
print(record.format("fasta"))

>gi|14150838|gb|AAK54648.1|AF376133_1 chalcone synthase [Cucumis sativus]
MMYQQGCFAGGTVLRLAKDLAENNRGARVLVVCSEITAVTFRGPSETHLDSMVGQALFGD
GAGAVIVGSDPDLSVERPLYELVWTGATLLPDSEGAIDGHLREVGLTFHLLKDVPGLISK
NIEKSLKEAFTPLGISDWNSTFWIAHPGGPAILDQVEAKLGLKEEKMRATREVLSEYGNM
SSAC


from Bio import SeqIO
record = SeqIO.read("NC_005816.gb", "genbank")
record

SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG'), id='NC_005816.1', name='NC_005816', description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence', dbxrefs=['Project:58037'])


len(record)

9609


len(record.features)

41


print(record.features[20])

type: gene
location: [4342:4780](+)
qualifiers:
    Key: db_xref, Value: ['GeneID:2767712']
    Key: gene, Value: ['pim']
    Key: locus_tag, Value: ['YP_pPCP05']


print(record.features[21])
# coding DNA sequence - CDS

type: CDS
location: [4342:4780](+)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['GI:45478716', 'GeneID:2767712']
    Key: gene, Value: ['pim']
    Key: locus_tag, Value: ['YP_pPCP05']
    Key: note, Value: ['similar to many previously sequenced pesticin immunity protein entries of Yersinia pestis plasmid pPCP, e.g. gi| 16082683|,ref|NP_395230.1| (NC_003132) , gi|1200166|emb|CAA90861.1| (Z54145 ) , gi|1488655| emb|CAA63439.1| (X92856) , gi|2996219|gb|AAC62543.1| (AF053945) , and gi|5763814|emb|CAB531 67.1| (AL109969)']
    Key: product, Value: ['pesticin immunity protein']
    Key: protein_id, Value: ['NP_995571.1']
    Key: transl_table, Value: ['11']
    Key: translation, Value: ['MGGGMISKLFCLALIFLSSSGLAEKNTYTAKDILQNLELNTFGNSLSHGIYGKQTTFKQTEFTNIKSNTKKHIALINKDNSWMISLKILGIKRDEYTVCFEDFSLIRPPTYVAIHPLLIKKVKSGNFIVVKEIKKSIPGCTVYYH']


sub_record = record[4300:4800]
sub_record

SeqRecord(seq=Seq('ATAAATAGATTATTCCAAATAATTTATTTATGTAAGAACAGGATGGGAGGGGGA...TTA'), id='NC_005816.1', name='NC_005816', description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence', dbxrefs=[])


len(sub_record)

500


len(sub_record.features)

2


print(sub_record.features[0])

type: gene
location: [42:480](+)
qualifiers:
    Key: db_xref, Value: ['GeneID:2767712']
    Key: gene, Value: ['pim']
    Key: locus_tag, Value: ['YP_pPCP05']


print(sub_record.features[1])

type: CDS
location: [42:480](+)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['GI:45478716', 'GeneID:2767712']
    Key: gene, Value: ['pim']
    Key: locus_tag, Value: ['YP_pPCP05']
    Key: note, Value: ['similar to many previously sequenced pesticin immunity protein entries of Yersinia pestis plasmid pPCP, e.g. gi| 16082683|,ref|NP_395230.1| (NC_003132) , gi|1200166|emb|CAA90861.1| (Z54145 ) , gi|1488655| emb|CAA63439.1| (X92856) , gi|2996219|gb|AAC62543.1| (AF053945) , and gi|5763814|emb|CAB531 67.1| (AL109969)']
    Key: product, Value: ['pesticin immunity protein']
    Key: protein_id, Value: ['NP_995571.1']
    Key: transl_table, Value: ['11']
    Key: translation, Value: ['MGGGMISKLFCLALIFLSSSGLAEKNTYTAKDILQNLELNTFGNSLSHGIYGKQTTFKQTEFTNIKSNTKKHIALINKDNSWMISLKILGIKRDEYTVCFEDFSLIRPPTYVAIHPLLIKKVKSGNFIVVKEIKKSIPGCTVYYH']


sub_record.annotations

{'molecule_type': 'DNA'}


sub_record.dbxrefs

[]


sub_record.annotations['topology'] = 'linear'


sub_record.id

'NC_005816.1'


sub_record.name

'NC_005816'


sub_record.description

'Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence'


sub_record.description = "Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, partial"


print(sub_record.format("genbank")[:200] + "...")

LOCUS       NC_005816                500 bp    DNA     linear   UNK 01-JAN-1980
DEFINITION  Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, partial.
ACCESSION   NC_005816
VERSION     NC_0058...


from Bio import SeqIO
record = next(SeqIO.parse("GSM461182_untreat_single_subset.fastq", "fastq"))
len(record)

75


print(record.seq)

CGAGCATTGCAATCAGATGGACGANCNNNNNNNNNNNNNNNNNNNNNNTNTNTTTGGCCTGGTTNNNNNNNANNN


print(record.letter_annotations["phred_quality"])

[31, 20, 32, 29, 33, 33, 34, 34, 34, 34, 33, 34, 33, 33, 33, 32, 29, 29, 2, 2, 2, 2, 2, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0]


left = record[:26]


print(left.seq)

CGAGCATTGCAATCAGATGGACGANC


print(left.letter_annotations["phred_quality"])

[31, 20, 32, 29, 33, 33, 34, 34, 34, 34, 33, 34, 33, 33, 33, 32, 29, 29, 2, 2, 2, 2, 2, 2, 0, 2]


right = record[21:]


print(right.seq)

CGANCNNNNNNNNNNNNNNNNNNNNNNTNTNTTTGGCCTGGTTNNNNNNNANNN


print(right.letter_annotations["phred_quality"])

[2, 2, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0]


edited = left + right


len(edited)

80


print(edited.seq)

CGAGCATTGCAATCAGATGGACGANCCGANCNNNNNNNNNNNNNNNNNNNNNNTNTNTTTGGCCTGGTTNNNNNNNANNN


print(edited.letter_annotations["phred_quality"])

[31, 20, 32, 29, 33, 33, 34, 34, 34, 34, 33, 34, 33, 33, 33, 32, 29, 29, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0]


edited = record[:20] + record[21:]


from Bio import SeqIO
record = SeqIO.read("NC_005816.gb", "genbank")
record

SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG'), id='NC_005816.1', name='NC_005816', description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence', dbxrefs=['Project:58037'])


len(record)

9609


len(record.features)

41


record.dbxrefs

['Project:58037']


record.annotations.keys()

dict_keys(['molecule_type', 'topology', 'data_file_division', 'date', 'accessions', 'sequence_version', 'gi', 'keywords', 'source', 'organism', 'taxonomy', 'references', 'comment'])


shifted = record[2000:] + record[:2000]
shifted

SeqRecord(seq=Seq('GATACGCAGTCATATTTTTTACACAATTCTCTAATCCCGACAAGGTCGTAGGTC...GGA'), id='NC_005816.1', name='NC_005816', description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence', dbxrefs=[])


len(shifted)

9609


len(shifted.features)

40


shifted.dbxrefs

[]


shifted.annotations.keys()

dict_keys(['molecule_type'])


shifted.dbxrefs = record.dbxrefs[:]
shifted.annotations = record.annotations.copy()
shifted.dbxrefs

['Project:58037']


shifted.annotations.keys()

dict_keys(['molecule_type', 'topology', 'data_file_division', 'date', 'accessions', 'sequence_version', 'gi', 'keywords', 'source', 'organism', 'taxonomy', 'references', 'comment'])


from Bio import SeqIO
record = SeqIO.read("NC_005816.gb", "genbank")
print("%s %i %i %i %i" % (record.id, len(record), len(record.features), len(record.dbxrefs), len(record.annotations)))

NC_005816.1 9609 41 1 13


rc = record.reverse_complement(id="TESTING")
print("%s %i %i %i %i" % (rc.id, len(rc), len(rc.features), len(rc.dbxrefs), len(rc.annotations)))

TESTING 9609 41 0 0

Biopython Sequence Annotation

Beatriz Manso

1. Set Working directory and install required libraries¶

2. The SeqRecord object¶

Creating a SeqRecord:¶

Pass the id, name and description to the initialization function, but if not they will be set as strings indicating they are unknown, and can be modified subsequently:¶

Pass the ID - AC12345:¶

Pass the description:¶

Get the sequence:¶

Including an identifier is very important if you want to output your SeqRecord to a file. You would normally include this when creating the object:¶

3. Annotating sequences¶

Print the annotation:¶

Print the annotation:¶

Translate the sequence:¶

4. SeqRecord objects from FASTA files¶

Identifiers and description:¶

The /db_xref qualifier serves as a vehicle for linking DNA sequence records to other external databases:¶

5. SeqRecord objects from GenBank files¶

Get the sequence:¶

GenBank files don’t have any per-letter annotations:¶

dbxrefs list gets populated from any PROJECT or DBLINK lines:¶

6. Feature, location and position objects¶

SeqFeature objects¶

Positions and locations¶

Fuzzy Positions¶

Create a location with fuzzy end points:¶

Print out a FeatureLocation object¶

Access the fuzzy start and end positions using the start and end attributes of the location:¶

If you just want numbers, they are actually subclasses of integers so should work like integers:¶

You can just pass in numbers to the FeaturePosition constructors, and you’ll get back out ExactPosition objects:¶

Location testing¶

Sequence described by a feature or location¶

SeqFeature object has an extract method to take care of all this (and since Biopython 1.78 can handle trans-splicing by supplying a dictionary of referenced sequences):¶

Comparison¶

The format() method¶

Slicing a SeqRecord¶

Slice this parent record from 4300 to 4800 (enough to include the pim gene/CDS), and see how many features we get:¶

Adding SeqRecord objects¶

Example with features, we’ll use a GenBank file. Suppose you have a circular genome:¶

Reverse-complementing SeqRecord objects¶