Thursday, March 19, 2015

bpy10. Reading a 1-record Genbank file using Biopython

Search for 'GenBank: BC135714.1' at NCBI.


In the results, there is Literature, Health, Genomes, and other sections. In the Genomes section, select the Nucleotide. If there is only 1 nucleotide, it will go to the Genbank file with that Locus ID.


In the upper right, select down arrow for send. Make sure complete record is selected, and then choose destination of File. Download options will come, and download the Genbank file.


Rename the file to BC135714.1.gb and save it to the working directory or a subfolder, such as data, under the working directory.


In this program, the function Bio.SeqIO.read is used to parse the text file. We print some information along with translated protein from a feature. The selected feature is the gene for the protein (ppm1a).

# bpy10.py
from __future__ import print_function, division
from Bio import SeqIO

gb = SeqIO.read('data/BC135714.1.gb','genbank')
print('type(gb) =',type(gb), end='\n\n')
print('id =', gb.id, end='\n\n')
print('desc =',gb.description, end='\n\n')
for annot in gb.annotations:
    print("--> ",annot)
    if annot == 'source':
        print(gb.annotations[annot], end='\n\n')
for ft in gb.features:
    print('--ft loc-->',ft.location)
dna = ft.extract(gb.seq) # ft is last feature ('ppm1a protein')
print("3rd feature translation: ",dna.translate())

#type(gb) = <class 'Bio.SeqRecord.SeqRecord'>
#
#id = BC135714.1
#
#desc = Xenopus tropicalis protein phosphatase 1A (formerly 2C),
# magnesium-dependent, alpha isoform, mRNA (cDNA clone MGC:121664
# IMAGE:7625995), complete cds.
#
#-->  comment
#-->  sequence_version
#-->  source
#Xenopus (Silurana) tropicalis (western clawed frog)
#
#-->  taxonomy
#-->  keywords
#-->  references
#-->  accessions
#-->  data_file_division
#-->  date
#-->  organism
#-->  gi
#--ft loc--> [0:3385](+)
#--ft loc--> [0:3385](+)
#--ft loc--> [336:1488](+)
#3rd feature translation:  
#MGAFLDKPKMEKHNAHGQGNGLRYGLSSMQGWRVEMEDAHTAVIGLPNGLD
#AWSFFAVYDGHAGSQVAKYCCEHLLDHITSNQDFKGTDGHLSVWSVKNGIR
#TGFLQIDEHMRVISEKKHGADRSGSTAVGVMTSPNHIYFINCGDSRGLLCR
#SKKVHFFTQDHKPSNPLEKERIQNAGGSVMIQRVNGSLAVSRALGDFDYKC
#VHGKGPTEQLVSPEPEVYEIERSEEDDQFIILACDGIWDVMGNEELCDFVW
#SRLEVTDDLERVCNEIVDTCLYKGSRDNMSVILICFPSAPKVLPEAVKREA
#ELDKYLEGRVEDIIKKQGEEGVPDLVHVMRTLASENIPNLPPGGELASKRS
#VIEAVYNRLNPYRNDDTDSASTDDMW*

No comments:

Post a Comment