Sunday, March 15, 2015

bpy5. Reading a FASTA file with one record in Biopython

A test fasta file is available from Biopython. This file contains only one sequence. You should download the text file to your working directory.


We can use Bio.SeqIO.read to read a file with only one record. The arguments are file name, format and optional alphabet. The format here is 'fasta'. If alphabet is omitted, it defaults to SingleLetterAlphabet(). You should provide an alphabet since fasta files do not have alphabet information inside them. However, you may be able to find from the extension, such as fna here.


We can find the attributes of the SeqRecord object which is returned, such as id and seq.


We also use some indexing to select some 10-length sequences.

# bpy5.py

from __future__ import print_function
from Bio import SeqIO
from Bio.Alphabet.IUPAC import unambiguous_dna as dna
record = SeqIO.read('NC_005816.fna','fasta',dna) 
print('id:',record.id)
print('alpha:',record.seq.alphabet)
print('first 10 letters:',record.seq[:10])
print('last 10 letters:',record.seq[-11:-1])
print('first 10 even letters:',record.seq[:20:2])
print('first 10 odd letters:',record.seq[1:21:2])
#id: gi|45478711|ref|NC_005816.1|
#alpha: IUPACUnambiguousDNA()
#first 10 letters TGTAACGAAC
#last 10 letters CCCGACCCCT
#first 10 even letters TTAGAGTCAA
#first 10 odd letters GACACGGATG

No comments:

Post a Comment