Tuesday, March 17, 2015

bpy6. Reading a multi-record FASTA file into Biopython

This text file is saved as bpy6.fna in the working directory.

>seq1
ATTGGA
>seq2
ATG
>seq3
GCTA

We have to use the function Bio.SeqIO.parse to read a multi-record file. This function returns a generator object.


Usually, we will just loop over the generator. However, we can also use the next method, to yield a new SeqRecord object. When there are no more SeqRecord objects to yield, it will return a StopIteration Exception.


We reload the records object, to show the next method example. The previous named records object, is garbage collected, as we do not have reference to it.

# bpy6.py

from __future__ import print_function
from Bio import SeqIO
records = SeqIO.parse('bpy6.fna','fasta')
print('type(records)=',type(records))
for record in records:
    print('id:',record.id)
    print('alpha:',record.seq.alphabet)
    print('seq:',record.seq)
records = SeqIO.parse('bpy6.fna','fasta')
record1 = records.next()
record2 = records.next()
record3 = records.next()
rec_ids = [r.id for r in [record1,record2,record3]]
print('rec_ids =',rec_ids)
#type(records)= 
#id: seq1
#alpha: SingleLetterAlphabet()
#seq: ATTGGA
#id: seq2
#alpha: SingleLetterAlphabet()
#seq: ATG
#id: seq3
#alpha: SingleLetterAlphabet()
#seq: GCTA
#rec_ids = ['seq1', 'seq2', 'seq3']

No comments:

Post a Comment