Wednesday, April 8, 2015

bpy26. Parsing SwissProt files using Biopython

In the last example, we saved the insulin proteins to 'data' subfolder.


We have a function that takes one parameter, the subfolder where *.txt files are stored, each corresponding to a UniProtKB text record.


We could have given our files a different extension during saving, for example .dat or .swiss which would require that term in the list comprehension. We do not have to have a filter term if only UniProtKB records are in the subfolder; in this case we have fils = os.listdir(fol)


Several attributes are printed of the records.

# bpy26.py
from __future__ import print_function, division
import os 
from Bio import SwissProt

def createRecords(fol):
    records = []
    fils = [fil for fil in os.listdir(fol) if fil.endswith('.txt')]
    for fil in fils:
        handle = open(fol + '/' + fil)
        record = SwissProt.read(handle)
        records.append(record)
        handle.close()
    return records

if __name__ == '__main__':
    records = createRecords('data')
    for record in records:
        print('Entry Name:',record.entry_name)
        print('Organism:',record.organism)
        print('Length:',record.sequence_length)
        first_crossref = record.cross_references[0]
        print('First cross ref:')
        for i in first_crossref:
            print('\t',i)
        print()
    
#Entry Name: IGF1R_HUMAN
#Organism: Homo sapiens (Human).
#Length: 1367
#First cross ref:
#         EMBL
#         X04434
#         CAA28030.1
#         -
#         mRNA
#
#Entry Name: IGF1R_MOUSE
#Organism: Mus musculus (Mouse).
#Length: 1373
#First cross ref:
#         EMBL
#         AF056187
#         AAC12782.1
#         -
#         mRNA

No comments:

Post a Comment