Tuesday, March 24, 2015

bpy12. Using egquery EUtil in Biopython

We can use Bio.Entrez.egquery to find counts in different databases, for a query term.


For Entrez.email, you should give your email. The query term, here, is 'asthma'. Biopython returns a dictionary of length 2, with the result in key 'eGQueryResult". The result inside 'eGQueryResult' are also a dictionary.


The dictionary within the result is iterated over (with row being the current subkey). Usually, we are only interested in two subkeys; the database name 'MenuName' and the count 'Count'.


We populate lists X and Y with the strings from the 2 subkeys. Finally, we print the two lists, provided count is not zero.

# bpy12.py
from __future__ import print_function, division
from Bio import Entrez

Entrez.email = "Your.Name.Here@example.org"
handle = Entrez.egquery(term="asthma")
record = Entrez.read(handle)
handle.close()
X=[]
Y=[]
for row in record["eGQueryResult"]:
    for i in row:
        if i == 'MenuName':
           Y.append(row[i])
        elif i=='Count':
            X.append(row[i])
for i in range(len(X)):
    if int(X[i])==0:
        continue
    print("%20s\t%s" % (Y[i],X[i]))

#              PubMed    149142
#      PubMed Central    106753
#                MeSH    11
#               Books    7165
#       PubMed Health    1461
#                OMIM    270
#         Site Search    75
#          Nucleotide    32446
#                 GSS    91
#                 EST    210
#             Protein    2558
#              Genome    1
#           Structure    390
#               dbVar    9
#                Gene    1034
#                 SRA    8245
#          BioSystems    125
#             UniGene    1
#   Conserved Domains    22
#              PopSet    7
#        GEO Profiles    731699
#        GEO DataSets    4417
#          HomoloGene    18
#    PubChem Compound    122
#   PubChem Substance    2125
#    PubChem BioAssay    3515
#         NLM Catalog    2610
#               Probe    34
#               dbGaP    2387
#          BioProject    209
#           BioSample    12542

No comments:

Post a Comment