Thursday, April 2, 2015

bpy22. Using ECitMatch in Python

If we have these 5 items for an article: (1) Journal Name, (2) Year (3) Volume, (4) Page, (5) Author, we can find the pubmed id for the article using ecitmatch. This will rarely be needed as esearch will usually find the information. However parsing the html string from get requests is important in general.


We do not have a built-in Bio.Entrez utility to deal with ecitmatch yet. But esearch can also return pubmed IDs, and the search term may use limits by Author, by Journal, by Date, etc.


The tuples journal, year, vol, page, author contain strings for the first and second article. We replace spaces with +, add pipes between different items and join the two terms using '%0D'

# bpy22.py
from __future__ import print_function, division
import urllib2

ids = 'article1','article2'
journal = "Front Cell Neurosci","J Biomed Sci"
year = "2014","2013"
vol = "8","20"
page = "47","92"
author = "Iyengar BR", "Skipper KA"
base = ("http://eutils.ncbi.nlm.nih.gov/entrez/"
        "eutils/ecitmatch.cgi?db=pubmed&retmode=xml&bdata=")
s = [None]*len(ids)
for i in range(len(ids)):
    fmt = (journal[i].replace(' ','+'), year[i], vol[i],
           page[i], author[i].replace(' ','+'), ids[i])
    s[i] = '%s|%s|%s|%s|%s|%s' % fmt
addr = base + '%0D'.join(s)    
response = urllib2.urlopen(addr)
html = response.read()
pIDs = []
for i in html.split('\n'):
    if i == '': continue
    pIDs.append(i.split('|')[-1])
print("pIDs=\n",pIDs)

#pIDs=
# ['24605084', '24320156'] 

No comments:

Post a Comment