Thursday, March 26, 2015

bR1. Getting the top 5 databases for 'asthma' at NCBI using R

We will read count.csv, that we created in last example, sort and normalize it, and plot the top 5 databases as a barplot.


There is no reason that we can not use this from within python. Also, there is no reason to use Biopython rather than Bioconductor. However, for me, Biopython is easier to understand.

# bR1.R
cnt <- read.table("count.csv", sep = ',',
                  header = FALSE, stringsAsFactors = FALSE,
                  col.names = c("db","count"))
cnt.ord <- cnt[order(cnt$count, decreasing = TRUE),]
cnt.ord$count <- cnt.ord$count/max(cnt.ord$count)
barplot(cnt.ord$count[1:5], names.arg = cnt.ord$db[1:5],
        ylab = 'Relative Number', xlab = 'Database',
        main = "Top 5 databases for 'asthma' at NCBI")

No comments:

Post a Comment