“A particular successful guide to understanding and modeling cancer progression has been evolutionary theory, which has a long tradition in cancer research. Already 40 years ago, seminal work established an evolutionary view of cancer, in which carcinogenesis is regarded as an evolutionary process driven by stepwise somatic mutations and clonal expansions” (Beerenwinkel et al, 2014)

Writing the review article from which I took this quote made me wonder how the long tradition of an evolutionary understanding of cancer plays out on PubMed.

Here is my analysis. For full transparency the R markdown file underlying this document is available at http://www.markowetzlab.org/supplements/PubMedAnalysis.Rmd.

Step 1: Source the scripts

My analysis is based on a script described on Rpsychologist.com and deposited at github.

My own version (sourced here) is identical to Rpsychologist’s except for fixing an URL in line 62 of his function.


Step 2: Define the query

I have used three different queries (‘cancer heterogeneity’, ‘cancer evolution’, ‘reviews’) defined as follows:

query <- c("cancer heterogeneity" = "((intratumoral heterogeneity[Title/Abstract]) 
                                     OR (tumor heterogeneity[Title/Abstract]) 
                                     OR (genetic heterogeneity[Title/Abstract])) 
                                     AND cancer NOT review[Publication Type]",
           "cancer evolution" = "((clonal evolution[Title/Abstract]) 
                                     OR (cancer evolution[Title/Abstract])) 
                                     AND cancer NOT review[Publication Type]",
           "reviews" =  "((intratumoral heterogeneity[Title/Abstract]) 
                                     OR (tumor heterogeneity[Title/Abstract]) 
                                     OR (genetic heterogeneity[Title/Abstract]) 
                                     OR (clonal evolution[Title/Abstract]) 
                                     OR (cancer evolution[Title/Abstract])) 
                                     AND cancer AND review[Publication Type]")

The table summarizes the search terms used in the query and their connections:

term heterogeneity evolution review
intratumoral heterogeneity[Title/Abstract] OR - OR
tumor heterogeneity[Title/Abstract] OR - OR
genetic heterogeneity[Title/Abstract] OR - OR
clonal evolution[Title/Abstract] - OR OR
cancer evolution[Title/Abstract] - OR OR
cancer AND AND AND
review[Publication Type] NOT NOT AND

Step 4: Visualize the result

We plot the results

ggplot(df, aes(year, count, group=.id, fill=.id)) + 
    geom_area() + 
    labs(title="Cancer heterogeneity and evolution literature over the years", x = "year", y = "Number of PubMed hits") + 
    scale_fill_brewer(palette="Spectral") + 
    theme_bw() + 

plot of chunk PubMedTrend

That’s all folks.