See also my earlier post, PubMed Publications by Year.

I recently came across this blog post, State of Deep Learning : H2 2018 Review  [local copy], which included this statement:

    "Every 20 minutes, a new ML paper is born. The growth rate of machine learning papers has been around 3.5% a month since July  --  which is around a 50% growth rate annually. This means around 2,200 machine learning papers a month and that we can expect around 30,000 new machine learning papers next year."  [Note: see my comment, below.]

    [Image source. Click image to open in new window.]

That plot is based on this one, by Jeff Dean et al. [Google: A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution (Mar/Apr 2018)]:


[Image source. Click image to open in new window.]

I’m being very picky here, but strictly speaking the number quoted in State of Deep Learning : H2 2018 Review, ~2,200 ML papers/month, is an over-estimate – based on the following criteria.

  • Jeff Dean’s plot (above) suggests a figure of ~1,750 ML submissions/month, which again appears to be an overestimate.

      (~21,000 ML papers in 2017) / (12 months * year-1) ≃ 1,750 ML papers/month

  • Looking at the plot below – sourced from – machine learning (cs.LG) papers comprised ~8% (measured at margin: 16mm/195mm) of the 27,031 cs papers submitted to arXiv in 2017.

    By that estimate, there were (27,031 * 0.08 ≃ 2,162), i.e. ~2.1 x 103 (rounded to two significant figures) ML papers submitted to arXiv in 2018.

    (27,031 * 0.08) / 12 ≃ 180 cs.LG (machine learning) submissions/month.

    Even adding in the smaller proportion of cs.AI papers, we can expect ~12.8% (25mm/195mm, measured at plot margin) of cs submissions to be ~290 cs.LG + cs.AI papers/month.


    [Image source. Click image to open in new window.]

A direct query (submitted 2018-11-30), 2017 cs.LG, returned 2,521 results (an average of 210 cs.LG papers/month in 2017. Similarly, a direct query (submitted 2018-11-30), 2017 “machine learning”, returned 4,360 results (363/month).

However, if you look at the cs.LG (machine learning) recent submissions page, you will see that many other arXiv categories (cs.CV; cs.DS; math.OC; stat.ML; etc.) are cross-listed as ML papers, both obfuscating and inflating those counts. That page showed 331 entries for Nov 26 2018 - Nov 30, 2018, suggesting (very approximately) ≅(331 / 5 * 30) ≃ 2,000 (rounded to one significant figure) “machine learning” papers/month.

These are the subject areas of primary personal interest – RSS feed subscriptions to {cs.LG | stat.ML | cs.LG | cs.AI | cs.IR} – and the Advanced Search result counts for 2017 (queried 2018-11-30).

arXiv monthly submission rates  [data: csv] are as follows:


[Image source. Click image to open in new window.]

  • The arXiv monthly submission rates indicated (2018-11-30) that 13,446 submissions were received in October 2018.

  • The 1991-2017 arXiv submission rate statistics indicated that “The cs [computer science] rate started as well to ramp up in 2007, grew to exceed the hep rate in calendar year 2012, continued on a very fast growth rate 2011-2015, and on an even faster growth rate since 2015 (driven largely by the computer vision, machine learning, and computational linguistics communities).”

There are more arXiv submission statistics (1991-2017)  here. The arXiv Bulk Data Access guidelines and OAI-PMH / API may allow programmatic access to those data (à la PubMed Publications by Year), but I haven’t pursued that option.

This dblp page, Statistics – Publications per year, gives the following plot:


[Image source. Click image to open in new window.  (Note: thesource data in their csv file is faulty: mis-formatted; missing data).]

Paper Gestalt (2010) presented this paper,


[Image source. Click image to open in new window.]

… while Deep Paper Gestalt (Dec 2018) – which built upon that work – presented this figure:


[Image source. Click image to open in new window.]