Computer Science Publications By Year
See also my earlier post, PubMed Publications by Year.
I recently came across this blog post, State of Deep Learning : H2 2018 Review [local copy], which included this statement:
-
"Every 20 minutes, a new ML paper is born. The growth rate of machine learning papers has been around 3.5% a month since July -- which is around a 50% growth rate annually. This means around 2,200 machine learning papers a month and that we can expect around 30,000 new machine learning papers next year." [Note: see my comment, below.]

[Image source. Click image to open in new window.]
That plot is based on this one, by Jeff Dean et al. [Google: A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution (Mar/Apr 2018)]:
[Image source. Click image to open in new window.]I’m being very picky here, but strictly speaking the number quoted in State of Deep Learning : H2 2018 Review, ~2,200 ML papers/month, is an over-estimate – based on the following criteria.
-
Jeff Dean’s plot (above) suggests a figure of ~1,750 ML submissions/month, which again appears to be an overestimate.
-
(~21,000 ML papers in 2017) / (12 months * year-1) ≃ 1,750 ML papers/month
-
Looking at the plot below – sourced from arXiv.org – machine learning (cs.LG) papers comprised ~8% (measured at margin: 16mm/195mm) of the 27,031 cs papers submitted to arXiv in 2017.
By that estimate, there were (27,031 * 0.08 ≃ 2,162), i.e. ~2.1 x 103 (rounded to two significant figures) ML papers submitted to arXiv in 2018.
(27,031 * 0.08) / 12 ≃ 180 cs.LG (machine learning) submissions/month.
Even adding in the smaller proportion of cs.AI papers, we can expect ~12.8% (25mm/195mm, measured at plot margin) of cs submissions to be ~290 cs.LG + cs.AI papers/month.
[Image source. Click image to open in new window.]
A direct query (submitted 2018-11-30), 2017 cs.LG, returned 2,521 results (an average of 210 cs.LG papers/month in 2017. Similarly, a direct query (submitted 2018-11-30), 2017 “machine learning”, returned 4,360 results (363/month).
However, if you look at the cs.LG (machine learning) recent submissions page, you will see that many other arXiv categories (cs.CV; cs.DS; math.OC; stat.ML; etc.) are cross-listed as ML papers, both obfuscating and inflating those counts. That page showed 331 entries for Nov 26 2018 - Nov 30, 2018, suggesting (very approximately) ≅(331 / 5 * 30) ≃ 2,000 (rounded to one significant figure) “machine learning” papers/month.
These are the subject areas of primary personal interest – RSS feed subscriptions to {cs.LG | stat.ML | cs.LG | cs.AI | cs.IR} – and the Advanced Search result counts for 2017 (queried 2018-11-30).
- [machine learning] 2017 cs:LG: 2,195 results (avg. 183.0/mo)
-
[machine learning] 2017 stat.ML: 1,579 results (131.6/mo)
- [machine learning] 2017 cs.LG OR stat.ML: 3,774 results (314.5/mo) [sum of previous two queries]
-
[machine learning] 2017 “machine learning”: 2,333 results (194.4/mo)
- [computation & language] 2017 cs.CL: 1,870 results (155.8/mo)
- [artificial intelligence] 2017 cs.AI: 1,153 results (96.1/mo)
- [information retrieval] 2017 cs.IR: 469 results (39.1/mo)
arXiv monthly submission rates [data: csv] are as follows:
[Image source. Click image to open in new window.]-
The arXiv monthly submission rates indicated (2018-11-30) that 13,446 submissions were received in October 2018.
-
The 1991-2017 arXiv submission rate statistics indicated that “The cs [computer science] rate started as well to ramp up in 2007, grew to exceed the hep rate in calendar year 2012, continued on a very fast growth rate 2011-2015, and on an even faster growth rate since 2015 (driven largely by the computer vision, machine learning, and computational linguistics communities).”
There are more arXiv submission statistics (1991-2017) here. The arXiv Bulk Data Access guidelines and OAI-PMH / API may allow programmatic access to those data (à la PubMed Publications by Year), but I haven’t pursued that option.
This dblp page, Statistics – Publications per year, gives the following plot:
[Image source. Click image to open in new window. (Note: thesource data in their csv file is faulty: mis-formatted; missing data).]Paper Gestalt (2010) presented this paper,
[Image source. Click image to open in new window.]… while Deep Paper Gestalt (Dec 2018) – which built upon that work – presented this figure:
[Image source. Click image to open in new window.]