What's hot in databases? (updated for 1975 to 2013!)

Trending topics in VLDB, via key words in the titles of publications from 1975 to 2013.

Last year's analysis and commentary is at vldb2012.html.

The more interesting cluster analysis is at index.html.

Thanks to @samrmadden for the 2000 - 2011 titles.
Thanks to DBLP for the rest!
If you want, the data and scripts are on GitHub.

Trends across all time

Thees trends are based on the keywords across all years (1975 - now) of VLDB publications

Most Stable

As we would expect, the most consistently popular keywords are all tightly related to our expertise as a community. databases, indexes, joins, models, and management are all at the core of what we do.

database
29,8,36,39,24,26,34,19,25,26,17,24,23,13,22,18,27,24,24,27,32,27,16,20,12,20,15,19,17,30,25,16,16,19,19,27,18,21,7
systems
9,3,17,15,13,12,19,8,12,11,5,13,10,5,3,6,10,8,6,13,17,14,4,3,8,7,12,14,10,20,18,11,15,17,8,8,16,7,4
distributed
4,0,6,7,3,9,6,4,5,5,2,3,3,5,1,2,3,4,1,3,0,3,1,1,1,1,3,3,5,5,6,4,5,8,4,3,3,6,8
management
6,2,7,7,6,5,3,4,2,6,2,10,7,2,5,3,7,7,7,4,8,8,3,3,3,5,8,6,8,15,8,5,9,8,6,3,8,2,1
relational
8,4,12,9,8,3,8,5,4,8,7,8,7,2,8,4,2,6,0,2,3,3,2,2,3,5,5,5,7,9,5,3,3,1,5,4,1,5,3
data
39,13,51,48,29,34,46,25,32,33,24,32,27,15,25,29,37,33,33,37,41,41,37,39,29,42,45,53,65,72,50,45,61,66,51,61,60,42,30
algorithms
0,0,1,0,2,1,2,1,0,3,2,3,2,5,3,6,2,2,2,5,1,2,5,6,2,2,1,4,0,4,5,6,4,2,3,2,3,4,3
queries
3,0,3,0,4,4,3,2,3,9,4,11,13,8,8,7,7,8,8,7,11,15,8,10,13,15,19,11,29,32,37,37,33,37,22,32,29,23,24
model
9,4,6,7,4,3,9,4,9,3,7,4,3,3,4,6,10,11,3,5,4,4,0,1,2,2,1,2,4,2,2,7,8,2,4,3,4,1,1
semantic
2,1,1,1,2,3,4,4,2,2,3,1,1,2,1,0,5,1,1,2,2,1,0,1,3,2,1,1,5,5,5,4,4,1,2,4,1,2,0
dynamic
1,0,0,1,1,0,0,0,1,3,1,1,2,3,1,1,1,2,3,1,3,1,1,1,2,1,4,4,2,0,4,2,4,4,4,3,0,3,2
performance
2,1,3,1,1,1,2,0,0,2,1,3,2,4,0,2,4,4,3,6,3,1,1,5,4,3,2,1,3,3,2,1,3,5,4,4,2,3,2

What's hot?

Here, we look for keywords that have an overall upward trend.

It's interesting that queries, indices, and efficiency are new in the past two decades. We can also see the web burst onto the scene, bringing search along with it. We see the excitement, then decline of streaming. Distributed processing is on the rise in parallel with "big data".

queries
3,0,3,0,4,4,3,2,3,9,4,11,13,8,8,7,7,8,8,7,11,15,8,10,13,15,19,11,29,32,37,37,33,37,22,32,29,23,24
data
39,13,51,48,29,34,46,25,32,33,24,32,27,15,25,29,37,33,33,37,41,41,37,39,29,42,45,53,65,72,50,45,61,66,51,61,60,42,30
based
2,0,0,3,0,3,0,1,2,2,2,6,2,2,2,5,2,4,5,4,3,2,2,7,4,6,6,4,13,14,14,9,5,10,10,13,15,12,7
efficient
1,1,0,2,0,1,1,1,0,0,3,1,1,3,0,4,1,2,0,4,3,3,5,2,2,4,5,6,7,8,18,15,16,15,13,11,14,12,7
distributed
4,0,6,7,3,9,6,4,5,5,2,3,3,5,1,2,3,4,1,3,0,3,1,1,1,1,3,3,5,5,6,4,5,8,4,3,3,6,8
processing
1,0,2,0,4,3,6,2,1,6,2,6,1,1,2,5,2,3,1,1,3,2,4,3,2,5,10,4,9,10,8,10,11,13,6,15,10,6,6
graphs
0,0,0,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,0,3,0,0,0,0,0,1,0,1,0,0,3,1,4,7,8,12,15,14,9
web
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,2,5,9,16,11,14,9,8,4,5,12,17,8,14,7,5
index
1,0,0,0,0,0,0,0,0,0,1,1,2,2,0,4,0,2,0,6,1,1,2,2,7,5,7,6,7,7,7,10,10,8,8,10,8,8,4
search
2,0,3,6,2,0,0,0,0,1,0,2,1,1,1,1,1,0,6,1,4,1,4,5,1,6,2,9,7,9,8,5,12,16,6,18,9,7,4
large
10,1,6,2,2,3,3,1,1,2,0,2,1,5,0,0,2,0,2,3,7,5,2,5,2,1,5,3,5,6,5,3,4,6,4,8,9,7,5
streams
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,1,8,13,20,12,9,11,8,10,12,3,6,4

What's not?

On the long time scale, database, models and relational as keywords are dying!

We can see that object oriented, rules, activity came and went. We always talk about XML being dead, at it sure looks that way on paper, along with streams, services, and caching.

database
29,8,36,39,24,26,34,19,25,26,17,24,23,13,22,18,27,24,24,27,32,27,16,20,12,20,15,19,17,30,25,16,16,19,19,27,18,21,7
systems
9,3,17,15,13,12,19,8,12,11,5,13,10,5,3,6,10,8,6,13,17,14,4,3,8,7,12,14,10,20,18,11,15,17,8,8,16,7,4
model
9,4,6,7,4,3,9,4,9,3,7,4,3,3,4,6,10,11,3,5,4,4,0,1,2,2,1,2,4,2,2,7,8,2,4,3,4,1,1
object
0,0,0,0,0,0,0,1,1,2,3,3,5,2,8,7,8,10,11,12,7,6,6,5,3,5,1,1,0,3,0,1,2,1,5,5,0,3,0
oriented
0,0,2,0,0,0,0,1,0,0,2,1,2,1,2,3,7,6,4,7,4,2,2,1,2,0,0,1,0,0,2,0,0,2,4,2,2,0,0
management
6,2,7,7,6,5,3,4,2,6,2,10,7,2,5,3,7,7,7,4,8,8,3,3,3,5,8,6,8,15,8,5,9,8,6,3,8,2,1
cached
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1,0,3,0,2,4,1,8,4,4,3,4,1,0,2,1,1,0,2,1
dimensional
0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,1,0,0,1,4,6,4,4,6,2,2,4,5,2,3,5,4,2,1,0,0,0
server
0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,2,1,2,2,2,4,4,3,2,0,2,1,2,0,0,1,2,2,3,0,0,0
data
39,13,51,48,29,34,46,25,32,33,24,32,27,15,25,29,37,33,33,37,41,41,37,39,29,42,45,53,65,72,50,45,61,66,51,61,60,42,30
xml
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,4,15,12,22,18,21,16,10,12,4,7,2,2,0
streams
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,1,8,13,20,12,9,11,8,10,12,3,6,4

The New Hotness

Let's now look at keywords that have burst into the scene in since I started graduate school (2007). These keywords are selected by computing the ratio of "the average number of times a keyword is used since 2010" by "the average before 2010".

Mapreduce, scaling, and the cloud are still at the peak of the gartner hype cycle, and there are so many systems that it's hard to even compare them.

Web data is inherently uncertain so we need probabilitc techniques to search for similar data.

Finally it's nice to see that crowdsourcing is still trending upwards. Perhaps work with a more human and social angle is ready for the lime light.

graphs
0,0,0,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,0,3,0,0,0,0,0,1,0,1,0,0,3,1,4,7,8,12,15,14,9
mapreduce
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,6,10,9,1
crowd
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,3,5
social
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,2,4,1,7,3,4
uncertain
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,2,2,2,6,1,8,7,5,2
awareness
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,2,3,2,2,9,4,3,2
cloud
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,6,3,4,1
match
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,1,0,0,1,2,1,4,6,4,7,1,5,6,8,6,3
web
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,2,5,9,16,11,14,9,8,4,5,12,17,8,14,7,5
probabilistic
0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,0,1,0,4,1,1,4,5,3,7,4,7,0
scale
0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,1,1,1,4,2,4,3,3,0,2,3,3,3,8,0,6
similarity
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,3,2,1,0,5,1,0,0,0,3,4,5,4,6,5,6,3
search
2,0,3,6,2,0,0,0,0,1,0,2,1,1,1,1,1,0,6,1,4,1,4,5,1,6,2,9,7,9,8,5,12,16,6,18,9,7,4
scalability
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,2,2,0,2,1,5,2,1,0,2,2,3,3,6,7,8,4,5,1
labeling
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,3

Top topics by year

It's good to see that data, query and databases are never far from our minds.

2013
queries24
data20
graphs9
distributed8
based7
database7
efficient7
processing6
scale6
crowd5
2012
queries23
database21
data16
graphs14
based12
efficient12
optimal12
mapreduce9
index8
join8
2011
data39
queries29
database18
systems16
based15
graphs15
efficient14
web14
mapreduce10
processing10
2010
queries32
data29
database27
search17
processing15
based13
graphs12
streams12
efficient11
index10
2009
data28
queries21
database19
web17
efficient13
networks11
based10
streams10
graphs8
index8
2008
data45
queries37
database19
systems17
search16
efficient15
processing13
web12
xml12
based10
2007
data43
queries33
database16
efficient16
systems15
processing11
streams11
index10
search10
xml10
2006
queries37
data26
database16
xml16
efficient15
optimal11
systems11
index10
processing10
based9

How is Eugene's career looking

The topics I've decided to work on look like a mixed bag. Provenance or lineage seems to be down, but I believe in second chances. Good to know that data analysis and workload driven research is a stable and increasingly topic!

`
lineage
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,2,3,2,2,1,1,1,1
visual
0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,2,1,2,0,0,0,2,0,1,1,0,1,0,1,1,3,2,0,1
analysi
3,0,2,3,0,2,2,0,1,2,1,0,1,1,0,1,1,0,2,1,0,4,0,3,0,1,0,2,2,1,0,3,3,4,7,5,5,3,2
bigdata
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,2
crowd
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,3,5
workload
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,0,0,0,0,2,1,0,0,2,0,4,3,4,0,2