Dangerous Liaisons - UW Libraries

July 6, 2017

Data and the Fine Arts: Thinking Through a New Resource

Laura Dimmit

When we think about research in the fine arts, it is perhaps a little counter-intuitive to think about things that are data-driven, or data-rich. However, a recent release of historical data from the Carnegie Hall Archives in New York City provides a great opportunity to think through the what and why of arts-based data.

The Carnegie Hall Performance History data set contains more than 100 years of performance records. By choosing to release this data with a public domain license, the archives are not only allowing researchers and the general public to examine and manipulate their raw data, but also to build upon it and create additional works.

In addition to a full GitHub repository, Carnegie Hall has also created a SPARQL endpoint (Note: what is SPARQL? Honestly, Wikipedia does a good job of explaining, so read more here.) that allows users to explore the data, and run test queries, without the necessity of a full download.

Screenshot of the Performance History SPARQL endpoint

Screenshot of the Performance History SPARQL endpoint

Let’s take one of the suggested “sample queries” and dig a little deeper. How about “
Number of works by Bach performed each year.”


Screenshot of the available "sample queries" for users to experiment with.

Screenshot of the available “sample queries” for users to experiment with.


This query is making use of several different types of information included in this data set: the date of each individual performance, and the title and creator or composer of each performed work.

Screenshot of the first section of results from the "Number of works by Bach performed each year" query.

Screenshot of the first section of results from the “Number of works by Bach performed each year” query.


The results of this query provide a number of jumping-off points. Perhaps what you want to know is how the popularity of a classic composer like Bach has changed over time. Perhaps you want to compare the relative performance rates of Bach with one of his contemporaries, like Handel. Perhaps you are curious about which composers are most popular across time in different parts of the world–or different parts of the United States. A single query like this can inform a variety of research agendas.

However, it is equally important to preface the use of a data set like this with the its limits. While the Carnegie Hall data set is expansive (more than 50,000 events), it is not comprehensive. As the data set overview states:  

“Since our archives were not established until 1986, there are some gaps in these records, which we continue to fill using sources like digitized newspaper listings and reviews…”

Along with these literal limits, there are more contextual and historical limits to consider. Which types of artists and groups were invited to perform at a venue like Carnegie Hall? Who is, and was, able to attend performances there? How generalizable are the trends that can be identified within the performance history for a single venue?

New resources like this challenge me to be more creative and more thoughtful in my approach to supporting the research of students and faculty in my liaison areas. They remind me to cast a wider net when looking for resources to include in an instruction session or class guide; they remind me to ask bigger questions about new research projects–not just “what is your research question?” but also “what do you want this research to look like? how do you want people to interact with your results?” They also encourage me to continue to grow my data literacy, so that I can support the fullest, most vibrant suite of arts research.