Digital Approaches to Early Music and Linked Music Hackathon

Our partners at OeRC have recently been running a couple of mini-projects under the Semantic Media Network grant: Semantic Linking and Integration of Content, Knowledge and Metadata in Early Music (SLICKMEM) and Semantic Linking of BBC Radio (SLoBR). Both projects have a strong Linked Data remit, publishing a significant amount of new Linked Data to the Web. As they were drawing to a close, Kevin Page was keen to close them with a hack day at which the developer community could experiment with the data sources the projects have generated and enhanced.

Kevin approached us with the idea of holding this event at Goldsmiths and running it in conjunction with a seminar on the musicological issues surrounding some of the data sets. My original idea was to make the subject of the seminar early instrumental music as the data sets that SLICKMEM and SLoBR cover include a significant selection of lute music drawn from the Electronic Corpus of Lute Music (ECOLM). However, as the expressions of interest came in and as we started to build up a list of data sets to use for the hack day it become clear that we should broaden the remit.

So the seminar we held was called Digital Approaches to Early Music. We had five speakers: Katherine Butler (Oxford), Tim Crawford (Goldsmiths), Simon McVeigh (Goldsmiths), Laurent Pugin (RISM), and Frans Wiering (Utrecht), and we had about fifteen participants. So it was quite a nice small, focused group.

I started the day with a short welcome in which I outlined some of the history of our digital musicology research group here at Goldsmiths and explained the context in which the Semantic Media Network projects had been carried out. The rest of the day was then chaired by David Lewis.

Frans Wiering gave the first paper, reporting on work he has been doing with Charlie Inskip at UCL: a survey study of the current discipline of musicology and particularly its use of digital methods and resources. Frans reported how people seemed broadly quite in favour of digital techniques in music research, but that they were also able to articulate several quite important limitations such as digital techniques leading to rushed conclusions, the problems of closed access to resources, and the problems of poorly filtered and too numerous results. In the following discussion it was noted that access to resources seems to be one of the highest priorities and that therefore libraries have an important part to play in the uptake of digital techniques. Topics also covered the quite small representation in Frans' sample of respondents who were interested in concepts beyond access to resources such as availability of software, or new, digital methods for research.

Katherine Butler gave a presentation on the techniques her team on the Tudor Partbooks project at Oxford are using to deal with a particularly problematic music manuscript. The books copied by John Sadler used a particularly acid ink which over the years has caused significant burn-through. Katherine described a variety of digital image editing techniques they have been using to try to convert the images into cleaner versions. A significant discussion point raised by Katherine was what are they actually trying to end up with? Do they want to create images that look as they would have done when Sadler first copied the music? Or do they want to create something that looks as it would do today if Sadler has not used such acidic ink? They also don't want to impose any editorial interventions at this point, for example they're not going to correct any copying errors. Katherine argued that this gives the work a slightly problematic status: it's not a facsimile, but it's also not really an edition; it's somewhere in-between. There was some discussion on possible techniques for quantifying the level of intervention that has been applied for each published page, for example some appropriate digital distance measure between the original and restored images. Another significant discussion point was on the different audience groups and how a digital edition could be made to serve their needs.

Laurent Pugin gave an introduction to RISM, especially focusing on their recent digital infrastructure including their Linked Data facilities. He then gave several demonstrations of how data from RISM can be used to visualise things like publishing trends in a bar chart or on a map of Europe. Laurent also mentioned RISM's use of Plaine and Easie Code for encoding music incipits and how they are interested in experimenting with MEI for this too. He described the Verovio tool which he has been developing for rendering MEI-encoded music notation in the browser. They are exploring ways of providing musical content-based search in RISM using tools such as ThemeFinder. Laurent raised the discussion point of distinguishing between the logical and graphical domains in music encoding.

Simon McVeigh described his team's work on a Transforming Musicology mini-project on analysing the data from concert programmes from the 18th an 19th centuries. They have produced Linked Data resources from a variety of concert programme data sources and have been working on various visualisations and analysis methods. Simon raised a number of key discussion points. He asked whether quantitative analysis of this kind is really a new method for music research or is it just the same method carried out faster? He raised the concern over maintaining scholarly control when projects make use of crowd-sourcing. He mentioned the problematic notion of accuracy when discussing tools such as optical music recognition with musicologists; a measure such as "90%" accuracy needs some explaining outside of a scientific or engineering context. Simon argued that many of the quantitative techniques are "suggestive" but they don't really analyse, you have to apply deeper thinking for real analysis. As an example, Simon mentioned how their visualisation of concerts by country "constrains your thinking into an outmoded nationalism", and this happens because it's easy to ask these questions with the data. Following discussion points included how to deal with the noise in newspaper and concert bill OCR outputs, particularly things like dealing with adverts. Simon argued that the adverts can give useful context and that discarding them during data preparation would prevent future users from having access to this context. This seemed a good example of a tension between, on the one hand, the drive to automate and to work on a large scale (where such information is considered "noise"), and on the other hand, the drive to make resources available for careful consideration by scholars, where such information would be considered invaluable context. In short, the categorisation of "noise" is subjective. Another interesting discussion point raised was on the availability of things like attendance data at historic concerts. Simon argued that the broader question here is on how you assess value and whether such data can be considered a quantitative measure of value.

The last speaker of the day was Tim Crawford who changed the topic of his talk at the last minute! Instead of talking about the technical infrastructure for analysing networks in music history, Tim gave an account of his research into Lord Danby's Lutebook and gave a personal reflection on how the technology which he has been promoting and experimenting with over the past twenty years of his research career has—and has not—met his expectations and requirements. Tim's talk was well received as a motivator for why digital approaches in music research can be really powerful but also as a warning for where they fall short.

The concluding discussion covered the topic of resource publication and how current research funding seems to be moving away from resource generation, away from "mere digitisation". Related to this was the discussion of how it's important for researchers to be able to draw peer-reviewed publications out of resource generation work. There was also an argument that future iterations of the REF may be more amenable to considering digital resources as valid outputs. Following this theme the question of sustainability of digital outputs was discussed, and especially the merits of Linked Open Data in this regard. The current lack of an obvious curator of humanities digital outputs was noted, especially since the closure of the Arts and Humanities Data Service in 2008.

Another discussion point was around the level of digital literacy amongst musicologists and how quite low levels, epsecially when it comes to writing code to automate processing tasks and the like, could be hindering uptake of digital approaches. Related to this, the problem of scholars being able to find out about best practice in digital methods was raised. It was largely agreed that deeper training in digital techniques is required for early career researchers or even for students.

The following day (Friday 9 October), was the Linked Music Hackathon which will be the subject of part two of this post...