Listening with machines? The challenges of AI for oral history and digital public history in libraries

A photo of the Belval campus of the University of Luxembourg, with a sculpture of a large disk on tripod legs and a conference poster in the foreground

A conference paper I wrote with Charlie Morgan for IFPH2024, the 7th World Conference of the International Federation for Public History, in September 2024.

Abstracts weren't available in the conference programme so I've posted ours below. The abstract was written in November 2023, before we knew how much the ransomware attack in October 2023 was going to make our work with digital and digitised collections difficult-to-impossible for the next year or two.

Listening with machines? The challenges of AI for oral history and digital public history in libraries

Mia Ridge, Digital Curator, British Library; Charlie Morgan, Oral History Archivist, British Library

Almost every aspect of our personal and professional lives has been affected by 'AI' and machine-learning based tools. Digital public history is no exception. How does AI change the types of experiences that libraries, museums and archives can create for the public? How does it change our understanding of participatory history when family and community historians might want to use AI tools with digitised or born digital collections? What does it mean to share authority and co-create ‘knowledge’ with machine learning products, especially AI tools that see the world through the lens of Silicon Valley’s capitalist ‘winner takes all’ attitude?

This presentation shares work at the British Library on an AI Strategy and Ethical Guide for digital scholarship, with a particular focus on the implications of AI for archived oral historians. It will include a case study of the use and applicability of corpus linguistic and digital humanities tools to search interviews, identify themes and select sections of audio for close listening. We will also consider the lessons from this case study for our AI strategy more broadly.

What are the ethical, practical and research implications of using AI to transcribe, summarise or analyse oral histories? What is the Library's role, and that of other professional bodies, in providing guidance for research students and others conducting or analysing interviews on platforms with built-in AI tools (for example, Microsoft Teams / OpenAI's Whisper), or exploring how AI could make oral histories more accessible and discoverable? How might AI tools change processes for quality checking records, and how should AI-generated metadata, transcriptions and descriptions be labelled?

This work builds on previous considerations of the implications of AI for digital public history projects, challenging established models for working with crowdsourcing, user-generated content, and other forms of digital participatory history.

'AI and the Digital Humanities' session at CILIP's 2024 conference

I was invited to chair a session on 'AI and the digital humanities' at CILIP's 2024 conference with Ciaran Talbot (Associate Director AI & Ideas Adoption, University of Manchester Library) and Glen Robson (IIIF Technical Co-ordinator, International Image Interoperability Framework Consortium) at CILIP's 2024 conference. CILIP is 'the UK library and information association'.

I wrote a blog post about it for the British Library's Digital Scholarship blog and CILIP also featured it on their AI Hub: AI and the Digital Humanities at CILIP Conference 2024.

'Community Engagement and Special Collections' talk

In April 2024 I was one of four presenters at the Association for Manuscripts and Archives in Research Collections (AMARC)'s Spring Meeting on 'Community Engagement and Special Collections', sharing our work on 'successful projects and strategies for engaging public audiences in meaningful ways through in-person events and digital outreach activities'

I presented on 'Living with Machines: Crowdsourcing transcriptions for digitised historical collections of the British industrial revolution'. The video from the seminar is below.

New data paper and datasets from crowdsourcing on Living with Machines

After lots of hard work by me, Nilo Pedrazzini, Miguel V., Arianna Ciula and Barbara McGillivray, we have a data paper in the Journal of Open Humanities Data: Language of Mechanisation Crowdsourcing Datasets from the Living with Machines Project.

And huge thanks to the thousands of Zooniverse volunteers who annotated 19th century newspaper articles to create the datasets we've published alongside the data paper!

Abstract: We present the ‘Language of Mechanisation’ datasets with examples of re-use in visualisations and analysis. These reusable CSV files, published on the British Library’s Research Repository, contain automatically-transcribed text from 19th century British newspaper articles. Volunteers on the Zooniverse crowdsourcing platform took part in tasks that asked ‘How did the word x change over time and place?’ They annotated articles with pre-selected meanings (senses) for the words coach, car, trolley and bike.

The datasets can support scholarship on a range of historical and linguistic research areas, including research on crowdsourcing and online volunteering behaviours, data processing and data visualisations methodologies.

The two datasets described are at:

Keynote video 'Evolutionary Innovations: Collections as Data in the AI era' for Making Meaning 2024

Making Meaning 2024: Mia Ridge Keynote

My slides for #SLQMakingMeaning #CollectionsAsData, 'Evolutionary Innovations: Collections as Data in the AI era', are online at https://zenodo.org/records/10795641

‘Collections as data’ describes the movement to publish open data from museum, library and archive collections that began in the noughties. The benefits of machine learning for better discoverability and research with digitised/born digital collections are alluring. And the popularity of generative AI – and an increased awareness of the biases it reinscribes – has focused attention on responsible computational access to collections – but what does this mean in practical terms? Mia will share examples from the British Library and the Living with Machines data science project.