After lots of hard work by me, Nilo Pedrazzini, Miguel V., Arianna Ciula and Barbara McGillivray, we have a data paper in the Journal of Open Humanities Data: Language of Mechanisation Crowdsourcing Datasets from the Living with Machines Project.
And huge thanks to the thousands of Zooniverse volunteers who annotated 19th century newspaper articles to create the datasets we've published alongside the data paper!
Abstract: We present the ‘Language of Mechanisation’ datasets with examples of re-use in visualisations and analysis. These reusable CSV files, published on the British Library’s Research Repository, contain automatically-transcribed text from 19th century British newspaper articles. Volunteers on the Zooniverse crowdsourcing platform took part in tasks that asked ‘How did the word x change over time and place?’ They annotated articles with pre-selected meanings (senses) for the words coach, car, trolley and bike.
The datasets can support scholarship on a range of historical and linguistic research areas, including research on crowdsourcing and online volunteering behaviours, data processing and data visualisations methodologies.
The two datasets described are at:
- Language of Mechanisation: annotated historical newspaper articles https://doi.org/10.23636/5t9m-0g59
- OCR and crowdsourced annotations, Language of Mechanisation, JSON files https://doi.org/10.23636/z634-km37