Skip to main content

A Search and Mining System for Digital Humanities

When:
Venue: Birkbeck Main Building

Humanities researchers are faced with an overwhelming volume of digitised primary source material, and “born digital” information, of relevance to their research as a result of large-scale digitisation projects. Current digital tools do not provide consistent support for analysing the content of digital archives that are potentially large in scale, multilingual, and come in a range of data formats. Tools are often out of reach for many research disciplines in the humanities, and can be incompatible with the way researchers locate and compare relevant sources. The Samtla (Search And Mining Tools for Language Archives) system was developed to support the exploration of digital archives by providing humanities researchers with digital tools for search, browsing, and text mining of digital archives in any domain or language, under a single system. The key to this domain-independent and language-independent digital infrastructure is a novel combination of language models and similarity measures. Comprehensive evaluation through crowd-sourcing has shown that the e�ectiveness of our system’s search functionality is on par with human-level performance.

Contact name: