Data Repositories

Open science resources from the International Digital Oral History Lab

We are committed to the principles of open science and FAIR (Findable, Accessible, Interoperable, Reusable) data. Our GitHub repositories house the technical infrastructure, research tools, datasets, and documentation that underpin our projects — enabling researchers worldwide to reproduce, extend, and build upon our work.

MeDoraH Project

Transforming Oral History Research Through Semantic Technologies
Semantic Web Natural Language Processing FAIR Data Knowledge Graph

MeDoraH is a collaborative research project between UCL and TU Darmstadt, developing innovative digital methods for oral history research. The project integrates semantic web technologies with historical-interpretative analysis to understand the evolution of Digital Humanities.

Bridging computational methods and humanistic inquiry, MeDoraH provides a comprehensive framework for representing oral history interviews, their metadata, and associated analytical data — designed to support advanced content analysis, facilitate interdisciplinary research, and adhere to FAIR data principles.

File Management
Digital library capturing technical metadata, provenance, and file relationships.
Content Modelling
Detailed models representing structure, semantics, and relationships.
Enrichment
Representing complex relationships for advanced querying and knowledge discovery.
View on GitHub

MDOH Project

Multimodal Digital Oral History
Multimodal Analysis Sound as Data Digital Hermeneutics Laughter Detection

MDOH develops methodologies and technical workflows for active engagement with the oral, aural, and sonic affordances of oral history collections — across both retro-digitised and born-digital materials. The project treats oral history artifacts as multifaceted resources rather than text-only objects, working across multiple representational modalities.

Central to MDOH is a commitment to reflexive digital practice. While leveraging computational approaches, the project remains attuned to oral history as a subjective and intersubjective meaning-making process, situated within specific cultural, temporal, and technological contexts.

Repository Structure ├── data/ — released datasets and documentation ├── docs/ — methodology notes and workflow descriptions ├── src/ — reusable code, pipelines, and utilities ├── notebooks/ — exploratory analysis and prototypes └── outputs/ — reproducible figures, tables, and exports
View on GitHub

MeDoraH_NLP

NLP Toolkit & Text Mining Suite
Workbench Workflow Hermeneutic Analysis Clustering Visualisation

A comprehensive suite of text mining and knowledge graph construction tools — enabling researchers to transform unstructured historical narratives into structured, semantically-rich knowledge representations.

Workflows
Workflows for information extraction and knowledge graph construction.
LLM Workbench
Cross-platform desktop app for Hermeneutic Analysis.
Preprocessing
Segmentation, sentence boundary detection, and context-aware pair generation.
Visualiser
Interactive visualisation of the ontology structure and graph data.
Hybrid Clustering
LLM Embedding + Prompt based Clustering
View on GitHub

MeDoraH_Ontology

Ontology & Schema Definitions
Ontology Design Guidelines FAIR Data Metadata

Ontology and schema definitions supporting FAIR data principles and semantic enrichment for oral history research. This repository provides the formal knowledge representation layer that underpins the entire MeDoraH technical infrastructure.

Core Ontology
OWL/RDF ontology with domains: Actor, Event, Artefact, ConceptualItem, SpatialEntity, TemporalEntity.
Metadata Schemas
Standardised schemas for technical metadata, provenance, and relationships.
Properties
Relation definitions, domains, ranges, and specialisation hierarchies.
View on GitHub

Contribute to Open Research

We welcome contributions from the community — whether bug reports, documentation improvements, methodological critiques, or new implementations. All repositories follow open-source best practices.