Public talk features Michael J. Logan / Loughan
from Curraghaderry, Milltown
Deirdre Ní Chonghaile
A chairde,
all welcome to our talk 12pm GMT Tues 6 February in G010 HRB, part of the Digital Humanities Research Group series on campus at University of Galway. Register for livestream here:
NB this presentation will not be archived. Feel free to spread the word.
Le dea-ghuí,
Deirdre Ní Chonghaile
Title:
“To have the ‘million’ readers yet”: applying OCR & NER to bilingual Irish-English texts in An Gaodhal (1881-1898)
Speakers:
Deirdre Ní Chonghaile, Glucksman Ireland House, New York University
Oksana Dereza, Data Science Institute, University of Galway
Abstract:
Computerized text extraction for the Irish language (Gaeilge) faces a number of challenges, the most significant of which is the machine-readability of cló Gaelach, the typeface most commonly used in hand-written and printed Irish-language material up until the 1960s. To date, only a handful of OCR training models attuned to cló Gaelach, and to pre-standardized spelling, have emerged and none were trained on bilingual texts (Irish-English). Using the text-recognition software Transkribus, a team at New York University and University of Galway have developed two new OCR models: a Gaeilge-only model and a bilingual Gaeilge-English model. The core dataset for this OCR training exercise is the Brooklyn-based bilingual monthly newspaper An Gaodhal (1881-1898), the first serial dedicated to providing content to an Irish-language readership, which was established, edited, and printed by Galwayman Micheál Ó Lócháin (1836-1899). The contents of the newspaper reflect the cultural interests of Irish speakers in New York, Ireland, and the wider diaspora; Irish American life; New York history; and the development of the Irish language during the Celtic Revival period. Using the texts extracted from An Gaodhal, which are being corrected at word level, the team is developing NER (named entity recognition) tools to aid future NLP work in the Irish language. This work-in-progress presentation will share learnings from this on-going project.
No Comments
Add a comment about this page