Public talk features Michael J. Logan / Loughan

from Curraghaderry, Milltown

Deirdre Ní Chonghaile

A chairde,

all welcome to our talk 12pm GMT Tues 6 February in G010 HRB, part of the Digital Humanities Research Group series on campus at University of Galway. Register for livestream here:

https://forms.office.com/pages/responsepage.aspx?id=hrHjE0bEq0qcbZq5u3aBbC0nLCyJ5etMops-yds2521UMkw4TExGQUZHSlRMOEJLSTlPSUhXWkVXVC4u

NB this presentation will not be archived. Feel free to spread the word.

Le dea-ghuí,

Deirdre Ní Chonghaile

Title:

“To have the ‘million’ readers yet”: applying OCR & NER to bilingual Irish-English texts in An Gaodhal (1881-1898)

 

Speakers:

Deirdre Ní Chonghaile, Glucksman Ireland House, New York University

Oksana Dereza, Data Science Institute, University of Galway

 

Abstract:

Computerized text extraction for the Irish language (Gaeilge) faces a number of challenges, the most significant of which is the machine-readability of cló Gaelach, the typeface most commonly used in hand-written and printed Irish-language material up until the 1960s. To date, only a handful of OCR training models attuned to cló Gaelach, and to pre-standardized spelling, have emerged and none were trained on bilingual texts (Irish-English). Using the text-recognition software Transkribus, a team at New York University and University of Galway have developed two new OCR models: a Gaeilge-only model and a bilingual Gaeilge-English model. The core dataset for this OCR training exercise is the Brooklyn-based bilingual monthly newspaper An Gaodhal (1881-1898), the first serial dedicated to providing content to an Irish-language readership, which was established, edited, and printed by Galwayman Micheál Ó Lócháin (1836-1899). The contents of the newspaper reflect the cultural interests of Irish speakers in New York, Ireland, and the wider diaspora; Irish American life; New York history; and the development of the Irish language during the Celtic Revival period. Using the texts extracted from An Gaodhal, which are being corrected at word level, the team is developing NER (named entity recognition) tools to aid future NLP work in the Irish language. This work-in-progress presentation will share learnings from this on-going project.

 

Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab
This page was added on 05/02/2024.

No Comments

Start the ball rolling by posting a comment on this page!

Add a comment about this page

Your email address will not be published.