Home EntertainmentUniversity of Sheffield Launches First National Census of UK Regional Swearwords to Preserve Language Diversity and Aid AI Development

University of Sheffield Launches First National Census of UK Regional Swearwords to Preserve Language Diversity and Aid AI Development

by Elena Rossi

SHEFFIELD – The University of Sheffield has launched the first national census of swearwords to document and preserve regional insults and curses across the United Kingdom.

The project addresses the increasing homogenization of the English language and provides critical data for the development of artificial intelligence, which frequently fails to accurately interpret regional accents and non-standard speech. It also feeds into a wider debate about how public institutions, from schools to broadcasters and regulators, respond to taboo language in everyday life.

The initiative began after Dr. Chris Montgomery, a senior lecturer in dialectology and the project lead, was approached by the art project Modern Toss. The art group sought a university partner to assist in the creation of a comprehensive map of British swearwords.

“We’ve got quite a lot of large corpus-based data that allows you to track general swearwords over time from the 1990s to the 2010s,” said Montgomery. “But actually, we don’t know very much about regional swearing at all. We haven’t ever had a survey of regional swearing before, we sort of don’t know what’s going on.”

Montgomery stated that swearing serves a productive social function, acting as an indicator of frustration and a means of showing social solidarity. He noted that many of the terms being collected operate as in‑group signals rather than simple insults, and that understanding them can help explain how communities police their own social boundaries.

Researchers are soliciting submissions from individuals in towns and cities across Britain to establish a “vivid, honest record” of contemporary speech. The academics are focusing on terms recognized locally but unknown elsewhere, which often reflect the specific history and identity of the communities using them. Participants are asked to provide not just words but also context: who uses them, in what situations, and how offensive they are perceived to be.

Initial submissions include “arl arse” from Liverpool, “bampot” from Glasgow, and “radgie bastard” from north-east England. Other regional examples cited include “divvy” in Merseyside, “pillock” in Leeds, and “dinlo” in Portsmouth. The research team stresses that such examples are being logged alongside information about tone and intent, to distinguish casual teasing from genuinely abusive language.

Technological Integration and AI Development

The census serves a functional purpose beyond linguistic preservation. Research led by the University of Sheffield indicates that AI systems often struggle to process regional accents and non-standard English, with consequences for everything from customer service chatbots to automated content-moderation tools.

The research team stated there is a requirement to capture data on regional variations to improve technology development and prevent the exclusion of regional language from these systems. Without such data, software used by public bodies and private companies risks misclassifying benign regional slang as hate speech, or failing to detect genuinely harmful abuse because it is phrased in local terms.

Montgomery emphasized that the project is not intended to promote offensive language, but to provide an insight into the English language as it is spoken in 2026.

“Some traditional regional dialects might be disappearing, and this project is about celebrating the regional language that people actually use and preserving a record of it, so future generations can get a real insight into people’s lives in 2026 and how people communicated in towns and cities across the country,” Montgomery said.

He added that treating taboo language as a serious research subject helps experts understand how language evolves and is utilized for social functions. The work also intersects with regulatory debates, as UK broadcasters and on‑demand services are subject to detailed rules on offensive language under the Ofcom Broadcasting Code, which requires them to gauge not only the words used but how they are understood by different audiences.

Public Standards, Policy and Everyday Speech

Linguists involved in the project say the findings could ultimately inform how institutions think about speech in multi-accent, multilingual societies. Public-sector agencies experimenting with AI-assisted call handling or complaint triage, for example, increasingly need systems that can tell the difference between cathartic swearing, targeted harassment and language reclaimed by communities themselves.

The Sheffield team argues that a more nuanced map of swearing could, over time, support fairer moderation policies on digital platforms and more context-aware training data for AI models. It may also help schools, local authorities and employers frame codes of conduct that recognise real-world language use while setting clear boundaries around abuse.

Academic Validation and Public Exhibition

The project has received support from other linguistics experts, including Dr. Robbie Love, a lecturer in English language at Aston University in Birmingham.

Love noted that localized swearing practices often receive little attention and that recording these practices has inherent value because language variation reflects and reinforces regional identity.

“This is not about encouraging rudeness or bad behaviour, but rather celebrating diversity and just acknowledging that swearing is, for a lot of people, a day-to-day part of life,” Love said.

The collected data will be used to create national exhibitions. These will showcase contemporary speech patterns and may include an interactive map allowing visitors to hear regional swearwords spoken in their native accents. The team is also exploring ways for schools, museums and libraries to use the material in public engagement work, balancing educational value with sensitivity around offensive terms.

The project is currently in the data collection phase, soliciting public submissions to build the census record. Members of the public can take part online, with researchers stressing that contributions will be anonymised and assessed according to ethical guidelines governing research into sensitive material.

You may also like

Leave a Comment