HBCU Howard University recently released new data on how to improve the experience of Black people using artificial intelligence-based automatic speech recognition technology.
Released alongside Google as part of their “Project Elevate Black Voices” partnership, the data was based on catalouges of different dialects and diction used by people in Black communities across the U.S. Approximately 600 hours of data was collected from users of African American English to help improve automatic speech recognition- otherwise known as ASR. Thirty-two states are included as part of the project.
In their collections, Howard University researchers found that AAE is underrepresented in ASR because Black users of the AI-based technology are often prompted to adjust their voices when interacting with ASR. The ASR models are therefore not accurately trained.
The data will first be made available to HBCU researchers working on AI updates to ensure that Black users are accurately represented in ASR. The dataset might be available to researchers outside of HBCUs. HBCUs are given priority as their work aligns with Howard University’s values of empowerment, inclusivity and community-based research.
“African American English has been at the forefront of United States culture since almost the beginning of the country,” said Gloria Washington, Ph.D., a Howard University professor and Project Elevate Black Voices, per the HBCU’s newspaper. “Voice assistant technology should understand different dialects of all African American English to truly serve not just African Americans, but other persons who speak these unique dialects. It’s about time that we provide the best experience for all users of these technologies.”
Previous research has highlighted not only that AI does not understand AAE, but also that it discriminates against users who speak with the AAE dialect. According to a 2024 report, researchers fed five different language learning models, including ChatGPT, over 2,000 social media posts written in AAE.
As part of the study, they asked AI how likely they would be to describe the people with adjectives such as alert, intelligent, rude or neat. Overall, AAE was associated with negative stereotypes by AI models. The technology associated the speakers with that dialect, often stereotyped as “rude, stupid, ignorant and lazy.” When asked to associate adjectives with Black people, the researchers received mixed responses. Black people were assocated with “loud” and aggressive,” but also “passionate,” “brilliant” and “imaginative.”
With the study, researchers deduced that training AI will have to include training against racism associated with linguistic bias, alongside overt racism.
The latest findings come as AI continues to become embedded in various fields and processes, including the creation of police reports and the filtering of job applications.