Improving the accessibility of web content
Empowering users: the rise of language processing technology
In recent decades, accessibility features in web browsers have seen significant improvements. This has been driven by growing recognition of the need for inclusivity and equal access to information for people with disabilities. Many of the new features, however, are often dismissed as being of use only to those with obvious disabilities like blindness or deafness.
But those with less obvious disabilities, such as dyslexia, ADHD or Asperger's, can also benefit greatly from accessibility tools like Microsoft's Immersive Reader, Apple's VoiceOver and Google's Reading mode app, amongst others. These tools are often integrated into applications and provide users with a customisable, distraction-free reading experience that improves focus and comprehension.
Figure 1. Google Chrome Reading mode
Similarly, those recovering from illness or an accident, or who are simply too tired to read, can listen to a screen reader that reads out web content. This uses text-to-speech (TTS) technology, an area that has seen particularly significant advances. From the early days of robotic-sounding voices in the 1960s, TTS has evolved to produce speech that is almost indistinguishable from human voices. This is thanks to advances in digital signal processing and linguistic modelling, and integration with machine learning and deep learning techniques.
Those who struggle with texting on their phone, on the other hand, can choose to use speech-to-text (STT) technology to convert spoken words to text.
More and more of us are also using automatic speech recognition (ASR), for example to listen to a particular song, dim the lights or make a call.
Behind all these innovations we find many patents in CPC classes like G06F, G10L13 and G06F15.
Figure 2. Patent application filings classified in G10L13 (text-to-speech) from1990 to 2020. Source: PATSTAT. Click on this link to access the query.
These language technologies are widely used across various applications, including virtual assistants, navigation systems and accessibility tools, fundamentally changing how we interact with them.
Human speech conveys a wide range of emotions, and mimicking natural speech to make it not just understandable but also employ precise pronunciation and use the correct intonation, rhythm, stress patterns and nuances is very challenging. Accurately reproducing the unique phonetic and syntactic characteristics of languages and dialects poses additional challenges. Some might also want to personalise pitch, speed and accents, and this requires sophisticated neural network-based technology to generate speech that is natural and intelligible.
There have been some interesting developments in this area, and you can find examples in Espacenet using the following classification symbols:
-
G10L13/00 Speech synthesis; Text to speech systems
-
G10L15/00 Speech recognition
-
G06F40/00 Handling natural language data
-
G06N3/00 Computing arrangements based on biological models
For instance, WO2019183062A1 describes a smart device to assist those facing challenges associated with aging and cognitive/mental disorders. The device speaks in voices familiar to the patient and assists them with managing their home environment.
Devices like these must not just be able to handle confused or poor input, they must also protect against malicious attack, for example in the form of deepfake audio.
Ongoing research and development, along with collaboration across fields such as linguistics, computer science and artificial intelligence, are helping language processing technology to overcome these challenges.
In today's rapidly advancing technological landscape, it comes as no surprise that many tools are available not just to those with visual or auditory impairments. Indeed, the ability to use technology to translate text or sound into another language is a significant step towards general global connectivity – whether in the form of voice-overs, captions, subtitles or text replacement, it makes knowledge and information accessible to everyone, regardless of the language they use. Many of these technologies are classified in CPC main group G10L13 (Speech synthesis; Text to speech systems). Figure 3 shows search results for this CPC main group in Espacenet (you can filter these results, for example by filing office).
Figure 3. Top applicant countries for text-to-speech (G10L13/low)
Artificial intelligence is now embedded in almost every aspect of our lives. Hearing aids, for instance, have evolved to include various features that benefit not just their users. They can translate text into other languages, read back shopping lists and even reduce background noise. In future they may even be able to count your steps or alert others in case of an accident, which are functions that of course already exist, but not yet in small, comfortable devices that fit in the ear.
Advances in language processing technology have significantly improved accessibility. The resulting tools have become essential to promoting inclusivity and to enhancing quality of life, from assisting those with visual impairments and cognitive challenges to benefiting those without specific needs. The technology continues to evolve, and it promises to continue to reduce accessibility gaps even further.
Keywords: Accessibility, TTS, STT, ASR, language processing, Espacenet, AI, CPC