Screen readers do not need to be saved by AI

Imagine listening to your favourite podcast. You rewind it to go over something you missed, but each time you replay it, it’s somehow different.

This sounds frustrating, right? But, it’s likely this is what would happen if we just stuffed large language models into screen readers, in a lazy attempt to avoid having to publish accessible content.

I’ve had this debate a few times on LinkedIn, but it came up again recently, after the awesome Access: Given conference, where Helen Dutson and Holly Tuke shared an example of emoji misuse. It was an RNIB post with clapping hands inserted between every word to highlight a common problem.

A slide which reads 'Only use one or two emojis in your writing and never put them in between words', alongside an example of a post from the RNIB which has clapping hands emojis between each word, and is described in more detail in the caption.
For a sighted person, the post looks… "fun". But for someone using a screen reader, it reads out: "This clapping hands is clapping hands what clapping hands screen clapping hands reader clapping hands users clapping hands hear clapping hands when clapping hands you clapping hands over-use clapping hands emojis clapping hands."

Annoying? Yes. One hundred percent! But, is it a problem for screen reader vendors to fix? I personally don’t think so. But, that’s the conclusion I see many people jumping to.

I appreciate this is just my opinion though. So, I figured it might be good to try articulate some of the reasons why I’ve came to that conclusion.

The Temptation of AI

When people see issues like this, and because AI is now literally everywhere, the natural instinct for many people is to suggest "smarter" technology. Surely an LLM could just tidy this up, skip over the redundant bits, and give the listener something more coherent? Right?

On the surface, it sounds appealing. But we need to step back and ask: what problem are we actually trying to solve here?

The truth is, screen readers already handle emojis pretty well. They recognise them as seperate entities from text and images, and each emoji has a consistent text description across different devices and platforms.

The descriptions aren’t always great. As Holly pointed out during the conference, the "red flag" emoji is actually announced as "triangular flag on post", which does not usually provide enough context for the way that emoji is used in our culture. But again, is this a problem for screen reader vendors to fix? Or should it be an iteration to the standardised text descriptions of emojis?

See, in both of these cases, the problem isn’t really the technology, it’s us! It’s the text descriptions humans have assigned to emojis, or the way humans are using them in their writing.

It’s a ridiculous and exaggerated example, but if I wrote a recipe and I missed out a few steps, none of us would expect an oven to figure it out and fix it! The responsibility for accessibility lies with the person creating the content. We shouldn't be looking for screen readers to try and patch bad content in real-time, it’s on us to communicate clearly in the first place.

The Real Cost of Bloating Screen Readers with AI

The cost of AI is a big issue. The monetary, environmental and ethical costs can be huge.

Let’s imagine for a moment, that a screen reader vendor decided to go down the AI route anyway. On paper, it sounds like a technological step forward. The marketing teams would love it! We’ll add some clever reasoning, train it on a few articles, and boom, all your problems will be solved!

Well, not exactly. In reality, the ripple effects would likely be enormous.

Context and accuracy

Language models are not designed to be consistent, they're designed to be coherant. Because they work on probability, they're unable to give the same answer twice, unless it happens purely by chance.

They also don't know facts, just the relationships between tokens. This is problematic if you're trying to read something exactly as it was written, or quote somebody in an academic paper.

We know LLM's can change the tone or context of something very easily, whether accidentally or on purpose. In a world where there are a lot of policial sensitivities right now, changing the context, even slightly, could eaily lead to somebody having their words misrepresented or misunderstood.

Development time

Building AI functionality into a product isn’t a quick update, it’s a huge engineering project. It's probably thousands of hours of engineering time. Until now, screen readers have had one job, just read the content on the screen. By having it re-structure or re-write that content is not just a feature, it's reimagining what a screen reader actually is!

If you rush it, and don't test it properly, you could easily end up with something that's biased or enablist in it's language.

Hardware strain

Language models don’t run for free! Even the "lightweight" versions are pretty resource-heavy. You either need a permanent internet connection and a subscription to use cloud based providers, or you need a decent amount of hardware to run them locally. I have a modern MacBook Pro, and most local LLM's will still try and cook my CPU doing fairly straightforward tasks.

A typical screen reader today can be installed on a modest laptop or smartphone and perform seamlessly, even at speeds of 600 to 800 words per minute, which are speeds I’ve seen many seasoned screen reader users rely on to keep up with studies, work, and everyday tasks.

However, once you introduce AI, the machine has to parse an input, break it into tokens, reason about meaning, generate an output, and then render it. Unless you have enough processing power, you can’t do that in real-time at 800 words per minute. Older machines would struggle, and screen reader users would be forced into a never-ending cycle of high-end upgrades just to keep up.

Though, language models are constantly being tuned, so some are probably efficient enough to keep up, with the right hardware.

Energy consumption

More powerful software and hardware, means everything inevitably uses more power. Whether that’s draining the battery on your phone quicker, or using more water to cool the cloud servers you’re heating up in a data centre somewhere, the impact adds up.

We live in a world where sustainability is already a pressing issue. Do we really want to increase the carbon footprint of a screen reader, just so they can "tidy up" an overuse of emojis, and probably alter the context of the original post? For me, it feels a bit like using a rocket booster to toast a slice of bread!

Consumer impact

Perhaps the most important consideration of all is the human cost. Screen readers aren’t optional extras or luxury devices. They’re essential tools, like a wheelchair or a hearing aid. The primary users of these tools are people with disabilities, who are already proven to be more likely to be on lower incomes, because of the disability pay gap.

If screen readers suddenly required more expensive hardware, steeper licence fees, or higher running costs, who absorbs that cost? It won’t be the tech vendors! It will fall squarely on the shoulders of the people who need these tools to live, work, and study. So, it’s not just impractical, it’s probably unethical.

A Simpler, Better Solution

So, where should we focus our energy?

I don't think it's in bloating assistive tech with AI, but in teaching people how to write inclusively. It's cheaper and more sustainable, but it’s also a more ethical and empathetic path. I highly recommend the book Considerate Content by Rebekah Barry if you want to learn more about making your content accessible.

Empowering humans to communicate in a more accessible way prevents the problem at its source, avoiding the need to rely on expensive, unreliable machines to try and patch it in real-time afterwards.

Final Thoughts

Screen readers aren’t broken. They don’t need "saving" by AI. What needs fixing is the way we, as humans, use language and iconography online. Accessibility is a shared responsibility, and the most powerful tool we have isn’t artificial intelligence, it’s empathy.

So, the next time you’re tempted to pepper your posts with emojis, use them as bullet points, or wedge them between every word, pause for a moment, and ask yourself, is this playful, or is this exclusionary? The answer might save someone a lot of unnecessary noise! 👏

Thanks, Craig


Post details