Oxblood ink abstract calligraphy fills the image in curly glyphs. Own work 2023.

Text-to-speech is not just screen readers

How and why I use synthetic voices to read me books

by AK Krajewska

I often use text-to-speech to read. I get the sense that it's an unusual habit for a fully sighted person. Since I've mentioned it occasionally when I write about books, I thought I'd say a bit more about my reading method.

The tools I use for text-to-speech #

I use a third generation Kindle, the Kindle Keyboard, which comes with a text-to-speech option. It's an old model from 2010, but you can still get them used. I know, because I'm on my third one. I broke the first one and lost the second one. Some newer Kindles have something called "VoiceView" which apparently pairs with assistive technology, but they don't have their own speakers and I also don't know how well they actually work. The Kindle Keyboard has a headphone jack, so I can stick a pair of headphones into it and put it in my purse when I go for a walk or in the pocket of my apron when I cook, and move around when I read. I like to move around. But more on that later. Some books come with DRM that doesn't allow text-to-speech. You can check that when you're buying them and only buy books that allow it (which is most of them) or find your own workarounds.

I use the Instapaper app on my iPhone to read articles from the internet. Instapaper has a text-to-speech option. You can control how fast it goes. Depending on my mood, I choose anywhere from 1x to 2x speed. 1.5x is my default. Instapaper works very well for things that take 10 minutes to read, and occasionally seizes up for longer pieces. I don't think it works very well for anything that takes longer than 30 minutes.

Finally, I sometimes use the system speech utility on my Mac. It's nice when something is so long that it's not practical for Instapaper, but not easy to import into a Kindle. I paste stuff I want to read into either Text Edit and then go to Edit > Speech > Start Speaking. It also works on Notes but I don't tend to use Notes like that, just because for me, Notes is for notes, not copy-pasted fanfiction. And more on that later as well.

Why I use text-to-speech #

While I don't have any significant vision impairment or learning disability that make reading difficult, I don't like to do just one thing at a time. Before I had books on text-to-speech, I was that person walking down the street reading a paperback. If there is a life activity during which you may imagine a person reading, I have done so, or attempted to do so.

You may wonder, why not audio books? First, I also listened to audio books, but I ran out. Audio books are just as small subset of books, and what's more they are very expensive. And some things just aren't available as narrated works at all, like weird blog posts and fanfiction.

Fanfiction is how I started. As best as I can reconstruct it from the extant evidence[1], I started using text-to-speech in the summer of 2007. Two things came together. The final Harry Potter book, Harry Potter and the Deathly Hallows, was published on July 21, 2007, and I read it immediately. At the same time, I was really into knitting, and often stayed up way too late doing it. After I read Deathly Hallows I felt very sad and dissatisfied with the Severus Snape storyline[2], and decided to find some fanfiction. By early August I was emailing myself lists of fics to read. I could not get enough Severus Snape fanfiction.

But I had a problem: I also wanted to knit. A lot. I could not get enough knitting and I could not get enough Severus Snape. When, a couple years earlier, I had experimented with speech to text (the experiment failed) I also discovered that my early model Mac laptop could do text-to-speech. It was pretty robotic and I didn't find much use for it. Until suddenly I very much did have a use for it. I would paste chapters of fanfic into Text Edit and make the robot voice read them to me while I knitted. It didn't take long to get used to the robotic voice. In the summer of 2008 I started playing World of Warcraft, and then I had the computer read me fanfiction while I played the game. Like I said, I hate to do just one thing at a time.

It must have been around 2012 when a friend offered to give me his used Kindle Keyboard. I hadn't been very interested in e-readers, but he said I could load fanfiction onto it and it would do text-to-speech. That was true, and it also opened the possibility of reading any book as text-to-speech. And it was easy to switch between audio and visual so I could read however I wanted.

Ever since then, I've used text-to-speech to read at least parts of every ebook I've read. I can read while getting dressed, while cooking, while cleaning, while knitting and crocheting, while drawing and doing calligraphy, and while riding in a car where I would get carsick reading, and while walking and walking and walking. Certain books become associated with particular craft projects. Passages get attached to particular scenery I saw on a walk.

Text-to-speech lets me do boring things by adding books to it, and it also lets me read difficult books by attaching physical activity. I don't think I could have finished Of Grammatology or Seeing Like a State if I couldn't read parts of them while baking or playing World of Warcraft.

Preempting some objections #

I don't want to be some kind of of text-to-speech evangelist, but I think text-to-speech is pretty neat, especially if you're the kind of person who, like me, doesn't like to do just one thing at a time. I hear variations on these objections whenever I talk about text-to-speech, so I might as well preempt them.

Text-to-speech voices are flat and robotic #

Yes, a lot of text-to-speech lacks the inflection of a human narrator, and it has some quirks. For example, my Kindle always reads people going "hm" as "hectometers." Not great. It also doesn't properly pause for punctuation, which can make it harder to follow. However, I found that I got used to it and now mostly tune out the style of the reading. I can't read anything with meter using text-to-speech. So no poetry, and no authors with a particularly rhythmic prose style.

That said, the voices are getting better on modern devices.

It's harder to follow and absorb audio than visual text #

First, following text that you listen to is a skill you can develop and improve. I had to learn it in poetry workshops. Second, yes, it's true! I sometimes listen to sections more than once if I missed something because, for example, I had to run the water, or I just got distracted. But I, at least, get distracted when I'm reading things with my eyes and have to read them again, so it doesn't seem much of a loss.

It's not real reading #

I guess if you're reading for some kind of merit badge or sport where you have to only use certain equipment, like you know, raw powerlifting vs equipped powerlifting, and your particular reading sport division does not allow listening to books, then this objections might make some sense. I mean, some people think that ebooks don't count as reading because you aren't using the traditional equipment. How far do you want to take this argument? It's not real reading when you have to use reading glasses? It's not real reading when you're not reading it off an ancient scroll? I think it's silly. I also don't think there is anything inherently virtuous about reading, and definitely not anything special about one form of reading over another. Unless what you specifically want out of the act of reading is practice reading with your eyes, I don't think it matters at all how you read. Even then, it's not fake to listen to a story. If anything, it is the written text is fake. But I will not get into Derrida again at this late hour, tempted as I may be to do so.

  1. Email responses to my LiveJournal posts about Snape/Hermione fanfiction I was reading. ↩︎

  2. It was a more innocent time, when one could be disappointed in the storyline instead of the author. Alas. ↩︎