A GUIDE TO
VOICE INTERFACES

six principles for designing for voice UI

WHAT IS A VOICE UI?

While we’re used to using our voices to get what we want from each other (“Number four with fries, please!”), the ability to get what we want from our products in the same way is still a relatively new phenomenon.


To help make sense of the current and future state of voice interfaces and navigate what’s destined to be an increasingly “Open Sesame!” age, we’ve compiled some of the most important things we’ve learned about voice UI for our friends in the design community. If you’re interested in learning about the building blocks of voice interface or need some inspiration to begin designing for one, this guide is for you.

AN INTERFACE THAT GOES BEYOND THE SCREEN

Voice interfaces and the new design frontier

While you may think of voice commands primarily in the context of popular devices manufactured by Apple, Google and Amazon, you should instead think of them as interfaces that can be applied across the board to complement screens and transform digital interactions as we know them now.

This is a new era of service design, and with it comes endless possibilities for designing intuitive, holistic and human-centered experiences that people love.

MORE THAN A FEMALE ASSISTANT IN A CYLINDER SHAPE

The rise of voice personal assistance

The progression of natural language processing, deep learning algorithms and significantly improved microphones means we are beginning to see interfaces that can understand and accommodate the rigid structure of human conversation.

But these aren’t dry, dull electronics. Companies are developing personalities for their virtual assistants, which have mostly arrived as a set of female characters – embodied in phones, home assistants and navigation systems – personifying AI via voice.

However, it’s important to note that applying this gendered identity has ramifications, especially because the resulting impulse is to then add a “her” to every product we can. Instead, we should pay attention to the unexamined decisions we’re making to avoid digitizing existing power structures under the guise of a “default” identity.

A SOUND RELATIONSHIP BETWEEN TECH AND HUMAN

Interfaces must still be efficient when completing the user’s task

As increasingly digital natives, we have long been accustomed to adapting our behavior to satisfy the demands of the machine interface. However, we are now at a point where usability is increasing so dramatically that our interfaces are adapting to our behavior, and this is all made possible by the rise of language-driven technologies.

But while that rate of that change may seem faster than ever, what isn’t changing is the desire for a more human mode of interaction.

Getting Started
01

CONVERSATION AS USER INTERFACE

The promise of simple spoken control and information without a screen requires us to look at what we expect from conversations and how we design for relationships.

Conversations in real life aren’t one-sided – they require cooperation between participants and mutual acceptance, understanding and respect. Same with a voice interface. Other channels (lights, animations, chimes) can work in concert with voice to reinforce that the interface is cooperating. People inevitably build relationships with products, and reflecting the attributes of positive human interactions can ensure our digital relationships are strong.

WE SUGGEST

Before getting started, think about the service or product you’re designing for and map out the desired “conversation” – then design to achieve that outcome.

02

THE INTERFACE OF LEAST RESISTANCE

A user will typically look for the easiest way to complete a task, but their definition of “easy” can vary wildly depending on the context and situation.

While sometimes voice is the best solution, other times it isn’t. Need to text your mom while you’re driving to let her know you’ll be late? Siri is the better, safer solution to compose and send a message, from beginning to end. Want to turn the music off? Shh-ing Alexa is a simple – and instinctual – way to communicate. However, if you’re hoping to get through your inbox in bed in the morning, your virtual assistant can only get you so far – unless time is no object, and typos are no problem.

WE SUGGEST

Examine your own relationship to, and familiarity with, voice and language interfaces and get to know the specific people who will be using the interface and the context in which they’ll likely be using it.

03

EVERYTHING HAPPENS IN SEQUENCE

When interacting with a voice interface, users don’t have a way to view a menu or scan their options, and if the system isn’t easily navigable, the probability of user fatigue and frustration increases. People don’t want to learn something new; they want things to play out in an order that makes sense.

Effective conversational design follows familiar sequences and implements conversational structure and familiar queues to increase empathy, maintain a relationship with the user and motivate for further engagement. Error messages and status updates in the form of light animations and subtle chimes – as well as earcones, which could basically be considered emoji for voice interface – are ways to convey meaning to the user in an intuitive way.

WE SUGGEST

Create a sequence diagram that represents the pattern of interaction.

04

CONTEXT IS KEY

If you’re walking down the street talking to yourself, people are likely to assume one of two things: you’re crazy... or you’re wearing a Bluetooth headset (not that the two are mutually exclusive). However, voice UI introduces a range of other possibilities – and with it, other things to consider – by potentially exposing your private preferences (“Google, add ultra-soft toilet paper to my shopping list”), embarrassing music choices (“Play the new Justin Bieber song”) and, more seriously, secure info (“My social security number is...”) to anyone within earshot. Another problem is the voice traffic in crowded public spaces. If everyone crammed on the subway is yelling into the microphone at their personal Google, that crossfire isn’t conducive for effective user-device communication.

For this reason, voice interfaces are most useful in more private, controlled environments, such as the car or home. (We promise your Prius won’t judge your music preferences.)

WE SUGGEST

Consider the context in which your voice interface will be used and the security risks it can introduce. Be clear about the topics and times when the interface isn’t the best option.

05

WHAT'S YOUR NAME AGAIN?

Conversation is not simply about information exchange but about navigating and negotiating social relationships. Equipping a voice UI with distinct personality traits is as much about establishing a social power dynamic between human and machine (user and service) as it is about subtly expressing a brand. Building empathy is a big part of designing for language interface – not only in how easily information is shared, but how easily users react to a language interface. Using emotive expressions to address the user and conversational context to recognize them helps build trust in both the interface, the service and, thereby, the fundamental elements for usability.

That being said, the service you’re designing for should inform the intonation, phrasing and sentence structure. Sarcasm is fine when you’re bantering with a home assistant, for instance, but a wisecracking interface in the classroom might not be as appreciated.

WE SUGGEST

Choose idiosyncratic elements that suit your service and map out how these are expressed in the interface. Be aware of any existing biases and evaluate or adjust accordingly.

06

MIND YOUR MANNERS

Remember when your parents taught you to say please and thank you? The importance of helpful words, and all that implies, applies to voice UI, too. Designing empathy does more than facilitate information exchange – it also sets expectations and allows the user to forgive mistakes. And specific words and sentence structures help build the ongoing social relationship between the user and the interface, while the actual work is being done on the backend.

Keep in mind, too, that an interface is supposed to be fair and democratic, so it should never be judging the user, their opinions, attitudes or limitations.

WE SUGGEST

Carefully consider limitations, barriers and specific circumstances – such as accents, speech impediments and bilingual households – and ensure a human-centered voice UI by designing to accommodate these individual needs.