On March 15, Grubhub celebrated the launch of our first voice-controlled ordering system, ‘Reorder with Grubhub’ for Amazon Alexa. Designed for frequent Grubhub users, our skill evolves the ordering experience by enabling diners to reorder their favorite dishes without lifting a finger. We’re so excited to offer this skill, and that enthusiasm extends to this blog post, where I’ll walk you through basic skill functionality, our design process, and our run-up to launch.
(Note: We launched ‘Reorder with Seamless’ for Amazon Alexa in May of this year. This skill works just like the Grubhub version, but is designed for Seamless diners who want to reorder from their Seamless order histories.)
An overview of the Grubhub skill
If you’re a current Grubhub user and you’ve enabled our skill in the Amazon Alexa store, you’ll be able to use the Alexa voice user interface (VUI) to reorder from your Grubhub Order History. You can also manage and update your default payment method and delivery address. This diagram, which will seem more intuitive as you read on, gives an overview of how our skill for Alexa works:
Phrases, slot values, and intents: VUI building blocks
Let’s look more closely at what’s happening between the user of our Alexa skill and the voice user interface (VUI). Designing a voice-driven skill like this requires that we anticipate and build a comprehensive phrase library, encompassing all possible prompts, questions, and commands a user may utter when interacting with the VUI.
Each phrase we place in the phrase library contains a few different components: utterances, slot values, and intents. Certain phrases also include one or two more components critical to successful exchanges of information between the user and the Alexa VUI. These components are a) the “wake word,” which signals the Amazon Echo or other device with Amazon Alexa to pay attention to the user — and b) the “skill name,” which invokes a specific skill.
To understand how components are combined into phrases, and how phrases come together to build the library, let’s look at a sample exchange between a Grubhub user and her Amazon Echo or other device with Alexa. When the user wants to talk to Alexa, the user says the “wake word” associated with its device (e.g., “Alexa”).
Here’s an example of what the user might say to initiate the exchange:
If the user’s a Grubhub customer who has ordered from Blue Ribbon Sushi recently, and Grubhub indicates that the order’s available, Alexa will respond with the next step in the reordering process.
We also group phrases together based on the specific sort of action or response they trigger, and sometimes, other details such as the slot value they contain — for example, RestaurantName or OrderDate. These groups of phrases are called “intents,” and when we put them all together, they make up our library, which currently contains several hundred phrases and dozens of intents.
When formatted as an intent, the sample phrase looks like this:
Note that in intent format, we omit the “wake word” (i.e., Alexa) and the skill name (i.e., Grubhub), since these aspects of the exchange are handled elsewhere in the code. Likewise, the restaurant uttered above (i.e., Blue Ribbon Sushi) has been replaced with the generic RestaurantName slot value string so the user can request any restaurant in her order history.
When we format a phrase as an intent, we add a prefix — in this example, “ReorderByRestaurant.” “ReorderByRestaurant” is the actual “intent.” In fact, every single phrase in our library that a) kicks off the reordering process and b) requests a specific reorder via the RestaurantName slot value must be mapped to the “ReorderByRestaurant” intent before it can be placed in the codebase, because the intent does the critical heavy lifting (and this is true for all Amazon Alexa skills). Rather than requiring Alexa to parse each word the user utters and match it to one of several hundred unique phrases in the codebase, the intent works as a shortcut, enabling Alexa to instantly recognize the user’s request and provide a response without delay.
Here, each time the user utters one of the phrases from the “ReorderByRestaurant” intent group, Alexa passes the intent and slot and Grubhub returns the response. If the restaurant mentioned matches an available order in the user’s history, Alexa responds by reading out the order (items, quantities, and total price), and asks the user to confirm the reorder. The example phrase we’ve been using — “Alexa, ask Grubhub to reorder from Blue Ribbon Sushi” — is just one way the user can trigger this specific response. She could also say “Alexa, ask Grubhub to read my last order from Superiority Burger,” or “Alexa, ask Grubhub to gimme the order from Chopt.”
In our codebase, an intent group might look something like this:
This is also an example — the “ReorderByRestaurant” intent group has more phrases. Intent groups must be robust so the user enjoys maximum flexibility interacting with our skill.
Prototyping the VUI
We modeled Reorder with Grubhub on natural human language patterns and conventions, which required us to make some basic assumptions about how Alexa would prompt and respond to users. We wanted Alexa’s responses to mimic the phrasing and flow of a human conversation partner, ensuring an engaging and realistic ordering experience.
For example, if a user says a phrase indicating desire to order food, without mentioning a preference for a specific order or restaurant, we wanted Alexa’s response to inform while encouraging further user engagement. So if a user says “Alexa, tell Grubhub I’m hungry,” Alexa would say, “Great, here are your available re-orders,” in response, followed by a readout of the user’s three most recent Grubhub orders.
This excerpt from our VUI flow illustrates how this works. Alexa’s responses are denoted in the yellow boxes. Note the conversational style and tone of the content. This is intentional — we want our “voice” to include warmth and friendliness.
Of course, building a VUI for our Alexa skill by attempting to guess what our real-life users will say and do is never as accurate as testing our assumptions with actual Grubhub users. So, we figured, let’s test our skill by observing actual Grubhub customers interact with the VUI in real time and encouraging them to tell us exactly what they think of the experience.
Testing with real Grubhub users
Once our team developed a working prototype of our skill, we recruited loyal Grubhub users of different backgrounds and lifestyles as our test subjects. Each user sat with one of our product managers and walked through the different tasks within our skill. Observing users as they asked to order from Grubhub, decided what to reorder, and managed their account settings gave us immediate feedback about what was and wasn’t working.
These were some of our key takeaways from the testing sessions:
- Our initial phrase library was far too limited. Before we could release the skill, we’d have to build a library that accounted for a much greater diversity of user commands and requests than we included in the prototype.
- The prototype included too much “filler” language, particularly in Alexa’s responses. It takes just a minute to order online, and if VUI-based ordering took longer, it’d be too inconvenient for day-to-day use.
- Some of the exchanges featured too many back-and-forths. We needed to simplify the flow of information and pare the process of ordering down to the minimum steps required to ensure an accessible, pleasing, and convenient VUI.
- Sometimes, our prototype of our Grubhub skill took too long to respond to our users. Other times, phrasing was rushed, or difficult for users to understand. We’d have to insert natural pauses into our code so the conversation felt more human.
- We tested more complex interactions between Alexa and our users — for example, requests for orders further back in the order history and modifications of existing orders. Ultimately, we discovered that we fared best limiting functionality to reordering the three most recent, currently available orders in the user’s Grubhub order history.
To market, to market
To prepare for the launch of our Alexa skill, our team kicked off a multi-phase, iterative cycle of VUI design and QA. We continued to refine and re-test our skill, focusing on perfecting the VUI experience and improving the robustness of the intent library. Many improvements were based on findings from our user testing sessions while others were inspired by members of our team over the project lifecycle.
As our launch date approached, we sent the skill to Amazon’s Alexa team for further testing and approval. The team was very supportive in preparing us for launch and someone from Amazon was always able to answer any questions we had. (If anyone on the Amazon Alexa team is reading this entry, I’d like to say “Thank you for all your help!”) Once the skill was certified and published to the Alexa skills store, we announced the launch to the community. We’re so excited to add voice-driven re-ordering to the Grubhub ordering ecosystem.