What is the future of voice?

Machines need to be taught how to interact with us (through our voice) as opposed to humans being taught machine interfaces.  This has been in the making for decades. The first interfaces with computers were painfully low level. As our computational capabilities and our ability to design more abstract languages evolved so did our interfaces.  The modern GUI was the first leap where computers first became accessible to large groups of educated people. The next leap in accessibility occurred when the mobile phone with simple touch interfaces made computers accessible to everyone.  The next leap in usability which has already started is with voice interfaces. Our kids are adopting the first generation of voice first applications like Alexa and Siri. We think a similar change will follow in the enterprise.

How far away is that future?

This an evolution that won’t be complete in a single year, rather it will be ongoing. Something similar happened to video and bandwidth-hungry applications. In the late 90s and early 2000s, many people said when will it be the year of video.  For that to happen higher bandwidths had to become available to a larger percentage of the population. That took time. The similar trend in voice is that getting voice applications to do four things takes time.  Those four drivers of voice adoption are: i) better recognition of language and vocabulary that are user specific, ii) getting used to accents, iii) understanding intent in a way that allows you to flexibly turn speech into action, and finally iv) understanding how to translate actions into a user’s specific environment or workflow.  In the workplace, I would add a fifth component and that provides better controls around data ownership, privacy, and security. These five drivers are not one-and-done propositions. They require technology to continually improve and marketplaces to evolve. This is all happening now and will be the nexus of activity for the next five to 10 years.

What is the impact of voice in the enterprise?  What areas will it affect most?

The voice revolution in the enterprise will eventually affect almost all enterprise workflows.  The first area that has already seen an impact is call centers, where speaking with customers is happening all the time.  The next area this will effect is in meetings where most corporations spend almost half their time. The value prop here will be to take this tremendous time bucket and both make it more productive and also structured.  Enterprises have spent billions creating structured databases and attempting to derive structure from text. The next frontier here is to derive structure from voice and particularly from the voice in meetings. Once you hone in on meetings, there are different types of meetings that drive different workflows:  sales meetings, project meetings, interviews, and staff meetings.

Outside of meetings, the next area for enterprise voice disruption is in workflow input.  Rather than learning complex enterprise interfaces, B2B companies will speech-enable their interfaces so users can give more flexible commands and engage in dialogue.

Can voice become the dominant form of digital engagement?  If so, when?

Yes only because we spend so much time speaking and talking at work, specifically in meetings.  This ability to extract meaning from these meetings will grow rapidly over the next 5 years. Once we activate and structure half of our corporate time, this will become the largest data asset the corporation has access to.

What are 3 ways voice is changing the way we do business?

The first biggest way this will affect the way we do business is it will restore our ability to focus on each other.  Right now we are experiencing an epidemic of multitasking and distraction. People sitting together are paying more attention to their phones than each other.  We are in meetings to get work done, but we end up staring at our screens and getting addicted to and distracted by notifications. This is making us generally distracted, anxious and for some time actually dumber (distractions cause IQ to temporarily decrease).

This can be solved.  Rather than having screens open, voice can allow us to shut down the screens and focus on each other.  By providing voice technology, meetings can be fruitful and notification free because an AI can take the notes for you and then capture the salient points.

The next major benefit you can expect is fast and accurate follow-up.  Most people leave meetings without an accurate picture of the actions and decisions that were made.  If an AI note taker is there, you can not only capture these, but you will be prompted to do a fast and accurate follow-up of these actions and decisions.  That way others are on the same page and collaborating with you towards creating results.

Finally, the next area of impact will be the continuity of these actions into our workflows.  If you leave a sales meeting, you should be able to update Salesforce or other CRM systems. If you leave a project meeting you should be able to update JIRA, Trello, ASANA or other systems.

Is voice over-hyped?  Is it under-hyped?

Good fundamental technology tends to get over-hyped in the short run.  That is because of the dynamics of how venture capital, start-ups, and the press all reinforce emerging trends into vast echo chambers.  But that same technology may actually be under-hyped in the long run. The reason for this is that fundamental technologies tend to get used in ways that the creators never fully imagined, and some of those end up truly transforming the way people behave.  Voice technologies have that potential.

What impact does voice have on the tech stack for the enterprise?

In order to understand the impact of voice in the enterprise, we first have to understand how enterprise voice differs from consumer voice applications like Alexa and OK Google because they are quite different on four dimensions:

1.Ownership of data.  Enterprises require that the data belongs to them and not the network.

2.Social interaction.  People speak to each other not EVA.  That is not the case with Alexa. In the consumer space, people talk directly to Alexa. At work, on the other hand, people use EVA to get more understanding out of conversations with each other.  This is a very different social dynamic, and commands have to be created in that context.

3. Non-discoverable version.  Some conversations are not meant to be recorded and transcribed, you still want to capture notable actions.  We have created a capability to capture such moments while not having a discoverable artifact that makes some people insecure.  This lack of discoverability is very important and most note taking, transcription companies lack this option.

4. Alexa, Cortana and OK Google tend to be good at integrating with their own apps and with consumer entertainment apps in general but they are generally not integrated into corporate workflows.  This is crucial because people need continuity from what they agree to in a meeting into the workflow systems that help them manage outcomes. Consumer voice assistants don’t focus on such enterprise systems, they are more integrated with music and home automation.

Does voice democratize technology or does it further separate the have’s from the have not’s?

Like mobile, voice should be a democratizing force.  Not everyone can have an exec admin, with Voice AI, we democratize that capability.  We can make such productivity assistance cheaper and hence more available to everyone.

What loses if voice wins?

The is not a zero-sum proposition.  Simplifying interfaces opens up more opportunities than it destroys by a long shot.  Think about what mobile apps did to extend the popular use of the web. Think about how many countries have populations that have skipped computer-web adoption and went straight to mobile and mobile apps.  Or think how many un-banked people enjoyed the innovations around mobile payments.

Is this transition to voice similar to other technical evolutions in business?

I think the GUI revolutionized computing for regular employees of companies rather than concentrating it with a few experts who understood programming languages.  Voice will have that same evolution. Many people simply avoid corporate workflows because of their complexity. This is why the move to consumerized B2B applications is gaining such steam.  Voice enablement is the next natural step. The rise of voice-enabled applications will drive much more adoption in the enterprise.

