How to choose a natural language processing (NLP) project that makes good sense for your organization.
For the past 18 months, my teams at Acxiom Research have worked extensively with a specific form of artificial intelligence called natural language processing (NLP). Our most exciting NLP development is called ABBY — our first artificially intelligent employee. But I’m not just here to talk about ABBY. I’m here to talk about the potential of NLP and how to decide if it’s a technology your own company should be exploring.
I want to leave you with two thoughts about NLP:
First, the open source technology around NLP is so robust you can easily build “on the shoulders of giants” and create amazingly effective NLP applications right now using just a small, highly-focused team and a platform approach.
Second, even with such a large amount of powerful technology at your fingertips, creating a front-end NLP (one that “talks back,” which is what most people think of when they think of AI) requires both vision and fortitude. Vision to see the power of the technology and sell it to your internal stakeholders. Fortitude because it will require a significant up-front investment before you see returns from some of the more advanced capabilities you need to develop. You must also be willing to learn the skills of a consumer marketer and deal with issues of changing behaviors already entrained in your user base.
NLP-based improvements to your business need not have a conversational front end. These backend-driven or linguistic analysis projects often offer the fastest, most cost-effective, highest-return way to use NLP in the short term. These projects involve teams of two-to-three people working for a few months to complete.
Hilary Mason, GM of Machine Learning at Cloudera presented a good example of backend NLP in a keynote at the most recent Strata Conference. Mason explained how Cloudera lowered its call center costs and improved customer satisfaction using NLP. They took a statistical sample of recorded calls from their call centers and transcribed them to text. They performed textual analysis on this corpus, seeking speech patterns tied to specific issues and problem resolution steps. They then deployed predictive models based on the results of this analysis into their call center systems. When a customer called, the underlying algorithms identified patterns of speech and proactively recommended a likely solution to the customer service representative as they were speaking with the customer. The result, Mason said, was reduced calls to the call center as well as increased customer satisfaction (my team saw the same type of positive results in our own similar project).
Once you focus on conversational NLP (or AI), where you want the machine to interact with a human in a way that has something even vaguely like the fluidity and imprecision of normal human speech, the problem becomes technically challenging and expensive. I am not speaking here of chatbots. A chatbot is a very simple machine that can follow a relatively structured conversation for a specific task and sits in certain pre-defined environments like Facebook Messenger. Conversational AIs are completely different. Similar to Alexa, they are ubiquitous (they are wherever you are), can handle multiple applications (also called intents), and can deal with the wide range of responses even one person can give to the same statement. They can also change contexts rapidly — say from providing information about today’s weather to making restaurant reservations.
Multiple open-source platforms already exist to allow your teams to build a functional, if imperfect, AI in a reasonable time frame and at a cost that provides a positive return. Companies like Apple, Google, Microsoft, and Amazon have each poured literally hundreds of millions of dollars, and the efforts of some of the brightest PhDs on the planet, into advanced NLP interfaces. These open source libraries allowed us to build a foundational platform for a simple conversational AI in about a year, with a team of 3-4 people, for approximately $500,000. That early platform has a few simple intents, no pre-conversation awareness of the user (since that requires an interface with secure systems), and no memory of prior user sessions. From there, depending on the complexity of the intent, we have been able to deliver each new function for between $10,000 for a simple intent (e.g. weather) and $25,000 for a more complex intent (e.g. conference room reservations).
We view the platform as an investment to be spread across all apps built in a two-year payback period. Since we expect to add 48 new intents over that period, amortizing the platform adds $10,000 to the cost of each intent. That is one way we cost-justify a new intent. For example, allowing people to self-service on a lost/forgotten password or other simple IT issues saves the time of at least one IT person a year. From some quick calculations using the IRR function in Excel, and assuming that role costs $100,000/year, the quick calculation of the single-year ROI of that “complex” app at ~260 percent, which makes it worth doing. Cost is only one factor we use in prioritizing which intents to build, and sometimes we invest even without a strong ROI. But we do use it as a guideline.
The following table provides an overview of some open-source tools worth looking into.
A conversational AI platform needs two forms of adoption to succeed. One with end users and, equally important, one with developers.
Achieving adoption of any new technology by a majority of end users is an arduous process. It is particularly difficult when users are reluctant to give up existing tools and ingrained behaviors. Purveyors of new technologies to consumers are well aware of this. They build a substantial adoption curve and associated marketing budgets into their business models. But developers and even product managers at many companies, especially those in B2B markets, have little experience with consumer adoption curves. They don’t factor it into their plans and, equally important, their managers don’t understand that curve either. There is very little patience or capability in many organizations for the kind of persistent messaging and salesmanship needed to gain widespread adoption of conversational interfaces. The result is that many front-end NLP projects never achieve adoption, which limits further investment.
My teams overcame this challenge with our ABBY project by treating the deployment of ABBY’s intents like any other typical new product marketing problem — we assigned a part-time product marketer to develop and execute marketing programs for internal adoption. We also developed a group of early adopters/beta testers who understand that part of their role is to promote the new intents to their peers in the organization. Lastly, our entire team is tasked with selling ABBY’s capabilities wherever we can when interacting with people in the organization. Just like in any standalone small company, everyone on the team is a salesperson.
But no matter how well you execute on internal marketing, front-end NLP is still a long-term evolution, and both the end user’s behavior and the capabilities of the AI are going to evolve over time as developers, the AI, and end users interact. It is for this reason that it is critical to develop an NLP platform for developers across the organization to use. Just as in an open marketplace, no one group can conceive of or build all the apps that may be important to the other various users or groups in your company. One way to enhance adoption is to have lots of teams building NLP apps for the conversational front end. Thus, developer adoption is a second critical element in the adoption cycle. We use many tools to promote adoption. We actively reach out to developers through team meetings, one-on-ones, and an NLP Special Interest Group. We also have NLP projects available for our regular quarterly hackathons.
This brings us to another design issue — efficacy. The intents to invest in are those that make an existing experience more effective, more efficient, or both. If it takes longer to do something conversationally, people will not use your AI. This is especially true where there is an ingrained behavior and significant, conscious extra effort is required for the end user to shift behavior. In our case, our phone directory project was a good investment because it was previously time consuming and inconvenient to get a person’s contact information from our internal systems. Once people used ABBY’s directory intent a few times, they began to switch. The same is true of room reservations. But when users were able to perform Google searches from within ABBY, we got very negative feedback. People thought we were silly to invest in an app when they could just switch to a browser and do a search that provided more robust information content in a format they understood.
A question I often get: “Where is the killer app?” The one area where conversational AI is making substantial inroads is customer service. But customer self-service is an instance of a broader class you can think of as diagnostics. That class of problems may define what can or cannot be a killer app for conversational AI. The question to ask with task-oriented users is, “When do they want or need to talk at length to an AI to accomplish a goal?” The answer is two-fold. One element is where the resolution of a task requires many back and forth interactions between the user and the “helper.” The second is when many words are needed because the item to be described is inexact, so the user is trying to string together a “close enough” description for the listener to guess at the actual item. Computer service is a great example. Buying a complex product like data via an online interface is another. A third is research and tabulation of information from data, which can be thought of as the “diagnosis of data” to determine an information outcome. In all these cases, end users must engage in a “ranging exercise,” where they start with a broad concept or set of possibilities and, through a series of interactive steps, restrict the set of possibilities until a final result is found or set of conclusions is reached.
The reality is, however, there may be no killer app. Very few apps are used by everyone. Given that such a universal intent as a phone directory requires promotion, imagine how much harder it is to gain adoption of intents focused on a single set of users. The analogy is mobile phones. There are very few universal apps in mobile. Most people use 10-15 apps. But the exact 10-15 are unique to each person. App downloads through the app store have a short “head end” and a long tail. App use is very idiosyncratic. It is very similar with apps within an organization with the caveat that the individual’s role has a very strong correlation to which apps they are most interested in. This is why having a platform, and adoption by developers, is so critical.