Deploy generative AI self-service question answering using the QnABot on AWS solution powered by Amazon Lex with Amazon Kendra, and Amazon Bedrock

October 2023: Amazon Bedrock is now generally available.

Powered by Amazon Lex, the QnABot on AWS solution is an open-source, multi-channel, multi-language conversational chatbot. QnABot allows you to quickly deploy self-service conversational AI into your contact center, websites, and social media channels, reducing costs, shortening hold times, and improving customer experience. Customers now want to experiment to see how the power of large language models (LLMs) can further improve the customer experience with generative AI capabilities. This includes automatically generating answers from existing company documents and knowledge bases, and making their self-service chatbots more conversational.

Our latest QnABot releases, v5.4.0+, now let you experiment using an LLM to disambiguate customer questions by taking conversational context into account, dynamically generating answers from relevant FAQs or Amazon Kendra search results and document passages. It also provides attribution and transparency by displaying links to the reference documents and context passages that were used by the LLM to construct the answers.

Generative AI can be used to rapidly generate content for common customer questions by searching through and summarizing the most relevant details from existing FAQs and knowledge base articles. Watch the demo video below to see a fictional insurance chatbot example where a customer is asking about specific insurance policies. The bot can generate a response and point to the knowledge sources that are used to create the response, helping provide additional context for the customer.

When you deploy QnABot, you can choose to automatically deploy an open-source LLM model (Falcon-40B-instruct) on an Amazon SageMaker endpoint. The LLM landscape is constantly evolving—new models are released frequently and our customers want to experiment with different models and prompt configurations to see what works best for their use cases. This is why QnABot also integrates with Amazon Bedrock and any other LLM using an AWS Lambda function. To help you get started, we’ve released a set of sample one-click deployable Lambda functions (plugins) that integrate QnABot with Amazon Bedrock and other LLMs.

In this post, we introduce the new generative AI features for QnABot and walk through a tutorial to create, deploy, and customize QnABot to use these features. We also discuss some relevant use cases.

New generative AI features

Using the LLM, QnABot now has two new important features, which we discuss in this section.

Generate answers to questions from Amazon Kendra search results or text passages

QnABot can now generate concise answers to questions from document extracts provided by an Amazon Kendra search, or text passages created or imported directly. This provides the following advantages:

The number of FAQs that you need to maintain and import into QnABot is reduced, because you can now synthesize concise answers on the fly from your existing documents.
Generated answers can be modified to create the best experience for the intended channel. For example, you can set the answers to be short, concise, and suitable for voice channel contact center bots, and website or text bots could potentially provide more detailed information.
Generated answers are fully compatible with QnABot’s multi-language support—users can interact in their chosen languages and receive generated answers in the same language.
Generated answers can include links to the reference documents and context passages used, to provide attribution and transparency on how the LLM constructed the answers.

For example, when asked “What is Amazon Lex?”, QnABot can retrieve relevant passages from an Amazon Kendra index (containing AWS documentation). QnABot then asks (prompts) the LLM to answer the question based on the context of the passages (which can also optionally be viewed in the web client). The following screenshot shows an example.

Disambiguate follow-up questions that rely on preceding conversation context

Understanding the direction and context of an ever-evolving conversation is key to building natural, human-like conversational interfaces. User queries often require a bot to interpret requests based on conversation memory and context. Now QnABot will ask the LLM to generate a disambiguated question based on the conversation history. This can then be used as a search query to retrieve the FAQs, passages, or Amazon Kendra results to answer the user’s question. The following is an example chat history:

Human: What is Amazon Lex?
AI: "Amazon Lex is an AWS service for building conversational interfaces for applications using voice and text..."
Human: Can it integrate with my CRM?

QnABot uses the LLM to rewrite the follow-up question to make “it” unambiguous, for example, “Can Amazon Lex integrate with my CRM system?” This allows users to interact like they would in a human conversation, and QnABot generates clear search queries to find the relevant FAQs or document passages that have the information to answer the user’s question.

These new features make QnABot more conversational and provide the ability to dynamically generate responses based on a knowledge base. This is still an experimental feature with tremendous potential. We strongly encourage users to experiment to find the best LLM and corresponding prompts and model parameters to use. QnABot makes it straightforward to experiment!

Tutorial

Time to try it! Let’s deploy the latest QnABot (v5.4.0 or later) and enable the new generative AI features. The high-level steps are as follows:

Create and populate an Amazon Kendra index.
Deploy the LLM plugin for Amazon Bedrock (optional).
Deploy QnABot.
Configure QnABot for your Lambda plugin (if using a plugin).
Access the QnABot web client and start experimenting.
Customize behavior using QnABot settings.
Add curated Q&As and text passages to the knowledge base.

Create and populate an Amazon Kendra Index

Download and use the following AWS CloudFormation template to create a new Amazon Kendra index.

This template includes sample data containing AWS online documentation for Amazon Kendra, Amazon Lex, and SageMaker. Deploying the stack requires about 30 minutes followed by about 15 minutes to synchronize it and ingest the data in the index.

When the Amazon Kendra index stack is successfully deployed, navigate to the stack’s Outputs tab and note the Index Id, which you will use later when deploying QnABot.

Alternatively, if you already have an Amazon Kendra index with your own content, you can use it instead with your own example questions for the tutorial.

Deploy the LLM Lambda plugin for Amazon Bedrock (optional)

In this section, we show you how to deploy a pre-built Lambda function to integrate QnABot with Amazon Bedrock. Skip to the next step if you want to use the built-in LLM instead.

Additional plugin options are also available. Review your options from the qnabot-on-aws-plugin-samples repository README.

Deploy the Bedrock plugin QNABOT-BEDROCK-EMBEDDINGS-AND-LLM by choosing Launch Stack in the Deploy a new Plugin stack section, which will deploy into the us-east-1 Region by default (to deploy in other Regions, see Build and Publish QnABot Plugins CloudFormation artifacts).

When the Plugin stack is successfully deployed, navigate to the stack’s Outputs tab (see the following screenshot) and inspect its contents, which you will use in the following steps to deploy and configure QnABot. Keep this tab open in your browser.

Deploy QnABot

Choose Launch Solution from the QnABot implementation guide to deploy the latest QnABot template via AWS CloudFormation. Provide the following parameters:

For DefaultKendraIndexId, use the Amazon Kendra Index ID (a GUID) you collected earlier
For EmbeddingsApi (see Semantic Search using Text Embeddings), choose one of the following:
- SAGEMAKER (the default built-in embeddings model)
- LAMBDA (Recommended: use the Amazon Bedrock embeddings API)
  - For EmbeddingsLambdaArn, use the EmbeddingsLambdaArn output value from your BEDROCK-EMBEDDINGS-AND-LLM Plugin stack.
For LLMApi (see Query Disambiguation for Conversational Retrieval, and Generative Question Answering), choose one of the following:
- SAGEMAKER (the default built-in LLM model)
- LAMBDA (Recommended: use the Amazon Bedrock LLM API)
  - For LLMLambdaArn, use the LLMLambdaArn output value from your BEDROCK-EMBEDDINGS-AND-LLM Plugin stack

For all other parameters, accept the defaults (see the implementation guide for parameter definitions), and proceed to launch the QnABot stack.

Configure QnABot for the Amazon Bedrock plugin

If you deployed QnABot using the Amazon Bedrock plugin, update the QnABot model parameters and prompt templates using the outputs from the plugin stack. For more information, see Update QnABot Settings. If you used the SageMaker (built-in) LLM option, skip to the next step, because the settings are already configured for you.

Access the QnABot web client and start experimenting

On the AWS CloudFormation console, choose the Outputs tab of the QnABot CloudFormation stack and choose the ClientURL link. Alternatively, launch the client by choosing QnABot on AWS Client from the Content Designer tools menu.

Now, try to ask questions related to AWS services, for example:

What is Amazon Lex?
How does SageMaker scale up inference workloads?
Is Kendra a search service?

Then you can ask follow-up questions without specifying the previously mentioned services or context, for example:

Is it secure?
Does it scale?

Customize behavior using QnABot settings

You can customize many settings on the QnABot Content Designer Settings page—see README – LLM Settings for a full list of relevant settings. For example, try the following:

Set ENABLE_DEBUG_RESPONSES to TRUE, save the settings, and try the previous questions again. Now you will see additional debug output at the top of each response, showing you how the LLM generates the Amazon Kendra search query based on the chat history, how long the LLM inferences took to run, and more. For example:
```
[User Input: "Is it fast?", LLM generated query (1207 ms): "Does Amazon Kendra provide search results quickly?", Search string: "Is it fast? / Does Amazon Kendra provide search results quickly?"["LLM: LAMBDA"], Source: KENDRA RETRIEVE API
```
Set ENABLE_DEBUG_RESPONSES back to FALSE, set LLM_QA_SHOW_CONTEXT_TEXT and LLM_QA_SHOW_SOURCE_LINKS to FALSE, and try the examples again. Now the context and sources links are not shown, and the output contains only the LLM-generated response.
If you feel adventurous, experiment also with the LLM prompt template settings—LLM_GENERATE_QUERY_PROMPT_TEMPLATE and LLM_QA_PROMPT_TEMPLATE. Refer to README – LLM Settings to see how you can use placeholders for runtime values like chat history, context, user input, query, and more. Note that the default prompts can most likely be improved and customized to better suit your use cases, so don’t be afraid to experiment! If you break something, you can always revert to the default settings using the RESET TO DEFAULTS option on the settings page.

Add curated Q&As and text passages to the knowledge base

QnABot can, of course, continue to answer questions based on curated Q&As. It can also use the LLM to generate answers from text passages created or imported directly into QnABot, in addition to using Amazon Kendra index.

QnABot attempts to find a good answer to the disambiguated user question in the following sequence:

QnA items
Text passage items
Amazon Kendra index

Let’s try some examples.

On the QnABot Content Designer tools menu, choose Import, then load the two example packages:

TextPassages-NurseryRhymeExamples
blog-samples-final

QnABot can use text embeddings to provide semantic search capability (using QnABot’s built-in OpenSearch index as a vector store), which improves accuracy and reduces question tuning, compared to standard OpenSearch keyword based matching. To illustrate this, try questions like the following:

“Tell me about the Alexa device with the screen”
“Tell me about Amazon’s video streaming device?”

These should ideally match the sample QNA you imported, even though the words used to ask the question are poor keyword matches (but good semantic matches) with the configured QnA items: Alexa.001 (What is an Amazon Echo Show) and FireTV.001 (What is an Amazon Fire TV).

Even if you are not (yet) using Amazon Kendra (and you should!), QnABot can also answer questions based on passages created or imported into Content Designer. The following questions (and follow-up questions) are all answered from an imported text passage item that contains the nursery rhyme 0.HumptyDumpty:

“Where did Humpty Dumpty sit before he fell?”
“What happened after he fell? Was he OK?”

When using embeddings, a good answer is an answer that returns a similarity score above the threshold defined by the corresponding threshold setting. See Semantic question matching, using Large Language Model Text Embeddings for more details on how to test and tune the threshold settings.

If there are no good answers, or if the LLM’s response matches the regular expression defined in LLM_QA_NO_HITS_REGEX, then QnABot invokes the configurable Custom Don’t Know (no_hits) behavior, which, by default, returns a message saying “You stumped me.”

Try some experiments by creating Q&As or text passage items in QnABot, as well as using an Amazon Kendra index for fallback generative answers. Experiment (using the TEST tab in the designer) to find the best values to use for the embedding threshold settings to get the behavior you want. It’s hard to get the perfect balance, but see if you can find a good enough balance that results in useful answers most of the time.

Clean up

You can, of course, leave QnABot running to experiment with it and show it to your colleagues! But it does incur some cost—see Plan your deployment – Cost for more details. To remove the resources and avoid costs, delete the following CloudFormation stacks:

QnABot stack
LLM Plugin stack (if applicable)
Amazon Kendra index stack

Use case examples

These new features make QnABot relevant for many customer use cases such as self-service customer service and support bots and automated web-based Q&A bots. We discuss two such use cases in this section.

Integrate with a contact center

QnABot’s automated question answering capabilities deliver effective self-service for inbound voice calls and chats in contact centers, with compelling outcomes. For example, see how Kentucky Transportation Cabinet reduced call hold time and improved customer experience with self-service virtual agents using Amazon Connect and Amazon Lex. Integrating the new generative AI features strengthens this value proposition further by dynamically generating reliable answers from existing content such as documents, knowledge bases, and websites. This eliminates the need for bot designers to anticipate and manually curate responses to every possible question that a user might ask. To integrate QnABot with Amazon Connect, see Connecting QnABot on AWS to an Amazon Connect call center.

The LLM-powered QnABot can also play a pivotal role as an automated real-time agent assistant. In this solution, QnABot passively listens to the conversation and uses the LLM to generate real-time suggestions for the human agents based on certain cues. It’s straightforward to set up and try—give it a go! This solution can be utilized with both Amazon Connect and other on-prem and cloud contact centers. For more information, see Live call analytics and agent assist for your contact center with Amazon language AI services.

To discover other generative AI use cases for customer service with Amazon Connect, see How contact center leaders can prepare for generative AI

Integrate with a website

Embedding QnABot in your websites and applications allows users to get automated assistance with natural dialogue. If you integrate QnAbot into your Amazon Connect flows, you can use the Amazon Connect chat user interface for an integrated web experience that combines bot interaction and live agent chat. For other web deployment scenarios, see Deploy a Web UI for your Chatbot.

The QnABot on the AWS plugin samples repository

As shown in this post, QnABot v5.4.0+ not only offers built-in support for embeddings and LLM models hosted on SageMaker, but it also offers the ability to easily integrate with any other LLM by using Lambda functions. You can author your own custom Lambda functions or get started faster with one of the samples we have provided in our new qnabot-on-aws-plugin-samples repository.

This repository includes a ready-to-deploy plugin for Amazon Bedrock, which supports both embeddings and text generation requests. Now that Amazon Bedrock is generally available, we expect to integrate it directly with QnABot, but why wait? Use our sample plugin to start experimenting!

Today’s LLM innovation cycle is driving a breakneck pace of new model releases, each aiming to surpass the last. This repository will expand to include additional QnABot plugin samples over time. We plan to add integrations for more LLMs, embeddings, and common use case examples involving Lambda hooks and knowledge bases. These plugins are offered as-is without warranty, for your convenience—users are responsible for supporting and maintaining them once deployed.

We hope that the QnABot plugins repository will mature into a thriving open-source community project. Watch the qnabot-on-aws-plugin-samples GitHub repo to receive updates on new plugins and features, use the Issues forum to report problems or provide feedback, and contribute improvements via pull requests. Contributions are welcome!

Conclusion

In this post, we introduced the new generative AI features for QnABot and walked through a solution to create, deploy, and customize QnABot to use these features. We also discussed some relevant use cases. Automating repetitive inquiries frees up human workers and boosts productivity. Rich responses create engaging experiences. Deploying the LLM-powered QnABot can help you elevate the self-service experience for customers and employees.

Don’t miss this opportunity—get started today. Experiment, and see if it provides value for you! You may revolutionize the user experience on your QnABot deployment!

About the authors

Clevester Teo is a Senior Partner Solutions Architect at AWS, focused on the Public Sector partner ecosystem. He enjoys building prototypes, staying active outdoors, and experiencing new cuisines. Clevester is passionate about experimenting with emerging technologies and helping AWS partners innovate and better serve public sector customers.

Windrich is a Solutions Architect at AWS who works with customers in industries such as finance and transport, to help accelerate their cloud adoption journey. He is especially interested in Serverless technologies and how customers can leverage them to bring values to their business. Outside of work, Windrich enjoys playing and watching sports, as well as exploring different cuisines around the world.

Bob Strahan is a Principal Solutions Architect in the AWS Language AI Services team.