AWS Machine Learning Blog

Perform intelligent search across emails in your Google workspace using the Gmail connector for Amazon Kendra

Many organizations use Gmail for their business email needs. Gmail for Business is part of Google Workspace, which provides a set of productivity and collaboration tools like Google Drive, Google Docs, Google Sheets, and more. For any organization, emails contain a wealth of information, which could be within the subject of an email, the message content, or even email attachments. Performing an intelligent search on email interactions with coworkers can help find answers to questions, thereby improving employee productivity and enhancing the overall customer experience for the organization.

Amazon Kendra is a highly accurate and intelligent search service that allows your users to search unstructured and structured data using natural language processing (NLP) and advanced search algorithms. You can now use the Gmail connector for Amazon Kendra to index emails and email attachments in Gmail, and search for answers to your questions on this content using intelligent search in Amazon Kendra, powered by machine learning (ML).

This post walks you through the process of configuring the Gmail connector for Amazon Kendra for your organization’s Google Workspace, allowing you to index emails based on a defined scope and take advantage of the intelligent search capabilities of Amazon Kendra.

Solution overview

A data source is a data repository or location that Amazon Kendra connects to and indexes your documents or content. After you create an Amazon Kendra index, you can create one or many data sources and configure them to start ingesting documents from the data source. In our solution, we ingest emails and attachments from Gmail by configuring the new Gmail data source connector to filter for emails that meet a certain filter criterion. After the configuration is complete, we can synchronize the data source to index the documents, allowing you to perform intelligent search on the Amazon Kendra index.

Prerequisites

To enable the Gmail connector for Amazon Kendra, you need the following:

  • An AWS account
  • A Google Workspace account and an organization for your business with one or many users that have access to Gmail
  • Administrator account credentials to Google Workspace and the Google Cloud console

Configure Google Workspace

To enable Amazon Kendra to access and index emails from Gmail accounts within the organization and perform intelligent search on them, it’s essential to configure your organization’s Google Workspace. In the steps that follow, we create a service account that the Gmail connector uses to index emails. The service account is provided with authorization scopes to allow access to certain Gmail APIs. The authorization scopes express the permissions you request users to authorize for your app and are applicable for all emails within your organization’s Google Workspace.

  1. Log in to your organization’s Google Cloud account.
  2. Create a new project with an appropriate name and assign it to your organization. In our example, we name the project KendraGmailConnector.
  3. Choose Create.

  1. Monitor the progress of creation of the new project on the Notifications menu on the top right of the Google Cloud console.

  1. After the project is created, choose the options menu, choose API & Services¸ and choose Library to view the API Library.

  1. On the API Library, search for Admin SDK API and choose Enable. The Admin SDK API enables managing the Google Workspace account resources and audit usage.

  1. Similarly, search for Gmail API on the API Library page and choose Enable. The Gmail API can help in viewing and managing Gmail mailbox data like threads, messages, and labels.

We now create a service account, which the Gmail connector for Amazon Kendra uses to access your organization’s emails based on the allowed API scope.

  1. On the options menu, choose IAM & Admin, then choose Service Accounts.

  1. Choose Create service account.

  1. Enter a name for your service account. For this post, we name our service account AmazonKendraGmailConnector.
  2. Enter your service account ID and account description.
  3. Skip the optional steps Grant this service account access to project and Grant users access to this service account and choose Done.

  1. Choose the service account you created to open the service account details page.
  2. Note the unique ID for the service account (also known as a client ID), to use in a later step.

Next, we create keys for the service account, which allows it to be used by the Gmail connector for Amazon Kendra.

  1. On the Keys tab, choose Add key.

  1. For Key type, select JSON.
  2. Choose Create.

This step downloads the private key to your computer, which must be kept safe to allow configuration on the Amazon Kendra console.

  1. Choose Close.

The following screenshot shows an example of the credentials JSON file.

  1. On the Details tab, expand the Advanced settings section.
  2. Under Domain-wide delegation, choose View Google Workspace admin console.

Granting access to the service account via a domain-wide delegation to your organization’s data must be done with caution, and can be reversed by disabling or deleting the service account or removing access through the Google Workspace admin console.

  1. Log in to the admin console using your Google Workspace admin credentials.
  2. In the navigation pane, under Security, choose Access and data control, then choose API controls.
  3. In the Domain-wide delegation section, choose Manage domain-wide delegation.

  1. Choose Add new.

This brings up the Add a new client ID dialog.

  1. Enter the unique ID for the service account you created earlier, and enter the following scopes to allow the service account to access the emails from Gmail:
    1. https://www.googleapis.com/auth/gmail.readonly
    2. https://www.googleapis.com/auth/admin.directory.user.readonly
  2. Choose Authorize.

This concludes the configuration within the Google Cloud console and Google Workspace admin console.

Configure the Gmail connector for Amazon Kendra

In this section, we walk through the configuration steps for the Gmail connector for Amazon Kendra:

  1. On the Amazon Kendra console, create a new index or open an existing index. For this post, we use the existing index EnterpriseKendraIndex.

  1. Under Data management in the navigation pane, choose Data sources.
  2. Choose Add data source.

  1. On the list of data sources, find the Gmail connector and choose Add connector.

  1. On the Specify data source details page, complete the following steps:
    1. For Data source name, enter a name.
    2. For Description, enter an optional description.
    3. Leave the language as the default setting, English (en).

    Amazon Kendra supports a select set of languages with full semantic search. These languages include Spanish, Japanese, French, and others. For more information, see Adding documents in languages other than English.

    1. Add any tags to the index, then choose Next.

Next, we create an AWS Secrets Manager secret to store the Gmail authentication details, and use the values in the credentials JSON file that we downloaded earlier.

  1. On the Define access and security page, complete the following steps:
      1. In the Authentication section, choose Create and add new secret, which opens the Create an AWS Secrets Manager secret dialog.
      2. For Secret name, enter a name.
      3. For Client email, enter the client email ID from the credentials JSON file.
      4. For Admin account email, enter the admin email for the Google Cloud console.
      5. For Private key, enter the private key from the credentials JSON file.
      6. Choose Save to return to the Define access and security page.

      1. In the Configure VPC and security group section, you can choose a VPC and the subnets that will contain the data source and security group that will grant access to the host. For our configuration, we choose No VPC.
      2. In the IAM role section, choose Create a new role and enter a role name.
      3. Choose Next.

  1. On the Configure sync settings page, set the following parameters to sync all emails and email attachments sent from the admin email address:
    1. In the Sync scope section, select Message attachments.
    2. Under Additional configuration, configure filters for the emails to ingest into the Amazon Kendra index:
      1. For Date range, enter the start and end dates for emails to be crawled. Emails received on or after the start date and before the end date are included in the sync scope.
      2. For Email domains, enter the email from domains, email to domains, subject, CC, and BCC emails you wish to include or exclude in your index. For this post, we set the email from domain as the admin email address.
      3. For Keywords in subjects, include or exclude any documents with at least one keyword mentioned in their subjects.
      4. For Labels, add regular expression patterns to include or exclude certain labels or attachment types (up to 100 patterns).
      5. For Attachments, add regular expression patterns to include or exclude certain attachments (up to 100 patterns).

    1. In the Sync mode section, you can either specify a full sync to sync and index all contents in all entities regardless of the previous sync status, or only sync new, modified, or deleted content. For this post, we select Full sync.

    1. Lastly, we set an appropriate frequency for the sync. For this post, we choose Run on demand.
    2. Choose Next.
  1. On the Set field mappings page, you associate or create a mapping of the required data source fields with fields in your index. You can also create mappings for custom index fields. You can specify mapping for both messages and message attachments. For this post, we add field mappings in the Message section:
    1. Select the Gmail field mappings subject, from, and to.
    2. Choose Next.

  1. On the Review and create page, review all the steps and choose Add data source to create your Gmail connector data source.
  2. After the data source is created, on the Data sources page, select the data source (kendra-gmail-connector) and choose Sync now.

The amount of time the sync takes depends on the number of the emails that match the sync scope and the size of attachments that need be indexed. You can check the status of the sync operation for the Gmail data source if you choose the data source and scroll down to the Sync run history section. Choose the status of the individual sync to view more details.

This section shows the start and end times of the sync and also the number of documents that were added, deleted, failed, or modified during the sync. A status of Completed denotes a sync where there are no failures. In cases where a document being ingested is blank, the sync status is set to Completed with Errors with the number of failed documents listed as Failed, as shown in the following screenshot. In case of a sync failure, you can investigate the reason by either choosing the number of failed documents or by choosing the entry in the Details column, which brings up the Amazon CloudWatch logs. In the following example, two documents failed ingestion because they were blank.

After the sync is successful, you can perform a search on the Amazon Kendra index.

Search indexed content

To search on the indexed content, choose Search indexed content in the navigation pane on the Amazon Kendra console.

On the search console, enter any natural language question. In our example, we ask “What is SageMaker.” Amazon Kendra performs an intelligent search on the emails ingested into the index based on the scope of the sync and finds an answer, as shown in the following screenshot.

In this example, the Document fields section shows the field mappings that we specified while configuring our data source connector.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Gmail connector, delete the added data source.

Conclusion

In this post, we showed how organizations can now use the Gmail connector for Amazon Kendra to allow users to perform intelligent search on emails and email attachments, thereby improving employee productivity and customer satisfaction.

Additionally, we walked through how to define field mappings to the Amazon Kendra data source, allowing users to refine their search results.

To learn more about the Gmail connector for Amazon Kendra, refer to Gmail data source connector for Amazon Kendra.


About the Author

Roshan Thomas is a Senior Solutions Architect at Amazon Web Services. He is based in Melbourne, Australia, and works closely with power and utilities customers to accelerate their journey in the cloud. He is passionate about technology and helping customers architect and build solutions on AWS.