AWS Machine Learning Blog

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

This post is co-written by Hesham Fahim from Thomson Reuters.

Thomson Reuters (TR) is one of the world’s most trusted information organizations for businesses and professionals. It provides companies with the intelligence, technology, and human expertise they need to find trusted answers, enabling them to make better decisions more quickly. TR’s customers span across the financial, risk, legal, tax, accounting, and media markets.

Thomson Reuters provides market-leading products in the Tax, Legal and News campaign, which users can sign up to using a subscription licensing model. To enhance this experience for their customers, TR wanted to create a centralized recommendations platform that allowed their sales team to suggest the most relevant subscription packages to their customers, generating suggestions that help raise awareness of products that could help their customers serve the market better through tailored product selections.

Prior to building this centralized platform, TR had a legacy rules-based engine to generate renewal recommendations. The rules in this engine were predefined and written in SQL, which aside from posing a challenge to manage, also struggled to cope with the proliferation of data from TR’s various integrated data source. TR customer data is changing at a faster rate than the business rules can evolve to reflect changing customer needs. The key requirement for TR’s new machine learning (ML)-based personalization engine was centered around an accurate recommendation system that takes into account recent customer trends. The desired solution would be one with low operational overhead, the ability to accelerate delivering business goals, and a personalization engine that could be constantly trained with up-to-date data to deal with changing consumer habits and new products.

Personalizing the renewal recommendations based on what would be valuable products for TR’s customers was an important business challenge for the sales and marketing team. TR has a wealth of data that could be used for personalization that has been collected from customer interactions and stored within a centralized data warehouse. TR has been an early adopter of ML with Amazon SageMaker, and their maturity in the AI/ML domain meant that they had collated a significant dataset of relevant data within a data warehouse, which the team could train a personalization model with. TR has continued their AI/ML innovation and has recently developed a revamped recommendation platform using Amazon Personalize, which is a fully managed ML service that uses user interactions and items to generate recommendations for users. In this post, we explain how TR used Amazon Personalize to build a scalable, multi-tenanted recommender system that provides the best product subscription plans and associated pricing to their customers.

Solution architecture

The solution had to be designed considering TR’s core operations around understanding users through data; providing these users with personalized and relevant content from a large corpus of data was a mission-critical requirement. Having a well-designed recommendation system is key to getting quality recommendations that are customized to each user’s requirements.

The solution required collecting and preparing user behavior data, training an ML model using Amazon Personalize, generating personalized recommendations through the trained model, and driving marketing campaigns with the personalized recommendations.

TR wanted to take advantage of AWS managed services where possible to simplify operations and reduce undifferentiated heavy lifting. TR used AWS Glue DataBrew and AWS Batch jobs to perform the extract, transform, and load (ETL) jobs in the ML pipelines, and SageMaker along with Amazon Personalize to tailor the recommendations. From a training data volume and runtime perspective, the solution needed to be scalable to process millions of records within the time frame already committed to downstream consumers in TR’s business teams.

The following sections explain the components involved in the solution.

ML training pipeline

Interactions between the users and the content is collected in the form of clickstream data, which is generated as the customer clicks on the content. TR analyzes if this is part of their subscription plan or beyond their subscription plan so that they can provide additional details about the price and plan enrollment options. The user interactions data from various sources is persisted in their data warehouse.

The following diagram illustrates the ML training pipeline.
ML engine training pipeline
The pipeline starts with an AWS Batch job that extracts the data from the data warehouse and transforms the data to create interactions, users, and items datasets.

The following datasets are used to train the model:

  • Structured product data – Subscriptions, orders, product catalog, transactions, and customer details
  • Semi-structured behavior data – Users, usage, and interactions

This transformed data is stored in an Amazon Simple Storage Service (Amazon S3) bucket, which is imported into Amazon Personalize for ML training. Because TR wants to generate personalized recommendations for their users, they use the USER_PERSONALIZATION recipe to train ML models for their custom data, which is referred as creating a solution version. After the solution version is created, it’s used for generating personalized recommendations for the users.

The entire workflow is orchestrated using AWS Step Functions. The alerts and notifications are captured and published to Microsoft Teams using Amazon Simple Notification Service (Amazon SNS) and Amazon EventBridge.

Generating personalized recommendations pipeline: Batch inference

Customer requirements and preferences change very often, and the latest interactions captured in clickstream data serves as a key data point to understand the changing preferences of the customer. To adapt to ever-changing customer preferences, TR generates personalized recommendations on a daily basis.

The following diagram illustrates the pipeline to generate personalized recommendations.
Pipeline to generate personalized recommendations in Batch
A DataBrew job extracts the data from the TR data warehouse for the users who are eligible to provide recommendations during renewal based on the current subscription plan and recent activity. The DataBrew visual data preparation tool makes it easy for TR data analysts and data scientists to clean and normalize data to prepare it for analytics and ML. The ability to choose from over 250 pre-built transformations within the visual data preparation tool to automate data preparation tasks, all without the need to write any code, was an important feature. The DataBrew job generates an incremental dataset for interactions and input for the batch recommendations job and stores the output in a S3 bucket. The newly generated incremental dataset is imported into the interactions dataset. When the incremental dataset import job is successful, an Amazon Personalize batch recommendations job is triggered with the input data. Amazon Personalize generates the latest recommendations for the users provided in the input data and stores it in a recommendations S3 bucket.

Price optimization is the last step before the newly formed recommendations are ready to use. TR runs a cost optimization job on the recommendations generated and uses SageMaker to run custom models on the recommendations as part of this final step. An AWS Glue job curates the output generated from Amazon Personalize and transforms it into the input format required by the SageMaker custom model. TR is able to take the advantage of breadth of the services that AWS provides, using both Amazon Personalize and SageMaker in the recommendation platform to tailor recommendations based on the type of customer firm and end-users.

The entire workflow is decoupled and orchestrated using Step Functions, which gives the flexibility of scaling the pipeline depending on the data processing requirements. The alerts and notifications are captured using Amazon SNS and EventBridge.

Driving email campaigns

The recommendations generated along with the pricing results are used to drive email campaigns to TR’s customers. An AWS Batch job is used to curate the recommendations for each customer and enrich it with the optimized pricing information. These recommendations are ingested into TR’s campaigning systems, which drive the following email campaigns:

  • Automated subscription renewal or upgrade campaigns with new products that might interest the customer
  • Mid-contract renewal campaigns with better offers and more relevant products and legal content materials

The information from this process is also replicated to the customer portal so customers reviewing their current subscription can see the new renewal recommendations. TR has seen a higher conversion rate from email campaigns, leading to increased sales orders, since implementing the new recommendation platform.

What’s next: Real-time recommendations pipeline

Customer requirements and shopping behaviors change in real time, and adapting recommendations to the real-time changes is key to serving the right content. After seeing a great success deploying a batch recommendation system, TR is now planning to take this solution to the next level by implementing a real-time recommendations pipeline to generate recommendations using Amazon Personalize.

The following diagram illustrates the architecture to provide real-time recommendations.
Real-time recommendations pipeline
The real-time integration starts with collecting the live user engagement data and streaming it to Amazon Personalize. As the users are interacting with TR’s applications, they generate clickstream events, which are published into Amazon Kinesis Data Streams. Then the events are ingested into TR’s centralized streaming platform, which is built on top of Amazon Managed Streaming for Kafka (Amazon MSK). Amazon MSK makes it easy to ingest and process streaming data in real time with fully managed Apache Kafka. In this architecture, Amazon MSK serves as a streaming platform and performs any data transformations required on the raw incoming clickstream events. Then an AWS Lambda function is triggered to filter the events to the schema compatible with the Amazon Personalize dataset and push those events to an Amazon Personalize event tracker using a putEvent API. This allows Amazon Personalize to learn from your user’s most recent behavior and include relevant items in recommendations.

TR’s web applications invoke an API deployed in Amazon API Gateway to get recommendations, which triggers a Lambda function to invoke a GetRecommendations API call with Amazon Personalize. Amazon Personalize provides the latest set of personalized recommendations curated to the user behavior, which are provided back to the web applications via Lambda and API Gateway.

With this real-time architecture, TR can serve their customers with personalized recommendations curated to their most recent behavior and serve their needs better.

Conclusion

In this post, we showed you how TR used Amazon Personalize and other AWS services to implement a recommendation engine. Amazon Personalize enabled TR to accelerate the development and deployment of high-performance models to provide recommendations to their customers. TR is able to onboard a new suite of products within weeks now, compared to months earlier. With Amazon Personalize and SageMaker, TR is able to elevate the customer experience with better content subscription plans and prices for their customers.

If you enjoyed reading this blog and would like to learn more about Amazon Personalize and how it can help your organization build recommendation systems, please see the developer guide.


About the Authors

Hesham Fahim is a Lead Machine Learning Engineer and Personalization Engine Architect at Thomson Reuters. He has worked with organizations in academia and industry ranging from large enterprises to mid-sized startups. With a focus on scalable deep learning architectures, He has experience in mobile robotics, biomedical image analysis as well as recommender systems. Away from computers he enjoys astrophotography, reading and long distance biking.

Srinivasa Shaik is a Solutions Architect at AWS based in Boston. He helps Enterprise customers to accelerate their journey to the cloud. He is passionate about containers and machine learning technologies. In his spare time, he enjoys spending time with his family, cooking, and traveling.

Vamshi Krishna Enabothala is a Sr. Applied AI Specialist Architect at AWS. He works with customers from different sectors to accelerate high-impact data, analytics, and machine learning initiatives. He is passionate about recommendation systems, NLP, and computer vision areas in AI and ML. Outside of work, Vamshi is an RC enthusiast, building RC equipment (planes, cars, and drones), and also enjoys gardening.

Simone Zucchet is a Senior Solutions Architect at AWS. With over 6 years of experience as a Cloud Architect, Simone enjoys working on innovative projects that help transform the way organizations approach business problems. He helps support large enterprise customers at AWS and is part of the Machine Learning TFC. Outside of his professional life, he enjoys working on cars and photography.