Next generation Data platform for the top insurance company

Published ON
February 1, 2023
The primary goal of the client was to digitally transform the enterprise, starting with consolidating the data lake by implementing governance and granting efficient access to various personas.
main image

The client must create a next-generation solution that effectively meets the needs of the specified personas while also utilizing resources efficiently and minimizing maintenance requirements.

The Cloudseed team worked in partnership with the client to develop a plan for moving the current data lake layer.

Key project requirements were

Data Scope

 Raw Layer (~100 Sources, ~10,000+ Data Sets) 

• Structured Data – RDBMS & Flat File Sources 

• Semi-Structured Data – XML Files, JSON Files 

• Unstructured Data – Audio WAV Files, Chat Transcripts, Images

• Transform layer for 6 subject Areas(Personal lines, Claims, Customer etc.) 

• Migration of deployed models and Apps 

• Historical data migration of 60Tb

Processes

Ingestion of data into the Cloud (Land, Cleanse, and Create Active / History data copies) 

• Historical as well as Incremental data loads 

• Metadata Delivery (Operational, Technical, and Business Metadata) 

• Checkout and Data Quality Processes 

• Role Based on Security and User Access 

• High Availability and Disaster Recovery Processes

User Migration

• Data Scientists First; Business Users Later 

• Migration of users with relevant data access to different groups

A comprehensive cloud migration solution from beginning to end, including the design of the architecture, examination of the source, incorporation of data, and historical information.

The migration of historical data was accomplished through a combination of the Hadoop distribution copy and the ingestion framework. Constructed a framework for data ingestion on EMR that utilizes metadata and includes various safeguards and controls.

A framework for processing data using GLUE and EMR. The Glue Data Catalog served as a centralized repository for metadata and was integrated with existing data governance tools.

Deploying AWS Infrastructure using Cloud Formation automation. A framework designed for the automatic deployment of workflow and scheduling processes.

Empower Data Scientists and Data Analysts to independently access and explore data through self-service capabilities. This will speed up the process of creating predictive models by decreasing the time spent on acquiring and exploring data.

To decrease the duration of the project cycle, utilize flexible computing infrastructure. By having access to on-demand higher computing, predictive models can be trained and retrained in a shorter period of time. Furthermore, by having a single point of access for both structured and unstructured data, more comprehensive insights can be gleaned.

• The cloud's ability to provide agility and flexibility improves the speed at which data solutions are delivered

• The cloud enables the use of cutting-edge technologies such as Machine Learning and Artificial Intelligence, allowing for faster implementation in solving business challenges

• Quick infrastructure provisioning and the ability to easily scale compute resources as needed, allows for efficient time to market.

By separating compute and storage, we can achieve increased flexibility and a reduction in infrastructure costs by 40%. Additionally, we can provide transparency to business divisions by displaying the costs associated with their usage of infrastructure resources and data, including data egress costs, compute costs, and storage costs. This allows for potential cost-recovery options for business divisions.

Other Case Studies