Introducing Genomics Tertiary Analysis and Data Lakes Using AWS Glue and Amazon Athena
Genomics Tertiary Analysis and Data Lakes Using AWS Glue and Amazon Athena is a new AWS Solutions Implementation that creates a scalable environment in AWS to prepare genomic data for large-scale analysis and perform interactive queries against a genomics data lake. The solution demonstrates how to 1) build, package, and deploy libraries used for genomics data conversion, 2) provision data ingestion pipelines for genomics data preparation and cataloging, and 3) run interactive queries against a genomics data lake. The solution uses AWS CloudFormation to automate its deployment in the AWS Cloud, and it includes continuous integration and continuous delivery (CI/CD) using AWS CodeCommit source code repositories and AWS CodePipeline for building and deploying updates to the data preparation jobs, crawlers, data analysis notebooks, and the data lake infrastructure. It fully leverages infrastructure as code principles and best practices that enable you to rapidly evolve the solution.
Source:: Amazon AWS