Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.

Athena is out-of-the-box integrated with Amazon Glue Data Catalog, allowing you to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning. You can also use Glue’s fully-managed ETL capabilities to transform data or convert it into columnar formats to optimize cost and improve performance.


Introducing Amazon Athena for Apache Spark



How it works

Amazon Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats. Athena provides a simplified, flexible way to analyze petabytes of data where it lives. Analyze data or build applications from an Amazon Simple Storage Service (S3) data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python. Athena is built on open-source Trino and Presto engines and Apache Spark frameworks, with no provisioning or configuration effort required.



Use Cases

  • Run queries on S3, on premises, or on other clouds
    Submit a single SQL query to analyze data in relational, non-relational, object, and custom data sources running on S3, on premises or in multicloud environments.


  • Prepare data for ML models
    Use ML models in SQL queries or Python to simplify complex tasks, such as anomaly detection, customer cohort analysis, and sales predictions.

     
  • Build distributed big data reconciliation engines
    Deploy a reconciliation tool with an engine built for the cloud to validate vast amounts of data effectively at scale.


  • Perform multicloud analytics
    Query Azure Synapse Analytics data and visualize the results with Amazon QuickSight.