Skip to main content

Amazon Athena

Amazon Athena is an interactive query service that makes it easy to analyse data directly in S3 using standard SQL. There's no infrastructure to manage - you simply point Athena at your data, define the schema, and start querying.

Key Features

  • Serverless: No clusters or infrastructure to provision
  • Standard SQL: Uses Presto/Trino under the hood
  • Schema-on-read: Define schemas without transforming source data
  • Multiple formats: Query CSV, JSON, ORC, Avro, and Parquet

Common Use Cases

  • Log analysis: Query application logs, CloudTrail, VPC Flow Logs
  • Ad-hoc analytics: Explore data without setting up a data warehouse
  • Data lake queries: Analyse large datasets stored in S3
  • Cost analysis: Query AWS Cost and Usage Reports

Example Query

SELECT 
date,
COUNT(*) as requests,
AVG(response_time) as avg_response
FROM access_logs
WHERE status_code = 200
GROUP BY date
ORDER BY date DESC
LIMIT 30;

Performance Tips

  • Use columnar formats: Parquet and ORC significantly reduce query costs and time
  • Partition your data: Organise by date, region, or other common filters
  • Compress data: Use Snappy or GZIP compression
  • Use AWS Glue: Automate schema discovery with the Glue Data Catalog

What We Like

  • Pay per query: Only pay for data scanned
  • Zero maintenance: No servers, no tuning, no upgrades
  • Instant queries: No waiting for cluster spin-up
  • Integration: Works seamlessly with AWS Glue, QuickSight, and other services

What We Don't Like

  • Cost unpredictability: Poorly optimised queries on large datasets can be expensive
  • No indexes: Relies entirely on partitioning and columnar formats for performance
  • Concurrency limits: Default limits can be restrictive for multi-user workloads