Amazon Athena
Amazon Athena is an interactive query service that makes it easy to analyse data directly in S3 using standard SQL. There's no infrastructure to manage - you simply point Athena at your data, define the schema, and start querying.
Key Features
- Serverless: No clusters or infrastructure to provision
- Standard SQL: Uses Presto/Trino under the hood
- Schema-on-read: Define schemas without transforming source data
- Multiple formats: Query CSV, JSON, ORC, Avro, and Parquet
Common Use Cases
- Log analysis: Query application logs, CloudTrail, VPC Flow Logs
- Ad-hoc analytics: Explore data without setting up a data warehouse
- Data lake queries: Analyse large datasets stored in S3
- Cost analysis: Query AWS Cost and Usage Reports
Example Query
SELECT
date,
COUNT(*) as requests,
AVG(response_time) as avg_response
FROM access_logs
WHERE status_code = 200
GROUP BY date
ORDER BY date DESC
LIMIT 30;
Performance Tips
- Use columnar formats: Parquet and ORC significantly reduce query costs and time
- Partition your data: Organise by date, region, or other common filters
- Compress data: Use Snappy or GZIP compression
- Use AWS Glue: Automate schema discovery with the Glue Data Catalog
What We Like
- Pay per query: Only pay for data scanned
- Zero maintenance: No servers, no tuning, no upgrades
- Instant queries: No waiting for cluster spin-up
- Integration: Works seamlessly with AWS Glue, QuickSight, and other services
What We Don't Like
- Cost unpredictability: Poorly optimised queries on large datasets can be expensive
- No indexes: Relies entirely on partitioning and columnar formats for performance
- Concurrency limits: Default limits can be restrictive for multi-user workloads