Crossing the t’s: Athena vs. Redshift Spectrum

Alex Gordienko
Nerd For Tech
Published in
2 min readJan 31, 2021

--

Photo by Jeroen den Otter on Unsplash

Amazon provides two different managed services for querying data located in your data lake. The titles are AWS Athena and AWS Redshift Spectrum. These services both provide similar tools for managing data with SQL queries at the same price but have some distinctive features. I have prepared this article to cross the t’s and help you decide what to use in your data projects.

Both services provide full support of AWS Glue Data Catalog (and external Hive Metastore which is actually laid under Glue Data Catalog). There is no option to use Glue in some regions, but users of both tools from those regions could use Athena Catalog instead.

Being a part of the Redshift family, Redshift Spectrum natively supports connection to Redshift clusters. Athena, in contrast, is able to work with Redshift only through JDBC connectors. Here is additional information on this topic provided by AWS.

Athena has support for additional data types, which are STRUCT, ARRAY, and MAP. This allows Athena to work with unstructured and semi-structured data (i.e. JSON, Avro, etc.) as well as with structured data. This feature could be vital for explorations in the data lake. Meanwhile, Redshift Spectrum can manipulate only structured data, and only fits the daily needs of those who work with Redshift data warehouses.

One additional thing which you should keep in mind when choosing a service for querying data. Athena is able to work with S3 buckets from different regions, while Redshift Spectrum is able to load data only from buckets within the region. Prices for both services are the same — $5 per TB scanned. Redshift Spectrum is available in the AWS account only in parallel with the Redshift cluster what will cost additional money.

These are the most important features of both services in the table view. Please, feel free to share it:

Athena vs. Redshift Spectrum comparison

--

--