
Access S3Tables data with Grafana and Amazon Athena
by Paul SymonsContents
Summary
Amazon S3Tables is a new way to manage Iceberg tables on AWS.
Accessing the data using the Amazon Athena Console is straight forward, but making it play nice with Grafana takes a little extra work.
In this blog we explain how to utilise your S3Tables Catalogs when using the Grafana Athena Data Source plugin.
Context
In my last blog “Airwaves to Data Lake in 60 Seconds”, we described how enabling S3Tables bucket integration AWS Analytics Services (described in this terrifying looking documentation) allows you to query your S3Tables data directly in services like Athena.
This is an excellent way to try out Iceberg Tables with relatively little investment in terms of both time and money. What is not obvious is that the Athena Console presents a view of your data that is different to what the Athena API makes available to you.
In the following screenshot, you can see the default AwsDataCatalog
as a Data Source: a new Catalog selector is shown, featuring:
-
None
- the Catalog containing the original Glue Catalog Databases and Tables -
s3tablescatalog/tind-flight-tracking
, a Catalog based on my S3Tables Table Bucket
A problem arises when you try to access your S3Tables data using the Grafana Athena Data Source -
the S3Tables Table Buckets do not appear in the list of catalogs to select, nor in the list of
databases offered when AwsDataCatalog
is selected:
This occurs because of the (correct, in my opinion) way that the Grafana Athena Data Source lists catalogs.
Grafana Plugin for Athena
I’ve been using the Grafana plugin for Athena for about 3 years now - if you have time series data available through Athena, it can be a valuable way to quickly visualise your data in aesthetically pleasing ways.
When you set up the Grafana Athena Data Source, it presents auto-populated drop down boxes to select the Data Source and Database. These boxes can not be free-typed into, so if a value does not appear in the list, you cannot override it.
Behind the scenes, the plugin is using the Athena API to do the following calls:
- ListDataCatalogs to list the Data Sources
- ListDatabases to list the Databases in a selected Data Source
- ListTableMetadata to list the tables in the selected Database
As you can probably guess by now - the S3Tables Data Sources shown in the Amazon Athena Console (e.g. s3tablescatalog/tind-flight-tracking
)
are not listed in the results of the ListDataCatalogs API response.
The Poison
What’s really going to bake your noodle is that if you use the ListDatabases API call
with the a catalog-name
parameter matching the federated name of your S3Tables bucket, e.g. s3tablescatalog/<your-table-bucket-name>
,
it will successfully list the namespaces from your S3Tables Table Bucket!
The Remedy
The workaround here is to create new Athena Catalog objects that map directly to your S3Tables Table Buckets.
You can do this on the CLI as follows:
$ aws athena create-data-catalog --name s3t-tind-flight-tracking \
--type GLUE \
--parameters catalog-id=s3tablescatalog/tind-flight-tracking
Or if using Terraform, something like this:
resource "aws_athena_data_catalog" "athena_catalog_tind_flight_tracking" {
name = "s3t-tind-flight-tracking"
description = "Data Catalog connection to S3Tables Table Bucket"
type = "GLUE"
parameters = {
catalog-id = "s3tablescatalog/tind-flight-tracking"
}
}
Once you have done this, the catalog s3t-tind-flight-tracking
shown above will be available
to the Grafana Athena plugin, and you can carry on your righteous business. Below, your new
catalog appears as an additional Data Source in the Athena Console.
Rant of the Day
The need to create a separate Athena Data Catalog is lightly documented in the Athena User Guide, though you’d really have to have an eagle eye to find it and translate its Console-only instructions to the API or CLI equivalents.
What’s more confusing and concerning to me is the inconsistency with Amazon Athena: the experience in the console differs from the experience offered by the API.
It begins consistently - the data sources presented in the Athena Console match the responses in the ListDataCatalogs API response, showing:
- the
AwsDataCatalog
data source - this is the traditional “Glue Catalog” - My new S3Tables Athena data catalog -
s3t-tind-flight-tracking
I guess we can confidently call AwsDataCatalog
a data catalog, and s3t-tind-flight-tracking
a data catalog, right? Or are they Data Sources — I’m confused.
Data Source - AwsDataCatalog
When I select the AwsDataCatalog
data source, I am now presented a dropdown selector to choose, well, another Catalog.
How interesting! Within the AwsDataCatalog
data source (or data catalog, using the API parlance),
I can see my S3Tables Catalog as a selectable Catalog.
It begins to feel like a dream within a dream.
And the original Glue Catalog databases and tables are helpfully placed in the catalog named None
. Still following?
This is where the Console experience differs from the API experience - there is no API equivalent to the Catalog dropdown shown above, therefore any consumers (like the Grafana Athena plugin) that use only the Athena API will not see the S3TablesCatalog catalogs shown in the Athena Console, unless you add them yourself.
Language Matters
By now we have become accustomed to vendors using different names for the varying hierarchies of fluff that sit collectively above relations (tables, views, etc.).
Snowflake | Redshift | Databricks | Athena | LakeFormation | |
---|---|---|---|---|---|
Relations | Table | Table | Table | Table | Table |
Layer 1 | Schema | Schema | Schema | Database(Glue) or Namespaces(S3Tables) | Database |
Layer 2 | Database | Database | Catalog | Catalog(S3Tables) or None (Glue) | Catalog |
Layer 3 | Horizon Catalog | Redshift Catalog or Federated | Unity Catalog | Data Sources(Athena Catalogs) | Catalogs |
What is different here is that Athena and/or LakeFormation and Glue are not consistent with each other, and that is a problem.
In my opinion, this is unlikely to be an oversight, i.e. Athena team not adding in s3tablescatalog to the ListDataCatalogs API. It is a fundamental disunity in the concept of data hierarchy with in the AWS ecosystem, because each product and thus API has a different worldview.
- Athena has its own definition of catalogs that can be of type
GLUE
or can be Federated - using Glue Connections - Glue Data Catalog is a simple Hive Metastore whose only hierarchy for holding relations (tables and views) is a database.
- LakeFormation is a permission layer, but also lists the data hierarchy in its own way which is not consistent with Athena nor Glue
I find it very difficult to rationalise this disunity with customers and peers.
So in a forthcoming blog, I’ll demonstrate how you can stick with S3Tables, use Trino and ditch the rest of the AWS stack, saving yourself both heartache and money.