31 Mar 2025

Uncovering Event Patterns in SQL with Match_Recognize

by Paul Symons

Summary
Our Dataset - WA Fuelwatch
Using MATCH_RECOGNIZE
Further Learning
- Rows per match
- After Match Skip
Conclusion
- References

Summary

MATCH_RECOGNIZE is a SQL clause that helps you identify sequential patterns in your datasets, using concise and readable syntax. It has innumerable use cases, but popular ones include:

Detecting patterns of fraud in transactional or game play data
Triggering alarm states in response to sequences of IOT device events
Identifying the duration of repeating event sequences (e.g. WA Fuelwatch cycles)

Though MATCH_RECOGNIZE was added to the SQL:2016 standard some years ago, only a handful of systems support it, notably:

In this blog, we will demonstrate its usefulness by

Illustrating a simplified dataset where patterns can be identified
Exploring the syntax of the MATCH_RECOGNIZE clause
Identify further learning opportunities for complex use cases

I have used the open-source Trino engine with DuckDB for the examples here, but the results should be similar elsewhere.

Our Dataset - WA Fuelwatch

In Western Australia, there is a legislated and government-run service called FuelWatch. Alongside the legislation - which requires service stations to maintain consistent pricing for 24 hour windows - the website service allows consumers to find the current price of a service station’s fuel, and (after 2pm) also tomorrow’s price of fuel.

I’ve previously experimented with this data, for a presentation at a local data meetup group in Perth, to demonstrate how useful in combination DuckDB and Evidence.dev are for analytics processing and Business Intelligence.

A curious phenomenon observed here with Unleaded Petrol (ULP) is that many retailers (especially in the Perth metropolitan area) engage in price cycling; that is, over a period of days, they raise and lower the prices gradually.

Desired Analysis

We want to understand how the duration of the price cycle varies over time, for each service station.

This is easy to understand visually, when looking at a single station on a graph:

However, with over 800 service stations in the dataset, graphing every service station is not a sustainable solution, and also doesn’t help us to compare cycle durations across comparable dimension keys, such as postcode, or petrol brand.

We will show how MATCH_RECOGNIZE helps us quickly identify the dates and price range for each successive cycle.

Data Structure

We have a single table called fuelwatch which is comprised of historical CSV files from the Fuelwatch website.

Here is a random sample of the data:

	1	2	3	4	5
publish_date	2025-01-01	2025-01-01	2025-01-01	2025-01-01	2025-01-01
trading_name	7-Eleven Dunsborough	Ampol Foodary Butler	Ampol Foodary Carnarvon	Ampol Foodary Doubleview	Ampol Foodary High Wycombe
brand_description	7-Eleven	Ampol	Ampol	Ampol	Ampol
production_description	ULP	PULP	98 RON	98 RON	ULP
product_price	179.9	208.9	223.9	198.9	164.9
address	100 Commonage Rd	11 Headingly Cr	595 Robinson St	365 Scarborough Beach Rd	967 Abernethy Rd
location	DUNSBOROUGH	BUTLER	KINGSFORD	DOUBLEVIEW	HIGH WYCOMBE
postcode	6281	6036	6701	6018	6057
area_description	Busselton-Shire	North of River	Carnarvon	North of River	East/Hills
region_description	South-West	Metro	Gascoyne	Metro	Metro

From now on, we will focus on the following key elements:

publish_date - the date when a price is applicable
trading_name - the name of the service station
product_description - we will only focus on the ULP value (unleaded petrol)
product_price - the price on the given day

Data Sample

We are going to focus on the Ampol Foodary Thornlie Square Service Station for the month-to-date of March 2025. It has the following values for ULP:

publish_date	product_description	trading_name	product_price
2025-03-01	ULP	Ampol Foodary Thornlie Square	171.9
2025-03-02	ULP	Ampol Foodary Thornlie Square	170.9
2025-03-03	ULP	Ampol Foodary Thornlie Square	171.9
2025-03-04	ULP	Ampol Foodary Thornlie Square	163.9
2025-03-05	ULP	Ampol Foodary Thornlie Square	188.9
2025-03-06	ULP	Ampol Foodary Thornlie Square	182.9
2025-03-07	ULP	Ampol Foodary Thornlie Square	177.9
2025-03-08	ULP	Ampol Foodary Thornlie Square	175.9
2025-03-09	ULP	Ampol Foodary Thornlie Square	170.9
2025-03-10	ULP	Ampol Foodary Thornlie Square	172.9
2025-03-11	ULP	Ampol Foodary Thornlie Square	163.9
2025-03-12	ULP	Ampol Foodary Thornlie Square	187.9
2025-03-13	ULP	Ampol Foodary Thornlie Square	178.9
2025-03-14	ULP	Ampol Foodary Thornlie Square	176.9
2025-03-15	ULP	Ampol Foodary Thornlie Square	172.9
2025-03-16	ULP	Ampol Foodary Thornlie Square	171.9
2025-03-17	ULP	Ampol Foodary Thornlie Square	167.9
2025-03-18	ULP	Ampol Foodary Thornlie Square	160.9
2025-03-19	ULP	Ampol Foodary Thornlie Square	199.9
2025-03-20	ULP	Ampol Foodary Thornlie Square	183.9
2025-03-21	ULP	Ampol Foodary Thornlie Square	178.9
2025-03-22	ULP	Ampol Foodary Thornlie Square	175.9
2025-03-23	ULP	Ampol Foodary Thornlie Square	171.9
2025-03-24	ULP	Ampol Foodary Thornlie Square	169.9
2025-03-25	ULP	Ampol Foodary Thornlie Square	159.9
2025-03-26	ULP	Ampol Foodary Thornlie Square	171.9
2025-03-27	ULP	Ampol Foodary Thornlie Square	171.9
2025-03-28	ULP	Ampol Foodary Thornlie Square	167.9

The image below shows this data with some cycle start and end points marked, based on the above data. Green circles represent each cycle start, with the red circles representing the cycle end:

Using MATCH_RECOGNIZE

There’s no getting around the fact that MATCH_RECOGNIZE is a very powerful, but also complex SQL clause. Let’s build it up in stages.

Clause

It starts as follows:

SELECT * FROM fuelwatch
MATCH_RECOGNIZE(
    -- options
)

Note that the order in which options appear in the MATCH_RECOGNIZE clause are important; the statement may not compile if they are out of order.

All options are specified within the clause itself. The output of your query with MATCH_RECOGNIZE will differ depending on the options you specify. More on that later.

Partitions and Ordering

Like window functions, MATCH_RECOGNIZE can be applied partition-wise; this means patterns will be matched for rows that are grouped by distinct values of the partition key.

Ordering is also important, as patterns are expected to be matched sequentially. Omitting the order will often produce non-deterministic results.

Below we add the partition key trading_name (the name of the service station), and order by the publish_date of the price:

SELECT * FROM fuelwatch
MATCH_RECOGNIZE(
    PARTITION BY trading_name
    ORDER BY publish_date
    -- more options
)

Neither PARTITION BY nor ORDER BY are required, however it is difficult to imagine a use case that warrants omission of ORDER BY.

Declaring a Pattern

To specify how we will match sequences of rows, we introduce a pattern. This comprises a sequence of custom identifiers that may optionally include regex semantics.

For example, if I specify a pattern identifier LOWPRICE, that alone is valid. But I can additionally specify:

LOWPRICE{3,} to indicate 3 or more LOWPRICE pattern matches
LOWPRICE+ to indicate at least one LOWPRICE pattern match
LOWPRICE* for 0 or more LOWPRICE pattern matches

… and so on.

In our fuelwatch example, we want to match when the price goes down, optionally match when it plateaus, and finally, match when it goes up:

SELECT * FROM fuelwatch
MATCH_RECOGNIZE(
    PARTITION BY trading_name
    ORDER BY publish_date
    PATTERN ( DOWN_OR_PLATEAU* BOTTOM)
    -- more options
)

Defining Pattern Identifiers

Once a pattern has been specified, identifiers in the pattern can be defined.

To define identifiers, we use the DEFINE keyword, following by a comma separated list of identifiers and their definitions:

SELECT * FROM fuelwatch
MATCH_RECOGNIZE(
    PARTITION BY trading_name
    ORDER BY publish_date
    PATTERN ( DOWN_OR_PLATEAU* BOTTOM)
    DEFINE
        DOWN_OR_PLATEAU AS NEXT(product_price) <= product_price,
        BOTTOM AS NEXT(product_price) > product_price
    -- more options)

By now you are probably wondering, what is NEXT() for? The answer, it is one of many possible navigation functions that allow you to compare the current row to another row(s). In our case, we are comparing to a value in the row immediately following. These navigation patterns are incredible powerful and worth studying in detail to meet your use case.

It is not actually necessary to define every identifier, but if left undefined, an identifier will match any row relative to where it is referenced in the PATTERN().

For example, if we modified our pattern to be
PATTERN ( START DOWN_OR_PLATEAU* BOTTOM)
the START identifier would always match the row before a DOWN_OR_PLATEAU sequence was encountered.

This may seem useful, and it often is; but be careful, as the row matched by START will be included in match result. If your MATCH_RECOGNIZE clause is configured to skip past the last matching row - which is the default - your START row may steal the place of what should be your first DOWN_OR_PLATEAU row.

Adding Measures

Finally, we add some measures that will constitute our match results.

You can see more navigation functions here - e.g. FIRST() and LAST()

SELECT * FROM fuelwatch
MATCH_RECOGNIZE(
    PARTITION BY trading_name
    ORDER BY publish_date
    MEASURES    
        FIRST(DOWN_OR_PLATEAU.publish_date)    AS start_date
        , LAST(BOTTOM.publish_date)            AS end_date
        , FIRST(DOWN_OR_PLATEAU.product_price) AS start_price
        , LAST(BOTTOM.product_price)           AS end_price
        , DATE_DIFF('day', FIRST(DOWN_OR_PLATEAU.publish_date), LAST(BOTTOM.publish_date)) AS days	
    PATTERN ( DOWN_OR_PLATEAU* BOTTOM)
    DEFINE
        DOWN_OR_PLATEAU AS NEXT(product_price) <= product_price,
        BOTTOM AS NEXT(product_price) > product_price
)

This creates a result set that simplifies the desired facets of the input table shown previously:

trading_name	start_date	end_date	start_price	end_price	days
Ampol Foodary Thornlie Square	2025-03-01	2025-03-02	171.9	170.9	1
Ampol Foodary Thornlie Square	2025-03-03	2025-03-04	171.9	163.9	1
Ampol Foodary Thornlie Square	2025-03-05	2025-03-09	188.9	170.9	4
Ampol Foodary Thornlie Square	2025-03-10	2025-03-11	172.9	163.9	1
Ampol Foodary Thornlie Square	2025-03-12	2025-03-18	187.9	160.9	6
Ampol Foodary Thornlie Square	2025-03-19	2025-03-25	199.9	159.9	6

Further Learning

We’ve barely scratched the surface of what MATCH_RECOGNIZE is capable of, and we’ve also relied on several defaults. I’ll explain these briefly below.

Rows per match

There are two options for how to generate match results from your query:

ONE ROW PER MATCH
ALL ROWS PER MATCH (and options)

The first option is the default, and creates a summary record for each complete PATTERN match, based on the your defined MEASURES.

The second option will return all rows included in each match, and optionally, other rows that matched nothing. In this mode, additional functions like CLASSIFIER() become more useful in MEASURES, notating the PATTERN identifier that this row matches, e.g. DOWN_OR_PLATEAU, BOTTOM, etc.

After Match Skip

When a full pattern has been matched, you can control where the “cursor” is placed next. This is specified with phrases beginning AFTER MATCH SKIP...

The default is AFTER MATCH SKIP PAST LAST ROW - this will look for new matches from the row immediately following the last row from your previous match.

One of a few alternatives include AFTER MATCH SKIP TO NEXT ROW, which has more of a sliding window effect.

Conclusion

MATCH_RECOGNIZE takes some commitment to learn, but it pays back in dividends. I’ve found one of the most useful ways to make progress is to start with ALL ROWS PER MATCH WITH UNMATCHED ROWS, and specify a single MEASURE that shows the output of the CLASSIFIER() function. This allows you to focus on labelling rows correctly. Once rows are labelled correctly, adding further measures becomes much more straight forward.

I hope you’ve enjoyed this brief introduction to the MATCH_RECOGNIZE clause - it is a very powerful way to analyse sequences of events in your data!

References

Fuelwatch MATCH_RECOGNIZE example with Trino and DuckDB
Row pattern recognition with MATCH_RECOGNIZE
DuckDB Roadmap - MATCH_RECOGNIZE support is planned