AI generated image of rabbits inspecting dashboards, looking for a pattern of behaviour

Uncovering Event Patterns in SQL with Match_Recognize

by Paul Symons

Contents

Summary

MATCH_RECOGNIZE is a SQL clause that helps you identify sequential patterns in your datasets, using concise and readable syntax. It has innumerable use cases, but popular ones include:

  • Detecting patterns of fraud in transactional or game play data
  • Triggering alarm states in response to sequences of IOT device events
  • Identifying the duration of repeating event sequences (e.g. WA Fuelwatch cycles)

Though MATCH_RECOGNIZE was added to the SQL:2016 standard some years ago, only a handful of systems support it, notably:

In this blog, we will demonstrate its usefulness by

  • Illustrating a simplified dataset where patterns can be identified
  • Exploring the syntax of the MATCH_RECOGNIZE clause
  • Identify further learning opportunities for complex use cases

I have used the open-source Trino engine with DuckDB for the examples here, but the results should be similar elsewhere.

Our Dataset - WA Fuelwatch

In Western Australia, there is a legislated and government-run service called FuelWatch. Alongside the legislation - which requires service stations to maintain consistent pricing for 24 hour windows - the website service allows consumers to find the current price of a service station’s fuel, and (after 2pm) also tomorrow’s price of fuel.

I’ve previously experimented with this data, for a presentation at a local data meetup group in Perth, to demonstrate how useful in combination DuckDB and Evidence.dev are for analytics processing and Business Intelligence.

A curious phenomenon observed here with Unleaded Petrol (ULP) is that many retailers (especially in the Perth metropolitan area) engage in price cycling; that is, over a period of days, they raise and lower the prices gradually.

Desired Analysis

We want to understand how the duration of the price cycle varies over time, for each service station.

This is easy to understand visually, when looking at a single station on a graph:

However, with over 800 service stations in the dataset, graphing every service station is not a sustainable solution, and also doesn’t help us to compare cycle durations across comparable dimension keys, such as postcode, or petrol brand.

We will show how MATCH_RECOGNIZE helps us quickly identify the dates and price range for each successive cycle.

Data Structure

We have a single table called fuelwatch which is comprised of historical CSV files from the Fuelwatch website.

Here is a random sample of the data:

12345
publish_date2025-01-012025-01-012025-01-012025-01-012025-01-01
trading_name7-Eleven DunsboroughAmpol Foodary ButlerAmpol Foodary CarnarvonAmpol Foodary DoubleviewAmpol Foodary High Wycombe
brand_description7-ElevenAmpolAmpolAmpolAmpol
production_descriptionULPPULP98 RON98 RONULP
product_price179.9208.9223.9198.9164.9
address100 Commonage Rd11 Headingly Cr595 Robinson St365 Scarborough Beach Rd967 Abernethy Rd
locationDUNSBOROUGHBUTLERKINGSFORDDOUBLEVIEWHIGH WYCOMBE
postcode62816036670160186057
area_descriptionBusselton-ShireNorth of RiverCarnarvonNorth of RiverEast/Hills
region_descriptionSouth-WestMetroGascoyneMetroMetro

From now on, we will focus on the following key elements:

  • publish_date - the date when a price is applicable
  • trading_name - the name of the service station
  • product_description - we will only focus on the ULP value (unleaded petrol)
  • product_price - the price on the given day

Data Sample

We are going to focus on the Ampol Foodary Thornlie Square Service Station for the month-to-date of March 2025. It has the following values for ULP:

publish_dateproduct_descriptiontrading_nameproduct_price
2025-03-01ULPAmpol Foodary Thornlie Square171.9
2025-03-02ULPAmpol Foodary Thornlie Square170.9
2025-03-03ULPAmpol Foodary Thornlie Square171.9
2025-03-04ULPAmpol Foodary Thornlie Square163.9
2025-03-05ULPAmpol Foodary Thornlie Square188.9
2025-03-06ULPAmpol Foodary Thornlie Square182.9
2025-03-07ULPAmpol Foodary Thornlie Square177.9
2025-03-08ULPAmpol Foodary Thornlie Square175.9
2025-03-09ULPAmpol Foodary Thornlie Square170.9
2025-03-10ULPAmpol Foodary Thornlie Square172.9
2025-03-11ULPAmpol Foodary Thornlie Square163.9
2025-03-12ULPAmpol Foodary Thornlie Square187.9
2025-03-13ULPAmpol Foodary Thornlie Square178.9
2025-03-14ULPAmpol Foodary Thornlie Square176.9
2025-03-15ULPAmpol Foodary Thornlie Square172.9
2025-03-16ULPAmpol Foodary Thornlie Square171.9
2025-03-17ULPAmpol Foodary Thornlie Square167.9
2025-03-18ULPAmpol Foodary Thornlie Square160.9
2025-03-19ULPAmpol Foodary Thornlie Square199.9
2025-03-20ULPAmpol Foodary Thornlie Square183.9
2025-03-21ULPAmpol Foodary Thornlie Square178.9
2025-03-22ULPAmpol Foodary Thornlie Square175.9
2025-03-23ULPAmpol Foodary Thornlie Square171.9
2025-03-24ULPAmpol Foodary Thornlie Square169.9
2025-03-25ULPAmpol Foodary Thornlie Square159.9
2025-03-26ULPAmpol Foodary Thornlie Square171.9
2025-03-27ULPAmpol Foodary Thornlie Square171.9
2025-03-28ULPAmpol Foodary Thornlie Square167.9

The image below shows this data with some cycle start and end points marked, based on the above data. Green circles represent each cycle start, with the red circles representing the cycle end:

Using MATCH_RECOGNIZE

There’s no getting around the fact that MATCH_RECOGNIZE is a very powerful, but also complex SQL clause. Let’s build it up in stages.

Clause

It starts as follows:

SELECT * FROM fuelwatch
MATCH_RECOGNIZE(
    -- options
)

Note that the order in which options appear in the MATCH_RECOGNIZE clause are important; the statement may not compile if they are out of order.

All options are specified within the clause itself. The output of your query with MATCH_RECOGNIZE will differ depending on the options you specify. More on that later.

Partitions and Ordering

Like window functions, MATCH_RECOGNIZE can be applied partition-wise; this means patterns will be matched for rows that are grouped by distinct values of the partition key.

Ordering is also important, as patterns are expected to be matched sequentially. Omitting the order will often produce non-deterministic results.

Below we add the partition key trading_name (the name of the service station), and order by the publish_date of the price:

SELECT * FROM fuelwatch
MATCH_RECOGNIZE(
    PARTITION BY trading_name
    ORDER BY publish_date
    -- more options
)

Neither PARTITION BY nor ORDER BY are required, however it is difficult to imagine a use case that warrants omission of ORDER BY.

Declaring a Pattern

To specify how we will match sequences of rows, we introduce a pattern. This comprises a sequence of custom identifiers that may optionally include regex semantics.

For example, if I specify a pattern identifier LOWPRICE, that alone is valid. But I can additionally specify:

  • LOWPRICE{3,} to indicate 3 or more LOWPRICE pattern matches
  • LOWPRICE+ to indicate at least one LOWPRICE pattern match
  • LOWPRICE* for 0 or more LOWPRICE pattern matches

… and so on.

In our fuelwatch example, we want to match when the price goes down, optionally match when it plateaus, and finally, match when it goes up:

SELECT * FROM fuelwatch
MATCH_RECOGNIZE(
    PARTITION BY trading_name
    ORDER BY publish_date
    PATTERN ( DOWN_OR_PLATEAU* BOTTOM)
    -- more options
)

Defining Pattern Identifiers

Once a pattern has been specified, identifiers in the pattern can be defined.

To define identifiers, we use the DEFINE keyword, following by a comma separated list of identifiers and their definitions:

SELECT * FROM fuelwatch
MATCH_RECOGNIZE(
    PARTITION BY trading_name
    ORDER BY publish_date
    PATTERN ( DOWN_OR_PLATEAU* BOTTOM)
    DEFINE
        DOWN_OR_PLATEAU AS NEXT(product_price) <= product_price,
        BOTTOM AS NEXT(product_price) > product_price
    -- more options)

By now you are probably wondering, what is NEXT() for? The answer, it is one of many possible navigation functions that allow you to compare the current row to another row(s). In our case, we are comparing to a value in the row immediately following. These navigation patterns are incredible powerful and worth studying in detail to meet your use case.

It is not actually necessary to define every identifier, but if left undefined, an identifier will match any row relative to where it is referenced in the PATTERN().

For example, if we modified our pattern to be

PATTERN ( START DOWN_OR_PLATEAU* BOTTOM)

the START identifier would always match the row before a DOWN_OR_PLATEAU sequence was encountered.

This may seem useful, and it often is; but be careful, as the row matched by START will be included in match result. If your MATCH_RECOGNIZE clause is configured to skip past the last matching row - which is the default - your START row may steal the place of what should be your first DOWN_OR_PLATEAU row.

Adding Measures

Finally, we add some measures that will constitute our match results.

You can see more navigation functions here - e.g. FIRST() and LAST()

SELECT * FROM fuelwatch
MATCH_RECOGNIZE(
    PARTITION BY trading_name
    ORDER BY publish_date
    MEASURES    
        FIRST(DOWN_OR_PLATEAU.publish_date)    AS start_date
        , LAST(BOTTOM.publish_date)            AS end_date
        , FIRST(DOWN_OR_PLATEAU.product_price) AS start_price
        , LAST(BOTTOM.product_price)           AS end_price
        , DATE_DIFF('day', FIRST(DOWN_OR_PLATEAU.publish_date), LAST(BOTTOM.publish_date)) AS days	
    PATTERN ( DOWN_OR_PLATEAU* BOTTOM)
    DEFINE
        DOWN_OR_PLATEAU AS NEXT(product_price) <= product_price,
        BOTTOM AS NEXT(product_price) > product_price
)

This creates a result set that simplifies the desired facets of the input table shown previously:

trading_namestart_dateend_datestart_priceend_pricedays
Ampol Foodary Thornlie Square2025-03-012025-03-02171.9 170.91
Ampol Foodary Thornlie Square2025-03-032025-03-04 171.9163.9 1
Ampol Foodary Thornlie Square2025-03-052025-03-09188.9 170.9 4
Ampol Foodary Thornlie Square2025-03-102025-03-11172.9 163.9 1
Ampol Foodary Thornlie Square2025-03-122025-03-18187.9 160.9 6
Ampol Foodary Thornlie Square2025-03-192025-03-25199.9 159.9 6

Further Learning

We’ve barely scratched the surface of what MATCH_RECOGNIZE is capable of, and we’ve also relied on several defaults. I’ll explain these briefly below.

Rows per match

There are two options for how to generate match results from your query:

  • ONE ROW PER MATCH
  • ALL ROWS PER MATCH (and options)

The first option is the default, and creates a summary record for each complete PATTERN match, based on the your defined MEASURES.

The second option will return all rows included in each match, and optionally, other rows that matched nothing. In this mode, additional functions like CLASSIFIER() become more useful in MEASURES, notating the PATTERN identifier that this row matches, e.g. DOWN_OR_PLATEAU, BOTTOM, etc.

After Match Skip

When a full pattern has been matched, you can control where the “cursor” is placed next. This is specified with phrases beginning AFTER MATCH SKIP...

The default is AFTER MATCH SKIP PAST LAST ROW - this will look for new matches from the row immediately following the last row from your previous match.

One of a few alternatives include AFTER MATCH SKIP TO NEXT ROW, which has more of a sliding window effect.

Conclusion

MATCH_RECOGNIZE takes some commitment to learn, but it pays back in dividends. I’ve found one of the most useful ways to make progress is to start with ALL ROWS PER MATCH WITH UNMATCHED ROWS, and specify a single MEASURE that shows the output of the CLASSIFIER() function. This allows you to focus on labelling rows correctly. Once rows are labelled correctly, adding further measures becomes much more straight forward.

I hope you’ve enjoyed this brief introduction to the MATCH_RECOGNIZE clause - it is a very powerful way to analyse sequences of events in your data!

References

Post Analytics