A street sign showing a man carrying a delivery package

Using Pants to Package Your Python Lambda

by Paul Symons

Contents

Summary

I’ve recently been working on a system to catalogue some of the dataset files available on the AEMO WEM Market Data website, and in doing so, ended up writing a small Python application to crawl the webftp-like data file repository. The application gathers file metadata - used to trace changes over time - to ask:

  • what kind of data exists here?
  • when is data added, updated, and removed?
  • what is the rate of change of different data models?

Whilst building a command line application was useful, the ultimate goal was for it to run in an automated way, and for its output data to be collected, processed, then used to tell a story.

Systemising the Python application

Gee, that sounds fancy. I just wanted to be able to:

  • Lint and Test the application
  • Manage the dependencies of the application (3rd party libraries that are not part of the Python runtime)
  • Package together all the necessary bits of the application, in order to deploy it somewhere
  • Deploy the package to a system where it can be run on a schedule, and its output data be captured

I was 18 years into my career before I used Python, so I consider myself a latecomer. But here are my humble observations:

AdvantagesDisadvantages
Immeasurable strategies, libraries and utilities to get the job doneImmeasurable strategies, libraries and utilities to get the job done

It can be confusing for a newcomer to chart a way through this process, especially with Python’s complex and often messy history. But Pants can definitely help.

The fit for Pants

Pants is a multi-language build system that I hadn’t heard of until I started this project, but after doing some research have discovered had its origins with author Benjy Weinberger as he variously worked at Google, Twitter and latterly, Foursquare. Originally itself a Python application, it’s now written in Rust.

I’ve typically found packaging and deploying AWS Lambda Functions that have dependencies on other Python libraries, somewhat gross. It’s not that there aren’t good options out there: AWS CDK and AWS SAM are both effective at doing essentially the same thing, and given both are made by AWS you may wonder why you’d want to do it any other way than the numerous competing methods offered by the cloud service provider.

Notably, the playing field is uneven. Tools variously focus on:

  • packaging
  • deployment
  • both packaging and deployment

Pants does not deploy, it is used to test / build / package: therefore, it makes a good companion to Terraform.

Pairing with Terraform

Ultimately, many teams work with tools like Terraform to standarise Infrastructure-As-Code (IAC) delivery to platforms and capabilities beyond just the cloud service provider. Or because they don’t have a choice.

The Terraform AWS Provider does not natively support packaging of Python projects into lambda artifacts, and as such there are many workarounds employed by teams to achieve this goal, such as using null_resource resources and provisioner blocks, or innovative community modules such as AWS Lambda Terraform Module

As an alternative to these methods, I’m going to use Pants to test and package a Lambda Function zip artifact, and have Terraform refer to that artifact.

I like the “Pants included” approach, because it co-ordinates the various Python tools I’d otherwise use anyway, and makes it simple for me to configure them. I also get a consistent interface to my build process, whether I’m using Python or another supported language.

How I use Pants

I am going to focus on a few commands only. Pants can do a lot more, but I’ll leave that for you to explore. I’ll also skip over installation as the docs are really good - it’s quick and simple.

Like many other build systems such as Maven for Java, Pants has defined goals that you run to achieve your… goals. A key design …goal… of Pants is to be simple, and it often favours convention over configuration. Consequently - as documented on the Python Goals page - it relies on tools like pip for dependency resolution, pytest for testing, and various other community tools that you can choose from.

Before launching in to the anatomy of a Pants project, here are some sample commands:

commanddescription
pants test ::run my tests
pants package ::make a package of my project and any dependencies it requires
pants run src/python:librun your code
pants lint ::lint and format your code (if enabled)
pants tailor ::generate build files for your code

Most of that is probably familiar or self-explanatory, but there may be couple of things worth explaining.

  • pants is the build tool
  • test / package / lint etc. are all goals
  • the 3rd parameter is the target.

Targets allow you to identify specific things to focus a goal on (e.g. the named lib target in the src/python folder’s BUILD file). But who makes the BUILD file 😱 ? Well, the tailor (goal) does. But you can also just hand code them.

As shown above, you can also specify :: as a target in some cases, which means “Test every applicable target found for this goal”

Anatomy of a Pants Project

Before starting, make the Pants initial configuration your reference for how to start a Pants Project; in addition, you can refer to my sample repository which also serves as an example of Pants in action.

pants.toml - configuration file

This file is required, and largely determines the behaviour of Pants. In this file, you will declare the backends you require in your project - for example, Python support. In my project, I have the following backends defined:

backend_packages.add = [
    "pants.backend.python",
    "pants.backend.awslambda.python",
    "pants.backend.python.lint.black",
    "pants.backend.python.lint.flake8"
]

These give me support for:

  1. Python language and runtime support
  2. AWS Lambda packaging
  3. Black code formatter
  4. flake8 Linting

I have appreciated that Pants configuration is relatively flat hierarchically; for example, I have configured some flake8 exceptions, and added pytest coverage output options as follows:

[flake8]
args = ["--max-line-length 88", "--ignore=E501,W503"]

[coverage-py]
report = ["console","json","html"]

BUILD file generation

If we now run a command, for example pants test :: - it will return with no output. That’s because Pants requires BUILD files to know what to test.

To generate BUILD files, we run the tailor goal:

$ pants tailor ::
Created BUILD:
  - Add python_requirements target root
Created src/python/BUILD:
  - Add python_sources target python
  - Add python_tests target tests
Created src/python/support/BUILD:
  - Add python_sources target support
  - Add python_tests target tests
$

In my repository, Pants has created three BUILD files:

filereason/purpose
./BUILDdeclaration of python_requirements(), because Pants found `requirements.txt
./src/python/BUILDfolder, containing two targets:
- python_sources() - folder has Python source code
- python_tests(...) - test code was found (e.g. files ending _test.py)
./src/python/support/BUILDas above

If we didn’t yet have a requirements.txt - or, we preferred not to have one at all - we could simply add the python_requirements to any BUILD file where the dependency would be required, and Pants would take care of resolving that dependency when necessary.

You can run pants tailor at any time, not only when you start a project. This is useful and typically necessary as you add new code to your project.

Testing

At this stage, we are ready to run most of our commands. We can run:

  • pants run src/python/function.py to call our code
  • pants lint :: to format and lint our code, if we added those backends to our pants.toml configuration
  • pants test :: to test our code
    • pants test --use-coverage :: if we want coverage output

Packaging

Finally - and the reason why I felt Pants necessary - let’s add our packaging target.

I want to package my lambda handler src/python/function.py and all of its requirements. So I will edit the src/python/BUILD file, adding the following target:

python_aws_lambda_function(
    name="lambda",
    runtime="python3.10",
    handler="function.py:lambda_handler",
    output_path="aemo-inventory.zip"
)

Running pants package :: will now create dist/aemo-inventory.zip, and if we do a quick unzip -l dist/aemo-inventory.zip, you will see it includes not only our code, but also that of our requirements.

Top tip: Don’t call your function handler file lambda_function.py. Ask me how I know.

Pants in Action

Continuous integration to verify and validate builds, and continuous delivery to promote new artifacts into live environments, is my primary motivation for using Pants. To demonstrate that, here’s a sample of the GitHub Actions workflow to lint, test, package and then deploy the Python application:

- name: INSTALL PANTS
  run: |
      ./get-pants.sh

- name: RUN LINTER
  run: |
      pants lint ::

- name: RUN TESTS
  run: |
      pants test --use-coverage ::
      if [ `jq '.totals.percent_covered' < dist/coverage/python/coverage.json` -lt ${MIN_TEST_COVERAGE} ]; then die "Did not meet minimum coverage requirement of ${MIN_TEST_COVERAGE}%" ; fi

- name: BUILD LAMBDA ZIP
  run: |
      pants package src/python:lambda

- uses: hashicorp/setup-terraform@v3
- name: TF INIT
  run: |
      terraform init

- name: TF PLAN
  run: |
      terraform plan -var-file=envs/ci.tfvars

- name: TF APPLY
  run: |
      terraform apply -auto-approve -var-file=envs/ci.tfvars

Some steps have been removed for brevity, but in some ways, running Pants feels very much like running Terraform. There are also a series of supporting Github Actions you can use if you prefer a tighter Pants integration.

Conclusion

I have really enjoyed using Pants: it configures easily, caches predictably, and runs fast.

I highly recommend you review the sizing chart to see if it might work for you.