Using Pants to Package Your Python Lambda
by Paul SymonsContents
- Summary
- Systemising the Python application
- The fit for Pants
- How I use Pants
- Anatomy of a Pants Project
- Pants in Action
- Conclusion
Summary
I’ve recently been working on a system to catalogue some of the dataset files available on the AEMO WEM Market Data website, and in doing so, ended up writing a small Python application to crawl the webftp-like data file repository. The application gathers file metadata - used to trace changes over time - to ask:
- what kind of data exists here?
- when is data added, updated, and removed?
- what is the rate of change of different data models?
Whilst building a command line application was useful, the ultimate goal was for it to run in an automated way, and for its output data to be collected, processed, then used to tell a story.
Systemising the Python application
Gee, that sounds fancy. I just wanted to be able to:
- Lint and Test the application
- Manage the dependencies of the application (3rd party libraries that are not part of the Python runtime)
- Package together all the necessary bits of the application, in order to deploy it somewhere
- Deploy the package to a system where it can be run on a schedule, and its output data be captured
I was 18 years into my career before I used Python, so I consider myself a latecomer. But here are my humble observations:
Advantages | Disadvantages |
---|---|
Immeasurable strategies, libraries and utilities to get the job done | Immeasurable strategies, libraries and utilities to get the job done |
It can be confusing for a newcomer to chart a way through this process, especially with Python’s complex and often messy history. But Pants can definitely help.
The fit for Pants
Pants is a multi-language build system that I hadn’t heard of until I started this project, but after doing some research have discovered had its origins with author Benjy Weinberger as he variously worked at Google, Twitter and latterly, Foursquare. Originally itself a Python application, it’s now written in Rust.
I’ve typically found packaging and deploying AWS Lambda Functions that have dependencies on other Python libraries, somewhat gross. It’s not that there aren’t good options out there: AWS CDK and AWS SAM are both effective at doing essentially the same thing, and given both are made by AWS you may wonder why you’d want to do it any other way than the numerous competing methods offered by the cloud service provider.
Notably, the playing field is uneven. Tools variously focus on:
- packaging
- deployment
- both packaging and deployment
Pants does not deploy, it is used to test / build / package: therefore, it makes a good companion to Terraform.
Pairing with Terraform
Ultimately, many teams work with tools like Terraform to standarise Infrastructure-As-Code (IAC) delivery to platforms and capabilities beyond just the cloud service provider. Or because they don’t have a choice.
The Terraform AWS Provider does not natively support packaging of Python projects into lambda artifacts, and as such
there are many workarounds employed by teams to achieve this goal, such as using null_resource
resources
and provisioner
blocks, or innovative community modules such as
AWS Lambda Terraform Module
As an alternative to these methods, I’m going to use Pants to test and package a
Lambda Function zip
artifact, and have Terraform refer to that artifact.
I like the “Pants included” approach, because it co-ordinates the various Python tools I’d otherwise use anyway, and makes it simple for me to configure them. I also get a consistent interface to my build process, whether I’m using Python or another supported language.
How I use Pants
I am going to focus on a few commands only. Pants can do a lot more, but I’ll leave that for you to explore. I’ll also skip over installation as the docs are really good - it’s quick and simple.
Like many other build systems such as Maven for Java, Pants has defined goals that you run to achieve your… goals. A key design …goal… of Pants is to be simple, and it often favours convention over configuration. Consequently - as documented on the Python Goals page - it relies on tools like pip for dependency resolution, pytest for testing, and various other community tools that you can choose from.
Before launching in to the anatomy of a Pants project, here are some sample commands:
command | description |
---|---|
pants test :: | run my tests |
pants package :: | make a package of my project and any dependencies it requires |
pants run src/python:lib | run your code |
pants lint :: | lint and format your code (if enabled) |
pants tailor :: | generate build files for your code |
Most of that is probably familiar or self-explanatory, but there may be couple of things worth explaining.
pants
is the build tooltest
/package
/lint
etc. are all goals- the 3rd parameter is the target.
Targets allow you to identify specific things to focus a goal on (e.g. the named lib
target in the src/python
folder’s BUILD
file).
But who makes the BUILD file 😱 ? Well, the tailor (goal) does. But you can also just hand code them.
As shown above, you can also specify ::
as a target in some cases, which means “Test every applicable target found for this goal”
Anatomy of a Pants Project
Before starting, make the Pants initial configuration your reference for how to start a Pants Project; in addition, you can refer to my sample repository which also serves as an example of Pants in action.
pants.toml
- configuration file
This file is required, and largely determines the behaviour of Pants. In this file, you will declare the backends you require in your project - for example, Python support. In my project, I have the following backends defined:
backend_packages.add = [
"pants.backend.python",
"pants.backend.awslambda.python",
"pants.backend.python.lint.black",
"pants.backend.python.lint.flake8"
]
These give me support for:
- Python language and runtime support
- AWS Lambda packaging
- Black code formatter
- flake8 Linting
I have appreciated that Pants configuration is relatively flat hierarchically; for example, I have configured some flake8 exceptions, and added pytest coverage output options as follows:
[flake8]
args = ["--max-line-length 88", "--ignore=E501,W503"]
[coverage-py]
report = ["console","json","html"]
BUILD file generation
If we now run a command, for example pants test ::
- it will return with no output. That’s because Pants requires BUILD
files to know what to test.
To generate BUILD
files, we run the tailor
goal:
$ pants tailor ::
Created BUILD:
- Add python_requirements target root
Created src/python/BUILD:
- Add python_sources target python
- Add python_tests target tests
Created src/python/support/BUILD:
- Add python_sources target support
- Add python_tests target tests
$
In my repository, Pants has created three BUILD
files:
file | reason/purpose |
---|---|
./BUILD | declaration of python_requirements() , because Pants found `requirements.txt |
./src/python/BUILD | folder, containing two targets: |
- python_sources() - folder has Python source code | |
- python_tests(...) - test code was found (e.g. files ending _test.py) | |
./src/python/support/BUILD | as above |
If we didn’t yet have a requirements.txt
- or, we preferred not to have one at all - we
could simply add the python_requirements
to any BUILD
file where the dependency would be required,
and Pants would take care of resolving that dependency when necessary.
You can run pants tailor
at any time, not only when you start a project. This is useful
and typically necessary as you add new code to your project.
Testing
At this stage, we are ready to run most of our commands. We can run:
pants run src/python/function.py
to call our codepants lint ::
to format and lint our code, if we added those backends to our pants.toml configurationpants test ::
to test our codepants test --use-coverage ::
if we want coverage output
Packaging
Finally - and the reason why I felt Pants necessary - let’s add our packaging target.
I want to package my lambda handler src/python/function.py
and all of its requirements.
So I will edit the src/python/BUILD
file, adding the following target:
python_aws_lambda_function(
name="lambda",
runtime="python3.10",
handler="function.py:lambda_handler",
output_path="aemo-inventory.zip"
)
Running pants package ::
will now create dist/aemo-inventory.zip
, and if we do a quick unzip -l dist/aemo-inventory.zip
,
you will see it includes not only our code, but also that of our requirements.
Top tip: Don’t call your function handler file
lambda_function.py
. Ask me how I know.
Pants in Action
Continuous integration to verify and validate builds, and continuous delivery to promote new artifacts into live environments, is my primary motivation for using Pants. To demonstrate that, here’s a sample of the GitHub Actions workflow to lint, test, package and then deploy the Python application:
- name: INSTALL PANTS
run: |
./get-pants.sh
- name: RUN LINTER
run: |
pants lint ::
- name: RUN TESTS
run: |
pants test --use-coverage ::
if [ `jq '.totals.percent_covered' < dist/coverage/python/coverage.json` -lt ${MIN_TEST_COVERAGE} ]; then die "Did not meet minimum coverage requirement of ${MIN_TEST_COVERAGE}%" ; fi
- name: BUILD LAMBDA ZIP
run: |
pants package src/python:lambda
- uses: hashicorp/setup-terraform@v3
- name: TF INIT
run: |
terraform init
- name: TF PLAN
run: |
terraform plan -var-file=envs/ci.tfvars
- name: TF APPLY
run: |
terraform apply -auto-approve -var-file=envs/ci.tfvars
Some steps have been removed for brevity, but in some ways, running Pants feels very much like running Terraform. There are also a series of supporting Github Actions you can use if you prefer a tighter Pants integration.
Conclusion
I have really enjoyed using Pants: it configures easily, caches predictably, and runs fast.
I highly recommend you review the sizing chart to see if it might work for you.