[Snapcraft 5.0](https://forum.snapcraft.io/t/release-notes-snapcraft-5-0/25751) implemented creating build IDs based on the project's contents instead of the directory path in https://github.com/snapcore/snapcraft/pull/3554. This is a feature we initially wanted, but it broke our workaround added in https://github.com/certbot/certbot/pull/8719. Our workaround is broken because now that the build ID is based on the project's contents, copying the project to a temporary directory has no effect.
This PR removes the workaround from https://github.com/certbot/certbot/pull/8719 and instead constructs a random build ID that it provides to snapcraft. This provides us with even more randomness to avoid build ID conflicts while avoiding having to copy the project to a temporary directory before every build.
* improve-remote-build
* use lowercase letters
I think this PR improves tools/snap/build_remote.py's output in a number of ways such as:
* Logs of snap builds were being deleted because they weren't being copied out of the temporary directory added in https://github.com/certbot/certbot/pull/8719.
* The lock should now always be acquired before printing output when multiple processes are running which helps prevent processes mixing their output with each other.
* Output is never buffered which ensures that repeated calls to `print` from the same process while it holds the output lock is kept together.
* The case where we printed output about the "chroot problem" and stopped retrying the build has been deleted because with the fix in https://github.com/certbot/certbot/pull/8719, we should be able to recover in this case.
* If the build failed for any reason, we dump as much output about the problem as we can. I think most times we won't need to read this output, but I personally prefer it being there in case we want it for some reason. Due to this change, I also simplified `_build_snap` and `_dump_results` a bit since `_build_snap` handles printing logs as needed.
* print more output
* lock when printing output
* clarify purpose of lock
* preserve logfiles
* python better
* consistently flush output
* remove workspaces dict
* rename variable
* remove unused variable
* don't use all which exits early
* fix typo
* Upgrade cryptography to 3.4.6
* Fix comment with instructions for how to use hashin
* run tools/rebuild_certbot_constraints.py
* add deps for building cryptography in snaps
* Update cryptography build dependencies for docker
* Update sources for test farm tests
* Remove rust if it's installed for test farm tests
* source bootstrap script and call sudo as needed
We observed recently several unexpected behavior during the execution of snap jobs in Azure. In particular it seems that `snapcraft remote-build` is tending to reattach to the latest builds on Launchpad triggered by the nightly builds on master, independently from the actual branch, status of the code, or targeted architectures.
Primarily if the builds on Launchpad are stalled for some reason, it blocks effectively any other Azure snap jobs until someone manually cancel the builds on Launchpad. Secondarily it means that the outcome of the builds may be inconsistent, because they can be the result of a build for the master source even if you are on a PR that modifieds these sources (including `snapcraft.yaml`).
After digging in `snapcraft` source code, I realized that the signature computed to understand if a build should be resumed, is not based one some hashes against the snapcraft working directory content, but is simply a hash of the working directory absolute path *itself*. It means that every builds triggered from the working directory `/my/path/certbot` for instance, are recognized as the same unique build on Launchpad side, and may be resumed if they already exist, and so independently from the source code, `snapcraft.yaml` or targeted archs.
For the record, relevant parts in `snapcraft` source code:
82024d3748/snapcraft/project/_project.py (L44)82024d3748/snapcraft/project/_project.py (L86-L89)82024d3748/snapcraft/cli/remote.py (L128-L132)
This PR makes effectively the resume build mechanism effectively a noop by moving the source code first in a temporary directory with random name before running `snapcraft remote-build`. This way the signature is never the same and builds are always recognized as brand new builds.
* Invalidate snapcraft remote-build cache by using a temporary workspace.
* Capture one more state in the build
* Kill snapcraft build when a "Chroot problem" is encountered
* Display specific helper for "Chroot problem" status and cancel retry mechanism in this case.
* Isolate build tmp directories
* Configure XDG_CACHE_HOME
* Kill snapcraftctl with chroot problem is encountered
This PR adds a `--timeout` flag to `tools/snap/build_remote.py` in order to fail the process if the time execution reaches the provided timeout. It is set to 5h30 on the relevant Azure job, while the job itself has a timeout of 6h managed on Azure side. This allows a slightly better output for these jobs when the snapcraft build stales for any reason.
While reviewing https://github.com/certbot/certbot/pull/8404, it occurred to me that we're keeping both the generated files and the script used to generate them in `git`. Keeping both around seems unnecessary and is almost asking for the files to get out of sync at some point in the future. I fixed that by removing the files, adding them to `.gitignore`, and updating `build_remote.py` to generate them as needed.
* Remove generated files.
* Add generated files to gitignore.
* Reuse generate_dnsplugins_all.sh in build_remote
Fixes#8409.
Change the line in the README to allow `sudo /snap/bin/lxd.migrate -yes` to fail (for example, if there's nothing to migrate), but the whole command to succeed.
I tested this on a clean Focal install and confirmed it works.
This PR adds the following documentation improvements to fix https://github.com/certbot/certbot/issues/7958:
- Simplify building external plugins
- Separate out certbot snap instructions from plugin instructions
- Mention that dnsimple is just an example for the plugin instructions
- Mention remote build for other architectures
- Mention snap doc exists elsewhere in developer guide (`contributing.rst`)
* Set up generate_dnsplugins_all.sh for all files and parametrize snapcraft and postrefreshhook files
* Create constraints file in the generate_dnsplugins_all script
* Separate out plugin and certbot snaps and update instructions
* Add remote build instructions
* Add pointers to the README to contributing.rst
With more and more of our wildcard instructions on https://certbot.eff.org telling people to use these plugins, I think we should get ready to move our DNS plugins to the stable channel. This PR removes grade: devel so the snap store doesn't prevent us from doing that when we want to. See #8128 where we did this to the Certbot snap for more info.
You can see the snap tests passing with this change at https://dev.azure.com/certbot/certbot/_build/results?buildId=2797&view=results.
This reverts commit feca125437.
Since this change landed, ARM builds for many of the DNS plugins have failed every night. See https://dev.azure.com/certbot/certbot/_build?definitionId=5 or our public Mattermost channel.
I quickly tried to fix this myself and wasn't trivially able to do so. I tried setting `SNAPCRAFT_PYTHON_VENV_ARGS: --system-site-packages` and adding `python3-wheel` as a build dependency, but it didn't work for some reason. The `python3-wheel` package didn't seem to be installed.
I still suspect something like this is the approach we should take, however, I want to fix the failing tests now so things are no longer broken in `master` and those of us on the Certbot team at EFF stop getting spammed with 54 (!!) emails about failed builds from launchpad every night.
Unfortunately, while I was working on this the queue for ARM machines on Launchpad jumped up to an estimated ~20 hour wait, but I confirmed that this fixes the problem by building on an ARM AMI using the instructions at https://github.com/certbot/certbot/blob/master/tools/snap/README.md#use-testing-and-development. If whoever reviews this would like an ARM machine to test on themselves, please let me know.
Fixes#8169
This PR improves snaps remote builds script by dumping the output of `snapcraft remote-build` when unexpected behavior is detected:
* when all builds for a project finish with a zero status code, and none of them are marked as failed, we expect to have all the associated snap files available locally.
* when some builds are marked as failed, we expect to have a build output for each of them available locally.
In these two situations, if the expectation are not matched, then the script will display the output of `snapcraft remote-build` itself. I added also a control error to handle nicely the absence of an expected build output on the local machine.
* Improve log dump in snaps remote builds when an unexpected behavior is detected
* Use the manager
* Update tools/snap/build_remote.py
Co-authored-by: Brad Warren <bmw@users.noreply.github.com>
Fixes#7863.
Connect command is `sudo snap connect certbot-dns-dnsimple:certbot-metadata certbot:certbot-metadata`
Logs are `cat /var/snap/certbot-dns-dnsimple/current/debuglog`
Echos in hook are only printed to terminal when it exits 0; otherwise, check logs in `debuglog` mentioned above.
Manual tests include all iterations of connected, unconnected, installed for the first, second time, etc, with passing and failing version checks.
* Make dnsimple not update if certbot is too old
* create an interface to read cb version
* add missing newline
* fix syntax
* trying to figure out the consumer syntax
* trying to figure out the consumer syntax, again
* only check post first install
* valid setting name
* test for first install differently
* snapctl doesn't error if it fails I guess
* time to do some print debugging
* continue playing with syntax
* once again, fooled by bash int vs string comparisons!
* debugging
* if we use post and pre together we can do this
* is this how content interface syntax works
* it's a directory?
* more debug
* what's that error message again?
* try other syntax
* if it's not documented just guess at syntax
* actually, I think this is the syntax
* oops didn't set for new hook
* test passing information along connection
* interface attributes can only be set during the execution of prepare hooks
* just do it with main connection
* undo last few test changes
* Add some printing to make sure we understand what's going on
* create empty directory to bind to
* put mkdir in the correct part
* let's inspect the environment
* it can't run bash directly.
* perhaps only directories can be shared via the contente interface
* update name of folder
* echo to debug log to understand what's going on exactly. we have file access though!
* update grep for new file
* more printing
* echo to the debug log
* ok NOW all print statements are going to the log
* why does echo need two >s
* remove unnecessary extra check, just check if the init file is available
* check if certbot version will be available post-refresh after all
* pre-refresh hook is not necessary to get certbot version
* update mkdir so we don't have to clean each time
* try comparing version numbers in python
* it's python3
* we need different prints for if we succeed or if we fail.
* improve bash syntax
* remove some debugging code
* Remove debug script
* remove spaces for clarity
* consolidate parts and remove more test code
* s/certbot-version/certbot-metadata/g
* use sys.exit instead of exit
* find and save certbot version on the certbot side
* change presence test to new file
* switch to using packaging.version.parse instead of LooseVersion
* switch to requiring certbot version >= plugin version
* add plugin snap changes to generate script
* Add comment to generation file saying not to edit generated files manually
* Create post-refresh hook for all plugins with script
* generate files using new script
* update snapcraft.yaml files for plugins
* bin/sh comes first
* Add packaging to install_requires
* Check that refresh is allowed in integration test
* switch plug and slot names in integration test
* Update tools/generate_dnsplugins_postrefreshhook.sh
Co-authored-by: Brad Warren <bmw@users.noreply.github.com>
* small bash fixes
* Update snap readme with new instructions
* Run tools/generate_dnsplugins_postrefreshhook.sh
* Update tools/snap/generate_dnsplugins_postrefreshhook.sh
Co-authored-by: Brad Warren <bmw@users.noreply.github.com>
Co-authored-by: Brad Warren <bmw@users.noreply.github.com>
Snapcraft has a feature name `remote-build`. It allows to compile snaps using the Canonical dedicated build architecture for several architectures. Compared to the QEMU-enabled Docker approach used currently, the remote build has several advantages:
* the builds are done on the native architecture, making them basically faster than what can be achieved on QEMU
* it avoids to depend on `adferrand/snapcraft` (which could be otherwise be fixed with the merge of https://github.com/snapcore/snapcraft/pull/3144, but this will not happen in the short term)
* when everything is good, all snaps build can be run in parallel and then can be orchestrated by one single Azure Pipeline job, since the heavy tasks are done remotely.
This PR makes the necessary ajustements to use the remote build feature instead of the QEMU-enabled docker approach.
One complex task was to be able to compile the `certbot` snap on `arm64` and `armhf`. Indeed on these architectures the pre-compiled wheel for `cffi` is not available. So it needs to be compiled during the snap build. Sadly, the current version of the python plugin in snapcraft is limited by the fact that `wheels` is not installed in the virtual environment set up to build the python packages, and there is no easy way to change that except by overridding the whole build process.
In the long term, I think I will open a PR on `snapcraft` Git repository to provide a consistent solution. But for the short term, I used the possibility to provide arguments to the `venv` module, to add the flag `--system-site-packages`. With it, the virtual environment can use the system site package, where `wheel` is available.
The other significant additions are in `tools/snap/build_remote.py` script. If invoking the remote build on a local machine is quite straight-forward, it is another story on the CI because we need build auditability and resiliency during these non-interactive actions. In particular we should avoid as possible inconsistent results on the nightly pipeline and the release pipeline.
So this script wraps the `snapcraft` call into a retry logic, and improves its logs in the context of parallel builds.
For the minor modifications, it is mainly about ensuring that plugins can be built (some of them also need `cffi` for instance), and simplify the Azure Pipeline since all snaps are retrieved in one go.
Please note that the `test-` branches still run only the `amd64` architecture. Indeed I noticed that builds on `arm64` and `armhf` are tending to be very slow to start (up to 40 min) while the `amd64` ones wait at max 10 mins, and usually 30 seconds only when the overall load on Canonical side is low.
To work on `certbot/certbot` repository, one secured file needs to be added, because `snapcraft` needs to be authenticated against Launchpad with credentials allowing remote builds. To do so, from a local machine that have this capability, one can extract the existing file at `$HOME/.local/share/snapcraft/provider/launchpad/credentials`, and register it as a secured file in Azure Pipeline with the name `snapcraftRemoteBuildCredentials`.
* Define scripts
* Setup pipeline to use remote builds
* Focus on packaging builds
* Set credentials
* Setup git
* Launch all builds in parallel
* Add dev dependencies to build cffi and cryptography
* Convert to a python logic
* Reorganize the pipeline
* Handle the fact that snap builds may be taken from cache
* Generate constraints
* Exit code
* Check existence
* Try to handle better non zero exit code
* Add --system-site-packages to get wheel in the venv
* Add executable permissions
* Troubleshoot
* Dynamic display, take the maximum timeout for snap build job
* Allow retries if the remote build does not start
* Trigger only amd64 builds for test branches
* Exit properly
* Update snapcraft.yaml
* Fix snap run
* Set secured file name
* Update .azure-pipelines/templates/jobs/packaging-jobs.yml
Co-authored-by: Brad Warren <bmw@users.noreply.github.com>
* Update .azure-pipelines/templates/jobs/packaging-jobs.yml
Co-authored-by: Brad Warren <bmw@users.noreply.github.com>
* Update .azure-pipelines/templates/jobs/packaging-jobs.yml
Co-authored-by: Brad Warren <bmw@users.noreply.github.com>
* Move order in deps
* Reactivate all builds
* Use Manager() as a context manager
* Use Pool as a context manager
* Some nice refactorings
* Check snapcraft execution interruption with exit codes
* Use f-string and format expressions
* Start log
* Consistent use of single/double quotes
* Better loop to extract lines
* Retry on build failures
* Few optimizations
Co-authored-by: Brad Warren <bmw@users.noreply.github.com>
Fixes#8041
This PR makes Azure Pipeline build the DNS plugins snaps for the 3 architectures during the CI.
It leverages the existing logic for building the Certbot snap in order to deploy a QEMU environment with Docker, and leverages the local PyPI index to speed up the build when installing `cffi` and `cryptography`.
All DNS plugins snaps are constructed in one unique docker container, in order to save the time required to install the system dependencies upon first start of `snapcraft`, and so speed up significantly the build.
Finally, all `amd64` DNS plugins snaps are built within 6 minutes. For `arm64` and `armhf`, it is around 40 mins: this is quite fast in fact, considering that 14 DNS plugins snaps are built.
However, this is still an extremely heavy task to make the full 3 architectures builds, even for Azure Pipelines and its 10 parallel jobs capability. That is why I make the `arm64` and `armhf` builds be skipped for the `full-test-suite`, and let them run only for `nightly` and `release`. This means however that these builds will not be done for the release branches. If this is a problem, I can put a more elaborate suspend condition to triggers the builds in this case.
All snaps are stored in the pipeline artifacts storage, making them available for publication during a `release` pipeline.
The PR is set as Draft for now, because I use temporarily `pr_test-suite` to validate the packaging jobs when commits are pushed. Once the PR is ready, I will revert it back to the normal configuration (run the standard tests).
* Configure a script to build DNS snaps
* Focus on packaging
* Trigger all architectures
* Add extra index
* Prepare conditional suspend
* Set final suspend logic
* Set final suspend value
* Loop for publication
* Use python3
* Clean before build
* Add a test
* Add test job in Azure
* Preserve env
* Apply normal config for pipelines
* Skip QEMU jobs only for test branches
* Makes snap run tests depends also on the Certbot snap build
* Update .azure-pipelines/templates/jobs/packaging-jobs.yml
Co-authored-by: Brad Warren <bmw@users.noreply.github.com>
* Update .azure-pipelines/templates/stages/deploy-stage.yml
Co-authored-by: Brad Warren <bmw@users.noreply.github.com>
* More accurate way to get the plugin snap name
* Integrate DNS snap tests into certbot-ci
* Fixes
* Update certbot-ci/snap_integration_tests/conftest.py
Co-authored-by: Brad Warren <bmw@users.noreply.github.com>
* Update certbot-ci/snap_integration_tests/conftest.py
Co-authored-by: Brad Warren <bmw@users.noreply.github.com>
* Clean an _init_.py file
Co-authored-by: Brad Warren <bmw@users.noreply.github.com>