If you execute `tests/lock_test.py` or `tox -e integration` on a fairly recent machine, you will get the following error during tests executing against a live Nginx instance:
```
no "ssl_certificate" is defined in server listening on SSL port while SSL handshaking, client: x.x.x.x, server: y:y:y:y:z
```
Indeed, having no defined ssl certificate for a ssl port would inevitably lead to an error during the handshake SSL process between a client and this mis-configured nginx instance.
However it was not a problem one year before, because the handshake was not occurring in practice: the test just need to have a nginx started, and then immediately proceed to modify the configuration with a correct SSL setup. And nginx was able to start with a mis-configuration on SSL.
But then this fix has been done: https://trac.nginx.org/nginx/ticket/178
Basically with this, validation of the configuration is done during nginx startup, that will refuse to start with invalid configuration on SSL. Consequently, all related tests are failing with a sufficiently up-to-date nginx. For now, it is not seen on Travis because Ubuntu Trusty is used, with an old Nginx.
The PR fixes that, by generating on the fly self-signed certificates in the two impacted tests, and pushing the right parameters in the Nginx configuration.
* Fix nginx configuration with self-signed certificates generated on the fly
* Fix lint/mypy
* Fix old cryptography
* Unattended openssl
* Update lock_test.py
* First part
* Several optimizations about the docker env setup
* Documentation
* Various corrections and documentation. Add acme and certbot explicitly as dependencies of certbot-ci.
* Correct a variable misinterpreted as a pytest hook
* Correct strict parsing option on pebble
* Refactor acme setup to be executed from pytest hooks.
* Pass TRAVIS env variable to trigger specific xdist logic
* Retrigger build.
* Work in progress
* Config operational
* Propagate to xdist
* Corrections on acme and misc
* Correct subnet for pebble
* Remove gobetween, as tls-sni challenges are not tested anymore.
* Improve pebble setup. Reduce LOC.
* Update acme.py
* Optimize acme ca setup, with less temporary assets
* Silent setup
* Clean code
* Remove unused workspace
* Use default network driver
* Remove bridge
* Update package documentation
* Remove rerun capability for integration tests, not needed.
* Add documentation
* Variable for all ports and subnets used by the stack
* Update certbot-ci/certbot_integration_tests/conftest.py
Co-Authored-By: adferrand <adferrand@users.noreply.github.com>
* Update certbot-ci/certbot_integration_tests/utils/acme.py
Co-Authored-By: adferrand <adferrand@users.noreply.github.com>
* Update certbot-ci/certbot_integration_tests/utils/misc.py
Co-Authored-By: adferrand <adferrand@users.noreply.github.com>
* Update tox.ini
Co-Authored-By: adferrand <adferrand@users.noreply.github.com>
* Update certbot-ci/certbot_integration_tests/utils/misc.py
Co-Authored-By: adferrand <adferrand@users.noreply.github.com>
* Update certbot-ci/certbot_integration_tests/utils/acme.py
Co-Authored-By: adferrand <adferrand@users.noreply.github.com>
* Update certbot-ci/certbot_integration_tests/utils/acme.py
Co-Authored-By: adferrand <adferrand@users.noreply.github.com>
* Update certbot-ci/certbot_integration_tests/conftest.py
Co-Authored-By: adferrand <adferrand@users.noreply.github.com>
* Rename to acme_server
* Add comment
* Refactor in a unique context fixture
* Remove the need of CERTBOT_ACME_XDIST environment variable
* Remove nonstrict/strict options in pebble
* Clean dependencies
* Clean tox
* Change function name
* Add comment about coveragerc specificities
* Change a comment.
* Update setup.py
* Update conftest.py
* Use the production-ready docker-compose.yml file for Pebble
* New style class
* Tune pebble to have a stable test environment
* Pin a dependency
The rerun capability in a test campaign can be a nice feature. It can also a bad design.
It is right that with high level tests, like performance or end-to-end tests, a given test runtime can depend on an external component, outside the scope of the developers, that would spuriously fail. In theses situations, having the capacity to rerun several time a test can be a great benefit. Indeed, as these tests are inherently flaky, the rerun greatly reduces their failure because of external reasons, reducing also the Pierre et le Loup effect (from a well-known french book for children): because tests constantly fail for no reason, you stop to listen to them and to not see when they fail for a real reason.
However this not apply to unit tests, or operations about code quality. For theses executions, the flakiness should not exist: a unit test is supposed to have no external dependency. A rerun approach would just hide a situation that is not desirable for this kind of tests
Also effectively I see that our tests are usually not flaky: so the only effect of the rerun is to give the failure state about a test three times slower. If a test becomes flaky, this should be fixed, and so be visible immediately in our CI.
For these reasons, I remove the travis_retry in the script section of .travis-ci.yml, to call directly tox and let the pipeline fails on the first error.
* Disable rerun feature of Travis
* Update .travis.yml
* Remove completely the retry logic
* Fix pipstrap on windows
* Pipstrap pin setuptools version, so explicit install it is not needed anymore.
* Rebuild letsencrypt-auto source
* Use sys.executable in pipstrap to allow straightforward execution in a venv by choosing the python interpreter from this venv.
* Update letsencrypt-auto
* Simulate test-everything
* Revert "Simulate test-everything"
This reverts commit b62c4d719a.
* Clean pipstrap code
* Reenabling OCSP cryptography support
* Refactor the validation logic of OCSP response to match the OpenSSL one
* Prepare runtime for OCSP response test
* Move unrelated test to another relevant place
* Reimplement OCSP status checks in integration tests
* Clean script
* Protect OCSP check against connection errors
* Update tests/certbot-boulder-integration.sh
Co-Authored-By: adferrand <adferrand@users.noreply.github.com>
* Cleaning
* Add a specific script for letsencrypt-auto install+help
* Remove inconsistent assertion
* Add executable permissions
* Remove unused variable
* Move testdata
* Corrected cleanup code
* Empty commit
Fixes#6755.
POSTing the `keyAuthorization` in a JWS token when answering an ACME challenge, has been deprecated for some time now. Indeed, this is superfluous as the request is already authentified by the JWS signature.
Boulder still accepts to see this field in the JWS token, and ignore it. Pebble in non strict mode also. But Pebble in strict mode refuses the request, to prepare complete removal of this field in ACME v2.
Certbot still sends the `keyAuthorization` field. This PR removes it, and makes Certbot compliant with current ACME v2 protocol, and so Pebble in strict mode.
See also [letsencrypt/pebble#192](https://github.com/letsencrypt/pebble/issues/192) for implementation details server side.
* New implementation, with a fallback.
* Add deprecation on changelog
* Update acme/acme/client.py
Co-Authored-By: adferrand <adferrand@users.noreply.github.com>
* Fix an instance parameter
* Update changelog, extend coverage
* Update comment
* Add unit tests on keyAuthorization dump
* Update acme/acme/client.py
Co-Authored-By: adferrand <adferrand@users.noreply.github.com>
* Restrict the magic of setting a variable in immutable object in one place. Make a soon to be removed method private.
* Add acme library usage example
Create, edit and deactivate account.
Setup and perform http-01 challenge.
Issue, renew and revoke certificate.
* Adapt example to ACME-v2 and exclude data persistence
The code to persist/load data would length this example and distract from what is actually important.
* Fix domain names and e-mail addresses
* Remove unnecessary license header
This usage example is under the license for the acme package.
* Remove logging information
The code will be mostly read by developers, so simplify the logging info into comments.
* Revert abstraction of simple methods
All methods that are used only once in this example were expanded into the main code in order to make the process more explicit.
* Fix missing URL suffix
* Improve aesthetics and reorganize workflow
Also make words capitalization consistent and improve comments.
No complaints from pep8.
So merging the study from @bmw and me, here is what happened.
Each invocation of `certbot.logger.post_arg_parse_setup` create a file handler on `letsencrypt.log`. This function also set an atexit handler invoking `logger.shutdown()`, that have the effect to close all logger file handler not already closed at this point. This method is supposed to be called when a python process is close to exit, because it makes all logger unable to write new logs on any handler.
Before #6667 and this PR, for tests, the atexit handle would be triggered only at the end of the pytest process. It means that each test that launches `certbot.logger.post_arg_parse_setup` add a new file handler. These tests were typically connecting the file handler on a `letsencrypt.log` located in a temporary directory, and this directory and content was wipped out at each test tearDown. As a consequence, the file handles, not cleared from the logger, were accumulating in the logger, with all of them connected to a deleted file log, except the last one that was just created by the current test. Considering the number of tests concerned, there were ~300 file handler at the end of pytest execution.
One can see that, on prior #6667, by calling `print(logger.getLogger().handlers` on the `tearDown` of these tests, and see the array growing at each test execution.
Even if this represent a memory leak, this situation was not really a problem on Linux: because a file can be deleted before it is closed, it was only meaning that a given invocation of `logger.debug` for instance, during the tests, was written in 300 log files. The overhead is negligeable. On Windows however, the file handlers were failing because you cannot delete a file before it is closed.
It was one of the reason for #6667, that added a call to `logging.shutdown()` at each test tearDown, with the consequence to close all file handlers. At this point, Linux is not happy anymore. Any call to `logger.warn` will generate an error for each closed file handler. As a file handler is added for each test, the number of errors grows on each test, following an arithmetical suite divergence.
On `test_sdists.py`, that is using the bare setuptools test suite without output capturing, we can see the damages. The total output takes 216000 lines, and 23000 errors are generated. A decent machine can support this load, but a not a small AWS instance, that is crashing during the execution. Even with pytest, the captured output and the memory leak become so large that segfaults are generated.
On the current PR, the problem is solved, by resetting the file handlers array on the logging system on each test tearDown. So each fileHandler is properly closed, and removed from the stack. They do not participate anymore in the logging system, and can be garbage collected. Then we stay on always one file handler opened at any time, and tests can succeed on AWS instances.
For the record, here is all the places where the logging system is called and fail if there is still file handlers closed but not cleaned (extracted from the original huge output before correction):
```
Logged from file account.py, line 116
Logged from file account.py, line 178
Logged from file client.py, line 166
Logged from file client.py, line 295
Logged from file client.py, line 415
Logged from file client.py, line 422
Logged from file client.py, line 480
Logged from file client.py, line 503
Logged from file client.py, line 540
Logged from file client.py, line 601
Logged from file client.py, line 622
Logged from file client.py, line 750
Logged from file cli.py, line 220
Logged from file cli.py, line 226
Logged from file crypto_util.py, line 101
Logged from file crypto_util.py, line 127
Logged from file crypto_util.py, line 147
Logged from file crypto_util.py, line 261
Logged from file crypto_util.py, line 283
Logged from file crypto_util.py, line 307
Logged from file crypto_util.py, line 336
Logged from file disco.py, line 116
Logged from file disco.py, line 124
Logged from file disco.py, line 134
Logged from file disco.py, line 138
Logged from file disco.py, line 141
Logged from file dns_common_lexicon.py, line 45
Logged from file dns_common_lexicon.py, line 61
Logged from file dns_common_lexicon.py, line 67
Logged from file dns_common.py, line 316
Logged from file dns_common.py, line 64
Logged from file eff.py, line 60
Logged from file eff.py, line 73
Logged from file error_handler.py, line 105
Logged from file error_handler.py, line 110
Logged from file error_handler.py, line 87
Logged from file hooks.py, line 248
Logged from file main.py, line 1071
Logged from file main.py, line 1075
Logged from file main.py, line 1189
Logged from file ops.py, line 122
Logged from file ops.py, line 325
Logged from file ops.py, line 338
Logged from file reporter.py, line 55
Logged from file selection.py, line 110
Logged from file selection.py, line 118
Logged from file selection.py, line 123
Logged from file selection.py, line 176
Logged from file selection.py, line 231
Logged from file selection.py, line 310
Logged from file selection.py, line 66
Logged from file standalone.py, line 101
Logged from file standalone.py, line 88
Logged from file standalone.py, line 97
Logged from file standalone.py, line 98
Logged from file storage.py, line 52
Logged from file storage.py, line 59
Logged from file storage.py, line 75
Logged from file util.py, line 56
Logged from file webroot.py, line 165
Logged from file webroot.py, line 186
Logged from file webroot.py, line 187
Logged from file webroot.py, line 204
Logged from file webroot.py, line 223
Logged from file webroot.py, line 234
Logged from file webroot.py, line 235
Logged from file webroot.py, line 237
Logged from file webroot.py, line 91
```
* Reapply #6667
* Make setuptools delegates tests execution to pytest, like in acme module.
* Clean handlers at each tearDown to avoid memory leaks.
* Update changelog
This PR fixes certbot-nginx and relevant tests to make them succeed on Windows.
Next step will be to enable integration tests through certbot-ci in a future PR.
* Fix tests and incompabilities in certbot-nginx for Windows
* Fix lint, fix oldest local dependencies
The method `lock_and_call`, in `certbot.tests.util` is designed to acquire a lock on a foreign process, then execute a callable in the current process. This is done to closely reproduce the lock mechanism involved between two certbot instances that are running in parallel.
This method uses the `multiprocessing` module. But its implementation in `lock_and_call` is broken for Windows: the two processes fail to communicate, leading to a deadlock.
In fact, `multiprocessing` module is using the fork mechanism on Linux, and the spawn mechanism on Windows, leading to behavior inconsistencies between the two platforms.
As this method is for tests, and not for production code, I did not try to make two implementations "by the book", one suitable for Windows, the other for Linux, like for the `certbot.lock` module.
Instead, I use a `subprocess` approach with a trigger file allowing to coordinate the current process and the subprocess. With this, `lock_and_call` is running from the same code both on Linux and Windows.
Relevant tests in the `certbot.tests.lock_test` test module are now enabled for Windows.
* Implement new lock_and_call method
* Reactivate tests for Windows
First PR about this issue, #6440, involved to much refactoring to ensure a correct behavior on Linux and installer plugins.
This PR proposes a new implementation of a lock mechanism for Linux and Windows, without implying a refactoring. It takes strictly the existing behavior for Linux, and add the appropriate logic for Windows. The `lock` module formalizes two independant mechanism dedicated to each platform, to improve maintainability.
Tests related to locking are re-activated for Windows, or definitively skipped because of irrelevancy for this platform. 6 more tests are enabled overall.
* Reimplement lock file on basic level
* Remove unused code
* Re-activate some tests
* Update doc
* Reactivate tests relevant to locks in Windows. Correct a test that was not testing what is supposed to test.
* Clean compat.
* Move close sooner in Windows lock implementation
* Add strong mypy types
* Use os.name
* Refactor lock mechanism logic
* Enable more tests
* Update lock.py
* Update lock_test.py
This PR updates and fixes `pebble-fetch.sh` considering latest improvements done on Pebble, to start a working instance.
* Fix the pebble fetch script
* Update pebble-fetch.sh
* Update tox.ini
Apache plugin will now use command line default values from `ApacheConfingurator.OS_DEFAULTS` instead of respective distribution override when `CERTBOT_DOCS=1` environment variable is present.
Fixes: #6234
* Apache: respect CERTBOT_DOCS environment variable
* Move the tests to apache plugin
I think this is causing failures in some of our tests so this PR reverts the change until we can fix the problem.
The 2nd commit is to keep the change using more idiomatic wording in the changelog for another change that got included in this PR.
* Revert "Use built-in support for OCSP in cryptography >= 2.5 (#6603)"
This reverts commit 2ddaf3db04.
* keep changelog correction
We always run a full set of CI tests before beginning the release process. The way this would work previously is we would either trigger tests on the `test-everything` branch to run through Travis' web UI or if it was a point release, create a new branch based on `test-everything` but modify `.travis.yml` so the branch that was pulled in to be tested was the point release branch instead of `master`.
This no longer works because the former `test-everything` tests are now only run when Travis automatically runs our tests nightly.
We could create and maintain a separate branch for the purpose of manually running all tests or remove the conditionals from the latest `.travis.yml` file every time before we want to run these tests, but there must be A Better Way™.
This PR makes the change that in addition to running all tests nightly, they would also run on pushes to tested branches other than master. These changes do not affect the tests run on PRs or on commits to `master`.
What is affected is commits to point release branches and branches named `test-*`. (See [.travis.yml](2ddaf3db04/.travis.yml (L177)) for what branches we run tests on.) Running all tests on point release branches automates the step of running our full test suite before doing a point release.
The changes to `test-*` could be a mixed bag, however, since we switched to travis-ci.com over 3 weeks ago, I'm the only one who has used this functionality and I personally prefer things this way. At the very least, since these branches don't seem to be widely used, I think we can make this change and reevaluate if it becomes a problem.
* Test all on push events to non-master branches.
* Move branches section up.
* expand comment
Attempts to configure all of the following VirtualHosts for answering the HTTP challenge:
* VirtualHosts that have the requested domain name in either `ServerName` or `ServerAlias` directive.
* VirtualHosts that have a wildcard name that would match the requested domain name.
This also applies to HTTPS VirtualHosts, making Apache plugin able to handle cases where HTTP redirection takes place in reverse proxy or similar, before reaching the Apache HTTPD.
Even though also HTTPS VirtualHosts are selected, Apache plugin tries to ensure that at least one of the selected VirtualHosts listens to HTTP-01 port (configured with `--http-01-port` CLI option). So in a case where only HTTPS VirtualHosts exist, but user wants to configure those, `--http-01-port` parameter needs to be set for the port configured to the HTTPS VirtualHost(s).
Fixes: #6730
* Select all matching VirtualHosts for HTTP-01 challenges instead of just one
* Finalize PR and add tests
* Changelog entry
In response to #6594. [Fixes #6594.]
To execute OCSP requests, certbot relies currently on a openssl binary execution. If openssl is not present in the PATH, the OCSP check will be silently ignored. Since version 2.4, cryptography has support for OCSP requests, without the need to have openssl binary available locally.
This PR takes advantage of it, and will use the built-in support of OCSP in cryptography for versions >= 2.4. Otherwise, fallback is done do a direct call to openssl binary, allowing oldest requirements to still work with legacy cryptography versions.
Update: requirement is now cryptography >= 2.5, to avoid to rely on a private method from cryptography.
* Implement logic using cryptography
* Working OSCP using pure cryptography
* Fix openssl usage in unit tests
* Reduce verbosity
* Add tests
* Improve naive skipIf
* Test resiliency
* Update ocsp.py
* Validate OCSP response. Unify OCSP URL get
* Improve resiliency checks, correct lint/mypy
* Improve hash selection
* Fix warnings when calling openssl bin
* Load OCSP tests assets as vectors.
* Update ocsp.py
* Protect against invalid ocsp response.
* Add checks to OCSP response
* Add more control on ocsp response
* Be lenient about assertion that next_update must be in the future, similarly to openssl.
* Construct a more advanced OCSP response mock to trigger more logic in ocsp module.
* Add test
* Refactor signature process to use crypto_util
* Fallback for cryptography 2.4
* Avoid a collision with a meteor.
* Correct method signature documentation
* Relax OCSP update interval
* Trigger built-in ocsp logic from cryptography with 2.5+
* Update pinned version of cryptography
* Update certbot/ocsp.py
Co-Authored-By: adferrand <adferrand@users.noreply.github.com>
* Update ocsp.py
* Update ocsp_test.py
* Update CHANGELOG.md
* Update CHANGELOG.md
The existing `acme.TLSALPN01` challenge class did not have
a `response_cls`, meaning it was not possible to use a tls-alpn-01
challenge with `client.answer_challenge`.
To support the above a simple `TLSALPN01Response` class is added that
doesn't provide the ability to solve a tls-alpn-01 challenge end to end
(e.g. generating and serving the correct response certificate) but that
does allow the challenge to be initiated. This is sufficient for users
that have set up the challenge response independent of the `acme`
module code.
Resolves#6676
The account path used to store user credentials is calculated from the domain used to contact the relevant ACME CA server.
For instance, if the directory URL is https://my.domain.net/directory, then the account path will be $CONFIG_DIR/accounts/my.domain.net.
However, if non standard HTTP/HTTPS port need to be used, colons will be involved. For instance, https://my.domain.net:14000/directory will give $CONFIG_DIR/accounts/my.domain.net:14000.
Colons in paths are supported on POSIX systems, but not on Windows (it is reserved for the root drive letter).
This PR replaces colons by underscores for account paths on Windows, and leaves them untouched on Linux.
* Fix account path on Windows when colons are involved
* Protect colon in drive letter
* Refactor compat platform specific logic
It was pointed out to me that you can no longer run tox.cover.py directly to run coverage tests on a subset of the packages in this repo.
This happened after we did both of:
1. Factored out --pyargs from the different test files and put it in pytest.ini.
2. Moved the options we added to pytest.ini to tox.ini meaning that --pyargs is not set unless you run the file through tox.
I think the fact that we factored out --pyargs from the files that needed it was a mistake. --pytest is needed by tox.cover.py and install_and_test.py in order to work correctly.
I think CLI options like this which are needed for the file to function should be left in the file directly. Doing anything else in my opinion unnecessarily couples these scripts to other files making them more brittle and harder to maintain.
With that said, I also think CLI options which are not needed (such as --numprocesses) can be left to be optionally added through PYTEST_ADDOPTS.
* Add --pyargs to tox.cover.py.
* Add --pyargs to install_and_test.py.
* Remove --pyargs from tox.ini.
When certbot is executing, several resources are opened. It is typically file handles and locks on them. Of course, theses resources need to be cleanup. It is done in Certbot by registering cleanup functions through atexit module, that ensures theses functions will be called when Certbot exit. This allow to not care about resource cleanup everywhere in the code, as it is processed globally.
The problem with atexit is it cleanup functions are called when the Python program exit. If the program is Certbot itself when used, this is Pytest in unit test execution. So during a unit test execution, cleanup is not called after a test and before its tearDown, but when Pytest exit, so way after tests and their respective tearDown.
But many tearDown implies to delete folders where this kind of resources are hold.
This is never a problem on Linux, thanks to its non-blocking file handling. It is usually not a problem on Windows, despite its blocking approach. But if the tearDown requires folder cleanup, exceptions are raised, and currently hidden as warnings. There is currently 504 exceptions of this type in Certbot core tests on Windows.
This PR starts to correct this situation. To do so, some of the functions cleanup normally called through atexit, are explicitly called as part of the tearDown process of relevant test classes, before directory removal is done. Theses situations come all from the certbot.tests.util.TempDirTestCase, so the code is in this specific tearDown process.
As a consequence, exceptions drop from 504 to 64.
Then there are still a significant part of them, that will be handled in later mitigation.
* Call atexit handlers before test tearDown to reduce errors on Windows
* Clear locks dict after global releasing execution
* Remove last tearDown errors.
* Clean out mock on open.
* Remove a test
* Reenable some tests
This PR reactivates tls-sni-01 challenges on recent Boulder versions checkout for integration tests. This allows to continue testing this challenge until it is officially dropped from server (Boulder) and client (Certbot).
Reverts #6679.