Added a counter metrics to track total number of build jobs and total number of failed builds. These can be used to calculate the build success rate in grafana.
In the previous kubernetes executor the build job was persisted in DEBUG mode due to the virtual machine in the pod never exiting. This kept the job alive for users to view the debug information. The `kubernetesPodman` executor does not run the VM so it will be cleaned up due to `ttlSecondsAfterFinished` being set on the job. This change prevents the `ttlSecondsAfterFinished` field from being set when DEBUG is true, allowing the pod to stay alive to retrieve the logs.
Setting the backoffLimit to 1 for kubernetes and kubernetesPodman builds. Prevents subsequent attempts from failing due to the token expiring. Having the job recreate pods is unnecessary as the build manager already has the retry logic.
Currently the CI breaks due to a dependency of black, `click`, breaking with it's latest release with `ImportError: cannot import name '_unicodefun' from 'click'`. Since black does not pin it's version of click it pulls in the latest version containing the breaking change and fails the CI check. This updates black with the patch. [See the original issue here.](https://github.com/psf/black/issues/2964) The rest of the changes are format updates introduced with the latest version of black.
Adding JOB_REGISTRATION_TIMEOUT to take effect on generating the build registration token. Also adding the DEBUG option to the kubernetesPodman executor.
Changes made to allow use of a single quay-builder image for kubernetes and kubernetesPodman builds.
Implements the following changes:
- Added EXECUTOR env var to kubernetsPodman job configuration
- Updated the builder ignition config to overwrite the registry.conf file to set short name mode to permissive
- Always run the quay-builder in the VM as root
If not set, TimeoutStartSec for the Docker service is set to
600. Since it's a service of type oneshot, this should either not be
set, or at least the length of the machine's lifetime.
Allow the build to move forward if it is already in the desired
phase/state. When a build fails, ang gets retried from the queue, its
phase doesn't get updated back to WAITING. So it is possible that it
is already in a phase such as SCHEDULED, which could prevent the
buildman from marking the new attempt as scheduled, as there would be
no aparent changes made to the build phase.
With the previous 15 sec hard-coded value, some build could
sporadically expire before having the time to boot start instance and
make the registration rpc call. Change the default to 30 sec, and make
it configurable.
* buildman: Add proxy variables to builds if they exist (PROJQUAY-2120)
Adds the ability to define proxy variables for builders. The proxy variables are parsed as env. variables and defined in Quay's config.yaml file.
* buildman: Add proxy variables to builds if they exist (PROJQUAY-2120)
Adds the ability to define proxy variables for builders. The proxy variables are parsed as env. variables and defined in Quay's config.yaml file.
Update the log level from EXCEPTION to WARNING when getting a KeyError
from the orchestrator. The KeyError is valid and happens when a build
has expired.
When set to true, DEBUG will prevent the build nodes from shutting
down after the quay-builder service is done or fails, and will prevent the
build manager from cleaning up the instances (terminating EC2
instances or deleting k8s jobs).
This will allow debugging builder node issues, and should not be set
in a production environment.
The lifetime service will still exist. i.e The instance will still
shutdown after ~2h (EC2 instances will terminate, k8s jobs will
complete)
Setting DEBUG will also affect ALLOWED_WORKER_COUNT, as the
unterminated instances/jobs will still count towards the total number
of running workers.
* Handle non 200 api response from executors
* Allows the CA cert to be specified in the config for server verification
Allow the CA cert used for server verification to be specified in the
config even if client certificate authentication is not used.
Handles non-200 responses from executors when trying to get worker count.
* Use safe_load when loading the config yaml
* Setup nginx ssl termination for grpc endpoints
* Bootstrap Quay's ca cert in the build executor nodes
* Update certificate mount point in ignition config
Mount the Fedora CoreOS/RHCOS based cert directory to /certs in the
builder container, where it will be installed by the container's
entrypoint.
Allow specifying the container runtime to the templated ignition file
Allow specifying the container runtime in the executor's ignition
file. This allow for different runtimes, e.g Docker, Podman to run a build.
* Reenable builder in supervisord config
* Rewrites the buildmanager to use gRPC
Rewrite of the current buildmanager using gRPC.
This deprecates the enterprise type builder, as individual nodes will
no longer keep track of build states because of WAMP.
Also removes trollius, which was required by the WAMP servers.
Instead, gRPC uses a threaded model to serve its requests.
Deprecates etcd as state trakcing for build states in favor of Redis
only.
Defines a state interface to manage/transition build states, implemented by the
buildmanager.
* Fix incorrect reference to aws connection
* Truncate the "Token" tag in ec2 to 36 char.
Normalize the token tag to 36 char in EC2.
Add an expiration to the original redis key, in the event that the
expiry handler is not able to delete the key, the original should be
removed eventually.
* Orchestrator: add context to KeyError
* EXPOSE 50051 in Dockerfiles
* Add buildman/README
Used by the manager to schedule builds based on the current running
count. Uses the specific executors' api to get the count of running
builders instead of Redis/Orchestrator.
This is due to issues encountered in the past where the manager would
have problems scheduling builds, and go into a weird state when
Redis was unavailable.
Remove wamp's REALM/websocket parameters from executor
Remove asyncio from executor
* Update the executor image from Container Linux to Fedora CoreOS
* Move the container cloud config script for templating from devtable to quay's repo
* Ignition config template
* Move dockersystemd from devtable repo
* Remove pinned dependency on devtable/container-cloud-config
* Removes squashed image and logentries
* Update builder image
* Update mounted cert directory for Fedora
* Removes old clouconfig template
* Pass userdata as firmware config to qemu
* Use CentOS:8 as base image
* Convert all Python2 to Python3 syntax.
* Removes oauth2lib dependency
* Replace mockredis with fakeredis
* byte/str conversions
* Removes nonexisting __nonzero__ in Python3
* Python3 Dockerfile and related
* [PROJQUAY-98] Replace resumablehashlib with rehash
* PROJQUAY-123 - replace gpgme with python3-gpg
* [PROJQUAY-135] Fix unhashable class error
* Update external dependencies for Python 3
- Move github.com/app-registry/appr to github.com/quay/appr
- github.com/coderanger/supervisor-stdout
- github.com/DevTable/container-cloud-config
- Update to latest mockldap with changes applied from coreos/mockldap
- Update dependencies in requirements.txt and requirements-dev.txt
* Default FLOAT_REPR function to str in json encoder and removes keyword assignment
True, False, and str were not keywords in Python2...
* [PROJQUAY-165] Replace package `bencode` with `bencode.py`
- Bencode is not compatible with Python 3.x and is no longer
maintained. Bencode.py appears to be a drop-in replacement/fork
that is compatible with Python 3.
* Make sure monkey.patch is called before anything else (
* Removes anunidecode dependency and replaces it with text_unidecode
* Base64 encode/decode pickle dumps/loads when storing value in DB
Base64 encodes/decodes the serialized values when storing them in the
DB. Also make sure to return a Python3 string instead of a Bytes when
coercing for db, otherwise, Postgres' TEXT field will convert it into
a hex representation when storing the value.
* Implement __hash__ on Digest class
In Python 3, if a class defines __eq__() but not __hash__(), its
instances will not be usable as items in hashable collections (e.g sets).
* Remove basestring check
* Fix expected message in credentials tests
* Fix usage of Cryptography.Fernet for Python3 (#219)
- Specifically, this addresses the issue where Byte<->String
conversions weren't being applied correctly.
* Fix utils
- tar+stream layer format utils
- filelike util
* Fix storage tests
* Fix endpoint tests
* Fix workers tests
* Fix docker's empty layer bytes
* Fix registry tests
* Appr
* Enable CI for Python 3.6
* Skip buildman tests
Skip buildman tests while it's being rewritten to allow ci to pass.
* Install swig for CI
* Update expected exception type in redis validation test
* Fix gpg signing calls
Fix gpg calls for updated gpg wrapper, and add signing tests.
* Convert / to // for Python3 integer division
* WIP: Update buildman to use asyncio instead of trollius.
This dependency is considered deprecated/abandoned and was only
used as an implementation/backport of asyncio on Python 2.x
This is a work in progress, and is included in the PR just to get the
rest of the tests passing. The builder is actually being rewritten.
* Target Python 3.8
* Removes unused files
- Removes unused files that were added accidentally while rebasing
- Small fixes/cleanup
- TODO tasks comments
* Add TODO to verify rehash backward compat with resumablehashlib
* Revert "[PROJQUAY-135] Fix unhashable class error" and implements __hash__ instead.
This reverts commit 735e38e3c1d072bf50ea864bc7e119a55d3a8976.
Instead, defines __hash__ for encryped fields class, using the parent
field's implementation.
* Remove some unused files ad imports
Co-authored-by: Kenny Lee Sin Cheong <kenny.lee@redhat.com>
Co-authored-by: Tom McKay <thomasmckay@redhat.com>
This is in attempts to keep the codebase as idiomatic as possible.
An addition benefit of reverting to the default histogram buckets is
that the slowest route durations more accurate.