Added a counter metrics to track total number of build jobs and total number of failed builds. These can be used to calculate the build success rate in grafana.
In the previous kubernetes executor the build job was persisted in DEBUG mode due to the virtual machine in the pod never exiting. This kept the job alive for users to view the debug information. The `kubernetesPodman` executor does not run the VM so it will be cleaned up due to `ttlSecondsAfterFinished` being set on the job. This change prevents the `ttlSecondsAfterFinished` field from being set when DEBUG is true, allowing the pod to stay alive to retrieve the logs.
Setting the backoffLimit to 1 for kubernetes and kubernetesPodman builds. Prevents subsequent attempts from failing due to the token expiring. Having the job recreate pods is unnecessary as the build manager already has the retry logic.
This fixes the crash:
DataError: Invalid input of type: 'NoneType'. Convert to a bytes, string, int or float first.
This is happening because we access a value of a key which has expired
Currently the CI breaks due to a dependency of black, `click`, breaking with it's latest release with `ImportError: cannot import name '_unicodefun' from 'click'`. Since black does not pin it's version of click it pulls in the latest version containing the breaking change and fails the CI check. This updates black with the patch. [See the original issue here.](https://github.com/psf/black/issues/2964) The rest of the changes are format updates introduced with the latest version of black.
Adding JOB_REGISTRATION_TIMEOUT to take effect on generating the build registration token. Also adding the DEBUG option to the kubernetesPodman executor.
Changes made to allow use of a single quay-builder image for kubernetes and kubernetesPodman builds.
Implements the following changes:
- Added EXECUTOR env var to kubernetsPodman job configuration
- Updated the builder ignition config to overwrite the registry.conf file to set short name mode to permissive
- Always run the quay-builder in the VM as root
If not set, TimeoutStartSec for the Docker service is set to
600. Since it's a service of type oneshot, this should either not be
set, or at least the length of the machine's lifetime.
Allow the build to move forward if it is already in the desired
phase/state. When a build fails, ang gets retried from the queue, its
phase doesn't get updated back to WAITING. So it is possible that it
is already in a phase such as SCHEDULED, which could prevent the
buildman from marking the new attempt as scheduled, as there would be
no aparent changes made to the build phase.
With the previous 15 sec hard-coded value, some build could
sporadically expire before having the time to boot start instance and
make the registration rpc call. Change the default to 30 sec, and make
it configurable.
The quay-builder-qemu container image is built from this
directory alone. For cpaas integration, the case
where two container images are built from
the same source repo is not supported. To tackle this,
the removed directory would live in the new
"quay-builder-qemu" github repo
Signed-off-by: harishsurf <hgovinda@redhat.com>
* buildman: Add proxy variables to builds if they exist (PROJQUAY-2120)
Adds the ability to define proxy variables for builders. The proxy variables are parsed as env. variables and defined in Quay's config.yaml file.
* buildman: Add proxy variables to builds if they exist (PROJQUAY-2120)
Adds the ability to define proxy variables for builders. The proxy variables are parsed as env. variables and defined in Quay's config.yaml file.
Adds ACCOUNT_RECOVERY_MODE to allow Quay to run with some core
features disabled. When this is set, the instance should only be used
in order by existing users who hasn't linked their account to an
external login service, after database authentication has been
disabled.
Update the log level from EXCEPTION to WARNING when getting a KeyError
from the orchestrator. The KeyError is valid and happens when a build
has expired.
When set to true, DEBUG will prevent the build nodes from shutting
down after the quay-builder service is done or fails, and will prevent the
build manager from cleaning up the instances (terminating EC2
instances or deleting k8s jobs).
This will allow debugging builder node issues, and should not be set
in a production environment.
The lifetime service will still exist. i.e The instance will still
shutdown after ~2h (EC2 instances will terminate, k8s jobs will
complete)
Setting DEBUG will also affect ALLOWED_WORKER_COUNT, as the
unterminated instances/jobs will still count towards the total number
of running workers.
Create the working directory/change permissions, ownership in the
dockerfile instead of at runtime to avoid permission issues when
running the entrypoint script.
* Handle non 200 api response from executors
* Allows the CA cert to be specified in the config for server verification
Allow the CA cert used for server verification to be specified in the
config even if client certificate authentication is not used.
Handles non-200 responses from executors when trying to get worker count.