You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 

10 KiB

Zuul v3 Proof of Concept

Brennen Bearnes bbearnes@wikimedia.org, Fall 2019

Summary

Zuul v3 is a very capable system and could meet most of the needs we've expressed. It seems likely that migrating from our existing Zuul configuration to this release would be more straightforward than implementing either of GitLab or Argo.

Zuul is also complex, with its configuration distributed across a number of components. While most per-project configuration can be likely be done in project repos with a .zuul.yaml or .zuul.d/, it will still require changes to (at minimum) Zuul's tenant configuration. The fact that configuration is shared between all projects within a tenant, while powerful, may also lead to some confusion. All of this probably means that Zuul would not be quite as self-serve as we'd prefer.

Some bullet points follow.

Good:

  • A known quantity
  • Feature-rich, integrates with Gerrit
  • We have plenty of Python expertise
  • Upstream appears active
  • Docs aren't completely terrible

Bad:

  • YAML to edit in multiple places
  • I don't like Ansible

Uncertain:

  • I still don't know what the K8s story is like
  • How do we implement the equivalent of pipelinelib?
  • Will upstream make major architectural changes again?

In conclusion: It's imperfect and I'm not sure I like it, but I narrowly/weakly think that Zuul v3 would be our most effective near-term choice for a migration, whether as a temporary solution to be supplanted by Argo in the fullness of time or as something more permanent.

Configuration

I spun up Zuul on DigitalOcean droplets under my personal account. I initially tried to follow Zuul From Scratch on a Debian Buster system. This proved difficult. I encountered incompatibilities with Java runtimes and various other dependencies.

Eventually, I fell back to the docker-compose quick start, as described in the initial evaluation task, T218138, with the addition of a separate Buster droplet as a test node.

From the quick start's docker-compose.yaml, I use the following services:

  • Gerrit
  • MariaDB
  • zuul-web
  • zuul-executor
  • zuul-scheduler
  • Zookeeper
  • Nodepool launcher
  • an Apache log server

Initial configuration files for these services are mounted from doc/source/admin/examples in the zuul repo, where they can be modified in place.

Zuul

Zuul configuration includes:

  • etc_zuul/zuul.conf: ini-format config for various services
  • etc_zuul/main.yaml: YAML tenant config: projects are grouped into tenants, which share configuration and jobs
  • etc_zuul/scheduler_logging.conf: ini-format Python logging config

Finally, docker-compose runs an Ansible playbook to create an empty zuul-config repo on the Gerrit instance, and instructions are provided for adding pipelines and Ansible playbooks. Zuul configuration changes from the zuul-config repo take effect as soon as they're reviewed and merged in Gerrit.

zuul-config is listed under config-projects in the tenant config in etc_zuul/main.yaml, which means that it runs with elevated privileges. Normal projects (such as Blubber) and a sort of standard library of Ansible jobs called zuul-jobs are listed under untrusted-projects:

- tenant:
    name: example-tenant
    source:
      gerrit:
        config-projects:
          - zuul-config
        untrusted-projects:
          - test1
          - test2
          - blubber
      opendev.org:
        untrusted-projects:
          - zuul/zuul-jobs:
              include:
                - job

In zuul-config, I have the following structure:

brennen@lostsnail-0:~/zuul-config$ tree
.
├── playbooks
│   └── base
│       ├── post-logs.yaml
│       ├── post-ssh.yaml
│       └── pre.yaml
└── zuul.d
    ├── jobs.yaml
    ├── pipelines.yaml
    └── projects.yaml

3 directories, 6 files

zuul-config/playbooks/ contains base Ansible playbooks which are inherited by all jobs. These playbooks, in turn, apply roles defined in the zuul-jobs repo mentioned above.

zuul-config/zuul.d/projects.yaml contains project definitions. The first uses a regex to match on all projects:

- project:
    name: ^.*$
    check:
      jobs: []
    gate:
      jobs: []

- project:
    name: zuul-config
    check:
      jobs:
        - noop
    gate:
      jobs:
        - noop

zuul-config/zuul.d/jobs.yaml contains job definitions, and specifies the playbooks for setting up and tearing down SSH keys, copying the project's source to a node, and stashing the job's logs.

- job:
    name: base
    parent: null
    description: |
      The recommended base job.

      All jobs ultimately inherit from this.  It runs a pre-playbook
      which copies all of the job's prepared git repos on to all of
      the nodes in the nodeset.

      It also sets a default timeout value (which may be overidden).
    pre-run: playbooks/base/pre.yaml
    post-run:
      - playbooks/base/post-ssh.yaml
      - playbooks/base/post-logs.yaml
    roles:
      - zuul: zuul/zuul-jobs
    timeout: 1800
    nodeset:
      nodes:
        - name: debian-buster
          label: debian-buster

zuul-config/zuul.d/pipelines.yaml defines check and gate pipelines:

- pipeline:
    name: check
    description: |
      Newly uploaded patchsets enter this pipeline to receive an
      initial +/-1 Verified vote.
    manager: independent
    require:
      gerrit:
        open: True
        current-patchset: True
    trigger:
      gerrit:
        - event: patchset-created
        - event: change-restored
        - event: comment-added
          comment: (?i)^(Patch Set [0-9]+:)?( [\w\\+-]*)*(\n\n)?\s*recheck
    success:
      gerrit:
        Verified: 1
      mysql:
    failure:
      gerrit:
        Verified: -1
      mysql:

- pipeline:
    name: gate
    description: |
      Changes that have been approved are enqueued in order in this
      pipeline, and if they pass tests, will be merged.
    manager: dependent
    post-review: True
    require:
      gerrit:
        open: True
        current-patchset: True
        approval:
          - Workflow: 1
    trigger:
      gerrit:
        - event: comment-added
          approval:
            - Workflow: 1
    start:
      gerrit:
        Verified: 0
    success:
      gerrit:
        Verified: 2
        submit: true
      mysql:
    failure:
      gerrit:
        Verified: -2
      mysql:

Job Logging

Logs are copied to a shared volume using the upload-logs role provided in zuul-jobs, and served by an Apache container. upload-logs can also handle SCPing logs to a remote host.

There's an upload-logs-swift role for use with OpenStack's Swift object store, although I haven't tried using it.

Other log stores would probably be easy enough to support, just by replacing upload-logs with an appropriate playbook.

Nodepool

Nodepool's configuration in etc_nodepool/nodepool.yaml includes a list of static nodes for running jobs - in this case, there's only one:

providers:
  - name: static-vms
    driver: static
    pools:
      - name: main
        nodes:
          - name: "167.71.188.58"
            labels: debian-buster
            host-key: "actual-host-key-goes-here"
            # Probably set to false because I couldn't get it to work:
            host-key-checking: false
            python-path: /usr/bin/python3
            username: root

Individual Projects

Individual projects work essentially the same way zuul-config does: A .zuul.yaml or a .zuul.d/ is written to define jobs, which run Ansible playbooks, and these jobs are added to pipelines for the project.

As I understand it, all of this configuration is shared between the projects in a tenant - so a project can run playbooks defined in a different project, for example.

Here's the .zuul.yaml I wrote for Blubber:

brennen@lostsnail-0:~/blubber$ cat .zuul.yaml
- job:
    name: blubber-build
    run: playbooks/blubber-build.yaml

- job:
    name: blubber-test
    run: playbooks/blubber-test.yaml

- project:
    check:
      jobs:
        - blubber-build
        - blubber-test
    gate:
      jobs:
        - blubber-build
        - blubber-test

This in turn references two playbooks:

brennen@lostsnail-0:~/blubber$ cat playbooks/blubber-build.yaml
- hosts: all
  environment:
    GOPATH=/root
  tasks:
    - debug:
        msg: Building Blubber.
    - name: Install build and test dependencies
      apt:
        name: "{{ packages }}"
        update_cache: yes
      vars:
        packages:
        - build-essential
        - git
    - make:
        chdir: src/gerrit/blubber

brennen@lostsnail-0:~/blubber$ cat playbooks/blubber-test.yaml
- hosts: all
  tasks:
    - debug:
        msg: Running Blubber tests.

Web Interface

There is one. I had intended to provide screenshots, but I didn't.

That said, most interaction for users would be by way of Gerrit, a model that our developers are already well familiar with.

Remaining Problems

We obviously don't want to run CI jobs as root on a continuously reused VM. In principle, this is very solvable, but there is some work to be done.

It's probably worth noting that zuul-jobs includes roles for building and uploading Docker images, described in the Docker Jobs, Container Roles, and Container Images sections of the docs. That last one mentions this:

The requires: docker-image attribute means that whenever this job (or any jobs which inherit from it) run, Zuul will search ahead of the change in the dependency graph to find any jobs which produce docker-images and tell this job about them. This allows the job to pull images from the intermediate registry into the buildset registry.

That sounds pretty neat, but like a lot of other things about Zuul it's complicated and I don't quite understand it.