You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

371 lines
10 KiB

4 years ago
4 years ago
4 years ago
4 years ago
  1. # Zuul v3 Proof of Concept
  2. Brennen Bearnes <bbearnes@wikimedia.org>, Fall 2019
  3. ## Summary
  4. Zuul v3 is a very capable system and could meet most of the needs we've
  5. expressed. It seems likely that migrating from our existing Zuul configuration
  6. to this release would be more straightforward than implementing either of
  7. GitLab or Argo.
  8. Zuul is also complex, with its configuration distributed across a number of
  9. components. While most per-project configuration can be likely be done in
  10. project repos with a `.zuul.yaml` or `.zuul.d/`, it will still require changes
  11. to (at minimum) Zuul's tenant configuration. The fact that configuration is
  12. shared between all projects within a tenant, while powerful, may also lead to
  13. some confusion. All of this probably means that Zuul would not be quite as
  14. self-serve as we'd prefer.
  15. Some bullet points follow.
  16. Good:
  17. - A known quantity
  18. - Feature-rich, integrates with Gerrit
  19. - We have plenty of Python expertise
  20. - Upstream appears active
  21. - Docs aren't completely terrible
  22. Bad:
  23. - YAML to edit in multiple places
  24. - I don't like Ansible
  25. Uncertain:
  26. - I still don't know what the K8s story is like
  27. - How do we implement the equivalent of pipelinelib?
  28. - Will upstream make major architectural changes again?
  29. In conclusion: It's imperfect and I'm not sure I _like_ it, but I
  30. narrowly/weakly think that Zuul v3 would be our most effective near-term choice
  31. for a migration, whether as a temporary solution to be supplanted by Argo in
  32. the fullness of time or as something more permanent.
  33. ## Configuration
  34. I spun up Zuul on DigitalOcean droplets under my personal account. I initially
  35. tried to follow [Zuul From Scratch][scratch] on a Debian Buster system. This
  36. proved difficult. I encountered incompatibilities with Java runtimes and
  37. various other dependencies.
  38. Eventually, I fell back to [the docker-compose quick start][quick], as
  39. described in the [initial evaluation task, T218138][T218138], with the addition
  40. of a separate Buster droplet as a test node.
  41. From the quick start's [`docker-compose.yaml`][docker-compose], I use the
  42. following services:
  43. - Gerrit
  44. - MariaDB
  45. - zuul-web
  46. - zuul-executor
  47. - zuul-scheduler
  48. - Zookeeper
  49. - Nodepool launcher
  50. - an Apache log server
  51. Initial configuration files for these services are mounted from
  52. [doc/source/admin/examples][example-conf] in the zuul repo, where they can be
  53. modified in place.
  54. ### Zuul
  55. Zuul configuration includes:
  56. - `etc_zuul/zuul.conf`: ini-format config for various services
  57. - `etc_zuul/main.yaml`: YAML tenant config: projects are grouped into
  58. tenants, which share configuration and jobs
  59. - `etc_zuul/scheduler_logging.conf`: ini-format Python logging config
  60. Finally, docker-compose runs an Ansible playbook to create an empty
  61. `zuul-config` repo on the Gerrit instance, and instructions are provided for
  62. adding pipelines and Ansible playbooks. Zuul configuration changes from the
  63. `zuul-config` repo take effect as soon as they're reviewed and merged in
  64. Gerrit.
  65. `zuul-config` is listed under _config-projects_ in the tenant config in
  66. `etc_zuul/main.yaml`, which means that it runs with elevated privileges.
  67. Normal projects (such as Blubber) and a sort of standard library of Ansible
  68. jobs called [`zuul-jobs`][zuul-jobs] are listed under _untrusted-projects_:
  69. ```yaml
  70. - tenant:
  71. name: example-tenant
  72. source:
  73. gerrit:
  74. config-projects:
  75. - zuul-config
  76. untrusted-projects:
  77. - test1
  78. - test2
  79. - blubber
  80. opendev.org:
  81. untrusted-projects:
  82. - zuul/zuul-jobs:
  83. include:
  84. - job
  85. ```
  86. In `zuul-config`, I have the following structure:
  87. ```
  88. brennen@lostsnail-0:~/zuul-config$ tree
  89. .
  90. ├── playbooks
  91. │   └── base
  92. │   ├── post-logs.yaml
  93. │   ├── post-ssh.yaml
  94. │   └── pre.yaml
  95. └── zuul.d
  96. ├── jobs.yaml
  97. ├── pipelines.yaml
  98. └── projects.yaml
  99. 3 directories, 6 files
  100. ```
  101. `zuul-config/playbooks/` contains base Ansible playbooks which are inherited by
  102. all jobs. These playbooks, in turn, apply roles defined in the zuul-jobs repo
  103. mentioned above.
  104. `zuul-config/zuul.d/projects.yaml` contains project definitions. The first uses
  105. a regex to match on all projects:
  106. ```yaml
  107. - project:
  108. name: ^.*$
  109. check:
  110. jobs: []
  111. gate:
  112. jobs: []
  113. - project:
  114. name: zuul-config
  115. check:
  116. jobs:
  117. - noop
  118. gate:
  119. jobs:
  120. - noop
  121. ```
  122. `zuul-config/zuul.d/jobs.yaml` contains job definitions, and specifies the
  123. playbooks for setting up and tearing down SSH keys, copying the project's
  124. source to a node, and stashing the job's logs.
  125. ```yaml
  126. - job:
  127. name: base
  128. parent: null
  129. description: |
  130. The recommended base job.
  131. All jobs ultimately inherit from this. It runs a pre-playbook
  132. which copies all of the job's prepared git repos on to all of
  133. the nodes in the nodeset.
  134. It also sets a default timeout value (which may be overidden).
  135. pre-run: playbooks/base/pre.yaml
  136. post-run:
  137. - playbooks/base/post-ssh.yaml
  138. - playbooks/base/post-logs.yaml
  139. roles:
  140. - zuul: zuul/zuul-jobs
  141. timeout: 1800
  142. nodeset:
  143. nodes:
  144. - name: debian-buster
  145. label: debian-buster
  146. ```
  147. `zuul-config/zuul.d/pipelines.yaml` defines check and gate pipelines:
  148. ```yaml
  149. - pipeline:
  150. name: check
  151. description: |
  152. Newly uploaded patchsets enter this pipeline to receive an
  153. initial +/-1 Verified vote.
  154. manager: independent
  155. require:
  156. gerrit:
  157. open: True
  158. current-patchset: True
  159. trigger:
  160. gerrit:
  161. - event: patchset-created
  162. - event: change-restored
  163. - event: comment-added
  164. comment: (?i)^(Patch Set [0-9]+:)?( [\w\\+-]*)*(\n\n)?\s*recheck
  165. success:
  166. gerrit:
  167. Verified: 1
  168. mysql:
  169. failure:
  170. gerrit:
  171. Verified: -1
  172. mysql:
  173. - pipeline:
  174. name: gate
  175. description: |
  176. Changes that have been approved are enqueued in order in this
  177. pipeline, and if they pass tests, will be merged.
  178. manager: dependent
  179. post-review: True
  180. require:
  181. gerrit:
  182. open: True
  183. current-patchset: True
  184. approval:
  185. - Workflow: 1
  186. trigger:
  187. gerrit:
  188. - event: comment-added
  189. approval:
  190. - Workflow: 1
  191. start:
  192. gerrit:
  193. Verified: 0
  194. success:
  195. gerrit:
  196. Verified: 2
  197. submit: true
  198. mysql:
  199. failure:
  200. gerrit:
  201. Verified: -2
  202. mysql:
  203. ```
  204. ### Job Logging
  205. Logs are copied to a shared volume using the `upload-logs` role provided in
  206. `zuul-jobs`, and served by an Apache container. `upload-logs` can also handle
  207. SCPing logs to a remote host.
  208. There's an `upload-logs-swift` role for use with [OpenStack's Swift object
  209. store][swift], although I haven't tried using it.
  210. Other log stores would probably be easy enough to support, just by replacing
  211. `upload-logs` with an appropriate playbook.
  212. ### Nodepool
  213. Nodepool's configuration in `etc_nodepool/nodepool.yaml` includes a list of
  214. static nodes for running jobs - in this case, there's only one:
  215. ```yaml
  216. providers:
  217. - name: static-vms
  218. driver: static
  219. pools:
  220. - name: main
  221. nodes:
  222. - name: "167.71.188.58"
  223. labels: debian-buster
  224. host-key: "actual-host-key-goes-here"
  225. # Probably set to false because I couldn't get it to work:
  226. host-key-checking: false
  227. python-path: /usr/bin/python3
  228. username: root
  229. ```
  230. ### Individual Projects
  231. Individual projects work essentially the same way `zuul-config` does: A
  232. `.zuul.yaml` or a `.zuul.d/` is written to define jobs, which run Ansible
  233. playbooks, and these jobs are added to pipelines for the project.
  234. As I understand it, _all_ of this configuration is shared between the projects
  235. in a tenant - so a project can run playbooks defined in a different project,
  236. for example.
  237. Here's the `.zuul.yaml` I wrote for Blubber:
  238. ```
  239. brennen@lostsnail-0:~/blubber$ cat .zuul.yaml
  240. - job:
  241. name: blubber-build
  242. run: playbooks/blubber-build.yaml
  243. - job:
  244. name: blubber-test
  245. run: playbooks/blubber-test.yaml
  246. - project:
  247. check:
  248. jobs:
  249. - blubber-build
  250. - blubber-test
  251. gate:
  252. jobs:
  253. - blubber-build
  254. - blubber-test
  255. ```
  256. This in turn references two playbooks:
  257. ```
  258. brennen@lostsnail-0:~/blubber$ cat playbooks/blubber-build.yaml
  259. - hosts: all
  260. environment:
  261. GOPATH=/root
  262. tasks:
  263. - debug:
  264. msg: Building Blubber.
  265. - name: Install build and test dependencies
  266. apt:
  267. name: "{{ packages }}"
  268. update_cache: yes
  269. vars:
  270. packages:
  271. - build-essential
  272. - git
  273. - make:
  274. chdir: src/gerrit/blubber
  275. brennen@lostsnail-0:~/blubber$ cat playbooks/blubber-test.yaml
  276. - hosts: all
  277. tasks:
  278. - debug:
  279. msg: Running Blubber tests.
  280. ```
  281. ## Web Interface
  282. There is one. I had intended to provide screenshots, but I didn't.
  283. That said, most interaction for users would be by way of Gerrit, a model that
  284. our developers are already well familiar with.
  285. ## Remaining Problems
  286. We obviously don't want to run CI jobs as root on a continuously reused VM. In
  287. principle, this is very solvable, but there is some work to be done.
  288. It's probably worth noting that `zuul-jobs` includes roles for building and
  289. uploading Docker images, described in the
  290. [Docker Jobs](https://zuul-ci.org/docs/zuul-jobs/docker-jobs.html),
  291. [Container Roles](https://zuul-ci.org/docs/zuul-jobs/container-roles.html), and
  292. [Container Images](https://zuul-ci.org/docs/zuul-jobs/docker-image.html) sections
  293. of the docs. That last one mentions this:
  294. > The requires: docker-image attribute means that whenever this job (or any
  295. > jobs which inherit from it) run, Zuul will search ahead of the change in the
  296. > dependency graph to find any jobs which produce docker-images and tell this
  297. > job about them. This allows the job to pull images from the intermediate
  298. > registry into the buildset registry.
  299. That sounds pretty neat, but like a lot of other things about Zuul it's
  300. complicated and I don't quite understand it.
  301. [T218138]: https://phabricator.wikimedia.org/T218138
  302. [docker-compose]: https://opendev.org/zuul/zuul/src/branch/master/doc/source/admin/examples/docker-compose.yaml
  303. [example-conf]: https://opendev.org/zuul/zuul/src/branch/master/doc/source/admin/examples/
  304. [quick]: https://zuul-ci.org/docs/zuul/admin/quick-start.html
  305. [scratch]: https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html
  306. [swift]: https://docs.openstack.org/swift/latest/
  307. [zuul-jobs]: https://opendev.org/zuul/zuul-jobs