You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

236 lines
6.6 KiB

pipeline: Directed graph execution model Our current pipelinelib based jobs require repos to conform to a number of rigid conventions: assume the repo contains source for only a single application, build "test" variant, run "test" variant, build "production" variant, helm deploy/test, publish under a single tag name. These jobs also assume all of these operations need to be performed linearly. While this design was sufficient for our very first use cases, its convention based design it already proving prohibitively inflexible. For example, teams maintaining repos that contain multiple interrelated applications cannot build and test these applications as independent images; Teams wanting to execute multiple test suites would have to wrap them in a single entrypoint and implement their own concurrency should they need it; Etc. Instead of Release Engineering maintaining a new specialized pipeline job for each team that performs only slightly different permutations of the same operations (resulting in duplication of job definitions and a large maintenance burden), we can instead establish a configuration format and interface by which teams provide their own pipeline compositions. This initial commit in a series of pipeline related commits implements two fundamental components to support a CI/CD pipeline that can execute any number of user-defined variant build/test/publish/deploy stages and steps in a safely concurrent model: a directed-graph based execution model, and name bindings for stage outputs. The former provides the model for composing stage execution, and the latter provides a decoupled system for defining what outputs each subsequent stage operates upon. First, an `ExecutionGraph` class that can represent a directed acyclic graph given a number of linearly defined arcs (aka branches/edges). This component will allow users to provide the overall execution flow as separate linear processes but allow parallel branches of the execution graph to be scheduled concurrently. Example: /* To represent a graph with separate parallel branches like: * * a x * ⇘ ⇙ * b * ⇙ ⇘ * y c * ⇘ ⇙ * z * * One only needs to provides each linear execution arc */ def graph = new ExecutionGraph([["a", "b", "c", "z"], ["x", "b", "y", "z"]]) /* The ExecutionGraph can solve how those arcs intersect and how the * nodes can be scheduled with a degree of concurrency that Jenkins * allows. */ graph.stack() // => [["a", "x"], ["b"], ["y", "c"], ["z"]] Second, a set of context classes for managing immutable global and local name/value bindings between nodes in the graph. Effectively this will provide a way for pipeline stages to safely and deterministically consume inputs from previous stages along the same branch, and to provide their own outputs for subsequent stages to consume. For example, one stage called "build" that builds a container image will save the image ID in a predetermined local binding called `.imageID` and a subsequent "publish" stage configured by the user can reference that image by `${build.imageID}`. Once a value is bound to a name, that name cannot be reused; bindings are immutable. Node contexts are only allowed to access namespaces for nodes that precede them in same branch of the graph, ensuring deterministic behavior during parallel graph branch execution. See unit tests for `ExecutionContext` for details on expected behavior. Put together, these two data structures can constitute an execution "stack" of sorts that can be safely mapped to Jenkins Pipeline stages, and make use of parallel execution for graph branches. Specifically, the `ExecutionGraph.stack()` method is implemented to yield each set of independent stack "frames" in topological sort order which can safely be scheduled to run in parallel. Bug: T210267 Change-Id: Ic5d01bf54c703eaf14434a36f1e2b3e276b48b6f
5 years ago
pipeline: Directed graph execution model Our current pipelinelib based jobs require repos to conform to a number of rigid conventions: assume the repo contains source for only a single application, build "test" variant, run "test" variant, build "production" variant, helm deploy/test, publish under a single tag name. These jobs also assume all of these operations need to be performed linearly. While this design was sufficient for our very first use cases, its convention based design it already proving prohibitively inflexible. For example, teams maintaining repos that contain multiple interrelated applications cannot build and test these applications as independent images; Teams wanting to execute multiple test suites would have to wrap them in a single entrypoint and implement their own concurrency should they need it; Etc. Instead of Release Engineering maintaining a new specialized pipeline job for each team that performs only slightly different permutations of the same operations (resulting in duplication of job definitions and a large maintenance burden), we can instead establish a configuration format and interface by which teams provide their own pipeline compositions. This initial commit in a series of pipeline related commits implements two fundamental components to support a CI/CD pipeline that can execute any number of user-defined variant build/test/publish/deploy stages and steps in a safely concurrent model: a directed-graph based execution model, and name bindings for stage outputs. The former provides the model for composing stage execution, and the latter provides a decoupled system for defining what outputs each subsequent stage operates upon. First, an `ExecutionGraph` class that can represent a directed acyclic graph given a number of linearly defined arcs (aka branches/edges). This component will allow users to provide the overall execution flow as separate linear processes but allow parallel branches of the execution graph to be scheduled concurrently. Example: /* To represent a graph with separate parallel branches like: * * a x * ⇘ ⇙ * b * ⇙ ⇘ * y c * ⇘ ⇙ * z * * One only needs to provides each linear execution arc */ def graph = new ExecutionGraph([["a", "b", "c", "z"], ["x", "b", "y", "z"]]) /* The ExecutionGraph can solve how those arcs intersect and how the * nodes can be scheduled with a degree of concurrency that Jenkins * allows. */ graph.stack() // => [["a", "x"], ["b"], ["y", "c"], ["z"]] Second, a set of context classes for managing immutable global and local name/value bindings between nodes in the graph. Effectively this will provide a way for pipeline stages to safely and deterministically consume inputs from previous stages along the same branch, and to provide their own outputs for subsequent stages to consume. For example, one stage called "build" that builds a container image will save the image ID in a predetermined local binding called `.imageID` and a subsequent "publish" stage configured by the user can reference that image by `${build.imageID}`. Once a value is bound to a name, that name cannot be reused; bindings are immutable. Node contexts are only allowed to access namespaces for nodes that precede them in same branch of the graph, ensuring deterministic behavior during parallel graph branch execution. See unit tests for `ExecutionContext` for details on expected behavior. Put together, these two data structures can constitute an execution "stack" of sorts that can be safely mapped to Jenkins Pipeline stages, and make use of parallel execution for graph branches. Specifically, the `ExecutionGraph.stack()` method is implemented to yield each set of independent stack "frames" in topological sort order which can safely be scheduled to run in parallel. Bug: T210267 Change-Id: Ic5d01bf54c703eaf14434a36f1e2b3e276b48b6f
5 years ago
pipeline: Directed graph execution model Our current pipelinelib based jobs require repos to conform to a number of rigid conventions: assume the repo contains source for only a single application, build "test" variant, run "test" variant, build "production" variant, helm deploy/test, publish under a single tag name. These jobs also assume all of these operations need to be performed linearly. While this design was sufficient for our very first use cases, its convention based design it already proving prohibitively inflexible. For example, teams maintaining repos that contain multiple interrelated applications cannot build and test these applications as independent images; Teams wanting to execute multiple test suites would have to wrap them in a single entrypoint and implement their own concurrency should they need it; Etc. Instead of Release Engineering maintaining a new specialized pipeline job for each team that performs only slightly different permutations of the same operations (resulting in duplication of job definitions and a large maintenance burden), we can instead establish a configuration format and interface by which teams provide their own pipeline compositions. This initial commit in a series of pipeline related commits implements two fundamental components to support a CI/CD pipeline that can execute any number of user-defined variant build/test/publish/deploy stages and steps in a safely concurrent model: a directed-graph based execution model, and name bindings for stage outputs. The former provides the model for composing stage execution, and the latter provides a decoupled system for defining what outputs each subsequent stage operates upon. First, an `ExecutionGraph` class that can represent a directed acyclic graph given a number of linearly defined arcs (aka branches/edges). This component will allow users to provide the overall execution flow as separate linear processes but allow parallel branches of the execution graph to be scheduled concurrently. Example: /* To represent a graph with separate parallel branches like: * * a x * ⇘ ⇙ * b * ⇙ ⇘ * y c * ⇘ ⇙ * z * * One only needs to provides each linear execution arc */ def graph = new ExecutionGraph([["a", "b", "c", "z"], ["x", "b", "y", "z"]]) /* The ExecutionGraph can solve how those arcs intersect and how the * nodes can be scheduled with a degree of concurrency that Jenkins * allows. */ graph.stack() // => [["a", "x"], ["b"], ["y", "c"], ["z"]] Second, a set of context classes for managing immutable global and local name/value bindings between nodes in the graph. Effectively this will provide a way for pipeline stages to safely and deterministically consume inputs from previous stages along the same branch, and to provide their own outputs for subsequent stages to consume. For example, one stage called "build" that builds a container image will save the image ID in a predetermined local binding called `.imageID` and a subsequent "publish" stage configured by the user can reference that image by `${build.imageID}`. Once a value is bound to a name, that name cannot be reused; bindings are immutable. Node contexts are only allowed to access namespaces for nodes that precede them in same branch of the graph, ensuring deterministic behavior during parallel graph branch execution. See unit tests for `ExecutionContext` for details on expected behavior. Put together, these two data structures can constitute an execution "stack" of sorts that can be safely mapped to Jenkins Pipeline stages, and make use of parallel execution for graph branches. Specifically, the `ExecutionGraph.stack()` method is implemented to yield each set of independent stack "frames" in topological sort order which can safely be scheduled to run in parallel. Bug: T210267 Change-Id: Ic5d01bf54c703eaf14434a36f1e2b3e276b48b6f
5 years ago
pipeline: Directed graph execution model Our current pipelinelib based jobs require repos to conform to a number of rigid conventions: assume the repo contains source for only a single application, build "test" variant, run "test" variant, build "production" variant, helm deploy/test, publish under a single tag name. These jobs also assume all of these operations need to be performed linearly. While this design was sufficient for our very first use cases, its convention based design it already proving prohibitively inflexible. For example, teams maintaining repos that contain multiple interrelated applications cannot build and test these applications as independent images; Teams wanting to execute multiple test suites would have to wrap them in a single entrypoint and implement their own concurrency should they need it; Etc. Instead of Release Engineering maintaining a new specialized pipeline job for each team that performs only slightly different permutations of the same operations (resulting in duplication of job definitions and a large maintenance burden), we can instead establish a configuration format and interface by which teams provide their own pipeline compositions. This initial commit in a series of pipeline related commits implements two fundamental components to support a CI/CD pipeline that can execute any number of user-defined variant build/test/publish/deploy stages and steps in a safely concurrent model: a directed-graph based execution model, and name bindings for stage outputs. The former provides the model for composing stage execution, and the latter provides a decoupled system for defining what outputs each subsequent stage operates upon. First, an `ExecutionGraph` class that can represent a directed acyclic graph given a number of linearly defined arcs (aka branches/edges). This component will allow users to provide the overall execution flow as separate linear processes but allow parallel branches of the execution graph to be scheduled concurrently. Example: /* To represent a graph with separate parallel branches like: * * a x * ⇘ ⇙ * b * ⇙ ⇘ * y c * ⇘ ⇙ * z * * One only needs to provides each linear execution arc */ def graph = new ExecutionGraph([["a", "b", "c", "z"], ["x", "b", "y", "z"]]) /* The ExecutionGraph can solve how those arcs intersect and how the * nodes can be scheduled with a degree of concurrency that Jenkins * allows. */ graph.stack() // => [["a", "x"], ["b"], ["y", "c"], ["z"]] Second, a set of context classes for managing immutable global and local name/value bindings between nodes in the graph. Effectively this will provide a way for pipeline stages to safely and deterministically consume inputs from previous stages along the same branch, and to provide their own outputs for subsequent stages to consume. For example, one stage called "build" that builds a container image will save the image ID in a predetermined local binding called `.imageID` and a subsequent "publish" stage configured by the user can reference that image by `${build.imageID}`. Once a value is bound to a name, that name cannot be reused; bindings are immutable. Node contexts are only allowed to access namespaces for nodes that precede them in same branch of the graph, ensuring deterministic behavior during parallel graph branch execution. See unit tests for `ExecutionContext` for details on expected behavior. Put together, these two data structures can constitute an execution "stack" of sorts that can be safely mapped to Jenkins Pipeline stages, and make use of parallel execution for graph branches. Specifically, the `ExecutionGraph.stack()` method is implemented to yield each set of independent stack "frames" in topological sort order which can safely be scheduled to run in parallel. Bug: T210267 Change-Id: Ic5d01bf54c703eaf14434a36f1e2b3e276b48b6f
5 years ago
pipeline: Directed graph execution model Our current pipelinelib based jobs require repos to conform to a number of rigid conventions: assume the repo contains source for only a single application, build "test" variant, run "test" variant, build "production" variant, helm deploy/test, publish under a single tag name. These jobs also assume all of these operations need to be performed linearly. While this design was sufficient for our very first use cases, its convention based design it already proving prohibitively inflexible. For example, teams maintaining repos that contain multiple interrelated applications cannot build and test these applications as independent images; Teams wanting to execute multiple test suites would have to wrap them in a single entrypoint and implement their own concurrency should they need it; Etc. Instead of Release Engineering maintaining a new specialized pipeline job for each team that performs only slightly different permutations of the same operations (resulting in duplication of job definitions and a large maintenance burden), we can instead establish a configuration format and interface by which teams provide their own pipeline compositions. This initial commit in a series of pipeline related commits implements two fundamental components to support a CI/CD pipeline that can execute any number of user-defined variant build/test/publish/deploy stages and steps in a safely concurrent model: a directed-graph based execution model, and name bindings for stage outputs. The former provides the model for composing stage execution, and the latter provides a decoupled system for defining what outputs each subsequent stage operates upon. First, an `ExecutionGraph` class that can represent a directed acyclic graph given a number of linearly defined arcs (aka branches/edges). This component will allow users to provide the overall execution flow as separate linear processes but allow parallel branches of the execution graph to be scheduled concurrently. Example: /* To represent a graph with separate parallel branches like: * * a x * ⇘ ⇙ * b * ⇙ ⇘ * y c * ⇘ ⇙ * z * * One only needs to provides each linear execution arc */ def graph = new ExecutionGraph([["a", "b", "c", "z"], ["x", "b", "y", "z"]]) /* The ExecutionGraph can solve how those arcs intersect and how the * nodes can be scheduled with a degree of concurrency that Jenkins * allows. */ graph.stack() // => [["a", "x"], ["b"], ["y", "c"], ["z"]] Second, a set of context classes for managing immutable global and local name/value bindings between nodes in the graph. Effectively this will provide a way for pipeline stages to safely and deterministically consume inputs from previous stages along the same branch, and to provide their own outputs for subsequent stages to consume. For example, one stage called "build" that builds a container image will save the image ID in a predetermined local binding called `.imageID` and a subsequent "publish" stage configured by the user can reference that image by `${build.imageID}`. Once a value is bound to a name, that name cannot be reused; bindings are immutable. Node contexts are only allowed to access namespaces for nodes that precede them in same branch of the graph, ensuring deterministic behavior during parallel graph branch execution. See unit tests for `ExecutionContext` for details on expected behavior. Put together, these two data structures can constitute an execution "stack" of sorts that can be safely mapped to Jenkins Pipeline stages, and make use of parallel execution for graph branches. Specifically, the `ExecutionGraph.stack()` method is implemented to yield each set of independent stack "frames" in topological sort order which can safely be scheduled to run in parallel. Bug: T210267 Change-Id: Ic5d01bf54c703eaf14434a36f1e2b3e276b48b6f
5 years ago
  1. package org.wikimedia.integration
  2. import com.cloudbees.groovy.cps.NonCPS
  3. import org.codehaus.groovy.GroovyException
  4. import org.wikimedia.integration.ExecutionGraph
  5. /**
  6. * Provides execution context and value bindings to graph nodes. Each node's
  7. * context (see {@link ofNode()}) provides methods for saving its own values
  8. * and bindings for accessing values saved by ancestor nodes.
  9. *
  10. * These contexts can be used during graph stack evaluation to safely pass
  11. * values from ancestor nodes to descendent nodes, and to keep nodes from
  12. * different branches of execution to access each other's values.
  13. */
  14. class ExecutionContext implements Serializable {
  15. private Map globals = [:]
  16. private ExecutionGraph graph
  17. ExecutionContext(executionGraph) {
  18. graph = executionGraph
  19. }
  20. /**
  21. * Create and return an execution context for the given node.
  22. */
  23. NodeContext ofNode(node) {
  24. new NodeContext(node, graph.ancestorsOf(node))
  25. }
  26. /**
  27. * Returns the names of all values bound by node contexts.
  28. */
  29. List getAllKeys() {
  30. def keys = []
  31. for (def ns in globals) {
  32. for (def key in globals[ns]) {
  33. keys.add("${ns}.${key}")
  34. }
  35. }
  36. keys
  37. }
  38. /**
  39. * Provides an execution context for a single given node that can resolve
  40. * bindings for values stored by ancestor nodes, set its own values, and
  41. * safely interpolate user-provided strings.
  42. *
  43. * @example
  44. * Given a graph:
  45. * <pre><code>
  46. * def context = new ExecutionContext(new ExecutionGraph([
  47. * ["a", "b", "c", "d", "e", "f"],
  48. * ["x", "d", "y", "f"],
  49. * ]))
  50. * </code></pre>
  51. *
  52. * @example
  53. * Values can be bound to any existing node.
  54. * <pre><code>
  55. * context.ofNode("a")["foo"] = "afoo"
  56. * context.ofNode("a")["bar"] = "abar"
  57. * context.ofNode("b")["foo"] = "bfoo"
  58. * context.ofNode("x")["bar"] = "xbar"
  59. * </code></pre>
  60. *
  61. * @example
  62. * Those same values can be accessed in contexts of descendent nodes, but
  63. * not in contexts of unrelated nodes.
  64. * <pre><code>
  65. * assert context.ofNode("c").binding("a", "foo") == "afoo"
  66. * assert context.ofNode("c").binding("a", "bar") == "abar"
  67. * assert context.ofNode("c").binding("b", "foo") == "bfoo"
  68. *
  69. * assert context.ofNode("c").binding("x", "bar") == null
  70. *
  71. * assert context.ofNode("e").binding("a", "foo") == "afoo"
  72. * assert context.ofNode("e").binding("a", "bar") == "abar"
  73. * assert context.ofNode("e").binding("b", "foo") == "bfoo"
  74. * assert context.ofNode("e").binding("x", "bar") == "xbar"
  75. * </code></pre>
  76. *
  77. * @example
  78. * Leveraging all of the above, user-provided configuration can be safely
  79. * interpolated.
  80. * <pre><code>
  81. * assert (context.ofNode("c") % 'w-t-${a.foo} ${b.foo}') == "w-t-afoo bfoo"
  82. * assert (context.ofNode("c") % 'w-t-${a.bar}') == "w-t-abar"
  83. * assert (context.ofNode("x") % 'w-t-${x.bar}') == 'w-t-${x.bar}'
  84. * </code></pre>
  85. */
  86. class NodeContext implements Serializable {
  87. final VAR_EXPRESSION = /\$\{\w*\.\w+\}/
  88. final def node
  89. final Set ancestors
  90. NodeContext(contextNode, contextAncestors) {
  91. globals[contextNode] = globals[contextNode] ?: [:]
  92. node = contextNode
  93. ancestors = contextAncestors
  94. }
  95. /**
  96. * Binds a value to a name in the globals store under the node's
  97. * namespace. The value may later be retrieved by any descendent node's
  98. * context using {@link binding()}.
  99. */
  100. @NonCPS
  101. void bind(String key, value)
  102. throws NameAlreadyBoundException {
  103. if (globals[node].containsKey(key)) {
  104. throw new NameAlreadyBoundException(key: key)
  105. }
  106. globals[node][key] = value
  107. }
  108. /**
  109. * Retrieves a value previously bound using {@link bind()} to this node's
  110. * context. If the given key is not found under this node's namespace, a
  111. * {@link NameNotFoundException} is thrown.
  112. */
  113. def binding(String key)
  114. throws NameNotFoundException {
  115. if (!globals[node].containsKey(key)) {
  116. throw new NameNotFoundException(ns: node, key: key)
  117. }
  118. globals[node][key]
  119. }
  120. /**
  121. * Retrieves a value previously bound using {@link bind()} under the given
  122. * ancestor node's namespace and name.
  123. */
  124. def binding(ancestorNode, String key)
  125. throws NameNotFoundException, AncestorNotFoundException {
  126. if (!(ancestorNode in ancestors)) {
  127. throw new AncestorNotFoundException(ancestor: ancestorNode, node: node)
  128. }
  129. if (!globals[ancestorNode].containsKey(key)) {
  130. throw new NameNotFoundException(ns: ancestorNode, key: key)
  131. }
  132. globals[ancestorNode][key]
  133. }
  134. /**
  135. * Returns all objects bound to the given name under any node namespace,
  136. * as well as the node under which it is found.
  137. *
  138. * This should only be used at the end of an execution graph.
  139. */
  140. Map getAll(String key) {
  141. globals.findAll { it.value[key] != null }.collectEntries { [it.key, it.value[key]] }
  142. }
  143. /**
  144. * Operator alias for {@link binding(String)} or, if a "namespace.key" is
  145. * given, {@link binding(def, String)}.
  146. */
  147. def getAt(String key) {
  148. def keys = key.split(/\./)
  149. if (keys.size() > 1) {
  150. if (keys[0] == "") {
  151. return binding(keys[1])
  152. }
  153. return binding(keys[0], keys[1])
  154. }
  155. return binding(key)
  156. }
  157. /**
  158. * Interpolates the given string by substituting all symbol expressions
  159. * with values previously bound by ancestor nodes.
  160. */
  161. String interpolate(String str) {
  162. // NOTE call to replaceAll does not rely on its sub matching feature as
  163. // Groovy CPS does not implement it correctly, and marking this method
  164. // as NonCPS causes it to only ever return the first substitution.
  165. str.replaceAll(VAR_EXPRESSION) {
  166. this[it[2..-2]]
  167. }
  168. }
  169. /**
  170. * Operator alias for {@link interpolate()}.
  171. */
  172. String mod(String str) {
  173. interpolate(str)
  174. }
  175. /**
  176. * Operator alias for {@link bind()}.
  177. */
  178. void putAt(String key, value) {
  179. bind(key, value)
  180. }
  181. }
  182. class AncestorNotFoundException extends GroovyException {
  183. def ancestor, node
  184. String getMessage() {
  185. "cannot access '${ancestor}.*' values since '${node}' does not follow it in the graph '${graph}'"
  186. }
  187. }
  188. class NameNotFoundException extends GroovyException {
  189. def ns, key
  190. String getMessage() {
  191. "no value bound for '${ns}.${key}'; all bound names are: ${getAllKeys().join(", ")}"
  192. }
  193. }
  194. class NameAlreadyBoundException extends GroovyException {
  195. def key
  196. String getMessage() {
  197. "'${node}' already has a value assigned to '${key}'"
  198. }
  199. }
  200. }