pipeline: Directed graph execution model
Our current pipelinelib based jobs require repos to conform to a number
of rigid conventions: assume the repo contains source for only a single
application, build "test" variant, run "test" variant, build
"production" variant, helm deploy/test, publish under a single tag name.
These jobs also assume all of these operations need to be performed
linearly.
While this design was sufficient for our very first use cases, its
convention based design it already proving prohibitively inflexible. For
example, teams maintaining repos that contain multiple interrelated
applications cannot build and test these applications as independent
images; Teams wanting to execute multiple test suites would have to wrap
them in a single entrypoint and implement their own concurrency should
they need it; Etc.
Instead of Release Engineering maintaining a new specialized pipeline
job for each team that performs only slightly different permutations of
the same operations (resulting in duplication of job definitions and a
large maintenance burden), we can instead establish a configuration
format and interface by which teams provide their own pipeline
compositions.
This initial commit in a series of pipeline related commits implements
two fundamental components to support a CI/CD pipeline that can execute
any number of user-defined variant build/test/publish/deploy stages and
steps in a safely concurrent model: a directed-graph based execution
model, and name bindings for stage outputs. The former provides the
model for composing stage execution, and the latter provides a decoupled
system for defining what outputs each subsequent stage operates upon.
First, an `ExecutionGraph` class that can represent a directed acyclic
graph given a number of linearly defined arcs (aka branches/edges). This
component will allow users to provide the overall execution flow as
separate linear processes but allow parallel branches of the execution
graph to be scheduled concurrently.
Example:
/* To represent a graph with separate parallel branches like:
*
* a x
* ⇘ ⇙
* b
* ⇙ ⇘
* y c
* ⇘ ⇙
* z
*
* One only needs to provides each linear execution arc
*/
def graph = new ExecutionGraph([["a", "b", "c", "z"], ["x", "b", "y", "z"]])
/* The ExecutionGraph can solve how those arcs intersect and how the
* nodes can be scheduled with a degree of concurrency that Jenkins
* allows.
*/
graph.stack() // => [["a", "x"], ["b"], ["y", "c"], ["z"]]
Second, a set of context classes for managing immutable global and local
name/value bindings between nodes in the graph. Effectively this will
provide a way for pipeline stages to safely and deterministically
consume inputs from previous stages along the same branch, and to
provide their own outputs for subsequent stages to consume.
For example, one stage called "build" that builds a container image will
save the image ID in a predetermined local binding called `.imageID` and
a subsequent "publish" stage configured by the user can reference that
image by `${build.imageID}`.
Once a value is bound to a name, that name cannot be reused; bindings
are immutable. Node contexts are only allowed to access namespaces for
nodes that precede them in same branch of the graph, ensuring
deterministic behavior during parallel graph branch execution. See unit
tests for `ExecutionContext` for details on expected behavior.
Put together, these two data structures can constitute an execution
"stack" of sorts that can be safely mapped to Jenkins Pipeline stages,
and make use of parallel execution for graph branches. Specifically, the
`ExecutionGraph.stack()` method is implemented to yield each set of
independent stack "frames" in topological sort order which can safely be
scheduled to run in parallel.
Bug: T210267
Change-Id: Ic5d01bf54c703eaf14434a36f1e2b3e276b48b6f
5 years ago pipeline: Directed graph execution model
Our current pipelinelib based jobs require repos to conform to a number
of rigid conventions: assume the repo contains source for only a single
application, build "test" variant, run "test" variant, build
"production" variant, helm deploy/test, publish under a single tag name.
These jobs also assume all of these operations need to be performed
linearly.
While this design was sufficient for our very first use cases, its
convention based design it already proving prohibitively inflexible. For
example, teams maintaining repos that contain multiple interrelated
applications cannot build and test these applications as independent
images; Teams wanting to execute multiple test suites would have to wrap
them in a single entrypoint and implement their own concurrency should
they need it; Etc.
Instead of Release Engineering maintaining a new specialized pipeline
job for each team that performs only slightly different permutations of
the same operations (resulting in duplication of job definitions and a
large maintenance burden), we can instead establish a configuration
format and interface by which teams provide their own pipeline
compositions.
This initial commit in a series of pipeline related commits implements
two fundamental components to support a CI/CD pipeline that can execute
any number of user-defined variant build/test/publish/deploy stages and
steps in a safely concurrent model: a directed-graph based execution
model, and name bindings for stage outputs. The former provides the
model for composing stage execution, and the latter provides a decoupled
system for defining what outputs each subsequent stage operates upon.
First, an `ExecutionGraph` class that can represent a directed acyclic
graph given a number of linearly defined arcs (aka branches/edges). This
component will allow users to provide the overall execution flow as
separate linear processes but allow parallel branches of the execution
graph to be scheduled concurrently.
Example:
/* To represent a graph with separate parallel branches like:
*
* a x
* ⇘ ⇙
* b
* ⇙ ⇘
* y c
* ⇘ ⇙
* z
*
* One only needs to provides each linear execution arc
*/
def graph = new ExecutionGraph([["a", "b", "c", "z"], ["x", "b", "y", "z"]])
/* The ExecutionGraph can solve how those arcs intersect and how the
* nodes can be scheduled with a degree of concurrency that Jenkins
* allows.
*/
graph.stack() // => [["a", "x"], ["b"], ["y", "c"], ["z"]]
Second, a set of context classes for managing immutable global and local
name/value bindings between nodes in the graph. Effectively this will
provide a way for pipeline stages to safely and deterministically
consume inputs from previous stages along the same branch, and to
provide their own outputs for subsequent stages to consume.
For example, one stage called "build" that builds a container image will
save the image ID in a predetermined local binding called `.imageID` and
a subsequent "publish" stage configured by the user can reference that
image by `${build.imageID}`.
Once a value is bound to a name, that name cannot be reused; bindings
are immutable. Node contexts are only allowed to access namespaces for
nodes that precede them in same branch of the graph, ensuring
deterministic behavior during parallel graph branch execution. See unit
tests for `ExecutionContext` for details on expected behavior.
Put together, these two data structures can constitute an execution
"stack" of sorts that can be safely mapped to Jenkins Pipeline stages,
and make use of parallel execution for graph branches. Specifically, the
`ExecutionGraph.stack()` method is implemented to yield each set of
independent stack "frames" in topological sort order which can safely be
scheduled to run in parallel.
Bug: T210267
Change-Id: Ic5d01bf54c703eaf14434a36f1e2b3e276b48b6f
5 years ago pipeline: Directed graph execution model
Our current pipelinelib based jobs require repos to conform to a number
of rigid conventions: assume the repo contains source for only a single
application, build "test" variant, run "test" variant, build
"production" variant, helm deploy/test, publish under a single tag name.
These jobs also assume all of these operations need to be performed
linearly.
While this design was sufficient for our very first use cases, its
convention based design it already proving prohibitively inflexible. For
example, teams maintaining repos that contain multiple interrelated
applications cannot build and test these applications as independent
images; Teams wanting to execute multiple test suites would have to wrap
them in a single entrypoint and implement their own concurrency should
they need it; Etc.
Instead of Release Engineering maintaining a new specialized pipeline
job for each team that performs only slightly different permutations of
the same operations (resulting in duplication of job definitions and a
large maintenance burden), we can instead establish a configuration
format and interface by which teams provide their own pipeline
compositions.
This initial commit in a series of pipeline related commits implements
two fundamental components to support a CI/CD pipeline that can execute
any number of user-defined variant build/test/publish/deploy stages and
steps in a safely concurrent model: a directed-graph based execution
model, and name bindings for stage outputs. The former provides the
model for composing stage execution, and the latter provides a decoupled
system for defining what outputs each subsequent stage operates upon.
First, an `ExecutionGraph` class that can represent a directed acyclic
graph given a number of linearly defined arcs (aka branches/edges). This
component will allow users to provide the overall execution flow as
separate linear processes but allow parallel branches of the execution
graph to be scheduled concurrently.
Example:
/* To represent a graph with separate parallel branches like:
*
* a x
* ⇘ ⇙
* b
* ⇙ ⇘
* y c
* ⇘ ⇙
* z
*
* One only needs to provides each linear execution arc
*/
def graph = new ExecutionGraph([["a", "b", "c", "z"], ["x", "b", "y", "z"]])
/* The ExecutionGraph can solve how those arcs intersect and how the
* nodes can be scheduled with a degree of concurrency that Jenkins
* allows.
*/
graph.stack() // => [["a", "x"], ["b"], ["y", "c"], ["z"]]
Second, a set of context classes for managing immutable global and local
name/value bindings between nodes in the graph. Effectively this will
provide a way for pipeline stages to safely and deterministically
consume inputs from previous stages along the same branch, and to
provide their own outputs for subsequent stages to consume.
For example, one stage called "build" that builds a container image will
save the image ID in a predetermined local binding called `.imageID` and
a subsequent "publish" stage configured by the user can reference that
image by `${build.imageID}`.
Once a value is bound to a name, that name cannot be reused; bindings
are immutable. Node contexts are only allowed to access namespaces for
nodes that precede them in same branch of the graph, ensuring
deterministic behavior during parallel graph branch execution. See unit
tests for `ExecutionContext` for details on expected behavior.
Put together, these two data structures can constitute an execution
"stack" of sorts that can be safely mapped to Jenkins Pipeline stages,
and make use of parallel execution for graph branches. Specifically, the
`ExecutionGraph.stack()` method is implemented to yield each set of
independent stack "frames" in topological sort order which can safely be
scheduled to run in parallel.
Bug: T210267
Change-Id: Ic5d01bf54c703eaf14434a36f1e2b3e276b48b6f
5 years ago pipeline: Directed graph execution model
Our current pipelinelib based jobs require repos to conform to a number
of rigid conventions: assume the repo contains source for only a single
application, build "test" variant, run "test" variant, build
"production" variant, helm deploy/test, publish under a single tag name.
These jobs also assume all of these operations need to be performed
linearly.
While this design was sufficient for our very first use cases, its
convention based design it already proving prohibitively inflexible. For
example, teams maintaining repos that contain multiple interrelated
applications cannot build and test these applications as independent
images; Teams wanting to execute multiple test suites would have to wrap
them in a single entrypoint and implement their own concurrency should
they need it; Etc.
Instead of Release Engineering maintaining a new specialized pipeline
job for each team that performs only slightly different permutations of
the same operations (resulting in duplication of job definitions and a
large maintenance burden), we can instead establish a configuration
format and interface by which teams provide their own pipeline
compositions.
This initial commit in a series of pipeline related commits implements
two fundamental components to support a CI/CD pipeline that can execute
any number of user-defined variant build/test/publish/deploy stages and
steps in a safely concurrent model: a directed-graph based execution
model, and name bindings for stage outputs. The former provides the
model for composing stage execution, and the latter provides a decoupled
system for defining what outputs each subsequent stage operates upon.
First, an `ExecutionGraph` class that can represent a directed acyclic
graph given a number of linearly defined arcs (aka branches/edges). This
component will allow users to provide the overall execution flow as
separate linear processes but allow parallel branches of the execution
graph to be scheduled concurrently.
Example:
/* To represent a graph with separate parallel branches like:
*
* a x
* ⇘ ⇙
* b
* ⇙ ⇘
* y c
* ⇘ ⇙
* z
*
* One only needs to provides each linear execution arc
*/
def graph = new ExecutionGraph([["a", "b", "c", "z"], ["x", "b", "y", "z"]])
/* The ExecutionGraph can solve how those arcs intersect and how the
* nodes can be scheduled with a degree of concurrency that Jenkins
* allows.
*/
graph.stack() // => [["a", "x"], ["b"], ["y", "c"], ["z"]]
Second, a set of context classes for managing immutable global and local
name/value bindings between nodes in the graph. Effectively this will
provide a way for pipeline stages to safely and deterministically
consume inputs from previous stages along the same branch, and to
provide their own outputs for subsequent stages to consume.
For example, one stage called "build" that builds a container image will
save the image ID in a predetermined local binding called `.imageID` and
a subsequent "publish" stage configured by the user can reference that
image by `${build.imageID}`.
Once a value is bound to a name, that name cannot be reused; bindings
are immutable. Node contexts are only allowed to access namespaces for
nodes that precede them in same branch of the graph, ensuring
deterministic behavior during parallel graph branch execution. See unit
tests for `ExecutionContext` for details on expected behavior.
Put together, these two data structures can constitute an execution
"stack" of sorts that can be safely mapped to Jenkins Pipeline stages,
and make use of parallel execution for graph branches. Specifically, the
`ExecutionGraph.stack()` method is implemented to yield each set of
independent stack "frames" in topological sort order which can safely be
scheduled to run in parallel.
Bug: T210267
Change-Id: Ic5d01bf54c703eaf14434a36f1e2b3e276b48b6f
5 years ago pipeline: Directed graph execution model
Our current pipelinelib based jobs require repos to conform to a number
of rigid conventions: assume the repo contains source for only a single
application, build "test" variant, run "test" variant, build
"production" variant, helm deploy/test, publish under a single tag name.
These jobs also assume all of these operations need to be performed
linearly.
While this design was sufficient for our very first use cases, its
convention based design it already proving prohibitively inflexible. For
example, teams maintaining repos that contain multiple interrelated
applications cannot build and test these applications as independent
images; Teams wanting to execute multiple test suites would have to wrap
them in a single entrypoint and implement their own concurrency should
they need it; Etc.
Instead of Release Engineering maintaining a new specialized pipeline
job for each team that performs only slightly different permutations of
the same operations (resulting in duplication of job definitions and a
large maintenance burden), we can instead establish a configuration
format and interface by which teams provide their own pipeline
compositions.
This initial commit in a series of pipeline related commits implements
two fundamental components to support a CI/CD pipeline that can execute
any number of user-defined variant build/test/publish/deploy stages and
steps in a safely concurrent model: a directed-graph based execution
model, and name bindings for stage outputs. The former provides the
model for composing stage execution, and the latter provides a decoupled
system for defining what outputs each subsequent stage operates upon.
First, an `ExecutionGraph` class that can represent a directed acyclic
graph given a number of linearly defined arcs (aka branches/edges). This
component will allow users to provide the overall execution flow as
separate linear processes but allow parallel branches of the execution
graph to be scheduled concurrently.
Example:
/* To represent a graph with separate parallel branches like:
*
* a x
* ⇘ ⇙
* b
* ⇙ ⇘
* y c
* ⇘ ⇙
* z
*
* One only needs to provides each linear execution arc
*/
def graph = new ExecutionGraph([["a", "b", "c", "z"], ["x", "b", "y", "z"]])
/* The ExecutionGraph can solve how those arcs intersect and how the
* nodes can be scheduled with a degree of concurrency that Jenkins
* allows.
*/
graph.stack() // => [["a", "x"], ["b"], ["y", "c"], ["z"]]
Second, a set of context classes for managing immutable global and local
name/value bindings between nodes in the graph. Effectively this will
provide a way for pipeline stages to safely and deterministically
consume inputs from previous stages along the same branch, and to
provide their own outputs for subsequent stages to consume.
For example, one stage called "build" that builds a container image will
save the image ID in a predetermined local binding called `.imageID` and
a subsequent "publish" stage configured by the user can reference that
image by `${build.imageID}`.
Once a value is bound to a name, that name cannot be reused; bindings
are immutable. Node contexts are only allowed to access namespaces for
nodes that precede them in same branch of the graph, ensuring
deterministic behavior during parallel graph branch execution. See unit
tests for `ExecutionContext` for details on expected behavior.
Put together, these two data structures can constitute an execution
"stack" of sorts that can be safely mapped to Jenkins Pipeline stages,
and make use of parallel execution for graph branches. Specifically, the
`ExecutionGraph.stack()` method is implemented to yield each set of
independent stack "frames" in topological sort order which can safely be
scheduled to run in parallel.
Bug: T210267
Change-Id: Ic5d01bf54c703eaf14434a36f1e2b3e276b48b6f
5 years ago |
|
- package org.wikimedia.integration
-
- import com.cloudbees.groovy.cps.NonCPS
- import org.codehaus.groovy.GroovyException
-
- import org.wikimedia.integration.ExecutionGraph
-
- /**
- * Provides execution context and value bindings to graph nodes. Each node's
- * context (see {@link ofNode()}) provides methods for saving its own values
- * and bindings for accessing values saved by ancestor nodes.
- *
- * These contexts can be used during graph stack evaluation to safely pass
- * values from ancestor nodes to descendent nodes, and to keep nodes from
- * different branches of execution to access each other's values.
- */
- class ExecutionContext implements Serializable {
- private Map globals = [:]
- private ExecutionGraph graph
-
- ExecutionContext(executionGraph) {
- graph = executionGraph
- }
-
- /**
- * Create and return an execution context for the given node.
- */
- NodeContext ofNode(node) {
- new NodeContext(node, graph.ancestorsOf(node))
- }
-
- /**
- * Returns the names of all values bound by node contexts.
- */
- List getAllKeys() {
- def keys = []
-
- for (def ns in globals) {
- for (def key in globals[ns]) {
- keys.add("${ns}.${key}")
- }
- }
-
- keys
- }
-
- /**
- * Provides an execution context for a single given node that can resolve
- * bindings for values stored by ancestor nodes, set its own values, and
- * safely interpolate user-provided strings.
- *
- * @example
- * Given a graph:
- * <pre><code>
- * def context = new ExecutionContext(new ExecutionGraph([
- * ["a", "b", "c", "d", "e", "f"],
- * ["x", "d", "y", "f"],
- * ]))
- * </code></pre>
- *
- * @example
- * Values can be bound to any existing node.
- * <pre><code>
- * context.ofNode("a")["foo"] = "afoo"
- * context.ofNode("a")["bar"] = "abar"
- * context.ofNode("b")["foo"] = "bfoo"
- * context.ofNode("x")["bar"] = "xbar"
- * </code></pre>
- *
- * @example
- * Those same values can be accessed in contexts of descendent nodes, but
- * not in contexts of unrelated nodes.
- * <pre><code>
- * assert context.ofNode("c").binding("a", "foo") == "afoo"
- * assert context.ofNode("c").binding("a", "bar") == "abar"
- * assert context.ofNode("c").binding("b", "foo") == "bfoo"
- *
- * assert context.ofNode("c").binding("x", "bar") == null
- *
- * assert context.ofNode("e").binding("a", "foo") == "afoo"
- * assert context.ofNode("e").binding("a", "bar") == "abar"
- * assert context.ofNode("e").binding("b", "foo") == "bfoo"
- * assert context.ofNode("e").binding("x", "bar") == "xbar"
- * </code></pre>
- *
- * @example
- * Leveraging all of the above, user-provided configuration can be safely
- * interpolated.
- * <pre><code>
- * assert (context.ofNode("c") % 'w-t-${a.foo} ${b.foo}') == "w-t-afoo bfoo"
- * assert (context.ofNode("c") % 'w-t-${a.bar}') == "w-t-abar"
- * assert (context.ofNode("x") % 'w-t-${x.bar}') == 'w-t-${x.bar}'
- * </code></pre>
- */
- class NodeContext implements Serializable {
- final VAR_EXPRESSION = /\$\{\w*\.\w+\}/
-
- final def node
- final Set ancestors
-
- NodeContext(contextNode, contextAncestors) {
- globals[contextNode] = globals[contextNode] ?: [:]
-
- node = contextNode
- ancestors = contextAncestors
- }
-
- /**
- * Binds a value to a name in the globals store under the node's
- * namespace. The value may later be retrieved by any descendent node's
- * context using {@link binding()}.
- */
- @NonCPS
- void bind(String key, value)
- throws NameAlreadyBoundException {
-
- if (globals[node].containsKey(key)) {
- throw new NameAlreadyBoundException(key: key)
- }
-
- globals[node][key] = value
- }
-
- /**
- * Retrieves a value previously bound using {@link bind()} to this node's
- * context. If the given key is not found under this node's namespace, a
- * {@link NameNotFoundException} is thrown.
- */
- def binding(String key)
- throws NameNotFoundException {
-
- if (!globals[node].containsKey(key)) {
- throw new NameNotFoundException(ns: node, key: key)
- }
-
- globals[node][key]
- }
-
- /**
- * Retrieves a value previously bound using {@link bind()} under the given
- * ancestor node's namespace and name.
- */
- def binding(ancestorNode, String key)
- throws NameNotFoundException, AncestorNotFoundException {
-
- if (!(ancestorNode in ancestors)) {
- throw new AncestorNotFoundException(ancestor: ancestorNode, node: node)
- }
-
- if (!globals[ancestorNode].containsKey(key)) {
- throw new NameNotFoundException(ns: ancestorNode, key: key)
- }
-
- globals[ancestorNode][key]
- }
-
- /**
- * Returns all objects bound to the given name under any node namespace,
- * as well as the node under which it is found.
- *
- * This should only be used at the end of an execution graph.
- */
- Map getAll(String key) {
- globals.findAll { it.value[key] != null }.collectEntries { [it.key, it.value[key]] }
- }
-
- /**
- * Operator alias for {@link binding(String)} or, if a "namespace.key" is
- * given, {@link binding(def, String)}.
- */
- def getAt(String key) {
- def keys = key.split(/\./)
-
- if (keys.size() > 1) {
- if (keys[0] == "") {
- return binding(keys[1])
- }
-
- return binding(keys[0], keys[1])
- }
-
- return binding(key)
- }
-
- /**
- * Interpolates the given string by substituting all symbol expressions
- * with values previously bound by ancestor nodes.
- */
- String interpolate(String str) {
- // NOTE call to replaceAll does not rely on its sub matching feature as
- // Groovy CPS does not implement it correctly, and marking this method
- // as NonCPS causes it to only ever return the first substitution.
- str.replaceAll(VAR_EXPRESSION) {
- this[it[2..-2]]
- }
- }
-
- /**
- * Operator alias for {@link interpolate()}.
- */
- String mod(String str) {
- interpolate(str)
- }
-
- /**
- * Operator alias for {@link bind()}.
- */
- void putAt(String key, value) {
- bind(key, value)
- }
- }
-
- class AncestorNotFoundException extends GroovyException {
- def ancestor, node
-
- String getMessage() {
- "cannot access '${ancestor}.*' values since '${node}' does not follow it in the graph '${graph}'"
- }
- }
-
- class NameNotFoundException extends GroovyException {
- def ns, key
-
- String getMessage() {
- "no value bound for '${ns}.${key}'; all bound names are: ${getAllKeys().join(", ")}"
- }
- }
-
- class NameAlreadyBoundException extends GroovyException {
- def key
-
- String getMessage() {
- "'${node}' already has a value assigned to '${key}'"
- }
- }
- }
|