pipelinelib

41 Commits

1 Branch

120 KiB

Commit Graph

Author	SHA1	Message	Date
Dan Duvall	858c26317b	pipeline: Directed graph execution model Our current pipelinelib based jobs require repos to conform to a number of rigid conventions: assume the repo contains source for only a single application, build "test" variant, run "test" variant, build "production" variant, helm deploy/test, publish under a single tag name. These jobs also assume all of these operations need to be performed linearly. While this design was sufficient for our very first use cases, its convention based design it already proving prohibitively inflexible. For example, teams maintaining repos that contain multiple interrelated applications cannot build and test these applications as independent images; Teams wanting to execute multiple test suites would have to wrap them in a single entrypoint and implement their own concurrency should they need it; Etc. Instead of Release Engineering maintaining a new specialized pipeline job for each team that performs only slightly different permutations of the same operations (resulting in duplication of job definitions and a large maintenance burden), we can instead establish a configuration format and interface by which teams provide their own pipeline compositions. This initial commit in a series of pipeline related commits implements two fundamental components to support a CI/CD pipeline that can execute any number of user-defined variant build/test/publish/deploy stages and steps in a safely concurrent model: a directed-graph based execution model, and name bindings for stage outputs. The former provides the model for composing stage execution, and the latter provides a decoupled system for defining what outputs each subsequent stage operates upon. First, an `ExecutionGraph` class that can represent a directed acyclic graph given a number of linearly defined arcs (aka branches/edges). This component will allow users to provide the overall execution flow as separate linear processes but allow parallel branches of the execution graph to be scheduled concurrently. Example: /* To represent a graph with separate parallel branches like: * * a x * ⇘ ⇙ * b * ⇙ ⇘ * y c * ⇘ ⇙ * z * * One only needs to provides each linear execution arc / def graph = new ExecutionGraph([["a", "b", "c", "z"], ["x", "b", "y", "z"]]) / The ExecutionGraph can solve how those arcs intersect and how the * nodes can be scheduled with a degree of concurrency that Jenkins * allows. */ graph.stack() // => [["a", "x"], ["b"], ["y", "c"], ["z"]] Second, a set of context classes for managing immutable global and local name/value bindings between nodes in the graph. Effectively this will provide a way for pipeline stages to safely and deterministically consume inputs from previous stages along the same branch, and to provide their own outputs for subsequent stages to consume. For example, one stage called "build" that builds a container image will save the image ID in a predetermined local binding called `.imageID` and a subsequent "publish" stage configured by the user can reference that image by `${build.imageID}`. Once a value is bound to a name, that name cannot be reused; bindings are immutable. Node contexts are only allowed to access namespaces for nodes that precede them in same branch of the graph, ensuring deterministic behavior during parallel graph branch execution. See unit tests for `ExecutionContext` for details on expected behavior. Put together, these two data structures can constitute an execution "stack" of sorts that can be safely mapped to Jenkins Pipeline stages, and make use of parallel execution for graph branches. Specifically, the `ExecutionGraph.stack()` method is implemented to yield each set of independent stack "frames" in topological sort order which can safely be scheduled to run in parallel. Bug: T210267 Change-Id: Ic5d01bf54c703eaf14434a36f1e2b3e276b48b6f	6 years ago

Author

SHA1

Message

Date

Dan Duvall

858c26317b

pipeline: Directed graph execution model

Our current pipelinelib based jobs require repos to conform to a number
of rigid conventions: assume the repo contains source for only a single
application, build "test" variant, run "test" variant, build
"production" variant, helm deploy/test, publish under a single tag name.
These jobs also assume all of these operations need to be performed
linearly.

While this design was sufficient for our very first use cases, its
convention based design it already proving prohibitively inflexible. For
example, teams maintaining repos that contain multiple interrelated
applications cannot build and test these applications as independent
images; Teams wanting to execute multiple test suites would have to wrap
them in a single entrypoint and implement their own concurrency should
they need it; Etc.

Instead of Release Engineering maintaining a new specialized pipeline
job for each team that performs only slightly different permutations of
the same operations (resulting in duplication of job definitions and a
large maintenance burden), we can instead establish a configuration
format and interface by which teams provide their own pipeline
compositions.

This initial commit in a series of pipeline related commits implements
two fundamental components to support a CI/CD pipeline that can execute
any number of user-defined variant build/test/publish/deploy stages and
steps in a safely concurrent model: a directed-graph based execution
model, and name bindings for stage outputs. The former provides the
model for composing stage execution, and the latter provides a decoupled
system for defining what outputs each subsequent stage operates upon.

First, an `ExecutionGraph` class that can represent a directed acyclic
graph given a number of linearly defined arcs (aka branches/edges). This
component will allow users to provide the overall execution flow as
separate linear processes but allow parallel branches of the execution
graph to be scheduled concurrently.

Example:

    /* To represent a graph with separate parallel branches like:
     *
     *   a       x
     *     ⇘   ⇙
     *       b
     *     ⇙   ⇘
     *   y       c
     *     ⇘   ⇙
     *       z
     *
     * One only needs to provides each linear execution arc
     */
    def graph = new ExecutionGraph([["a", "b", "c", "z"], ["x", "b", "y", "z"]])

    /* The ExecutionGraph can solve how those arcs intersect and how the
     * nodes can be scheduled with a degree of concurrency that Jenkins
     * allows.
     */
    graph.stack() // => [["a", "x"], ["b"], ["y", "c"], ["z"]]

Second, a set of context classes for managing immutable global and local
name/value bindings between nodes in the graph. Effectively this will
provide a way for pipeline stages to safely and deterministically
consume inputs from previous stages along the same branch, and to
provide their own outputs for subsequent stages to consume.

For example, one stage called "build" that builds a container image will
save the image ID in a predetermined local binding called `.imageID` and
a subsequent "publish" stage configured by the user can reference that
image by `${build.imageID}`.

Once a value is bound to a name, that name cannot be reused; bindings
are immutable. Node contexts are only allowed to access namespaces for
nodes that precede them in same branch of the graph, ensuring
deterministic behavior during parallel graph branch execution. See unit
tests for `ExecutionContext` for details on expected behavior.

Put together, these two data structures can constitute an execution
"stack" of sorts that can be safely mapped to Jenkins Pipeline stages,
and make use of parallel execution for graph branches. Specifically, the
`ExecutionGraph.stack()` method is implemented to yield each set of
independent stack "frames" in topological sort order which can safely be
scheduled to run in parallel.

Bug: T210267
Change-Id: Ic5d01bf54c703eaf14434a36f1e2b3e276b48b6f

6 years ago

1 Commits (master)