Contract-Driven Development – a Real-World Adoption Journey

Key Takeaways

  • Contract-driven development (CDD) can help with microservices integration but needs a new collaboration model among stakeholders.
  • CDD involves writing and storing API specifications before building applications, which is different from usual practices and may take time to adopt.
  • CDD builds on and contributes to a strong test pyramid which requires changes to test composition, strategy and application architecture to improve component testability.
  • Start the CDD adoption journey with a few teams first to show its value and identify the necessary changes to your way of working before expanding it to others.
  • Use the learnings and data from initial proof of concept to persuade leadership and others to help drive large scale adoption through smooth onboarding experience with contextualized playbooks and utilities for CDD

Testing contract compatibility between components independently and early in the SDLC has become necessary for effective microservices integration. Contract-driven development makes for a compelling option to address this problem by leveraging API specifications as executable contracts. However, contract-driven development is not just about choosing API specifications and/or associated tooling. It can take time to warm up to the concepts and internalize the underlying rationale.

In this article, I will share our journey of how a large financial services company adopted contract-driven development. We will be covering the situation on the ground before adoption and initial skepticism, and how we were able to influence change by demonstrating value and eventually getting people to collaborate over API specifications.

Prequel—The story before contract-driven development

Monolith to Microservices Transition;the company was transitioning from a handful of large monolithic applications to smaller microservices. However, as the number of services grew, our ability to ship features suffered to the point where we had to stop feature development frequently to iron out microservices integration issues.

Contract testing (Consumer-driven contract testing)

We heard about consumer-driven contract testing, which seemed to address our concerns. However, early on, we realized that its mechanics did not suit our development style. Primarily the aspect of first building the consumer application and, in the process, generating the API contracts for the provider. Then building the provider based on those API contracts, which translated to a sequential development style. Many of our microservices followed an API provider-first approach, and almost everyone preferred a technique that allowed parallel development of consumer and provider applications. However, we did try to pursue the approach in areas where possible (For example: UI applications with backend for frontend microservices).

OpenAPI specifications

OpenAPI specifications seemed like a great idea to help the teams articulate API designs clearly to each other. It also seemed like this would allow parallel development of provider and consumer applications. OpenAPI specification did help us unambiguously communicate API design, especially data types, required fields, etc., compared to free-form text.

However, we could not achieve parallel development because of how these API specifications were created and shared. Any feature development started with provider application code changes based on which OpenAPI specifications were generated. Consumer applications then leveraged these API specifications to generate API client boilerplate code. This made the entire development process sequential.

Moreover, this process could not prevent integration issues between consumer and provider applications. When API specifications were being generated from the provider application code, we ran into several problems:

  1. Human error—Any missed annotations or metadata by the provider developers would result in an API specification that is not representative of the behavior of the code. Client applications generating code based on these erroneous API specifications were incompatible with the provider applications.
  2. Non-standard distribution mechanism—API specifications were shared over email and documentation sites. This led to several versions of the API specification without a single source of truth.
  3. Additional burden—With consumer-driven contract testing happening in parallel in pockets, some teams had to maintain both the API contract files and generate/maintain OpenAPI specification files. This was especially a concern since there was no additional perceived benefit for the developers on the ground.
  4. Discoverability—Since each provider team owned their API specification and had their respective ways of sharing the same, consumer teams did not have a uniform experience in accessing the correct API specifications. There were talks about adopting an API management tool to address this. However, we realized that it would not solve the other pain points like human error and the added burden of additional tooling and process.

In all of the above issues, OpenAPI specification was not the problem. API specifications by themselves only help us articulate API behavior. Just generating API specifications from code or vice versa does not amount to an adoption strategy. We were missing standardized mechanisms to leverage OpenAPI specifications effectively.

In short, both consumer-driven contract testing and attempting OpenAPI specification adoption (without the necessary platforming) had not moved the needle in a big way for the organization in terms of resolving contract-compatibility issues to help ship features effectively. 

Starting with contract-driven development adoption journey

In our search for an alternate solution, we wanted to try contract-driven development because it seemed to satisfy our initial criteria.

  1. Parallel development of provider and consumer applications
  2. API specification is the API contract (instead of two separate artifacts)
  3. An automated technique other than a code generation-based technique to ensure providers were adhering to contract
  4. More emphasis on API design and promoting collaboration among teams in the process

However, the teams were also skeptical about contract-driven development because this again involved API specifications and contract testing, both of which they had already tried and had seen a low return on investment. However, addressing these concerns was a great starting point for us to get the teams started on contract-driven development. We felt the most convincing way of achieving this would be through a real-world example in their context and taking it to production.

First API specification as executable contract

To gain more buy-in, we set out to introduce contract-driven development to just a handful of teams working on a feature that cut across two or three microservices and frontend components.

Authoring API specifications

None of these teams had written an OpenAPI specification by hand. They had always generated them from the code and found it difficult to author an API with raw OpenAPI yaml. However, once we discovered the extensive tooling available, the teams started making progress. The tool of choice for most people was, and running a containerized version of it on their local machine meant they could always quickly validate the syntax and browse through endpoints visually. The other tool that was appealing to many was stoplight studio. After that, the team went on to discover several plugins for their favorite IDEs, and then there was no stopping.

API design

We now got the consumer and provider teams to jointly discuss the changes for the new feature over the OpenAPI specification file. This was the eye-opening moment for the teams involved, who were collaborating on API specifications. Examples:

  1. Consumer team voiced their needs for aspects like pagination, filtering, etc., which needed to be more intuitive in the earlier design, and the provider team made necessary changes.
  2. Provider team explained limitations in building a sub 100 ms synchronous API call that, in turn, had to orchestrate several API calls to its dependencies. Eventually, we reached a consensus with the consumer team to have a callback mechanism.
  3. Architecture and security stakeholders suggested that the provider application return appropriate HTTP error codes to avoid enumeration attacks.

In short, the outcome of several discussions was fully captured in OpenAPI, and there was no ambiguity. Seeing the benefits of this discussion dispelled the skepticism around starting with authoring API specifications.

Storing API specifications

Now that we had created our first API specification by hand, the question on everyone’s mind was where to store it. I suggested creating a central contract repository to store the same. We saw some interesting questions and also some pushback on this.

Provider application teams, who historically owned these API specifications (because they generated them from their code), wanted to continue the same practice. In their opinion, it made sense to keep the API specifications in the same repository as the code for the microservice implementing it and then publish the API specification as a build artifact to a common location for discoverability. However, from a contract-driven development point of view, there are downsides to this approach.

  1. Provider bias—Since the provider application teams have complete control over these API specifications, they may not always see the necessity to involve consumer teams before updating API specifications. The consumer teams may not even be made aware of the changes (even after the fact), because of which API design changes may surprise them.
  2. Not collaboration friendly—Consumer applications have to raise pull/merge requests to multiple repositories (for each provider dependency) to propose changes to their API specifications. The same difficulty applies to stakeholders like solution architects, etc., who may work across multiple teams. This can discourage participation from parties other than provider application developers in the API design and specification authoring process.

Demonstrating the value of central contract repository

A Central contract repository (storing API specifications across the organization in a single version control repository), in contrast to the above approach, serves as a common meeting point for consumer teams, provider teams, and other stakeholders while acting as the single source of truth for API specifications across the board. However, to convince people of the benefits, we created a central contract repository on which we could experiment.

Figure 1: Pre-merge checks for central contract repository

  1. All direct commits to the master branch were disallowed
  2. Any change had to be proposed through a pull/merge request only
  3. Each merge request went through three pre-merge checks
    • Automated API style guide/Linter
    • Backward compatibility checks with Specmatic to prevent breaking changes
    • A manual review prior to merge to allow stakeholders to share their opinions 

Example: Pre-merge checks implemented with Github actions. Here is the complete code.

Let us look at each of these pre-merge checks in detail.

  1. Linter stage ensured that all API specifications adhered to organization-wide practices, style guides, and coding standards for API specifications. This made it easy for onboarding team members, and the experience was consistent because all the guidelines were in place. Example: We started with a basic Spectral configuration that extended the standard OpenAPI 3 ruleset. We made minor changes to severity levels, and then added custom rules to codify our style guide.

  1. Backward compatibility (#nocode) checks were now possible at the earliest instance where a change could be introduced. Specmatic backward compatibility check was added as the second step in the pre-merge process to prevent breaking changes from being introduced even by accident. Example: A build failure prevented the merge because a breaking change was being proposed as part of the pull request.

  1. Manual review, as the last step, ensured that a sufficient number of people representing all stakeholders reviewed the proposed change. The API specifications pertaining to each team/service were organized under folders with naming conventions similar to how we organize code under packages. These folders were seeded with a JSON configuration listing the reviewers. Anyone in the organization could raise a pull/merge request to propose changes to API specifications under any team. However, only after all reviewers listed in the configuration for the folder where changes were being proposed submitted their reviews the merge button would be enabled by the pre-merge checks. This way, even though the central contract repository is shared across all teams, we could control the minimum list of reviews required at a folder level. Example: Central contract repository folder structure and specific collaborators for accounts service.

Re-usability also became practical because all API specifications were in a single location. We could extract common components such as headers or sections of schema to reduce duplication. Example: Headers and security schemes defined in common.yaml referenced in products.yaml 

At this point, there was enough evidence of collaboration (pull/merge request review comments, etc.) and the ability to prevent breaking changes in API design (through automated backward compatibility checks), which convinced the teams to start pushing more of their API specifications into this location.

API specifications as executable contracts

Now that we had our API specification stored in the central contract repository, it was time to enforce it on the consumer and provider teams.

Consumer application—API specification as mock

Contract compatibility is enforced on the consumer side by having the consumer applications interact with an emulation of the provider application based on the commonly agreed API specification (contract as stub). How effectively we identify potential compatibility issues with this mechanism depends on the thoroughness of consumer API tests.

However, our teams had none. We had unit tests, and we wrote the next immediate level of tests for an integrated environment in which all their dependent services were also deployed. 

Isolating the application from its dependencies was our first step. On consumer applications built with Spring, we leveraged profiles to inject appropriate stubs, in-memory databases, queues, etc. On NodeJS applications, we achieved similar isolation with libraries like node-config. We generated OpenAPI specifications (by intercepting and recording traffic in test and/or staging environments with Specmatic proxy mode) for HTTP dependencies that did not already have one and pushed them to the central contract repository. Based on these API specifications, we could then emulate associated HTTP dependencies with Specmatic “contract as stub” server and set expectation data. We made appropriate changes in property files/configurations to run consumer applications independent of other services in local developer machines.

Writing API component tests was now a breeze compared to earlier because we had successfully eliminated the need for a complex environment setup on local machines. Each team chose a component testing tool that best suited their needs (JavaScript UI components leveraged Karma, and some microservices teams used RestAssured). The main criterion was that irrespective of the testing framework, the basic anatomy of an API component test should be satisfied.

Figure 2: Consumer component test setup

Running component tests in CI was our next logical point. This was now a trivial task. Specmatic, being a platform and programming language agnostic executable, the same “contract as stub” setup running on local developer machines could now be leveraged on our CI also. Since Specmatic was pulling the API specifications from the central contract repository in both places (local dev env and CI), the component test setup was consistent across these two environments.

Figure 3: Consumer component test setup in local and CI environments

Consistent component test setup in local and CI with Specmatic “contract as stub” boosted confidence among consumer application developers because they could be sure that component tests that were successful on local machines would work in the CI also. And conversely, any test failures on CI could be replicated easily on their local machines, making it easy to resolve.

Provider applications—API specifications as “contract as tests”

Provider applications also had issues similar to their consumer counterparts and had to be made component testable. Once they got past this point, they could run their API specifications in their central contract repository as contract tests against their application.

Consumer and provider terminology is relative. Example: Application B can be a provider to application A and also be a consumer of application C. So, in this case, application B needs to run Specmatic “contract as test” based on its API specification, which it has shared with application A and isolate itself from its dependencies by running Specmatic “contract as stub” based on API specifications of its dependencies.

“Contract as Test” 

(API spec of B)                           (API spec of dependencies of B)

This was an important realization for the teams to see how Specmatic was able to help them on both (as a test and as a stub) fronts.

With “contract as tests” running, we now wanted to get the API component tests running for the provider application. At this point, the provider application developers raised an important question: Why write “API tests” when we are running API specifications as “contract as tests”?

While “contract as test” verifies the API signature of the provider application, “API component tests” validate the application’s logic. Example: consider an API for a calculator that accepts two integer operands, an enum for operation, and returns another integer as a result. Here the “contract as tests” are concerned with verifying if the application implementing the said API accepts two integers, one of the operand enums, and the result is also an integer. Here we are not verifying the correctness of the calculation itself as long as the contract is met (data types, schema, HTTP status, etc., are satisfied).

On the other hand, “API/component tests” are concerned with verifying the calculation. While we can argue that API signature is also being exercised during the “API component test,” it is important to note that the API specification is not involved here, and thereby the consumer applications’ interests are not represented. Similarly, when this calculator API specification is being leveraged as “contract as stub” by the consumer, the stub server, by default, returns randomly auto-generated values matching the datatype of the result. The consumer application team can, however, provide an expectation to the stub server to return specific results to input combinations that are correct mathematically in addition to having the right datatype as per the calculator API spec. 

Understanding this difference helped them gain important insights about the purpose of each type of test and where each belongs in the overall test pyramid. One interesting learning we got is that the number of API component tests drastically reduced since many of those aspects were already covered by the contract test.

Figure 4: Test pyramid

Also, now that we had both “contract as tests” and “API component tests”, we had to sequence them in our CI pipeline. We ran the “contract as tests” first to ensure that the API signature was in line with the API specification for this application before proceeding with the “API component tests,” which verified logic.

Figure 5: Provider contract and component tests in local and CI environments

Early identification of contract compatibility issues

With the above local and continuous integration environment setup, both consumer and provider application teams were able to identify contract compatibility issues early in their development cycle. This translated to a stable staging environment; the teams could now ship features independently while ensuring that when these components are deployed together, they will work well with each other.

Figure 6: Identifying contract compatibility issues early with contract-driven development

Wider adoption—the long haul

These initial teams that participated in the proof-of-concept for contract-driven development at this point had become strong proponents of the approach and were now in a position to even guide other teams to champion the effort. However, this did not immediately translate to the entire organization embracing the idea.

To convince other teams to participate, we had to extract data from the early adopter teams’ journey to show proof of value addition. Some of these metrics, such as issue resolution time, bug leakage, etc., took time to gather meaningful patterns and correlate and compare with baseline data.

Another area that required significant time and effort investment was making the adoption journey as seamless as possible through playbooks and utilities. With this, we saw an increase in uptake as the time for onboarding reduced. We also had to constantly watch for any regression in CDD adoption and add checks to prevent the same.

While the overall adoption journey may have taken a few quarters, the slow pace had its advantages. We experimented with the concepts and contextualized the approach before a wider rollout.