Archives December 2022

Securing a Kafka Cluster in Kubernetes Using Strimzi

Key Takeaways

  • Strimzi simplifies the deployment of a Kafka cluster to a Kubernetes cluster.
  • Strimzi configuration lets you secure Kafka communications and provide user/topic RBAC management in a declarative way.
  • Debezium Server provides attributes to connect to a secured Kafka cluster.
  • Debezium Embedded can be used as Strimzi creates Kubernetes Secrets with the required credentials that any Kubernetes deployment can read.
  • By default, Kubernetes doesn’t encrypt secrets, which you need to configure to protect them against attacks.

In part 3 of this series, we learned about dual writes problems and how to solve them using Change Data Capture patterns, specifically using Debezium to read changes done in the database (through transaction log) and populating these changes to a Kafka topic.

In part 4 of this series, we moved the example one step forward, running the application from the local development machine to Kubernetes (production environment). We rely on Strimzi to deploy and configure Kafka and Debezium into the Kafka cluster.

But overall, we missed one important thing not covered at that time to make things simpler yet very important; this is security.

  • How to secure the MySQL instance without having the username/password directly hardcoded in the deployment file.
  • How to add authn in the Kafka cluster using Strimzi.
  • How to configure Debezium to authenticate against Kafka and MySQL instances securely.

In this article, we’ll answer all these questions by taking the application developed in the previous article (using the Debezium Server approach) and securing it.

Kubernetes

We need a Kubernetes cluster with Strimzi installed. We explained this in part 4 of this series; if you are reusing it, you first need to delete the application, the MySQL database, the Kafka cluster, and the Debezium instance.

IMPORTANT: You only need to run the following steps IF you still have the cluster from part 4. If you already deleted it, continue reading after the section that describes how to delete the cluster.

Run the following commands in a terminal window to delete them:

kubectl delete deployment movie-plays-producer-debezium-server -n kafka
kubectl delete service movie-plays-producer-debezium-server -n kafka
kubectl delete -f mysql-deployment.yaml -n kafka
kubectl delete -f debezium-kafka-connector.yaml -n kafka
kubectl delete -f debezium-kafka-connect.yaml -n kafka
kubectl delete -f kafka.yaml -n kafka

IMPORTANT: You only need to run the following step if you don’t have a Kuberntes cluster.

If you have already destroyed the cluster, follow the quick instructions to create a new one. In a terminal window, run these commands:

minikube start -p strimzi --kubernetes-version='v1.22.12' --vm-driver='virtualbox' --memory=12096 --cpus=3

kubectl create namespace kafka

kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka

Validate the operator installation by running the following command:

kubectl get pods -n kafka

NAME                                        READY   STATUS    RESTARTS   AGE
strimzi-cluster-operator-597d67c7d6-ms987   1/1     Running   0          4m27s

Wait until the operator is running and ready.

At this point, we can start installing all the components with authentication and authorization instead of anonymous access.

MySQL

In the previous article, we deployed the MySQL instance, hardcoding the username/password in the deployment file as an environment variable:

env:
    - name: MYSQL_ROOT_PASSWORD
      value: alex
    - name: MYSQL_DATABASE
      value: moviesdb
    - name: MYSQL_USER
      value: alex
    - name: MYSQL_PASSWORD
      value: alex

Let’s create a Kubernetes Secret to store these sensitive data. Data in a Kubernetes secrets file must be encoded in base64 format. The alex string encoded in base64 is YWxleA==.

To generate this value, run the following command:

echo -n 'alex' | base64
YWxleA==

Create the mysql-secret.yaml file with the secrets set:

apiVersion: v1
kind: Secret
metadata:
 name: mysqlsecret
type: Opaque
data:
 mysqlrootpassword: YWxleA==
 mysqluser: YWxleA==
 mysqlpassword: YWxleA==

And apply it to the cluster:

kubectl apply -f mysql-secret.yaml -n kafka

Then update the MySQL deployment file to read the values from the secret created in the previous step using the secretKeyRef field in the value section:

apiVersion: v1
kind: Service
metadata:
 name: mysql
 labels:
   app: mysql
spec:
 ports:
   - port: 3306
 selector:
   app: mysql
 clusterIP: None
---
apiVersion: apps/v1
kind: Deployment
metadata:
 name: mysql
 labels:
   app: mysql
spec:
 selector:
   matchLabels:
     app: mysql
 strategy:
   type: Recreate
 template:
   metadata:
     labels:
       app: mysql
   spec:
     containers:
     - image: mysql:8.0.30
       name: mysql
       env:
       - name: MYSQL_ROOT_PASSWORD
         valueFrom:
           secretKeyRef:
             key: mysqlrootpassword
             name: mysqlsecret
       - name: MYSQL_DATABASE
         value: moviesdb
       - name: MYSQL_USER
         valueFrom:
           secretKeyRef:
             key: mysqluser
             name: mysqlsecret
       - name: MYSQL_PASSWORD
         valueFrom:
           secretKeyRef:
             key: mysqlpassword
             name: mysqlsecret
       ports:
       - containerPort: 3306
         name: mysql

In a secretKeyRef section, we specify the secret name where secrets are stored; in this case, we named mysqlsecret in the mysql-secret.yaml file.

Deploy the MySQL instance into the Kubernetes cluster:

kubectl apply -f mysql-deployment.yaml -n kafka

We can validate secrets that are injected correctly by exporting the environment variables. First, let’s get the Pod name:

kubectl get pods -n kafka
NAME                                        READY   STATUS    RESTARTS   AGE
mysql-7888f99967-4cj47                      1/1     Running   0          90s

And then running export command inside the container by running the following commands in a terminal window::

kubectl exec -n kafka -ti mysql-7888f99967-4cj47 /bin/bash

bash-4.4# export
declare -x GOSU_VERSION="1.14"
declare -x HOME="/root"
declare -x HOSTNAME="mysql-7888f99967-4cj47"
declare -x KUBERNETES_PORT="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP_ADDR="10.96.0.1"
declare -x KUBERNETES_PORT_443_TCP_PORT="443"
declare -x KUBERNETES_PORT_443_TCP_PROTO="tcp"
declare -x KUBERNETES_SERVICE_HOST="10.96.0.1"
declare -x KUBERNETES_SERVICE_PORT="443"
declare -x KUBERNETES_SERVICE_PORT_HTTPS="443"
declare -x MYSQL_DATABASE="moviesdb"
declare -x MYSQL_MAJOR="8.0"
declare -x MYSQL_PASSWORD="alex"
declare -x MYSQL_ROOT_PASSWORD="alex"
declare -x MYSQL_SHELL_VERSION="8.0.30-1.el8"
declare -x MYSQL_USER="alex"
declare -x MYSQL_VERSION="8.0.30-1.el8"
declare -x OLDPWD
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
declare -x PWD="/"
declare -x SHLVL="1"
declare -x TERM="xterm"

Now you can exit the container:

exit

The MySQL database credentials are now configured using a Kubernetes Secret, which is much better than setting them in the deployment file. The other part to update is the application as it now needs to read the credentials from the Secret instead of having them statically set in the configuration file.

Movie Plays Producer Debezium

The database username and password are hardcoded in the application.properties file. It would be better if the application could be configured automatically with username and password set in the Kubernetes Secret when deployed to Kubernetes.

One way to do this could be by injecting the secrets as environment variables as we did in the MySQL deployment into the application Pod. For example, in the case of the password, the env part of the deployment file would be:

- name: MYSQL_PASSWORD
  valueFrom:
    secretKeyRef:
      key: mysqlpassword
      name: mysqlsecret

Now update the application.properties file to set the password value from the environment variable:

%prod.quarkus.datasource.password=${mysql-password}

This works, but storing secrets as environment variables isn’t the most secure way to do it, as they can easily be hacked by anyone listing the environment variables.

Quarkus includes the kubernetes-config extension that allows the application to read Kubernetes ConfigMaps and Secrets directly from the Kubernetes API server. This way, secrets are securely transmitted from the Kubernetes cluster to the application memory without any middle step like materializing them as environment variables or mounting them as volumes.

Kubernetes Config Extension

The first thing to do is register the kubernetes-config extension. Open the pom.xml file and add the following dependency:


  io.quarkus
  quarkus-kubernetes-config

Then, enable the application to read Kubernetes Secrets directly from the Kubernetes API, and set the name of the Secret (in our case mysqlsecret) to read.

Open the src/main/resources/application.properties file and append the following lines:

%prod.quarkus.kubernetes-config.secrets.enabled=true                           
quarkus.kubernetes-config.secrets=mysqlsecret

Then update the quarkus.datasource.username and quarkus.datasource.password properties to read their values from the keys, mysqluser and mysqlpassword, from the mysqlsecret Secret.

In the application.properties file, update these properties accordingly:

%prod.quarkus.datasource.username=${mysqluser}
%prod.quarkus.datasource.password=${mysqlpassword}

Both values are assigned with the value obtained from the key set in the mysqlsecret Secret.

Since reading Kubernetes Secrets involves interacting with the Kubernetes API Server, when RBAC (roll-based access control) is enabled on the cluster, the ServiceAccount used to run the application must have the proper permissions for such access.

Because we registered the Kubernetes extension in the previous article, all the necessary Kubernetes resources to make that happen are automatically generated, so we don’t need to do anything.

Let’s deploy the application running the following command in a terminal window:

./mvnw clean package -DskipTests -Dquarkus.kubernetes.deploy=true

…
[INFO] [io.quarkus.kubernetes.deployment.KubernetesDeployer] Deploying to kubernetes server: https://192.168.59.104:8443/ in namespace: kafka.
[INFO] [io.quarkus.kubernetes.deployment.KubernetesDeployer] Applied: Service movie-plays-producer-debezium-server.
[INFO] [io.quarkus.kubernetes.deployment.KubernetesDeployer] Applied: Deployment movie-plays-producer-debezium-server.
[INFO] [io.quarkus.deployment.QuarkusAugmentor] Quarkus augmentation completed in 9537ms

To validate the correctness of the deployment, inspect the Pod’s log so no error is shown and SQL statements are executed correctly:

kubectl get pods -n kafka

NAME                                                         READY   STATUS      RESTARTS   AGE
movie-plays-producer-debezium-server-auth-7cc69fb56c-nc8tx   1/1     Running     0          44s


kubectl logs movie-plays-producer-debezium-server-auth-7cc69fb56c-nc8tx -n kafka

__  ____  __  _____   ___  __ ____  ______
 --/ __ / / / / _ | / _ / //_/ / / / __/
 -/ /_/ / /_/ / __ |/ , _/ ,

In the following illustration, you can see the part we correctly secured.


 
Now that the application is running, with the MySQL credentials correctly managed, let’s move on to secure Kafka and Debezium parts.

Kafka

So far, we’ve deployed an open Kafka cluster; no authentication or authorization logic was enabled.

Strimzi allows to deploy of a Kafka cluster with the following authentication mechanisms:

  • SASL SCRAM-SHA-512
  • TLS client authentication
  • OAuth 2.0 token-based authentication

Since the Strimzi Operator is already installed on the Kubernetes cluster, we can use the Kafka custom resource. Kafka resource configures a cluster deployment, and in this case, with TLS client authentication enabled.

Strimzi has options to set a listener in the listeners block to use mTLS as communication protocol (tls=true) and authentication method type (authentication field).

Create a new file named kafka.yaml with the following content to configure a secured Kafka:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
 name: my-cluster
 namespace: kafka
spec:
 kafka:
   version: 3.2.0
   replicas: 1
   listeners:
     - name: demo
       port: 9092
       type: internal
       tls: false
     - name: secure
       port: 9093
       type: internal
       tls: true
       authentication:
         type: tls
   authorization:
     type: simple
   config:
     offsets.topic.replication.factor: 1
     transaction.state.log.replication.factor: 1
     transaction.state.log.min.isr: 1
     default.replication.factor: 1
     min.insync.replicas: 1
     inter.broker.protocol.version: "3.2"
   storage:
     type: ephemeral
 zookeeper:
   replicas: 1
   storage:
     type: ephemeral
 entityOperator:
   topicOperator: {}
   userOperator: {}

And apply it to the Kubernetes cluster:

kubectl apply -f kafka.yaml -n kafka

kafka.kafka.strimzi.io/my-cluster created

Let’s validate the Kafka cluster is up and running:

kubectl get pods -n kafka

NAME                                         READY   STATUS    RESTARTS   AGE
my-cluster-entity-operator-d4db5ff58-rt96n   3/3     Running   0          2m26s
my-cluster-kafka-0                           1/1     Running   0          2m58s
my-cluster-zookeeper-0                       1/1     Running   0          3m31s

Since we set the listener to use TLS, Strimzi has automatically created a Kubernetes Secret with the cluster certificate, pkcs12 truststore, and associated password as data.

kubectl get secrets -n kafka

my-cluster-clients-ca                        Opaque                                1      9m14s
my-cluster-clients-ca-cert                   Opaque                                3      9m14s
my-cluster-cluster-ca                        Opaque                                1      9m14s
my-cluster-cluster-ca-cert                   Opaque                                3      9m14s
my-cluster-cluster-operator-certs            Opaque                                4      9m14s
my-cluster-entity-operator-dockercfg-5wwb5   kubernetes.io/dockercfg               1      8m9s
my-cluster-entity-operator-token-h9xkq       kubernetes.io/service-account-token   4      8m9s
my-cluster-entity-operator-token-npvfc       kubernetes.io/service-account-token   4      8m9s
my-cluster-entity-topic-operator-certs       Opaque                                4      8m9s
my-cluster-entity-user-operator-certs        Opaque                                4      8m8s
my-cluster-kafka-brokers                     Opaque                                4      8m41s
my-cluster-kafka-dockercfg-fgpx2             kubernetes.io/dockercfg               1      8m41s
my-cluster-kafka-token-2x7s8                 kubernetes.io/service-account-token   4      8m41s
my-cluster-kafka-token-6qdgk                 kubernetes.io/service-account-token   4      8m41s
my-cluster-zookeeper-dockercfg-p296g         kubernetes.io/dockercfg               1      9m13s
my-cluster-zookeeper-nodes                   Opaque                                4      9m13s
my-cluster-zookeeper-token-dp9sc             kubernetes.io/service-account-token   4      9m13s
my-cluster-zookeeper-token-gbrxg             kubernetes.io/service-account-token   4      9m13s

The important secret here is the one named -cluster-ca-cert (in this case my-cluster-cluster-ca-cert).

List the content of the secret by running the following command in a terminal window:

kubectl get secret my-cluster-cluster-ca-cert -o yaml -n kafka

apiVersion: v1
data:
  ca.crt: LS0tLS1CRUdJTiBDRVJU
  ca.p12: MIIGkwIBAzCCBk==
  ca.password: azJjY2tIMEs1c091
kind: Secret
metadata:
  annotations:
    strimzi.io/ca-cert-generation: "0"
  creationTimestamp: "2022-08-21T19:32:55Z"
  labels:
    app.kubernetes.io/instance: my-cluster
    app.kubernetes.io/managed-by: strimzi-cluster-operator
    app.kubernetes.io/name: strimzi
    app.kubernetes.io/part-of: strimzi-my-cluster
    strimzi.io/cluster: my-cluster
    strimzi.io/kind: Kafka
    strimzi.io/name: strimzi
  name: my-cluster-cluster-ca-cert
  namespace: kafka
  ownerReferences:
  - apiVersion: kafka.strimzi.io/v1beta2
    blockOwnerDeletion: false
    controller: false
    kind: Kafka
    name: my-cluster
    uid: 23c84dfb-bb33-47ed-bd41-b4e87e0a4c3a
  resourceVersion: "49424"
  uid: 6c2679a8-216f-421b-880a-de0e6a0879fa
type: Opaque

Let’s create a user assigned to the mTLS authorization.

Security and Debezium

With Kafka secured, let’s create a KafkaUser resource to set the authorization roles to the groups and topics authenticated for users using the mTLS mode.

Create a new file named kafka-user-connect-all-topics.yaml with the following content:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
 name: my-connect
 namespace: kafka
 labels:
   # Cluster name set previously
   strimzi.io/cluster: my-cluster
spec:
 authentication:
   type: tls
 authorization:
   type: simple
   acls:
   # Kafka Connects internal topics used to store configuration, offsets or status
   - resource:
       type: group
       name: outbox-viewer
     operation: Read
   - resource:
       type: group
       name: outbox-viewer
     operation: Describe
   - resource:
       type: group
       name: mysql-dbhistory
     operation: Read
   - resource:
       type: group
       name: mysql-dbhistory
     operation: Describe
   - resource:
       type: group
       name: connect-cluster
     operation: Read
   - resource:
       type: group
       name: connect-cluster
     operation: Describe
   - resource:
       type: topic
       name: connect-cluster-configs
     operation: Read
   - resource:
       type: topic
       name: connect-cluster-configs
     operation: Describe
   - resource:
       type: topic
       name: connect-cluster-configs
     operation: Write
   - resource:
       type: topic
       name: connect-cluster-configs
     operation: Create
   - resource:
       type: topic
       name: connect-cluster-status
     operation: Read
   - resource:
       type: topic
       name: connect-cluster-status
     operation: Describe
   - resource:
       type: topic
       name: connect-cluster-status
     operation: Write
   - resource:
       type: topic
       name: connect-cluster-status
     operation: Create
   - resource:
       type: topic
       name: connect-cluster-offsets
     operation: Read
   - resource:
       type: topic
       name: connect-cluster-offsets
     operation: Write
   - resource:
       type: topic
       name: connect-cluster-offsets
     operation: Describe
   - resource:
       type: topic
       name: connect-cluster-offsets
     operation: Create
   - resource:
       type: group
       name: connect-cluster
     operation: Read
   # Debezium topics
   - resource:
       type: topic
       name: "*"
     operation: Read
   - resource:
       type: topic
       name: "*"
     operation: Describe
   - resource:
       type: topic
       name: "*"
     operation: Write
   - resource:
       type: topic
       name: "*"
     operation: Create

Apply the resource in a terminal window:

kubectl apply -f kafka-user-connect-all-topics.yaml -n kafka
kafkauser.kafka.strimzi.io/my-connect created

After registering this Kafka user, Strimzi creates a new secret with the same name as the KafkaUser resource (my-connect) with the pkcs12 keystore holding the client’s private key and the password to access it.

kubectl get secret my-connect -n kafka -o yaml

apiVersion: v1
data:
  ca.crt: LS0tLS1CK
  user.crt: LS0tLS1CRUdJTiB==
  user.key: LS0tLS1CRUdJTiBQUklWQVRK
  user.p12: MIILNAIBAzCAA==
  user.password: UUR4Nk5NemsxUVFF
kind: Secret
metadata:
  creationTimestamp: "2022-08-21T20:12:44Z"
  labels:
    app.kubernetes.io/instance: my-connect
    app.kubernetes.io/managed-by: strimzi-user-operator
    app.kubernetes.io/name: strimzi-user-operator
    app.kubernetes.io/part-of: strimzi-my-connect
    strimzi.io/cluster: my-cluster
    strimzi.io/kind: KafkaUser
  name: my-connect
  namespace: kafka
  ownerReferences:
  - apiVersion: kafka.strimzi.io/v1beta2
    blockOwnerDeletion: false
    controller: false
    kind: KafkaUser
    name: my-connect
    uid: 882447cc-7759-4884-9d2f-f57f8be92711
  resourceVersion: "60439"
  uid: 9313676f-3417-42d8-b3fb-a1b1fe1b3a39
type: Opaque

So now, we’ve got a new Kafka user with required permissions to use the required Kafka topics.

Before deploying the Debezium Kafka Connector, let’s permit the Kafka Connector object to read MySQL secrets directly from the mysqlsecret Secret object using the Kubernetes API (like we did in the application) so the Connector can authenticate the database to read the transaction log.

Create the kafka-role-binding.yaml file with the following content:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
 name: connector-configuration-role
 namespace: kafka
rules:
- apiGroups: [""]
 resources: ["secrets"]
 resourceNames: ["mysqlsecret", "my-connect", "my-cluster-cluster-ca-cert"]
 verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
 name: connector-configuration-role-binding
 namespace: kafka
subjects:
- kind: ServiceAccount
 name: debezium-connect-cluster-connect
 namespace: kafka
roleRef:
 kind: Role
 name: connector-configuration-role
 apiGroup: rbac.authorization.k8s.io

Notice that the name under the subjects block is the service account running the Debezium Kafka Connect Pod. We’ve not deployed the Pod yet, but when deploying a Kafka Connect component, the service account created follows the format $KafkaConnectName-connect. Since the Debezium Kafka Connect will be named debezium-connect-cluster-connect, the service account created will be my-connect-connect, and we give permissions to this account to read Kubernetes  Secrets directly.

Apply the kafka-role-binding.yaml before deploying the Debezium Kafka Connect:

kubectl apply -f kafka-role-binding.yaml -n kafka

role.rbac.authorization.k8s.io/connector-configuration-role created
rolebinding.rbac.authorization.k8s.io/connector-configuration-role-binding created

The following illustration summarizes the current secured communications:

To deploy the Debezium Kafka Connect, we’ll use the KafkaConnect object again provided by Strimzi but with some changes to authenticate against the Kafka cluster and enable reading configuration parameters from Kubernetes Secrets (the main purpose is to read MySQL credentials to authenticate from Debezium).

The following fields are configured:

  • The port is now 9093.
  • mTLS certificate is set to communicate with the cluster (tls field).
  • The certificate and key user are set to authenticate against the cluster (authentication field).
  • config.providers is set to read configuration from Kubernetes Secrets in the MySQL Connector.
  • The externalConfiguration section is used to materialize the truststores and keystores from secrets to a file. They are materialized in the /opt/kafka/external-configuration/ directory. They are accessed by MySQL connector.

Create the kafka-connect.yaml file as shown in the following listing:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
 name: debezium-connect-cluster
 namespace: kafka
 annotations:
   strimzi.io/use-connector-resources: "true"
spec:
 version: 3.2.0
 image: quay.io/lordofthejars/debezium-connector-mysql:1.9.4
 replicas: 1
 bootstrapServers: my-cluster-kafka-bootstrap:9093
 logging:
   type: inline
   loggers:
     connect.root.logger.level: "INFO"
 tls:
   trustedCertificates:
     - secretName: my-cluster-cluster-ca-cert
       certificate: ca.crt
 authentication:
   type: tls
   certificateAndKey:
     secretName: my-connect
     certificate: user.crt
     key: user.key
 config:
   config.providers: secrets
   config.providers.secrets.class: io.strimzi.kafka.KubernetesSecretConfigProvider
   group.id: connect-cluster
   offset.storage.topic: connect-cluster-offsets
   offset.storage.replication.factor: 1
   config.storage.topic: connect-cluster-configs
   config.storage.replication.factor: 1
   status.storage.topic: connect-cluster-status
   status.storage.replication.factor: 1
 externalConfiguration:
   volumes:
     - name: cluster-ca
       secret:
         secretName: my-cluster-cluster-ca-cert
     - name: my-user
       secret:
         secretName: my-connect

The trustedCertificates are set from the secret created when the Kafka cluster was deployed using the Kafka object.

The certificateAndKey, under the authentication block, is set from the secret created when KafkaUser was registered.

Deploy the resource and validate it’s correctly deployed and configured:

kubectl apply -f kafka-connect.yaml -n kafka
kafkaconnect.kafka.strimzi.io/debezium-connect-cluster created

Create a new file named debezium-kafka-connector.yaml configuring Debezium to register the MySQL connector to access the transaction log of MySQL instance. In this case, we are not using plain text username and password in the connector configuration but referring to the Secret object we previously created with MySQL credentials. The format to access to the Secret is secrets:/:. Moreover, it reads the trust stores and key stores materialized when you applied the KafkaConnect definition.


apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
 name: debezium-connector-mysql
 namespace: kafka
 labels:
   strimzi.io/cluster: debezium-connect-cluster
spec:
 class: io.debezium.connector.mysql.MySqlConnector
 tasksMax: 1
 config:
   group.id: connect-cluster
   tasks.max: 1
   database.hostname: mysql
   database.port: 3306
   database.user: root
   database.password: ${secrets:kafka/mysqlsecret:mysqlpassword}
   database.server.id: 184054
   database.server.name: mysql
   database.include.list: moviesdb
   database.allowPublicKeyRetrieval: true
   table.include.list: moviesdb.OutboxEvent
   database.history.kafka.bootstrap.servers: my-cluster-kafka-bootstrap:9093
   database.history.kafka.topic: schema-changes.movies
   database.history.producer.security.protocol: SSL
   database.history.producer.ssl.keystore.type: PKCS12
   database.history.producer.ssl.keystore.location: /opt/kafka/external-configuration/my-user/user.p12
   database.history.producer.ssl.keystore.password: ${secrets:kafka/my-connect:user.password}
   database.history.producer.ssl.truststore.type: PKCS12
   database.history.producer.ssl.truststore.location: /opt/kafka/external-configuration/cluster-ca/ca.p12
   database.history.producer.ssl.truststore.password: ${secrets:kafka/my-cluster-cluster-ca-cert:ca.password}

   database.history.consumer.security.protocol: SSL
   database.history.consumer.ssl.keystore.type: PKCS12
   database.history.consumer.ssl.keystore.location: /opt/kafka/external-configuration/my-user/user.p12
   database.history.consumer.ssl.keystore.password: ${secrets:kafka/my-connect:user.password}
   database.history.consumer.ssl.truststore.type: PKCS12
   database.history.consumer.ssl.truststore.location: /opt/kafka/external-configuration/cluster-ca/ca.p12
   database.history.consumer.ssl.truststore.password: ${secrets:kafka/my-cluster-cluster-ca-cert:ca.password}

Apply the file to register the MySQL connector running the following command in a terminal window:

kubectl apply -f kafka-connector.yaml -n kafka
kafkaconnector.kafka.strimzi.io/debezium-connector-mysql created

Finally, all the communications are secured.

Demo

And that’s all, now we’ve got the same example shown in the previous article, but now it’s secured.

To test it, there is a Quarkus application named outbox-viewer that prints all the content of the OutboxEvent topic to the console. Apply the following YAML file to deploy it:

---
apiVersion: v1
kind: ServiceAccount
metadata:
 annotations:
   app.quarkus.io/commit-id: ebe139afdc9f7f956725af5c5a92cf3c03486bca
   app.quarkus.io/build-timestamp: 2022-08-23 - 11:14:36 +0000
 labels:
   app.kubernetes.io/name: outbox-viewer
   app.kubernetes.io/version: 1.0.0-SNAPSHOT
 name: outbox-viewer
 namespace: kafka
---
apiVersion: v1
kind: Service
metadata:
 annotations:
   app.quarkus.io/commit-id: ebe139afdc9f7f956725af5c5a92cf3c03486bca
   app.quarkus.io/build-timestamp: 2022-08-23 - 11:14:36 +0000
 labels:
   app.kubernetes.io/name: outbox-viewer
   app.kubernetes.io/version: 1.0.0-SNAPSHOT
 name: outbox-viewer
 namespace: kafka
spec:
 ports:
   - name: http
     port: 80
     targetPort: 8080
 selector:
   app.kubernetes.io/name: outbox-viewer
   app.kubernetes.io/version: 1.0.0-SNAPSHOT
 type: ClusterIP
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
 name: view-secrets
 namespace: kafka
rules:
 - apiGroups:
     - ""
   resources:
     - secrets
   verbs:
     - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
 name: outbox-viewer-view
 namespace: kafka
roleRef:
 kind: ClusterRole
 apiGroup: rbac.authorization.k8s.io
 name: view
subjects:
 - kind: ServiceAccount
   name: outbox-viewer
   namespace: kafka
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
 name: outbox-viewer-view-secrets
 namespace: kafka
roleRef:
 kind: Role
 apiGroup: rbac.authorization.k8s.io
 name: view-secrets
subjects:
 - kind: ServiceAccount
   name: outbox-viewer
   namespace: kafka
---
apiVersion: apps/v1
kind: Deployment
metadata:
 annotations:
   app.quarkus.io/commit-id: ebe139afdc9f7f956725af5c5a92cf3c03486bca
   app.quarkus.io/build-timestamp: 2022-08-23 - 11:14:36 +0000
 labels:
   app.kubernetes.io/name: outbox-viewer
   app.kubernetes.io/version: 1.0.0-SNAPSHOT
 name: outbox-viewer
 namespace: kafka
spec:
 replicas: 1
 selector:
   matchLabels:
     app.kubernetes.io/name: outbox-viewer
     app.kubernetes.io/version: 1.0.0-SNAPSHOT
 template:
   metadata:
     annotations:
       app.quarkus.io/commit-id: ebe139afdc9f7f956725af5c5a92cf3c03486bca
       app.quarkus.io/build-timestamp: 2022-08-23 - 11:14:36 +0000
     labels:
       app.kubernetes.io/name: outbox-viewer
       app.kubernetes.io/version: 1.0.0-SNAPSHOT
     namespace: kafka
   spec:
     containers:
       - env:
           - name: KUBERNETES_NAMESPACE
             valueFrom:
               fieldRef:
                 fieldPath: metadata.namespace
         image: quay.io/lordofthejars/outbox-viewer:1.0.0-SNAPSHOT
         imagePullPolicy: Always
         name: outbox-viewer
         ports:
           - containerPort: 8080
             name: http
             protocol: TCP
         volumeMounts:
           - mountPath: /home/jboss/cluster
             name: cluster-volume
             readOnly: false
           - mountPath: /home/jboss/user
             name: user-volume
             readOnly: false
     serviceAccountName: outbox-viewer
     volumes:
       - name: cluster-volume
         secret:
           optional: false
           secretName: my-cluster-cluster-ca-cert
       - name: user-volume
         secret:
           optional: false
           secretName: my-connect

Then in one terminal window, follow the logs of the application’s Pod.

kubectl logs outbox-viewer-684969f9f6-7snng -f

Substitute the Pod name with your Pod.

Find the IP and port of the Movie Player Producer application running the following commands in a terminal:

minikube ip -p strimzi

192.168.59.106

Gets the exposed port of the movie-plays-producer-debezium, which is the second port (in bold in the following snippet).

kubectl get services -n kafka

movie-plays-producer-debezium   LoadBalancer   10.100.117.203        80:32460/TCP                 67m

The, send a curl request to the Movie Plays Producer application:

curl -X 'POST' 
  'http://192.168.59.106:32460/movie' 
  -H 'accept: application/json' 
  -H 'Content-Type: application/json' 
  -d '{
  "name": "Minions: The Rise of Gru",
  "director": "Kyle Balda",
  "genre": "Animation"
}'

Adapt the IP and port to your case.

Finally, inspect the output of the outbox-viewer Pod to see the transmission of the data from the database to Kafka using the Debezium Server approach.

{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"bytes","optional":false,"field”
…
,"aggregatetype":"Movie","aggregateid":"1","type":"MovieCreated","timestamp":1661339188708005,"payload":"{"id":1,"name":"Minions: The Rise of Gru","director":"Kyle Balda","genre":"Animation"}","tracingspancontext":null},"source":{"version":"1.9.4.Final","connector":"mysql","name":"mysql","ts_ms":1661339188000,"snapshot":"false","db":"moviesdb","sequence":null,"table":"OutboxEvent","server_id":1,"gtid":null,"file":"binlog.000002","pos":2967,"row":0,"thread":15,"query":null},"op":"c","ts_ms":1661339188768,"transaction":null}}

Debezium Embedded

So far, we’ve secured the interactions between the Application and the MySQL database, Debezium Server and MySQL, Debezium Server and Kafka.

But you might wonder, what happens if instead of using Debezium Server, I am using Debezium Embedded deployed within the Quarkus application?  How can I configure the Kafka connection to be secured using the mTLS method?

Quarkus offers two ways to connect to Kafka, using the Kafka client or the Reactive Message client; let’s see the properties required in both cases to authenticate to a Kafka cluster using the mTLS authentication method.

KeyStore and TrustStore

To configure mTLS on the client side, four elements are required:

  • Cluster TrustStore to make the mTLS connection
  • TrustStore password
  • Kafka User KeyStore to authenticate
  • KeyStore password

The first two elements are stored in the my-cluster-cluster-ca-cert Kubernetes Secret created before when we applied the Strimzi resources. To get them, run the following commands in a terminal window:

kubectl get secret my-cluster-cluster-ca-cert -n kafka -o jsonpath='{.data.ca.p12}' | base64 -d > mtls-cluster-ca.p12

And the password:

kubectl get secret my-cluster-cluster-ca-cert -n kafka -o jsonpath='{.data.ca.password}' | base64 -d
k2cckH0K5sOu

The later elements are stored in the my-connect Kubernetes Secret. To get them, run the following commands in a terminal window:

kubectl get secret my-connect -n kafka -o jsonpath='{.data.user.p12}' | base64 -d > mtls-user.p12

And the password:

kubectl get secret my-connect -n kafka -o jsonpath='{.data.user.password}' | base64 -d
QDx6NMzk1QQE

Now, set the Quarkus Kafka configuration properties to authenticate to the Kafka cluster using the previous credentials:

%prod.kafka.ssl.truststore.location=mtls-cluster-ca.p12
%prod.kafka.ssl.truststore.password=k2cckH0K5sOu
%prod.kafka.ssl.truststore.type=PKCS12
%prod.kafka.ssl.keystore.location=mtls-user.p12
%prod.kafka.ssl.keystore.password=QDx6NMzk1QQE
%prod.kafka.ssl.keystore.type=PKCS12
%prod.kafka.security.protocol=SSL

%prod.mp.messaging.incoming.movies.ssl.truststore.location=mtls-cluster-ca.p12
%prod.mp.messaging.incoming.movies.ssl.truststore.password=k2cckH0K5sOu
%prod.mp.messaging.incoming.movies.ssl.truststore.type=PKCS12
%prod.mp.messaging.incoming.movies.ssl.keystore.location=mtls-user.p12
%prod.mp.messaging.incoming.movies.ssl.keystore.password=QDx6NMzk1QQE
%prod.mp.messaging.incoming.movies.ssl.keystore.type=PKCS12
%prod.mp.messaging.incoming.movies.security.protocol=SSL

We could use the Quarkus Kubernetes Config extension as done with MySQL credentials to inject the credentials directly, but for the sake of simplification, we did it in this way.

But in terms of security, there is still one missing important point: how do we correctly store secrets inside a YAML file, and how do we keep secrets at rest securely inside a Kubernetes cluster?

Encryption of Secrets

At the beginning of this article, we created a Kubernetes Secret object with the MySQL credentials, but it was a YAML file with sensitive information encoded in Base64 format, so that is not very secure at all. Probably this YAML file will end up in a Git repository making the secrets available to anyone with access to the repo. In the following section, we’ll fix this.

Sealed Secrets

Sealed Secrets is a Kubernetes controller permitting to encrypt Kubernetes Secrets resources at the client side (local machine) and decrypting them inside the Kubernetes cluster when applied.

There are two components to start using the Sealed Secrets project. The first one is the kubeseal CLI tool to encrypt secrets.

To install kubeseal, download the package depending on your operative system from the following link.

The second one is the kubeseal Kubernetes controller. To install it, run the following command on the command line:

kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.18.1/controller.yaml -n kube-system

role.rbac.authorization.k8s.io/sealed-secrets-service-proxier created
clusterrole.rbac.authorization.k8s.io/secrets-unsealer created
deployment.apps/sealed-secrets-controller created
customresourcedefinition.apiextensions.k8s.io/sealedsecrets.bitnami.com created
service/sealed-secrets-controller created
role.rbac.authorization.k8s.io/sealed-secrets-key-admin created
clusterrolebinding.rbac.authorization.k8s.io/sealed-secrets-controller created
serviceaccount/sealed-secrets-controller created
rolebinding.rbac.authorization.k8s.io/sealed-secrets-service-proxier created
rolebinding.rbac.authorization.k8s.io/sealed-secrets-controller created

Inspect that controller has correctly deployed and running by running the following command:

kubectl  get pods -n kube-system

sealed-secrets-controller-554d94cb68-xr6mw                                1/1     Running   0          8m46s

After that, we can take the mysql-secret.yaml file and use kubeseal tool to automatically create a new Kubernetes resource of the kind SealedSecret with the data field encrypted.

kubeseal -n kube -o yaml  mysql-secret-encrypted.yaml

The new file, named mysql-secret-encrypted.yaml, is of kind SealedSecret with the value of each key encrypted:

apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
 creationTimestamp: null
 name: mysqlsecret
 namespace: kube
spec:
 encryptedData:
   mysqlpassword: AgBl721mnowwPlC35FfO26zP0
   mysqlrootpassword: AgAKl1tWV8hahn00yGS4ucs
   mysqluser: AgCWrWFl1/LcS
template:
   data: null
   metadata:
     creationTimestamp: null
     name: mysqlsecret
     namespace: kafka
   type: Opaque

At this point, you can safely remove the mysql-secret.yaml file as it’s not required anymore.

Apply the encrypted resource as any other Kubernetes resource file, and the Sealed Secrets Kubernetes controller will decrypt and store it correctly inside Kubernetes as a normal Secret.

You can validate the Secret by running the following command:

kubectl  get secret mysqlsecret -n kafka -o yaml

apiVersion: v1
data:
  mysqlpassword: YWxleA==
  mysqlrootpassword: YWxleA==
  mysqluser: YWxleA==
kind: Secret
metadata:
  creationTimestamp: "2022-08-21T19:05:21Z"
  name: mysqlsecret
  namespace: kafka
  ownerReferences:
  - apiVersion: bitnami.com/v1alpha1
    controller: true
    kind: SealedSecret
    name: mysqlsecret
    uid: 2a5ee74b-c2b2-49b3-9a9f-877e7a77b163
  resourceVersion: "41514"
  uid: 494cbe8b-7480-4ebd-9cc5-6fe396795eaa
type: Opaque

It’s important to note that it’s a decrypted Kubernetes Secret having a reference to the SealedSecret responsible for its creation. In this way, the lifecycle of the SealedSecret is tight to the Secret too.

We’ve fixed the problem of storing the YAML file correctly without revealing sensitive data, but when the Secret is applied to the Kubernetes cluster, it’s stored in Base64 encoding format, so it’s not secret.

Secrets at Rest

By default, Kubernetes doesn’t store secrets encrypted at rest in the etcd database. Encrypting Secret Data at Rest is a huge topic that would deserve its  own post (in fact there is a book Kubernetes Secret Management dedicated to this topic). Every Kubernetes implementation might have different ways to enable encryption of secrets at rest, although at the very end, it’s a configuration file (EncryptionConfiguration), copied inside every kube-apiserver node.

This file is the form of:

apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
 - resources:
     - secrets
   providers:
     - identity: {}
     - aesgcm:
         keys:
           - name: key1
             secret: c2VjcmV0IGlzIHNlY3VyZQ==
           - name: key2
             secret: dGhpcyBpcyBwYXNzd29yZA==
     - aescbc:
         keys:
           - name: key1
             secret: c2VjcmV0IGlzIHNlY3VyZQ==
           - name: key2
             secret: dGhpcyBpcyBwYXNzd29yZA==
     - secretbox:
         keys:
           - name: key1
             secret: YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXoxMjM0NTY=

In the following image, we can see the flow of a secret when an EncryptionConfiguration file is registered in the kube-apiserver.

Now, secrets are encrypted at the YAML file thanks to the SealedSecrets object and also protected when at rest using the EncryptionConfiguration file.

Conclusions

Securing all the infrastructure is important, and we’ve learned in this article how to secure the access to the database and to Kafka using Kubernetes Secrets.

With Strimzi we are able to define not only the authentication part, but also the authorization part providing some rules on who can do what regarding Kafka topics.

Accessing these secrets is also an important part, and Quarkus and Debezium let you access those secrets in an efficient yet secured way, without persisting the secret in the filesystem (or as environment variable) but injecting them directly into memory.

Security is an important topic, and Strimzi is the perfect match when it’s time to manage it in a Kafka cluster.

The source code is available on GitHub.

Microservices Integration Done Right Using Contract-Driven Development

Key Takeaways

  • Integration testing has become the largest hurdle impacting the independent development and deployment of microservices which is impacting the quality, time-to-market, predictability of delivery and ultimately business agility
  • We need an alternate approach to help identify compatibility issues between microservices early in the development cycle to reduce the dependence on integration testing
  • Adopting API specifications such as OpenAPI or AsyncAPI is a step in the right direction to clearly communicate the API signature to avoid communication gaps. Why stop there when you can get a lot more value from them?
  • Contract Driven Development helps us leverage API specifications as executable contracts using Specmatic to shift left the identification of compatibility issues thereby eliminating / reducing the need for integration tests
  • Specmatic has a #NOCODE approach that holds both consumer and provider teams accountable to commonly agreed API specification by emulating the provider for the consumers through “Smart Mocks” and emulating the consumer for the provider through “Contract as Test”

The ability to develop and deploy a single microservice independently is the most critical indicator of a successful microservice adoption strategy. However, most teams must undergo an extensive integration testing phase before deploying them. This is because integration tests have become necessary to identify compatibility issues between microservices since unit and component/API tests do not cover interactions between microservices.

Firstly, integration tests are a late feedback mechanism to find compatibility issues. The cost of fixing such issues increases severalfold with how late they are discovered (represented by the heatmap at the bottom of the above image).

Also, this can cause extensive rework for both consumer and provider teams which impacts the predictability of feature delivery severely because teams often must juggle regular feature development along with fixing integration bugs.

Also, integration environments can be extremely brittle. Even a single broken interaction, because of compatibility issues between two components/services, can render the entire environment compromised, which means even other unrelated features and microservices cannot be tested.

This blocks the path to production, even for critical fixes. And can bring the entire delivery to a grinding halt. We call this the “Integration Hell.”

Integration Testing – understanding the beast

Before we kill integration tests, let us understand what it actually is. The term is often a misnomer.
 
Testing an application is not just about testing the logic within each function, class, or component. Features and capabilities are a result of these individual snippets of logic interacting with their counterparts. If a service boundary/API between two pieces of software is not properly implemented, it leads to what is popularly known as an integration issue. Example: If functionA calls functionB with only one parameter while functionB expects two mandatory parameters, there is an integration/compatibility issue between the two functions. Such quick feedback helps us course correct early and fix the problem immediately.

However, when we look at such compatibility issues at the level of microservices where the service boundaries are at the http, messaging, or event level, any deviation or violation of the service boundary is not immediately identified during unit and component/api testing. The microservices must be tested with all their real counterparts to verify if there are broken interactions. This is what is broadly (and in a way wrongly) classified as integration testing.

Integration testing is used as a term to cover a broad spectrum of checks:

  1. Compatibility between two or more components
  2. Workflow testing – an entire feature that involves an orchestration of interactions
  3. Interaction with other dependencies such as storage, messaging infrastructure, etc.
  4. And more, just short of end-to-end tests with production infrastructure

To be clear, when we are talking about killing “integration testing,” we are talking about removing the dependency on “integration tests” as the only way to identify compatibility issues between microservices. Other aspects, such as workflow testing, may still be necessary.

Identifying the inflection point – knowing where to strike

When all the code is part of a monolith, the API specification for a service boundary may just be a method signature. Also, these method signatures can be enforced through mechanisms such as compile time checks, thereby giving early feedback to developers.

However, when a service boundary is lifted to an interface such as http REST API by splitting the components into microservices, this early feedback is lost. The API specification, which was earlier documented as an unambiguous method signature, now needs to be documented explicitly to convey the right way of invoking it. This can lead to a lot of confusion and communication gaps between teams if the API documentation is not machine parsable.

Without a well-documented service boundary,

  1. The consumer/client can only be built with an approximate emulation of the provider – Hand-rolled mocks and other stubbing techniques often lead to a problem called stale stubs, where the mock is not truly representative of the provider anymore.
  2. Likewise, for the provider, there is no emulation of the consumer.

This means that we must resort to a slow and sequential style of development where we wait for one of the components to be built before we start with the other. This is not a productive approach if we need to ship features quickly.

We have now lost two critical aspects by moving to microservices:

  1. Ability to clearly communicate the API specification for a service boundary between two components leveraging it to interact with each other
  2. Also, the ability to enforce that API specification describing the service boundary.

We need a way to compensate for these deficiencies.

API Specifications

Adopting an API specification standard such as OpenAPI or AsyncAPI is critical to bring back the ability to communicate API signatures in an unambiguous and machine-readable manner. While this adds to developers’ workload to  create and maintain these specs, the benefits outweigh the effort.

That said, API specifications, as the name suggests, only help in describing the API signatures. What about the aspect of enforcing them in the development process for early feedback? That part is still missing.

Code/Document Generation – Ineffective and unsustainable

We can argue that API specifications can be enforced by code generation techniques. And it seems to make sense at a surface level that if the code was generated based on the specification, how can it deviate from the specification?

However, here are the difficulties

  1. Ongoing development – Most code gen tools/techniques generate scaffolds for server/provider and client/consumer code and require us to fill in our business logic within that scaffold/template. The problem is when there is a change in the specification, and we usually need to regenerate the scaffold, extract our business logic from older versions of the code, and paste it there again, which leaves room for human error.
  2. Data type mismatches – Code gen tools/techniques have to build the capability for each programming language. In a polyglot environment, the generated scaffold may not be consistent in terms of data types, etc., across various programming languages. This is further exacerbated if we leverage document generation (generating API specifications based on provider/service code) in one programming language and then leverage that generated specification to further generate scaffolding for client code.

Overall, code generation and document generation may work only in a limited scenario with several   caveats. While they may initially provide a quick way for teams to build applications by giving them free code, the ongoing cost of such techniques makes them impractical.

So, we need another way to enforce API specifications.

Contract-Driven Development – API Specifications as Executable Contracts

A method signature can be enforced by a compiler to give early feedback to a developer when they are deviating from the method signature. Can something similar be done for APIs?

Contract testing is an attempt to achieve this goal. According to Pact.io’s documentation:

Contract testing is a technique for testing an integration point by checking each application in isolation to ensure the messages it sends or receives conform to a shared understanding that is documented in a “contract.”

However, it is important to note that there are several approaches to contract testing itself, such as consumer-driven contract testing (Pact.io), provider-driven contract (Spring cloud contract in the producer contract testing approach), bi-directional contract testing (Pactflow.io), and so on. And in a large majority of these approaches, the API contract is a separate document from the API specification. For example, in Pact.io pact, jsons are the API contracts. Spring cloud contract also has its own DSL to define the contract. Instead of maintaining two different artifacts, which could potentially go out of sync, what if we could leverage the API specification itself as the API contract to provide developers early feedback when their implementation deviates from the API specification causing a problem to the consumer/API client?
 
That is exactly what Specmatic can achieve. Specmatic is an open-source tool that embodies Contract-Driven Development (CDD). It enables us to split the interactions between the consumer and provider into independently verifiable units. Consider the following interaction between two microservices, which are currently being verified only in higher environments.

ServiceA  ServiceB

CDD helps us split this interaction into its constituent components
 

ServiceA  Contract as Stub {API spec of ServiceB}
             Contract as Test {API spec of ServiceB}  ServiceB

Let us now examine the above in detail.

  1. Left-hand side – ServiceA => Contract as Stub
    1. Here, we are emulating the provider (ServiceB) for the consumer (ServiceA) so that the consumer application development can progress independently of the provider.
    2. Since the Contract as Stub (Smart Mock) is based on the mutually-agreed API specification, this is indeed a truly representative emulation of the provider (ServiceB) and gives feedback/throws an error when the consumer (ServiceA) implementation deviates from the API specification while invoking it.
  2. Right-hand side – Contract as Test => ServiceB
    1. Emulate the consumer (ServiceA) for the provider (ServiceB) by invoking it and verifying if the response is as per mutually-agreed API specification.
    2. Contract as Test will provide immediate feedback to the provider (ServiceB) application developer as soon as they deviate from the spec.

Now that we can adhere to the API specification at a component level for both consumer (ServiceA) and provider (ServiceB) applications while building them independently of each other, there is no need to test their interactions by deploying them together. Thereby no more dependency on integration tests for identifying compatibility issues.

This is how Specmatic is able to leverage API specifications as executable contracts.

Contract as Code

The linchpin here is the API specification itself, which allows the API providers and consumers to decouple and drive the development and deployment of their respective components independently while keeping all of them aligned.

For Contract-driven development (CDD) to be successful, we need to take an API-first approach, where API providers and consumers collaboratively design and document the API specification first. This means they would need to use one of the modern visual editors like Swagger, Postman, Stoplight, etc., to author their API specification by focusing on the API design and making sure all the stakeholders are in-sync before they go off and start building their pieces independently.

Teams that are habituated to generating the API specification from their code might feel uncomfortable with this reverse flow of writing the API specification first. CDD requires a mindset shift similar to Test-driven-development, where we hand-write the test first to guide/drive our code design. Similarly, in CDD, we hand-code the API specification first and then use tools like Specmatic to turn them into executable contract tests.

With approaches that rely on generating the API specification from code, I have observed that the API design takes a backseat, becoming more of an afterthought or being biased toward the consumer or the provider. Also, with time-to-market pressure, by starting with the API specification first, we are able to independently progress on consumer and provider components in parallel. This is not possible when we are depending on generating the API specification from code (consumers have to wait until providers have written code and generated the specs.)

Now that you have agreed on a common API specification first, it is absolutely important that there is a single source of truth for these API specifications. Multiple copies of these specs floating around will lead to consumer and provider teams diverging in their implementation.

CDD stands on the strength of three fundamental pillars. While “Contract as Stub” and “Contract as Test” keeps the consumers and provider teams in line, the glue holding everything together is the third pillar which is the “central contract repo.”

API specifications are machine-parsable code. And what better place to store them than a version control system. By storing them in a version control system such as Git, we can also add some rigor to the process of authoring them by adding a Pull/Merge request process. The Pull/Merge request should ideally involve the following steps:

  1. Syntax check + linting to ensure consistency
  2. Backward compatibility checks to identify if there are any breaking changes
  3. A final review and merge step

It is highly recommended that specs be stored in a central location. This suits most cases (even large enterprises). Storing specifications across multiple repositories are not recommended unless there is an absolute necessity for that practice.

Once the specifications are in the central repo, they can be:

  1. Leveraged by consumer and provider teams to make independent progress
  2. Published to API gateways where appropriate

The death of integration tests

Now that we have eliminated the need for integration tests to identify compatibility issues between applications, what about system and workflow testing?

CDD paves the way for more stable higher environments since all compatibility issues have been identified much earlier in the development cycle (in environments such as local and CI), where the cost of fixing such issues is significantly lower. This allows us to run system and workflow tests to verify complex orchestrations in the now-stable higher environments. And since we have removed the need for running integration tests to identify compatibility issues, this reduces the overall run time of test suites in higher environments.

The Service and the Beast: Building a Windows Service that Does Not Fail to Restart

Key Takeaways

  • Windows Services play a key role in the Microsoft Windows operating system, and support the creation and management of long-running processes.
  • Service Isolation is important and powerful, however, when the service needs to interact with the user’s space, isolation makes things harder, but you can manage this.
  • Services are ideal for use alongside a watchdog mechanism. Such a mechanism will ensure a given application is always running, and in case it shuts down abnormally, it will restart.
  • A good logging mechanism is always useful during development, using either a simple, or, when needed, a complex logging tool.
  • Testing the final solution is imperative. Once the code is checked and verified to work, up to 2% of testers might still report bugs, which is within reason.
     

When programming C++ for Windows, working with Windows Services is almost inevitable. Windows Services play a key role in the Microsoft Windows operating system, and support the creation and management of long-running processes that  survive sleep, hibernate, restart and shutting down. But what happens if they don’t? The inability to restart a service after shutting down the PC when Fast Startup is checked can result in a program catastrophe. Service Isolation, introduced by Microsoft in Windows Vista, can cause this type of havoc – and here’s how you can solve it.

Thank You for Your Service …

We’ve been working with Windows Services for years, yet it seems that no matter how much we think we know about Services, or how much we believe we can handle them, we keep encountering more problems, challenges and issues. Some of these issues are undocumented or, if we’re “lucky”, they are poorly documented.

Ever since Service Isolation was introduced by Microsoft, one of the most annoying problems we’ve encountered is the inability to restart a service after shutting down the PC when Fast Startup is checked. As we could not find a solution, we decided to roll up our sleeves and created one ourselves, which led to the development of a persistent Service.

But before we dive deeper and explain more about our solution, let’s start with the basics and explain what Services are and why we even need to use Windows Services in the first place.

NT Service (also known as Windows Service) is the term given to a special process which is loaded by the Service Control Manager of the NT kernel, and runs in the background right after Windows starts (before users log on). We use services to perform core and low-level OS tasks, such as Web serving, event logging, file serving, help and support, printing, cryptography, and error reporting.

Additionally, services enable us to create executable, long-running applications. The reason is that a Service runs in its own Windows session environment, so it does not interfere with other components or sessions of your application. Obviously, Services are expected to start automatically once the computer boots – and we’ll get to that in a minute.

Moving further, the obvious question is – why do we need persistent Services? The answer is pretty clear, a service is able to:

  • run; continuously in the background
  • invoke itself under the logged-in user’s session;
  • act as a watchdog and make sure a given application is always running.

A Windows Service needs to survive sleep, hibernate, restart and shutdown. However, as explained, there are specific and dangerous issues when “Fast Startup” is checked and the PC is turned off and on again. In most of these cases, the service failed to restart.

Since we were developing an Anti-Virus, which is supposed to restart after reboot or shutdown, this issue created a serious problem which we were eager to solve.

Stay! Good Service …

To create the near perfect persistent Windows service, we had to solve several underlying issues first.

One of those issues was related to Service Isolation – the isolated Service can’t access any context associated with any specific user. One of our software products used to store data in c:usersappdatalocal but when it ran from our service, the path was invalid since the service runs from Session 0. Moreover, after reboot, the Service starts before any user logs in, which leads to the first piece of the solution: waiting for the user to log in.

To figure out how to do this, we posted our question here.

This turned out to be a problem with no perfect solution, however, the code that accompanied this article has been used and fully tested with no issues.

The Basics

The structure and the flow of our code may look complex, and that is for a reason. Over the last 10 years, Services have become isolated from other processes. Since then, Windows Services operate under the SYSTEM user account as opposed to any other user account, and run isolated.

The reason for the isolation is because services are powerful and can be a potential security risk. Because of that, Microsoft introduced service isolation. Before that change, all services ran in Session 0 along with applications.

However, after the introduction of isolation, which took place with Windows Vista, things have changed.
The idea behind our code was to have the Windows Service launch itself as a user, by calling CreateProcessAsUserW, as will be explained further.
Our Service, named  SG_RevealerService, has several commands and when called using the following command line parameters, it acts accordingly.

#define SERVICE_COMMAND_INSTALL L"Install"             // The command line argument
                                                       // for installing the service

#define SERVICE_COMMAND_LAUNCHER L"ServiceIsLauncher"  // Launcher command for
                                                       // NT service

When calling SG_RevealerService, there are three options:

Option 1 – called without any command line argument – nothing will happen.

Option 2 – called with the Install command line argument. In this case, the service will install itself and if a valid executable path is added after a hash (#) separator, it will start, and the Windows watchdog will keep it running.

The Service then runs itself using CreateProcessAsUserW(), and the new process runs under the user account. This gives the Service the ability to access the context that the calling instance has no access to due to Service Isolation.

Option 3 – called with the ServiceIsLauncher command line argument. The service client main application will start. At this point, the entry function indicates that the service has started itself with the current user’s privileges. At this point, you can see 2 instances of SG_RevealerService in the Task Manager: one under SYSTEM, and the other under the currently logged-in user.

/*
RunHost
*/

BOOL RunHost(LPWSTR HostExePath,LPWSTR CommandLineArguments)
{
    WriteToLog(L"RunHost '%s'",HostExePath);

    STARTUPINFO startupInfo = {};
    startupInfo.cb = sizeof(STARTUPINFO);
    startupInfo.lpDesktop = (LPTSTR)_T("winsta0\default");

    HANDLE hToken = 0;
    BOOL bRes = FALSE;

    LPVOID pEnv = NULL;
    CreateEnvironmentBlock(&pEnv, hToken, TRUE);

    PROCESS_INFORMATION processInfoAgent = {};
    PROCESS_INFORMATION processInfoHideProcess = {};
    PROCESS_INFORMATION processInfoHideProcess32 = {};

    if (PathFileExists(HostExePath))
    {
        std::wstring commandLine;
        commandLine.reserve(1024);

        commandLine += L""";
        commandLine += HostExePath;
        commandLine += L"" "";
        commandLine += CommandLineArguments;
        commandLine += L""";

        WriteToLog(L"launch host with CreateProcessAsUser ...  %s",
                     commandLine.c_str());

        bRes = CreateProcessAsUserW(hToken, NULL, &commandLine[0],
               NULL, NULL, FALSE, NORMAL_PRIORITY_CLASS |
               CREATE_UNICODE_ENVIRONMENT | CREATE_NEW_CONSOLE |
               CREATE_DEFAULT_ERROR_MODE, pEnv,
            NULL, &startupInfo, &processInfoAgent);
        if (bRes == FALSE)
        {
            DWORD   dwLastError = ::GetLastError();
            TCHAR   lpBuffer[256] = _T("?");
            if (dwLastError != 0)    // Don't want to see an
                                     // "operation done successfully" error ;-)
            {
                ::FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM,    // It's a system error
                    NULL,                                      // No string to be
                                                               // formatted needed
                    dwLastError,                               // Hey Windows: Please
                                                               // explain this error!
                    MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), // Do it in the standard
                                                               // language
                    lpBuffer,              // Put the message here
                    255,                   // Number of bytes to store the message
                    NULL);
            }
            WriteToLog(L"CreateProcessAsUser failed - Command Line = %s Error : %s",
                         commandLine, lpBuffer);
        }
        else
        {
            if (!writeStringInRegistry(HKEY_LOCAL_MACHINE,
               (PWCHAR)SERVICE_REG_KEY, (PWCHAR)SERVICE_KEY_NAME, HostExePath))
            {
                WriteToLog(L"Failed to write registry");
            }
        }
    }
    else
    {
        WriteToLog(L"RunHost failed because path '%s' does not exists", HostExePath);
    }
    hPrevAppProcess = processInfoAgent.hProcess;
    
    CloseHandle(hToken);
    WriteToLog(L"Run host end!");

    return bRes;
}

Detecting User Log On

The first challenge is to start some of the actions only when and if a user logs in.
In order to detect a user log on, we first define a global variable.

bool g_bLoggedIn = false;

It will be set to true when a user logs in.

Subscribing to the Logon event

We defined the following Preprocesor Directives:

#define EVENT_SUBSCRIBE_PATH    L"Security"
#define EVENT_SUBSCRIBE_QUERY    L"Event/System[EventID=4624]"

After the Service starts, we subscribe to the logon event, so the moment a user has logged in, we get an alert via the callback function we have set, and we can continue.
To implement this, we need a class to handle the creation of the subscription and wait for the event callback.

class UserLoginListner
{
    HANDLE hWait = NULL;
    HANDLE hSubscription = NULL;

public:
    ~UserLoginListner()
    {
        CloseHandle(hWait);
        EvtClose(hSubscription);
    }

    UserLoginListner()
    {
        const wchar_t* pwsPath = EVENT_SUBSCRIBE_PATH;
        const wchar_t* pwsQuery = EVENT_SUBSCRIBE_QUERY;

        hWait = CreateEvent(NULL, FALSE, FALSE, NULL);

        hSubscription = EvtSubscribe(NULL, NULL,
            pwsPath, pwsQuery,
            NULL,
            hWait,
            (EVT_SUBSCRIBE_CALLBACK)UserLoginListner::SubscriptionCallback,
            EvtSubscribeToFutureEvents);
        if (hSubscription == NULL)
        {
            DWORD status = GetLastError();

            if (ERROR_EVT_CHANNEL_NOT_FOUND == status)
                WriteToLog(L"Channel %s was not found.n", pwsPath);
            else if (ERROR_EVT_INVALID_QUERY == status)
                WriteToLog(L"The query "%s" is not valid.n", pwsQuery);
            else
                WriteToLog(L"EvtSubscribe failed with %lu.n", status);

            CloseHandle(hWait);
        }
    }

Next we need a function for the wait itself:

void WaitForUserToLogIn()
{
    WriteToLog(L"Waiting for a user to log in...");
    WaitForSingleObject(hWait, INFINITE);
    WriteToLog(L"Received a Logon event - a user has logged in");
}

We also need a callback function:

static DWORD WINAPI SubscriptionCallback(EVT_SUBSCRIBE_NOTIFY_ACTION action, PVOID
       pContext, EVT_HANDLE hEvent)
{
    if (action == EvtSubscribeActionDeliver)
    {
        WriteToLog(L"SubscriptionCallback invoked.");
        HANDLE Handle = (HANDLE)(LONG_PTR)pContext;
        SetEvent(Handle);
    }

    return ERROR_SUCCESS;
}

Then, all we need to do is add a block of code with the following lines:

WriteToLog(L"Launch clientn"); // launch client ...
{
    UserLoginListner WaitTillAUserLogins;
    WaitTillAUserLogins.WaitForUserToLogIn();
}

Once we reach the end of this block, we can be assured that a user has logged in.

Later in the article, we will explain how to retrieve the account/username of the logged-in user and how to use my GetLoggedInUser() function.

It’s Not You, It’s Me: Impersonating a User

When we know for sure that a user has logged in, we need to impersonate them.

The following function does the job. Not only does it impersonate the user, it also calls CreateProcessAsUserW() and runs itself as that user.
By doing so, we give the service access to the user’s context, including documents, desktop, etc. and allow the service to use the UI, which isn’t possible for a service running from Session 0.

CreateProcessAsUserW creates a new process along with its primary thread, which will run in the context of a given user.

//Function to run a process as active user from Windows service
void ImpersonateActiveUserAndRun()
{
    DWORD session_id = -1;
    DWORD session_count = 0;
    WTS_SESSION_INFOW *pSession = NULL;

    if (WTSEnumerateSessions(WTS_CURRENT_SERVER_HANDLE, 0, 1, &pSession, &session_count))
    {
        WriteToLog(L"WTSEnumerateSessions - success");
    }
    else
    {
        WriteToLog(L"WTSEnumerateSessions - failed. Error %d",GetLastError());
        return;
    }
    TCHAR szCurModule[MAX_PATH] = { 0 };

    GetModuleFileName(NULL, szCurModule, MAX_PATH);


    for (size_t i = 0; i (&ptr_wts_connect_state),
            &bytes_returned))
        {
            wts_connect_state = *ptr_wts_connect_state;
            ::WTSFreeMemory(ptr_wts_connect_state);
            if (wts_connect_state != WTSActive) continue;
        }
        else
        {
            continue;
        }

        HANDLE hImpersonationToken;
        if (!WTSQueryUserToken(session_id, &hImpersonationToken))
        {
            continue;
        }

        //Get the actual token from impersonation one
        DWORD neededSize1 = 0;
        HANDLE *realToken = new HANDLE;
        if (GetTokenInformation(hImpersonationToken, (::TOKEN_INFORMATION_CLASS) TokenLinkedToken, realToken, sizeof(HANDLE), &neededSize1))
        {
            CloseHandle(hImpersonationToken);
            hImpersonationToken = *realToken;
        }
        else
        {
            continue;
        }
        HANDLE hUserToken;
        if (!DuplicateTokenEx(hImpersonationToken,
            TOKEN_ASSIGN_PRIMARY | TOKEN_ALL_ACCESS | MAXIMUM_ALLOWED,
            NULL,
            SecurityImpersonation,
            TokenPrimary,
            &hUserToken))
        {
            continue;
        }


        // Get user name of this process
        WCHAR* pUserName;
        DWORD user_name_len = 0;
        if (WTSQuerySessionInformationW(WTS_CURRENT_SERVER_HANDLE, session_id, WTSUserName, &pUserName, &user_name_len))
        {
            //Now we got the user name stored in pUserName
        }
        // Free allocated memory                         
        if (pUserName) WTSFreeMemory(pUserName);
        ImpersonateLoggedOnUser(hUserToken);
        STARTUPINFOW StartupInfo;
        GetStartupInfoW(&StartupInfo);
        StartupInfo.cb = sizeof(STARTUPINFOW);
        PROCESS_INFORMATION processInfo;
        SECURITY_ATTRIBUTES Security1;
        Security1.nLength = sizeof SECURITY_ATTRIBUTES;
        SECURITY_ATTRIBUTES Security2;
        Security2.nLength = sizeof SECURITY_ATTRIBUTES;
        void* lpEnvironment = NULL;

        // Obtain all needed necessary environment variables of the logged in user.
        // They will then be passed to the new process we create.

        BOOL resultEnv = CreateEnvironmentBlock(&lpEnvironment, hUserToken, FALSE);
        if (!resultEnv)
        {
            WriteToLog(L"CreateEnvironmentBlock - failed. Error %d",GetLastError());
            continue;
        }
        std::wstring commandLine;
        commandLine.reserve(1024);
        commandLine += L""";
        commandLine += szCurModule;
        commandLine += L"" "";
        commandLine += SERVICE_COMMAND_Launcher;
        commandLine += L""";
        WCHAR PP[1024]; //path and parameters
        ZeroMemory(PP, 1024 * sizeof WCHAR);
        wcscpy_s(PP, commandLine.c_str());

        // Next we impersonate - by starting the process as if the current logged in user, has started it
        BOOL result = CreateProcessAsUserW(hUserToken,
            NULL,
            PP,
            NULL,
            NULL,
            FALSE,
            NORMAL_PRIORITY_CLASS | CREATE_NEW_CONSOLE,
            NULL,
            NULL,
            &StartupInfo,
            &processInfo);

        if (!result)
        {
            WriteToLog(L"CreateProcessAsUser - failed. Error %d",GetLastError());
        }
        else
        {
            WriteToLog(L"CreateProcessAsUser - success");
        }
        DestroyEnvironmentBlock(lpEnvironment);
        CloseHandle(hImpersonationToken);
        CloseHandle(hUserToken);
        CloseHandle(realToken);
        RevertToSelf();
    }
    WTSFreeMemory(pSession);
}

Finding the Logged-In User

In order to find the logged-in user’s account name, we use the following function:

std::wstring GetLoggedInUser()
{
    std::wstring user{L""};
    WTS_SESSION_INFO *SessionInfo;
    unsigned long SessionCount;
    unsigned long ActiveSessionId = -1;

    if(WTSEnumerateSessions(WTS_CURRENT_SERVER_HANDLE,
                            0, 1, &SessionInfo, &SessionCount))
    {
        for (size_t i = 0; i 

We use this function soon after the Service kicks in. As long as there is no user logged in, this function returns an empty string, and while it does, we know we should wait.

A Watchdog Is a Service’s Best Friend

Services are ideal for use along with a Watchdog mechanism.

Such a mechanism will ensure a given application is always running, and in case it shuts down abnormally, it will restart it. We always need to remember that the user may just select Quit, and in such case, we don’t want to restart the process. However if the process is stopped via the Task Manager, or by any other means, we would want to restart it. A good example would be an AntiVirus program. We want to make sure that malware is not able to terminate the Anti Virus that is supposed to detect it.

To achieve that, we need the Service to provide some sort of an API to the program using it, so when the user of that program selects “Quit”, the program informs the Service that its job is done, and it can uninstall itself.

Some Building Blocks

Next, we will explain some building blocks that are required to understand the code in this article.

GetExePath

In order to obtain the path of our Service, or any executable, the following function will be handy.

/**
 * GetExePath() - returns the full path of the current executable.
 *
 * @param values - none.
 * @return a std::wstring containing the full path of the current executable.
 */
std::wstring GetExePath()
{
    wchar_t buffer[65536];
    GetModuleFileName(NULL, buffer, sizeof(buffer) / sizeof(*buffer));
    int pos = -1;
    int index = 0;
    while (buffer[index])
    {
        if (buffer[index] == L'\' || buffer[index] == L'/')
        {
            pos = index;
        }
        index++;
    }
    buffer[pos + 1] = 0;
    return buffer;
}

WriteLogFile

When developing a Windows Service, (and any software, for that matter), it’s important to have a logging mechanism. We have a very complex logging mechanism, but for the purposes of this article, I added the minimal logging function named WriteToLog. It works like printf but everything sent to it is not only formatted but also stored in a log file, which can later be checked. This log file grows, as new log entries append to it.

The path of the log file, would normally be the path of the Service’s EXE, however, due to Service Isolation, for a short while after rebooting the PC, this path will change to c:WindowsSystem32 and we don’t want that. So our log function checks for the path of our exe and does not assume the Current Directory will remain the same throughout the lifecycle of the Service.

/**
 * WriteToLog() - writes formatted text into a log file, and on screen (console)
 *
 * @param values - formatted text, such as L"The result is %d",result.
 * @return - none
 */
void WriteToLog(LPCTSTR lpText, ...)
{
    FILE *fp;
    wchar_t log_file[MAX_PATH]{L""};
    if(wcscmp(log_file,L"") == NULL)
    {
        wcscpy(log_file,GetExePath().c_str());
        wcscat(log_file,L"log.txt");
    }
    // find gmt time, and store in buf_time
    time_t rawtime;
    struct tm* ptm;
    wchar_t buf_time[DATETIME_BUFFER_SIZE];
    time(&rawtime);
    ptm = gmtime(&rawtime);
    wcsftime(buf_time, sizeof(buf_time) / sizeof(*buf_time), L"%d.%m.%Y %H:%M", ptm);

    // store passed messsage (lpText) to buffer_in
    wchar_t buffer_in[BUFFER_SIZE];

    va_list ptr;
    va_start(ptr, lpText);

    vswprintf(buffer_in, BUFFER_SIZE, lpText, ptr);
    va_end(ptr);

    // store output message to buffer_out - enabled multiple parameters in swprintf
    wchar_t buffer_out[BUFFER_SIZE];

    swprintf(buffer_out, BUFFER_SIZE, L"%s %sn", buf_time, buffer_in);

    _wfopen_s(&fp, log_file, L"a,ccs=UTF-8");
    if (fp)
    {
        fwprintf(fp, L"%sn", buffer_out);
        fclose(fp);
    }
    wcscat(buffer_out,L"n");HANDLE stdOut = GetStdHandle(STD_OUTPUT_HANDLE);
    if (stdOut != NULL && stdOut != INVALID_HANDLE_VALUE)
    {
        DWORD written = 0;
        WriteConsole(stdOut, buffer_out, wcslen(buffer_out), &written, NULL);
    }
}

More Building Blocks – Registry Stuff

Here are some functions we use to store the watchdog executable’s path, so when the Service restarts after a PC restart or reboot, it will have that path available.
 

BOOL CreateRegistryKey(HKEY hKeyParent, PWCHAR subkey)
{
    DWORD dwDisposition; //Verify new key is created or open existing key
    HKEY  hKey;
    DWORD Ret;
    Ret =
        RegCreateKeyEx(
            hKeyParent,
            subkey,
            0,
            NULL,
            REG_OPTION_NON_VOLATILE,
            KEY_ALL_ACCESS,
            NULL,
            &hKey,
            &dwDisposition);
    if (Ret != ERROR_SUCCESS)
    {
        WriteToLog(L"Error opening or creating new keyn");
        return FALSE;
    }
    RegCloseKey(hKey); //close the key
    return TRUE;
}

BOOL writeStringInRegistry(HKEY hKeyParent, PWCHAR subkey,
                           PWCHAR valueName, PWCHAR strData)
{
    DWORD Ret;
    HKEY hKey;
    //Check if the registry exists
    Ret = RegOpenKeyEx(
        hKeyParent,
        subkey,
        0,
        KEY_WRITE,
        &hKey
    );
    if (Ret == ERROR_SUCCESS)
    {
        if (ERROR_SUCCESS !=
            RegSetValueEx(
                hKey,
                valueName,
                0,
                REG_SZ,
                (LPBYTE)(strData),
                ((((DWORD)lstrlen(strData) + 1)) * 2)))
        {
            RegCloseKey(hKey);
            return FALSE;
        }
        RegCloseKey(hKey);
        return TRUE;
    }
    return FALSE;
}

LONG GetStringRegKey(HKEY hKey, const std::wstring &strValueName,
                     std::wstring &strValue, const std::wstring &strDefaultValue)
{
    strValue = strDefaultValue;
    TCHAR szBuffer[MAX_PATH];
    DWORD dwBufferSize = sizeof(szBuffer);
    ULONG nError;
    nError = RegQueryValueEx(hKey, strValueName.c_str(), 0, NULL,
             (LPBYTE)szBuffer, &dwBufferSize);
    if (nError == ERROR_SUCCESS)
    {
        strValue = szBuffer;
        if (strValue.front() == _T('"') && strValue.back() == _T('"'))
        {
            strValue.erase(0, 1); // erase the first character
            strValue.erase(strValue.size() - 1); // erase the last character
        }
    }
    return nError;
}

BOOL readStringFromRegistry(HKEY hKeyParent, PWCHAR subkey,
                            PWCHAR valueName, std::wstring& readData)
{
    HKEY hKey;
    DWORD len = 1024;
    DWORD readDataLen = len;
    PWCHAR readBuffer = (PWCHAR)malloc(sizeof(PWCHAR) * len);
    if (readBuffer == NULL)
        return FALSE;
    //Check if the registry exists
    DWORD Ret = RegOpenKeyEx(
        hKeyParent,
        subkey,
        0,
        KEY_READ,
        &hKey
    );
    if (Ret == ERROR_SUCCESS)
    {
        Ret = RegQueryValueEx(
            hKey,
            valueName,
            NULL,
            NULL,
            (BYTE*)readBuffer,
            &readDataLen
        );
        while (Ret == ERROR_MORE_DATA)
        {
            // Get a buffer that is big enough.
            len += 1024;
            readBuffer = (PWCHAR)realloc(readBuffer, len);
            readDataLen = len;
            Ret = RegQueryValueEx(
                hKey,
                valueName,
                NULL,
                NULL,
                (BYTE*)readBuffer,
                &readDataLen
            );
        }
        if (Ret != ERROR_SUCCESS)
        {
            RegCloseKey(hKey);
            return false;;
        }
        readData = readBuffer;
        RegCloseKey(hKey);
        return true;
    }
    else
    {
        return false;
    }
}

Checking If Our Host Is Running

One key ability of the program in this article is to guard our SampleApp (which we call “the host”), and when it’s not running, restart it (hence the watchdog name). In real life, we would check if the host was terminated by the user, which is OK, or terminated by some malware (which isn’t OK), and in the latter case, restart it (otherwise, the user will select Quit, but the App would continue to “haunt” the system and be executed again and again).

Here is how it’s done:

We create a Timer event and every given amount of time (shouldn’t be too frequent) we check if the host’s process is running, and if it isn’t, we start it. We use a static boolean flag (is_running) which is used to indicate that we are already in this block of code, so it won’t be called while already being handled. This is something I always do in WM_TIMER code blocks, because, when a timer is set at too high a frequency, the code block may be called while the code from previous WM_TIMER event is still being executed).

We also check if a user is logged in by examining the g_bLoggedIn boolean flag.

        case WM_TIMER:
        {
            if (is_running) break;
            WriteToLog(L"Timer event");
            is_running = true;
            HANDLE hProcessSnap;
            PROCESSENTRY32 pe32;
            bool found{ false };

            WriteToLog(L"Enumerating all processess...");
            // Take a snapshot of all processes in the system.
            hProcessSnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
            if (hProcessSnap == INVALID_HANDLE_VALUE)
            {
                WriteToLog(L"Failed to call CreateToolhelp32Snapshot(). Error code %d",GetLastError());
                is_running = false;
                return 1;
            }

            // Set the size of the structure before using it.
            pe32.dwSize = sizeof(PROCESSENTRY32);

            // Retrieve information about the first process,
            // and exit if unsuccessful
            if (!Process32First(hProcessSnap, &pe32))
            {
                WriteToLog(L"Failed to call Process32First(). Error code %d",GetLastError());
                CloseHandle(hProcessSnap);          // clean the snapshot object
                is_running=false;
                break;
            }

            // Now walk the snapshot of processes, and
            // display information about each process in turn
            DWORD svchost_parent_pid = 0;
            DWORD dllhost_parent_pid = 0;
            std::wstring szPath = L"";

            if (readStringFromRegistry(HKEY_LOCAL_MACHINE, (PWCHAR)SERVICE_REG_KEY, (PWCHAR)SERVICE_KEY_NAME, szPath))
            {
                m_szExeToFind = szPath.substr(szPath.find_last_of(L"/\") + 1);    // The process name is the executable name only
                m_szExeToRun = szPath;                                            // The executable to run is the full path
            }
            else
            {
                WriteToLog(L"Error reading ExeToFind from the Registry");
            }

            do
            {
                if (wcsstr( m_szExeToFind.c_str(), pe32.szExeFile))
                {
                    WriteToLog(L"%s is running",m_szExeToFind.c_str());
                    found = true;
                    is_running=false;
                    break;
                }
                if (!g_bLoggedIn)
                {
                    WriteToLog(L"WatchDog isn't starting '%s' because user isn't logged in",m_szExeToFind.c_str());
                    return 1;
                }
            }
            while (Process32Next(hProcessSnap, &pe32));
            if (!found)
            {
                WriteToLog(L"'%s' is not running. Need to start it",m_szExeToFind.c_str());
                if (!m_szExeToRun.empty())    // watchdog start the host app
                {
                    if (!g_bLoggedIn)
                    {
                        WriteToLog(L"WatchDog isn't starting '%s' because user isn't logged in",m_szExeToFind.c_str());
                        return 1;
                    }
                    ImpersonateActiveUserAndRun();

                    RunHost((LPWSTR)m_szExeToRun.c_str(), (LPWSTR)L"");

                }
                else
                {
                    WriteToLog(L"m_szExeToRun is empty");
                }
            }
            CloseHandle(hProcessSnap);
        }
        is_running=false;
        break;

How to Test the Service

When we wanted to test the solution, we hired 20 qualified and cooperative testers. Throughout the progress of work, more and more tests succeeded. At some point, it worked perfectly on our own Surface Pro laptops, but luckily, one of our employees reported that on his PC, after shutting it down, the service wasn’t coming up again, or came up but without starting itself under Ring 3. That’s good news, as during development, when you suspect a bug, the worst news is not to find it and not to be able to reproduce it. All in all, 10% of the testers reported a problem. So the version posted here works perfectly on our employee’s PC, however 2% of the testers still report problems from time to time. In other words, SampleApp doesn’t start after shutting down the PC and turning it on.

Here are instructions for testing the service and the watchdog.

The SampleApp

We have included a sample application generated by the Visual Studio Wizard, as the “host” application that will be kept running by the watchdog. You can run it on its own and it should show up like in the image below. This application doesn’t do much. In fact, it doesn’t do anything …

In the following section, we will provide the instructions for testing the service and the watchdog. You can download the source code from GitHub.

Running from CMD

Open CMD as an Administrator. Change the current directory to where the Service’s EXE resides and type:

SG_RevealerService.exe Install#SampleApp.exe

As you can see, we have two elements:

  • The command, which is Install
  • The argument, which is attached to the command through a hash (#) and should be any executable you want your watchdog to watch.

The Service will first start SampleApp, and from that moment, if you try to terminate or kill the SampleApp, the watchdog will restart it after a few seconds. If you reboot, turn the PC off and on again,you will see if the Service comes back and starts SampleApp again. That sums up the goal and functionality of our Service.

Uninstalling

Finally, to stop and uninstall the service we have included the uninstall.bat script, which goes like this:

sc stop sg_revealerservice
sc delete sg_revealerservice
taskkill /f /im sampleapp.exe
taskkill /f /im sg_revealerservice.exe

Conclusion:

  • Windows Services play a key role in the Microsoft Windows operating system, and support the creation and management of long-running processes.
  • In some cases, when “Fast Startup” is checked, and the PC is started after a regular shutdown, services tend to fail to restart.
  • The aim of this article is to create a persistent service that will always run and restart after Windows restarts, or after shutdown.
  • One of the main issues relates to Service Isolation. The isolation itself (which was introduced in Windows Vista) is important and powerful, however, when we need to interact with the user’s space, that creates some limitations.
  • When a service restarts, we want it to interact with the user’s space, however it can’t be too early (before any user logs in). You can solve this problem though by subscribing to the logon event.
  • Services are ideal for use alongside a watchdog mechanism. Such a mechanism will ensure a given application is always running, and in case it shuts down abnormally, it will restart. We managed to develop that as well, based on the methods described earlier, which made it possible to always run, be alerted when users log in, and interact with user’s space.
  • A Timer event was used to monitor the operation of the watched process.
  • A good logging mechanism is always useful during development, using either a simple, or, when needed, a complex logging tool.
  • Testing the final solution is imperative. Once the code is checked and verified to work, up to 2% of testers might still report bugs, which is within reason.

The Importance of Pipeline Quality Gates and How to Implement Them

Key Takeaways

  • A quality gate is an enforced measure built into your pipeline that the software needs to meet before it can proceed to the next step
  • Security scans should be added as quality gates to your pipeline at the beginning of your project
  • While manual approval steps should be avoided, when needed, they should be built into the pipeline for improved accountability
  • While automated checks are preferred you should build manual overrides for each gate to address urgent issues

  

There is no doubt that CI/CD pipelines have become a vital part of the modern development ecosystem that allows teams to get fast feedback on the quality of the code before it gets deployed. At least that is the idea in principle.

The sad truth is that too often companies fail to fully utilize the fantastic opportunity that a CI/CD pipeline offers in being able to provide rapid test feedback and good quality control by failing to implement effective quality gates into their respective pipelines.

What is a Quality Gate and Why Do You Need Them?

A quality gate is an enforced measure built into your pipeline that the software needs to meet before it can proceed to the next step. This measure enforces certain rules and best practices that the code needs to adhere to prevent poor quality from creeping into the code.

It can also drive the adoption of test automation, as it requires testing to be executed in an automated manner across the pipeline.

This has a knock-on effect of reducing the need for manual regression testing in the development cycle driving rapid delivery across the project.

These quality gates are typically automated, to allow for the pipeline to self-monitor the quality of the code delivered.

Still, it is possible to place a manual verification step into a CI/CD pipeline to prevent accidental errors or ensure certain measures have been properly signed off.

How a Typical Pipeline Looks with Quality Gates in Place

So, I’ve briefly explained the purpose of quality gates, but it perhaps makes more sense to describe how a quality gate will affect the structure of a pipeline and check for quality at the different stages of a project.

While pipelines will all be structured differently based on their purpose and the type of environments a team is working with, the following quality checks in a pipeline are helpful.

Stage 1 – Setup and Checkout:

The developer checks out the code. A setup file should be present to ensure that the environment is then built to be consistent for each developer. Part of this setup should include several linting standards that will also check that certain coding principles are been adhered to. This will prevent code from being deployed where it does not meet these appropriate linting standards.

Quality Check: Linting standards need to be met before the code build can be successful.

Stage 2 – Build:

Once a developer is ready to submit their code, the pipeline will take their code and build it in a container/mocked environment. This environment is where the unit tests will be run.

Stage 3 – Execute Unit Tests/CI Tests:

These tests include both the unit tests written by the developer for the modules under development and some broader component tests which will represent the execution across all the modules, but at a mocked level. These component tests are especially useful when developers have worked on different components of code in isolation and some additional automated tests are required to ensure correct interoperability between the modules.  

Quality Check: Check that the execution of unit tests meets pre-set criteria i.e., 100% successful completion of all tests with a 90% code coverage achieved at a unit testing level.

Stage 4 – Static Analysis:

The relevant static analysis scans and security scans are run against the code base to ensure that certain coding and security best practices have been adhered to.

Quality Check: Successful completion of scans with 100% code coverage.

Stage 5 – Environment readiness check

These are contract-level tests that will run to ensure that the test environment meets the required expectations of the deployment code. It could be something as simple as getting a successful response from certain dependent applications and databases, compliance or version checks of certain software and patches, to measuring existing server utilization to ensure it meets the requirements to be able to successfully run the new change before deployment.

Stage 6 – Deployment to Test Env:

It is only at this point that the code is deployed into an integrated test environment where it will exist against other unmocked modules (actual deployed code sitting outside of the current repos domain) that will run tests developed by the testing team to cover a wider range of unmocked integration tests and end-to-end tests.

Stage 7 – Post-Deployment Checks (Smoke Testing):

These should be lightweight tests of the code to ensure that it is working effectively within the test environment. It will run tests against the newly deployed functionality to ensure it is working correctly and do a high-level regression (smoke testing) of other functionality to ensure nothing critical has been broken by the change. Should it fail here, the code is rolled back and the QA environment is restored.

Quality Check: Successful passing of all post-deployment and smoke tests.

Stage 8 – Automated Functional Integration Tests:

This is where the remainder of the automated tests identified by the testing team are executed. This will span a wider coverage of the codebase and should include some unmocked tests as well, with more realistic data that better resembles production.

Quality Check: All tests need to pass.

Stage 9 – Dynamic Code Analysis:

This is another scan that is run against live code (unlike the static analysis scans which are run against pre-deployed code) and provides an additional measure of quality and security checks. This includes checking SQL queries, long input strings (to exploit buffer overflow vulnerabilities), large numbers (to detect integer overflow and underflow vulnerabilities), and unexpected input data (to exploit invalid assumptions by developers). These are all vital checks that are best run against an actual working environment.

Quality Check: Successful completion of scans.

Stage 10 – Deploy to Staging:

Before this step, you may also want to run a readiness assessment like what was run in Stage 5 to ensure this environment is ready for deployment too. I have simply just chosen not to repeat these steps here again.

The code is then passed on to a staging environment, which is another integrated environment, but one that better reflects the state of production. Also, unlike test environments which can be scaled up and down, this one should be permanently available and configured as the code would be in production unless your actual production environment is scaled this same way. This is important because we want this environment to mimic production settings as close as possible and be configured in the same way, to provide an accurate environment to test against.

Any final manual or automated validations can also be conducted at this time by the testing team. These won’t necessarily form part of the automated tests unless the testing team deems it necessary, though anything that can be automated should ideally be automated.

Post-Deployment Checks:

As was conducted against the QA environment, a set of post-deployment tests are run. This ensures that the staging environment is in the correct state. Smoke tests are then executed to ensure the deployed code is in a usable state.

Quality Check: Successful passing of all post-deployment and smoke tests.

Stage 11 – Non-Functional Test Execution:

It’s at this stage that all load, performance, and additional security tests are executed to ensure that the code meets all the required non-functional requirements (NFR) standards before being deployed into production.

Quality Check: Successful completion and passing of all NFR tests.

Once the code has passed all these stringent quality checks then it is deemed sufficient enough to be deployed to production.

What Does a Quality Gate Check

It all depends on how much information you have to make decisions upon. If you make use of the right data and can identify a reliable measurement based on that data, you can use it to build a quality gate into your system. However, some common quality gate criteria that should be considered are the following:

Quality Validation

This is the most obvious and common form of quality gate that teams make use of. Metrics from the test build artifacts such as pass rate or code coverage are measured and the code is deployed only if they are within required thresholds. In most pipelines, you will probably want to have multiple gates to assess quality across different layers of testing: unit, component, integration, and even any automated end-to-end (E2E) testing.

That automation element is important because any reliance on manual testing or manual processes will affect the speed of your pipeline and reduce its effectiveness. You will also want to have as many unit and component tests as possible, to reduce the execution times of the quality gates and provide quicker feedback.  

Security Scans on Artifacts

This is another quality gate that you want to build into your pipeline checks. Preferably from the beginning of your project. This quality gate requires security scans for things such as anti-virus checking, code signing, and policy checking to be set up against the code repo and for the scan results to then be analyzed and passed against a certain threshold before the code is deployed any further. A gate should initiate the scan and check for completion and success before moving the code to the next stage.

Along with searching for vulnerabilities in the code, you can also use this gate to check for outdated packages and add this to the scoring system. This will help to drive continued maintenance of the code to the latest versions and reduce future tech debt.

Infrastructure Health

This ensures that the environment you are intending to deploy into is in the right state to receive your code. Throughout your pipeline, you can monitor and validate the infrastructure against compliance rules after each deployment, or wait for health resource utilization to meet a certain preset requirement before continuing. These health parameters may vary over time, regularly changing their status from healthy to unhealthy and back to healthy. This is one of the reasons why routinely checking this can be useful as it prevents flakiness in deployments.

To account for such variations, all the gates are periodically re-evaluated until all of them are successful at the same time. The release execution and deployment do not proceed if all gates do not succeed in the same interval and before the configured timeout.

Incident and Issue Management

Ensure the required status for work items, incidents, and issues that are tracked by your sprint/defect management system are all in the correct state. For example, deployments should only occur if no priority zero bugs exist, You should also validate that there are no active incidents after deployment.

Seeking Approvals Through External Users

Notify external users such as legal approval departments, auditors, or IT managers about a deployment by integrating with approval collaboration systems such as Microsoft Teams or Slack, and waiting for the approval to be complete. This might seem like an unnecessary step that would slow down the agility of a pipeline, but on certain work items, it could be essential.

For instance, I’ve worked on certain applications where specific releases needed to meet specific regulatory and legal requirements and we had a legal person sign off on this functionality in test and give a manual approval before it could be deployed. Many financial companies may have similar audit requirements that need to be met depending on the functionality being worked on. When it is essential it is important, for accountability, that it gets built into the pipeline processes as required.

User Experience Relative to Baseline

Using product telemetry, ensure the user experience hasn’t regressed from the baseline state. The experience level before the deployment could be considered as the baseline.

Overriding any measures

While a lot of these gates are automated checks that will enforce the rules strictly, there will always be scenarios where you will want to deploy code to the next stage of your quality gate knowing it won’t be able to meet certain criteria. This may be needed to address an urgent issue for example. You should build a manual deployment override that will bypass any or all steps via the verification of multiple people in the party. Preferably they don’t belong to the same discipline so at least two people from development/business and testing need to agree on the decision.

How to Build Quality Gates into a Pipeline

So, we now know what quality gates do, how they change the structure of a typical pipeline, and what should be checked – but how do we build these quality gates into your pipeline?

A lot of this depends on the tools that are being used by a specific company, so I will present a few examples of quality gates being implemented in YAML that should be able to work with the most common CI/CD applications.

Checking Environments Before Deployment

What you want to do here is run a set of smoke tests against an environment and then stop the deployment if the smoke tests fail:

- name: Pre-deploy test
       task:
             jobs:
                    - name: Server & database
                          commands:
                              - checkout
                              - bash ./scripts/check-db-up.sh
                              - bash ./scripts/check-server-up.sh

In this code example, some bash scripts have been developed to check the state of a server and DB before the deployment scripts are executed. If those commands return failures, then the deployment script does not run.

Similarly, the smoke tests could be placed after the code is deployed to ensure that the systems are still operational post-deploy.

- name: Post-deploy test
      task:
             jobs:
                    - name: Smoke test
                    commands:
                         -checkout
                         - bash ./scripts/check-app-up.sh

In these two examples, I’m simply just calling shell scripts that do basic environment checks for us, but you can easily call an actual suite of tests here that can execute to verify the health of your environment pre and post-deployment.

We are not interested in trying to measure any form of coverage here, instead, if any of the tests fail, the code should not deploy. And so, the trick is within the test themselves to build break commands that will prevent the tests from finishing should a failure occur.

Measuring Code Coverage and Pass Rates Before Deployment

Now depending on the code coverage tool you are using it might represent results differently, so you should set up your tool in a way that meets your needs and then adjust your pipeline accordingly.

 # ReportGenerator extension to combine code coverage outputs into one      
      - task: reportgenerator@4
        inputs:
          reports: '$(Agent.TempDirectory)/*/coverage.cobertura.xml'
          targetdir: '$(Build.SourcesDirectory)/CoverageResults'
 
      # Publish code coverage report to the pipeline
      - task: PublishCodeCoverageResults@1
        displayName: 'Publish code coverage'
        inputs:
          codeCoverageTool: Cobertura
          summaryFileLocation: '$(Build.SourcesDirectory)/CoverageResults/Cobertura.xml'
          reportDirectory: '$(Build.SourcesDirectory)/CoverageResults'
       
      - task: davesmits.codecoverageprotector.codecoveragecomparerbt.codecoveragecomparerbt@1
        displayName: 'Compare Code Coverage'
        inputs:
          codecoveragetarget: 90
 
      - task: CopyFiles@2
        displayName: 'Copy coverage results'
        inputs:
          SourceFolder: '$(Build.SourcesDirectory)/CoverageResults'
          Contents: '**'
          TargetFolder: '$(Build.ArtifactStagingDirectory)/CoverageResults'

In this sample code, we make use of a specific code coverage tool (Cobertura) and modules to combine the results from different code coverage scans to measure the coverage and have then set a soft target of 90% for it to meet. The results are then published so that the development team can analyze and improve them where necessary.

Ensuring Security Scans are Passed Before Deployment

This code looks at the results of a security scan and determines if the pipeline passes or fails based on a scoring ratio.
For this code example, I will show you first a YAML file that pulls the code from the repo, builds it, and then executes the security scan on it (in this case, an Aqua Security scan). However, all this will do is execute the security scan and unless the scan itself fails for any particular reason, the code will still pass.
steps:

  main_clone:
    title: Cloning main repository...
    type: git-clone
    repo: '${{CF_REPO_OWNER}}/${{CF_REPO_NAME}}'
    revision: '${{CF_REVISION}}'
    stage: prepare
  build:
    title: "Building Docker Image"
    type: "build"
    image_name: "${{CF_ACCOUNT}}/${{CF_REPO_NAME}}"
    tag: ${{CF_REVISION}}
    dockerfile: "Dockerfile"
    stage: "build"
  AquaSecurityScan:
    title: 'Aqua Private scan'
    image: codefresh/cfstep-aqua
    stage: test
    environment:
      - 'AQUA_HOST=${{AQUA_HOST}}'
      - 'AQUA_PASSWORD=${{AQUA_PASSWORD}}'
      - 'AQUA_USERNAME=${{AQUA_USERNAME}}'
      - IMAGE=${{CF_ACCOUNT}}/${{CF_REPO_NAME}}
      - TAG=${{CF_REVISION}}
      - REGISTRY=codefresh

To ensure we stop the deployment from going ahead if the scan flags any vulnerabilities, we can write a scan policy, like the below example, that will fail or pass based on the results.

$ kubectl apply -f - -o yaml  ---
> kind: ScanPolicy
> metadata:
>   name: scan-policy
> spec:
>   regoFile: |
>     package policies
>
>     default isCompliant = false
>
>     # Accepted Values: "Critical", "High", "Medium", "Low", "Negligible", "UnknownSeverity"
>     violatingSeverities := ["Critical","High","UnknownSeverity"]
>     ignoreCVEs := []
>
>     contains(array, elem) = true {
>       array[_] = elem
>     } else = false { true }
>
>     isSafe(match) {
>       fails := contains(violatingSeverities, match.Ratings.Rating[_].Severity)
>       not fails
>     }
>
>     isSafe(match) {
>       ignore := contains(ignoreCVEs, match.Id)
>       ignore
>     }
>
>     isCompliant = isSafe(input.currentVulnerability)
> EOF

It’s important to remember here that the results are dependent on the scanning tool itself and its configuration.

Proactive Quality

Every team wants to release reliable, quality code and fund the right balance between the testing effort and the ability to deliver and deploy code quickly. By utilizing quality gates in your CI/CD pipelines, your team can take control of their QA and testing process and build confidence in knowing that code is suitably tested across multiple different testing disciplines before it is deployed into production.
 

PHP 8 – Functions and Methods

Key Takeaways

  • PHP 8.1 simplifies the syntax for creating a callable to AnyCallableExpression(...).
  • PHP 8.0 introduces named function parameters in addition to positional arguments.
  • PHP 8.1 introduces Fibers as interruptible functions to facilitate multi-tasking.
  • PHP 8 adds new standard library functions, including __toString(), and sets new requirements on the use of magic methods.
  • Private methods may be inherited and reimplemented without any restrictions, except for private final constructors that must be kept as such. 

This article is part of the article series “PHP 8.x”. You can subscribe to receive notifications about new articles in this series via RSS.

PHP continues to be one of the most widely used scripting languages on  the web with 77.3% of all the websites whose server-side programming language is known using it according to w3tech. PHP 8 brings many new features and other improvements, which we shall explore in this article series.

PHP 8.0 adds support for several functions- and methods-related features, some of which are an improvement of existing features, while others are completely new features. The enhanced callable syntax in PHP 8.1 can be used to create anonymous functions from a callable. Named function arguments  may be used along with positional arguments with the added benefit that named arguments are not ordered and can convey meaning by their name.  Fibers are interruptible functions that add support for multitasking. 

Inheritance on private methods is redefined

Object inheritance is a programming paradigm that is used by most object-oriented languages including PHP. It makes it possible to override  public and protected methods, and class properties and constants defined in a class from any class that extends it. In PHP, public methods cannot be reimplemented with a more restrictive access such as by making a public method private. To demonstrate this, consider a class B that extends class A and reimplements a public method from A. 

sortArray();

When run, the script generates an error message:

Fatal error: Access level to B::sortArray() must be public (as in class A) 
public method cannot be reimplemented.

On the contrary, private methods defined in a class are not inherited and can be reimplemented in a class that extends it.   As an example, class B extends class A in the following script and reimplements a private method from A.

Prior to PHP 8.0, two restrictions applied to private method redeclaration in an extending class: the final and static modifiers were not allowed to be changed. If a private method was declared final, an extending class was not allowed to redeclare the method. If a private method was declared static, it was to be kept static in an extending class. And, if a private method did not have the static modifier, an extending class was not allowed to add a static modifier. Both restrictions have been lifted in PHP 8. The following script runs ok in PHP 8. 

The only private method restriction in PHP 8 is to enforce private final constructors, which are sometimes used to disable the constructor when using static factory methods as a substitute.  

The script generates error message:

Fatal error: Cannot override final method A::__construct() 

A variadic argument may replace any number of function arguments

In PHP 8, a single variadic argument may replace any number of function arguments.  Consider the following script in which class B extends class A and replaces the three-argument function sortArray with a single variadic argument. 

  $val) {
                echo "$key = $val ";
             }  
       } elseif ($sortType == "desc") {
             rsort($arrayToSort);
             foreach ($arrayToSort as $key => $val) {
                echo "$key = $val ";
             }  
       }  
   }
}
class B extends A {

    public function sortArray(...$multiple) {

        $arrayToSort= $multiple[0];
        $sortType=$multiple[1];
    
    if ($sortType == "asc") {
             sort($arrayToSort);
             foreach ($arrayToSort as $key => $val) {
                echo "$key = $val ";
             }  
        } elseif ($sortType == "desc") {
             rsort($arrayToSort);
             foreach ($arrayToSort as $key => $val) {
                echo "$key = $val ";
             }  
        }  
   }
}

The sortArray function in class B  may be called using multiple arguments.

$sortType="asc";
$arrayToSort=array("B", "A", "f", "C");
$arraySize=4;
 
$b=new B();
$b->sortArray($arrayToSort,$sortType,$arraySize);

The output is as follows:

0 = A 1 = B 2 = C 3 = f 

Simplified Callable Syntax

A callable is a PHP expression that can be called, such as an instance method, a static method, or an invocable object. A callable can be used to create a short-form expression for a method call, for example. In PHP 8.1, there is  new callable syntax available:

AVariableCallableExpression(…)

The AVariableCallableExpression represents a variable callable expression. The ellipses … is included in the syntax. 

Why a new callable syntax? Let’s recall what the traditional callable syntax looks like with some examples:

$f1 = 'strlen'(...);
$f2 = [$someobj, 'somemethod'](...);
$f3 = [SomeClass::class, 'somestaticmethod'](...);

 This has two issues:

  1. The syntax involves strings and arrays
  2. The scope is not maintained at the point at which the callable is created.

To demonstrate this, consider the following script for sorting an array in which the getSortArrayMethod() method returns a callable for the sortArray() method with return [$this, 'sortArray']

arrayToSort = $arrayToSort;
        $this->sortType = $sortType;
    }

    public function getSortArrayMethod() {
             return [$this, 'sortArray'];
         
    }

    private function sortArray() {
        if ($this->sortType == "Asc") {
             sort($this->arrayToSort);
             foreach ($this->arrayToSort as $key => $val) {
                echo "$key = $val ";
             }  
        } elseif ($this->sortType == "Desc") {
             rsort($this->arrayToSort);
             foreach ($this->arrayToSort as $key => $val) {
                echo "$key = $val ";
             }  
        } else {
              
             shuffle($this->arrayToSort);
             foreach ($this->arrayToSort as $key => $val) {
                echo "$key = $val ";
             }  
        }
    }
}

$sortType="Asc";
$arrayToSort=array("B", "A", "f", "C");
$sort = new Sort($arrayToSort,$sortType);
$c = $sort->getSortArrayMethod();
$c();

The script generates an error message:

Fatal error: Uncaught Error: Call to private method Sort::sortArray()
 from global scope

Using Closure::fromCallable([$this, 'sortArray']) instead of  [$this, 'sortArray'] would fix the scope issue, but using the Closure::fromCallable method makes the call verbose. The new callable syntax fixes both the scope and syntax verbosity issues. With the new callable syntax, the function becomes:

public function getSortArrayMethod() {

       return $this->sortArray(...);         
}

The array gets sorted with output:

0 = A 1 = B 2 = C 3 = f

The new syntax can be combined with the traditional syntax involving strings and arrays to fix the scope issue. The scope at which the callable is created is kept unchanged.

public  function getSortArrayMethod() {
        return [$this, 'sortArray'](...);       
            
}

The new callable syntax may be used with static methods as well, as demonstrated by the following script that includes a static function. 

arrayToSort = $arrayToSort;
        $this->sortType = $sortType;
    }

    public  function getStaticMethod() {
       
       return Sort::aStaticFunction(...); 
        
    }
    
    private  static  function aStaticFunction() {

    }
}

$sortType="Asc";
$arrayToSort=array("B", "A", "f", "C");
$sort = new Sort($arrayToSort,$sortType);
$cStatic=$sort->getStaticMethod();
$cStatic();

The output is the same as before:

0 = A 1 = B 2 = C 3 = f

The following are equivalent ways for calling a method:

return $this->sortArray(...); 
return Closure::fromCallable([$this, 'sortArray']);
return [$this, 'sortArray'](...); 

The following are equivalent ways for calling a static method:

return Sort::aStaticFunction(...); 
return [Sort::class, 'aStaticFunction'](...); 
return Closure::fromCallable([Sort::class, 'aStaticFunction']);

 The new callable syntax may be used even if a function declares parameters.

arrayToSort = $arrayToSort;
        $this->sortType = $sortType;
    }

    public  function getSortArrayMethod() {

       return $this->sortArray(...); 
    }

    private function sortArray(int $a,string $b) {
        if ($this->sortType == "Asc") {
             sort($this->arrayToSort);
             foreach ($this->arrayToSort as $key => $val) {
                echo "$key = $val ";
             }  
        } elseif ($this->sortType == "Desc") {
             rsort($this->arrayToSort);
             foreach ($this->arrayToSort as $key => $val) {
                echo "$key = $val ";
             }  
        } else {
              
             shuffle($this->arrayToSort);
             foreach ($this->arrayToSort as $key => $val) {
                echo "$key = $val ";
             }  
        }
    }
}

A callable must be called with its arguments if the method declares any.

$sortType="Asc";
$arrayToSort=array("B", "A", "f", "C");
$sort = new Sort($arrayToSort,$sortType);
$c = $sort->getSortArrayMethod();
$c(1,"A");

Simplified Syntax can be used with  any PHP Callable expression

The simplified callable syntax may be used with any PHP callable expression. The callable syntax is not supported with the new operator for object creation because the callable syntax AVariableCallableExpression(…) does not have provision to specify constructor args, which could be required. The following is an example that is not supported:

$sort = new Sort(...);

An error message is generated:

Fatal error: Cannot create Closure for new expression

The following script demonstrates the full range of callable expressions that are supported.

arrayToSort = $arrayToSort;
        $this->sortType = $sortType;
    }

    public  function getSortArrayMethod() {

       return $this->sortArray(...); 
    }

    public  function getStaticMethod() {
       
      return Sort::aStaticFunction(...);   
         
    }
    
    public  static  function aStaticFunction() {

    }
     
    public    function sortArray(int $a,string $b) {
        if ($this->sortType == "Asc") {
             sort($this->arrayToSort);
             foreach ($this->arrayToSort as $key => $val) {
                echo "$key = $val ";
             }  
        } elseif ($this->sortType == "Desc") {
             rsort($this->arrayToSort);
             foreach ($this->arrayToSort as $key => $val) {
                echo "$key = $val ";
             }  
        } else {
              
             shuffle($this->arrayToSort);
             foreach ($this->arrayToSort as $key => $val) {
                echo "$key = $val ";
             }  
        }
    }
    public function __invoke() {}
}

$sortType="Asc";
$arrayToSort=array("B", "A", "f", "C");

 
$classStr = 'Sort';
$staticmethodStr = 'aStaticFunction';
$c1 = $classStr::$staticmethodStr(...);
$methodStr = 'sortArray';

$sort = new Sort($arrayToSort,$sortType);
$c2 = strlen(...);
$c3 = $sort(...);  // invokable object
$c4 = $sort->sortArray(...);
$c5 = $sort->$methodStr(...);
$c6 = Sort::aStaticFunction(...);
$c7 = $classStr::$staticmethodStr(...);

// traditional callable using string, array
$c8 = 'strlen'(...);
$c9 = [$sort, 'sortArray'](...);
$c10 = [Sort::class, 'aStaticFunction'](...); 
$c11 = $sort->getSortArrayMethod();
$c11(1,"A");
$cStatic=$sort->getStaticMethod();
$cStatic();
 

Trailing comma and optional/required arguments order 

Another new feature in PHP 8.0 is support for adding a trailing comma  at the end of the list of parameters  to a function to improve readability. Any trailing comma is ignored. A trailing comma may not always be useful, but could be useful if the  parameters  list is long or if parameter names are long, making it suitable to list them vertically. A trailing comma is also supported in closure use lists. 

PHP 8.0 deprecates declaring optional arguments before required arguments. Optional arguments declared before required arguments are implicitly required. 

The following script demonstrates the required parameters implicit order in addition to the use of trailing comma. 


The output is as follows:

Deprecated: Optional parameter $the_third_arg_of_this_function declared before required parameter $the_last_arg_of_this_function is implicitly treated as a required parameter

Nullable parameters are not considered optional parameters and may be declared before required parameters using  the $param = null form or the explicit nullable type, as in the following script:

 

Named Function Parameters and Arguments

PHP 8.0 adds support for named function parameters and arguments besides already supported positional parameters and arguments. Named arguments are passed in a function call with the following syntax:

Argument_name:Argument_value

Some of the benefits of named arguments are the following:

  • Function parameters may be given a meaningful name to make them self-documenting
  • Arguments are order-independent when passed by name
  • Default values may be skipped arbitrarily.

In the following script, the array_hashtable function declares named parameters. The function may be passed argument values with or without argument names. When positional arguments are passed, the function parameters declaration order is used. When named arguments are passed, any arbitrary order may be used.

";
}
// Using positional arguments:
array_hashtable(0, 10, 50, 20, 25);

// Using named arguments: 

array_hashtable(key2: 0, key5: 25, key1: 10, key4: 50, key3: 20);
?>

The output is:

0 10 50 20 25
10 0 20 50 25

Named arguments and positional arguments may be used in the same function call.  The mixed arguments call is used with the same example function array_hashtable.

";
}
// Using mixed arguments:
array_hashtable(0, 10, 50, key5: 25, key4: 20);
 
?>

Output is :

0 10 50 20 25

Notice that named arguments are used only after positional arguments. The following script reverses the order and uses positional  arguments after named arguments:

";
}
// Using mixed arguments:
array_hashtable(0, 10, key3: 25, 50, key5: 20);
 
?>

The script generates an error message:

Fatal error: Cannot use positional argument after named argument 

Declaring optional arguments before required arguments is deprecated even with named arguments, as demonstrated by the following script:

";
}
// Using mixed arguments:
array_hashtable(1,2,key3: 25, key4: 1,key5: 20);
 
?>

The output includes the deprecation messages:

Deprecated: Optional parameter $key1 declared before required parameter $key5 is implicitly treated as a required parameter 
Deprecated: Optional parameter $key2 declared before required parameter $key5 is implicitly treated as a required parameter 
Deprecated: Optional parameter $key3 declared before required parameter $key5 is implicitly treated as a required parameter 

When using optional named parameters after required named parameters, named arguments may be used to skip over one or more  optional parameters in a function call, as in the script:

";
}
// Using mixed arguments:
array_hashtable(key1:1, key2:2,key4: 25);
 
?>

The output is:

1 2 20 25 10

You may call a function with only a subset of its optional arguments, regardless of their order.

";
}
// Using mixed arguments:
array_hashtable(1,2,key4: 25);
 
?>

Output is as follows:

1 2 20 25 10

Even when calling a function with a subset of its optional arguments, positional arguments cannot be used after named arguments, as demonstrated by script:

";
}
// Using mixed arguments:
array_hashtable(1,2,key4: 25,5);
 
?>

The following error message is produced:

Fatal error: Cannot use positional argument after named argument

PHP 8.1 improves on the named arguments feature by supporting   named arguments after unpacking the arguments, as in the script:

";
}
echo array_hashtable(...[10, 20], key5: 40);  
echo array_hashtable(...['key2' => 2, 'key1' => 2], key4: 50);  
 
?>

The output is as follows:

10 20 30 40 40
2 2 30 50 50

However, a named argument must not overwrite an earlier argument as demonstrated by script:

";
}
echo array_hashtable(...[10, 20], key2: 40);   
?>

Output is as follows:

Fatal error: Uncaught Error: Named parameter $key2 overwrites previous argument.

Non-static method cannot be called statically

Prior to PHP 8.0, if you called a non-static method in a static context, or statically, you only got a deprecation message. With 8.0 you now get an error message. Also, $this is undefined in a static context.  To demonstrate this, consider the following script in which a non-static method aNonStaticMethod() is called with static syntax A::aNonStaticMethod()

If you run the script, you would get an error message:

Uncaught Error: Non-static method A::aNonStaticMethod() cannot be called statically

Fibers

PHP 8.1 adds support for multi-tasking with Fibers. A Fiber is an interruptible function with a stack of its own. A Fiber may be suspended from anywhere in the call stack, and resumed later. The new Fiber class is a final class that supports the following public methods:

Method

Description

__construct(callable $callback)

Constructor to create a new Fiber instance. The parameter is the callable to invoke when starting the fiber. Arguments given to Fiber::start() will be provided as arguments to the given callable. The callable need not call Fiber::suspend() at all, or if called need not call directly. The call to Fiber::suspend() may be deeply nested down the call stack.

start(mixed ...$args): mixed

Starts the fiber. The method returns when the fiber suspends or terminates. A variadic list of arguments is provided to the callable used when constructing the fiber. A mixed value is returned from the first suspension point or NULL if the fiber returns. A FiberError is thrown if the fiber has already been started when this method is called.

resume(mixed $value = null): mixed

Resumes the fiber, returning the given mixed value from Fiber::suspend(). Returns when the fiber suspends or terminates. The returned mixed value is actually returned from the next suspension point or NULL if the fiber returns. Throws a FiberError if the fiber has not started, is running, or has terminated.

throw(Throwable $exception): mixed

Throws the given exception into the fiber from Fiber::suspend(). Returns when the fiber suspends or terminates. The param is the Throwable $exception. The returned  mixed Value is actually returned from the next suspension point or NULL if the fiber returns. Throws a FiberError if the fiber has not started, is running, or has terminated.

getReturn(): mixed

Gets the mixed return value of the fiber callback. NULL is returned if the fiber does not have a return statement. Throws a FiberError if the fiber has not terminated or the fiber threw an exception.

isStarted(): bool

Returns bool True if the fiber has been started.

isSuspended(): bool

Returns bool True if the fiber has been suspended.

isRunning(): bool

Returns bool True if the fiber is currently running.

isTerminated(): bool

Returns bool True if the fiber has been terminated.

static suspend(mixed $value = null): mixed

Suspends  the fiber. Returns execution to the call to Fiber->start(), Fiber->resume(), or Fiber->throw(). The fiber may be resumed with Fiber::resume() or Fiber::throw(). Cannot be called from the main thread outside the fiber. The param is a mixed $value to return from Fiber::resume() or Fiber::throw(). The return mixed value is provided to Fiber::resume().

Throws FiberError if not within a fiber (i.e., if called from the main thread).

static getCurrent(): ?Fiber

Returns the currently executing fiber instance or NULL if in main.

A Fiber may be started only once, but may be suspended and resumed multiple times.  The following script demonstrates multitasking by using a Fiber to perform different types of sorts on an array. The Fiber is suspended after each sort, and resumed later to perform a different type of sort. 

 $val) {
                echo "$key = $val ";
             }  
  echo "
"; Fiber::suspend(); rsort($arr); foreach ($arr as $key => $val) { echo "$key = $val "; } echo "
"; Fiber::suspend(); shuffle($arr); foreach ($arr as $key => $val) { echo "$key = $val "; } }); $arrayToSort=array("B", "A", "f", "C"); $value = $fiber->start($arrayToSort); $fiber->resume(); $fiber->resume(); ?>

The output is as follows:

0 = A 1 = B 2 = C 3 = f
0 = f 1 = C 2 = B 3 = A
0 = C 1 = f 2 = A 3 = B

If the Fiber is not resumed after first suspension, only one type of sort is made, which could be implemented by commenting out the two calls to resume().

//$fiber->resume();

//$fiber->resume(); 

Output is the result from the first sort:

0 = A 1 = B 2 = C 3 = f

Stringable interface and __toString()

PHP 8.0 introduces a new interface called Stringable that provides only one method  __toString(). The  __toString() method if provided in a class would implicitly implement the Stringable interface.  Consider the class A that provides a __toString() method.

The script returns 1 from the type check for Stringable.

The reverse is however not true. If a class implements the Stringable interface, it must explicitly provide the __toString() method as the method is not added automatically, as in:

New standard library functions

PHP 8 introduces a number of  new functions belonging to its standard library. 

The str_contains function returns a bool to indicate if the string given as the first argument contains the string given as the second argument. The following script returns false:

And the following script returns 1, or true:

The str_starts_with function returns a bool to indicate if the string given as the first argument starts with the string given as the second argument. The following script returns false.

And the following script returns 1, or true.

The str_ends_with function returns a bool to indicate if the string given as the first argument ends with the string given as the second argument. The following script returns false.


if (str_starts_with('haystack', 'needle')) {
    echo true;
} else {
    echo false;
}

And the following script returns 1, or true.

The fdiv function divides two numbers and returns a float value, as demonstrated by the script:

The output is:

float(1.1538461538461537)
float(5) 
float(INF) 
float(-INF)
float(NAN)
float(5)
float(5)

The fdatasync  function, aliased to fsync on Windows, synchronizes data to a stream on a file. To demonstrate its use, create an empty file test.txt in the scripts directory that contains PHP scripts to run.  Run the script:

Subsequently, open the test.txt file to find the text:

first line of data
second line of data
third line of data

The array_is_list function returns a bool to indicate whether a given array is a list. The array must start at 0, the keys must be all consecutive integer keys and in the correct order.  The following script  demonstrates the array_is_list function:

 'a', 'b']);  

echo array_is_list([1 => 'a', 'b']); // false
echo array_is_list([1 => 'a', 0 => 'b']); // false
echo array_is_list([0 => 'a', 'b' => 'b']); // false
echo array_is_list([0 => 'a', 2 => 'b']); // false

The  output is:

1
1
1

Magic methods must have correct signatures  

Magic methods are special methods in PHP to override default actions They include the following methods, of which the constructor method __construct() may be the most familiar:

__construct(), __destruct(), __call(), __callStatic(), __get(), __set(), __isset(), __unset(), __sleep(), __wakeup(), __serialize(), __unserialize(), __toString(), __invoke(), __set_state(), __clone(), and __debugInfo(). 

As of PHP 8.0 the signature of the magic method definitions must be correct, which implies that if type declarations are used in method parameters or  return type, they must be identical to that in the documentation. The new __toString() method must declare string as return type. To demonstrate declare return type as int:

An error message is generated:

Fatal error: A::__toString(): Return type must be string when declared

However, functions that don’t declare a return type by definition such as the constructor function must not declare a return type, not even a void return type. The following script is an example:

The script returns an error message:

Fatal error: Method A::__construct() cannot declare a return type

All magic methods with a few exceptions, e.g. __construct(), must be declared with public visibility. To demonstrate this, declare __callStatic with private visibility.

 

A warning message is output:

Warning: The magic method A::__callStatic() must have public visibility

Even though it is ok to omit mixed return types,  the method signature must be the same. For example, in the following script, class A declares __callStatic without specifying its return type, while the class B defines its first parameter as an int:

An error message is output:

Fatal error: B::__callStatic(): Parameter #1 ($name) must be of type string when declared

Return Type Compatibility with Internal Classes 

With PHP 8.1 most internal methods, which are the methods in internal classes, have “tentatively” started to declare a return type. Tentatively implies that while in 8.1 only a Deprecation notice is raised, in version 9.0 an error condition message shall be output. Thus, any extending class must declare a return type that is compatible with the internal class, or a deprecation notice is issued. To demonstrate this, extend the internal class Directory  and redeclare the function read() without a return type:

The script generates a deprecation notice:

Deprecated: Return type of A::read() should either be compatible with Directory::read(): string|false, or the #[ReturnTypeWillChange] attribute should be used to temporarily suppress the notice

The following script, however is OK:

 Adding the #[ReturnTypeWillChange] attribute attribute suppresses the deprecation notice:

The SensitiveParameter attribute

While stack traces for exceptions that include detailed information about method parameters are quite useful for debugging, you may not want to output parameter values for some sensitive parameters such as those associated with passwords and credentials. PHP 8.2 introduces a new attribute called SensitiveParameter so that, if a method parameter is annotated with the SensitiveParameter attribute, the parameter’s value is not output in an exception stack trace.

To demonstrate this, consider the following script in which the function f1 has the $password parameter associated with the SensitiveParameter attribute. 

The function throws an Exception just to demonstrate the SensitiveParameter feature. Call the function:

f1(param3: 'a');

Notice that the exception stack trace does not include the value for the $password parameter, and instead has Object(SensitiveParameterValue) listed.

Stack trace: #0 : f1(1, Object(SensitiveParameterValue), 'a') #1 {main}

Built-in functions deprecation/enhancement

The built-in functions  utf8_encode() and  utf8_decode() have often been misunderstood because their names imply encoding/decoding just about any string. Actually the functions are for encoding/decoding only ISO8859-1, aka “Latin-1”, strings. Additionally, the error messages they generate are not descriptive enough for debugging. PHP 8.2 has deprecated these functions. The following script makes use of them:

With PHP 8.2, a deprecation notice is output:

Deprecated: Function utf8_encode() is deprecated  
Deprecated: Function utf8_decode() is deprecated

In PHP 8.2, the functions iterator_count  and iterator_to_array accept all iterables. The iterator_to_array() function copies the elements of an iterator into an array. The iterator_count() function counts the elements of an array. These functions accept an $iterator as the first argument. In PHP 8.2, the type of the $iterator parameter has been widened from Traversable to Traversable|array so that any arbitrary iterable value is accepted.  

The following script demonstrates their use with both arrays and Traversables.

'one', 'two', 'three', 'four');
$iterator = new ArrayIterator($a);
var_dump(iterator_to_array($iterator, true));
var_dump(iterator_to_array($a, true));

var_dump(iterator_count($iterator));
var_dump(iterator_count($a));

The output is as follows:

array(4) { [1]=> string(3) "one" [2]=> string(3) "two" [3]=> string(5) "three" [4]=> string(4) "four" } 
array(4) { [1]=> string(3) "one" [2]=> string(3) "two" [3]=> string(5) "three" [4]=> string(4) "four" } 
int(4) 
int(4)

Summary

In this article in the PHP 8 series, we discussed the new features related to functions and methods, the most salient being named function parameters/arguments, a simplified callable syntax, and interruptible functions called Fibers. 

In the next article in the series, we will cover new features for PHP’s  type system.

This article is part of the article series “PHP 8.x”. You can subscribe to receive notifications about new articles in this series via RSS.

PHP continues to be one of the most widely used scripting languages on  the web with 77.3% of all the websites whose server-side programming language is known using it according to w3tech. PHP 8 brings many new features and other improvements, which we shall explore in this article series.

How Behaviour-Driven Development Helps Those with Sleep Disorders Contribute Effectively

Key Takeaways

  • Invisible illnesses, including sleep disorders, are more prevalent in the workplace than most people realise
  • Techniques like sketchnoting, and candour with one’s team can help those with sleep disorders to facilitate their own productivity 
  • Teams that are open and accepting of those with sleep disorders benefit in their planning accuracy and quality practice from diverse viewpoints 
  • Measures that teams can take to facilitate those with sleep disorders can benefit everyone on the team
  • An open and welcoming team culture plays a large part in helping team members overcome challenges

When I was a young software engineer, fresh out of college, I was willing and eager to be an active, contributing member of my software team. There was only one problem; I had a sleep disorder, which would not be diagnosed for another nine years. The diagnosis made sense of a lot of my personal experiences in the industry, as I learned to negotiate a new way of working within software development, test and automation. Along with my own personal journey came new information about sleep disorders in general, and how software teams can help those of us with such challenges to be active contributors in their development lifecycle. In so doing, the teams benefit from their input, from a diversity of viewpoints in analysing requirements and the software being developed, and in laying the foundation of acceptance for other challenged team members in the future.  

What are sleep disorders?

Sleep disorders are conditions that affect sleep quality, timing, or duration and impact a person’s ability to properly function while they are awake. Some examples of sleep disorders include:

  • Insomnia – a failure to achieve sleep
  • Delayed Sleep Phase Disorder – where sleep onset happens later and later every day
  • Obstructive Sleep Apnoea – where your airway is blocked during sleep
  • Restless Leg Syndrome – involuntary leg movement while sleeping
  • Narcolepsy – a generalised instability in switching between sleep and wakefulness
  • Idiopathic Hypersomnia – similar to Narcolepsy, but with factors that are still poorly understood

It’s worth remembering as well that all of these diagnoses are fuzzy and often overlap. 

For myself, for example, my diagnosis is somewhere between Idiopathic Hypersomnia and Narcolepsy. That means I get Excessive Daytime Sleepiness (EDS), Sleep Drunkenness, rapid REM onset, transient cognitive deficits and some other disturbances. In my case, I became aware I had an issue after my son was born.

We had the usual sleeplessness for a while, and when he started sleeping through the night, I didn’t recover. Then we broke down what my actual sleep experience was and had been for years, moving from subjective to objective understanding. It was a series of questions and answers: 

Me: Y’know when you wake up in the morning, fall back asleep and dream the entire day in five minutes?

Others: No… No…

Me: Y’know when you have entire detailed conversations in the morning and you can’t remember having had them an hour later?

Others: No… No…

… and so on. I went back through my history – remembering when I was 17, coming home from school at 5pm and napping, every single day. Remembering sleeping in lectures that fascinated me when I was in college. Remembering, in fact, waking up while speaking to the lecturer, not aware of what I’d said. Remembering my first job out of college, how hard it seemed to think and remember things. 

With a good primary care medical practitioner and a sleep specialist, and after a thorough sleep study, it all started to fall into place. I started on medication, which really helped my day-to-day experience – though it will likely never feel the way it does for normal sleepers. I’ll never wake up refreshed. I may still have symptoms on a bad day. I even have some strange additional sleep symptoms since starting medication, such as sleep paralysis, and sleep attacks. I’m happy to live with these rare occurrences though, if it means I have a somewhat normal circadian rhythm. 

On a day-to-day basis, my life looks mostly normal, and I should make it clear that my sleep disorder is not as severe as others’. I may experience some attention or cognitive deficits around midday or late afternoon, depending on a lot of factors. If I feel that coming on, I might dedicate some of my lunch break to having a nap. Aside from this Excessive Daytime Sleepiness, the next biggest issue for me is transient cognitive deficit.

Cognitive Deficit

Cognitive Deficit describes an inability to retain detail or parse them effectively and draw conclusions. The memory issues can manifest in a few ways. Short term memory is probably the most commonly affected – when suffering from EDS especially, it can be hard for the information to “go in”, or rather for it to “stay in”. 

In other circumstances we may retain something for several hours, even days, and then spontaneously lose the memory of it. It can be very hard to evaluate what is a firm memory and what is not. Many of us use backup methods for keeping track of important details. Note-taking is somewhat of an art-form among sleepy heads. I used to use pen and paper for this, and always had a notepad with me for calls and meetings. With the advent of the pandemic, I started experimenting with digital methods. Lately I use OneNote for work notes, and Google Keep for personal notes. Both can be used on mobile devices, and so the capability for taking notes, and for having that information on hand, travels with you. The one trade-off from hand-written notes is that it can be more difficult to connect data diagrammatically – as someone who conceptualises in a very visual manner, it was very useful to be able to draw my notes as well as using script. As such, I’ve invested in an eReader device that has a pen interface, which allows me to draw as well as type notes into the apps mentioned. It’s still early days for that, so I’ll see how that pans out. 

For me, focusing on multi-sensory learning helps to improve my retention of important information – this is where drawing relationships spatially, as well as just jotting bullet points helps to lodge that data more securely in my head. This practice is commonly referred to as “sketchnoting”; there are many very beautiful examples online, but I just focus on whatever is necessary to capture my own understanding. I also recommend “re-noting” notes – take the notes that you took in the moment and condense them further, reading them aloud as you do so. I’ve even heard of people singing their notes to themselves.  

Amytangg, CC BY-SA 4.0, via Wikimedia Commons                     

There are other types of operational information that don’t require separate notes to be made, as they arrive to us digitally – these include email, Slack messages, etc. These methods come with their own means of storage and re-reference, however I sometimes find that I need to make temporary copies for more immediate reference. It’s like the difference between cache memory and main memory on a computer; keep immediate knowledge brief but readily accessible, and then dig deeper for context. On Slack, I do this by sharing a link to a message or thread to myself. With email, I use category-based search folders, and then tag the required email using my “Memo” category. 

Sleep Disorders and Work

As you might imagine, most jobs involve having timely access to some kind of operational information. If someone is unable to store or call that information to mind, it can seriously impair their effectiveness. 

If it’s noticed, it may be interpreted by colleagues in a variety of ways, some of them negative. Even if it’s not noticed, it can lead to considerable anxiety on the part of the sufferer – “Will I remember what I need to remember?”, or even worse, “Will I be found out?” This last fear is especially common among those who, for whatever reason, have not felt able to share their condition with their manager or teammates. It can also lead to crippling Imposter Syndrome – a sufferer may doubt their ability to do the job for which they were hired. 

I want to stress that the answer to this is almost always “Yes, you can do the job!” Memory issues are transient, and can be managed – for example with the methods described above – and employers generally hire based on experience, attitude and aptitude, not some ability to perform feats of memory.

Additionally, the team benefits from having people of diverse backgrounds and lived experiences contributing to the software that they build. Imagine having someone on your team who – because of their sleep disorder – is keenly aware of the attention drop-off pitfalls that UI developers often inadvertently build into software, and can help steer the team clear of them. Imagine a developer or tester who – because of their sleep disorder – is extremely motivated to ensure that the code is legible and easily understandable for future re-use. These are just two examples, but research [1] has shown that diverse teams operate better and plan with more accuracy than teams with narrower ranges of backgrounds and experience. 

How Behaviour-Driven Development Can Help

Some of the issues that I have encountered with my sleep disorder have affected my memory, my focus, and my wakefulness in professional settings. Key practices within a Behaviour-Driven Development based workflow can help me to sidestep these particular issues – Behaviour-Driven Development is an approach to software development that was devised to address drawbacks that had become apparent in more traditional methods of developing software. The main issue being communication – how do we ensure that everyone in our organisation is working on the same page and has the same understanding of what we need to  develop and how?

The Three Practices of Behaviour Driven Development are Discovery, Formulation and Automation

The core practices are:

  • Discovery – where we try to learn whatever we can about the next change that will be implemented
  • Formulation – where we build our single source of truth by writing concise scenarios that describe the desired behaviour of our software using domain terminology, and 
  • Automation – where we tie those scenarios to automated routines that perform the repetitive tasks for us

A common way of conducting this discovery is using example mapping – a way of categorising ideas on “cards” (real or virtual) and ordering them for better understanding. Starting with the story card, we then create a set of rule cards, and then for each rule, we create one or more example cards. Questions that arise and that can’t be answered by those in attendance are written on question cards for follow up later. 

Example Mapping involves the expression of ideas on “cards” so that they can impart meaning to the issue at hand. 

With this knowledge, we go to formulation, where we write a set of scenarios in plain language that describe our software’s desired behaviour. This becomes our single source of truth, defined using our own business language, to which everyone refers back.

Gherkin syntax allows us to express & automate desired behaviour of software using plain language

So, for me, engaging in discovery – especially when using example mapping – renders the tasks at hand in a clear, easily understood manner, which helps when I’m having an episode of “brain fog”. Having clear and concise descriptions of functionality in a single source of truth helps when I’m unable to recall what was discussed; I can refer back to the Features. Automation helps when performing repetitive tasks precisely the same each time would be complicated due to some transient cognitive deficits. 

Teams that follow BDD based workflows also tend to engage in other practices, such as mobbing, and having these active/passive collaborative sessions where the other parties are actively engaged in helping you understand can aid understanding and retention of information. I have been in situations in the past where I would have struggled with the complexity of some upcoming change, and what may have taken days and several long email threads was effectively explained in-person within twenty minutes. There also tends to be a culture of openness in these teams, of meeting team members where they are and helping them to contribute to the best of their ability, which necessarily improves engagement and mental health within the team. The open discussion activities within teams that follow BDD practices take the Agile concept of democratic teams and runs with it, making it a cornerstone of day-to-day work. 

Lessons Learned

I’ve learned that, for me, it’s a benefit to be up front about my needs, to make it clear that while I have challenges, I still want to work with the team towards common goals and contribute. A good team will respond positively, and understand that diversity of experience among team members all contributes toward good software. In theory, if a team responds negatively – well then we’re not a good match, which is also valuable information; it has, in the past, given me the impetus to look elsewhere for a team that may value my contribution.  

Advice to Teams

The following is some advice I would give to teams accepting new members into their ranks. This is obviously geared towards those who suffer from sleep disorders, but they can benefit the onboarding of any team member, whether they have sleep disorders, invisible illnesses, mental health challenges or none of these.

Be open. Be generous. Never take it personally if someone drifts off to sleep during a meeting. Ask: how can I/we help you to contribute to the level you would like? Understand that their input is heartfelt, even if they seem tired or distracted. And above all, make it clear that if anyone has persistent challenges in their life – whether it’s sleep disorders, chronic health issues, mental health issues – that the team can be flexible around them, and there’s no reason for anyone to be excluded. 

References

1. Rock, D. Grant, H. Why Diverse Teams are Smarter. Harvard Business Review Accessed Dec 14, 2022

GraalVM Java Compilers Join OpenJDK in 2023, Align with OpenJDK Releases and Processes

Key Takeaways

  • The Community Editions of the GraalVM JIT and Ahead-of-Time (AOT) compilers will move to OpenJDK in 2023.
  • They will align with Java releases and use the OpenJDK Community processes.
  • Existing releases, GraalVM Enterprise Edition features, and other GraalVM projects will remain at GraalVM.
  • GraalVM 22.3 provides experimental support for JDK 19 and improves observability.
  • Project Leyden will standardize AOT compilation in Java and define far-reaching optimizations for Java applications running in a JRE with a JIT compiler.

As part of the GraalVM release 22.3, Oracle detailed the planned move of two GraalVM projects to OpenJDK. Sometime in 2023, the code for the Community Editions of the Just-in-time (JIT) compiler (“Graal Compiler”) and the Ahead-of-Time (AOT) compiler for native OS executables (“Native Image”) will move to at least one new OpenJDK project. Existing releases, GraalVM Enterprise Edition features, and other GraalVM projects will remain at GraalVM.

Oracle originally announced this move during JavaOne in October 2022 without providing specifics. In mid-December 2022, OpenJDK then proposed Project Galahad for the move.

The GraalVM project is part of Oracle Labs and, thereby, not under OpenJDK governance. GraalVM currently has four feature releases per year and follows a different development process than OpenJDK.

At OpenJDK, the GraalVM Java compilers will align with the Java release cadence of two feature updates per year, four annual patch updates, and one Long-Term Support (LTS) release every two years. The GraalVM OpenJDK project will use the OpenJDK Community processes and submit JDK Enhancement Proposals (JEP) for inclusion in OpenJDK Java releases. 

The Graal Compiler is written in Java and uses the Hotspot VM in a Java Runtime Environment (JRE). It replaces the C2 JIT compiler, written in C++, which ships in most Java distributions.

The GraalVM Native Image AOT compiler produces native executables that typically start much faster, use less CPU and memory, and have a smaller disk size than Java applications running in a JRE with a JIT compiler. That makes Java more competitive in the cloud. GraalVM Native Image achieves these optimizations by removing unused code and pre-calculating the application heap snapshot, using the Graal Compiler under the hood. But that also excludes some Java applications from using GraalVM Native Image. InfoQ recently published an article series on this topic.

The Enterprise Edition provides improved Native Image performance, such as the Profile-Guided Optimizations for runtime profiling, but it will not move to OpenJDK. Neither will the other GraalVM projects, such as support for other languages like JavaScript or Python, or Java on Truffle, a Java replacement for the entire Hotspot VM.

The GraalVM Community Edition ships under the GNU General Public License, version 2, with the Classpath Exception. Many OpenJDK distributions, including Oracle’s OpenJDK builds, use that same license. Oracle’s Java distribution, on the other hand, uses the “Oracle No-Fee Terms and Conditions” license. Oracle announced the alignment of “all the GraalVM technologies with Java […] from a licensing perspective” and promised “additional details […] in the coming months.”

GraalVM Release 22.3 Supports JDK 19, Improves Observability

GraalVM 22.3 was the last feature release for 2022. It has experimental support for JDK 19, including virtual threads and structured concurrency from Project Loom. Full support for JDK 19 will come in GraalVM 23.0 at the end of January 2023. 

The release contains a lot of improvements for monitoring native executables, an area that lags behind Java programs running in a JRE. The JDK tool, jvmstat, can now monitor the performance and resource usage of native executables and collect heap dumps for inspection with VisualVM. Native executables can also record the JavaMonitorEnter, JavaMonitorWait, and ThreadSleep events for the free Java Flight Recorder (JFR) tool.

The GraalVM Native Image compiler needs so-called hints about the usage of reflection in Java code. Working with the Spring and Micronaut frameworks, GraalVM launched a public repository of such hints for Java libraries in July 2022. That repository, called “GraalVM Reachability Metadata,” now has entries for Hibernate, Jetty, JAXB, and Thymeleaf.

The GraalVM Native Image AOT compiler performed  2-3 times faster in select benchmarks. Native executables use less memory at runtime and run integer min/max operations much more quickly. Two optimizations, StripMineCountedLoops and EarlyGVN, added as experimental in 22.2, are now stable and enabled by default.

Native executables can now contain a Software Bill of Materials (SBOM), and the debugging experience has improved by better identifying memory usage and memory leaks.

In GraalVM ecosystem news, the current IntelliJ version 2022.3 has experimental support for debugging native executables, and JUnit 5.9.1 has annotations for including or excluding them. 

The Python implementation in GraalVM changed its name to GraalPy. It saw many compatibility and performance improvements, as did the Ruby implementation TruffleRuby. Python and other languages in GraalVM benefit from the experimental availability of the LLVM Runtime on Windows.

This GraalVM release provides a new download option through a one-line shell script for macOS and Windows:

bash 

Project Leyden Optimizes Java With Condensers

Project Leyden is an OpenJDK initiative with a mission “to improve the startup time, time to peak performance, and footprint of Java programs.” Oracle clarified GraalVM’s relationship to that project: it “plans to evolve the Native Image technology in the OpenJDK Community to track the Project Leyden specification.” The original goal of Project Leyden was to add native executables, like the ones produced by the GraalVM Native Image AOT compiler, to the Java Language Specification. After its formal creation in June 2020, the project showed no public activity for two years.

In May 2022, Project Leyden emerged again with a second goal: identifying and implementing far-reaching optimizations for Java applications running in a JRE with a JIT compiler. It did this because the GraalVM AOT compiler enforces a closed-world assumption and must have all application information, such as its classes and resources, at build time. Some Java applications and libraries use dynamic features in Java that don’t work within this constraint. At least some of the optimizations from this second goal will also require changes to the Java Language Specification. Its implementation will rely on existing tools, such as the Hotspot VM and the jlink tool.

In October 2022, Project Leyden detailed how to achieve both goals. It introduced the concept of a condenser that runs between compile time and run time. It “transforms a program into a new, faster, and potentially smaller program while preserving the meaning” of the original program. The Java Language Specification will evolve to contain condensers.

Condensers will define how AOT compilation and native executables fit into Java, fulfilling the original goal of Project Leyen. But condensers will also improve Java applications running in JRE with a JIT compiler, serving the second goal. The GraalVM Java compilers, the JRE, and the HotSpot JIT compiler continue to receive features and updates independent of Project Leyden. So condensers provide an additional layer of optimization.

As of mid-December 2022, the development of condensers has yet to start. That makes it unlikely that Project Leyden will ship condensers in the next Java LTS release, Java 21, in September 2023. So the earliest possible Java LTS release with results of Project Leyden may be Java 25 in September 2025.

The Java Community Reacts to Oracle’s Move

Spring Boot, with the release of version 3.0 in November 2022, joins the group of Java frameworks that have already supported AOT compilation with GraalVM Native Image in production: Quarkus, Micronaut, and Helidon. InfoQ spoke to representatives from all four frameworks about Oracle’s announcement.

The first responses come from Andrew Dinn, Distinguished Engineer in Red Hat’s Java Team, and Dan Heidinga, Principal Software Engineer at Red Hat. They published an InfoQ article in May 2022, arguing for a closer alignment between OpenJDK and GraalVM.

InfoQ: Your InfoQ article from May 2022 said, “Native Java needs to be brought into OpenJDK to enable co-evolution with other ongoing enhancements.” From that perspective, how do you view Oracle’s announcement?

Andrew Dinn & Dan Heidinga: We’re really positive on Oracle’s announcement and on the recent progress on Project Leyden. Bringing the key parts of GraalVM — the JIT compiler and the Native Image code — under the OpenJDK project helps bring both development and user communities together.  

Working together under the OpenJDK project using a common development model and the existing OpenJDK governance enables better communication and collaboration between the GraalVM developers and the broader OpenJDK community. And it makes it easier for these components to influence the design of other Java projects like Valhalla or Amber in ways that make them more amenable to AOT compilation.

Finally, it brings GraalVM directly under the remit of the Java specification process and ensures that Java and GraalVM evolve together in a consistent direction.

InfoQ: Oracle also announced that the GraalVM Native Image AOT compiler will implement the specifications of the OpenJDK project Leyden. Leyden recently delayed the standardization of native Java in favor of optimizing JIT compilation. How do you view that decision now, given this new development?

Dinn & Heidinga: Leyden recently laid out Mark Reinhold’s vision for how to start to address the problem space. He suggests using “Condensers” to help shift computation from one application phase to another. And he mentions both specification changes to support this approach (a necessary task to provide a stable foundation to build on!) and Java language changes, such as the lazy statics JEP draft.

None of this says “delay AOT” or “optimize JIT.” This approach enables both AOT and JIT — as well as general execution — to be optimized. His example is using an XML library to read a configuration file before run time. This approach to shifting computation helps both AOT and JIT compilation. As we’ve seen with frameworks like Quarkus, being able to initialize state at build time has been critical for faster startup. Leyden is now laying the groundwork to do that pre-initialization in a way that preserves the meaning of the program — one of the key requirements Andrew and I had called out in our article.

This is not simply a matter of ensuring that GraalVM is tightly and reliably bound to the Java specification. Mark’s proposal clarifies that Leyden will, if required, carefully and coherently update the specification to permit behavioral variations appropriate to specific Condensation steps. For example, it even mentions the need to clarify the semantics of class loader behavior when targeting a fully closed-world AOT program.

Bringing GraalVM into OpenJDK also makes mixing and matching the code from both projects easier. Maybe GraalVM’s Native Image AOT compiler becomes the final condenser in a pipeline of program transformations?

InfoQ: Your article described some Java features that don’t work in native Java. Java Agent support, which is vital for observability, is another missing feature. What should GraalVM do about these limitations — if anything?

Dinn & Heidinga: The GraalVM team will likely be fairly busy migrating their code into OpenJDK over the next year or so. Migration projects tend to take longer than expected, even when expecting them to do so.

We see the GraalVM community working to increase the tooling available for Native Image, including efforts by Red Hat to support JFR and JMX, and work on debugging support, including efforts by both Red Hat and IntelliJ, all in conjunction with the Oracle developers.

GraalVM is in a great place to continue evolving and collaborating with existing agent providers to find the right instrumentation API for Native Image. Anything learned in this process will be directly applicable to Leyden and speed up the delivery of Leyden.

That’s the end goal now: getting all the experience and learning — and hopefully some code — fed into the Leyden development process to help that project deliver the required specification updates, language changes, and condenser tools to drive broad adoption across the frameworks, libraries, and applications.

Sébastien Deleuze, Spring Framework Committer at VMware, shared the views of the Spring ecosystem.

InfoQ: How do you view the move of the GraalVM JIT and AOT compilers for Java to OpenJDK?

Sébastien Deleuze: Since our work compiling Spring applications to native executables began, we have collaborated very closely with the GraalVM team. Our goal is to limit the differences between “native Java” and “Java on the JVM” while keeping the efficiency benefits of the AOT approach. So from our point of view, the move of the GraalVM JIT and AOT compilers to OpenJDK is good news because it is a key step towards a more unified Java under the OpenJDK banner, even if some differences will remain between native and the JVM.

We are also reaching a point where more and more issues or optimizations require some changes on the OpenJDK codebase. So hopefully, having GraalVM on the OpenJDK side will help, including via closer collaboration with Project Leyden.

GraalVM has been an umbrella project with many distinct sub-projects, like polyglot technologies, with different levels of maturity and different use cases. The clear split between what moves to OpenJDK and the rest will likely help to focus more on the GraalVM Native Image support and clarify what GraalVM means for end users.

InfoQ: The GraalVM Java compilers now have four feature releases per year and will have two in the future. How does that affect Spring?

Deleuze: It is true that the four feature releases have been pretty useful for moving forward fast while we were experimenting with the Spring Native project. But Spring Boot 3, which was released in late November, started our official and production-grade support for GraalVM native. So, from that perspective, the switch to a slower pace in terms of features, synchronized with OpenJDK releases, will help handle those upgrades consistently, with less frequent risks of breaking changes and more time to work on ambitious new features. Let’s not forget there will also be four predictable quarterly Critical Patch Updates annually to fix glitches on native image support.

InfoQ: The development of the GraalVM Java compilers will be different, with at least one OpenJDK project, committers, reviewers, and JEPs. What’s the impact on Spring?

Deleuze: While the processes will change, we expect to continue the collaboration between Spring and GraalVM teams on the OpenJDK side. Also, as announced last year, we work closely with BellSoft, one of the leading OpenJDK contributors, on both JDK and native support. We will share more details on the impact in the upcoming months.

Jason Greene, Distinguished Engineer & Manager at Red Hat, responded on behalf of the Quarkus framework.

InfoQ: How do you view the move of the GraalVM JIT and AOT compilers for Java to OpenJDK?

Jason Greene: We see it as a positive change for both the GraalVM and OpenJDK communities. Bringing the two projects closer together will increase collaboration and code-sharing between the efforts, including work that advances Project Leyden. We have a positive relationship with both teams and look forward to that continuing in the new structure.

InfoQ: The GraalVM Java compilers now have four feature releases per year and will have two in the future. How does that affect Quarkus?

Greene: The change may mean waiting a little longer for a new Native Image feature to appear in a formal GraalVM release. However, GraalVM also has an extensibility SPI that we currently utilize in Quarkus. This, in combination with related improvements to Quarkus itself, allows for improvements to the Quarkus GraalVM Native Image experience within the frequent Quarkus release schedule.

InfoQ: The GraalVM Java compilers will have at least one OpenJDK project with committers, reviewers, and JEPs. What’s the impact on Quarkus?

Greene: We expect minimal impact on the Quarkus community from these changes. Even though the OpenJDK processes and tools have some differences, they share similar goals with the current model. While GraalVM did not use JEPs, they did have design discussions on issues and a PR process involving regular code reviews.

Graeme Rocher, Architect at Oracle, provided a view from the Micronaut framework.

InfoQ: How do you view the move of the GraalVM JIT and AOT compilers for Java to OpenJDK?

Graeme Rocher: This is an excellent step for the community and the broader adoption of GraalVM.

InfoQ: The GraalVM Java compilers now have four feature releases per year and will have two in the future. How does that affect Micronaut?

Rocher: The quarterly releases were helpful during the early adoption phase of GraalVM. GraalVM is mature and stable now, so moving to two releases a year is less of a problem and more of a benefit at this stage. The Micronaut team and the GraalVM team both work at Oracle Labs and will continue to collaborate and ensure developer builds and snapshots are well-tested for each release.

InfoQ: The GraalVM Java compilers will have at least one OpenJDK project with committers, reviewers, and JEPs. What’s the impact on Micronaut?

Rocher: There will undoubtedly be efforts to standardize many of the APIs and annotations for AOT, which we will gradually move to as these new APIs emerge. However, this is not a new challenge for Micronaut as it has evolved in parallel with GraalVM and adapted to improvements as they have emerged.

Tomas Langer, architect at Oracle, responded for the Helidon framework.

InfoQ: How do you view the move of the GraalVM JIT and AOT compilers for Java to OpenJDK?

Tomas Langer: The JIT compiler helps the runtime and has no impact on the sources and development process of Helidon. Any performance improvement to OpenJDK is excellent!

AOT compilation impacts the way we design (and test) our software. If Native Image becomes part of OpenJDK, complexity will decrease for any work related to AOT — we would only need a single installation of a JDK. The same would be true for our customers.

InfoQ: The GraalVM Java compilers now have four feature releases per year and will have two in the future. How does that affect Helidon?

Langer: As GraalVM Native Image becomes more mature, we should not see such significant changes as we have seen in the past. Having fewer releases will actually make our life easier, as we should be able to support the latest release easier than in the past: We are skipping GraalVM release support, as the amount of testing and associated work makes it harder to stay on the latest release.

InfoQ: The GraalVM Java compilers will have at least one OpenJDK project with committers, reviewers, and JEPs. What’s the impact on Helidon?

Langer: I think the answer is very similar to the previous one — the more mature GraalVM Native Image is, the easier it should be for us to consume it.

GraalVM currently plans four feature releases for 2023. At least the first release will still contain the GraalVM Java compilers, as GraalVM Native Image will get full Java 19 support in version 23.0 on January 24, 2023. It’s unclear when the GraalVM Java compilers will have their last release with GraalVM and when they’ll have their first release with OpenJDK.

How We Improved Application’s Resiliency by Uncovering Our Hidden Issues Using Chaos Testing

Key Takeaways

  • Chaos testing is a disciplined approach to test a system’s integrity that can be carried at unit, integration and system levels
  • Chaos testing has well-defined principles for building hypothesis, varied real world events, running experiments on production, automating them and limiting the blast radius
  • While chaos testing presents a lot of advantages for business and testers, like identifying few scenarios that get surfaced only via live data streams, it also has some disadvantages and risks; for instance testing in production may sometimes disrupt a service that may result in system halt
  • Chaos testing needs to be performed on a small part of the system, and as the confidence builds up, the blast radius can be expanded
  • To perform chaos testing, the resolve should come from the management and they should know upfront what they want out of the tests, as well as the risks associated

 

As the software has become more complex, and with the rise in microservices and distributed infrastructure, it is very hard to control systems failure. In the past, as infrastructure was developed and managed on premise, the sysadmins found it very easy to maintain it. Now that systems are hosted on globally distributed infrastructures, it’s hard to predict what failure might occur to the system.

Chaos testing is the act of disrupting and breaking an application system to build resilience. Generally it is performed on production systems, making it extremely sensitive to perform.

In this article, I have listed the chaos testing principles which are outlined by Netflix. The readers should be able to understand the advantages and disadvantages that chaos testing offers. This will help them to decide whether they want to perform it or not. I have also explained why we should convince the management to perform chaos tests, considering all benefits over the risks.

What is chaos testing?

Chaos testing is the highly disciplined approach to test a system’s integrity by proactively simulating and identifying failures in a given environment before they lead to unplanned downtime or a negative user experience.

It can also be defined as a method of testing distributed software that purposely introduces failures and faulty scenarios to verify how the system behaves in the face of random disruptions. These disruptions can cause applications to respond unpredictably and break under pressure.

As an example, while working for a banking client, we came across an issue which happened once a day. Each day, for a few seconds, the site became unresponsive and then this error showed up on screen: “Bank Website down, not working”. After a few seconds it started responding again fine. The issue was not reproducible on staging environments, but after each release, it was reported on production.

After we convinced the clients that we need to watch the production systems closely, we used chaos testing. First we decided to perform the testing on a Sunday between 12:00 Am and 4:00 Am, when least traffic was reported. We started shutting down services randomly one at a time, and observing the impact on the overall system. Soon we found that there was an API-A that got its data feed from a third party API-B and that was never in scope of testing. On inspecting API-A closely we found that once a day, there was a time lag of 31 seconds in receiving the data from API-B. We asked the third party to look into this issue and passed our findings to them. They found that the API-B hangs up for exactly 31 seconds when AM time of their servers shifts to PM times at cusp of noon, and that kept API-A waiting, ultimately leading to hanging of the system.

It was a learning experience as we had focused on our application alone while testing, without taking in count the external code and dependencies on those codes. We modified our strategy and included chaos testing in our test plan.

Principles of chaos testing

Chaos engineering is made up of five main principles:

  1. Identify a steady state
    We should define a “steady state” or control as a measurable system output that indicates the normal working behaviour (in most cases it is well below a one percent error rate).
  2. Hypothesize that the system will hold its steady state
    Once a steady state has been determined, it must be hypothesized that it will continue in both control and experimental conditions.
  3. Ensure minimal impact to your users
    During chaos testing, the goal is to actively try to break or disrupt the system, but it’s important to do so in a way that minimises any negative impact to your users. Your team will be responsible for ensuring all tests are focused on specific areas and should be ready for incident response as needed.
  4. Introduce chaos
    Once you are confident that your system is working, your team is prepared, and the impact areas are contained, you can start running your chaos testing applications. Try to introduce different variables to simulate real world scenarios, including everything from a server crash to malfunctioning hardware and severed network connections. It’s best to test in a non-production environment so you can monitor how your service or application would react to these events without directly affecting the live version and active users.
  5. Monitor and repeat
    With chaos engineering, the key is to test consistently, introducing chaos to pinpoint any weaknesses within your system. The goal of chaos engineering is to disprove the established hypothesis from number two and build a more reliable system in the process.

Chaos testing test pyramid

Over the years, the IT industry has experienced some rather dramatic changes in the design, building, and operational scale in which computer systems operate. This results in development of more complex systems. The cumulative effects result in large-scale distributed systems with more opportunities for failure.

The goal of chaos engineering is to educate and inform the organization of unknown vulnerabilities and previously unanticipated outcomes of a computer system. A primary focus of these complex testing procedures is to identify hidden problems that can potentially arise during production environments prior to an outage failure outside of the organization’s control. Only then can the disaster recovery team address systematic weaknesses and enhance the system’s overall fault-tolerance and resiliency. Hence, Chaos testing is being carried out at various levels.

A typical test pyramid consists of three areas of common testing:

  1. Unit Level
    The primary objective of unit tests is to evaluate an individual component’s specific, expected behaviours. The component being tested must be unattached from its conventional dependencies while the chaos engineering team maintains control of its behaviour with the help of mocks. The worst case scenarios are tested against expected behaviour.
  2. Integration Level
    Individual components interact with each other and hence integration tests focus on the interactions and interrelationships between individual components. Engineers ideally run these tests automatically following the successful unit testing of the individual components. These integration tests can be very useful in determining the stable state or common operational metrics of complex applications and systems.
  3. System Level
    Systems tests proactively evaluate how the entire computer system reacts under the increased stress of a particular, worst-case failure scenario. Only in real-world conditions involving standard production environments can the disaster recovery team definitively determine the steady state behaviours of the individual components and their integration protocols within the overall architecture.

Advantages and disadvantages of chaos testing

Advantages:

By testing the limits of your applications, the insights you can gain will deliver a lot of benefits for your development teams and your overall business. Here are the benefits of a healthy, well-managed chaos engineering practice.

  1. Increases resiliency and reliability
    Chaos testing enriches the organisation’s intelligence about how software performs under stress and how to make it more resilient.
  2. Accelerates innovation
    Intelligence from chaos testing funnels back to developers who can implement design changes that make software more stable, resulting in improved production quality.
  3. Advanced collaboration
    Developers aren’t the only group to see advantages. The entire technical group gains insights that will lead to faster response times and better collaboration.
  4. Speed incident response
    By learning what failure scenarios are possible, the disaster recovery teams can speed up troubleshooting, repairs, and incident response.
  5. Improves customer satisfaction
    Increased resilience and faster response times lead to less downtime. Greater innovation and collaboration from development and SRE teams means better software that meets new customer demands quickly with efficiency and high performance.
  6. Boosts business outcomes
    Chaos testing can extend an organisation’s competitive advantage through faster time-to-value, saving time, money, and resources, and producing a better bottom line.

The more resilient an organisation’s software is, the more customers can enjoy its services without distraction or disappointment.

Disadvantages:

Although the benefits of chaos testing are clear, it is a practice that should be undertaken with deliberation. Here are the top concerns and challenges.

  1. Unnecessary damage
    The major concern with chaos testing is the potential for unnecessary damage. Chaos engineering can lead to a real-world loss that exceeds the allowances of justifiable testing. To limit the cost of uncovering application vulnerabilities, organisations should avoid tests that overrun the designated blast radius. The goal is to control the blast radius so you can pinpoint the cause of failure.
  2. Lack of observability
    Without comprehensive observability, it can be difficult to understand critical dependencies vs non-critical dependencies. A lack of visibility can also make it difficult for teams to determine the exact root cause of an issue, which can complicate remediation plans.
  3. Unclear starting system state
    Another issue is having a clear picture of the starting state of the system before the test is run. Without this clarity, teams can have difficulty understanding the true effects of the test. This can diminish the effectiveness of chaos testing and can put downstream systems at greater risk.

Convince the upper management about doing chaos testing

Chaos testing is a new approach where any failure may lead to the shutting down of whole systems. Hence, it is very critical to convince bosses about performing chaos testing. Here is what we can do:

  1. Educate them
    Biologist Henri Laborit wrote in 1976: “Faced to unknown experience, man has only three choices: fight, do nothing or flee”. This subject is quite new, with little visibility, so it is important to allow your boss to discover the concept at his own speed, avoiding instinctive rejection. You may start by sharing some interesting papers on the subject on internal or external social networks:

  2. Tell the right story
    Once you have educated them, you should adapt your story to their concerns, questions or objections. Never hesitate to play with emotions; they are a major factor in decision-making. The most obvious emotion to play with is fear: fear of major outage that will impact your revenue. For instance, a five-minute outage represents around one million dollars for Google/Alphabet, one hundred thousand dollars for Netflixm and three million for Apple:

Image source

Moreover, it is during an incident that you should not hesitate to be opportunistic to advance your pawns, to propose new practices that will limit the impact in future incidents.

One of the best ways is to speak about resilience – the ability to recover quickly from difficulties.

Things to take care of when planning for chaos testing

Chaos testing is a new concept, but we always had the mindset to perform it, and we did perform it sometimes, without knowing that it was a chaos testing. It has its own principles, benefits and pitfalls. However, I would advise all teams to weigh the pros and cons of conducting these tests before formulating a plan. You should be very clear as to what you want to achieve from these disruptive tests. Take permissions from your bosses and convince them why it is important to carry out these tests. Once they are convinced, then lay out a plan, where you should be defining the blast radius of your tests. The monitoring of systems should be in place and under full observation. Basically it requires a lot of preparations before even beginning them. If the preparations are right, and the intentions are clear, these tests will give a lot of valuable insights.

An Ode to Unit Tests: in Defense of the Testing Pyramid

Key Takeaways

  • The testing diamond didn’t address the problems of the testing pyramid. In fact, it avoided the problems caused by misinformation about unit tests.
  • Opaque-box tests are not exclusive to testing through public interfaces of the system. A system is composed of many boundaries, and all of them benefit from behaviour-focused tests.
  • By avoiding clear-box testing, the need for heavy mocking and public interfaces will drop significantly. This leads to a more maintainable test portfolio.
  • Avoid publicly accessible code at all costs. The less code you have that is accessible, the easier it is to maintain, evolve, and refactor your code.
  • Build architectures with a testing strategy in mind. How easy it is to test them will dictate the success of the architecture.

 

It was 2014 when David Heinemeier Hansson set the Software Development world on fire. He was on a RailsConf stage when he proclaimed that “TDD is Death”.

It was a bold move. But he was the leader that many unhappy with testing were looking for. Many followed along, splitting developers into two camps.

That moment was the epicenter of a new wave. A wave that took us to today, where unit tests are losing importance in favor of integration tests.

When the famous Testing Pyramid by Mike Cohn is now reshaped as a diamond.

It’s impossible to find a single reason for this movement, but it’s easy to find many behind the discontentment with the existing testing practices.

This happens when practices are spread like dogma, lack proper guidance, and are rooted in abstract thinking.

Everyone starts by doing their best. Trying, failing, and trying again. Until the moment that someone breaks the chain and presents a different path. A path to a promised low-maintenance test suite. 

What Is the Best Direction?

One thing I’ve learned in this industry is that even being a new field, we quickly forget our history. The rapid pace makes us believe that the past has no answers and that the future has many great things to unveil. I can’t argue with the future perspectives, but I can tell you that our first tendency is to look for innovation instead of information. 

Probably, the following questions would avoid many feeling the need for a Testing Diamond as a replacement for the Pyramid:

  • Is this problem caused by unit tests or how I write unit tests?
  • Am I applying integration testing to components that need it?
  • Did I misunderstand anything that led me to the same assertions in multiple places? 
  • Am I improving my design through tests or testing around the existing design?

Getting Back to Our Roots  

Likely the answer is once again hidden in the past.

So, what does history tell us about integration tests? Historically, integration testing was the stage when different development units were tested together. Those units were developed in isolation, often by multiple teams. That was the phase when we guaranteed that the defined interfaces were well implemented and worked accordingly.

Nowadays, we see integration tests applied to code units developed by the same team. This implies that each source code file is a system boundary. As if each code file had been developed by an autonomous team. This is blurring the lines between unit and integration tests.

Based on that, we could reason that the distinction between integration and unit tests was rooted in a mistake. The idea that integration tests are for testing between teams and unit tests are for testing within a team is the wrong distinction. We were fixing a problem caused by us.

What we should do instead is define clear boundaries. Not layers, but boundaries between each development team. Those boundaries will give you a perspective on the system’s role and how it interacts with other domains. This is similar to how Alistair Cockburn describes the Hexagonal Architecture, also known as Ports and Adapters. In his work, he describes a system as having two sides. The internal and the external ones. Now, we need to bridge those two sides through well-defined boundaries.

How does that help? It is this internal/external relationship that makes it clear the relationship between unit and integration tests. The unit tests are responsible for testing the boundary from an outside-in perspective. While the integration tests will test the boundary from an inside-out perspective. In concrete words, we can say that integration tests ensure the correct behavior of the Adapters, Gateways, and Clients that mediate the relationship with other development units (such as APIs, Plugins, Databases, and Modules).

Behavior Focused Testing

What does the unit in unit tests mean? It means a unit of behavior. There’s nothing in that definition dictating that a test has to focus on a single file, object, or function. Why is it difficult to write unit tests focused on behavior?

A common problem with many types of testing comes from a tight connection between software structure and tests. That happens when the developer loses sight of the test goal and approaches it in a clear-box (sometimes referred to as white-box) way. 

Clear-box testing means testing with the internal design in mind to guarantee the system works correctly. This is really common in unit tests. The problem with clear-box testing is that tests tend to become too granular, and you end up with a huge number of tests that are hard to maintain due to their tight coupling to the underlying structure.

Part of the unhappiness around unit tests stems from this fact. Integration tests, being more removed from the underlying design, tend to be impacted less by refactoring than unit tests. 

I like to look at things differently. Is this a benefit of integration tests or a problem caused by the clear-box testing approach? What if we had approached unit tests in an opaque-box (sometimes referred to as black-box) (behavioral driven) approach? Wouldn’t we have reached similar or even better results?

A common misunderstanding is thinking that opaque-box testing can only be applied to the outer boundaries of our system. That is wrong. Our system is built with many boundaries. Some may be accessible through a communication protocol, while others may be extended with in-process adapters. Each adapter has its own boundaries and can be tested in a behavioral-driven approach.

Mocking: All or Nothing

On a clear-box testing approach, there’s often heavy use of mocks. But when you overuse mocks, tests become harder to maintain. Maybe that is what Mark Seemann refers to when he says that stubs and mocks break encapsulation.

Once you start facing this kind of problem due to heavy mocking, it’s normal to start to hate mocking. So, you try to avoid it at all costs. An API-only testing approach will commonly lead to the need for heavy mocking.

Once again, I question whether it was a problem due to mocking or misusing mocks. 

Mocks and stubs may be harder to maintain, but they exist for a reason. They have a valid role to fulfill in making tests faster and more stable. It’s our responsibility to control them. We don’t want to overuse them beyond where they are essential.

Reduce the Public Surface of Your Code

Another side effect of clear-box testing is that it leads to exposing more code than needed. Validators, mappers, and other pieces of code that could be internal implementation details are now part of the public contract just because we exposed them for the sake of testing. Also, anyone working in the Java and C# world knows how prevalent Interfaces are in their codebases. Once again, for the sake of testing. To mock a dependency, the developer might introduce an Interface.

Once a piece of code is accessible from the outside, it becomes harder to change, and tests become required. This will lead to code where maintainability is a problem and refactoring is almost impossible without rewriting a ton of unit tests.

On the surface that looks like an argument in favor of integration tests, since integration tests focus on the outer layer where many of these implementation details don’t leak. 

Once again, I ask is it a problem with unit tests, or is it a problem with the way we are implementing unit tests? If we implement unit tests in an opaque-box way, ignoring the internal design decisions and only being concerned with what consumers need, it will lead to a smaller contract. A contract that is easier to test, with fewer tests, and tests that are easier to maintain.

Architecture as the Guiding Principle

Tests tend to grow around architecture. We design our systems, then we think about testing. When we do that, systems can become harder to test. We have seen that happen with multi-layer architectures, where the dependency on data access technology brings complexity when unit testing the domain layer.

That can easily be avoided by adopting an architecture with test isolation in mind. From Hexagonal Architecture to Clean Architecture, we have many options from which we can pick.

This type of architecture is built to be device independent. All infrastructure dependencies are plugged into the system through dependency configuration. This type of architecture will make unit testing comfortable and lead you to use integration tests for what they should be: testing adapters to the outside world.

Integration testing adapters only introduce a weak spot into our testing strategy. When you integration test with all the components connected, you gain the advantage of testing things like configuration and composition. We obviously want to test that. We can still run tests with all components connected. The difference is that those become “smoke tests” and don’t need to test every single corner case. That will lead to more stable and reliable tests.

Conclusion

It is as important to question old beliefs in the industry as it is to know them well before starting to question them.

We know the past repeats itself. We know that the past also informs our decisions about the future. We should also know that we will inevitably make the same mistakes over and over again. It’s human nature, so it’s up to us to avoid doing it.

Testing strategies are one of those cases where we tend to repeat our mistakes. We are addressing the pain caused by a lack of good information and education while avoiding the existing good practices.

Testing and architecture are deeply connected. It’s up to us to design architectures with testing in mind. And unit testing will still be a tool we use in our pursuit of good testing strategies.

Java InfoQ Trends Report – December 2022

Key Takeaways

  • Adoption of Java Virtual Threads will continue to grow as frameworks, such as Helidon and Vert.x, have introduced their own Virtual Threads platforms.
  • Adoption of Native Java will also continue to grow as Project Leyden, having been dormant for two years, was resurrected in May 2022, and there is support from frameworks such as Spring.
  • There has been a migration from MicroProfile Metrics to metrics and tracing offered by Micrometer.
  • There have been enormous efforts from both the commercial ecosystem and Java community to set the minimum Java version requirement on Java 11, but some frameworks, such as the Spring ecosystem, have raised the bar to Java 17.
  • With the release of Java 17, Java 11 has finally overtaken legacy Java 8 in the market.
  • As the speed of innovation increases, modernization continues to be a challenge for many Java developers.
  • Since the release of Java 17, the latest LTS, we’re noticing faster adoption of Java 17 than we did when Java 11 was released.

This report provides a summary of how the InfoQ Java editorial team currently sees the adoption of technology and emerging trends within the Java space.

We focus on Java the language, as well as related languages like Kotlin and Scala, the Java Virtual Machine (JVM), and Java-based frameworks and utilities.

We discuss trends in core Java, such as the adoption of new versions of Java, and also the evolution of frameworks such as Spring Framework, Jakarta EE, Quarkus, Micronaut, Helidon, MicroProfile and MicroStream.

This report has two main goals:

  • To assist technical leaders in making mid- to long-term technology investment decisions.
  • To help individual developers in choosing where to invest their valuable time and resources for learning and skill development.

This is our fourth published Java trends report. However, this topic has received ample news coverage as we have been internally tracking Java and JVM trends since 2006.

To help navigate current and future trends at InfoQ and QCon, we make use of the “crossing the chasm” mental model for technology success pioneered by Geoffrey Moore in his book of the same name.

We try to identify ideas that fit what Moore referred to as the early market, where “the customer base is made up of technology enthusiasts and visionaries who are looking to get ahead of either an opportunity or a looming problem.”

As we have done for the 2021, 2020 and 2019 Java trend reports, we present the internal topic graph for 2022:

For context, this was our internal topic graph for 2021:

Aside from some new technologies having been identified in the Innovators space, notable changes are described as follows.

We decided to place all of the downstream distributions of OpenJDK into one label, namely Java Community JDKs, and placed them in the Early Majority space. This list would include: Amazon Corretto; Azul Zulu; Microsoft Build of OpenJDK; BellSoft Liberica JDK; Eclipse Temurin; IBM Semeru; and Alibaba Dragonwell.

Java 17 has moved into the Early Adopters space as some frameworks, especially Spring, have committed to Java 17 as a baseline.

MicroStream joins Helidon and Micronaut in the Early Adopters space due to their continued development and integration with Helidon, Micronaut and Spring Boot.

Spring Native was removed from the model and replaced with the more generic Native Java. This was due to: the resurrection of Project Leyden in May 2022, initially introduced in 2020 and having been dormant for two years; and VMware deciding to supersede the Spring Native project in favor of GraalVM to support generating native images.

After more than a year in development, Spring Framework 6.0 and Spring Boot 3.0 were both released in November 2022 featuring a Java 17+ and Jakarta EE 9 baseline. Embedded observability through Micrometer with tracing and metrics has also been included with these releases.

What follows is a lightly-edited summary of the corresponding discussion on various topics among several InfoQ Java Queue editors and Java Champions:

  • Michael Redlich, Senior Research Technician at ExxonMobil Technology & Engineering Company and Java Queue Lead Editor at InfoQ
  • Ben Evans, Senior Principal Software Engineer at Red Hat and Java Queue Editor at InfoQ
  • Johan Janssen, Software Architect at ASML and Java Queue Editor at InfoQ
  • Dalia Abo Sheasha, Product Manager at Microsoft
  • Billy Korando, Developer Advocate at Oracle
  • Otávio Santana, Distinguished Software Engineer at Zup Innovation

We also acknowledge the Java Queue editors who provided input on updating our “crossing the chasm” model for 2022:

  • Erik Costlow, Senior Director of Product Management and Java Queue Editor at InfoQ
  • Karsten Silz, Full-Stack Java Developer and Java Queue Editor at InfoQ

We feel this provides more context for our recommended positioning of some of the technologies on the internal topic graph.

OpenJDK

Korando: The productivity features of Records, Pattern Matching, and improvements to Strings as part of Project Amber has me the most excited. I know there have been several times in my career where these features could have been very helpful with data transformation, working with formatted strings, and other frustrating areas. I’m excited for these features for my own uses, but also that future Java developers will benefit from these features as well and won’t have to go through those frustrations like I did.

Santana: Java 8 has become outdated. In one of the most recent market surveys, such as the JRebel report, we can see that Java 8 is decreasing in popularity. This evolution is a good thing for the Java market and developer experience industry such as IDEs, frameworks, integrations tools, etc. We can also see enormous efforts of both the market and the Java community to set the minimum requirement on Java 11 as seen with the latest versions of Quarkus and Jakarta EE. I believe other application frameworks will follow suit to set Java 11 as a minimum requirement. When we talk about the Java platform, I believe these tools and frameworks are around 80% of enterprise code, so they will be updated so as not to fall out of favor with enterprise developers. The migration to Java 11 also points to a more fast-paced  culture, where  the Java and JDK version will be updated more often, at least every two years.

Evans: With the release of JDK 17, JDK 11 has finally overtaken legacy JDK 8 in the market.

The launch of the Adoptium Marketplace, and the arrival of reproducible builds, is far more significant than many people realize. This is especially true for enterprises and folks that have to care about supply chain security.

Redlich: The releases of Java 18 and Java 19 over this past year delivered new preview and incubating features such as Virtual Threads, Structured Concurrency, Pattern Matching for switch, Record Patterns, and the Foreign Function & Memory API. These features, in the form of JEPs, provide continued contribution towards fulfilling Project Amber, Project Loom and Project Panama. Java 20, scheduled for release in March 2023, will deliver upgraded previews and incubations of those features.

Java 17 and Beyond

Sheasha: Since the release of Java 17, the latest LTS, we’re noticing faster adoption of Java 17 than we did when Java 11 was released. There are various factors including companies adopting more modern DevOps processes and pipelines that allow for faster and easier application updates. Another factor is frameworks and libraries adopting a more rapid release cadence which had previously been a big blocker for developers ready to upgrade their applications. The latest Spring Framework release, version 6, is based on Java 17 which signals to developers a commitment to adopting the latest Java versions. Another team embracing the faster Java release cadence is the Minecraft team who now ships with Java 17 to millions of their players.

Meanwhile, we’re still seeing lots of developers running apps on Java 8 but since the jump from Java 8 to Java 11 is harder than Java 11 to 17, teams that have done the hard work of upgrading to Java 11 are a lot more likely to quickly adopt Java 17.

We’re also seeing more developers using non-LTS Java versions (Java 18+) as they feel more confident adopting newer Java versions for their applications giving them access to new features of Java that they no longer have to wait years for. We’re seeing lots of developers pick the latest non-LTS Java version when prototyping or developing new applications. For production, however, an LTS version is still the choice for most teams.

Santana: The newest LTS in the Java version, version 17, brings several new features for the Java developer. We can enumerate, but to highlight, the Record construct brings a new perspective, primarily on enterprise business.

Janssen: There are many exciting developments such as Spring Boot requiring Java 17 which will hopefully boost the adoption of Java 17.

Project Loom and Virtual Threads

Sheasha: There is a ton of excitement around the performance of Java. Lots of developers are excited to see the updates coming out of Project Loom as developers get early access to some of the project’s work, such as Virtual Threads.

Korando: The merging of key features of Project Loom into the main line Java release is probably the most significant change over the past year. The development of Project Loom has been eagerly watched over the past several years as its promise of much greater horizontal scalability will be applicable to many applications Java developers work. While features like Virtual Threads are ready to be used in production now, there is no doubt that many developers are eagerly awaiting for them to move out of preview status and be part of an LTS release, hopefully with Java 21 in September 2023.

Evans: I hear a lot of people talking about Project Loom, but I am somewhat more reserved about it – I would rather wait until we have more real-world experience with it. I think it’s possible that it will be the huge game-changer that some people think, but I don’t think it’s a slam-dunk.

Redlich: The much anticipated release of Java 19 in September 2022 featured support for Virtual Threads. Development in this area has already produced incubation frameworks such as Helidon Níma, a microservices framework offered by Oracle, and the Virtual Threads Incubator project offered by Vert.x. I anticipate other vendors to follow suit.

Jakarta EE

After a delay of about three months, the much anticipated release of Jakarta EE 10 was made available to the Java community on September 22, 2022.

Redlich: The release of Jakarta EE 10 featured updates to over 20 of the specifications and a new Core Profile to complement the existing Platform and Web Profile. Plans for a point release of Jakarta 10 and Jakarta EE 11 are already being discussed within the Jakarta EE Working Group.

Native Java (GraalVM/Spring Native/Project Leyden)

Santana: GraalVM is becoming increasingly popular, giving a massive space to Project Leyden. It is a race to make Java startup faster!

Janssen: GraalVM is continuously improving and supporting more use cases such as Spring applications.

Open Telemetry

Evans: OpenTelemetry has made it to version 1.0 and is making extremely strong progress for such a young standard. I didn’t expect to see OpenTelemetry easily exceeding what were already aggressive expectations. It is set to achieve Gartner’s target of “the majority of telemetry traffic by end of 2023” which is well ahead of schedule.

Redlich: The upcoming release of MicroProfile 6.0 will feature the debut of the MicroProfile Telemetry specification to replace the MicroProfile OpenTracing specification that was first introduced in MicroProfile 1.3.

Containers

Sheasha: As more Java workloads are shifting to run in containers, we’re seeing a shift from merely talking about how to containerize a Java app to how to best containerize a Java app. We are seeing more guidance around best practices when running an application in a container. For example, Microsoft has published an article with recommendations around memory allocation and garbage collection.

Evans: Containerisation of Java apps continues to gain ground.

Microsoft Support for Java

After introducing their own downstream distribution of OpenJDK in April 2021, Microsoft has continued its embrace of the Java programming language.

Sheasha: Microsoft joined the Jakarta EE and MicroProfile Working Groups as Microsoft continues to make large investments in the Java ecosystem for many reasons highlighted in this blog post from the GitHub ReadME project.

What is the Java Community Saying?

Sheasha: As the speed of innovation increases, modernization continues to be a challenge for many developers. The more Java versions are released, the bigger the gap for apps on old Java versions and frameworks which gets harder and harder to close. This is why we’re seeing more companies invest in modernization tooling. Projects like OpenRewrite are important as we proceed with innovation without leaving apps behind. We’re also seeing new projects such as the Eclipse Migration Toolkit for Java that help developers with their Java version migrations.

Developers are increasingly overwhelmed by the amount of knowledge required to build an application. There are a constant stream of new libraries, frameworks, and features to learn about. In addition, as we’re seeing more “Ops” in the “DevOps” sneaking into developer’s responsibilities. Developers are now expected to have a good understanding of Docker and Kubernetes.

Korando: I think a lot of the excitement with Project Loom being delivered into the mainline OpenJDK is now turning towards Project Valhalla. Project Valhalla is another long-running project in the OpenJDK and is promising significant improvements to memory management and throughput performance. Hopefully, we will start to set Project Valhalla to start delivering features into mainline OpenJDK in 2023!

Santana: There is a new trend towards reflectionless frameworks where reflection is eliminated to decrease application startup and decrease memory consumption. Frameworks such as Quarkus, Micronaut, Spring Native and Jakarta CDI Lite are examples of this.

The cloud is the new target of any solution. We can see the migration to all environments, not only Infrastructure as a Service (IaaS), but any solution that makes the Java developer’s life easier and gives more abstraction to the operation layer.

Serverless brings scalability and simplicity to handle that from the software developer’s perspective. We can see several solutions moving to the native way to take advantage of it.

Evans: It seems like this year Quarkus has started to properly break through into developers’ consciousness. I meet developers all the time now who have experimented with it, although the number of production apps is still growing. I think people have realized that it’s not just a native-compiled Java, but is also a great developer experience as well as being a first-class Kubernetes Java.

What is New and Exciting That We Didn’t Expect?

Korando: The announcement of the Graal JIT compiler and native image being merged into OpenJDK at JavaOne 2022 was unexpected and exciting. A lot of the technology of GraalVM has been exciting, but has been difficult to use for many Java developers. The merging of these key features into OpenJDK will make them more accessible to all Java developers!

Santana: We know that the developer tools and architecture are a vast industry. Therefore, we can see many companies talk about “perfect solutions” and non-trade-off decisions. But in reality, Java has otherwise proven consistent and several success cases in several scenarios. Each architecture decision has trade-offs, and we need to understand the context to apply the best solution to the best scenario. The Java ecosystem shows us that it is a real-life and production-ready platform. It offers solutions for microservices, CQRS, cloud-native, serverless, even-driven-design, monolith, SQL, NoSQL, mapper, active records, etc.

Evans: I didn’t expect Loom to get merged to mainline (in experimental form) in time for Java 19 and I really like Gunnar Morling’s JFR Analytics project.

Janssen: I recently learned about the Coordinated Restore at Checkpoint (CRaC) in OpenJDK as it combines fast startup times with runtime optimizations.

The Java Community

Sheasha: Personally, I love how many doors Java has opened for me over my entire career. I’ve had a chance to be a developer, team lead, developer advocate and program manager, all within the Java ecosystem.

Java’s continuous innovation and improvements to the language keeps the space interesting. It also keeps the language a great choice for solving various problems across a variety of industries. For me, one of the biggest reasons I’ve loved working in the Java space is the wonderful community that surrounds Java full of welcoming and supportive people from all over the world.

Evans: I am excited about sun-setting Java 8 and moving the community on to Java 17 and beyond. Also, the Observability community, especially OpenTelemetry, is going from strength to strength. And there is new work on profiling, including Java Flight Recorder, starting up.

Janssen: I’m really looking forward to working with project Loom and hope we can soon start to use it in projects.

Redlich: I am enjoying my time contributing to open source projects and have recently been elected as a committer to the Jakarta NoSQL and Jakarta Data specifications and the Eclipse JNoSQL project, the compatible implementation of Jakarta NoSQL. We have been working to have the two Jakarta specifications be included in the Jakarta EE Platform in time for Jakarta EE 11.

Conclusion

Please note that the viewpoints of our contributors only tell part of the story. Different parts of the Java ecosystem and locales may have different experiences. Our report for 2022 should be considered as a starting point for debate, rather than a definitive statement, and an invitation to an open discussion about the direction the industry is taking.