At DHL I built a microservice using cats, cats-effect, cats-tagless, refined, doobie, http4s and ZIO
In this post I would like to go over some of the tech I used to build a new micro-service at DHL and why these technologies and methodologies could be interesting as well for you. I had the freedom to pick new technologies. I would like to thank DHL Netherlands to give me this opportunity.
Back to the project! I’m quite fond of functional programming and I decided to roll with a functional stack. The stack includes http4s, circe, refined, doobie, cats, cats-tagless, cats-effect and on top of that ZIO.
I think there is some great material already out there about cats, http4s, circe, discipline, scalacheck and ZIO.
In a typical Scala application, you model data structures with primitive types such as String
, Int
and so forth. In a lot of cases, these primitive types are too wide, in other words, they accept too many values. This makes it harder to test your function because your function and data structures will accept a lot of values and can, therefore, be in a lot of states. To constrain these primitive types we can use refinement types.
A basic example of this. In my backend, we refer to a depot by a depot code. This is 3 characters long code. In a refinement type I would describe it like this:
type DepotCode = Size[Equal[W.`3`.T]]
def getUsers(depotCode: String Refined DepotCode): Task[List[User]] = ???
In this example, I have a method getUsers
which has a refined String
type that allows a String
of 3 characters. This is incredibly useful because the method is easier to test and you don’t need to check if the depotCode
is a String
of 3 characters long.
This is a basic example of refinement types. You can also use variable-sized strings (3 to 200 chars), regular expressions, etcetera. The variable-sized strings could be useful to not lose data when you store it in a database like Postgres or MySQL.
There are modules for cats, circe, doobie and other libraries that deal with data to constrain your primitives. In my backend, I use circe to encode and decode JSON. With refinement types module for circe, I can be sure that an incoming payload from an HTTP endpoint is in the right shape for me to process further. This eliminates a lot of bugs and tests.
You can also do this by using lists and sets which are not empty like NonEmptyList
and NonEmptySet
. You can even go further and include the dimensions of a collection in a type by using dependent types. This could be useful for matrix operations which is relevant in machine learning.
In the past, I’ve been using Slick and Hibernate which make mappings to types (ORM’s). While this adds type-safety I think there are some downsides to these approaches:
Therefore I prefer Doobie over these solutions. Doobie has a specialized string interpolator for SQL queries. So you can write SQL queries as you are used to, but use string interpolation to safely insert values into your queries like this:
def list(userId: UserId): Query0[ExamResult] =
sql"select exam_id, correct, total from exams where user_id = $userId".query[ExamResult]
This is a Query0
statement which means it’s not a ConnectionIO
yet, to do that you can use combinators like run
, to[List]
, stream
, unique
or option
to get the right shape out of your database. ConnectionIO
is similar to DBIO
(as seen from Slick). It allows you to compose ConnectionIO
statements through a for comprehension. Executing these statements as a whole makes it a transaction.
Another nice feature of Doobie is that you can check if the query type checks with your case class. In this example, it’s ExamResult
.
So ExamResult
is defined as case class ExamResult(id: String Refined ExamId, correct: Int, total: Int)
.
With the check
method which is included in the test harness of doobie you can verify the query is correct:
+ Query0[ExamResult] defined at ExamRepository.scala:20
select exam_id, correct, total from exams where user_id = ?
+ SQL Compiles and TypeChecks
+ P01 UUID → OTHER (uuid)
+ C01 exam_id VARCHAR (varchar) NOT NULL → String
+ C02 correct INTEGER (int4) NOT NULL → Int
+ C03 total INTEGER (int4) NOT NULL → Int
Doobie is simple and works with any JDBC driver. This means you can start with an ordinary Postgres database and switch later to a CockroachDB, Citus, TimescaleDB or even Clickhouse database.
In a backend, you either work with in-memory data structures that need to be modified or data structures inside a database. If you use a database like Postgres or MySQL multiple statements can be composed together to run in a transaction. A transaction is atomic, which means all the changes happen at once or not.
It’s a common practice to group methods that query the database in a Repository
. A repository is an interface to query a certain group of coherent entities. A little example:
trait EventRepository[F[_]] {
def allEvents(offset: Long): Stream[F, AggregateEvent[Json]]
def deleteStream(aggregateId: UUID): F[Int]
def getEventsByAggregateId(aggregateId: UUID, fromSeqNumber: Option[Long] = None): Stream[F, Event[Json]]
def insert(aggregateId: UUID, lastSeqNumber: Long, events: NonEmptyList[Event[Json]]): F[Int]
}
As you can see we have a tagless final interface for a repository. Why? Well, we want to implement all of our repositories in terms of ConnectionIO
(the same thing as DBIO
in Slick) to either choose compose multiple statements into a transaction or run one single method. To do the latter we need to transform the complete interface. Luckily cats-tagless has some utilities for that:
trait FunctorK[A[_[_]]] {
def mapK[F[_], G[_]](af: A[F])(fk: F ~> G): A[G]
}
This may be daunting, but it’s not that hard. If you look at type A[_[_]]
we can fit in EventRepository[F[_]]
. So for parameter af
we could use EventRepository[ConnectionIO]
. This implies that F
is ConnectionIO
. To get values out of a ConnectionIO
doobie has a method trans
on Transactor
(which is a natural transformation). This method looks like this:
def trans(implicit ev: Monad[M]): ConnectionIO ~> M
If we want M
to be a ZIO Task
we are game! The library cats-tagless
offers a nifty macro that makes it easy to derive a FunctorK
from a tagless final interface:
implicit val functorK: FunctorK[EventRepository] = Derive.functorK[EventRepository]
I have a utility method that allows you to select a ConnectionIO
repository and get back Task
-based version:
Which lets you write something like this env.pgRepo(_.users).findById(userId)
. The PostgresRepos
is a case class that contains all the repositories based on ConnectionIO
. Thanks to the FunctorK
instance we can transform the complete algebra/interface to a Task
based on.
But what if you want to compose multiple transaction statements? I use this method:
def pgTransact[A](tx: Monad[ConnIO] => PostgresRepos[ConnIO] => ConnIO[A]): Task[A] =
trans(tx(monadConnIO)(postgresRepos))
And a little example here:
def deleteRows: Task[Unit] = env.pgTransact { implicit connIO => repos =>
for {
_ <- repos.courierEvents.deleteStream(userId.repr)
_ <- repos.courierTokens.delete(userId)
_ <- repos.exams.delete(userId)
} yield ()
}
Et voila!
Tagless final works well with repositories, but can also work pretty well with other subsystems that work with data. This could be a REST or gRPC client interacting with a sub-system like Keycloak, Kubernetes, etc. You could also decorate your algebra with logging, caching, tracing or metrics. Or add timeouts and circuit breakers to all methods using a natural transformation.
Using discipline to test your tagless final algebras is a great tool to write tests that are not bound to a specific database or API. It lets you write down the semantics/laws of your algebras and you plug it to test for example a ConnectionIO
-based implementation or your test implementation which uses ZIO Task
. If you decide to move a repository over to Cassandra, it’s a matter of setting up your test harness to cope with Cassandra-specific IO operations.
I think functional programming can add a lot more stuff to Scala. Scala offers the power to write (Embedded Domain Specific Languages) EDSL’s like in Haskell. This means you embedded a micro-language in the host language itself. Such a language can describe data structures, but also HTTP/gRPC endpoints.
There are projects like formulation
, skeuomorph
and scalaz-schema
which describe data structures to derive Encoders and Decoders, but also Schema’s from data types. I think it’s good to have explicit mappings of your data structures in your code as you will eventually have to think about migrating and back/forward compatibility.
Projects like endpoints
, itinere
and tapir
are there to describe HTTP endpoints. If you combine that with the projects from above you can remove the boilerplate of writing HTTP endpoints by hand as your business logic is mapped to these endpoints! You could see HTTP endpoints as encoders and decoders as well. With these endpoint libraries, you can derive a server and a client but also a Swagger/OpenAPI specification which can be offered to external consumers.
I think having these tools is crucial to prevent problems with the compatibility of your data inside event logs or external consumers of your API.