Static Types and their Impact on Testing

November 18, 2014

In this post (series?), I’d like explore how to write a program in a type safe and testable way. I’m using Scala because it’s the language I’m most proficient in; this would be prettier and less boilerplatey in Haskell.

The basic point that I hope to get across in this post (and the potential follow-ups) is that by encoding our domain into static types and avoiding side effects we can accomplish a few things:

We can greatly limit the scope of our tests
For the tests that still make sense to write, we can write very strong tests
By using meaningful values and separating side-effecting functions from pure ones, we can more easily reason about our program

We’re going to write a simple service that takes a form filled out by a user, sends an email based on that form, and then records that event to a database.

Here’s a possible data model for our email and a form.

package com.notik

case class EmailAddress(address: String)
case class FromAddress(address: EmailAddress)
case class ToAddress(address: EmailAddress)
case class EmailBody(body: String)
case class Recipient(name: String)
case class Email(from: FromAddress, to: ToAddress, body: EmailBody, recipient: Recipient)
case class WebForm(from: FromAddress, to: ToAddress, body: EmailBody, recipient: Recipient)

Right away, we’ve gained something. Not a ton, but something.

If we had encoded the idea of an Email as Email(fromAddress: String, toAddress: String, body: String, recipient: String) then we can easily shoot ourselves in the foot. We might mistakenly write Email("hello", "jason", "jason@gmail.com", "body of email"), for example. We’ve mixed up all of the parameters and we only know things have gone wrong once we run our program and it blows up for InvalidEmailAddressException or whatever. Worse, it might not blow up at all and instead our program is just wrong.

What we should do is encode our domain as =>ly as possible and let the type system/compiler do as much work as possible.

Ok, so with our case classes we have separate types representing the different parts of an email. But, we’re probably creating this Email by populating it from the fields of some web form and there’s still nothing preventing us from mixing up the strings and getting incorrect data, right?

This is always a possibility and users might input bad data altogether. Ok, so let’s encode this into our types. Firstly, we’re going to create a function which takes in a WebForm and produces an Email. But instead of producing an Email directly, we’re going to produce “possibly” an email. We’re not going to assume we have an email and wait for runtime exceptions nor are we going to do validation checks and simply halt in the case of invalid input. Instead, we’re going to encode the notion that something could always go wrong when submitting the form. See EmailService.mkEmail. It takes a WebForm and returns a ValidationNel[String, Email]. If the form is incorrect (as defined by our domain logic) then we get back a list of all the errors. If nothing is wrong, we get an Email.

 import scalaz._
  import Scalaz._
  import argonaut.Parse

  /*
    Validation represents the idea of success or failure. NEL stands for non-empty list. We can actually require
    that the list of errors we get back in the case of failure is non-empty.

    This is the first step towards a more =>ly typed way of handling web form submissions. We encode the notion
    of failure in our type. We no longer have simply an Email as a result of a WebForm. Instead, we are forced, at
    the level of the type system, to deal with the possibility of failure.
   */

  val emailRegex = "^[_A-Za-z0-9-+]+(.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(.[A-Za-z0-9]+)*(.[A-Za-z]{2,})$".r

  def mkEmail(form: WebForm): scalaz.ValidationNel[String, Email] =
    (validateFrom(form.from).toValidationNel |@| validateTo(form.to).toValidationNel |@| validateBody(form.body).toValidationNel |@| validateRecipient(form.recipient).toValidationNel)(Email.apply)

  def validateFrom(from: FromAddress):Validation[String, FromAddress] =
    emailRegex.findFirstIn(from.address.address)
     .map(_ => from.success)
     .getOrElse(s"$from is not a valid email address".fail[FromAddress])

  def validateTo(to: ToAddress):Validation[String, ToAddress] =
    emailRegex.findFirstIn(to.address.address)
     .map(_ => to.success)
     .getOrElse(s"$to is not a valid email address".fail[ToAddress])

  def validateBody(emailBody: EmailBody): Validation[String, EmailBody] =
   if(emailBody.body.nonEmpty) emailBody.success
   else "body of the email cannot be empty".fail[EmailBody]

  def validateRecipient(recipient: Recipient): Validation[String, Recipient] =
    if(recipient.name.nonEmpty) recipient.success
    else "recipient name cannot be empty".fail[Recipient]

  def emailFromJsonForm(json: String):Validation[NonEmptyList[String], Email] = for {
    form => Parse.decodeValidation[WebForm](json).toValidationNel
    email => mkEmail(form)
  } yield email
}

With this in place, if we tried creating an email out of a form with something like from = "gmail.com", to "@gmail.com", body = "", recipient = "Levi" we’d get a list of errors like this:

NonEmptyList(FromAddress(EmailAddress(gmail.com)) is not a valid email address, 
ToAddress(EmailAddress(@gmail.com)) is not a valid email address, body of the email cannot be empty)

Our ultimate goal is to send an email and log a record of that. But we’re still dealing with just our data/model at this point.

Ok, so how do we get this WebForm? For our example, we’ll assume a user is filling out some input fields which will then be posted as JSON. If we were doing it through query params or some other way, the same general principles would apply.

On to JSON. We’ll use the Argonaut library to deserialize some JSON data into our WebForm type.

/*
We need a codec that defines, in a *type safe* way, how to decode our JSON into our WebForm class. We'll put our codecs in the respective companion objects so the implicits can be found without additional imports.
*/

object WebForm {
  implicit def WebFormCodecJson: CodecJson[WebForm] =
    casecodec4(WebForm.apply, WebForm.unapply)("from", "to", "body", "recipient")
}

/*
case class EmailBody(body: String)
case class Recipient(name: String)
case class Email(from: FromAddress, to: ToAddress, body: EmailBody, recipient: Recipient)
*/

object FromAddress {
  implicit def FromAddressCodecJson: CodecJson[FromAddress] =
    casecodec1(FromAddress.apply, FromAddress.unapply)("address")
}

object ToAddress {
  implicit def FromAddressCodecJson: CodecJson[ToAddress] =
    casecodec1(ToAddress.apply, ToAddress.unapply)("address")
}

object EmailBody {
  implicit def EmailBodyCodecJson: CodecJson[EmailBody] =
    casecodec1(EmailBody.apply, EmailBody.unapply)("body")
}

object Recipient {
  implicit def RecipientCodecJson: CodecJson[Recipient] =
    casecodec1(Recipient.apply, Recipient.unapply)("recipient")
}

Parsing/decoding may fail for a number of reasons. As with the validation examples above, we the possibility of failure is encoded into our types. Specifically, we we decode, the value produced is an Either[String, WebForm] where the left side contains any error message in the case of failure and the right side contains the Email in the case of success. Again, the basic idea is simple but powerful: instead of pretending that we have the values we ultimately want even though we know that things may very well blow up, we simply encode the possiblity of failure into our type.

Using functional constructs we can deal with the “happy path” and deal with errors at the very end, instead of sprinkling error handling all over our code. We do this by decoding the form from json and then calling mkEmail. The function emailFromJsonForm returns a Validation[NonEmptyList[String], Email] with the list of any errors on the left and the Email on the right side.

At this point, we have no side-effecting functions. This may seem like a lot of boilerplate, but we’ve gained a lot from this. And the benefits are only increase as our program grows larger.

To see the benefits of this approach, consider an alternative program where functions don’t return meaningful values and are only called for their side effects.

def sendEmail(webForm: String): Unit = {
    val form = deserialize[WebForm](parse(webForm))
    val email = Email(from.from, form.to, form.body, form.recipient)
    EmailService.sendEmail(email) //sends the email
    logEvent(email) //writes record to the DB
}

This monolithic function mixes the data transformations with the side effect of sending an email. The Unit return type is completely opaque. It’s not a meaningful value that we can reason about. By returning Unit, we’re saying “Nothing to look at here..move along”. By definition the only way to test our program is to test the entire thing at once. Since Unit has no meaning, we can only test that our program does what we want by inspecting that the side effects seem to meet our requirements.

This kind of program becomes impossible to maintain, especially as it grows larger and we add more functionality. Reasoning about this program requires an ever-increasing amount of mental energy. With no meaningful types anywhere, we can’t add functionality without keeping everything in mind at once. This function also ignores the real possibility of errors occurring when deserializing to our WebForm type and leaves out any validations on our Email.

Contrast this to the program we’ve written where we have clearly defined types and transformations that return clear values which indicate the possibility of failure. We can now write tests using something like ScalaCheck to confirm that malformed email addresses, for example,are rejected and return failures. Furthermore, the scope of our tests is greatly diminished. Instead of writing tests that exhaustively check that the side effects of our program occur as we expect (since we can’t inspect anything directly), we write small functions that return meaningful values for which we can write strong tests that confirm the correctness of our code directly. With the monolithic and side-effecting approach, if the tests don’t pass, we’re not necessarily sure why. We’re left to fish through our code, trying to find the bug that caused the problem.

Even with all the problems in my monolithic sendEmail example, I’m still using more clearly defined types than a lot of programs that I often see. For example, a lot of people seem to use hashes as their data type for everything. Someone showed me an example where they deleted two entries in some hash where they had intended to only remove one. Their tests blew up and eventually they tracked down that the unintended deletion which caused the issue. The problem here is that a hash is the wrong type for almost anything in your program’s domain. All we know about a hash is that contains keys and values. That’s it. There’s no information there and no type (beyond simply the most general notion of a structure that contains keys/values). By designing a very specific model, encoding it into strong types, and using a compiler you can catch errors before things blow up at runtime.

I’ll either edit this post or add a part two that completes the picture, illustrating how to perform the effect of sending an email, how to do type safe database queryies and insertions, etc.