Cancer Risk Prediction in Rust

I have had the great pleasure of rewriting and updating some of our risk models in Rust. We moved these risk calculations from our Rails applications to a standalone JSON API.

What is Rust?

As the Rust team describes the language, “Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.”1

Rust is a compiled, statically typed, non-garbage-collected language. It is expressive, providing facilities familiar to users of higher-level and functional languages. It has aspects that will be familiar to developers who have worked with Ruby, Swift, Haskell, and C-like languages.

Although Rust offers many features available in high-level languages, it embraces the idea of “zero-cost abstractions.” Taken from C++, the notion is, “‘What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.’”2 Thus, for example, Rust allows the developer to chain iterator adaptors (the name for transformations to collections like map, filter, and fold),3 and the compiler can often optimize away the overhead.4

Rust also enforces a series of rules through its strict compiler that allows it to guarantee memory safety despite the absence of a garbage collector. Put very briefly, only one variable can own a memory location. Other variables may borrow from that variable, but the compiler must be able to determine that the borrowing variable cannot outlive the owner. Variables are immutable by default, and there cannot be multiple mutable references to the same memory location at the same time.

Rust has a somewhat steep learning curve, and a big reason for that is the borrowing rules described above. But it fixes issues that exist in any non-garbage-collected language. A C/C++ developer must keep track of the issues Rust’s borrow checker is designed to protect against or she is likely to encounter memory-management problems.5 That is, Rust’s borrow-checker isn’t gratuitous: it protects against genuine errors.

Why did we choose Rust?

We chose Rust for several reasons. The advantage I’m going to discuss today relates to null guarding. Our risk models use data from numerous models within our application. Many of the associations between models can be null, while some cannot. Collections can be null, empty, or contain data. We wanted to be forced to think about each possibility as we built out our risk service, so that we can avoid falling victim to the billion dollar mistake, NoMethodError: undefined method 'age' for nil:NilClass.

To simplify a bit, consider the following (try running it here):

 1# Simulate a JSON request parsed into a Ruby hashmap
 2#
 3# @return [Hash{String => Hash}]
 4def request_body
 5  { 'abc_123' => {
 6    date_of_birth: nil,
 7    cancers: [
 8      { cancer_type: :breast,
 9        age_of_diagnosis: nil },
10      { cancer_type: :colorectal,
11        age_of_diagnosis: 51 }
12    ]
13  } }
14end
15
16# Deterimine whether any family member has breast cancer under 50
17#
18# @return [Boolean]
19def family_has_breast_cancer_under_50(data)
20  data.values.any? do |family_member|
21    family_member[:cancers].any? do |cancer|
22      cancer[:cancer_type] == :breast && cancer[:age_of_diagnosis] < 50
23    end
24  end
25end
26
27puts family_has_breast_cancer_under_50(request_body)

We’re hoping for a false, but we actually get

1undefined method `<' for nil:NilClass (NoMethodError)

This error message is not as clear as we might want. The problem is that we forgot to guard against the possibility that age_of_diagnosis is nil. We should have done this:6

1def family_has_breast_cancer_under_50(data)
2  data.values.any? do |family_member|
3    family_member[:cancers].any? do |cancer|
4      cancer[:cancer_type] == :breast && (cancer[:age_of_diagnosis]&.< 50)
5    end
6  end
7end

What if we could get a compile-time error if we forget to handle situations in which we anticipate a null, and a clearer runtime error when the JSON fails to meet our expectations? That’s what we get with Rust’s static typing and its Option type.

The Option type is defined as

1pub enum Option<T> {
2    None,
3    Some(T),
4}

As we will see, the Option type requires the developer to handle both cases: the underlying type wrapped in a Some, or a None.

If we know that age_of_diagnosis cannot be null, we can write this code (which you can run here):

 1#![allow(dead_code)]
 2
 3use std::collections::HashMap;
 4
 5// We can derive `trait`s, like `PartialEq`, to add functionality to types we define. These are
 6// similar to Haskell's type classes.
 7#[derive(PartialEq)]
 8enum CancerType {
 9    Breast,
10    Colorectal,
11    Ovarian,
12    Endometrial,
13    Gastric,
14    Pancreatic,
15}
16
17struct Cancer {
18    cancer_type: CancerType,
19    age_of_diagnosis: u8, // an unsigned 8-bit integer
20}
21
22struct FamilyMember {
23    date_of_birth: String,
24    cancers: Vec<Cancer>,
25}
26
27// This is a tuple struct implementing the `newtype` pattern.
28struct Family(HashMap<String, FamilyMember>);
29
30// An `impl` block adds methods to a `struct` or `enum`.
31impl Family {
32    // The `&` means pass-by-reference. Unlike C/C++, pass-by-reference is by default immutable.
33    // The argument would be `&mut self` if the method were allowed to mutate its argument.
34    pub fn has_breast_cancer_under_50(&self) -> bool {
35        // A single-element tuple must still have its only field retrieved by index.
36        self.0.values().any(|family_member| {
37            // Unlike Ruby and similar languages, we cannot call a method on an iterator directly
38            // on a collection; we must first call `iter()` or `into_iter()` (which differ based on
39            // whether they consume the underlying collection).
40            family_member.cancers.iter().any(|cancer| {
41                cancer.cancer_type == CancerType::Breast && cancer.age_of_diagnosis < 50
42            })
43        })
44    }
45}
46
47fn main() {
48    let mut family_hash = HashMap::with_capacity(1);
49
50    // Rust has two different kinds of strings: `String` and `&str`. Their difference is beyond the
51    // scope of this post.
52    family_hash.insert(String::from("abc-123"), get_family_member());
53
54    let family = Family(family_hash); // Constructor for `Family`.
55    assert!(family.has_breast_cancer_under_50());
56}
57
58// =================================================================================================
59// Helper functions
60// =================================================================================================
61
62fn get_family_member() -> FamilyMember {
63    FamilyMember {
64        date_of_birth: String::from("1938-01-10"),
65        cancers: vec![
66            Cancer {
67                cancer_type: CancerType::Breast,
68                age_of_diagnosis: 49,
69            },
70            Cancer {
71                cancer_type: CancerType::Colorectal,
72                age_of_diagnosis: 51,
73            },
74        ],
75    }
76}

But if age_of_diagnosis can potentially be null, we could try this:

1Cancer {
2    cancer_type: CancerType::Breast,
3    age_of_diagnosis: None,
4}

But then we get a type error (here we’re trying to use None, which isn’t a u8):

1   |
268 |                 age_of_diagnosis: None,
3   |                                   ^^^^ expected u8, found enum `std::option::Option`
4   |
5   = note: expected type `u8`
6              found type `std::option::Option<_>`

What if we update our Cancer struct to allow Option types like so?

1struct Cancer {
2    cancer_type: CancerType,
3    age_of_diagnosis: Option<u8>,
4}
 1`error[E0308]: mismatched types
 2  --> src/main.rs:32:43
 3   |
 432 |                 cancer.age_of_diagnosis < 50
 5   |                                           ^^
 6   |                                           |
 7   |                                           expected enum `std::option::Option`, found integral variable
 8   |                                           help: try using a variant of the expected type: `Some(50)`
 9   |
10   = note: expected type `std::option::Option<u8>`
11              found type `{integer}`

Now we see that we’re trying to compare age_of_diagnosis, which is an integer wrapped in an Option, directly with an integer. The compiler is saying, “Hey, you told me that this could be undefined, but here you’re acting as though it will always be defined. Please handle the None case or I’m not going to compile.” This is precisely what Ruby doesn’t do, and why we sometimes don’t notice null errors until we encounter them during runtime (ideally through unit testing, but maybe not).

So:

 1impl Family {
 2    pub fn has_breast_cancer_under_50(&self) -> bool {
 3        self.0.values().any(|family_member| {
 4            family_member.cancers.iter().any(|cancer| {
 5                cancer.cancer_type == CancerType::Breast &&
 6                // `map_or()` applies the second argument, a lambda function (or `closure` in Rust
 7                // parlance) to the value contained in the `Some(_)`, or returns the first argument
 8                // if the value is a `None`.
 9                cancer.age_of_diagnosis.map_or(false, |age| age < 50)
10            })
11        })
12    }
13}

The more deeply we nest Option types, the more complex the code becomes. If, for example, the cancers field on MedicalHistory can be null, we must do this:

 1impl Family {
 2    pub fn has_breast_cancer_under_50(&self) -> bool {
 3        self.0.values().any(|family_member| {
 4            family_member
 5                .medical_history
 6                // This avoids borrow-checker problems
 7                .as_ref()
 8                // `and_then()` is called `flat_map()` in some languages
 9                .and_then(|family_member| family_member.cancers.as_ref())
10                .map_or(false, |cancers| {
11                    cancers.iter().any(|cancer| {
12                        cancer.cancer_type == CancerType::Breast
13                            && cancer.age_of_diagnosis.map_or(false, |age| age < 50)
14                    })
15                })
16        })
17    }
18}

But again, this is domain complexity, not incidental complexity.

❧ ❧ ❧

In a future post, I’ll describe an additional advantage of Rust: it’s super fast. I’ll also discuss the awesome Serde library, and how it helps enforce the schema for the JSON.


1 The Rust Programming Language, https://www.rust-lang.org/en-US/ (last visited Oct. 12, 2017).

2 Aaron Turon, Abstraction without overhead: Traits in Rust, The Rust Programming Language Blog (May 5, 2015) https://blog.rust-lang.org/2015/05/11/traits.html (quoting Bjarne Stroustrup) (citation omitted).

3 Technically, Rust uses the phrase iterator adaptor for a method that transforms one iterator into another (e.g. map, filter, zip, chain, etc.), and an iterator consumer for a method that produces a final value (e.g. fold, any, all, find, etc.). The Rust Programming Language § 13.2, https://doc.rust-lang.org/book/second-edition/ch13-02-iterators.html.

4 See Ruud van Asseldonk, Zero-cost abstractions (Nov. 30, 2016), https://ruudvanasseldonk.com/2016/11/30/zero-cost-abstractions.

5 See Nerijus Arlauskas, Short intro to C++ for Rust developers: Ownership and Borrowing (Jan. 22, 2017) (describing how C++ developers protect against problems the problems Rust’s borrow checker aims to solve), https://nercury.github.io/c++/intro/2017/01/22/cpp-for-rust-devs.html.

6 Note the use of the safe navigation operator &.. See Georgi Mitrev, The Safe Navigation Operator (&.) in Ruby (Nov. 13, 2015), http://mitrev.net/ruby/2015/11/13/the-operator-in-ruby/.