Cancer Risk Prediction in Rust
I have had the great pleasure of rewriting and updating some of our risk models in Rust. We moved these risk calculations from our Rails applications to a standalone JSON API.
What is Rust?
As the Rust team describes the language, “Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.”1
Rust is a compiled, statically typed, non-garbage-collected language. It is expressive, providing facilities familiar to users of higher-level and functional languages. It has aspects that will be familiar to developers who have worked with Ruby, Swift, Haskell, and C-like languages.
Although Rust offers many features available in high-level languages, it
embraces the idea of “zero-cost abstractions.” Taken from C++, the
notion is, “‘What you don’t use, you don’t pay for. And further: What
you do use, you couldn’t hand code any
better.’”2 Thus, for
example, Rust allows the developer to chain iterator adaptors (the name
for transformations to collections like map
, filter
, and
fold
),3 and the
compiler can often optimize away the
overhead.4
Rust also enforces a series of rules through its strict compiler that allows it to guarantee memory safety despite the absence of a garbage collector. Put very briefly, only one variable can own a memory location. Other variables may borrow from that variable, but the compiler must be able to determine that the borrowing variable cannot outlive the owner. Variables are immutable by default, and there cannot be multiple mutable references to the same memory location at the same time.
Rust has a somewhat steep learning curve, and a big reason for that is the borrowing rules described above. But it fixes issues that exist in any non-garbage-collected language. A C/C++ developer must keep track of the issues Rust’s borrow checker is designed to protect against or she is likely to encounter memory-management problems.5 That is, Rust’s borrow-checker isn’t gratuitous: it protects against genuine errors.
Why did we choose Rust?
We chose Rust for several reasons. The advantage I’m going to discuss
today relates to null
guarding. Our risk models use data from numerous
models within our application. Many of the associations between models
can be null
, while some cannot. Collections can be null
, empty, or
contain data. We wanted to be forced to think about each possibility as
we built out our risk service, so that we can avoid falling victim to
the billion dollar mistake,
NoMethodError: undefined method 'age' for nil:NilClass
.
To simplify a bit, consider the following (try running it here):
1# Simulate a JSON request parsed into a Ruby hashmap
2#
3# @return [Hash{String => Hash}]
4def request_body
5 { 'abc_123' => {
6 date_of_birth: nil,
7 cancers: [
8 { cancer_type: :breast,
9 age_of_diagnosis: nil },
10 { cancer_type: :colorectal,
11 age_of_diagnosis: 51 }
12 ]
13 } }
14end
15
16# Deterimine whether any family member has breast cancer under 50
17#
18# @return [Boolean]
19def family_has_breast_cancer_under_50(data)
20 data.values.any? do |family_member|
21 family_member[:cancers].any? do |cancer|
22 cancer[:cancer_type] == :breast && cancer[:age_of_diagnosis] < 50
23 end
24 end
25end
26
27puts family_has_breast_cancer_under_50(request_body)
We’re hoping for a false
, but we actually get
1undefined method `<' for nil:NilClass (NoMethodError)
This error message is not as clear as we might want. The problem is that
we forgot to guard against the possibility that age_of_diagnosis
is
nil
. We should have done
this:6
1def family_has_breast_cancer_under_50(data)
2 data.values.any? do |family_member|
3 family_member[:cancers].any? do |cancer|
4 cancer[:cancer_type] == :breast && (cancer[:age_of_diagnosis]&.< 50)
5 end
6 end
7end
What if we could get a compile-time error if we forget to handle
situations in which we anticipate a null
, and a clearer runtime error
when the JSON fails to meet our expectations? That’s what we get with
Rust’s static typing and its Option
type.
The Option type is defined as
1pub enum Option<T> {
2 None,
3 Some(T),
4}
As we will see, the Option
type requires the developer to handle both
cases: the underlying type wrapped in a Some
, or a None
.
If we know that age_of_diagnosis
cannot be null
, we can write this
code (which you can run
here):
1#![allow(dead_code)]
2
3use std::collections::HashMap;
4
5// We can derive `trait`s, like `PartialEq`, to add functionality to types we define. These are
6// similar to Haskell's type classes.
7#[derive(PartialEq)]
8enum CancerType {
9 Breast,
10 Colorectal,
11 Ovarian,
12 Endometrial,
13 Gastric,
14 Pancreatic,
15}
16
17struct Cancer {
18 cancer_type: CancerType,
19 age_of_diagnosis: u8, // an unsigned 8-bit integer
20}
21
22struct FamilyMember {
23 date_of_birth: String,
24 cancers: Vec<Cancer>,
25}
26
27// This is a tuple struct implementing the `newtype` pattern.
28struct Family(HashMap<String, FamilyMember>);
29
30// An `impl` block adds methods to a `struct` or `enum`.
31impl Family {
32 // The `&` means pass-by-reference. Unlike C/C++, pass-by-reference is by default immutable.
33 // The argument would be `&mut self` if the method were allowed to mutate its argument.
34 pub fn has_breast_cancer_under_50(&self) -> bool {
35 // A single-element tuple must still have its only field retrieved by index.
36 self.0.values().any(|family_member| {
37 // Unlike Ruby and similar languages, we cannot call a method on an iterator directly
38 // on a collection; we must first call `iter()` or `into_iter()` (which differ based on
39 // whether they consume the underlying collection).
40 family_member.cancers.iter().any(|cancer| {
41 cancer.cancer_type == CancerType::Breast && cancer.age_of_diagnosis < 50
42 })
43 })
44 }
45}
46
47fn main() {
48 let mut family_hash = HashMap::with_capacity(1);
49
50 // Rust has two different kinds of strings: `String` and `&str`. Their difference is beyond the
51 // scope of this post.
52 family_hash.insert(String::from("abc-123"), get_family_member());
53
54 let family = Family(family_hash); // Constructor for `Family`.
55 assert!(family.has_breast_cancer_under_50());
56}
57
58// =================================================================================================
59// Helper functions
60// =================================================================================================
61
62fn get_family_member() -> FamilyMember {
63 FamilyMember {
64 date_of_birth: String::from("1938-01-10"),
65 cancers: vec![
66 Cancer {
67 cancer_type: CancerType::Breast,
68 age_of_diagnosis: 49,
69 },
70 Cancer {
71 cancer_type: CancerType::Colorectal,
72 age_of_diagnosis: 51,
73 },
74 ],
75 }
76}
But if age_of_diagnosis
can potentially be null
, we could try
this:
1Cancer {
2 cancer_type: CancerType::Breast,
3 age_of_diagnosis: None,
4}
But then we get a type error (here we’re trying to use None
, which
isn’t a u8
):
1 |
268 | age_of_diagnosis: None,
3 | ^^^^ expected u8, found enum `std::option::Option`
4 |
5 = note: expected type `u8`
6 found type `std::option::Option<_>`
What if we update our Cancer
struct
to allow Option
types like
so?
1struct Cancer {
2 cancer_type: CancerType,
3 age_of_diagnosis: Option<u8>,
4}
1`error[E0308]: mismatched types
2 --> src/main.rs:32:43
3 |
432 | cancer.age_of_diagnosis < 50
5 | ^^
6 | |
7 | expected enum `std::option::Option`, found integral variable
8 | help: try using a variant of the expected type: `Some(50)`
9 |
10 = note: expected type `std::option::Option<u8>`
11 found type `{integer}`
Now we see that we’re trying to compare age_of_diagnosis
, which is an
integer wrapped in an Option
, directly with an integer. The compiler
is saying, “Hey, you told me that this could be undefined, but here
you’re acting as though it will always be defined. Please handle the
None
case or I’m not going to compile.” This is precisely what Ruby
doesn’t do, and why we sometimes don’t notice null
errors until we
encounter them during runtime (ideally through unit testing, but maybe
not).
So:
1impl Family {
2 pub fn has_breast_cancer_under_50(&self) -> bool {
3 self.0.values().any(|family_member| {
4 family_member.cancers.iter().any(|cancer| {
5 cancer.cancer_type == CancerType::Breast &&
6 // `map_or()` applies the second argument, a lambda function (or `closure` in Rust
7 // parlance) to the value contained in the `Some(_)`, or returns the first argument
8 // if the value is a `None`.
9 cancer.age_of_diagnosis.map_or(false, |age| age < 50)
10 })
11 })
12 }
13}
The more deeply we nest Option
types, the more complex the code
becomes. If, for example, the cancers
field on MedicalHistory
can be
null, we must do
this:
1impl Family {
2 pub fn has_breast_cancer_under_50(&self) -> bool {
3 self.0.values().any(|family_member| {
4 family_member
5 .medical_history
6 // This avoids borrow-checker problems
7 .as_ref()
8 // `and_then()` is called `flat_map()` in some languages
9 .and_then(|family_member| family_member.cancers.as_ref())
10 .map_or(false, |cancers| {
11 cancers.iter().any(|cancer| {
12 cancer.cancer_type == CancerType::Breast
13 && cancer.age_of_diagnosis.map_or(false, |age| age < 50)
14 })
15 })
16 })
17 }
18}
But again, this is domain complexity, not incidental complexity.
❧ ❧ ❧
In a future post, I’ll describe an additional advantage of Rust: it’s super fast. I’ll also discuss the awesome Serde library, and how it helps enforce the schema for the JSON.1 The Rust Programming Language, https://www.rust-lang.org/en-US/ (last visited Oct. 12, 2017).
2 Aaron Turon, Abstraction without overhead: Traits in Rust, The Rust Programming Language Blog (May 5, 2015) https://blog.rust-lang.org/2015/05/11/traits.html (quoting Bjarne Stroustrup) (citation omitted).
3 Technically, Rust uses
the phrase iterator adaptor for a method that transforms one iterator
into another (e.g. map
, filter
, zip
, chain
, etc.), and an
iterator consumer for a method that produces a final value (e.g.
fold
, any
, all
, find
, etc.). The Rust Programming Language §
13.2,
https://doc.rust-lang.org/book/second-edition/ch13-02-iterators.html.
4 See Ruud van Asseldonk, Zero-cost abstractions (Nov. 30, 2016), https://ruudvanasseldonk.com/2016/11/30/zero-cost-abstractions.
5 See Nerijus Arlauskas, Short intro to C++ for Rust developers: Ownership and Borrowing (Jan. 22, 2017) (describing how C++ developers protect against problems the problems Rust’s borrow checker aims to solve), https://nercury.github.io/c++/intro/2017/01/22/cpp-for-rust-devs.html.
6 Note the use of the
safe navigation operator &.
. See Georgi Mitrev, The Safe Navigation
Operator (&.) in Ruby (Nov. 13, 2015),
http://mitrev.net/ruby/2015/11/13/the-operator-in-ruby/.