When Ferrous Metals Corrode, pt. I

Intro

Hello Rustlang. I have been circling around this for a while, there's lots of folks I respect that have nothing but nice things to say, and both efficiency and safety have come to the forefront more and more for me. I think the final straw was when I recently learned that theres a good chance the Linux kernel is going to get a second official language, besides C. It's finally the time to learn me some Rust programming.

I plan to write up some notes about my learning experience here once a week or so. The notes will mostly be me excerpting the Rust book, and some experimentation of my own. Typically I learn stuff better if I have to write about it, and hopefully doing this in public will embarass me into keeping up at it.

First, Getting Books

Browsing around for learning resources, there is no shortage here. There is the free "The Book" which looks decent, but then there's also "Rust In Action" and the O'Reilly book "Programming Rust". Quickly perusing them, RIA looks like a decent second book, while the O'Reilly book will be the one I'm going to start out with (I think the first chapters heading, "Systems Programmers Can Have Nice Things", did it for me in the end!).

Tooling

There are packages for Rust in the Ubuntu repos, they are somewhat downrev though, and rustup seems to be the preferred option – I'm just tinkering around so I'm going for rustup (which works well enough). Jetbrains products usually work well for me, going to try out their Rust IDE plugin for CLion.

A Tour of Rust

Starting the O'Reilly book on the "A Tour Of Rust" chapter. Running cargo new hello and cargo run hello works as it should; inspecting the Cargo.toml reveals some metadata and empty deps. I like Cargo.toml, having metadata and deps management built in right from the start is awesome (just ask any Python dev).

The resulting binary is 3.6Mb – larger than I expected.

Moving on the book introduces functions wit a gcd() fn. Can I just print out the results from gcd()? Typing this into the IDE and compile, but it complains about a type mismatch. A bit disappointing that CLion didn't warn me about this tbh. Maybe because it's a macro – I don't know how macros actually work in Rust, but extrapolating from other langs I guess it would be harder to analyse. The cargo error message otoh is very helpful, recommends a formatting string: println!("gcd: {}", gcd(x, y)). There we go, that looks nice and familiar!

Testing

I can write test functions right into the main source with a #[test] annotation, nice! It makes total sense to me to keep test functions close to the unit under test, but otoh even for medium sized programs you will need scaffolding and test data etc., which would tend to clutter up your main code, so you'd want to split that out

Handling Command-Line Arguments

The chapter starts out with introducing the std::str::FromStr trait. Traits are a collection of methods that a type can implement, reminds me of GoLangs interfaces. I don't much like the fact that you import the FromStr trait, but that never gets mentioned in the following code – instead you are using the from_str method that the trait implements, you just have to know which methods the trait implements.

Example code follows, with some annotations

use std::str::FromStr;
use std::env;

fn main() {
    // have to declare mutable even to add elements to the list
    // the type of the elems, u64, is inferred by usage below
    let mut numbers = Vec::new();

    // for... over an iterator
    // iterators can be Vec, lines over a file, msgs from a comms channel...
    for arg in env::args().skip(1) {

        numbers.push(
            // from_str() returns a 'Return' value
            u64::from_str(&arg)
            // Return.expect() means check the retval from from_str() 
            .expect("error parsing argument"));
    }

    if numbers.len() == 0 {
        eprintln!("Usage: gcd NUMBER ...");
        std::process::exit(1);
    }

    let mut d = numbers[0];
    // & means borrow a ref
    for m in &numbers[1..] {
        // * means deref. clearly this should be mnemonic for C
        d = gcd(d, *m);
    }

    println!("The greatest common divisor of {:?} is {}",
             numbers, d);
}

Firstly, think it's great that there is an explicit Return type. Golang has some conventions around this but this feels cleaner.

Secondly, while the mnemonics for & and * are clear, I'm not exactly sure what borrowing a reference entails. I guess if we hadn't borrowed the ref, we would have needed to transfer the ownership of numbers to the gcd fn? Would this mean copying the Vec? Will need to punt on this now.

Serving Pages to the Web

The next chapter in the book is about serving up a calculation via http.

We're pulling in ext. dependencies via the Cargo.toml – did I already say how much I like that this is built right into Rust? Specifically we're pulling the Actix Web framework for serving http, and Serde for deserializing form data.

We can specify versions of crates (i.e. Rusts libraries/binaries) but also specify features we'd like to use:

[dependencies]
actix-web = "1.0.8"
serde = { version = "1.0", features = ["derive"] }

First part of example code, with inline commentary.

// pull in modules for serving http, shortened
use actix_web::{web, App, HttpResponse, HttpServer};

fn main() {
    // the '||' is the Rust closure / lambda expression 
    let server = HttpServer::new(|| {
        // body of the lambda returns a new App, with a GET route attached
        // the route calls a handler, the get_index() fn
        App::new()
            .route("/", web::get().to(get_index))
            .route("/gcd", web::post().to(post_gcd)) // post_gcd see below
    });

    println!("Serving on http://localhost:3000...");
    server
        .bind("127.0.0.1:3000").expect("error binding server to address")
        .run().expect("error running server");
}  

Ok, so closures – || really is nice and terse, to a fault.

I like how the method calls on server and Return values are chained, and how concise error handling via expect is.

Second part of the example code, the getindex(), build a HTTP response:

fn get_index() -> HttpResponse {
    // we always succeed
    HttpResponse::Ok()
        .content_type("text/html")
        .body(
            // multiline "raw" string, can use more than one # to avoid conflict
            r#"
                <title>GCD Calculator</title>
                <form action="/gcd" method="post">
                <input type="text" name="n"/>
                <input type="text" name="m"/>
                <button type="submit">Compute GCD</button>
                </form>
            "#,
        )
}  

Third part of the example code is about deserialization:

use serde::Deserialize;  // goes at the top


// attribute which the serde machinery picks up on
#[derive(Deserialize)]
struct GcdParameters { // struct a la Go or C
    n: u64,
    m: u64,
}

Part four defines a fn that handles the web form and returns an HttpResponse, was attached to the App route above:

// Form GcdParameters generics
fn post_gcd(form: web::Form<GcdParameters>) -> HttpResponse {
    // input checking
    if form.n == 0 || form.m == 0 {
        return HttpResponse::BadRequest()
            .content_type("text/html")
            .body("Computing the GCD with zero is boring.");
    }

    // building response string, similar formatting to println!()
    let response =
        format!("The greatest common divisor of the numbers {} and {} \
                 is <b>{}</b>\n",
                form.n, form.m, gcd(form.n, form.m));

    HttpResponse::Ok()
        .content_type("text/html")
        .body(response)
}

Ok so how Deserialize actually works is a bit unclear at this point, but presumably the web framework extracts values from the form and passes it into the post_gcd() fn. Rest is building an http response.

Concurrency

The next chapter introduces concurrency.

Rust provides some guarantees to ensure data can't be access without holding a lock, and that data can't be modified if its shared read-only. Also Rust can track ownership of data passed between threads.

I'll skip over the Mandelbrot source code, just noting some interesting language features from time to time.

Infinite loops

Rust has a dedicated syntax for an infinite loop, interesting.

loop {
    // ... loop infinitely
}

Complex numbers

The num package has a complex number type. It's a generic struct type, to be parametrized with a number type of imaginary and real parts.

Definition of the struct with two fields of the generic type T:

struct Complex<T> {
    /// Real portion of the complex number
    re: T,

    /// Imaginary portion of the complex number
    im: T,
}  

Example function using a Complex<f64> type:

use num::Complex;

fn complex_square_add_loop(c: Complex<f64>) {
    let mut z = Complex { re: 0.0, im: 0.0 };
    loop {
        z = z * z + c;
    }
}  

Documentation comments

While // marks the begin of a regular comment, triple slashes /// are special to the rustdoc utility to denote documentation to extract (is there some kind of doctest as well?).

Iterate over a range of integers

We've seen a for loop iterating over commandline args. This here iterates over a range of integers, from 0 up to but not including max_iter

for i in 0..max_iter {
    // ...
}

Option, Some, None

These are built in types (from the std::option module). Option is an enum, i.e. it can either be None or Some(value). This is useful for e.g. functions that are not defined over their whole input range. Option<T> is a generic type, i.e. the Some(v) value can be of a generic type T

More generics and pattern matching

Here's an interesting function. It parses an input string, looking for a separator character, and eiter returns two generic values or None if no separator is found

fn parse_pair<T: FromStr>(s: &str, separator: char) -> Option<(T, T)> {
    match s.find(separator) {
        None => None,
        Some(index) => {
            match (T::from_str(&s[..index]), T::from_str(&s[index + 1..])) {
                (Ok(l), Ok(r)) => Some((l, r)),
                _ => None
            }
        }
    }
}

The match thing is an expression, not a statement; it evals to the value of the leg it chose.

The function is generic: it takes a type parameter which can be anything that has the FromStr trait. E.g. parse_pair::<f64>("0.5x1.5", 'x') would return Some((0.5, 1.5). This specifies the <f64> float type explicitly, but often the Rust compiler can infer this type.

In the function we search with s.find() which returns an Option, either None or an index into s. This is passed to the match construct which branches on that Option – either returning None immediately or using the index value in another match construct.

This tries to apply from_str() on a slice of the input string. If we get two Ok(v) values those are packaged into a Some(tuple) and returned, otherwise return None. The version of from_str() that gets applied depends on the type parameter T; Ok(v) is the success value of the Result parameter (see above).

The _ is the fall-through pattern, matching everything else (i.e. one or both Results being Err(e)).

Building on this below is a parse function for Complex<f64> values. Note the double generic Option<Complex<f64>> return type.

fn parse_complex(s: &str) -> Option<Complex<f64>> {
    match parse_pair(s, ',') {
        Some((re, im)) => Some(Complex { re, im }),
        None => None
    }
}  

The Some(Complex { re, im }) part is shorthand for Some(Complex { re: re, im: im })

Tuple access and type conversion

This access the first element of the px tuple: px.0

Conversion to a f64 value: let f = px.0 as f64

I/O and unit type

I/O writing functions are typically called for their side effect – they don't really have useful values to return, except to signal errors.

This can be written like Result<(), std::io::Error>, meaning either () which is the unit type, similar to void, or an error (here an I/O Error).

Example function for writing png data to a file:

fn write_image(filename: &str, pixels: &[u8], bounds: (usize, usize))
    -> Result<(), std::io::Error>
{
    let output = File::create(filename)?;

    let encoder = PNGEncoder::new(output);
    encoder.encode(pixels,
                   bounds.0 as u32, bounds.1 as u32,
                   ColorType::Gray(8))?;

    Ok(())
}  

The ? is shorthand for this match construct:

let output = match File::create(filename) {
    Ok(f) => f,
    Err(e) => {
        return Err(e);
    }
};  

This would either return an error immediately or assign the Ok(value). Both File::create() and write_image() use std::io::Error as an error type. Nice and terse.

Note that ? doesn't work in main() as that doesn't return a value.

Macro vec!

This creates a vector of 23 elements and initializes it with ones: let mut v = vec![1; 23]

Crossbeam and scoped threads

The example uses the crossbeam library for computing Mandelbrot sets in parallel. It does this by creating scoped threads, which are guaranteed to terminate before returning from the closure that passed into crossbeam::scope function, meaning that you can reference data from the calling function.

crossbeam::scope(|spawner| {
    spawner.spawn(...)
    ...
}).unwrap();  

The spawner here is used to actually create the new thread. The scope() function waits until all threads finish execution. If any of the threads panicked, scope() will return an Err, otherwise Ok(()) (unity ok).

It's a pity that we're dealing with threads here, I wish Rust had a safer and more high-level concurrency abstraction, something like channels in Golang or, even better, actors a la Erlang. Some quick googling tells me that the Actix framework implements actors though (the name should have tipped me off I guess). Something to look at later.

Closures and move

The move keyword at the front indicates that this closure takes ownership of the variables it uses.

move |_| { ... }

Filesystems and Command-Line Tools

Next the book shows a cli utility which uses the text-colorizer and regex crates.

The derive debug attribute

In the example we collect cli args into a struct. To enable prettier help output with println! a #[derive(Debug)] attribute is added.

#[derive(Debug)]
struct Arguments {
    target: String,
    replacement: String,
    ...
}  

Printing errors

There's a eprintln! macro which works like println! but outputs to stderr

Text colors

Importing everything from the text-colorizer crate, then output to stderr, with the program name in green:

use text_colorizer::*;
eprintln!("{} - change occurrences of one string into another", "quickreplace".green());

Reading a file

File i/o is done with the fs standard library:

use std::fs;

The below snippet reads a string into the data var from the file given by args with the read_to_string() fn. If i/o fails, it outputs a bold red error and exits(1)

let data = match fs::read_to_string(&args.filename) {
        Ok(v) => v,
        Err(e) => {
            eprintln!("{} failed to read from file '{}': {:?}",
                      "Error:".red().bold(), args.filename, e);
            std::process::exit(1);
        }
    };

Writing a file

Similarly to the above, we're writing data into a file, printing an error on failure. This is done with the write() fn:

match fs::write(&args.output, &data) {
        Ok(_) => {},
        Err(e) => {
            eprintln!("{} failed to write to file '{}': {:?}",
                "Error:".red().bold(), args.filename, e);
            std::process::exit(1);
        }
    };

We're not interested in the return value of the write() fn, just want to handle errors

Regex replace

The regex crate offers regular expression handling, for instance regex replace.

Annotated example:

use regex::Regex;

fn replace(target: &str, replacement: &str, text: &str)
    -> Result<String, regex::Error>
{
    // compile a regex, error out if this should fail
    let regex = Regex::new(target)?; 

    // use the regex to replace text

    Ok( // we wrap this into an Ok(v) Result value
        regex.replace_all( // do the actual replace
            text, replacement).to_string() // always return a string
    )
}

Coda

Whoa, quite a tour. This concludes the first section, next will be data types