100 days of Rust - Alexandre Chaintreuil

Motivations

I became interested in Rust when I heard about safety. Buffer overruns or some sort of mishandling of memory cause a lot of vulnerabilites. There is an interesting discussion about that on reddit. It is a bit biased because it’s on /r/rust.

People claim that you just have to git gud (and usually claim that they write bug free C or C++). Yet super experienced developers are making those bugs. Rust appears to be an interesting way of avoiding these kind of bugs by design, and without runtime cost.

An interesting debate is whether or not we should rewrite everything in Rust.. (I’m not sure that’s a pratical solution).

It was enough to spark my interest. Even if I don’t end up rewriting everything in Rust, I will probably learn interesting concepts along the way.

I’m not a total newbie, I have already started two projects:

A web server: That was my first non “Hello World!” program, trying to understand how a web server might work and learning Rust at the same time. It was over a year ago.
An SQLite clone: I started 4 months ago, based on cstack tutorial on how to write a db. I struggled because my knowledge of the language wasn’t good and I struggled to manage allocations of Rows and Pages.

Then, I was preparing interviews so I focused on that. Now that I got the job (Yay! 🎉), I can get back at it.

Resources

I will start with the Rust book.

I will then port a program I wrote for some homework: a http log monitoring program.

From then, I have three options that I find interesting:

Get back at the database
Write an OS following CS140e from Stanford or Philipp Oppermann’s blog
Contribute to an open source project

The plan

The plan is to spend at least an hour per day coding in Rust and to write about my learnings and my struggles.

Starting today!

Week 1

Day 1 (1h)

I read the Rust Book, second edition on ownership, structs and pattern matching. It’s nicer than the first edition, the explaination are more detailled than in the first edition.

I have not much to say today, because I am re-learning things I vaguely knew. I do feel that I have a better understanding of ownership now.

I don’t want to spend too much time reading the docs because I might lose interest without a real problem. On the cool side, I noticed a part about parsing CLI inputs in the book. That will be relevant for the http log monitoring program.

I will try to reach that part by tomorrow, so I can build something!

Day 2 (1h)

Today, I read some sections of the Rust book (modules and common collections). It is taking longer than expected because the doc if really thourough.

The section 8.2 about String and UTF-8 is very interesting.

Day 3 (1h)

I read about error handling, generic types and lifetimes. This has always been a part I never really understood. I think it’s clearer now, but I need to write some code to apply this knowledge.

Fortunately tomorrow, I will reach the Testing, and then the I/O program to write some real code.

Day 4 (45min)

I read about Testing and experimented a bit on my own. It’s nice to have everything built into the language and not import external libs (unittest or pytest comes to mind).

It seems to lack the assert_almost_equals and other syntaxic sugar that I am used to.

It is also a bit depressing not to have reached the coding part yet.

Day 5 (45min)

Finally, I started coding! It took longer than expected but I’m getting there. It was frustrating to read interesting stuff and not code.

I went with the minigrep example, trying to code ahead of the code blocks. I don’t regret the time I spent reading the book because I understood all the compile error I got. I even foreseen some of them. It is definitely a step up from last time, when I would try to add &, mut, str/String and lifetime until it compiles. I stopped halfway through the chapter as I had to go to work.

Day 6 (2h)

I was excited to come home and continue coding! I went through the exercice, wrote some tests and refactored the code.

I also read the chapter on functional language features and the chapter on cargo. I will now work on the port of my log monitoring program.

I will stop reading the books by large chunks and focus more on coding. There are 31 chapters left, I can take my time. I might need the chapter on concurrency for the port but not right from the start.

Day 7 (1h)

It’s time to start my own project: rewriting my http log monitoring program in Rust!

I took the log generator that’s written in Python to get started quickly. For the first part of the project, I can reuse what I learned about building a CLI with the minigrep project

Starting the project was similar to the minigrep project. I built a program that could parse input from the CLI and read a file according to the argument

As I’m parsing structured files, I used a regex and the regex crate. Fortunately, I had already written the regex in my earlier Python version. It wasn’t too hard to adapt. I had a hard time escaping the double quote because I though r"..." was the regex string syntax just like in Python. In Rust, it means raw string.

Reading the file and displaying its content was straighforward. I also wrote some tests on the parsing.

Day 8 (1h)

I created a my first Consumer that will ingest events and generate reports. I will have multiple consumers that will all ingest events and generate reports.

This sounds like some similar behaviour between different classes. It makes sense to create a Trait for this:

pub trait Consumer {
    fn ingest(&mut self, http_log: HttpLog);
    fn report(&self);
}

I implemented the first one ErrorWatcher that counts the number of 4xx and 5xx errors. The struct holds a numeric counter, and a hashmap that holds an Enum corresponding to the error codes and their respective counts.

I could have done it more easily with just numeric counters (error_4xx_count and error_5xx_count) but I wanted to experiment with the std::collections::Hashmap.

It was interesting to see you need to implement 3 Traits to use something as a hashmap key. For basic types, you can derive them automatically with the annotation: #[derive(PartialEq, Eq, Hash)]

I managed to wrote some tests for the ingestion.

Day 9 (3h)

I implemented another consumer: the Ranker that reports the most active hosts and the most requested first directories of urls (this-part in http://url.com/this-part/that-other-part).

I had more fun with hashmaps: the Ranker struct holds 2 Hashmap<String, u32>. To compute a ranking, I needed to sort the keys by values. I built a Vec holding the tuple and sort it on the 2^nd value:

fn rank<'a>(&self, container: &'a HashMap<String, u32>) -> Vec<(&'a String, &'a u32)> {
    let mut ranking: Vec<(&String, &u32)> = container.iter().collect();
    ranking.sort_by(|a, b| b.1.cmp(a.1));

    ranking.truncate(self.max_ranking);

    ranking
}

It doesn’t feel idiomatic though…

The consumer.rs file started to grow big. I refactored it as a module. I also implemented some tests.

Day 10 (1h)

My goal is to implement threading. That will allow the program to have one ingesting thread that continuously fills a queue with new log events, and another thread that computes stuff and displays it in the terminal.

My first challenge was to have a container that contains a trait. That is not directly possible because

error[E0277]: the trait bound `consumer::Consumer: std::marker::Sized` is not satisfied
  --> src/lib.rs:40:21
   |
40 |     let consumers : Vec<Consumer> = Vec::new();
   |                     ^^^^^^^^^^^^^ `consumer::Consumer` does not have a constant size known at compile-time
   |
   = help: the trait `std::marker::Sized` is not implemented for `consumer::Consumer`
   = note: required by `std::vec::Vec`

The solution is quite simple (use a reference or a Box) but it raises an interesting questions: should I use a & or a Box? TL;DR: With a reference, you lend the value, with a Box you take ownership and you’re now responsible for the Drop. In our case, we don’t want more responsibility: reference it is.

I run into an interesting problem. This code would not compile

let mut consumers : Vec<&mut Consumer> = Vec::new();
let mut error_watcher = ErrorWatcher::new();
let mut ranker = Ranker::new();
consumers.push(&mut error_watcher);
consumers.push(&mut ranker);

error[E0597]: `error_watcher` does not live long enough
  --> src/lib.rs:45:25
   |
45 |     consumers.push(&mut error_watcher);
   |                         ^^^^^^^^^^^^^ borrowed value does not live long enough
...
72 | }
   | - `error_watcher` dropped here while still borrowed
   |
   = note: values in a scope are dropped in the opposite order they are created

The important thing here is the note at the bottom: values in a scope are dropped in the opposite order they are created At the end of the program, error_watcher would get dropped, then consumers. In that short moment of time, consumers would have a dangling reference.

The solution is to change the order:

-let mut consumers : Vec<&mut Consumer> = Vec::new();
 let mut error_watcher = ErrorWatcher::new();
 let mut ranker = Ranker::new();
+let mut consumers : Vec<&mut Consumer> = Vec::new();
 consumers.push(&mut error_watcher);
 consumers.push(&mut ranker);

Day 11 (2h)

I started by reading more of the Rust Book: the chapters on Smart Pointers and Concurrency. That was an interesting read again, and I have some pointers (haha!) on how I will implement threading in the program.

I first wanted to benchmark the single threaded performance and compare it to the Python program. I started implementing the Alerter which raises an alert when there are too many requests.

I implented the Alert struct that will hold the information for one alert.

Day 12 (2h)

I implented the logic for the Alerter (the consumer that raises alerts). I spent way more time than I expected on handling timezones with the chrono crate.

At first, I couldn’t get the parsing of the date working:

extern crate chrono;

use chrono::{DateTime, Local, TimeZone, FixedOffset, Utc};

fn main() {
    let working_input_string = "05/Jun/2018:20:55:45 +0200";
    match Local.datetime_from_str(working_input_string,"%d/%b/%Y:%T %z") {
        Ok(d) => println!("{:?}", d),
        Err(err) => println!("{}", err),
    }

    let failing_input_string = "05/Jun/2018:20:55:45 +0300";
    match Local.datetime_from_str(failing_input_string,"%d/%b/%Y:%T %z") {
        Ok(d) => println!("{:?}", d),
        Err(err) => println!("{}", err),
    }
}

The first datetime would be parsed just fine, but the second one wouldn’t. I get the following error message: no possible date and time matching input. Note that I live in a UTC+2 timezone

At first, I thought that I was related to the offset (the lib has no way of knowing which timezone to convert to because the offset of my timezone changes with DST). But the same issue appears with UTC (this time only the +0000 offset get parsed). I will raise an issue on the crate repo to clarify this. Edit: Done

Finally, I chose to parse time with DateTime::parse_from_str and convert it to Local time.

Final Update

I failed at the 100 days challenge. I took a week of holidays in July and then I didn’t went back at coding in Rust (mostly out of laziness). Now that things have settled a bit (I switched jobs and appartments during the summer).

I still have an interest in Rust so I will take out this challenge again, maybe under another form, but the goal is to code in Rust for around 100 hours and see what progress I made.