How to Read a Text File in Rust, for Beginners

Kent West - kent.west@{that mail that swore to do no evil}

The current documentation out there seems to be all over the map, and never makes it especially clear, especially for beginners. Hopefully this document will address this need.

Almost the exact same document can be found here, but it does/says things slightly differently. Comparing the differences might help the beginner understand the concepts a little better.

1. You need a working Rust setup, and to be able to compile and run the simple “Hello, World!” program. So let’s start with that.

main.rs

main() {
  println!(“Hello, World!”);
} // end of main()

Pretty much a no-brainer, right?

2. You need a text file to read.

Now let’s create a text file to be read, something like:

data.txt

Kent
Mason
jugular
tiger, river-dancing tse-tse “flyboys in the air”
This.Is.The.End.Of.This.Text.File!

We could name the file anything, but we’ll call it data.txt. I'm going to put my data.txt file in the src subdirecory below my project directory, which is /home/kent/PROGRAMMING/Rust/file_experiment/.

There are basically three ways a file can be read:

all at once, into one big text string
a line at a time
a character (or chunk) at a time

Side Note: Be wary of that term “character”; in American English one letter of the alphabet (or one numeral, or one punctuation mark, etc) corresponds to one character, but in other languages, there might be more than one character per “letter”/symbol. For our purposes, using an ASCII text file like above, we are safe to think of one character as one byte of data as one symbol on the keyboard (or in an ASCII chart). But ideally, you should get familiar with UTF-8 (of which ASCII is now a sub-set), and think in those terms, rather than thinking in terms of ASCII characters.

Also, you might be in the habit of thinking of a “string” as a connected series of “char”s, as in the string “Rover is a good dog” is a connected series of “R” + “o” + “v” + e” and etc. Be aware that in Rust, a “String” type is not simply an array, a connected series, of “char” types.

But both of these points are for a later time, and for today, you can think of ASCII text as chars and as symbols on your keyboard plus a few others you have to look up in an ASCII chart to use (like box-drawing chars and upside-down question-marks, etc), and a string as a connected series of chars.)

We’re only going to look at the first of these three methods, because if you understand it, there’s a good change that the existing online documentation should be understandable enough for you to follow for the second and third methods.

3) And then we'll read the file as one big text string.

This is probably the easiest way to read a text file. Your program simply opens the text file and gulps the entire file in one big bite into a string. Below is the basic idea (adding the highlighted lines), although trying to compile it as it is currently written will result in an error – go ahead, try it. We no longer need the “Hello, World!”, any more, so let’s get rid of it (the ~~strikeout~~ line) to clean up our code.

main.rs

/*
Learning to Read a Text File
by Kent West
*/

main() {
 println!(“Hello, World!”);
  let file_contents = read_to_string(“src/data.txt”);
  println!(“{}”, file_contents);
} // end of main()

As you can see, we don’t have an “open file” or “close file” statement as is true in many other programming languages; we’re just grabbing (or trying to grab) the file contents in one big gulp, and then to print them to the terminal window.

But the compiler complains that “read_to_string” is “not found in this scope”. In other words, the compiler needs some “path” information to follow the trail to finding the command “read_to_string”. We can do this in a couple of different ways.

We can add the path directly in the command:


let file_contents = std::fs::read_to_string(“src/data.txt”);

or we can put the path information in a use statement at the beginning of the program file:

main.rs

/*
Learning to Read a Text File
by Kent West
*/
  
use std::fs::*;
  
main() {
  let file_contents = read_to_string(“src/data.txt”);
  println!(“{}”, file_contents);
} // end of main()

The program still doesn’t run, though, giving you this error:

|  println!("{}", file_contents);  |  
                 ^^^^^^^^^^^^^ `std::result::Result<String,
std::io::Error>` cannot be formatted with the default formatter

That simply means that the file is formatted (by the “read_to_string” command) in such a way that the “println!” command (“macro”, actually, as indicated by the bang (“!”)) doesn’t know how to format it for printing.

We can tell the “println!” macro to use a special “debug” format to sort out how to print it. You don’t really need to know exactly what this is doing; it’s just telling the println! macro to “just guess; do the best you can with it”. We do this by adding a “:?” to the “{}” in the println! statement.

println!(“{:?}”, file_contents);

The program should now compile and run, and give you some output that looks like this:

Ok("Kent\nMason\njugular\ntiger, river-dancing tse-tse “flyboys in the air”\nThis.Is.The.End.Of.This.Text.File!\n")

4) Understanding the Output of the “read_to_string” Line

Here again is the output of our program:

Ok("Kent\nMason\njugular\ntiger, river-dancing tse-tse “flyboys in the air”\nThis.Is.The.End.Of.This.Text.File!\n")

The “\n” is the “newline” character (singular, although it’s actually two characters when typed out in English text, a backslash and an ‘n’). This corresponds to hitting ENTER to create a new line in your text editor.

When the “read_to_text” command reads a file, there are two possible outcomes: either there will be no errors, or there will be errors.

If the file is read successfully, the contents of the file are put into a "wrapper", and then a label is put onto that wrapper that says "Ok". If the file is not read successfully, an error message is put into that wrapper, and then a label is put onto that wrapper that says "Err".

This "wrapper" is then returned to the calling routine as a special type of data called a "Return" type (just as an integer might be an "i32" type, or a string of letters might be a "String" type, etc). A "Return" type returns either an "Ok" or an "Err", along with either the expected results tied to the "Ok" or an error message tied to the "Err".

>Try temporarily renaming your data file from “src/data.txt” to “src/data.tx”, and re-running your program.

Now you get this output from the program:

Err(Os{ code: 2, kind: NotFound, message: "No such file or directory";})

Nifty, eh? Some error information is “wrapped” in a package labeled “Err”.

But the program continues on. You can see this by adding another message at the end of the program:

main.rs

...
main() {
  let file_contents = read_to_string(“src/data.txt”);
  println!(“{:?}”, file_contents);
  println!("This is the end of the program.");
} // end of main()

Now if you run the program you'll see the error message printed out, and then the proof that this did not stop the program. In many cases, that could be bad; do you really want your program continuing after it has failed in some previous part?

It’s good Rust style to handle errors.

Handling Errors

There are four ways of dealing with errors:

Don't Handle the Error

One way to handle errors is to simply not handle them, expecting them to not occur, as we did above.

Don't Handle the Error, But Panic the Program to a Stop

A marginally better way is to not handle the error, but to stop the program from going any farther. We can do this by forcing the program to "panic" and quit. Try the following code (with your data file still misnamed).

let file_contents = fs::read_to_string("src/data.txt";).unwrap();

When you run the program now, the program crashes. You get essentially the same message as before, but before, the program didn’t panic/crash. Both situations (crashing and not crashing) can be bad, but not crashing, when running with bad/missing data, might turn out worse than just crashing.

If there are no errors though, (go ahead and fix the misspelling of “data.txt” now), you get your data (because there was no error), but you also don’t get it wrapped up in an “Ok” (or an “Err”) package. This is what the “.unwrap()” option does for us. It unwraps the data from the wrapper that results from the “read_to_string” command. And since the package is already unwrapped by the time it gets printed, you no longer need the "debug" format:

main.rs

...
main() {
  let file_contents = read_to_string(“src/data.txt”).unwrap();
  println!(“{:?}”, file_contents);
  println!("This is the end of the program.");
} // end of main()

Don't Handle the Error, But Panic the Program to a Stop with a Custom Message

We can improve this non-handling of an error. Rather than using “.unwrap()”, we can use “.expect()”. This does much the same thing, except that if we don’t get an “Ok” like we expect, we can print out a customized message:

main.rs

...
let file_contents = read_to_string("src/data.txt";).expect(“Something went wrong while trying to read the ‘src/data.txt’ file. Sorry.”);
...

Now if you run your program, and there is no error, you get the data, but it’s not wrapped in the "Result" wrapper, so the file contents can be printed without using the "debug" format. But if something does go wrong, your customized message gets printed along with the error from the failure.

Handle the Error Properly

There's a finer-grained way to handle errors:

main.rs

...
let file_contents = read_to_string("src/data.txt");

let file_contents = match file_contents {
  Ok(file) => file,
  Err(error) => panic!("\nOh great! We're dying now. Here's the error message:\n\n=====\n{:?}\n=====\n\n", error),
};

println!("{}", file_contents);
...

In the above snippet of code, we're initially creating a variable named "file_contents", and putting a "Result" data-type into that contains either an "Ok" along with the file contents or an "Err" along with the error message given from the operating system to the "read_to_string" function. Then we're running a "match" against that variable to find out which result we have; if it's an "Ok" (along with the file), we return that file and put it into a new-but-same-named variable, "file_contents". If the result is an "Err" (along with the error message from the OS), we panic, along with printing out a custom message which includes that error message from the OS.

Parsing the Data

The format of the returned data is not particularly helpful. It would be better if we could break up the data into logical “chunks”. A logical way of breaking up this data would be line-by-line. So if we could break up the data at the newlines, that’d be great. You can see that would be especially helpful on a data file that is more uniform, like this new “data2.txt”:

data2.txt

West,Kent,157,555-123-4567,Male
Johnson,Bendi,42,888-329-7555,Female
Dalcern,Frederique,2230,415-444-4448,Female

(Remember to change the instance of “data.txt” in your program to “data2.txt”, and to change it in the error message instance as well, so your error message won’t be in error.)

When we read this file, we read it into a single String variable ("file_contents") as one long string, like this:

West,Kent,157,555-123-4567,Male\nJohnson,Bendi,42,888-329-7555,Female\nDalcern,Frederique,2230,415-444-4448,Female<EOF>

It's easy to break up the data line-by-line, and convert each line into an element of a vector (a one-dimensional array).

main.rs

...
main()

    let file_contents = read_to_string("src/data2.txt");

    // Did an error occur during reading?
    let file_contents = match file_contents {
      Ok(file) => file,
      Err(error) => panic!("\nOh great! We're dying now. Here's the error message:\n\n=====\n{:?}\n=====\n\n", error),
    };

    // This splits the string at the newlines, and puts the lines into a vector of Strings.
    let file_vec: Vec<String> = file_contents
        .split("\n") // Split the contents on the newlines, which returns an iterator.
        .map(|line|line.to_string()) // We map the items in the iterator to an iterator of Strings.
        .collect() // The iterator of Strings returned by Map is collected back into a vector,
    ; // and returned as "file_vec".

    // Now we have a vector of Strings that can be printed out one by one.
    println!("The first line is \"{}\".", file_vec[0]);
    println!("The third line is \"{}\".", file_vec[2]);

    // println!("End of program.");
end of main()

The above can be condensed to the following:

main.rs

...
main()

    // Read a file into a vector of Strings (and watch out for errors).
    let file_vec: Vec<String> = match read_to_string("src/data2.txt") {
      Ok(file) => file.lines().map(|line|line.to_string()).collect(),
      Err(error) => panic!("\nOh great! We're dying now. Here's the error message:\n\n=====\n{:?}\n=====\n\n", error),
    };
    
    // Now we have a vector of Strings that can be printed out one by one.
    println!("The first line is \"{}\".", file_vec[0]);
    println!("The third line is \"{}\".", file_vec[2]);

end of main()

;