Fill in the foreign-function part of the tutorial

author: Marijn Haverbeke <marijnh@gmail.com> 2011-11-01 17:35:18 +0100
committer: Marijn Haverbeke <marijnh@gmail.com> 2011-11-01 17:35:18 +0100
commit: 11999351e0f59d0dcf2ed3b2cf661dcc2e4f0edd (patch)
tree: 76f8dbd8cf19727a8a0393b41a7b5ba331f2a867 /doc/tutorial
parent: 4fec1798d0d7177c6d15b825d69f2db1a2e814ff (diff)
download: rust-11999351e0f59d0dcf2ed3b2cf661dcc2e4f0edd.tar.gz
rust-11999351e0f59d0dcf2ed3b2cf661dcc2e4f0edd.zip
5 files changed, 191 insertions, 2 deletions
diff --git a/doc/tutorial/ext.md b/doc/tutorial/ext.md
new file mode 100644
index 00000000000..ff2589b8443
--- /dev/null
+++ b/doc/tutorial/ext.md
@@ -0,0 +1,3 @@
+# Syntax extension
+
+FIXME to be written
diff --git a/doc/tutorial/ffi.md b/doc/tutorial/ffi.md
index 1b000bb7298..797589c3599 100644
--- a/doc/tutorial/ffi.md
+++ b/doc/tutorial/ffi.md
@@ -1,3 +1,186 @@
 # Interacting with foreign code
 
-FIXME to be written
+On of Rust's aims, as a system programming language, is to
+interoperate well with C code.
+
+We'll start with an example. It's a bit bigger than usual, and
+contains a number of new concepts. We'll go over it one piece at a
+time.
+
+This is a program that uses OpenSSL's `SHA1` function to compute the
+hash of its first command-line argument, which it then converts to a
+hexadecimal string and prints to standard output. If you have the
+OpenSSL libraries installed, it should 'just work'.
+
+    use std;
+    import std::{vec, str};
+    
+    native "cdecl" mod ssl {
+        fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8;
+    }
+    
+    fn as_hex(data: [u8]) -> str {
+        let acc = "";
+        for byte in data { acc += #fmt("%02x", byte as uint); }
+        ret acc;
+    }
+
+    fn sha1(data: str) -> str unsafe {
+        let bytes = str::bytes(data);
+        let hash = ssl::SHA1(vec::unsafe::to_ptr(bytes),
+                             vec::len(bytes), std::ptr::null());
+        ret as_hex(vec::unsafe::from_buf(hash, 20u));
+    }
+    
+    fn main(args: [str]) {
+        std::io::println(sha1(args[1]));
+    }
+
+## Native modules
+
+Before we can call `SHA1`, we have to declare it. That is what this
+part of the program is responsible for:
+
+    native "cdecl" mod ssl {
+        fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8;
+    }
+
+A `native` module declaration tells the compiler that the program
+should be linked with a library by that name, and that the given list
+of functions are available in that library.
+
+In this case, it'll change the name `ssl` to a shared library name in
+a platform-specific way (`libssl.so` on Linux, for example), and link
+that in. If you want the module to have a different name from the
+actual library, you can say `native "cdecl" mod something = "ssl" {
+... }`.
+
+The `"cdecl"` word indicates the calling convention to use for
+functions in this module. Most C libraries use cdecl as their calling
+convention. You can also specify `"x86stdcall"` to use stdcall
+instead.
+
+FIXME: Mention c-stack variants? Are they going to change?
+
+## Unsafe pointers
+
+The native `SHA1` function is declared to take three arguments, and
+return a pointer.
+
+    fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8;
+
+When declaring the argument types to a foreign function, the Rust
+compiler has no way to check whether your declaration is correct, so
+you have to be careful. If you get the number or types of the
+arguments wrong, you're likely to get a segmentation fault. Or,
+probably even worse, your code will work on one platform, but break on
+another.
+
+In this case, `SHA1` is defined as taking two `unsigned char*`
+arguments and one `unsigned long`. The rust equivalents are `*u8`
+unsafe pointers and an `uint` (which, like `unsigned long`, is a
+machine-word-sized type).
+
+Unsafe pointers can be created through various functions in the
+standard lib, usually with `unsafe` somewhere in their name. You can
+dereference an unsafe pointer with `*` operator, but use
+caution—unlike Rust's other pointer types, unsafe pointers are
+completely unmanaged, so they might point at invalid memory, or be
+null pointers.
+
+## Unsafe blocks
+
+The `sha1` function is the most obscure part of the program.
+
+    fn sha1(data: str) -> str unsafe {
+        let bytes = str::bytes(data);
+        let hash = ssl::SHA1(vec::unsafe::to_ptr(bytes),
+                             vec::len(bytes), std::ptr::null());
+        ret as_hex(vec::unsafe::from_buf(hash, 20u));
+    }
+
+Firstly, what does the `unsafe` keyword at the top of the function
+mean? `unsafe` is a block modifier—it declares the block following it
+to be known to be unsafe.
+
+Some operations, like dereferencing unsafe pointers or calling
+functions that have been marked unsafe, are only allowed inside unsafe
+blocks. With the `unsafe` keyword, you're telling the compiler 'I know
+what I'm doing'. The main motivation for such an annotation is that
+when you have a memory error (and you will, if you're using unsafe
+constructs), you have some idea where to look—it will most likely be
+caused by some unsafe code.
+
+Unsafe blocks isolate unsafety. Unsafe functions, on the other hand,
+advertise it to the world. An unsafe function is written like this:
+
+    unsafe fn kaboom() { log "I'm harmless!"; }
+
+This function can only be called from an unsafe block or another
+unsafe function.
+
+## Pointer fiddling
+
+The standard library defines a number of helper functions for dealing
+with unsafe data, casting between types, and generally subverting
+Rust's safety mechanisms.
+
+Let's look at our `sha1` function again.
+
+    let bytes = str::bytes(data);
+    let hash = ssl::SHA1(vec::unsafe::to_ptr(bytes),
+                         vec::len(bytes), std::ptr::null());
+    ret as_hex(vec::unsafe::from_buf(hash, 20u));
+
+The `str::bytes` function is perfectly safe, it converts a string to
+an `[u8]`. This byte array is then fed to `vec::unsafe::to_ptr`, which
+returns an unsafe pointer to its contents.
+
+This pointer will become invalid as soon as the vector it points into
+is cleaned up, so you should be very careful how you use it. In this
+case, the local variable `bytes` outlives the pointer, so we're good.
+
+Passing a null pointer as third argument to `SHA1` causes it to use a
+static buffer, and thus save us the effort of allocating memory
+ourselves. `ptr::null` is a generic function that will return an
+unsafe null pointer of the correct type (Rust generics are awesome
+like that—they can take the right form depending on the type that they
+are expected to return).
+
+Finally, `vec::unsafe::from_buf` builds up a new `[u8]` from the
+unsafe pointer that was returned by `SHA1`. SHA1 digests are always
+twenty bytes long, so we can pass `20u` for the length of the new
+vector.
+
+## Passing structures
+
+C functions often take pointers to structs as arguments. Since Rust
+records are binary-compatible with C structs, Rust programs can call
+such functions directly.
+
+This program uses the Posix function `gettimeofday` to get a
+microsecond-resolution timer.
+
+    use std;
+    type timeval = {tv_sec: u32, tv_usec: u32};
+    native "cdecl" mod libc = "" {
+        fn gettimeofday(tv: *mutable timeval, tz: *()) -> i32;
+    }
+    fn unix_time_in_microseconds() -> u64 unsafe {
+        let x = {tv_sec: 0u32, tv_usec: 0u32};
+        libc::gettimeofday(std::ptr::addr_of(x), std::ptr::null());
+        ret (x.tv_sec as u64) * 1000_000_u64 + (x.tv_usec as u64);
+    }
+
+The `libc = ""` sets the name of the native module to the empty string
+to prevent the rust compiler from trying to link it. The standard C
+library is already linked with Rust programs.
+
+A `timeval`, in C, is a struct with two 32-bit integers. Thus, we
+define a record type with the same contents, and declare
+`gettimeofday` to take a pointer to such a record.
+
+The second argument to `gettimeofday` (the time zone) is not used by
+this program, so it simply declares it to be a pointer to the nil
+type. Since null pointer look the same, no matter which type they are
+supposed to point at, this is safe.
diff --git a/doc/tutorial/order b/doc/tutorial/order
index bf621816d47..7b66416f291 100644
--- a/doc/tutorial/order
+++ b/doc/tutorial/order
@@ -8,5 +8,6 @@ args
 generic
 mod
 ffi
+ext
 task
 test
diff --git a/doc/tutorial/syntax.md b/doc/tutorial/syntax.md
index 1ccd49760ec..bcdcac68101 100644
--- a/doc/tutorial/syntax.md
+++ b/doc/tutorial/syntax.md
@@ -1,5 +1,7 @@
 # Syntax Basics
 
+FIXME: briefly mention syntax extentions, #fmt
+
 ## Braces
 
 Assuming you've programmed in any C-family language (C++, Java,
diff --git a/doc/tutorial/web/default.css b/doc/tutorial/web/default.css
index e68f0fb378c..aff4f66f386 100644
--- a/doc/tutorial/web/default.css
+++ b/doc/tutorial/web/default.css
@@ -4,7 +4,7 @@
 .cm-s-default span.cm-def {color: #00f;}
 .cm-s-default span.cm-variable {color: black;}
 .cm-s-default span.cm-variable-2 {color: #05a;}
-.cm-s-default span.cm-variable-3 {color: #0a5;}
+.cm-s-default span.cm-variable-3 {color: #085;}
 .cm-s-default span.cm-property {color: black;}
 .cm-s-default span.cm-operator {color: black;}
 .cm-s-default span.cm-comment {color: #a50;}
author	Marijn Haverbeke <marijnh@gmail.com>	2011-11-01 17:35:18 +0100
committer	Marijn Haverbeke <marijnh@gmail.com>	2011-11-01 17:35:18 +0100
commit	11999351e0f59d0dcf2ed3b2cf661dcc2e4f0edd (patch)
tree	76f8dbd8cf19727a8a0393b41a7b5ba331f2a867 /doc/tutorial
parent	4fec1798d0d7177c6d15b825d69f2db1a2e814ff (diff)
download	rust-11999351e0f59d0dcf2ed3b2cf661dcc2e4f0edd.tar.gz rust-11999351e0f59d0dcf2ed3b2cf661dcc2e4f0edd.zip