Alan Jeffrey, Mozilla
Regular parsing is all-at-once:
let parser = ···;
let mut test_data = "(x + 37) + y".chars();
match parser.init(&mut test_data).unwrap() {
Done(parsed) ⇒ ··· // yay, parsed it!
}
Streaming parsing is bit-by-bit:
let mut test_data1 = "(x + ".chars();
let mut test_data2 = "37) + y".chars();
match parser.init(&mut test_data1).unwrap() {
Continue(parsing) ⇒ match parsing.more(&mut test_data2) {
Done(parsed) ⇒ ··· // yay, parsed it!
} ··· }
Streaming parsing is painful.
Input arrives in chunks (e.g. from the network).
Pain: save/restore parser state at chunk boundaries.
Gain: much less buffering.
We’d like the parsing library to use zero heap.
User code (e.g. building an AST) can do what it likes.
Can the library work with no memory allocation?
No.
Contradicts goal 1.
Remember save/restore parser state at chunk boundaries?
That state has to go somewhere.
Can’t go on the stack (not fixed size).
This just leaves the heap.
No memory allocation between chunk boundaries.
Parser libraries often have complex types.
For example, the parser “foo.or_else(bar).and_then(baz)”
has type “AndThen〈OrElse〈Foo,Bar〉,Baz〉”.
This is very similar to the Rust iterator library.
Users should not have to see complex types.
Yay type inference!
Dynamic method calls can be expensive.
Can we do everything with static method calls?
AFAIK no, at least not in Rust.
Contradicts previous goal.
Rust does static method dispatch by type.
You write “this.that()”,
rustc works out (this: SomeType)
and looks up (impl ThatTrait for SomeType),
at compile time.
Imagine a parser for expressions like "(x + 37) + y".
Expressions are arbitrarily deep, so the parser is recursive.
Imagine a type Exp1〈P〉 that can parse expressions of depth 1,
and uses parser P each time it reaches parentheses.
We’re looking for the type Exp1〈Exp1〈Exp1〈···〉〉〉.
Yay, Rust has recursive types, using Box.
struct Exp(Box〈Exp1〈Exp〉〉);
Yay! Recursive parsers, static dispatch, we’re good?
Remember Users should not have to see complex types?
struct Exp(Box〈Exp1〈Exp〉〉);
Exp1 is one of those complex types.
AFAIK, Rust doesn’t allow type inference to make these invisible,
(e.g. in supertrait type arguments or associated type definitions).
Trait objects to the rescue!
struct Exp(Box〈SomeParserTrait〉);
Yay: parser traits much are simpler than the complex parser types.
But: trait objects use dynamic dispatch.
No dynamic method calls between chunk boundaries.
let ALPHABETIC = character(char::is_alphabetic);
let ALPHANUMERIC = character(char::is_alphanumeric);
let parser = ALPHABETIC
.and_then(ALPHANUMERIC.star(String::new));
let mut test_data = "abc123!".chars();
match parser.init(&mut test_data).unwrap() {
Done(result) ⇒ assert_eq!(result, ('a', "bc123")), ...
}
assert_eq!(test_data.as_str(), "!");
let parser = ALPHABETIC
.and_then(ALPHANUMERIC.star(ignore))
.buffer(); // Returns a Cow<'a,str>
let mut test_data = "abc123!".chars();
match parser.init(&mut test_data).unwrap() {
Done(result) ⇒ assert_eq!(result, Borrowed("abc123")), ...
}
let parser = ALPHABETIC
.and_then(ALPHANUMERIC.star(ignore))
.buffer(); // Returns a Cow<'a,str>
let mut test_data1 = "abc".chars();
let mut test_data2 = "123!".chars();
match parser.init(&mut test_data1).unwrap() {
Continue(parsing) ⇒ match parsing.more(&mut test_data2) {
Done(result) ⇒ assert_eq!(result, Owned(String::from("abc123"))), ...
}, ... }
trait Stateful〈Ch, Str〉where Str: Iterator〈Item=Ch〉{
type Output;
fn more(self, data: &mut Str) → ParseResult〈Self, Self::Output〉;
}
enum ParseResult〈State, Output〉{
Continue(State),
Done(Output),
}