CakeParse

CakeParse is a lexer and recursive descent parser combinator library for Kotlin. It can parse any LL(*) grammer, and contains line/column number error reporting. A calculator expression parser example is included.

Usage

Step 0: Setup

Check the version badge for the latest version number: Version badge

Gradle

repositories {
  jcenter()
}

dependencies {
  compile "me.sargunvohra.lib:CakeParse:<version>"
}

Kobalt

dependencies {
  compile("me.sargunvohra.lib.CakeParse:<version>")
}

Step 1: Tokens

Define your tokens by name and regex. These make up the building blocks of your parser rules. Whitespace can be ignored by defining a token with ignore = true.

import me.sargunvohra.lib.cakeparse.api.*

val number = token("number", "[0-9]+")
val plus = token("plus", "\\+")
val space = token("space", "[ \\t\\r]+", ignore = true)
// ...

Step 2: Productions

Define your productions using the the combinators and other methods.

val parenExpr = lPar then expr before rPar
val primExpr = number map { it.raw.toInt() } or parenExpr
val goal = oneOrMore(number)
// ...

The available combinators are:

There are also a number of convenience parsers:

Step 3: Parse

Pass your input through a lexer, and then a parser.

val tokens = setOf(token1, token2, token3)
try {
  val input = System.in
  val result = tokens.lexer().lex(input).parseToEnd(goal).value
  println("Result: $result")
} catch (e: LexerException) {
  System.err.println("Lexing error: ${e.message}")
} catch (e: ParserException) {
  System.err.println("Parsing error: ${e.message}")
}

Note: Recursion

If you have a rule or system of rules that involves recursion, then you'll have to use the ref function to make it work. Be sure to explicitly specify the type of a ref. This is needed because the compiler can't infer the type of a recursive reference.

// put all tokens first
val lParen = token("lParen", "\\(")
val rParen = token("rParen", "\\)")

// then put all refs
val parenRef: Parser<Token> = ref({ paren })

// then put all rules
val paren = (lParen then parenRef before rParen) or empty()

The best way to do this is to put your goal rule at the bottom, all rules that depend on it above, all rules that depend on those above those, etc. Anywhere a rule refers to itself or the rule below, use a ref.