OCaml Planet

The OCaml Planet aggregates various blogs from the OCaml community. If you would like to be added, read the Planet syndication HOWTO.

927 blog posts are available. You can read the 30 more recent ones below or view older ones.

Proving a mem/map property — Shayne Fletcher, May 11, 2017

Here are two well known "classic" functions over polymorphic lists.

map f l computes a new list from l by applying f to each of its elements.


        let rec map (f : α → β) : α list → β list = function
        | [] → []
        | h :: t → f h :: map f t
        ;;

mem x l returns true is x is an element of l and returns false if it is not.


        let rec mem (a : α) : …

Here are two well known "classic" functions over polymorphic lists.

map f l computes a new list from l by applying f to each of its elements.


        let rec map (f : α → β) : α list → β list = function
        | [] → []
        | h :: t → f h :: map f t
        ;;

mem x l returns true is x is an element of l and returns false if it is not.


        let rec mem (a : α) : α list → bool  = function
        | [] → false
        | x :: l → a = x || mem a l
        ;;

If y is an element of the list obtained by mapping f over l then there must be an element x in l such that f x = y. Conversely, if there exists an x in l such that y = f x, then y must be a member of the list obtained by mapping f over l.

We attempt a proof of correctness of the given definitions with respect to this property.

Lemma mem_map_iff:


    ∀ (f : α → β) (l : α list) (y : β),
        mem y (map f l) ⇔ ∃(x : α), f x = y ∧ mem x l.

Proof:

We first treat the forward implication
```
    ∀ (f : α → β) (l : α list) (y : β), 
      mem y (map f l) ⇒ ∃(x : α), f x = y ∧ mem x l
        
```
and proceed by induction on l.
- l = []:
  - Show mem y (map f []) ⇒ ∃(x : α), f x = y ∧ mem x [].
  - mem y (map f []) ≡ False.
  - Proof follows (ex falso quodlibet).
- l has form x' :: l (use l now to refer to the tail):
  - Assume the induction hypothesis:
    - mem y (map f l) ⇒ ∃x, f x = y ∧ mem x l.
  - We are required to show for an arbitrary (x' : α):
    - mem y (map f (x' :: l)) ⇒ ∃(x : α), f x = y ∧ mem x (x' :: l).
  - By simplification, we can rewrite the above to:
    - f x' = y ∨ mem y (map f l) ⇒ ∃(x : α), f x = y ∧ (x' = x ∨ mem x l).
  - We assume then an (x' : α) and a (y : β) such that:
    1. f x' = y ∨ mem y (map f l).
    2. mem y (map f l) ⇒ ∃(x : α), f x = y ∧ mem x l.
  - Show ∃(x : α), f x = y ∧ (x' = x ∨ mem x l).
    - First consider f x' = y in (1).
      - Take x = x' in the goal.
      - Then by (1) f x = y ∧ x = x'.
      - So x' is a witness.
    - Now consider mem y (map f l) in (1).
      - ∃(x^* : α), f x^* = y ∧ mem x^* l by (2).
      - Take x = x^* in the goal.
      - By the above f x^* = y ∧ mem x^* l
      - So x^* is a witness.
We now work on the reverse implication. We want to show that
```
    ∀ (f : α → β) (l : α list) (y : β),
       ∃(x : α), f x = y ∧ mem x l ⇒ mem y (map f l)
      
```
and proceed by induction on l.
- l = []:
  - Assume x, y with f x = y ∧ mem x [].
  - Show mem y (map f []):
- l has form x' :: l (use l now to refer to the tail):
  - Assume the induction hypothesis:
    - ∃(x : α), f x = y ∧ mem x l ⇒ mem y (map f l).
  - We are required to show for an arbitrary (x' : α):
    - ∃ (x : α), f x = y ∧ mem x (x' :: l) ⇒ mem y (map f (x' :: l))
  - By simplification, we can rewrite the above to:
    - ∃ (x : α), f x = y ∧ x = x' ∨ mem x l ⇒ f x' = y ∨ mem y (map f l).
  - Assume the goal and induction hypotheses:
    - There is (x : α) and (y : β) such that:
      1. f x = y ∧ (x = x' ∨ mem x l)
      2. f x = y ∧ mem x l ⇒ mem y (map f l)
  - Show f x' = y ∨ mem y (map f l).
    - Assume x = x' in (1) and show f x' = y:
      - Since, f x = y is given by (1.), f x' = y.
    - Assume mem x l in (1) and show mem y (map f l):
      - Rewrite mem y (map f l) via (2) to f x = y ∧ mem x l.
      - f x = y by (1) so mem y (map f l).

∎

References:
"Sofware Foundations" -- Pierce et. al.

Hide

New opam features: more expressive dependencies — OCamlPro, May 11, 2017

This blog will cover yet another aspect of the improvements opam 2.0 has over opam 1.2. It may be a little more technical than previous issues, as it covers a feature directed specifically at packagers and repository maintainers, and regarding the package definition format.

Specifying dependencies in opam 1.2

Opam 1.2 already has an advanced way of specifying package dependencies, using formulas on packages and versions, with the following syntax:

depends: [
  "foo" {>= "3.0&q…

Specifying dependencies in opam 1.2

Opam 1.2 already has an advanced way of specifying package dependencies, using formulas on packages and versions, with the following syntax:

depends: [
  "foo" {>= "3.0" & < "4.0~"}
  ( "bar" | "baz" {>= "1.0"} )
]

meaning that the package being defined depends on both package foo, within the 3.x series, and one of bar or baz, the latter with version at least 1.0. See here for a complete documentation.

This only allows, however, dependencies that are static for a given package.

Opam 1.2 introduced build, test and doc “dependency flags” that could provide some specifics for dependencies (e.g. test dependencies would only be needed when tests were requested for the package). These were constrained to appear before the version constraints, e.g. "foo" {build & doc & >= "3.0"}.

Extensions in opam 2.0

Opam 2.0 generalises the dependency flags, and makes the dependencies specification more expressive by allowing to mix filters, i.e. formulas based on opam variables, with the version constraints. If that formula holds, the dependency is enforced, if not, it is discarded.

This is documented in more detail in the opam 2.0 manual.

Note also that, since the compilers are now packages, the required OCaml version is now expressed using this mechanism as well, through a dependency to the (virtual) package ocaml, e.g. depends: [ "ocaml" {>= "4.03.0"} ]. This replaces uses of the available: field and ocaml-version switch variable.

Conditional dependencies

This makes it trivial to add, for example, a condition on the OS to a given dependency, using the built-in variable os:

depends: [ "foo" {>= "3.0" & < "4.0~" & os = "linux"} ]

here, foo is simply not needed if the OS isn't Linux. We could also be more specific about other OSes using more complex formulas:

depends: [
  "foo" { "1.0+linux" & os = "linux" |
          "1.0+osx" & os = "darwin" }
  "bar" { os != "osx" & os != "darwin" }
]

Meaning that Linux and OSX require foo, respectively versions 1.0+linux and 1.0+osx, while other systems require bar, any version.

Dependency flags

Dependency flags, as used in 1.2, are no longer needed, and are replaced by variables that can appear anywhere in the version specification. The following variables are typically useful there:

with-test, with-doc: replace the test and doc dependency flags, and are true when the package's tests or documentation have been requested
likewise, build behaves similarly as before, limiting the dependency to a “build-dependency”, implying that the package won't need to be rebuilt if the dependency changes
dev: this boolean variable holds true on “development” packages, that is, packages that are bound to a non-stable source (a version control system, or if the package is pinned to an archive without known checksum). dev sources often happen to need an additional preliminary step (e.g. autoconf), which may have its own dependencies.

Use opam config list for a list of pre-defined variables. Note that the with-test, with-doc and build variables are not available everywhere: the first two are allowed only in the depends:, depopts:, build: and install: fields, and the latter is specific to the depends: and depopts: fields.

For example, the datakit.0.9.0 package has:

depends: [
  ...
  "datakit-server" {>= "0.9.0"}
  "datakit-client" {with-test & >= "0.9.0"}
  "datakit-github" {with-test & >= "0.9.0"}
  "alcotest" {with-test & >= "0.7.0"}
]

When running opam install datakit.0.9.0, the with-test variable is set to false, and the datakit-client, datakit-github and alcotest dependencies are filtered out: they won't be required. With opam install datakit.0.9.0 --with-test, the with-test variable is true (for that package only, tests on packages not listed on the command-line are not enabled!). In this case, the dependencies resolve to:

depends: [
  ...
  "datakit-server" {>= "0.9.0"}
  "datakit-client" {>= "0.9.0"}
  "datakit-github" {>= "0.9.0"}
  "alcotest" {>= "0.7.0"}
]

which is treated normally.

Computed versions

It is also possible to use variables, not only as conditions, but to compute the version values: "foo" {= var} is allowed and will require the version of package foo corresponding to the value of variable var.

This is useful, for example, to define a family of packages, which are released together with the same version number: instead of having to update the dependencies of each package to match the common version at each release, you can leverage the version package-variable to mean “that other package, at the same version as current package”. For example, foo-client could have the following:

depends: [ "foo-core" {= version} ]

It is even possible to use variable interpolations within versions, e.g. specifying an os-specific version differently than above:

depends: [ "foo" {= "1.0+%{os}%"} ]

this will expand the os variable, resolving to 1.0+linux, 1.0+darwin, etc.

Getting back to our datakit example, we could leverage this and rewrite it to the more generic:

depends: [
  ...
  "datakit-server" {>= version}
  "datakit-client" {with-test & >= version}
  "datakit-github" {with-test & >= version}
  "alcotest" {with-test & >= "0.7.0"}
]

Since the datakit-* packages follow the same versioning, this avoids having to rewrite the opam file on every new version, with a risk of error each time.

As a side note, these variables are consistent with what is now used in the build: field, and the build-test: field is now deprecated. So this other part of the same datakit opam file:

build:
  ["ocaml" "pkg/pkg.ml" "build" "--pinned" "%{pinned}%" "--tests" "false"]
build-test: [
  ["ocaml" "pkg/pkg.ml" "build" "--pinned" "%{pinned}%" "--tests" "true"]
  ["ocaml" "pkg/pkg.ml" "test"]
]

would now be preferably written as:

build:
  ["ocaml" "pkg/pkg.ml" "build" "--pinned" "%{pinned}%" "--tests" "%{with-test}%"]
run-test:
  ["ocaml" "pkg/pkg.ml" "test"]

which avoids building twice just to change the options.

Conclusion

Hopefully this extension to expressivity in dependencies will make the life of packagers easier; feedback is welcome on your personal use-cases.

Note that the official repository is still in 1.2 format (served as 2.0 at https://opam.ocaml.org/2.0, through automatic conversion), and will only be migrated a little while after opam 2.0 is finally released. You are welcome to experiment on custom repositories or pinned packages already, but will need a little more patience before you can contribute package definitions making use of the above to the official repository.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com. Please head to the latter for the comments!

Hide

Weekly News — OCaml Weekly News, May 09, 2017

Full Time: Software Developer (Functional Programming) at Jane Street in New York, NY; London, UK; Hong Kong — GitHub Jobs, May 08, 2017

Software Developer

Jane Street is a proprietary quantitative trading firm, focusing primarily on trading equities and equity derivatives. We use innovative technology, a scientific approach, and a deep understanding of markets to stay successful in our highly competitive field. We operate around the clock and around the globe, employing over 400 people in offices in New York, London and Hong Kong.

The markets in which we trade change rapidly, but our intellectual approach changes faster still…

Software Developer

The markets in which we trade change rapidly, but our intellectual approach changes faster still. Every day, we have new problems to solve and new theories to test. Our entrepreneurial culture is driven by our talented team of traders and programmers. At Jane Street, we don't come to work wanting to leave. We come to work excited to test new theories, have thought-provoking discussions, and maybe sneak in a game of ping-pong or two. Keeping our culture casual and our employees happy is of paramount importance to us.

We are looking to hire great software developers with an interest in functional programming. OCaml, a statically typed functional programming language with similarities to Haskell, Scheme, Erlang, F# and SML, is our language of choice. We've got the largest team of OCaml developers in any industrial setting, and probably the world's largest OCaml codebase. We use OCaml for running our entire business, supporting everything from research to systems administration to trading systems. If you're interested in seeing how functional programming plays out in the real world, there's no better place.

The atmosphere is informal and intellectual. There is a focus on education, and people learn about software and trading, both through formal classes and on the job. The work is challenging, and you get to see the practical impact of your efforts in quick and dramatic terms. Jane Street is also small enough that people have the freedom to get involved in many different areas of the business. Compensation is highly competitive, and there's a lot of room for growth.

You can learn more about Jane Street and our technology from our main site, janestreet.com. You can also look at a a talk given at CMU about why Jane Street uses functional programming (http://ocaml.janestreet.com/?q=node/61), and our programming blog (http://ocaml.janestreet.com).

We also have extensive benefits, including:

90% book reimbursement for work-related books
90% tuition reimbursement for continuing education
Excellent, zero-premium medical and dental insurance
Free lunch delivered daily from a selection of restaurants
Catered breakfasts and fresh brewed Peet's coffee
An on-site, private gym in New York with towel service
Kitchens fully stocked with a variety of snack choices
Full company 401(k) match up to 6% of salary, vests immediately
Three weeks of paid vacation for new hires in the US
16 weeks fully paid maternity/paternity leave for primary caregivers, plus additional unpaid leave

More information at http://janestreet.com/culture/benefits/

Hide

Preprocessor extensions for code generation — Shayne Fletcher, May 04, 2017

PPX

Preprocessor extensions for code generation

"A Guide to Extension Points in OCaml"[1] provides a great "quick-start" on using the OCaml extension points API to implement preprocessor extensions for abstract syntax tree rewrites. This post picks up where that tutorial leaves off by showing how to write a ppx that does code generation.

The problem treated here is one posed in Whitequark's blog : "Implement a syntax extension that would accept type declarations of th…

PPX

Preprocessor extensions for code generation

The problem treated here is one posed in Whitequark's blog : "Implement a syntax extension that would accept type declarations of the form type t = A [@id 1] | B of int [@id 4] [@@id_of] to generate a function mapping a value of type t to its integer representation."

Implementing the "`id_of`" ppx

The basic strategy

In the OCaml parse tree, structures are lists of structure items. Type declarations are structure items as are let-bindings to functions.

In this program, analysis of an inductive type declaration t may result in the production of a new structure item, the AST of an of_id function to be appended to the structure containing t.

Now the general strategy in writing a ppx is to provide a record of type Ast_mapper.mapper. That record is based on the Ast_mapper.default_mapper record but selectively overriding those fields for those sytactic categories that the ppx is intending to transform.

Now, as we determined above, the effect of the ppx is to provide a function from a structure to a new structure. Accordingly, at a minimum then we'll want to override the structure field of the default mapper. Schematically then our ppx code will take on the following shape.


  open Ast_mapper
  open Ast_helper
  open Asttypes
  open Parsetree
  open Longident

  let structure_mapper mapper structure =
    ...

   let id_of_mapper =  {
     default_mapper with structure = structure_mapper
   }

   let () = register "id_of" id_of_mapper

This program goes just a little bit further though. Any @id or @@id_of attributes that get as far as the OCaml compiler would be ignored. So, it's not neccessary that they be removed by our ppx once they've been acted upon but it seems tidy to do so. Accordingly, there are two more syntactic constructs that this ppx operates on.


  open Ast_mapper
  open Ast_helper
  open Asttypes
  open Parsetree
  open Longident

  let structure_mapper mapper structure =
    ...

  let type_declaration_mapper mapper decl =
    ...

  let constructor_declaration_mapper mapper decl =
    ...

  let id_of_mapper argv = {
    default_mapper with
      structure = structure_mapper;
      type_declaration = type_declaration_mapper;
      constructor_declaration = constructor_declaration_mapper
  }

Implementing the mappings

To warm up, lets start with the easy mappers.

The role of type_declaration_mapper is a function from a type_declaration argument to a type_declaration result that is the argument in all but that any @@id_of attribute has been removed.


  let type_declaration_mapper 
      (mapper : mapper) 
      (decl : type_declaration) : type_declaration  =
    match decl with
      (*Case of an inductive type "t"*)
    | {ptype_name = {txt = "t"; _};
       ptype_kind = Ptype_variant constructor_declarations;
       ptype_attributes;_} ->
      let (_, attrs) = 
        List.partition (fun ({txt;_},_) ->txt="id_of") ptype_attributes in
      {(default_mapper.type_declaration mapper decl) 
      with ptype_attributes=attrs}
    (*Not an inductive type named "t"*)
    | _ -> default_mapper.type_declaration mapper decl

constructor_declaration_mapper is analogous to type_declaration_mapper above but this time its @id attributes that are removed.


  let constructor_declaration_mapper 
      (mapper : mapper) 
      (decl : constructor_declaration) : constructor_declaration =
    match decl with
    | {pcd_attributes; _} ->
      let (_, attrs) = 
        List.partition (fun ({txt;_}, _) -> txt="id") pcd_attributes  in
      {(default_mapper.constructor_declaration mapper decl) 
      with pcd_attributes=attrs}

Now to the raison d'etre of the ppx, structure_mapper.

First, a utility function that computes from a constructor_declaration with an @id attribute, a (function) case for it. For example, suppose "Bar of int [@id 4]" is the constructor declaration, then the case to be computed is the AST corresponding to the code "| Bar _ -> 4".


  let case_of_constructor_declaration : 
      constructor_declaration -> case =  function
    | {pcd_name={txt;loc};pcd_args;pcd_attributes; _} ->
      match List.filter (fun ({txt;_}, _) -> txt="id") pcd_attributes with
      (*No "@id"*)
      | [] -> 
        raise (Location.Error (Location.error ~loc "[@id] : Missing"))
      (*Single "@id"*)
      | [(_, payload)] -> 
        begin match payload with 
          | PStr [{pstr_desc=Pstr_eval ({pexp_desc=
              Pexp_constant (Pconst_integer (id, None)); _}, _)
            }] ->
            Exp.case 
              (Pat.construct 
                 {txt=Lident txt; loc=(!default_loc)}  
                 (match pcd_args with 
                 | Pcstr_tuple [] -> None | _ -> Some (Pat.any ())))
              (Exp.constant (Pconst_integer (id, None)))
          | _ -> 
            raise (Location.Error (Location.error ~loc 
            "[@id] : Bad (or missing) argument (should be int e.g. [@id 4])"))
        end
      (*Many "@id"s*)
      | (_ :: _) -> 
        raise (Location.Error (Location.error ~loc 
        "[@id] : Multiple occurences"))

One more utility function is required.

eval_structure_item item acc computes structure items to push on the front of acc. If item is a single declaration of an inductive type t attributed with @@id_of, then two structure items will be produced : one for t and one synthesized for t's of_id function. In all other cases, just one structure item will be pushed onto acc.


  let eval_structure_item
      (mapper : mapper) 
      (item : structure_item) 
      (acc : structure) : structure =
    match item with
    (*Case of a single inductive type declaration*)
    | { pstr_desc = Pstr_type (_, [type_decl]); pstr_loc} ->
      begin
        match type_decl with
        (*Case where the type identifer is [t]*)
        | {ptype_name = {txt = "t"; _};
           ptype_kind = Ptype_variant constructor_declarations;
           ptype_attributes;
           _} ->
          begin
            match List.filter (fun ({txt;_},_) ->txt="id_of") 
              ptype_attributes 
            with
            (*No [@@id_of]*)
            | [] -> default_mapper.structure_item mapper item :: acc
  
            (*At least one [@@id_of] (treat multiple occurences as if
              one)*)
            | _ ->
              (*Cases of an [id_of] function for [t], one for each
                of its constructors*)
              let cases= 
                List.fold_right 
                  (fun x acc -> 
                    case_of_constructor_declaration x :: acc) 
                  constructor_declarations [] in
              (*The [id_of] function itself*)
              let id_of : structure_item = 
                Str.value Nonrecursive [
                  Vb.mk 
                    (Pat.var {txt="id_of"; loc=(!default_loc)}) 
                    (Exp.function_ cases)] in
  
              default_mapper.structure_item mapper item :: id_of :: acc
          end
        (*Case the type identifier is something other than [t]*)
        | _ -> default_mapper.structure_item mapper item :: acc
      end
    (*Case this structure item is something other than a single type
      declaration*)
    | _ -> default_mapper.structure_item mapper item :: acc

Finally we can write structure_mapper itself as a simple fold over a structure.


  let structure_mapper 
      (mapper : mapper) 
      (structure : structure) : structure =
    List.fold_right (eval_structure_item mapper) structure []

Building and testing

So that's it, this preprocessor extension is complete. Assuming the code is contained in a file called ppx_id_of.ml it can be compiled with a command along the lines of the following.


  ocamlc -o ppx_id_of.exe  -I +compiler-libs ocamlcommon.cma ppx_id_of.ml

When built, it can be tested with a command like ocamlc -dsource -ppx ppx_id_of.exe test.ml.

For example, when invoked on the following program,


  type t = 
    | A [@id 2] 
    | B of int [@id 4] [@@id_of]

  module M = struct
    type t =
    | Foo of int [@id 42]
    | Bar [@id 43] [@@id_of]

    module N = struct
       type t = 
       | Baz [@id 8] 
       | Quux of string * int [@id 7] [@@id_of]

      module Q = struct
        type t = 
          | U [@id 0] [@@id_of]
      end
    end
  end

the resulting output is,


  type t =
    | A
    | B of int
  let id_of = function | A  -> 2 | B _ -> 4
  module M =
    struct
      type t =
        | Foo of int
        | Bar
      let id_of = function | Foo _ -> 42 | Bar  -> 43
      module N =
        struct
          type t =
            | Baz
            | Quux of string * int
          let id_of = function | Baz  -> 8 | Quux _ -> 7
          module Q = struct type t =
                              | U
                            let id_of = function | U  -> 0  end
        end
    end

References:
[1] "A Guide to Extension Points in OCaml" -- Whitequark (blog post 2014)

Hide

New opam features: “opam install DIR” — OCamlPro, May 04, 2017

After the opam build feature was announced followed a lot of discussions, mainly having to do with its interface, and misleading name. The base features it offered, though, were still widely asked for:

a way to work directly with the project in the current directory, assuming it contains definitions for one or more packages
a way to copy the installed files of a package below a specified destdir
an easier way to get started hacking on a project, even without an initialised opam

Status of `opam …`

a way to work directly with the project in the current directory, assuming it contains definitions for one or more packages
a way to copy the installed files of a package below a specified destdir
an easier way to get started hacking on a project, even without an initialised opam

Status of `opam build`

opam build, as described in a previous post has been dropped. It will be absent from the next Beta, where the following replaces it.

Handling a local project

Consistently with what was done with local switches, it was decided, where meaningful, to overload the <packages> arguments of the commands, allowing directory names instead, and meaning “all packages defined there”, with some side-effects.

For example, the following command is now allowed, and I believe it will be extra convenient to many:

opam install . --deps-only

What this does is find opam (or <pkgname>.opam) files in the current directory (.), resolve their installations, and install all required packages. That should be the single step before running the source build by hand.

The following is a little bit more complex:

opam install .

This also retrieves the packages defined at ., pins them to the current source (using version-control if present), and installs them. Note that subsequent runs actually synchronise the pinnings, so that packages removed or renamed in the source tree are tracked properly (i.e. removed ones are unpinned, new ones pinned, the other ones upgraded as necessary).

opam upgrade, opam reinstall, and opam remove have also been updated to handle directories as arguments, and will work on “all packages pinned to that target”, i.e. the packages pinned by the previous call to opam install <dir>. In addition, opam remove <dir> unpins the packages, consistently reverting the converse install operation.

opam show already had a --file option, but has also been extended in the same way, for consistency and convenience.

This all, of course, works well with a local switch at ./, but the two features can be used completely independently. Note also that the directory name must be made unambiguous with a possible package name, so make sure to use ./foo rather than just foo for a local project in subdirectory foo.

Specifying a destdir

This relies on installed files tracking, but was actually independent from the other opam build features. It is now simply a new option to opam install:

opam install foo --destdir ~/local/

will install foo normally (if it isn’t installed already) and copy all its installed files, following the same hierarchy, into ~/local. opam remove --destdir is also supported, to remove these files.

Initialising

Automatic initialisation has been dropped for the moment. It was only saving one command (opam init, that opam will kindly print out for you if you forget it), and had two drawbacks:

some important details (like shell setup for opam) were skipped
the initialisation options were much reduced, so you would often have to go back to opam init anyway. The other possibility being to duplicate init options to all commands, adding lots of noise. Keeping things separate has its merits.

Granted, another command, opam switch create ., was made implicit. But using a local switch is a user choice, and worse, in contradiction with the previous de facto opam default, so not creating one automatically seems safer: having to specify --no-autoinit to opam build in order to get the more simple behaviour was inconvenient and error-prone.

One thing is provided to help with initialisation, though: opam switch create <dir> has been improved to handle package definitions at <dir>, and will use them to choose a compatible compiler, as opam build did. This avoids the frustration of creating a switch, then finding out that the package wasn’t compatible with the chosen compiler version, and having to start over with an explicit choice of a different compiler.

If you would really like automatic initialisation, and have a better interface to propose, your feedback is welcome!

More related options

A few other new options have been added to opam install and related commands, to improve the project-local workflows:

opam install --keep-build-dir is now complemented with --reuse-build-dir, for incremental builds within opam (assuming your build-system supports it correctly). At the moment, you should specify both on every upgrade of the concerned packages, or you could set the OPAMKEEPBUILDDIR and OPAMREUSEBUILDDIR environment variables.
opam install --inplace-build runs the scripts directly within the source dir instead of a dedicated copy. If multiple packages are pinned to the same directory, this disables parallel builds of these packages.
opam install --working-dir uses the working directory state of your project, instead of the state registered in the version control system. Don’t worry, opam will warn you if you have uncommitted changes and forgot to specify --working-dir.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com. Please head to the latter for the comments!

Hide

Weekly News — OCaml Weekly News, May 02, 2017

Looking for a technical writer — Jane Street (Yaron Minsky), May 01, 2017

Jane Street is looking to hire a technical writer. If you're interested, or know someone who you think would be a good match, here's the application link.

We've always believed that developers should spend time and effort documenting their own code, but at the same time, a great writer with a feel for the technology can raise the level of quality in a way that few developers can. And as we've grown, having someone dedicated to writing makes a ton of sense.

Here are the kinds of things we'd like …

Jane Street is looking to hire a technical writer. If you're interested, or know someone who you think would be a good match, here's the application link.

Here are the kinds of things we'd like to have a technical writer work on:

Training material. We have a training program that many new hires go through, including most new developers and all new traders. In that program, they learn about OCaml, our base libraries, our build system, the UNIX shell, Emacs, and our dev tools. Part of the job would be to help make the course better, both by improving what we have, and by adding new material.
Improving library documentation. While we expect developers to do a reasonable job of documenting their code, our most important libraries deserve the time and care to make them really shine. This is aimed both internally and externally, since a lot of these libraries, like Async, Core and Incremental, are open source.
Writing longer pieces. We need more tutorials and overviews on a variety of topics. Part of the work would be to create great new documentation, and part of it is to serve as an example for others as to what good documentation looks like. And where possible, we want to do this so that the documentation effectively compiles against our current APIs, preventing it from just drifting out of date.

In terms of skills, we want someone who is both a clear and effective written communicator, and who is good enough at programming to navigate our codebase, work through our tutorials, and write up examples. An interest in functional programming and expressive type systems is a plus, but you don't need to know any OCaml (the language we use). That's something we're happy to teach you here.

Hide

What do you mean ExceptT doesn't Compose? — Erik de Castro Lopo, Apr 30, 2017

Disclaimer: I work at Ambiata (our Github presence) probably the biggest Haskell shop in the southern hemisphere. Although I mention some of Ambiata's coding practices, in this blog post I am speaking for myself and not for Ambiata. However, the way I'm using ExceptT and handling exceptions in this post is something I learned from my colleagues at Ambiata.

At work, I've been spending some time tracking down exceptions in some of our Haskell code that have been bubbling up to the top level…

At work, I've been spending some time tracking down exceptions in some of our Haskell code that have been bubbling up to the top level an killing a complex multi-threaded program. On Friday I posted a somewhat flippant comment to Google Plus:

Using exceptions for control flow is the root of many evils in software.ï»¿

Lennart Kolmodin who I remember from my very earliest days of using Haskell in 2008 and who I met for the first time at ICFP in Copenhagen in 2011 responded:

Yet what to do if you want composable code? Currently I have
type Rpc a = ExceptT RpcError IO a
which is terrible

But what do we mean by "composable"? I like the wikipedia definition:

ï»¿Composability is a system design principle that deals with the inter-relationships of components. A highly composable system provides recombinant components that can be selected and assembled in various combinations to satisfy specific user requirements.

The ensuing discussion, which also included Sean Leather, suggested that these two experienced Haskellers were not aware that with the help of some combinator functions, ExceptT composes very nicely and results in more readable and more reliable code.

At Ambiata, our coding guidelines strongly discourage the use of partial functions. Since the type signature of a function doesn't include information about the exceptions it might throw, the use of exceptions is strongly discouraged. When using library functions that may throw exceptions, we try to catch those exceptions as close as possible to their source and turn them into errors that are explicit in the type signatures of the code we write. Finally, we avoid using String to hold errors. Instead we construct data types to carry error messages and render functions to convert them to Text.

In order to properly demonstrate the ideas, I've written some demo code and made it available in this GitHub repo. It compiles and even runs (providing you give it the required number of command line arguments) and hopefully does a good job demonstrating how the bits fit together.

So lets look at the naive version of a program that doesn't do any exception handling at all.


  import Data.ByteString.Char8 (readFile, writeFile)

  import Naive.Cat (Cat, parseCat)
  import Naive.Db (Result, processWithDb, renderResult, withDatabaseConnection)
  import Naive.Dog (Dog, parseDog)

  import Prelude hiding (readFile, writeFile)

  import System.Environment (getArgs)
  import System.Exit (exitFailure)

  main :: IO ()
  main = do
    args <- getArgs
    case args of
      [inFile1, infile2, outFile] -> processFiles inFile1 infile2 outFile
      _ -> putStrLn "Expected three file names." >> exitFailure

  readCatFile :: FilePath -> IO Cat
  readCatFile fpath = do
    putStrLn "Reading Cat file."
    parseCat <$> readFile fpath

  readDogFile :: FilePath -> IO Dog
  readDogFile fpath = do
    putStrLn "Reading Dog file."
    parseDog <$> readFile fpath

  writeResultFile :: FilePath -> Result -> IO ()
  writeResultFile fpath result = do
    putStrLn "Writing Result file."
    writeFile fpath $ renderResult result

  processFiles :: FilePath -> FilePath -> FilePath -> IO ()
  processFiles infile1 infile2 outfile = do
    cat <- readCatFile infile1
    dog <- readDogFile infile2
    result <- withDatabaseConnection $ \ db ->
                 processWithDb db cat dog
    writeResultFile outfile result

Once built as per the instructions in the repo, it can be run with:


  dist/build/improved/improved Naive/Cat.hs Naive/Dog.hs /dev/null
  Reading Cat file 'Naive/Cat.hs'
  Reading Dog file 'Naive/Dog.hs'.
  Writing Result file '/dev/null'.

The above code is pretty naive and there is zero indication of what can and cannot fail or how it can fail. Here's a list of some of the obvious failures that may result in an exception being thrown:

Either of the two readFile calls.
The writeFile call.
The parsing functions parseCat and parseDog.
Opening the database connection.
The database connection could terminate during the processing stage.

So lets see how the use of the standard Either type, ExceptT from the transformers package and combinators from Gabriel Gonzales' errors package can improve things.

Firstly the types of parseCat and parseDog were ridiculous. Parsers can fail with parse errors, so these should both return an Either type. Just about everything else should be in the ExceptT e IO monad. Lets see what that looks like:


  {-# LANGUAGE OverloadedStrings #-}
  import           Control.Exception (SomeException)
  import           Control.Monad.IO.Class (liftIO)
  import           Control.Error (ExceptT, fmapL, fmapLT, handleExceptT
                                 , hoistEither, runExceptT)

  import           Data.ByteString.Char8 (readFile, writeFile)
  import           Data.Monoid ((<>))
  import           Data.Text (Text)
  import qualified Data.Text as T
  import qualified Data.Text.IO as T

  import           Improved.Cat (Cat, CatParseError, parseCat, renderCatParseError)
  import           Improved.Db (DbError, Result, processWithDb, renderDbError
                               , renderResult, withDatabaseConnection)
  import           Improved.Dog (Dog, DogParseError, parseDog, renderDogParseError)

  import           Prelude hiding (readFile, writeFile)

  import           System.Environment (getArgs)
  import           System.Exit (exitFailure)

  data ProcessError
    = ECat CatParseError
    | EDog DogParseError
    | EReadFile FilePath Text
    | EWriteFile FilePath Text
    | EDb DbError

  main :: IO ()
  main = do
    args <- getArgs
    case args of
      [inFile1, infile2, outFile] ->
              report =<< runExceptT (processFiles inFile1 infile2 outFile)
      _ -> do
          putStrLn "Expected three file names, the first two are input, the last output."
          exitFailure

  report :: Either ProcessError () -> IO ()
  report (Right _) = pure ()
  report (Left e) = T.putStrLn $ renderProcessError e


  renderProcessError :: ProcessError -> Text
  renderProcessError pe =
    case pe of
      ECat ec -> renderCatParseError ec
      EDog ed -> renderDogParseError ed
      EReadFile fpath msg -> "Error reading '" <> T.pack fpath <> "' : " <> msg
      EWriteFile fpath msg -> "Error writing '" <> T.pack fpath <> "' : " <> msg
      EDb dbe -> renderDbError dbe


  readCatFile :: FilePath -> ExceptT ProcessError IO Cat
  readCatFile fpath = do
    liftIO $ putStrLn "Reading Cat file."
    bs <- handleExceptT handler $ readFile fpath
    hoistEither . fmapL ECat $ parseCat bs
    where
      handler :: SomeException -> ProcessError
      handler e = EReadFile fpath (T.pack $ show e)

  readDogFile :: FilePath -> ExceptT ProcessError IO Dog
  readDogFile fpath = do
    liftIO $ putStrLn "Reading Dog file."
    bs <- handleExceptT handler $ readFile fpath
    hoistEither . fmapL EDog $ parseDog bs
    where
      handler :: SomeException -> ProcessError
      handler e = EReadFile fpath (T.pack $ show e)

  writeResultFile :: FilePath -> Result -> ExceptT ProcessError IO ()
  writeResultFile fpath result = do
    liftIO $ putStrLn "Writing Result file."
    handleExceptT handler . writeFile fpath $ renderResult result
    where
      handler :: SomeException -> ProcessError
      handler e = EWriteFile fpath (T.pack $ show e)

  processFiles :: FilePath -> FilePath -> FilePath -> ExceptT ProcessError IO ()
  processFiles infile1 infile2 outfile = do
    cat <- readCatFile infile1
    dog <- readDogFile infile2
    result <- fmapLT EDb . withDatabaseConnection $ \ db ->
                 processWithDb db cat dog
    writeResultFile outfile result

The first thing to notice is that changes to the structure of the main processing function processFiles are minor but all errors are now handled explicitly. In addition, all possible exceptions are caught as close as possible to the source and turned into errors that are explicit in the function return types. Sceptical? Try replacing one of the readFile calls with an error call or a throw and see it get caught and turned into an error as specified by the type of the function.

We also see that despite having many different error types (which happens when code is split up into many packages and modules), a constructor for an error type higher in the stack can encapsulate error types lower in the stack. For example, this value of type ProcessError:


  EDb (DbError3 ResultError1)

contains a DbError which in turn contains a ResultError. Nesting error types like this aids composition, as does the separation of error rendering (turning an error data type into text to be printed) from printing.

We also see that with the use of combinators like fmapLT, and the nested error types of the previous paragraph, means that ExceptT monad transformers do compose.

Using ExceptT with the combinators from the errors package to catch exceptions as close as possible to their source and converting them to errors has numerous benefits including:

Errors are explicit in the types of the functions, making the code easier to reason about.
Its easier to provide better error messages and more context than what is normally provided by the Show instance of most exceptions.
The programmer spends less time chasing the source of exceptions in large complex code bases.
More robust code, because the programmer is forced to think about and write code to handle errors instead of error handling being and optional afterthought.

Want to discuss this? Try reddit.

Hide

New opam features: local switches — OCamlPro, Apr 27, 2017

Among the areas we wanted to improve on for opam 2.0 was the handling of switches. In opam 1.2, they are simply accessed by a name (the OCaml version by default), and are always stored into ~/.opam/<name>. This is fine, but can get a bit cumbersome when many switches are in presence, as there is no way to sort them or associate them with a given project.

A reminder about switches

For those unfamiliar with it, switches, in opam, are independent prefixes with their own compiler and set of i…

A reminder about switches

For those unfamiliar with it, switches, in opam, are independent prefixes with their own compiler and set of installed packages. The opam switch command allows to create and remove switches, as well as select the currently active one, where operations like opam install will operate.

Their uses include easily juggling between versions of OCaml, or of a library, having incompatible packages installed separately but at the same time, running tests without damaging your “main” environment, and, quite often, separation of environment for working on different projects.

You can also select a specific switch for a single command, with
opam install foo --switch other
or even for a single shell session, with
eval $(opam env --switch other)

What opam 2.0 adds to this is the possibility to create so-called local switches, stored below a directory of your choice. This gets users back in control of how switches are organised, and wiping the directory is a safe way to get rid of the switch.

Using within projects

This is the main intended use: the user can define a switch within the source of a project, for use specifically in that project. One nice side-effect to help with this is that, if a “local switch” is detected in the current directory or a parent, opam will select it automatically. Just don’t forget to run eval $(opam env) to make the environment up-to-date before running make.

Interface

The interface simply overloads the switch-name arguments, wherever they were present, allowing directory names instead. So for example:

cd ~/src/project
opam switch create ./

will create a local switch in the directory ~/src/project. Then, it is for example equivalent to run opam list from that directory, or opam list --switch=~/src/project from anywhere.

Note that you can bypass the automatic local-switch selection if needed by using the --switch argument, by defining the variable OPAMSWITCH or by using eval $(opam env --switch <name>)

Implementation

In practice, the switch contents are placed in a _opam/ subdirectory. So if you create the switch ~/src/project, you can browse its contents at ~/src/project/_opam. This is the direct prefix for the switch, so e.g. binaries can be found directly at _opam/bin/: easier than searching the opam root! The opam metadata is placed below that directory, in a .opam-switch/ subdirectory.

Local switches still share the opam root, and in particular depend on the repositories defined and cached there. It is now possible, however, to select different repositories for different switches, but that is a subject for another post.

Finally, note that removing that _opam directory is handled transparently by opam, and that if you want to share a local switch between projects, symlinking the _opam directory is allowed.

Current status

This feature has been present in our dev builds for a while, and you can already use it in the current beta.

Limitations and future extensions

It is not, at the moment, possible to move a local switch directory around, mainly due to issues related to relocating the OCaml compiler.

Creating a new switch still implies to recompile all the packages, and even the compiler itself (unless you rely on a system installation). The projected solution is to add a build cache, avoiding the need to recompile the same package with the same dependencies. This should actually be possible with the current opam 2.0 code, by leveraging the new hooks that are made available. Note that relocation of OCaml is also an issue for this, though.

Editing tools like ocp-indent or merlin can also become an annoyance with the multiplication of switches, because they are not automatically found if not installed in the current switch. But the user-setup plugin (run opam user-setup install) already handles this well, and will access ocp-indent or tuareg from their initial switch, if not found in the current one. You will still need to install tools that are tightly bound to a compiler version, like merlin and ocp-index, in the switches where you need them, though.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com. Please head to the latter for the comments!

Hide

Caveat Configurator: how to replace configs with code, and why you might not want to — Jane Street (Yaron Minsky), Apr 25, 2017

We have a new tech talk coming up on May 17th, from our very own Dominick LoBraico. This one is about how to represent configurations with programs. In some sense, this is an obvious idea. Lots of programmers have experienced the dysphoria that comes from watching your elegant little configuration format metamorphize into a badly constructed programming language with miserable tools. This happens because, as you try to make your configs clearer and more concise, you often end up walking down the…

Weekly News — OCaml Weekly News, Apr 25, 2017

Full Time: Front-end Developer at issuu in Copenhagen — GitHub Jobs, Apr 24, 2017

Fulltime, Copenhagen

issuu is the world's fastest-growing digital publishing platform. We are looking for a new member to join our fantastic team. With great people, unique ideas and stunning technology, we're changing the future of publishing today. Can you be the best at what you do? Join us!

About this job

As a Front-end Developer at issuu, you will be joining a team of highly skilled web enthusiasts building web applications in an agile environment. We currently develop for deskto…

Fulltime, Copenhagen

About this job

As a Front-end Developer at issuu, you will be joining a team of highly skilled web enthusiasts building web applications in an agile environment. We currently develop for desktops, tablets and mobile web. This requires great responsive sites that ensure the best possible user experience on all platforms.

We run a NodeJS server in a production system that serves billions of pages every month. Our client-side applications are a mix of vanilla JavaScript (transpiled with Babel from ES2015), and React / Redux, apps with an in-house developed, modular, BEM-styled styling framework.SASS enables us to keep our CSS well structured.

As our ideal candidate, you are excited about HTML5 features, like SVG and , and have a feel for the pulse of new developments in browser technology.In a system the size of issuu’s, code maintainability is just as important as accessibility. That also means that you are willing to share your code with other people or sit together for pair-programming sessions.

What we Like

We think browsers are awesome.
We love HTML5, CSS3 and all the new exciting web APIs.
We use React, Redux, ES2015 and CSS Modules.
We draw with Canvas, WebGL and SVGs.
We build modern progressive web apps.
We like declarative and functional programming.
We used to like jQuery and Backbone, but now we sort of grew apart.
We also like it if our front-end developers aren’t afraid of touching our backends, which we make with -
Python, NodeJS, OCaml and Erlang.

Qualifications

Demonstrated ability to deliver results in a fast-paced environment as a self-starter and quick learner
Expert understanding of browsers, web technologies (object-oriented JavaScript, ES2015, HTML, CSS), and a passion for the latest and greatest web standards
Experience in collaborating with UI Designers
Experience architecting and building robust, high-performance production web applications
Used to working with Design and Product teams in an agile fashion
Experience with building web applications for mobile web is a plus
Experience with MVC-based web applications is a plus
Experience with server-side languages like Python is a plus
Solid communications skills in English

What we Offer

You’ll be a part of issuu – an amazing place with room for parents, foodies, geeks, odd birds and the occasional normal person
Informal, inclusive and very flexible workplace
Competitive compensation package
Flat hierarchy – every opinion matters regardless of team and position
Regular hackathons
A sleek Mac and a pretty sweet desk setup
Great offices in Copenhagen, Palo Alto and Berlin

About the Team and Office

You will be joining our DK engineering team in Copenhagen consisting of approx. 27 developers. We are a diverse office with members of all parts of issuu, including Customer Support, Product, Design, Data Analytics and Management.

In a company the size of issuu, code maintainability is just as important as site accessibility. That also means that you are willing to share your code with other people or sit together for pair programming sessions.

Our office is conveniently located next to the Copenhagen Central Station. We provide nice, catered lunches each day, regular outings and monthly Friday bars with drinks, snacks and activities.

About issuu

issuu connects the world’s publishers to an audience of active consumers, on a global scale. Every day millions of people find, read and share billions of pieces of content they find meaningful, from every device. Millions of magazine, newspaper and catalog creators are empowered to distribute, measure and monetize their work through the issuu platform issuu. If you are interested in a sneak peek of who we are as issuuees, have a look at our brandbook.

Details

It’s a prerequisite that you have a valid EU work permit

Hide

Seventeenth OCaml compiler hacking evening at Pembroke — OCaml Labs compiler hacking, Apr 18, 2017

Our next OCaml Compiler Hacking event will be on Tuesday 16th May in The Old Library at Pembroke College, Cambridge.

If you're planning to come along, it'd be helpful if you could indicate interest via Doodle and sign up to the mailing list to receive updates.

Where: The Old Library, Pembroke College, Cambridge CB2 1RF

The Old Library is the first building on the left straight after the Porters Lodge.

When: 6:30pm, Tuesday 16th May

Who: anyone interested in impr…

News about Tyre | Drup's thingies — Gabriel Radanne, Apr 17, 2017

Here are some news about Tyre, along with release of version 0.3.

EzSudoku — OCamlPro, Apr 11, 2017

As you may have noticed, on the begining of April I have some urge to write something technical about some deeply specific point of OCaml. This time I’d like to tackle that through sudoku.

It appearch that Sudoku is of great importance considering the number of posts explaining how to write a solver. Following that trend I will explain how to write one in OCaml. But with a twist.

We will try to optimize it. I won’t show you anything as obvious as how to micro-optimize your code or so…

As you may have noticed, on the begining of April I have some urge to write something technical about some deeply specific point of OCaml. This time I’d like to tackle that through sudoku.

It appearch that Sudoku is of great importance considering the number of posts explaining how to write a solver. Following that trend I will explain how to write one in OCaml. But with a twist.

We will try to optimize it. I won’t show you anything as obvious as how to micro-optimize your code or some smart heuristc. No we are not aiming for being merely algorithmically good. We will try to make something serious, we are want it to be solved even before the program starts.

Yes really. Before. And I will show you how to use a feature of OCaml 4.03 that is sadly not well known.

—

First of all, as we do like type and safe programs, we will define what a well formed sudoku solution looks like. And by defining of course I mean declaring some GADTs with enough constraints to ensure that only well correct solutions are valid.

I assume tha you know the rules of Sudoku and will refrain from infuriating you by explaining it. But we will still need some vocabulary.

So the aim of sudoku is to fill a ‘grid’ with ‘symbols’ satisfying some ‘row’ ‘column’ and ‘square’ constraints.

To make the code examples readable we will stick to 4*4 sudokus. It’s the smallest size that behaves the same way as 9*9 ones (I considered going for 1*1 ones, but the article ended up being a bit short). Of course everything would still apply to any n^2*n^2 sized one.

So let’s start digging in some types. As we will refine them along the way, I will leave some parts to be filled later. This is represented by ‘…’ .

First there are symbols, just 4 of them befause we reduced the size. Nothing special about that right now.

type ... symbol =
| A : ...
| B : ...
| C : ...
| D : ...

And a grid is 16 symbols. To avoid too much visual clutter in the type I just put them linearly. The comment show how it is supposed to be seen in the 2d representation of the grid:

(* a b c d
   e f g h
   i j k l
   m n o p *)

type grid =
Grid :
... symbol * (* a *)
... symbol * (* b *)
... symbol * (* c *)
... symbol * (* d *)

... symbol * (* e *)
... symbol * (* f *)
... symbol * (* g *)
... symbol * (* h *)

... symbol * (* i *)
... symbol * (* j *)
... symbol * (* k *)
... symbol * (* l *)

... symbol * (* m *)
... symbol * (* n *)
... symbol * (* o *)
... symbol (* p *)
-> solution

Right now grid is a simple 16-uple of symbols, but we will soon start filling those ‘…’ to forbid any set of symbols that is not a valid solution.

Each constraint looks like, ‘among those 4 positions neither 2 symbols are the same’. To express that (in fact something equivalent but a bit simpler to state with our types), we will need to name positions. So let’s introduce some names:

type r1 (* the first position among a row constraint *)
type r2 (* the second position among a row constraint *)
type r3
type r4

type c1 (* the first position among a column constraint *)
type c2
type c3
type c4

type s1
type s2
type s3
type s4

type ('row, 'column, 'square) position

On the 2d grid this is how the various positions will be mapped.

r1 r2 r3 r4
r1 r2 r3 r4
r1 r2 r3 r4
r1 r2 r3 r4

c1 c1 c1 c1
c2 c2 c2 c2
c3 c3 c3 c3
c4 c4 c4 c4

s1 s2 s1 s2
s3 s4 s3 s4
s1 s2 s1 s2
s3 s4 s4 s4

For instance, the position g, in the 2nd row, 3rd column, will at the 3rd position in its row constraint, 2nd in its column constraint, and 3rd in its square constraint:

type g = (r3, c2, s3) position

We could have declare a single constraint position type, but this is slightly more readable. than:

type g = (p3, p2, p3) position

The position type is phantom, we could have provided a representation, but since no value of this type will ever be created, it’s less confusing to state it that way.

type a = (r1, c1, s1) position
type b = (r2, c1, s2) position
type c = (r3, c1, s1) position
type d = (r4, c1, s2) position

type e = (r1, c2, s3) position
type f = (r2, c2, s4) position
type g = (r3, c2, s3) position
type h = (r4, c2, s4) position

type i = (r1, c3, s1) position
type j = (r2, c3, s2) position
type k = (r3, c3, s1) position
type r = (r4, c3, s2) position

type m = (r1, c4, s3) position
type n = (r2, c4, s4) position
type o = (r3, c4, s3) position
type p = (r4, c4, s4) position

It is now possible to state for each symbol in which position it is, so we will start filling a bit those types.

type ('position, ...) symbol =
| A : (('r, 'c, 's) position, ...) symbol
| B : (('r, 'c, 's) position, ...) symbol
| C : (('r, 'c, 's) position, ...) symbol
| D : (('r, 'c, 's) position, ...) symbol

This means that a symbol value is then associated to a single position in each constraint. We will need to state that in the grid type too:

type grid =
Grid :
(a, ...) symbol * (* a *)
(b, ...) symbol * (* b *)
(c, ...) symbol * (* c *)
(d, ...) symbol * (* d *)

(e, ...) symbol * (* e *)
(f, ...) symbol * (* f *)
(g, ...) symbol * (* g *)
(h, ...) symbol * (* h *)

(i, ...) symbol * (* i *)
(j, ...) symbol * (* j *)
(k, ...) symbol * (* k *)
(l, ...) symbol * (* l *)

(m, ...) symbol * (* m *)
(n, ...) symbol * (* n *)
(o, ...) symbol * (* o *)
(p, ...) symbol (* p *)
-> solution

We just need to forbid a symbol to appear in two different positions of a given row/column/square to prevent invalid solutions.

type 'fields row constraint 'fields = < a : 'a; b : 'b; c : 'c; d : 'd >
type 'fields column constraint 'fields = < a : 'a; b : 'b; c : 'c; d : 'd >
type 'fields square constraint 'fields = < a : 'a; b : 'b; c : 'c; d : 'd >

Those types represent the statement ‘in this line/column/square, the symbol a is at the position ‘a, the symbol b is at the position ‘b, …’

For instance, the row ‘A D B C’ will be represented by

< a : l1; b : l3; c : l4; d : l2 > row

Which reads: ‘The symbol A is in first position, B in third position, C in fourth, and D in second’

The object type is used to make things a bit lighter later and allow to state names.

Now the symbols can be a bit more annotated:

type ('position, 'row, 'column, 'square) symbol =
| A : (('r, 'c, 's) position,
< a : 'r; .. > row,
< a : 'c; .. > column,
< a : 's; .. > square)
symbol

| B : (('r, 'c, 's) position,
< b : 'r; .. > row,
< b : 'c; .. > column,
< b : 's; .. > square)
symbol

| C : (('r, 'c, 's) position,
< c : 'r; .. > row,
< c : 'c; .. > column,
< c : 's; .. > square)
symbol

| D : (('r, 'c, 's) position,
< d : 'r; .. > row,
< d : 'c; .. > column,
< d : 's; .. > square)
symbol

Notice that ‘..’ is not ‘…’. Those dots are really part of the OCaml syntax: it means ‘put whatever you want here, I don’t care’. There is nothing more to add to this type.

This type declaration reports the position information. Using the same variable name ‘r in the position and in the row constraint parameter for instance means that both fields will have the same type.

For instance, a symbol ‘B’ in position ‘g’ would be in the 3rd position of its row, 2nd position of its column , and 3rd position of its square:

# let v : (g, _, _, _) symbol = B;;
val v :
(g, < b : r3 > row,
    < b : c2 > column,
    < b : s3 > square)
symbol = B

Those types constraints ensure that this is correctly reported.

The real output of the type checker is a bit more verbose, but I remove the irrelevant part:

val v :
(g, < a : 'a; b : r3; c : 'b; d : 'c > row,
    < a : 'd; b : c2; c : 'e; d : 'f > column,
    < a : 'g; b : s3; c : 'h; d : 'i > square)
symbol = B

We are now quite close from a completely constrained type. We just need to say that the various symbols from the same row/line/column constraint have the same type:

type grid =
Grid :
(a, 'row1, 'column1, 'square1) symbol *
(b, 'row1, 'column2, 'square1) symbol *
(c, 'row1, 'column3, 'square2) symbol *
(d, 'row1, 'column4, 'square2) symbol *

(e, 'row2, 'column1, 'square1) symbol *
(f, 'row2, 'column2, 'square1) symbol *
(g, 'row2, 'column3, 'square2) symbol *
(h, 'row2, 'column4, 'square2) symbol *

(i, 'row3, 'column1, 'square3) symbol *
(j, 'row3, 'column2, 'square3) symbol *
(k, 'row3, 'column3, 'square4) symbol *
(l, 'row3, 'column4, 'square4) symbol *

(m, 'row4, 'column1, 'square3) symbol *
(n, 'row4, 'column2, 'square3) symbol *
(o, 'row4, 'column3, 'square4) symbol *
(p, 'row4, 'column4, 'square4) symbol *

That is two symbols in the same row/column/square will share the same ‘row/’symbol/’square type. For any couple of symbols in say, a row, they must agree on that type, hence, on the position of every symbol.

Let’s look at the ‘A’ symbol for the ‘a’ and ‘c’ position for instance. Both share the same ‘row1 type variable. There are two cases. Either both are ‘A’s ore one is not.

* If one symbol is not a ‘A’, let’s say those are ‘C’ and ‘A’ symbols. Their row type (pun almost intended) will be respectively < c : r1; .. > and < a : r3; .. >. Meaning that ‘C’ does not care about the position of ‘A’ and conversly. Those types are compatible. No problem here.

* If both are ‘A’s then something else happens. Their row types will be < a : r1; .. > and < a : r3; .. > which is certainly not compatible since r1 and r3 are not compatible. This will be rejected.

Now we have a grid type that checks the sudoku constraints !

Let’s try it.

let ok =
Grid
(A, B, C, D,
C, D, A, B,

D, A, B, C,
B, C, D, A)

val ok : grid = Grid (A, B, C, D, C, D, A, B, D, A, B, C, B, C, D, A)

let not_ok =
Grid
(A, B, C, D,
C, D, A, B,

D, A, B, C,
B, C, A, D)

B, C, A, D);;
^
Error: This expression has type
(o, < a : r3; b : r1; c : r2; d : 'a > row,
    < a : c4; b : 'b; c : 'c; d : 'd > column,
    < a : s3; b : 'e; c : 'f; d : 'g > square)
  symbol
but an expression was expected of type
(o, < a : r3; b : r1; c : r2; d : 'a > row,
    < a : c2; b : c3; c : c1; d : 'h > column,
    < a : 'i; b : s1; c : s2; d : 'j > square)
  symbol
Types for method a are incompatible

What it is trying to say is that ‘A’ is both at position ‘2’ and ‘4’ of its column. Well it seems to work.

Solving it

But we are not only interested in checking that a solution is correct, we want to find them !

But with ‘one weird trick’ we will magically transform it into a solver, namely the -> . syntax. It was introduced in OCaml 4.03 for some other purpose. But we will now use its hidden power !

This is the right hand side of a pattern. It explicitely states that a pattern is unreachable. For instance

type _ t =
  | Int : int -> int t
  | Float : float -> float t

let add (type v) (a : v t) (b : v t) : v t =
  match a, b with
  | Int a, Int b -> Int (a + b)
  | Float a, Float b -> Float (a +. b)
  | _ -> .

By writing it here you state that you don’t expect any other pattern to verify the type constraints. This is effectively the case here. In general you won’t need this as the exhaustivity checker will see it. But in some intricate situations it will need some hints to work a bit more. For more information see Jacques Garrigue / Le Normand article

This may be a bit obscure, but this is what we now need. Indeed, we can ask the exhaustivity checker if there exist a value verifying the pattern and the type constraints. For instance to solve a problem, we ask the compiler to check if there is any value verifying a partial solution encoded as a pattern.

 A _ C _
 _ D _ B
 _ A D _
 D _ B _

let test x =
match x with
| Grid
  (A, _, C, _,
   _, D, _, B,

   _, A, D, _,
   D, _, B, _) -> .
| _ -> ()

Error: This match case could not be refuted.
Here is an example of a value that would reach it:
Grid (A, B, C, D, C, D, A, B, B, A, D, C, D, C, B, A)

The checker tells us that there is a solution verifying those constraints, and provides it.

If there were no solution, there would have been no error.

let test x =
match x with
| Grid
  (A, B, C, _,
   _, _, _, D,

   _, _, _, _,
   _, _, _, _) -> .
| _ -> ()

val test : grid -> unit =

And that’s it !

Wrapping it up

Of course that’s a bit cheating since the program is not executable, but who cares really ?
If you want to use it, I made a small (ugly) script generating those types. You can try it on bigger problems, but in fact it is a bit exponential. So you shouldn’t really expect an answer too soon.

Hide

Weekly News — OCaml Weekly News, Apr 11, 2017

Version 0.0.2 of the Frama-Clang plugin plugin is available for download. — Frama-C, Apr 05, 2017

Weekly News — OCaml Weekly News, Apr 04, 2017

Ann: Pumping | Drup's thingies — Gabriel Radanne, Apr 01, 2017

I’m happy to announce the release of Pumping, a library to leverage the OCaml type system to recognize regular languages.

Dealing with source code locations (in lexical and syntax analysis) — Shayne Fletcher, Mar 30, 2017

Locations

Writing compilers and interpreters requires rigorous management of source code locations harvested during syntax analysis and associated error handling mechanisms that involve reporting those locations along with details of errors they associate to.

This article does a "deep dive" into the the Location module of the OCaml compiler. The original source can be found in the ocaml/parsing directory of an OCaml distribution (copyright Xavier…

Locations

This article does a "deep dive" into the the Location module of the OCaml compiler. The original source can be found in the ocaml/parsing directory of an OCaml distribution (copyright Xavier Leroy).

Location is a masterclass in using the standard library Format module. If you have had difficulties understanding Format and what it provides the OCaml programmer, then this is for you. Furthermore, Location contains invaluable idioms for error reporting & exception handling. Learn them here to be able to apply them in your own programs.

Describing locations

A location corresponds to a range of characters in a source file. Location defines this type and a suite of functions for the production of location values.


type t = { 
  loc_start : Lexing.position;
  loc_end : Lexing.position;
  loc_ghost : bool
}

val none : t
val in_file : string → t
val init : Lexing.lexbuf → string → unit
val curr : Lexing.lexbuf → t

val symbol_rloc : unit → t
val symbol_gloc : unit → t
val rhs_loc : int → t

type 'a loc = { 
  txt : 'a;
  loc : t; 
}

val mkloc : 'a → t → 'a loc
val mknoloc : 'a → 'a loc

A value of the (standard library) type Lexing.position describes a point in a source file.


    type position = {
      pos_fname : string;
      pos_lnum : int;
      pos_bol : int;
      pos_cnum : int
    }

The fields of this record have the following meanings:

pos_fname is the file name
pos_lnum is the line number
pos_bol is the offset of the beginning of the line (the number of characters between the beginning of the lexbuf and the beginning of the line)
pos_cnum is the offset of the position (number of characters between the beginning of the lexbuf (details below) and the position)

The difference between pos_cnum and pos_bol is the character offset within the line (i.e. the column number, assuming each character is one column wide).

A location in a source file is defined by two positions : where the location starts and where the location ends.


type t = {
  loc_start : position;
  loc_end : position;
  loc_ghost : bool
}

The third field loc_ghost is used to disambiguate locations that do not appear explicitly in the source file. A location will not be marked as ghost if it contains a piece of code that is syntactically valid and corresponds to an AST node and will be marked as a ghost location otherwise.

There is a specific value denoting a null position. It is called none and it is defined by the function in_file.


let in_file (name : string) : t =
  let loc : position = {
    pos_fname = name; (*The name of the file*)
    pos_lnum = 1; (*The line number of the position*)
    pos_bol = 0; (*Offset from the beginning of the lexbuf of the line*)
    pos_cnum = -1; (*Offset of the position from the beginning of the lexbuf*)
  } in
  { loc_start = loc; loc_end = loc; loc_ghost = true }

let none : t = in_file "_none_"

Lexing.lexbuf is the (standard library) type of lexer buffers. A lexer buffer is the argument passed to the scanning functions defined by generated scanners (lexical analyzers). The lexer buffer holds the current state of the scanner plus a function to refill the buffer from the input.


type lexbuf = {
  refill_buff : lexbuf → unit;
  mutable lex_buffer : bytes;
  mutable lex_buffer_len : int;
  mutable lex_abs_pos : int;
  mutable lex_start_pos : int;
  mutable lex_curr_pos : int;
  mutable lex_last_pos : int;
  mutable lex_last_action : int;
  mutable lex_eof_reached : bool;
  mutable lex_mem : int array;
  mutable lex_start_p : position;
  mutable lex_curr_p : position;
}

At each token, the lexing engine will copy lex_curr_p to lex_start_p then change the pos_cnum field of lex_curr_p by updating it with the number of characters read since the start of the lexbuf. The other fields are left unchanged by the the lexing engine. In order to keep them accurate, they must be initialized before the first use of the lexbuf and updated by the relevant lexer actions.


(*Set the file name and line number of the [lexbuf] to be the start
   of the named file*)
let init (lexbuf : Lexing.lexbuf) (fname : string) : unit =
  let open Lexing in
  lexbuf.lex_curr_p <- {
    pos_fname = fname;
    pos_lnum = 1;
    pos_bol = 0;
    pos_cnum = 0;
  }

The location of the current token in a lexbuf is computed by the function curr.


(*Get the location of the current token from the [lexbuf]*)
let curr (lexbuf : Lexing.lexbuf) : t = 
  let open Lexing in {
    loc_start = lexbuf.lex_start_p;
    loc_end = lexbuf.lex_curr_p;
    loc_ghost = false;
  }

Parsing is the run-time library for parsers generated by ocamlyacc. The functions symbol_start and symbol_end are to be called in the action part of a grammar rule (only). They return the offset of the string that matches the left-hand-side of the rule : symbol_start returns the offset of the first character; symbol_end returns the offset after the last character. The first character in a file is at offset 0.

symbol_start_pos and symbol_end_pos are like symbol_start and symbol_end but return Lexing.position values instead of offsets.


(*Compute the span of the left-hand-side of the matched rule in the
   program source*)
let symbol_rloc () : t = {
  loc_start = Parsing.symbol_start_pos ();
  loc_end = Parsing.symbol_end_pos ();
  loc_ghost = false
}

(*Same as [symbol_rloc] but designates the span as a ghost range*)
let symbol_gloc () : t =  { 
  loc_start = Parsing.symbol_start_pos ();
  loc_end = Parsing.symbol_end_pos ();
  loc_ghost = true
}

The Parsing functions rhs_start and rhs_end are the same as symbol_start and symbol_end but return the offset of the string matching the n-th item on the right-hand-side of the rule where n is the integer parameter to rhs_start and rhs_end. n is 1 for the leftmost item. rhs_start_pos and rhs_end_pos return a position instead of an offset.


(*Compute the span of the [n]th item of the right-hand-side of the
   matched rule*)
let rhs_loc n = {
  loc_start = Parsing.rhs_start_pos n;
  loc_end = Parsing.rhs_end_pos n;
  loc_ghost = false;
}

The type 'a loc associates a value with a location.


(*A type for the association of a value with a location*)
type 'a loc = { 
  txt : 'a;
  loc : t; 
}

(*Create an ['a loc] value from an ['a] value and location*)
let mkloc (txt : 'a) (loc : t) : 'a loc = { txt ; loc }

(*Create an ['a loc] value bound to the distinguished location called
   [none]*)
let mknoloc (txt : 'a) : 'a loc = mkloc txt none

Error reporting with locations

Location has a framework for error reporting across modules concerned with locations (think lexer, parser, type-checker, etc).


open Format

type error =
{
  loc : t;
  msg : string;
  sub : error list;
}

val error_of_printer : t →  (formatter → 'a → unit) → 'a → error
val errorf_prefixed : ?loc : t → ?sub : error list → ('a, Format.formatter, unit, error) format4 → 'a

So, in the definition of the error record, loc is a location in the source code, msg an explanation of the error and sub a list of related errors. We deal here with the error formatting functions. The utility function print_error_prefix simply writes an error prefix to a formatter.


let error_prefix = "Error"

let warning_prefix = "Warning"

let print_error_prefix (ppf : formatter) () : unit =
  fprintf ppf "@{<error>%s@}:" error_prefix;
  ()

The syntax, "@{<error>%s}@" associates the embedded text with the named tag "error".

Next another utility, pp_ksprintf.


let pp_ksprintf 
    ?(before : (formatter → unit) option) 
    (k : string → 'd)
    (fmt : ('a, formatter, unit, 'd) format4) : 'a =
  let buf = Buffer.create 64 in
  let ppf = Format.formatter_of_buffer buf in
  begin match before with
    | None → ()
    | Some f → f ppf
  end;
  kfprintf
    (fun (_ : formatter) : 'd →
      pp_print_flush ppf ();
      let msg = Buffer.contents buf in
      k msg)
    ppf fmt

It proceeds as follows. A buffer and a formatter over that buffer is created. When presented with all of the arguments of the format operations implied by the fmt argument, if the before argument is non-empty, call it on the formatter. Finally, call kfprintf (from the standard library Format module) which performs the format operations on the buffer before handing control to a function that retrieves the contents of the now formatted buffer and passes them to the user provided continuation k.

With pp_ksprintf at our disposal, one can write the function errorf_prefixed.


let errorf_prefixed 
    ?(loc:t = none) 
    ?(sub : error list = []) 
    (fmt : ('a, Format.formatter, unit, error) format4) : 'a =
  let e : 'a =
    pp_ksprintf
      ~before:(fun ppf → fprintf ppf "%a " print_error_prefix ())
      (fun (msg : string) : error → {loc; msg; sub})
    fmt
  in e

errorf_prefixed computes a function. The function it computes provides the means to produce error values by way of formatting operations to produce the msg field of the error result value. The formatting operations include prefixing the msg field with the error_prefix string. The type of the arguments of the computed function unifies with the type variable 'a. In other words, the type of the computed function is 'a → error. For example, the type of errorf_prefixed "%d %s" is int → string → error.

The groundwork laid with errorf_prefixed above means a polymorphic function error_of_printer can now be produced.


let error_of_printer 
    (loc : t) 
    (printer : formatter → 'error_t → unit) 
    (x : 'error_t) : error =
  let mk_err : 'error_t → error = 
    errorf_prefixed ~loc "%a@?" printer in
  mk_err x

The idea is that error_of_printer is provided a function that can format a value of type 'error. This function is composed with errorf_prefixed thereby producing a function of type 'error → error. For example, we can illustrate how this works by making an error of a simple integer with code like the following:


# error_of_printer none (fun ppf x → Format.fprintf ppf "Code %d" x) 3;;
- : error =
{loc =
  {loc_start =
    {pos_fname = "_none_"; pos_lnum = 1; pos_bol = 0; pos_cnum = -1};
   loc_end = {pos_fname = "_none_"; pos_lnum = 1; pos_bol = 0; pos_cnum = -1};
   loc_ghost = true};
 msg = "Error: Code 3"; sub = []}

So, that's error_of_printer. The following utility function is much simpler - it simply writes a given filename to a formatter.


let print_filename (ppf : formatter) (file : string) : unit =
  fprintf ppf "%s" file

Next, a set of constants for consistent messages that involve locations and a function to get the file, line and column of a position.


let (msg_file, msg_line, msg_chars, msg_to, msg_colon) =
    ("File \"",        (*'msg file'*)
     "\", line ",      (*'msg line'*)
     ", characters ",  (*'msg chars'*)
     "-",              (*'msg to'*)
     ":")              (*'msg colon'*)

let get_pos_info pos = (pos.pos_fname, pos.pos_lnum, pos.pos_cnum - pos.pos_bol)

Making use of the above we have now print_loc : a function to print a location on a formatter in terms of file, line and character numbers.


let print_loc (ppf : formatter) (loc : t) : unit  =
  let (file, line, startchar) = get_pos_info loc.loc_start in
  let endchar = loc.loc_end.pos_cnum - loc.loc_start.pos_cnum + startchar in
  if file = "//toplevel//" then
    fprintf ppf "Characters %i-%i"
      loc.loc_start.pos_cnum loc.loc_end.pos_cnum
  else begin
    fprintf ppf "%s@{<loc>%a%s%i" msg_file print_filename file msg_line line;
    if startchar >= 0 then
      fprintf ppf "%s%i%s%i" msg_chars startchar msg_to endchar;
    fprintf ppf "@}"
  end

Locations generally speaking come out in a format along the lines of: File "<string>, line 1, characters 0-10:"


let print (ppf : formatter) (loc : t) : unit =
  (* The syntax, [@{<loc>%a@}] associates the embedded text with the
     named tag 'loc'*)
  fprintf ppf "@{<loc>%a@}%s@." print_loc loc msg_colon

That last function, print is just a small wrapper over print_loc that appends a colon to the location.

Exception handling involving errors with locations

This section is concerned with the following section of the Location's signature.


val register_error_of_exn : (exn → error option) → unit
val error_of_exn : exn → error option
val error_reporter : (formatter → error → unit) ref
val report_error : formatter → error → unit
val report_exception : formatter → exn → unit

Location contains a mutable list of exception handlers where an exception handler is a function of type exn → error option.


let error_of_exn : (exn → error option) list ref = ref []

A function is provided that adds an exception handler to the above list.


let register_error_of_exn f = error_of_exn := f :: !error_of_exn

The next function error_of_exn (yes, it is the only remaining function that manipulates the list error_exn previously defined directly) walks the list looking for a handler returning the contents of the result of the first such function that doesn't return a None value.


let error_of_exn exn =
  let rec loop = function
    | [] → None
    | f :: rest →
      match f exn with
      | Some _ as r → r
      | None → loop rest
  in
  loop !error_of_exn

We define now a "default" error reporting function. Given a formatter and an error, write the error location, an explanation of the error to the formatter and the same for any associated "sub" errors.


let rec default_error_reporter 
    (ppf : formatter) ({loc; msg; sub} : error) : unit =
  print ppf loc;
  Format.pp_print_string ppf msg;
  List.iter (Format.fprintf ppf "@\n@[<2>%a@]" default_error_reporter) sub

Now, error_reporter itself is a reference cell with default value default_error_reporter.


let error_reporter = ref default_error_reporter

This next function, print_updating_num_loc_lines looks more complicated than it is but does demonstrate a rather advanced usage of Format by containing calls to the functions pp_get_formatter_out_functions, pp_set_formatter_out_functions to tempoarily replace the default function for writing strings. The semantic of the function is to print an error on a formatter incidentally recording the number of lines required to do so.


(* A mutable line count*)
let num_loc_lines : int ref = ref 0

(*Prints an error on a formatter incidentally recording the number of
  lines required to do so*)
let print_updating_num_loc_lines 
    (ppf : formatter) 
    (f : formatter → error → unit) 
    (arg : error) : unit =
  (*A record of functions of output primitives*)
  let out_functions : formatter_out_functions
      = pp_get_formatter_out_functions ppf () in
  (*The strategoy is to temporarily replace the basic function for
    writing a string with this one*)
  let out_string (str : string) (start : int) (len : int) : unit =
    (*A function for counting line breaks in [str]. [c] is the current
      count, [i] is the current char under consideration*)
    let rec count (i : int) (c : int) : int=
      if i = start + len then c
      else if String.get str i = '\n' then count (succ i) (succ c)
      else count (succ i) c 
    in
    (*Update the count*)
    num_loc_lines := !num_loc_lines + count start 0;
    (*Write the string to the formatter*)
    out_functions.out_string str start len 
  in
  (*Replace the standard string output primitive with the one just
    defined *)
  pp_set_formatter_out_functions ppf 
    {out_functions with out_string} ;
  (*Write the error to the formatter*)
  f ppf arg ;
  pp_print_flush ppf ();
  (*Restore the standard primitive*)
  pp_set_formatter_out_functions ppf out_functions

The function report_error uses the currently installed error reporter to write an error report for a given error and formatter incidentally updating a count indicating the number of lines written.


let report_error (ppf : formatter) (err : error) : unit=
  print_updating_num_loc_lines ppf !error_reporter err

This next function, report_exception_rec tries a number of times to find a handler for a given error and if successful formats it. In the worst case a handler is never found and the exception propogates.


let rec report_exception_rec (n : int) (ppf : formatter) (exn : exn) : unit =
  (*Try to find a handler for the exception*)
  try match error_of_exn exn with
  | Some err → 
    (*Format the resulting error using the current error reporter*)
    fprintf ppf "@[%a@]@." report_error err 
  (*The syntax @[%a@]@ writes function output in a box followed by a
    'cut' break hint*)
  | None → raise exn (*A handler could not be found*)
  with exn when n > 0 →
    (*A handler wasn't found. Try again*)
    report_exception_rec (n - 1) ppf exn

The last piece is report_exception. It attempts to write an error report for the given exception on the provided formatter. The exception can be re-raised if no handler is found.


let report_exception (ppf : formatter) (exn : exn) : unit = 
  report_exception_rec 5 ppf exn

Usage

In this section we see how an example of how the above machinery is used. Consider defining a lexical analyzer as an example. Suppose the scanner is defined by the file lexer.mll (the input file to ocamllex). We can imagine its header containing code like the following.


     {
        (*The cases of lexer errors*)
        type error =
          | Illegal_character of char
          | Unterminated_comment of Location.t

        (*The lexer exception type*)
        exception Error of error * Location.t

        (*This function takes a formatter and an instance of type
          [error] and writes a message to the formatter explaining the
          meaning. This is a "printer"*)
        let report_error (ppf : Format.formatter) : error → unit = function
         | Illegal_character c → 
            Format.fprintf ppf "Illegal character (%s)" (Char.escaped c)
         | Unterminated_comment _ → 
            Format.fprintf ppf "Comment not terminated"

        (*Note that [report_error] is a function that unifies with
          the [formatter → 'a → unit] parameter of
          [error_of_printer]*)

        (*Register an exception handler for the lexer exception type*)
        let () =
         Location.register_error_of_exn
          (function
           | Error (err, loc) →
              Some (Location.error_of_printer loc report_error err)
           | _ →  None
          )
     }

     /*...*/
     rule token = ...

A function to handle errors with attached locations (in a REPL for example) is expressible as an idiom as simple as something like this.


let handle_interpreter_error ?(finally=(fun () → ())) ex =
  finally ();
  Location.report_exception (Format.std_formatter) ex

let safe_proc ?finally f =
  try f ()
  with exn → handle_interpreter_error ?finally exn

Hide

Weekly News — OCaml Weekly News, Mar 28, 2017

Two PhD positions in Ljubljana starting October 2017 — Andrej Bauer, Mar 28, 2017

We are looking for two PhD students at the Faculty of Mathematics and Physics, University of Ljubljana. The programme starts in October 2017 and lasts three years. The positions will be fully funded (subject to approval by the funding agency). The candidates should have a Master’s degree in mathematics or computer science. No knowledge of Slovene is required.

The first PhD student will be advised by dr. Andrej Bauer. The topic of research is foundations of type theory. The candidate should…

Polynomials over rings — Shayne Fletcher, Mar 21, 2017

Polynomials over rings

This post provides a workout in generic programming using modules & functors.

The program presented here models univariate polynomials over rings based on an exercise in "The Module Language" chapter, of Didier Rémy's book, Using, Understanding and Unraveling the OCaml Lanaguage.

Arithmetics and rings

We begin with a type for modules implementing arithmetic.


    module type ARITH = sig
      type t
    …

Polynomials over rings

This post provides a workout in generic programming using modules & functors.

Arithmetics and rings

We begin with a type for modules implementing arithmetic.


    module type ARITH = sig
      type t
      val of_int : int -> t            val to_int : t -> int
      val of_string : string -> t      val to_string : t -> string
      val zero : t                     val one : t
      val add : t -> t -> t            val sub : t -> t -> t
      val mul : t -> t -> t            val div : t -> t -> t
      val compare : t -> t -> int      val equal : t -> t -> bool
    end;;

A ring is a set equipped with two binary operations that generalize the arithmetic operations of addition and multiplication.


    module type RING = sig
      type t
      type extern_t
      val print : t -> unit
      val make : extern_t -> t         val show : t -> extern_t
      val zero : t                     val one : t
      val add : t -> t -> t            val mul : t -> t -> t
      val equal : t -> t -> bool
    end;;

We can build rings over arithmetics with functors. This particular one fixes the external representation of the elements of the ring to int.


    module Ring (A : ARITH) :
      RING  with type t = A.t and type extern_t = int =
    struct
      include A
      type extern_t = int
      let make = of_int                let show = to_int
      let print x = print_int (show x)
    end;;

Thus, here for example are rings over various specific arithmetic modules.


    module Ring_int32 = Ring (Int32);;
    module Ring_int64 = Ring (Int64);;
    module Ring_nativeint = Ring (Nativeint);;
    module Ring_int = Ring (
      struct
        type t = int
        let of_int x = x                   let to_int x = x
        let of_string = int_of_string      let to_string = string_of_int
        let zero = 0                       let one = 1
        let add = ( + )                    let sub = ( - )
        let mul = ( * )                    let div = ( / )
        let compare = Pervasives.compare   let equal = ( = )
      end
    );;

Polynomials

We define now the type of polynomials.


    module type POLYNOMIAL = sig
      type coeff (*Type of coefficients*)
      type coeff_extern_t (*Type of coeff. external rep*)

      (*Polynomials satisfy the ring interface*)
      include RING (*Declares a type [t] and [extern_t]*)

      (*Function to evaluate a polynomial at a point*)
      val eval : t -> coeff -> coeff
    end;;

Given a module implementing a ring, we can generate a module implementing polynomials with coefficients drawn from the ring.


    module Polynomial (R : RING) :
      POLYNOMIAL with type coeff = R.t
      and type coeff_extern_t = R.extern_t
      and type extern_t = (R.extern_t * int) list =
    struct
    
      type coeff = R.t (*Coefficient type*)
      type coeff_extern_t = R.extern_t (*External coeff. rep*)
      type extern_t = (coeff_extern_t * int) list (*External polynomial rep*)
    
      (*List of coefficients and their powers*)
      type t = (coeff * int) list (*Invariant : Ordered by powers,
                                    lower order terms at the front*)

      (* ... *)

    end;;

As the comments indicate, the polynomial data structure is a list of pairs of coefficients and powers, ordered so that lower powers come before higher ones. Here's a simple printing utility to aid visualization.


    let print p =
      List.iter
        (fun (c, k) -> Printf.printf "+ (";
          R.print c;
          Printf.printf ")X^%d " k)
        p

In order that we get a canonical representation, null coefficients are eliminated. In particular, the null monomial is simply the empty list.


    let zero = []

The multiplicative identity one is not really necessary as it is just a particular monomial however, its presence makes polynomials themselves satisfy the interface of rings.


    let one = [R.one, 0]

This helper function constructs monomials.


    let monomial (a : coeff) (k : int) =
      if k < 0 then
        failwith "monomial : negative powers not supported"
      else if R.equal a R.zero then [] else [a, k]

Next up, we define addition of polynomials by the following function. Care is taken to ensure the representation invariant is respected.


    let rec add u v =
      match u, v with
      | [], _ -> v
      | _, [] -> u
      | ((c1, k1) :: r1 as p1), ((c2, k2) :: r2 as p2) ->
        if k1 < k2 then
          (c1, k1) :: (add r1 p2)
        else if k1 = k2 then
          let c = R.add c1 c2 in
          if R.equal c R.zero then add r1 r2
          else (c, k1) :: (add r1 r2)
        else (c2, k2) :: (add p1 r2)

With monomial and add avaialable, we can now write make that computes a polynomial from an external representation. We also give the inverse function show here too.


    let make l =
      List.fold_left (fun acc (c, k) ->
        add (monomial (R.make c) k) acc) zero l

    let show p =
      List.fold_right (fun (c, k) acc -> (R.show c, k) :: acc) p []

The module private function times left-multiplies a polynomial by a monomial.


    let rec times (c, k) = function
      | [] -> []
      | (c1, k1) :: q ->
        let c2 = R.mul c c1 in
        if R.equal c2 R.zero then times (c, k) q
        else (c2, k + k1) :: times (c, k) q

Given the existence of times, polynomial multiplication can be expressed in a "one-liner".


    let mul p = List.fold_left (fun r m -> add r (times m p)) zero

Comparing two polynomials for equality is achieved with the following predicate.


    let rec equal p1 p2 =
      match p1, p2 with
      | [], [] -> true
      | (c1, k1) :: q1, (c2, k2) :: q2 ->
        k1 = k2 && R.equal c1 c2 && equal q1 q2
      | _ -> false

In the course of evaluating polynomials for a specific value of their indeterminate, we'll require a function for computing powers. The following routine uses the exponentiation by squaring technique.


    let rec pow c = function
      | 0 -> R.one
      | 1 -> c
      | k ->
        let l = pow c (k lsr 1) in
        let l2 = R.mul l l in
        if k land 1 = 0 then l2 else R.mul c l2

Finally, the function eval for evaluation of a polynomial at a specific point. The implementation uses Horner's rule for computationally efficient evaluation.


    let eval p c = match List.rev p with
      | [] -> R.zero
      | (h :: t) ->
        let reduce (a, k) (b, l) =
          let n = pow c (k - l) in
          let t = R.add (R.mul a n) b in
          (t, l)  in
        let a, k = List.fold_left reduce h t in
        R.mul (pow c k) a

Testing and example usage

The following interactive session creates a polynomial with integer coefficients module and uses it to confirm the equivalence of $(1 + x)(1 - x)$ with $(1 - x^{2})$.


    # #use "polynomial.ml";;
    # module R = Ring_int;;
    # module P = Polynomial (R);;
    # let p = P.mul (P.make [(1, 0); (1, 1)]) (P.make [(1, 0); (-1, 1)]);;
    # let q = P.make [(1, 0); (-1, 2)];;
    # P.equal p q;;
    - : bool = true

Polynomials in two variables, can be treated as univariate polynomials with polynomial coefficients. For example, the polynomial $(X + Y)$ can be regarded as $(1 \times X^{1})Y^{0} + (1 \times X^{0})Y^{1}$. Similarly we can write $X - Y$ as $(1 \times X^{1})Y^{0} + (-1 \times X^{0}) Y^1$ and now check the equivalence $(X + Y)(X - Y) = (X^{2} - Y^{2})$.


    #module Y = Polynomial (P);;
    
    #(* (X + Y) *)
    #let r = Y.make [
    #  ([1, 1], 0); (*(1 X^1) Y^0*)
    #  ([1, 0], 1)  (*(1 X^0) Y^1*)
    #];;

    #(* (X - Y) *)
    #let s = Y.make [
    #  ([1, 1], 0); (*(1 X^1) Y^0*)
    #  ([-1, 0], 1) (*((-1) X^0) Y^1*)
    #];;

    #(* (X^2 - Y^2) *)
    #let t = Y.make [
    #  ([1, 2], 0);   (*(1 X^2) Y^0*)
    #  ([-1, 0], 2)  (* (-1 X^0) Y^2*)
    #];;

    #Y.equal (Y.mul r s) t;;
    - : bool = true

Hide

Weekly News — OCaml Weekly News, Mar 21, 2017

Trivial meta-programming with cinaps — Jane Street (Jeremie Dimino), Mar 20, 2017

From now and then, I found myself having to write some mechanical and repetitive code. The usual solution for this is to write a code generator; for instance in the form of a ppx rewriter in the case of OCaml code. This however comes with a cost: code generators are harder to review than plain code and it is a new syntax to learn for other developers. So when the repetitive pattern is local to a specific library or not widely used, it is often not worth the effort. Especially if the code in ques…

Then there is the possibility of using a macro pre-processor such as cpp or cppo which is the equivalent of cpp but for OCaml. This can help in some cases but this has a cost as well:

macros generally make the code harder to read
errors tends to be harder to understand since they don't point where you'd expect
you can say goodbye to merlin

In fact, when the repetitive pattern is specific to one particular case and of reasonable size, committing and reviewing the generated code is acceptable. That's the problem Cinaps tries to solve.

What is cinaps?

Cinaps is an application that reads input files and recognize special syntactic forms. Such forms are expected to embed some OCaml code printing something to stdout. What they print is compared against what follow these special forms. The rest works exactly the same as expectation tests.

The special form is (*$ <ocaml-code> *) for ml source files, /*$ <ocaml-code> */ for C source files and #|$ <ocaml-code> |# for S-expression files.

For instance:

$ cat file.ml
let x = 1
(*$ print_newline ();
    List.iter (fun s -> Printf.printf "let ( %s ) = Pervasives.( %s )\n" s s)
      ["+"; "-"; "*"; "/"] *)
(*$*)
let y = 2

$ cinaps file.ml
---file.ml
+++file.ml.corrected
File "file.ml", line 5, characters 0-1:
  let x = 1
  (*$ print_newline ();
      List.iter (fun s -> Printf.printf "let ( %s ) = Pervasives.( %s )\n" s s)
        ["+"; "-"; "*"; "/"] *)
+|let ( + ) = Pervasives.( + )
+|let ( - ) = Pervasives.( - )
+|let ( * ) = Pervasives.( * )
+|let ( / ) = Pervasives.( / )
  (*$*)
  let y = 2

$ echo $?
1
$ cp file.ml.corrected file.ml
$ cinaps file.ml
$ echo $?
0

Real example

What follows is a real example where using Cinaps made the code much easier to write and maintain. However, I changed the names for this blog post since this code is not released publicly. Note also that this example shows one way we usually write C bindings at Jane Street. It is not meant as a model of how to write C bindings, and the excellent ctypes library should be the default choice in most cases. However, this code pre-dates ctypes and migrating it would be quite a lot of work.

The example itself is part of a C binding that I wrote a few years ago. While doing so I used Core.Flags in order to represent a few C enumerations on the OCaml side. Core.Flags is a module providing a nice abstraction for representing C flags.

The OCaml code looks like what you'd expect from code using Core.Flags:

module Open_flags = struct
external get_rdonly : unit -> Int63.t = "mylib_O_RDONLY" [@@noalloc]
external get_wronly : unit -> Int63.t = "mylib_O_WRONLY" [@@noalloc]
external get_rdwr : unit -> Int63.t = "mylib_O_RDWR" [@@noalloc]
external get_nonblock : unit -> Int63.t = "mylib_O_NONBLOCK" [@@noalloc]
external get_append : unit -> Int63.t = "mylib_O_APPEND" [@@noalloc]
external get_creat : unit -> Int63.t = "mylib_O_CREAT" [@@noalloc]
external get_trunc : unit -> Int63.t = "mylib_O_TRUNC" [@@noalloc]
external get_excl : unit -> Int63.t = "mylib_O_EXCL" [@@noalloc]
external get_noctty : unit -> Int63.t = "mylib_O_NOCTTY" [@@noalloc]
external get_dsync : unit -> Int63.t = "mylib_O_DSYNC" [@@noalloc]
external get_sync : unit -> Int63.t = "mylib_O_SYNC" [@@noalloc]
external get_rsync : unit -> Int63.t = "mylib_O_RSYNC" [@@noalloc]
let rdonly = get_rdonly ()
let wronly = get_wronly ()
let rdwr = get_rdwr ()
let nonblock = get_nonblock ()
let append = get_append ()
let creat = get_creat ()
let trunc = get_trunc ()
let excl = get_excl ()
let noctty = get_noctty ()
let dsync = get_dsync ()
let sync = get_sync ()
let rsync = get_rsync ()
include Flags.Make(struct
let known =
[ rdonly , "rdonly"
; wronly , "wronly"
; rdwr , "rdwr"
; nonblock , "nonblock"
; append , "append"
; creat , "creat"
; trunc , "trunc"
; excl , "excl"
; noctty , "noctty"
; dsync , "dsync"
; sync , "sync"
; rsync , "rsync"
]
let remove_zero_flags = false
let allow_intersecting = false
let should_print_error = true
end)
end

And there are about 3 modules like this in this file, plus the corresponding stubs in the C file. Writing this code initially was no fun, and adding new flags now that the C library has evolved is still no fun.

The rest of this section explains how to make it more fun with cinaps.

Setting up and using cinaps

First I add a rule in the build system to call cinaps appropriately. I use a few settings specific to our jenga based builds and it is currently not possible to replicate this outside of Jane Street, but assuming you have a Makefile, you can write:

.PHONY: cinaps
cinaps:
    cinaps -i src/*.ml src/*.c

Now whenever you call make cinaps, all the files will be updated in place. You can then do git diff to see what changed.

Then I write a file src/cinaps_helpers. It is plain OCaml source file, however it is not suffixed with .ml so that it is not confused with a regular module of the library. It contains the various bits that are common between the ml/C files in the library:

(* -*- tuareg -*- *)
let stub_prefix = "mylib_"
let stub name = stub_prefix ^ name
let open_flags =
[ "O_RDONLY"
; "O_WRONLY"
; "O_RDWR"
; "O_NONBLOCK"
; "O_APPEND"
; "O_CREAT"
; "O_TRUNC"
; "O_EXCL"
; "O_NOCTTY"
; "O_DSYNC"
; "O_SYNC"
; "O_RSYNC"
]
let other_flags =
[ ...
]
let yet_other_flags =
[ ...
]
let all_flags = open_flags @ other_flags @ yet_other_flags
open StdLabels
open Printf
let pr fmt = printf (fmt ^^ "\n")
let flags_module module_name flags ~prefix ~allow_intersection =
<code to print an Open_flags like module>

Now, in my original .ml file, I can write:

(*$ #use "cinaps_helpers" $*)
(*$ flags_module "Open_flags" open_flags ~prefix:"O_" ~allow_intersecting:false *)
module Open_flags = struct
external get_rdonly : unit -> Int63.t = "mylib_O_RDONLY" [@@noalloc]
external get_wronly : unit -> Int63.t = "mylib_O_WRONLY" [@@noalloc]
external get_rdwr : unit -> Int63.t = "mylib_O_RDWR" [@@noalloc]
external get_nonblock : unit -> Int63.t = "mylib_O_NONBLOCK" [@@noalloc]
external get_append : unit -> Int63.t = "mylib_O_APPEND" [@@noalloc]
external get_creat : unit -> Int63.t = "mylib_O_CREAT" [@@noalloc]
external get_trunc : unit -> Int63.t = "mylib_O_TRUNC" [@@noalloc]
external get_excl : unit -> Int63.t = "mylib_O_EXCL" [@@noalloc]
external get_noctty : unit -> Int63.t = "mylib_O_NOCTTY" [@@noalloc]
external get_dsync : unit -> Int63.t = "mylib_O_DSYNC" [@@noalloc]
external get_sync : unit -> Int63.t = "mylib_O_SYNC" [@@noalloc]
external get_rsync : unit -> Int63.t = "mylib_O_RSYNC" [@@noalloc]
let rdonly = get_rdonly ()
let wronly = get_wronly ()
let rdwr = get_rdwr ()
let nonblock = get_nonblock ()
let append = get_append ()
let creat = get_creat ()
let trunc = get_trunc ()
let excl = get_excl ()
let noctty = get_noctty ()
let dsync = get_dsync ()
let sync = get_sync ()
let rsync = get_rsync ()
include Flags.Make(struct
let known =
[ rdonly , "rdonly"
; wronly , "wronly"
; rdwr , "rdwr"
; nonblock , "nonblock"
; append , "append"
; creat , "creat"
; trunc , "trunc"
; excl , "excl"
; noctty , "noctty"
; dsync , "dsync"
; sync , "sync"
; rsync , "rsync"
]
let remove_zero_flags = false
let allow_intersecting = false
let should_print_error = true
end)
end
(*$*)

And cinaps will check that the text between the (*$ ... *) and (*$*) forms is what is printed by flags_module "Open_flags" .... I write something similar in the .c file. Note the initial (*$ ... $*) form, which is not expected to print anything and is only used for its other side effects.

Adding new flags become trivial: add it to the list in src/cinaps_helper and execute make cinaps.

Pushing the system

Now I decide that I don't like the fact that all my constant flags are initialized at runtime and I want them to be static constant on the ml side. A simple way to do this is to write a C program that include the right headers and output a .ml file defining these constants. I use cynaps to write this C file as well:

/*$ #use "cinaps_helpers" $*/
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
printf("open Core\n");
printf("let mk = Int63.of_int_exn\n");
/*$
printf "\n";
let len = longest all_flags in
List.iter all_flags ~f:(fun f ->
pr {| printf("let _%-*s = mk %%d\n", %-*s);|} len f len f );
printf " " */
printf("let _O_RDONLY = mk %d\n", O_RDONLY );
printf("let _O_WRONLY = mk %d\n", O_WRONLY );
printf("let _O_RDWR = mk %d\n", O_RDWR );
printf("let _O_NONBLOCK = mk %d\n", O_NONBLOCK);
printf("let _O_APPEND = mk %d\n", O_APPEND );
printf("let _O_CREAT = mk %d\n", O_CREAT );
printf("let _O_TRUNC = mk %d\n", O_TRUNC );
printf("let _O_EXCL = mk %d\n", O_EXCL );
printf("let _O_NOCTTY = mk %d\n", O_NOCTTY );
printf("let _O_DSYNC = mk %d\n", O_DSYNC );
printf("let _O_SYNC = mk %d\n", O_SYNC );
printf("let _O_RSYNC = mk %d\n", O_RSYNC );
/*$*/
return 0;
}

Updating the various flag modules in the the ml code is as simple as editing src/cinaps_helpers and doing make cinaps:

(*$ flags_module "Open_flags" open_flags ~prefix:"O_" ~allow_intersecting:false *)
module Open_flags = struct
let rdonly = Consts._O_RDONLY
let wronly = Consts._O_WRONLY
let rdwr = Consts._O_RDWR
let nonblock = Consts._O_NONBLOCK
let append = Consts._O_APPEND
let creat = Consts._O_CREAT
let trunc = Consts._O_TRUNC
let excl = Consts._O_EXCL
let noctty = Consts._O_NOCTTY
let dsync = Consts._O_DSYNC
let sync = Consts._O_SYNC
let rsync = Consts._O_RSYNC
include Flags.Make(struct
let known =
[ Consts._O_RDONLY , "rdonly"
; Consts._O_WRONLY , "wronly"
; Consts._O_RDWR , "rdwr"
; Consts._O_NONBLOCK , "nonblock"
; Consts._O_APPEND , "append"
; Consts._O_CREAT , "creat"
; Consts._O_TRUNC , "trunc"
; Consts._O_EXCL , "excl"
; Consts._O_NOCTTY , "noctty"
; Consts._O_DSYNC , "dsync"
; Consts._O_SYNC , "sync"
; Consts._O_RSYNC , "rsync"
]
let remove_zero_flags = false
let allow_intersecting = false
let should_print_error = true
end)
end
(*$*)

Tweak: indenting the generated code

You can either write cinaps code that produce properly indented code, or you can use the styler option:

.PHONY: cinaps
cinaps:
    cinaps -styler ocp-indent -i src/*.ml src/*.c

History behind the name

I initially wrote this tool while I did some work on the ocaml-migrate-parsetree project. ocaml-migrate-parsetree was started by Alain Frisch and continued by Frederic Bour and aims at providing a solid and stable base for authors of ppx rewriters or other tools using the OCaml frontend. I helped a bit during development and did some testing on a large scale while rebasing our ppx infrastructure on top it.

Due to its nature, this project contains a lot of repetitive code that cannot be factorized other than by using some kind of meta-programming. Initially we had a small pre-preprocessor that was interpreting a made-up syntax and was working like cpp does. The syntax was yet another DSL and the generated code was generated on the fly. This made the .ml and .mli files harder to understand since you had to decode this DSL in order to understand what the code was.

Cinaps replaced this tool and the name was chosen to emphasize that it is not a preprocessor. It means "Cinaps Is Not A Preprocessing System".

Status

Cinaps is published on github and is part of the upcoming v0.9 Jane Street release. The version that is published doesn't yet support the C/S-expression syntaxes but once the stable release has gone through, an updated version of Cinaps supporting these syntaxes will be released.

Hide

New opam features: “opam build” — OCamlPro, Mar 16, 2017

The new opam 2.0 release, currently in beta, introduces several new features. This post gets into some detail on the new opam build command, its purpose, its use, and some implementation aspects.

opam build is run from the source tree of a project, and does not rely on a pre-existing opam installation. As such, it adds a new option besides the existing workflows based on managing shared OCaml installations in the form of switches.

What does it do ?

Typically, this is used in a fresh git clone of…

The new opam 2.0 release, currently in beta, introduces several new features. This post gets into some detail on the new opam build command, its purpose, its use, and some implementation aspects.

What does it do ?

Typically, this is used in a fresh git clone of some OCaml project. Like when pinning the package, opam will find and leverage package definitions found in the source, in the form of opam files.

if opam hasn’t been initialised (no ~/.opam), this is taken care of.
if no switch is otherwise explicitely selected, a local switch is used, and
created if necessary (i.e. in ./_opam/)
the metadata for the current project is registered, and the package installed
after its dependencies, as opam usually does

This is particularly useful for distributing projects to people not used to opam and the OCaml ecosystem: the setup steps are automatically taken care of, and a single opam build invocation can take care of resolving the dependency chains for your package.

If building the project directly is preferred, adding --deps-only is a good way to get the dependencies ready for the project:

  $ opam build --deps-only
  $ eval $(opam config env)
  $ ./configure; make; etc.

Note that if you just want to handle project-local opam files, opam build can also be used in your existing switches: just specify --no-autoinit, --switch or make sure the OPAMSWITCH variable is set. E.g. opam build --no-autoinit --deps-only is a convenient way to get the dependencies for the local project ready in your current switch.

Additional functions

Installation

The installation of the packages happens as usual to the prefix corresponding to the switch used (<project-root>/_opam/ for a local switch). But it is possible, with --install-prefix, to further install the package to the system:

opam build --install-prefix ~/local

will install the results of the package found in the current directory below ~/local.

The dependencies of the package won’t be installed, so this is intended for programs, assuming they are relocatable, and not for libraries.

Choosing custom repositories

The user can pre-select the repositories to use on the creation of the local switch with:

  opam build --repositories <repos>

where <repos> is a comma-separated list of repositories, specified either as name=URL, or name if already configured on the system.

Multiple packages

Multiple packages are commonly found to share a single repository. In this case, opam build registers and builds all of them, respecting cross-dependencies. The opam files to use can also be explicitely selected on the command-line. In this case, specific opam files must be named <package-name>.opam.

Implementation details

The choice of the compiler, on automatic initialisation, is either explicit, using the --compiler option, or automatic. In the latter case, the default selection is used (see opam init --help, section “CONFIGURATION FILE” for details), but a compiler compatible with the local packages found is searched from that. This allows, for example, to choose a system compiler when available and compatible, avoiding a recompilation of OCaml.

When using --install-prefix, the normal installation is done, then the tracking of package-installed files, introduced in opam 2.0, is used to extract the installed files from the switch and copy them to the prefix.

The packages installed through opam build are not registered in any repository, and this is not an implicit use of opam pin: the rationale is that packages installed this way will also be updated by repeating opam build. This means that when using other commands, e.g. opam upgrade, opam won’t try to keep the packages to their local, source version, and will either revert them to their repository definition, or remove them, if they need recompilation.

Planned extensions

This is still in beta: there are still rough edges, please experiment and give feedback! It is still possible that the command syntax and semantics change significantly before release.

Another use-case that we are striving to improve is sharing of development setups (share sets of pinned packages, depend on specific remotes or git hashes, etc.). We have many ideas to improve on this, but opam build is not, as of today, a direct solution to this. In particular, installing this way still relies on the default opam repository; a way to define specific options for the switch that is implicitely created on opam build is in the works.

NOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com. Please head to the latter for the comments!

Hide

One more talk, two more videos — Jane Street (Yaron Minsky), Mar 15, 2017

I'm happy to announce our next public tech talk, called Seven Implementations of Incremental, on Wednesday, April 5th, presented by yours truly. You can register here.

The talk covers the history of Incremental, a library for building efficient online algorithms. The need to update computations incrementally is pretty common, and we've found Incremental to be useful in creating such computations in a number of different domains, from constructing efficient financial calculations to writing resp…

Weekly News — OCaml Weekly News, Mar 14, 2017

From visitors to iterators — GaGallium (François Pottier), Mar 14, 2017

I have been asked whether an automatically-generated visitor, as produced by the visitors syntax extension for OCaml, can be used to construct an iterator.

It turns out that this can be done in a simple and efficient manner. (Up to a constant factor, the time complexity of this solution is optimal.) As the problem is interesting and its solution is somewhat nonobvious, I am describing them here.

To play with this code in an OCaml toplevel, first install visitors via the command opam instal…

I have been asked whether an automatically-generated visitor, as produced by the visitors syntax extension for OCaml, can be used to construct an iterator.

To play with this code in an OCaml toplevel, first install visitors via the command opam install visitors. Then, launch ocaml and type this:

#use "topfind";;
#require "visitors.ppx";;
#require "visitors.runtime";;

Problem Statement

Suppose we have an arbitrary data structure that contains elements of type 'a. Here, it is a binary tree, but it could be anything:

type 'a sometree =
  | Leaf
  | Node of 'a sometree * 'a * 'a sometree

We would like to enumerate the elements of this data structure. More precisely, we would like to construct an iterator, that is, an on-demand producer of elements. Here is a simple definition of a stateful iterator:

type 'a iterator =
  unit -> 'a option

The question is, can we construct an iterator for the type 'a sometree, based on an automatically-generated visitor, so that the construction is entirely independent of the type 'a sometree?

Cascades

For starters, let us define cascades, which are a more pleasant kind of iterators. A cascade is a persistent (stateless) iterator. It can be thought of as a delayed list, that is, a list whose elements are computed only on demand.

Cascades could (should) be part of a separate library. As the time of writing (March, 2017), there is in fact a proposal to add them to OCaml's standard library.

type 'a cascade =
  unit -> 'a head

and 'a head =
  | Nil
  | Cons of 'a * 'a cascade

A delayed computation is represented as a function of type unit -> _.

The cascade constructors are as follows:

let nil : 'a cascade =
  fun () -> Nil

let cons (x : 'a) (xs : 'a cascade) : 'a cascade =
  fun () -> Cons (x, xs)

Forcing a cascade reveals its head:

let force xs =
  xs()

A cascade can be easily converted to a stateful iterator:

let cascade_to_iterator (xs : 'a cascade) : 'a iterator =
  let s = ref xs in
  fun () ->
    match force !s with
    | Nil ->
        s := nil;
        None
    | Cons (x, xs) ->
        s := xs;
        Some x

In the above code, writing nil into s may seem superfluous, but is in fact necessary to guarantee that the computation that led to a Nil outcome is not repeated in the future.

Because cascades are close cousins of lists, they are easy to work with. Constructing a cascade for a tree-like data structure is straightforward, whereas directly constructing a stateful iterator would be more involved.

Back to the problem

Now, can we use some kind of visitor to turn a tree of type 'a sometree into a cascade of type 'a cascade?

At first sight, this does not seem very easy, because of two issues:

a visitor usually traverses a tree in an eager manner, whereas we need the traversal to make progress only as cascade elements are demanded;
a visitor performs a bottom-up computation, without a left-to-right bias (assuming mutable state is not used), whereas a cascade enumerates elements in a left-to-right manner. (Or in a right-to-left manner. As will be apparent below, both directions are possible.)

The trick is to use another intermediate step. Instead of turning a tree directly into a cascade, we first transform it into a generic tree-like structure: a delayed tree. The first issue above is solved because, by introducing delays into the new tree, we allow its construction to be carried out on demand. The second issue is solved because this tree-to-tree transformation can be carried out in a purely bottom-up manner by a reduce visitor. Then, finally, it is straightforward to transform a delayed tree into a cascade.

From Delayed Trees to Cascades and Iterators

A delayed tree contains ordinary nodes of arity 0, 1, and 2. Furthermore, it contains DTDelay nodes, of arity 1, whose child is delayed, that is, computed only on demand.

type 'a delayed_tree =
  | DTZero
  | DTOne of 'a
  | DTTwo of 'a delayed_tree * 'a delayed_tree
  | DTDelay of (unit -> 'a delayed_tree)

A delayed tree is converted to a cascade as follows. As is often the case, when building a cascade, one must take a continuation k as an argument, so as to avoid naive and costly cascade concatenation operations. Thus, the specification of the function call delayed_tree_to_cascade dt k is to construct a cascade whose elements are the elements of the delayed tree dt (listed from left to right), followed with the elements of the cascade k.

let rec delayed_tree_to_cascade (dt : 'a delayed_tree) (k : 'a cascade)
: 'a cascade =
  fun () -> delayed_tree_to_head dt k

and delayed_tree_to_head (dt : 'a delayed_tree) (k : 'a cascade) : 'a head =
  match dt with
  | DTZero ->
      force k
  | DTOne x ->
      Cons (x, k)
  | DTTwo (dt1, dt2) ->
      delayed_tree_to_head dt1 (delayed_tree_to_cascade dt2 k)
  | DTDelay dt ->
      delayed_tree_to_head (force dt) k

let delayed_tree_to_cascade (dt : 'a delayed_tree) : 'a cascade =
  delayed_tree_to_cascade dt nil

let delayed_tree_to_iterator (dt : 'a delayed_tree) : 'a iterator =
  cascade_to_iterator (delayed_tree_to_cascade dt)

In the above code, we have chosen to perform a left-to-right traversal of the delayed tree, but could just as well have chosen a right-to-left traversal.

It is possible to make the delayed tree data structure implicit in the code; this is explained further on.

Constructing Delayed Trees

The type 'a delay is a synonym for 'a. We will use it as a decoration, in a type definition, to indicate that a call to the method visit_delay is desired.

type 'a delay = 'a

We now set up four constructor functions or constructor methods, which construct delayed trees, and which we will use in a reduce visitor.

Delayed trees form a monoid, in the sense that we concatenate them using DTTwo, and the neutral element is DTZero. We package these two data constructors in the methods zero and plus, which are automatically called in an automatically-generated reduce visitor.

The visitor method visit_delay delays the visit of a subtree by constructing and returning a DTDelay node, which carries a delayed recursive call to a visitor.

The visitor function yield will be invoked at elements of type 'a. It constructs a one-element delayed tree.

class ['self] delayed_tree_monoid = object (_ : 'self)

  method zero =
    DTZero

  method plus s1 s2 =
    match s1, s2 with
    | DTZero, s
    | s, DTZero ->
        s
    | _, _ ->
        DTTwo (s1, s2)

  method visit_delay: 'env 'a .
    ('env -> 'a -> 'b delayed_tree) ->
    'env -> 'a delay -> 'b delayed_tree
  = fun visit_'a env x ->
      DTDelay (fun () -> visit_'a env x)

end

let yield _env x =
  DTOne x

In the method plus, above, treating DTZero specially is not mandatory. It is an optimization, which helps allocate fewer nodes.

Generating a Visitor

It is now time to generate a reduce visitor for the type 'a sometree. This is the only part of the code which is specific of sometree. Everything else is generic.

We must insert delays into the structure of the type 'a sometree so as to indicate where visit_delay should be called and (therefore) where DTDelay nodes should be allocated. To do this, we write a copy of the definition of the type 'a sometree, with extra uses of delay in it. The new type is actually considered equal to 'a sometree by OCaml. Its role is purely to carry a @@deriving visitors annotation.

In the data constructor Node, the left-hand delay is in fact superfluous. With or without it, our iterators will eagerly descend along the leftmost branch of a tree, anyway.

type 'a mytree = 'a sometree =
  | Leaf
  | Node of 'a mytree delay * 'a * 'a mytree delay

and 'a mytree_delay =
  'a mytree delay

[@@deriving visitors { variety = "reduce"; polymorphic = true;
                       concrete = true; ancestors = ["delayed_tree_monoid"] }]

This approach is pleasant insofar as one controls exactly where delays are inserted. However, it requires copying the type definition, which may be unpleasant. Another approach is described further on.

Using the Generated Visitor

For demonstration purposes, let us make our visitor verbose. In production, one should remove verbose_reduce and use reduce instead.

class ['self] verbose_reduce = object (_ : 'self)
  inherit [_] reduce as super
  method! visit_Leaf visit_'a env =
    Printf.printf "Visiting a leaf.\n%!";
    super#visit_Leaf visit_'a env
  method! visit_Node visit_'a env t1 x t2 =
    Printf.printf "Visiting a node.\n%!";
    super#visit_Node visit_'a env t1 x t2
end

The problem is solved! There remains to write a couple lines of glue code:

let sometree_to_delayed_tree (t : 'a sometree) =
  new verbose_reduce # visit_mytree_delay yield () t

let sometree_to_iterator (t : 'a sometree) : 'a iterator =
  delayed_tree_to_iterator (sometree_to_delayed_tree t)

We use visit_mytree_delay above, even though visit_mytree would work just as well, so as to ensure that we get a delayed tree whose root is a DTDelay node. Thus, absolutely no work is performed when the iterator is created; iteration begins only when the first element is demanded.

Demo

A little demo helps see what is going on.

let t : int sometree =
  Node (Node (Leaf, 1, Leaf), 2, Node (Leaf, 3, Leaf))

let i : int iterator =
  sometree_to_iterator t

Here is a transcript of an OCaml toplevel session:

  # i();;
  Visiting a node.
  Visiting a node.
  Visiting a leaf.
  - : int option = Some 1
  # i();;
  Visiting a leaf.
  - : int option = Some 2
  # i();;
  Visiting a node.
  Visiting a leaf.
  - : int option = Some 3
  # i();;
  Visiting a leaf.
  - : int option = None
  # i();;
  - : int option = None

Variant: Avoiding Duplication of the Type Definition

Earlier, we have generated a visitor for the existing type 'a sometree in a posteriori style. We have manually created an isomorphic copy of the type 'a sometree, which we have named 'a mytree, and have annotated this copy with [@@deriving visitors { ... }]. Furthermore, we have taken this opportunity to insert delay type constructors into the type, so as to influence the generated visitor.

This style is relatively pleasant because it is declarative and lets us control exactly where delays are inserted. However, it requires duplicating the definition of the type 'a sometree, which may be unpleasant (if the definition is large) or impossible (if the definition is hidden behind an abstraction barrier).

Another approach is to generate a visitor in a priori style. When the type 'a sometree is first defined, a reduce visitor can be immediately generated for it, as follows:

type 'a sometree =
  | Leaf
  | Node of 'a sometree * 'a * 'a sometree

[@@deriving visitors { variety = "reduce"; polymorphic = true;
                       name = "sometree_reduce" }]

At this point, we pretend that we do not know yet what this visitor will be used for, so we do not annotate the type definition with delay, and do not use delayed_tree_monoid as a base class. We get a visitor class, named sometree_reduce. This class has two virtual methods, zero and plus.

Then, we create a subclass, named reduce, which we tailor to our needs. We mix the generated class sometree_reduce with the class delayed_tree_monoid, and insert a delay at every tree node by overriding the method visit_sometree:

class ['self] reduce = object (self : 'self)
  inherit [_] sometree_reduce as super
  inherit [_] delayed_tree_monoid
  method! visit_sometree visit_'a env t =
    self#visit_delay (super#visit_sometree visit_'a) env t
end

The rest of the code is unchanged (except the method visit_mytree_delay no longer exists; one calls visit_sometree instead).

Variant: Getting Rid of Explicit Delayed Trees

I like to present delayed trees as an explicit data structure, because this helps understand what is going on. However, if desired, it is possible to hide them by refunctionalization (the opposite of defunctionalization).

Above, the function delayed_tree_to_cascade was written with the help of an auxiliary function, delayed_tree_to_head. We could also have written it directly, as follows:

let rec delayed_tree_to_cascade (dt : 'a delayed_tree) (k : 'a cascade)
: 'a cascade =
  match dt with
  | DTZero ->
      k
  | DTOne x ->
      cons x k
  | DTTwo (dt1, dt2) ->
      delayed_tree_to_cascade dt1 (delayed_tree_to_cascade dt2 k)
  | DTDelay dt ->
      fun () -> delayed_tree_to_cascade (force dt) k ()

In this form, the only function that is ever applied to a delayed tree is delayed_tree_to_cascade. Therefore, wherever we construct a delayed tree t, we could instead directly build a closure whose behavior is equivalent to delayed_tree_to_cascade t. This is refunctionalization.

The data structure of delayed trees seems to disappears. The type 'a delayed_tree can be defined as a synonym for a 'a cascade -> 'a cascade. I usually refer to this type as 'a producer: it is the type of a function that concatenates elements in front of the cascade that it receives as an argument.

type 'a producer =
  'a cascade -> 'a cascade

type 'a delayed_tree =
  'a producer

The four data constructors are defined as follows:

let _DTZero k =
  k

let _DTOne x k =
  cons x k

let _DTTwo dt1 dt2 k =
  dt1 (dt2 k)

let _DTDelay dt k =
  fun () -> force dt k ()

The reader can check that the types of these data constructors are the same as in the previous version of the code. For instance, _DTDelay has type (unit -> 'a delayed_tree) -> 'a delayed_tree.

The root function delayed_tree_to_cascade (without a continuation argument) is defined as follows:

let delayed_tree_to_cascade (dt : 'a delayed_tree) : 'a cascade =
  dt nil

The delayed tree monoid uses the new versions of the four data constructors:

class ['self] delayed_tree_monoid = object (_ : 'self)

  method zero =
    _DTZero

  method plus =
    _DTTwo

  method visit_delay: 'env 'a .
    ('env -> 'a -> 'b delayed_tree) ->
    'env -> 'a delay -> 'b delayed_tree
  = fun visit_'a env x ->
      _DTDelay (fun () -> visit_'a env x)

end

let yield _env x =
  _DTOne x

The four functions _DTZero, _DTOne, _DTTwo, and _DTDelay could be inlined, if desired, so as to make the above code seem even more concise.

The rest of the code is unchanged.

Acknowledgements

KC Sivaramakrishnan asked whether a visitor can be used to construct an iterator. Gabriel Scherer pointed out that the delayed tree data structure can be hidden by refunctionalization.

Hide

View older blog posts.

OCaml Planet

Proving a mem/map property — Shayne Fletcher, May 11, 2017

New opam features: more expressive dependencies — OCamlPro, May 11, 2017

Specifying dependencies in opam 1.2

Specifying dependencies in opam 1.2

Extensions in opam 2.0

Conditional dependencies

Dependency flags

Computed versions

Conclusion

Weekly News — OCaml Weekly News, May 09, 2017

Full Time: Software Developer (Functional Programming) at Jane Street in New York, NY; London, UK; Hong Kong — GitHub Jobs, May 08, 2017

Preprocessor extensions for code generation — Shayne Fletcher, May 04, 2017

Preprocessor extensions for code generation

Preprocessor extensions for code generation

Implementing the "id_of" ppx

The basic strategy

Implementing the mappings

Building and testing

New opam features: “opam install DIR” — OCamlPro, May 04, 2017

Status of opam …

Status of opam build

Handling a local project

Specifying a destdir

Initialising

More related options

Weekly News — OCaml Weekly News, May 02, 2017

Looking for a technical writer — Jane Street (Yaron Minsky), May 01, 2017

What do you mean ExceptT doesn't Compose? — Erik de Castro Lopo, Apr 30, 2017

New opam features: local switches — OCamlPro, Apr 27, 2017

A reminder about switches

A reminder about switches

Using within projects

Interface

Implementation

Current status

Limitations and future extensions

Caveat Configurator: how to replace configs with code, and why you might not want to — Jane Street (Yaron Minsky), Apr 25, 2017

Weekly News — OCaml Weekly News, Apr 25, 2017

Full Time: Front-end Developer at issuu in Copenhagen — GitHub Jobs, Apr 24, 2017

Seventeenth OCaml compiler hacking evening at Pembroke — OCaml Labs compiler hacking, Apr 18, 2017

News about Tyre | Drup's thingies — Gabriel Radanne, Apr 17, 2017

EzSudoku — OCamlPro, Apr 11, 2017

Solving it

Wrapping it up

Weekly News — OCaml Weekly News, Apr 11, 2017

Version 0.0.2 of the Frama-Clang plugin plugin is available for download. — Frama-C, Apr 05, 2017

Weekly News — OCaml Weekly News, Apr 04, 2017

Ann: Pumping | Drup's thingies — Gabriel Radanne, Apr 01, 2017

Dealing with source code locations (in lexical and syntax analysis) — Shayne Fletcher, Mar 30, 2017

Describing locations

Error reporting with locations

Exception handling involving errors with locations

Usage

Weekly News — OCaml Weekly News, Mar 28, 2017

Two PhD positions in Ljubljana starting October 2017 — Andrej Bauer, Mar 28, 2017

Polynomials over rings — Shayne Fletcher, Mar 21, 2017

Arithmetics and rings

Arithmetics and rings

Polynomials

Testing and example usage

Weekly News — OCaml Weekly News, Mar 21, 2017

Trivial meta-programming with cinaps — Jane Street (Jeremie Dimino), Mar 20, 2017

What is cinaps?

Real example

Setting up and using cinaps

Pushing the system

Tweak: indenting the generated code

History behind the name

Status

New opam features: “opam build” — OCamlPro, Mar 16, 2017

What does it do ?

What does it do ?

Additional functions

Installation

Choosing custom repositories

Multiple packages

Implementation details

Planned extensions

One more talk, two more videos — Jane Street (Yaron Minsky), Mar 15, 2017

Implementing the "`id_of`" ppx

Status of `opam …`

Status of `opam build`