Concurrency, Parallelism, and Distributed Systems
Concurrency refers to running multiple computations more-or-less simultaneously, whereas parallelism refers to using multiple cores or OS-level threads to coordinate computation. We now know that the former is relatively safe and easy to reason about, whereas the latter is extremely difficult and causes many subtle bugs. OCaml currently supports concurrency elegantly, but parallelism support is not built in to the runtime.
- lwt: a monadic concurrency library. Concurrent code uses monads to express the higher-level abstractions of control flow.
- Async: another monadic concurrency library developed by Jane Street. This library is covered in Real World OCaml. While the concept is very similar to lwt, small discrepancies make compatibility between the libraries difficult.
Bindings to libuv,
an event loop-based system that runs
node.io. This is also a replacement for the
Unixmodule, allowing for full process control in a system-independent manner.
- The blog post that introduced Async
- A user gives up on Async
- Cooperative Concurrency in OCaml: Using Async
As mentioned above, OCaml currently doesn’t natively support multiple OS-level OCaml threads running simultaneously. A global lock prevents multiple OCaml threads from running simultaneously.
Since we currently don’t have thread-level parallelism, process-level is used instead.
- Parmap: Provides easy-to-use parallel map and fold functions. The library makes use of forking to create short-lived child processes, and memory mapping to feed the data back to the parent process.
- Parany: Similar to Parmap, compute a given function over multiple processes in parallel.
- hack-parallel: Parallel processing library using shared memory. Used by Facebook’s Hack.
- lwt-parallel: Lower level mechanism to create child processes in lwt and have it communicate with the parent via socket.
- ForkWork: Similar to Parmap above.
- By interfacing with external C code through the FFI, OCaml can pass off long-running computations to C threads running at the same time as OCaml code. This is made easier nowadays due to CTypes (see ffi)
- Nproc: A process pool implementation for OCaml using lwt. Rather than creating or forking processes as needed, preallocates them and sends them units of work as required.
- Ocamlnet: An enhanced system platform library. It contains the netmulticore library to compute tasks on as many cores of the machine as needed. This is the most powerful implementation of parellelism currently available for OCaml, as it is capable of creating a shared memory region, and running a custom-made garbage collector on said region.
- Sklml: A functional parallel skeleton compiler and programming system for OCaml programs.
The most promising and powerful way to use multicore is with the new multicore branch. This branch uses a parallel garbage collector, which means that OCaml will eventually be able to run on multiple cores in the same process. Note that this branch is not yet ready for real work, but it’s rapidly advancing. For more information, consult the Multicore Wiki.
- Parallel Programming in Multicore OCaml: great article on using the Multicore OCaml branch.
Distributed computing is similar to process-based parallelism, except that the child processes may or may not be on remote (though generally not too remote) machines. Therefore, distributed computing libraries generally can perform parallelism on the same machine.
- Rpc.Parallel: a library for spawning processes on a cluster of machines, and passing typed messages between them.
- Functory: a distributed computing library which facilitates distributed execution of parallelizable computations in a seamless fashion.
- MPI: message Passing Interface bindings for OCaml.
- ocaml-rpc: light library to deal with RPCs in OCaml.
- distributed: Library for distributed computation in OCaml. Similar to Erlang’s model and inspired by Cloud Haskell.
- reactor (alpha): Actor model for OCaml, similar to Erlang Elixir.