1. Intro
Morloc is a strongly-typed functional programming language where functions are imported from foreign languages and unified through a common type system. This language is designed to serve as the foundation for a universal library of functions. Each function in the library has one general type and zero or more implementations. An implementation may be either a function sourced from a foreign language or a composition of such functions. All interop code is generated by the Morloc compiler.
2. Why Morloc?
2.1. Compose functions across languages under a common type system
Morloc allows functions from polyglot libraries to be composed in a simple functional language. The focus isn’t on classic interoperability (e.g., calling Python from C) or serialization (e.g., sending data between applications via protobufs) — though morloc implementations may use these under the hood. Instead, you define types, import implementations, and build complex programs through function composition. The compiler invisibly generates any required interop code.
2.2. Write in your favorite language, share with everyone
Do you want to write in language X but have to write in language Y because everyone in your team does or because your expected users do? Love C for algorithms, R for statistics, but don’t want to write full apps in either? Morloc lets you mix and match, so you can use each language where it shines, with no bindings or boilerplate.
2.3. Run benchmarks and tests across languages
Tired of learning new benchmark and testing suites across all your languages? Is it hard to benchmark similar tools wrapped in applications with varying input formats, input validation costs, or startup overhead? In Morloc, functions with the same general type signature can be swapped in and out for benchmarking and testing. The same test suites and test cases will work across all supported languages because inputs/output of all functions of the same type share equivalent Morloc binary forms, making validation and comparison easy.
2.4. Design universal libraries
With Morloc, we can build abstract libraries using the general types as a logical framework. Then we can import implementations of these functions from one or more of the supported languages and easily test and benchmark them. These libraries are the foundation for an ecosystem where functions may be verified, organized/searched by type, and used to build rigorous programs.
2.5. Make better bioinformatics workflows
Within the bioinformatics space, Morloc can serve as a replacement for the brittle application/file paradigm of workflow design. Replace heavy CLI applications with pure function libraries, ad hoc textual file formats with explicit data structures, and workflow specifications with function compositions. See the the first Morloc paper for details (pre-released here).
3. Current status
Morloc is under heavy development in several areas:
-
language support – We need to further standardize the language onboarding process and then start adding new languages beyond the three that are currently supported (Python, R, and C++)
-
type system – There’s lots to do here: sum types, effect handling, constraints, extensible records
-
performance – The shared library implementation lacks proper memory defragmentation, and there is some unnecessary memory copying between languages
-
scaling – I’ve implemented some of the infrastructure and syntax for remote job submission, but more work is needed before it can be used in practice
-
syntax – Pattern matching, custom operators, namespaces, string interpolation, and more are on the roadmap
-
tooling – We need a linter, debugger, dependency manager, and better backend generators that produce better CLI usage statements and programmatic APIs
There is one island of stability, though: the native functions Morloc imports are fully independent of Morloc itself. For a given Morloc program, most of your code will be pure functions in native languages (e.g., Python, C++, or R). This code will never have to change between Morloc versions. Where Morloc will change is in how it describes these native functions, the syntax it uses to compose them, and the particulars of code generation.
Is Morloc ready for production? Maybe. Currently, Morloc has many sharp edges, and new versions may introduce breaking changes. So Morloc is most appropriate right now for adventurous first adopters who can solve problems and write clear issue reports. Morloc may be about one year of full-time work away from v1.0.
Want to contribute? The most helpful thing you can do is join the community (see the Contact section), try out Morloc, and offer feedback on social media or via GitHub issue reports. The community is just starting, and the language is young, so you can strongly influence how the system evolves.
4. Getting Started
4.1. Install the compiler
![]() |
Not well tested |
The easiest way to start using Morloc is through containers. I recommend using podman, since it doesn’t require a daemon or sudo access. But Docker, Singularity, and other container engines are fine as well.
An image with the morloc executable and batteries included can be retrieved from the GitHub container registry as follows:
$ podman pull ghcr.io/morloc-project/morloc/morloc-full:0.54.0
The v0.54.0
may be replaced with the desired Morloc version.
Now you can enter a shell with a full working installation of Morloc:
$ podman run --shm-size=4g \
-v $HOME:$HOME \
-w $PWD \
-e HOME=$HOME \
-it ghcr.io/morloc-project/morloc/morloc-full:0.54.0 \
/bin/bash
The --shm-size=4g
option sets the shared memory space to 4GB. Morloc uses
shared memory for communication between languages, but containers often limit
the shared memory space to 64MB by default. By mounting your home directory, the
changes you make in the container (including the installation of Morloc
modules) will be persistent across sessions.
You can set up a script to run commands in a Morloc environment. To do this, paste the following code into a file:
mkdir -p ~/.morloc
podman run --rm \
--shm-size=4g \
-e HOME=$HOME \
-v $HOME/.morloc:$HOME/.morloc \
-v $PWD:$HOME \
-w $HOME \
ghcr.io/morloc-project/morloc/morloc-full:0.54.0 "$@"
Make it executable (chmod 755 menv
) and place it in a bin folder on your PATH
(e.g., ~/bin
). The script will mount your current working directory and your
Morloc home directory, allowing you to run commands in a morloc-compatible
environment.
With the menv
script, can run commands like so:
$ menv morloc --version # get the current morloc version
$ menv morloc -h # list morloc commands
This should print the Morloc version and usage info.
Next you need to initialize the Morloc home directory:
$ menv morloc init -f # setup the morloc environment
This will write required headers to your environment and build the required libraries.
You can install Morloc modules as well:
$ menv morloc install types # install a morloc module
These modules will be retrieved from GitHub and written into Morloc home.
You can compile Morloc programs within this container as well:
$ menv morloc make -o foo foo.loc # compile a local morloc module
The last command builds a Morloc program with the executable "foo" from the Morloc script file "foo.loc". The generated executable may not work on your system since it was compiled within the container environment, so you should run it in the container environment as well:
$ menv ./foo bar 1 2 3
More advanced solutions with richer dependency handling will be introduced in the future, but for now this allows easy experimentation with the language in a safe(ish) sandbox.
The menv morloc
or menv ./foo
syntax is a bit verbose, but I’ll let you play
with alternative aliases. The conventions here are still fluid. Let me know if
you find something better and or if you find bugs in this approach.
4.2. Installing from source
![]() |
Not well tested |
If you want to compile Morloc from source, but don’t want to install a Haskell environment, the following instructions may be helpful.
First clone the Morloc repo:
$ git clone https://github.com/morloc-project/morloc
$ cd morloc
Now, you need a container to build Morloc. Create the following script in your
PATH and name it mtest
(or whatever you like):
podman run --shm-size=4g \
--rm \
-e HOME=$HOME \
-e PATH="$HOME/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
-v $HOME/.morloc:$HOME/.morloc \
-v $HOME/.local/bin:$HOME/.local/bin \
-v $PWD:$HOME \
-w $HOME \
ghcr.io/morloc-project/morloc/morloc-test:latest "$@"
Swap out podman
for whichever Docker-compatible container engine you prefer.
This script will allow the image to alter your MORLOC_HOME directory and to
install the Morloc executable locally (to ~/.local/bin
).
With this container, you can build the Morloc executable:
$ mtest stack install
This will build the morloc executable. The stack
utility will install a
Haskell compiler (ghc) in a local sandbox along with all required Haskell
modules. This will take awhile the first time you run it.
On success, morloc
will not be installed. You can test the build like so:
$ mtest morloc -h
You may run the Morloc test suite from here as well:
$ mtest stack test
And you can build Morloc programs:
$ mtest morloc make foo.loc
$ mtest ./nexus foo 1 2 3
As before, you need to run the generated executable in your environment as well.
4.3. Setting up IDEs
Editor support for Morloc is still a work in progress.
If you are working in vim, you can install Morloc syntax highlighting as follows:
$ mkdir -p ~/.vim/syntax/
$ mkdir -p ~/.vim/ftdetect/
$ curl -o ~/.vim/syntax/loc.vim https://raw.githubusercontent.com/morloc-project/vimmorloc/main/loc.vim
$ echo 'au BufRead,BufNewFile *.loc set filetype=loc' > ~/.vim/ftdetect/loc.vim
Developing a full plugin is left as an excercise for the user (pull requests welcome).
If you are working in VS Code, I’ve made a simple extension that offers syntax highlighting and snippets. You can pull the extension from GitHub and move it into your VS code extensions folder:
$ git clone https://github.com/morloc-project/vscode ~/.vscode-oss/extensions/morloc
Update the path to the extensions folder as needed on your system. This manually installs the extensions, which is not ideal. I’ll push the extension to the official VS Code package manager soon.
4.4. Say hello
The inevitable "Hello World" case is implemented in Morloc like so:
module main (hello)
hello = "Hello up there"
The module named main
exports the term hello
which is assigned to a literal
string value.
Paste this code into a file (e.g. "hello.loc") and then it can be imported by other Morloc modules or directly compiled into a program where every exported term is a subcommand.
$ morloc make hello.loc
This command will produce two files: a C program, nexus.c
, and its compiled
binary, nexus
. The nexus
is the command line interface (CLI) to the commands
exported from the module.
Calling nexus
with no arguments or with the -h
flag, will print a help
message:
$ ./nexus -h
Usage: ./nexus [OPTION]... COMMAND [ARG]...
Nexus Options:
-h, --help Print this help message
-o, --output-file Print to this file instead of STDOUT
-f, --output-format Output format [json|mpk|voidstar]
Exported Commands:
hello
return: Str
This usage message is automatically generated. For each exported term, it
specifies the input (none, in this case) and output types as inferred by the
compiler. For this case, the exported command is just the term hello
, so no
input types are listed.
The command is called as so:
$ ./nexus hello
Hello up there
4.5. Dice rolling
Let’s write a little program rolls a pair of 20-sided dice and prints the larger result. Here is the Morloc script:
module dnd (rollAdv)
import types
source Py from "foo.py" ("roll", "max", "narrate")
roll :: Int -> Int -> [Int]
max :: [Int] -> Int
narrate :: Int -> Str
rollAdv = narrate (max (roll 2 20))
Here we define a module named dnd
that exports the function rollAdv
. In line
2, we import the required type definitions from the Morloc module
types
. Later on we’ll go into how these types are defined. In line 3, we
source three functions from the Python file "foo.py". In lines 5-8, we assign
each of these functions a Morloc type signature. You can think of the arrows
in the signatures as separating arguments. For example, the function roll
takes two integers as arguments and returns a list of integers. The square
brackets indicate lists. In the final line, we define the rollAdv
function.
The Python functions are sourced from the Python file "foo.py" with the following code:
import random
def roll(n, d):
# Roll an n-sided die d times, return a list of results
return [random.randint(1, d) for _ in range(n)]
def narrate(roll_value):
return f"You rolled a {roll_value!s}"
Nothing about this code is particular to Morloc.
One of Morloc’s core values is that foreign source code never needs to know anything about the Morloc ecosystem. Sourced code should always be nearly idiomatic code that uses normal data types. The inputs and outputs of these functions are natural Python integers, lists, and strings — they are not Morloc-specific serialized data or ad hoc textual formats.
We can compile and run this program as so:
$ morloc make main.loc
$ ./nexus rollAdv
"You rolled a 20"
As a random function, it will return a new result every time.
So, what’s the point? We could have done this more easily in a pure Python script. Morloc generates a CLI for us, type checks the program, and performs some runtime validation (by default, just on the final inputs and outputs). But there are other tools in the Python universe can achieve this same end. Where Morloc is uniquely valuable is in the polyglot setting.
4.6. Polyglot dice rolling
In this next example, we rewrite the prior dice example with all three functions being sourced from different languages:
module dnd (rollAdv)
import types
source R from "foo.R" ("roll")
source Cpp from "foo.hpp" ("max")
source Py from "foo.py" ("narrate")
roll :: Int -> Int -> [Int]
max :: [Int] -> Int
narrate :: Int -> Str
rollAdv = narrate (max (roll 2 20))
Note that all of this code is exactly the same as in the prior example except the source statements.
The roll
function is defined in R:
roll <- function(n, d){
sample(1:d, n)
}
The max
function is defined in C++:
#pragma one
#include <vector>
#include <algorithm>
template <typename A>
A max(const std::vector<A>& xs) {
return *std::max_element(xs.begin(), xs.end());
}
The narrate
function is defined in Python:
def narrate(roll_value):
return f"You rolled a {roll_value!s}"
This can be compiled and run in exactly the same way as the prior monoglot example. It will run a bit slower, mostly because of the heavy cost of starting the R interpreter.
The Morloc compiler automatically generates all code required to translate data between the languages. Exactly how this is done will be discussed later.
4.7. Parallelism example
Here is an example showing a parallel map function written in Python that calls C++ functions.
module m (sumOfSums)
import types
source Py from "foo.py" ("pmap")
source Cpp from "foo.hpp" ("sum")
pmap a b :: (a -> b) -> [a] -> [b]
sum :: [Real] -> Real
sumOfSums = sum . pmap sum
This Morloc script exports a function that sums a list of lists of real numbers. Here we use the dot operator for function composition. The sum function is implemented in C++:
// C++ header sourced by morloc script
#pragma one
#include <vector>
double sum(const std::vector<double>& vec) {
double sum = 0.0;
for (double value : vec) {
sum += value;
}
return sum;
}
The parallel pmap
function is written in Python:
# Python3 file sourced by morloc script
import multiprocessing as mp
def pmap(f, xs):
with mp.Pool() as pool:
results = pool.map(f, xs)
return results
The inner summation jobs will be run in parallel. The pmap
function has the
same signature as the non-parallel map
function, so can serve as a drop-in
replacement.
This can be compiled and run with the lists being provided in JSON format:
$ morloc make main.loc
$ ./nexus sumOfSums '[[1,2],[3,4,5]]'
5. Syntax and Features
5.1. Source function from foreign languages
In Morloc, you can import functions from many languages and compose them under a common type system. The syntax for importing functions from source files is as follows:
source Cpp from "foo.hpp" ("map", "sum", "snd")
source Py from "foo.py" ("map", "sum", "snd")
This brings the functions map
, sum
, and snd
into scope in the Morloc
script. Each of these functions must be defined in the C++ and Python
scripts. For Python, since map
and sum
are builtins, only snd
needs to be
defined. So the foo.py
function only requires the following two lines:
def snd(pair):
return pair
The C++ file, foo.hpp
, may be implemented as a simple header file with generic
implementations of the three required functions.
#pragma once
#include <vector>
#include <tuple>
// map :: (a -> b) -> [a] -> [b]
template <typename A, typename B, typename F>
std::vector<B> map(F f, const std::vector<A>& xs) {
std::vector<B> result;
result.reserve(xs.size());
for (const auto& x : xs) {
result.push_back(f(x));
}
return result;
}
// snd :: (a, b) -> b
template <typename A, typename B>
B snd(const std::tuple<A, B>& p) {
return std::get<1>(p);
}
// sum :: [a] -> a
template <typename A>
A sum(const std::vector<A>& xs) {
A total = A{0};
for (const auto& x : xs) {
total += x;
}
return total;
}
Note that these implementations are completely independent of Morloc — they have no special constraints, they operate on perfectly normal native data structures, and their usage is not limited to the Morloc ecosystem. The Morloc compiler is responsible for mapping data between the languages. But to do this, Morloc needs a little information about the function types. This is provided by the general type signatures, like so:
map a b :: (a -> b) -> [a] -> [b]
snd a b :: (a, b) -> b
sum :: [Real] -> Real
The syntax for these type signatures is inspired by Haskell, with the exception
that generic terms (a
and b
here) must be declared on the left. Square
brackets represent homogenous lists and parenthesized, comma-separated values
represent tuples, and arrows represent functions. In the map
type, (a → b)
is a function from generic value a
to generic value b
; [a]
is the input
list of initial values; [b]
is the output list of transformed values.
Removing the syntactic sugar for lists and tuples, the signatures may be written as:
map a b :: (a -> b) -> List a -> List b
snd a b :: Tuple2 a b -> b
sum :: List Real -> Real
These signatures provide the general types of the functions. But one general type may map to multiple native, language-specific types. So we need to provide an explicit mapping from general to native types.
type Cpp => List a = "std::vector<$1>" a
type Cpp => Tuple2 a b = "std::tuple<$1,$2>" a b
type Cpp => Real = "double"
type Py => List a = "list" a
type Py => Tuple2 a b = "tuple" a b
type Py => Real = "float"
These type functions guide the synthesis of native types from general
types. Take the C++ mapping for List a
as an example. The basic C++ list type
is vector
from the standard template library. After the Morloc typechecker
has solved for the type of the generic parameter a
, and recursively converted
it to C++, its type will be substituted for $1
. So if a
is inferred to be
a Real
, it will map to the C++ double
, and then be substituted into the list
type yielding std::vector<double>
. This type will be used in the generated C++
code.
5.2. Functions
Functions are defined with arguments seperated by whitespace:
foo x = g (f x)
Here foo
is the Morloc function name and x
is its first argument.
Morloc supports the .
operator for composition, so we can re-write foo
as:
foo = g . f
Morloc supports partial application of arguments.
For example, to multiply every element in a list by 2, we can write:
multiplyByTwo = map (mul 2.0)
5.3. Modules
A module includes all the code defined under the import <module_name>
statement. It can be imported with the import
command.
The following module defines the constant x
and exports it.
module foo (x)
x = 42
Another module can import Foo
:
import foo (x)
...
A term may be imported from multiple modules. For example:
module main (add)
import cppbase (add)
import pybase (add)
import rbase (add)
This module imports that C++, Python, and R add
functions and exports all
of them. Modules that import add
will import three different versions of the
function. The compiler will choose which to use.
5.4. Docstrings and toolboxes
Morloc has early support for docstrings in comments that propagate to the generated CLI code.
For example:
module main (add, sum)
import types
source Py from "main.py" ("add", "sum")
--' Add two floats
add :: Real -> Real -> Real
--' Sum a list of floats
sum :: [Real] -> Real
The special comment --'
introduces a docstring that is attached to the
following type signature and will be propagated through to the code generated by
the backend.
$ morloc make main.loc
$ ./neuxs -h
Usage: ./nexus [OPTION]... COMMAND [ARG]...
Nexus Options:
-h, --help Print this help message
-o, --output-file Print to this file instead of STDOUT
-f, --output-format Output format [json|mpk|voidstar]
Exported Commands:
add Add two floats
param 1: Real
param 2: Real
return: Real
sum Sum a list of floats
param 1: [Real]
return: Real
5.5. User arguments and outputs
User data is passed to Morloc executables as positional arguments to the
specified function subcommand. The argument may be a literal JSON string or a
filename. For files, the format may be JSON, MessagePack, or Morloc binary (VoidStar)
format. The Morloc nexus first checks for a ".json" extension, if found, the
nexus attempts to parse the file as JSON. Next the nexus checks for a ".mpk" or
".msgpack" extension, and if found it attempts to parse the file as a
MessagePack file. If neither extension is found, it attempts to parse the file
first as Morloc binary, then as MessagePack, and finally as JSON. See the
parse_cli_data_argument
function in morloc.h
for details.
Passing literal JSON on the command line can be a little unintuitive since extra quoting may be required. Here are a few examples:
# The Bash shell removes the outer quotes, so double quoting is required
$ ./nexus foo '"this is a literal string"'
# Single quotes are lists is fine, still need to quote inner strings
$ ./nexus bar '["asdf", "df"]'
# By default, output is written to JSON format
$ ./nexus baz 1 2 3 > baz.json
# The output can be directly read by a downstream morloc program
$ ./nexus bif baz.json
Data may be written to MessagePack or VoidStar via the -f
argument:
$ ./nexus -f voidstar head '[["some","random"],["data"]]' > data.vs
$ ./nexus -f json head data.vs > data.json
$ ./nexus -f mpk reverse data.json > data.mpk
$ ./nexus reverse data.mpk
"some"
The VoidStar format is the richest and is the only form that contains the schema describing the data.
5.6. Mapping general types to native types
When a function is sourced from a foreign language, Morloc needs to know how Morloc general types map to the function’s native types. This information is encoded in language-specific type functions. For examples:
type R => Bool = "logical"
type Py => Bool = "bool"
type Cpp => Bool = "bool"
type R => Int32 = "integer"
type Py => Int32 = "int"
type Cpp => Int32 = "uint32"
Language-specific types are always quoted since they may contain syntax that is illegal in the Morloc language.
A function such as an integer addition function addi
:
add :: Int32 -> Int32 -> Int32
This can be automatically mapped to a C++ function with the prototype int addi(int x, int y)
.
Containers can be similarly mapped to native types:
type Py => List a = "list" a
type Cpp => List a = "std::vector<$1>" a
The $1
symbol is used to represent the interpolation of the first parameter
into the native type. So the Morloc type List Int32
would translate to
std::vector<uint32>
in C++.
5.7. Records and tables
Morloc has dedicated support for defining records and tables.
Here is a record example:
module foo (incAge)
import types (Int, Str)
source R from "foo.R" ("incAge")
record Person = Person
{ name :: Str
, age :: Int
}
-- Increment the person's age
incAge :: Person -> Person
Where the "foo.R" file contains the function:
incAge <- function(person){
person$age <- person$age + 1
person
}
This may be compiles and run as so:
$ morloc make foo.loc
$ ./nexus incAge '{name:"Alice",age:32}'
Tables are similar, but all fields are lists of equal length:
module foo (readPeople, addPeople)
import types (Int, Str)
source R from "people-tables.R"
( "read.delim" as readPeople
, "addPeople")
table People = People
{ name :: Str
, age :: Int
}
readPeople :: Filename -> People
addPeople :: [Str] -> [Int] -> People -> People
With "people-tables.R" containing:
addPeople <- function(names, ages, df){
rbind(df, data.frame(name = names, age = ages))
}
This can be compiled and run as so:
# read a tab-delimited file containing person rows
./nexus readPeople data.tab > people.json
# add a row to the table
./nexus addPeople '["Eve"]' '[99]' people.json
The record and table types are currently excessively strict. Defining functions that add or remove fields/columns requires defining entirely new records/tables. Generic funtions for operations such as removing lists of columns cannot be defined at all. Future versions of Morloc will have more flexible tables, but for now most operations should be done in coarser functions. Alternatively, custom non-parameterized tabular/record types may be defined.
The case study in the Morloc
paper uses a
JsonObj
type that represents an arbitrarily nested object that serializes
to/from JSON. In Python, it deserializes to a dict
object; in R, to a list
objects; and in C to an ordered_json
object from from
(Niels Lohmann’s json package).
A similar approach could be used to define a non-parameterized table type that serialized to CSV or some binary type (such as Parquet).
These non-parameterized solutions are flexible and easy to use, but lack the reliability of the typed structures.
5.8. Type hierarchies
In some cases, there is a single obvious native type for a given Morloc general
type. For example, most languages have exactly only one reasonable way to
represent a boolean. However, other data types have may have many forms. The
Morloc List
is a simple example. In Python, the list
type is most often used
for representing ordered lists, however it is inefficient for heavy numeric
problems. In such cases, it is better to use a numpy
vector. Further, there
are data structures that are isomorphic to lists but that are more efficient for
certain problems, such as stacks and queues.
We can define type hierarchies that represent these relationships.
-- aliases at the general level
type Stack a = List a
type LList a = List a
type ForwardList a = List a
type Deque a = List a
type Queue a = List a
type Vector a = List a
-- define a C++ specialization for each special type
type Cpp => Stack a = "std::stack<$1>" a
type Cpp => LList a = "std::list<$1>" a
type Cpp => ForwardList a = "std::forward_list<$1>" a
type Cpp => Deque a = "std::deque<$1>" a
type Cpp => Queue a = "std::queue<$1>" a
Here we equate each of the specialized containers with the general List
type. This indicates that they all share the same common form and can all be
converted to the same binary. Then we specify language specific patterns as
desired. When the Morloc compiler seeks a native form for a type, it will
evaluate these type functions by incremental steps. At each step the compiler
first checks to see if there is a direct native mapping for the language, if
none is found, it evaluates the general type function.
Native type annotations are also passed to the language binders, allowing them to implement specialized behavior for more efficient conversion to binary.
5.9. One term may have many definitions
Morloc supports what might be called term polymorphism. Each
term may have many definitions. For example, the function mean
has three
definitions below:
import base (sum, div, size, fold, add)
import types
source Cpp from "mean.hpp" ("mean")
mean :: [Real] -> Real
mean xs = div (sum xs) (size xs)
mean xs = div (fold 0 add xs) (size xs)
mean
is sourced directly from C++, it is defined in terms of the sum
function, and it is defined more generally with sum
written as a fold
operation. The Morloc compiler is responsible for deciding which
implementation to use.
The equals operator in Morloc indicates functional substitutability. When you
say a term is "equal" to something, you are giving the compiler an option for
what may be substituted for the term. The function mean
, for example, has many
functionally equivalent definitions. They may be in different languages, or they
may be more optimal in different situations.
Now this ability to simply state that two things are the same can be abused. The following statement is syntactically allowed in Morloc:
x = 1
x = 2
What is x
after this code is run? It is 1 or 2. The latter definition does
not mask the former, it appends the former. Now in this case, the two values
are certainly not substitutable. Morloc has a simple value checker that will
catch this type of primitive contradition. However, the value checker cannot yet
catch more nuanced errors, such as:
x = div 1 (add 1 1)
x = div 2 1
In this case, the type checker cannot check whithin the implementation of add
,
so it cannot know that there is a contradiction. For this reason, some care is
needed in making these definitions.
5.10. Overload terms with typeclasses
In addition to term polymorphism, Morloc offers more traditional ad hoc polymorphism over types. Here typeclasses may be defined and type-specific instances may be given. This idea is similar to typeclasses in Haskell, traits in Rust, interfaces in Java, and concepts in C++.
In the example below, Addable
and Foldable
classes are defined and used to
create a polymorphic sum
function.
class Addable a where
zero a :: a
add a :: a -> a -> a
instance Addable Int where
source Py "arithmetic.py" ("add")
source Cpp "arithmetic.hpp" ("add")
zero = 0
instance Addable Real where
source Py "arithmetic.py" ("add")
source Cpp "arithmetic.hpp" ("add")
zero = 0.0
class Foldable f where
foldr a b :: (a -> b -> b) -> b -> f a -> b
instance Foldable List where
source Py "foldable.py" ("foldr")
source Cpp "foldable.hpp" ("foldr")
sum = foldr add zero
The instances may import implementations for many languages.
The native functions may themselves be polymorphic, so the imported
implementations may be repeated across many instances. For example, the Python
add
may be written as:
def add(x, y):
return x + y
And the C++ add as:
template <class A>
A add(A x, A y){
return x + y;
}
Typeclasses work currently, but they are not yet in their final form. They cannot be directly imported and they are not explicit in type signatures. I would be happy to hear your thoughts on Morloc typeclasses. Getting them right is crucial to the grand structure of the future Morloc library.
5.11. Binary forms
Every Morloc general type maps unambiguously to a binary form that consists of several fixed-width literal types, a list container, and a tuple container. The literal types include a unit type, a boolean, signed integers (8, 16, 32, and 64 bit), unsigned integers (8, 16, 32, and 64 bit), and IEEE floats (32 and 64 bit). The list container is represented by a 64-bit size integer and a pointer to an unboxed vector. The tuple is represented as a set of values in contiguous memory. These basic types are listed below:
Type | Domain | Schema | Width (bytes) |
---|---|---|---|
Unit |
|
z |
1 |
Bool |
|
b |
1 |
UInt8 |
\([0,2^{8})\) |
u1 |
1 |
UInt16 |
\([0,2^{16})\) |
u2 |
2 |
UInt32 |
\([0,2^{32})\) |
u4 |
4 |
UInt64 |
\([0,2^{64})\) |
u8 |
8 |
Int8 |
\([-2^{7},2^{7})\) |
i1 |
1 |
Int16 |
\([-2^{15},2^{15})\) |
i2 |
2 |
Int32 |
\([-2^{31},2^{31})\) |
i4 |
3 |
Int64 |
\([-2^{63},2^{63})\) |
i8 |
4 |
Float32 |
IEEE float |
f4 |
4 |
Float64 |
IEEE double |
f8 |
8 |
List x |
het lists |
a{x} |
\(16 + n \Vert a \Vert \) |
Tuple2 x1 x2 |
2-ples |
t2{x1}{x2} |
\(\Vert a \Vert + \Vert b \Vert\) |
TupleX \(\ t_i\ ...\ t_k\) |
k-ples |
\(tkt_1\ ...\ t_k\) |
\(\sum_i^k \Vert t_i \Vert\) |
\(\{ f_1 :: t_1,\ ... \ , f_k :: t_k \}\) |
records |
\(mk \Vert f_1 \Vert f_1 t_1\ ...\ \Vert f_k \Vert f_k t_k \) |
\(\sum_i^k \Vert t_i \Vert\) |
All basic types may be written to a schema that is used internally to direct
conversions between Morloc binary and native basic types. The schema values
are shown in the table above. For example, the type [(Bool, [Int8])]
would
have the schema at2bai1
. You will not usually have to worry about these
schemas, since they are mostly used internally. They are worth knowing, though,
since they appear in low-level tests, generated source code, and binary data
packets.
Here is an example of how the type ([UInt8], Bool)
, with the value
([3,4,5],True)
, might be laid out in memory:
---
03 00 00 00 00 00 00 00 00 -- first tuple element, specifies list length (little-endian)
30 00 00 00 00 00 00 00 00 -- first tuple element, pointer to list
01 00 00 00 00 00 00 00 00 -- second tuple element, with 0-padding
03 04 05 -- 8-bit values of 3, 4, and 5
---
Records and tables are represented as tuples. The names for each field are stored only in the type schemas. Morloc also supports tables, which are just records where the field types correspond to the column types and where fields are all equal-length lists. Records and tables may be defined as shown below:
A record
is a named, heterogenous list such as a struct
in C, a dict
in
Python, or a list
in R. The type of the record exactly describes the data
stored in the record (in contrast to parameterized types like [a]
or Map a
b
). They are represented in Morloc binary as tuples, the keys are only stored
in the schemas.
A table
is like a record where field types represent the column types. But
table
is not just syntactic sugar for a record of lists, the table
annotation is passed with the record through the compiler all the way to the
translator, where the language-specific serialization functions may have special
handling for tables.
record Person = Person { name :: Str, age :: UInt8 }
table People = People { name :: Str, age :: Int }
alice = { name = "Alice", age = 27 }
students = { name = ["Alice", "Bob"], age = [27, 25] }
The Morloc type signatures can be translated to schema strings that may be parsed by a foundational Morloc C library into a type structure. Every supported language in the Morloc ecosystem must provide a library that wraps this Morloc C library and translates to/from Morloc binary given the Morloc type schema.
5.12. Defining non-primitive types
Types that are composed entirely of Morloc primitives, lists, tuples, records
and tables may be directly and unambiguously translated to Morloc binary forms
and thus shared between languages. But what about types that do not break down
cleanly into these forms? For example, consider the parameterized Map k v
type
that represents a collection with keys of generic type k
and values of generic
type v
. This type may have many representations, including a list of pairs, a
pair of columns, a binary tree, and a hashmap. In order for Morloc to know how
to convert all Map
types in all languages to one form, it must know how to
express Map
type in terms of more primitive types. The user can provide this
information by defining instances of the Packable
typeclass for Map
. This
typeclass defines two functions, pack
and unpack
, that construct and
deconstruct a complex type.
class Packable a b where
pack a b :: a -> b
unpack a b :: b -> a
The Map
type for Python and C++ may be defined as follows:
type Py => Map key val = "dict" key val
type Cpp => Map key val = "std::map<$1,$2>" key val
instance Packable ([a],[b]) (Map a b) where
source Cpp from "map-packing.hpp" ("pack", "unpack")
source Py from "map-packing.py" ("pack", "unpack")
The Morloc user never needs to directly apply the pack
and unpack
functions. Rather, these are used by the compiler within the generated code. The
compiler constructs a serialization tree from the general type and from this
trees generates the native code needed to (un)pack types recursively until only
primitive types remain. These may then be directly translated to Morloc binary
using the language-specific binding libraries.
In some cases, the native type may not be as generic as the general type. Or you
may want to add specialized (un)packers. In such cases, you can define more
specialized instances of Packable
. For example, if the R
Map
type is
defined as an R
list, then keys can only be strings. Any other type should
raise an error. So we can write:
type R => Map key val = "list" key val
instance Packable ([Str],[b]) (Map Str b) where
source R from "map-packing.R" ("pack", "unpack")
Now whenever the key generic type of Map
is inferred to be anything other than
a string, all R implementations will be pruned.
5.13. The universal library
A module may export types, typeclasses, and function signatures but no
implementations. Such a module would be completely language agnostic. A powerful
approach to building libraries in the Morloc ecosystem is to write one module
that defines all types, then $n$ modules for language-specific implementations
that import the type module, and then one module to import and merge all
implementations. This is the approach taken by the base
module and by other
core libraries.
In the future, when hundreds of languages are supported, and when possibly some functions may even have many implementations per language, it will be desirable to have finer control over what functions are used. One solution would be to add filters to the import statement. Thus the import expressions would be a sort of query. Alternatively, constraints could be added at the function level, and thus the entire Morloc script would be a query over the universal library. This would be especially powerful when imported types are expressed as unknowns to be inferred by usage.
6. Internals
6.1. Packet protocols
A Morloc program compiles into a single "nexus" file and a "pool" file for each language. The nexus program accepts user input, dispatches to the pools, and formats results. The pools "pool" all functions from each specific language. When a command is sent to the nexus, it initializes the pools as background processes, pool daemons. Then the nexus sends the given information to the pool daemon that contains the top function in the composition.
Data is passed between the nexus and between daemons using a combination of UNIX domain socket messages and shared memory storage.
Packets follow a binary protocol that is defined in the data/morloc.h
file in
the main Morloc repository.
Each packet has a 32-byte with the following fields:
Field | Type | Width | Description |
---|---|---|---|
|
unsigned int |
4 |
Morloc-specific constant: 6D F8 07 07 (mo-ding-ding) |
|
unsigned int |
2 |
Morloc "plain" (see below for description) |
|
unsigned int |
2 |
Packet version |
|
unsigned int |
2 |
Metadata convention |
|
unsigned int |
2 |
Evaluation mode (e.g., debug or verbose) |
|
8-byte union |
8 |
Packet description |
|
unsigned int |
4 |
length of metadata block |
|
unsigned int |
8 |
length of the data payload |
The Morloc plain specifies membership in a special set of Morloc libraries that follows a certain set of conventions of requirements. This is broader than a namespace. An example of a plain would be a "Safe" plain where all functions are verified in some fashion. In this case, it may be required that any packets that are read into a program should also have been created by a member of the Safe plain. Handling for plains is not yet implemented.
The command
specifies the type of packet. There are currently three: data
,
call
, and ping
.
A data packet represents data and how it is stored. These packets have three main uses. First, they are used to store arguments that are passed between the nexus and pool daemons. Second, they are transport data within the pools that has not been transformed into native structures. This allows data to be efficiently transferred between languages without being "naturalized" unless needed. Third, data packets may be written by the nexus as the output of a program.
Field | Type | Width | Description |
---|---|---|---|
|
char |
1 |
Constant 0 - for "data" type |
|
char |
1 |
Source type (e.g., file, message, pointer to shared memory) |
|
char |
1 |
Data format (e.g., JSON, MessagePack, Text, Voidstar) |
|
char |
1 |
Compression algorithm |
|
char |
1 |
Encryption algorithm |
|
char |
1 |
Pass or fail status |
|
char[2] |
2 |
zero-padding |
The source
field states "where" the data is. It might be stored literally in
the packet itself. It might be stored in a file, in which case the packet stores
only the filename. It might be stored in the shared memory volumes, in which
case the packet stores a relative pointer to the memory location. The format
field stores the data type. It might be JSON data, MessagePack data, literal
text (e.g., for error messages), or the Morloc binary (which I sometimes call
Voidstar). The compression
and encryption
fields are not currently used. But
in the future they will be needed to support packet-specific
compression/encryption of payloads. The status
field represents whether the
producing computation failed. If so, then the packet may contain an error
message.
A call packet is sent from the nexus to a pool daemon or between daemons (for foreign calls). These packets specify the function to call and contain a contiguous list of data packets representing positional arguments as their payload.
Field | Type | Width | Description |
---|---|---|---|
|
char |
1 |
Constant 1 - for "call" type |
|
char |
1 |
stores if the call is local or remote (or something else) |
|
char[2] |
2 |
zero-padding |
|
unsigned int |
4 |
ID for the function to call in the pool daemon |
The entrypoint
field is currently used when a call packet is sent to a remote
machine for execution.
A ping packet is a header-only packet that is sent to check if some resource (such as a pool daemon) is up and running. The nexus will ping all pool daemons until it gets a response before sending them call instructions.
Field | Type | Width | Description |
---|---|---|---|
|
char |
1 |
Constant 2 - for "ping" type |
|
char[7] |
7 |
0 padding |
Metadata is stored in blocks that start with an 8-byte metadata header.
Field | Type | Width | Description |
---|---|---|---|
|
char[3] |
3 |
Constant "mmh" string (Morloc Metadata Header) |
|
char |
1 |
Metadata type (e.g., schema string or data hash) |
|
uint |
4 |
Data format (e.g., JSON, MessagePack, Text, Voidstar) |
Currently Morloc uses the metadata section to store data type schemas and to cache data hashes. In the future, these sections could be extended to store provenance data, benchmark data, environment info, runtime dags, or even raw inputs and code. The nexus can write data packets as a final output.
7. Q&A
7.1. I only use one language, is Morloc still useful?
Yes, Morloc remains useful even if you only use one programming language.
While Morloc is designed to allow polyglot development, its core benefits also apply to single-language projects. In the Morloc ecosystem, you may continue working in your preferred language, but focus shifts to writing libraries instead of standalone applications.
Morloc lets you compose these functions and automatically generate applications from them, offering several advantages:
-
Broader usability: Your functions can be easily reused and easily accessed by other language communities.
-
Improved testing and benchmarking: Functions can be integrated into language-agnostic testing and benchmarking frameworks.
-
Future-proofing: If you ever need to migrate to a new language, Morloc’s type annotations and documentation carry over—only the implementation needs to change. And if you want to leave the Morloc ecosystem, your implementation does not need to change.
-
Better workflows: Especially in fields like bioinformatics, Morloc shifts workflows from chaining applications and files to composing typed functions and native data structures, making pipelines more robust and easier to validate.
-
No more format parsing: Morloc data structures replace bespoke file formats and offer efficient serialization.
While language interop is a major feature of Morloc, it is not main purpose. The very first version of Morloc was not even polyglot at all. The focus originally was to just have a simple composition language that separated pure code from associated effects, conditions, caching, etc.
The primary goal of Morloc is to support the development of composable, typed universal libraries. Support for many languages is required for this goal, since no one language is best for all cases. Most Morloc users would continue to program in their favorite language, but gain the ability to compose, share, and extend functionality more easily.
7.2. Is this just a bioinformatics workflow language?
No. The Morloc paper pre-released here, is focused on bioinformatics applications. As discussed at length in the paper, Morloc addresses systematic flaws in the traditional approaches to building bioinformatics workflows. Given the need, and also given my personal background, bioinformatics is a good place to start. However, Morloc can be more broadly applied to any functional problem.
7.3. Do you really want to deprecate all the bioinformatics formats?
Yes, with the possible exception of a specialized binary formats which may offer performance benefits.
For human-readable semi-structured formats, I think only two are necessary. A tabular format (e.g., CSV) and a tree format (e.g., JSON).
7.4. Do you really want to deprecate all the bioinformatics applications?
Yes, with the exception of interactive graphical applications.
7.5. Do you want to deprecate the conventional workflow languages?
Not entirely. They do offer good scaling support that Morloc cannot yet match. Some also support GUIs which offer an intuitive and valuable way to visualize and create workflows from coarse components.
Hybrid solutions are possible. Conventional workflow languages can wrap Morloc compiled applications and pass Morloc generated data in place of bespoke bioinformatics formats.
7.6. Does Morloc allow function-specific environments?
No, unlike workflow managers such as Snakemake and Nextflow, Morloc does not offer function-specific environments. This is a deliberate design choice.
Dependency resolution is a hard and heavily researched problem. The general goal of dependency solvers is to find one set of dependencies that satisfies the entire program. The bioinformatics community often gives up on finding unified environments and instead runs each function in its independent environment. With every function running in its own container, all dependency issues are encapsulated and all functions may be executed from one manager. But this comes at a heavy cost. Each application must be wrapped in a script, the script must be executed via an expensive system call into the container, and data must be serialized and sent to the container. This approach is reasonable for workflows with a small number of heavy components. But from a programming language perspective, wrapping every function call in its own environment is inefficient and opaque.
Morloc is designed not to hide problems in boxes, but rather to solve the root problem. Conventional workflow languages attempt to simplify workflows design by layering frameworks over the functions. The Morloc approach is the exact opposite. First delete everything unnecessary from all applications and lift their light algorithmic cores into clean, well-typed libraries. Then build upwards through composition of these pure functions—and judicious use of impure ones—to create efficient, reliable, and composable tools.
Now, if you really do need to run something in a container, you can just make a function that wraps a call to a container and then use it just as you would any other function. You could even write a wrapper function that takes a record with all the metadata needed for a conda environment and execute its function within that environment. We can do this through libraries, so there is no need to hardcode this pattern into the Morloc language itself.
The reproducibility of Morloc workflows may be ensured by running the entire Morloc program in an environment or container, with a single set of dependencies. The specific Morloc compiler version can be specified and modules may be imported using their git hashes. This is done in the current Morloc examples (see the Dockerfile in the workflow-comparisons folder of https://github.com/morloc-project/examples).
7.7. What about recursion?
Recursion is not directly supported in Morloc. It may be in the future, but the implementation is complicated by inconsistent support for recursion in different languages. For example, Python has a recursion limit that can cause runtime crashes. Instead, recursive algorithms should be written as control fucntions in foreign languages.
For example, the Morloc bio
module contains many generic tree traversal
algorithms. One is the foldTree
function:
foldTree n e l a
:: (l -> a -> a)
-> (n -> e -> a -> a)
-> a
-> RootedTree n e l
-> a
Here the RootedTree
type represents a phylogenetic tree with generic node
(n
), edge (e
), and leaf (l
) types. The foldTree
function accepts two
functions as arguments. The first (l → a → a)
, reduces leaf values to the
accumulator value. The second, reduces branch given the parent node, edge and
current accumulator value. The strategy for implementing this function is
decided in the foreign source code. The current C++ implementation is
recursive, but iterative alternatives are possible.
7.8. What about object-oriented programming?
An "object" is a somewhat loaded term in the programming world. As far as Morloc is concerned, an object is a thing that contains data and possibly other unknown stuff, such as hidden fields and methods. All types in Morloc have must forms that are transferable between languages. Methods do not easily transfer; at least they cannot be written to Morloc binary. However, it is possible to convey class-like APIs through typeclasses. Hidden fields are more challenging since, by design, they are not accessible. So objects cannot generally be directly represented in the Morloc ecosystem.
Objects that have a clear "plain old data" representation can be handled by
Morloc. These objects, and their component types, must have no vital hidden
data, no vital state, and no required methods. Examples of these are all the
basic Python types (int
, float
, list
, dict
, etc) and many C++ types such
as the standard vector and tuple types. When these objects are passed between
languages, they are reduced to their pure data.
7.9. Is Morloc still relevant when AI can program and translate?
Maybe. Morloc may serve as a system for functional composition, verification, and automation even when most functions are generated by machines.
I’ll lay out an argument for this below, starting with a few proposition:
-
Adversaries exist. AIs may themselves be adversarial or there might be adversarial code in ecosystem around the AIs (for example, prompt injection). Humans can’t trust humans, humans can’t trust AIs, AIs can’t trust humans, and AIs can’t trust AIs. Depending on their architecture, AIs may not even be able to trust their own memories.
-
Stupid is fast. Narrow intelligence outperforms general intelligence for narrow problems. A vast AGI system with deep understanding of physics and Shakespeare will not be the fastest tool for sorting a list of integers. There will always be a need for programs across the intelligence spectrum — from classical functions, to statistical models, to general intelligences.
-
Creating functions is expensive. Designing high-performance algorithms is not trivial. Even simple functions, like sorting algorithms, require deep thought to optimize for a given use case. But there is a further combinatorial explosion of more complex physical simulations, graphics engines, and statistical algorithms. While simple functions might be created in seconds, others may take years of CPU time to optimize.
-
Reproducibility is important. Future AIs may serve as nearly perfect oracles, but they are complex entities and future AIs will likely be capable of evolving over time as persons. So they will likely not give equivalent answers day to day. It is valuable to be able to crystallize a thought process into something that will behave the same every time it is invoked on a given input. So again, functions are important.
-
Correctness is important. If functions are being composed by AIs to create new programs, any function that does not behave in the way the AI expects can cause cascading errors. It doesn’t matter how intelligent the AI is, if it is building programs from functions that it cannot verify, then the programs may not be safe.
A few things follow from these propositions.
First, AI will benefit from writing functions. Even in a world with no humans, they will need functions for efficiently solving narrow problems. They will likely generate libraries of billions of specialized functions. Some may be classical functions and others may be small statistical models. By caching these functions, compute time can be saved. Rather than generating entire programs from first principles, they can build them logically through composition of prior functions. The same forms of abstractions that help humans reason will also be of value to AIs. Yes, they have far larger working memories than we do, but that does not change the fact that abstraction and composition reduce the costs of re-derivation.
Time can also be saved if different AIs share functions they have written (both with each other and with humans). Since adversaries exist, shared functions must be verified. But verification is hard, especially if a godlike super-intelligence were trying to hide adversarial features in the binary. The problem can be simplified by using a controlled language that can be formally verified by a trusted classical computer program — a compiler. So rather than share functions as binary, it would make sense to share them in strict controlled languages. For this reason, I believe that something resembling current programming languages will exist far into the future. Their main purpose will be as easily verifiable and human readable specifications for languages that can be compiled into high-performance code.
So in this imagined future, there are billions of functions in databases that are written in verifiable languages readable by humans, classical machines, and AIs. But what language is used? Maybe the AIs can converge on one standard. But even for AIs, and perhaps especially for them, I don’t think a single language is optimal. Rather, just as in human mathematics, there will likely be many languages for many domains. Languages make trade-offs. In general, the more complex a language is, the more difficult it is to parse, verify and optimize. So even if we ignore human factors, multi-lingual ecosystems are still likely to appear. Adding in human factors, we are again likely to see a spectrum of languages that accept different trade offs in rigor, ease of use, and domain specificity.
I predict a future where humans and AIs use libraries of functions written in specialized languages. All the functions need to be easily verifiable by an outside actor and verified functions need to be composed to more complex programs using a well-verified composer. Since we don’t trust any agent to verify, we need a classical program. Morloc is a potential candidate for this role. It would serve as a classical composition tool, function verification ecosystem, automation engine, and conceptual framework for organizing and using billions of mostly machine generated functions.
Of course, the future is impossible to predict, especially where AI is concerned. It is possible that AIs will converge on a single universal representation for computation. It is possible that the need for human readability and curation may disappear. It is possible that classical computer functions could be entirely replaced by discrete mathematical constructs that are composable and machine verifiable but entirely incomprehensible to humans.
7.10. Why is it named after Morlocks, weren’t they, like, bad?
While the Morlocks of Wellian fame are best known for their culinary preferences, I think Wells misrepresented them. And even if he didn’t, we don’t treat our own Eloi any better. Meat choices aside, the Morlocks worked below to maintain the machines that simplified life above. That’s why the Morloc language adapts their name.
8. Contact
This is a young project and any brave early users are highly valued. Feel free to contact me for any reason!
-
discord: https://discord.gg/dyhKd9sJfF
-
BlueSky: https://bsky.app/profile/morloc-project.bsky.social
-
email: z@morloc.io