\[ \def\ea{\widehat{\alpha}} \def\eb{\widehat{\beta}} \def\eg{\widehat{\gamma}} \def\sep{ \quad\quad} \newcommand{\mark}[1]{\blacktriangleright_{#1}} \newcommand{\expr}[3]{#1\ \ \vdash\ #2\ \dashv\ \ #3} \newcommand{\packto}[2]{#1\ \approx >\ #2} \newcommand{\apply}[3]{#1 \bullet #2\ \Rightarrow {\kern -1em} \Rightarrow\ #3} \newcommand{\subtype}[2]{#1\ :\leqq\ #2} \newcommand{\braced}[1]{\lbrace #1 \rbrace} \]

1. Intro

Morloc is a strongly-typed functional programming language where functions are imported from foreign languages and unified under a common type system. This language is designed to serve as the foundation for a universal library of functions. Each function in the library has one general type and zero or more implementations. An implementation may be either a function sourced from a foreign language or a composition of such functions. All interop code is generated by the Morloc compiler.

2. Why Morloc?

2.1. Compose functions across languages under a common type system

Morloc allows functions from polyglot libraries to be composed in a simple functional language. The focus isn’t on classic interoperability (e.g., calling Python from C) or serialization (e.g., sending data between applications via protobufs) — though morloc implementations may use these under the hood. Instead, you define types, import implementations, and build complex programs through function composition. The compiler invisibly generates any required interop code.

2.2. Write in your favorite language, share with everyone

Do you want to write in language X but have to write in language Y because everyone in your team does or because your expected users do? Love C for algorithms, R for statistics, but don’t want to write full apps in either? Morloc lets you mix and match, so you can use each language where it shines, with no bindings or boilerplate.

2.3. Run benchmarks and tests across languages

Tired of learning new benchmark and testing suites across all your languages? Is it hard to benchmark similar tools wrapped in applications with varying input formats, input validation costs, or startup overhead? In Morloc, functions with the same general type signature can be swapped in and out for benchmarking and testing. The same test suites and test cases will work across all supported languages because inputs/output of all functions of the same type share equivalent Morloc binary forms, making validation and comparison easy.

2.4. Design universal libraries

With Morloc, we can build abstract libraries using the general types as a logical framework. Then we can import implementations of these functions from one or more of the supported languages and easily test and benchmark them. These libraries are the foundation for an ecosystem where functions may be verified, organized/searched by type, and used to build rigorous programs.

2.5. Make better bioinformatics workflows

Within the bioinformatics space, Morloc can serve as a replacement for the brittle application/file paradigm of workflow design. Replace heavy CLI applications with pure function libraries, ad hoc textual file formats with explicit data structures, and workflow specifications with function compositions. See the the first Morloc paper for details (pre-released here).

3. Current status

Morloc is under heavy development in several areas:

  • language support – Need to streamline language onboarding and add languages beyond the current three (Python, R, and C++)

  • type system – There’s lots to do here: sum types, effect handling, constraints, extensible records

  • performance – The shared library implementation lacks proper memory defragmentation, and there is some unnecessary memory copying between languages

  • scaling – I’ve implemented some of the infrastructure and syntax for remote job submission, but more work is needed before it can be used in practice

  • tooling – We need a linter, debugger, better dependency management, more flexible builds

There is one island of stability, though: the native functions Morloc imports are fully independent of Morloc itself. For a given Morloc program, most of your code will be pure functions in native languages (e.g., Python, C++, or R). This code will never have to change between Morloc versions. Where Morloc will change is in how it describes these native functions, the syntax it uses to compose them, and the particulars of code generation.

Is Morloc ready for production? Maybe. Currently, Morloc has many sharp edges, and new versions may introduce breaking changes. So Morloc is most appropriate right now for adventurous first adopters who can solve problems and write clear issue reports.

Want to contribute? The most helpful thing you can do is join the community (see the Contact section), try out Morloc, and offer feedback on social media or via GitHub issue reports. The community is just starting, and the language is young, so you can strongly influence how the system evolves.

4. Getting Started

4.1. Installing Morloc

The easiest way to run Morloc is through containers in a UNIX environment. Linux will work natively. MacOS and Windows are more complicated and I’ll deal with their special later on. For Windows, you will need to install through the Windows Subsystem for Linux.

The only dependency is a container engine, currently two — docker and podman — are supported.

Podman instructions

Unlike Docker, podman runs rootless by default, so no sudo is required. On Linux, it also runs natively with no daemons.

On MacOS and Windows (even through WSL), a virtual machine is required. So you will need to initialize podman as so:

$ podman init
$ podman start

You can confirm that podman is running by entering

$ podman --version
podman version 5.4.1   # version on my current setup

Docker instructions

Docker requires either sudo access or rootless mode configuration.

Verify Docker is running:

$ docker --version
$ docker run hello-world

After confirming either Podman or Docker is running on your system, pull an installation script from morloc-project/morloc-manager:

$ curl -o morloc-manager https://raw.githubusercontent.com/morloc-project/morloc-manager/refs/heads/main/morloc-manager.sh

morloc-manager is a (mostly) POSIX compatible shell script that is tested on Linux and MacOS. On Windows, you should may run this in the Windows Subsystem for Linux (WSL). You may execute morloc-manager directly, for example bash morloc-manager. But it is more convenient to make the script executable and move it to a folder in your execution path.

The script provides usage information on request:

$ morloc-manager -h
$ morloc-manager install -h
$ morloc-manager uninstall -h

The install subcommand will build the Morloc home directory, install required containers, and generate executable scripts. The two you need are menv and morloc-shell.

menv runs arbitrary commands in a Morloc container. It can be used to compile and run Morloc programs. For example:

$ menv morloc make -o foo foo.loc
$ menv ./foo double 21

morloc-shell is the same as the above script except that you enter the container in an interactive shell rather than just running one command. The container has installations for Python, R, and C++ compiler as well as vim and other conveniences.

4.2. Setting up IDEs

We are currently working on expanding the editor support for Morloc.

Here are supported editors

vim

If you are working in vim, you can install Morloc syntax highlighting as follows:

$ mkdir -p ~/.vim/syntax/
$ mkdir -p ~/.vim/ftdetect/
$ curl -o ~/.vim/syntax/loc.vim https://raw.githubusercontent.com/morloc-project/vimmorloc/main/loc.vim
$ echo 'au BufRead,BufNewFile *.loc set filetype=loc' > ~/.vim/ftdetect/loc.vim
vim highlights

VS Code / VSCodium / Cursor

We have a publicly available "morloc" extension with support for highlighting and snippet expansion.

vscode highlights

Zed

This is currently under development, see repo here.

The extension is mostly written, and the required Tree-sitter grammar is written, but there are bugs to be resolved. I’m happy to accept pull requests!

I’ve also written several syntax highlighting and static analysis tools:

Pygmentize

A repo with the Pygmentize parser can be found here. This parser is used to highlight code here in the manual. It can be easily integrated into Python code, e.g., in the Weena discord bot.

Tree-sitter

Tree-sitter is a program for defining parsers and using them to query languages and add advanced grammatical understanding to editors. These grammars require a complete lexer and parser specification for the language. This grammar is available for Morloc, see repo here. Tree-sitter allows general purpose syntax highlighting (e.g., over the command line) and parses a full concrete syntax tree from the code:

tree sitter

4.3. Say hello

The inevitable "Hello World" case is implemented in Morloc like so:

module main (hello)
hello = "Hello up there"

The module named main exports the term hello which is assigned to a literal string value.

Paste this code into a file (e.g. "hello.loc") and then it can be imported by other Morloc modules or directly compiled into a program where every exported term is a subcommand.

$ morloc make hello.loc

This command will produce an executable named nexus (a copy of the pre-compiled nexus binary) along with a nexus.manifest JSON file and pool files for each language used (e.g., pool.py, pool-cpp.out, pool.R). The nexus is the command line interface (CLI) to the commands exported from the module.

Calling nexus with no arguments or with the -h flag, will print a help message:

$ ./nexus -h
Usage: ./nexus [OPTION]... COMMAND [ARG]...

Nexus Options:
 -h, --help            Print this help message
 -o, --output-file     Print to this file instead of STDOUT
 -f, --output-format   Output format [json|mpk|voidstar]

Exported Commands:
  hello
    return: Str

This usage message is automatically generated. For each exported term, it specifies the input (none, in this case) and output types as inferred by the compiler. For this case, the exported command is just the term hello, so no input types are listed.

The command is called as so:

$ ./nexus hello
Hello up there

4.4. Dice rolling

Let’s write a little program rolls a pair of 20-sided dice and prints the larger result. Here is the Morloc script:

module dnd (rollAdv)
import types
source Py from "foo.py" ("roll", "max", "narrate")

roll :: Int -> Int -> [Int]
max :: [Int] -> Int
narrate :: Int -> Str

rollAdv = narrate (max (roll 2 20))

Here we define a module named dnd that exports the function rollAdv. In line 2, we import the required type definitions from the Morloc module types. Later on we’ll go into how these types are defined. In line 3, we source three functions from the Python file "foo.py". In lines 5-8, we assign each of these functions a Morloc type signature. You can think of the arrows in the signatures as separating arguments. For example, the function roll takes two integers as arguments and returns a list of integers. The square brackets indicate lists. In the final line, we define the rollAdv function.

The Python functions are sourced from the Python file "foo.py" with the following code:

import random

def roll(n, d):
    # Roll an n-sided die d times, return a list of results
    return [random.randint(1, d) for _ in range(n)]

def narrate(roll_value):
    return f"You rolled a {roll_value!s}"

Nothing about this code is particular to Morloc.

One of Morloc’s core values is that foreign source code never needs to know anything about the Morloc ecosystem. Sourced code should always be nearly idiomatic code that uses normal data types. The inputs and outputs of these functions are natural Python integers, lists, and strings — they are not Morloc-specific serialized data or ad hoc textual formats.

This module is dependent on the types module, which in turn is dependent on the prelude module. So before compiling, we need to import both of these:

morloc install prelude
morloc install types

Now we can compile and run this program as so:

$ morloc make main.loc
$ ./nexus rollAdv
"You rolled a 20"

As a random function, it will return a new result every time.

So, what’s the point? We could have done this more easily in a pure Python script. Morloc generates a CLI for us, type checks the program, and performs some runtime validation (by default, just on the final inputs and outputs). But there are other tools in the Python universe can achieve this same end. Where Morloc is uniquely valuable is in the polyglot setting.

4.5. Polyglot dice rolling

In this next example, we rewrite the prior dice example with all three functions being sourced from different languages:

module dnd (rollAdv)

import types

source R from "foo.R" ("roll")
source Cpp from "foo.hpp" ("max")
source Py from "foo.py" ("narrate")

roll :: Int -> Int -> [Int]
max :: [Int] -> Int
narrate :: Int -> Str

rollAdv = narrate (max (roll 2 20))

Note that all of this code is exactly the same as in the prior example except the source statements.

The roll function is defined in R:

roll <- function(n, d){
    sample(1:d, n)
}

The max function is defined in C++:

#pragma one
#include <vector>
#include <algorithm>

template <typename A>
A max(const std::vector<A>& xs) {
    return *std::max_element(xs.begin(), xs.end());
}

The narrate function is defined in Python:

def narrate(roll_value):
    return f"You rolled a {roll_value!s}"

This can be compiled and run in exactly the same way as the prior monoglot example. It will run a bit slower, mostly because of the heavy cost of starting the R interpreter.

The Morloc compiler automatically generates all code required to translate data between the languages. Exactly how this is done will be discussed later.

4.6. Parallelism example

Here is an example showing a parallel map function written in Python that calls C++ functions.

module m (sumOfSums)

import types

source Py from "foo.py" ("pmap")
source Cpp from "foo.hpp" ("sum")

pmap a b :: (a -> b) -> [a] -> [b]
sum :: [Real] -> Real

sumOfSums = sum . pmap sum

This Morloc script exports a function that sums a list of lists of real numbers. Here we use the dot operator for function composition. The sum function is implemented in C++:

// C++ header sourced by morloc script
#pragma one
#include <vector>

double sum(const std::vector<double>& vec) {
    double sum = 0.0;
    for (double value : vec) {
        sum += value;
    }
    return sum;
}

The parallel pmap function is written in Python:

# Python3 file sourced by morloc script
import multiprocessing as mp

def pmap(f, xs):
    with mp.Pool() as pool:
        results = pool.map(f, xs)
    return results

The inner summation jobs will be run in parallel. The pmap function has the same signature as the non-parallel map function, so can serve as a drop-in replacement.

This can be compiled and run with the lists being provided in JSON format:

$ morloc make main.loc
$ ./nexus sumOfSums '[[1,2],[3,4,5]]'

5. Syntax and Features

5.1. Basic data types

Morloc supports standard primitives, homogenous lists, and tuples.

Booleans

Booleans in Morloc are represented as True or False under the Bool type.

Numbers

Numbers may be represented in normal format, scientific notation, or as hexidecimal/octal/binary format:

-- standard notation for integers and floats
42
4.2

-- scientific notation (upper or lowercase 'e')
4.2E16
4.2e16
4.2e-8

-- hexadecimal notation (case insensitive)
0xf00d
0xDEADBEAF

-- octal notation (upper or lowercase 'o')
0o755

-- binary notation (upper or lowercase 'b')
0b0101
Strings

Morloc supports multi-line strings and string interpolation.

String interpolation uses the #{…​} syntax. The expression inside the braces must evaluate to a Str:

helloYou you = "hello #{you}"

String interpolation works with any expression that returns a string.

Multi-line strings use triple quotes. Leading indentation common to all lines is stripped:

longString =
  """
  this is a long
  string
  """

-- single-quote triple-strings also work
anotherString = '''single quotes are also OK'''

-- triple-quoted strings can contain internal quotes
quoted = """you can use "internal quotes" freely"""
Tuples

Tuples may be used to store a fixed number of terms of different type.

x = (1, True, 6.45)
Records

Records are essentially named tuples. Record type definitions and language-specific handling will be addressed later, but record data is expressed as shown in the example below:

{ name = "Alice", age = 42 }
Lists

Lists are used to store a variable number of terms of the same type.

x = [1,2,3]

Lists do not yet have special accessors (e.g., slicing). List operations are performed through sourced functions such as head, tail, take, etc. You may import many of these from the root modules.

5.2. Pattern functions for data access/update

Data structures may be accessed and modified using pattern functions. These patterns may be getters that extract a tuple of values from a data structure or setters that update a data structure without changing its type.

getters patterns

A getter pattern describes an optionally branching path into a data structure. Each segment of the path may be a tuple index, a record key, or a group of indices/keys. The terminal positions in the pattern are returned as elements in a tuple. Here are a few examples:

-- return the 1st element in a tuple of any size
.0 (1,2) -- return 1
.0 ((1,3),2,5) -- return (1,3)

-- return the 2nd element in the first element of a tuple
.0.1 ((1,3),2,5) -- return 3

-- returns the 2nd and 1st elements in a tuple
.(.1,.0) (1,2,3) -- returns (2,1)
.(.1,.0) (1,2)   -- returns (2,1)

-- indices and keys may be used together
.0.(.x, .y.1) ({x=1, y=(1,2), z=3}, 6) -- returns (1,2)

These patterns are transformed into functions may be used exactly like any other function.

map .1 [(1,2),(2,3)] -- returns [2,3]
setters patterns

Setter patterns are similar but add an assignment statement to each pattern terminus.

.(.0 = 99) (1,2) -- return (99,2)

-- indices and keys may be used together
.0.(.x=99, .y.1=33) ({x=1, y=(1,2), z=3}, 6) -- returns ({x=99, y=(1,33), z=3}, 6)
Table 1. Comparison of patterns to Python syntax
Pattern Python Note

.0

lambda x: x[0]

patterns are functions

.0 x

x[0]

.0.k x

x[0]["k"]

.(.1,.0) x

(x[1], x[0])

foo .0 xs

foo(lambda x: x[0], xs)

higher order

.(.k = 1) x

x["k"] = 1

Note that setters are designed to not mutate data. The spine of the data structure will be copied which retains links to the original data for unmodified fields. So the expression .(.0 = 42) x when translated into Python will create a new tuple with the first field being 42 and the remaining fields assigned to elements of the original field. The same goes for records.

5.3. Source function from foreign languages

In Morloc, you can import functions from many languages and compose them under a common type system. The syntax for importing functions from source files is as follows:

source Cpp from "foo.hpp" ("map", "sum", "snd")
source Py from "foo.py" ("map", "sum", "snd")

This brings the functions map, sum, and snd into scope in the Morloc script. Each of these functions must be defined in the C++ and Python scripts. For Python, since map and sum are builtins, only snd needs to be defined. So the foo.py function only requires the following two lines:

def snd(pair):
    return pair

The C++ file, foo.hpp, may be implemented as a simple header file with generic implementations of the three required functions.

#pragma once
#include <vector>
#include <tuple>

// map :: (a -> b) -> [a] -> [b]
template <typename A, typename B, typename F>
std::vector<B> map(F f, const std::vector<A>& xs) {
    std::vector<B> result;
    result.reserve(xs.size());
    for (const auto& x : xs) {
        result.push_back(f(x));
    }
    return result;
}

// snd :: (a, b) -> b
template <typename A, typename B>
B snd(const std::tuple<A, B>& p) {
    return std::get<1>(p);
}

// sum :: [a] -> a
template <typename A>
A sum(const std::vector<A>& xs) {
    A total = A{0};
    for (const auto& x : xs) {
        total += x;
    }
    return total;
}

Note that these implementations are completely independent of Morloc — they have no special constraints, they operate on perfectly normal native data structures, and their usage is not limited to the Morloc ecosystem. The Morloc compiler is responsible for mapping data between the languages. But to do this, Morloc needs a little information about the function types. This is provided by the general type signatures, like so:

map a b :: (a -> b) -> [a] -> [b]
snd a b :: (a, b) -> b
sum :: [Real] -> Real

The syntax for these type signatures is inspired by Haskell, with the exception that generic terms (a and b here) must be declared on the left. Square brackets represent homogenous lists and parenthesized, comma-separated values represent tuples, and arrows represent functions. In the map type, (a → b) is a function from generic value a to generic value b; [a] is the input list of initial values; [b] is the output list of transformed values.

Removing the syntactic sugar for lists and tuples, the signatures may be written as:

map a b :: (a -> b) -> List a -> List b
snd a b :: Tuple2 a b -> b
sum :: List Real -> Real

These signatures provide the general types of the functions. But one general type may map to multiple native, language-specific types. So we need to provide an explicit mapping from general to native types.

type Cpp => List a = "std::vector<$1>" a
type Cpp => Tuple2 a b = "std::tuple<$1,$2>" a b
type Cpp => Real = "double"
type Py => List a = "list" a
type Py => Tuple2 a b = "tuple" a b
type Py => Real = "float"

These type functions guide the synthesis of native types from general types. Take the C++ mapping for List a as an example. The basic C++ list type is vector from the standard template library. After the Morloc typechecker has solved for the type of the generic parameter a, and recursively converted it to C++, its type will be substituted for $1. So if a is inferred to be a Real, it will map to the C++ double, and then be substituted into the list type yielding std::vector<double>. This type will be used in the generated C++ code.

5.4. Functions

Functions are defined with arguments seperated by whitespace:

foo x = g (f x)

Here foo is the Morloc function name and x is its first argument.

The Morloc internal module, which is imported into all root modules, defines the composition (.) and application ($) operators.

With . , we can re-write foo as:

foo = g . f

Composition chains can build multi-stage pipelines:

process = format . transform . validate . parse

The $ operator is the application operator. It has the lowest precedence, so it can be used to avoid parentheses:

-- these are equivalent
foo (bar (baz x))
foo $ bar $ baz x

Morloc supports partial application of arguments.

For example, to multiply every element in a list by 2, we can write:

multiplyByTwo = map (mul 2)

Partial application works well for leading arguments, but what if we want to partially apply a later argument?

For example, we can use direct partial application with subtraction to create functions that subtract the input argument from a given value:

map (sub 1) [1,2,3] -- returns [0,-1,-2]

But what if we want to do the reverse:

map ??? [1,2,3] -- we want to return [0,1,2]

One solution is to use anonymous function, lambdas, like so:

map (\x -> sub x 1) [1,2,3] -- returns [0,1,2]

Morloc also supports a shortcut for more flexible partial application using underscores as placeholders:

map (sub _ 1) [1,2,3] -- returns [0,1,2]

This will be transformed in the compiler frontend to a lambda, so is behaves identically. These placeholders may also be used in data structures. The following two expressions obtain the same result:

map (\x -> (x,42)) [1,2,3]
map (_,42) [1,2,3]

The placeholders may be used in nested data structures as well:

people :: [Person]
people = zipWith { country = "Pangea", name = _, age = _ } ["Alice", "Bob"] [42, 44]

When multiple placeholders are used, the arguments generated for the lambda are applied in left-to-right order.

Placeholders also work in string interpolation:

map "Hello #{_}!" ["Alice", "Bob"]

5.5. Native Morloc functions

While Morloc’s primary purpose is composing foreign functions, you can also define functions entirely in Morloc without sourcing from any language. These native functions are written using Morloc’s own expression syntax:

module main (double, greet)

import root-py

double :: Real -> Real
double x = x + x

greet :: Str -> Str
greet name = "Hello #{name}!"

Native functions can use composition, where clauses, lambdas, and all other Morloc expression forms. They are compiled down to whichever language the compiler selects for execution.

5.6. where and let clauses

Functions may use where clauses to define local bindings:

f x = y + b where
    y = x + 1
    b = 41.0

Where clauses inherit the scope of their parent and may be nested:

f = x where
    x = y where
        y = a + b
        a = 1.0
    b = 41.0

let is the more orderly cousin of where. The where clause defines terms that can be used anywhere in the where block and that may be any functional argument.

f n =
  let m = n + 1
  let y = m + 2
  in (m + Y)

let also allows you to ignore the result of a computation:

f n =
  let _ = n + 1
  let y = m + 2
  in (5 + Y)

Ignoring the return data from n+1 is completely pointless in this case, but this ability will become crucial when we move on with computational that have side-effects. === Guards

Guards provide conditional branching within function definitions. Each guard clause begins with ? followed by a condition and a result expression, with a : default that is always required as the final case:

abs :: Int -> Int
abs x
  ? x >= 0 = x
  : neg x

Guards are evaluated lazily from top to bottom. The first condition that evaluates to true determines the result; remaining guards are not evaluated. The : default always terminates the guard chain, ensuring exhaustiveness.

Guards work naturally with multiple parameters:

clamp :: Int -> Int -> Int -> Int
clamp lo hi x
  ? x < lo = lo
  ? x > hi = hi
  : x

Guards can be combined with where clauses to define local bindings used in the conditions and result expressions:

classify :: Int -> Str
classify x
  ? x > big = "big"
  ? x > small = "medium"
  : "small"
  where
    big = 100
    small = 10

Guards may appear inside let bindings:

absLet :: Int -> Int
absLet x =
  let result ? x >= 0 = x
             : neg x
  in result

Guards can also be used as inline expressions anywhere a value is expected, enclosed in parentheses:

classify :: Int -> Str
classify x = (? x > 100 = "big" ? x > 10 = "medium" : "small")

5.7. Recursion

Morloc supports recursive function definitions. A function may refer to itself in its body, and the compiler will generate the appropriate code in the target language.

The classic factorial function can be written using guards and self-reference:

fact :: Int -> Int
fact n
  ? n == 0 = 1
  : n * fact (n - 1)

Functions may also be mutually recursive. The following pair of functions determines (rather inefficiently) whether a number is even or odd:

isEven :: Int -> Bool
isEven n
  ? n == 0 = True
  : isOdd (n - 1)

isOdd :: Int -> Bool
isOdd n
  ? n == 0 = False
  : isEven (n - 1)
Caution

Recursive Morloc functions are not equally well supported across all target languages. Some backends may impose recursion depth limits or lack tail-call optimization, which can cause stack overflows for deep recursion.

Additionally, if a recursive function calls foreign functions implemented in different languages, each recursive step may cross a language boundary. These cross-language calls involve data marshalling and inter-process communication, so recursion that spans languages will be significantly slower than recursion within a single language.

5.8. Infix operators

Morloc supports user-defined infix operators with explicit associativity and precedence. Operators are declared with infixl (left-associative) or infixr (right-associative) followed by a precedence level (higher binds tighter):

infixl 6 +
infixl 7 *
infixr 8 **

Operators are given type signatures by wrapping them in parentheses:

(+) a :: a -> a -> a
(*) a :: a -> a -> a
(**) :: Int -> Int -> Int

Operators may be sourced from foreign languages like any other function:

source Py from "ops.py" ("add" as (+), "mul" as (*))

Infix operators work naturally with typeclasses:

class Num a where
    zero a :: a
    negate a :: a -> a
    (+) a :: a -> a -> a
    (*) a :: a -> a -> a

infixl 6 +
infixl 7 *

instance Num Int where
    source Py from "foo.py" ("add" as (+), "mul" as (*), "neg" as negate)
    zero = 0

-- now we can write natural expressions
test_expr :: Int
test_expr = 4 * 7 + 3  -- evaluates to 31 (precedence: 4*7 first, then +3)

Operators may also be imported from other modules:

import ops ((&), (|))

5.9. Records

A record is internally a named tuple. Records may map to different structures in different languages.

A general record is defined as follows:

record Person = Person
    { name :: Str
    , age :: Int
    }

Concrete forms must have the same field names and field types. Since these must be the same, they need not be specified. We only need to specify the name of the concrete type:

record Py => Person = "dict"
record R => Person = "list"
record Cpp => Person = "person_t"

In R and Python, records are typically dict and list types, respectively. These types con contain any fields of any type. In C++, records are represented as structs; these must must be defined in the C code, as shown below.

struct person_t {
    std::string name;
    int age;
};

Functions may be defined that act on the records, as below:

source R from "foo.R" ("incAge" as rinc)
source Py from "foo.py" ("incAge" as pinc)
source Cpp from "foo.hpp" ("incAge" as cinc)

-- Increment the person's age
rinc :: Person -> Person
pinc :: Person -> Person
cinc :: Person -> Person

Records may, like all morloc types, be passed freely between languages. As shown above, records may be written in braces and their type will be inferred.

The "foo.R" file contains the function:

incAge <- function(person){
    person$age <- person$age + 1
    person
}

No special code is needed for person, it is just a builtin R list. Similarly for Python:

def incAge(person):
    person["age"] += 1
    return person

C++ requires a definition of a person_t struct:

struct person_t {
    std::string name;
    int age;
};

person_t incAge(person_t person){
    person.age++;
    return person;
}

Records may be initialized and functions called on them:

foo name age
    = (rinc . pinc . cinc)
      { name = name, age = age }

foo, above, initializes a Person record and then increments its age 3 time in different languages.

Records may contain fields with arbitrarily complex types, but recursive types are not currently supported.

5.10. Tables

Tables are similar, but all fields are lists of equal length:

module foo (readPeople, addPeople)

import root-py (Int, Str)

source R from "people-tables.R"
   ( "read.delim" as readPeople
   , "addPeople")

table People = People
    { name :: Str
    , age :: Int
    }

readPeople :: Filename -> People
addPeople :: [Str] -> [Int] -> People -> People

With "people-tables.R" containing:

addPeople <- function(names, ages, df){
    rbind(df, data.frame(name = names, age = ages))
}

This can be compiled and run as so:

# read a tab-delimited file containing person rows
./nexus readPeople data.tab > people.json

# add a row to the table
./nexus addPeople '["Eve"]' '[99]' people.json

The record and table types are currently strict. Defining functions that add or remove fields/columns requires defining entirely new records/tables. Generic functions for operations such as removing lists of columns cannot be defined at all. For now, most operations should be done in coarser functions. Alternatively, custom non-parameterized tabular/record types may be defined.

The case study in the Morloc paper uses a JsonObj type that represents an arbitrarily nested object that serializes to/from JSON. In Python, it deserializes to a dict object; in R, to a list objects; and in C to an ordered_json object from from (Niels Lohmann’s json package).

A similar approach could be used to define a non-parameterized table type that serialized to CSV or some binary type (such as Parquet).

These non-parameterized solutions are flexible and easy to use, but lack the reliability of the typed structures.

5.11. Effects and delayed evaluation

Morloc is a functional programming language. Here a "function" is a mapping from a value in one domain to value in another domain. This works neatly for pure functions. But it becomes complicated when "effects", like interactions with the operating system, are introduced.

Consider a simple readFile program:

readFile :: Filename -> Str

This Morloc program takes a filename as input and returns a string containing the file contents. Logically, this is a function. You are mapping a filename to the file contents. At a given slice of time on a given system, there is a one-to-one mapping between filenames and files.

You can save the contents in a variable:

contents = readFile "myfile.txt"

contents is now a value storing a string. But the world is not constant. Files change. Let’s say that myfile.txt is a log file and we want to read it at multiple time points. Further, we don’t want to specify the filename every time we call the function. It would be more convenient to partially apply the filename and have a function that will map the applied filename to whatever is currently at the file location. But partially applying the filename to readFile results in a value not a function. In many familiar languages, you can define functions that have no arguments. So you could call read_log() and it would read the log and return a fresh contents blob every time the function was executed.

Let’s look at an even trickier problem. Suppose we wanted a function that returns the current epoch time. How would this be defined in Morloc? You could require arguments that transform the problem to a true function. The world state could be an argument. This state might be the locale or some other reference. The function then becomes

time :: TemporalState -> Time

But once again, when we partially apply the function, we again are reduced to a value. Perhaps we want a "function" that always returns California time.

A second example is random numbers. We can define a family of random functions like so:

runif :: Real -> Real -> Real
choose a :: [a] -> a
coinToss :: ???

runif and choose are functions that produce random values when given the required arguments. But what is coinToss? There are no arguments. Again, you could reparameterize these random functions as pure functions that take a random seed, or a stateful random generator, as an argument. That would look something like this:

runif :: Real -> Real -> RNG -> (Real, RNG)
choose a :: [a] -> RNG -> (a, RNG)
coinToss :: RNG -> (Bool, RNG)

This is the "right" way to do random effects. But it is certainly not the most common way of doing it. In most languages, the random number generator exists in global state. Or the random number generator might use actual system noise to get truly random values.

What we want in Morloc is to preserve the power of partial function application but still have the ability to call "functions" with no arguments. We can achieve this by preventing a term from being evaluated until we explicitly request evaluation.

In Morloc, delayed computatoin is specified with braces at both the type-level and the term-level. So you can write the (rather pointless) expression:

val :: {Int}
val = {42}

This defins an expression that evaluates to the value 42 whenever its value is requested. There are two ways to request evaluation. First, the top value in an program is implicitly forced. Here is val is directly exported in the compiled program, the command ./nexus val will return 42. The second way is to force evaluation with !val.

Let’s look at a less trivial cases:

time :: {Time}
timeStr = "The time right now is #{!time}"

readFile :: Filename -> {Str}
readMyLog = readFile "mylog.log"
logvalue1 = !readMyLog -- read the current log data
logvalue2 = !readMyLog -- read the log again

-- roll one d-sided dice
rollDie :: Int -> {Int}

roll1d20 = {Int}
roll1d20 = rollDie 20

-- stores one pair of independent d20 rolls
roll2d20 = (Int, Int)
roll2d20 = (!roll1d20, !roll1d20)

Let’s dig more into the final function. The roll2d20 initializes with two rolls. It is fully evaluated. What if we want an expression that always evaluates to a fresh pair of 2d20 rolls? For this we can wrap the expression in braces.

roll2d20 = {(Int, Int)}
roll2d20 = {(!roll1d20, !roll1d20)}

The braces around the expression indicates that its evaluation is suspended until forced. Once it is forced to evaluate, the inner rolls are forced as well, leading to fresh dice rolls.

What if we have more complex sets of effects? Perhaps we have many rolls that need to be used in conditionals. Let’s say we want to simulate damage in a DnD attack. If the attack d20 roll is greater than the enemy’s defense, then the attack hits and does damage equal to base damage plus the sum of the rolled attack dice. We could write all this into a suspended expression:

damage :: Int -> Int -> {[Int]} -> {Int}
damage ac mod dmgDice = { ? !roll1d20 > ac = sum !dmgDice : 0 }

This works. But what if we add we add in terms that exist just for their side effects. Suppose we are writing a shellscript-like sequence of commands. Maybe we want to make a directory, move into the directory, create a file, and echo something into it. Here we need to sequence effects.

--' Make a directory
mkdir :: Path -> {ExitCode}
cd :: Path -> {ExitCode}
touch :: Path -> {ExitCode}

scriptAttempt1 :: ExitCode
scriptAttempt1 =
  let _ = !(mkdir "foo/bar")
  let _ = !(cd "foo/bar")
  let e = !(touch "baz")
  in e

This works. The let syntax forces evaluation in a specific order and the let wildcard allows results to be ignored (though here, we should probably not be ignoring the exit code).

Morloc also has a do syntax that allows a more imperative approach:

scriptAttempt1 :: {ExitCode}
scriptAttempt1 = do
  mkdir "foo/bar"
  cd "foo/bar"
  touch "baz"

Note that the expression type is delayed, so we can control when this computation is evaluated.

Here is one last example that shows several ways to implement the same problem

rollAdv :: {Int}
rollAdv = {max !roll1d20 !roll1d20}

rollAdv'' :: {Int}
rollAdv'' =
  let x = !roll1d20
  let y = !roll1d20
  in {max x y}

rollAdv' :: {Int}
rollAdv' = do
  x <- roll1d20
  y <- roll1d20
  max x y

5.12. Optional types

All programming languages must have a way to deal with missing values. If you query a database for a record that doesn’t exist, what is returned? If a parameter is not set, what value does it have? In Python, the None type stores missing values. In R, NULL serves a similar purpose. In both languages, types that may lack values are represented as a union of the original type and the null type. JSON, similarly, stores missing values as null. Other languages solve this problem in libraries. C++ has the standard template library datastructure std::optional<T> for representing values of generic type T that have null, std::nullopt types.

Haskell offers the Maybe sum type that may be Nothing or Just a. This is perhaps the cleanest solution, but it is not practical for Morloc. One of the core principles of Morloc is that sourced functions should be idiomatic. So Morloc needs a built-in mechanism that can vary freely in language-specific implementation while preserving between language consistency. To this end, Morloc offers a dedicated "Optional" type with supported implicit coercion.

5.12.1. Syntax

The ? prefix marks a type as optional and the null primitive indicates an absent value. ?Int is an integer that might be null, ?Str is a string that might be null, and so on. The ? prefix can be applied to any type, including lists (?[Int]), records (?Person), and nested optionals (??Int).

--' Get the first element from a list or empty on failure
safeHead :: [Int] -> ?Int
fromNull a :: a -> ?a -> a

The null keyword represents an absent value:

testNull :: ?Int
testNull = null

5.12.2. Working with optional values

Functions that produce or consume optional values are sourced from foreign languages like any other function. Here is a complete example in Python:

module main (testSafeHead, testSafeHeadEmpty, testFromNull)

import root-py

safeHead :: [Int] -> ?Int
safeHead xs
  ? length xs == 0 = null
  : head xs

source Py from "main.py" ("fromNull")
fromNull a :: a -> ?a -> a

testSafeHead :: ?Int
testSafeHead = safeHead [10, 20, 30]

testSafeHeadEmpty :: ?Int
testSafeHeadEmpty = safeHead []

testFromNull :: Int
testFromNull = fromNull 0 null

The Python implementations handle None in the usual way:

def fromNull(default_val, x):
    if x is None:
        return default_val
    return x

Running this program gives:

$ ./main testSafeHead
10
$ ./main testSafeHeadEmpty
null
$ ./main testFromNull
0

The same pattern works in C++ (using std::optional) and R (using NULL). In C++:

#include <optional>

template <class T>
T fromNull(T default, const std::optional<T>& x) {
  if(x.has_value()){
    return x.value();
  } else {
    return default;
  }
}

In R:

fromNull <- function(default_val, x){
  if(is.null(x)){
    return default_val
  } else {
    return x
  }
}

5.12.3. Optional record fields

Record fields can be optional. This is useful for data with missing or unknown values:

record Person where
  name :: Str
  age :: ?Int
record Py => Person = "dict"

makePerson :: Str -> ?Int -> Person
source Py from "foo.py" ("makePerson")

alice :: Person
alice = makePerson "Alice" (toNull 30)

bob :: Person
bob = makePerson "Bob" null

When serialized to JSON, alice becomes {"name":"Alice","age":30} and the age field of bob becomes null.

5.12.4. Optional values across languages

Optional types work seamlessly across language boundaries. A function in one language can produce an optional value that is consumed by a function in another:

-- C++ produces an optional value
cSafeDiv :: Int -> Int -> ?Int
source Cpp from "foo.hpp" ("cSafeDiv")

-- Python consumes it
pFromNull :: Int -> ?Int -> Int
source Py from "foo.py" ("pFromNull")

-- Chain them together: C++ to Python
testCppToPy :: Int
testCppToPy = pFromNull (-1) (cSafeDiv 10 3)

testCppToPyNull :: Int
testCppToPyNull = pFromNull (-1) (cSafeDiv 10 0)

The Morloc compiler generates the necessary serialization code at each language boundary. A null value in C++ (std::nullopt) is serialized as JSON null, which Python reads as None. The programmer does not need to handle the interop manually.

5.12.5. Implicit coercion

Morloc automatically coerces a non-optional value to an optional when the context requires it. If a function expects ?Int, you can pass a plain Int without wrapping it:

addOpt :: ?Int -> ?Int -> ?Int
source Py from "foo.py" ("addOpt")

-- Both arguments are plain Int, coerced to ?Int automatically
testCoerceAddOpt :: ?Int
testCoerceAddOpt = addOpt 3 4

fromNull a :: a -> ?a -> a
source Py from "foo.py" ("fromNull")

-- The second argument (42) is Int, coerced to ?Int
testCoerceArg :: Int
testCoerceArg = fromNull 0 42

This coercion is transitive: a coerces to ?a, which coerces to ??a.

Coercion also works across language boundaries. If a C++ function returns Int and a Python function expects ?Int, the compiler inserts the appropriate serialization so that the value is received correctly:

-- C++ returns a plain Int
cAddOne :: Int -> Int
source Cpp from "cfoo.hpp" ("cAddOne")

-- Python expects ?Int in the second argument
pUnwrapOr a :: a -> ?a -> a
source Py from "pfoo.py" ("pUnwrapOr")

-- The Int result from C++ is coerced to ?Int for Python
testCppIntToPyOpt :: Int
testCppIntToPyOpt = pUnwrapOr 0 (cAddOne 41)  -- returns 42

5.12.6. Language mapping

Optional types map to native nullable types in each language:

Language Representation

Python

None for null, plain value otherwise

C++

std::optional<T>

R

NULL for null, plain value otherwise

No special type mappings are needed for optionals — the ? prefix works with any type that already has a language mapping.

6. Advanced Types

6.1. Type hierarchies

In some cases, there is a single obvious native type for a given Morloc general type. For example, most languages have exactly only one reasonable way to represent a boolean. However, other data types have may have many forms. The Morloc List is a simple example. In Python, the list type is most often used for representing ordered lists, however it is inefficient for heavy numeric problems. In such cases, it is better to use a numpy vector. Further, there are data structures that are isomorphic to lists but that are more efficient for certain problems, such as stacks and queues.

We can define type hierarchies that represent these relationships.

-- aliases at the general level
type Stack       a = List a
type LList       a = List a
type ForwardList a = List a
type Deque       a = List a
type Queue       a = List a
type Vector      a = List a


-- define a C++ specialization for each special type
type Cpp => Stack a = "std::stack<$1>" a
type Cpp => LList a = "std::list<$1>" a
type Cpp => ForwardList a = "std::forward_list<$1>" a
type Cpp => Deque a = "std::deque<$1>" a
type Cpp => Queue a = "std::queue<$1>" a

Here we equate each of the specialized containers with the general List type. This indicates that they all share the same common form and can all be converted to the same binary. Then we specify language specific patterns as desired. When the Morloc compiler seeks a native form for a type, it will evaluate these type functions by incremental steps. At each step the compiler first checks to see if there is a direct native mapping for the language, if none is found, it evaluates the general type function.

Native type annotations are also passed to the language binders, allowing them to implement specialized behavior for more efficient conversion to binary.

6.2. One term may have many definitions

Morloc supports what might be called term polymorphism. Each term may have many definitions. For example, the function mean has three definitions below:

import base (sum, div, size, fold, add)
import root-cpp
source Cpp from "mean.hpp" ("mean")
mean :: [Real] -> Real
mean xs = div (sum xs) (size xs)
mean xs = div (fold 0 add xs) (size xs)

mean is sourced directly from C++, it is defined in terms of the sum function, and it is defined more generally with sum written as a fold operation. The Morloc compiler is responsible for deciding which implementation to use.

The equals operator in Morloc indicates functional substitutability. When you say a term is "equal" to something, you are giving the compiler an option for what may be substituted for the term. The function mean, for example, has many functionally equivalent definitions. They may be in different languages, or they may be more optimal in different situations.

Now this ability to simply state that two things are the same can be abused. The following statement is syntactically allowed in Morloc:

x = 1
x = 2

What is x after this code is run? It is 1 or 2. The latter definition does not mask the former, it appends the former. Now in this case, the two values are certainly not substitutable. Morloc has a simple value checker that will catch this type of primitive contradition. However, the value checker cannot yet catch more nuanced errors, such as:

x = div 1 (add 1 1)
x = div 2 1

In this case, the type checker cannot check whithin the implementation of add, so it cannot know that there is a contradiction. For this reason, some care is needed in making these definitions.

6.3. Overload terms with typeclasses

In addition to term polymorphism, Morloc offers more traditional ad hoc polymorphism over types. Here typeclasses may be defined and type-specific instances may be given. This idea is similar to typeclasses in Haskell, traits in Rust, interfaces in Java, and concepts in C++.

In the example below, Addable and Foldable classes are defined and used to create a polymorphic sum function.

class Addable a where
    zero a :: a
    add a :: a -> a -> a

instance Addable Int where
    source Py from "arithmetic.py" ("add" as add)
    source Cpp from "arithmetic.hpp" ("add" as add)
    zero = 0

instance Addable Real where
    source Py from "arithmetic.py" ("add" as add)
    source Cpp from "arithmetic.hpp" ("add" as add)
    zero = 0.0

class Foldable f where
    foldr a b :: (a -> b -> b) -> b -> f a -> b

instance Foldable List where
    source Py from "foldable.py" ("foldr" as foldr)
    source Cpp from "foldable.hpp" ("foldr" as foldr)

sum = foldr add zero

The instances may import implementations for many languages.

The native functions may themselves be polymorphic, so the imported implementations may be repeated across many instances. For example, the Python add may be written as:

def add(x, y):
    return x + y

And the C++ add as:

template <class A>
A add(A x, A y){
    return x + y;
}

Typeclasses may be imported from other modules. For example, a module that defines the Ord typeclass and derived operators can be imported and instantiated in another module:

import numops (Ord, (<), (>), (>=), min)

instance Ord Int where
    source Py from "foo.py" ("le" as (<=))

6.4. Defining non-primitive types

Types that are composed entirely of Morloc primitives, lists, tuples, records and tables may be directly and unambiguously translated to Morloc binary forms and thus shared between languages. But what about types that do not break down cleanly into these forms? For example, consider the parameterized Map k v type that represents a collection with keys of generic type k and values of generic type v. This type may have many representations, including a list of pairs, a pair of columns, a binary tree, and a hashmap. In order for Morloc to know how to convert all Map types in all languages to one form, it must know how to express Map type in terms of more primitive types. The user can provide this information by defining instances of the Packable typeclass for Map. This typeclass defines two functions, pack and unpack, that construct and deconstruct a complex type.

class Packable a b where
    pack a b :: a -> b
    unpack a b :: b -> a

The Map type for Python and C++ may be defined as follows:

type Py => Map key val = "dict" key val
type Cpp => Map key val = "std::map<$1,$2>" key val
instance Packable ([a],[b]) (Map a b) where
    source Cpp from "map-packing.hpp" ("pack", "unpack")
    source Py from "map-packing.py" ("pack", "unpack")

The Morloc user never needs to directly apply the pack and unpack functions. Rather, these are used by the compiler within the generated code. The compiler constructs a serialization tree from the general type and from this trees generates the native code needed to (un)pack types recursively until only primitive types remain. These may then be directly translated to Morloc binary using the language-specific binding libraries.

In some cases, the native type may not be as generic as the general type. Or you may want to add specialized (un)packers. In such cases, you can define more specialized instances of Packable. For example, if the R Map type is defined as an R list, then keys can only be strings. Any other type should raise an error. So we can write:

type R => Map key val = "list" key val
instance Packable ([Str],[b]) (Map Str b) where
source R from "map-packing.R" ("pack", "unpack")

Now whenever the key generic type of Map is inferred to be anything other than a string, all R implementations will be pruned.

6.5. Mapping general types to native types

When a function is sourced from a foreign language, Morloc needs to know how Morloc general types map to the function’s native types. This information is encoded in language-specific type functions. For examples:

type R => Bool = "logical"
type Py => Bool = "bool"
type Cpp => Bool = "bool"

type R => Int32 = "integer"
type Py => Int32 = "int"
type Cpp => Int32 = "uint32_t"

Language-specific types are always quoted since they may contain syntax that is illegal in the Morloc language.

A function such as an integer addition function addInt:

addInt :: Int32 -> Int32 -> Int32

This can be automatically mapped to a C++ function with the prototype int addInt(int x, int y).

Containers can be similarly mapped to native types:

type Py => List a = "list" a
type Cpp => List a = "std::vector<$1>" a

The $1 symbol is used to represent the interpolation of the first parameter into the native type. So the Morloc type List Int32 would translate to std::vector<uint32_t> in C++.

7. Human and Machine Interfaces

7.1. Modules and docstrings

A Morloc module describes a set of functions, their types, and their descriptions. We’ve already covered terms and types, now we will cover the descriptions we add to modules, functions, arguments and fields. The extra data is stored in specialized comments (docstrings) that describe the terms and add modifications like defaults. The most obvious use case of these annotations is in specializing CLI interfaces, discussed in the next section, but they aslo inform the generation of rich APIs as well (see the HTTP/TCP/socket section).

7.2. The Command Line Interface

Building a Morloc module will generate a CLI tool where exported functions are presented as typed subcommands.

Here is a minimal example of propagated function descriptions:

import root-py

source Py from "main.py" ("foo", "bar")

--' Take two reals and do thing
foo :: Real -> Real -> Real

--' Convert a list of reals into a thing
bar :: [Real] -> Real

The special comment --' introduces a docstring that is attached to the following type signature and will be propagated through to the code generated by the backend.

$ morloc make main.loc
$ ./neuxs -h
Usage: ./nexus [OPTION...] COMMAND [ARG...]

Nexus Options:
 -h, --help            Print this help message
 -o, --output-file     Print to this file instead of STDOUT
 -f, --output-format   Output format [json|mpk|voidstar]

Exported commands (call with -h/--help for more info):
  foo  Take two reals and do thing
  bar  Convert a list of reals into a thing

More detailed information about each exported subcommand may be accessed as well:

$ ./nexus foo -h
Usage: ./nexus foo ARG1 ARG2

Take two reals and do thing

Positional arguments:
  ARG1  No description given
        type: Real
  ARG2  No description given
        type: Real

Return: Real

Docstrings may also contain tags that specify how the arguments of the exported functions map to CLI positional or optional arguments. Here is a list of the currently supported tags:

  • name - give the CLI subcommand a dedicated name rather than defaulting to the Morloc function name

  • literal - treat an argument the actual data, not a file that contains the data, currently this is just used for strings where literal: true indicates that the extra JSON quotes are not required.

  • unroll - if true, then the record argument is "unrolled" into a group of optional arguments

  • default - the default value for an argument, in JSON format

  • metavar - the variable name used for the argument in the usage text

  • arg - short and long labels for this argument (e.g., "-v/--verbose")

  • true - the flag labels that toggle a boolean argument on

  • false - the flag labels that toggle a boolean argument off

  • return - a description of the returned data (this tag is the same as adding a description docstring to final type in a signature)

Here is a longer example that show-cases these tags:

module foobar (foo, bar)

import root-py

source Py from "foobar.py"
  ("foo", "bar")

--' config record
--' unroll: true
--' arg: --config
record Config where
  --' temporary directory
  --' arg: --tmp
  --' literal: true
  --' default: "/tmp"
  tmpdir :: Str

  --' cache the results
  --' true: --cache
  cache :: Bool

  --' number of threads to use
  --' arg: -t/--num-threads
  --' default: 1
  nthreads :: Int

--' do foo stuff
foo ::
  Config ->
  --' list of integers
  --' metavar: INT_LIST
  [Int] ->
  --' sum of INT_LIST
  Int

--' do bar stuff
--' return: summed values
bar ::
  --' unroll: false
  Config -> [Int] -> Int

The top-level usage statement is as follows

$ ./nexus -h
Usage: ./nexus [OPTION...] COMMAND [ARG...]

Nexus Options:
 -h, --help            Print this help message
 -o, --output-file     Print to this file instead of STDOUT
 -f, --output-format   Output format [json|mpk|voidstar]

Exported commands (call with -h/--help for more info):
  foo  do foo stuff
  bar  do bar stuff

The dedicated usage information for foo can be accessed as well. Here we see that the record Config has been unrolled into a group of optional arguments:

$ ./nexus foo -h
Usage: ./nexus foo [OPTION...] INT_LIST

do foo stuff

Positional arguments:
  INT_LIST  list of integers
            type: [Int]

Group arguments:
  Config: config record
    --config Config
        Default values for this argument group
            tmpdir :: Str
            cache :: Bool
            nthreads :: Int
    --tmp Str
        temporary directory
        type: Str
        default: "/tmp"
    --cache
        cache the results
        default: false
    -t Int, --num-threads Int
        number of threads to use
        type: Int
        default: 1

Return: Int
  sum of INT_LIST

Since each subcommand is a function, the return type is always the same. Unlike in a conventional CLI program, the arguments cannot alter the return type.

The bar subcommand explicitly does not unroll the Config record:

$ ./nexus bar -h
Usage: ./nexus bar ARG1 ARG2

do bar stuff

Positional arguments:
  ARG1  config record
        type: NamRecord Config<>
              {
                  tmpdir :: Str
                  cache :: Bool
                  nthreads :: Int
              }
  ARG2  No description given
        type: [Int]

Return: Int
  summed values

7.3. Composing CLI Tools

Since modules can both be compiled into executable command line tools and imported by other modules, we can naturally compose command line tools.

Here is a little Morloc script that imports a Python program that prints a calendar to STDERR.

module cally (cal)

import root-py

source Py from "cal.py" ("cal")

--' Print a 3-month calendar and some timezones
cal :: () -> ()

Here is another Morloc tool that prints d20 rolls

module dnd (d20)

import root-py

source R from "dnd.R" ("d20")

--' Roll n d20 dice
d20 :: Int -> [Int]

Now we can import both of these into a third module which will expose the functions from both the calendar and dnd modules.

module toolbox (cal, d20)

import .cally
import .dnd

This final module can be compiled and will have a usage statement like so:

Usage: ./nexus [OPTION]... COMMAND [ARG]...

Nexus Options:
 -h, --help            Print this help message
 -o, --output-file     Print to this file instead of STDOUT
 -f, --output-format   Output format [json|mpk|voidstar]

Exported Commands:
  cal   Print a 3-month calendar and some timezones
          return: Unit
  d20   Roll n d20 dice
          param 1: Int
          return: [Int]

7.4. User arguments and outputs

User data is passed to Morloc executables as positional arguments to the specified function subcommand. The argument may be a literal JSON string or a filename. For files, the format may be JSON, MessagePack, or Morloc binary (VoidStar) format. The Morloc nexus first checks for a ".json" extension, if found, the nexus attempts to parse the file as JSON. Next the nexus checks for a ".mpk" or ".msgpack" extension, and if found it attempts to parse the file as a MessagePack file. If neither extension is found, it attempts to parse the file first as Morloc binary, then as MessagePack, and finally as JSON. See the parse_cli_data_argument function in morloc.h for details.

Passing literal JSON on the command line can be a little unintuitive since extra quoting may be required. Here are a few examples:

# The Bash shell removes the outer quotes, so double quoting is required
$ ./nexus foo '"this is a literal string"'

# Single quotes are lists is fine, still need to quote inner strings
$ ./nexus bar '["asdf", "df"]'

# By default, output is written to JSON format
$ ./nexus baz 1 2 3 > baz.json

# The output can be directly read by a downstream morloc program
$ ./nexus bif baz.json

Data may be written to MessagePack or VoidStar via the -f argument:

$ ./nexus -f voidstar head '[["some","random"],["data"]]' > data.vs
$ ./nexus -f json head data.vs > data.json
$ ./nexus -f mpk reverse data.json > data.mpk
$ ./nexus reverse data.mpk
"some"

The VoidStar format is the richest and is the only form that contains the schema describing the data.

7.5. Search and install

The docstrings are used for discoverability as well. In this section I’ll cover how modules are installed as executables or standard modules and how they can be searched.

I’ll demonstrate this with a simple two module Morloc program describing a set of DnD operations. The first module defines general random operations:

fate.loc
module fate (roll, coinToss, choose)

import root-py

source Py from "fate.py"
  ( "roll" as roll
  , "coin_toss" as coinToss
  , "choose" as choose
  )

--' Roll n d-sided dice
roll ::
  --' Number of dice
  Int ->
  --' Number of pips per die
  Int ->
  --' Roll values
  {[Int]}

--' Randomly return True or False
coinToss :: {Bool}

--' Randomly choose one element from a non-empty list
choose a :: [a] -> {a}

The sourced fate.py script contains the following code:

fate.py
import random

def choose(xs):
    return random.choice(xs)

def roll(n, d):
    return [random.randint(1, d) for _ in range(n)]

def coin_toss():
    return bool(random.randint(0,1))

We can install fate with morloc install --build ./fate. This create the morloc module we will need in the future as well as an executable we can test.

We can test this, for example by rolling 3d8:

$ fate roll 3 8
[8,2,5]

Next let’s build on this foundation. First let’s make a simple tavern script that helps generate new characters.

tavern.loc
module tavern (randomClass, randomRace)

import root-py
import .fate (choose)

--' Select a random class
randomClass :: {Str}
randomClass = choose ["Fighter", "Wizard", "Rogue", "Cleric", "Ranger", "Bard"]

--' Select a random race
randomRace :: {Str}
randomRace = choose ["Human", "Elf", "Dwarf", "Halfling"]

Next let’s add a module for combat:

combat.loc
module combat (rollAdv, fighterDamage, threeHits, intro)

import root-py
import root-r
import .fate (roll, coinToss)

--' Roll a pair of d20 dice and keep the larger result
rollAdv :: {Int}
rollAdv = {fold max 0 !(roll 2 20)}

--' Damage done on hit, modifier + sum of dice rolls
damage ::
  --' Enemy Armor Class
  Int ->
  --' Attack modifier
  Int ->
  --' Attack dice
  {[Int]} ->
  --' Damage modifier
  Int ->
  --' Damage dice
  {[Int]} ->
  --' Total damage
  {Int}
damage ac atkMod atkDice dmgMod dmgDice = do
  let atk = atkMod + fold max 0 !atkDice
  let dmg = dmgMod + sum !dmgDice
  { ? atk >= ac = dmg : 0 }

--' Damage calculation for a fighter
fighterDamage ::
  --' Enemy Armor Class
  Int ->
  --' Fighter's damage
  {Int}
fighterDamage ac = damage ac 4 (roll 1 20) 2 (roll 2 8)

source R from "combat.R" ("intro")

--' Introduce a new battle!
intro ::
  --' Monster name
  --' literal: true
  Str ->
  --' DM's monster intro
  Str

We can build just the CLI tools if we wish:

$ morloc make --install combat.loc

This will build the executable combat which is a script that carries with it all the information needed for execution. You can copy it anywhere you like on the system and it will work (so long as the contents of the directory where it was made are preserved). If you want, you can move the executable to a location in your system PATH and then you can call it as a normal program from anywhere.

But Morloc offers a cleaner solution:

$ morloc make --install combat.loc

This command does several things.

First it installs the combat executable to a standard path. The pool/ artifacts and all files in the current working directory need to be moved to a standard location. There are two ways you can specify the required build files.

You can specify required files with --include arguments

$ morloc make --install combat.loc --include fate.loc --include fate.py --include combat.R

Or you can create a package.yaml file and add an include field. The default file can be generaed for you with morloc new. You can then modify the include field list with the required files:

name: combat
version: 0.1.0
homepage: null
synopsis: null
description: null
category: null
license: MIT
author: null
maintainer: null
github: null
bug-reports: null
dependencies: []
# Files to include when installing with `morloc make --install`
include: ["fate.*", "combat.R"]

Then run morloc make --install combat.loc.

In both install paths, the combat source code is copied to the ~/.local/share/morloc/exe/<modname> folder and the executable script itself is written to ~/.local/share/morloc/bin/.

We can view the installed executable:

$ morloc list -v combat
Programs:
  combat  4 commands
    rollAdv :: Int
    fighterDamage :: Int -> Int
    threeHits :: Int -> [Int]
    intro :: Str -> Str

If we add the Morloc bin folder above to PATH, then we can now use this program naturally:

$ combat -h
Usage: morloc-nexus <manifest> [OPTION...] COMMAND [ARG...]

Nexus Options:
 -h, --help            Print this help message
 -o, --output-file     Print to this file instead of STDOUT
 -f, --output-format   Output format [json|mpk|voidstar]

Commands (call with -h/--help for more info):
  rollAdv        Roll a pair of d20 dice and keep the larger result
  fighterDamage  Damage calculation for a fighter
  threeHits      Three hit rolls against an enemy
  intro          Introduce a new battle!
$ combat fighterDamage 15
12
$ combat fighterDamage 15
8

We can also uninstall with morloc uninstall combat. This will cleanly remove the installed source and the installed executable script.

7.6. Building API interfaces

In addition to being CLI tools, compiled Morloc programs can run as long-lived daemons, accepting function calls over HTTP, TCP, or Unix sockets. A router aggregates multiple programs behind a single API.

No extra steps are needed to setup these extra APIs. They are already built into the executable we created in the last session. We only need to activate them.

7.6.1. HTTP protocol

We can start combat as a daemon on HTTP port 8080:

$ combat --daemon --http-port 8080 &
daemon: listening on http port 8080
daemon: ready

The trailing & creates the process in the background. This command launches all language pool processes (Python and R in this case) as child processes in separate process groups. A thread pool handles concurrent requests. If a pool crashes, the daemon detects it restarts it automatically.

We can check the daemon’s health:

$ curl -s localhost:8080/health
{"status":"ok","result":[true]}

The /health endpoint returns the liveness status of each pool.

The running daemons are discoverable:

$ curl -s localhost:8080/discover | jq .
{
  "status": "ok",
  "result": {
    "name": "combat",
    "version": 1,
    "commands": [
      {
        "name": "rollAdv",
        "type": "remote",
        "return_type": "Int",
        "return_schema": "<int>i4",
        "args": [],
        "desc": "Roll a pair of d20 dice and keep the larger result"
      },
      {
        "name": "fighterDamage",
        "type": "remote",
        "return_type": "Int",
        "return_schema": "<int>i4",
        "args": [
          {
            "kind": "pos",
            "type": "Int",
            "schema": "<int>i4"
          }
        ],
        "desc": "Damage calculation for a fighter"
      },
      {
        "name": "threeHits",
        "type": "remote",
        "return_type": "[Int]",
        "return_schema": "<array><int>i4",
        "args": [
          {
            "kind": "pos",
            "type": "Int",
            "schema": "<int>i4"
          }
        ],
        "desc": "Three hit rolls against an enemy"
      },
      {
        "name": "intro",
        "type": "remote",
        "return_type": "Str",
        "return_schema": "<str>s",
        "args": [
          {
            "kind": "pos",
            "type": "Str",
            "schema": "<str>s"
          }
        ],
        "desc": "Introduce a new battle!"
      }
    ]
  }
}

Functions can be called over the port:

$ curl -s -X POST localhost:8080/call/rollAdv -d '[]'
{"status":"ok","result":18}

$ curl -s -X POST localhost:8080/call/fighterDamage -d '[15]'
{"status":"ok","result":12}

Bad commands will return sensible errors:

$ curl -s -X POST localhost:8080/call/fireball -d '[]'
{"status":"error","error":"Unknown command: fireball"}

7.6.2. TCP protocol

HTTP adds overhead per request: headers, text parsing, and the full HTTP framing around each message. When your client is a program rather than a browser or curl, you can skip all of that. The TCP protocol uses a compact binary framing — just a 4-byte big-endian length prefix followed by the JSON payload. This makes it well suited for service-to-service communication, high-throughput automated pipelines, or any context where you control both ends of the connection and want minimal overhead.

Start a daemon on TCP port 9001:

$ combat --daemon --port 9001 &
daemon: listening on tcp port 9001
daemon: ready

Unlike the HTTP protocol, you can’t use curl to talk to a TCP daemon. You need a client that speaks the length-prefixed binary framing. Here is a minimal Python client:

tcp_client.py — minimal TCP client
import socket, struct, json

def call(host, port, method, command=None, args=None):
    msg = {"method": method}
    if command: msg["command"] = command
    if args is not None: msg["args"] = args

    payload = json.dumps(msg).encode()
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((host, port))
    # send 4-byte big-endian length, then the JSON payload
    s.sendall(struct.pack('>I', len(payload)) + payload)

    # read the 4-byte response length, then the response
    resp_len = struct.unpack('>I', s.recv(4))[0]
    resp = s.recv(resp_len)
    s.close()
    return json.loads(resp)

print(call("localhost", 9001, "call", "rollAdv"))
# {"status": "ok", "result": 18}

print(call("localhost", 9001, "call", "fighterDamage", [15]))
# {"status": "ok", "result": 12}

print(call("localhost", 9001, "health"))
# {"status": "ok", "result": [true]}

print(call("localhost", 9001, "discover"))
# {"status": "ok", "result": {"name": "combat", "commands": [...]}}

The request is a JSON object with a method field ("call", "discover", or "health"), an optional command field naming the function, and an optional args array.

7.6.3. Unix socket protocol

For processes running on the same machine, Unix domain sockets are the fastest option. They bypass the entire network stack — no TCP handshake, no port allocation, no loopback routing. This is how Morloc pools communicate with the nexus internally.

To start a daemon on a Unix socket:

$ combat --daemon --socket /tmp/combat.sock &
daemon: listening on unix socket /tmp/combat.sock
daemon: ready

The wire protocol is identical to TCP: a 4-byte big-endian length prefix followed by the JSON payload. The only difference is the socket type.

unix_client.py — minimal socket client
import socket, struct, json

def call(sock_path, method, command=None, args=None):
    msg = {"method": method}
    if command: msg["command"] = command
    if args is not None: msg["args"] = args

    payload = json.dumps(msg).encode()
    s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    s.connect(sock_path)
    s.sendall(struct.pack('>I', len(payload)) + payload)

    resp_len = struct.unpack('>I', s.recv(4))[0]
    resp = s.recv(resp_len)
    s.close()
    return json.loads(resp)

print(call("/tmp/combat.sock", "call", "rollAdv"))
# {"status": "ok", "result": 18}

print(call("/tmp/combat.sock", "call", "fighterDamage", [15]))
# {"status": "ok", "result": 12}

print(call("/tmp/combat.sock", "discover"))
# {"status": "ok", "result": {"name": "combat", "commands": [...]}}

7.6.4. Running all protocols at once

You don’t have to choose. One daemon can listen through all three protocols at the same time:

$ combat --daemon \
      --http-port 8080 \
      --port 9001 \
      --socket /tmp/combat.sock
daemon: listening on unix socket /tmp/combat.sock
daemon: listening on tcp port 9001
daemon: listening on http port 8080
daemon: ready

All three protocols hit the same daemon process and share the same pool processes. A request arriving over HTTP, TCP, or the Unix socket is dispatched identically — only the framing differs.

7.6.5. From single daemons to a router

Everything above shows a single program running as a daemon. This is enough when you have one service, but Morloc programs are designed to be composed. You might have a tavern program that picks character classes and races, and a combat program that resolves attacks and damage. Each is its own compiled Morloc program with its own pools.

You could start each one as an independent daemon on its own port and have your client keep track of which port maps to which program. But that gets tedious. The router solves this: it presents a single HTTP endpoint that discovers and manages all your installed Morloc programs.

The following diagram illustrates how a client request flows through the router to a program daemon and its language pools:

                  Client
                    |
                    | HTTP: POST /call/tavern/randomClass -d '[]'
                    v
             +--------------+
             |    Router    |  morloc-nexus --router --http-port 9090
             |  (HTTP:9090) |  Reads manifests from fdb/ at startup
             +--------------+
              /            \
    Unix socket            Unix socket
            /                \
  +-----------+         +-----------+
  |  tavern   |         |  combat   |
  |  daemon   |         |  daemon   |
  +-----------+         +-----------+
       |                 /        \
       v                v          v
    Python           Python        R
     pool             pool        pool

Each daemon is a child process of the router, started lazily on first request. The router and its daemons communicate over Unix sockets using the same length-prefixed JSON protocol described above.

7.6.6. Router mode

Setup

To make a program available to the router, install it with --install. This copies the program binary and writes a manifest file to the fdb/ directory where the router discovers programs at startup.

$ morloc make --install -o tavern tavern.loc
Installed 'tavern' to ~/.local/share/morloc/bin/tavern

$ morloc make --install -o combat combat.loc
Installed 'combat' to ~/.local/share/morloc/bin/combat

$ ls ~/.local/share/morloc/fdb/
combat.manifest  tavern.manifest
Starting the router
$ mim --router --http-port 9090
router: listening on http port 9090
router: 2 programs registered
router:   - combat (4 commands)
router:   - tavern (2 commands)
router: ready
Listing programs
$ curl -s localhost:9090/programs | python3 -m json.tool
{
    "programs": [
        {
            "name": "combat",
            "running": false,
            "commands": [
                {"name": "rollAdv",        "type": "remote", "return_type": "Int"},
                {"name": "fighterDamage",  "type": "remote", "return_type": "Int"},
                {"name": "threeHits",      "type": "remote", "return_type": "[Int]"},
                {"name": "intro",          "type": "remote", "return_type": "Str"}
            ]
        },
        {
            "name": "tavern",
            "running": false,
            "commands": [
                {"name": "randomClass",  "type": "remote", "return_type": "Str"},
                {"name": "randomRace",   "type": "remote", "return_type": "Str"}
            ]
        }
    ]
}

Programs start lazily — running: false until the first call.

Per-program discovery

You can discover commands for a specific program without starting its daemon:

$ curl -s localhost:9090/discover/tavern | python3 -m json.tool
{
    "name": "tavern",
    "commands": [...]
}
Calling functions

Calls are routed by program name in the URL: /call/<program>/<command>.

$ curl -s -X POST localhost:9090/call/tavern/randomClass -d '[]'
{"status":"ok","result":"Rogue"}

$ curl -s -X POST localhost:9090/call/tavern/randomRace -d '[]'
{"status":"ok","result":"Elf"}

$ curl -s -X POST localhost:9090/call/combat/rollAdv -d '[]'
{"status":"ok","result":17}

$ curl -s -X POST localhost:9090/call/combat/fighterDamage -d '[15]'
{"status":"ok","result":12}

$ curl -s -X POST localhost:9090/call/combat/intro -d '["Goblin"]'
{"status":"ok","result":"A wild Goblin appears!"}

The first call to a program starts its daemon automatically. Subsequent calls reuse the running daemon with no startup cost. If a daemon crashes between calls, the router detects the failure and restarts it transparently.

Error handling
$ curl -s -X POST localhost:9090/call/dungeon/explore -d '[]'
{"status":"error","error":"Unknown program: dungeon"}
Independent daemons vs router-managed daemons

A daemon started manually (e.g., combat --daemon --http-port 8080) is completely independent of the router. The router only knows about programs whose manifests are in the fdb/ directory, and it starts its own daemon instances as child processes. If you start a daemon on your own and also have the same program registered with the router, you will have two separate daemon processes — each with its own pool processes and its own state.

7.6.7. Shutdown

Send SIGTERM (or SIGINT) to stop a daemon or router gracefully. The daemon sends SIGTERM to each pool process group, waits briefly for clean exit, then sends SIGKILL to any stragglers. Unix socket files are removed.

$ kill $DAEMON_PID
daemon: shutting down

$ kill $ROUTER_PID
router: shutting down

When a router shuts down, it terminates all the daemons it started. There is currently no way to stop an individual program’s daemon through the router API — the router manages their lifecycles internally. If you need to restart a specific program, restart the router.

7.6.8. Summary

Mode Flag Description

Daemon

--daemon

Run one program as a persistent service

Router

--router

Aggregate all installed programs (fdb/) behind one API

HTTP

--http-port <n>

RESTful JSON API (curl-friendly)

TCP

--port <n>

Length-prefixed JSON over TCP

Socket

--socket <path>

Length-prefixed JSON over Unix socket

fdb

--fdb <path>

Override manifest directory (default: ~/.local/share/morloc/fdb)

8. Modules and Libraries

8.1. Importing modules

Every Morloc file is a module. A module declaration names the module and optionally lists the terms it exports:

module mylib (foo, bar)

This declares a module named mylib that exports foo and bar. Only exported terms are visible to other modules that import this one.

If a module exports everything it defines, you can use the wildcard form:

module mylib (*)

For submodules that exist only to be imported by a parent, you can omit the name entirely:

module (*)

An anonymous module’s name is inferred from its file path relative to the importing module. For example, if main.loc imports .utils, the compiler will resolve the module in utils/main.loc (or utils.loc) and assign it the name utils.

Morloc distinguishes between two kinds of imports: system modules and local modules.

System modules are installed packages that live in ~/.local/share/morloc/lib/. They are imported by name, without any prefix:

import root-py
import root-cpp

System modules are installed with morloc install:

$ morloc install root
$ morloc install root-py

Local modules are files or directories within your own project. They are imported with a dot (.) prefix to distinguish them from system modules:

import .utils (helper)
import .lib.math (square)

The dot prefix tells the compiler to look for the module relative to the directory of the importing file, not in the system library.

Both system and local imports support selective imports. Without a selector, all exported terms are brought into scope:

import root-py             -- import everything from root-py
import .mylib              -- import everything from local mylib
import .mylib (foo, bar)   -- import only foo and bar from local mylib

When you write import .foo, the compiler looks for the module relative to the directory containing the current file. It checks two locations, in order:

  1. A directory module: foo/main.loc

  2. A file module: foo.loc

Dot-separated paths map to nested directories. For example, import .lib.math resolves to either lib/math/main.loc or lib/math.loc.

Here is an example project layout:

project/
  main.loc            -- module main, imports .utils and .lib.math
  utils.loc           -- module (*), a flat file module
  utils.py
  lib/
    math/
      main.loc        -- module (*), a directory module
      main.py

The top-level main.loc imports both:

module main (negate_square, square_negate)

type Py => Real = "float"

import .utils (negate)
import .lib.math (square)

negate_square :: Real -> Real
negate_square x = negate (square x)

square_negate :: Real -> Real
square_negate x = square (negate x)

The flat file utils.loc exports negate:

module (*)

source Py from "utils.py" ("negate")

type Py => Real = "float"

negate :: Real -> Real

And the directory module lib/math/main.loc exports square:

module (*)

source Py from "main.py" ("square")

type Py => Real = "float"

square :: Real -> Real

Local modules can also import other local modules. The path is always relative to the importing file. For example, if bar/baz/main.loc needs to import a sibling at bif/biz/, it writes:

import .bif.biz (mul)

This resolves relative to bar/baz/, looking for bar/baz/bif/biz/main.loc.

Since root is also the name of a system module, a local directory named root/ must be imported with the dot prefix to avoid ambiguity:

import root         -- imports the system "root" module
import .root        -- imports the local "root/" directory

The dot prefix always forces local resolution, so there is never a collision between local and system module names.

8.2. Installing modules

The default Morloc modules are hosted on GitHub under the morloclib organization. Modules can be installed with the morloc install command:

$ morloc install internal
$ morloc install root
$ morloc install root-cpp
$ morloc install root-py
$ morloc install root-r

Installed modules are stored in ~/.local/share/morloc/lib/ and can be imported in any Morloc script.

To view the modules that are currently installed, you can run morloc list. This will list all installed modules, their version, and their short descriptions. Adding the -v option additionally prints the types of all exported terms.

To view just the exports of one desired module, you can include pattern that matches the module of interest:

$ morloc list -v il
Modules:
  internal
    pack a b :: a -> b
    unpack a b :: b -> a
    (.) a b c :: (b -> c) -> (a -> b) -> a -> c
    ($) :: (a -> b) -> a -> b

Here il matches any module with a name including the ordered characters i and l — only internal in this case.

8.3. The universal library

A module may export types, typeclasses, and function signatures but no implementations. Such a module would be completely language agnostic. A powerful approach to building libraries in the Morloc ecosystem is to write one module that defines all types, then $n$ modules for language-specific implementations that import the type module, and then one module to import and merge all implementations. This is the approach taken by the base module and by other core libraries.

In the future, when hundreds of languages are supported, and when possibly some functions may even have many implementations per language, it will be desirable to have finer control over what functions are used. One solution would be to add filters to the import statement. Thus the import expressions would be a sort of query. Alternatively, constraints could be added at the function level, and thus the entire Morloc script would be a query over the universal library. This would be especially powerful when imported types are expressed as unknowns to be inferred by usage.

8.4. Planes of libraries

Important The infrastructure for "planes" is not yet constructed, so the following is speculative

The concept of "planes" is central to the future organization of Morloc and is one of the primary reasons that I created it. A plane is like a namespace for a community’s modules—​but instead of organizing by category or programming language, modules in a plane share a common philosophy about quality, trustworthiness, software design and the review process.

Currently, the universe of functions is separated first by language and then by subject area. Morloc, being polyglot, allows the first mode of separation to be lifted, so language does not need to separate communities. Instead, communities can organize around values.

  • Levels of review & trust: Code may be wild and experimental; tightly reviewed and trusted in production; or formally verified.

  • Design philosophy: Groups may prioritize safety, raw performance, or elegance by some metric.

  • Use case: Planes may focus on production, pedagogy, competition or experimentation.

Making these differences explicit (and easy to navigate) lets the community set and find their own standards.

Real-World Analogs

Within the R community, you could define three planes:

  • CRAN: Has stringent requirements for acceptance and manual application process focused on adherence to well-defined (mostly automated) requirements

  • rOpenSci: Focuses on a formal peer review process that considers motivation, documentation, and good software design

  • GitHub: Wild west. Anything goes.

You could probably find more "planes" in R, but these three capture the idea of what a plane is. It is a design philosophy and set of protocols that define admission.

Possible examples of Morloc planes

  • unstable: For newly submitted or unvetted modules, e.g., loaded straight from GitHub.

  • safe: Modules that passed manual review, rigorous automated tests, and have strong test suites.

  • true: For formally verified code, strict on what languages are allowed (e.g., dependently typed languages).

  • prod: For modules ready for production, combining safety and performance.

  • comp: Modules suited for competitive programming; all performance, no safety checks or focus on software design principles.

  • red: Adversarial modules—​written to give the Morloc bot problems. Probably don’t want to import these.

  • weird: Esoteric code. For silly implementations that abuse languages in fun ways.

  • demo: Prototypes, examples, and proof-of-concept modules. More pedogogical than practical.

Planes aren’t rigid categories, but cultures: each has its own ground rules, review process, and ideas about what makes code "good". Anyone can propose a new plane, but we don’t want too many; a bit of consensus is required before adding one.

How Does a Module Join a Plane?

Again, the architecture is in development. But here is the basic process:

  1. Register: Authors register their module (e.g., import code from GitHub and authenticate).

  2. AI Vetting: Our AI (Weena) checks code for basic standards.

  3. Acceptance: After being accepted, the module defaults to the unstable plane.

  4. Level Up: Module authors can then apply to join other planes. Getting accepted depends on the plane’s review process (could be peer review, automated testing, thumbs up from community members, or nothing at all).

  5. Multiple Planes: Modules can exist in multiple planes at once—​different communities may trust the same code for different reasons.

This process will eventually be mediated on the website morloc.io (under construction).

Overall, planes help you find code that matches your needs and values—​whether you want ultimate safety, bleeding-edge performance, or just something weird that might surprise you. They also provide community and allow relations between different codebases to be specified.

9. Morloc versions and dependencies

9.1. Installing and managing morloc versions

The easiest way to run Morloc is through containers in a UNIX environment. Linux will work natively. MacOS and Windows are more complicated and I’ll deal with their special later on. For Windows, you will need to install through the Windows Subsystem for Linux.

The only dependency is a container engine, currently two — docker and podman — are supported.

Podman instructions

Unlike Docker, podman runs rootless by default, so no sudo is required. On Linux, it also runs natively with no daemons.

On MacOS and Windows (even through WSL), a virtual machine is required. So you will need to initialize podman as so:

$ podman init
$ podman start

You can confirm that podman is running by entering

$ podman --version
podman version 5.4.1   # version on my current setup

Docker instructions

Docker requires either sudo access or rootless mode configuration.

Verify Docker is running:

$ docker --version
$ docker run hello-world

After confirming either Podman or Docker is running on your system, pull an installation script from morloc-project/morloc-manager:

$ curl -o morloc-manager https://raw.githubusercontent.com/morloc-project/morloc-manager/refs/heads/main/morloc-manager.sh

morloc-manager is a (mostly) POSIX compatible shell script that is tested on Linux and MacOS. On Windows, you should may run this in the Windows Subsystem for Linux (WSL). You may execute morloc-manager directly, for example bash morloc-manager. But it is more convenient to make the script executable and move it to a folder in your execution path.

The script provides usage information on request:

$ morloc-manager -h
$ morloc-manager install -h
$ morloc-manager uninstall -h

The install subcommand will build the Morloc home directory, install required containers, and generate four executable scripts.

The first script is menv. It runs arbitrary commands in a Morloc container. It can be used to compile and run Morloc programs. For example:

$ menv morloc make -o foo foo.loc
$ menv ./foo double 21

The menv script mounts the working directory and the ~/.local/share/morloc/<version> directory. It builds the program using the containers version of the Morloc compiler and Morloc libraries that are defined in the version-specific home directory. It cannot see any dependencies on your local system.

The second script is morloc-shell. This script is the same as the above script except that you enter the container in an interactive shell rather than just running one command. The container has installations for Python, R, and C++ compiler. It also contains vim and other conveniences

The third script is menv-dev. This runs commands in a dev container where all Haskell tools required for building Morloc from source are installed. To build Morloc from source, do the following:

$ git clone git@github.com:morloc-project/morloc
$ cd morloc
$ menv-dev stack install --fast
$ menv-dev stack test --fast

This will build the version that was cloned from GitHub. Now you can use menv-dev as you would menv.

The fourth script is morloc-shell-dev. This script lets you enter the dev shell.

The Morloc compiler always considers ~/.local/share/morloc to be the Morloc home directory. The containers mount separate folders from the local ~/.local/share/morloc/versions folder to this expected home directory in the container. This means the different versions of Morloc installed, as well as the special local version used by the dev container, do not have overlapping libraries and modules.

Throughout the rest of this manual, whenever I show an example that uses Morloc, you can assume that I am running it from within a container after calling morloc-shell or morloc-shell-dev.

9.2. Handling dependencies

morloc-manager will install the Morloc compiler and default language run-times. This is sufficient to play with Morloc and run most of the demos. However, if you write any code that has additional dependencies, you will need to augment the environment. As of v0.5.0, morloc-manager provides support for specifying containers that build on the default containers.

Let’s say we want to create a new environment for scientific computing with Python:

$ morloc-manager env --init scipy

This will create a stub Dockerfile at ~/.local/share/morloc/deps/scipy.Dockerfile. It can be modified to specify whatever additional dependencies are needed. Then the environment can be built and selected as follows:

$ morloc-manager env scipy

All existing environments can be listed as well:

$ morloc-manager env --list

This feature is new and still experimental. Any feedback would be appreciated.

10. Build Architecture

10.1. Architecture Overview

A compiled Morloc program has two kinds of components: one nexus and one or more pools.

The nexus is a pre-compiled C binary that serves as the CLI entry point. It reads a JSON manifest describing the program’s structure, parses command-line arguments, and orchestrates execution. The nexus starts pool daemons, sends them call packets over Unix domain sockets, and prints the result. When done, it tears everything down.

Pools are language-specific daemons — one per language used in the program. A pool contains all functions from its language, compiled into a single unit. Pools listen on Unix sockets for call packets, dispatch to the appropriate function, and return results. All pools are multi-threaded (or multi-process), starting with one worker and growing dynamically as needed.

The nexus and pools share a shared memory region for passing data. In the common case, only 8-byte relative pointers travel over sockets — the actual data lives in shared memory visible to all processes. Pools can also call each other directly for cross-language ("foreign") calls, without routing through the nexus.

Diagram

Here are a runtime rules you should be able to count on. Any violations should be considered bugs.

  1. STDOUT and STDERR pass through. Any output written to stdout or stderr by user functions is never intercepted or buffered by the Morloc runtime. It passes directly to the terminal.

  2. Errors become tracebacks. All exceptions raised by user functions are caught by the pool and returned as error packets. As the error propagates back through foreign calls to the nexus, each layer appends context, building a full cross-language traceback that the user can read.

  3. Intra-pool calls are near-native. Calls between functions within the same pool go through a simple dispatch table — there is no serialization, no socket overhead, and no IPC. Performance should be nearly native.

  4. Inter-pool calls cost socket time plus marshalling. A call between pools (or between the nexus and a pool) pays only the few microseconds of Unix socket round-trip plus the cost of data marshalling. In the best case, the data in shared memory can be directly used between programs and marshalling cost is zero. In practice, copies are often needed — for example, Python demands ownership of its strings even when the data could in principle be shared directly.

10.2. Protocols

This section describes the binary formats used for communication between the nexus and pools: the manifest, the packet protocol, the shared memory layout, and the voidstar data format.

10.2.1. The manifest

The manifest is a JSON object embedded in the wrapper script (after a MANIFEST marker). It describes the program’s structure. Key fields:

Field Description

version

Manifest format version (currently 1)

name

Program name

build_dir

Absolute path to the build directory

pools

Array of pool descriptors (see below)

commands

Array of exported commands (see below)

Each pool entry:

  • lang — Language name (e.g., "python3", "cpp")

  • exec — Command-line tokens to launch the pool (e.g., ["python3", "pools/pool.py"])

  • socket — Unix domain socket basename (e.g., "pipe-python3")

Each command entry:

  • name — CLI subcommand name

  • type — "remote" (dispatched to a pool) or "pure" (evaluated in the nexus)

  • mid — Manifold index identifying the function in the pool

  • pool — Index into the pools array

  • needed_pools — Indices of all pools that must be running

  • arg_schemas / return_schema — Schema strings describing argument and return types (see Schema strings)

  • args — CLI argument descriptors

10.2.2. Packet protocol

All communication uses a binary packet protocol over Unix domain sockets. Every packet starts with a 32-byte packed header:

Diagram
Table 2. Packet header fields
Field Type Width Description

magic

uint32_t

4

Constant 0x0707f86d (little-endian)

plain

uint16_t

2

Plain membership (reserved, always 0)

version

uint16_t

2

Format version (currently 0)

flavor

uint16_t

2

Metadata convention (reserved)

mode

uint16_t

2

Evaluation mode (reserved)

command

union

8

Type-specific command data (see below)

offset

uint32_t

4

Bytes of metadata between header and payload

length

uint64_t

8

Payload length in bytes

Total packet size is always 32 + offset + length.

Packet types

The command field’s first byte is a type tag:

Data packet (0x00) — Carries data or error messages:

Field Type Width Description

type

uint8_t

1

0x00

source

uint8_t

1

0x00=MESG (inline), 0x01=FILE (path), 0x02=RPTR (shared memory pointer)

format

uint8_t

1

0x00=JSON, 0x01=MSGPACK, 0x02=TEXT, 0x03=DATA, 0x04=VOIDSTAR

compression

uint8_t

1

Reserved, always 0

encryption

uint8_t

1

Reserved, always 0

status

uint8_t

1

0x00=PASS, 0x01=FAIL

padding

uint8_t

2

Zero

The most common combination is source=RPTR, format=VOIDSTAR — the data lives in shared memory and only an 8-byte relative pointer travels over the socket.

When status=FAIL, the packet carries a UTF-8 error message (source=MESG, format=TEXT).

Call packet (0x01) — Instructs a pool to execute a function:

Field Type Width Description

type

uint8_t

1

0x01

entrypoint

uint8_t

1

0x00=LOCAL, 0x01=REMOTE_SFS

padding

uint8_t

2

Zero

midx

uint32_t

4

Manifold index (which function to call)

The payload is a contiguous sequence of data packets, one per argument.

Ping packet (0x02) — Header-only, no payload. The nexus pings pools to check readiness; the pool echoes it back as a pong.

Metadata blocks

Between the header and payload (in the offset region), packets can carry metadata blocks. Each has an 8-byte header:

Field Type Width Description

magic

char[3]

3

Constant "mmh"

type

uint8_t

1

0x01=SCHEMA_STRING, 0x02=XXHASH

size

uint32_t

4

Payload size in bytes

10.2.3. Shared memory

Pools share data through POSIX shared memory segments rather than copying over sockets. Only relative pointers (8 bytes) travel over the wire.

Volumes

Shared memory is organized as multiple volumes (/dev/shm/morloc-<hash>_0, morloc-<hash>_1, etc.). The nexus creates the first volume (64 KB). New volumes are created automatically when space runs out (up to 32 volumes). If /dev/shm is too small (common in Docker), volumes fall back to files in the temporary directory.

Pointer types
Type Description

absptr_t (void*)

Virtual address in the current process. Different per process.

volptr_t (ssize_t)

Offset within a single volume (0 = first byte after the header).

relptr_t (ssize_t)

Global offset across all volumes. This is the pointer type shared between processes — it appears in data packets and in voidstar data structures.

             volume 0 (size=20)        volume 1
         ---xxxxxx........----xxxxxx............---->
 relptr           0      7          8         19
Volume header (shm_t)
Field Type Description

magic

unsigned int

Constant 0xFECA0DF0

volume_name

char[256]

Volume identifier

volume_index

int

Index in the pool (0, 1, 2, …​)

volume_size

size_t

Usable data capacity (excludes header)

relative_offset

size_t

Sum of all prior volumes' sizes

rwlock

pthread_rwlock_t

Process-shared read-write lock

cursor

volptr_t

Current free block (allocator hint)

Block header (block_header_t, packed)
Field Type Description

magic

unsigned int

Constant 0x0CB10DF0

reference_count

atomic unsigned int

Active references (0 = free)

size

size_t

Payload size in bytes (excludes header)

Blocks use reference counting. shmalloc allocates with first-fit and lazy coalescing. shfree decrements the reference count; blocks are merged during the next allocation scan.

10.2.4. Schema strings

Schema strings are a compact encoding of a data type’s binary layout. They appear in the manifest and in packet metadata.

Primitives:

Schema Type

z

nil (1 byte)

b

bool (1 byte)

i1/i2/i4/i8

signed int (1/2/4/8 bytes)

u1/u2/u4/u8

unsigned int (1/2/4/8 bytes)

f4/f8

float (4/8 bytes)

s

variable-length UTF-8 string

Compounds:

Pattern Description

a<elem>

Array. ai4 = array of int32.

t<N><elems>

Tuple. t2i4f8 = (int32, float64).

m<N><fields>

Record with length-prefixed keys. m2<3>agei4<4>names = \{age: int32, name: string}.

10.2.5. Voidstar binary format

Every Morloc general type maps unambiguously to a binary form that consists of several fixed-width literal types, a list container, and a tuple container. The literal types include a unit type, a boolean, signed integers (8, 16, 32, and 64 bit), unsigned integers (8, 16, 32, and 64 bit), and IEEE floats (32 and 64 bit). The list container is represented by a 64-bit size integer and a pointer to an unboxed vector. The tuple is represented as a set of values in contiguous memory. These basic types are listed below:

Table 3. Morloc primitives
Type Domain Schema Width (bytes)

Unit

()

z

1

Bool

True | False

b

1

UInt8

\([0,2^{8})\)

u1

1

UInt16

\([0,2^{16})\)

u2

2

UInt32

\([0,2^{32})\)

u4

4

UInt64

\([0,2^{64})\)

u8

8

Int8

\([-2^{7},2^{7})\)

i1

1

Int16

\([-2^{15},2^{15})\)

i2

2

Int32

\([-2^{31},2^{31})\)

i4

4

Int64

\([-2^{63},2^{63})\)

i8

8

Float32

IEEE float

f4

4

Float64

IEEE double

f8

8

List x

het lists

a{x}

\(16 + n \Vert a \Vert \)

Tuple2 x1 x2

2-ples

t2{x1}{x2}

\(\Vert a \Vert + \Vert b \Vert\)

TupleX \(\ t_i\ ...\ t_k\)

k-ples

\(tkt_1\ ...\ t_k\)

\(\sum_i^k \Vert t_i \Vert\)

\(\{ f_1 :: t_1,\ ... \ , f_k :: t_k \}\)

records

\(mk \Vert f_1 \Vert f_1 t_1\ ...\ \Vert f_k \Vert f_k t_k \)

\(\sum_i^k \Vert t_i \Vert\)

All basic types may be written to a schema that is used internally to direct conversions between Morloc binary and native basic types. The schema values are shown in the table above. For example, the type [(Bool, [Int8])] would have the schema at2bai1. You will not usually have to worry about these schemas, since they are mostly used internally. They are worth knowing, though, since they appear in low-level tests, generated source code, and binary data packets.

Here is an example of how the type ([UInt8], Bool), with the value ([3,4,5],True), might be laid out in memory:

---
03 00 00 00 00 00 00 00 00 -- first tuple element, specifies list length (little-endian)
30 00 00 00 00 00 00 00 00 -- first tuple element, pointer to list
01 00 00 00 00 00 00 00 00 -- second tuple element, with 0-padding
03 04 05                   -- 8-bit values of 3, 4, and 5
---

Records and tables are represented as tuples. The names for each field are stored only in the type schemas. Morloc also supports tables, which are just records where the field types correspond to the column types and where fields are all equal-length lists. Records and tables may be defined as shown below:

A record is a named, heterogenous list such as a struct in C, a dict in Python, or a list in R. The type of the record exactly describes the data stored in the record (in contrast to parameterized types like [a] or Map a b). They are represented in Morloc binary as tuples, the keys are only stored in the schemas.

A table is like a record where field types represent the column types. But table is not just syntactic sugar for a record of lists, the table annotation is passed with the record through the compiler all the way to the translator, where the language-specific serialization functions may have special handling for tables.

record Person = Person { name :: Str, age :: UInt8 }
table People = People { name :: Str, age :: Int }

alice = { name = "Alice", age = 27 }
students = { name = ["Alice", "Bob"], age = [27, 25] }

The Morloc type signatures can be translated to schema strings that may be parsed by a foundational Morloc C library into a type structure. Every supported language in the Morloc ecosystem must provide a library that wraps this Morloc C library and translates to/from Morloc binary given the Morloc type schema.

11. Q&A

11.1. I’m not a human, do I still need Morloc?

Yes, you are still a programmer and writing endless glue code to stitch things together takes time and is brittle. Abstraction is universal. With Morloc you can build with confidence using simple components with highly efficient interfaces. You can access a library of functions and freely compose them to build infinite new tools. Morloc is a way to define, access, and explore customizable and efficient toolsets.

11.2. I only use one language, is Morloc still useful?

Yes, Morloc remains useful even if you only use one programming language.

While Morloc is designed to allow polyglot development, its core benefits also apply to single-language projects. In the Morloc ecosystem, you may continue working in your preferred language, but focus shifts to writing libraries instead of standalone applications.

Morloc lets you compose these functions and automatically generate applications from them, offering several advantages:

  • Broader usability: Your functions can be easily reused and easily accessed by other language communities.

  • Improved testing and benchmarking: Functions can be integrated into language-agnostic testing and benchmarking frameworks.

  • Future-proofing: If you ever need to migrate to a new language, Morloc’s type annotations and documentation carry over—only the implementation needs to change. And if you want to leave the Morloc ecosystem, your implementation does not need to change.

  • Better workflows: Especially in fields like bioinformatics, Morloc shifts workflows from chaining applications and files to composing typed functions and native data structures, making pipelines more robust and easier to validate.

  • No more format parsing: Morloc data structures replace bespoke file formats and offer efficient serialization.

While language interop is a major feature of Morloc, it is not main purpose. The very first version of Morloc was not even polyglot at all. The focus originally was to just have a simple composition language that separated pure code from associated effects, conditions, caching, etc.

The primary goal of Morloc is to support the development of composable, typed universal libraries. Support for many languages is required for this goal, since no one language is best for all cases. Most Morloc users would continue to program in their favorite language, but gain the ability to compose, share, and extend functionality more easily.

11.3. Is this just a bioinformatics workflow language?

No. The Morloc paper, is focused on bioinformatics applications. As discussed at length in the paper, Morloc addresses systematic flaws in the traditional approaches to building bioinformatics workflows. Given the need, and also given my personal background, bioinformatics is a good place to start. However, Morloc can be more broadly applied to any functional problem.

11.4. Do you really want to deprecate all the bioinformatics formats?

Yes, with the possible exception of a specialized binary formats which may offer performance benefits.

For human-readable semi-structured formats, I think only two are necessary. A tabular format (e.g., CSV) and a tree format (e.g., JSON).

11.5. Do you really want to deprecate all the bioinformatics applications?

Yes, with the exception of interactive graphical applications.

11.6. Do you want to deprecate the conventional workflow languages?

Yes. They do offer good scaling support that Morloc cannot yet match yet (but that is only a few prompts away). Some workflow languages also support GUIs which offer an intuitive and valuable way to visualize and create workflows from coarse components.

Hybrid solutions are possible. Conventional workflow languages can wrap Morloc compiled applications and pass Morloc generated data in place of bespoke bioinformatics formats.

11.7. Does Morloc allow function-specific containerized environments?

No, unlike workflow managers such as Snakemake and Nextflow, Morloc does not offer function-specific environments. This is a deliberate design choice.

Dependency resolution is a hard and heavily researched problem. The general goal of dependency solvers is to find one set of dependencies that satisfies the entire program. The bioinformatics community often gives up on finding unified environments and instead runs each function in its independent environment. With every function running in its own container, all dependency issues are encapsulated and all functions may be executed from one manager. But this comes at a heavy cost. Each application must be wrapped in a script, the script must be executed via an expensive system call into the container, and data must be serialized and sent to the container. This approach is reasonable for workflows with a small number of heavy components. But from a programming language perspective, wrapping every function call in its own environment is inefficient and opaque.

Morloc is designed not to hide problems in boxes, but rather to solve the root problem. Conventional workflow languages attempt to simplify workflows design by layering frameworks over the functions. The Morloc approach is the exact opposite. First delete everything unnecessary from all applications and lift their light algorithmic cores into clean, well-typed libraries. Then build upwards through composition of these pure functions—​and judicious use of impure ones—​to create efficient, reliable, and composable tools.

Now, if you really do need to run something in a container, you can just make a function that wraps a call to a container and then use it just as you would any other function. You could even write a wrapper function that takes a record with all the metadata needed for a conda environment and execute its function within that environment. We can do this through libraries, so there is no need to hardcode this pattern into the Morloc language itself.

The reproducibility of Morloc workflows may be ensured by running the entire Morloc program in an environment or container, with a single set of dependencies. The specific Morloc compiler version can be specified and modules may be imported using their git hashes. This is done in the current Morloc examples (see the Dockerfile in the workflow-comparisons folder of https://github.com/morloc-project/examples).

11.8. What about object-oriented programming?

An "object" is a somewhat loaded term in the programming world. As far as Morloc is concerned, an object is a thing that contains data and possibly other unknown stuff, such as hidden fields and methods. All types in Morloc have must forms that are transferable between languages. Methods do not easily transfer; at least they cannot be written to Morloc binary. However, it is possible to convey class-like APIs through typeclasses. Hidden fields are more challenging since, by design, they are not accessible. So objects cannot generally be directly represented in the Morloc ecosystem.

Objects that have a clear "plain old data" representation can be handled by Morloc. These objects, and their component types, must have no vital hidden data, no vital state, and no required methods. Examples of these are all the basic Python types (int, float, list, dict, etc) and many C++ types such as the standard vector and tuple types. When these objects are passed between languages, they are reduced to their pure data.

11.9. Is Morloc still relevant when AI can program and translate?

Maybe. Morloc may serve as a system for functional composition, verification, and automation even when most functions are generated by machines.

I’ll lay out an argument for this below, starting with a few proposition:

  1. Adversaries exist. AIs may themselves be adversarial or there might be adversarial code in ecosystem around the AIs (for example, prompt injection). Humans can’t trust humans, humans can’t trust AIs, AIs can’t trust humans, and AIs can’t trust AIs. Depending on their architecture, AIs may not even be able to trust their own memories.

  2. Stupid is fast. Narrow intelligence outperforms general intelligence for narrow problems. A vast AGI system with deep understanding of physics and Shakespeare will not be the fastest tool for sorting a list of integers. There will always be a need for programs across the intelligence spectrum — from classical functions, to statistical models, to general intelligences.

  3. Creating functions is expensive. Designing high-performance algorithms is not trivial. Even simple functions, like sorting algorithms, require deep thought to optimize for a given use case. But there is a further combinatorial explosion of more complex physical simulations, graphics engines, and statistical algorithms. While simple functions might be created in seconds, others may take years of CPU time to optimize.

  4. Reproducibility is important. Future AIs may serve as nearly perfect oracles, but they are complex entities and future AIs will likely be capable of evolving over time as persons. So they will likely not give equivalent answers day to day. It is valuable to be able to crystallize a thought process into something that will behave the same every time it is invoked on a given input. So again, functions are important.

  5. Correctness is important. If functions are being composed by AIs to create new programs, any function that does not behave in the way the AI expects can cause cascading errors. It doesn’t matter how intelligent the AI is, if it is building programs from functions that it cannot verify, then the programs may not be safe.

A few things follow from these propositions.

First, AI will benefit from writing functions. Even in a world with no humans, they will need functions for efficiently solving narrow problems. They will likely generate libraries of billions of specialized functions. Some may be classical functions and others may be small statistical models. By caching these functions, compute time can be saved. Rather than generating entire programs from first principles, they can build them logically through composition of prior functions. The same forms of abstractions that help humans reason will also be of value to AIs. Yes, they have far larger working memories than we do, but that does not change the fact that abstraction and composition reduce the costs of re-derivation.

Time can also be saved if different AIs share functions they have written (both with each other and with humans). Since adversaries exist, shared functions must be verified. But verification is hard, especially if a godlike super-intelligence were trying to hide adversarial features in the binary. The problem can be simplified by using a controlled language that can be formally verified by a trusted classical computer program — a compiler. So rather than share functions as binary, it would make sense to share them in strict controlled languages. For this reason, I believe that something resembling current programming languages will exist far into the future. Their main purpose will be as easily verifiable and human readable specifications for languages that can be compiled into high-performance code.

So in this imagined future, there are billions of functions in databases that are written in verifiable languages readable by humans, classical machines, and AIs. But what language is used? Maybe the AIs can converge on one standard. But even for AIs, and perhaps especially for them, I don’t think a single language is optimal. Rather, just as in human mathematics, there will likely be many languages for many domains. Languages make trade-offs. In general, the more complex a language is, the more difficult it is to parse, verify and optimize. So even if we ignore human factors, multi-lingual ecosystems are still likely to appear. Adding in human factors, we are again likely to see a spectrum of languages that accept different trade offs in rigor, ease of use, and domain specificity.

I predict a future where humans and AIs use libraries of functions written in specialized languages. All the functions need to be easily verifiable by an outside actor and verified functions need to be composed to more complex programs using a well-verified composer. Since we don’t trust any agent to verify, we need a classical program. Morloc is a potential candidate for this role. It would serve as a classical composition tool, function verification ecosystem, automation engine, and conceptual framework for organizing and using billions of mostly machine generated functions.

Of course, the future is impossible to predict, especially where AI is concerned. It is possible that AIs will converge on a single universal representation for computation. It is possible that the need for human readability and curation may disappear. It is possible that classical computer functions could be entirely replaced by discrete mathematical constructs that are composable and machine verifiable but entirely incomprehensible to humans.

11.10. Why is it named after Morlocks, weren’t they, like, bad?

While the Morlocks of Wellian fame are best known for their culinary preferences, I think Wells misrepresented them. And even if he didn’t, we don’t treat our own Eloi any better. Meat choices aside, the Morlocks worked below to maintain the machines that simplified life above. That’s why the Morloc language adapts their name.

11.11. Wait! I have more questions!

Great! Look me up on Discord (link below) and we can chat.

12. Contact

This is a young project and any brave early users are highly valued. Feel free to contact me for any reason!