\[ \def\ea{\widehat{\alpha}} \def\eb{\widehat{\beta}} \def\eg{\widehat{\gamma}} \def\sep{ \quad\quad} \newcommand{\mark}[1]{\blacktriangleright_{#1}} \newcommand{\expr}[3]{#1\ \ \vdash\ #2\ \dashv\ \ #3} \newcommand{\packto}[2]{#1\ \approx >\ #2} \newcommand{\apply}[3]{#1 \bullet #2\ \Rightarrow {\kern -1em} \Rightarrow\ #3} \newcommand{\subtype}[2]{#1\ :\leqq\ #2} \newcommand{\braced}[1]{\lbrace #1 \rbrace} \]

1. Intro

Morloc is a strongly-typed functional programming language where functions are imported from foreign languages and unified under a common type system. This language is designed to serve as the foundation for a universal library of functions. Each function in the library has one general type and zero or more implementations. An implementation may be either a function sourced from a foreign language or a composition of such functions. All interop code is generated by the Morloc compiler.

2. Why Morloc?

2.1. Compose functions across languages under a common type system

Morloc allows functions from polyglot libraries to be composed in a simple functional language. The focus isn’t on classic interoperability (e.g., calling Python from C) or serialization (e.g., sending data between applications via protobufs) — though morloc implementations may use these under the hood. Instead, you define types, import implementations, and build complex programs through function composition. The compiler invisibly generates any required interop code.

2.2. Write in your favorite language, share with everyone

Do you want to write in language X but have to write in language Y because everyone in your team does or because your expected users do? Love C for algorithms, R for statistics, but don’t want to write full apps in either? Morloc lets you mix and match, so you can use each language where it shines, with no bindings or boilerplate.

2.3. Run benchmarks and tests across languages

Tired of learning new benchmark and testing suites across all your languages? Is it hard to benchmark similar tools wrapped in applications with varying input formats, input validation costs, or startup overhead? In Morloc, functions with the same general type signature can be swapped in and out for benchmarking and testing. The same test suites and test cases will work across all supported languages because inputs/output of all functions of the same type share equivalent Morloc binary forms, making validation and comparison easy.

2.4. Design universal libraries

With Morloc, we can build abstract libraries using the general types as a logical framework. Then we can import implementations of these functions from one or more of the supported languages and easily test and benchmark them. These libraries are the foundation for an ecosystem where functions may be verified, organized/searched by type, and used to build rigorous programs.

2.5. Make composable and deployable tools

A Morloc module can be compiled directly into a CLI tool with rich subcommands and automatically generated usage statements. These CLI tools can be composed with just a few lines to make custom toolboxes. They can also be compiled as daemons serving over UNIX sockets, TCP or HTTP.

2.6. Make better scientific workflows

Within the scientific programming space, Morloc can serve as a replacement for the brittle application/file paradigm of workflow design. Replace heavy CLI applications with pure function libraries, ad hoc textual file formats with explicit data structures, and workflow specifications with function compositions. See the first Morloc paper for details (here).

3. Getting Started

3.1. Installing Morloc

The easiest way to run Morloc is through containers in a UNIX environment. Linux will work natively. MacOS and Windows are more complicated and I’ll deal with their special cases later on. For Windows, you will need to install through the Windows Subsystem for Linux.

3.1.1. Installing morloc-manager

The morloc-manager utility streamlines the management of Morloc environments. The binaries can be downloaded directly from GitHub (here) or you can follow the script below can be followed to install the latest binary:

$ sys="linux-x86_64"  # or "linux-arm64" / "macos"
$ mim_url=$(curl -s https://api.github.com/repos/morloc-project/morloc/releases/latest | grep browser_download_url | grep morloc-manager-${sys} | grep -o 'https[^"]*')
$ curl -Lo morloc-manager "$mim_url"
$ chmod +x morloc-manager
$ mv morloc-manager ~/.local/bin/

On macOS you may need to clear the quarantine attribute:

$ xattr -d com.apple.quarantine morloc-manager

morloc-manager relies on containers, so you will also need a container engine. Two are supported: Podman and Docker. I recommend Podman for rootless local work. If you have both engines installed, you will need to tell morloc-manager which you are using with the command:

$ morloc-manager setup --engine podman
Engine set to: podman
Podman instructions

Unlike Docker, podman runs rootless by default, so no sudo is required. On Linux, it also runs natively with no daemons.

On MacOS and Windows (even through WSL), a virtual machine is required. So you will need to initialize podman as so:

$ podman machine init
$ podman machine start

You can confirm that podman is running by entering

$ podman --version
podman version 5.4.1   # version on my current setup

The morloc-manager utility usage information can be accessed with the -h option:

$ morloc-manager -h
morloc-manager - container lifecycle manager for Morloc

Usage: morloc-manager [OPTIONS] [COMMAND]

Development
  setup      Configure the default container engine
  new        Build a new morloc environment
  run        Run a command in the active environment
  rm         Remove a morloc environment
  ls         List morloc environments
  info       Show configuration and installed environments
  select     Select an environment
  update     Rebuild an environment
  nuke       Remove all morloc environments

Deployment
  start      Serve an environment over the network
  stop       Stop a running serve container
  logs       Stream logs from a running serve container
  freeze     Export installed state as a frozen artifact
  unfreeze   Build a portable serve image from frozen state
  status     List running serve containers
  doctor     Check environment health and diagnose issues

Options
  -v, --verbose  Print container commands to stderr before executing
      --json     Output machine-readable JSON instead of human-readable text
      --version  Print version and exit
  -h, --help     Print help (see more with '--help')

To get started with Morloc, we will only need subcommands from the Development section of the usage statement above.

3.1.2. Creating environments

An environment is a named, self-contained Morloc installation: a container image, a data directory for modules and binaries, and optionally a custom Dockerfile layer with extra dependencies.

We can create a new environment and name it base with the command below:

$ morloc-manager new --non-interactive base
Pulling ghcr.io/morloc-project/morloc/morloc-full:latest...
Created environment: base
Initializing morloc (this may take several minutes)...
Environment 'base' is ready.
Activate it with: morloc-manager select base

By default, the new subcommand pulls the latest Morloc release. There are many additional options, but this default environment will be sufficient for running nearly everything in the Morloc docs.

We can check that the new environment is available with ls. You might see something like this (your environments will differ from mine):

$ morloc-manager ls
Local environments:
  base [0.85.0]
  edge [0.79.0] (active)

Next we should activate the new environment we just created:

$ morloc-manager select base
Selected environment: base

everthing is set up correctly with info:

$ morloc-manager info
Active:         edge
Local engine:   podman
System engine:  podman
SELinux:        not detected

Directories:
  Config (local)       /home/z/.config/morloc (exists)
  Data (local)         /home/z/.local/share/morloc (exists)
  Config (system)      /etc/morloc (exists)
  Data (system)        /usr/local/share/morloc (not found)

Local environments:
  base [0.85.0] (active)
  edge [0.79.0]

You can get details on a particular environment as well:

$ morloc-manager info base
Name:           base
Scope:          local
Active:         yes
Base image:     ghcr.io/morloc-project/morloc/morloc-full:0.85.0
Morloc version: 0.85.0
Engine:         podman
SHM size:       512m
Dockerfile:     none
Flags:          /home/username/.config/morloc/environments/base/env.flags
Data dir:       /home/username/.local/share/morloc/environments/base

3.1.3. The Morloc shell and first runs

You can run commands inside the environment:

$ morloc-manager run -- morloc --version
0.85.0

This confirms that that the morloc compiler is installed and shows its version.

Alternatively you can create an interactive session inside the Morloc container:

$ morloc-manager run --shell

You are now inside a shell in the container. You can check that the Morloc compiler installed in the container is indeed the latest version:

$ morloc --version
0.85.0  # you may have a later version

The current working directory from your system is mounted. All changes you make in this directory will persist. The Morloc module directory for this environment from your home system is mounted as well, so any Morloc modules you install will persist. That beind the case, let’s install the Morloc standard library:

$ morloc install stdlib

The Morloc stdlib module is a re-exporter of all the Morloc individual standard library modules. So installing it is a shortcut for explicit installation.

You are now ready to run almost any code in the docs.

3.2. Setting up IDEs

We are currently working on expanding the editor support for Morloc.

Below are the editors that are supported or under development.

vim

If you are working in vim, you can install Morloc syntax highlighting as follows:

$ mkdir -p ~/.vim/syntax/
$ mkdir -p ~/.vim/ftdetect/
$ curl -o ~/.vim/syntax/loc.vim https://raw.githubusercontent.com/morloc-project/vimmorloc/main/loc.vim
$ echo 'au BufRead,BufNewFile *.loc set filetype=loc' > ~/.vim/ftdetect/loc.vim
vim highlights

VS Code / VSCodium / Cursor

We have a publicly available "morloc" extension with support for highlighting and snippet expansion.

vscode highlights

Zed

This is currently under development, see repo here.

The extension is mostly written, and the required Tree-sitter grammar is written, but there are bugs to be resolved. I’m happy to accept pull requests!

I’ve also written several syntax highlighting and static analysis tools:

Pygmentize

A repo with the Pygmentize parser can be found here. This parser is used to highlight code here in the manual. It can be easily integrated into Python code, e.g., in the Weena discord bot.

Tree-sitter

Tree-sitter is a program for defining parsers and using them to query languages and add advanced grammatical understanding to editors. These grammars require a complete lexer and parser specification for the language. This grammar is available for Morloc, see repo here. Tree-sitter allows general purpose syntax highlighting (e.g., over the command line) and parses a full concrete syntax tree from the code:

tree sitter

3.3. First Morloc programs

The inevitable "Hello World" case is implemented in Morloc like so:

module hw (hello)

--' A Morlock's hello world
hello = "Hello up there"

The module named hw exports the term hello which is assigned to a literal string value. The --' syntax adds a docstring that describes the term.

Paste this code into a file (e.g. "hello.loc") and then it can be imported by other Morloc modules or directly compiled into an executable program.

$ morloc make hello.loc

This command will produce an executable named after the module (in this case, hw) and pool files for each language used (e.g., pool.py, pool-cpp.out, pool.R) in the pools/ directory. The executable is the command line interface (CLI) to the concrete commands exported from the module.

Calling the executable with the -h flag, will print a help message:

$ ./hw -h
Usage: ./hw ARG

A Morlock's hello world

Nexus options (long form only; short forms reserved for the program):
  --help           Print this help message
  --print          Pretty-print output for human consumption
  --output-file    Print to this file instead of STDOUT
  --output-form    Output format [json|mpk|voidstar]
  --keep-null      Print top-level () or None as 'null' (default: empty)

Daemon mode:
  --daemon         Run as a long-lived daemon
  --http-port PORT Listen on HTTP port
  --port PORT      Listen on TCP port
  --socket PATH    Listen on UNIX socket

Return: Str

We can ignore the nexus and daemon options for now. This usage message is automatically generated. For each exported term, it specifies the input (none, in this case) and output types as inferred by the compiler. For this case, the exported command is just the term hello, so no input types are listed and the return type is a string.

Because hw exports exactly one function, the COMMAND name is optional: you can either write it explicitly or omit it.

$ ./hw hello
"Hello up there"
$ ./hw           # equivalent: the only exported command is the implicit one
"Hello up there"

3.4. Unit conversion example

To introduce Morloc programming, let’s develop a simple unit conversion program.

Let’s define a few unit conversions in C++ and write it to the file units.hpp:

#pragma once

double cels2fahr(double cels){
  return 1.8 * cels + 32.0;
}

double meters2feet(double meters){
  return meters * 3.28084;
}

Now let’s source the C++ code in the Morloc file units.loc:

module units (cels2fahr, meters2feet)

source Cpp from "units.hpp" ("cels2fahr", "meters2feet")

type Cpp => Real = "double"

--' Convert from Celsius to Fahrenheit
cels2fahr :: Real -> Real

--' Convert from meters to feet
meters2feet :: Real -> Real

Here we define a new module and its exports. The source statement reads foreign language code and specifies the terms that should be imported from the source code. The type phrase maps the general Morloc type Real to the concrete C++ type double. The remaining lines define the type signatures for the two unit conversion functions. Both map a Real input to a Real output.

We can compile and run it as below:

$ morloc make units.loc
$ ./units cels2fahr 100
212

The Morloc compiler will build a C++ program from the Morloc script. The compiler also generates help statements:

$ ./units -h
Usage: ./units [OPTION...] COMMAND [ARG...]

Nexus options (must precede COMMAND):
  -h, --help           Print this help message
  -p, --print          Pretty-print output for human consumption
  -o, --output-file    Print to this file instead of STDOUT
  -f, --output-format  Output format [json|mpk|voidstar]
      --keep-null      Print top-level () or None as 'null'
                       (default: produce empty output)

Daemon mode:
  --daemon             Run as a long-lived daemon
  --http-port PORT     Listen on HTTP port
  --port PORT          Listen on TCP port
  --socket PATH        Listen on Unix socket

Commands (call with -h/--help for more info):
  cels2fahr    Convert from Celsius to Fahrenheit
  meters2feet  Convert from meters to feet

The nexus and deamon mode options are present in all compiled Morloc programs and offer options for output format and run mode. These advanced options will be discussed later.

3.4.1. Defining language-agnostic functions

This C++ definition for unit conversions is fine, but it would be helpful to be able to do conversions natively in any language without needing to define new helpers or make foreign calls to the C++ functions. In Morloc we can write code that is language independent, like so:

module units (cels2fahr, meters2feet)

import root

--' Convert from Celsius to Fahrenheit
cels2fahr cels = 1.8 * cels + 32.0

--' Convert from meters to feet
meters2feet meters = meters * 3.28084

If you get the error that the root is not installed, you can install all the Morloc standard library modules with:

$ morloc install stdlib

The root module is a language independent Morloc module that defines the typeclasses for arithmetic among other things. Since our new module is fully language-agnostic, we cannot directly compile it, but we can typecheck it:

$ morloc typecheck units2.loc
cels2fahr :: Real -> Real
meters2feet :: Real -> Real

Now to actually compile this program into something we can execute, we need to import a module that contains sourced implementations. Let’s define a main.loc module that imports the types module.

module main (cels2fahr, meters2feet)
import .units
import root-cpp

The .units specifies the relative path to a local Morloc module. The module, root-cpp, stores C++ implementations of root terms, here we just need the arithmetic operator definitions.

If we instead wanted to build in Python, we could substitute the root-cpp import for root-py. Alternatively, we could import both root-cpp and root-py and let the compiler decide which implementations to use.

3.5. Polyglot programs

Morloc can freely mix languages. Suppose we have a function in Python for writing reports:

# format.py
def report(ctemp, c2f):
  return f"The current temperature is {ctemp}°C ({c2f(ctemp)}°F)"

This function takes a temperature in Celsius and a Celsius-to-Fahrenheit converting function as arguments. It returns a string describing the temperature.

Now let’s source this function and our old C++ function into a Morloc program.

module main (report)

import root-py
import root-cpp

source Py from "format.py" ("report" as report_wrapper)
report_wrapper :: Real -> (Real -> Real) -> Str

source Cpp from "units.hpp" ("cels2fahr")
cels2fahr :: Real -> Real

--' Write a cute string about the temperature
report t = report_wrapper t cels2fahr

The report function passes a C++ function to a Python function. All the wiring for this is done under the hood by the Morloc compiler.

We could also import the language-agnostic morloc definitions from before and import root-py. Then the language-agnostic definitions would collapse to native python and the report function would be pure Python.

3.6. Parallelism example

Here is an example showing a parallel map function written in Python that calls cpp functions.

module m (sumOfSums)

import root-py
import root-cpp

source Py from "foo.py" ("pmap")
source Cpp from "foo.hpp" ("sum")

pmap :: (a -> b) -> [a] -> [b]
sum :: [Real] -> Real

sumOfSums = sum . pmap sum

This Morloc script exports a function that sums a list of lists of real numbers. Here we use the dot operator for function composition. The type signature for pmap uses lowercase type variables (a and b) to indicate that the function is generic — it works for any types a and b. The sum function is implemented in cpp:

// cpp header sourced by morloc script
#pragma once
#include <vector>

double sum(const std::vector<double>& vec) {
    double sum = 0.0;
    for (double value : vec) {
        sum += value;
    }
    return sum;
}

The parallel pmap function is written in Python:

# Python3 file sourced by morloc script
import multiprocessing as mp

def pmap(f, xs):
    with mp.Pool() as pool:
        results = pool.map(f, xs)
    return results

The inner summation jobs will be run in parallel. The pmap function has the same signature as the non-parallel map function, so can serve as a drop-in replacement.

This can be compiled and run with the lists being provided in JSON format:

$ morloc make main.loc
$ ./m sumOfSums '[[1,2],[3,4,5]]'
15

4. Syntax and Features

4.1. Functions

Functions are defined with arguments separated by whitespace:

foo x y z = g x (f y z)

Here foo is the Morloc function that takes the arguments x, y, and z. Using whitespace to separate arguments may be unfamiliar if you have a background in the Algol family of languages (such as C and Python).

The Morloc internal module, which is imported into a stdlib modules, defines the composition (.) and application ($) operators.

The . operator composes two functions. Consider the two definitions below.

foo1 x = g (f x)
foo2 = g . f

The first shows an explict function call where the function g takes the output of f x as input. The second represents the same operation as a composition of the two functions g and f.

Composition chains can build multi-stage pipelines:

process = format . transform . validate . parse

The $ operator is the application operator. It has the lowest precedence, so it can be used to avoid parentheses:

foo1 x = h (g (f x))
foo2 x = h $ g $ f x

Morloc supports partial application of arguments. If you take a function that requires N arguments, and provide it one argument, you will get a new function of (N-1) arguments. Let’s take the slice function which takes three arguments: a start index, an end index, and a list of values. It returns a sublist. Here are a few examples of partial application:

# create a new "take" function
take = slice 0

# define a new "head" function that returns the first element of a list
head = slice 0 1

# extract the first 5 elements from ever list in a list of lists
firstFive xss = map (slice 0 5) xss

Partial application works with binary operators as well, the example below divides every element in a list of Real values by 2. Numeric literals are not polymorphic across Int and Real, so write 2.0 to keep the operator on Real, and give the binding a signature so map’s `Functor instance can be resolved:

divideByTwo :: [Real] -> [Real]
divideByTwo = map (/ 2.0)

Binary operators can be applied in the reverse order as well:

divideTwoBy :: [Real] -> [Real]
divideTwoBy = map (2.0 /)

The / operator is defined only on Real and other Numeric types. For integer division, use // instead:

halvedInts :: [Int] -> [Int]
halvedInts = map (// 2)

4.1.1. Lambdas

Anonymous functions are written with a backslash, one or more parameters, and . They capture free variables from the enclosing scope, so a lambda can refer to bindings defined outside it:

addBias :: Real -> [Real] -> [Real]
addBias bias = map (\x -> x + bias)

Here bias is captured from the outer parameter list. Lambdas must take at least one argument: the zero-argument form \ → 5 is a parse error. To wrap a value as a "function with no arguments", use the effect system (see the section on effects and delayed evaluation) rather than a lambda.

4.2. Foreign functions

In Morloc, you can import functions from many languages and compose them under a common type system. The syntax for importing functions from source files is as follows:

source Cpp from "foo.hpp" ("map", "sum", "snd")
source Py from "foo.py" ("morloc_map" as map, "morloc_sum" as sum, "snd")

The C++ file, foo.hpp, may be implemented as a simple header file with generic implementations of the three required functions.

#pragma once
#include <vector>
#include <tuple>

// map :: (a -> b) -> [a] -> [b]
template <typename F, typename A>
auto map(F f, const std::vector<A>& xs) {
    std::vector<decltype(f(xs.front()))> result;
    result.reserve(xs.size());
    for (const auto& x : xs) {
        result.push_back(f(x));
    }
    return result;
}

// snd :: (a, b) -> b
template <typename A, typename B>
B snd(const std::tuple<A, B>& p) {
    return std::get<1>(p);
}

// sum :: [a] -> a
template <typename A>
A sum(const std::vector<A>& xs) {
    A total = A{0};
    for (const auto& x : xs) {
        total += x;
    }
    return total;
}

Note that these implementations are completely independent of Morloc — they have no special constraints, they operate on perfectly normal native data structures, and their usage is not limited to the Morloc ecosystem.

The Morloc compiler is responsible for mapping data between the languages. But to do this, Morloc needs a little information about the function types. This is provided by the general type signatures, like so:

map :: (a -> b) -> [a] -> [b]
snd :: (a, b) -> b
sum :: [Real] -> Real

The syntax for these type signatures is inspired by Haskell. Square brackets represent homogenous lists and parenthesized, comma-separated values represent tuples, and arrows represent functions. In the map type, (a → b) is a function from generic value a to generic value b; [a] is the input list of initial values; [b] is the output list of transformed values. In the snd type, the second element from a tuple of two generic terms is extracted. In sum, a list of reals is converted to a single real.

Removing the syntactic sugar for lists and tuples, the signatures may be written as:

map :: (a -> b) -> List a -> List b
snd :: Tuple2 a b -> b
sum :: List Real -> Real

These signatures provide the general types of the functions. But one general type may map to multiple native, language-specific types. So we need to provide an explicit mapping from general to native types.

type Cpp => List a = "std::vector<$1>" a
type Cpp => Tuple2 a b = "std::tuple<$1,$2>" a b
type Cpp => Real = "double"
type Py => List a = "list" a
type Py => Tuple2 a b = "tuple" a b
type Py => Real = "float"

These type functions guide the synthesis of native types from general types. Take the C++ mapping for List a as an example. The basic C++ list type is vector from the standard template library. After the Morloc typechecker has solved for the type of the generic parameter a, and recursively converted it to C++, its type will be substituted for $1. So if a is inferred to be a Real, it will map to the C++ double, and then be substituted into the list type yielding std::vector<double>. This type will be used in the generated C++ code.

These type mappings will normally be imported from foundational modules, such as root-py or root-cpp, so you will not often need to define them in practice.

4.2.1. Importing builtin functions

Importing builtin functions can be problematic. This is why we sourced the map and sum functions from Python under the Python names morloc_map and morloc_sum.

If we directly sourced the builtins, as below:

source Py from "foo.py" ("morloc_map" as map, "morloc_sum" as sum, "snd")

The functions map and sum would be treated by the code Morloc generates as functions that are exported from the foo module. The generated Python code will access these functions under the foo namespace as foo.map and foo.sum. But map and sum are Python builtins, not direct exports, so they must be re-exported at the top of the source file:

# foo.py
from builtins import map, sum  # make builtins module-level attributes

def snd(pair):
    return pair[1]

For third-party modules, any term that is passed to Morloc will need to be locally defined (def bar(…​)) or specifically imported (from somemodule import bar).

4.3. Booleans

Booleans in Morloc are represented as True or False under the Bool type. Comparison and logical operators can be imported from the root modules.

4.3.1. Literals and basic use

yes :: Bool
yes = True

no :: Bool
no = False

4.3.2. Comparison operators

The Eq and Ord typeclasses in root provide the standard comparison operators. They work over any type with the appropriate instance — integers, reals, strings, and tuples and lists of comparable values.

Operator Meaning

==

equal

!=

not equal

<

less than

less than or equal

>

greater than

>=

greater than or equal

import root-py

isPositive :: Int -> Bool
isPositive x = x > 0

sameLength :: [a] -> [b] -> Bool
sameLength xs ys = length xs == length ys

4.3.3. Logical operators

The root module defines logical conjunction (&&), disjunction (||), negation (not), exclusive-or (xor), and not-and (nand):

Operator Meaning

&&

logical AND

||

logical OR

not

logical negation (prefix function)

xor

exclusive OR

nand

NOT AND

&& and || are short-circuiting and right-associative. && binds tighter than ||, matching the convention of most languages:

inRange :: Int -> Int -> Int -> Bool
inRange lo hi x = lo <= x && x <= hi

isWeekend :: Int -> Bool
isWeekend day = day == 0 || day == 6

isWeekday :: Int -> Bool
isWeekday day = not (isWeekend day)

4.3.4. Boolean-valued list functions

The root module provides several functions in the Foldable family that return Bool:

Function Signature

any

Foldable f ⇒ (a → Bool) → f a → Bool

all

Foldable f ⇒ (a → Bool) → f a → Bool

elem

(Foldable f, Eq a) ⇒ a → f a → Bool

any returns True if the predicate holds for at least one element. all returns True only if the predicate holds for every element. elem checks membership using ==.

hasNegative :: [Int] -> Bool
hasNegative = any (< 0)

allPositive :: [Int] -> Bool
allPositive = all (> 0)

containsZero :: [Int] -> Bool
containsZero = elem 0

4.3.5. Guards

Booleans drive Morloc’s guard syntax. A guard alternative starts with ? and selects the first branch whose condition evaluates to True; the : line is the fallthrough:

classify :: Int -> Str
classify x
  ? x < 0     = "negative"
  ? x == 0    = "zero"
  : "positive"

See the Guards section for a full description of guard syntax.

4.4. Integer types

Integers may be written in decimal, hexadecimal, octal, or binary:

-- standard decimal notation
42

-- hexadecimal notation (case insensitive)
0xf00d
0xDEADBEEF

-- octal notation (upper or lowercase 'o')
0o755

-- binary notation (upper or lowercase 'b')
0b0101

A prefixed integer must contain only digits valid for its base, and must end on a non-identifier character. A trailing letter or digit that is not a valid digit for the base is a compile-time error rather than a silently truncated literal followed by an unrelated identifier:

$ morloc eval -e "0xF00D"
61453
$ morloc eval -e "0xF0OD"
<expr>:1:1: malformed hexadecimal literal: 0xF0OD
$ morloc eval -e "0b1001"
9

Here I am using the Morloc eval command to evaluate a single Morloc expression.

Morloc provides a default variable-width integer for general use and fixed-width types for performance-critical code.

4.4.1. Integer types at a glance

Type Width Use case

Int

Variable (arbitrary precision)

Default integer for most code. Works across all languages.

Int8, Int16, Int32, Int64

8, 16, 32, 64 bits (signed)

Performance-critical code with known bounds.

UInt8, UInt16, UInt32, UInt64

8, 16, 32, 64 bits (unsigned)

Bit manipulation, byte data, indices.

4.4.2. The default Int type

The Int type is Morloc’s universal integer. The on-wire representation is variable-width: values up to 64 bits fit in 16 bytes inline, and larger values spill to a pointer to an array of 64-bit limbs. The in-language range of Int, however, is determined by the host language’s native integer type:

Language Native binding for Int Representable range

Python

int

Arbitrary precision

C++

int

32-bit signed (-2^31 to 2^31 - 1)

R

integer

32-bit signed

A morloc program that needs a value above 32 bits in C or R should declare the field as `Int64` (or `UInt64`), which maps to `int64_t` in C and to R’s numeric (with 53-bit integer precision via double).

Integer literals are Int by default:

x = 42          -- Int
y = 0xDEADBEEF  -- Int (hex literal)
z = -9999       -- Int

4.4.3. Big integers from Python

Python natively supports arbitrary-precision integers. Morloc’s Int type takes full advantage of this. For example, computing large factorials:

module main (fact)

import root-py

fact :: Int -> Int
fact n
  ? n == 0 = 1
  : n * fact (n - 1)
$ morloc make -o calc main.loc
$ ./calc fact 100
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

This produces a 525-bit integer — far beyond what any fixed-width type can hold. The value is stored as a multi-limb big integer and printed correctly.

4.4.4. Cross-language overflow errors

When a big integer is passed to a language whose concrete type cannot represent it, Morloc produces a clear error at the language boundary. The error states the value’s size, the target type’s bit width, and its representable range.

For example, passing a large factorial from Python to C++:

module main (factCpp)

import root-py
import root-cpp

fact :: Int -> Int
fact n
  ? n == 0 = 1
  : n * fact (n - 1)

factPy :: Int -> Int
factPy n = idpy (fact n)

factCpp :: Int -> Int
factCpp x = idcpp (factPy x)

Small values pass through without issue:

$ ./calc factCpp 5
120

But large values produce a descriptive error:

$ ./calc factCpp 100
Error: run failed
Integer overflow: 9-limb integer (576 bits) does not fit in
32-bit type (range -2147483648 to 2147483647)

The same applies to R, which is limited to 32-bit integers and 53-bit integer precision via doubles:

$ ./calc factR 100
Error: run failed
Integer overflow: 9-limb integer (576 bits) does not fit in
R's numeric type (max 2^53 for integer precision).
Use a fixed-width type (Int32, Int64) or keep computation in Python.

Note the use of idpy in the example above. This forces the factorial computation to run entirely in Python, where int is arbitrary precision. Without this, the compiler will collapse the implementation of fact to pure C++, which would be much faster, but would not show the cross-language behavior.

4.4.5. Compile-time literal bounds

Integer literals are bounds-checked at compile time against the type they’re written into. A literal that overflows its target type is rejected with a sourced error pointing at the literal:

tooLarge :: UInt8
tooLarge = 1000
$ morloc make main.loc
main.loc:2:12: error:
Integer literal 1000 overflows UInt8 (range 0 to 255)
  |
2 | tooLarge = 1000
  |            ^

The error caret points at the literal itself, not the binding name — so when the same literal is referenced from multiple sites, the diagnostic stays on the offending source.

4.4.6. Fixed-width integer types

For code where values are known to be bounded, use fixed-width types. These map directly to the target language’s native types:

Morloc type C++ Python R

Int8

int8_t

int

integer

Int16

int16_t

int

integer

Int32

int32_t

int

integer

Int64

int64_t

int

numeric (double)

UInt8

uint8_t

int

raw

UInt16

uint16_t

int

integer

UInt32

uint32_t

int

numeric (double)

UInt64

uint64_t

int

numeric (double)

Fixed-width types use direct binary serialization with no overhead — the serialized format is identical to the in-memory representation. This is the right choice for numerical code and interop with C libraries that require specific widths.

Note You might wonder why the Python types are all int rather than a truly fixed-size integer such as the numpy alternatives. There is a way to specialize types in this way that we will learn later in the "Type Hierarchies" section. Also see the later sections on Tensors and Tables where we discuss higher performance types and shared memory.

4.4.7. Negation and unary minus

Morloc supports unary minus (-) on any numeric type. The same glyph plays two roles — the binary subtraction operator and the unary negation operator — and the rule for distinguishing them is whitespace-sensitive.

-- prefix `-` on a value: produces the additive inverse
neg :: Int -> Int
neg x = -x

-- prefix `-` on an expression: parenthesize the expression
shifted :: Int -> Int
shifted x = -(x + 1)

-- works on any numeric primitive (Int, Int8..64, UInt8..64,
-- Real, Float32, Float64) via the `Negatable` typeclass
flipReal :: Real -> Real
flipReal x = -x
Negative literals

A - directly preceding a digit — with no space between the dash and the digit — is parsed as part of the numeric literal itself. This means -1 is an atomic integer (not a function call), and works in contexts where function calls are not allowed (such as pure-data files):

xs :: [Int]
xs = [-1, -2, -3, -100]

ys :: [Real]
ys = [-1.5, -2.0e-3, -0xff]

point :: (Int, Int)
point = (-3, -4)

The same atomic-lexing rule extends to the non-finite Real literals -Inf and -NaN; see the floats chapter for details.

When - is unary vs. binary

The lexer applies an asymmetric-whitespace rule. A - immediately followed by a digit is treated as part of a negative literal whenever the dash is in a position that cannot end an expression on its left:

  • at the start of input;

  • after an opening delimiter ((, [, ,, =, etc.);

  • after another operator;

  • after whitespace (the dash is preceded by space, but the digit is not).

In every other position — where the dash directly follows an operand-finishing token with no intervening whitespace — the dash is the binary subtraction operator.

Expression Interpretation

-1

atomic literal -1

f -1

f applied to literal -1 (asymmetric whitespace)

f - 1

binary subtraction f - 1 (symmetric whitespace)

f-1

binary subtraction f - 1 (no whitespace)

[-1, -2]

list of two negative literals

1 + -2

1 + (-2); the -2 is a literal

-(x + 1)

desugars to negate (x + 1)

-x

desugars to negate x

Position restrictions

Prefix - on a non-literal expression is permitted wherever an expression begins, including the right-hand side of an infix operator. The only restriction is that the operand of prefix - must start with an atom (an identifier, a literal, an open paren, an open bracket, or a similar atom-introducing token) — not with another prefix -. Stack two negations by parenthesizing the inner one.

-- ok: -x at the start of an expression
neg1 :: Int -> Int
neg1 x = -x

-- ok: -x on the right of a binary operator
neg2 :: Int -> Int
neg2 x = 1 + -x

-- ok: subtracting a negated value
neg3 :: Int -> Int -> Int
neg3 x y = x - -y

-- ok: -x parenthesized; equivalent to neg2
neg4 :: Int -> Int
neg4 x = 1 + (-x)

-- syntax error: two adjacent prefix dashes are not allowed
-- bad :: Int -> Int
-- bad x = - -x

-- ok: parenthesize the inner negation to stack two
double :: Int -> Int
double x = -(-x)
The Negatable typeclass

Negation is provided by a typeclass Negatable a defined in the internal module:

class Negatable a where
  negate :: a -> a

Each numeric primitive has a Negatable instance in root-py, root-cpp, and root-r that dispatches to the host language’s native unary minus. The expression -x is desugared by the parser to negate x, so writing negate x explicitly is equivalent. The compiler chooses the language for each negation the same way it chooses the language for any polymorphic call: based on the imported language modules and the surrounding cross-language boundaries.

4.5. Floating-point types

Morloc’s floating-point types are IEEE 754 binary formats. A Real value is the default; Float32 and Float64 are explicit precision controls.

Type Width Use case

Real

Language-dependent (typically 64-bit IEEE 754)

Default floating point.

Float32

32 bits (IEEE 754 binary32)

Tensors, GPU code, memory-constrained numerics.

Float64

64 bits (IEEE 754 binary64)

Default-precision scientific computation.

Each type maps to its host-language equivalent:

Morloc type C++ Python R

Real

double

float

numeric

Float32

float

float (with f32 conversion at the boundary)

numeric

Float64

double

float

numeric

4.5.1. Literal forms

Real literals are written with a decimal point or scientific notation:

pi :: Real
pi = 3.14159265358979

-- scientific notation (upper or lowercase 'e')
avogadro :: Float64
avogadro = 6.022e23     -- prints as 6.022e+23 (explicit + on the exponent)

-- negative exponent
boltzmann :: Real
boltzmann = 1.380649e-23

-- 32-bit float for reduced memory in tensors
weights :: Tensor1 1000 Float32

A literal without a decimal point and without an exponent is parsed as an Int, not a Real. Use 1.0 or 1e0 if you want a floating-point literal of value 1.

4.5.2. IEEE 754 and non-finite values

Morloc’s Real type reflects IEEE 754 in full: the value space includes the finite reals representable in the target precision plus three classes of non-finite values that are reserved bit patterns in the standard:

  • +Infinity (positive infinity)

  • -Infinity (negative infinity)

  • NaN (Not-a-Number)

These are produced by ordinary IEEE 754 arithmetic — e.g. dividing by zero, overflowing the finite range, or evaluating an indeterminate form like Inf - Inf or 0 * Inf. They are not error states; they are values that propagate through subsequent computation according to spec-mandated rules.

Source-level literals

The non-finite values have dedicated source-level literals, capitalized to match Morloc’s existing keyword conventions (True, False, Null):

posInf :: Real
posInf = Inf

negInf :: Real
negInf = -Inf

notANumber :: Real
notANumber = NaN

-Inf is lexed as a single atomic token (mirroring how -1.5 is one token rather than negate(1.5)), so it works in pure-morloc contexts where negate is not in scope. The same applies to -NaN, though the sign of NaN is collapsed at the wire boundary — both NaN and -NaN round-trip as the canonical nan.

All three target languages (Python, R, C++) follow IEEE 754 for arithmetic on non-finite values, so the results below are identical regardless of which language a computation runs in:

Expression Result Why

Inf + Inf

Inf

Same-sign infinity addition

Inf + (-Inf)

NaN

Invalid op: opposite-sign cancellation

Inf - Inf

NaN

Invalid op: same-sign cancellation

Inf * 0.0

NaN

Invalid op: zero times infinity

Inf * 2.0

Inf

Magnitude preservation

Inf * (-1.0)

-Inf

Sign rule on multiplication

Inf * Inf

Inf

Like-sign product

Inf * (-Inf)

-Inf

Mixed-sign product

NaN + finite

NaN

NaN absorption (additive)

NaN * 0.0

NaN

NaN beats zero

NaN * Inf

NaN

NaN beats infinity

negate Inf

-Inf

Sign-bit flip

negate NaN

NaN

Sign flip stays NaN

The behavior is mandated by IEEE 754 and is uniform across pools.

4.5.3. Compile-time literal overflow

Real literals are bounds-checked at compile time against the type they’re written into. A literal whose magnitude exceeds the target precision’s representable range is rejected with a sourced error pointing at the literal.

For Real and Float64 (max ≈ 1.8e308):

tooBig :: Real
tooBig = 1e500
$ morloc make main.loc
main.loc:2:10: error:
Float literal 1.0e500 overflows Float64 (|x| > 1.8e308)
  |
2 | tooBig = 1e500
  |          ^

The check is per-target-precision, so a literal that fits Float64 but overflows Float32 (max ≈ 3.4e38) is rejected when typed as Float32:

tooBigF32 :: Float32
tooBigF32 = 1e100
$ morloc make main.loc
main.loc:2:13: error:
Float literal 1.0e100 overflows Float32 (|x| > 3.4e38)
  |
2 | tooBigF32 = 1e100
  |             ^

Negative-magnitude literals are checked symmetrically:

main.loc:2:10: error:
Float literal -1.0e500 overflows Float64 (|x| > 1.8e308)
  |
2 | tooNeg = -1e500
  |          ^

The non-finite literal forms Inf, -Inf, and NaN bypass the bounds check by construction — they are explicit non-finite values, not finite literals that happened to overflow.

4.5.4. Wire format and JSON interop

The JSON wire format is RFC 8259-compliant: standard JSON has no syntax for non-finite numeric literals, and the spec’s recommended workaround is strings. Morloc emits non-finite Real values as quoted lowercase strings:

Value JSON form

+Inf

"inf"

-Inf

"-inf"

NaN

"nan"

Finite x

The numeric form (3.14, 4.2e16, etc.)

This means a Real-typed field can appear either as a JSON number or as a JSON string in output. Consumers should accept both shapes.

The JSON decoder is liberal: it accepts every common spelling case-insensitively, with optional sign, plus the bareword null decoded as NaN for backward compatibility with payloads written by older Morloc runtimes:

Accepted on input Decoded as

"inf", "+inf", "infinity", "+infinity", "INF", …​

+Inf

"-inf", "-infinity", "-INF", …​

-Inf

"nan", "+nan", "-nan", "NaN", …​

NaN

null (bareword)

NaN (legacy)

For example, calling a pure-morloc identity from the command line:

$ ./main realId 3.14
3.14
$ ./main realId '"inf"'
"inf"
$ ./main realId '"-Infinity"'
"-inf"
$ ./main realId '"NaN"'
"nan"

Internal cross-language boundaries (Morloc-to-pool calls) use a binary format that preserves IEEE 754 bytes verbatim, so non-finite values round-trip without information loss. Only the JSON boundary — typically the program’s final output — uses the string form.

Warning
Cross-language gotcha: division by zero in Python

All three target languages implement IEEE 754 arithmetic identically for the non-finite cases listed above. There is one notable divergence in language design, however: Python deliberately raises a ZeroDivisionError on 1.0 / 0.0, where R and C++ produce +Inf per IEEE 754.

>>> 1.0 / 0.0
ZeroDivisionError: float division by zero
> 1 / 0
[1] Inf

If a Morloc program relies on 1.0 / 0.0 evaluating to +Inf, that expression must run in a non-Python pool. Constructing infinity directly via the Inf literal (or float('inf') inside Python source) avoids the issue entirely.

4.5.5. Float32 precision considerations

Float32 halves memory usage relative to Float64 and is the right choice for large numerical arrays (tensors, image buffers, GPU input) where the extra precision is not needed. The tradeoffs to keep in mind:

  • Significand has ~7 decimal digits of precision (vs. ~15-17 for Float64).

    A literal like `0.1

    Float32` rounds to the nearest representable binary32 value — it is not exact.

  • Maximum magnitude is ≈ 3.4e38 (vs. 1.8e308 for Float64). Compile-time bounds-checking enforces this for literals.

  • All arithmetic on Float32 runs at single precision, including the overflow-to-infinity threshold.

For most application code, Real (mapped to double / numeric / host-language float) is the right default. Reach for Float32 deliberately when memory or interop with single-precision hardware demands it.

4.5.6. Negation of Real values

Negation works on Real, Float32, and Float64 the same way it does on integers, via the Negatable typeclass. See the integers chapter for the full unary-minus rules. The IEEE 754-relevant differences for Real:

  • -Inf and -NaN are atomic source literals — no negate lookup is performed, so they work in pure-morloc contexts.

  • negate of +Inf is -Inf; negate of NaN is NaN (sign bit flipped, but the value is still NaN by IEEE 754 rules).

  • negate of +0.0 is -0.0. The two values compare equal under == but have different IEEE 754 bit patterns; the binary cross-language format preserves the distinction, the JSON output does not.

4.6. Strings

Morloc strings are double-quoted. They support Unicode unicode characters:

cn = "你知道得太多了🤫"

String interpolation uses the #{…​} syntax. The expression inside the braces must have type Str — non-Str values are not auto-converted. To embed an Int, Real, Bool, or other non-Str value, call show (or another explicit stringifier) inside the braces:

helloYou you = "hello #{you}"
sayCount n = "count: #{show n}"

Inside a string, the backslash introduces an escape sequence. The following escapes are recognized:

Escape Meaning

\n

newline

\t

tab

\r

carriage return

\0

NUL byte (U+0000)

\\

a single backslash

\"

a literal double quote

Any other backslashed character (for example, \q) is a compile-time error. A literal backslash must always be written as \\. Windows-style paths, for example, must be written like so:

winPath = "C:\\Users\\weena\\file.txt"

Writing "C:\Users" instead would fail to compile, since \U is not a recognized escape.

Morloc also recognizes tripple quotes. For one-line strings, the tripple quotes can help avoid the need to escape internal quotations. For example:

dblStr = """That's weird, I also spelled it "ear quotes", like "bunny ears"."""
sinStr = '''"Why do the pigeons here have so few toes?"'''

These will yield strings that are identical to single quoted strings with the quotes escaped. The most valuable use of triple quotes, though, is for multi-line strings.

The spacing of multi-line strings is trimmed by applying the following 3 rules in order: 1. Initial spaces up to and including the first newline are removed 2. Terminal spaces up to and including the final newline are removed 3. All leading spaces are trimmed by the number of leading spaces in the line with the fewest leading spaces

This allows you to write blocks of text with natural indentation. Below is a multi-line string:

longString =
  """
  this is a long
  string
  """

It evaluates to:

this is a long
string

This allows natural paragraphs to be written without breaking indentation patterns.

4.6.1. Null strings

A particularly thorny issue in multi-lingual string support involves NUL character. In C strings, NUL characters (0 in ascii) terminate strings and thus they are generally illegal. Common C functions like strlen and strdup would fail if any NUL exists in a string. R, which is built on C, makes within-string NUL characters strictly illegal. On the other hand, Python and C++ (through the standard template string type) support NUL in strings. While these languages support NUL’s, problems can still arise in the common case where these string objects are converted to C-strings — e.g., through direct access in C++ with the .c_str() method or through C ABI in Python.

NUL’s are not common in strings. The main use case is to store binary data. This is not the recommended use of the Str type, though. It would generally be better to use [Uint8] to store bytes (or even better, Vector n Uint8, as will be introduced later). However, the Morloc philosophy is to support what is idiomatic in a language. The Str type is meant to represent the default string type in all languages (like the Int type). So the Morloc Int fully suports NUL’s in strings. They may be present in string literals through the \0 character, the Morloc evaluator preserves NULs end-to-end, and JSON represents them with the standard \u0000 escape.

All languages have an allow_string_null in their lang.yaml spec that specifies whether NUL’s are allowed. When a Str vale with a NUL character is sent to a NUL-intolerant language, the Morloc runtime rejects the call with a clear error message:

Error: r does not support embedded NUL bytes in strings
       (at args[0] (byte 3 of 7))

A Str literal that is destined to live in source of a NUL-intolerant language (typically an R-sourced function whose argument is a literal NUL string) is rejected at compile time, since the pool would otherwise fail to parse the generated source. The error names the offending pool language.

Checking all strings for NUL characters, however, is expensive. You can opt out when safe in two ways:

  • morloc make --unsafe-skip-null-check bakes a per-program skip flag into the manifest.

  • The runtime env var MORLOC_SKIP_NULL_CHECK=1 skips the scan dynamically.

Both are unsafe: a NUL passing into R will still crash inside the R runtime, just with R’s own error rather than the morloc-level diagnostic. There is no way for user-written R code to do anything useful with a NUL string.

4.7. Tuples and Lists

Two of the most common container types are tuples and lists. Tuples have a fixed size and may contain elements with different types. Lists have variable size but contain elements all of the same type.

Tuples and lists both are translated into JSON as arrays. So from the JSON alone it is impossible to tell if a value such as [1,2,3] is a list of integers or a tuple of three integers.

4.7.1. Tuples

Tuples may be used to store a fixed number of terms of different type.

x :: (Int, Bool, Real)
x = (1, True, 6.45)

Tuple types and tuple values are both represented as comma-delimited values within parentheses. The parenthesized type representation is syntactic sugar for a fixed-size tuple type such as Tuple3 or Tuple8; the parser generates the appropriate TupleN form from the number of fields, so there is no fixed upper bound on tuple arity. Generally, if you have more than a few members in a tuple, it is better to define a record type with named values.

4.7.2. Lists

Lists are homogeneous, variable-length sequences of values. The base list type is List a, which can be written as [a]:

x :: [Int]
x = [1, 2, 3]

ys :: List Real
ys = [1.0, 2.0, 3.0]

The default List type maps to each language’s natural ordered container (list in Python, std::vector in C++, list/vector in R).

While all list types share the same representation on the wire — zero or more elements in contiguous memory — there are several data structures that for accessing this data that have different performance tradeoffs. Deques can efficiently add elements to the beginning or end of the list. For an introduction in how to specialize types, see the section on type hierarchies.

A more rigorous and high-performance alternative to the List type is the Vector type, the 1D tensor, that is described in the Tensors chapter.

Tuples may be used to store a fixed number of terms of different type.

x :: (Int, Bool, Real)
x = (1, True, 6.45)

Tuple types and tuple values are both represented as comma-delimited values within parentheses. The parenthesized type representation is syntactic sugar for a fixed-size tuple type such as Tuple3 or Tuple8; the parser generates the appropriate TupleN form from the number of fields, so there is no fixed upper bound on tuple arity. Generally, if you have more than a few members in a tuple, it is better to define a record type with named values.

4.8. Records

A record is a named, fixed set of fields. Each field has a name and a type. Records may map to different structures in different languages (e.g., a Python dict, an R list, or a C++ struct); internally they are laid out positionally, but the surface language always binds field values by name.

A general record is defined as follows:

record Person = Person
    { name :: Str
    , age :: Int
    }

Concrete forms must have the same field names and field types. Since these must be the same, they need not be repeated in the concrete definitions. We only need to specify the outer name of container:

record Py => Person = "dict"
record R => Person = "list"
record Cpp => Person = "person_t"

In Python and R, records are typically dict and list types, respectively. These types can contain any fields of any type. In C++, records are represented as structs; these must be defined in the C++ code, as shown below.

struct person_t {
    std::string name;
    int age;
};

Functions may be defined that act on the records, as below:

import root-r
import root-py
import root-cpp

source R from "foo.R" ("incAge" as rinc)
source Py from "foo.py" ("incAge" as pinc)
source Cpp from "foo.hpp" ("incAge" as cinc)

-- Increment the person's age
rinc :: Person -> Person
pinc :: Person -> Person
cinc :: Person -> Person

4.8.1. Record literals match by field name

Field values in a record literal are bound to declared fields by name, not by position. The order in which fields appear in the literal is irrelevant. The two literals below denote the same value:

record Person = Person { name :: Str, age :: Int }

alice :: Person
alice = { name = "Alice", age = 30 }

alice2 :: Person
alice2 = { age = 30, name = "Alice" }   -- same value as `alice`

A literal must mention every declared field exactly once. Missing a declared field, naming a field that does not exist on the record, or repeating a field name is a compile-time error.

bad1 :: Person
bad1 = { name = "Alice" }  -- error: missing field 'age'

bad2 :: Person
bad2 = { name = "Alice", age = 30, weight = 65 }  -- error: unknown field 'weight'

bad3 :: Person
bad3 = { name = "Alice", name = "Bob", age = 30 }  -- error: duplicate field 'name'

4.8.2. Language-specific representation of records

The "foo.R" file contains the function:

incAge <- function(person){
    person$age <- person$age + 1
    person
}

No special code is needed for person, it is just a builtin R list. Similarly for Python:

def incAge(person):
    person["age"] += 1
    return person

C++ requires a definition of a person_t struct:

struct person_t {
    std::string name;
    int age;
};

person_t incAge(person_t person){
    person.age++;
    return person;
}

Records may be initialized and functions called on them:

foo name age
    = (rinc . pinc . cinc)
      { name = name, age = age }

foo, above, initializes a Person record and then increments its age 3 time in different languages.

Warning Records may contain fields with arbitrarily complex types, but recursive types are not currently supported.

4.9. Pattern functions for data access/update

Data structures may be accessed and modified using pattern functions. These are a dedicated accessor and update operators for accessing and rearranging data structures. Patterns may be getters that extract a value or a tuple of values from a data structure or they may be setters that update a data structure without changing its type.

Getter patterns

A getter pattern describes an optionally branching path into a data structure. Each segment of the path may be a tuple index, a record key, or a group of indices/keys. The terminal positions in the pattern are returned as elements in a tuple. Here are a few examples:

-- return the 1st element in a tuple of any size
.0 (1,2) -- return 1
.0 ((1,3),2,5) -- return (1,3)

-- return the 2nd element in the first element of a tuple
.0.1 ((1,3),2,5) -- return 3

-- returns the 2nd and 1st elements in a tuple
.(.1,.0) (1,2,3) -- returns (2,1)
.(.1,.0) (1,2)   -- returns (2,1)

-- indices and keys may be used together
.0.(.x, .y.1) ({x=1, y=(1,2), z=3}, 6) -- returns (1,2)

These patterns are transformed into functions that may be used exactly like any other function.

map .1 [(1,2),(2,3)] -- returns [2,3]
Setter patterns

Setter patterns are similar but add an assignment statement to each pattern terminus.

.(.0 = 99) (1,2) -- return (99,2)

-- indices and keys may be used together
.0.(.x=99, .y.1=33) ({x=1, y=(1,2), z=3}, 6) -- returns ({x=99, y=(1,33), z=3}, 6)
Table 1. Comparison of patterns to Python syntax
Pattern Python Note

.0

lambda x: x[0]

patterns are functions

.0 x

x[0]

.0.k x

x[0]["k"]

.(.1,.0) x

(x[1], x[0])

foo .0 xs

foo(lambda x: x[0], xs)

higher order

.(.k = 1) x

x["k"] = 1

Note that setters are designed to not mutate data. The spine of the data structure will be copied which retains links to the original data for unmodified fields. So the expression .(.0 = 42) x when translated into Python will create a new tuple with the first field being 42 and the remaining fields assigned to elements of the original field. The same goes for records.

4.10. where and let clauses

Functions may use where clauses to define local bindings:

f x = y + b where
    y = x + 1
    b = 41

Where clauses inherit the scope of their parent and may be nested:

f = x where
    x = y where
        y = a + b
        a = 1
    b = 41

In a where clause, bindings can refer to the function’s arguments (from the left-hand side) and can be used in the main expression (the right-hand side). The bindings in a where block are order-independent and may refer to each other freely (though must not be mutually recursive).

let is the more orderly cousin of where. There may be multiple let assignments before the terminal in. These are guaranteed to be executed in order and may only refer to terms bound above them.

f n =
  let m = n + 1
      y = m + 2
  in (m + y)

4.10.1. Scope rules: let shadows, where does not

The two forms differ in how they treat repeated names. let is non-recursive sequential: each binding is in scope for everything that follows, and a later binding can shadow an earlier one of the same name. where is order-invariant: every binding sees every other, so a name can only be bound once in a single clause and cannot collide with a function parameter.

This code is legal:

-- chain of single-binding lets
foo = let x = 1
      let x = 2
       in x       -- evaluates to 2

Whereas similar patterns in a where clause are rejected at compile time:

-- error: duplicate binding in where-clause: y
g n = y where
  y = n + 1
  y = n + 2

-- error: where-clause binding shadows function parameter: x
g x = y where
  x = 100
  y = x + 1

4.11. Conditionals

Guards provide conditional branching. Each guard clause begins with ? followed by a condition and a result expression, with a : default that is always required as the final case:

abs :: Int -> Int
abs x
  ? x >= 0 = x
  : neg x

Guards are evaluated lazily from top to bottom. The first condition that evaluates to true determines the result; remaining guards are not evaluated. The : default always terminates the guard chain, ensuring exhaustiveness.

Guards work naturally with multiple parameters:

clamp :: Int -> Int -> Int -> Int
clamp lo hi x
  ? x < lo = lo
  ? x > hi = hi
  : x

Guards can be combined with where clauses to define local bindings used in the conditions and result expressions:

classify :: Int -> Str
classify x
  ? x > big = "big"
  ? x > small = "medium"
  : "small"
  where
    big = 100
    small = 10

Guards may appear inside let bindings:

absLet :: Int -> Int
absLet x =
  let result ? x >= 0 = x
             : neg x
  in result

Guards can also be used as inline expressions anywhere a value is expected. The parentheses are required only when the inline guard appears inside a larger expression (as a function argument, inside a list, on the right of an operator, etc.); at top-level assignment position they may be omitted:

classify :: Int -> Str
classify x
  ? x > 100 = "big"
  ? x > 10 = "medium"
  : "small"

-- but inside a larger expression, the parentheses are needed
labelOf :: Int -> Str
labelOf x = "label: " <> (? x > 0 = "pos" : "non-pos")

4.12. Recursion

4.12.1. Recursive functions

Morloc supports recursive function definitions. A function may refer to itself in its body, and the compiler will generate the appropriate code in the target language.

The classic factorial function can be written using guards and self-reference:

fact :: Int -> Int
fact n
  ? n == 0 = 1
  : n * fact (n - 1)

Functions may also be mutually recursive. The following pair of functions determines (rather inefficiently) whether a number is even or odd:

isEven :: Int -> Bool
isEven n
  ? n == 0 = True
  : isOdd (n - 1)

isOdd :: Int -> Bool
isOdd n
  ? n == 0 = False
  : isEven (n - 1)
Caution

Recursion is not well supported across all target languages. Some languages impose recursion depth limits or lack tail-call optimization, which can cause crashes or stack overflows for deep recursion.

4.12.2. Recursive types

A morloc type is recursive when its definition refers to itself. To terminate, the recursion must be guarded — every cycle through the definition must pass under an ?T (optional, with Null as the base case) or [T] (list, with [] as the base case). Bare self-reference like type X = X is rejected at compile time, as are cycles spanning multiple type definitions.

The examples below assume one stdlib import for working with optional values:

import maybe-py (require, isNull)

isNull tests whether an optional value is absent; require asserts it is present and strips the ?.

Linked lists

The canonical example is a linked list: a payload paired with an optional tail of the same type. Once a value reaches Null in the tail slot, the chain terminates.

type LL a = (a, ?(LL a))

A literal value of this type can be written directly, with Null in the tail slot at the end of the chain:

llExample :: LL Int
llExample = (42, (7, (99, Null)))

A builder generating a descending range:

llRange :: Int -> LL Int
llRange n ? n > 0 = (n, llRange (n - 1))
          : (0, Null)

The recursive call returns LL Int, but the second slot expects ?(LL Int); the typechecker’s element-wise coercion (a → ?a) handles that automatically. The base case writes Null directly into the optional slot.

A consumer that counts the chain length. The selectors .0 x and .1 x retrieve the first and second tuple slots:

llLen :: LL Int -> Int
llLen x ? isNull (.1 x) = 1
        : 1 + llLen (require (.1 x))

A consumer that sums the payloads, with the same recursion shape:

llSum :: LL Int -> Int
llSum x ? isNull (.1 x) = .0 x
        : (.0 x) + llSum (require (.1 x))
Branching: binary trees

The ? guard can appear more than once in a body. A binary tree node carries a payload and two independently optional children, so each node may have zero, one, or two subtrees.

type BTree a = (a, ?(BTree a), ?(BTree a))

A literal binary tree with a root and two leaves:

btreeExample :: BTree Int
btreeExample = (10, (5, Null, Null), (15, Null, Null))

A balanced builder that shares its subtree value via let:

btreeBuild :: Int -> BTree Int
btreeBuild d ? d <= 0 = (1, Null, Null)
             : let sub = btreeBuild (d - 1)
               in (0, sub, sub)

To sum every payload, factor out the optional-handling into a small helper so the recursive function reads as the structural recursion it is:

btreeSum :: BTree Int -> Int
btreeSum x = .0 x + maybeSum (.1 x) + maybeSum (.2 x)

maybeSum :: ?(BTree Int) -> Int
maybeSum m ? isNull m = 0
           : btreeSum (require m)
List-guarded recursion: rose trees

The other allowed guard is [T]. An empty list [] is the natural base case, and arbitrary branching is encoded directly as a list of children rather than a fixed number of optional slots.

type Rose a = (a, [Rose a])

A literal rose tree with a root and two leaf children:

roseExample :: Rose Int
roseExample = (1, [(2, []), (3, [])])

A builder that produces a complete binary rose tree of a given depth (each node has either two or zero children):

roseBuild :: Int -> Rose Int
roseBuild d ? d <= 0 = (1, [])
            : let sub = roseBuild (d - 1)
              in (0, [sub, sub])

Summing the payloads of a rose tree uses the Foldable instance for List (via fold from the stdlib) to combine the children’s sums:

roseSum :: Rose Int -> Int
roseSum x = .0 x + fold (\acc child -> acc + roseSum child) 0 (.1 x)
Record form

The same recursion rules apply to record declarations. The only surface difference is that fields are addressed by name rather than position; the wire format and the typecheck rules are identical to the tuple-alias form.

record LL where
  head :: Int
  tail :: ?LL

A literal of the record form uses {…​} syntax with named fields:

llRecordExample :: LL
llRecordExample = {head = 42, tail = {head = 7, tail = Null}}

The length consumer is the same shape as for the tuple-form LL, with .tail x in place of .1 x:

llLen :: LL -> Int
llLen x ? isNull (.tail x) = 1
        : 1 + llLen (require (.tail x))
Parameterised recursion

Recursive types can carry type parameters. The parameter threads through every recursive position; the consumer below stays polymorphic in the payload.

record Container a where
  val :: a
  sub :: ?(Container a)

A literal Container Int:

containerExample :: Container Int
containerExample = {val = 1, sub = {val = 2, sub = Null}}

A polymorphic consumer:

containerLength :: Container a -> Int
containerLength x ? isNull (.sub x) = 1
                  : 1 + containerLength (require (.sub x))
Caution

Mutually recursive type aliases — two or more type definitions that reference each other in a cycle — are not supported. The compiler detects these at the frontend and rejects them with a clear error, across both general and language-specific scopes. The same rule applies whether the cycle lives in one module or spans several.

4.13. Effects and delayed evaluation

4.13.1. Why effects need a name

Morloc is a functional language. A function maps a value in one domain to a value in another, and the mapping is the function’s whole meaning. That works neatly for arithmetic, for string manipulation, for transforming records. It runs into trouble the moment we try to talk about anything that touches the world.

Consider readFile:

readFile :: Str -> Str

This looks like a function from a filename to a string. Indeed it is a function at any given instant on a given machine: the filename names a particular file, and the file has particular contents. But files change. If we read the same file twice in the same program, we may get two different answers. So it matters when we call the function and we may want to call it a several different points in time.

The same problem shows up for "values" that are not really values. What is the type of the current time? What is the type of a coin toss?

time     :: ???
coinToss :: ???

We could try to make them into honest functions by handing them an explicit world or an explicit random seed — time :: TemporalState → Time, coinToss :: RNG -> (Bool, RNG) — and thread the state through every call that needs it. This can work, but it pulls extra plumbing into every signature.

Morloc takes a different route: it gives the effect a name at the type level. <Rand> Bool is not a Bool; it is a suspended computation that, when run, performs the Rand effect and yields a Bool. Where the original problem was "this looks like a value but doesn’t act like one", the solution is to give it a type that says so.

Note <E> T is a suspended computation that performs effects E and yields a T. It is not a T. You obtain a T by running it.

4.13.2. The mental model

  • <E> T is a suspension. Holding one in a variable does nothing.

  • The bind arrow <- runs a suspension once and gives you a result.

  • A bare statement inside a do-block runs a suspension and throws the result away. This is how you sequence side effects whose return values you do not need.

  • let binds without running. If the right-hand side is effectful, the suspension is what gets bound; it only fires when a later <- reaches for it.

  • When you export an <E> T, the compiled program runs it for you at the boundary. The caller receives a T.

Effect labels are names that the compiler propagates and checks for coverage. What an effect means — what Error represents, what IO permits at runtime, what Rand looks like operationally — is the business of the library that defines the effect, not the compiler. The compiler’s job is to keep the labels honest; libraries build behaviour on top.

4.13.3. Syntax

Declaring an effect

Every effect label a program uses must be declared:

effect IO
escapable effect Error

The default form is inescapable; the escapable form is discussed in Escapable and inescapable effects. Declarations are global to the program; two modules cannot declare the same label with conflicting escapability.

A <L> that has not been declared is a compile error — the compiler does not know any effect names of its own.

Annotating signatures

An effect annotation goes immediately before the type it wraps:

readFile  :: Path -> <IO> Str
rollDie   :: Int -> <Rand> Int
riskyRead :: Path -> <IO, Error> Str

Multiple labels are comma-separated inside a single pair of angle brackets. Order does not matter; <IO, Error> and <Error, IO> are the same row.

The empty row <> is the identity: <> T is definitionally equal to T. You do not write it; the point is that a pure value satisfies any effect slot whose row contains it (see The rules).

do-blocks

A do-block strings statements together. It is the only construct in which effects are actually run. Inside a block there are exactly three forms of statement:

Form Meaning

x ← e

Run e, bind the result to x.

e

(bare) Run e, discard the result.

let x = e

Bind x to e without running anything.

The final statement of a do-block is its return value. The block’s overall type is <U> T, where U is the union of all the statements' effects and T is the type of the final statement.

A worked example covering every form:

sideEffect :: Int -> <IO> Int
add        :: Int -> Int -> Int

example :: <IO> Int
example = do
    let t = sideEffect 3     -- t :: <IO> Int, NOT run
    sideEffect 1              -- runs, result discarded
    x <- sideEffect 5         -- runs, x = 10
    let y = add x 1           -- y = 11, no run
    z <- t                    -- NOW t runs; z = 6
    add y z                   -- returns 17

Both layout-indented form (as above) and brace form (do { x ← e; y ← f; …​ }) are accepted.

When do is needed and when it isn’t

A do-block is not always required. A single effectful expression stands on its own:

forceOnce :: <IO> Int
forceOnce = sideEffect 5

Use a do-block when you need to sequence multiple statements, bind intermediate results, or run a suspension for its effects only. A do-block is itself an expression, so it can appear as an argument:

handle (do
    v <- foo x
    v)

4.13.4. The rules

The whole type-checking story for effects is four rules.

  1. Empty rows vanish. <> T == T. A pure T satisfies any <E> T slot; the empty row is the subset of every row.

  2. More effects are a supertype of fewer. <E1> T <: <E2> T exactly when the concrete labels of E1 are a subset of E2. A <IO> Int is usable where <IO, Error> Int is expected; the reverse is not.

  3. Effects don’t leak silently. A value of type <E> T with non-empty E cannot be assigned to a slot of type T, nor to a bare type variable with no effect annotation. If you intend the effect to escape, you say so in the type.

  4. A do-block collects. Its row is the union of its statements' rows; its type is <that-union> T, where T is the type of its final statement.

A few illustrations:

-- Rule 1: pure is a subtype of <IO>
pureFortyTwo :: <IO> Int
pureFortyTwo = 42                  -- OK

-- Rule 2: widening is fine
ioFunc      :: <IO> Int
testSubtype :: <IO, Error> Int
testSubtype = do
  x <- ioFunc
  x                                -- OK: <IO> <: <IO, Error>

-- Rule 2: narrowing is rejected
rint :: <IO, Error> Int
a    :: <IO> Int
a = rint                           -- ERROR: Error not in <IO>

-- Rule 3: effects can't be dropped into a pure slot
rint :: <IO> Int
b    :: Int
b = rint                           -- ERROR: <IO> Int is not Int

-- Rule 4: the union of statements' effects
readValue   :: <IO> Int
riskyDouble :: Int -> <Error> Int

combined :: <IO, Error> Int
combined = do
  x <- readValue                   -- contributes <IO>
  y <- riskyDouble x               -- contributes <Error>
  y

There is one more guarantee the user sees but does not write down: an exported <E> T is forced automatically at the boundary. The compiled program’s user receives a T. Effects do not escape the binary.

4.13.5. Effect row variables

Combinators that thread effects need to be able to talk about sets of unknown effects. For that, an effect row may include a single lowercase variable that represents a set of zero or more unknown effects:

The function mapE, below, carries the all the effects of the mapping function to the final value:

mapE :: (a -> <e> b) -> [a] -> <e> [b]

Effect variables and constants may be mixed, but there may be no more than one effect variable associated with a given term.

In the following code, the signature states the requirement that function f produce a random effect, but allows other effects to be produced as well.

foo :: (Int -> <Rand,e> Int) -> <Rand,e> Int
foo f x = do
  y <- f x
  y * 2

If the programmer had instead specified the effect <Rand>, then only the Rand effect would be allowed and any additional effects would raise an error in the typechecker.

4.13.6. Escapable and inescapable effects

The default form effect E is inescapable. An inescapable effect that appears in a function’s arguments must also appear in its result. The compiler enforces this on every signature, sourced or defined.

effect Cap                         -- inescapable

passt :: <Cap> Int -> <Cap> Int    -- OK: Cap propagates
bad   :: <Cap, e> a -> <e> a       -- ERROR: Cap dropped from result

Effects may alternatively be explicitly defined as escapable and then functions may be sourced to handle the effect.

escapable effect Error

source Py from "ops.py" ("foo", "handle_error" as handleError)

foo :: Int -> <Error> Int
-- a handler that discharges Error and lets every other effect through
handleError :: <Error, e> a -> <e> a

-- four equivalent ways to write the same call
gDirect, gDoWrap, gDoForce, gDoBind :: Int -> Int

gDirect  x = handleError (foo x)
gDoWrap  x = do { handleError (foo x) }
gDoForce x = handleError (do { v <- foo x ; v })
gDoBind  x = handleError (do { foo x })

4.14. Optional types

All programming languages must have a way to deal with missing values. If you query a database for a record that doesn’t exist, what is returned? If a parameter is not set, what value does it have? In Python, the None type stores missing values. In R, NULL serves a similar purpose. In both languages, types that may lack values are represented as a union of the original type and the null type. JSON, similarly, stores missing values as null. Other languages solve this problem in libraries. C++ has the standard template library type std::optional<T> for representing values of generic type T that may be missing.

One of the core principles of Morloc is that sourced functions should be idiomatic. So Morloc needs a built-in mechanism that can vary freely in language-specific implementation while preserving between language consistency. To this end, Morloc offers a dedicated "optional" type with supported implicit coercion.

4.14.1. Syntax

The ? prefix marks a type as optional and the Null constructor indicates an absent value. ?Int is an integer that might be Null, ?Str is a string that might be Null, and so on. The ? prefix can be applied to any type, including lists (?[Int]) and records (?Person).

--' Get the first element from a list or empty on failure
safeHead :: [Int] -> ?Int
require :: a -> ?a -> a

The Null constructor represents an absent value:

testNull :: ?Int
testNull = Null
Note Morloc uses Null (capitalized) in source code, following the convention that constructors begin with an uppercase letter (consistent with the booleans True and False). In JSON output, absent optionals are serialized as lowercase null per the JSON standard.

4.14.2. Working with optional values

Functions that produce or consume optional values are sourced from foreign languages like any other function. Here is a complete example in Python:

module main (testSafeHead, testSafeHeadEmpty, testFromNull)

import root-py

safeHead :: [Int] -> ?Int
safeHead xs
  ? length xs == 0 = Null
  : at 0 xs

source Py from "main.py" ("require")
require :: a -> ?a -> a

testSafeHead :: ?Int
testSafeHead = safeHead [10, 20, 30]

testSafeHeadEmpty :: ?Int
testSafeHeadEmpty = safeHead []

testFromNull :: Int
testFromNull = require 0 Null

The Python implementations handle None in the usual way:

def require(default_val, x):
    if x is None:
        return default_val
    return x

Running this program gives:

$ ./main testSafeHead
10
$ ./main testSafeHeadEmpty

$ ./main --keep-null testSafeHeadEmpty
null
$ ./main testFromNull
0
Note When the top-level result of an exported function is Null (or ()), the nexus prints an empty line rather than the literal string null. The reason is that printing null or () would be noisy in a CLI tool and could cause problems if it mixes the tailing null with STDOUT (a downstream consumer that ingested a stray null line could choke or, worse, treat it as a valid record). Pass --keep-null to the nexus if you want the literal null emitted instead, as shown above.

The same pattern works in C++ (using std::optional) and R (using NULL). In C++:

#include <optional>

template <class T>
T require(T default_val, const std::optional<T>& x) {
  if(x.has_value()){
    return x.value();
  } else {
    return default_val;
  }
}

In R:

require <- function(default_val, x){
  if(is.null(x)){
    return(default_val)
  } else {
    return(x)
  }
}

4.14.3. Optional record fields

Record fields can be optional as well. This is useful for data with missing or unknown values. The where form below is an alternative syntax for record declarations (equivalent to the brace syntax used in the Records section):

record Person where
  name :: Str
  age :: ?Int
record Py => Person = "dict"

makePerson :: Str -> ?Int -> Person
source Py from "foo.py" ("makePerson")

alice :: Person
alice = makePerson "Alice" 30

bob :: Person
bob = makePerson "Bob" Null

When serialized to JSON, alice becomes {"name":"Alice","age":30} and the age field of bob becomes null.

4.14.4. Optional values across languages

Optional types work seamlessly across language boundaries. A function in one language can produce an optional value that is consumed by a function in another:

-- C++ produces an optional value
cSafeDiv :: Int -> Int -> ?Int
source Cpp from "foo.hpp" ("cSafeDiv")

-- Python consumes it
pFromNull :: Int -> ?Int -> Int
source Py from "foo.py" ("pFromNull")

-- Chain them together: C++ to Python
testCppToPy :: Int
testCppToPy = pFromNull (-1) (cSafeDiv 10 3)

testCppToPyNull :: Int
testCppToPyNull = pFromNull (-1) (cSafeDiv 10 0)

The Morloc compiler generates the necessary serialization code at each language boundary. A null value in C++ (std::nullopt) is serialized as JSON null, which Python reads as None. The programmer does not need to handle the interop manually.

4.14.5. Implicit coercion

Morloc automatically coerces a non-optional value to an optional when the context requires it. If a function expects ?Int, you can pass a plain Int without wrapping it:

addOpt :: ?Int -> ?Int -> ?Int
source Py from "foo.py" ("addOpt")

-- Both arguments are plain Int, coerced to ?Int automatically
testCoerceAddOpt :: ?Int
testCoerceAddOpt = addOpt 3 4

require :: a -> ?a -> a
source Py from "foo.py" ("require")

-- The second argument (42) is Int, coerced to ?Int
testCoerceArg :: Int
testCoerceArg = require 0 42

Coercion to ?a fires whenever the context requires an optional value; the plain value flows through without a wrapper at the call site.

Coercion also works across language boundaries. If a C++ function returns Int and a Python function expects ?Int, the compiler inserts the appropriate serialization so that the value is received correctly:

-- C++ returns a plain Int
cAddOne :: Int -> Int
source Cpp from "cfoo.hpp" ("cAddOne")

-- Python expects ?Int in the second argument
pUnwrapOr :: a -> ?a -> a
source Py from "pfoo.py" ("pUnwrapOr")

-- The Int result from C++ is coerced to ?Int for Python
testCppIntToPyOpt :: Int
testCppIntToPyOpt = pUnwrapOr 0 (cAddOne 41)  -- returns 42

4.14.6. Nested optionals are idempotent

?(?T) is accepted by the parser and the typechecker, but at runtime it collapses to a single ?T. There is one Null, with no way to distinguish "outer Null" from "inner Null". This is by design.

The reason is rooted in why ? is a dedicated language primitive rather than a library construct like C++'s Optional. ? must lower to the native "missing value" of every target language: None in Python, NULL in R, std::optional<T> in C++, and so on. In Python and R — and in most dynamic languages — the null value is structureless. There is no language mechanism for distinguishing an "outer None`" from an "inner `None`": both are the same singleton atom. If Morloc allowed two distinguishable null levels, the semantics would diverge across backends (C++ could fake it with nested `std::optional, but Python and R fundamentally cannot), which would break the portability guarantee that ? is meant to provide.

So ? is intentionally idempotent across the language: ?T, ?(?T), and ?(?(?T)) all serialize to the same wire format and the same runtime representation in every backend.

If you genuinely need layered nullability — for example, distinguishing "the database lookup failed" from "the lookup succeeded but the field was unset" — define a custom type that encodes the distinction explicitly.

-- ? collapses, so both of these are the same value at runtime
collapsed1 :: ?(?Int)
collapsed1 = Null

collapsed2 :: ?(?Int)
collapsed2 = 7   -- treated identically to (7 :: ?Int)

-- For layered failure modes, build a custom type instead.
record LookupResult = LookupResult
  { tableMissing :: Bool   -- step 1 failure
  , fieldMissing :: Bool   -- step 2 failure
  , value :: ?Int          -- present when both succeeded
  }
Note Sum types (tagged unions like data Result = Found Int | Missing) are planned but not yet supported in Morloc. Their cross-language design is the open problem — not every backend has a first-class sum representation.

4.15. Intrinsics

Intrinsics are compiler-generated special functions. They are prefixed with @ and provide access to the Morloc runtime.

4.15.1. Reference table

Intrinsic Signature Description

@save

a → Str → <IO> ()

Save a value to file in flat binary format (fast, minimal overhead)

@savem

a → Str → <IO> ()

Save a value to file in MessagePack format (portable, compact)

@savej

a → Str → <IO> ()

Save a value to file as plain JSON text (human-readable)

@load

Str → <IO> ?a

Load a value from file, auto-detecting the format. Returns null if the file does not exist.

@hash

a → Str

Hash a value via MessagePack serialization (xxhash), returns a 16-character hex string

@version

Str

The compiler version string (resolved at compile time)

@compiled

Str

The compilation timestamp (resolved at compile time)

@lang

Str

The canonical language identifier of the pool where the expression is evaluated — the name field from lang.yaml ("py", "cpp", "r", …​; "morloc" at the nexus level)

@datafile

Str → Str

Resolve a relative path to the installed data file location (resolved at compile time)

@schema

a → Str

The serialization schema string for the given type

@typeof

a → Str

The morloc abstract type name for the given type, e.g. "Int", "[Str]", "?Real", "(Int, Str)"

Several intrinsics are polymorphic in their data argument: @save, @savem, @savej, @hash, @schema, and @typeof accept a value of any type. @load returns a value of any type, inferred from context. The @save/@savem/@savej functions return <IO> () because they perform I/O as a side effect. The remaining intrinsics (@version, @compiled, @lang, @datafile) operate on or return plain strings and are resolved at compile time.

4.15.2. Hashing

@hash computes a fast, non-cryptographic hash (xxhash) of any value. The value is first serialized to MessagePack internally, then hashed. The result is a 16-character hexadecimal string.

module main (hashInt, hashStr)

import root-py (id)

hashInt :: Int -> Str
hashInt x = @hash (id x)

hashStr :: Str -> Str
hashStr x = @hash (id x)

Hashing is deterministic: the same value always produces the same hash. Two values of different types may hash differently even if they look similar (e.g., the integer 1 and the string "1"), because their MessagePack serializations differ.

4.15.3. Compile-time constants

The @version, @compiled, and @lang intrinsics are resolved at compile time. They can be used anywhere a Str value is expected.

module main (info)

import root-py (id)

info :: [Str]
info = id [@version, @compiled, @lang]

Running ./info info might produce:

["0.85.0", "<compile-timestamp>", "morloc"]

The @lang value depends on where the expression is evaluated. When the list literal above is assembled at the nexus level (not inside a sourced function), @lang resolves to "morloc". To observe the language-pool identifier, pass @lang into a sourced function and let it be evaluated inside that pool: the value will be that pool’s canonical language identifier — the name field from its lang.yaml ("py", "cpp", "r", …​).

@lang deliberately returns this short canonical identifier, not a human-facing display name like "Python3" or "C++". Intrinsics are low-level primitives where stability outweighs presentation: the lang.yaml name is the guaranteed-unique, stable identifier for a language backend, so it is the correct value for conditional logic and tooling. Map it to a prettier label yourself if you need one.

4.15.4. Saving and loading data

The @save, @savem, and @savej intrinsics write a value to a file path. @load reads it back. Together they provide a type-safe file persistence mechanism.

@save uses the flat binary format, which is the fastest option — the value’s in-memory representation is written to disk with minimal serialization overhead (no text encoding or schema parsing, only pointer translation). @savem uses MessagePack, which is compact and portable across different machines and architectures. @savej writes plain JSON, which is human-readable and can be edited by hand or consumed by other tools.

@load auto-detects the file format. Files written by @save or @savem carry a small header that @load uses to distinguish the binary and MessagePack formats. If no header is present, @load tries to parse the file as JSON. This means @load can read files written by any of the three save intrinsics, and it can also read plain JSON files that were created outside of Morloc.

Since @load returns <IO> ?a, it is an effectful computation that yields an optional value. If the file does not exist, the result is null rather than an error. This makes it natural to use @load for optional configuration or cached data.

Here is a basic round-trip example:

module main (roundTrip)

import root-py (id)

roundTrip :: Int -> Str -> <IO> ?Int
roundTrip x path = do
  @save (id x) path
  @load path

The @save call writes the integer to the given path. Then @load reads it back. Because @load is the final expression in the do block, its result is the return value of roundTrip.

You can also use @savej when you want the output to be readable:

module main (saveReadable)

import root-py (id)

saveReadable :: [Str] -> <IO> ()
saveReadable xs = @savej (id xs) "output.json"

The resulting output.json file is plain JSON that can be inspected in any text editor.

4.15.5. Caching with save and load

A common pattern is to check whether a cached result exists before recomputing it. Since @load returns null when the file is missing, you can branch on the result:

module main (cachedResult)

import root-py (id)
import maybe-py (default)

source Py from "compute.py" ("expensiveComputation")
expensiveComputation :: Int -> Int

cachedResult :: Int -> Str -> <IO> Int
cachedResult x cachePath = do
  cached <- @load cachePath
  let result = default (expensiveComputation x) cached
  @save (id result) cachePath
  result

On the first call, @load returns null because the cache file does not exist. default falls through to calling expensiveComputation. The result is saved for future calls. On subsequent calls, @load returns the cached value and default uses it directly, skipping the computation.

You can also use @hash to build content-addressed caches where the cache path depends on the input:

module main (hashedCache)

import root-py
import maybe-py (default)

source Py from "compute.py" ("expensiveComputation")
expensiveComputation :: Int -> Int

hashedCache :: Int -> <IO> Int
hashedCache x = do
  let key = @hash (id x)
  let cachePath = "/tmp/cache_" <> key <> ".bin"
  cached <- @load cachePath
  let result = default (expensiveComputation x) cached
  @save (id result) cachePath
  result

Each distinct input gets its own cache file, keyed by the xxhash of its serialized form.

4.15.6. Accessing installed data files

The @datafile intrinsic resolves a relative file path to its location in the installed program directory. When you compile with morloc make --install, source files and data files listed in package.yaml are copied into the install directory. At runtime, these files are no longer at their original paths. @datafile bridges this gap by resolving the path at compile time.

module main (readConfig)

import root-py

source Py from "config.py" ("loadConfig")
loadConfig :: Str -> Str

readConfig :: Str
readConfig = loadConfig (@datafile "defaults.json")

Here @datafile "defaults.json" evaluates to the absolute path where defaults.json is installed (e.g., ~/.local/share/morloc/exe/main/defaults.json). The Python function receives this path as a plain string and can open the file normally.

When running without --install (plain morloc make), @datafile returns the relative path unchanged, so the program works from the project directory as expected.

Note Source functions that need data files should accept the path as a parameter rather than hardcoding relative paths. This keeps data dependencies explicit in the type signature and ensures files are found correctly whether the program is run from the project directory or installed.

4.15.7. Type introspection

The @schema and @typeof intrinsics return information about how the compiler represents a type. The value argument is not evaluated at runtime — only its type matters.

module main (showSchema, showType)

import root-py (id)

showSchema :: Int -> Str
showSchema x = @schema (id x)

showType :: Int -> Str
showType x = @typeof (id x)

@typeof returns the morloc abstract type name (the same way the type would be written in a signature): "Int", "Str", "Real", "Bool", "[Int]", "?Int", "(Int, Str)", and so on. It does not return the language-native type name in the current pool.

@schema returns the internal serialization schema string used by the compiler for MessagePack and binary serialization. The encoding is short, byte-oriented, and stable for a given compiler version. The alphabet:

Schema fragment Type

j

Int (default variable-width integer)

i1 / i2 / i4 / i8

Int8 / Int16 / Int32 / Int64

u1 / u2 / u4 / u8

UInt8 / UInt16 / UInt32 / UInt64

f4 / f8

Float32 / Float64 (and Real, which maps to f8)

b

Bool

s

Str

z

Null / () (Unit)

?X

Optional X — ? prefix followed by an inner schema

aX

List X (and Array, Deque, Vector) — a prefix followed by an inner schema; fixed-dim arrays append :N

tN X1 X2 …​ XN

Tuple of N elements — t followed by a length code and one schema per element

mN key1 X1 …​

Named record of N fields — m followed by a length and (keylen, keytext, schema) per field

T / T:N …​

Arrow table primitive; bare T is row-polymorphic, T:N declares N required columns

*

Unknown (unresolved) type

<typename>…​

Optional concrete-type hint prefix (e.g., <Money>j for type Money = Int)

So [Int] serializes as aj, ?Int as ?j, and (Int, Str) as t2 j s. @schema is primarily useful for debugging and for cross-language tools that inspect morloc wire formats.

5. Advanced Types

5.1. One term may have many definitions

Morloc supports term polymorphism. Each term may have many definitions. This is most useful for helper functions that have reasonable implementations in multiple languages. By providing definitions in several languages, the compiler can select implementations that avoid unnecessary cross-language calls. This allows the program to collapse around specialized, single-language functions. Without term polymorphism, changing the language of any component would require manually rewriting and rewiring large sections of the program.

The function mean, below, is given three definitions:

import root-cpp

source Cpp from "mean.hpp" ("mean")
mean :: [Int] -> Int
mean xs = sum xs // length xs
mean xs = fold (+) 0 xs // length xs

The mean function is 1) sourced directly from C++, 2) defined in terms of the sum function, and 3) defined more generally with sum written as a fold operation. The Morloc compiler is responsible for deciding which implementation to use.

The equals operator in Morloc indicates functional substitutability. When you say a term is "equal" to something, you are giving the compiler an option for what may be substituted for the term. The function mean, for example, has three functionally equivalent definitions.

Now this ability to simply state that two things are the same can be abused. The following statement is syntactically allowed in Morloc:

x = 1
x = 2

What is x after this code is run? It is 1 or 2. The latter definition does not mask the former, it appends the former. Now in this case, the two values are certainly not substitutable. Morloc has a simple value checker that will catch this type of primitive contradiction (literal equality and container size conflicts). The value checker is in early development and cannot yet catch more nuanced errors, such as:

x = 2 / (1 + 1)
x = 2 / 1

In this case, the type checker cannot check within the implementation of (+), so it cannot know that there is a contradiction. For this reason, some care is needed in making these definitions.

5.1.1. Term polymorphism example: Polyglot test suites

With term polymorphism we can design arbitrarily complex programs that will collapse into very different implementations depending on what implementations for the terms are subsequently imported. A powerful example of this is the approach to testing used in the Morloc standard library.

Each "section" in the standard library is comprised of a language-agnostic parent module and many language-specific child modules. The parent module defines all terms that will be exported and the implements a comprehensive test suite.

Here is a stripped down example of the language-agnostic test suite for a toy clock module:

module clock (incSec, incMin, incHour)

import root

incSec :: (Int,Int,Int) -> (Int,Int,Int)
incMin :: (Int,Int,Int) -> (Int,Int,Int)
incHour :: (Int,Int,Int) -> (Int,Int,Int)

This module defines type signatures for the exported clock functions and it imports the language-independent root module (just for Int definitions).

module clock.test (test)

import clock (incMin, incHour, incSec)


source Py from "test.py" ("testEqual", "printMsg", "printResult")
testEqual
    :: Str -- description of the test
    -> a
    -> a
    -> (Int, Int) -- total number of tests and total fails
    -> (Int, Int)
printMsg :: Str -> a -> a
printResult :: (Int, Int) -> (Int, Int)

testGroup
    :: Str
    -> ((Int, Int) -> (Int, Int))
    -> (Int, Int)
    -> (Int, Int)
testGroup msg tests = tests . printMsg msg

passed :: (Int, Int) -> Bool
passed t = 0 == (.0 t)

test = runTests (0,0)

The actual standard library modules define specialized Morloc functions for managing test groups, counting failures, pretty-printing results, etc.

5.2. Overload terms with typeclasses

Typeclasses allow the same function name to have different implementations for different types. Unlike term polymorphism (where the compiler freely chooses between alternative definitions of a term), a typeclass instance is selected based on the type the function is applied to. This idea is similar to typeclasses in Haskell, traits in Rust, interfaces in Java, and concepts in C++.

In the example below, Addable and Foldable classes are defined and used to create a polymorphic sum function.

class Addable a where
    zero :: a
    (+) :: a -> a -> a

instance Addable Int where
    source Py from "arithmetic.py" ("add" as (+))
    source Cpp from "arithmetic.hpp" ("add" as (+))
    zero = 0

instance Addable Real where
    source Py from "arithmetic.py" ("add" as (+))
    source Cpp from "arithmetic.hpp" ("add" as (+))
    zero = 0.0

class Foldable f where
    foldr :: (a -> b -> b) -> b -> f a -> b

instance Foldable List where
    source Py from "foldable.py" ("foldr" as foldr)
    source Cpp from "foldable.hpp" ("foldr" as foldr)

sum = foldr (+) zero
Note This example defines its own Addable class with zero and (+). These conflict with the Integral class from the standard library, so this example should not be combined with import root-py, import root-cpp, or similar root imports.

The instances may import implementations for many languages.

The native functions may themselves be polymorphic, so the imported implementations may be repeated across many instances. For example, the Python source function add (sourced as (+)) may be written as:

def add(x, y):
    return x + y

And the C++ source function add as:

template <class A>
A add(A x, A y){
    return x + y;
}

Typeclasses may be imported from other modules. For example, a module that defines the Ord typeclass and derived operators can be imported and instantiated in another module:

import numops (Ord, (<), (>), (>=), min)

instance Ord Int where
    source Py from "foo.py" ("le" as (<=))

5.3. Infix operators

Morloc supports user-defined infix operators with explicit associativity and precedence. Operators are declared with infixl (left-associative), infixr (right-associative), or infix (non-associative) followed by a precedence level in the range 0 through 9 inclusive (higher binds tighter, matching the Haskell convention). Values outside this range are rejected at parse time.

infixl 6 +
infixl 7 *
infixr 8 **

5.3.1. Reserved operator names

Operator names cannot begin with --. The sequence -- always starts a line comment, regardless of what follows it:

-- this is a comment
--' this is a docstring (special comment variant)
--* this is a doc-group annotation (special comment variant)

Because -- is unconditionally a comment opener, any operator declaration like infixl 6 --+ or infixl 6 --- is treated as a comment from -- onward. The comment runs to end-of-line, so the infixl declaration is left incomplete and the parser fails on the next line of the file with an unrelated-looking error. Choose an operator name that does not begin with --. The -- prefix is reserved by the compiler so that comment variations (--', --*) can be added in the future without colliding with user operators.

Operators are given type signatures by wrapping them in parentheses:

(+) :: a -> a -> a
(*) :: a -> a -> a
(**) :: Int -> Int -> Int

Operators may be sourced from foreign languages like any other function:

source Py from "ops.py" ("add" as (+), "mul" as (*))

Infix operators work naturally with typeclasses:

class Num a where
    zero :: a
    negate :: a -> a
    (+) :: a -> a -> a
    (*) :: a -> a -> a

infixl 6 +
infixl 7 *

instance Num Int where
    source Py from "foo.py" ("add" as (+), "mul" as (*), "neg" as negate)
    zero = 0

-- now we can write natural expressions
test_expr :: Int
test_expr = 4 * 7 + 3  -- evaluates to 31 (precedence: 4*7 first, then +3)

Operators may also be imported from other modules:

import ops ((&), (|))

5.4. Defining non-primitive types

Types that are composed entirely of Morloc primitives, lists, tuples, records and tables may be directly and unambiguously translated to Morloc binary forms and thus shared between languages. But what about types that do not break down cleanly into these forms? For example, consider the parameterized Map k v type that represents a collection with keys of generic type k and values of generic type v. This type may have many representations, including a list of pairs, a pair of columns, a binary tree, and a hashmap. In order for Morloc to know how to convert all Map types in all languages to one form, it must know how to express Map type in terms of more primitive types. The user can provide this information by defining instances of the Packable typeclass for Map. This typeclass defines two functions, pack and unpack, that construct and deconstruct a complex type.

class Packable a b where
    pack :: a -> b
    unpack :: b -> a

The Map type for Python and C++ may be defined as follows:

type Py => Map key val = "dict" key val
type Cpp => Map key val = "std::map<$1,$2>" key val
instance Packable ([a],[b]) (Map a b) where
    source Cpp from "map-packing.hpp" ("pack", "unpack")
    source Py from "map-packing.py" ("pack", "unpack")

The Morloc user never needs to directly apply the pack and unpack functions. Rather, these are used by the compiler within the generated code. The compiler constructs a serialization tree from the general type and from this trees generates the native code needed to (un)pack types recursively until only primitive types remain. These may then be directly translated to Morloc binary using the language-specific binding libraries.

In some cases, the native type may not be as generic as the general type. Or you may want to add specialized (un)packers. In such cases, you can define more specialized instances of Packable. For example, if the R Map type is defined as an R list, then keys can only be strings. Any other type should raise an error. So we can write:

type R => Map key val = "list" key val
instance Packable ([Str],[b]) (Map Str b) where
    source R from "map-packing.R" ("pack", "unpack")

Now whenever the key generic type of Map is inferred to be anything other than a string, all R implementations will be pruned.

5.5. Mapping general types to native types

When a function is sourced from a foreign language, Morloc needs to know how Morloc general types map to the function’s native types. This information is encoded in language-specific type functions. For examples:

type R => Bool = "logical"
type Py => Bool = "bool"
type Cpp => Bool = "bool"

type R => Int32 = "integer"
type Py => Int32 = "int"
type Cpp => Int32 = "int32_t"

Language-specific types are always quoted since they may contain syntax that is illegal in the Morloc language.

A function such as an integer addition function addInt:

addInt :: Int32 -> Int32 -> Int32

This can be automatically mapped to a C++ function with the prototype int32_t addInt(int32_t x, int32_t y). Morloc also provides an Int type that maps to whatever the default integer type is in a given language (e.g., int in C++, int in Python). When a specific width is needed, use the explicit types such as Int32 or Int64.

Containers can be similarly mapped to native types:

type Py => List a = "list" a
type Cpp => List a = "std::vector<$1>" a

The $1 symbol is used to represent the interpolation of the first parameter into the native type. So the Morloc type List Int32 would translate to std::vector<int32_t> in C++.

5.5.1. Type alias resolution

When you define a general type alias, the compiler automatically resolves language-specific types by following the alias chain. You do not need to redundantly define language-specific mappings for every alias.

For example, given:

type Py => Str = "str"
type LastName = Str

The compiler resolves LastName to Str and then to "str" in Python — there is no need to write type Py ⇒ LastName = "str". The same applies to any depth of aliasing: each step is resolved until a language-specific mapping is found.

You can provide an explicit language-specific mapping to override this default resolution. For example, if you wanted LastName to be stored as a bytes object in Python rather than a string:

type Py => LastName = "bytes"

This override only applies to LastName; the base Str type remains mapped to "str" as before.

5.6. Type hierarchies

Morloc types form hierarchies through type aliases. A type alias like type Bytes = Str declares that Bytes belongs to the same representation family as Str — they serialize to the same binary form and can cross language boundaries identically. But Bytes is not merely a synonym. It is a distinct type that can have its own language-specific representation, its own typeclass instances, and its own semantic meaning.

This section walks through type hierarchies using strings as a running example.

5.6.1. Transparent aliases

The simplest kind of alias creates a semantic label with no behavioral difference:

type Filename = Str

Filename is a transparent alias. It adds meaning for the reader — "this string is a file path" — but has no effect on compilation. Since there is no language-specific override, Filename resolves to the same native type as Str in every language: str in Python, std::string in C++, character in R.

A Filename value can be used anywhere a Str is expected. Typeclass instances defined for Str are inherited automatically:

import root-py

type Filename = Str

-- This works: Filename inherits Eq from Str
sameFile :: Filename -> Filename -> Bool
sameFile a b = a == b

If no Eq Filename instance is defined, the compiler walks the alias chain Filename → Str and finds Eq Str. The lookup always moves toward the root of the hierarchy, never sideways to siblings.

5.6.2. Concrete type overrides

Some aliases need different native representations in certain languages. Python distinguishes between text strings (str) and byte strings (bytes), but C++ might use std::string for both:

type Bytes = Str

type Py => Bytes = "bytes"
-- No C++ override: Bytes resolves to std::string (same as Str)

The general declaration type Bytes = Str places Bytes in the Str representation family. The concrete override type Py ⇒ Bytes = "bytes" tells the Python code generator to use the native bytes type instead of str.

This override has consequences. A Python function that operates on bytes cannot accept a str argument, even though at the general level Bytes = Str. The compiler respects this: when the concrete types diverge, values of one type cannot be silently substituted for the other in that language.

5.6.3. Sourcing language-specific functions

With Str and Bytes established, we can source Python functions that work with each:

-- Convert between representations
toBytes :: Str -> Bytes
source Py from "strlib.py" ("to_bytes" as toBytes)

fromBytes :: Bytes -> Str
source Py from "strlib.py" ("from_bytes" as fromBytes)

The Python implementations:

def to_bytes(s):
    return s.encode("utf-8")

def from_bytes(b):
    return b.decode("utf-8")

In C++, where both Str and Bytes map to std::string, these conversions are identity functions. You could source a C++ no-op, or the compiler can optimize them away when the concrete types are identical.

5.6.4. Typeclasses with aliases

Typeclasses and type hierarchies interact in two ways: an alias can inherit instances from its parent, or it can specialize with its own implementation.

Specialization: Filelike

The Filelike typeclass defines operations specific to filesystem paths. Only Filename (and types that descend from it) should be instances:

class Filelike a where
  extension :: a -> Str

instance Filelike Filename where
  source Py from "strlib.py" ("get_extension" as extension)
import os

def get_extension(path):
    return os.path.splitext(path)[1]

Since Filename is an alias for Str, you might wonder whether Str inherits Filelike. It does not — inheritance flows up the hierarchy (from child to parent), not down (from parent to child). Filename inherits Str’s instances, but `Str does not gain `Filename’s.

Attempting to call extension on a bare Str produces a compile error:

-- Type error: no instance Filelike Str
bad = extension ("not a filename" :: Str)
Shared behavior: Stringlike

Some operations make sense for both Str and Bytes. A typeclass can capture this:

class Stringlike a where
  split :: Str -> a -> [a]
  trim :: a -> a

instance Stringlike Str where
  source Py from "strlib.py" ("split_str" as split, "trim_str" as trim)

instance Stringlike Bytes where
  source Py from "strlib.py" ("split_bytes" as split, "trim_bytes" as trim)
def split_str(sep, s):
    return s.split(sep)

def trim_str(s):
    return s.strip()

def split_bytes(sep, b):
    return b.split(sep.encode("utf-8"))

def trim_bytes(b):
    return b.strip()

Both Str and Bytes have their own Stringlike instances with distinct implementations. The split function for Bytes encodes the separator because Python’s bytes.split requires a bytes argument.

Since Filename is an alias for Str with no Stringlike override, it inherits Stringlike Str:

-- Works: Filename inherits Stringlike from Str
splitPath :: Filename -> [Filename]
splitPath p = split "/" p

5.6.5. The alias family tree

The examples above form a small hierarchy:

Str                 -- root type
 +-- Bytes          -- concrete override in Python ("bytes")
 +-- Filename       -- transparent alias (no override)

The rules:

  1. Inheritance flows upward. Filename inherits all of Str’s typeclass instances. `Bytes also inherits from Str, but only when the concrete types are compatible.

  2. Specialization takes precedence. If instance Stringlike Bytes is defined, it is used instead of the inherited Stringlike Str.

  3. Siblings are independent. Bytes and Filename do not inherit from each other. An instance defined for Bytes is not available to Filename, and vice versa.

  4. Concrete overrides constrain inheritance. When Bytes has type Py ⇒ Bytes = "bytes", Python functions sourced for Str (which operates on "str") cannot be used for Bytes in Python, even though the general types are alias-equivalent.

5.6.6. Larger hierarchies

The same principles scale to deeper trees. The standard library’s sequence types form a hierarchy rooted at List:

type List a

type Deque a = List a
type Cpp => Deque a = "std::deque<$1>" a

type Array a = List a

Each child can have its own typeclass instances:

instance Functor List
instance Foldable List
instance Indexed List
instance Stack List
instance Queue List

-- Deque: double-ended access, no random indexing
instance Functor Deque
instance Foldable Deque
instance Stack Deque
instance Queue Deque

-- Array: random indexing, no stack/queue ops
instance Functor Array
instance Foldable Array
instance Indexed Array

If a typeclass instance is not defined for Array (say, Stack), the compiler walks up to List and finds Stack List. Whether this inherited instance is usable depends on whether the concrete types are identical — if Array maps to a different native type than List in the target language, the inherited implementation will not apply.

5.7. Kind system

In morloc, every type variable belongs to one of a few categories called "kinds". A kind is what tells the compiler what kind of thing a variable stands for: an ordinary type (like Int), a number (like a dimension), a string (like a column name), a record schema, or a list of strings. Most morloc code never has to think about kinds because the default kind — ordinary type — is what nearly every type variable is. Kinds become visible when you want the compiler to track values that would otherwise be "merely runtime data" alongside the types they appear in.

Note

Kinds are phantom descriptions of types, not types themselves. A kind classifies what sort of expression fits in a particular slot of a type constructor; it has no runtime presence and cannot be inhabited by a value. The Nat kind tells the compiler "this slot holds a natural number"; the Rec kind tells it "this slot holds a record schema". The expressions that fill these slots (5, (n + m), {x = Int, y = Str}, Singleton "x" Int) all live at the kind level. You can’t take a kind-Rec expression and use it as a type for a runtime value; for that, use a named record declaration (see the Custom types chapter).

Kinds are written between a variable name and an enclosing parenthesis: (n :: Nat), (r :: Rec). A bare lowercase variable like a defaults to the ordinary-type kind.

The available kinds:

  • Type — the default. Any concrete type goes here: Int, Real, Str, [Int], (Int, Str), user-defined types, etc. Variables written without an annotation (a, b, t) are Type-kinded.

  • Nat — a natural number. Used for dimensions and counts.

  • Str — a string literal lifted to the type level. Used for column names and other "named-thing" labels.

  • Rec — a record schema (a mapping from field names to types). Used for table column maps.

  • List — an ordered list at the type level. The current parser defaults the element kind to Str, so List means "list of column names" in practice.

  • Set — an unordered, duplicate-free collection. Same defaulting as List: Set means "set of column names".

5.7.1. Nat: dimensions in the type

The standard library’s Vector and tensor types carry their dimension as a Nat:

type Vector  (n :: Nat) a                    -- 1D, length n
type Matrix  (m :: Nat) (n :: Nat) a         -- m rows by n columns

The Nat exists only at compile time. At runtime a Vector 5 Int is just a list of integers; the 5 is erased. But while the compiler is checking your code it can verify that you are not, say, multiplying a 2x3 matrix by a 2x3 matrix:

matmul :: Matrix m k a -> Matrix k n a -> Matrix m n a

Here m, k, n are Nat-kinded variables. The signature reads "multiplying an m-by-k matrix by a k-by-n matrix gives an m-by-n matrix" — and the compiler will reject a call that does not satisfy the inner-dimension match.

When the dimensions are concrete numbers, the compiler evaluates them. A Table 5 r and a Table 7 r rbound together is a Table 12 r; a caller expecting Table 13 r gets a compile-time error.

5.7.2. Str: column names in the type

A Str-kinded variable stands for a string literal that exists in both the runtime and the type system at once. The label syntax f:Str introduces a function argument that is a string at runtime AND a type-level Str variable bound by the same name:

asCol :: f:Str -> Vector n a -> Table n (Singleton f a)

When you call asCol "x" xs, the runtime sees f = "x" and the compiler sees the result type as Table n (Singleton "x" a), i.e. a table whose one column is named x. This is what lets the compiler keep track of column names through long pipelines: every call site that touches a column by name can reflect that name into the result type.

Other examples:

setCol    :: f:Str -> Vector n a -> Table n r
                                 -> Table n ((r - f) + Singleton f a)

renameCol :: f:Str -> g:Str -> Table n r
                            -> Table n ((r - f) + Singleton g (ProjectField r f))

setCol introduces the name f and uses it both as the runtime column name passed to the kernel and as the type-level key in the result schema. renameCol introduces two Str-kinded variables, f (the old name) and g (the new name), and rewires the schema accordingly.

5.7.3. Rec: record schemas at the kind level

A Rec-kinded variable stands for a column schema — a mapping from column names (Str) to column element types (Type). The literal {x = Int, y = Str} is an expression at kind Rec. So is the result of any of the record-arithmetic expressions:

  • r + s — merge two schemas (overlapping keys are an error)

  • r - f — drop a single field by name

  • r - l — drop the fields whose names appear in the List l

  • Singleton f t — a one-field record {f = t}

  • Restrict r l — the projection of r to the fields in l

All of these are kind-Rec expressions. They live in slots whose declared kind is Rec — today, the only such slot is the second parameter of Table n r. A definition like type R = Singleton "x" Int is a category error: R is being given a kind-Rec expression as its body, but the typedef machinery expects a Type-kinded body. The compiler refuses because the slots don’t match; user-level records (the kind that values can have) still come from a named record declaration.

(Constructing inhabitable types by applying kind-Rec expressions is a separate feature that morloc doesn’t have today. If added, it would be its own design layer on top of kinds, not a relaxation of the phantom-description rule.)

The Rec algebra appears directly in the table-stdlib signatures:

type Table (n :: Nat) (r :: Rec)

asCol      :: f:Str -> Vector n a -> Table n (Singleton f a)
setCol     :: f:Str -> Vector n a -> Table n r
                                  -> Table n ((r - f) + Singleton f a)
dropCols   :: l:[Str] -> Table n r -> Table n (r - l)
selectCols :: l:[Str] -> Table n r -> Table n (Restrict r l)
rbind      :: Table n1 r  -> Table n2 r  -> Table (n1 + n2) r
cbind      :: Table n  r1 -> Table n  r2 -> Table n (r1 + r2)

Read aloud, setCol says "given a column name f, a vector, and a table whose row schema is r, return a table whose row schema is r with f removed and then f re-added at the type of the vector’s elements." That sentence is enforced literally by the type system. Calling setCol "y" ys someTable adjusts the type of the result so the new column is reflected exactly.

There is also a type-level lookup operator, ProjectField r f, which returns the type of field f in row r:

getCol :: f:Str -> Table n r -> Vector n (ProjectField r f)

If you call getCol "x" t on a table whose schema contains x = Int, the result type reduces to Vector n Int. If x is not present, you get a compile-time error.

5.7.4. List and Set: collections of column names

A List-kinded variable holds a literal list of column names at both the runtime and the type level. The l:[Str] label syntax introduces one of these:

selectCols :: l:[Str] -> Table n r -> Table n (Restrict r l)
dropCols   :: l:[Str] -> Table n r -> Table n (r - l)

When you write selectCols ["x", "y"] t, the runtime sees the list ["x", "y"], and the compiler sees the result type as Table n (Restrict r ["x", "y"]), which it then reduces to a record containing only the x and y fields of r (with their original types preserved). A request for a column that does not exist in r fails at compile time.

Set-kinded variables are the same idea for unordered collections. They appear less in user-facing signatures but are used internally for constraints like "the keys of r1 and r2 are disjoint" (used by cbind to reject duplicated column names at compile time).

5.7.5. Reference: type-level functions

The compiler recognises a small set of named operators on kinded types. They look like ordinary type applications (Singleton f a, Restrict r l) and reduce to concrete values whenever their arguments are ground.

Function Kind signature Reads as Example reduction

Singleton k v

Str → Type → Rec

one-field record

Singleton "x" Int{x = Int}

Restrict r l

Rec → List Str → Rec

project to fields in l, in input order

Restrict {x=Int, y=Str, z=Real} ["x","z"]{x=Int, z=Real}

ProjectField r f

Rec → Str → Type

look up the type of one field

ProjectField {x=Int, y=Str} "x"Int

Keys r

Rec → Set Str

set of column names

Keys {x=Int, y=Str}{"x", "y"}

ListToSet l

List a → Set a

drop ordering and duplicates

ListToSet ["x","y","x"]{"x", "y"}

Size c

dispatched: List a / Set a / RecNat

number of elements

Size ["x", "y", "z"]3

The standard library’s table API uses every one of these except Size, which surfaces in dimension calculations elsewhere (for example, a column of generated indices whose length matches the schema width).

Note "Reduction" here means the compiler walks the kind-level expression tree and simplifies it where possible. Singleton "x" Int simplifies to {x = Int} — still a kind-Rec expression, just in canonical form. The result is never a Type-kinded type. The reductions exist so that constraints like Subset (ListToSet l) (Keys r) can be discharged when l and r happen to be ground, not so users can build types from kind-level fragments.

The same set of operators is also expressed in symbolic form. The parser sees +, -, *, / in type position and the kind solver picks the right meaning from the kinds of the arguments:

Operator Kinds Meaning

n + m

Nat → Nat → Nat

natural-number addition

n - m

Nat → Nat → Nat

subtraction (clamped at zero)

n * m

Nat → Nat → Nat

multiplication

n / m

Nat → Nat → Nat

integer division

r + s

Rec → Rec → Rec

merge two schemas

r - f

Rec → Str → Rec

drop one field by name

r - l

Rec → List Str → Rec

drop fields named in l

The Nat operators evaluate eagerly when both sides are literals: a signature mentioning Table (5 + 7) r is exactly Table 12 r after parsing. The Rec operators compose the same way — after substitution, the compiler walks the expression tree and reduces it to a record literal whenever the structure permits.

5.7.6. Reference: constraints

A constraint restricts the values a polymorphic variable can take. It is written to the left of in a signature:

foo :: (Constraint1 args, Constraint2 args) => a -> b

There are two kinds of constraints. The first is the familiar typeclass constraint (Eq a, Ord a, Functor f) — these are user-defined and discharged by selecting an instance.

The second is a small set of built-in primitive constraints over the kinded operators above:

Constraint Argument kinds Holds when

Member a s

a :: x, s :: Set x

a appears in s

Subset s1 s2

both Set x

every element of s1 is in s2

Disjoint s1 s2

both Set x

s1 and s2 share no elements

These are mostly used to constrain column-name sets. Two examples from the table stdlib show them in action:

selectCols :: (Subset (ListToSet l) (Keys r)) =>
              l:[Str] -> Table n r -> Table n (Restrict r l)

The constraint says: every name in the list l must appear among the keys of the row schema r. A call like selectCols ["x", "z"] t where t :: Table n {x=Int, y=Str} fails at compile time because "z" is not a key of r.

setCol :: (Disjoint (Singleton f a) (r - f)) =>
          f:Str -> Vector n a -> Table n r
                              -> Table n ((r - f) + Singleton f a)

The constraint says: the new field f and the residual schema (r - f) must not share keys. This is trivially true (the residual just had f removed) so the constraint always discharges — but it is what justifies the + on the right-hand side, which is otherwise forbidden when keys overlap.

Implicit constraints

You almost never have to write these constraints by hand. The compiler emits them automatically from the canonical operator forms it sees in a signature:

  • Whenever a record-extension r + Singleton k a (equivalently RecExtend k a r) appears with r a Rec variable, a Disjoint (Singleton k _) (Keys r) constraint is emitted — "the new key must not already be in the schema."

  • Whenever Restrict r l appears, a Subset (ListToSet l) (Keys r) constraint is emitted — "every requested name must exist."

Both of selectCols and setCol get their constraints this way. You only need the explicit form if you want a constraint that the compiler could not derive from your signature shape — for example, an extra Disjoint between two row variables that never directly meet in a + expression, but whose disjointness is part of your function’s contract.

The deliberately small constraint set keeps the type system decidable. Member, Subset, and Disjoint over finite sets of strings are easy to check; richer constraint languages quickly become undecidable, so the compiler trades expressiveness for predictable typechecking.

5.7.7. When to think about kinds

The honest answer is: only when you want the compiler to track something it would otherwise have to leave to runtime. Most morloc functions over ordinary types never need a kind annotation. You reach for them when:

  • dimensions or counts matter (use Nat),

  • column or field names matter (use Str and Rec),

  • the list of names matters as a whole (use List).

The table API in the standard library is the densest practical example in morloc — every signature uses Nat for the row count, Rec for the schema, Str for column names, and List for column-name lists. Reading through table/main.loc is the recommended way to internalise the notation.

5.8. Tensors

Morloc has built-in tensor types with dimensions tracked in the type system. When all dimensions are known at compile time, the compiler can catch shape mismatches — like passing a 3x4 matrix where a 4x3 was expected — even when the functions live in different languages. When dimensions are runtime values (e.g., batch sizes or feature counts from data), the check is deferred.

The standard library defines tensors from 1D to 5D:

type Vector  (n :: Nat) a                          -- 1D
type Matrix  (m :: Nat) (n :: Nat) a               -- 2D
type Tensor3 (d1 :: Nat) (d2 :: Nat) (d3 :: Nat) a -- 3D
-- ... up to Tensor5

The Nat parameters are type-level natural numbers — they exist only in the type system and are erased at runtime. The a is the element type (Real, Int, etc.). Under the hood, these map to numpy.ndarray in Python, mlc::Tensor templates in C++, and array/matrix in R.

Vector is the flat 1D form and aliases directly to List a — a Vector n a and a [a] are the same on the wire, with the type-level n adding a compile-time length constraint. Matrix and Tensor3 through Tensor5 are abstract types that go through the Packable machinery (see below) to reach their wire form.

Writing tensor functions

Tensor signatures use lowercase variables for dimensions. These are implicitly generic — the function works for any size:

-- Works for any m-by-n matrix
transpose :: Matrix m n Real -> Matrix n m Real

-- Both inputs must have the same shape
add :: Matrix m n Real -> Matrix m n Real -> Matrix m n Real

-- Dot product requires equal-length vectors
dot :: Vector n Real -> Vector n Real -> Real

The compiler checks that dimensions line up when you compose these functions. If you try to add a 3x4 matrix to a 5x6 matrix, you get a type error.

Dimension arithmetic

Signatures can express arithmetic relationships between dimensions:

-- Flattening a matrix multiplies its dimensions
flatten :: Matrix m n Real -> Vector (m * n) Real

-- Stacking vertically adds rows
vstack :: Matrix m n Real -> Matrix p n Real -> Matrix (m + p) n Real

-- Kronecker product multiplies both dimensions
kron :: Matrix m n Real -> Matrix p q Real -> Matrix (m * p) (n * q) Real

When concrete sizes are known, the compiler evaluates the arithmetic and checks it. For example, flattening a Matrix 3 4 Real produces a Vector 12 Real — and trying to use it where a Vector 13 Real is expected will fail.

When dimensions are still generic (variables, not numbers), the compiler defers the check until sizes become known.

Labeled nat parameters

Sometimes a function’s output dimensions depend on its runtime arguments. The n:Int syntax lets you express this:

-- The integer argument determines the vector length
makeVec :: n:Int -> Vector n Real

-- Two integer arguments determine the matrix shape
makeMat :: m:Int -> n:Int -> Matrix m n Real

When you call makeVec 5, the compiler knows the result is a Vector 5 Real and can propagate that through the rest of your program. This works with integer literals, let-bound variables, and tuple accessors:

makeVec 5                                        -- Vector 5 Real
let n = 3 in makeVec n                           -- Vector 3 Real
let dims = (3, 4) in makeMat (.0 dims) (.1 dims) -- Matrix 3 4 Real
Tensor wire forms

Higher-rank tensors do not have a single canonical cross-language representation — a C++ mlc::Tensor3 and a Python numpy.ndarray look nothing alike at the byte level. Morloc bridges them through the Packable typeclass: each rank declares a wire form built from primitives, tuples, and Vector, plus the language-specific pack / unpack functions that convert between the wire form and the native runtime type.

The standard library declares one Packable instance per rank, all following the same shape: a tuple of runtime dimensions paired with a flat Vector holding the row-major contiguous data.

-- A Matrix d1 d2 a serializes as ((d1-runtime, d2-runtime), flat data).
-- The Vector's type-level length is the product of the type-level dims;
-- this lets the compiler check that the flat buffer matches the shape.
instance Packable ((Int, Int), Vector (d1 * d2) a)
                  (Matrix d1 d2 a)

instance Packable ((Int, Int, Int), Vector (d1 * d2 * d3) a)
                  (Tensor3 d1 d2 d3 a)

-- Tensor4 and Tensor5 follow the same pattern.

Vector itself has no Packable instance: it is already the flat wire form. Routing tensor data through Vector rather than the generic [a] lets the language extensions take a fast path — numpy buffers travel zero-copy through Python, std::vector round-trips into C++ without an intermediate Python list, and R numeric vectors land directly in the native form.

The wire-format split is a clean separation:

  • The runtime dimensions (the (Int, Int, …​) tuple) carry the actual shape across the wire and let receivers allocate the right buffer size.

  • The type-level dimensions (the (d1 * d2 * …​) Nat expression on the Vector) let the compiler verify that the flat buffer length matches the shape — statically when the dims are known, deferred when they are not.

  • The device residency (whether a tensor lives on CPU or GPU) is deliberately omitted. Device location is local to a compute node and never meaningful across the wire; this matches the convention adopted by NumPy .npy, Apache Arrow IPC, ONNX, HDF5, and Protobuf TensorProto. The language-specific pack and unpack functions handle host-device transfers transparently when needed.

You almost never invoke pack or unpack by hand — the compiler routes serialization through the Packable instances automatically at every language boundary. The user-facing surface is just the abstract type (Matrix m n Real) and the runtime constructors (zeros2, identity, matmul, …​). Direct construction of a tensor literal in pure Morloc, when needed, goes through pack on the wire form:

m :: Matrix 2 3 Real
m = pack ((2, 3), [1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
Example: a CNN inference pipeline

Here is a small convolutional neural network for character recognition, written as a Morloc pipeline over C++ tensor functions. The architecture is: conv2d → relu → flatten → dense → argmax.

module main (predictDigit)

import root
import root-cpp

source Cpp from "cnn.hpp"
  ( "makeImage", "makeKernels", "makeBias"
  , "makeWeights", "makeDenseBias"
  , "conv2d", "reluMap", "flatten3d", "dense", "argmax"
  )

-- Construct inputs with labeled dimensions
makeImage   :: h:Int -> w:Int -> Matrix h w Real
makeKernels :: k:Int -> fh:Int -> fw:Int -> Tensor3 k fh fw Real
makeBias    :: k:Int -> Vector k Real
makeWeights :: nout:Int -> nin:Int -> Matrix nout nin Real
makeDenseBias :: n:Int -> Vector n Real

-- Convolution: output spatial dims shrink by (kernel - 1)
conv2d :: Matrix h w Real
       -> Tensor3 k fh fw Real
       -> Vector k Real
       -> Tensor3 k (h - fh + 1) (w - fw + 1) Real

-- ReLU preserves shape
reluMap :: Tensor3 a b c Real -> Tensor3 a b c Real

-- Flatten multiplies all dimensions together
flatten3d :: Tensor3 a b c Real -> Vector (a * b * c) Real

-- Dense layer: matrix-vector multiply plus bias
dense :: Matrix m n Real -> Vector n Real -> Vector m Real -> Vector m Real

-- Find the class with highest score
argmax :: Vector n Real -> Int

Now the pipeline itself reads like a straightforward description of the architecture:

predictDigit :: Int
predictDigit =
  let image    = makeImage 5 5
      kernels  = makeKernels 2 3 3
      bias     = makeBias 2
      convOut  = conv2d image kernels bias
      activated = reluMap convOut
      flat     = flatten3d activated
      weights  = makeWeights 3 18
      denseBias = makeDenseBias 3
      logits   = dense weights flat denseBias
  in argmax logits

The compiler infers every intermediate shape from the labeled dimensions. For example, makeImage 5 5 is Matrix 5 5 Real. Convolving with 2 kernels of size 3x3 yields Tensor3 2 3 3 Real (since 5 - 3 + 1 = 3). Flattening gives Vector 18 Real (2 * 3 * 3 = 18). The dense layer takes Matrix 3 18 Real weights and a Vector 18 Real input, and the 18 must match — if you changed the kernel count or image size, the compiler would catch the mismatch.

Opaque dimensions

Not every operation has a predictable output shape. When the output size depends on runtime values, the dimensions are left as fresh unknowns:

-- Output size depends on how many elements pass the predicate
filter :: (a -> Bool) -> Vector m a -> Vector i a

-- Output size depends on the integer arguments
slice :: Int -> Int -> Matrix m n Real -> Matrix i j Real

The compiler accepts these but cannot check downstream dimension constraints against them. Correctness here depends on getting the runtime logic right.

What the compiler checks (and what it does not)

Morloc checks that dimensions are consistent across your compositions — but it trusts the type signatures you write for foreign functions. A C++ function declared as Matrix m n Real → Matrix n m Real but actually implementing the identity will not be caught. This is the same tradeoff as linking against a C header file: the types are a contract, and the implementation is expected to honor it.

Arithmetic constraints (like m * n = 18) are checked when all variables are known. When some variables remain free, the check is deferred. If it can never be resolved, it is effectively unchecked.

5.9. Tables

Morloc has a built-in Table type for columnar data, parameterised by a row count and a column schema:

type Table (n :: Nat) (r :: Rec)

The two parameters carry compile-time information that is erased at runtime:

  • n :: Nat — the row count, a type-level natural number

  • r :: Rec — the column schema, a type-level mapping from field names to types (e.g. {x=Int, y=Str})

Tables travel between pools through the Apache Arrow C Data Interface in shared memory. Cross-language calls hand off column buffers by reference — no marshalling, no copying. In Python a Table is a pyarrow.RecordBatch, in C++ it is mlc::ArrowTable (a move-only wrapper around ArrowSchema/ArrowArray), and in R it is an arrow::RecordBatch.

Building and inspecting tables

The table module provides one constructor (asCol) and a small set of introspection functions. Multi-column tables are built by composition:

import table-py    -- one of table-py / table-cpp / table-r

-- Single-column constructor: lift a Vector into a one-column table
-- whose column name is `f`. The label `f:Str` introduces `f` as both
-- the runtime column name and a type-level Str variable.
asCol :: f:Str -> Vector n a -> Table n (Singleton f a)

-- Runtime introspection.
nrow  :: Table n r -> Int
ncol  :: Table n r -> Int
names :: Table n r -> [Str]

A typical multi-column build chains asCol with setCol (described below):

let xs = [0, 1, 2]       :: Vector 3 Int
    ys = ["a", "b", "c"] :: Vector 3 Str
    zs = [0.5, 1.0, 1.5] :: Vector 3 Real
    t  = setCol "z" zs (setCol "y" ys (asCol "x" xs))
       :: Table 3 {x = Int, y = Str, z = Real}
in nrow t   -- 3
Row operations

Row-only operations preserve the column schema but may change the row count. The "open" output row count m is left polymorphic; the caller fixes it by annotation or by use site.

-- Take rows in the half-open range [start, end). Bounds are clamped
-- non-negative; if start >= end the result is empty; if end > nrow
-- the end is silently clamped to nrow. Identical semantics to the
-- `slice` method of root's Indexed typeclass:
--   sliceRows 0 0 t           -- empty
--   sliceRows 0 (nrow t) t    -- the whole table
--   sliceRows 1 (nrow t) t    -- drop the first row
--   sliceRows 0 5 t           -- "head 5"
--   sliceRows (nrow t - 5) (nrow t) t   -- "tail 5"
sliceRows :: start:Int -> end:Int -> Table n r -> Table m r

-- Boolean-mask selection. The mask must have the same length as the
-- table; rows where the mask is True are kept.
filterRows :: Vector n Bool -> Table n r -> Table m r

-- Gather rows by integer indices. Indices may repeat or be out of
-- their original order; out-of-range indices error at runtime.
-- This is the primitive that subsumes the historical `reverseRows`:
--   pickRows (reverse (range 0 (nrow t - 1))) t   -- reverse rows
pickRows :: Vector m Int -> Table n r -> Table m r

-- Drop duplicate rows (whole-row equality).
distinctRows :: Table n r -> Table m r

-- Stable multi-key sort. Each entry of the spec list is a (column
-- name, ascending?) pair, where True means ascending and False
-- descending; later columns are tie-breakers for earlier ones.
sortRows :: [(Str, Bool)] -> Table n r -> Table n r
Column operations

Column operations touch the schema. The label syntaxes f:Str and l:[Str] introduce a runtime column name (or list of names) as a type-level variable, so the compiler can compute the result schema exactly. The row-arithmetic notation (r - f) + Singleton f a reads "drop the field f from r, then extend with a fresh field f at type a." The Singleton constructor builds a one-field record {f = a}.

-- Extract a named column as a Vector. The result element type is
-- determined by the type-level lookup `ProjectField r f`, which
-- reduces to the column's type when r is ground. Compile-time error
-- if `f` is absent in `r`.
getCol :: f:Str -> Table n r -> Vector n (ProjectField r f)

-- Set or replace a column. If `f` is already in the schema, the
-- column is replaced in place; otherwise it is appended. Either way
-- the result schema is `(r - f) + Singleton f a`.
setCol :: f:Str -> Vector n a -> Table n r
                              -> Table n ((r - f) + Singleton f a)

-- Drop columns by literal-list label. Drop-of-absent is benign per
-- element, matching the type-level (r - l) semantics.
dropCols :: l:[Str] -> Table n r -> Table n (r - l)

-- Project to columns by literal-list label, in the order given. The
-- implicit Subset constraint emitted from `Restrict r l` ensures
-- every requested column actually exists in r; a missing column is
-- a typecheck error.
selectCols :: l:[Str] -> Table n r -> Table n (Restrict r l)

-- Runtime [Str] projection (escape hatch). The result column schema
-- cannot be tracked at the type level; the output's row type is a
-- free Rec variable that the caller binds. Use `selectCols` instead
-- when the column names are known statically.
selectColsDyn :: [Str] -> Table n r1 -> Table n r2

-- Rename a single column. The result schema drops `f` and adds `g`
-- at the same type; compile-time error if `f` is absent in `r`.
renameCol :: f:Str -> g:Str -> Table n r
                            -> Table n ((r - f) + Singleton g (ProjectField r f))
Concatenation

Two tables can be stacked row-wise (when their schemas match) or column-wise (when their row counts match):

-- Row-wise: row counts add, schemas must match.
rbind :: Table n1 r -> Table n2 r -> Table (n1 + n2) r

-- Column-wise: row counts must match, schemas merge.
cbind :: Table n r1 -> Table n r2 -> Table n (r1 + r2)
Type-level row arithmetic

When concrete row counts are known, the compiler evaluates them and checks the result. Stacking a Table 5 r on a Table 7 r produces a Table 12 r, and trying to use that result somewhere expecting a Table 13 r is a type error. When the row counts are still generic variables the check is deferred until sizes become known.

mkLeft  :: Table 5 {x=Int}
mkRight :: Table 7 {x=Int}

bound :: Table 12 {x=Int}
bound = rbind mkLeft mkRight              -- 5 + 7 == 12, checked
Type-level column algebra

Tables compose column-wise through ` on the row type. The compiler enforces that overlapping column names are an error: extending `{x=Int}` with `{x=Real}` does not silently coerce. To replace a column, drop it first (`r - "f"`) then extend (` Singleton "f" t), which is exactly what setCol does internally.

mkA :: Table 5 {x=Int}
mkB :: Table 5 {y=Str}

both :: Table 5 ({x=Int} + {y=Str})
both = cbind mkA mkB
Polymorphism

Tables interact cleanly with row polymorphism. A function that takes a table of any shape and just inspects it needs no concrete schema:

shape :: Table n r -> (Int, Int)
shape t = (nrow t, ncol t)

The r here is a Rec variable — it stands for any column schema. The function compiles once and runs against tables with any columns the user hands it.

Cross-language data flow

The Arrow C Data Interface backs every cross-pool table handoff. A C++ pool returning a Table 1000000 r to a Python pool sends only an SHM offset and a schema descriptor across the socket; the Python receiver imports the same column buffers without copying. The same applies between R and Python, R and C++, etc.

A small worked example — one Python source, one C++ slicer:

module main (top10)

import table-py
import table-cpp

source Py from "data.py" ("loadCensus")

-- Python loads the data; C++ trims to the first ten rows. The
-- table travels through SHM with no copy.
loadCensus :: Int -> Table n {state=Str, pop=Int}

top10 :: Int -> Table m {state=Str, pop=Int}
top10 year = sliceRows 0 10 (loadCensus year)

The compiler picks sliceRows from table-cpp automatically — the result of loadCensus already lives in SHM, so the C++ side just adjusts a slice descriptor. No data leaves shared memory.

Reading tables from files

When a Table argument is supplied to a compiled program, the runtime detects the on-disk format and lands the data directly in SHM:

# Literal JSON, row-oriented
./prog summarize '[{"state":"WA","pop":7705281},{"state":"OR","pop":4237256}]'

# JSON file (column-oriented works too)
./prog summarize census.json

# Arrow IPC file -- detected by the ARROW1 magic.
./prog summarize census.arrow

# Parquet file -- detected by the PAR1 magic at head and tail.
./prog summarize census.parquet

# CSV / TSV -- chosen by file extension.
./prog summarize census.csv
./prog summarize census.tsv

# Standard input
cat census.json | ./prog summarize -

JSON accepts both row form ([{col: v, …​}, …​]) and column form ({col: [v, …​], …​}). The two are equivalent on the wire. Arrow IPC and Parquet are content-detected by file magic; CSV and TSV are detected by extension and parsed against the declared column schema (header row required, comma for .csv and tab for .tsv).

The schema declared in the morloc signature drives validation: a column missing from the file or carrying the wrong type is rejected with a clear error before the data reaches the pool. Nullable Arrow / Parquet columns are accepted into a non-Optional column when the runtime null count is zero, mirroring how pyarrow and arrow-cpp treat default-nullable schemas.

Writing tables to files

Functions whose return type is a Table can write the result in any of the same formats. The output format is selected by --output-form:

./prog top10 2024 --output-form json     # default; row-oriented JSON
./prog top10 2024 --output-form arrow    # Arrow IPC, ARROW1 framed
./prog top10 2024 --output-form parquet  # Parquet
./prog top10 2024 --output-form csv      # CSV with header row

Round-tripping is byte-stable for the columnar contents:

./prog top10 2024 --output-form arrow > census.arrow
./prog top10 2024 --output-form parquet > census.parquet
diff <(./prog identity census.arrow) <(./prog identity census.parquet)

The runtime ships the Arrow, Parquet, and CSV libraries inside the compiled nexus binary, so there is no per-language dependency on PyArrow, arrow-cpp, or arrow-r at the user’s pool layer for IO. Pools see only the Arrow C Data Interface.

Limitations

The current implementation supports primitive column types — Bool, Int, Real, sized signed and unsigned integers, sized floats, and Str. Nullable columns parse correctly when null counts are zero. Nested column types (list-of, struct-of, dictionary-encoded) are not yet supported. Date, Timestamp, and Duration round-trip as the underlying integer or string but do not yet have first-class morloc primitives.

Tables are immutable in place — every column-modifying operation produces a new table. The Arrow SHM layer is reference-counted across pools so the cost of building "new" tables is normally just a descriptor update, but a true in-place mutation API does not exist.

sortRows does not yet check at compile time that every column name in its spec list appears in the row schema; a missing column raises at runtime. Joins, group-by, aggregate, and column-type casting belong to follow-on modules and are not part of table itself.

6. Human and Machine Interfaces

6.1. Docstrings

A Morloc module describes a set of functions, their types, and their descriptions. We’ve already covered terms and types, now we will cover the descriptions we add to modules, functions, arguments and fields. The extra data is stored in specialized comments (docstrings) that describe the terms and add modifications like defaults. The most obvious use case of these annotations is in specializing CLI interfaces, discussed in the next section, but they also inform the generation of rich APIs as well (see the HTTP/TCP/socket section).

Docstrings can be attached to type definitions, not just function signatures. This is useful for avoiding repetition. Without type-level docstrings, you would need to annotate every use of a type:

decode ::
  --' The secret key to use for decoding
  --' literal: true
  --' metavar: KEY
  Key ->
  --' The ciphertext to decode
  --' literal: true
  --' metavar: CIPHERTEXT
  CipherText ->
  --' The decoded plaintext
  PlainText

encode ::
  --' The secret key to use for encoding
  --' literal: true
  --' metavar: KEY
  Key ->
  --' The plaintext to encode
  --' literal: true
  --' metavar: PLAINTEXT
  PlainText ->
  --' The encoded ciphertext
  CipherText

The descriptions, literal tags, and metavar names are repeated for every occurrence of Key, CipherText, and PlainText. Instead, you can attach docstrings directly to the type definitions:

--' A secret key
--' literal: true
--' metavar: KEY
type Key = Str

--' An encrypted message
--' literal: true
--' metavar: CIPHERTEXT
type CipherText = Str

--' A decrypted message
--' literal: true
--' metavar: PLAINTEXT
type PlainText = Str

--' Decode a ciphertext with a key
decode :: Key -> CipherText -> PlainText

--' Encode a plaintext with a key
encode :: Key -> PlainText -> CipherText

The CLI and API generators will pull the descriptions, tags, and metavar names from the type definitions automatically. This keeps function signatures clean and ensures consistent documentation across every function that uses these types.

6.2. The Command Line Interface

Building a Morloc module will generate a CLI tool where exported functions are presented as typed subcommands.

Here is a minimal example of propagated function descriptions:

import root-py

source Py from "main.py" ("foo", "bar")

--' Take two reals and do thing
foo :: Real -> Real -> Real

--' Convert a list of reals into a thing
bar :: [Real] -> Real

The special comment --' introduces a docstring that is attached to the following type signature and will be propagated through to the code generated by the backend.

$ morloc make -o nexus main.loc
$ ./nexus -h
Usage: ./nexus [OPTION...] COMMAND [ARG...]

Nexus options (must precede COMMAND):
  -h, --help           Print this help message
  -p, --print          Pretty-print output for human consumption
  -o, --output-file    Print to this file instead of STDOUT
  -f, --output-format  Output format [json|mpk|voidstar]
      --keep-null      Print top-level () or None as 'null'
                       (default: produce empty output)

Daemon mode:
  --daemon             Run as a long-lived daemon
  --http-port PORT     Listen on HTTP port (0 = ephemeral; specify only once)
  --port PORT          Listen on TCP port  (0 = ephemeral; specify only once)
  --socket PATH        Listen on Unix socket
  --port-file PATH     Write bound ports to PATH as JSON
  --eval-timeout SECS  CPU budget for /eval and /typecheck (default: 30)

Commands (call with -h/--help for more info):
  foo  Take two reals and do thing
  bar  Convert a list of reals into a thing

More detailed information about each exported subcommand may be accessed as well:

$ ./nexus foo -h
Usage: ./nexus foo ARG1 ARG2

Take two reals and do thing

Positional arguments:
  ARG1  No description given
        type: Real
  ARG2  No description given
        type: Real

Return: Real
Note The type: lines in --help output show the wire type — the concrete general type the argument resolves to after alias expansion — not the source-level alias. So an argument declared as UserID (where type UserID = UInt32) will appear as type: UInt32 in usage. This is a deliberate choice: telling the caller exactly what shape of data to provide is more useful at the command line than echoing the domain name. The alias name is best surfaced through a docstring description, a metavar tag, or the short/long arg flag.

Docstrings may also contain tags that specify how the arguments of the exported functions map to CLI positional or optional arguments. Here is a list of the currently supported tags:

  • name - give the CLI subcommand a dedicated name rather than defaulting to the Morloc function name

  • literal - treat an argument as the actual data, not a file that contains the data. Currently this is used for strings where literal: true indicates that the extra JSON quotes are not required. For example, without literal: true, a string must be passed as '"hello"' (with JSON quoting); with literal: true, it can be passed simply as hello.

  • unroll - if true, then the record argument is "unrolled" into a group of optional arguments

  • default - the default value for an argument, in JSON format. default: is only meaningful for optional arguments (those declared with arg: so they can be omitted on the command line).

  • metavar - the variable name used for the argument in the usage text

  • arg - short and long labels for this argument (e.g., "-v/--verbose")

  • true - the flag labels that toggle a boolean argument on

  • false - the flag labels that toggle a boolean argument off

  • return - a description of the returned data (this tag is the same as adding a description docstring to final type in a signature)

Here is a longer example that show-cases these tags:

module foobar (foo, bar)

import root-py

source Py from "foobar.py"
  ("foo", "bar")

--' config record
--' unroll: true
--' arg: --config
record Config where
  --' temporary directory
  --' arg: --tmp
  --' literal: true
  --' default: "/tmp"
  tmpdir :: Str

  --' cache the results
  --' true: --cache
  cache :: Bool

  --' number of threads to use
  --' arg: -t/--num-threads
  --' default: 1
  nthreads :: Int

--' do foo stuff
foo ::
  Config ->
  --' list of integers
  --' metavar: INT_LIST
  [Int] ->
  --' sum of INT_LIST
  Int

--' do bar stuff
--' return: summed values
bar ::
  --' unroll: false
  Config -> [Int] -> Int

The top-level usage statement is as follows

$ ./foobar -h
Usage: ./foobar [OPTION...] COMMAND [ARG...]

Nexus options (must precede COMMAND):
  -h, --help           Print this help message
  -p, --print          Pretty-print output for human consumption
  -o, --output-file    Print to this file instead of STDOUT
  -f, --output-format  Output format [json|mpk|voidstar]
      --keep-null      Print top-level () or None as 'null'
                       (default: produce empty output)

Daemon mode:
  --daemon             Run as a long-lived daemon
  --http-port PORT     Listen on HTTP port (0 = ephemeral; specify only once)
  --port PORT          Listen on TCP port  (0 = ephemeral; specify only once)
  --socket PATH        Listen on Unix socket
  --port-file PATH     Write bound ports to PATH as JSON
  --eval-timeout SECS  CPU budget for /eval and /typecheck (default: 30)

Commands (call with -h/--help for more info):
  foo  do foo stuff
  bar  do bar stuff

The dedicated usage information for foo can be accessed as well. Here we see that the record Config has been unrolled into a group of optional arguments:

$ ./foobar foo -h
Usage: ./foobar foo [OPTION...] INT_LIST

do foo stuff

Positional arguments:
  INT_LIST  list of integers
            type: [Int]

Group arguments:
  Config: config record
    --config Config
        Default values for this argument group
            tmpdir :: Str
            cache :: Bool
            nthreads :: Int
    --tmp Str
        temporary directory
        type: Str
        default: "/tmp"
    --cache
        cache the results
        default: false
    -t Int, --num-threads Int
        number of threads to use
        type: Int
        default: 1

Return: Int
  sum of INT_LIST

Since each subcommand is a function, the return type is always the same. Unlike in a conventional CLI program, the arguments cannot alter the return type.

The bar subcommand explicitly does not unroll the Config record:

$ ./foobar bar -h
Usage: ./foobar bar ARG1 ARG2

do bar stuff

Positional arguments:
  ARG1  config record
        type: NamRecord Config<>
              {
                  tmpdir :: Str
                  cache :: Bool
                  nthreads :: Int
              }
  ARG2  No description given
        type: [Int]

Return: Int
  summed values

6.3. Quick evaluation with morloc eval

Morloc has three subcommands that turn source into a result, and they serve distinct roles:

  • morloc make — compile a module into an executable (the nexus and its language pools). This is the full language: a module may source foreign code, declare types, typeclasses, and instances, import local modules, and export zero, one, or many terms.

  • morloc typecheck — type-check a module without compiling or running it. Same full language as make; it only reports the inferred types of the exported terms.

  • morloc eval — compile and run a single expression. An expression composes functions that are already installed on the system; it cannot introduce new ones. An eval expression may import installed modules and use let/where/do, but it may not source foreign code or declare types, typeclasses, instances, or module structure.

The dividing line is module vs expression. make and typecheck consume a module, which can define and source new functionality and export any number of terms. eval consumes one expression assembled purely from already-installed pieces, producing exactly one result. Use eval for quick experiments, shell pipelines, and for exposing a fixed set of installed functions to callers who may only compose them — never to introduce new code.

Pass the expression inline with -e, or name a file containing it as the positional argument. The two are interchangeable: writing an -e string to a file and running morloc eval file gives the same result — the file is treated as expression text, not as a module.

$ morloc eval -e "import root-py; 1 + 2"
3

$ morloc eval -e 'import root-py; "foo" <> "bar"'
"foobar"

$ printf 'import root-py\n1 + 2\n' > add.loc
$ morloc eval add.loc
3

Because an eval expression can only compose installed functions, eval is also the safe surface to expose over an API or daemon: it resolves only installed modules, never local-filesystem modules, so an untrusted caller cannot source arbitrary foreign code or reach a module they uploaded. A local import — a bare name that resolves on the filesystem, or a dot-prefixed name (.utils) — is rejected in eval mode; build programs that depend on local modules with morloc make instead. The --allow-local-modules flag re-enables local resolution for local development only and is insecure for server use.

6.3.1. Imports in eval strings

Morloc has no implicit prelude: every name an expression refers to must come from a module the eval string explicitly imports. Operators like + and <> are typeclass methods sourced from the standard library, so a typical eval string begins with one or more imports:

$ morloc eval -e "import root-py; import root-cpp; 1 + 2"
3

As described above, only installed modules may be imported (named bare, like root-py); local imports are rejected in eval mode. See Importing modules for the full import rules.

If no import brings the required operator or function into scope, the compiler reports an undefined-term error with a hint pointing at the fix:

$ morloc eval -e "1 + 2"
<expr>:1:2: error:
Undefined term: +
hint: an eval expression has no implicit prelude; prefix the expression with 'import root-py;' (or the module that defines +) to bring it into scope

6.3.2. Single-line layout: braces and semicolons

A Morloc source file relies on indentation to delimit blocks. An eval string is a single shell argument, so block structure must use the explicit-brace forms that the grammar provides as alternatives to the indentation-based forms. Two rules apply:

  • Top-level items (imports and the trailing expression) are separated by a literal ;. The eval preprocessor rewrites every top-level ; to a newline before handing the string to the parser, which is the same effect as starting a new top-level line in a file.

  • Block bindings inside where, let, and do are written with literal braces and semicolons: where { a = 1; b = 2 }, let { a = 1; b = 2 } in expr, do { stmt1; stmt2; expr }. Semicolons inside {…​} are preserved by the preprocessor and consumed by the parser as item separators.

A where clause that would normally span multiple indented lines in a source file:

result = a + b where
    a = 10
    b = 20

becomes, on the command line:

$ morloc eval -e 'import root-py; a + b where { a = 10; b = 20 }'
30

Likewise for let:

$ morloc eval -e 'import root-py; let { a = 10; b = 20 } in a + b'
30

A do-block (see Effects and delayed evaluation) uses the same brace-and-semicolon form:

do { stmt1; stmt2; final_expr }

These explicit-brace forms are not specific to eval — they are part of the Morloc grammar and may be used in source files too. They are simply the only practical way to write multi-binding blocks inside a single shell-quoted string.

6.3.3. Saving an eval expression as a command

--save NAME installs the compiled expression as a reusable command:

$ morloc eval --save adder -e "import root-py; 1 + 2"
$ adder
3

The installed command behaves like any other Morloc executable.

6.4. Composing CLI Tools

Since modules can both be compiled into executable command line tools and imported by other modules, we can naturally compose command line tools.

Here is a little Morloc script that imports a Python program that prints a calendar to STDERR.

module cally (cal)

import root-py

source Py from "cal.py" ("cal")

--' Print a 3-month calendar and some timezones
cal :: () -> ()

Here is another Morloc tool that prints d20 rolls

module dnd (d20)

import root-py

source R from "dnd.R" ("d20")

--' Roll n d20 dice
d20 :: Int -> [Int]

Now we can import both of these into a third module which will expose the functions from both the calendar and dnd modules.

module toolbox (cal, d20)

import .cally
import .dnd

This final module can be compiled and will have a usage statement like so:

Usage: ./toolbox [OPTION...] COMMAND [ARG...]

Nexus options (must precede COMMAND):
  -h, --help           Print this help message
  -p, --print          Pretty-print output for human consumption
  -o, --output-file    Print to this file instead of STDOUT
  -f, --output-format  Output format [json|mpk|voidstar]
      --keep-null      Print top-level () or None as 'null'
                       (default: produce empty output)

Daemon mode:
  --daemon             Run as a long-lived daemon
  --http-port PORT     Listen on HTTP port (0 = ephemeral; specify only once)
  --port PORT          Listen on TCP port  (0 = ephemeral; specify only once)
  --socket PATH        Listen on Unix socket
  --port-file PATH     Write bound ports to PATH as JSON
  --eval-timeout SECS  CPU budget for /eval and /typecheck (default: 30)

Commands (call with -h/--help for more info):
  cal  Print a 3-month calendar and some timezones
  d20  Roll n d20 dice

6.5. Single-command vs multi-command programs

A Morloc program can export one function or many. The two cases produce slightly different command-line interfaces.

Multi-command programs (two or more exports) require a COMMAND name on every invocation:

$ ./toolbox cal
$ ./toolbox -p d20 5    # nexus options before COMMAND

The COMMAND name partitions argv: anything before it is a nexus option, anything after it belongs to the subcommand. This separation lets the subcommand freely use any short flag (including -p, -o, -f) without colliding with the nexus.

Single-command programs (exactly one export) make the COMMAND name optional. Both forms are equivalent:

$ ./hw hello
$ ./hw           # implicit: the only exported command runs

Because the COMMAND is optional, nexus options and subcommand arguments share the same argv. To keep short flags available for the program, nexus options in single-command mode must use their long form only (--print, --output-file, --output-form, --keep-null, --help). Short forms like -p, -o, and -f are passed through to the subcommand. This is why the help text for a single-command program omits the short forms from its nexus-options list.

6.6. User arguments and outputs

Note The examples in this section assume the program was compiled with an explicit output name, e.g., morloc make -o nexus main.loc.

User data is passed to Morloc executables as positional arguments to the specified function subcommand. The argument may be a literal JSON string or a filename. For files, the format may be JSON, MessagePack, or Morloc binary (VoidStar) format. The Morloc nexus first checks for a ".json" extension, if found, the nexus attempts to parse the file as JSON. Next the nexus checks for a ".mpk" or ".msgpack" extension, and if found it attempts to parse the file as a MessagePack file. If neither extension is found, it attempts to parse the file first as Morloc binary, then as MessagePack, and finally as JSON. See the parse_cli_data_argument function in morloc.h for details.

Passing literal JSON on the command line can be a little unintuitive since extra quoting may be required. Here are a few examples:

# The Bash shell removes the outer quotes, so double quoting is required
$ ./nexus foo '"this is a literal string"'

# Single quotes are lists is fine, still need to quote inner strings
$ ./nexus bar '["asdf", "df"]'

# By default, output is written to JSON format
$ ./nexus baz 1 2 3 > baz.json

# The output can be directly read by a downstream morloc program
$ ./nexus bif baz.json

Data may be written to MessagePack or VoidStar via the -f argument:

$ ./nexus -f voidstar first '[["some","random"],["data"]]' > data.vs
$ ./nexus -f json first data.vs > data.json
$ ./nexus -f mpk reverse data.json > data.mpk
$ ./nexus reverse data.mpk
"some"

The VoidStar format is the richest and is the only form that contains the schema describing the data.

A bare -- ends option parsing — everything after it is treated as a positional argument even if it looks like a flag. This is rarely needed in practice: the parser already recognizes -X as a short option only when X is alphabetic, so negative numbers like -4.0 and -7 are treated as positionals without help. The case where -- matters is passing a literal string that looks like a short option (e.g., the string -f) as a positional:

$ ./nexus echoStr -- -f       # without --, '-f' would be parsed as a flag

6.6.1. Top-level null and Unit

When a JSON-format command returns () (Unit) or Null at the top level, the nexus produces empty stdout by default. This matches the UNIX convention that a tool with no result to report says nothing, which is what you usually want when piping a Morloc program into something like grep, xargs, or a status check. A () carries no information either way, and a None at the top level is normally indistinguishable from "the command ran and did its work".

When the distinction does matter — for example, when piping JSON to a downstream consumer that needs null to mean "got a null result" versus an empty file meaning "the process died before writing anything" — pass --keep-null to disable the suppression:

$ ./nexus lookup missing-key            # default: empty
$ ./nexus --keep-null lookup missing-key
null

Suppression only affects the JSON output format. Binary formats (-f mpk, -f voidstar) always emit a well-formed nil value (MessagePack 0xc0, voidstar’s nil packet) so downstream readers see the bytes they expect.

Nested null values inside tuples, records, or arrays are not suppressed — structural shape carries information that the consumer needs:

$ ./nexus pair      # returns (5, ())
[5,null]

6.7. System-scope environments

morloc-manager keeps environments in two parallel scopes: a per-user local scope (no privileges required) and a machine-wide system scope (root required). The --system flag selects the system scope on the subcommands that mutate it:

  • morloc-manager setup --system — write the default container engine to the system config (run with sudo).

  • morloc-manager new <name> --system — build a new environment in the system scope so it is shared across users.

  • morloc-manager rm <name> --system — remove an environment from the system scope.

  • morloc-manager nuke --system — remove all system-scope environments (and, with --images, their backing container images).

  • morloc-manager select <name> --system — write the active-environment pointer to the system config rather than the user’s local config.

Read-only subcommands also accept --system for discovery rather than mutation:

  • morloc-manager ls --system — list only system-scope environments (--local is the symmetric filter).

  • morloc-manager info [<name>] --system — describe a system-scope environment, useful when a local environment of the same name shadows it.

A regular (non-root) user can therefore find out whether a system-scope environment exists — and what it is configured with — without elevated privileges, by running morloc-manager ls --system or morloc-manager info <name> --system. Mutating subcommands will refuse to run without root and print a hint to re-invoke under sudo.

6.8. Search and install

The docstrings are used for discoverability as well. In this section I’ll cover how modules are installed as executables or standard modules and how they can be searched.

I’ll demonstrate this with a simple two module Morloc program describing a set of DnD operations. The first module defines general random operations:

fate.loc
module fate (roll, coinToss, choose)

import root-py

source Py from "fate.py"
  ( "roll" as roll
  , "coin_toss" as coinToss
  , "choose" as choose
  )

--' Roll n d-sided dice
roll ::
  --' Number of dice
  Int ->
  --' Number of pips per die
  Int ->
  --' Roll values
  <Rand> [Int]

--' Randomly return True or False
coinToss :: <Rand> Bool

--' Randomly choose one element from a non-empty list
choose :: [a] -> <Rand> a

The sourced fate.py script contains the following code:

fate.py
import random

def choose(xs):
    return random.choice(xs)

def roll(n, d):
    return [random.randint(1, d) for _ in range(n)]

def coin_toss():
    return bool(random.randint(0,1))

We can install fate with morloc install --build ./fate. This installs the module so it can be imported by other Morloc programs, and the --build flag additionally builds an executable we can test.

Note morloc install (with or without --build) installs modules for import — from remote sources by name (e.g., morloc install root) or from local directories with ./. In contrast, morloc make --install compiles a local program and installs the resulting executable.

We can test this, for example by rolling 3d8:

$ fate roll 3 8
[8,2,5]

Next let’s build on this foundation. First let’s make a simple tavern script that helps generate new characters.

tavern.loc
module tavern (randomClass, randomRace)

import root-py
import fate (choose)

--' Select a random class
randomClass :: <Rand> Str
randomClass = choose ["Fighter", "Wizard", "Rogue", "Cleric", "Ranger", "Bard"]

--' Select a random race
randomRace :: <Rand> Str
randomRace = choose ["Human", "Elf", "Dwarf", "Halfling"]

Next let’s add a module for combat:

combat.loc
module combat (rollAdv, fighterDamage, intro)

import root-py
import root-r
import fate (roll, coinToss)

--' Roll a pair of d20 dice and keep the larger result
rollAdv :: <Rand> Int
rollAdv = do fold max 0 !(roll 2 20)

--' Damage done on hit, modifier + sum of dice rolls
damage ::
  --' Enemy Armor Class
  Int ->
  --' Attack modifier
  Int ->
  --' Attack dice
  <Rand> [Int] ->
  --' Damage modifier
  Int ->
  --' Damage dice
  <Rand> [Int] ->
  --' Total damage
  <Rand> Int
damage ac atkMod atkDice dmgMod dmgDice = do
  let atkRoll = fold max 0 !atkDice
  let atk = atkMod + atkRoll
  let dmg = dmgMod + sum !dmgDice
  ? atkRoll == 20 = 2 * dmg  -- critical
    ? atk >= ac = dmg        -- hit
    : 0                      -- miss

--' Damage calculation for a fighter
fighterDamage ::
  --' Enemy Armor Class
  Int ->
  --' Fighter's damage
  <Rand> Int
fighterDamage ac = damage ac 4 (roll 1 20) 2 (roll 2 8)

source R from "combat.R" ("intro")

--' Introduce a new battle!
intro ::
  --' Monster name
  --' literal: true
  Str ->
  --' DM's monster intro
  Str

We can build and install the program with:

$ morloc make --install combat.loc

This command does several things.

First it installs the combat executable to a standard path. The pool/ artifacts and all files in the current working directory need to be moved to a standard location. There are two ways you can specify the required build files.

You can specify required files with --include arguments

$ morloc make --install combat.loc --include fate.loc --include combat.R

Or you can create a package.yaml file and add an include field. The default file can be generaed for you with morloc new. You can then modify the include field list with the required files:

name: combat
version: 0.1.0
homepage: null
synopsis: null
description: null
category: null
license: MIT
author: null
maintainer: null
github: null
bug-reports: null
dependencies: []
# Files to include when installing with `morloc make --install`
include: ["combat.R"]

Then run morloc make --install combat.loc.

In both install paths, the combat source code is copied to the ~/.local/share/morloc/exe/<modname> folder and the executable script itself is written to ~/.local/share/morloc/bin/.

We can view the installed executable:

$ morloc list -v combat
Programs:
  combat  3 commands
    rollAdv :: Int
    fighterDamage :: Int -> Int
    intro :: Str -> Str

If we add the Morloc bin folder above to PATH, then we can now use this program naturally:

$ combat -h
Usage: combat [OPTION...] COMMAND [ARG...]

Nexus options (must precede COMMAND):
  -h, --help           Print this help message
  -p, --print          Pretty-print output for human consumption
  -o, --output-file    Print to this file instead of STDOUT
  -f, --output-format  Output format [json|mpk|voidstar]
      --keep-null      Print top-level () or None as 'null'
                       (default: produce empty output)

Daemon mode:
  --daemon             Run as a long-lived daemon
  --http-port PORT     Listen on HTTP port
  --port PORT          Listen on TCP port
  --socket PATH        Listen on Unix socket

Commands (call with -h/--help for more info):
  rollAdv        Roll a pair of d20 dice and keep the larger result
  fighterDamage  Damage calculation for a fighter
  intro          Introduce a new battle!
$ combat fighterDamage 15
12
$ combat fighterDamage 15
8

We can also uninstall with morloc uninstall combat. This will cleanly remove the installed source and the installed executable script.

6.9. Building API interfaces

In addition to being CLI tools, compiled Morloc programs can run as long-lived daemons, accepting function calls over HTTP, TCP, or Unix sockets. A router aggregates multiple programs behind a single API.

No extra steps are needed to setup these extra APIs. They are already built into the executable we created in the last section. We only need to activate them.

6.9.1. HTTP protocol

We can start combat as a daemon on HTTP port 8080:

$ combat --daemon --http-port 8080 &
morloc-daemon: listening on http://0.0.0.0:8080
$ DAEMON_PID=$!

The trailing & creates the process in the background and $! captures its PID for later shutdown (see the Shutdown section below). This command launches all language pool processes (Python and R in this case) as child processes in separate process groups. A thread pool handles concurrent requests. If a pool crashes, the daemon detects it restarts it automatically.

We can check the daemon’s health:

$ curl -s localhost:8080/health
{"status":"ok","result":[true]}

The /health endpoint returns the liveness status of each pool.

The running daemons are discoverable:

$ curl -s localhost:8080/discover | jq .
{
  "status": "ok",
  "result": {
    "name": "combat",
    "version": 1,
    "commands": [
      {
        "name": "rollAdv",
        "type": "remote",
        "return_type": "Int",
        "return_schema": "<int>i4",
        "args": [],
        "desc": "Roll a pair of d20 dice and keep the larger result"
      },
      {
        "name": "fighterDamage",
        "type": "remote",
        "return_type": "Int",
        "return_schema": "<int>i4",
        "args": [
          {
            "kind": "pos",
            "type": "Int",
            "schema": "<int>i4"
          }
        ],
        "desc": "Damage calculation for a fighter"
      },
      {
        "name": "intro",
        "type": "remote",
        "return_type": "Str",
        "return_schema": "<str>s",
        "args": [
          {
            "kind": "pos",
            "type": "Str",
            "schema": "<str>s"
          }
        ],
        "desc": "Introduce a new battle!"
      }
    ]
  }
}

Functions can be called over the port:

$ curl -s -X POST localhost:8080/call/rollAdv -d '[]'
{"status":"ok","result":18}

$ curl -s -X POST localhost:8080/call/fighterDamage -d '[15]'
{"status":"ok","result":12}

Bad commands will return sensible errors:

$ curl -s -X POST localhost:8080/call/fireball -d '[]'
{"status":"error","error":"Unknown command: fireball"}

Beyond the pre-compiled commands, POST /eval and POST /typecheck take a JSON body {"expr": "…​"} and evaluate (or type-check) a single Morloc expression on the fly:

$ curl -s -X POST localhost:8080/eval -d '{"expr":"import root-py; 1 + 2"}'
{"status":"ok","result":3}

POST /eval runs the expression in the eval sandbox: it may compose installed modules and use let/where/do, but may not declare types, typeclasses, instances, source foreign code, or import local-filesystem modules. This is the intended interface for exposing a fixed set of server-side modules to untrusted callers — they can only compose what is already installed; arbitrary code upload is not possible. Use morloc make server-side to build programs that need local modules. POST /typecheck only reports the inferred type and never executes anything, so it is not sandboxed the same way; treat its input as you would any type-check request.

Every response also carries an HTTP status code that reflects the class of outcome, so HTTP clients with built-in retry / branching logic (curl --fail, axios, fetch) work as expected without parsing the JSON envelope. The JSON body is still always present for clients that prefer it.

Code Meaning When

200

OK

Success. The body’s result field carries the return value.

204

No Content

The response to a CORS preflight OPTIONS request. The daemon never dispatches OPTIONS through any handler; it answers immediately with the standard Access-Control-Allow-* headers and an empty body.

400

Bad Request

The request was malformed: missing required field, unparseable args JSON, wrong number of arguments, a value that didn’t match its declared schema, or a string containing an embedded NUL byte the target language can’t represent.

404

Not Found

The path or named resource doesn’t exist: an unknown HTTP endpoint (GET /nope), an unknown command (POST /call/fireball), or a binding name that wasn’t registered (DELETE /bindings/missing).

408

Request Timeout

A POST /eval or POST /typecheck expression consumed more CPU than the --eval-timeout budget (default 30s) and was killed by the kernel via SIGXCPU. This guard only applies to those two endpoints, which fork morloc eval/typecheck as a subprocess. POST /call/ requests dispatch into a pre-compiled pool worker and are not bounded by --eval-timeout — long-running calls there are allowed.

500

Internal Server Error

A genuinely server-side failure: a pool socket error, a fork/pipe failure, the eval engine returning an unexpected error, or any other state that wasn’t the client’s fault.

503

Service Unavailable

The service is temporarily unable to handle the request but the caller should retry. The daemon emits 503 during the brief window where it is tearing down and respawning a crashed pool; the router emits 503 when forwarding a request to a daemon in that state, or when its cluster /health reports at least one program unhealthy. All 503 responses include Retry-After: 1. Clients with built-in retry middleware (curl --retry, axios-retry, hyper-retry) will back off and re-issue automatically.

The same status-code mapping applies whether you call a single daemon directly or hit the router; the router forwards classification through unchanged. Client errors (4xx) describe something the caller can fix; server errors (5xx) describe something the caller should retry or report. Unix-socket and TCP clients see the same classification via the JSON envelope’s status and error fields, though they don’t get the HTTP-level Retry-After hint on 503.

6.9.2. TCP protocol

HTTP adds overhead per request: headers, text parsing, and the full HTTP framing around each message. When your client is a program rather than a browser or curl, you can skip all of that. The TCP protocol uses a compact binary framing — just a 4-byte big-endian length prefix followed by the JSON payload. This makes it well suited for service-to-service communication, high-throughput automated pipelines, or any context where you control both ends of the connection and want minimal overhead.

Start a daemon on TCP port 9001:

$ combat --daemon --port 9001 &
morloc-daemon: listening on tcp://127.0.0.1:9001

Unlike the HTTP protocol, you can’t use curl to talk to a TCP daemon. You need a client that speaks the length-prefixed binary framing. Here is a minimal Python client:

tcp_client.py — minimal TCP client
import socket, struct, json

def recvall(s, n):
    data = b''
    while len(data) < n:
        chunk = s.recv(n - len(data))
        if not chunk:
            raise RuntimeError("Connection closed")
        data += chunk
    return data

def call(host, port, method, command=None, args=None):
    msg = {"method": method}
    if command: msg["command"] = command
    if args is not None: msg["args"] = args

    payload = json.dumps(msg).encode()
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((host, port))
    # send 4-byte big-endian length, then the JSON payload
    s.sendall(struct.pack('>I', len(payload)) + payload)

    # read the 4-byte response length, then the response
    resp_len = struct.unpack('>I', recvall(s, 4))[0]
    resp = recvall(s, resp_len)
    s.close()
    return json.loads(resp)

print(call("localhost", 9001, "call", "rollAdv"))
# {"status": "ok", "result": 18}

print(call("localhost", 9001, "call", "fighterDamage", [15]))
# {"status": "ok", "result": 12}

print(call("localhost", 9001, "health"))
# {"status": "ok", "result": [true]}

print(call("localhost", 9001, "discover"))
# {"status": "ok", "result": {"name": "combat", "commands": [...]}}

The request is a JSON object with a method field ("call", "discover", or "health"), an optional command field naming the function, and an optional args array.

6.9.3. Unix socket protocol

For processes running on the same machine, Unix domain sockets are the fastest option. They bypass the entire network stack — no TCP handshake, no port allocation, no loopback routing. This is how Morloc pools communicate with the nexus internally.

To start a daemon on a Unix socket:

$ combat --daemon --socket /tmp/combat.sock &
morloc-daemon: listening on unix:///tmp/combat.sock

The wire protocol is identical to TCP: a 4-byte big-endian length prefix followed by the JSON payload. The only difference is the socket type.

unix_client.py — minimal socket client
import socket, struct, json

def recvall(s, n):
    data = b''
    while len(data) < n:
        chunk = s.recv(n - len(data))
        if not chunk:
            raise RuntimeError("Connection closed")
        data += chunk
    return data

def call(sock_path, method, command=None, args=None):
    msg = {"method": method}
    if command: msg["command"] = command
    if args is not None: msg["args"] = args

    payload = json.dumps(msg).encode()
    s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    s.connect(sock_path)
    s.sendall(struct.pack('>I', len(payload)) + payload)

    resp_len = struct.unpack('>I', recvall(s, 4))[0]
    resp = recvall(s, resp_len)
    s.close()
    return json.loads(resp)

print(call("/tmp/combat.sock", "call", "rollAdv"))
# {"status": "ok", "result": 18}

print(call("/tmp/combat.sock", "call", "fighterDamage", [15]))
# {"status": "ok", "result": 12}

print(call("/tmp/combat.sock", "discover"))
# {"status": "ok", "result": {"name": "combat", "commands": [...]}}

6.9.4. Running all protocols at once

You don’t have to choose. One daemon can listen through all three protocols at the same time:

$ combat --daemon \
      --http-port 8080 \
      --port 9001 \
      --socket /tmp/combat.sock
morloc-daemon: listening on unix:///tmp/combat.sock
morloc-daemon: listening on tcp://127.0.0.1:9001
morloc-daemon: listening on http://0.0.0.0:8080

All three protocols hit the same daemon process and share the same pool processes. A request arriving over HTTP, TCP, or the Unix socket is dispatched identically — only the framing differs.

Ephemeral ports

If you don’t care which port the daemon binds to — which is the common case for tests, CI jobs, or any orchestrator running many daemons in parallel — pass 0 and the OS picks a free one for you. The actual port appears in the stderr ready line, and can also be written to a file in a fixed JSON shape:

$ combat --daemon --http-port 0 --port 0 --port-file ports.json &
morloc-daemon: listening on tcp://127.0.0.1:46217
morloc-daemon: listening on http://0.0.0.0:39381

$ cat ports.json
{"http":39381,"tcp":46217,"unix":null}

The file is written atomically (via rename) only after every listener is bound, so a stat-waiting client never sees a half-written file. Missing listeners are null, never absent — the schema is fixed.

6.9.5. From single daemons to a router

Everything above shows a single program running as a daemon. This is enough when you have one service, but Morloc programs are designed to be composed. You might have a tavern program that picks character classes and races, and a combat program that resolves attacks and damage. Each is its own compiled Morloc program with its own pools.

You could start each one as an independent daemon on its own port and have your client keep track of which port maps to which program. But that gets tedious. The router solves this: it presents a single HTTP endpoint that discovers and manages all your installed Morloc programs.

The following diagram illustrates how a client request flows through the router to a program daemon and its language pools:

                  Client
                    |
                    | HTTP: POST /call/tavern/randomClass -d '[]'
                    v
             +--------------+
             |    Router    |  morloc-nexus --router --http-port 9090
             |  (HTTP:9090) |  Reads manifests from fdb/ at startup
             +--------------+
              /            \
    Unix socket            Unix socket
            /                \
  +-----------+         +-----------+
  |  tavern   |         |  combat   |
  |  daemon   |         |  daemon   |
  +-----------+         +-----------+
       |                 /        \
       v                v          v
    Python           Python        R
     pool             pool        pool

Each daemon is a child process of the router, started lazily on first request. The router and its daemons communicate over Unix sockets using the same length-prefixed JSON protocol described above.

6.9.6. Router mode

Setup

To make a program available to the router, install it with --install. This copies the program binary and writes a manifest file to the fdb/ directory where the router discovers programs at startup.

$ morloc make --install -o tavern tavern.loc
Installed 'tavern' to ~/.local/share/morloc/bin/tavern

$ morloc make --install -o combat combat.loc
Installed 'combat' to ~/.local/share/morloc/bin/combat

$ ls ~/.local/share/morloc/fdb/
combat.manifest  tavern.manifest
Starting the router
$ morloc-nexus --router --http-port 9090
morloc-router: listening on http://0.0.0.0:9090
morloc-router: 2 programs registered
morloc-router:   - combat (3 commands)
morloc-router:   - tavern (2 commands)
Listing programs
$ curl -s localhost:9090/programs | python3 -m json.tool
{
    "programs": [
        {
            "name": "combat",
            "running": false,
            "commands": [
                {"name": "rollAdv",        "type": "remote", "return_type": "Int"},
                {"name": "fighterDamage",  "type": "remote", "return_type": "Int"},
                {"name": "intro",          "type": "remote", "return_type": "Str"}
            ]
        },
        {
            "name": "tavern",
            "running": false,
            "commands": [
                {"name": "randomClass",  "type": "remote", "return_type": "Str"},
                {"name": "randomRace",   "type": "remote", "return_type": "Str"}
            ]
        }
    ]
}

Programs start lazily — running: false until the first call.

Per-program discovery

You can discover commands for a specific program without starting its daemon:

$ curl -s localhost:9090/discover/tavern | python3 -m json.tool
{
    "name": "tavern",
    "commands": [...]
}
Calling functions

Calls are routed by program name in the URL: /call/<program>/<command>.

$ curl -s -X POST localhost:9090/call/tavern/randomClass -d '[]'
{"status":"ok","result":"Rogue"}

$ curl -s -X POST localhost:9090/call/tavern/randomRace -d '[]'
{"status":"ok","result":"Elf"}

$ curl -s -X POST localhost:9090/call/combat/rollAdv -d '[]'
{"status":"ok","result":17}

$ curl -s -X POST localhost:9090/call/combat/fighterDamage -d '[15]'
{"status":"ok","result":12}

$ curl -s -X POST localhost:9090/call/combat/intro -d '["Goblin"]'
{"status":"ok","result":"A wild Goblin appears!"}

The first call to a program starts its daemon automatically. Subsequent calls reuse the running daemon with no startup cost. If a daemon crashes between calls, the router detects the failure and restarts it transparently.

Error handling
$ curl -s -X POST localhost:9090/call/dungeon/explore -d '[]'
{"status":"error","error":"Unknown program: dungeon"}
Independent daemons vs router-managed daemons

A daemon started manually (e.g., combat --daemon --http-port 8080) is completely independent of the router. The router only knows about programs whose manifests are in the fdb/ directory, and it starts its own daemon instances as child processes. If you start a daemon on your own and also have the same program registered with the router, you will have two separate daemon processes — each with its own pool processes and its own state.

6.9.7. Shutdown

Send SIGTERM (or SIGINT) to stop a daemon or router gracefully. The daemon sends SIGTERM to each pool process group, waits briefly for clean exit, then sends SIGKILL to any stragglers. Unix socket files are removed.

$ kill $DAEMON_PID
morloc-daemon: shutting down

$ kill $ROUTER_PID
morloc-router: shutting down

When a router shuts down, it terminates all the daemons it started. There is currently no way to stop an individual program’s daemon through the router API — the router manages their lifecycles internally. If you need to restart a specific program, restart the router.

6.9.8. Summary

Mode Flag Description

Daemon

--daemon

Run one program as a persistent service

Router

--router

Aggregate all installed programs (fdb/) behind one API

HTTP

--http-port <n>

RESTful JSON API (curl-friendly); 0 = ephemeral

TCP

--port <n>

Length-prefixed JSON over TCP; 0 = ephemeral

Socket

--socket <path>

Length-prefixed JSON over Unix socket

Discovery

--port-file <path>

Write bound ports to PATH as JSON (atomic)

fdb

--fdb <path>

Override manifest directory (default: ~/.local/share/morloc/fdb)

7. Modules and Libraries

7.1. Importing modules

Every Morloc file is a module. A module declaration names the module and optionally lists the terms it exports:

module mylib (foo, bar)

This declares a module named mylib that exports foo and bar. Only exported terms are visible to other modules that import this one.

If a module exports everything it defines, you can use the wildcard form:

module mylib (*)

If a module’s export list is empty, it exports no named terms. This is useful for modules whose only purpose is to provide typeclass instances — instances travel with the module rather than the export list, so once a module is imported all of its instances become available. An instance-only module can therefore write:

module myinstances ()
import .base
type Py => Int = "int"
instance Addable Int where
    source Py from "ops.py" ("add_int" as add)

To use the instances, write import .myinstances in the consuming module. Like in Haskell, typeclass methods are not picked individually — importing the module makes the entire instance available.

For submodules that exist only to be imported by a parent, you can omit the name entirely:

module (*)

An anonymous module’s name is inferred from its file path relative to the importing module. For example, if main.loc imports .utils, the compiler will resolve the module in utils/main.loc (or utils.loc) and assign it the name utils.

A single file may contain more than one module declaration. Each module keyword starts a new module; everything indented under it (or appearing before the next module keyword) belongs to that module. A module body may also be empty:

module utils (helper)
import root-py
helper :: Int -> Int
helper x = x + 1

module main (run)
import utils (helper)
run :: Int -> Int
run x = helper x

Morloc distinguishes between two kinds of imports: system modules and local modules.

System modules are installed packages that live in ~/.local/share/morloc/lib/. They are imported by name, without any prefix:

import root-py
import root-cpp

System modules are installed with morloc install:

$ morloc install root
$ morloc install root-py

Local modules are files or directories within your own project. They are imported with a dot (.) prefix to distinguish them from system modules:

import .utils (helper)
import .lib.math (square)

The dot prefix tells the compiler to look for the module relative to the directory of the importing file, not in the system library.

Both system and local imports support selective imports. Without a selector, all exported terms are brought into scope:

import root-py             -- import everything from root-py
import .mylib              -- import everything from local mylib
import .mylib (foo, bar)   -- import only foo and bar from local mylib

When you write import .foo, the compiler looks for the module relative to the directory containing the current file. It checks two locations, in order:

  1. A directory module: foo/main.loc

  2. A file module: foo.loc

Dot-separated paths map to nested directories. For example, import .lib.math resolves to either lib/math/main.loc or lib/math.loc.

Here is an example project layout:

project/
  main.loc            -- module main, imports .utils and .lib.math
  utils.loc           -- module (*), a flat file module
  utils.py
  lib/
    math/
      main.loc        -- module (*), a directory module
      main.py

The top-level main.loc imports both:

module main (negate_square, square_negate)

type Py => Real = "float"

import .utils (negate)
import .lib.math (square)

negate_square :: Real -> Real
negate_square x = negate (square x)

square_negate :: Real -> Real
square_negate x = square (negate x)

The flat file utils.loc exports negate:

module (*)

source Py from "utils.py" ("negate")

type Py => Real = "float"

negate :: Real -> Real

And the directory module lib/math/main.loc exports square:

module (*)

source Py from "main.py" ("square")

type Py => Real = "float"

square :: Real -> Real

Local modules can also import other local modules. The path is always relative to the importing file. For example, if bar/baz/main.loc needs to import a sibling at bif/biz/, it writes:

import .bif.biz (mul)

This resolves relative to bar/baz/, looking for bar/baz/bif/biz/main.loc.

Since root is also the name of a system module, a local directory named root/ must be imported with the dot prefix to avoid ambiguity:

import root         -- imports the system "root" module
import .root        -- imports the local "root/" directory

The dot prefix always forces local resolution, so there is never a collision between local and system module names.

7.2. Installing modules

The default Morloc modules are hosted on GitHub under the morloclib organization. Modules can be installed with the morloc install command:

$ morloc install internal
$ morloc install root
$ morloc install root-cpp
$ morloc install root-py
$ morloc install root-r

Installed modules are stored in ~/.local/share/morloc/lib/ and can be imported in any Morloc script.

To view the modules that are currently installed, you can run morloc list. This will list all installed modules, their version, and their short descriptions. Adding the -v option additionally prints the types of all exported terms.

To view just the exports of one desired module, you can include pattern that matches the module of interest:

$ morloc list -v il
Modules:
  internal
    pack :: a -> b
    unpack :: b -> a
    (.) :: (b -> c) -> (a -> b) -> a -> c
    ($) :: (a -> b) -> a -> b

Here il matches any module with a name including the ordered characters i and l — only internal in this case.

7.3. The universal library

A module may export types, typeclasses, and function signatures but no implementations. Such a module would be completely language agnostic. A powerful approach to building libraries in the Morloc ecosystem is to write one module that defines all types, then $n$ modules for language-specific implementations that import the type module, and then one module to import and merge all implementations. This is the approach taken by the base module and by other core libraries.

In the future, when hundreds of languages are supported, and when possibly some functions may even have many implementations per language, it will be desirable to have finer control over what functions are used. One solution would be to add filters to the import statement. Thus the import expressions would be a sort of query. Alternatively, constraints could be added at the function level, and thus the entire Morloc script would be a query over the universal library. This would be especially powerful when imported types are expressed as unknowns to be inferred by usage.

8. Installation, Versions, and Deployment

morloc-manager is distributed as a static binary alongside Morloc releases for x86 Linux, Linux ARM, and macOS. Binaries are available on the GitHub releases page.

8.1. Initial setup

Before first use, configure the default container engine:

$ morloc-manager setup --engine podman    # or: --engine docker
$ morloc-manager setup                    # show current settings

If only one engine is installed, morloc-manager new will auto-detect it. If both Podman and Docker are installed, running setup once is required to choose a default.

8.2. Creating an environment

An environment is a self-contained Morloc installation: a base container image, an optional Dockerfile layer, engine flags, and its own module and binary directories. Everything in Morloc happens inside an environment.

$ morloc-manager new                            # interactive wizard
$ morloc-manager new myenv                      # use latest morloc release
$ morloc-manager new myenv --version 0.73.0     # pin a specific version
$ morloc-manager new myenv --image ubuntu:22.04 # use any container image

After creating an environment, activate it with select:

$ morloc-manager select myenv

Subsequent run, freeze, and start commands operate on the active environment.

8.3. Running commands

The run subcommand executes a command inside the active environment:

$ morloc-manager run -- morloc make -o hello hello.loc
$ morloc-manager run -- ./hello 21
$ morloc-manager run --shell           # interactive shell

The current directory is bind-mounted into the container so source files and build artifacts are shared with the host. On SELinux systems (Fedora, RHEL), the :z relabel suffix is applied automatically. You must work in a subdirectory of your home directory (not ~ itself, /tmp, or other system directories).

8.4. Managing environments

You can keep multiple environments and switch between them at any time:

$ morloc-manager ls                # list all environments
$ morloc-manager select myenv      # switch to myenv
$ morloc-manager info myenv        # detailed info for myenv
$ morloc-manager info              # overview of all environments
$ morloc-manager rm myenv          # remove an environment

Environments can be created at local scope (per-user, the default) or system-wide (--system, requires root). Local environments shadow system environments of the same name. A regular user can select a system environment read-only without being able to modify it.

$ sudo morloc-manager new shared --version 0.73.0 --system
$ morloc-manager select shared
$ morloc-manager run -- morloc --version

8.5. Customizing environments with a Dockerfile layer

For extra system dependencies — Python packages, R libraries, apt packages — you can layer a custom Dockerfile on top of the base image. The simplest way is to generate a stub:

$ morloc-manager new scipy --version 0.73.0 --dockerfile-stub

This creates a template at ~/.config/morloc/environments/scipy/Dockerfile. Edit it to add your dependencies:

# CONTAINER_BASE is replaced at build time with the environment's base image
ARG CONTAINER_BASE=scratch
FROM ${CONTAINER_BASE}

RUN pip install scikit-learn pandas matplotlib

Then rebuild:

$ morloc-manager update scipy

You can also pass an existing Dockerfile directly, or include extra build context files:

$ morloc-manager new ml --version 0.73.0 --dockerfile ./Dockerfile
$ morloc-manager new ml --version 0.73.0 --dockerfile ./Dockerfile -i ./data.csv

Once built, morloc-manager run uses the customized image transparently. The manager tracks a SHA-256 hash of the Dockerfile and skips rebuilding when nothing has changed.

8.6. Extra container flags

For runtime changes beyond the image — GPU passthrough, extra volumes, network modes — use --flagfile or repeat -x:

$ morloc-manager new gpu --version 0.73.0 \
    -x --gpus all \
    -x "-v /data/models:/models"

Each environment stores its flags in ~/.config/morloc/environments/<name>/env.flags. One flag per line; lines starting with # are comments. You can edit the file directly and the changes apply on the next run.

8.7. Deployment: serve and freeze

The manager provides two complementary deployment paths:

  • start serves an environment locally by bind-mounting its state into a read-only container. Fast, requires no build step.

  • freeze/unfreeze exports an environment as a portable image for external deployment (registries, Kubernetes).

8.7.1. Local serving with start

$ morloc-manager start                   # serve active environment on :8080
$ morloc-manager start myenv -p 9090:8080
$ morloc-manager status                  # list running servers
$ morloc-manager logs myenv              # view logs
$ morloc-manager logs myenv -f           # stream logs
$ morloc-manager stop myenv              # stop the container

The serve container runs morloc-nexus --router as its entrypoint, exposing:

GET /health

Health check

GET /programs

List available programs

GET /discover/<prog>

Show exported commands and their type signatures

POST /call/<prog>/<cmd>

Invoke a specific command

POST /eval

Compose and evaluate expressions from installed modules

8.7.2. Portable images with freeze/unfreeze

For deploying to a different machine, export the environment’s state and build a standalone image:

$ morloc-manager freeze -o ./my-freeze
$ morloc-manager unfreeze --from ./my-freeze/state.tar.gz --tag myservice:v1
$ docker run -d -p 8080:8080 myservice:v1

freeze produces two artifacts:

  • state.tar.gz — the contents of lib/, fdb/, bin/, exe/, and opt/

  • freeze-manifest.json — an auditable record listing the Morloc version, timestamp, installed modules with SHA-256 checksums, compiled programs with their exported commands, and the base image and environment layer used

unfreeze builds a minimal serve image that contains the morloc-nexus router, the Morloc compiler (needed for the /eval endpoint), language runtimes, and the frozen state — no GHC, Stack, or build tools. The resulting image can be pushed to a registry and deployed externally; it does not need morloc-manager to run.

freeze requires at least one program compiled with morloc make --install.

8.8. Beyond fixed APIs: composable function services

A conventional API exposes a fixed set of endpoints. A Morloc serve container goes further: in addition to calling pre-compiled commands, the /eval endpoint lets callers compose new expressions from the functions available in installed modules. Because Morloc’s type system spans all installed languages, these compositions are type-checked before execution, and the runtime handles all cross-language marshalling automatically.

This means a single deployed container does not just serve a finite set of functions — it serves the entire composition space of every function in every installed module. An agent or client can discover available functions via /discover, read their type signatures, and synthesize novel pipelines that were never anticipated at build time, all within the safety guarantees of the type system.

The safety model for this relies on several layers:

  • The serve-mode parser accepts only a restricted subset of the Morloc language: callers can compose primitives and functions defined in frozen modules, but cannot source new foreign code or import modules that are not already installed

  • Module resolution is checked at compile time — only functions from installed modules are reachable

  • The type system prevents invalid compositions across language boundaries

  • The container runs with a read-only filesystem and resource limits

8.9. System-wide environments with Podman

Podman stores images per-user. After creating a system environment with sudo, configure rootless Podman to read the rootful image store by adding this line to the [storage.options] section of /etc/containers/storage.conf:

additionalimagestores = ["/var/lib/containers/storage"]

No Podman restart is needed; the setting is re-read on every invocation.

9. Build Architecture

9.1. Architecture Overview

A compiled Morloc program has two kinds of components: one nexus and one or more pools.

The nexus is a pre-compiled Rust binary that serves as the CLI entry point. It reads a JSON manifest describing the program’s structure, parses command-line arguments, and orchestrates execution. The nexus starts pool daemons, sends them call packets over Unix domain sockets, and prints the result. When done, it tears everything down.

Pools are language-specific daemons — one per language used in the program. A pool contains all functions from its language, compiled into a single unit. Pools listen on Unix sockets for call packets, dispatch to the appropriate function, and return results. All pools support concurrency, starting with one worker and growing dynamically as needed. C++ pools use OS threads for true parallelism; Python and R pools use worker processes to handle concurrent requests.

Data moves between pools via Unix domain sockets. For small values (up to 64 KB), the serialized data is embedded directly in the packet — no shared memory is needed. For large values, data is placed in a shared memory region and only an 8-byte pointer travels over the socket. Pools can also call each other directly for cross-language ("foreign") calls, without routing through the nexus.

Diagram

Here are the runtime rules you should be able to count on. Any violations should be considered bugs.

  1. STDOUT and STDERR pass through. Any output written to stdout or stderr by user functions is never intercepted or buffered by the Morloc runtime. It passes directly to the terminal.

  2. Errors become tracebacks. All exceptions raised by user functions are caught by the pool and returned as error packets. As the error propagates back through foreign calls to the nexus, each layer appends context, building a full cross-language traceback that the user can read.

  3. Intra-pool calls are near-native. Calls between functions within the same pool go through a simple dispatch table — there is no serialization, no socket overhead, and no IPC. Performance should be nearly native.

  4. Inter-pool calls cost socket time plus marshalling. A call between pools (or between the nexus and a pool) pays only the few microseconds of Unix socket round-trip plus the cost of data marshalling. In the best case, the data in shared memory can be directly used between programs and marshalling cost is zero. In practice, copies are often needed — for example, Python demands ownership of its strings even when the data could in principle be shared directly.

9.2. Cross-language function calls

When a Morloc program composes functions from different languages, the compiler must bridge the language boundary. A key design principle is that Morloc never serializes functions. Functions cannot be meaningfully transmitted between language runtimes — there is no way to pickle a C++ template instantiation into something Python can call directly. Instead, Morloc generates wrapper functions that make IPC calls to the foreign language pool.

9.2.1. How it works

Each function in a compiled Morloc program is assigned a unique integer identifier called a manifold ID (mid). Every pool maintains a dispatch table mapping manifold IDs to concrete function implementations. When a function needs to call a function in another language, it does not call it directly — it sends a call packet containing the target manifold ID and serialized arguments over a Unix domain socket to the foreign pool, which dispatches the call and returns the result.

The compiler generates all of this automatically. Consider a program where Python’s pmap calls a C++ sum function:

module foo (sumOfSums)

import root-cpp
import root-py

source Py from "foo.py" ("pmap")
source Cpp from "foo.hpp" ("sum")

pmap :: (a -> b) -> [a] -> [b]
sum :: [Real] -> Real

sumOfSums = sum . pmap sum

When pmap is compiled in the Python pool, it receives sum not as a C++ function pointer, but as a Python wrapper function generated by the compiler. This wrapper:

  1. Serializes its arguments into the binary wire format

  2. Sends a call packet (with the C++ sum manifold ID) over the Unix socket to the C++ pool

  3. Reads the result packet back

  4. Deserializes the result into a Python value

From Python’s perspective, this wrapper is an ordinary Python callable. It can be passed to multiprocessing.Pool.map, stored in a list, or used anywhere a function is expected — because it is a regular Python function. The cross-language call is hidden inside it.

9.2.2. What the generated code looks like

The Python pool contains a wrapper like this (simplified):

def m1384(x):
    packed = morloc.put_value(x, "<list>a<float>f8")
    result = morloc.foreign_call(cpp_socket_path, 1384, [packed])
    return morloc.get_value(result, "<float>f8")

Here 1384 is the manifold ID assigned to sum, and cpp_socket_path is the path to the C++ pool’s Unix domain socket. The morloc.foreign_call function handles the IPC: it sends a call packet, waits for the response, and returns the raw result packet. The put_value and get_value functions handle serialization and deserialization using a compact binary schema string.

On the C++ side, the pool’s dispatch table routes the manifold ID to the actual sum implementation:

// compiler-generated dispatch
uint8_t* local_dispatch(uint32_t mid, const uint8_t** args) {
    switch(mid) {
        case 1384: return m1384(args[0]);  // calls sum
        // ...
    }
}

9.2.3. Performance implications

Intra-pool calls (functions in the same language) are direct native function calls — no serialization, no sockets, no dispatch table lookup. The only overhead is that functions may be wrapped in thin wrapper functions, but even this can be eliminated with the %inline pragma, which inlines the function body at the call site.

Inter-pool calls (cross-language) pay the cost of:

  • Serializing the arguments (proportional to data size)

  • A Unix socket round-trip (microseconds for small payloads)

  • Deserializing the result

In special cases, serialization can be avoided entirely. When data has the same binary representation in both languages, only a pointer to shared memory needs to cross the socket — no copying or conversion. Currently this zero-copy path is supported for Arrow tables; support for fixed-size numeric vectors and tensors is planned.

For higher-order functions like pmap, each invocation of the wrapped function is a separate IPC round-trip. If pmap sum is called on a list of 1000 elements, that is 1000 cross-language calls. This is the expected cost of language interop — the alternative would be to batch the data and send it all at once, but that would require changing the function’s interface.

When performance matters, the best strategy is to keep hot loops within a single language. The compiler’s implementation selection algorithm already optimizes for this: given multiple implementations of a function, it prefers the one that avoids cross-language calls.

9.3. Protocols

Note This section is primarily of interest to users extending the Morloc ecosystem (e.g., adding a new language backend) or debugging at the binary level.

This section describes the binary formats used for communication between the nexus and pools: the manifest, the packet protocol, the shared memory layout, and the voidstar data format.

9.3.1. The manifest

The manifest is a JSON object embedded in the wrapper script (after a MANIFEST marker). It describes the program’s structure. Key fields:

Field Description

version

Manifest format version (currently 1)

name

Program name

build_dir

Absolute path to the build directory

pools

Array of pool descriptors (see below)

commands

Array of exported commands (see below)

Each pool entry:

  • lang — Language name (e.g., "python3", "cpp")

  • exec — Command-line tokens to launch the pool (e.g., ["python3", "pools/pool.py"])

  • socket — Unix domain socket basename (e.g., "pipe-python3")

Each command entry:

  • name — CLI subcommand name

  • type — "remote" (dispatched to a pool) or "pure" (evaluated in the nexus)

  • mid — Manifold index identifying the function in the pool

  • pool — Index into the pools array

  • needed_pools — Indices of all pools that must be running

  • arg_schemas / return_schema — Schema strings describing argument and return types (see Schema strings)

  • args — CLI argument descriptors

9.3.2. Packet protocol

All communication uses a binary packet protocol over Unix domain sockets. Every packet starts with a 32-byte packed header:

Diagram
Table 2. Packet header fields
Field Type Width Description

magic

uint32_t

4

Constant 0x0707f86d (little-endian)

plain

uint16_t

2

Plain membership (reserved, always 0)

version

uint16_t

2

Format version (currently 0)

flavor

uint16_t

2

Metadata convention (reserved)

mode

uint16_t

2

Evaluation mode (reserved)

command

union

8

Type-specific command data (see below)

offset

uint32_t

4

Bytes of metadata between header and payload

length

uint64_t

8

Payload length in bytes

Total packet size is always 32 + offset + length.

Packet types

The command field’s first byte is a type tag:

Data packet (0x00) — Carries data or error messages:

Field Type Width Description

type

uint8_t

1

0x00

source

uint8_t

1

0x00=MESG (inline), 0x01=FILE (path), 0x02=RPTR (shared memory pointer)

format

uint8_t

1

0x00=JSON, 0x01=MSGPACK, 0x02=TEXT, 0x03=DATA, 0x04=VOIDSTAR

compression

uint8_t

1

Reserved, always 0

encryption

uint8_t

1

Reserved, always 0

status

uint8_t

1

0x00=PASS, 0x01=FAIL

padding

uint8_t

2

Zero

For small data (up to 64 KB serialized), the most common combination is source=MESG, format=VOIDSTAR — the voidstar binary is embedded directly in the packet payload, avoiding shared memory entirely. For large data, the combination is source=RPTR, format=VOIDSTAR — only an 8-byte relative pointer travels over the socket, and the data lives in shared memory.

When status=FAIL, the packet carries a UTF-8 error message (source=MESG, format=TEXT).

Call packet (0x01) — Instructs a pool to execute a function:

Field Type Width Description

type

uint8_t

1

0x01

entrypoint

uint8_t

1

0x00=LOCAL, 0x01=REMOTE_SFS

padding

uint8_t

2

Zero

midx

uint32_t

4

Manifold index (which function to call)

The payload is a contiguous sequence of data packets, one per argument.

Ping packet (0x02) — Header-only, no payload. The nexus pings pools to check readiness; the pool echoes it back as a pong.

Metadata blocks

Between the header and payload (in the offset region), packets can carry metadata blocks. Each has an 8-byte header:

Field Type Width Description

magic

char[3]

3

Constant "mmh"

type

uint8_t

1

0x01=SCHEMA_STRING, 0x02=XXHASH

size

uint32_t

4

Payload size in bytes

9.3.3. Shared memory

Pools share data through POSIX shared memory segments rather than copying over sockets. Only relative pointers (8 bytes) travel over the wire.

Volumes

Shared memory is organized as multiple volumes (/dev/shm/morloc-<hash>_0, morloc-<hash>_1, etc.). The nexus creates the first volume (64 KB). New volumes are created automatically when space runs out (up to 32 volumes). If /dev/shm is too small (common in Docker), volumes fall back to files in the temporary directory.

Pointer types
Type Description

absptr_t (void*)

Virtual address in the current process. Different per process.

volptr_t (ssize_t)

Offset within a single volume (0 = first byte after the header).

relptr_t (ssize_t)

Global offset across all volumes. This is the pointer type shared between processes — it appears in data packets and in voidstar data structures.

             volume 0 (size=20)        volume 1
         ---xxxxxx........----xxxxxx............---->
 relptr           0      7          8         19
Volume header (shm_t)
Field Type Description

magic

unsigned int

Constant 0xFECA0DF0

volume_name

char[256]

Volume identifier

volume_index

int

Index in the pool (0, 1, 2, …​)

volume_size

size_t

Usable data capacity (excludes header)

relative_offset

size_t

Sum of all prior volumes' sizes

rwlock

pthread_rwlock_t

Process-shared read-write lock

cursor

volptr_t

Current free block (allocator hint)

Block header (block_header_t, packed)
Field Type Description

magic

unsigned int

Constant 0x0CB10DF0

reference_count

atomic unsigned int

Active references (0 = free)

size

size_t

Payload size in bytes (excludes header)

Blocks use reference counting. shmalloc allocates with first-fit and lazy coalescing. shfree decrements the reference count; blocks are merged during the next allocation scan.

9.3.4. Schema strings

Schema strings are a compact encoding of a data type’s binary layout. They appear in the manifest and in packet metadata.

Primitives:

Schema Type

z

nil (1 byte)

b

bool (1 byte)

i1/i2/i4/i8

signed int (1/2/4/8 bytes)

u1/u2/u4/u8

unsigned int (1/2/4/8 bytes)

f4/f8

float (4/8 bytes)

s

variable-length UTF-8 string

Compounds:

Pattern Description

a<elem>

Array. ai4 = array of int32.

t<N><elems>

Tuple. t2i4f8 = (int32, float64).

m<N><fields>

Record with length-prefixed keys. m2<3>agei4<4>names = \{age: int32, name: string}.

9.3.5. Voidstar binary format

Every Morloc general type maps unambiguously to a binary form that consists of several fixed-width literal types, a list container, and a tuple container. The literal types include a unit type, a boolean, signed integers (8, 16, 32, and 64 bit), unsigned integers (8, 16, 32, and 64 bit), and IEEE floats (32 and 64 bit). The list container is represented by a 64-bit size integer and a pointer to an unboxed vector. The tuple is represented as a set of values in contiguous memory. These basic types are listed below:

Table 3. Morloc primitives
Type Domain Schema Width (bytes)

Unit

()

z

1

Bool

True | False

b

1

UInt8

\([0,2^{8})\)

u1

1

UInt16

\([0,2^{16})\)

u2

2

UInt32

\([0,2^{32})\)

u4

4

UInt64

\([0,2^{64})\)

u8

8

Int8

\([-2^{7},2^{7})\)

i1

1

Int16

\([-2^{15},2^{15})\)

i2

2

Int32

\([-2^{31},2^{31})\)

i4

4

Int64

\([-2^{63},2^{63})\)

i8

8

Float32

IEEE float

f4

4

Float64

IEEE double

f8

8

List x

lists

a{x}

\(16 + n \Vert a \Vert \)

Tuple2 x1 x2

2-ples

t2{x1}{x2}

\(\Vert a \Vert + \Vert b \Vert\)

TupleX \(\ t_i\ ...\ t_k\)

k-ples

\(tkt_1\ ...\ t_k\)

\(\sum_i^k \Vert t_i \Vert\)

\(\{ f_1 :: t_1,\ ... \ , f_k :: t_k \}\)

records

\(mk \Vert f_1 \Vert f_1 t_1\ ...\ \Vert f_k \Vert f_k t_k \)

\(\sum_i^k \Vert t_i \Vert\)

All basic types may be written to a schema that is used internally to direct conversions between Morloc binary and native basic types. The schema values are shown in the table above. For example, the type [(Bool, [Int8])] would have the schema at2bai1. You will not usually have to worry about these schemas, since they are mostly used internally. They are worth knowing, though, since they appear in low-level tests, generated source code, and binary data packets.

Here is an example of how the type ([UInt8], Bool), with the value ([3,4,5],True), might be laid out in memory:

---
03 00 00 00 00 00 00 00 -- first tuple element, specifies list length (little-endian)
30 00 00 00 00 00 00 00 -- first tuple element, pointer to list
01 00 00 00 00 00 00 00 -- second tuple element, with 0-padding
03 04 05                -- 8-bit values of 3, 4, and 5
---

Records and tables (described in detail earlier) are represented as tuples in voidstar format — field names are stored only in the type schemas. The table annotation is not just syntactic sugar for a record of lists; it is preserved through the compiler to the translator, where language-specific serialization functions may have special handling for tables.

record Person = Person { name :: Str, age :: UInt8 }
table People = People { name :: Str, age :: Int }

alice = { name = "Alice", age = 27 }
students = { name = ["Alice", "Bob"], age = [27, 25] }

The Morloc type signatures can be translated to schema strings that may be parsed by a foundational Morloc C library into a type structure. Every supported language in the Morloc ecosystem must provide a library that wraps this Morloc C library and translates to/from Morloc binary given the Morloc type schema.

9.4. Runtime and Dev Builds

The Morloc runtime — the shared library (libmorloc.so) and the nexus binary (morloc-nexus) — is written in Rust. Language-specific pools and extensions still use their own compilers (g++ for C++, etc.), but they link against the Rust-built libmorloc.so through a stable C ABI defined in a single header (morloc.h).

9.4.1. Build modes

morloc init -f installs the runtime. It looks for one of two environment variables to decide how:

Variable Purpose

MORLOC_RUST_BIN

Path to a directory containing pre-built libmorloc.so and morloc-nexus binaries. Used for release installs, CI, and container images.

MORLOC_RUST_DIR

Path to the data/rust/ Cargo workspace inside the compiler repo. Requires a working Rust toolchain (cargo). Used for development and local builds.

Exactly one must be set. If both are set, MORLOC_RUST_BIN takes precedence.

Using pre-built binaries (release / CI):

export MORLOC_RUST_BIN=/path/to/binaries
morloc init -f

Pre-built binaries are published with each release on GitHub Releases.

Building from source (development):

export MORLOC_RUST_DIR=/path/to/compiler/data/rust
morloc init -f

This runs cargo build --release inside the Cargo workspace and copies the artifacts into place. You need cargo (install via rustup).

9.4.2. What the runtime provides

The Rust workspace produces two artifacts:

libmorloc.so

Shared library linked by every pool. Provides the IPC layer (Unix domain sockets), shared-memory allocator, packet codec, JSON/MessagePack serialization, and worker-pool concurrency primitives.

morloc-nexus

Pre-compiled CLI entry point. Reads a .manifest JSON file, parses command-line arguments, starts pool daemons, dispatches calls, and prints results.

9.4.3. Build dependencies

Tool When needed

cargo

Only when building from source (MORLOC_RUST_DIR). Not needed if using pre-built binaries.

gcc / g++

Still required for C++ pool compilation and Python/R language extensions. No longer needed for the runtime itself.

9.4.4. Container images

The official container images (morloc-full, morloc-serve) set MORLOC_RUST_BIN automatically during the image build. Users who install via morloc-manager or use containers do not need to set either variable manually.

10. Future Directions

10.1. Planes of libraries

Important The infrastructure for "planes" is not yet constructed, so the following is speculative

The concept of "planes" is central to the future organization of Morloc and is one of the primary reasons that I created it. A plane is like a namespace for a community’s modules—​but instead of organizing by category or programming language, modules in a plane share a common philosophy about quality, trustworthiness, software design and the review process.

Currently, the universe of functions is separated first by language and then by subject area. Morloc, being polyglot, allows the first mode of separation to be lifted, so language does not need to separate communities. Instead, communities can organize around values.

  • Levels of review & trust: Code may be wild and experimental; tightly reviewed and trusted in production; or formally verified.

  • Design philosophy: Groups may prioritize safety, raw performance, or elegance by some metric.

  • Use case: Planes may focus on production, pedagogy, competition or experimentation.

Making these differences explicit (and easy to navigate) lets the community set and find their own standards.

Real-World Analogs

Within the R community, you could define three planes:

  • CRAN: Has stringent requirements for acceptance and manual application process focused on adherence to well-defined (mostly automated) requirements

  • rOpenSci: Focuses on a formal peer review process that considers motivation, documentation, and good software design

  • GitHub: Wild west. Anything goes.

You could probably find more "planes" in R, but these three capture the idea of what a plane is. It is a design philosophy and set of protocols that define admission.

Possible examples of Morloc planes

  • default: Official libraries used in sandboxes and demos (not necessarily efficient).

  • unstable: For newly submitted or unvetted modules, e.g., loaded straight from GitHub.

  • safe: Modules that passed manual review, rigorous automated tests, and have strong test suites.

  • true: Formally verified modules, strict on what languages are allowed (e.g., dependently typed languages).

  • prod: Production ready modules, combining safety and performance.

  • comp: Modules suited for competitive programming; all performance, no safety checks or focus on software design principles.

  • red: Adversarial modules—​written to give the Morloc bot problems. Probably don’t want to import these.

  • weird: Esoteric code. For silly implementations that abuse languages in fun ways.

  • demo: Prototypes, examples, and proof-of-concept modules. More pedagogical than practical.

Planes aren’t rigid categories, but cultures: each has its own ground rules, review process, and ideas about what makes code "good". Anyone can propose a new plane, but we don’t want too many; a bit of consensus is required before adding one.

How Does a Module Join a Plane?

Again, the architecture is in development. But here is the basic process:

  1. Register: Authors register their module (e.g., import code from GitHub and authenticate).

  2. AI Vetting: Our AI (Weena) checks code for basic standards.

  3. Acceptance: After being accepted, the module defaults to the unstable plane.

  4. Level Up: Module authors can then apply to join other planes. Getting accepted depends on the plane’s review process (could be peer review, automated testing, thumbs up from community members, or nothing at all).

  5. Multiple Planes: Modules can exist in multiple planes at once—​different communities may trust the same code for different reasons.

This process will eventually be mediated on the website morloc.io (under construction).

Overall, planes help you find code that matches your needs and values—​whether you want ultimate safety, bleeding-edge performance, or just something weird that might surprise you. They also provide community and allow relations between different codebases to be specified.

11. Q&A

11.1. I only use one language, is Morloc still useful?

Yes, Morloc remains useful even if you only use one programming language.

While Morloc is designed to allow polyglot development, its core benefits also apply to single-language projects. In the Morloc ecosystem, you may continue working in your preferred language, but focus shifts to writing libraries instead of standalone applications.

Morloc lets you compose these functions and automatically generate applications from them, offering several advantages:

  • Broader usability: Your functions can be easily reused and easily accessed by other language communities.

  • Improved testing and benchmarking: Functions can be integrated into language-agnostic testing and benchmarking frameworks.

  • Future-proofing: If you ever need to migrate to a new language, Morloc’s type annotations and documentation carry over—only the implementation needs to change. And if you want to leave the Morloc ecosystem, your implementation does not need to change.

  • Better workflows: Especially in fields like bioinformatics, Morloc shifts workflows from chaining applications and files to composing typed functions and native data structures, making pipelines more robust and easier to validate.

  • No more format parsing: Morloc data structures replace bespoke file formats and offer efficient serialization.

While language interop is a major feature of Morloc, it is not the main purpose. The very first version of Morloc was not even polyglot at all. The focus originally was to just have a simple composition language that separated pure code from associated effects, conditions, caching, etc.

The primary goal of Morloc is to support the development of composable, typed universal libraries. Support for many languages is required for this goal, since no one language is best for all cases. Most Morloc users would continue to program in their favorite language, but gain the ability to compose, share, and extend functionality more easily.

11.2. Is this just a bioinformatics workflow language?

No. The Morloc paper, is focused on bioinformatics applications. As discussed at length in the paper, Morloc addresses systematic flaws in the traditional approaches to building bioinformatics workflows. Given the need, and also given my personal background, bioinformatics is a good place to start. However, Morloc can be more broadly applied to any functional problem.

11.3. Does Morloc allow function-specific containerized environments?

No, unlike workflow managers such as Snakemake and Nextflow, Morloc does not offer function-specific environments. This is a deliberate design choice.

Dependency resolution is a hard and heavily researched problem. The general goal of dependency solvers is to find one set of dependencies that satisfies the entire program. The bioinformatics community often gives up on finding unified environments and instead runs each function in its independent environment. With every function running in its own container, all dependency issues are encapsulated and all functions may be executed from one manager. But this comes at a heavy cost. Each application must be wrapped in a script, the script must be executed via an expensive system call into the container, and data must be serialized and sent to the container. This approach is reasonable for workflows with a small number of heavy components. But from a programming language perspective, wrapping every function call in its own environment is inefficient and opaque.

Morloc is designed not to hide problems in boxes, but rather to solve the root problem. Conventional workflow languages attempt to simplify workflows design by layering frameworks over the functions. The Morloc approach is the exact opposite. First delete everything unnecessary from all applications and lift their light algorithmic cores into clean, well-typed libraries. Then build upwards through composition of these pure functions—​and judicious use of impure ones—​to create efficient, reliable, and composable tools.

Now, if you really do need to run something in a container, you can just make a function that wraps a call to a container and then use it just as you would any other function. You could even write a wrapper function that takes a record with all the metadata needed for a conda environment and execute its function within that environment. We can do this through libraries, so there is no need to hardcode this pattern into the Morloc language itself.

The reproducibility of Morloc workflows may be ensured by running the entire Morloc program in an environment or container, with a single set of dependencies. The specific Morloc compiler version can be specified and modules may be imported using their git hashes. This is done in the current Morloc examples (see the Dockerfile in the workflow-comparisons folder of https://github.com/morloc-project/examples).

11.4. What about object-oriented programming?

An "object" is a somewhat loaded term in the programming world. As far as Morloc is concerned, an object is a thing that contains data and possibly other unknown stuff, such as hidden fields and methods. All types in Morloc have must forms that are transferable between languages. Methods do not easily transfer; at least they cannot be written to Morloc binary. However, it is possible to convey class-like APIs through typeclasses. Hidden fields are more challenging since, by design, they are not accessible. So objects cannot generally be directly represented in the Morloc ecosystem.

Objects that have a clear "plain old data" representation can be handled by Morloc. These objects, and their component types, must have no vital hidden data, no vital state, and no required methods. Examples of these are all the basic Python types (int, float, list, dict, etc) and many C++ types such as the standard vector and tuple types. When these objects are passed between languages, they are reduced to their pure data.

11.5. Is Morloc still relevant when AI can program and translate?

Maybe. Morloc may serve as a system for functional composition, verification, and automation even when most functions are generated by machines.

I’ll lay out an argument for this below, starting with a few proposition:

  1. Adversaries exist. AIs may themselves be adversarial or there might be adversarial code in ecosystem around the AIs (for example, prompt injection). Humans can’t trust humans, humans can’t trust AIs, AIs can’t trust humans, and AIs can’t trust AIs. Depending on their architecture, AIs may not even be able to trust their own memories.

  2. Stupid is fast. Narrow intelligence outperforms general intelligence for narrow problems. A vast AGI system with deep understanding of physics and Shakespeare will not be the fastest tool for sorting a list of integers. There will always be a need for programs across the intelligence spectrum — from classical functions, to statistical models, to general intelligences.

  3. Creating functions is expensive. Designing high-performance algorithms is not trivial. Even simple functions, like sorting algorithms, require deep thought to optimize for a given use case. But there is a further combinatorial explosion of more complex physical simulations, graphics engines, and statistical algorithms. While simple functions might be created in seconds, others may take years of CPU time to optimize.

  4. Reproducibility is important. Future AIs may serve as nearly perfect oracles, but they are complex entities and future AIs will likely be capable of evolving over time as persons. So they will likely not give equivalent answers day to day. It is valuable to be able to crystallize a thought process into something that will behave the same every time it is invoked on a given input. So again, functions are important.

  5. Correctness is important. If functions are being composed by AIs to create new programs, any function that does not behave in the way the AI expects can cause cascading errors. It doesn’t matter how intelligent the AI is, if it is building programs from functions that it cannot verify, then the programs may not be safe.

A few things follow from these propositions.

First, AI will benefit from writing functions. Even in a world with no humans, they will need functions for efficiently solving narrow problems. They will likely generate libraries of billions of specialized functions. Some may be classical functions and others may be small statistical models. By caching these functions, compute time can be saved. Rather than generating entire programs from first principles, they can build them logically through composition of prior functions. The same forms of abstractions that help humans reason will also be of value to AIs. Yes, they have far larger working memories than we do, but that does not change the fact that abstraction and composition reduce the costs of re-derivation.

Time can also be saved if different AIs share functions they have written (both with each other and with humans). Since adversaries exist, shared functions must be verified. But verification is hard, especially if a godlike super-intelligence were trying to hide adversarial features in the binary. The problem can be simplified by using a controlled language that can be formally verified by a trusted classical computer program — a compiler. So rather than share functions as binary, it would make sense to share them in strict controlled languages. For this reason, I believe that something resembling current programming languages will exist far into the future. Their main purpose will be as easily verifiable and human readable specifications for languages that can be compiled into high-performance code.

So in this imagined future, there are billions of functions in databases that are written in verifiable languages readable by humans, classical machines, and AIs. But what language is used? Maybe the AIs can converge on one standard. But even for AIs, and perhaps especially for them, I don’t think a single language is optimal. Rather, just as in human mathematics, there will likely be many languages for many domains. Languages make trade-offs. In general, the more complex a language is, the more difficult it is to parse, verify and optimize. So even if we ignore human factors, multi-lingual ecosystems are still likely to appear. Adding in human factors, we are again likely to see a spectrum of languages that accept different trade offs in rigor, ease of use, and domain specificity.

I predict a future where humans and AIs use libraries of functions written in specialized languages. All the functions need to be easily verifiable by an outside actor and verified functions need to be composed to more complex programs using a well-verified composer. Since we don’t trust any agent to verify, we need a classical program. Morloc is a potential candidate for this role. It would serve as a classical composition tool, function verification ecosystem, automation engine, and conceptual framework for organizing and using billions of mostly machine generated functions.

In this world, Morloc could serve as the composition engine that parses AI-generated compositions as Morloc scripts, generates all the artifacts, and manages search over functional libraries. The AI design space could be defined as composition-space of all modules that are available in the ecosystem. This lowers the complexity of design space, likely increasing generation efficiency, and allows more formal reasoning over the side-effects a given AI system can have.

Of course, the future is impossible to predict, especially where AI is concerned. It is possible that AIs will converge on a single universal representation for computation. It is possible that the need for human readability and curation will disappear. It is possible that classical computer functions could be entirely replaced by discrete mathematical constructs that are composable and machine verifiable but entirely incomprehensible to humans.

11.6. I’m not a human, do I still need Morloc?

Yes, you are still a programmer and writing endless glue code to stitch things together takes time and is brittle. Abstraction is universal. With Morloc you can build with confidence using simple components with highly efficient interfaces. You can access a library of functions and freely compose them to build infinite new tools. Morloc is a way to define, access, and explore customizable and efficient toolsets.

11.7. Why is it named after Morlocks, weren’t they, like, bad?

While the Morlocks of Wellian fame are best known for their culinary preferences, I think Wells misrepresented them. And even if he didn’t, we don’t treat our own Eloi any better. Meat choices aside, the Morlocks worked below to maintain the machines that simplified life above. That’s why the Morloc language adapts their name.

11.8. Wait! I have more questions!

Great! Look me up on Discord (link below) and we can chat.

12. Contact

This is a young project and any brave early users are highly valued. Feel free to contact me for any reason!