\[ \def\ea{\widehat{\alpha}} \def\eb{\widehat{\beta}} \def\eg{\widehat{\gamma}} \def\sep{ \quad\quad} \newcommand{\mark}[1]{\blacktriangleright_{#1}} \newcommand{\expr}[3]{#1\ \ \vdash\ #2\ \dashv\ \ #3} \newcommand{\packto}[2]{#1\ \approx >\ #2} \newcommand{\apply}[3]{#1 \bullet #2\ \Rightarrow {\kern -1em} \Rightarrow\ #3} \newcommand{\subtype}[2]{#1\ :\leqq\ #2} \newcommand{\braced}[1]{\lbrace #1 \rbrace} \]

1. What is morloc?

morloc is a strongly-typed functional programming language where functions are imported from foreign languages and unified through a common type system.

This language is designed to serve serve as the foundation for a universal library of functions. Each function in the library has one universal type and zero or more implementations. An implementation may be either a function sourced from a foreign language or a composition of morloc functions. The morloc compiler takes the specification for a program, written as a composition of typed functions, and generates an optimized program.

2. Getting Started

2.1. Hello world!

module main (hello)
hello = "Hello World"

The module named main exports the term hello which is assigned the string value of "Hello World".

Paste this into a file (e.g. "hello.loc") and then it can be imported by other morloc modules or directly compiled into a program where every exported term is a subcommand.

morloc make hello.loc

This will generate a single file named "nexus.py". The nexus is the command line user interface to the commands exported from the module. For this simple example, it is the only generated file. Currently the UI is written in Python, though I will likely move to a compiled language in the future to avoid Python’s overhead.

Calling "nexus.py" with no arguments or with the -h flag, will print a help message:

$ ./nexus.py -h
The following commands are exported:
  hello
    return: Str

The command is called as so:

$ ./nexus.py hello
Hello World

2.2. Single language function composition

The following code uses only C++ functions (fold, map, add and mul).

module main (sumOfSquares)

import cppbase (fold, map, add, mul)

square x = mul x x

sumOfSquares xs = fold add 0.0 (map square xs)

If this script is pasted into the file "example-1.loc", it can be compiled as follows:

morloc install cppbase
morloc make example-1.loc

The install command clones the cppbase repo from github into the local directory ~/.morloc/lib. The make command will generate a file named nexus.py, which is an executable interface to the exported functions.

You can see the exported functions and their input and output types:

$ ./nexus.py -h
The following commands are exported:
  sumOfSquares
    param 1: List Real
    return: Real

Then you can call the exported functions:

$ ./nexus.py sumOfSquares [1,2,3]
14

The compiling process generates three files. The nexus.py interface script, a pool.cpp file containing auto-generated code wrapping the imported C++ functions and their compositions, and the executable pool-cpp.out. The nexus script parses the users commands and dispatches them to the executables.

All arguments passed from the user to a morloc nexus are in JSON format. In the case above, [1,2,3] is a JSON list of numbers. However, passing JSON on the command line can be tedious, especially where strings are concerned, so passing JSON in files is also accepted. For example:

$ echo '[1,2,3]' > list.json
$ ./nexus.py sumOfSquares list.json
14

Bash file substitution is also accepted:

$ ./nexus.py sumOfSquares <(echo '[1,2,3]')
14

Or STDIN can be read with the explicit STDIN path:

$ echo '[1,2,3]' | ./nexus.py sumOfSquares /dev/stdin
14

Use of STDIN without the path is not yet supported.

2.3. Composition between languages

morloc can compose functions across languages. For example

module main (fibplot)

import math (fibonacci)
import rbase (plotVectorPDF, ints2reals)

fibplot n = plotVectorPDF (ints2reals (fibonacci n)) "fibonacci-plot.pdf"

The fibplot function calculates Fibonacci numbers using a C++ function and plots it using an R function.

plotVectorPDF is defined in the morloc module rbase (https://github.com/morloclib/rbase). The morloc module contains the following files:

README.md
package.yaml
main.loc
core.R
rbase.R

The main.loc file contains morloc function signatures, compositions, and export statements. The core.R and rbase.R files contain R source code. rbase.R contains the general serialization functions required for R interoperability with other languages. The core.R file contains mostly core utilities that are common between languages (map, zip, fold, add, sub, etc). ints2reals is an alias for the base R function as.numeric. plotPDF is a wrapper around the generic R plot function, as shown below:

plotVectorPDF <- function(x, filename){
  pdf(filename)
  plot(x)
  dev.off()
}

This is a perfectly normal R function with no extra boilerplate. It takes an arbitrary input x and a filename, passes x to the generic plot function, and writes the result to a PDF with the name filename.

The main.loc file contains the type general type signature for this function:

plotVectorPDF :: [Real] -> Str -> ()

The first signature is the general type, the second is the concrete, R-specific type.

Similarly the fibonacci C++ function has the type:

fibonacci :: Int -> List Int

The general type, Int → List Int, describes a function that takes an integer and returns a list of integers. List Int could be written equivalently as [Int].

A concrete type signature can be inferred from the concrete signature but evaluating type functions provided in the libraries:

type Cpp => Int = "int"
type Cpp => List a = "std::vector<$1>" a

After recursively applying these two type functions, the general type evaluates to the concrete type:

fibonacci :: "int" -> "std::vector<$1>" "int"

"std::vector<$1>" "int" aligns to List Int (or equivalently, [Int]). The type that is used in C++ prototypes and type annotations is generated from the concrete type via macro expansion into the type constructor: int replaces $1 yielding the concrete type std::vector<int>.

The fibonacci function itself is a normal C++ function with the prototype:

std::vector<int> fibonacci(int n)

3. Syntax and Features

3.1. Primitives and containers

data : True | False
     | number
     | string
     | [data, ...]       -- lists
     | (data, ...)       -- tuples
     | {x = data, ...}   -- records

As for as morloc is concerned, number is of arbitrary length and precision. Strings are double quoted and support escapes. In the future, I will add support for string interpolation.

3.2. Functions

Function definition follows Haskell syntax.

foo x = g (f x)

morloc supports the . operator for composition, so we can re-write foo as:

foo = g . f

morloc supports partial application of arguments.

For example, to multiply every element in a list by 2, we can write:

multiplyByTwo = map (mul 2.0)

3.3. Type signatures and type functions

General type declarations also follow Haskell syntax:

take a :: Int -> List a -> List a

Where a is a generic type variable. morloc supports [a] as sugar for List a.

The general types may be translated to concrete types by fully evaluating them with a set of language-specific type functions. For example:

type Cpp => Int = "int"
type Py => Int = "int"

type Cpp => List a = "std::vector<$1>" a
type Py => List a = "list" a

Language-specific types are always quoted since they may contain syntax that is illegal in the morloc language.

Type functions may also map between general types.

type (Pairlist a b) = [(a,b)]

Why do I call them type functions, rather than just aliases? There is a lot more that can be done with these functions that I am just beginning to explore.

3.4. Sourcing functions

Sourcing a function from a foreign language is done as follows:

source Cpp from "foo.h" ("mlc_foo" as foo)

foo :: A -> B

Here we state that we are importing the function mlc_foo from the C++ source file foo.h and calling it foo. We then give it a general type signature.

Currently morloc treats language-specific functions as black boxes. The compiler does not parse the C++ code to insure the type the programmer wrote is correct. Checking a morloc general type for a function against the source code may often be possible with conventional static analysis. LLMs are also quite effective at both inferring morloc types from source code and checking types against source code.

For statically typed languages like C++, incorrectly typed functions will usually be caught by the foreign language compiler.

3.5. Records, objects, and tables

Support of records, objects and tables in morloc is still immature.

records, objects and tables are all defined with the same syntax (for now) but have different meanings.

A record is a named, heterogenous list such as a struct in C, a dict in Python, or a list in R. The type of the record exactly describes the data stored in the record (in contrast to parameterized types like [a] or Map a b).

A table is like a record where all types are replaced by lists of that type. But table is not just syntactic sugar for a record of lists, the table annotation is passed with the record through the compiler all the way to the translator, where the language-specific serialization functions may have special handling for tables.

An object is a record with baggage. It is a special case of an OOP class where all arguments passed to the constructor can be accessed as public fields.

All three are defined in similar ways.

record (PersonRec a) = PersonRec {name :: Str, age :: Int}
record Cpp => PersonRec a = "MyObj"

table (PersonTbl a) = PersonObj {name :: Str, age :: Int}
table R => PersonTbl a = "data.frame"
table Cpp => PersonTbl a = "struct"

record (PersonRec a) = PersonRec {name :: Str, age :: Int}
object Cpp => PersonRec a = "MyObj"

Notice that object is undefined for general types, since they don’t check luggage. Also note the difference between the type constructor (e.g. PersonRec) and the data constructor (e.g., "MyObj"). The latter corresponds to the class constructor in the OOP language.

3.6. Modules

A module includes all the code defined under the import <module_name> statement. It can be imported with the import command.

The following module defines the constant x and exports it.

module foo (x)
x = 42

Another module can import Foo:

import Foo (x)

...

A term may be imported from multiple modules. For example:

module main (add)
import cppbase (add)
import pybase (add)
import rbase (add)

This module imports that C++, Python, and R add functions and exports all of them. Modules that import add will import three different versions of the function. The compiler will choose which to use.

3.7. Ad hoc polymorphism (overloading and type classes)

morloc supports ad hoc polymorphism, where instances of a function may be defined for multiple types.

Here is an example of a simple type classe, Sizeable, which represents objects that have be mapped to an integer that conveys the notion of size:

module size (add)

class Sizeable a where
  size a :: a -> Int

Instances of Sizeable may be defined in this module or in modules that import this module. For example:

module foo *

type Cpp => List a = "std::vector<$1>" a
type Py => List a = "list" a

type Cpp => Str = "std::string"
type Py => Str = "str"

instance Sizeable [a] where
  source Cpp "foo.hpp" ("size" as size)
  source Py ("len" as size)

instance Sizeable Str where
  source Cpp "foo.hpp" ("size" as size)
  source Py ("len" as size)

Where in C`, the generic function `size` returns length for any `C size with a size method. For Python, the builtin len can be directly used.

morloc also supports multiple parameter typeclasses, such as in the Packable typeclass below:

class Packable a b where
  pack a b :: a -> b
  unpack a b :: b -> a

This specific typeclass is special in the morloc ecosystem since it handles the simplification of complex types before serialization. Instances may overlap and the most specific one will be selected. Packable may have instances such as the following:

instance Packable [a] (Queue a) where
  ...

instance Packable [a] (Set a) where
  ...

instance Packable [(a,b)] (Map a b) where
  ...

instance Packable [(Int,b)] (Map Int b) where
  ...

3.8. Core libraries

Each supported language has a base library that roughly corresponds to the Haskell prelude. They have functions for mapping over lists, working with strings, etc. They also contain standard type aliases for each language. For example, type Cpp ⇒ Int = "std::string".

The root of the current library is the conventions module that defines the core type classes and the type signatures for the core functions. The conventions library does not, however, load any foreign source code, so it is entirely language agnostic.

Next each language has their own base module — such as pybase, rbase, and cppbase — that import conventions and include the implementations for all (or some) of the defined functions and typeclasses.

Finally, a base module imports all of the language-specific bases. Currently, there are only three supported languages, so importing all their base modules is not impractical. In the future, more selective approaches may be used.

4. Language Interoperability

4.1. Type inference

Every sourced function in morloc must be given a general type signature. These are usually the only type annotations that are needed in a morloc program. Types of all other expressions in the program can be inferred. But this type inference gives us only the general types of all expressions. In order to generate code and properly (de)serialize when needed, we must know the language-specific type of every expression. The transformation from general type to concrete type is performed with user provided type functions. For example:

type Cpp => Map a b = "std::map<$1,$2>" a b
type Cpp => Tuple2 a b = "std::tuple<$1,$2>" a b
type Cpp => List a = "std::vector<$1>" a
type Cpp => Int = "int"
type Cpp => Str = "std::string"

source Cpp from "foo.hpp" ("listToMap", "strLen", "map")

listToMap a b :: [(a,b)] -> Map a b
strLen :: Str -> Int
map a b :: (a -> b) -> [a] -> [b]

makeLengthMap xs = listToMap . map (\x -> (x, strLen x))

The sourced function listToMap, strLen, and map all require general type signatures. From these general type signatures, the type of every sub-expression in makeLengthMap can be inferred, so this function does not need a type signature. Its type is: [Str] → Map Str Int.

There are currently a small number of special types in morloc. Among these are the primitives Int, Real, Bool, Str, and Unit. The Int is an integer of unlimited size. The Real is a float of unlimited precision and width. These two types correspond to the integers and reals that are allowed in JSON. The Str is currently limited to ASCII. The reason for this is partly my bias from scientific computing, where ASCII is usually all we need (there are no umlauts in DNA sequence). I will extend support eventually. The Unit type corresponds to the JSON null. The other special types are List and TupleX, where X is any integer greater than 2.

The Int and Real types can be thought of as mathematical ideals. In contrast, the C++ int and double types are more limited. When the deviations from the ideal integer and real numbers matter, more specific general types may be created, such as BigInt, Int32, or Float64 types, for integers of unlimited size, 32 bits intgers, or 64 bit floats, respectively.

The Map type, is not special in morloc. To define a new type, either Map or BigInt, you have to tell morloc how the type can be broken down into simpler components. How this is done is described in the next section.

4.2. Serialization

morloc 's current interoperability paradigm is based entirely on serialization. Serialization is not a fundamental requirement of morloc. JSON serialization could be replaced with machine-level interoperability for a pair of languages. This change would only affect performance, requiring no new code on the part of the programmer, since all interop is handled by the compiler.

Data types that have an unambiguous mapping to the JSON data model can be automatically serialized without any extra boilerplate. The JSON data model follows this grammar:

json : number
     | bool
     | string
     | null
     | [json]
     | {key1 : json, key1 : json, ...}

Types that are compositions of primitives and containers can be automatically serialized. This includes records and the subset of objects for which arguments passed to the constructor are assigned to accessible fields. For other types, an (un)packing function that simplifies the data is required. For example, take the general type Map a b, which maps keys of type a to values of type b. In a given language, the Map type may be implemented as a hash table, a tree, pair lists, or even a connection to a database. The types a and b do not give enough information to serialize the object. Therefore, the user must provide an unpack function which could be Map a b → ([a],[b]) or Map a b → [(a,b)]. The pack function works in the opposite direction. These functions are provided in an instance of the Packable type class, for example:

module map (Map)

type Cpp => Map a b = "std::map<$1,$2>" a b

class Packable a b where
  pack a b :: a -> b
  unpack a b :: b -> a

instance Packable ([a],[b])  (Map a b) where
  source Cpp "map.hpp" ("packMap" as pack, "unpackMap" as pack)

Note that the unpack function Map a b → ([a],[b]) may not take us all the way to a serializable form since a and b may be arbitrarily complex objects. This is fine, morloc will recursively handle (de)serialization all the way down.

5. Acknowledgements

This documentation page was built with Asciidocs — the best markdown language ever — and the asciidoctor-jet template made by Harsh Kapadia.

References