1. What is morloc
?
morloc
is a strongly-typed functional programming language where functions are
imported from foreign languages and unified through a common type system.
This language is designed to serve serve as the foundation for a universal
library of functions. Each function in the library has one universal type and
zero or more implementations. An implementation may be either a function sourced
from a foreign language or a composition of morloc
functions. The morloc
compiler takes the specification for a program, written as a composition of
typed functions, and generates an optimized program.
2. Getting Started
2.1. Hello world!
module main (hello)
hello = "Hello World"
The module named main
exports the term hello
which is assigned the string
value of "Hello World".
Paste this into a file (e.g. "hello.loc") and then it can be imported by other
morloc
modules or directly compiled into a program where every exported term
is a subcommand.
morloc make hello.loc
This will generate a single file named "nexus.py". The nexus is the command line user interface to the commands exported from the module. For this simple example, it is the only generated file. Currently the UI is written in Python, though I will likely move to a compiled language in the future to avoid Python’s overhead.
Calling "nexus.py" with no arguments or with the -h
flag, will print a help
message:
$ ./nexus.py -h
The following commands are exported:
hello
return: Str
The command is called as so:
$ ./nexus.py hello
Hello World
2.2. Single language function composition
The following code uses only C++ functions (fold
, map
, add
and mul
).
module main (sumOfSquares)
import cppbase (fold, map, add, mul)
square x = mul x x
sumOfSquares xs = fold add 0.0 (map square xs)
If this script is pasted into the file "example-1.loc", it can be compiled as follows:
morloc install cppbase
morloc make example-1.loc
The install
command clones the cppbase
repo from github into the
local directory ~/.morloc/lib
. The make
command will generate a file named
nexus.py
, which is an executable interface to the exported functions.
You can see the exported functions and their input and output types:
$ ./nexus.py -h
The following commands are exported:
sumOfSquares
param 1: List Real
return: Real
Then you can call the exported functions:
$ ./nexus.py sumOfSquares [1,2,3]
14
The compiling process generates three files. The nexus.py
interface script, a
pool.cpp
file containing auto-generated code wrapping the imported C++
functions and their compositions, and the executable pool-cpp.out
. The nexus
script parses the users commands and dispatches them to the executables.
All arguments passed from the user to a morloc
nexus are in JSON format. In
the case above, [1,2,3]
is a JSON list of numbers. However, passing JSON on
the command line can be tedious, especially where strings are concerned, so
passing JSON in files is also accepted. For example:
$ echo '[1,2,3]' > list.json
$ ./nexus.py sumOfSquares list.json
14
Bash file substitution is also accepted:
$ ./nexus.py sumOfSquares <(echo '[1,2,3]')
14
Or STDIN can be read with the explicit STDIN path:
$ echo '[1,2,3]' | ./nexus.py sumOfSquares /dev/stdin
14
Use of STDIN without the path is not yet supported.
2.3. Composition between languages
morloc
can compose functions across languages. For example
module main (fibplot)
import math (fibonacci)
import rbase (plotVectorPDF, ints2reals)
fibplot n = plotVectorPDF (ints2reals (fibonacci n)) "fibonacci-plot.pdf"
The fibplot
function calculates Fibonacci numbers using a C++ function and
plots it using an R function.
plotVectorPDF
is defined in the morloc
module rbase
(https://github.com/morloclib/rbase). The morloc
module
contains the following files:
README.md
package.yaml
main.loc
core.R
rbase.R
The main.loc
file contains morloc
function signatures, compositions, and
export statements. The core.R
and rbase.R
files contain R source code.
rbase.R
contains the general serialization functions required for R
interoperability with other languages. The core.R
file contains mostly core
utilities that are common between languages (map
, zip
, fold
, add
,
sub
, etc). ints2reals
is an alias for the base R function as.numeric
.
plotPDF
is a wrapper around the generic R plot function, as shown below:
plotVectorPDF <- function(x, filename){
pdf(filename)
plot(x)
dev.off()
}
This is a perfectly normal R function with no extra boilerplate. It takes an
arbitrary input x
and a filename, passes x
to the generic plot function,
and writes the result to a PDF with the name filename
.
The main.loc
file contains the type general type signature for this function:
plotVectorPDF :: [Real] -> Str -> ()
The first signature is the general type, the second is the concrete, R-specific type.
Similarly the fibonacci
C++
function has the type:
fibonacci :: Int -> List Int
The general type, Int → List Int
, describes a function that takes an integer
and returns a list of integers. List Int
could be written equivalently as
[Int]
.
A concrete type signature can be inferred from the concrete signature but evaluating type functions provided in the libraries:
type Cpp => Int = "int"
type Cpp => List a = "std::vector<$1>" a
After recursively applying these two type functions, the general type evaluates to the concrete type:
fibonacci :: "int" -> "std::vector<$1>" "int"
"std::vector<$1>" "int"
aligns to List Int
(or equivalently, [Int]
). The
type that is used in C++
prototypes and type annotations is generated from the
concrete type via macro expansion into the type constructor: int
replaces $1
yielding the concrete type std::vector<int>
.
The fibonacci function itself is a normal C++ function with the prototype:
std::vector<int> fibonacci(int n)
3. Syntax and Features
3.1. Primitives and containers
data : True | False
| number
| string
| [data, ...] -- lists
| (data, ...) -- tuples
| {x = data, ...} -- records
As for as morloc
is concerned, number
is of arbitrary length and precision.
Strings are double quoted and support escapes. In the future, I will add
support for string interpolation.
3.2. Functions
Function definition follows Haskell syntax.
foo x = g (f x)
morloc
supports the .
operator for composition, so we can re-write foo
as:
foo = g . f
morloc
supports partial application of arguments.
For example, to multiply every element in a list by 2, we can write:
multiplyByTwo = map (mul 2.0)
3.3. Type signatures and type functions
General type declarations also follow Haskell syntax:
take a :: Int -> List a -> List a
Where a
is a generic type variable. morloc
supports [a]
as sugar for List a
.
The general types may be translated to concrete types by fully evaluating them with a set of language-specific type functions. For example:
type Cpp => Int = "int"
type Py => Int = "int"
type Cpp => List a = "std::vector<$1>" a
type Py => List a = "list" a
Language-specific types are always quoted since they may contain syntax that is
illegal in the morloc
language.
Type functions may also map between general types.
type (Pairlist a b) = [(a,b)]
Why do I call them type functions, rather than just aliases? There is a lot more that can be done with these functions that I am just beginning to explore.
3.4. Sourcing functions
Sourcing a function from a foreign language is done as follows:
source Cpp from "foo.h" ("mlc_foo" as foo)
foo :: A -> B
Here we state that we are importing the function mlc_foo
from the C++
source
file foo.h
and calling it foo
. We then give it a general type signature.
Currently morloc
treats language-specific functions as black boxes. The
compiler does not parse the C++
code to insure the type the programmer wrote
is correct. Checking a morloc
general type for a function against the source
code may often be possible with conventional static analysis. LLMs are also
quite effective at both inferring morloc
types from source code and checking
types against source code.
For statically typed languages like C++
, incorrectly typed functions will
usually be caught by the foreign language compiler.
3.5. Records, objects, and tables
Support of records, objects and tables in morloc
is still immature.
records
, objects
and tables
are all defined with the same syntax (for
now) but have different meanings.
A record
is a named, heterogenous list such as a struct
in C, a dict
in
Python, or a list
in R. The type of the record exactly describes the data
stored in the record (in contrast to parameterized types like [a]
or Map a b
).
A table
is like a record where all types are replaced by lists of that type.
But table
is not just syntactic sugar for a record of lists, the table
annotation is passed with the record through the compiler all the way to the
translator, where the language-specific serialization functions may have
special handling for tables.
An object
is a record with baggage. It is a special case of an OOP class
where all arguments passed to the constructor can be accessed as public fields.
All three are defined in similar ways.
record (PersonRec a) = PersonRec {name :: Str, age :: Int}
record Cpp => PersonRec a = "MyObj"
table (PersonTbl a) = PersonObj {name :: Str, age :: Int}
table R => PersonTbl a = "data.frame"
table Cpp => PersonTbl a = "struct"
record (PersonRec a) = PersonRec {name :: Str, age :: Int}
object Cpp => PersonRec a = "MyObj"
Notice that object
is undefined for general types, since they don’t check
luggage. Also note the difference between the type constructor (e.g.
PersonRec
) and the data constructor (e.g., "MyObj"
). The latter corresponds
to the class constructor in the OOP language.
3.6. Modules
A module includes all the code defined under the import <module_name>
statement. It can be imported with the import
command.
The following module defines the constant x
and exports it.
module foo (x)
x = 42
Another module can import Foo
:
import Foo (x)
...
A term may be imported from multiple modules. For example:
module main (add)
import cppbase (add)
import pybase (add)
import rbase (add)
This module imports that C++
, Python, and R add
functions and exports all
of them. Modules that import add
will import three different versions of the
function. The compiler will choose which to use.
3.7. Ad hoc polymorphism (overloading and type classes)
morloc
supports ad hoc polymorphism, where instances of a function may be
defined for multiple types.
Here is an example of a simple type classe, Sizeable
, which represents objects
that have be mapped to an integer that conveys the notion of size:
module size (add)
class Sizeable a where
size a :: a -> Int
Instances of Sizeable
may be defined in this module or in modules that import
this module. For example:
module foo *
type Cpp => List a = "std::vector<$1>" a
type Py => List a = "list" a
type Cpp => Str = "std::string"
type Py => Str = "str"
instance Sizeable [a] where
source Cpp "foo.hpp" ("size" as size)
source Py ("len" as size)
instance Sizeable Str where
source Cpp "foo.hpp" ("size" as size)
source Py ("len" as size)
Where in C`, the generic function `size` returns length for any `C
size
with a size
method. For Python, the builtin len
can be directly used.
morloc
also supports multiple parameter typeclasses, such as in the Packable
typeclass below:
class Packable a b where
pack a b :: a -> b
unpack a b :: b -> a
This specific typeclass is special in the morloc
ecosystem since it handles
the simplification of complex types before serialization. Instances may overlap
and the most specific one will be selected. Packable
may have instances such
as the following:
instance Packable [a] (Queue a) where
...
instance Packable [a] (Set a) where
...
instance Packable [(a,b)] (Map a b) where
...
instance Packable [(Int,b)] (Map Int b) where
...
3.8. Core libraries
Each supported language has a base library that roughly corresponds to the
Haskell prelude. They have functions for mapping over lists, working with
strings, etc. They also contain standard type aliases for each language. For
example, type Cpp ⇒ Int = "std::string"
.
The root of the current library is the conventions
module that defines the
core type classes and the type signatures for the core functions. The
conventions
library does not, however, load any foreign source code, so it is
entirely language agnostic.
Next each language has their own base module — such as pybase
, rbase
, and
cppbase
— that import conventions
and include the implementations for all
(or some) of the defined functions and typeclasses.
Finally, a base
module imports all of the language-specific bases. Currently,
there are only three supported languages, so importing all their base modules is
not impractical. In the future, more selective approaches may be used.
4. Language Interoperability
4.1. Type inference
Every sourced function in morloc
must be given a general type signature. These
are usually the only type annotations that are needed in a morloc
program. Types of all other expressions in the program can be inferred. But this
type inference gives us only the general types of all expressions. In order to
generate code and properly (de)serialize when needed, we must know the
language-specific type of every expression. The transformation from general type
to concrete type is performed with user provided type functions. For example:
type Cpp => Map a b = "std::map<$1,$2>" a b
type Cpp => Tuple2 a b = "std::tuple<$1,$2>" a b
type Cpp => List a = "std::vector<$1>" a
type Cpp => Int = "int"
type Cpp => Str = "std::string"
source Cpp from "foo.hpp" ("listToMap", "strLen", "map")
listToMap a b :: [(a,b)] -> Map a b
strLen :: Str -> Int
map a b :: (a -> b) -> [a] -> [b]
makeLengthMap xs = listToMap . map (\x -> (x, strLen x))
The sourced function listToMap
, strLen
, and map
all require general type
signatures. From these general type signatures, the type of every sub-expression
in makeLengthMap
can be inferred, so this function does not need a type
signature. Its type is: [Str] → Map Str Int
.
There are currently a small number of special types in morloc
. Among these are
the primitives Int
, Real
, Bool
, Str
, and Unit
. The Int
is an integer
of unlimited size. The Real
is a float of unlimited precision and
width. These two types correspond to the integers and reals that are allowed in
JSON. The Str
is currently limited to ASCII. The reason for this is partly my
bias from scientific computing, where ASCII is usually all we need (there are no
umlauts in DNA sequence). I will extend support eventually. The Unit
type
corresponds to the JSON null
. The other special types are List
and TupleX
,
where X
is any integer greater than 2.
The Int
and Real
types can be thought of as mathematical ideals. In
contrast, the C++
int
and double
types are more limited. When the
deviations from the ideal integer and real numbers matter, more specific general
types may be created, such as BigInt
, Int32
, or Float64
types, for
integers of unlimited size, 32 bits intgers, or 64 bit floats, respectively.
The Map
type, is not special in morloc
. To define a new type, either Map
or BigInt
, you have to tell morloc
how the type can be broken down into
simpler components. How this is done is described in the next section.
4.2. Serialization
morloc
's current interoperability paradigm is based entirely on
serialization. Serialization is not a fundamental requirement of morloc
. JSON
serialization could be replaced with machine-level interoperability for a pair
of languages. This change would only affect performance, requiring no new code
on the part of the programmer, since all interop is handled by the compiler.
Data types that have an unambiguous mapping to the JSON data model can be automatically serialized without any extra boilerplate. The JSON data model follows this grammar:
json : number
| bool
| string
| null
| [json]
| {key1 : json, key1 : json, ...}
Types that are compositions of primitives and containers can be automatically
serialized. This includes records and the subset of objects for which arguments
passed to the constructor are assigned to accessible fields. For other types, an
(un)packing function that simplifies the data is required. For example, take the
general type Map a b
, which maps keys of type a
to values of type b
. In a
given language, the Map
type may be implemented as a hash table, a tree, pair
lists, or even a connection to a database. The types a
and b
do not give
enough information to serialize the object. Therefore, the user must provide an
unpack function which could be Map a b → ([a],[b])
or Map a b → [(a,b)]
.
The pack function works in the opposite direction. These functions are
provided in an instance of the Packable
type class, for example:
module map (Map)
type Cpp => Map a b = "std::map<$1,$2>" a b
class Packable a b where
pack a b :: a -> b
unpack a b :: b -> a
instance Packable ([a],[b]) (Map a b) where
source Cpp "map.hpp" ("packMap" as pack, "unpackMap" as pack)
Note that the unpack function Map a b → ([a],[b])
may not take us all the way
to a serializable form since a
and b
may be arbitrarily complex
objects. This is fine, morloc
will recursively handle (de)serialization all
the way down.
5. Acknowledgements
This documentation page was built with Asciidocs — the best markdown language ever — and the asciidoctor-jet template made by Harsh Kapadia.