Do-it-yourself, open-source modelling
xxx
You want to create a mathematical model of sorts. Well, not that mathematical maybe. More, like computational. You have this image in your mind, which you just put down on paper. Maybe it looks like this:
The math is not that complicated. The arrows represent equations specifying that both owls and cats eat mice, which again eat your morning cereals (important model restriction: exclude yourself as a cereal forager and leave out the cereal production system). If we looked inside the boxes and elaborated further, we could add how cereals is a resource for mice offspring. Throw in kittens and owlets, and we’ve got a whole ecosystem going. We’ll look into textbooks of population dynamics and quickly find a few, simple equations describing each sub-process. Having the system diagram (which have grown since our first sketch above) drawn on one paper and an assortment of equations listed on another paper, how do we proceed? Every bit is simple enough but to assemble this as a whole in a neat model running on your computer seems beyond your reach.
If you are an engineer, I am sorry for having bothered you with this bottomless pit of biological processes and ever-enfolded sub-processes and sub-sub-process. Maybe you are more into a physical systems, like a greenhouse (from j.compag.2017.08.020):
So what’s the problem? Go ahead. Write down those differential equations, built on the safe physical principles of thermodynamics, heat exchange, etc. But then you realise that the smoothness of your equations belies the abruptness of the real-world system. You’ve got windows opening and closing, curtains being drawn, heat system turning on/off — a battery of effectuators controlled by the logic of the greenhouse climate computer. Trying to incorporate that logic into your equations disrupts their beauty and turn the model as a whole into an unmanageable monster. No. You need another approach: Describe the logic inside each box on its own and then connect the boxes, like Lego bricks, to build the whole model.
If you are a mathematician, you may have come to the wrong place, unless you want to learn about model-building that involves no fancy math at all.
BoxScript is a language developed at the Ecological Modelling Laboratory as a tool to compose models out of model building blocks, like those described loosely in the two examples above. It is not yet another software to draw models by dropping building blocks (boxes, arrows and whatnot) on a graphical canvas. BoxScript is a programming language used to write models in text files, which we call boxscripts.
Knowing how to write BoxScript code only gets you half-way. You also need software that will read the boxscript and run the model. Universal Simulator is the software created at the lab for this purpose. It reads and executes boxscripts and produces output that is seamlessly exported to R, where the output is displayed and, optionally, further analysed. Any R skills you may have honed can thus be applied readily for visualisation and statistical analysis of model outputs.
The Universal Simulator comes with a toolbox of model building blocks that you can use in your boxscripts for model-building. However, if you are going to build a model of any complexity, you will likely need to create additional building blocks, custom-made for your modelling needs. Building blocks are written in C++. You code your own by downloading the source code for Universal Simulator, as it includes a simple (yes, it is simple) programming framework for defining new building blocks based on the Box base class. C++ from scratch is difficult but the Box framework let’s you cheat and write C++ code without knowing the technical details of what goes on under the surface.
You will use Qt Creator to write and manage your C++ code. You will use Qt Creator on its open-source license, which means that Universal Simulator and your code as well must be open-source. Qt Creator and Universal Simulator work on the major computer platforms: Windows, Mac and Linux. You will, lastly, need a decent text editor to write boxscripts. Any one will do but if you use Notepad++ or Atom, you will benefit from the highlighting of BoxScript syntax.
xxx
It seems to be a common understanding, which I will not hesitate to deem a misunderstanding, that graphical tools are superior to text-based tools. Surely, it is easier to draw a model on the screen, mousing around, dragging and dropping, than it is to type code into a text editor? Well, if that were the case, why is R then such a global success? The very first time you set forth to create a scatter plot in R, carry out a linear regression and overlay the scatter plot with the estimated line, you will be too exhausted after the ordeal to be capable of answering that question. You could have achieved the same result in a spreadsheet in no time. However, after the quick solution in the spreadsheet (which, admit it, was just a first stab at analysing your data), you realise that you have to filter your data, add random factors and, by the way, produce a print-ready figure of 68 mm width for the journal (in addition to the full-coloured one for your talk). After finishing that job in the spreadsheet, you are too exhausted to think of, that you will have to repeat, more or less, the same procedure for five other data sets. How again did I fix this axis? Did I right-click it? Did I change the layout or the options? Enter R. After having produced the first figure, the whole procedure is documented right there in the code. Copy and paste to do the other five data sets or, if you are experienced, cut out the common code and package it into a function that does most of the job. This same benefit is true for BoxScript, when you compare it to a graphical tool. Graphical tools make the first baby steps easy but also pose a hindrance to come up and running.
R is a success also because it can be extended with new functionality through its package system. BoxScript takes the same approach, as you can define new building blocks, which are then available in your boxscripts. When you are writing building blocks in the Box framework, you create them in your own C++ namespace, say, you call it savanna. Your savanna building blocks ends up in a savanna module (technically, a binary library file, i.e., on Windows a DLL file), which you can upload to make it immediately useful to other users.
Other reasons for R's success, which work in concert with those above, are that it is open-source and that it is an interpreted language. The latter means that you can execute your R code immediately. BoxScript and Universal Simulator are open-source too but BoxScript cannot be executed directly, as we'll quickly see from the examples in the following. Another difference is that BoxScript is a declarative language whereas R is an imperative language. In BoxScript you describe what a model consists of but not how the simulation should be carried out. In R you describe how every computational step should be carried out.
There are many programming languages already. You may be well-versed in a few of them, or you may be a newbie in the programming world. Why should you learn BoxScript in addition, or maybe as a first, language? Well, first of all BoxScript is really simple. There is nothing much to learn. That's because it was designed as a domain-specific language (DSL). The domain is modelling, well, not any kind of modelling. If you have a system which you can describe adequately with differential equations, go ahead, and use software for that domain. This is not the domain BoxScript was designed for. But if your model contains heterogenuos components (building blocks) made out of math and algorithms (e.g., code) that is what BoxScript is for. Such models can be difficult to manage, e.g. difficult to extend and re-use, but with BoxScript they become manageable. In R, I am telling you, they do not. I created BoxScript because I found no language or tool that could solve this problem for me.
While the domain of heterogenous, modular (i.e., build block-based) modelling is still rather broad, the more precise domain is defined by the building blocks available in the toolbox. What those are have been determined by the various projects that I have been worked on through the years in collaboration with students, researchers and engineers in academia and R&D companies. Thus you will find building blocks to model crops, pests, population dynamics and greenhouse microclimate. In addition, there a building blocks to deal with generic modelling tasks, such as random number generation, uncertainty and sensitivity analysis, and visualisation of model outputs.
xxx
Enough talk. Let's see what a real boxscript looks like. Here's one:
// butterfly.box⏷
Simulation sim {
Calendar calendar {
.begin = 01/05/2009
.end = 30/09/2009
}
Records weather {
.fileName = "flakkebjerg 2009.txt"
}
Box butterfly {
DayDegrees time {
.T0 = 5
.T = weather[Tavg]
}
Stage egg {
.initial = 100
.duration = 140
.timeStep = ../time[step]
}
Stage larva {
.inflow = ../egg[outflow]
.duration = 200
.timeStep = ../time[step]
}
Stage pupa {
.inflow = ../larva[outflow]
.duration = 100
.timeStep = ../time[step]
}
Stage adult {
.inflow = ../pupa[outflow]
.duration = 28
.timeStep = 1
}
}
OutputR {
PageR {
.xAxis = calendar[date]
PlotR {
.ports = weather[Tavg]
}
PlotR {
.ports = Stage::*[content]
}
}
}
}
Even if you are not an entomologist, you should be able to recognize the series of life stages that leads to an adult butterfly. You should also be able to guess from this code, how many eggs we start out with, and on what date the eggs were laid. The concept of day-degrees may be new to you, but it is easy to understand by example: If you have a threshold of 5 degrees and an average temperature of the day of 23 degrees then that corresponds to 18 day-degrees. A cold day below the threshold, would correspond to zero day-degrees. Right, so what is the temperature threshold in this model? And, what is the source for daily temperature readings? See if you can guess that information in the boxscript above.
You will notice that each of the three immature stages (egg, larva, pupa) has a timeStep
that refers to ../time[step]
. For the adult butterflies, it is more straightforward as timeStep
is set to 1 (One what? We'll return to that). First of all, you should appreciate the structure of the file, which is by no means original; you will find the exact same structure in standard languages, such as XML, JSON, HTML, etc. It is a hierarchy of boxes inside boxes, each box delineated by a pair of braces { }
. The type of the box (which corresponds one-to-one with the C++ class defining the behaviour of the box) is written in front of the braces, optionally, followed by the name of that particular box (the name of that particular object in programmer's parlance).
Thus this box is of the Stage
class and is named pupa
:
Stage pupa { }
And this box is of the OutputR
class and is unnamed:
OutputR { }
Each kind (class) of box defines the inputs
that it will take, and the outputs
that it will compute and make available to other boxes. A common name for inputs and outputs is ports
. You can look up the ports of a certain box class in the class's documentation.
To set the value of an input, you precede its name with a period. Here duration
is set to 100:
.duration = 100
In the butterfly boxscript, you can find examples of the various types of inputs. If the input is a string (i.e., a piece of text), it must be written in apostrophs:
.fileName = "flakkebjerg 2009.txt"
Dates can be written in European (day/month/year), international (year/month/day) or American (/month/day/year) format. So, these three lines are equivalent:
.end = 30/09/2009
.end = 2009/09/30
.end = /9/30/2009
Note, that the year must be written in full, and that American notation is preceded by an extra slash. Leading zeroes on day and month are optional.
Finally, but importantly, an input can be set not to a fixed value but to the output from (or the input to) another box. This line will set the value of the T
input equal to the output Tavg
output delivered by the weather
box:
.T = weather[Tavg]
In this case T
is a scalar input (defined in the C++ implementation of the Stage
class and reported in the documentation of the Stage
class), which means it takes exactly one value. Hence, there must be exactly one match when weather[Tavg]
is looked up in the boxscript. Had T
been defined as a vector input then it would be all right, if zero or more matches were found.
References to ports are written as paths. Just like a file path (e.g., "/home/documents/letter.pdf" or "*.pdf") may point to zero, one or more files in the directory of folders in your computer's file system, a BoxScript reference may point to zero, one or more ports in the boxscript. In BoxScript references, the port name is written in brackets at the end of the path. Here are some examples of references used in the butterfly boxscript:
butterfly/egg[duration]
egg[inflow]
*[inflow]
Slashes join one box (the parent) to a box inside (the child). An asterisc functions as a joker (any name) and is likely to yield many matches (four in the case of *[inflow]
). Optionally, you can specify which class a box on the path must belong to. Thus this path matches a port named content
found in any box of the Stage
class:
Stage::*[content]
References will often benefit from being relative, which is marked by an initial single-period (meaning me) or double-period (meaning my parent). You can find this path in the butterfly boxscript:
.inflow = ../larva[outflow]
It refers to the parent (look it up and you should find butterfly
to be the parent) and inside that the child larva
from which it takes the outflow
output. In effect it refers to the sibling box called larva
.
Without this explanation you would likely have guessed the meaning correctly, despite not knowing the details about how paths specify references (in fact, there are even more details given in the documentation of reference paths).
If we step back once again to get an overview of the model dynamics, we can see that the first three life stages will last 140, 200 and 100 day-degrees, respectively. The adult stage lasts, in fact, 28 days. That's because the time step by default is 1 day. The time step is specified by the Calendar
box (if such is present).
All box inputs will have a sensible default value, which they will keep unless you change the input value in the boxscript. Thus the calendar
box in the butterfly boxscript have many inputs besides the begin
and end
inputs, which were set explicitly. One of them is timeStep
which defaults to a value of 1
, another is timeUnit
which defaults to "d"
for days. You can look up the default values for all inputs in the class documentation.
Most often you would like to see model outputs as time series in a figure. You specify this with an OutputR
box (not more than one), which can contain one or more PageR
boxes (each will produce one page of output in R), which again each may contain one or more PlotR
boxes (each producing one plot on the page). All plots on a page have the same x-axis.
Here is the page, consisting of two plots, produced by the boxscript :
We chose to put date on the x-axis:
.xAxis = calendar[date]
Each plot may show one or more ports. In the left plot, we show just a single port, namely the daily average temperature, resulting in one curve in the plot:
.ports = weather[Tavg]
In the right plot, we show the current number of individuals in each life stage, resulting in four curves in the plot:
.ports = Stage::*[content]
If you are not satisfied with the default plots, you can write your own R scripts to produce plots and do other post-processing of the simulation outputs, which are all accessible as a sim
data frame in R. You would specify these scripts as inputs to the PageR
box:
PageR {
.scripts = c("my-analysis.R", "my-plots.R")
}
(Yes. You construct vectors in boxscript with a c
function, just like in R.)
It turns out that "BoxScript" is so obvious a name for a language or a script that it has been used earlier (see e.g. Liu & Cunningham 2005 and Aldebaran Robotics). In comparison, I believe you will find the present BoxScript a lot simpler!
xxx
The Universal Simulator provides a prompt as the user interface. Under the hood, it contains the BoxScript Engine, which carries out the actual simulations. It collaborates smoothly with R to produce simulation outputs. This is the overall process:
Here, the green-dashed boxes are text files (such as weather files) and R scripts optionally supplied by the modeller. The violet-punctuated boxes are temporary files, together with content copied to the clipboard (the R snippet). These can usually be ignored by the modeller but they might come handy, when debugging faulty models.
The Boxscript Engine reads the boxscript and other input files (specified in the boxscript) — and carries out the simulation. The user steers this process through the load and run commands typed in at the prompt. Remember, that a boxscript does not include the code defining the behaviour of the boxes that make up the model. It only declares which boxes the model is constructed of, while setting the input ports to the desired values. The behaviour of the boxes is defined in the C++ implementation of each Box
class. Technically, those Box
class implementations are loaded from dynamic link libraries by the BoxScript Engine. If you should refer to a Box
class that is lacking from these libraries, you will be told so by an error message at the prompt.
The simulation output comes as a tab-separated text file, which contains columns corresponding to the ports, referred to in the boxscript's PageR
and PlotR
boxes, with a row for each time step of the model. In addition, an R script is produced that will show the demanded plots (R ggplot is used for this purpose). For ease of use, finally, a snippet of R code is put into the clipboard, so that all you have to do as a user, once the engine has finished its job, is to paste the clipboard at the R prompt. The plots will appear immediately, optionally amended with analyses defined in additional R scripts supplied by the user (which R scripts can be specified in the boxscript).
The Boxscript Engine is embedded inside the Universal Simulator, so there was a need for a user interface to interact with it. This could have been designed with a top menu line, roll-down and pop-up menus, click buttons and other user input paraphenalia. But it isn't. You commandeer Universal Simulator through its prompt. For example:
Other commands include
The manual contains the full list of commands and their documentation. Here, I shall only give the explanation that items in brackets [ ]
are optional. and that the pipe symbol |
separates options of which one or more can be included. You have already guessed that you should replace content in sharp parentheses (< ... >) with your own text.
This prompt is much more restricted in use than the R prompt. At the R prompt, you can write R code. At the Universal Simulator code, you cannot write BoxScript code, you can only issue a few commands to manage (load, run, edit, list, search) your boxscripts and to look up help.
Download the latest version of Universal Simulator with the freshly updated Virtual Greenhouse model.
2 May 2024
Read our paper on the Cereal Aphid-Fungus model and study the detailed documentation. Any questions? Write us.
2 Aug 2023
We remain candy-coloured until further notice.
1 Aug 2023
Any questions concerning our models and tools? Interested in visiting the lab? Want to chat online? Write us.