#head.html
EcolMod Lab
#header.html#home#Do-it-yourself, open-source modelling

xxx

Motivation

You want to create a mathematical model of sorts. Well, not that mathematical maybe. More, like computational. You have this image in your mind, which you just put down on paper. Maybe it looks like this:

The math is not that complicated. The arrows represent equations specifying that both owls and cats eat mice, which again eat your morning cereals (important model restriction: exclude yourself as a cereal forager and leave out the cereal production system). If we looked inside the boxes and elaborated further, we could add how cereals is a resource for mice offspring. Throw in kittens and owlets, and we’ve got a whole ecosystem going. We’ll look into textbooks of population dynamics and quickly find a few, simple equations describing each sub-process. Having the system diagram (which have grown since our first sketch above) drawn on one paper and an assortment of equations listed on another paper, how do we proceed? Every bit is simple enough but to assemble this as a whole in a neat model running on your computer seems beyond your reach.

If you are an engineer, I am sorry for having bothered you with this bottomless pit of biological processes and ever-enfolded sub-processes and sub-sub-process. Maybe you are more into a physical systems, like a greenhouse (from j.compag.2017.08.020):

Image

So what’s the problem? Go ahead. Write down those differential equations, built on the safe physical principles of thermodynamics, heat exchange, etc. But then you realise that the smoothness of your equations belies the abruptness of the real-world system. You’ve got windows opening and closing, curtains being drawn, heat system turning on/off — a battery of effectuators controlled by the logic of the greenhouse climate computer. Trying to incorporate that logic into your equations disrupts their beauty and turn the model as a whole into an unmanageable monster. No. You need another approach: Describe the logic inside each box on its own and then connect the boxes, like Lego bricks, to build the whole model.

If you are a mathematician, you may have come to the wrong place, unless you want to learn about model-building that involves no fancy math at all.

BoxScript is a language developed at the Ecological Modelling Laboratory as a tool to compose models out of model building blocks, like those described loosely in the two examples above. It is not yet another software to draw models by dropping building blocks (boxes, arrows and whatnot) on a graphical canvas. BoxScript is a programming language used to write models in text files, which we call boxscripts.

Knowing how to write BoxScript code only gets you half-way. You also need software that will read the boxscript and run the model. Universal Simulator is the software created at the lab for this purpose. It reads and executes boxscripts and produces output that is seamlessly exported to R, where the output is displayed and, optionally, further analysed. Any R skills you may have honed can thus be applied readily for visualisation and statistical analysis of model outputs.

The Universal Simulator comes with a toolbox of model building blocks that you can use in your boxscripts for model-building. However, if you are going to build a model of any complexity, you will likely need to create additional building blocks, custom-made for your modelling needs. Building blocks are written in C++. You code your own by downloading the source code for Universal Simulator, as it includes a simple (yes, it is simple) programming framework for defining new building blocks based on the Box base class. C++ from scratch is difficult but the Box framework let’s you cheat and write C++ code without knowing the technical details of what goes on under the surface.

You will use Qt Creator to write and manage your C++ code. You will use Qt Creator on its open-source license, which means that Universal Simulator and your code as well must be open-source. Qt Creator and Universal Simulator work on the major computer platforms: Windows, Mac and Linux. You will, lastly, need a decent text editor to write boxscripts. Any one will do but if you use Notepad++ or Atom, you will benefit from the highlighting of BoxScript syntax.

xxx

Why BoxScript?

It seems to be a common understanding, which I will not hesitate to deem a misunderstanding, that graphical tools are superior to text-based tools. Surely, it is easier to draw a model on the screen, mousing around, dragging and dropping, than it is to type code into a text editor? Well, if that were the case, why is R then such a global success? The very first time you set forth to create a scatter plot in R, carry out a linear regression and overlay the scatter plot with the estimated line, you will be too exhausted after the ordeal to be capable of answering that question. You could have achieved the same result in a spreadsheet in no time. However, after the quick solution in the spreadsheet (which, admit it, was just a first stab at analysing your data), you realise that you have to filter your data, add random factors and, by the way, produce a print-ready figure of 68 mm width for the journal (in addition to the full-coloured one for your talk). After finishing that job in the spreadsheet, you are too exhausted to think of, that you will have to repeat, more or less, the same procedure for five other data sets. How again did I fix this axis? Did I right-click it? Did I change the layout or the options? Enter R. After having produced the first figure, the whole procedure is documented right there in the code. Copy and paste to do the other five data sets or, if you are experienced, cut out the common code and package it into a function that does most of the job. This same benefit is true for BoxScript, when you compare it to a graphical tool. Graphical tools make the first baby steps easy but also pose a hindrance to come up and running.

R is a success also because it can be extended with new functionality through its package system. BoxScript takes the same approach, as you can define new building blocks, which are then available in your boxscripts. When you are writing building blocks in the Box framework, you create them in your own C++ namespace, say, you call it savanna. Your savanna building blocks ends up in a savanna module (technically, a binary library file, i.e., on Windows a DLL file), which you can upload to make it immediately useful to other users.

Other reasons for R's success, which work in concert with those above, are that it is open-source and that it is an interpreted language. The latter means that you can execute your R code immediately. BoxScript and Universal Simulator are open-source too but BoxScript cannot be executed directly, as we'll quickly see from the examples in the following. Another difference is that BoxScript is a declarative language whereas R is an imperative language. In BoxScript you describe what a model consists of but not how the simulation should be carried out. In R you describe how every computational step should be carried out.

There are many programming languages already. You may be well-versed in a few of them, or you may be a newbie in the programming world. Why should you learn BoxScript in addition, or maybe as a first, language? Well, first of all BoxScript is really simple. There is nothing much to learn. That's because it was designed as a domain-specific language (DSL). The domain is modelling, well, not any kind of modelling. If you have a system which you can describe adequately with differential equations, go ahead, and use software for that domain. This is not the domain BoxScript was designed for. But if your model contains heterogenuos components (building blocks) made out of math and algorithms (e.g., code) that is what BoxScript is for. Such models can be difficult to manage, e.g. difficult to extend and re-use, but with BoxScript they become manageable. In R, I am telling you, they do not. I created BoxScript because I found no language or tool that could solve this problem for me.

While the domain of heterogenous, modular (i.e., build block-based) modelling is still rather broad, the more precise domain is defined by the building blocks available in the toolbox. What those are have been determined by the various projects that I have been worked on through the years in collaboration with students, researchers and engineers in academia and R&D companies. Thus you will find building blocks to model crops, pests, population dynamics and greenhouse microclimate. In addition, there a building blocks to deal with generic modelling tasks, such as random number generation, uncertainty and sensitivity analysis, and visualisation of model outputs.

xxx

What's BoxScript?

Enough talk. Let's see what a real boxscript looks like. Here's one:

// butterfly.box⏷
Simulation sim {
  Calendar calendar {
    .begin = 01/05/2009
    .end   = 30/09/2009
  }
  Records weather {
    .fileName = "flakkebjerg 2009.txt"
  }
  Box butterfly {
    DayDegrees time {
      .T0 = 5
      .T  = weather[Tavg]
    }
    Stage egg {
      .initial  = 100 
      .duration = 140
      .timeStep = ../time[step]
    }
    Stage larva {
      .inflow   = ../egg[outflow]
      .duration = 200
      .timeStep = ../time[step]
    }
    Stage pupa {
      .inflow   = ../larva[outflow]
      .duration = 100
      .timeStep = ../time[step]
    }
    Stage adult {
      .inflow   = ../pupa[outflow]
      .duration = 28
      .timeStep = 1
    }
  }
  OutputR {
    PageR {
      .xAxis = calendar[date]
      PlotR {
        .ports = weather[Tavg]
      }
      PlotR {
        .ports = Stage::*[content]
      }
    }
  }
}

Even if you are not an entomologist, you should be able to recognize the series of life stages that leads to an adult butterfly. You should also be able to guess from this code, how many eggs we start out with, and on what date the eggs were laid. The concept of day-degrees may be new to you, but it is easy to understand by example: If you have a threshold of 5 degrees and an average temperature of the day of 23 degrees then that corresponds to 18 day-degrees. A cold day below the threshold, would correspond to zero day-degrees. Right, so what is the temperature threshold in this model? And, what is the source for daily temperature readings? See if you can guess that information in the boxscript above.

You will notice that each of the three immature stages (egg, larva, pupa) has a timeStep that refers to ../time[step]. For the adult butterflies, it is more straightforward as timeStep is set to 1 (One what? We'll return to that). First of all, you should appreciate the structure of the file, which is by no means original; you will find the exact same structure in standard languages, such as XML, JSON, HTML, etc. It is a hierarchy of boxes inside boxes, each box delineated by a pair of braces { }. The type of the box (which corresponds one-to-one with the C++ class defining the behaviour of the box) is written in front of the braces, optionally, followed by the name of that particular box (the name of that particular object in programmer's parlance).

Thus this box is of the Stage class and is named pupa:

Stage pupa { }

And this box is of the OutputR class and is unnamed:

OutputR { }

Each kind (class) of box defines the inputs that it will take, and the outputs that it will compute and make available to other boxes. A common name for inputs and outputs is ports. You can look up the ports of a certain box class in the class's documentation.

To set the value of an input, you precede its name with a period. Here duration is set to 100:

.duration = 100

In the butterfly boxscript, you can find examples of the various types of inputs. If the input is a string (i.e., a piece of text), it must be written in apostrophs:

.fileName = "flakkebjerg 2009.txt"

Dates can be written in European (day/month/year), international (year/month/day) or American (/month/day/year) format. So, these three lines are equivalent:

.end = 30/09/2009
.end = 2009/09/30
.end = /9/30/2009

Note, that the year must be written in full, and that American notation is preceded by an extra slash. Leading zeroes on day and month are optional.

Finally, but importantly, an input can be set not to a fixed value but to the output from (or the input to) another box. This line will set the value of the T input equal to the output Tavg output delivered by the weather box:

.T = weather[Tavg]

In this case T is a scalar input (defined in the C++ implementation of the Stage class and reported in the documentation of the Stage class), which means it takes exactly one value. Hence, there must be exactly one match when weather[Tavg] is looked up in the boxscript. Had T been defined as a vector input then it would be all right, if zero or more matches were found.

References to ports are written as paths. Just like a file path (e.g., "/home/documents/letter.pdf" or "*.pdf") may point to zero, one or more files in the directory of folders in your computer's file system, a BoxScript reference may point to zero, one or more ports in the boxscript. In BoxScript references, the port name is written in brackets at the end of the path. Here are some examples of references used in the butterfly boxscript:

butterfly/egg[duration]
egg[inflow]
*[inflow]

Slashes join one box (the parent) to a box inside (the child). An asterisc functions as a joker (any name) and is likely to yield many matches (four in the case of *[inflow]). Optionally, you can specify which class a box on the path must belong to. Thus this path matches a port named content found in any box of the Stage class:

Stage::*[content]

References will often benefit from being relative, which is marked by an initial single-period (meaning me) or double-period (meaning my parent). You can find this path in the butterfly boxscript:

.inflow = ../larva[outflow]

It refers to the parent (look it up and you should find butterfly to be the parent) and inside that the child larva from which it takes the outflow output. In effect it refers to the sibling box called larva.

Without this explanation you would likely have guessed the meaning correctly, despite not knowing the details about how paths specify references (in fact, there are even more details given in the documentation of reference paths).

If we step back once again to get an overview of the model dynamics, we can see that the first three life stages will last 140, 200 and 100 day-degrees, respectively. The adult stage lasts, in fact, 28 days. That's because the time step by default is 1 day. The time step is specified by the Calendar box (if such is present).

All box inputs will have a sensible default value, which they will keep unless you change the input value in the boxscript. Thus the calendar box in the butterfly boxscript have many inputs besides the begin and end inputs, which were set explicitly. One of them is timeStep which defaults to a value of 1, another is timeUnit which defaults to "d" for days. You can look up the default values for all inputs in the class documentation.

Most often you would like to see model outputs as time series in a figure. You specify this with an OutputR box (not more than one), which can contain one or more PageR boxes (each will produce one page of output in R), which again each may contain one or more PlotR boxes (each producing one plot on the page). All plots on a page have the same x-axis.

Here is the page, consisting of two plots, produced by the boxscript :

Image

We chose to put date on the x-axis:

 .xAxis = calendar[date]

Each plot may show one or more ports. In the left plot, we show just a single port, namely the daily average temperature, resulting in one curve in the plot:

.ports = weather[Tavg]

In the right plot, we show the current number of individuals in each life stage, resulting in four curves in the plot:

.ports = Stage::*[content]

If you are not satisfied with the default plots, you can write your own R scripts to produce plots and do other post-processing of the simulation outputs, which are all accessible as a sim data frame in R. You would specify these scripts as inputs to the PageR box:

PageR {
  .scripts = c("my-analysis.R", "my-plots.R")
}

(Yes. You construct vectors in boxscript with a c function, just like in R.)

It turns out that "BoxScript" is so obvious a name for a language or a script that it has been used earlier (see e.g. Liu & Cunningham 2005 and Aldebaran Robotics). In comparison, I believe you will find the present BoxScript a lot simpler!

xxx

What’s the Universal Simulator?

The Universal Simulator provides a prompt as the user interface. Under the hood, it contains the BoxScript Engine, which carries out the actual simulations. It collaborates smoothly with R to produce simulation outputs. This is the overall process:

Image

Here, the green-dashed boxes are text files (such as weather files) and R scripts optionally supplied by the modeller. The violet-punctuated boxes are temporary files, together with content copied to the clipboard (the R snippet). These can usually be ignored by the modeller but they might come handy, when debugging faulty models.

The Boxscript Engine reads the boxscript and other input files (specified in the boxscript) — and carries out the simulation. The user steers this process through the load and run commands typed in at the prompt. Remember, that a boxscript does not include the code defining the behaviour of the boxes that make up the model. It only declares which boxes the model is constructed of, while setting the input ports to the desired values. The behaviour of the boxes is defined in the C++ implementation of each Box class. Technically, those Box class implementations are loaded from dynamic link libraries by the BoxScript Engine. If you should refer to a Box class that is lacking from these libraries, you will be told so by an error message at the prompt.

The simulation output comes as a tab-separated text file, which contains columns corresponding to the ports, referred to in the boxscript's PageR and PlotR boxes, with a row for each time step of the model. In addition, an R script is produced that will show the demanded plots (R ggplot is used for this purpose). For ease of use, finally, a snippet of R code is put into the clipboard, so that all you have to do as a user, once the engine has finished its job, is to paste the clipboard at the R prompt. The plots will appear immediately, optionally amended with analyses defined in additional R scripts supplied by the user (which R scripts can be specified in the boxscript).

The Boxscript Engine is embedded inside the Universal Simulator, so there was a need for a user interface to interact with it. This could have been designed with a top menu line, roll-down and pop-up menus, click buttons and other user input paraphenalia. But it isn't. You commandeer Universal Simulator through its prompt. For example:

  • load butterfly.box
  • run

Other commands include

  • load [<file name>]
  • run [<file name>]
  • edit
  • list [<path>] [p|r|i|x]
  • find <path>
  • help [<class name>]

The manual contains the full list of commands and their documentation. Here, I shall only give the explanation that items in brackets [ ] are optional. and that the pipe symbol | separates options of which one or more can be included. You have already guessed that you should replace content in sharp parentheses (< ... >) with your own text.

This prompt is much more restricted in use than the R prompt. At the R prompt, you can write R code. At the Universal Simulator code, you cannot write BoxScript code, you can only issue a few commands to manage (load, run, edit, list, search) your boxscripts and to look up help.

#right.html

Try it!

Download the latest version of Universal Simulator with the freshly updated Virtual Greenhouse model.

2 May 2024

Model just published

Read our paper on the Cereal Aphid-Fungus model and study the detailed documentation. Any questions? Write us.

2 Aug 2023

Home page overhaul

We remain candy-coloured until further notice.

1 Aug 2023

Contact

Any questions concerning our models and tools? Interested in visiting the lab? Want to chat online? Write us.

#footer.html