Machinetalk explained Part 5: Code Generation

This blog post is part 5 of the Machinetalk explained series describing the concepts and ideas behind the Machinetalk middleware.

In this section, I describe the part of Machinetalk which probably needs most explanation. What I’m talking about is Machinetalk GSL – language bindings for Machinetalk using code generation.

But before we delve deep into code generation and meta-programming I recommend you to read the previous parts of Machinetalk explained:

A Case for Model-Driven Software Development

Before I start to explain MOP (Model Oriented Programming) it is important that you understand the reasons behind choosing this development approach.

I began to work on Machinetalk when I started the QtQuickVcp project. The project is a combination of UI components and Machinetalk bindings written in Qt/C++. It turned out that coding the language bindings rather simple using ZeroMQ and Protobuf. However, the hardest part was to look up the Machinetalk API in the Machinekit code sources and to implement the protocol flow correctly.

In the Fall last year I began to work on the Machinetalk bindings for Python pymachinetalk. For implementing the Python language bindings, I had to do almost the same things as for the Qt/C++ language bindings. Furthermore, I noticed that I also had to implement almost the same things for every new Machinetalk service I added to Machinekit.

As a programmer, I know that whenever you need to do a repetitive task it is time for abstraction. Your abstraction can be a function, an object or a module. But, we are usually only taught how to abstract in a particular programming language.

However, this particular problem is only partially solvable using language dependent means of abstraction. For example, we could introduce abstract Machinetalk service classes in C++ to make the task less repetitive. You can easily see that we need to do the same thing again in Python and any other programming language. The problem gets worse the fewer means of abstraction a particular programming language offers – think of a C Machinetalk language binding for example.

Other middleware solutions such as ROS have same the problem. Vendors provide a reference implementation for a particular programming language the rest is left to the community. However, living in a strongly heterogeneous world, we cannot accept middleware solutions that work only for a particular programming language

When studying the ZeroMQ reference manual – the zguide I came across the Code Generation section mentioning the GSL tool. It easily caught my attention since iMatix claims to use it to build protocols themselves.

Now that you have seen the problems that lead me to explore the MOP approach for Machinetalk it is time to explain Model-Oriented Programming.

Model-Oriented Programming

Model-Oriented Programming (MOP) is the application of model-driven development methods to programming. In comparison to traditional Model Driven Architecture (MDA) development approaches it does not depend on any general-purpose modeling language such as the Unified Modeling Language (UML).

Scientists found that model-centric software development approaches have not been widely adopted although they are in general considered good practice. A majority of the interviewed programmers claim that they do not think the generated output of code generators can be considered decent code.

Furthermore, general-purpose modeling languages are often found to be too generic. In MOP this problem is approached by developing not only domain-specific abstract models but also code generators for domain-specific modeling languages. Therefore, MOP can be applied to express concepts related to the problem domain. From these models not only code skeletons without a function can be generated but also full working software components.

MOP is most useful for projects that require repetitive coding. Moreover, models created in the process of MOP are technology and language independent and convertible to domain-specific and optimized source code. An advantage compared to general-purpose modeling languages is that code generators can be optimized to generate high quality and human-readable source code.

iMatix Generator Scripting Language

GSL by iMatix is an open source code construction tool and MOP language.
GSL uses simple XML documents without style sheets and namespaces as model files. Therefore, GSL is a Textual Modeling Language (TML) and shares all the benefits of text-based modeling.

You need no special software to edit GSL files. However, I found it most useful to use GNU Emacs to edit the gsl files. Put the editor into the major-mode for the corresponding language (e.g. python-mode) and enable the following minor mode by issuing gsl-mode.

(define-minor-mode gsl-mode
"Highlight two successive newlines."
:lighter " gsl"
(if gsl-mode
    (highlight-regexp "\\(^\\..*\\)\n" 'hi-green-b)
  (unhighlight-regexp "\\(^\\..*\\)\n"))
(if gsl-mode
    (highlight-regexp "\\(\\$(.*?)\\)" 'hi-red-b)
  (unhighlight-regexp "\\(\\$(.*?)\\)")))

I found the electric indent mode very annoying when editing GSL files. You can easily turn it off by issuing electric-indent-mode.

gsl-svg

The GSL interpreter uses XML and GSL documents as input. It extracts data from the XML files and pushes it into a data tree.

The GSL interpreter interprets GSL documents in template or script mode depending on the selected mode.

If the interpreter is in script mode it interprets each line as GSL command except lines starting with a . symbol. All other lines are directly output to the specified output file.

In script mode, the interpreter does the exact opposite.

GSL Example

I personally prefer examples over long descriptions. Therefore, I created a simple GSL example for generating Python classes from an abstract model.

The model model.xml looks as follows:

<?xml version = "1.0" ?>
<module name = "foo">
    <class name = "bar">
        <property name = "foo bar"/>
        <property name = "bar"/>
    </class>
</module>

To generate code from the model we need a GSL template. The template pygen.gsl looks as follows:

.template 1
.output "$(module.name:c).py"
.for class
class $(class.Name)(object):
    def __init__(self):
.  for property
        self._$(name:c) = None
.  endfor

.  for property
    @property
    def $(name:c)(self):
        print('queried "$(name)"')
        return self._$(name:c)

. endfor
.endfor
.endtemplate

As you see, without proper code highlighting the template becomes rather confusing. With the gsl-mode enabled the same code looks as follows:

gslcode

When we execute the script with the following command gsl -script:pygen.gsl model.xml this results in the Python module foo.py:

class Bar(object):
    def __init__(self):
        self._foo_bar = None
        self._bar = None

    @property
    def foo_bar(self):
        print('queried "foo bar"')
        return self._foo_bar

    @property
    def bar(self):
        print('queried "bar"')
        return self._bar

Of course, we wouldn’t create a model for such a simple problem in real life. However, it demonstrates the capabilities and the simplicity of GSL very well. We also see that the approach becomes saner with increasing complexity of the model which would be in this case if we add more modules, classes, and properties.

According to a discussion on Reddit, GSL is a second-order meta-programming language. Second-order meta-programming means using this language one can build Domain Specific Languages, which is what we need for the Machinetalk code generator.

Modeling the Machinetalk Middleware

Now that we have seen the tools used for generating the Machinetalk language bindings it is time to explain the modeling approach.

The Machinetalk middleware design is split into sub-models to decrease the complexity of individual models and to separate the scope of each model.

code_generation-svg

  • Protocol models contain messages and their relation to the Machinetalk Protobuf container.
  • Component models are used to design behavior and interface of software components.

The GSL compiler converts the models into executable language bindings for multiple programming languages. The ProtoBuf compiler generates messages classes. The generated component classes use these message classes to serialize and deserialize messages.

Developers implementing new language bindings only need to develop a GSL template (a code-generator) for the target language and component classes containing language-specific details.

Model and Protocol Layering

The Machinetalk middleware separates the models into three layers.

layering-svg

  • The Channel layer models the behavior of a single channel, such as, for example, the RPC or publish-subscribe channels.
  • The Composition layer composes multiple channels to form a multi-channel protocol. This method allows combining the power of publish-subscribe and RPC in services.
  • As the name suggests the models do not cover the Implementation layer. This layer enables the implementation of language dependent presentation of the message data.

Protocol Model

The protocol model has two main functions. First, it defines and documents all messages related to the protocol used by a Machinetalk channel or component composed of multiple channels. Moreover, it also clearly defines the relation between the structure of Protobuf messages and Machinetalk messages.

Protobuf as API

Protobuf is a great serialization technology. Unfortunately, it lacks a few things to work as API description for Machinetalk services.

First, Protobuf itself provides an Interface Description Language (IDL) for describing messages. However, it does not include tools to describe the relation between messages.

Secondly, Machinetalk uses a single top-level container messages and sub-messages for each protocol. The reasons behind this decision have been described earlier. However, this leads to the problem that a single message description is not enough to describe the API of a Machinetalk.

Example

An example is worth a thousand words:

<data name="command">
    <field name="ticket" requirement="MAY" />
    <response name="emccmd executed" />
    <response name="emccmd completed" />
    <response name="error" />
</data>

<message name= "emc task plan run" inherit="command">
    Run the task planner from the specified line number.
    <field name="emc_command_params" message="EmcCommandParameters" requirement="MUST">
        <field name="line number" requirement="MUST" />
    </field>
    <field name="interp_name" requirement="MUST" />
</message>

<system name="RPC">
    Description of RPC components.
    <include filename="rpc_client.xml" />
    <include filename="rpc_service.xml" />
</system>

The model contains the description for all messages used in a system (combination of client and server / publisher and subscriber). Based on this model the code generator produces the protocol documentation.

Component Model

The component model describes the component state machines, channels, sockets, and timers.

The state machines are defined in the SCXML format, a W3C standard for defining state machines. My favorite editor for editing these charts Qt Creator (>= 4.2), but you can also find free and open source tools from other vendors to edit the files graphically. Alternatively, you can also use as a simple text editor to modify the XML source tree.

I don’t want to go to much into details about SCXML. Instead please take a look the following statechart generated by the Machinetalk dot-file generator:

subscribe-svg

The transitions and actions in the statechart have special meaning. Events can be triggered by incoming and outgoing messages, timers, and triggers and socket state changes.

Actions send messages, start and stop channels, start, stop and reset timers and trigger custom slots.

<trigger name = "start">
    <event name = "connect" when = "down"/>
</trigger>

<slot name="set connected" />
<slot name="clear connected" />

Another core element of the middleware components is the timer. A typical use case of timers in middleware components is sending and verifying period heartbeat messages.

<timer
    name = "heartbeat"
    interval = "2500"
    liveness = "2" >
    For monitoring if the connection is alive.
    <tick>
        <event name = "heartbeat tick" when = "up" />
        <event name = "heartbeat tick" when = "trying" />
    </tick>
    <timeout>
        <event name = "heartbeat timeout" when = "up" />
        <event name = "heartbeat timeout" when = "trying" />
    </timeout>
</timer>

At the core of the component model are the socket and channel definition. If a socket definition refers to a class this means we are working on a composition layer component reusing a channel layer component.

In addition to the events triggered by state changes, each socket contains definitions for incoming and outgoing messages. The public attribute defines the visibility of the message interface in the resulting software class.

<socket name="command" class="RPC Client" module="Machinetal2
    The command channel is used to issue commands to mklaunc3
    <state name="trying">
        <event name="command trying" when="up" />
    </state>
    <state name="up">
        <event name="command up" when="trying" />
    </state>

    <outgoing name="emc task abort" public="true" />
    <incoming name="*" />
        <event name = "any msg sent" when = "up" />
        <event name = "any msg sent" when = "trying" />
    </incoming>
    <incoming name="error" public="true">
        <note />
    </incoming>
</socket>

Please also note the use of the special tag note. This tag copies the content of a note message to the error string. I tried to avoid these implementation specific tags as much as possible.

Code Generators

Besides the models, the code generators are the second most important part of Machinetalk GSL.

The fundamental idea behind Machinetalk GSL is that for a new language binding one only needs to write a new code generator. The complexity of the code generator is far smaller than to write a complete language binding in any programming language.

For the core Machinetalk services I measured a code generation ratio (ratio between LOC of the code generator to the generated code) of 6 for Python and 10 for C++. This value increases with any additional Machinetalk service.

But before I talk too much about the benefits of code generation, let’s take a look at how to implement a new language binding.

To implement a new Machinetalk binding you need to fulfill the following requirements:

  • FSM implementation: required for the component state machines
  • Concurrency: Machinetalk uses an asynchronous API. Therefore we need some of a concurrency support such as an event loop of multi-threading.
  • Timers: Timers are required for heartbeat messages.
  • Service Discovery: such as mDNS/DNS-SD

Implementation Process

To implement a new language binding using Machinetalk GSL, I recommend the following process:

implementation_process-svg

  • First of all, research the minimum requirements in your target programming language and framework.

  • Next, create a small proof of concept implementation. This step will help you writing the code generator.

  • As the third step, generalize the proof of concept to implement the code generator. The existing implementations will help you.

  • When you completed the code generator, continue by implementing the implementation layer components using the newly generated language bindings.

Already implemented Code Generators

During the last year, I have continuously added code generators to Machinetalk GSL. Currently, the project contains code generators for the following programming languages, frameworks, and tools:

  • Qt/C++: used in QtQuickVcp
  • Python: for pymachinetalk, not yet integrated
  • Node.js: not used so far
  • JavaScript (Browser): used in WebVCP
  • Markdown + Graphviz Dot: used in Machinetalk-Doc
  • UPPAAL: used for formal verification of the middleware models

Conclusion

In this blog post, we have learned about code generators for the Machinetalk language bindings. We used the GSL language and tool to write the code generators and created XML models.

If you want to learn more about Machinetalk, GSL, and code generation I recommend you to take a look at the machinetalk-gsl GitHub repository.

Even if you are not going to work on Machinetalk GSL, I still can recommend taking a closer look at MOP to add it to your toolbox.

The end of this article also brings me to the end of the Machinetalk explained series. I hope you have enjoyed reading the articles and learned more about the Machinetalk middleware.

Please send me feedback, ideas, and recommendations.

Your
Machine Koder

4 thoughts on “Machinetalk explained Part 5: Code Generation”

  1. Great series. Thank you so much for putting in the time to develop it. I hope to reference it in the future as I break down the learning curve of MK. Keep up the good work!

Leave a Reply

Your email address will not be published. Required fields are marked *