C++ API Design for SWIG

SWIG, the “Simplified Wrapper and Interface Generator”, is a tool which automatically generates wrappers that enable C and C++ code to be called from many other languages, such as Java, .NET, Python, Perl and Ruby. This article looks at how to design an API with SWIG in mind. Differences between C/C++ and the target languages, and limitations of the tool, mean some styles of APIs can more successfully be wrapped than others. This article discusses generating wrappers for Java, .NET and Python, but the principles can be applied more generally.

Approaching SWIG

You get the best results from SWIG by designing your API with SWIG and the desired target-languages in mind from the start.

SWIG-friendly API conventions should be decided upon and followed through, and whatever work is necessary to get the language-bindings generated and built automatically for all the desired target languages should be introduced early in the development process, so it can be monitored that the results are acceptable. You’ll then be in a position where it can be expected that any future changes to the API that follow the same conventions will result in working and palatable results in your target languages. This is a great position to be in, because you’ve done some initial investment to get the automation, but once it’s in place you can rely on it to continue to work automatically, rather than needing constant adjustment or further code written by hand to wrap-up the SWIG-generated output.

API Paradigm

SWIG can generate wrappers for both C and C++ code, but if you’re looking for a low-maintenance solution, then you’re better off wrapping a class-based C++ API.

Whilst C code can be wrapped by the tool, and you could hand-craft classes in the target-language to wrap up, for example, handle-based C API functions, that leaves you with a lot of ongoing work to maintain those wrapper classes by-hand in each target language of interest. It can be done, and you might deliberately choose to do it, seeing SWIG as a convenience to avoid having to deal with the native-interface code (like P/Invoke in .NET or JNI in Java) for each target-language when interfacing with your C API, whilst also wanting to have fine control over the API that a caller uses in the target-language.

However, this article considers approaches that minimize the need to maintain hand-crafted code on the generated end. Object-oriented programming is a dominant paradigm in many of the target languages that you’re likely to want to generate bindings for using SWIG, so starting off with an object-oriented API using C++ classes avoids a mismatch that would result in un-idiomatic generated code.

Primitive Types

Primitive types, such as integers and booleans, will make frequent appearance in the method parameters and return types of most APIs.

Care must be taken that any types used map to appropriate (i.e. idiomatic and also compatible) types in each target language of interest. The types should also be used consistently (e.g. if you’re going to use a 32-bit signed integer as a collection size in one place, use that type for collection sizes elsewhere too).

bool maps to boolean types in Java, .NET and Python so is safe to use.

Both float and double map to types of the same name in Java and C#, and both map to Python’s float type, which is actually implemented as a double-precision type in-practice, so float and double are OK too.

Integers can be a tricky area, as C and C++ have so many different integer types, with considerations of both signed vs unsigned and fixed size vs platform-dependent size coming into play.

The best plan is to look at the target languages you’re using to decide what will work. For example, Java uses signed for the integer types, and whilst C# does support unsigned integer types, they’re a second-class citizen and not supported by all .NET languages. The one exception is that a byte is unsigned in C#, but signed in Java. Both Java and C# use fixed sizes for the integer types (e.g. int is 32-bit and long is 64-bit). Python’s built-in integer types are also signed, with less variety, just “plain integers” (typically 32-bit or 64-bit size) and “long integers” (arbitrary sized integers, a.k.a. bignums).

Given the above, I’d suggest typedef-ing integer types for byte, 16-bit signed, 32-bit signed and 64-bit signed, and using those typedefs exclusively in your SWIG-facing API definitions. Your choice of integer types may differ depending on the target languages of interest, e.g. if your target language doesn’t have a built-in 64-bit integer type.

Here’s an example of typedefs for primitive types that are to be used throughout a SWIG-wrapped API. Even if the typedefs are trivial, like for bool, it’s good for future-proofing to define and use typedefs for all primitive types in your API definition, because later on you may want to conditionally handle a given type for a particular target language which does things a bit differently from the ones you initially support.

String Types

Strings are another good argument for preferring C++ over C when using SWIG. SWIG provides built-in support for the C++ std::string and std::wstring types, so you don’t need to worry about problems of ownership and memory management that you would with C char* or wchar_t* strings. Support for mapping std::string is enabled by including std_string.i in your SWIG interface definition. For std::wstring support include std_wstring.i.

Unicode is supported by the Java and Python bindings, but not by C# bindings (even with std::wstring). The lack of out-of-the-box Unicode support for C# is disappointing, but it can be fixed with custom string marshalling, as shown in my example on Github linked at the end of this article.

As with the primitive types, it would be a good idea to typedef your string type, so that you can change its definition for some target languages if necessary in the future.

Enumeration Types

Enumeration types are supported by SWIG, but you will want to consider whether you’re satisfied with the way they are being represented in your target languages.

For C#, enums in C++ are mapped to enums in C#, so nothing to worry about here.

By default, C++ enums are wrapped with Java classes with static members named after the enum labels. You can tell SWIG to generate real Java enums by adding the following to your SWIG interface file:

In Python, the enumeration labels will be defined at the module level (not namespaced inside a class for example). Admittedly that scoping is consistent with the original C++ (unless it was an enum class), but I still find it a little jarring, as we lose any notion of the enum being a separate type in its own right. You can use C++11 enum class to achieve namespacing after a fashion, but it is wrapped as EnumTypeName_LabelName, which I don’t think is particularly idiomatic.

Pointers and References

C++ supports passing and returning objects by value, by reference and by pointer. References and pointers can also be const or non-const.

Non-const references don’t work in the current version of SWIG (the generated C# wrappers don’t compile), and if you use pointers, SWIG will generate awkwardly named pointer classes like SWIGTYPE_p_std__string for a pointer to a string, which you won’t be able to do anything with in the target-language, except pass back into the C++ code. Pass and return by value and pass and return by const reference are both safe to use.

The following class declaration summarizes the situation (options that compile but produce poor quality bindings are marked as “Un-idiomatic”):

Bottom line: pass and return objects by value or by const reference. If you want non-const references or pointers for the benefit of C++ callers, see the Preprocessor section below to see how to hide those methods from SWIG.

Member Variables

Private member variables are ignored by SWIG as you would expect. Protected and public member variables may be exposed in your API. I would lean towards avoiding them if possible, and preferring use of methods, for maximum control and to allow for languages where wrapping of members may be suboptimal, but SWIG does an adequate job for C#, Java and Python.

Both C# and Python wrappers will have member properties of the same name that you can get and set with automatic translation across the language boundary. Java wrappers will have getMemberName() and setMemberName() methods generated for retrieving and setting the members, and whilst that will make the interface look different from the original C++ code, it is idiomatic for Java.

Callbacks

If you want your C++ code to be able to call back into a method implemented in the target language, don’t use function pointers, use virtual methods.

With virtual methods, a caller in the target language can subclass your wrapped C++ class and implement the virtual method in the same way as they would for any other class in the target language. SWIG takes care of the translation using its directors feature. This is not enabled by default. You can enable it either when invoking SWIG with the command-line arguments -directors -features directors, or in the SWIG interface file as follows (replacing mylibrayname with your module name):

Some caveats to be aware of:

  • An abstract (i.e. pure-virtual) method in the C++ class will not be abstract in the target language, but it can be over-ridden in the target language. If it is not over-ridden, an exception will be thrown.
  • It is inadvisable to callback into the target language from a different thread from the one that the target language originally entered the C++ code. Consider that the language runtime may be storing thread-local data, or it may simply not expect to be running in threads not created by the language runtime. Whilst .NET does seem to cope with this, Java can be problematic and Python won’t work at all.
  • See next section for discussion of exceptions thrown by callback.

Exceptions

C++ to Target Language

If your C++ methods may throw exceptions to report errors, you will have to decide on how they should be translated into the target languages. SWIG provides a fair amount of flexibility for exceptions, but the better quality the translation, the more work you’ll have to do, and the more it will be language specific (a downside if you’re trying to support more than one target language).

One approach you can take, is to use C++ exception throw specifications on methods that may throw, indicating what exceptions they may throw, and to define SWIG typemaps to translate to exceptions in the target language. If you add %include <exception.i> to your SWIG interface file, you can make use of some standard SWIG exception mappings:

  • SWIG_MemoryError
  • SWIG_IOError
  • SWIG_RuntimeError
  • SWIG_IndexError
  • SWIG_TypeError
  • SWIG_DivisionByZero
  • SWIG_OverflowError
  • SWIG_SyntaxError
  • SWIG_ValueError
  • SWIG_SystemError
  • SWIG_UnknownError

You may want to make special considerations for Java though, given that it requires throws specifiers in order for exceptions to be propagated.

Here is an example of a C++ exception class (hidden from SWIG), along with typemaps (one for Java specfically and one for other languages using one of the standard SWIG exception mappings). Note the $1.what() calls in the typemaps that pass the message from the C++ exception to the translated exception.

Then if a method declares that it throws the MyIOException from this example, SWIG will translate the exception for the target language.

In Java, this method will be generated with the signature:

In other languages, appropriate built-in exceptions for IO errors will be used if the C++ method throws. In Python, IOError will be raised, and in C#,  System.IO.IOException will be raised.

In this example, I have shown defining the typemaps inline in the C++ header, but you could of course define them separately in the SWIG interface file if you prefer. If you have a lot of exceptions to be mapped, you might wish to condense the typemap definitions with some macro magic.

Note also that you may prefer not to use built-in exceptions in the target language, but instead to either hand-craft some exception classes in each target language, or to arrange for C++ exception classes to be wrapped. Both options are achievable.

Target Language To C++

That only deals with exception translation in one direction though.

If you’re using virtual methods to callback into the target language, you also need to consider the translation of exceptions in the other direction.

However, SWIG’s support in this area is poor.

The Python bindings support throwing an exception if a virtual method call failed:

This doesn’t seem to be supported for Java or C#. It’s not a great solution for Python either, as there’s no meaningful translation of exception data (you just get to know there was an exception).

Don’t forget either that you may have to ensure the translated exception gets translated again back to the target language if the sequence is target language => C++ method => target language callback.

For practical purposes, you have three options:

  1. Document that virtual method implementations shouldn’t throw exceptions
  2. Dig into SWIG customization features, probably per-language, to try and do some better translation
  3. Do some hand-crafting of wrappers in the target language(s) that arranges for exceptions to be caught in the target language and, for example, returned from some proxy method to C++ as some kind of data type with information about the exception

I have followed the third option in the past. It’s not ideal given a goal is to avoid intervention once the automation has been put in place, but it may be supportable if virtual method callbacks into the target language are only an occasional feature of your API.

Hopefully this is an area where SWIG will be improved in the future.

Collections and Templates

SWIG has built-in support for various C++ standard library collection templates. The supported templates vary according to langauge, but all 3 of Python, Java and C# support both std::vector and std::map.

The bindings in Python and C# are reasonably idiomatic, with support for native collection iterators (so you can use foreach in C# and for ... in in Python, and both languages also support index-style syntax with the wrappers for std::map.

The Java bindings are adequate, but lack the syntactic sugar of the C# and Python bindings.

To enable SWIG’s built-in support for these templates, include std_vector.i and std_map.i in your SWIG interface file.

SWIG can wrap instances of templates, so you must declare each instance that you want a binding generated for using the %template keyword, specifying the name of the instantiated class. If you have a lot of collection types used in your API, I like to wrap up declaring the template instantiations with a macro that also declares a typedef that gives the same name in C++ as we’re giving to the wrapped type in the target language, as in the following example:

In addition to SWIG’s built-in support for certain templates, you may also expose your own template definitions to SWIG and similarly instantiate them using the %template keyword for all types that you need in the target languages (or that are used by other classes in your API).

Preprocessor

As you will have seen from some of the code snippets above, you can use the preprocessor to control what definitions are seen by SWIG and what by the C++ compiler.

The SWIG preprocessor symbol is defined whenever the SWIG tool is being run, so you can use it to divide your definitions between those seen by SWIG and those that are not. You may want to expose different definitions when SWIG does its pass (like in the case of the exceptions example above, where the exception class was seen by C++, but the typemaps for exception translation were seen by SWIG). You may also just want to hide code that won’t wrap well from SWIG (code that presumably is still useful to C++ callers).

You can also control what definitions, both in your C++ header files and in the SWIG interface file, are seen by each SWIG target language using SWIG’s language-specific preprocessor symbols. See the SWIG Preprocessor documentation for the full list, but here are some examples:

  • SWIGJAVA – Java target
  • SWIGPYTHON – Python target
  • SWIGCSHARP – C# target

Finally, if your C++ definitions are split across multiple headers, probably included by the main header file for the library, then you’ll need to pass the -includeall command-line parameter to SWIG, and also to hide any system/standard library headers inside an #ifndef SWIG block so that SWIG won’t try looking for them.

SWIG XML

If you find you can’t get the results you want with SWIG (or indeed if there is a target language which SWIG doesn’t support which you want to use), there is a route of last resort.

SWIG supports an XML target language, which results in an XML file being generated containing the information that SWIG has parsed from the C++ headers.

You could use this as the input for your own code generator, enabling you to get a somewhat more easily parsed representation of your library’s interface, whilst having full control over the language bindings.

It can be done (I did it once), but you’re probably better off sticking with SWIG’s own extensibility mechanisms, unless it is indeed a non-supported language that you are targetting.

Example Code

As I’ve mentioned above, SWIG does have various mechanisms available for you to customize the language bindings it generates, so it is worth looking into them and deciding what makes sense for your use case if you want to get the best quality results.

To give an impression of what’s possible though, I’ve put together an example that covers the various topics discussed above on the Softwariness Github site. I’ve just created build files for Visual Studio/Windows, but there is nothing inherently Windows-specific there so you can easily create a makefile for other platforms if you want. The example uses unit tests in Python, C# and Java to show that the various features are working as expected in the target languages.

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInEmail this to someonePrint this page

Comments are closed.

Comments are closed