![]() |
|
||||||
Existing ApproachesThe general task of all frameworks and tools reviewed here, is to provide access to a C struct or sequences of such, given in a byte stream. Main property of this structured data type is, that all member variables are physically grouped in one continuous block of memory. Even arrays or nested structs are in the same block not references to other memory areas. The order of data inside structs is defined by the order of their declaration. If one member variable precedes another in the declaration of the struct, it will do so in memory as well. The position of a member relative to the start of the data block, depends on the order of members in the structure and their alignment/padding. In C, there is a standard alignment relative to physical memory addresses for basic types and structs, defined by the systems application binary interface (see e.g. Unix System V Application Binary Interface), which is mainly motivated by performance concerns in respect to the systems' hardware. The developer can explicitly declare different alignments for every type or variable he/she introduces. Thus, each member of the struct can have a specific alignment relative to physical memory addresses and this is the usual case, when data is written to byte streams, to remove unnecessary padding. In order to provide access to the data in a type-safe manner, the frameworks, tools and concepts reviewed here have to have the following building blocks:
The challenge hereby, is the lack of a Java type, which resembles the same properties as a structured type in C. Mainly because nested types and arrays will always be references. Therefore, the most practical approach, is to exchange the whole structure by value, only. However, the major disadvantage of this approach is the wasted processing time when only a minor subset of the data is actually touched by the application. The review focuses on the following properties of applied approaches:
Those figures will be rough estimations, only. An actual evaluation is not part of this review. LWJGL v3: StructLWJGL uses a class called Classes for mapped objects are derived from class There is a helper class called Offsets or member variables are stored in static final members of the struct class, which are treated as constants by the JIT compiler and as such inserted in the code (text cache not data cache). An instance of the struct class is then attached to a memory location which might be either a byte buffer, or memory on stack or in a native library. But as long as the mapped object stays in memory, the buffer is not garbage collected. To access a sequence of structs there is a similar class LWJGL v2: MappedObject (legacy)In LWJGL v2 exists a class MappedObject which allows to define struct types in Java which can be mapped to byte buffers. It very much looks like this is derived work from the author of LibStruct (see below) but was discarded in LWJGL 3, to focus more on LWJGL's core functionality, which is binding. Mapped objects are declard as Java classes derived from The declared mapped object can have even public member variables and integrates seamlessly with the remaining code. This comes at the cost of more responsibility on user side. The user needs to understand, that this object (especially references on members such as arrays) can only exist as long as the backing memory is available, otherwise he will run into errors, which he can't locate and fix without the missing knowledge. The mechanism relies on runtime code generation/instrumentation which is triggered through a special class loader Offsets are inserted as constants in the generated code, which puts them in text cache not data cache. LibStruct The LibStruct project follows a very similar approach to Classes of mapped objects (here structs) are declared by adding the annotation
Methods can have a body which operates on member variables of the struct before returning a value. Declared struct types can have public member variables (even references). Again, struct types integrate seamlessly with the surrounding code, which leads to the same risks on user side as described for LibStruct supports runtime code generation and instrumentation through a Java agent, but comes also with a tool The concept requires generation of code to handle offsets in data access for each struct class and instrumentation of all classes using the mapped struct objects to allow direct access to members. When using runtime code generation, it also requires a class loader, which triggers instrumentation of classes using structs. Java Binary Block ParserThe Java Binary Block Parser (JBBP) aims to provide a framework which supports access to structs with variables of any integer type (even unsigned) but no floating point types, for example. Strength of the framework is its support of mapping single bits to variables. JBBP provides a domain specific language to declare those struct types or even new simple types such as three byte integers ( Since version 1.3 (Sep. 2017) JBBP supports offline code generation, which improves performance of data access at runtime. The declaration of structs did not change. Javolution: Struct Javolution is a framework aiming to provide tools for the development of realtime and embedded systems. Support of access to unstructured byte streams is provided through base classes Structs are declared as classes derived from It introduces classes for all standard primitives types in C and similar classes can be declared for nested structs or arrays with appropriate getters and setters for their value(s). This strategy establishes a concept which is consistent in its structure and semantic. When accessing a reference on a nested struct, the user gets a view object on this struct which cannot be mistaken as a copy, for example. The lack of a code generator results in much development effort for struct types, almost the same effort as writing the functionality without the helper classes. The approach to have an object instance for every single attribute in the struct causes a lot of memory management overhead, increased footprint and potentially increase in cache misses, because of the wide spread distribution of data in heap memory. PreonPreon calls the problem by its name and aims to provide a codec infrastructure for struct types with their library. Mapping of byte streams to Java types is in fact bound to the tasks of encoding and decoding of data. Preon does not provide code generation or instrumentation but a strong concept and classes to implement such tasks. Struct types are declared as simple Java classes and each member variable to be considered in mapping, has to be explicitly marked with an annotation At runtime, the application requests a codec from the framework, providing its annotated Java type as reference. The codec creates a new instance of the given Java type and decodes the data into it at once. Where possible it does lazy decoding, which is achieved through proxies (for example for nested structs), which are inserted in the object representing the struct in Java. But it usually copies a lot of data at once. The processing effort to instantiate a codec for a given type is presumably pretty high because each time a codec is requested, the given struct class is searched for annotations and appropriate codecs for attributes are instantiated. And having object instances for each attribute results in the same kind of distribution of data in heap memory as in Javolution and therefore the same impact on cache misses etc.. But the development effort is lower compared to Javolution. Simplified Wrapper and Interface Generator (SWIG)A different approach to perform access to native data is using JNI. Because JNI supports only generation of C header files from Java classes, which lacks the ability to define alignment etc., the most flexible approach is to use a framework such as SWIG, GlueGen or JNA (see below). In SWIG, struct types can be declared in C with support of all the features of C structs. A code generator then generates the corresponding counterpart in Java and C code to access the data in JNI code (see documentation). Thus, the application will need to load this native library at runtime and provide the pointer to the byte buffer through JNI to get an instance of the struct object. This object then has Native calls come at a certain cost, which is not just related to byte order manipulations. But this cost is presumably very close to pure Java approaches, if properly optimised (TODO: analyse), but the main disadvantage for many users will be elsewhere:
JogAmp: GlueGenGlueGen is part of the JogAmp project. It is a code generator which generates Java and JNI/C code to call C libraries. GlueGen requires C header files with declared C functions and data types and corresponding configuration files as input to generate (they call it 'emit') appropriate Java native methods/classes and JNI interface code. The configuration file provides vital information, such as C functions and data types to be considered and the target package and class names for generated code and fine grained control over more detailed options such as ignoring fields of structs or the scope of visibility of generated native methods (public/private etc.). Conversion of data between Java and C occurs in the generated JNI code. Java Native Access (JNA)JNA is another framework, which provides interfacing to native functions or data, but unlike SWIG and GlueGen the API user writes all of his/her code in Java. In terms of struct types, the framework provides classes to declare and access native data in for example byte buffers. Declared struct types are classes derived from Project PanamaProject Panama, officially announced in June, 2014 by John Rose (Oracle), aims to improve interconnections between the JVM and native libraries in general. Current prototype of this project can be found under the name Java Native Runtime. In respect to struct types, they follow a very similar approach to Javolution, by the time of this review: Every single attribute of a struct is boxed, which has the disadvantages already mentioned in regards to Javolution. In his blog, the project lead states, that he expects "that value types will narrow the gap eventually for other C types, but they are not here yet" (refer to next section for value types). So, there might be a chance that the approach on access to struct types will change. Java Value TypesLast but not least, there are plans to support struct-like types in a future version of the Java language. First concepts refer to such new data types as Value Types which have almost the same properties as structs in C. There is also an approach to flatten arrays, such that a multi-dimensional array doesn't have to be physically an array of references on other arrays. If those two approaches make their way into the Java language specification, there will be support to represent the data in a more compact form, but that does not involve interfacing or mapping to byte streams. Also, by now, there are no signs of support to control alignment of data or having unions or mapped bits. So, whether it will be useful for such purposes, is unknown for now.
Holger Machens, 02-Jan-2021
|
|||||||