Specification

Background

Properties of a Structured Type in C

If we talk about native representation of data in memory, we refer to the representation used by C libraries, which complies with the platform's Application Binary Interface (ABI) - the standard low-level interface to communicate between application and system. This section will explain certain properties of data representation in respect to structured types.

A struct defines an ordered group of attributes. The order of attributes of a struct also defines their physical order in memory. Structs do not support inheritance. A struct has an alignment which is either defined by its first attribute or explicitly for the struct itself (they must be equal, if both have been declared).
A union defines an unordered set of alternative attributes for the same data. Alternative attributes of the same union, refer to the same location in memory.
Each attribute in the declaration of a struct or union has:
- A type
- A name, which can be empty (i.e. anonymous structs or unions).
- An alignment in memory.
- A logical position in the order of attributes (which is the same for all alternative attributes of unions).
- An offset, which can be calculated from alignment and logical position.
The type of an attribute has:
- A kind, which is one of:
  - struct
  - union
  - array
  - scalar
  - pointer
- A physical size (in bytes)
- A format, only if it is of scalar kind. The format defines the representation of a scalar value in memory. A few examples:
  - integer: take as is.
  - signed integer: 2th compliment.
  - floating point types, e.g. IEEE 754
  - fixed point number, e.g. 24 bits for digits for comma and 8 for fraction.
  - bitset, which defines a subset of the bits of an integer to be treated as an integer with those bits shifted to the least significant bit.
  - etc.
An array is an ordered sequence of elements (components). The order of elements defines their order in physical memory. An array can have one or more dimensions where each dimension has a fixed size (boundaries). Thus, an array has an element type and a dimension, where each dimension has its fixed element count.

Concept

Overview

The system will basically provide serialisation/deserialisation between Java objects and a native data representation which complies to some imaginary structured C type. The structured C type (i.e. the data layout) of an object will be declared in terms of a Java class. A code generator, will create a corresponding view class for the given type declaration. The view performs data exchange (and translation) between data in native representation (given in a ByteBuffer) and a Java object (instance of the class, which served as type declaration).

Mapping to Native Types

Mapping establishes a link between an instance of a Java type to an instance of a native type in a given block of memory. The type is one of struct, union, array or scalar. A mapping requires a translation between the Java type and the native type.

Java type is one of the genuine Java types. We have to consider:
- Scalar Types:
  - byte
  - short
  - int
  - long
  - float
  - double
  - char
  - boolean
- Regular Classes: All classes.
- Arrays: Arrays of any element type.
- Immutable Types: Objects of classes that prohibit modification of their state such as all boxed primitive types or java.lang.String.
- Enum Classes: Enum classes do not support explicit object instantiation.
Block of memory has:
- an address (virtual memory address)
- a particular byte ordering.
Translation requires knowledge of
- the Java type
- the native type
- the byte ordering of the memory block

Views

Views can be generated for classes (structs) only. Views, whether generated or user implemented, have the following properties:

Associated Type: A view class is statically linked to one particular type specification, which is a scalar type, a complex type or an array type.
Associated Element: A field of a complex type or all elements of an array can be associated with a particular view.
Data Size: A view provides a method sizeof() to retrieve the overall size (number of bytes) of the associated type in a buffer. The number of bytes can differ from the number of the associated Java type.
Buffer Assignment: A view (and all its internal views) is assigned to a buffer using one of its open() methods. All data and view access methods will refer to the assigned buffer from there on. The view is meant to be detached from the buffer using its close() method, after its work on the buffer is done. A buffer will not be garbage collected until either the close method has been called or the view was garbage collected!
Data Access: A view provides read/write access to a value of the given type or elements of that type, such as elements of an array or fields of a complex type.
Complex Field View Access: A view provides access to views on fields of a struct or array type in the associated type.
Alignment: A view can consider a given alignment for the associated element or associated type. Alignment is always a power of two. Alignment of element overrides alignment of type. The alignment of a view instance cannot be changed throughout its lifetime.
Interpretation: A view allows interpretation (conversion) of a native type, such as required for strings or integer types of specific size (e.g. int24). Interpretation converts data between the associated Java type and a number of bytes in the buffer.
Cloning: Every view provides methods to clone it for reuse purposes. A clone shares all properties with its origin but is detached from it, meaning that internally used views of the clone are cloned as well.

The following sections clarify details of views in use for specific kinds of types.

Scalar Types

Elements of scalar types (Java primitives) only appear as fields of classes or elements of arrays. Properties of scalar types are:

Return-by-Value only: Most effective way to retrieve the value of a scalar type from a view is via stack.
No Generics: Java primitive types cannot be used in generic class declarations.
Views: A view on a primitive type is required only, for a specific interpretation of the native type (e.g. user provided view). There will be standard view interfaces for all genuine primitive types to support interpretation of them.

Complex Types

Complex types are classes which are interpreted as C structs. They have the following properties:

Accessible: Class is declared as public (and implicitly static).
Alignment: A class can have a standard alignment declared through annotation.
View: The view for classes is usually generated through the generator. A class can have a declared view, which interprets its content differently. A declared view for a class will suppress the generation of its view.
Fields: A class can have fields. All fields are interpreted as part of the C struct if the following requirements are met:
- Accessible: All non-public fields are discarded unless explicitly included. Included non-public fields will force the view to use reflection API.
- Not of Enclosing Type: Classes cannot have a field of its own type or one of its enclosing classes type, because it would result in a native type of infinite size (when serialised).
- Not static: Static members are members of the class but not of objects of this class.
- Not transient: In the context of serialization, the keyword transient declares fields to be non-permanent (e.g. a cache variable may be declared transient).
- Not Excluded: A field will be discarded if excluded by declaration (using annotation).
- Explicitly Included: Fields excluded by any of the above can be explicitly included by declaration (using annotation).
The declaration of a field can have the following properties, given by annotation:
- Alignment: A field can have a particular alignment which overrides the alignment declared for its type.
- Immutable: A field can be explicitly declared immutable (see Immutable Types).
- View: Access to fields will be performed based on the fields type or the declared view type. Fields of type array can have a declared view for its elements only.
- Lengths: Array fields require the declaration of the size of each of its dimensions. Fields with an explicitly declared view can have a length attribute as well. In case of array with a declared element view, the declared length attribute contains the lengths of the elements at the end (after the length of the last dimension of the array) Those are provided to the element view during runtime.
Member Classes: A class can have nested classes (member classes). Those will be considered as part of the C struct if:
- Regular Class: It complies with general rules for classes.
- Accessible: It is declared as public static and thus accessible from the outside.
- Type Use: There is at least one field of the enclosing class, which uses the type of the nested class.
Views of member classes will be member classes of the view of their enclosing class.
Constructors: Views will always try to use the default constructor. A missing default constructor forces the view to use the reflections API.
Inheritance: Inheritance is not supported by view generation, because fields of derived classes can potentially hide fields of base classes.
Return-by-Reference: Objects can only be referenced. Thus, retrieving the value of an object from a buffer requires either to instantiate a new object or to get a reference to an existing object which retrieves the value (e.g. T get(T object)).

Arrays:

Array types have the following properties:

Dimensions: An array type has a declared set of dimensions (e.g. int[][] has two dimensions).
Lengths: An instance of an array type has a specific length for each of its dimensions (e.g. int[2][4] has lengths {2,4}).
Array Components: The components of an array are the items of its first dimension (e.g. int[][] has components of type int[]).
Array Elements: The elementary items of an array are its elements (e.g. int[][] has elements of type int). The element type cannot be of an array type.
Component Distance: The distance between two components/elements of an array can be greater than the size of its elements, if the alignment is larger. The method stride() returns the distance from one component's start to the next component's start in number of bytes.

View on Fields of Array Types

A view for a class, which has fields of array types, will get special methods to access all components of each dimension of that array.

Access Entire Data: The view provides methods to access the entire value of the array (e.g. for int[][] a there will be int[][] getA() and setA(int[][])).
Direct Access to Component/Element: For each dimension there will be a getter and a setter method to read/write the particular component/element type (e.g. for field int[][] a there will be int[] getA(int index0) and int getA(int index0, int index1)).
Call-by-Reference: If the component type of the referenced dimension supports call-by-reference semantic, than there will be additional getter methods, which take an object of that type. This is always the case for component types, which are array types (i.e. all but the last dimension of a multidimensional array).
View Components: For each array there will be a method to retrieve a view instance for that array, which especially supports iteration over its components (see Section below).
View Elements: For each element, there will be a method to retrieve a view on that element unless it is of primitive type and has no associated view.

Array Views

Array types only appear as fields of complex types. However, access to multidimensional arrays is faster, when iterating over their components, rather than iterating over elements. Access to an element of a multidimensional array requires consideration of the component size of each dimension. The following is an example of direct element access:

1: for (int i = 0; i < array.length; i++) {
2:   for (int j = 0; j < array[i].length; j++) {
3:     array[i][j] = value;
4:   }
5: }

Line 3 in this example will be translated in an expression to first calculate the offset of the element based on the component sizes of each dimension of the array. In pseudo code, it looks like this:

	// constants for component sizes
	// precalculated by the compiler:
	elementSize       = sizeof(int);
	componentSize_Dim1 = elementSize*length_Dim1;
	componentSize_Dim2 = elementSize;
	
	// loop headers here
	
	// access to individual element
	offset = i*componentSize_Dim1 + j*componentSize_Dim2;
	reference(array_address + offset) = value
	
	// end loops here

If we change our loop to iterate over components only, it would instead look like this:

1: for (int i = 0; i < array.length; i++) {
2:   subarray = array[i];
2:   for (int j = 0; j < subarray.length; j++) {
3:     subarray[j] = value;
4:   }
5: }

The corresponding pseudo code after translation would be this:

	// loop header for dim 1
	
	offset1 = i*componentSize_Dim1;
	subarray = array_address + offset1

	// loop header for dim 2
	
	// access to individual element
	offset2 = j*componentSize_Dim2;
	reference(subarray_address + offset2) = value;
	
	// end loops here

Thus, the address of the sub-array will be calculated once for each sub-array, instead of once for each element. For native Java arrays, a good JIT compiler will probably optimise even the first example that way, but it probably cannot optimise direct access to elements via methods of views. Thus, there will be array views, which support iterating over their components, rather than elements.

Array views will have the following properties:

Component Access: There will be getter and setter to get/set components of that array. Remember: components of a one-dimensional array are its elements.
Component View: There will be a method V view(int index) to retrieve a view of type V on any of the components and a method V view(V view, int index) to configure an existing view to point to a component.
Length: There will be a method length() to retrieve the number of components of the viewed array.

An array view class will usually require another view on its components. This will be the case for all arrays, which contain arrays (i.e. multidimensional arrays) and arrays whose element type is a struct (class). All boxed types will be treated as structs in this regard. This type of array view will be implemented as one generic class.

Since Java does not support primitive types to be used in generic classes, there will be two special array view classes for one-dimensional arrays of each primitive type. One will support to use a user defined view on the primitive elements of its array (such as the generic array view) and the other will provide optimised access without using an element view.

Array views will only exist for API users and not used internally. Thus, requested array views will be instantiated on demand only.

Instantiation of arrays cannot be optimised in views on structs containing an array, but not in array views. Either, (1) each array view stores the length of each dimension, resulting in the same information stored multiple times, or (2) the length of each dimension has to be requested from each view's component view recursively, or (3) the component views will be called recursively to instantiate each component. The standard method will be (3) to avoid conflicts with user views, but should be avoided by the user, when aiming for higher performance. There are methods which take a preinitialised array to retrieve the value of a component and thus support reuse of existing arrays.

Holger Machens, 02-Jan-2021