-->Java Native I/O

Specification

Background

Properties of a Structured Type in C

If we talk about native representation of data in memory, we refer to the representation used by C libraries, which complies with the platform's Application Binary Interface (ABI) - the standard low-level interface to communicate between application and system. This section will explain certain properties of data representation in respect to structured types.

Concept

Overview

The system will basically provide serialisation/deserialisation between Java objects and a native data representation which complies to some imaginary structured C type. The structured C type (i.e. the data layout) of an object will be declared in terms of a Java class. A code generator, will create a corresponding view class for the given type declaration. The view performs data exchange (and translation) between data in native representation (given in a ByteBuffer) and a Java object (instance of the class, which served as type declaration).

Mapping to Native Types

Mapping establishes a link between an instance of a Java type to an instance of a native type in a given block of memory. The type is one of struct, union, array or scalar. A mapping requires a translation between the Java type and the native type.

Views

Views can be generated for classes (structs) only. Views, whether generated or user implemented, have the following properties:

The following sections clarify details of views in use for specific kinds of types.

Scalar Types

Elements of scalar types (Java primitives) only appear as fields of classes or elements of arrays. Properties of scalar types are:

Complex Types

Complex types are classes which are interpreted as C structs. They have the following properties:

Arrays:

Array types have the following properties:

View on Fields of Array Types

A view for a class, which has fields of array types, will get special methods to access all components of each dimension of that array.

Array Views

Array types only appear as fields of complex types. However, access to multidimensional arrays is faster, when iterating over their components, rather than iterating over elements. Access to an element of a multidimensional array requires consideration of the component size of each dimension. The following is an example of direct element access:

1: for (int i = 0; i < array.length; i++) {
2:   for (int j = 0; j < array[i].length; j++) {
3:     array[i][j] = value;
4:   }
5: }

Line 3 in this example will be translated in an expression to first calculate the offset of the element based on the component sizes of each dimension of the array. In pseudo code, it looks like this:

	// constants for component sizes
	// precalculated by the compiler:
	elementSize       = sizeof(int);
	componentSize_Dim1 = elementSize*length_Dim1;
	componentSize_Dim2 = elementSize;
	
	// loop headers here
	
	// access to individual element
	offset = i*componentSize_Dim1 + j*componentSize_Dim2;
	reference(array_address + offset) = value
	
	// end loops here

If we change our loop to iterate over components only, it would instead look like this:

1: for (int i = 0; i < array.length; i++) {
2:   subarray = array[i];
2:   for (int j = 0; j < subarray.length; j++) {
3:     subarray[j] = value;
4:   }
5: }

The corresponding pseudo code after translation would be this:

	// loop header for dim 1
	
	offset1 = i*componentSize_Dim1;
	subarray = array_address + offset1

	// loop header for dim 2
	
	// access to individual element
	offset2 = j*componentSize_Dim2;
	reference(subarray_address + offset2) = value;
	
	// end loops here

Thus, the address of the sub-array will be calculated once for each sub-array, instead of once for each element. For native Java arrays, a good JIT compiler will probably optimise even the first example that way, but it probably cannot optimise direct access to elements via methods of views. Thus, there will be array views, which support iterating over their components, rather than elements.

Array views will have the following properties:

An array view class will usually require another view on its components. This will be the case for all arrays, which contain arrays (i.e. multidimensional arrays) and arrays whose element type is a struct (class). All boxed types will be treated as structs in this regard. This type of array view will be implemented as one generic class.

Since Java does not support primitive types to be used in generic classes, there will be two special array view classes for one-dimensional arrays of each primitive type. One will support to use a user defined view on the primitive elements of its array (such as the generic array view) and the other will provide optimised access without using an element view.

Array views will only exist for API users and not used internally. Thus, requested array views will be instantiated on demand only.

Instantiation of arrays cannot be optimised in views on structs containing an array, but not in array views. Either, (1) each array view stores the length of each dimension, resulting in the same information stored multiple times, or (2) the length of each dimension has to be requested from each view's component view recursively, or (3) the component views will be called recursively to instantiate each component. The standard method will be (3) to avoid conflicts with user views, but should be avoided by the user, when aiming for higher performance. There are methods which take a preinitialised array to retrieve the value of a component and thus support reuse of existing arrays.