Java Native I/O

Major Design Decisions

Separate Definition of Data Type and View

General concept now differentiates between a data type definition in terms of a Java class, which resembles a C struct, and a view type, which provides methods to access the C struct in a ByteBuffer.

Defining type and view in an interface was just impractical. When having a getter and a setter method for a member field of a struct, which one of both will get the necessary annotations?
The view has a very specific use case: It converts the data between a Java type and its corresponding C type. Separating both makes this obvious to the user and prevents him from making mistakes when using it.
Separation also makes the generated view independent of the type declaration which prevents the approach from conflicts related to implementation constraints, such as inheriting from final classes or overriding methods which are final or have the same signature and a different return type. This also eases the integration of types from thirdparty libraries.
Separation also allows for providing different (user implemented) views for the same type.

Static Code Generation Integrated in IDE

To improve user experience, dynamic code generation has been replaced by static code generation integrated with the IDE.

With the originally planned dynamic code generation, the user had to define an interface, which contains declarations of methods to access the struct in a ByteBuffer. Signatures of those methods must comply to specific rules (i.e. common getter/setter methods and specific methods to access array elements). Every change of a member field of the struct type forces a change of the access methods too. All of this makes the approach more error prone than helping and the effort gets too high to justify the use of a code generator.
Static code generation integrated in the IDE, can provide much better usability. It generates (or updates) a view for a given Java type when requested, generating all required methods automatically. The user just needs to provide the struct type (e.g. via context menu in project explorer).
Compared to hand-made code, static code generation induces no additional processing effort at runtime.

Generated Code of StructViews

A view contains the following methods.

Open/Close Semantic: A view is meant to be opened and closed which attaches/detaches a buffer to/from the view. This semantic is well known from other resources, such as files, sockets and streams in general. IDEs also support the user in identifying resource leaks in respect to objects implementing the interface java.lang.Closable. But a view is not a Stream. There is another class StructViewStream, which supports stream like access and iterating over lists of structs.
Getter and Setter for the whole structure. Each view has methods T get()/set(T value) which allow to read/write the entire struct and an additional method T get(T result) which writes the result in the given parameter.
Getter and Setter for each Member:
- void setMember(T value)
- T getMember()
- T getMember(T result)
The latter has been added to avoid excessive object allocation. It takes an object of the requested type as parameter, which will receive the result. Scalar and immutable types will not produce this type of getter.
Getter and Setter for Array Elements: Arrays result in additional getters and setters to access contained components, identified by indices provided via parameters. Thus, an array T[][] array produces
- T[][] getArray()
- T[][] getArray(T[][] v)
- T[] getArray(int i)
- T[] getArray(T[] v, int i)
- T getArray(int i, int j)
- T getArray(T v, int i, int j)
and corresponding setters if applicable.
Access to Nested Structs: Structs containing structured values, have a corresponding public field, which provides a view on that value.
```
			class A {
				public static class B { public int c,d; }
				public B b;
			}
		
```
results in a view
```
			class AView {
				public static class BView { 
					public int getC(); 
					public int getD(); 
				}
				public BView b;
			}
		
```
Such that the view can be used like this to directly access a member of B:
```
			AView view = ...;
			view.b.getA();
		
```
Direct access to members of member structs would have not been possible via methods which contain the member of the member in its name, because it would possibly hide other members with the same name. For example a method AView.getB_getC() to access member c in B, will conflict with a member b_getC in A, which would result in the same method getB_getC() in AView.

Still Existing Gotchas and Issues (i.e. TODOs)

Struct Type Declaration

In the current version, there are some constraints on structs, which conflict with ease of use in general, and integration of types, declared in thirdparty libraries.

No Inheritance: Inheritance would require the generator to either create views with similar inheritance structure or consider the inherited member fields to be 'inlined' in the declared struct type.
Public Members Only: This was mainly driven by the fact, that private members require the use of the Java reflections API or the non-standard Unsafe interface to modify them. This either generates more processing effort, or may lead to incompatibilities. Technically, private members can still be supported through those APIs if the users wants it, which will be most likely the case in future.
Public Static Member Classes: As well as member fields, member classes need to be accessible (public). They also need to be static, because instances of non-static classes (structs) declared inside of a struct type, have a reference on the instance of the outer class.
Issues with Unions: There is no corresponding type construct in Java, which makes it hard to declare unions without running into issues. Obviously, this is no issue for the generation of a view, but the use of the corresponding Java type beyond just its use as declaration of the struct type, is getting questionable. Maybe a declared Union should be a kind of view itself, which has methods to access data through one of the types declared in the union, but in fact accesses all members when using any setter method. Also, currently every view contains a method get() and a method set() with no parameters, which refers to an instance of the entire struct. This would make no sense in case of unions.

View Implementation

Nested Views: A view for a struct which contains nested structs as members, currently holds references on instances of corresponding views in member fields, as explained above. This should be replaced by calling static methods of those views instead, providing the buffer and offset of the required data in parameters, but I don't see a valid solution to the issue mentioned in the previous section. I actually wonder, if those view objects, which are instantiated with the view itself, will be in the same page (virtual memory) and thus have similar cache efficiency as passing parameters via stack, because the view object will be referenced anyways and thus its page will be in the TLB already.
Views for Arrays: For arrays with a structured component type (e.g. Struct[]) the view currently dynamically instantiates a view instance to encode/decode the array elements on demand. This can also be solved via static methods instead, or with member fields for each of those types.
Code Efficiency: Generated code is not well optimised by now. Just a reminder.
User Implemented Views: The code generator considers user implemented code when provided through an annotation of the respective member field. This feature is not final yet and was mainly introduced to support problematic types such as Java boolean, which does not have a defined size in memory, or Strings, which require to determine its size based on a null terminator and consideration of different character encodings or all the boxed types which are classes representing scalar types etc.. There can be standard views for certain types but actually its a bit questionable whether it is worth the effort to implement them. I'd rather provide tools to ease implementation of those views.

IDE Integration

Currently there is a basic implementation of a plugin for Eclipse. It is basic in the sense that it does not use the abstract syntax tree features of Eclipse JDT and therefore lacks support for marking errors in the code of a struct type declaration (e.g. in the editor). If possible, the code generator will be integrated deeper, but still kept independent to allow its use as standalone tool (e.g. command line or Ant Task) and integration with other IDEs.

Holger Machens, 02-Jan-2021