jak-project/doc/type_system.md
2021-01-19 23:38:28 +00:00

17 KiB

Type System

This document explains the GOAL type system. The GOAL type system supports runtime typing, single inheritance, virtual methods, and dynamically sized structures.

Everything in GOAL has a type at compile time. A subset of compile-time types are also available in the runtime as objects with the same name as the type. For example, there is a string type, and at runtime there is a global object named string which is an object of type type containing information about the string type.

Some objects have runtime type information, and others don't. Objects which have runtime type information can have their type identified at runtime, and are called "boxed objects". Objects without runtime type information are called "unboxed objects". An unboxed object cannot reliably be detected as a unboxed object - you can't write a function that takes an arbitrary object and tells you if its boxed or not. However, boxed objects can always be recognized as boxed.

All types have a parent type, and all types descend from the parent type object, except for the special type none (and maybe _type_, but more on this later). The none type doesn't exist in the runtime and is used to represent an invalid value that the compiler should not use. For example, the return type of a function which doesn't return anything is none, and attempting to use this value should cause an error.

Here are some important special types:

  • object - the parent of all types
  • structure - parent type of any type with fields
  • basic - parent type of any structure with runtime type information.

All types have methods. Objects have access to all of their parents methods, and may override parent methods. All types have these 9 methods:

  • new - like a constructor, returns a new object. It's not used in all cases, and on all types, and needs more documentation on when specifically it is used.
  • delete - basically unused, but like a destructor. Often calls kfree, which does nothing.
  • print - prints a short, one line representation of the object to the PrintBuffer
  • inspect - prints a multi-line description of the object to the PrintBuffer. Usually auto-generated by the compiler and prints out the name and value of each field.
  • length - Returns a length if the type has something like a length (number of characters in string, etc). Otherwise returns 0. Usually returns the number of filled slots, instead of the total number of allocated slots, when there is possibly a difference.
  • asize-of - Gets the size in memory of the entire object. Usually this just looks this up from the appropriate type, unless it's dynamically sized.
  • copy - Create a copy of this object on the given heap. Not used very much?
  • relocate - Some GOAL objects will be moved in memory by the kernel as part of the compacting actor heap system. After being moved, the relocate method will be called with the offset of the move, and the object should fix up any internal pointers which may point to the old location. It's also called on v2 objects loaded by the linker when they are first loaded into memory.
  • memusage - Not understood yet, but probably returns how much memory in bytes the object uses. Not supported by all objects.

Usually a method which overrides a parent method must have the same argument and return types. The only exception is new methods, which can have different argument/return types from the parent. (Dee the later section on _type_ for another exception)

The compiler's implementation for calling a method is:

  • Is the type a basic?
    • If so, look up the type using runtime type information
    • Get the method from the vtable
  • Is the type not a basic?
    • Get the method from the vtable of the compile-time type
    • Note that this process isn't very efficient - instead of directly linking to the slot in the vtable (one deref) it first looks up the type by symbol, then the slot (two derefs). I have no idea why it's done this way.

In general, I suspect that the method system was modified after GOAL was first created. There is some evidence that types were once stored in the symbol table, but were removed because the symbol table became full. This could explain some of the weirdness around method calls/definition rules, and the disaster method-set! function.

GOAL Value Types

Some GOAL types are "value types", meaning they are passed by value when used as arguments to functions, return values from functions, local variables, and when using set!. These are always very small and fit directly into the CPU registers. Some example value types:

  • Floating point numbers
  • Integers

GOAL Reference Types

Other GOAL types are "reference types", meaning they act like a reference to data when used as arguments to functions, return values from functions, local variables, and when using set!. The data can be allocated on a heap, on the stack, or as part of static data included when loading code (which is technically also on a heap). All structure/basic types are reference types.

You can think of these like C/C++ pointers or references, which is how it is implemented. The difference is that there's no special notation for this. A GOAL string object is like a C/C++ string* or string&. A GOAL "pointer to reference type" is like a C/C++ my_type**.

Note - this is quite a bit different from C/C++. In C++ you can have a structure with value semantics (normal), or reference semantics (C++ reference or pointer). In GOAL, there is no value semantics for structures! This is great because it means function arguments/variables always fit into registers.

GOAL Fields

GOAL field definitions look like this:

(name type-name [optional stuff])

where optional stuff can include these, in any order:

  • :inline #t (default is false), to mark field as inline. This can only be done for a reference type, and indicates that the data should be stored inline, in the type, rather than just storing a reference to data stored elsewhere.
  • :dynamic #t (default is false), to mark field as dynamically-sized array (must be the last field in the type)
  • a number, to give an array size.
  • :offset x where x is a number, to manually specify where the field is located

There are many combinations of reference/value, dynamic/not-dynamic, inline/not-inline, array-size/no-array-size, and it can be confusing. This list explains all that are valid.

  • Value type, no modifiers: a single value is stored in the field. The field type is the value type.
  • Value type, :dynamic #t: the field marks the beginning of an array (of unknown size). Field type is (pointer your-type)
  • Value type, with array size: the field marks the beginning of an array (of known size). Field type is (pointer your-type)
  • Value type, with :inline #t: invalid in all cases.
  • Reference type, no modifiers: a single reference is stored in the type. Type of field is your-type (a C++ pointer).
  • Reference type, :inline #t: a single object is stored inside the type. Type of field is your-type still (a C++ pointer). The access logic is different to make this work.
  • Reference type, :dynamic #t or array size: the field marks the beginning of an array of references. Field type is (pointer your-type). Like C array of pointers.
  • Reference type, :inline #t and (:dynamic #t or array size): the field marks the beginning of an array of inline objects. Field type is (inline-array your-type). Like C array of structs.

Bonus ones, for where the array is stored outside of the type:

  • A dynamically typed GOAL array, stored outside your type (think std::vector): use (name (array your-type))
  • A dynamically typed GOAL array, stored inside your type: Not allowed, array is dynamic!
  • An array of value types, stored outside your type: use (name (pointer your-type))
  • An array of references (C++ array of pointers), stored outside your type: use (name (pointer your-ref-type))
  • An array of objects of reference type (C++ array of structs), stored outside your type: use (name (inline-array your-ref-type))

Of course, you can combine these, to get even more confusing types! But this seems uncommon.

GOAL Field Placement

The exact rules for placing fields in GOAL types is unknown, but the simple approach of "place the next field as close as possible to the end of the last field" seems to get it right almost all the time. However, we need to be extra certain that we lay out type fields correctly because many GOAL types have overlapping fields.

The theory I'm going with for now is:

  • The order of fields in the inspect method is the order fields are listed in in the type definition
  • In the rare cases this is wrong, this is due to somebody manually specifying an offset.

As a result, we should specify offsets like this:

  • If we think a field was manually placed, use :offset to override. This is certain to be right
  • If we think a field was automatically placed, use :offset-assert to inform the compiler where we expect it to be. In this case it will still place the field automatically, but if the result is different from the :offset-assert, it will throw an error.
  • Avoid defining any fields without :offset or :offset-assert

GOAL Arrays

For value types, arrays work as you expect. They have type (pointer your-type). Arrays of references come in two versions:

  • Array of references: (pointer your-type), like a C array of pointers
  • Array of inline objects: (inline-array your-type), like a C array of structs

The default alignment of structs is 16 bytes, which is also the minimum alignment of kmalloc, and the minimum alignment used when using a reference type as an inline field. However, it's possible to violate this rule in a (inline-array your-type) to be more efficient. The your-type can set a flag indicating it should be packed in an inline array.

I believe the alignment then becomes the maximum of the minimum alignment of the your-type fields. So if you have a type with two uint32s (alignment 4 bytes), an (inline-array your-type) can then have spacing of 8 bytes, instead of the usual minimum 16. The behavior of a (field-name your-type :inline #t) is unchanged and will still align at the minimum of 16 bytes. I believe that the first element of the array will still have an alignment of 16.

There's a single type system library, located in common/type_system. It will be used in both the decompiler and compiler. The plan is to have a single all_types.gc file which contains all type information (type definitions and types of globals). The decompiler will help generate this, but some small details may need to be filled in manually for some types. Later versions of the decompiler can use this information to figure out what fields of types are being accessed. We can also add a test to make sure that types defined in the decompiled game match all_types.gc.

The main features are:

  • TypeSystem stores all type information and provides a convenient way to add new types or request information about existing types.
  • Type information about a GOAL Type. A "base GOAL type" is identified by a single unique string. Examples: function, string, vector3h.
  • TypeSpec a way to specify either Type or a "compound type". Compound types are used to create types which represent specific function types (function which takes two integer arguments and returns a string), or specific pointer/array types (pointer to an integer). These can be represented as (possibly nested) lists, like (pointer integer) or (function (integer integer) string).
  • Type Checking for compiler
  • Parsing of type definitions for compiler
  • Lowest common ancestor implementation for compiler to figure out return types for branching forms.
  • Logic to catch multiple incompatible type definitions for both compiler warnings and decompiler sanity checks

Compile Time vs. Run Time types

The types in the runtime are only a subset of the compile time types. Here are the rules I've discovered so far

  • Any compound types become just the first type. So (pointer my-type) becomes pointer.
  • The inline-array class just becomes pointer.
  • Some children of integers disappear, but others don't. The rules for this aren't known yet.

Special _type_ for methods

The first argument of a method always contains the object that the method is being called on. It also must have the type _type_, which will be substituted by the type system (at compile time) using the following rules:

  • At method definition: replace with the type that the method is being defined for.
  • At method call: replace with the compile-time type of the object the method is being called on.

A method can have other arguments or a return value that's of type _type_. This special "type" will be replaced at compile time with the type which is defining or calling the method. No part of this exists at runtime. It may seem weird, but there are two uses for this.

The first is to allow children to specialize methods and have their own child type as an argument type. For example, say you have a method is-same-shape, which compares two objects and sees if they are the same shape. Suppose you first defined this for type square with

(defmethod square is-same-shape ((obj1 square) (obj2 square))
  (= (-> obj1 side-length) (-> obj2 side-length))
 )

Then, if you created a child class of square called rectangle (this is a terrible way to use inheritance, but it's just an example), and overrode the is-same-shape method, you would have to have arguments that are squares, which blocks you from accessing rectangle-specific fields. The solution is to define the original method with type _type_ for the first two arguments. Then, the method defined for rectangle also will have arguments of type _type_, which will expand to rectangle.

The second use is for a return value. For example, the print and inspect methods both return the object that is passed to them, which will always be the same type as the argument passed in. If print was define as (function object object), then (print my-square) would lose the information that the return object is a square. If print is a (function _type_ _type_), the type system will know that (print my-square) will return a square.

Inline Array Class

There's a weird inline-array-class type that's not fully understood yet. It uses heap-base.

Heap Base

This is a field in type. What does it mean? It's zero for most types (at least the early types).

Second Size Field

There are two fields in type for storing the size. The first one stores the exact size, and by default the second stores the size rounded up to the nearest 16 bytes. Why? Who uses it? Does it ever get changed?

The Type System

The type system will store:

  • The types of all global variables (this includes functions)
  • Information about all types:
    • Fields/specific details on how to load from memory, alignment, sign extension, size in arrays, etc...
    • Parent type
    • Methods not defined for the parent.

It's important that all of the type-related info is stored/calculated in a single location. The proof of concept compiler did not have the equivalent of TypeSystem and scattered field/array access logic all over the place. This was extremely confusing to get right.

If type information is specified multiple times, and is also inconsistent, the TypeSystem can be configured to either throw an exception or print a warning.

This will be a big improvement over the "proof of concept" compiler which did not handle this situation well. When debugging GOAL you will often put the same file through the compiler again and again, changing functions, but not types. In this case, there should be no warnings. If the type does change, it should warn (as old code may exist that uses the old type layout), but shouldn't cause the compiler to abort, error, or do something very unexpected.

Method System

All type definitions should also define all the methods, in the order they appear in the vtable. I suspect GOAL had this as well because the method ordering otherwise seems random, and in some cases impossible to get right unless (at least) the number of methods was specified in the type declaration.

Todo

  • Kernel types that are built-in
  • Signed/unsigned for a few built-in type fields
  • Tests for field placement logic (probably a full compiler test?)
  • Bitfield types
  • Type redefinition tests (these are a pain and probably useless, might just wait for full compiler tests?)
  • Stuff for decompiler
    • What field is here?
    • Export all deftypes