aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorToni Uhlig <matzeton@googlemail.com>2023-07-16 02:03:33 +0200
committerToni Uhlig <matzeton@googlemail.com>2023-07-16 02:03:33 +0200
commit5a40295c4cf0af5ea8da9ced04a4ce7d3621a080 (patch)
treecb21506e7b04d10b45d6066a0ee1655563d5d52b /doc
Squashed 'flatcc/' content from commit 473da2a
git-subtree-dir: flatcc git-subtree-split: 473da2afa5ca435363f8c5e6569167aee6bc31c5
Diffstat (limited to 'doc')
-rw-r--r--doc/Grammar-2015-07-23.md42
-rw-r--r--doc/Grammar.README.md2
-rw-r--r--doc/benchmarks.md147
-rw-r--r--doc/binary-format.md1378
-rw-r--r--doc/builder.md1845
-rw-r--r--doc/eclectic.fbs12
-rw-r--r--doc/flatcc-help.md106
-rw-r--r--doc/json_parser_design.md138
-rw-r--r--doc/security.md272
9 files changed, 3942 insertions, 0 deletions
diff --git a/doc/Grammar-2015-07-23.md b/doc/Grammar-2015-07-23.md
new file mode 100644
index 0000000..cafc054
--- /dev/null
+++ b/doc/Grammar-2015-07-23.md
@@ -0,0 +1,42 @@
+# Formal Grammar of the schema language
+
+schema = include*
+ ( namespace\_decl | type\_decl | enum\_decl | root\_decl |
+ file_extension_decl | file_identifier_decl |
+ attribute\_decl | object )*
+
+include = `include` string\_constant `;`
+
+namespace\_decl = `namespace` ident ( `.` ident )* `;`
+
+attribute\_decl = `attribute` string\_constant `;`
+
+type\_decl = ( `table` | `struct` ) ident metadata `{` field\_decl+ `}`
+
+enum\_decl = ( `enum` | `union` ) ident [ `:` type ] metadata `{` commasep(
+enumval\_decl ) `}`
+
+root\_decl = `root_type` ident `;`
+
+field\_decl = ident `:` type [ `=` scalar ] metadata `;`
+
+type = `bool` | `byte` | `ubyte` | `short` | `ushort` | `int` | `uint` |
+`float` | `long` | `ulong` | `double`
+ | `string` | `[` type `]` | ident
+
+enumval\_decl = ident [ `=` integer\_constant ]
+
+metadata = [ `(` commasep( ident [ `:` scalar ] ) `)` ]
+
+scalar = integer\_constant | float\_constant | `true` | `false`
+
+object = { commasep( ident `:` value ) }
+
+value = scalar | object | string\_constant | `[` commasep( value ) `]`
+
+commasep(x) = [ x ( `,` x )\* ]
+
+file_extension_decl = `file_extension` string\_constant `;`
+
+file_identifier_decl = `file_identifier` string\_constant `;`
+
diff --git a/doc/Grammar.README.md b/doc/Grammar.README.md
new file mode 100644
index 0000000..15fdaac
--- /dev/null
+++ b/doc/Grammar.README.md
@@ -0,0 +1,2 @@
+Official flatbuffer grammer listed for reference.
+
diff --git a/doc/benchmarks.md b/doc/benchmarks.md
new file mode 100644
index 0000000..06863da
--- /dev/null
+++ b/doc/benchmarks.md
@@ -0,0 +1,147 @@
+# Benchmarks
+
+Benchmarks are defined for raw C structs, Googles `flatc` generated C++
+and the `flatcc` compilers C ouput.
+
+These can be run with:
+
+ scripts/benchmark.sh
+
+and requires a C++ compiler installed - the benchmark for flatcc alone can be
+run with:
+
+ test/benchmark/benchflatcc/run.sh
+
+this only requires a system C compiler (cc) to be installed (and
+flatcc's build environment).
+
+A summary for OS-X 2.2 GHz Haswell core-i7 is found below. Generated
+files for OS-X and Ubuntu are found in the benchmark folder.
+
+The benchmarks use the same schema and dataset as Google FPL's
+FlatBuffers benchmark.
+
+In summary, 1 million iterations runs at about 500-540MB/s at 620-700
+ns/op encoding buffers and 29-34ns/op traversing buffers. `flatc` and
+`flatcc` are close enough in performance for it not to matter much.
+`flatcc` is a bit faster encoding but it is likely due to less memory
+allocation. Throughput and time per operatin are of course very specific
+to this test case.
+
+Generated JSON parser/printer shown below, for flatcc only but for OS-X
+and Linux.
+
+
+## operation: flatbench for raw C structs encode (optimized)
+ elapsed time: 0.055 (s)
+ iterations: 1000000
+ size: 312 (bytes)
+ bandwidth: 5665.517 (MB/s)
+ throughput in ops per sec: 18158707.100
+ throughput in 1M ops per sec: 18.159
+ time per op: 55.070 (ns)
+
+## operation: flatbench for raw C structs decode/traverse (optimized)
+ elapsed time: 0.012 (s)
+ iterations: 1000000
+ size: 312 (bytes)
+ bandwidth: 25978.351 (MB/s)
+ throughput in ops per sec: 83263946.711
+ throughput in 1M ops per sec: 83.264
+ time per op: 12.010 (ns)
+
+## operation: flatc for C++ encode (optimized)
+ elapsed time: 0.702 (s)
+ iterations: 1000000
+ size: 344 (bytes)
+ bandwidth: 490.304 (MB/s)
+ throughput in ops per sec: 1425301.380
+ throughput in 1M ops per sec: 1.425
+ time per op: 701.606 (ns)
+
+## operation: flatc for C++ decode/traverse (optimized)
+ elapsed time: 0.029 (s)
+ iterations: 1000000
+ size: 344 (bytes)
+ bandwidth: 11917.134 (MB/s)
+ throughput in ops per sec: 34642832.398
+ throughput in 1M ops per sec: 34.643
+ time per op: 28.866 (ns)
+
+
+## operation: flatcc for C encode (optimized)
+ elapsed time: 0.626 (s)
+ iterations: 1000000
+ size: 336 (bytes)
+ bandwidth: 536.678 (MB/s)
+ throughput in ops per sec: 1597255.277
+ throughput in 1M ops per sec: 1.597
+ time per op: 626.074 (ns)
+
+## operation: flatcc for C decode/traverse (optimized)
+ elapsed time: 0.029 (s)
+ iterations: 1000000
+ size: 336 (bytes)
+ bandwidth: 11726.930 (MB/s)
+ throughput in ops per sec: 34901577.551
+ throughput in 1M ops per sec: 34.902
+ time per op: 28.652 (ns)
+
+## JSON benchmark
+
+*Note: this benchmark is only available for `flatcc`. It uses the exact
+same data set as above.*
+
+The benchmark uses Grisu3 floating point parsing and printing algorithm
+with exact fallback to strtod/sprintf when the algorithm fails to be
+exact. Better performance can be gained by enabling inexact Grisu3 and
+SSE 4.2 in build options, but likely not worthwhile in praxis.
+
+## operation: flatcc json parser and printer for C encode (optimized)
+
+(encode means printing from existing binary buffer to JSON)
+
+ elapsed time: 1.407 (s)
+ iterations: 1000000
+ size: 722 (bytes)
+ bandwidth: 513.068 (MB/s)
+ throughput in ops per sec: 710619.931
+ throughput in 1M ops per sec: 0.711
+ time per op: 1.407 (us)
+
+## operation: flatcc json parser and printer for C decode/traverse (optimized)
+
+(decode/traverse means parsing json to flatbuffer binary and calculating checksum)
+
+ elapsed time: 2.218 (s)
+ iterations: 1000000
+ size: 722 (bytes)
+ bandwidth: 325.448 (MB/s)
+ throughput in ops per sec: 450758.672
+ throughput in 1M ops per sec: 0.451
+ time per op: 2.218 (us)
+
+## JSON parsing and printing on same hardware in Virtual Box Ubuntu
+
+Numbers for Linux included because parsing is significantly faster.
+
+## operation: flatcc json parser and printer for C encode (optimized)
+
+ elapsed time: 1.210 (s)
+ iterations: 1000000
+ size: 722 (bytes)
+ bandwidth: 596.609 (MB/s)
+ throughput in ops per sec: 826328.137
+ throughput in 1M ops per sec: 0.826
+ time per op: 1.210 (us)
+
+## operation: flatcc json parser and printer for C decode/traverse
+
+ elapsed time: 1.772 (s)
+ iterations: 1000000
+ size: 722 (bytes)
+ bandwidth: 407.372 (MB/s)
+ throughput in ops per sec: 564227.736
+ throughput in 1M ops per sec: 0.564
+ time per op: 1.772 (us)
+
diff --git a/doc/binary-format.md b/doc/binary-format.md
new file mode 100644
index 0000000..7c2f53e
--- /dev/null
+++ b/doc/binary-format.md
@@ -0,0 +1,1378 @@
+# FlatBuffers Binary Format
+
+
+<!-- vim-markdown-toc GFM -->
+
+* [Overview](#overview)
+* [Memory Blocks](#memory-blocks)
+* [Example](#example)
+* [Primitives](#primitives)
+ * [Numerics](#numerics)
+ * [Boolean](#boolean)
+ * [Format Internal Types](#format-internal-types)
+ * [Scalars](#scalars)
+ * [Structs](#structs)
+* [Internals](#internals)
+* [Type Hashes](#type-hashes)
+ * [Conflicts](#conflicts)
+ * [Type Hash Variants](#type-hash-variants)
+* [Unions](#unions)
+* [Optional Fields](#optional-fields)
+* [Alignment](#alignment)
+* [Default Values and Deprecated Values](#default-values-and-deprecated-values)
+* [Schema Evolution](#schema-evolution)
+* [Keys and Sorting](#keys-and-sorting)
+* [Size Limits](#size-limits)
+* [Verification](#verification)
+* [Risks](#risks)
+* [Nested FlatBuffers](#nested-flatbuffers)
+* [Fixed Length Arrays](#fixed-length-arrays)
+* [Big Endian FlatBuffers](#big-endian-flatbuffers)
+* [StructBuffers](#structbuffers)
+* [StreamBuffers](#streambuffers)
+* [Bidirectional Buffers](#bidirectional-buffers)
+* [Possible Future Features](#possible-future-features)
+ * [Force Align](#force-align)
+ * [Mixins](#mixins)
+
+<!-- vim-markdown-toc -->
+
+
+## Overview
+
+## Memory Blocks
+
+A FlatBuffers layout consists of the following types of memory blocks:
+
+- header
+- table
+- vector
+- string
+- vtable
+- (structs)
+
+Each of these have a contigous memory layout. The header references the
+root table. Every table references a vtable and stores field in a single
+block of memory. Some of these fields can be offsets to vectors, strings
+and vectors. Vectors are one-dimensional blocks where each element is
+self-contained or stores reference to table or a string based on schema
+type information. A vtable decides which fields are present in table and
+where they are stored. Vtables are the key to support schema evolution.
+A table has no special rules about field ordering except fields must be
+properly aligned, and if they are ordered by size the will pack more
+densely but possibly more complicated to construct and the schema may
+provide guidelines on the preferred order. Two vtables might mean
+different things but accidentally have the same structure. Such vtables
+can be shared between different tables. vtable sharing is important when
+vectors store many similar tables. Structs a dense memory regions of
+scalar fields and smaller structs. They are mostly found embedded in
+tables but they are independent blocks when referenced from a union or
+union vector, or when used as a buffer root. Structs hold no references.
+
+Space between the above blocks are zero padded and present in order to
+ensure proper alignment. Structs must be packed as densely as possible
+according the alignment rules that apply - this ensures that all
+implementations will agree on the layout. The blocks must not overlap in
+memory but two blocks may be shared if they represent the same data such
+as sharing a string.
+
+FlatBuffers are constructed back to front such that lower level objects
+such as sub-tables and strings are created first in stored last and such
+that the root object is stored last and early in the buffer. See also
+[Stream Buffers](#stream-buffers) for a proposed variation over this
+theme.
+
+All addressing in FlatBuffers are relative. The reason for this? When
+navigating the buffer you don't need to store the buffer start or the
+buffer length, you can also find a new reference relative the reference
+you already have. vtables require the table location to find a field,
+but a table field referencing a string field only needs the table field
+location, not the table start in order to find the referenced string.
+This results in efficient navigation.
+
+## Example
+
+Uoffsets add their content to their address and are positive while a
+tables offset to its vtable is signed and is substracted. A vtables
+element is added to the start of the table referring to the vtable.
+
+
+Schema ([eclectic.fbs]) :
+
+ namespace Eclectic;
+
+ enum Fruit : byte { Banana = -1, Orange = 42 }
+ table FooBar {
+ meal : Fruit = Banana;
+ density : long (deprecated);
+ say : string;
+ height : short;
+ }
+ file_identifier "NOOB";
+ root_type FooBar;
+
+JSON :
+
+ { "meal": "Orange", "say": "hello", "height": -8000 }
+
+Buffer :
+
+ header:
+
+ +0x0000 00 01 00 00 ; find root table at offset +0x00000100.
+ +0x0004 'N', 'O', 'O', 'B' ; possibly our file identifier
+
+ ...
+
+ table:
+
+ +0x0100 e0 ff ff ff ; 32-bit soffset to vtable location
+ ; two's complement: 2^32 - 0xffffffe0 = -0x20
+ ; effective address: +0x0100 - (-0x20) = +0x0120
+ +0x0104 00 01 00 00 ; 32-bit uoffset string field (FooBar.say)
+ ; find string +0x100 = 256 bytes _from_ here
+ ; = +0x0104 + 0x100 = +0x0204.
+ +0x0108 42d ; 8-bit (FooBar.meal)
+ +0x0109 0 ; 8-bit padding
+ +0x010a -8000d ; 16-bit (FooBar.height)
+ +0x010c ... ; (first byte after table end)
+
+ ...
+
+ vtable:
+
+ +0x0120 0c 00 ; vtable length = 12 bytes
+ +0x0122 0c 00 ; table length = 12 bytes
+ +0x0124 08 00 ; field id 0: +0x08 (meal)
+ +0x0126 00 00 ; field id 1: <missing> (density)
+ +0x0128 04 00 ; field id 2: +0004 (say)
+ +0x012a 0a 00 ; field id 3: +0x0a (height)
+
+ ...
+
+ string:
+
+ +0x0204 05 00 00 00 ; vector element count (5 ubyte elements)
+ +0x0208 'h' 'e' ; vector data
+ +0x020a 'l' 'l' ; vector data
+ +0x020c 'o' ; vector data
+ +0x020d 00 ; zero termination
+ ; special case for string vectors
+
+ ...
+
+
+Actual FlatCC generated buffer :
+
+ Eclectic.FooBar:
+ 0000 08 00 00 00 4e 4f 4f 42 e8 ff ff ff 08 00 00 00 ....NOOB........
+ 0010 2a 00 c0 e0 05 00 00 00 68 65 6c 6c 6f 00 00 00 *.......hello...
+ 0020 0c 00 0c 00 08 00 00 00 04 00 0a 00 ............
+
+
+Note that FlatCC often places vtables last resulting in `e0 ff ff ff`
+style vtable offsets, while Googles flatc builder typically places them
+before the table resulting in `20 00 00 00` style vtable offsets which
+might help understand why the soffset is subtracted from and not added
+to the table start. Both forms are equally valid.
+
+
+## Primitives
+
+Primitives are data used to build more complex structures such as
+strings, vectors, vtables and tables. Although structs are not strictly
+a primitive, it helps to view them as self-contained primitives.
+
+
+### Numerics
+
+FlatBuffers are based on the following primitives that are 8, 16, 32 and
+64 bits in size respectively (IEEE-754, two's complement, little endian):
+
+ uint8, uint16, uint32, uint64 (unsigned)
+ int8, int16, int32, int64 (two's complement)
+ float32, float64 (IEEE-754)
+
+
+### Boolean
+
+ flatbuffers_bool (uint8)
+ flatbuffers_true (flatbuffers_bool assign as 1, read as != 0)
+ flatbuffers_false (flatbuffers_bool = 0)
+
+Note that a C99 `bool` type has no defined size or sign so it is not an
+exact representation of a flatbuffers boolean encoding.
+
+When a stored value is interpreted as boolean it should not be assumed
+to be either 1 or 0 but rather as `not equal to 0`. When storing a
+boolean value or when converting a boolean value to integer before
+storing, the value should be 1 for true and 0 for false, In C this can
+be done using `!!x`.
+
+### Format Internal Types
+
+ flatbuffers_union_type_t (uint8, NONE = 0, 0 <= type <= 255)
+ flatbuffers_identifier_t (uint8[4])
+ flatbuffers_uoffset_t (uin32)
+ flatbuffers_soffset_t (int32),
+ flatbuffers_voffset_t (uint16)
+
+These types can change in FlatBuffer derived formats, but when we say
+FlatBuffers without further annotations, we mean the above sizes in
+little endian encoding. We always refer to specific field type and not a
+specific field size because this allows us to derive new formats easily.
+
+
+### Scalars
+
+To simpify discussion we use the term scalars to mean integers, boolean,
+floating point and enumerations. Enumerations always have an underlying
+signed or unsigned integer type used for its representation.
+
+Scalars are primitives that has a size between 1 and 8 bytes. Scalars
+are stored aligned to the own size.
+
+The format generally supports vectors and structs of any scalar type but
+some language bindings might not have support for all combinations such
+as arrays of booleans.
+
+An special case is the encoding of a unions type code which internally
+is an enumeration but it is not normally permitted in places where we
+otherwise allow for scalars.
+
+Another special case is enumerations of type boolean which may not be
+widely supported, but possible. The binary format is not concerned with
+this distinction because a boolean is just an integer at this level.
+
+### Structs
+
+A struct is a fixed lengthd block of a fixed number of fields in a specific order
+defined by a schema. A field is either a scalar, another struct or a fixed length
+array of these, or a fixed length char array. A struct cannot contain fields that
+contain itself directly or indirectly. A struct is self-contained and has no
+references. A struct cannot be empty.
+
+A schema cannot change the layout of a struct without breaking binary
+compatibility, unlike tables.
+
+When used as a table field, a struct is embedded within the table block
+unless it is union value. A vector of structs are placed in a separate
+memory block, similar to vectors of scalars. A vector of unions that has
+a struct as member will reference the struct as an offset, and the
+struct is then an independent memory block like a table.
+
+FlatCC supports that a struct can be the root object of a FlatBuffer, but
+most implementations likely won't support this. Structs as root are very
+resource efficient.
+
+Structs cannot have vectors but they can have fixed length array fields. which
+are equivalent to stacking multiple non-array fields of the same type after each
+other in a compact notation with similar alignment rules. Additionally arrays
+can be of char type to have a kind of fixed length string. The char type is not
+used outside of char arrays. A fixed length array can contain a struct that
+contains one more fixed length arrays. If the char array type is not support, it
+can be assumed to be a byte array.
+
+
+## Internals
+
+All content is little endian and offsets are 4 bytes (`uoffset_t`).
+A new buffer location is found by adding a uoffset to the location where
+the offset is stored. The first location (offset 0) points to the root
+table. `uoffset_t` based references are to tables, vectors, and strings.
+References to vtables and to table fields within a table have a
+different calculations as discussed below.
+
+A late addition to the format allow for adding a size prefix before the
+standard header. When this is done, the builder must know about so it
+can align content according to the changed starting position. Receivers
+must also know about the size field just as they must know about the
+excpected buffer type.
+
+The next 4 bytes (`sizeof(uoffset_t)`) might represent a 4 byte buffer
+identifier, or it might be absent but there is no obvious way to know
+which. The file identifier is typically ASCII characters from the
+schemas `file_identifier` field padded with 0 bytes but may also contain
+any custom binary identifier in little endian encoding. See
+[Type-Hashes](#type-hashes). The 0 identifier should be avoided because
+the buffer might accidentally contain zero padding when an identifier is
+absent and because 0 can be used by API's to speficy that no identifier
+should be stored.
+
+When reading a buffer, it should be checked that the length is at least
+8 bytes (2 * `sizeof(uoffset_t)`) Otherwise it is not safe to check the
+file identifier.
+
+The root table starts with a 4 byte vtable offset (`soffset_t`). The
+`soffset_t` has the same size as `uoffset_t` but is signed.
+
+The vtable is found by *subtracting* the signed 4 byte offset to the
+location where the vtable offset is stored. Note that the `FlatCC`
+builder typically stores vtables at the end of the buffer (clustered
+vtables) and therefore the vtable offset is normally negative. Other
+builders often store the vtable before the table unless reusing an
+existing vtable and this makes the soffset positive.
+(Nested FlatBuffers will not store vtables at the end because it would
+break compartmentalization).
+
+The vtable is a table of 2 byte offsets (`voffset_t`). The first two
+entreis are the vtable size in bytes and the table size in bytes. The
+next offset is the vtable entry for table field 0. A vtable will always
+have as many entries as the largest stored field in the table, but it
+might skip entries that are not needed or known (version independence) -
+therefore the vtable length must be checked. The table length is only
+needed by verifiers. A vtable lookup of an out of range table id is the
+same as a vtable entry that has a zero entry and results in a default
+value for the expected type. Multiple tables may shared the same vtable,
+even if they have different types. This is done via deduplication during
+buffer construction. A table field id maps to a vtable offset using the
+formula `vtable-start + sizeof(voffset_t) * (field-id + 2)`, iff the
+result is within the vtable size.
+
+The vtable entry stores a byte offset relative to the location where the
+`soffset_t` where stored at the start of the table. A table field is
+stored immediately after the `soffset_t` or may contain 0 padding. A
+table field is always aligned to at least its own size, or more if the
+schema demands it. Strings, sub-tables and vectors are stored as a
+4-byte `uoffset_t`. Such sub-elements are always found later in the
+buffer because `uoffset_t` is unsigned. A struct is stored in-place.
+Conceptually a struct is not different from an integer in this respect,
+just larger.
+
+If a sub-table or vector is absent and not just without fields, 0
+length, the containing table field pointing the sub-table or vector
+should be absent. Note that structs cannot contain sub-tables or
+vectors.
+
+A struct is always 0 padded up to its alignment. A structs alignment
+is given as the largest size of any non-struct member, or the alignment
+of a struct member (but not necessarily the size of a struct member), or
+the schema specified aligment.
+
+A structs members are stored in-place so the struct fields are known
+via the the schema, not via information in the binary format
+
+An enum is represent as its underlying integer type and can be a member
+of structs, fields and vectors just like integer and floating point
+scalars.
+
+A table is always aligned to `sizeof(uoffset_t)`, but may contain
+internal fields with larger alignment. That is, the table start or end
+are not affected by alignment requirements of field members, unlike
+structs.
+
+A string is a special case of a vector with the extra requirement that a
+0 byte must follow the counted content, so the following also applies to
+strings.
+
+A vector starts with a `uoffset_t` field the gives the length in element
+counts, not in bytes. Table fields points to the length field of a
+vector, not the first element. A vector may have 0 length. Note that the
+FlatCC C binding will return a pointer to a vectors first element which
+is different from the buffer internal reference.
+
+All elements of a vector has the same type which is either a scalar type
+including enums, a struct, or a uoffset reference where all the
+reference type is to tables all of the same type, all strings, or,
+for union vectors, references to members of the same union. Because a
+union member can be a struct, it is possible to have vectors that
+reference structs instead of embeddding them, but only via unions. It is
+not possible to have vectors of vectors other than string vectors,
+except indirectly via vectors containing tables.
+
+A vectors first element is aligned to the size of `uoffset_t` or the
+alignment of its element type, or the alignment required by the schema,
+whichever is larger. Note that the vector itself might be aligned to 4
+bytes while the first element is aligned to 8 bytes.
+
+(Currently the schema semantics do no support aligning vectors beyond
+their element size, but that might change and can be forced when
+building buffers via dedicated api calls).
+
+Strings are stored as vectors of type `ubyte_t`, i.e. 8-bit elements. In
+addition, a trailing zero is always stored. The trailing zero is not
+counted in the length field of the vector. Technically a string is
+supposed to be valid UTF-8, but in praxis it is permitted that strings
+contain 0 bytes or other invalid sequences. It is recommended to
+explicitly check strings for UTF-8 conformance when this is required
+rather than to expect this to alwways be true. However, the ubyte vector
+should be preferred for binary data.
+
+A string, vector or table may be referenced by other tables and vectors.
+This is known as a directed acyclic graph (DAG). Because uoffsets are
+unsigned and because uoffsets are never stored with a zero value (except
+for null entries in union vectors), it is not possible for a buffer to
+contain cycles which makes them safe to traverse with too much fear of
+excessive recursion. This makes it possible to efficiently verify that a
+buffer does not contain references or content outside of its expected
+boundaries.
+
+A vector can also hold unions, but it is not supported by all
+implementations. A union vector is in reality two separate vectors: a
+type vector and an offset vector in place of a single unions type and
+value fields in table. See unions.
+
+
+## Type Hashes
+
+A type hash is a 32-bit value defined as the fnv1a32 hash of a tables or
+a structs fully qualified name. If the fnv1a32 hash returns 0 it should
+instead hash the empty string. 0 is used to indicate that a buffer
+should not store an identifier.
+
+Every table and struct has a name and optionally also a namespace of one
+or more levels. The fully qualified name is optional namespace followed
+by the type name using '.' as a separator. For example
+"MyGame.Sample.Monster", or "Eclectic.FooBar".
+
+The type hash can be used as the buffer identifier instead of the schema
+provided `file_identifier`. The type hash makes it possible to
+distinguish between different root types from the same schema, and even
+across schema as long as the namespace is unique.
+
+Type hashes introduce no changes to the binary format but the application
+interface must choose to support user defined identifiers or explicitly
+support type hashes. Alternatively an application can peak directly into
+the buffer at offset 4 (when `uoffset_t` is 4 bytes long).
+
+FlatCC generates the following identifier for the "MyGame.Sample.Monster"
+table:
+
+ #define MyGame_Sample_Monster_type_hash ((flatbuffers_thash_t)0xd5be61b)
+ #define MyGame_Sample_Monster_type_identifier "\x1b\xe6\x5b\x0d"
+
+But we can also
+[compute one online](https://www.tools4noobs.com/online_tools/hash/)
+for our example buffer:
+
+ fnv1a32("Eclectic.FooBar") = 0a604f58
+
+Thus we can open a hex editor and locate
+
+ +0x0000 00 01 00 00 ; find root table at offset +0x00000100.
+ +0x0004 'N', 'O', 'O', 'B' ; possibly our file identifier
+
+and replace it with
+
+ +0x0000 00 01 00 00 ; find root table at offset +0x00000100.
+ +0x0004 58 4f 60 0a ; very likely our file identifier identifier
+
+or generate it with `flatcc`:
+
+ $ bin/flatcc --stdout doc/eclectic.fbs | grep FooBar_type
+ #define Eclectic_FooBar_type_hash ((flatbuffers_thash_t)0xa604f58)
+ #define Eclectic_FooBar_type_identifier "\x58\x4f\x60\x0a"
+
+
+The following snippet implements fnv1a32, and returns the empty string
+hash if the hash accidentially should return 0:
+
+
+ static inline flatbuffers_thash_t flatbuffers_type_hash_from_name(const char *name)
+ {
+ uint32_t hash = 2166136261UL;
+ while (*name) {
+ hash ^= (uint32_t)*name;
+ hash = hash * 16777619UL;
+ ++name;
+ }
+ if (hash == 0) {
+ hash = 2166136261UL;
+ }
+ return hash;
+ }
+
+### Conflicts
+
+It is possible to have conficts between two type hashes although the
+risk is small. Conflicts are not important as long as an application can
+distinguish between all the types it may encouter in actual use. A
+switch statement in C will error at compile time for two cases that have
+the same value, so the problem is easily detectable and fixable by
+modifying a name or a namespace.
+
+For global conflict resolution, a type should be identified by its fully
+qualified name using adequate namespaces. This obviously requires it to
+be stored separate from the buffer identifier due to size constraints.
+
+
+### Type Hash Variants
+
+If an alternative buffer format is used, the type hash should be
+modified. For example, if `uoffset_t` is defined as a 64-bit value, the
+fnv1a64 hash should be used instead. For big endian variants the hash
+remains unchanged but is byteswapped. The application will use the same
+id while the acces layer will handle the translation.
+
+For buffers using structs as roots, the hash remains unchanged because
+the struct is a unique type in schema. In this way a receiver that does
+not handle struct roots can avoid trying to read the root as a table.
+
+For futher variations of the format, a format identifier can be inserted
+in front of the namespace when generating the hash. There is no formal
+approach to this, but as an example, lets say we want to use only 1 byte
+per vtable entry and identify these buffers with type hash using the
+prefix "ebt:" for example buffer type. We then have the type hash:
+
+ #define type_hash_prefix "ebt:"
+
+ hash = fnv1a32(type_hash_prefix "Eclectic.FooBar");
+ hash = hash ? hash : fnv1a32(type_hash_prefix);
+
+If the hash returns 0 we hash the prefix.
+
+
+## Unions
+
+A union is a contruction on top of the above primitives. It consists of
+a type and a value.
+
+In the schema a union type is a set of table types with each table name
+assigned a type enumeration starting from 1. 0 is the type NONE meaning
+the union has not value assigned. The union type is represented as a
+ubyte enum type, or in the binary format as a value of type
+`union_type_t` which for standard FlatBuffers is an 8-bit unsigned code
+with 0 indicating the union stores not value and a non-zero value
+indicating the type of the stored union.
+
+A union is stored in a table as a normal sub-table reference with the
+only difference being that the offset does not always point to a table
+of the same type. The 8-bit union type is stored as a separate table
+field conventially named the same as the union value field except for a
+`_type` suffix. The value (storing the table offset) MUST have a field
+ID that is exactly one larger than the type field. If value field is
+present the type field MUST also be present. If the type is NONE the
+value field MUST be absent and the type field MAY be absent because a
+union type always defaults to the value NONE.
+
+Vectors of unions is a late addition (mid 2017) to the FlatBuffers
+format. FlatCC supports union vectors as of v0.5.0.
+
+Vectors of unions have the same two fields as normal unions but they
+both store a vector and both vectors MUST have the same length or both
+be absent from the table. The type vector is a vector of 8-bit enums and
+the value vector is a vector of table offsets. Obviously each type
+vector element represents the type of the table in the corresponding
+value element. If an element is of type NONE the value offset must be
+stored as 0 which is a circular reference. This is the only offset that
+can have the value 0.
+
+A later addition (mid 2017) to the format allows for structs and strings
+to also be member of a union. A union value is always an offset to an
+independent memory block. For strings this is just the offset to the
+string. For tables it is the offset to the table, naturally, and for
+structs, it is an offset to an separate aligned memory block that holds
+a struct and not an offset to memory inside any other table or struct.
+FlatCC supports mixed type unions and vectors of these as of v0.5.0.
+
+## Optional Fields
+
+As of mid 2020 the FlatBuffers format introduced optional scalar table fields.
+There is no change to the binary schema, but the semantics have changed slightly
+compared to ordinary scalar fields (which remain supported as is): If an
+optional field is not stored in a table, it is considered to be a null value. An
+optinal scalar field will have null as its default value, so any representable
+scalar value will always be stored in the buffer, unlike other scalar fields
+which by default do not store the field if the value matches the default numeric
+value. This was already possible before by using `force_add` semantics to force
+a value to be written even if it was matching the default value, and by
+providing an `is_present` test when reading a value so that it would be possible
+to distinguish between a value that happened to be a default value, and a value
+that was actually absent. However, such techniques were ad-hoc. Optional
+fields formalize the semantics of null values for scalars. Other field types
+already have meaningful null values. Only table fields can be optional so struct
+fields must always assign a value to all members.
+
+## Alignment
+
+All alignments are powers of two between 1 and 256. Large alignments are
+only possible via schema specified alignments. The format naturally has
+a maximum alignment of the largest scalar it stores, which is a 64-bit
+integer or floating point value. Because C malloc typically returns
+buffers aligned to a least 8 bytes, it is often safe to place buffers in
+heap allocated memory, but if the target system does not permit
+unaligned access, or is slow on unaligned access, a buffer should be
+placed in sufficiently aligned memory. Typically it is a good idea to
+place buffer in cacheline aligned boundary anyway.
+
+A buffers alignment is the same as the largest alignment of any
+object or struct it contains.
+
+The buffer size is not guaranteed to be aligned to its own alignment
+unlike struct. Googles `flatc` builder does this, at least when size
+prefixed. The `FlatCC` tool currently does not, but it might later add
+an option to pad up to alignment. This would make it simpler to stack
+similar typed buffers in file - but users can retrieve the buffers
+alignment and do this manually. Thus, when stacking size prefixed
+buffers, each buffer should start aligned to its own size starting at
+the size field, and should also be zero padded up to its own alignment.
+
+
+## Default Values and Deprecated Values
+
+A table can can omit storing any field that is not a required field. For
+strings, vectors and tables this result in returning a null value
+different from an empty value when reading the buffer. Struct fields not
+present in table are also returned as null.
+
+All fields that can return null do not have a default value. Other
+values, which are integers, floats and enumerations, can all have
+default values. The default value is returned if not found in the table.
+
+If a default value is not specified the default defaults to zero for the
+corresponding type. If an enumeration does not have a 0 value and no
+explicit default value, this is a schema error.
+
+When building a buffer the builder will compare a stored field to the
+known default value. If it matches the field will simple be skipped.
+Some builder API's makes it possible to force a default value to be
+stored and to check if a field is missing when reading the buffer. This
+can be used to handle NULL values differently from default or missing
+values.
+
+A deprecated field is a schema construct. The binary format either stores
+a table field, or it does not.
+
+A deprecated field should be treated as not available, as in no way to
+read the value as opposed to returning a default value regardless of
+whether the field is present or not. If they for some reason are made
+accessible the verifier must also understand and verify these fields.
+
+A deprecated field also cannot be written to a new buffer, although if
+it against guidelines remains possible to do so, it should be done as if
+the field was not deprecated.
+
+Structs cannot have default values and cannot have deprecated fields in
+stadard FlatBuffers. FlatCC supports marking a struct field as
+deprecated. This implies the field will always be zeroed and with no
+trivial accessors. A struct can never change size without breaking
+support for schema evolution.
+
+FlatCC JSON parsers allow structs to only set some values. Remaining
+values will be implicitly zeroed. The C API for writing buffers do not
+have this concern because a struct can just be written as a C struct
+so there is no control over which fields a user choose to set or zero.
+However, structs should be zeroed and padded where data is not otherwise
+set. This makes it possible to hash and integrity check structs (though
+this is not an integral part of the format).
+
+
+## Schema Evolution
+
+A table has a known set of low-valued field identifiers. Any unused
+field id can be used in a future version. If a field (as is normal) is
+implicitly assigned an id, new fields can only be added at the end of
+the table. Internally this translates into new versions getting ever
+larger vtables. Note that vtables are only stored as large as needed for
+the actual content of table, so a rarely used new field will not cause
+vtables to grew when unused.
+
+Enumarations may not change values in future versions. Unions may only
+added new table names to the end of the union list.
+
+Structs cannot change size nor content. They cannot evolve. FlatCC
+permits deprecating fields which means old fields will be zeroed.
+
+Names can be changed to the extend it make sense to the applications already
+written around the schema, but it may still break applications relying
+on some reflective information. For example, a reader may provide the
+string representation of a numeric enum value.
+
+New types can be added. For example adding a new table is always safe
+as long as it does not conflict with any existing schemas using the same
+namespace.
+
+Required fields cannot stop being required and they cannot be deprecated.
+
+Various attributes and other changes form a gray area that will not make
+the binary format unsafe but may still break due to changes in code
+generation, serialization to JSON, or similar. For example, a generated
+constructor that creates a table from a list of positional arguments
+might break if the field order changes or grows or have fields
+deprecated. JSON parsers could cease to work on old formats if base64
+serialization is added subsequently, and so on.
+
+## Keys and Sorting
+
+Keys and sorting is a meta construct driven by the schema. The binary
+format has no special concept of keys and sorting and a vector can be
+sorted by one of several keys so it makes no sense to enforce a specific
+order.
+
+The basic FlatBuffers format only permit at most one key and generally
+sorts vectors by that key during buffer construction. FlatCC does not do
+this both because sorting is not practical while building the buffer and
+because FlatCC supports sorting by one of several keys. Thus, in general
+it is not safe to assume that a vector is sorted, but it can be sorted
+if needed.
+
+## Size Limits
+
+A buffer should never be larger than `2^(sizeof(soffset_t) * 8 - 1) - 1`
+or `2^31 - 1` i.e. 2GB for standard FlatBuffers. Beyond this safe it is
+not safe to represent vtable offsets and implementations can no longer
+use signed types to store `uoffset_t` values. This limit also ensures
+that all vectors can be represented safely with a signed 32-bit length
+type.
+
+The application interface is likely to use a native type for
+representing sizes and vector indices. If this type is smaller that
+`sizeof(soffset_t)` or equivalently `sizeof(uoffset_t)` there is a risk
+of overflow. The simplest way to avoid any issues is to limit the
+accepted buffer size of the native size type. For example, on some
+embedded microprocessor systems a C compiler might have a 16-bit int and
+`size_t` type, even if it supports `uint32_t` as well. In this the safe
+assumption is to limit buffers to `2^15 - 1` which is very likely more
+than sufficient on such systems.
+
+A builder API might also want to prevent vectors from being created when
+they cannot stay within the size type of the platform when the element
+size is multipled by the element count. This is deceiving because the
+element count might easily be within range. Such issues will be rare in
+praxis but they can be the source of magnificent vulnerabilites if not
+handled appropriately.
+
+
+## Verification
+
+Verification as discussed here is not just about implementing a
+verifier. It is as much requirements that any builder must fulfill.
+
+The objective of verification is to make it safe to read from an
+untrusted buffer, or a trusted buffer that accidentally has an
+unexpected type. The verifier does not guarantee that the type is indeed
+the expeced type exact it can read the buffer identifier if present
+which is still no guarantee. The verifier cannot make it safe to modify
+buffers in-place because the cost of doing such a check would be
+prohitive in the average case.
+
+A buffer verifier is expected to verify that all objects (strings,
+vectors and tables) do not have an end position beyond the externally
+specified buffer size and that all offset are aligned relative to offset
+zero, and sometimes also relative to actually specified buffer (but
+sometimes it is desireable to verify buffers that are not stored
+aligned, such as in network buffers).
+
+A verifier primarily checks that:
+
+- the buffer is at least `2 * sizeof(uoffset_t)` such that the root
+ root table and buffer identifier can be checked safely.
+- any offset being chased is inside the buffer that any data accessed
+ from that resuling location is entirely inside the buffer and aligned
+ notably the vtables first entry must be valid before the vtable can
+ safely validate itself, but this also applies to table, string and
+ vector fields.
+- any uoffset has size of at least `sizeof(uoffse_t)` to aviod
+ self-reference and is no larger than the largest positive soffset_t
+ value of same size - this ensures that implementations can safely add
+ uoffset_t even if converting to signed first. It also, incidentally,
+ ensures compatibility with StreamBuffers - see below.
+- vtable size is aligned and does not end outside buffer.
+- vtable size is at least the two header fields
+ (`2 * `sizeof(voffset_t)`).
+- required table fields are present.
+- recursively verify all known fields and ignore other fields. Unknown
+ fields are vtable entries after the largest known field ID of a table.
+ These should be ignored in order to support forward versioning.
+- deprecated fields are valid if accessors are available to do so, or
+ ignore if the there is no way to access the field by application code.
+- vectors end within the buffer.
+- strings end within the buffer and has a zero byte after the end which
+ is also within the buffer.
+- table fields are aligned relative to buffer start - both structs,
+ scalars, and offset types.
+- table field size is aligned relative to field start.
+- any table field does not end outside the tables size as given by the
+ vtable.
+- table end (without chasing offsets) is not outside buffer.
+- all data referenced by offsets are also valid within the buffer
+ according the type given by the schema.
+- verify that recursion depth is limited to a configurable acceptable
+ level for the target system both to protect itself and such that
+ general recursive buffer operations need not be concerned with stack
+ overflow checks (a depth of 100 or so would normally do).
+- verify that if the union type is NONE the value (offset) field is absent and
+ if it is not NONE that the value field is present. If the union type
+ is known, the table should be verified. If the type is not known
+ the table field should be ignored. A reader using the same schema would
+ see the union as NONE. An unknown union is not an error in order to
+ support forward versioning.
+- verifies the union value according to the type just like any other
+ field or element.
+- verify that a union vector always has type vector if the offset vector
+ is present and vice versa.
+
+A verifier may choose to reject unknown fields and union types, but this
+should only be an user selectable option, otherwise schema evolution
+will not be possible.
+
+A verifier needs to be very careful in how it deals with overflow and
+signs. Vector elements multiplied by element size can overflow. Adding
+an invalid offset might become valid due to overflow. In C math on
+unsigned types yield predictable two's complement overflow while signed
+overflow is undefined behavior and can and will result in unpredictable
+values with modern optimizing compilers. The upside is that if the
+verifier handles all this correctly, the application reader logic can be
+much simpler while staying safe.
+
+
+A verifier does __not__ enforce that:
+
+- structs and other table fields are aligned relative to table start because
+ tables are only aligned to their soffset field. This means a table cannot be
+ copied naively into a new buffer even if it has no offset fields.
+- the order of individual fields within a table. Even if a schema says
+ something about ordering this should be considered advisory. A
+ verifier may additionally check for ordering for specific
+ applications. Table order does not affect safety nor general buffer
+ expectations. Ordering might affect size and performance.
+- sorting as specified by schema attributes. A table might be sorted in
+ different ways and an implementation might avoid sorting for
+ performance reasons and other practical reasons. A verifier may,
+ however, offer additional verification to ensure specific vectors or
+ all vectors are sorted according to schema or other guidelines. Lack
+ of sorting might affect expected behavior but will not make the buffer
+ unsafe to access.
+- that structures do not overlap. Overlap can result in vulnerabilities
+ if a buffer is modified, and no sane builder should create overlaps
+ other than proper DAGs except via a separate compression/decopression
+ stage of no interest to the verifier.
+- strings are UTF-8 compliant. Lack of compliance may affect expections
+ or may make strings appear shorter or garbled. Worst case a naive
+ UTF-8 reader might reach beyond the end when observing a lead byte
+ suggest data after buffer end, but such a read should discover the
+ terminal 0 before things get out of hand. Being relaxed permits
+ specific use cases where UTF-8 is not suitable without giving up the
+ option to verify buffers. Implementations can add additional
+ verification for specific usecases for example by providing a
+ strict-UTF8 flag to a verifier or by verifying at the application
+ level. This also avoids unnecessary duplicate validation, for example
+ when an API first verifies the buffer then converts strings to an
+ internal heap representation where UTF-8 is validated anyway.
+- default values are not stored. It is common to force default values to
+ be stored. This may be used to implement NULL values as missing
+ values different from default values.
+- enum values are only among the enum values of its type. There are many
+ use cases where it is convenient to add enum values for example flags
+ or enums as units e.g. `2 * kilo + 3 * ounce. More generally ordinary
+ integers may have value range restrictions which is also out of scope
+ for verifier. An application may provide additional verification when
+ requested.
+
+More generally we can say that a verifier is a basic fast assurance that
+the buffer is safe to access. Any additional verification is application
+specific. The verifier makes it safe to apply secondary validation.
+Seconary validation could be automated via schema attributes and may be
+very useful as such, but it is a separate problem and out of scope for a
+core binary format verifier.
+
+A buffer identifier is optional so the verifier should be informed
+whether an identifier must match a given id. It should check both ASCII
+text and zero padding not just a string compare. It is non-trivial to
+decide if the second buffer field is an identifier, or some other data,
+but if the field does not match the expected identifier, it certainly
+isn't what is expected.
+
+Note that it is not entirely trivial to check vector lengths because the
+element size must be mulplied by the stored element count. For large
+elements this can lead to overflows.
+
+
+## Risks
+
+Because a buffer can contain DAGs constructed to explode exponentiall,
+it can be dangerous to print JSON or otherwise copy content blindly
+if there is no upper limit on the export size.
+
+In-place modification cannot be trusted because a standard buffer
+verifier will detect safe read, but will not detect if two objects are
+overlapping. For example, a table could be stored inside another table.
+Modifing one table might cause access to the contained table to go out
+of bounds, for example by directing the vtable elsewhere.
+
+The platform native integer and size type might not be able to handle
+large FlatBuffers - see [Size Limits](#size-limits).
+
+Becaue FlatCC requires buffers to be sorted after builiding, there is
+risk due to buffer modifications. It is not sufficient to verify buffers
+after sorting because sorting is done inline. Therefore buffers must be
+trusted or rewritten before sorting.
+
+
+## Nested FlatBuffers
+
+FlatBuffers can be nested inside other FlatBuffers. In concept this is
+very simple: a nested buffer is just a chunk of binary data stored in a
+`ubyte` vector, typically with some convenience methods generated to
+access a stored buffer. In praxis it adds a lot of complexity. Either
+a nested buffer must be created strictly separately and copied in as
+binary data, but often users mix the two builder contexts accidentally
+storing strings from one buffer inside another. And when done right, the
+containing ubyte vector might not be aligned appropriately making it
+invalid to access the buffer without first copying out of the containing
+buffer except where unaligned access is permitted. Further, a nested
+FlatBuffer necessarily has a length prefix because any ubyte vector has
+a length field at its start. Therefore, size prefixed flatbuffers should
+not normally be stored as nested buffers, but sometimes it is necessary
+in order have the buffer properly aligned after extraction.
+
+The FlatCC builder makes it possible to build nested FlatBuffers while
+the containing table of the parent buffer is still open. It is very
+careful to ensure alignment and to ensure that vtables are not shared
+between the two (or more) buffers, otherwise a buffer could not safely
+be copied out. Users can still make mistakes by storing references from
+one buffer in another.
+
+Still, this area is so complex that several bugs have been found.
+Thus, it is advice to use nested FlatBuffers with some care.
+
+On the other hand, nested FlatBuffers make it possible to trivially
+extract parts of FlatBuffer data. Extracting a table would require
+chasing pointers and could potentially explode due to shared sub-graphs,
+if not handled carefully.
+
+## Fixed Length Arrays
+
+This feature can be seen as equivalent to repeating a field of the same type
+multiple times in struct.
+
+Fixed length array struct fields has been introduced mid 2019.
+
+A fixed length array is somewhat like a vector of fixed length containing inline
+fixed length elements with no stored size header. The element type can be
+scalars, enums and structs but not other fixed length errors (without wrapping
+them in a struct).
+
+An array should not be mistaken for vector as vectors are independent objects
+while arrays are not. Vectors cannot be fixed length. An array can store fixed
+size arrays inline by wrapping them in a struct and the same applies to unions.
+
+The binary format of a fixed length vector of length `n` and type `t` can
+be precisely emulated by created a struct that holds exactly `n` fields
+of type `t`, `n > 0`. This means that a fixed length array does not
+store any length information in a header and that it is stored inline within
+a struct. Alignment follows the structs alignment rules with arrays having the
+same alignment as their elements and not their entire size.
+
+The maximum length is limited by the maximum struct size and / or an
+implementation imposed length limit. Flatcc accepts any array that will fit in
+struct with a maximum size of 2^16-1 by default but can be compiled with a
+different setting. Googles flatc implementation currently enforces a maximum
+element count of 2^16-1.
+
+Assuming the schema compiler computes the correct alignment for the overall
+struct, there is no additonal work in verifying a buffer containing a fixed
+length array because structs are verified based on the outermost structs size
+and alignment without having to inspect its content.
+
+Fixed lenght arrays also support char arrays. The `char` type is similar to the
+`ubyte` or `uint8` type but a char can only exist as a char array like
+`x:[char:10]`. Chars cannot exist as a standalone struct or table field, and
+also not as a vector element. Char arrays are like strings, but they contain no
+length information and no zero terminator. They are expected to be endian
+neutral and to contain ASCII or UTF-8 encoded text zero padded up the array
+size. Text can contain embedded nulls and other control characters. In JSON form
+the text is printed with embedded null characters but stripped from trailing
+null characters and a parser will padd the missing null characters.
+
+
+The following example uses fixed length arrays. The example is followed by the
+equivalent representation without such arrays.
+
+ struct Basics {
+ int a;
+ int b;
+ }
+
+ struct MyStruct {
+ x: int;
+ z: [short:3];
+ y: float;
+ w: [Basics:2];
+ name: [char:4];
+ }
+
+ // NOT VALID - use a struct wrapper:
+ table MyTable {
+ t: [ubyte:2];
+ m: [MyStruct:2];
+ }
+
+Equivalent representation:
+
+ struct MyStructEquivalent {
+ x: int;
+ z1: short;
+ z2: short;
+ z3: short;
+ y: float;
+ wa1: int;
+ wa2: int;
+ name1: ubyte;
+ name2: ubyte;
+ name3: ubyte;
+ name4: ubyte;
+ }
+
+ struct MyStructArrayEquivalent {
+ s1: MyStructEquivalent;
+ s2: MyStructEquivalent;
+ }
+
+ struct tEquivalent {
+ t1: ubyte;
+ t2: ubyte;
+ }
+
+ table MyTableEquivalent {
+ t: tEquivalent;
+ m: MyStructArrayEquivalent;
+ }
+
+
+Note that forced zero-termination can be obtained by adding a trailing ubyte
+field since uninitialized struct fields should be zeroed:
+
+ struct Text {
+ str: [char:255];
+ zterm: ubyte;
+ }
+
+Likewise a length prefix field could be added if the applications involved know
+how to interpret such a field:
+
+ struct Text {
+ len: ubyte;
+ str: [char:254];
+ zterm: ubyte;
+ }
+
+The above is just an example and not part of the format as such.
+
+
+## Big Endian FlatBuffers
+
+FlatBuffers are formally always little endian and even on big-endian
+platforms they are reasonably efficient to access.
+
+However it is possible to compile FlatBuffers with native big endian
+suppport using the FlatCC tool. The procedure is out of scope for this
+text, but the binary format is discussed here:
+
+All fields have exactly the same type and size as the little endian
+format but all scalars including floating point values are stored
+byteswapped within their field size. Offset types are also byteswapped.
+
+The buffer identifier is stored byte swapped if present. For example
+the 4-byte "MONS" identifier becomes "SNOM" in big endian format. It is
+therefore reasonably easy to avoid accidentially mixing little- and
+big-endian buffers. However, it is not trivial to handle identifers that
+are not exactly 4-bytes. "HI\0\0" could be "IH\0\0" or "\0\0IH". It is
+recommended to always use 4-byte identifies to avoid this problem. See
+the FlatCC release 0.4.0 big endian release for details.
+
+When using type hash identifiers the identifier is stored as a big
+endian encoded hash value. The user application will the hash in its
+native form and accessor code will do the conversion as for other
+values.
+
+
+## StructBuffers
+
+_NOTE: the Google FlatBuffer project originally documented structs as
+valid root objects, but never actually implemented it, and has as of mid
+2020 changed the specification to disallow root structs as covered in
+this section. FlatCC for C has been supporting root structs for a long
+time, and they can provide significant speed advantages, so FlatCC will
+continue to support these._
+
+Unlike tables, structs are are usually embedded in in a fixed memory
+block representing a table, in a vector, or embedded inline in other
+structs, but can also be independent when used in a union.
+
+The root object in FlatBuffers is conventionally expected to be a table,
+but it can also be struct. FlatCC supports StructBuffers. Since structs
+do not contain references, such buffers are truly flat. Most
+implementations are not likely to support structs are root but even so
+they are very useful:
+
+It is possible to create very compact and very fast buffers this way.
+They can be used where one would otherwise consider using manual structs
+or memory blocks but with the advantage of a system and language
+independent schema.
+
+StructBuffers may be particularly interesting for the Big Endian
+variant of FlatBuffers for two reasons: the first being that performance
+likely matters when using such buffers and thus Big Endian platforms
+might want them. The second reason is the network byte order is
+traditionally big endian, and this has proven very difficult to change,
+even in new evolving IETF standards. StructBuffers can be used to manage
+non-trival big endian encoded structs, especially structs containing
+other structs, even when the receiver does not understand FlatBuffers as
+concept since the header can just be dropped or trivially documented.
+
+
+## StreamBuffers
+
+StreamBuffers are so far only a concept although some implementations
+may already be able to read them. The format is documented to aid
+possible future implementations.
+
+StreamBuffers makes it possible to store partially completed buffers
+for example by writing directly to disk or by sending partial buffer
+data over a network. Traditional FlatBuffers require an extra copying
+step to make this possible, and if the writes are partial, each section
+written must also store the segment length to support reassembly.
+StreamBuffers avoid this problem.
+
+StreamBuffers treat `uoffset_t` the same as `soffset_t` when the special
+limitation that `uoffset_t` is always negative when viewed as two's
+complement values.
+
+The implication is that everthing a table or vector references must be
+stored earlier in the buffer, except vtables that can be stored freely.
+Existing reader implementations that treat `uoffset_t` as signed, such as
+Javascript, will be able to read StreamBuffers with no modification.
+Other readers can easily be modified by casting the uoffset value to
+signed before it.
+
+Verifiers must ensure that any buffer verified always stores all
+`uoffset_t` with the same sign. This ensures the DAG structure is
+preserved without cycles.
+
+The root table offset remains positive as an exception and points to the
+root buffer. A stream buffer root table will be the last table in the
+buffer.
+
+The root table offset may be replaced with a root table offset at the
+end of the buffer instead of the start. An implementation may also
+zero the initial offset and update it later. In either case the buffer
+should be aligned accordingly.
+
+Some may prefer the traditional FlatBuffer approach because the root
+table is stored and it is somehow easier or faster to access, but modern
+CPU cache systems do not care much about the order of access as long as
+as their is some aspect of locality of reference and the same applies to
+disk access while network access likely will have the end of the buffer
+in hot cache as this is the last sent. The average distance between
+objects will be the same for both FlatBuffers and StreamBuffers.
+
+
+## Bidirectional Buffers
+
+Bidirectional Buffers is a generalization of StreamBuffers and FlatBuffers
+and so far only exists as an idea.
+
+FlatBuffers and StreamBuffers only allow one direction because it
+guarantees easy cycle rejection. Cycles are unwanted because it is
+expensive to verify buffers with cycles and because recursive navigation
+might never terminate.
+
+As it happens, but we can also easily reject cycles in bidirectional
+buffers if we apply certain constraints that are fully backwards
+compatible with existing FlatBuffers that have a size below 2GB.
+
+The rule is that when an offset changes direction relative to a parent
+objects direction, the object creating the change of direction becomes a
+new start or end of buffer for anything reachable via that offset.
+
+The root table R is reached via forward reference from the buffer
+header. Its boundaries are itself and the buffer end. If the this table
+has an offset pointing back, for example to new table X, then table X
+must see the buffer start and R as two boundaries. X must not directly
+or indirectly reach outside this region. Likewise, if the table R points
+to new table Y in the forward direction, then Y is bounded by itself and
+the buffer end.
+
+A lower bound is the end of a table or a vector althoug we just say the
+table or vector is a boundary. An upper bound is until the start of the
+table or vector, or the end of the buffer.
+
+When a buffer is verified, the boundaries are initially the buffer start
+and buffer end. When the references from the root table are followed,
+there new boundaries are either buffer start to root table, or root
+table to end, depending on the offsets direction. And so forth
+recursively.
+
+We can relax the requirement that simple vectors and vtables cannot
+cross these boundaries, except for the hard buffer limits, but it does
+compicate things and is likely not necessary.
+
+Nested FlatBuffers already follow similar boundary rules.
+
+The point of Bidirectional buffers is not that it would be interesting
+to store things in both directions, but to provide a single coherent
+buffer model for both ways of storing buffers without making a special
+case. This will allow StreamBuffers to co-exist with FlatBuffers without
+any change, and it will allow procedures to build larger buffers to make
+their own changes on how to build their subsection of the buffer.
+
+We still need to make allowance for keeping the buffer headers root
+pointer implicit by context, at least as an option. Otherwise it is not,
+in general, possible to start streaming buffer content before the entire
+buffer is written.
+
+
+## Possible Future Features
+
+This is highly speculative, but documents some ideas that have been
+floating in order to avoid incompatible custom extensions on the same
+theme. Still these might never be implemented or end up being
+implemented differently. Be warned.
+
+### Force Align
+
+`force_align` attribute supported on fields of structs, scalars and
+and vectors of fixed length elements.
+
+### Mixins
+
+A mixin is a table or a struct that is apparently expanded into the table
+or struct that contains it.
+
+Mixins add new accessors to make it appear as if the fields of the mixed
+in type is copied into the containing type although physically this
+isn't the case. Mixins can include types that themselves have mixins
+and these mixed in fields are also expanded.
+
+A `mixin` is an attribute applied to table or a struct field when that
+field has a struct or a table type. The binary format is unchanged and
+the table or struct will continue to work as if it wasn't a mixin and
+is therefore fully backwards compatible.
+
+Example:
+
+ struct Position {
+ spawned: bool(required);
+ x: int;
+ y: int;
+ }
+
+ table Motion {
+ place: Position(mixin);
+ vx: int = 0;
+ vy: int = 0;
+ }
+
+ table Status {
+ title: string;
+ energy: int;
+ sleep: int;
+ }
+
+ table NPC {
+ npcid: int;
+ motion: Motion(mixin);
+ stat: Status(mixin);
+ }
+
+ table Rock {
+ here: Position(mixin);
+ color: uint32 = 0xa0a0a000;
+ }
+
+ table Main {
+ npc1: NPC;
+ rock1: Rock;
+ }
+
+ root_type Main;
+
+
+Here the table NPC and Rock will appear with read accessors is if they have the fields:
+
+ table NPC {
+ npcid: int;
+ spawned: bool(required);
+ x: int;
+ y: int;
+ vx: int = 0;
+ vy: int = 0;
+ title: string;
+ energy: int;
+ sleep: int;
+ place: Position;
+ motion: Motion(required);
+ stat: Status;
+ }
+
+ table Rock {
+ spawned: bool(required);
+ x: int;
+ y: int;
+ here: Position(required);
+ color: uint32 = 0xa0a0a000;
+ }
+
+ table Main {
+ npc1: NPC;
+ rock1: Rock;
+ }
+
+ root_type Main;
+
+or in JSON:
+
+ {
+ "npc1": {
+ "npcid": 1,
+ "spawned": true;
+ "x": 2,
+ "y": 3,
+ "vx": -4,
+ "vy": 5,
+ "title": "Monster",
+ "energy": 6,
+ "sleep": 0
+ },
+ "rock1": {
+ "spawned": false;
+ "x": 0,
+ "y": 0,
+ "color": 0xa0a0a000
+ }
+ }
+
+
+
+Note that there is some redundancy here which makes it possible to
+access the mixed in fields in different ways and to perform type casts
+to a mixed in type.
+
+A cast can happen through a generated function along the lines of
+`npc.castToPosition()`, or via field a accessor `npc.getPlace()`.
+
+A JSON serializer excludes the intermediate container fields such as
+`place` and `motion` in the example.
+
+A builder may choose to only support the basic interface and required
+each mixed in table or struct be created separately. A more advanced
+builder would alternatively accept adding fields directly, but would
+error if a field is set twice by mixing up the approaches.
+
+If a mixed type has a required field, the required status propagates to
+the parent, but non-required siblings are not required as can be seen in
+the example above.
+
+Mixins also places some constraints on the involved types. It is not
+possible to mix in the same type twice because names would conflict and
+it would no longer be possible to do trivially cast a table or struct
+to one of its kinds. An empty table could be mixed in to
+provide type information but such a type can also only be added once.
+
+Mixing in types introduces the risk of name conflicts. It is not valid
+to mix in a type directly or indirectly in a way that would lead to
+conflicting field names in a containing type.
+
+Note that in the example it is not possible to mix in the pos struct
+twice, otherwise we could have mixed in a Coord class twice to have
+position and velocity, but in that case it would be natural to
+have two fields of Coord type which are not mixed in.
+
+Schema evolution is fully supported because the vtable and field id's
+are kept separate. It is possible to add new fields to any type that
+is mixed in. However, adding fields could lead to name conficts
+which are then reported by the schema compiler.
+
+
+[eclectic.fbs]: https://github.com/dvidelabs/flatcc/blob/master/doc/eclectic.fbs
diff --git a/doc/builder.md b/doc/builder.md
new file mode 100644
index 0000000..c953770
--- /dev/null
+++ b/doc/builder.md
@@ -0,0 +1,1845 @@
+# Builder Interface Reference
+
+<!-- vim-markdown-toc GFM -->
+
+* [Introduction](#introduction)
+* [Size Prefixed Buffers](#size-prefixed-buffers)
+* [Namespaces](#namespaces)
+* [Error Codes](#error-codes)
+* [Endianess](#endianess)
+ * [Deprecated](#deprecated)
+* [Buffers](#buffers)
+* [Tables](#tables)
+ * [Adding Fields](#adding-fields)
+ * [Nested Tables](#nested-tables)
+* [Packing tables](#packing-tables)
+* [Strings](#strings)
+* [Structs](#structs)
+ * [Fixed Length Arrays in Structs](#fixed-length-arrays-in-structs)
+* [Nested Buffers](#nested-buffers)
+* [Scalars and Enums](#scalars-and-enums)
+* [Vectors](#vectors)
+* [Unions](#unions)
+ * [Union Vectors](#union-vectors)
+ * [Unions of Strings and Structs](#unions-of-strings-and-structs)
+* [Error Handling](#error-handling)
+* [Type System Overview](#type-system-overview)
+* [Cloning](#cloning)
+* [Picking](#picking)
+* [Sorting Vectors](#sorting-vectors)
+ * [Dangers of Sorting](#dangers-of-sorting)
+ * [Scanning](#scanning)
+* [Example of different interface type users](#example-of-different-interface-type-users)
+* [Special Emitters](#special-emitters)
+
+<!-- vim-markdown-toc -->
+
+
+## Introduction
+
+We assume a separate read-only file and add extensions to this with
+support from a builder library and a builder object.
+
+The underlying builder library supports two modes of operation that mix
+together: `create` which sends data directly to the target buffer
+(emitter object) and a stack driven `start/end` approach which allocates
+objects and vectors on the stack. The code generator chooses the most
+efficient approach given the circumstances.
+
+Unlike most FlatBuffer language interfaces, tables and vectors are not
+created back to front: They are either created completely in one
+operation, or they are constructed on a stack front to back until they
+can be emitted. The final buffer is still constructed back to front.
+For big-endian platforms this may require temporary stack allocation of
+complete vectors where little endian platforms can emit directly.
+
+Tables and vectors stored in other tables or vectors must be completed
+before the can be stored, but unlike must language interfaces they can
+be constructed while a parent is also being constructed as long as
+nesting remains balanced. While this occasionally may require more
+stack, it may also avoid external temporary allocation.
+
+A builder object is required to start buffer construction. The builder
+must be initialized first and can be reset and reused between buffers,
+reusing stack allocation. The builder can have a customized emitter
+object but here we use the default. Finalizing the buffer depends
+the emitter and we can use a default finalizer only because we use the
+default emitter - it allocates and populates a linear buffer from a
+paged emitter ring buffer.
+
+Note that in most cases `flatcc_builder_finalize_buffer` is sufficient,
+but to be strictly portable, use
+`flatcc_builder_finalize_aligned_buffer` and `aligned_free`.
+`aligned_free` is often implemented as `free` in `flatcc/portable` but
+not on all platforms. As of flatcc version 0.5.0
+`flatcc_builder_aligned_free` is provided to add robustness in case the
+applications `aligned_free` implementation might differ from the library
+version due to changes in compile time flags.
+
+Generally we use the monster example with various extensions, but to
+show a simple complete example we use a very simple schema (`myschema.fbs`):
+
+ table mytable { myfield1: int; myfield2: int; }
+
+ #include "myschema_builder.h"
+
+ void testfun() {
+
+ void *buffer;
+ size_t size;
+ flatcc_builder_t builder, *B;
+ mytable_table_t mt;
+ B = &builder;
+ flatcc_builder_init(B);
+
+ /* Construct a buffer specific to schema. */
+ mytable_create_as_root(B, 1, 2);
+
+ /* Retrieve buffer - see also `flatcc_builder_get_direct_buffer`. */
+ /* buffer = flatcc_builder_finalize_buffer(B, &size); */
+ buffer = flatcc_builder_finalize_aligned_buffer(B, &size);
+
+ /* This is read-only buffer access. */
+ mt = mytable_as_root(buffer);
+ assert(mytable_myfield1(mt) == 1);
+ assert(mytable_myfield2(mt) == 2);
+
+ /* free(buffer); */
+ flatcc_builder_aligned_free(buffer);
+
+ /*
+ * Reset, but keep allocated stack etc.,
+ * or optionally reduce memory using `flatcc_builder_custom_reset`.
+ */
+ flatcc_builder_reset(B);
+
+ /* ... construct another a buffer */
+
+ /* Reclaim all memory. */
+ flatcc_builder_clear(B);
+ }
+
+Note that a compiled schema generates a `myschema_reader.h` file and
+optionally a `myschema_builder.h` and some common support files. When
+building a buffer the `myschema_builder.h` must be used but when only
+reading then the `myschema_reader.h` file should be used instead. Here
+we are only concerned with building. When building, it is necessary to
+link with `libflatccrt.a` runtime library but when reading, all
+nesessary code is contained in the generated header files.
+
+The builder object only manages a stack of currently active objects and
+does not store an object that is complete. Instead it calls an emitter
+object with the partial data ready for emission, similar to a write
+function. A default emitter is provided which implements a ring buffer
+and the result may be written to a file, copied to a buffer or a
+finalized to an allocated buffer. The builder supports these methods
+directly for default emitter, and only the default emitter because
+emitters are otherwise defined by only one simple emit function - see
+`emit_test.c` for a simple example of a custom emitter.
+A custom allocator may be useful when working with small buffers in a
+constrained environment - the allocator handles temporary stacks,
+virtual table caches etc. but not the emitter.
+
+The allocator and emitter interface is documented in the builder library
+header pflatcc_builder.h] and the default implementation in
+[flatcc_emitter.h]. The default allocator is implemented as part of the
+flatcc_builder source.
+
+The builder can be reused between buffers using the `reset` operation.
+The default emitter can also be reused and will automaticallhy reset
+when the buffer is. For custom emitters, any reset operation must be
+called manually. The same applies to clear. The reset operations
+maintain allocated memory by also reduce memory consumption across
+multiple resets heuristically.
+
+
+## Size Prefixed Buffers
+
+Buffers can be created with a size prefix of type `uoffset_t`. When
+doing this, the buffer is aligned relative to the size prefix such that
+buffers can be stacked in a file and for example be accessed via memory
+mapping.
+
+The usual `create_as_root` and `start_as_root` has a variant called
+`create_as_root_with_size` and `start_as_root_with_size`.
+
+To read a buffer with a size prefix use:
+
+ size_t size;
+ buffer = flatbuffers_read_size_prefix(rawbuffer, &size);
+
+The size the size of the buffer excluding the size prefix. When
+verifying buffers the buffer and size arguments should be used. See also
+[monster_test.c] for an example.
+
+Note that the size prefix ensures internal alignment but does not
+guarantee that the next buffer in a file can be appended directly
+because the next buffers alignment is unknown and because it potentially
+wastes padding bytes. The buffer size at offset 0 can increased to the
+needed alignment as long as endianness is handled and the size of the
+size field is subtracted, and zeroes are appended as necesary.
+
+## Namespaces
+
+The generated code is typically wrapped in a custom namespace and
+functions and definitions that are library specific are usually mapped
+into the namespace. We often use an empty namespace for custom types and
+`flatbuffers_` for library names, but usually a `foo_` prefix could also
+be used on both cases, where `foo` is a custom namespace.
+
+Note that the name `flatcc_emitter` is only used with the default emitter
+and the name [flatcc_builder] is only used for buffer management but not
+for constructing content. Once a valid buffer is ready the common and
+namespace (`flatbuffers`) and schema specific (or empty) namespace is used
+with schema specific operations.
+
+All schema specific content is prefixed with a namespace to avoid
+conflicts - although the namespace is empty if the schema doesn't
+specify any. Note that the same schema can have multiple
+namespaces. An example of a namespace prefixed operation:
+
+ MyGame_Example_Monster_create_as_root(B, ... lots of args);
+
+To simplify this we can use a macro to prefix a namespace. The use
+of the name `ns` is arbitrary and we can choose different names for
+different namespaces.
+
+ #undef ns
+ #define ns(x) MyGame_Example_ ## x
+
+But the above doesn't work with nested calls to ns such as
+
+ ns(Monster_color_add(B, ns(Color_Green));
+
+it would have to be:
+
+ ns(Monster_color_add)(B, ns(Color_Green);
+
+Therefore we have a helper macro the does allow nesting:
+
+ #undef ns
+ #define ns(x) FLATBUFFERS_WRAP_NAMESPACE(MyGame_Example, x)
+
+The common namespace can also be wrapped for a more consistent
+appearance:
+
+ #undef nsc
+ #define nsc(x) FLATBUFFERS_WRAP_NAMESPACE(flatbuffers, x)
+
+ nsc(string_ref_t) s;
+ s = nsc(string_create_str(B, "hello, world!"));
+
+instead of
+
+ flatbuffers_string_ref_t s;
+ s = flatbuffers_string_create_str(B, "hellow, world!);
+
+
+## Error Codes
+
+Functions return values can be grouped roughly into 4 groups: functions
+returning pointer, references, `size_t` lengths, and `int` status codes.
+Pointers and references return 0 on error. Sizes do not return error.
+Status codes return 0 on success or an error code that is usually -1.
+Status codes may be checked with `flatbuffers_failed(...)`.
+
+
+## Endianess
+
+The function `flatbuffers_is_native_pe()` provide an efficient runtime
+check for endianness. Since FlatBuffers are little endian, the function
+returns true when the native endianness matches the protocol endianness
+which for FlatBuffers is little endian. We do not hardcode little endian
+because it enables us to support other protocols in the future - for
+example the struct conversions may be very useful for big endian network
+protocols.
+
+> As of flatcc 0.4.0 it is possible to compile flatcc with native
+> big-endian support which has been tested on AIX. More details in
+> [README Endianness](https://github.com/dvidelabs/flatcc#endianness)
+
+
+By testing `is_native_pe` dependencies on speficic compile time flags
+can be avoided, and these are fragile:
+
+During build, vectors and structs behave differently from tables: A
+table updates one field at a time, doing endian conversion along the
+way. A struct is either placed in a table, and is converted by the table
+specific operation, or it is placed in a vector. A vector only does the
+endian conversion when the vector is finished, so when a vector is not
+created atomically with a single `create` call, the elements are placed on a
+stack. By default this is in native format, but the user may choose to
+place buffer encoded structs or scalars in the vector and call
+`vec_end_pe`. The same `push` operation can be used to place a
+natively encoded struct and a buffer encoded struct in the vector
+because it does no conversion at that point. Therefore there is also no
+`push_pe` method that would mean to push an unconverted element unto
+the stack. Only for tables and entire vectors does the pe command make
+sense. If a vector wishes to push a buffer encoded struct when the
+vector is otherwise constructed in native encoding or vice versa, the
+vector may be extended empty and then assigned using any of the
+`assign`, `assign_from_pe` or `assign_to_pe` calls.
+
+We did not mention that a struct can also be a standalone object
+as a buffer root, and for that it has a `end_pe` call that essentially
+works like a single element vector without a length prefix.
+
+The `clone` operation is a more userfriendly `pe` operation which takes
+an object or a vector from an existing buffer and places it in a new
+buffer without endian conversion.
+
+### Deprecated
+
+__NOTE: `FLATBUFFERS_LITTLEENDIAN` is deprecated and will be removed in
+a future version. It just complicates endina handling.__
+
+The header files tries to define `FLATBUFFERS_LITTLEENDIAN` to 0 or 1
+based on system definitions but otherwise leaves the flag undefined.
+Simply testing for
+
+ #if FLATBUFFERS_LITTLEENDIAN
+ ...
+ #endif
+
+will not fail if the endianness is undetected but rather give the
+impression that the system is big endian, which is not necessarily true.
+The `flatbuffers_is_native_pe()` relates to the detected or system
+provided conversion functions if a suitable `endian.h` file after the
+header file gave up on its own detection (e.g. `le16toh(1) == 1`).
+Therefore, it is better to use `flatbuffers_is_native_pe()` in most
+cases. It also avoids making assumptions on whether the protocol is
+little or big endian.
+
+## Buffers
+
+A buffer can most simply be created with the `create_as_root` call for
+a table or a struct as seen ealier. The `as_root` part is just a thin
+wrapper around buffer start and stop calls and using these allows for
+more flexibility. the `as_root` also automatically uses the defined file
+identifier if any.
+
+The build process begins with starting a buffer. The buffer may contain
+a struct or table, so one of these should be constructed subsequently.
+Structs are generally created inline in tables, only at the buffer level
+is a struct created independently. The api actually permits other
+formats, but it will not be valid flatbuffers then.
+
+ flatcc_builder_ref_t root;
+ flatcc_builder_init(B);
+ /* 0 indicates no file identifier. */
+ flatcc_builder_buffer_start(B, 0);
+ root = /* ... construct a table or a struct */
+ flatcc_builder_buffer_end(B, root);
+
+`buffer_start` takes a file identifier as second argument. If null or a
+string with null characters, the identifier is not stored in the buffer.
+
+Regardless of whether a struct or table is declared as root in the schema or
+not, there are methods to automatically start both the buffer and struct or buffer
+and table such as `Monster_start/end_as_root`. This is also valid for
+nested buffers. If the schema has a file identifier, it is used as
+identifier for the created object. The alternative
+`create_as_root_with_identifier` allows for explicitly setting an id or
+explicitly dropping an id by providing a null argument. The
+corresponding reader function `Monster_as_root(buffer)` also has a
+`Monster_as_root_with_identifier(buffer, id)`. Here the id is ignored if the id
+is null, and otherwise the operation returns null if the id does not match.
+For the most part ids are handled transparently by these defaults.
+
+The buffer can be started with block alignment and/or a custom
+identifier using the `flatcc_builder_buffer_start_aligned`:
+
+ flatcc_builder_buffer_start_aligned(B, "myid", 16);
+ ...
+ flatcc_builder_buffer_end(B, root);
+
+The alignment can be 0 using the minimum required alignment, which is
+derived from the operations between `start/end`. The alignment argument
+is called `block_align` and is useful if the emitter operates on blocks
+such as encryption, cache line isolation, or compression blocks where
+the final buffer should align with the blocks used during construction.
+This can lead to significant zero padding just after the block header,
+depending on block size.
+
+The schema specified identifier is given as:
+
+ flatbuffers_identifier
+
+and defaults to null. The schema specified extension is given as:
+
+ flatbuffers_extension
+
+and defaults to null. Note that `flatbuffers_` is replaced by whatever
+namespace is chosen. Each specific schema type also has a named file
+exntension reflection the extension active when the type was defined,
+for example:
+
+ MyGame_Example_Monster_file_identifier
+
+This define is used when `create_as_root` automatically sets a file
+identifier.
+
+NOTE: before flatcc 0.6.1, the identifier was named
+
+ MyGame_Example_Monster_identifier (DEPRECATED)
+
+but that would conflict with a table field named `identifier` which
+happened often enough to be a problem. This naming is now removed on
+conflict and will be completely removed in a future version.
+
+When the buffer is ended, nothing special happens but only at this point
+does it really makes sense to access the resulting buffer. The default
+emitter provides a copy method and a direct buffer access method. These
+are made available in the builder interface and will return null for
+other emitters. See also [flatcc_builder.h] and the default emitter in
+`flatcc_emitter.h`.
+
+
+## Tables
+
+### Adding Fields
+
+If `Monster` is a table, we can create a Monster buffer (after
+builder init) as follows:
+
+ Monster_start(B);
+ Monster_Hp_add(B, 80);
+ ...
+ flatcc_builder_buffer_create(B, Monster_end(B));
+
+All scalar and enums are added similar to the `Monster_add_Hp` call. We
+will subsequently see how to deal with other types.
+
+A table can also be created in a single operation using `create`:
+
+ Monster_ref_t m;
+ m = Monster_create(B, 80, ...);
+
+The create arguments are those taken by the individual fields `add`
+operations which is either an scalar, enum, or a reference returned by
+another create or end call. Note that unlike the C++ interface, unions
+only take a single argument that is also accepted by the `add` operation
+of a union field. Deprecated fields are not included in the argument
+list.
+
+As of v0.5.3 the arguments are given in field id order which is usually
+the same as the schema listed order, except with id attributes are
+given explicitly. Using id order ensures version stability. Note that
+since deprecated fields are omitted, deprecated fields can still break
+existing code.
+
+BREAKING: Prior to flatcc v0.5.3 the create call would use the schema order
+also when fields have id attributes specifying a different order. This
+could break code across versions and did not match the C++ behavior.
+It was also document that the `original_order` attribute affected create
+argument order, but that was incorrect.
+
+NOTE: If the `original_order` attribute is set on a table, the `create`
+implementation adds fields to the table in schema listed order,
+otherwise it adds fields in order of decreasing size to reduce alignment
+overhead. Generally there should be no need to use the `original_order`
+attribute. This doesn't affect the call argument order although that
+was incorrectly document prior to v 0.5.3.
+
+NOTE: the `create` and `create_as_root` operations are not guaranteed to
+be available when the number of fields is sufficiently large because it
+might break some compilers. Currently there are no such restrictions.
+
+Scalars and enums do not store the value if it it matches the default
+value which is by default 0 and otherwise defined in the schema. To
+override this behavior, use `force_add`. In the monster example, health
+points default to 100 (percent), so if we wish to force store it in the
+buffer we could use:
+
+ Monster_hp_force_add(B, 100);
+
+Only scalar fields and enums have a `force_add` operation since only these
+types have a default value, and other types have a meaningful
+interpretation of null. (It is not quite clear if empty tables separate
+from null/absent are valid in all implementations).
+
+`force_add` may be useful when roundtripping data from a database where it is
+relevant to distinguish between any valid value and null. Most readers will not
+be able to tell the difference, but it is possible to inspect a flatbuffer to
+see if a table field is present, present and default, or absent, meaning null.
+
+NOTE: As of mid 2020, FlatBuffers added optional scalar table fields with support in flatcc 0.6.1. These fields automatically imply `force_add` to represent null values when a field is absent and therefore these fields do not have a `force_add` method and these fields also do not have a default value other than `null`, i.e. null if not added.
+
+If Monster is declared as root, the above may also be called as:
+
+ Monster_start_as_root(B);
+ Monster_add_hp(B, 80);
+ ...
+ Monster_end_as_root(B);
+
+(Calling `Monster_end` instead would require `buffer_end` call
+subsequently, and is basically a violation of nesting).
+
+### Nested Tables
+
+Tables can be nested, for example the Mini field may have type
+Monster table again (a recursive type):
+
+ buffer_start(B);
+ Monster_start(B);
+ Monster_add_Hp(B, 80);
+ Monster_start(B);
+ Monster_hp_add(B, 81);
+ ...
+ Monster_mini_add(Monster_end(B));
+ ...
+ flatcc_builder_buffer_end(B, Monster_end(B));
+
+The child Monster table may be created before the parent or as above
+between the tables start and end. If created before, reference must be
+stored until it can be added. The only requirement is that start and
+end are balanced, that the sub-table is ended before the parent, and
+that both are created in the same buffer (nested buffers can be created
+while the parent buffer is still being created, similar to sub-tables,
+so it is possible to mess this up):
+
+ Monster_ref_t root, mini;
+
+ buffer_start(B);
+ Monster_start(B);
+ Monster_hp_add(B, 81);
+ mini = Monster_end(B);
+
+ Monster_start(B);
+ Monster_hp_add(B, 80);
+ Monster_mini_add(B, mini);
+ root = Monster_end(B);
+
+ flatcc_builder_buffer_end(B, root)
+
+
+Rather than adding a child table explicitly, it can be started and ended
+as an operation on the field name, here with `Monster_Mini_start/end`:
+
+ Monster_ref_t root;
+
+ Monster_start(B);
+ Monster_add_Hp(B, 80);
+ Monster_mini_start(B);
+ Monster_hp_add(B, 81);
+ Monster_mini_end(B);
+ root = Monster_end(B);
+
+ flatcc_builder_buffer_end(B, root);
+
+We can repeat the the table nesting as deep as we like, provided our
+builder is willing to allocate enough stack space.
+
+**Warning**: It is possible to use the wrong table type operations
+between `start/end` - don't do that. It is a tradeoff between usability
+and type safety.
+
+Note that vectors, strings and structs map several standard operations
+to a field name, for example `mytable_myfield_push(B, x)`. This is not the
+case with table fields which only map `start/end/create` in part because it
+would never terminate for recursive types and in part because each table
+is different making a generic mapping rather complex and with very long
+names.
+
+A table may be created with a constructor, but it requires all
+non-scalar objects to be references or pointers. Struct fields must be
+pointers to zero padded structs, and strings, vectors and tables must be
+references. The constructors are probably most useful for simple tables
+with mostly scalar values (here we use the original Monster fields and
+leaves out any we have invented for the sake of illustration):
+
+IMPORTANT: objects can generally only be created within a buffer
+context, i.e. after `buffer_start`. For example calling
+`flatbuffers_uint8_vec_create` before `Monster_create_as_root`
+technically violates this rule because the create call also starts the
+buffer. It is, however, allowed at the top level. For nested buffers
+(see later) this must be avoided because the vector would end up in the
+wrong buffer.
+
+ Monster_ref_t m;
+ uint8_t invdata[4] = { 1, 2, 3, 4 };
+ Vec3_t vec;
+
+ flatbuffers_uint8_vec_ref_t inventory =
+ flatbuffers_uint8_vec_create(B, invdata, 4);
+ m = Monster_create(B, &vec, 150, 80, name, inventory,
+ Color_Red, Any_as_NONE());
+ flatcc_builder_buffer_create(m);
+
+or
+
+ Monster_create_as_root(B, &vec, 150, 80, name, inventory,
+ Color_Red, Any_as_NONE());
+
+## Packing tables
+
+By reordering the fields, the table may be packed better, or be better
+able to reuse an existing vtable. The `create` call already does this
+unless the attribute `original_order` has been set. Unions present a
+special problem since it is two fields treated as one and the type field
+will generally waste padding space if stored in order:
+
+To help pack unions better these can be added with the type
+seperate from the value reference using `add_type(B, test.type)`,
+`add_value(B, test)` where the value is only added if the type is
+not `NONE`. The `add_type` should be called last since it is the
+smallest type.
+
+The same field should not be added more than at most once. Internal
+reservations that track offset fields may overflow otherwise. An
+assertion will fail in debug builds.
+
+Required table fields will be asserted in debug builds as part of the
+`end/create` call. Only offset fields can have a required attribute.
+
+The generated `monster_test_reader.h` from [monster_test.fbs] shows how
+the default packing takes place in generated `create` calls, see for
+example the typealias test. Note that for example vectors are stored
+together with integers like `uint32` because references to vectors have
+the same size as `uint32`.
+
+
+## Strings
+
+Strings can be added to tables with zero terminated strings as source
+
+ Monster_start(B);
+ ...
+ Monster_name_create_str(B, "Mega Monster");
+ Monster_end(B);
+
+or strings potententially containing zeroes:
+
+ #define MONSTER "Mega\0Monster"
+ Monster_start(B);
+ ...
+ /* Includes embedded zero. */
+ Monster_name_create(B, MONSTER, sizeof(MONSTER));
+ Monster_end(B);
+
+or zero terminated source up to at most `max_len` characters.
+
+ #define MONSTER "Mega\0Monster"
+ Monster_start(B);
+ ...
+ /* "Mega" */
+ Monster_name_create_strn(B, MONSTER, 12);
+ Monster_end(B);
+
+The `create_str` and `create_strn` versions finds the string length via strlen
+and strnlen respectively. `append_string` also has `_str/_strn` versions.
+
+A string can also be created from an existing flatbuffer string in which
+case the length is expected to be stored 4 bytes before the pointer in
+little endian format, and aligned properly:
+
+ Monster_name_clone(B, mybufferstring);
+
+or, create a string at most 4 characters long starting at 0-based index
+10, if present:
+
+ Monster_name_slice(B, mybufferstring, 10, 4);
+
+If index or index + len goes beyond the source, the result is truncated
+accordingly, possibly resulting in an empty string.
+
+A string can also be create independently. The above is just shortcuts
+for that:
+
+ flatbuffers_string_ref_t monster_name;
+ monster_name = flatbuffers_string_create_str("Mega Monster");
+ Monster_name_add(B, monster_name);
+
+Strings are generally expected to be utf-8, but any binary data will be
+stored. Zero termination or embedded control codes are includes as is.
+The string gets a final zero temination regardless, not counted in the
+string length (in compliance with the FlatBuffers format).
+
+A string can also be constructed from a more elaborate sequence of
+operations. A string can be extended, appended to, or truncated and
+reappended to, but it cannot be edited after other calls including calls
+to update the same string. This may be useful if stripping escape codes
+or parsed delimiters, etc., but here we just create the same "Mega
+Monster" string in a more convoluted way:
+
+ flatbuffers_string_ref_t name;
+ char *s;
+ #define N 20
+ Monster_start(B);
+ ...
+ flatbuffers_string_start(B);
+ flatbuffers_string_append(B, "Mega", 4);
+ flatbuffers_string_append(B, " ", 1);
+ s = flatbuffers_string_extend(B, N);
+ strncpy(s, "Monster", N);
+ flatbuffers_string_truncate(B, N - strlen(s));
+ name = flatbuffers_string_end(B);
+ Monster_name_add(B, name);
+ ...
+ Monster_end(B);
+
+`flatbuffers_string_create...` calls are also available when creating
+the string separate from adding it to a table, for example:
+
+ flatbuffers_string_h name;
+ name = flatbuffers_string_create_str(B, "Mini Monster");
+
+It is guaranteed that any returned the string buffer is zero filled and
+has an extra zero after the requested length such that strlen can be
+called on the content, but only the requested bytes may be updated.
+
+Every call only returns the substring being added to the string in that
+operation. It is also possible to call `flatbuffers_string_edit` to get a
+modifiable pointer to the start of the string.
+
+`flatbuffers_string_reserved_len(B)` returns the current string length
+including any embedded zeroes, but excluding final zero termination. It
+is only valid until `string_end` is called.
+
+See [flatcc_builder.h] for detailed documentation. Essentially `extend`
+reserves zeroed space on the stack and returns a buffer to the new
+space, and truncate reduces the overall size again, and the string is
+then given the final length and a zero termination at the end.
+
+There is no endian conversion (except internally for the string length),
+because UTF-8 strings are not sensitive to endianness.
+
+Like tables, the string may be created while a parent container is being
+constructed, or before.
+
+Strings can also be used as vector elements, but we will get that when
+discussing vectors.
+
+## Structs
+
+Structs in tables can be added as:
+
+ Monster_pos_create(B, 1, 2, 3);
+
+The above essentially does the following:
+
+ Vec3_t *v;
+ v = Monster_pos_start(B);
+ Vec3_assign(v, 1, 2, -3.2);
+ Monster_pos_end(B);
+
+Some versions of the monster schema has extra test fields - these would
+break the assign approach above because there would be extra arguments.
+Instead we can rely on the zero intialization and assign known fields.
+
+ Vec3_t *v;
+ v = Monster_pos_start(B);
+ v->x = 1, v->y = 2, v->z = -3.2;
+ Monster_pos_end(B);
+
+`Monster_pos_end_pe(B)` can be used when the struct is known to be
+little endian (pe for protocol endian, meaning no conversion is necessary),
+for example copied from an existing buffer, but then `clone` is a better
+choice:
+
+ Monster_pos_clone(B, &v);
+
+When the struct is created alone for use as root:
+
+ Vec3_ref_t root;
+ root = Vec3_create(B, 1, 2, 3)
+ flatcc_builder_buffer_create(B, root);
+
+An existing struct can be added as:
+
+ Vec3_t v;
+ Vec3_assign(&v, 1, 2, 3);
+ /* v does not have to be zero padded. */
+ Monster_pos_add(B, &v);
+
+When adding a struct that is already little endian, presumably from an
+existing buffer, it can be cloned using:
+
+ Monster_pos_clone(B, &v);
+
+Clone assumes the source struct is both little endian and that padding
+is already zeroed (example ignores error handling), and `end_pe`
+does nothing.
+
+ *Monster_pos_start(B) = v;
+ Monster_pos_end_pe(B);
+
+There are several assignment types that convert between host (native)
+endianness and buffer endiannes. We use `pe` to indicate
+`protocol_endian` rather than just `le` for `little endian` because it
+allows us to change endianness to big endian in the the future and it
+more clearly states the intention. While big endian is not allowed in
+FlatBuffers, big endian structs may be useful in other network
+protocols - but it is not currently supported because it would force
+little endian platforms to support byte-swapping. The operations are:
+
+`assign_from_pe`, `assign_to_pe`, `copy`, `copy_from_pe`,
+`copy_to_pe`, `to_pe` and `from_pe`.
+
+All the copy operations takes a const pointer as source, and
+`to/from_pe` is just copy with same source and destination:
+
+ Vec3_t v, v2;
+ Vec3_assign_to_pe(&v2, 1, 2, 3);
+ Vec3_copy_from_pe(Vec3_clear(&v), &v2);
+ Vec3_to_pe(&v);
+
+`from_pe` means from little endian to native endian, end `to_pe`
+is the opposite. On little endian platforms all copy operations behave
+the same and only move fields, not padding. `to/from_pe` conversion
+will leave deprecated fields either as they were, or zero them because
+the operation may be skipped entirely on protocol endian native platforms.
+
+While struct fields cannot be deprecated officially, they are supported
+if the schema compiler is flagged to accept then. The struct fields are
+renamed and assigned 0 when using assign or copy, and assign / create has
+no argument for them.
+
+Because padding can carry noise and unintended information, structs
+should be cleared before assignment - but if used as a source to copy
+the padding is not copied so only the destation need to be zeroed.
+
+If a struct is nested, the assign operation includes all fields as if
+the struct was flattened:
+
+ typedef struct Plane Plane_t;
+ struct Plane {
+ Vec3_t direction;
+ Vec3_t normal;
+ };
+ Plane_t plane;
+ Plane_clear(&plane);
+ Plane_assign(&plane, 1, 2, 3, 7, 8, 9);
+
+Structs can also be created standalone, similar to tables and vectors,
+but FlatBuffers only support this when the struct is used as root.
+
+Assuming Vec3 is declared as root, a buffer only holding a Vec3 struct
+can be created using:
+
+ Vec3_create_as_root(B, 1, 2, 3);
+
+Important: do not store the above as a nested buffer - it would be
+missing the vector size field. If `Monster_playground` is a ubyte vector
+with `nested_flatbuffer` attribute, then
+`Monster_playground_start/end_as_root` may be used.
+
+Structs also support `start/end_as_root`. In this case `start` returns
+the struct pointer, and `end_pe_as_root` is supported:
+
+ Vec3_t *v;
+ v = Vec3_start_as_root(B);
+ v->x = 1, v->y = 2, v->z = 3;
+ Vec3_end_as_root(B);
+
+(Be careful with the different result codes since a tables `start_as_root`
+returns an integer result code where 0 is success while a struct returns
+a pointer that is null on failure.)
+
+The following also creates a buffer at top-level, but it may also be
+added as a nested buffer because the stack frame detects the nesting:
+
+ Vec3_t *v;
+ flatcc_builder_buffer_start(B);
+ v = Vec3_start(B);
+ v->x = 1, v->y = 2, v->z = 3;
+ flatcc_builder_buffer_end(B, Vec3_end(B));
+
+or
+ flatcc_builder_buffer_start(B);
+ ...
+ Monster_start(B);
+ flatcc_builder_buffer_start(B);
+ v = Vec3_start(B);
+ v->x = 1, v->y = 2, v->z = 3;
+ Monster_playground_add(B,
+ flatcc_builder_buffer_end(B, Vec3_end(B)));
+ flatcc_builder_buffer_end(B, Monster_end(B));
+
+or
+
+ flatcc_builder_buffer_ref_t nested_root;
+ flatcc_builder_buffer_start(B);
+ nested_root = Vec3_create_as_root(B, 1, 2, 3);
+ Monster_start(B);
+ Monster_playground_add(B, nested_root);
+ flatcc_builder_buffer_end(B, Monster_end(B));
+
+A `buffer_ref_t` can be used as `uint8_vec_ref_t` when the
+buffer is nested, and otherwise the reference cannot be used
+for anything other than testing for failure. The buffer content
+should match the type declared in a `nested_flatbuffers` attribute
+but it isn't enforced, and a root can be stored in any field of
+[ubyte] type.
+
+When `Monster_playground` is declared as nested:
+
+ ...
+ Monster_start(B);
+ Monster_playground_create_as_root(B, 1, 2, 3);
+ flatcc_builder_buffer_end(B, Monster_end(B));
+ ...
+
+Be aware that `Vec3_t` is for native updates while `Vec3_struct_t` is a const
+pointer to an endian encoded struct used in the reader interface, and actually
+also as source type in the clone operation.
+
+### Fixed Length Arrays in Structs
+
+As of flatcc 0.6.0 it is possible to have fixed length arrays as structs
+members. A fixed length array is equivalent to having a struct field repeated
+one or more times. The schema syntax is `name : [type:count];` similar to an
+ordinary struct field `name : type;`. The type is any type that can ba valid
+struct field type including enums and nested structs. The size cannot be 0 and
+the overall size is limited by the maximum struct size the array is contained
+within which is typically 65535 (2^16-1).
+
+For example, given the schema:
+
+ struct MyStruct {
+ counters:[int:3];
+ // char is only valid as a fixed length array type
+ name:[char:6];
+ }
+ table MyTable {
+ mystruct:MyStruct;
+ }
+
+The table can be created with:
+
+ ns(MyStruct_t) *x;
+ ns(MyTable_start_as_root(B));
+ x = ns(MyTable_mystruct_start(B));
+ x->counters[0] = 1;
+ x->counters[1] = 2;
+ x->counters[2] = 3;
+ strncpy(x->name, "Kermit", sizeof(x->name));
+ ns(MyTable_mystruct_end(B));
+ ns(MyTable_end_as_root(B));
+
+Note that char arrays are not zero terminated but they are zero padded, so
+strncpy is exactly the right operation to use when assigning to char arrays,
+at least when they do not contain embedded nulls which is valid.
+Char arrays are expected to be ASCII or UTF-8, but an application may use
+other encodings if this is clear to all users.
+
+With assignment:
+
+ int data[3] = { 1, 2, 3 };
+ ns(MyStruct_t) *x;
+ ns(MyTable_start_as_root(B));
+ x = ns(MyTable_mystruct_start(B));
+ // Careful: the name argument does not use strncpy internally
+ // so the source must be at least the expected length
+ // like other array arguments. Strings can have embedded nulls.
+ ns(MyStruct_assign(x, data, "Kermit");
+ ns(MyTable_mystruct_end(B));
+ ns(MyTable_end_as_root(B));
+
+To read a struct the pointer to the struct is retrieved first
+
+ int sum;
+ int i;
+ const char *name;
+ size_t name_len;
+ ns(MyTable_table_t) t;
+ ns(MyStruct_struct_t) x;
+
+ t = ns(MyTable_as_root(buf));
+ x = ns(MyTable_mystruct_get(t));
+ for (sum = 0, i = 0; i < ns(MyStruct_counters_get_len()); ++i) {
+ sum += ns(MyStruct_counters_get(x, i)) +
+ // char arrays are endian neutral, so we can use pointer access.
+ name = ns(MyStruct_name_get_ptr(x);
+ name_len = strnlen(name, ns(MyStruct_name_get_len()));
+ printf("Added counters from %.*s", name_len, name);
+ // char arrays can be accessed like other arrays:
+ // ns(MyStruct_name_get(x, i);
+ }
+
+An alternative to `strnlen` is strip trailing zeroes which will allow for
+char arrays embedded zeroes, but there is no direct support for this. The JSON
+printer uses this approach to shorten the printed char array string.
+
+The `_get` suffix can be ommitted in the above if the flatcc `-g` has not
+supplied to reduce the risk of name conflicts, but not for `_get_len` and
+`_get_ptr`.
+
+Note that it is not possible to have fixed length arrays as part of a table but
+it is possible to wrap such data in a struct, and it is also possible to have
+vectors of structs that contain fixed length arrays.
+
+
+## Nested Buffers
+
+These are discussed under Structs and Table sections but it is worth
+noting that a nested buffers can also be added as pe ubyte vectors
+which is probably the original intention with nested buffers. However,
+when doing so it can be difficult to ensure the buffer is correctly
+aligned. The untyped `flatcc_builder` has various options to deal with
+this, but with generated code it is better to create a nested buffer
+inline when suitable (with nested `buffer_start/end` or
+`mytable_myfield_create_as_root`) - for example a message wrapper with
+a union of tables holding buffer for a specific message type. In other
+cases the buffer may truly be created independently of the current
+buffer and then it can be added with controlled alignment using either
+the `flatcc_builder` api for full control, or the `nest` operation on
+nested table and struct fields:
+
+To create and add a ubyte vector with a higher alignment than ubytes
+single byte alignment, the following operation is available as an
+operation on a nested buffer field:
+
+ Monster_playground_nest(B, void *data, size_t size, uint16_t align);
+
+If alignment is unknown, it can be set to 0, and it will default to 8
+for nested table types, and to the struct alignment for struct buffers.
+
+Block alignment is inherited from the parent buffer so the child buffer
+ends up in its own set of blocks, if block alignment is being used. If
+the nested buffer needs a different block alignment, the `flatcc_builder`
+api must be used.
+
+All structs and tables have an `start/end/create_as_root` even if they
+are not referenced by any `nested_flatbuffers` field and they will
+create [ubyte] vectors containing a nested buffer but only [ubyte]
+fields with `nested_flatbuffers` attribute will dedicated
+`start/end/create_as_root` on the field name. Structs also have
+`end_pe_as_root`.
+
+
+## Scalars and Enums
+
+Scalars keep their original type names `uint8_t`, `double`, etc, but
+they get some operations similar to structs. These are contained in a
+namespace which by default is `flatbuffers_`, for example:
+
+ uint16_t *flatbuffers_uint16_to_pe(uint16_t *p);
+ uint16_t *flatbuffers_uint16_from_pe(uint16_t *p);
+ flatbuffers_bool_t *flatbuffers_bool_to_pe(flatbuffers_bool_t *p);
+ flatbuffers_bool_t *flatbuffers_bool_from_pe(flatbuffers_bool_t *p);
+
+These may be used freely, but are primarily present as an interface to
+the vector operations also defined for structs.
+
+Enums have similar definitions which may be used to convert endianness
+without being concerned with the underlying integer type, for example:
+
+ Color_enum_t *Color_to_pe(Color_enum_t *p);
+
+## Vectors
+
+Vectors can be created independently, or directly when updating a table - the
+end result is the same. Builder vector operations always reference element
+values by pointer, or by reference for offset types like tables and strings.
+
+ uint8_t v;
+ Monster_inventory_start(B);
+ v = 1;
+ flatbuffers_uint8_vec_push(B, &v);
+ v = 2;
+ flatbuffers_uint8_vec_push(B, &v);
+ v = 3;
+ flatbuffers_uint8_vec_push(B, &v);
+ Monster_inventory_end(B);
+
+or
+
+ flatbuffers_uint8_vec_ref_t inv;
+ uint8_t v;
+ flatbuffers_uint8_vec_start(B);
+ v = 1;
+ flatbuffers_uint8_vec_push(B, &v);
+ v = 2;
+ flatbuffers_uint8_vec_push(B, &v);
+ v = 3;
+ flatbuffers_uint8_vec_push(B, &v);
+ inv = flatbuffers_uint8_vec_end(B);
+ Monster_inventory_add(B, inv);
+
+Because it can be tedious and error-prone to recall the exact field
+type, and because the operations are not type safe (any kind of push
+would be accepted), some vector operations are also mapped to the field
+name:
+
+ uint8_t v;
+ Monster_inventory_start(B);
+ v = 1;
+ Monster_inventory_push(B, &v);
+ v = 2;
+ Monster_inventory_push(B, &v);
+ v = 3;
+ Monster_inventory_push(B, &v);
+ Monster_inventory_end(B);
+
+Note: vector operations on a type uses the `_vec_<operation>` syntax, for
+example `uint8_vec_push` or `Monster_vec_push` while operations that are mapped
+onto table field names of vector type do not use the `_vec` infix because it is
+not a type name, for example `Monster_inventory_push`.
+
+A slightly faster operation preallocates the vector:
+
+ uint8_t *v;
+ Monster_inventory_start(B);
+ v = Monster_inventory_extend(B, 3);
+ v[0] = 1, v[1] = 2, v[2] = 3;
+ v = Monster_inventory_extend(B, 2);
+ v[0] = 4, v[1] = 5;
+ Monster_inventory_end(B);
+
+Push just extends one element at time. Note that `extend` returns the
+pointer to the extended vector segment. The full vector can be accessed
+with `edit` and `reserved_len` between `start/end` (recalling that pointers
+cannot be reused across buffer calls):
+
+ uint8_t *v, i;
+ uint8_t data[] = { 1, 2 };
+ Monster_inventory_start(B);
+ Monster_inventory_push(B, &data[0]);
+ Monster_inventory_push(B, &data[1]);
+ v = Monster_inventory_edit(B);
+ for (i = 1; i < Monster_inventory_reserved_len(B); ++i) {
+ v[i] = v[i - 1] + v[i];
+ }
+ Monster_inventory_end(B);
+
+Note that the name `reserved_len` is to avoid confusion with
+`_vec_len` read operation. It also indicates that it is not the final
+size since it may change with `truncate/extend`.
+
+A vector can also contain structs. Let us extend the Monster example
+with a vector of positions, so we can have a breadcrumb trail:
+
+ Monster_breadcrumbs_start(B);
+ Vec3_vec_push_create(B, 1, 2, 3);
+ Vec3_vec_push_create(B, 3, 4, 5);
+ Monster_breadcrumbs_end(B);
+
+or
+
+ Monster_breadcrumbs_start(B);
+ Monster_breadcrumbs_push_create(B, 1, 2, 3);
+ Monster_breadcrumbs_push_create(B, 3, 4, 5);
+ Monster_breadcrumbs_end(B);
+
+or
+
+ Vec3_t *trails[2];
+ Monster_breadcrumbs_start(B);
+ trails = Monster_breadcrumbs_extend(B, 2);
+ Vec3_create(&trails[0], 1, 2, 3);
+ Vec3_create(&trails[1], 4, 5, 6);
+ Monster_breadcrumbs_end(B);
+
+The `vec_start/exttend/end/end_pe/create/create_pe/clone/slice` are
+translated into similar calls prefixed with the field name instead of
+`vector` and except for `start`, the calls also add the vector to the
+table if successful, for example:
+
+ uint8_t data[] = { 1, 2, 3 };
+ Monster_inventory_create(B, data, 3);
+ Monster_breadcrumbs_slice(B, some_other_breadcrumbs, 0, 10);
+
+Vector operations that are allowed between `vec_start` and
+`vec_end(_pe)` are also mapped. These are
+`vec_extend/append/truncate/edit/reserved_len`, and `push/push_create/push_copy`.
+`push_copy` ensures only valid fields are copied, not zero padding (or
+the unofficial deprecated fields).
+
+A struct `push_clone` is the same as a `push_copy` operation
+because structs are stored inline in vectors - with the
+exception of union vectors which have `push_clone` that does the
+right thing.
+
+The `add` call adds a vector created independently from the table field,
+and this is what is going on under the surface in the other calls:
+
+ Vec3_t x;
+ Vec3_vec_ref_t inv;
+
+ /* Clear any padding in `x` because it is not allocated by builder. */
+ Vec3_assign(Vec3_clear(&x), 3, 4, 5);
+ Vec3_vec_start(B);
+ Vec3_vec_push_create(B, 1, 2, 3);
+ Vec3_vec_push(B, &v);
+ inv = Vec3_vec_end(B);
+
+ Monster_breadcrumbs_add(B, inv);
+
+As always, a reference such as `inv` may only be used at most once, and
+should be used once to avoid garbage.
+
+Note that `Vec3_vec_start` would create an independent struct instead of a
+vector of structs. Also note that `vec_ref_t` is a builder specific
+temporary type while `vec_t` is intended as a const pointer to the first
+element in an existing buffer in little endian encoding with a size
+prefix (to be used with clone, for example).
+
+An existing Vec3 struct can also be pushed with `Vec3_push(B, &v)`. The
+argument must be zero padded. Because vectors are converted at the end,
+there is no `push_pe`, but a struct may be in little endian using push
+on all platforms if `vec_end_pe` is used at the end.
+
+A vector may also be created from an existing array:
+
+ uint8_t data[] = { 1, 2, 3 };
+ Monster_inventory_add(B, flatbuffers_uint8_vec_create(B, data, 3));
+
+This also applies to arrays of structs as long as they are properly zero
+padded. `create_pe` is similar but does not do any endian conversion,
+and is similar to `clone` except there are no header prefix.
+
+Likewise an existing vector with proper zero padding may be appended
+using the `extend` operation. The format must be native or little endian
+depending on whether `vec_end` or `vec_end_pe` is called at the end.
+
+All vectors are converted to little endian when the `end` command is
+called. `end_pe` prevents this from happening.
+
+`clone` and `slice` and can be used to copy an entire, or a partial
+array from an existing buffer. The pointer must be to the first vector
+element in little endian format, and it must have a size prefix and be
+aligned (like any flatbuffer vector). `slice` takes a base-0 index and
+a vector length where the result is truncated if the source is not
+large enough.
+
+ Monster_inventory_clone(B, v);
+
+or
+
+ Monster_inventory_add(flatbuffers_int8_clone(B, v);
+
+or
+
+ Monster_inventory_add(flatbuffers_int8_slice(B, v, 2, 4);
+
+or
+
+ Monster_inventory_slice(B, v, 2, 4);
+
+A vector of strings an be constructed as (`friends` is a string
+vector field that we just invented for the occasion):
+
+ flatbuffers_string_ref_t friend, *p;
+ Monster_friends_start(B);
+ friend = flatbuffer_string_create_str(B, "Peter Pan");
+ Monster_friends_push_create_str(B, "Shrek");
+ Monster_friends_push_create_str(B, "Pinnochio");
+ Monster_friends_push_create_str(B, "Pinnochio");
+ Monster_friends_push_create(B, "Hector", 6);
+ Monster_friends_push(friend);
+ p = Monster_friends_extend(B, 1);
+ *p = flatbuffers_string_create_str("Cindarella");
+ Monster_friends_push_start(B);
+ flatbuffers_string_append("The Little");
+ flatbuffers_string_append("Mermaid");
+ Monster_friends_push_end(B);
+ Monster_friends_end(B);
+
+Vectors and strings have a second argument to start, see also the `spawn` example
+below.
+
+Finally, vectors can contain tables. Table vectors are offset
+vectors just like string vectors. `push_start` pushes a new table and
+allows for updates until `push_end`. If we have a spawn vector of monsters in
+the Monster table, we can populate it like this:
+
+ Monster_spawn_start(B);
+ Monster_vec_push_start(B);
+ Monster_Hp_add(B, 27);
+ Monster_vec_push_end(B);
+ Monster_vec_push_create(B,
+ /* Approximate argument list for illustration only. */
+ &vec, 150, 80, name, inventory, Color_Red, Any_as_None());
+ Monster_spawn_end(B);
+
+The push operation has constructors `push_start/end/create` for both tables
+struct, and string elements. String elements also have
+`push_create_str/create_strn/clone/slice`. Structs also have
+`push_copy`. Between `push_start` and
+`push_end` the operations valid for the given table or string element can be
+used (typically `add` for tables, and `append` for strings).
+
+Instead of `Monster_vec_push_start` we can also uses
+`Monster_spawn_push_start` etc. - in this case the child type is the
+same as the parent, but using the field specific `push_start` ensures we
+get the right table element type.
+
+`Monster_spawn_push_start(B)` takes no length argument because it is a
+table element, while `Monster_friends_push_start(B)` because it is a
+string element (similar to a vector).
+
+`Monster_spawn_start(B)` should just be followed by push operations
+rather than following up with `Monster_spawn_extend(B, n)` because we
+risk loose references that can lead to crashes. But handled carefully
+it is possible:
+
+ Monster_vec_ref_t mvec;
+ Monster_spawn_start(B);
+ mvec = Monster_spawn_extend(B, 2);
+ mvec[0] = Monster_create(B, ...);
+ mvec[1] = Monster_create(B, ...);
+ Monster_spawn_end(B);
+
+We can also push a reference to an independently create monster table,
+all as seen before with strings.
+
+As of flatcc version 0.5.2 it is also possible to clone tables.
+Therefore we also have `push_clone` on vectors of tables.
+
+While the use of `extend` and `truncate` is possible with vectors of
+strings and tables, they should be used with care because the elements
+are references and will just end up as garbage if truncated. On the
+other hand, unused elements should be truncated as 0 elements in an
+offset vector is not valid.
+
+A vector of tables or strings can be created using an externally built
+array of references, for example:
+
+ Monster_ref_t monsters[20];
+ Monster_vec_ref_t mvec;
+ monsters[0] = Monster_create(B, ...);
+ ...
+ mvec = Monster_vec_create(B, monsters, 20);
+
+By convention, create calls bypass the internal stack when the endian
+format is otherwise compatible, and thus feed the emitter directly.
+This is not possible with table and string vectors because the
+references in the source vectors must be translated into offsets.
+Therefore these create calls are similar to start, append, end calls.
+There is an internal, but unexposed `flatcc_builder` version
+`create_offset_vector_direct` which destroys the source vector instead
+of allocating a stack copy.
+
+## Unions
+
+Unlike the C++ Flatbuffers library, we do not expose a separate union
+type field except via a small struct with a union of typed references
+and a type field. This struct is given to the create argument, and above
+it is zero initialized meaning default None.
+
+Unions can be created with value specific `start/end/create` calls. The add
+call is not specialized since it takes a union reference:
+
+
+ Monster_test_Weapon_start(B);
+ Weapon_rounds_add(B, 50);
+ Monster_test_Weapon_end(B);
+
+or
+
+ Monster_test_Weapon_create(B, 50);
+
+or
+
+ Monster_test_Weapon_add(B, Weapon_create(B, 50));
+
+or
+
+ Monster_test_Pickup_start(B);
+ Pickup_location_create(B, 0, 0, 17);
+ Pickup_hint_create_str(B, "Jump High!");
+ Monster_test_Pickup_end(B);
+
+or
+
+ Pickup_ref_t test;
+ Pickup_start(B);
+ Pickup_location_create(B, 0, 0, 17);
+ test = Pickup_end(B);
+ Monster_test_add(B, Any_as_Pickup(test));
+
+or
+
+ Any_union_ref_t test;
+ Pickup_start(B);
+ Pickup_location_create(B, 0, 0, 17);
+ /* test.Pickup = Pickup_end(B); no longer possible as of v0.5.0 */
+ test.value = Pickup_end(B); /* As of v0.5.1. */
+ test.type = Any_Pickup;
+ Monster_test_add(B, test);
+
+The following is valid and will not return an error, but also has no effect:
+
+ Monster_test_add(B, Any_as_NONE());
+
+
+_Note: the union structure has been changed for v0.5.0, and v0.5.1.
+Both unions and union vectors are now represented by a struct with the
+fields { type, value } in the low level interfaces. Before 0.5.0 only
+unions of tables were supported._
+
+
+### Union Vectors
+
+The `monster_test.fbs` schema has a field named manyany in the Monster
+table. It is vector of unions of type Any.
+
+We can create a vector using
+
+ Any_union_vec_ref_t anyvec_ref;
+
+ Any_vec_start(B);
+ Any_vec_push(TestSimpleTableWithEnum_create(B));
+ anyvec_ref = Any_vec_end(B);
+ Monster_manyany_add(anyvec_ref);
+
+A union can be constructed with type specific `_push` or `_push_create` operations:
+
+ Monster_manyany_start(B);
+ Monster_manyany_push(B, Any_as_TestSimpleTableWithEnum(ref));
+ Monster_manyany_end(B);
+
+ Monster_manyany_start(B);
+ Monster_manyany_TestSimpleTableWithEnum_push(B, ref);
+ Monster_manyany_end(B);
+
+ Monster_manyany_start(B);
+ Monster_manyany_TestSimpleTableWithEnum_push_create(B, args);
+ Monster_manyany_end(B);
+
+and other similar operations, much like other vectors.
+
+Note that internally `anyvec_ref` is really two references, one to type
+vector and one to a table vector. The vector is constructed a single
+vector of unions and later split into two before final storage. If it is
+necessary to create a union vector from a vector of tables and types,
+the low level builder interface has a `direct` call to do this.
+
+Union vectos generally use more temporary stack space because during
+construction because each element as a struct of type and reference
+which don't back as densely as a two separate tables. In addition the
+separated type and table vectors must be constructed temporarily. The
+finaly buffer result is resonably compatct since the type vector does
+not use much space. Unions will also be somewhat slower to construct,
+but not unreasonably so.
+
+
+### Unions of Strings and Structs
+
+_Note: as of v0.5.0 unions can also contain strings and structs in
+addition to tables. Support for these types in other languages may vary,
+but C++ does support them too._
+
+All union values are stored by reference. Structs that are not unions
+are stored inline in tables and cannot be shared but unions of struct
+type are stored by reference and can be shared. A union value is
+therefore always a reference. This is mostly transparent because the
+generated table field methods has `create/start/end` calls for each union
+value type and addition to `add`.
+
+To illustrate the use of these variation we use the Movie table from
+[monster_test.fbs]:
+
+ namespace Fantasy;
+
+ table Attacker {
+ sword_attack_damage: int;
+ }
+
+ struct Rapunzel {
+ hair_length: uint16;
+ }
+
+ struct BookReader {
+ books_read: int;
+ }
+
+ union Character {
+ MuLan: Attacker = 2, // Can have name be different from type.
+ Rapunzel = 8, // Or just both the same, as before.
+ Belle: Fantasy.BookReader,
+ BookFan: BookReader,
+ Other: string,
+ Unused: string = 255
+ }
+
+ table Movie {
+ main_character: Character;
+ antagonist: Character;
+ side_kick: Character;
+ cameo: Character;
+ characters: [Character];
+ }
+
+
+and the mixed type test case from [monster_test.c]:
+
+
+ nsf(Character_union_ref_t) ut;
+ nsf(Rapunzel_ref_t) cameo_ref;
+ nsf(Attacker_ref_t) attacker_ref;
+ nsf(BookReader_ref_t) br_ref;
+ nsf(BookReader_t *) pbr;
+ nsf(Movie_table_t) mov;
+
+ nsf(Movie_start_as_root(B));
+ br_ref = nsf(BookReader_create(B, 10));
+ cameo_ref = nsf(Rapunzel_create(B, 22));
+ ut = nsf(Character_as_Rapunzel(cameo_ref));
+ nsf(Movie_main_character_Rapunzel_create(B, 19));
+ nsf(Movie_cameo_Rapunzel_add(B, cameo_ref));
+ attacker_ref = nsf(Attacker_create(B, 42));
+ nsf(Movie_antagonist_MuLan_add(B, attacker_ref));
+ nsf(Movie_side_kick_Other_create_str(B, "Nemo"));
+ nsf(Movie_characters_start(B));
+ nsf(Movie_characters_push(B, ut));
+ nsf(Movie_characters_MuLan_push(B, attacker_ref));
+ nsf(Movie_characters_MuLan_push_create(B, 1));
+ nsf(Character_vec_push(B, nsf(Character_as_Other(nsc(string_create_str(B, "other"))))));
+ nsf(Movie_characters_Belle_push(B, br_ref));
+ pbr = nsf(Movie_characters_Belle_push_start(B));
+ pbr->books_read = 3;
+ nsf(Movie_characters_Belle_push_end(B));
+ nsf(Movie_characters_Belle_push(B, nsf(BookReader_create(B, 1))));
+ nsf(Movie_characters_Belle_push_create(B, 2));
+ nsf(Movie_characters_Other_push(B, nsc(string_create_str(B, "another"))));
+ nsf(Movie_characters_Other_push_create_str(B, "yet another"));
+ nsf(Movie_characters_end(B));
+ nsf(Movie_end_as_root(B));
+
+Note that reading a union of string type requires a cast which can be
+seen in the full test case in [monster_test.c].
+## Error Handling
+
+The API generally expects all error codes to be checked but the
+following table and vector operations will accept and return an error:
+
+- `add` null reference to table, vector, or string.
+- `push` null reference to table or string.
+- `buffer_end/create` null reference to root.
+
+This can simplify pushing or adding atomically created objects, for
+example by adding a cloned vector to table field.
+
+It is especially important to check start operations because the builder
+will not be in the expected stack frame context after failure and will
+not have reserved necessary internal memory, for example when adding a
+table field.
+
+On a server with reasonable amount of memory using the default
+allocator, and with an emitter that will not return errors, and when it
+can be expected that inputs will not exceed the size contraints of the
+flatbuffer data types, and if the api is being used correctly, then there
+are no reason for failure and error handling may be skipped. However,
+it is sometimes desireable for servers to restrict a single clients
+memory usage, and then errors are very likely unless the source data is
+already limited. As an opposite example, an embedded device sending
+small network packages using a fixed but large enough allocation pool,
+would be in total control and need not be concerned with any errors.
+
+
+
+## Type System Overview
+
+The generated methods for building buffers may look the same but
+have different semantics. For example `_clone` on a table field
+such as `Monster_enemy_clone` will actually create a table based
+on the content of a table in a another buffer, then add that
+table to the currently open table. But `Monster_clone` will
+create clone and just return a reference without adding the
+reference to any table. There is also `push_clone` which adds
+an element to an open vector. The same applies to many other
+operations.
+
+Basically there are
+the following different types of methods:
+
+- Methods on native flatbuffer types, such as
+ `flatbuffer_string_start`.
+- Methods on generated types types such as `Monster_start`
+- Methods on field members such as as `Monster_emeny_start`
+- Methods on vectors on vectors of the above such as
+ `flatbuffers_string_vec_start`, `Monster_vec_start`.
+ `Monster_inventory_vec_start`.
+- Slight adaptions for buffer roots and nested buffer roots.
+
+For unions and union vectors the story is more complex - and the
+api might need to be cleaned up further, but generally there are
+both union type fields, union value fields, and union fields
+representing both, and vectors of the same. In additional there
+are pseudo fields for each union member because `create` on a
+union does not make sense, but
+`Monster_myvariant_MyTable_create` does create and `MyTable`
+table and assigns it with the correct type to the field
+`Monster_myvariant_type` and `Monster_myvariant.
+
+
+## Cloning
+
+As of flatcc v0.5.2 it is also possible to clone tables, unions,
+vectors of tables, vectors of strings, and vectors of unions.
+Previously many operations did have a clone or a `push_clone`
+operator, but these were all raw byte copies. Table cloning and
+union cloning is signficantly more involved as it a simple copy
+will not work due to stored references, possible sharing of
+references and because the required alignment of table is hard
+to reason about without building a new table. Unions and union
+vectors are even more difficult.
+
+That said, cloning is now implemented for all relevant data
+types.
+
+All clone operations expect the content to originate from
+another finalized buffer. For scalars and structs there are
+copy operations that are almost the same as clone - they both
+avoid endian conversion.
+
+Structs have a special case with clone and copy: Whenever a
+struct is stored inline in the desitination buffer, it behaves
+like copy. Whenever the destination is a buffer root, or a union
+member, the result is a reference to an independent memory
+block. When calling clone on a struct type the destination is
+unknown and a indendpendent reference is created. If this is not
+the intention a `copy` operation can be used. When used field
+methods the destination type is known at the right thing will
+happen.
+
+Cloning a table will, by default, expand any shared references
+in the source into separate copies. This is also true when
+cloning string vectors, or any other data that holds references.
+Worst case this can blow up memory (which is also true when
+printing JSON from a buffer).
+
+It is possible to preserve the exact DAG structure when cloning.
+It may not worthwhile for simple use cases but it goes as
+follows:
+
+The builder has a pointer to a `flatcc_refmap_t` object. This is
+a fairly small stack allocated object that implements a
+hashtable. By default this pointer is null, and we have the
+above mentioned expansion. If it is not null, each newly cloned
+object will have its reference stored in the refmap. The next
+time the same object is cloned, the existing reference will be
+taken from the refmap instead. See source comments in
+`flatcc_refmap.h` and `flatcc_builder.h`, and `monster_test.c`
+clone tests.
+
+Note that, for example, it might be relevant to preserve DAG
+structure when cloning one object with all its sub-objects, but
+if it is cloned a second time, a new copy is desired still while
+preseving the inner DAG structure. This can be done by working
+with multiple refmaps and simple swapping them out via
+`flatcc_builder_set_refmap`. It is also possible to add
+references manually to a refmap before cloning.
+
+Warning: the refmap MUST not hold any foreign references when
+starting a nested root clone or when cloning inside a nested
+buffer that has been started but not ended because it is
+invalid to share references between buffers and there are no
+safety checks for this.
+
+
+## Picking
+
+Picking is a method that is related to clone and also introduced
+with flatcc 0.5.2. A pick method is only defined on a table
+field or a struct field. Instead of taking an a read reference
+of same type as the field, it takes a reference to to the same
+container type (table or struct). Essentially pick means: find
+myself in the other table, clone me, and and me to the new table
+which is currently open. So clone takes an entire table where
+pick takes a single field. Table cloning is implemented as a
+sequence of pick method, one for each field as can be seen in
+the generated builder source. A pick operation does nothting if
+the field is not set. Pick also works with refmaps because it
+does an internal clone operation. In the generated code, only
+clone on types will use the refmap but other clone and pick
+operations do depend on these type clone methods.
+
+
+## Sorting Vectors
+
+Vectors can be sorted, but not by the primary builder interface because:
+
+String and table elements cannot be accessed after they have been
+emitted. The emitter can do all sorts of async operations other than
+actually building a buffer, for example encrypting blocks and / or send
+partial buffers over the network. Scalars could be sorted, but the most
+efficient way of emitting vectors does not create a temporary vector but
+emits the source directly when endianess allows for it. Less
+significant, the buffer producer is likely busy processing content and /
+or on a resource constrained device. Altogether, it is much simpler to
+not support sorting at this interface level.
+
+To understand how sorting is implemented, lets first look at how an
+already sorted vector can be searched:
+
+Every vector of string, scalar and enum element types have a `find`
+operation in the reader interface that performs a binary seach. Every
+vector of table and struct elements have a `find_by_<field_name>` iff
+there is a key attribute on at least one top-level scalar, enum or
+string field type. FlatBuffers do not officially allow for multiple key
+attributes, but if enabled, there will by a `find_by` operation for
+every keyed element field. In addition there is a `find` operation that
+maps to the first keyed field.
+
+The read interface returns a vector type, which is a const pointer, when
+accessing a table field of vector type. The find operation takes such a
+vector as first argument, and a key as second. Strings have variations
+to allow for keys with a given length (similar to strcmp vs strncmp).
+
+This leads us to the sort interface:
+
+Every `find` and `find_by` operation has a matching `sort` and `sort_by`
+operation table and struct vectors maps `sort` to the first keyed
+`sort_by` operation. The sort operation takes a non-const vector which
+has the type name suffix `_mutable_vec_t`. These
+vectors are not available via the reader interface and must be cast
+explicitly from `_vec_t` to `_mutable_vec_t`. When this is done, the
+vector can be sorted in-place in the buffer without any memory
+allocation and without any recursion.
+
+
+
+If the namespace is
+`flatbuffers`, a string vector is sorted by:
+
+ flatbuffers_string_vec_t vec;
+ vec = ...;
+ `flatbuffers_string_vec_sort((flatbuffers_string_mutable_vec_t)vec)`
+
+Scalar and enum vectors have similar inline sort operations, for
+example:
+
+ flatbuffers_uint8_vec_sort(flatbuffer_uint8_mutable_vec_t vec);
+
+For vectors of tables or structs the sort function is named by the key
+field. Assuming the Monster table has a key attribute on the `Hp` field,
+the following sort operation is available:
+
+ MyGame_Example_Monster_vec_t monsters;
+ monsters = ...;
+ MyGame_Example_Monster_vec_sort_by_Hp(
+ (MyGame_Example_Monster_mutable_vec_t)monsters);
+
+Note: this is the reader interface. Any kind of `ref_t` type used by the
+builder do not apply here. (Advanced: if an emitter builds a buffer, the
+ref type can be used to find the actual vector pointer and then it can
+be sorted by casting the pointer to a vector, even if the buffer isn't
+finished).
+
+Multiple keys per table or struct is an optional feature. Each key will
+have its own sort and find function similar to the above. The first key
+also has the shortcut:
+
+ MyGame_Example_Monster_vec_sort(m);
+
+The current implementation uses heap sort which is nearly as fast as
+quicksort and has a compact implementation that does not require
+recursion or external memory and is robust against DOS attacks by having
+worst case O(n log n). It is, however, not a stable sort. The sort
+assumes struct have a reasonable size so swap operations can be done
+efficiently. For large structs a decicated sort operation building an
+external index vector would be better, but this is not supported.
+
+Note that a DAG is valid so there can be multiple vectors referring to
+the same table elements, and each can be sorted by a different key.
+
+The find operations are stable meaning they always return the lowest
+index of any matching key or `flatbuffers_not_found` which is larger
+than any other index.
+
+### Dangers of Sorting
+
+If a buffer was received over, say, an untrusted network the buffer
+should be verified before being accessed. But verification only makes it
+safe to read a buffer, not to modify a buffer because for example two
+vectors can be crafted to overlap each other without breaking any
+verification rules.
+
+Thus, sorting is intended to be done shortly after the buffer is
+constructed while it can still be trusted.
+
+Using find on a buffer that is supposed to be sorted, but isn't, can
+yield unexpected search results, but the result will always be a one
+element in the vector being searched, not a buffer overrun.
+
+### Scanning
+
+Some vectors can be sorted by different keys depending on which version
+version of `_sort_by` is being used. Obviously `_find_by` must match the
+sorted key.
+
+If we need to search for a key that is not sorted, or if we simply do
+not want to sort the vector, it is possible to use scanning operations
+instead by using `_scan` or `_scan_by`. Scanning is similar to find
+except that it does a linear search and it supports scanning from a
+given position.
+
+More information on scanning in the
+[README](https://github.com/dvidelabs/flatcc#searching-and-sorting)
+file, and in the [monster_test.c] test file.
+
+
+## Example of different interface type users
+
+A resource constrained microcontroller is building flatbuffers from
+sensor data using an emitter that sends UDP packages of the flatbuffer
+as soon as enough data is ready. A server reassembles the packages or
+discards them if any UDP package was lost. One the package is assembled,
+the server sorts specific vectors such as temparture levels in the buffer
+before it sends the buffer upstream to a storage service through a
+TCP/IP connection. The analyzers perform taks such as detecting
+abnormal temparature readings based on the sorted vector data.
+
+In the above example, the original sensor devices are not interested in
+the reader interface nor the sort interface. While the sort and find
+operations may be available, it is dead inline code that does not
+inflate the binary codes image size, but the libflatccrt library is
+linked in. The collecting server is not interested in the builder
+interface and does not link with the `libflatccrt` library but uses
+both the inline functions of the reader intrface and the sort interface.
+The upstream data storage service uses no interface at all since it
+treats the buffers as binary blobs in a database indexed by device and
+time. The end users only use the read only interface to visualize and
+analyze and has no need for the builder or the sort interface.
+
+
+## Special Emitters
+
+An emitter only need to implement one function to replace or wrap the
+default emitter. See [flatcc_builder.h] on `flatcc_builder_emit_fun` for
+details, and also `emit_test.c` for a very simple custom emitter that
+just prints debug messages, and [flatcc_emitter.h].
+
+When adding padding `flatcc_builder_padding_base` is used as base in iov
+entries and an emitter may detect this pointer and assume the entire
+content is just nulls. Usually padding is of limited size by its very
+nature so the benefit of handling this is also limited, but it, or a
+similar user provided constants can be used for similar purposes:
+
+When creating a vector in a single operation from an external C-array,
+no copying takes place on the internal builder stack. Therefore it is
+valid to provide a null pointer or a valid array such as
+`flatcc_builder_padding_base` that is is too small for the given length,
+provided that the emitter is aware of it. This in turn can be used to
+allocate space in the emitters internal datastructure so the vector can
+be filled after the fact if so desired. Pointer tagging may be another
+way to communicate special intent. Be aware that only `create` calls
+support this - any `append`, `start/end` or other dynamic operation will
+require valid inpout and will stack allocate temporary space.
+
+Emitters always receive a small table of iov entries that together form
+a single object including necessary headers and padding, for example a
+vector, a string, a nested buffer header, or a vtable. This is
+guaranteed by the api, but there is no coordination to provide details
+about which call is in order to keep the interface simple and fast. If
+this is desired the user must hint the emitter out of band before
+calling the relevant build operation. This can also be one indirectly by
+setting `user_state` in the emitter and have the emitter inspect this
+setting.
+
+When adding vectors piecemeal using `append` or similar as opposed to
+zero or less than zero copy approach above, the memory cost is obviously
+higher, but unless the individual objects grow large, the stack will
+operate in hot cpu cache so the bandwidth from main memory to cpu and
+back will not necessarily double. If the stack grows large it may also
+be worthwhile trimming the stack with a custom allocator and custom
+builder reset between buffers to reduce stack size and initialization
+overhead.
+
+[monster_test.c]: https://github.com/dvidelabs/flatcc/blob/master/test/monster_test/monster_test.c
+[flatcc_builder.h]: https://github.com/dvidelabs/flatcc/blob/master/include/flatcc/flatcc_builder.h
+[flatcc_emitter.h]: https://github.com/dvidelabs/flatcc/blob/master/include/flatcc/flatcc_emitter.h
+[monster_test.fbs]: https://github.com/dvidelabs/flatcc/blob/master/test/monster_test/monster_test.fbs
diff --git a/doc/eclectic.fbs b/doc/eclectic.fbs
new file mode 100644
index 0000000..26ecd85
--- /dev/null
+++ b/doc/eclectic.fbs
@@ -0,0 +1,12 @@
+ namespace Eclectic;
+
+ enum Fruit : byte { Banana = -1, Orange = 42 }
+ table FooBar {
+ meal : Fruit = Banana;
+ density : long (deprecated);
+ say : string;
+ height : short;
+ }
+ file_identifier "NOOB";
+ root_type FooBar;
+
diff --git a/doc/flatcc-help.md b/doc/flatcc-help.md
new file mode 100644
index 0000000..b9efc20
--- /dev/null
+++ b/doc/flatcc-help.md
@@ -0,0 +1,106 @@
+```
+flatcc FlatBuffers schema compiler for C by dvide.com
+version: 0.5.2-pre
+usage: flatcc [options] file [...]
+options:
+ --reader (default) Generate reader
+ -c, --common Generate common include header(s)
+ --common_reader Generate common reader include header(s)
+ --common_builder Generate common builder include header(s)
+ -w, --builder Generate builders (writable buffers)
+ -v, --verifier Generate verifier
+ -r, --recursive Recursively generate included schema files
+ -a Generate all (like -cwvr)
+ -g Use _get suffix only to avoid conflicts
+ -d Dependency file like gcc -MMD
+ -I<inpath> Search path for include files (multiple allowed)
+ -o<outpath> Write files relative to this path (dir must exist)
+ --stdout Concatenate all output to stdout
+ --outfile=<file> Like --stdout, but to a file.
+ --depfile=<file> Dependency file like gcc -MF.
+ --deptarget=<file> Override --depfile target like gcc -MT.
+ --prefix=<prefix> Add prefix to all generated names (no _ added)
+ --common-prefix=<prefix> Replace 'flatbuffers' prefix in common files
+ --schema Generate binary schema (.bfbs)
+ --schema-length=no Add length prefix to binary schema
+ --verifier Generate verifier for schema
+ --json-parser Generate json parser for schema
+ --json-printer Generate json printer for schema
+ --json Generate both json parser and printer for schema
+ --version Show version
+ -h | --help Help message
+
+This is a flatbuffer compatible compiler implemented in C generating C
+source. It is largely compatible with the flatc compiler provided by
+Google Fun Propulsion Lab but does not support JSON objects or binary
+schema.
+
+By example 'flatcc monster.fbs' generates a 'monster.h' file which
+provides functions to read a flatbuffer. A common include header is also
+required. The common file is generated with the -c option. The reader
+has no external dependencies.
+
+The -w (--builder) option enables code generation to build buffers:
+`flatbuffers -w monster.fbs` will generate `monster.h` and
+`monster_builder.h`, and also a builder specific common file with the
+-cw option. The builder must link with the extern `flatbuilder` library.
+
+-v (--verifier) generates a verifier file per schema. It depends on the
+runtime library but not on other generated files, except other included
+verifiers.
+
+-r (--recursive) generates all schema included recursively.
+
+--reader is the default option to generate reader output but can be used
+explicitly together with other options that would otherwise disable it.
+
+All C output can be concated to a single file using --stdout or
+--outfile with content produced in dependency order. The outfile is
+relative to cwd.
+
+-g Only add '_get' suffix to read accessors such that, for example,
+only 'Monster_name_get(monster)` will be generated and not also
+'Monster_name(monster)'. This avoids potential conflicts with
+other generated symbols when a schema change is impractical.
+
+-d generates a dependency file, e.g. 'monster.fbs.d' in the output dir.
+
+--depfile implies -d but accepts an explicit filename with a path
+relative to cwd. The dependency files content is a gnu make rule with a
+target followed by the included schema files The target must match how
+it is seen by the rest of the build system and defaults to e.g.
+'monster_reader.h' or 'monster.bfbs' paths relative to the working
+directory.
+
+--deptarget overrides the default target for --depfile, simiar to gcc -MT.
+
+--schema will generate a binary .bfbs file for each top-level schema file.
+Can be used with --stdout if no C output is specified. When used with multiple
+files --schema-length=yes is recommend.
+
+--schema-length adds a length prefix of type uoffset_t to binary schema so
+they can be concatenated - the aligned buffer starts after the prefix.
+
+--json-parser generates a file that implements a fast typed json parser for
+the schema. It depends on some flatcc headers and the runtime library but
+not on other generated files except other parsers from included schema.
+
+--json-printer generates a file that implements json printers for the schema
+and has dependencies similar to --json-parser.
+
+--json is generates both printer and parser.
+
+The generated source can redefine offset sizes by including a modified
+`flatcc_types.h` file. The flatbuilder library must then be compiled with the
+same `flatcc_types.h` file. In this case --prefix and --common-prefix options
+may be helpful to avoid conflict with standard offset sizes.
+
+The output size may seem bulky, but most content is rarely used inline
+functions and macros. The compiled binary need not be large.
+
+The generated source assumes C11 functionality for alignment, compile
+time assertions and inline functions but an optional set of portability
+headers can be included to work with most any compiler. The portability
+layer is not throughly tested so a platform specific test is required
+before production use. Upstream patches are welcome.
+```
diff --git a/doc/json_parser_design.md b/doc/json_parser_design.md
new file mode 100644
index 0000000..3d7cc6a
--- /dev/null
+++ b/doc/json_parser_design.md
@@ -0,0 +1,138 @@
+# JSON Parser Design
+
+The overall principle of the json parser is as follows:
+
+`flatcc/flatcc_json.h` contains functions to parse json primitives
+and type conversions, and a generic json parser to skip unrecognized
+fields.
+
+For each table all known fields are sorted and a trie is constructed.
+After recognizing a json object, each member name is read partially
+and matched against the trie, and more of the field name is read as
+demanded by the trie and a match or rejection is decided.
+
+The member name is read as a single 64 bit word read in big endian
+format (zero padded at the end of file). Big endian makes the first
+character in the name most significant and allows the word to contain
+trailing "garbage". The efficiency depends on unaligned 64-bit reads
+with a fast byteswapping operation, but there is a fall-back that
+works in all cases via an unaligned read support function.
+
+The trie is constructed by sorting all field names and taking a
+median split. If the split name has preceeding name which is a strict
+prefix, that name is chosen as split instead. This is important
+because words are read with trailing "garbage". If the median name is
+longer than a 64 bit word, it is treated specially where a match
+triggers reading more data and repeating the process.
+
+The median is tested for less than, but not equality. Each side if
+the choice uses a new median like above, until there is only one
+option left. This option is then tested for exact match by masking
+out any trailing data. If the match succeeds the input member name
+must be checked for closing `"` or other valid termination if
+allowing unquoted names.
+
+Note that this match only visits data once, unlike a hash table where
+a lookup still requires matching the name. Since we expect the number
+of field names per table to be reasonable, this approach is likely
+faster than a hash table.
+
+If the match fails, then an error can be generated with a "unknown
+field" error, or the field can be consumed by the generic json
+parser.
+
+When a member name is successfully matched againts a known field, the
+member value is expected to be of a primitive type such as a string
+or an integer, the value is parsed using a json support parser and
+type converter which may issue overflow and format errors. For
+example, integers won't accept decimal points and ubyte fields won't
+accept integers of value 256 or higher. If the parse was successful
+the member is added the current flatbuffer table opened at start of
+the json object. In fact, the value is parsed directly into the
+flatbuffer builders entry for the field.
+
+If a scalar field fails to parse its value, a second attempt is
+done parsing the value as a symbolic value. This parse is similar
+to parsing member names but uses a global set of known constants
+from enumerations. If this also fails, the field is invalid and
+an error is generated.
+
+When the field is parsed, comma is detected and the process is
+repeated. It may fail on duplicate fields because the builder will
+complain.
+
+Once a closing bracket is matched, the table is closed in the
+builder.
+
+In the above we discussed tables, but structs work the same with the
+exception that any fields not present are simply set to zero and it
+is not checked. Nor are duplicate fields checked. This is to avoid
+allocating extra space for something that isn't very important.
+
+Complex member types require more consideration. We wish to avoid a
+recursive descent parser because it imposes limits on nesting depth,
+but we want to have a composable parser. This leads to solutions such
+as a trampoline, returning a parser function for the matched field
+which is called by a driver function, before resuming the current
+parser. We wish to avoid this as it adds overhead, especially in
+inlining.
+
+To avoid trampolines we compile all tables and structs into one large
+function where each type can be reached by a goto label or a switch
+statement. We can then use a variant of Simon Tathams famous
+co-routine by duff device to abort and resume the current parse. We
+still need a stack to track return state.
+<http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html>
+
+Outside the giant state machine we provide wrappers so we can parse
+each type from a clean call interface to parse root type.
+
+The above solution has one major problem: we apparently have to
+include types from included schema as well. But we don't have to it
+turns out: Each included schema is closed and can never have a member
+type in our current schema. Therefore mutual recursion is not
+possible. This means we can afford to use function call recursion
+into included parsers with recursion depth bounded by the number of
+included schema.
+
+Spaces are optimized by first testing to see if there is a character
+above space and exit immediately if so. Otherwise, if unaligned
+access is valid and the buffer holds at least 16 bytes, we descend
+hierarchically to test ever small unligned loads against a word of
+spaces. This loops as long as 2x64 bit words can be matched to a word
+of pure spaces (ASCII 0x20). If there is any control character, or
+if we reach near the end or, or if unaligned access is not available,
+we bail out to a standard character parsing loop that also does line
+counting.
+
+A final, and major issue, is how to handle unions:
+
+Googles `flatc` compiler makes the assumtion that the union type
+field appear before the union table field. However, this order cannot
+be trusted in standard JSON. If we were to sort the fields first
+or prescan for type, it would complete ruin our parsing strategy.
+
+To handle unions efficiently we therefore require either that union
+fields appear before the union table, and that the associated
+union table is present before another unions type field, or we
+require that the member name is tagged with its type. We prefer
+the tagged approach, but it is not compliant with `flatc v1.2`.
+
+By using a tag such as `"test_as_Monster"` instead of just "test",
+the parser can treat the field as any other table field. Of course
+there is only allowed to be one instance, so `"test_as_Weapon"`
+in the same object would not be allowed. To the parser it just
+triggers a duplicate field error.
+
+In addition, the `"test_type"` field is allowed to appear anywhere,
+or to be absent. If it is present, it must have a type that
+matches the tagged member, unless it is of type NONE. NONE may
+also be represented as `"test_as_NONE": null`.
+
+While the tagged approach has no use of the type field, it may still
+be very useful to other JSON consumers so they can know what tagged
+field to look up in the JSON object.
+
+The parser handles the type field by setting the type field in the
+flatcc builder table and checking if it is already set when
+seeing a tagged or untagged union table field.
diff --git a/doc/security.md b/doc/security.md
new file mode 100644
index 0000000..52e0515
--- /dev/null
+++ b/doc/security.md
@@ -0,0 +1,272 @@
+# Security Considerations
+
+This sections covers questions such as when is it safe to access a
+buffer without risking access violation, buffer overruns, or denial of
+service attacks, but cannot possibly cover all security aspects.
+
+## Reading Buffers
+
+When reading a buffer you have to know the schema type before
+reading and it is preferable to know the size of the buffer although not
+strictly required. If the type is not known, the `file_type`, aka buffer
+type, can be checked. This no guarantee due to collisions with other
+data formats and because the identifier field may be absent or
+misleading. The identifier therefore works best on buffers that can be
+trusted.
+
+If a buffer cannot be trusted, such as when receiving it over a public
+network, it may be the case that buffer type is known, but it is not
+known if someone uses an incorrect implementation of FlatBuffers, or if
+the buffer has somehow been corrupted in transit, or someone
+intentionally tampered with the buffer. In this case the buffer can be
+verified. A verification does not prove that the buffer has the correct
+type, but it does prove that it is safe to read (not write) from the
+buffer. The buffer size must be known in order to verify the buffer. If
+the buffer has a wrong type, but still (unlikey but possible) passes
+verification, then unexpected data may be read from the buffer, but it
+will not cause any crashes when using the API correctly.
+
+It is preferable to know the required alignment of a buffer which isn't
+trivially available unless retrieved from the builder when the buffer is
+created. The buffer alignment can be deduced from the schema.
+
+On many systems there is no real problem in accessing a buffer
+unaligned, but for systems where it matters, care must be taken because
+unaligned access can result in slow performance or access violations.
+Even on systems where alignment matters, a standard malloc operation is
+often sufficient because it normally aligns to the largest word that
+could cause access violations when unaligned. For special use case such
+as GPU memory access more alignment may be needed and FlatBuffers
+support higher alignments in the schema. Portable `aligned_alloc` and
+`aligned_free` support methods are available to help allocate memory with
+sufficient alignment. Because compile time flags can change between
+compilation of the runtime library and the application,
+`flatcc_builder_aligned_free` ensures a consistent deallocation method
+for aligned buffers allocated by the runtime library.
+
+A verifier for C requires the buffer to placed in aligned memory and it
+will fail if the buffer content is not properly aligned relative to the
+buffer or to an absolute memory address regardless of whether the
+current systems requires alignment or not. Therefore a buffer verified
+on one system is safe to use on all systems. One could use this fact to
+sign a buffer, but this is beyond the scope of FlatBuffers itself, and
+verifying a signature is likely much slower than re-verifying a buffer
+when a verifier is available.
+
+Note: it would be helpful if the verifier allowed verification only
+relative to the buffer start instead of requiring the absolute addresses
+to be aligned. This would allow verification of buffers before copying
+them out of unaligned locations in network buffers and also allow
+subsequent reading of such buffers without copying iff the system
+supports unaligned access. However, the verifier does not currently
+support this.
+
+It is not always safe to verify a buffer. A buffer can be constructed to
+trigger deep nesting. The FlatCC verifier has a hard coded non-exact
+limit of about 100 levels. This is to protection stack recursion. If the
+limit is exceeded, the verifier will safely fail. The limit can be
+changed with a compile time flag. If the limit is too permissive a
+system may run into stack overflow, but it is unlikely on most systems
+today. Typical application code may have similar recursive access
+functions. Therefore it is likely that recursion is safe if the verifier
+succeeds but it depends on the application.
+
+A buffer can point to the same data from multiple places. This is known
+as a DAG. The verifier rejects cycles that could lead to infinite loops
+during application traversal but does permit DAGs. For normal use DAGs
+are safe but it is possible to maliciously construct a buffer with a
+long vector where all elements points to a table that also has a vector
+of a similar nature. After a few levels, this can lead to a finite but
+exponentially large number of places to visit. The current FlatCC
+verifier does not protect against this but Googles flatc compiler has a
+verifier the limits the number of visited tables.
+
+When reading a buffer in C no memory allocation takes place after the
+buffer has initially been placed in memory. For example, strings can be
+read directly as C strings and strings are 0-terminated. A string might
+contain embedded 0 bytes which is not strictly correct but permitted.
+This will result in a shorter string if used naively as a C string
+without reading the string length via the API but it will still be safe.
+Other languages might have to allocate memory for objects such as
+strings though.
+
+A field can generally be absent. Scalar fields are always safe to
+access, even if they are absent, because they have a default value that
+will be returned. Tables, Vectors, Strings, and Structs may return null
+when a field is absent. This is perfectly valid but if the application
+does not check for null, this can lead to an access violation.
+
+A field can marked 'required' in the schema. If it is required, it will
+never return null unless the buffer is invalid. A verifier will detect
+this. On a practical note some FlatBuffer builders might not enforce the
+required field and readers do not always verify buffers before access
+(nor should they have to) - therefore an application is advised to check
+return values even on required fields unless the buffer is entirely
+trusted.
+
+If a buffer is verified, it is safe to access all vector elements up to
+its size. Access of elements via API calls do not necessarily check for
+out of bounds but some might.
+
+A buffer may also be encoded in big endian format. This is not standard,
+but FlatCC supports for systems that are primarily big endian. The
+buffer identifier will usually detect the difference because the
+identifier will be byte swapped. A reader therefore need to be aware of
+this possiblity, but most often this is not a concern since standard
+FlatBuffers are always little endian. The verifier will likely fail an
+unpexcted endian encoding but at least make it safe to access.
+
+
+## Thread Safety
+
+There is no thread safety on the FlatBuffers API but read access does
+not mutate any state. Every read location is a temporary variable so as
+long as the application code is otherwise sane, it is safe read a buffer
+from multiple threads and if the buffer is placed on cache line
+alignment (typically 64 or 128 bytes) it is also efficient without false
+sharing.
+
+A verifier is also safe to use because it it only reads from a buffer.
+
+A builder is inherently NOT safe for multihreaded access. However, with
+proper synchronization there is nothing preventing one thread from doing
+the grunt work and another putting the high level pieces together as
+long as only one thread at a time is access the builder object, or the
+associated allocator and emitter objects. From a performance perspective
+this doesn't make much sense, but it might from an architectural
+perspective.
+
+A builder object can be cleared and reused after a buffer is constructed
+or abandoned. The clear operation can optionally reduce the amount of
+memory or keep all the memory from the previous operation. In either
+case it is safe for new thread to use the builder after it is cleared
+but two threads cannot use the builder at the same time.
+
+It is fairly cheap to create a new builder object, but of course cheaper
+to reuse existing memory. Often the best option is for each thread to
+have its own builder and own memory and defer any sharing to the point
+where the buffer is finished and readable.
+
+
+## Schema Evolution
+
+Accessing a buffer that was created by a more recent of a FlatBuffers
+schema is safe iff the new version of the schema was created according
+the guidelines for schema evolution - notably no change of field
+identifiers or existing enum values and no removal or deprecation of
+required fields. Googles flatc tool can check if a new schema version is
+safe.
+
+Fields that are not required but deprecated in a new version will still
+be safe to access by old version but they will likely read default or
+null values and should be prepared for this.
+
+
+## Copying or Printing Buffer Content
+
+Even if it is safe to read a buffer, it is not safe to copy or even
+print a buffer because a DAG can unfold to consume much more output
+space than the given input. In data compression this is known as a Zip
+Bomb but even without malicious intent, users need to aware of the
+potential expansion. This is also a concern when printing JSON.
+
+A table also cannot be trivially copied based on memory content because
+it has offsets to other content. This is not an issue when using any
+official API but may be if a new buffer is attempted to be constructed
+by Frankenstein from parts of existing buffers.
+
+Nested buffers are not allowed to share any content with its parents,
+siblings or child buffers for similar reasons.
+
+The verifier should complain if a buffer goes out if bounds.
+
+
+## Modifying Buffers
+
+It is not safe to modify buffers in-place unless the buffer is
+absolutely trusted. Verifying a buffer is not enough. FlatC does not
+provide any means to modify a buffer in-place but it is not hard to
+achieve this is if so desired. It is especially easy to do this with
+structs, so if this is needed this is the way to do it.
+
+Modifying a buffer is unsafe because it is possible to place one table
+inside another table, as an example, even if this is not valid. Such
+overlaps are too expensive to verify by a standard verifier. As long as
+the buffer is not modified this does not pose any real problem, but if a
+field is modified in one table, it might cause a field of another table
+to point out of bounds. This is so obvious an attack vector that anyone
+wanting to hack a system is likely to use this approach. Therefore
+in-place modification should be avoided unless on a trusted platform.
+For example, a trusted network might bump a time-to-live counter when
+passing buffers around.
+
+Even if it is safe to modify buffers, this will not work on platforms
+that require endian conversion. This is usually big endian platforms,
+but it is possible to compile flatbuffers with native big endian format
+as well.
+platforms unless extra precautions are taken. FlatCC has a lot of
+low-level `to/from_pe` calls that performs the proper
+
+
+## Building Buffers
+
+A buffer can be constructed incorrectly in a large number of ways that
+are not efficient to detect at runtime.
+
+When a buffer is constructed with a debug library then assertions will
+sometimes find the most obvious problems such as closing a table after
+opening a vector. FlatCC is quite permissive in the order of object
+creation but the nesting order must be respected, and it cannot type
+check all data. Other FlatBuffer builders typically require that child
+objects are completed created before a parent object is started. FlatCC
+does not require this but will internally organize objects in a
+comptible way. This removes a number of potential mistakes, but not all.
+
+Notably a table from a parent object or any other external reference
+should not be used in a nested buffer.
+
+It is a good idea to run a verifier on a constructed buffer at least
+until some confidence has been gained in the code building buffers.
+
+If a buffer needs to be constructed with sorted keys this cannot be done
+during construction, unlike the C++ API because the builder allocates as
+little memory as possible. Instead the reader interface supports a
+mutable cast for use with a sort operation. This sort operation must
+only be used on absolutely trusted buffers and verification is not
+sufficient if malicous overlaps can be expected.
+
+The builder will normally consume very little memory. It operates a few
+stacks and a small hash table in additional to a circular buffer to
+consum temporary buffer output. It is not possible to access constructed
+buffer objects buffer the buffer is complete because data may span
+multiple buffers. Once a buffer is complete the content can be
+copied out, or a support function can be used allocate new memory with
+the final buffer as content.
+
+The internal memory can grow large when the buffer grows large,
+naturally. In addition the temporary stacks may grow large if there are
+are large tables or notably large vectors that cannot be be copied
+directly to the output buffers. This creates a potential for memory
+allocation errors, especially on constrained systems.
+
+The builder returns error codes, but it is tedious to check these. It is
+not necessary to check return codes if the API is used correctly and if
+there are no allocation errors. It is possible to provide a custom
+allocator and a custom emitter. These can detect memory failures early
+making it potentially safe to use the builder API without any per
+operation checks.
+
+The generated JSON parser checks all return codes and can be used to
+construct a buffer safely, especially since the buffer is naturally
+bounded by the size of the JSON input. JSON printing, on the other hand,
+can potentially explode, as discussed earlier.
+
+FlatCC generated create calls such as `MyGame_Example_Monster_create()`
+will not be compatible across versions if there are deprecated fields
+even if the schema change otherwise respects schema evolutation rules.
+This is mostly a concern if new fields are added because compilation
+will otherwise break on argument count mismatch. Prior to flatcc-0.5.3
+argument order could change if the field (id: x) attribute was used
+which could lead to buffers with unexpected content. JSON parsers that
+support constructors (objects given as an array of create arguments)
+have similar concerns but here trailing arguments can be optional.