1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
|
# FlatBuffers Binary Format
<!-- vim-markdown-toc GFM -->
* [Overview](#overview)
* [Memory Blocks](#memory-blocks)
* [Example](#example)
* [Primitives](#primitives)
* [Numerics](#numerics)
* [Boolean](#boolean)
* [Format Internal Types](#format-internal-types)
* [Scalars](#scalars)
* [Structs](#structs)
* [Internals](#internals)
* [Type Hashes](#type-hashes)
* [Conflicts](#conflicts)
* [Type Hash Variants](#type-hash-variants)
* [Unions](#unions)
* [Optional Fields](#optional-fields)
* [Alignment](#alignment)
* [Default Values and Deprecated Values](#default-values-and-deprecated-values)
* [Schema Evolution](#schema-evolution)
* [Keys and Sorting](#keys-and-sorting)
* [Size Limits](#size-limits)
* [Verification](#verification)
* [Risks](#risks)
* [Nested FlatBuffers](#nested-flatbuffers)
* [Fixed Length Arrays](#fixed-length-arrays)
* [Big Endian FlatBuffers](#big-endian-flatbuffers)
* [StructBuffers](#structbuffers)
* [StreamBuffers](#streambuffers)
* [Bidirectional Buffers](#bidirectional-buffers)
* [Possible Future Features](#possible-future-features)
* [Force Align](#force-align)
* [Mixins](#mixins)
<!-- vim-markdown-toc -->
## Overview
## Memory Blocks
A FlatBuffers layout consists of the following types of memory blocks:
- header
- table
- vector
- string
- vtable
- (structs)
Each of these have a contigous memory layout. The header references the
root table. Every table references a vtable and stores field in a single
block of memory. Some of these fields can be offsets to vectors, strings
and vectors. Vectors are one-dimensional blocks where each element is
self-contained or stores reference to table or a string based on schema
type information. A vtable decides which fields are present in table and
where they are stored. Vtables are the key to support schema evolution.
A table has no special rules about field ordering except fields must be
properly aligned, and if they are ordered by size the will pack more
densely but possibly more complicated to construct and the schema may
provide guidelines on the preferred order. Two vtables might mean
different things but accidentally have the same structure. Such vtables
can be shared between different tables. vtable sharing is important when
vectors store many similar tables. Structs a dense memory regions of
scalar fields and smaller structs. They are mostly found embedded in
tables but they are independent blocks when referenced from a union or
union vector, or when used as a buffer root. Structs hold no references.
Space between the above blocks are zero padded and present in order to
ensure proper alignment. Structs must be packed as densely as possible
according the alignment rules that apply - this ensures that all
implementations will agree on the layout. The blocks must not overlap in
memory but two blocks may be shared if they represent the same data such
as sharing a string.
FlatBuffers are constructed back to front such that lower level objects
such as sub-tables and strings are created first in stored last and such
that the root object is stored last and early in the buffer. See also
[Stream Buffers](#stream-buffers) for a proposed variation over this
theme.
All addressing in FlatBuffers are relative. The reason for this? When
navigating the buffer you don't need to store the buffer start or the
buffer length, you can also find a new reference relative the reference
you already have. vtables require the table location to find a field,
but a table field referencing a string field only needs the table field
location, not the table start in order to find the referenced string.
This results in efficient navigation.
## Example
Uoffsets add their content to their address and are positive while a
tables offset to its vtable is signed and is substracted. A vtables
element is added to the start of the table referring to the vtable.
Schema ([eclectic.fbs]) :
namespace Eclectic;
enum Fruit : byte { Banana = -1, Orange = 42 }
table FooBar {
meal : Fruit = Banana;
density : long (deprecated);
say : string;
height : short;
}
file_identifier "NOOB";
root_type FooBar;
JSON :
{ "meal": "Orange", "say": "hello", "height": -8000 }
Buffer :
header:
+0x0000 00 01 00 00 ; find root table at offset +0x00000100.
+0x0004 'N', 'O', 'O', 'B' ; possibly our file identifier
...
table:
+0x0100 e0 ff ff ff ; 32-bit soffset to vtable location
; two's complement: 2^32 - 0xffffffe0 = -0x20
; effective address: +0x0100 - (-0x20) = +0x0120
+0x0104 00 01 00 00 ; 32-bit uoffset string field (FooBar.say)
; find string +0x100 = 256 bytes _from_ here
; = +0x0104 + 0x100 = +0x0204.
+0x0108 42d ; 8-bit (FooBar.meal)
+0x0109 0 ; 8-bit padding
+0x010a -8000d ; 16-bit (FooBar.height)
+0x010c ... ; (first byte after table end)
...
vtable:
+0x0120 0c 00 ; vtable length = 12 bytes
+0x0122 0c 00 ; table length = 12 bytes
+0x0124 08 00 ; field id 0: +0x08 (meal)
+0x0126 00 00 ; field id 1: <missing> (density)
+0x0128 04 00 ; field id 2: +0004 (say)
+0x012a 0a 00 ; field id 3: +0x0a (height)
...
string:
+0x0204 05 00 00 00 ; vector element count (5 ubyte elements)
+0x0208 'h' 'e' ; vector data
+0x020a 'l' 'l' ; vector data
+0x020c 'o' ; vector data
+0x020d 00 ; zero termination
; special case for string vectors
...
Actual FlatCC generated buffer :
Eclectic.FooBar:
0000 08 00 00 00 4e 4f 4f 42 e8 ff ff ff 08 00 00 00 ....NOOB........
0010 2a 00 c0 e0 05 00 00 00 68 65 6c 6c 6f 00 00 00 *.......hello...
0020 0c 00 0c 00 08 00 00 00 04 00 0a 00 ............
Note that FlatCC often places vtables last resulting in `e0 ff ff ff`
style vtable offsets, while Googles flatc builder typically places them
before the table resulting in `20 00 00 00` style vtable offsets which
might help understand why the soffset is subtracted from and not added
to the table start. Both forms are equally valid.
## Primitives
Primitives are data used to build more complex structures such as
strings, vectors, vtables and tables. Although structs are not strictly
a primitive, it helps to view them as self-contained primitives.
### Numerics
FlatBuffers are based on the following primitives that are 8, 16, 32 and
64 bits in size respectively (IEEE-754, two's complement, little endian):
uint8, uint16, uint32, uint64 (unsigned)
int8, int16, int32, int64 (two's complement)
float32, float64 (IEEE-754)
### Boolean
flatbuffers_bool (uint8)
flatbuffers_true (flatbuffers_bool assign as 1, read as != 0)
flatbuffers_false (flatbuffers_bool = 0)
Note that a C99 `bool` type has no defined size or sign so it is not an
exact representation of a flatbuffers boolean encoding.
When a stored value is interpreted as boolean it should not be assumed
to be either 1 or 0 but rather as `not equal to 0`. When storing a
boolean value or when converting a boolean value to integer before
storing, the value should be 1 for true and 0 for false, In C this can
be done using `!!x`.
### Format Internal Types
flatbuffers_union_type_t (uint8, NONE = 0, 0 <= type <= 255)
flatbuffers_identifier_t (uint8[4])
flatbuffers_uoffset_t (uin32)
flatbuffers_soffset_t (int32),
flatbuffers_voffset_t (uint16)
These types can change in FlatBuffer derived formats, but when we say
FlatBuffers without further annotations, we mean the above sizes in
little endian encoding. We always refer to specific field type and not a
specific field size because this allows us to derive new formats easily.
### Scalars
To simpify discussion we use the term scalars to mean integers, boolean,
floating point and enumerations. Enumerations always have an underlying
signed or unsigned integer type used for its representation.
Scalars are primitives that has a size between 1 and 8 bytes. Scalars
are stored aligned to the own size.
The format generally supports vectors and structs of any scalar type but
some language bindings might not have support for all combinations such
as arrays of booleans.
An special case is the encoding of a unions type code which internally
is an enumeration but it is not normally permitted in places where we
otherwise allow for scalars.
Another special case is enumerations of type boolean which may not be
widely supported, but possible. The binary format is not concerned with
this distinction because a boolean is just an integer at this level.
### Structs
A struct is a fixed lengthd block of a fixed number of fields in a specific order
defined by a schema. A field is either a scalar, another struct or a fixed length
array of these, or a fixed length char array. A struct cannot contain fields that
contain itself directly or indirectly. A struct is self-contained and has no
references. A struct cannot be empty.
A schema cannot change the layout of a struct without breaking binary
compatibility, unlike tables.
When used as a table field, a struct is embedded within the table block
unless it is union value. A vector of structs are placed in a separate
memory block, similar to vectors of scalars. A vector of unions that has
a struct as member will reference the struct as an offset, and the
struct is then an independent memory block like a table.
FlatCC supports that a struct can be the root object of a FlatBuffer, but
most implementations likely won't support this. Structs as root are very
resource efficient.
Structs cannot have vectors but they can have fixed length array fields. which
are equivalent to stacking multiple non-array fields of the same type after each
other in a compact notation with similar alignment rules. Additionally arrays
can be of char type to have a kind of fixed length string. The char type is not
used outside of char arrays. A fixed length array can contain a struct that
contains one more fixed length arrays. If the char array type is not support, it
can be assumed to be a byte array.
## Internals
All content is little endian and offsets are 4 bytes (`uoffset_t`).
A new buffer location is found by adding a uoffset to the location where
the offset is stored. The first location (offset 0) points to the root
table. `uoffset_t` based references are to tables, vectors, and strings.
References to vtables and to table fields within a table have a
different calculations as discussed below.
A late addition to the format allow for adding a size prefix before the
standard header. When this is done, the builder must know about so it
can align content according to the changed starting position. Receivers
must also know about the size field just as they must know about the
excpected buffer type.
The next 4 bytes (`sizeof(uoffset_t)`) might represent a 4 byte buffer
identifier, or it might be absent but there is no obvious way to know
which. The file identifier is typically ASCII characters from the
schemas `file_identifier` field padded with 0 bytes but may also contain
any custom binary identifier in little endian encoding. See
[Type-Hashes](#type-hashes). The 0 identifier should be avoided because
the buffer might accidentally contain zero padding when an identifier is
absent and because 0 can be used by API's to speficy that no identifier
should be stored.
When reading a buffer, it should be checked that the length is at least
8 bytes (2 * `sizeof(uoffset_t)`) Otherwise it is not safe to check the
file identifier.
The root table starts with a 4 byte vtable offset (`soffset_t`). The
`soffset_t` has the same size as `uoffset_t` but is signed.
The vtable is found by *subtracting* the signed 4 byte offset to the
location where the vtable offset is stored. Note that the `FlatCC`
builder typically stores vtables at the end of the buffer (clustered
vtables) and therefore the vtable offset is normally negative. Other
builders often store the vtable before the table unless reusing an
existing vtable and this makes the soffset positive.
(Nested FlatBuffers will not store vtables at the end because it would
break compartmentalization).
The vtable is a table of 2 byte offsets (`voffset_t`). The first two
entreis are the vtable size in bytes and the table size in bytes. The
next offset is the vtable entry for table field 0. A vtable will always
have as many entries as the largest stored field in the table, but it
might skip entries that are not needed or known (version independence) -
therefore the vtable length must be checked. The table length is only
needed by verifiers. A vtable lookup of an out of range table id is the
same as a vtable entry that has a zero entry and results in a default
value for the expected type. Multiple tables may shared the same vtable,
even if they have different types. This is done via deduplication during
buffer construction. A table field id maps to a vtable offset using the
formula `vtable-start + sizeof(voffset_t) * (field-id + 2)`, iff the
result is within the vtable size.
The vtable entry stores a byte offset relative to the location where the
`soffset_t` where stored at the start of the table. A table field is
stored immediately after the `soffset_t` or may contain 0 padding. A
table field is always aligned to at least its own size, or more if the
schema demands it. Strings, sub-tables and vectors are stored as a
4-byte `uoffset_t`. Such sub-elements are always found later in the
buffer because `uoffset_t` is unsigned. A struct is stored in-place.
Conceptually a struct is not different from an integer in this respect,
just larger.
If a sub-table or vector is absent and not just without fields, 0
length, the containing table field pointing the sub-table or vector
should be absent. Note that structs cannot contain sub-tables or
vectors.
A struct is always 0 padded up to its alignment. A structs alignment
is given as the largest size of any non-struct member, or the alignment
of a struct member (but not necessarily the size of a struct member), or
the schema specified aligment.
A structs members are stored in-place so the struct fields are known
via the the schema, not via information in the binary format
An enum is represent as its underlying integer type and can be a member
of structs, fields and vectors just like integer and floating point
scalars.
A table is always aligned to `sizeof(uoffset_t)`, but may contain
internal fields with larger alignment. That is, the table start or end
are not affected by alignment requirements of field members, unlike
structs.
A string is a special case of a vector with the extra requirement that a
0 byte must follow the counted content, so the following also applies to
strings.
A vector starts with a `uoffset_t` field the gives the length in element
counts, not in bytes. Table fields points to the length field of a
vector, not the first element. A vector may have 0 length. Note that the
FlatCC C binding will return a pointer to a vectors first element which
is different from the buffer internal reference.
All elements of a vector has the same type which is either a scalar type
including enums, a struct, or a uoffset reference where all the
reference type is to tables all of the same type, all strings, or,
for union vectors, references to members of the same union. Because a
union member can be a struct, it is possible to have vectors that
reference structs instead of embeddding them, but only via unions. It is
not possible to have vectors of vectors other than string vectors,
except indirectly via vectors containing tables.
A vectors first element is aligned to the size of `uoffset_t` or the
alignment of its element type, or the alignment required by the schema,
whichever is larger. Note that the vector itself might be aligned to 4
bytes while the first element is aligned to 8 bytes.
(Currently the schema semantics do no support aligning vectors beyond
their element size, but that might change and can be forced when
building buffers via dedicated api calls).
Strings are stored as vectors of type `ubyte_t`, i.e. 8-bit elements. In
addition, a trailing zero is always stored. The trailing zero is not
counted in the length field of the vector. Technically a string is
supposed to be valid UTF-8, but in praxis it is permitted that strings
contain 0 bytes or other invalid sequences. It is recommended to
explicitly check strings for UTF-8 conformance when this is required
rather than to expect this to alwways be true. However, the ubyte vector
should be preferred for binary data.
A string, vector or table may be referenced by other tables and vectors.
This is known as a directed acyclic graph (DAG). Because uoffsets are
unsigned and because uoffsets are never stored with a zero value (except
for null entries in union vectors), it is not possible for a buffer to
contain cycles which makes them safe to traverse with too much fear of
excessive recursion. This makes it possible to efficiently verify that a
buffer does not contain references or content outside of its expected
boundaries.
A vector can also hold unions, but it is not supported by all
implementations. A union vector is in reality two separate vectors: a
type vector and an offset vector in place of a single unions type and
value fields in table. See unions.
## Type Hashes
A type hash is a 32-bit value defined as the fnv1a32 hash of a tables or
a structs fully qualified name. If the fnv1a32 hash returns 0 it should
instead hash the empty string. 0 is used to indicate that a buffer
should not store an identifier.
Every table and struct has a name and optionally also a namespace of one
or more levels. The fully qualified name is optional namespace followed
by the type name using '.' as a separator. For example
"MyGame.Sample.Monster", or "Eclectic.FooBar".
The type hash can be used as the buffer identifier instead of the schema
provided `file_identifier`. The type hash makes it possible to
distinguish between different root types from the same schema, and even
across schema as long as the namespace is unique.
Type hashes introduce no changes to the binary format but the application
interface must choose to support user defined identifiers or explicitly
support type hashes. Alternatively an application can peak directly into
the buffer at offset 4 (when `uoffset_t` is 4 bytes long).
FlatCC generates the following identifier for the "MyGame.Sample.Monster"
table:
#define MyGame_Sample_Monster_type_hash ((flatbuffers_thash_t)0xd5be61b)
#define MyGame_Sample_Monster_type_identifier "\x1b\xe6\x5b\x0d"
But we can also
[compute one online](https://www.tools4noobs.com/online_tools/hash/)
for our example buffer:
fnv1a32("Eclectic.FooBar") = 0a604f58
Thus we can open a hex editor and locate
+0x0000 00 01 00 00 ; find root table at offset +0x00000100.
+0x0004 'N', 'O', 'O', 'B' ; possibly our file identifier
and replace it with
+0x0000 00 01 00 00 ; find root table at offset +0x00000100.
+0x0004 58 4f 60 0a ; very likely our file identifier identifier
or generate it with `flatcc`:
$ bin/flatcc --stdout doc/eclectic.fbs | grep FooBar_type
#define Eclectic_FooBar_type_hash ((flatbuffers_thash_t)0xa604f58)
#define Eclectic_FooBar_type_identifier "\x58\x4f\x60\x0a"
The following snippet implements fnv1a32, and returns the empty string
hash if the hash accidentially should return 0:
static inline flatbuffers_thash_t flatbuffers_type_hash_from_name(const char *name)
{
uint32_t hash = 2166136261UL;
while (*name) {
hash ^= (uint32_t)*name;
hash = hash * 16777619UL;
++name;
}
if (hash == 0) {
hash = 2166136261UL;
}
return hash;
}
### Conflicts
It is possible to have conficts between two type hashes although the
risk is small. Conflicts are not important as long as an application can
distinguish between all the types it may encouter in actual use. A
switch statement in C will error at compile time for two cases that have
the same value, so the problem is easily detectable and fixable by
modifying a name or a namespace.
For global conflict resolution, a type should be identified by its fully
qualified name using adequate namespaces. This obviously requires it to
be stored separate from the buffer identifier due to size constraints.
### Type Hash Variants
If an alternative buffer format is used, the type hash should be
modified. For example, if `uoffset_t` is defined as a 64-bit value, the
fnv1a64 hash should be used instead. For big endian variants the hash
remains unchanged but is byteswapped. The application will use the same
id while the acces layer will handle the translation.
For buffers using structs as roots, the hash remains unchanged because
the struct is a unique type in schema. In this way a receiver that does
not handle struct roots can avoid trying to read the root as a table.
For futher variations of the format, a format identifier can be inserted
in front of the namespace when generating the hash. There is no formal
approach to this, but as an example, lets say we want to use only 1 byte
per vtable entry and identify these buffers with type hash using the
prefix "ebt:" for example buffer type. We then have the type hash:
#define type_hash_prefix "ebt:"
hash = fnv1a32(type_hash_prefix "Eclectic.FooBar");
hash = hash ? hash : fnv1a32(type_hash_prefix);
If the hash returns 0 we hash the prefix.
## Unions
A union is a contruction on top of the above primitives. It consists of
a type and a value.
In the schema a union type is a set of table types with each table name
assigned a type enumeration starting from 1. 0 is the type NONE meaning
the union has not value assigned. The union type is represented as a
ubyte enum type, or in the binary format as a value of type
`union_type_t` which for standard FlatBuffers is an 8-bit unsigned code
with 0 indicating the union stores not value and a non-zero value
indicating the type of the stored union.
A union is stored in a table as a normal sub-table reference with the
only difference being that the offset does not always point to a table
of the same type. The 8-bit union type is stored as a separate table
field conventially named the same as the union value field except for a
`_type` suffix. The value (storing the table offset) MUST have a field
ID that is exactly one larger than the type field. If value field is
present the type field MUST also be present. If the type is NONE the
value field MUST be absent and the type field MAY be absent because a
union type always defaults to the value NONE.
Vectors of unions is a late addition (mid 2017) to the FlatBuffers
format. FlatCC supports union vectors as of v0.5.0.
Vectors of unions have the same two fields as normal unions but they
both store a vector and both vectors MUST have the same length or both
be absent from the table. The type vector is a vector of 8-bit enums and
the value vector is a vector of table offsets. Obviously each type
vector element represents the type of the table in the corresponding
value element. If an element is of type NONE the value offset must be
stored as 0 which is a circular reference. This is the only offset that
can have the value 0.
A later addition (mid 2017) to the format allows for structs and strings
to also be member of a union. A union value is always an offset to an
independent memory block. For strings this is just the offset to the
string. For tables it is the offset to the table, naturally, and for
structs, it is an offset to an separate aligned memory block that holds
a struct and not an offset to memory inside any other table or struct.
FlatCC supports mixed type unions and vectors of these as of v0.5.0.
## Optional Fields
As of mid 2020 the FlatBuffers format introduced optional scalar table fields.
There is no change to the binary schema, but the semantics have changed slightly
compared to ordinary scalar fields (which remain supported as is): If an
optional field is not stored in a table, it is considered to be a null value. An
optinal scalar field will have null as its default value, so any representable
scalar value will always be stored in the buffer, unlike other scalar fields
which by default do not store the field if the value matches the default numeric
value. This was already possible before by using `force_add` semantics to force
a value to be written even if it was matching the default value, and by
providing an `is_present` test when reading a value so that it would be possible
to distinguish between a value that happened to be a default value, and a value
that was actually absent. However, such techniques were ad-hoc. Optional
fields formalize the semantics of null values for scalars. Other field types
already have meaningful null values. Only table fields can be optional so struct
fields must always assign a value to all members.
## Alignment
All alignments are powers of two between 1 and 256. Large alignments are
only possible via schema specified alignments. The format naturally has
a maximum alignment of the largest scalar it stores, which is a 64-bit
integer or floating point value. Because C malloc typically returns
buffers aligned to a least 8 bytes, it is often safe to place buffers in
heap allocated memory, but if the target system does not permit
unaligned access, or is slow on unaligned access, a buffer should be
placed in sufficiently aligned memory. Typically it is a good idea to
place buffer in cacheline aligned boundary anyway.
A buffers alignment is the same as the largest alignment of any
object or struct it contains.
The buffer size is not guaranteed to be aligned to its own alignment
unlike struct. Googles `flatc` builder does this, at least when size
prefixed. The `FlatCC` tool currently does not, but it might later add
an option to pad up to alignment. This would make it simpler to stack
similar typed buffers in file - but users can retrieve the buffers
alignment and do this manually. Thus, when stacking size prefixed
buffers, each buffer should start aligned to its own size starting at
the size field, and should also be zero padded up to its own alignment.
## Default Values and Deprecated Values
A table can can omit storing any field that is not a required field. For
strings, vectors and tables this result in returning a null value
different from an empty value when reading the buffer. Struct fields not
present in table are also returned as null.
All fields that can return null do not have a default value. Other
values, which are integers, floats and enumerations, can all have
default values. The default value is returned if not found in the table.
If a default value is not specified the default defaults to zero for the
corresponding type. If an enumeration does not have a 0 value and no
explicit default value, this is a schema error.
When building a buffer the builder will compare a stored field to the
known default value. If it matches the field will simple be skipped.
Some builder API's makes it possible to force a default value to be
stored and to check if a field is missing when reading the buffer. This
can be used to handle NULL values differently from default or missing
values.
A deprecated field is a schema construct. The binary format either stores
a table field, or it does not.
A deprecated field should be treated as not available, as in no way to
read the value as opposed to returning a default value regardless of
whether the field is present or not. If they for some reason are made
accessible the verifier must also understand and verify these fields.
A deprecated field also cannot be written to a new buffer, although if
it against guidelines remains possible to do so, it should be done as if
the field was not deprecated.
Structs cannot have default values and cannot have deprecated fields in
stadard FlatBuffers. FlatCC supports marking a struct field as
deprecated. This implies the field will always be zeroed and with no
trivial accessors. A struct can never change size without breaking
support for schema evolution.
FlatCC JSON parsers allow structs to only set some values. Remaining
values will be implicitly zeroed. The C API for writing buffers do not
have this concern because a struct can just be written as a C struct
so there is no control over which fields a user choose to set or zero.
However, structs should be zeroed and padded where data is not otherwise
set. This makes it possible to hash and integrity check structs (though
this is not an integral part of the format).
## Schema Evolution
A table has a known set of low-valued field identifiers. Any unused
field id can be used in a future version. If a field (as is normal) is
implicitly assigned an id, new fields can only be added at the end of
the table. Internally this translates into new versions getting ever
larger vtables. Note that vtables are only stored as large as needed for
the actual content of table, so a rarely used new field will not cause
vtables to grew when unused.
Enumarations may not change values in future versions. Unions may only
added new table names to the end of the union list.
Structs cannot change size nor content. They cannot evolve. FlatCC
permits deprecating fields which means old fields will be zeroed.
Names can be changed to the extend it make sense to the applications already
written around the schema, but it may still break applications relying
on some reflective information. For example, a reader may provide the
string representation of a numeric enum value.
New types can be added. For example adding a new table is always safe
as long as it does not conflict with any existing schemas using the same
namespace.
Required fields cannot stop being required and they cannot be deprecated.
Various attributes and other changes form a gray area that will not make
the binary format unsafe but may still break due to changes in code
generation, serialization to JSON, or similar. For example, a generated
constructor that creates a table from a list of positional arguments
might break if the field order changes or grows or have fields
deprecated. JSON parsers could cease to work on old formats if base64
serialization is added subsequently, and so on.
## Keys and Sorting
Keys and sorting is a meta construct driven by the schema. The binary
format has no special concept of keys and sorting and a vector can be
sorted by one of several keys so it makes no sense to enforce a specific
order.
The basic FlatBuffers format only permit at most one key and generally
sorts vectors by that key during buffer construction. FlatCC does not do
this both because sorting is not practical while building the buffer and
because FlatCC supports sorting by one of several keys. Thus, in general
it is not safe to assume that a vector is sorted, but it can be sorted
if needed.
## Size Limits
A buffer should never be larger than `2^(sizeof(soffset_t) * 8 - 1) - 1`
or `2^31 - 1` i.e. 2GB for standard FlatBuffers. Beyond this safe it is
not safe to represent vtable offsets and implementations can no longer
use signed types to store `uoffset_t` values. This limit also ensures
that all vectors can be represented safely with a signed 32-bit length
type.
The application interface is likely to use a native type for
representing sizes and vector indices. If this type is smaller that
`sizeof(soffset_t)` or equivalently `sizeof(uoffset_t)` there is a risk
of overflow. The simplest way to avoid any issues is to limit the
accepted buffer size of the native size type. For example, on some
embedded microprocessor systems a C compiler might have a 16-bit int and
`size_t` type, even if it supports `uint32_t` as well. In this the safe
assumption is to limit buffers to `2^15 - 1` which is very likely more
than sufficient on such systems.
A builder API might also want to prevent vectors from being created when
they cannot stay within the size type of the platform when the element
size is multipled by the element count. This is deceiving because the
element count might easily be within range. Such issues will be rare in
praxis but they can be the source of magnificent vulnerabilites if not
handled appropriately.
## Verification
Verification as discussed here is not just about implementing a
verifier. It is as much requirements that any builder must fulfill.
The objective of verification is to make it safe to read from an
untrusted buffer, or a trusted buffer that accidentally has an
unexpected type. The verifier does not guarantee that the type is indeed
the expeced type exact it can read the buffer identifier if present
which is still no guarantee. The verifier cannot make it safe to modify
buffers in-place because the cost of doing such a check would be
prohitive in the average case.
A buffer verifier is expected to verify that all objects (strings,
vectors and tables) do not have an end position beyond the externally
specified buffer size and that all offset are aligned relative to offset
zero, and sometimes also relative to actually specified buffer (but
sometimes it is desireable to verify buffers that are not stored
aligned, such as in network buffers).
A verifier primarily checks that:
- the buffer is at least `2 * sizeof(uoffset_t)` such that the root
root table and buffer identifier can be checked safely.
- any offset being chased is inside the buffer that any data accessed
from that resuling location is entirely inside the buffer and aligned
notably the vtables first entry must be valid before the vtable can
safely validate itself, but this also applies to table, string and
vector fields.
- any uoffset has size of at least `sizeof(uoffse_t)` to aviod
self-reference and is no larger than the largest positive soffset_t
value of same size - this ensures that implementations can safely add
uoffset_t even if converting to signed first. It also, incidentally,
ensures compatibility with StreamBuffers - see below.
- vtable size is aligned and does not end outside buffer.
- vtable size is at least the two header fields
(`2 * `sizeof(voffset_t)`).
- required table fields are present.
- recursively verify all known fields and ignore other fields. Unknown
fields are vtable entries after the largest known field ID of a table.
These should be ignored in order to support forward versioning.
- deprecated fields are valid if accessors are available to do so, or
ignore if the there is no way to access the field by application code.
- vectors end within the buffer.
- strings end within the buffer and has a zero byte after the end which
is also within the buffer.
- table fields are aligned relative to buffer start - both structs,
scalars, and offset types.
- table field size is aligned relative to field start.
- any table field does not end outside the tables size as given by the
vtable.
- table end (without chasing offsets) is not outside buffer.
- all data referenced by offsets are also valid within the buffer
according the type given by the schema.
- verify that recursion depth is limited to a configurable acceptable
level for the target system both to protect itself and such that
general recursive buffer operations need not be concerned with stack
overflow checks (a depth of 100 or so would normally do).
- verify that if the union type is NONE the value (offset) field is absent and
if it is not NONE that the value field is present. If the union type
is known, the table should be verified. If the type is not known
the table field should be ignored. A reader using the same schema would
see the union as NONE. An unknown union is not an error in order to
support forward versioning.
- verifies the union value according to the type just like any other
field or element.
- verify that a union vector always has type vector if the offset vector
is present and vice versa.
A verifier may choose to reject unknown fields and union types, but this
should only be an user selectable option, otherwise schema evolution
will not be possible.
A verifier needs to be very careful in how it deals with overflow and
signs. Vector elements multiplied by element size can overflow. Adding
an invalid offset might become valid due to overflow. In C math on
unsigned types yield predictable two's complement overflow while signed
overflow is undefined behavior and can and will result in unpredictable
values with modern optimizing compilers. The upside is that if the
verifier handles all this correctly, the application reader logic can be
much simpler while staying safe.
A verifier does __not__ enforce that:
- structs and other table fields are aligned relative to table start because
tables are only aligned to their soffset field. This means a table cannot be
copied naively into a new buffer even if it has no offset fields.
- the order of individual fields within a table. Even if a schema says
something about ordering this should be considered advisory. A
verifier may additionally check for ordering for specific
applications. Table order does not affect safety nor general buffer
expectations. Ordering might affect size and performance.
- sorting as specified by schema attributes. A table might be sorted in
different ways and an implementation might avoid sorting for
performance reasons and other practical reasons. A verifier may,
however, offer additional verification to ensure specific vectors or
all vectors are sorted according to schema or other guidelines. Lack
of sorting might affect expected behavior but will not make the buffer
unsafe to access.
- that structures do not overlap. Overlap can result in vulnerabilities
if a buffer is modified, and no sane builder should create overlaps
other than proper DAGs except via a separate compression/decopression
stage of no interest to the verifier.
- strings are UTF-8 compliant. Lack of compliance may affect expections
or may make strings appear shorter or garbled. Worst case a naive
UTF-8 reader might reach beyond the end when observing a lead byte
suggest data after buffer end, but such a read should discover the
terminal 0 before things get out of hand. Being relaxed permits
specific use cases where UTF-8 is not suitable without giving up the
option to verify buffers. Implementations can add additional
verification for specific usecases for example by providing a
strict-UTF8 flag to a verifier or by verifying at the application
level. This also avoids unnecessary duplicate validation, for example
when an API first verifies the buffer then converts strings to an
internal heap representation where UTF-8 is validated anyway.
- default values are not stored. It is common to force default values to
be stored. This may be used to implement NULL values as missing
values different from default values.
- enum values are only among the enum values of its type. There are many
use cases where it is convenient to add enum values for example flags
or enums as units e.g. `2 * kilo + 3 * ounce. More generally ordinary
integers may have value range restrictions which is also out of scope
for verifier. An application may provide additional verification when
requested.
More generally we can say that a verifier is a basic fast assurance that
the buffer is safe to access. Any additional verification is application
specific. The verifier makes it safe to apply secondary validation.
Seconary validation could be automated via schema attributes and may be
very useful as such, but it is a separate problem and out of scope for a
core binary format verifier.
A buffer identifier is optional so the verifier should be informed
whether an identifier must match a given id. It should check both ASCII
text and zero padding not just a string compare. It is non-trivial to
decide if the second buffer field is an identifier, or some other data,
but if the field does not match the expected identifier, it certainly
isn't what is expected.
Note that it is not entirely trivial to check vector lengths because the
element size must be mulplied by the stored element count. For large
elements this can lead to overflows.
## Risks
Because a buffer can contain DAGs constructed to explode exponentiall,
it can be dangerous to print JSON or otherwise copy content blindly
if there is no upper limit on the export size.
In-place modification cannot be trusted because a standard buffer
verifier will detect safe read, but will not detect if two objects are
overlapping. For example, a table could be stored inside another table.
Modifing one table might cause access to the contained table to go out
of bounds, for example by directing the vtable elsewhere.
The platform native integer and size type might not be able to handle
large FlatBuffers - see [Size Limits](#size-limits).
Becaue FlatCC requires buffers to be sorted after builiding, there is
risk due to buffer modifications. It is not sufficient to verify buffers
after sorting because sorting is done inline. Therefore buffers must be
trusted or rewritten before sorting.
## Nested FlatBuffers
FlatBuffers can be nested inside other FlatBuffers. In concept this is
very simple: a nested buffer is just a chunk of binary data stored in a
`ubyte` vector, typically with some convenience methods generated to
access a stored buffer. In praxis it adds a lot of complexity. Either
a nested buffer must be created strictly separately and copied in as
binary data, but often users mix the two builder contexts accidentally
storing strings from one buffer inside another. And when done right, the
containing ubyte vector might not be aligned appropriately making it
invalid to access the buffer without first copying out of the containing
buffer except where unaligned access is permitted. Further, a nested
FlatBuffer necessarily has a length prefix because any ubyte vector has
a length field at its start. Therefore, size prefixed flatbuffers should
not normally be stored as nested buffers, but sometimes it is necessary
in order have the buffer properly aligned after extraction.
The FlatCC builder makes it possible to build nested FlatBuffers while
the containing table of the parent buffer is still open. It is very
careful to ensure alignment and to ensure that vtables are not shared
between the two (or more) buffers, otherwise a buffer could not safely
be copied out. Users can still make mistakes by storing references from
one buffer in another.
Still, this area is so complex that several bugs have been found.
Thus, it is advice to use nested FlatBuffers with some care.
On the other hand, nested FlatBuffers make it possible to trivially
extract parts of FlatBuffer data. Extracting a table would require
chasing pointers and could potentially explode due to shared sub-graphs,
if not handled carefully.
## Fixed Length Arrays
This feature can be seen as equivalent to repeating a field of the same type
multiple times in struct.
Fixed length array struct fields has been introduced mid 2019.
A fixed length array is somewhat like a vector of fixed length containing inline
fixed length elements with no stored size header. The element type can be
scalars, enums and structs but not other fixed length errors (without wrapping
them in a struct).
An array should not be mistaken for vector as vectors are independent objects
while arrays are not. Vectors cannot be fixed length. An array can store fixed
size arrays inline by wrapping them in a struct and the same applies to unions.
The binary format of a fixed length vector of length `n` and type `t` can
be precisely emulated by created a struct that holds exactly `n` fields
of type `t`, `n > 0`. This means that a fixed length array does not
store any length information in a header and that it is stored inline within
a struct. Alignment follows the structs alignment rules with arrays having the
same alignment as their elements and not their entire size.
The maximum length is limited by the maximum struct size and / or an
implementation imposed length limit. Flatcc accepts any array that will fit in
struct with a maximum size of 2^16-1 by default but can be compiled with a
different setting. Googles flatc implementation currently enforces a maximum
element count of 2^16-1.
Assuming the schema compiler computes the correct alignment for the overall
struct, there is no additonal work in verifying a buffer containing a fixed
length array because structs are verified based on the outermost structs size
and alignment without having to inspect its content.
Fixed lenght arrays also support char arrays. The `char` type is similar to the
`ubyte` or `uint8` type but a char can only exist as a char array like
`x:[char:10]`. Chars cannot exist as a standalone struct or table field, and
also not as a vector element. Char arrays are like strings, but they contain no
length information and no zero terminator. They are expected to be endian
neutral and to contain ASCII or UTF-8 encoded text zero padded up the array
size. Text can contain embedded nulls and other control characters. In JSON form
the text is printed with embedded null characters but stripped from trailing
null characters and a parser will padd the missing null characters.
The following example uses fixed length arrays. The example is followed by the
equivalent representation without such arrays.
struct Basics {
int a;
int b;
}
struct MyStruct {
x: int;
z: [short:3];
y: float;
w: [Basics:2];
name: [char:4];
}
// NOT VALID - use a struct wrapper:
table MyTable {
t: [ubyte:2];
m: [MyStruct:2];
}
Equivalent representation:
struct MyStructEquivalent {
x: int;
z1: short;
z2: short;
z3: short;
y: float;
wa1: int;
wa2: int;
name1: ubyte;
name2: ubyte;
name3: ubyte;
name4: ubyte;
}
struct MyStructArrayEquivalent {
s1: MyStructEquivalent;
s2: MyStructEquivalent;
}
struct tEquivalent {
t1: ubyte;
t2: ubyte;
}
table MyTableEquivalent {
t: tEquivalent;
m: MyStructArrayEquivalent;
}
Note that forced zero-termination can be obtained by adding a trailing ubyte
field since uninitialized struct fields should be zeroed:
struct Text {
str: [char:255];
zterm: ubyte;
}
Likewise a length prefix field could be added if the applications involved know
how to interpret such a field:
struct Text {
len: ubyte;
str: [char:254];
zterm: ubyte;
}
The above is just an example and not part of the format as such.
## Big Endian FlatBuffers
FlatBuffers are formally always little endian and even on big-endian
platforms they are reasonably efficient to access.
However it is possible to compile FlatBuffers with native big endian
suppport using the FlatCC tool. The procedure is out of scope for this
text, but the binary format is discussed here:
All fields have exactly the same type and size as the little endian
format but all scalars including floating point values are stored
byteswapped within their field size. Offset types are also byteswapped.
The buffer identifier is stored byte swapped if present. For example
the 4-byte "MONS" identifier becomes "SNOM" in big endian format. It is
therefore reasonably easy to avoid accidentially mixing little- and
big-endian buffers. However, it is not trivial to handle identifers that
are not exactly 4-bytes. "HI\0\0" could be "IH\0\0" or "\0\0IH". It is
recommended to always use 4-byte identifies to avoid this problem. See
the FlatCC release 0.4.0 big endian release for details.
When using type hash identifiers the identifier is stored as a big
endian encoded hash value. The user application will the hash in its
native form and accessor code will do the conversion as for other
values.
## StructBuffers
_NOTE: the Google FlatBuffer project originally documented structs as
valid root objects, but never actually implemented it, and has as of mid
2020 changed the specification to disallow root structs as covered in
this section. FlatCC for C has been supporting root structs for a long
time, and they can provide significant speed advantages, so FlatCC will
continue to support these._
Unlike tables, structs are are usually embedded in in a fixed memory
block representing a table, in a vector, or embedded inline in other
structs, but can also be independent when used in a union.
The root object in FlatBuffers is conventionally expected to be a table,
but it can also be struct. FlatCC supports StructBuffers. Since structs
do not contain references, such buffers are truly flat. Most
implementations are not likely to support structs are root but even so
they are very useful:
It is possible to create very compact and very fast buffers this way.
They can be used where one would otherwise consider using manual structs
or memory blocks but with the advantage of a system and language
independent schema.
StructBuffers may be particularly interesting for the Big Endian
variant of FlatBuffers for two reasons: the first being that performance
likely matters when using such buffers and thus Big Endian platforms
might want them. The second reason is the network byte order is
traditionally big endian, and this has proven very difficult to change,
even in new evolving IETF standards. StructBuffers can be used to manage
non-trival big endian encoded structs, especially structs containing
other structs, even when the receiver does not understand FlatBuffers as
concept since the header can just be dropped or trivially documented.
## StreamBuffers
StreamBuffers are so far only a concept although some implementations
may already be able to read them. The format is documented to aid
possible future implementations.
StreamBuffers makes it possible to store partially completed buffers
for example by writing directly to disk or by sending partial buffer
data over a network. Traditional FlatBuffers require an extra copying
step to make this possible, and if the writes are partial, each section
written must also store the segment length to support reassembly.
StreamBuffers avoid this problem.
StreamBuffers treat `uoffset_t` the same as `soffset_t` when the special
limitation that `uoffset_t` is always negative when viewed as two's
complement values.
The implication is that everthing a table or vector references must be
stored earlier in the buffer, except vtables that can be stored freely.
Existing reader implementations that treat `uoffset_t` as signed, such as
Javascript, will be able to read StreamBuffers with no modification.
Other readers can easily be modified by casting the uoffset value to
signed before it.
Verifiers must ensure that any buffer verified always stores all
`uoffset_t` with the same sign. This ensures the DAG structure is
preserved without cycles.
The root table offset remains positive as an exception and points to the
root buffer. A stream buffer root table will be the last table in the
buffer.
The root table offset may be replaced with a root table offset at the
end of the buffer instead of the start. An implementation may also
zero the initial offset and update it later. In either case the buffer
should be aligned accordingly.
Some may prefer the traditional FlatBuffer approach because the root
table is stored and it is somehow easier or faster to access, but modern
CPU cache systems do not care much about the order of access as long as
as their is some aspect of locality of reference and the same applies to
disk access while network access likely will have the end of the buffer
in hot cache as this is the last sent. The average distance between
objects will be the same for both FlatBuffers and StreamBuffers.
## Bidirectional Buffers
Bidirectional Buffers is a generalization of StreamBuffers and FlatBuffers
and so far only exists as an idea.
FlatBuffers and StreamBuffers only allow one direction because it
guarantees easy cycle rejection. Cycles are unwanted because it is
expensive to verify buffers with cycles and because recursive navigation
might never terminate.
As it happens, but we can also easily reject cycles in bidirectional
buffers if we apply certain constraints that are fully backwards
compatible with existing FlatBuffers that have a size below 2GB.
The rule is that when an offset changes direction relative to a parent
objects direction, the object creating the change of direction becomes a
new start or end of buffer for anything reachable via that offset.
The root table R is reached via forward reference from the buffer
header. Its boundaries are itself and the buffer end. If the this table
has an offset pointing back, for example to new table X, then table X
must see the buffer start and R as two boundaries. X must not directly
or indirectly reach outside this region. Likewise, if the table R points
to new table Y in the forward direction, then Y is bounded by itself and
the buffer end.
A lower bound is the end of a table or a vector althoug we just say the
table or vector is a boundary. An upper bound is until the start of the
table or vector, or the end of the buffer.
When a buffer is verified, the boundaries are initially the buffer start
and buffer end. When the references from the root table are followed,
there new boundaries are either buffer start to root table, or root
table to end, depending on the offsets direction. And so forth
recursively.
We can relax the requirement that simple vectors and vtables cannot
cross these boundaries, except for the hard buffer limits, but it does
compicate things and is likely not necessary.
Nested FlatBuffers already follow similar boundary rules.
The point of Bidirectional buffers is not that it would be interesting
to store things in both directions, but to provide a single coherent
buffer model for both ways of storing buffers without making a special
case. This will allow StreamBuffers to co-exist with FlatBuffers without
any change, and it will allow procedures to build larger buffers to make
their own changes on how to build their subsection of the buffer.
We still need to make allowance for keeping the buffer headers root
pointer implicit by context, at least as an option. Otherwise it is not,
in general, possible to start streaming buffer content before the entire
buffer is written.
## Possible Future Features
This is highly speculative, but documents some ideas that have been
floating in order to avoid incompatible custom extensions on the same
theme. Still these might never be implemented or end up being
implemented differently. Be warned.
### Force Align
`force_align` attribute supported on fields of structs, scalars and
and vectors of fixed length elements.
### Mixins
A mixin is a table or a struct that is apparently expanded into the table
or struct that contains it.
Mixins add new accessors to make it appear as if the fields of the mixed
in type is copied into the containing type although physically this
isn't the case. Mixins can include types that themselves have mixins
and these mixed in fields are also expanded.
A `mixin` is an attribute applied to table or a struct field when that
field has a struct or a table type. The binary format is unchanged and
the table or struct will continue to work as if it wasn't a mixin and
is therefore fully backwards compatible.
Example:
struct Position {
spawned: bool(required);
x: int;
y: int;
}
table Motion {
place: Position(mixin);
vx: int = 0;
vy: int = 0;
}
table Status {
title: string;
energy: int;
sleep: int;
}
table NPC {
npcid: int;
motion: Motion(mixin);
stat: Status(mixin);
}
table Rock {
here: Position(mixin);
color: uint32 = 0xa0a0a000;
}
table Main {
npc1: NPC;
rock1: Rock;
}
root_type Main;
Here the table NPC and Rock will appear with read accessors is if they have the fields:
table NPC {
npcid: int;
spawned: bool(required);
x: int;
y: int;
vx: int = 0;
vy: int = 0;
title: string;
energy: int;
sleep: int;
place: Position;
motion: Motion(required);
stat: Status;
}
table Rock {
spawned: bool(required);
x: int;
y: int;
here: Position(required);
color: uint32 = 0xa0a0a000;
}
table Main {
npc1: NPC;
rock1: Rock;
}
root_type Main;
or in JSON:
{
"npc1": {
"npcid": 1,
"spawned": true;
"x": 2,
"y": 3,
"vx": -4,
"vy": 5,
"title": "Monster",
"energy": 6,
"sleep": 0
},
"rock1": {
"spawned": false;
"x": 0,
"y": 0,
"color": 0xa0a0a000
}
}
Note that there is some redundancy here which makes it possible to
access the mixed in fields in different ways and to perform type casts
to a mixed in type.
A cast can happen through a generated function along the lines of
`npc.castToPosition()`, or via field a accessor `npc.getPlace()`.
A JSON serializer excludes the intermediate container fields such as
`place` and `motion` in the example.
A builder may choose to only support the basic interface and required
each mixed in table or struct be created separately. A more advanced
builder would alternatively accept adding fields directly, but would
error if a field is set twice by mixing up the approaches.
If a mixed type has a required field, the required status propagates to
the parent, but non-required siblings are not required as can be seen in
the example above.
Mixins also places some constraints on the involved types. It is not
possible to mix in the same type twice because names would conflict and
it would no longer be possible to do trivially cast a table or struct
to one of its kinds. An empty table could be mixed in to
provide type information but such a type can also only be added once.
Mixing in types introduces the risk of name conflicts. It is not valid
to mix in a type directly or indirectly in a way that would lead to
conflicting field names in a containing type.
Note that in the example it is not possible to mix in the pos struct
twice, otherwise we could have mixed in a Coord class twice to have
position and velocity, but in that case it would be natural to
have two fields of Coord type which are not mixed in.
Schema evolution is fully supported because the vtable and field id's
are kept separate. It is possible to add new fields to any type that
is mixed in. However, adding fields could lead to name conficts
which are then reported by the schema compiler.
[eclectic.fbs]: https://github.com/dvidelabs/flatcc/blob/master/doc/eclectic.fbs
|