Merge commit '5a40295c4cf0af5ea8da9ced04a4ce7d3621a080' as 'flatcc'

author: Toni Uhlig <matzeton@googlemail.com> 2023-07-16 02:03:33 +0200
committer: Toni Uhlig <matzeton@googlemail.com> 2023-07-16 02:03:33 +0200
commit: b31e4bc16d1df62b50c6f77a77041f9e7b6c906d (patch)
tree: 024c74c13d918aa6bde302aab6836fa33607613c /flatcc/doc/security.md
parent: ba6815ef8fb8ae472412b5af2837a7caba2799c2 (diff)
parent: 5a40295c4cf0af5ea8da9ced04a4ce7d3621a080 (diff)
1 files changed, 272 insertions, 0 deletions
diff --git a/flatcc/doc/security.md b/flatcc/doc/security.md
new file mode 100644
index 0000000..52e0515
--- /dev/null
+++ b/flatcc/doc/security.md
@@ -0,0 +1,272 @@
+# Security Considerations
+
+This sections covers questions such as when is it safe to access a
+buffer without risking access violation, buffer overruns, or denial of
+service attacks, but cannot possibly cover all security aspects.
+
+## Reading Buffers
+
+When reading a buffer you have to know the schema type before
+reading and it is preferable to know the size of the buffer although not
+strictly required. If the type is not known, the `file_type`, aka buffer
+type, can be checked. This no guarantee due to collisions with other
+data formats and because the identifier field may be absent or
+misleading. The identifier therefore works best on buffers that can be
+trusted.
+
+If a buffer cannot be trusted, such as when receiving it over a public
+network, it may be the case that buffer type is known, but it is not
+known if someone uses an incorrect implementation of FlatBuffers, or if
+the buffer has somehow been corrupted in transit, or someone
+intentionally tampered with the buffer. In this case the buffer can be
+verified. A verification does not prove that the buffer has the correct
+type, but it does prove that it is safe to read (not write) from the
+buffer. The buffer size must be known in order to verify the buffer. If
+the buffer has a wrong type, but still (unlikey but possible) passes
+verification, then unexpected data may be read from the buffer, but it
+will not cause any crashes when using the API correctly.
+
+It is preferable to know the required alignment of a buffer which isn't
+trivially available unless retrieved from the builder when the buffer is
+created. The buffer alignment can be deduced from the schema.
+
+On many systems there is no real problem in accessing a buffer
+unaligned, but for systems where it matters, care must be taken because
+unaligned access can result in slow performance or access violations.
+Even on systems where alignment matters, a standard malloc operation is
+often sufficient because it normally aligns to the largest word that
+could cause access violations when unaligned. For special use case such
+as GPU memory access more alignment may be needed and FlatBuffers
+support higher alignments in the schema. Portable `aligned_alloc` and
+`aligned_free` support methods are available to help allocate memory with
+sufficient alignment. Because compile time flags can change between
+compilation of the runtime library and the application,
+`flatcc_builder_aligned_free` ensures a consistent deallocation method
+for aligned buffers allocated by the runtime library.
+
+A verifier for C requires the buffer to placed in aligned memory and it
+will fail if the buffer content is not properly aligned relative to the
+buffer or to an absolute memory address regardless of whether the
+current systems requires alignment or not. Therefore a buffer verified
+on one system is safe to use on all systems. One could use this fact to
+sign a buffer, but this is beyond the scope of FlatBuffers itself, and
+verifying a signature is likely much slower than re-verifying a buffer
+when a verifier is available.
+
+Note: it would be helpful if the verifier allowed verification only
+relative to the buffer start instead of requiring the absolute addresses
+to be aligned. This would allow verification of buffers before copying
+them out of unaligned locations in network buffers and also allow
+subsequent reading of such buffers without copying iff the system
+supports unaligned access. However, the verifier does not currently
+support this.
+
+It is not always safe to verify a buffer. A buffer can be constructed to
+trigger deep nesting. The FlatCC verifier has a hard coded non-exact
+limit of about 100 levels. This is to protection stack recursion. If the
+limit is exceeded, the verifier will safely fail. The limit can be
+changed with a compile time flag. If the limit is too permissive a
+system may run into stack overflow, but it is unlikely on most systems
+today. Typical application code may have similar recursive access
+functions. Therefore it is likely that recursion is safe if the verifier
+succeeds but it depends on the application.
+
+A buffer can point to the same data from multiple places. This is known
+as a DAG. The verifier rejects cycles that could lead to infinite loops
+during application traversal but does permit DAGs. For normal use DAGs
+are safe but it is possible to maliciously construct a buffer with a
+long vector where all elements points to a table that also has a vector
+of a similar nature. After a few levels, this can lead to a finite but
+exponentially large number of places to visit. The current FlatCC
+verifier does not protect against this but Googles flatc compiler has a
+verifier the limits the number of visited tables.
+
+When reading a buffer in C no memory allocation takes place after the
+buffer has initially been placed in memory. For example, strings can be
+read directly as C strings and strings are 0-terminated. A string might
+contain embedded 0 bytes which is not strictly correct but permitted.
+This will result in a shorter string if used naively as a C string
+without reading the string length via the API but it will still be safe.
+Other languages might have to allocate memory for objects such as
+strings though.
+
+A field can generally be absent. Scalar fields are always safe to
+access, even if they are absent, because they have a default value that
+will be returned. Tables, Vectors, Strings, and Structs may return null
+when a field is absent. This is perfectly valid but if the application
+does not check for null, this can lead to an access violation.
+
+A field can marked 'required' in the schema. If it is required, it will
+never return null unless the buffer is invalid. A verifier will detect
+this. On a practical note some FlatBuffer builders might not enforce the
+required field and readers do not always verify buffers before access
+(nor should they have to) - therefore an application is advised to check
+return values even on required fields unless the buffer is entirely
+trusted.
+
+If a buffer is verified, it is safe to access all vector elements up to
+its size. Access of elements via API calls do not necessarily check for
+out of bounds but some might.
+
+A buffer may also be encoded in big endian format. This is not standard,
+but FlatCC supports for systems that are primarily big endian. The
+buffer identifier will usually detect the difference because the
+identifier will be byte swapped. A reader therefore need to be aware of
+this possiblity, but most often this is not a concern since standard
+FlatBuffers are always little endian. The verifier will likely fail an
+unpexcted endian encoding but at least make it safe to access.
+
+
+## Thread Safety
+
+There is no thread safety on the FlatBuffers API but read access does
+not mutate any state. Every read location is a temporary variable so as
+long as the application code is otherwise sane, it is safe read a buffer
+from multiple threads and if the buffer is placed on cache line
+alignment (typically 64 or 128 bytes) it is also efficient without false
+sharing.
+
+A verifier is also safe to use because it it only reads from a buffer.
+
+A builder is inherently NOT safe for multihreaded access. However, with
+proper synchronization there is nothing preventing one thread from doing
+the grunt work and another putting the high level pieces together as
+long as only one thread at a time is access the builder object, or the
+associated allocator and emitter objects. From a performance perspective
+this doesn't make much sense, but it might from an architectural
+perspective.
+
+A builder object can be cleared and reused after a buffer is constructed
+or abandoned. The clear operation can optionally reduce the amount of
+memory or keep all the memory from the previous operation. In either
+case it is safe for new thread to use the builder after it is cleared
+but two threads cannot use the builder at the same time.
+
+It is fairly cheap to create a new builder object, but of course cheaper
+to reuse existing memory. Often the best option is for each thread to
+have its own builder and own memory and defer any sharing to the point
+where the buffer is finished and readable.
+
+
+## Schema Evolution
+
+Accessing a buffer that was created by a more recent of a FlatBuffers
+schema is safe iff the new version of the schema was created according
+the guidelines for schema evolution - notably no change of field
+identifiers or existing enum values and no removal or deprecation of
+required fields. Googles flatc tool can check if a new schema version is
+safe.
+
+Fields that are not required but deprecated in a new version will still
+be safe to access by old version but they will likely read default or
+null values and should be prepared for this.
+
+
+## Copying or Printing Buffer Content
+
+Even if it is safe to read a buffer, it is not safe to copy or even
+print a buffer because a DAG can unfold to consume much more output
+space than the given input. In data compression this is known as a Zip
+Bomb but even without malicious intent, users need to aware of the
+potential expansion. This is also a concern when printing JSON.
+
+A table also cannot be trivially copied based on memory content because
+it has offsets to other content. This is not an issue when using any
+official API but may be if a new buffer is attempted to be constructed
+by Frankenstein from parts of existing buffers.
+
+Nested buffers are not allowed to share any content with its parents,
+siblings or child buffers for similar reasons.
+
+The verifier should complain if a buffer goes out if bounds.
+
+
+## Modifying Buffers
+
+It is not safe to modify buffers in-place unless the buffer is
+absolutely trusted. Verifying a buffer is not enough. FlatC does not
+provide any means to modify a buffer in-place but it is not hard to
+achieve this is if so desired. It is especially easy to do this with
+structs, so if this is needed this is the way to do it.
+
+Modifying a buffer is unsafe because it is possible to place one table
+inside another table, as an example, even if this is not valid. Such
+overlaps are too expensive to verify by a standard verifier. As long as
+the buffer is not modified this does not pose any real problem, but if a
+field is modified in one table, it might cause a field of another table
+to point out of bounds. This is so obvious an attack vector that anyone
+wanting to hack a system is likely to use this approach. Therefore
+in-place modification should be avoided unless on a trusted platform.
+For example, a trusted network might bump a time-to-live counter when
+passing buffers around.
+
+Even if it is safe to modify buffers, this will not work on platforms
+that require endian conversion. This is usually big endian platforms,
+but it is possible to compile flatbuffers with native big endian format
+as well.
+platforms unless extra precautions are taken. FlatCC has a lot of
+low-level `to/from_pe` calls that performs the proper
+
+
+## Building Buffers
+
+A buffer can be constructed incorrectly in a large number of ways that
+are not efficient to detect at runtime.
+
+When a buffer is constructed with a debug library then assertions will
+sometimes find the most obvious problems such as closing a table after
+opening a vector. FlatCC is quite permissive in the order of object
+creation but the nesting order must be respected, and it cannot type
+check all data. Other FlatBuffer builders typically require that child
+objects are completed created before a parent object is started. FlatCC
+does not require this but will internally organize objects in a
+comptible way. This removes a number of potential mistakes, but not all.
+
+Notably a table from a parent object or any other external reference
+should not be used in a nested buffer.
+
+It is a good idea to run a verifier on a constructed buffer at least
+until some confidence has been gained in the code building buffers.
+
+If a buffer needs to be constructed with sorted keys this cannot be done
+during construction, unlike the C++ API because the builder allocates as
+little memory as possible. Instead the reader interface supports a
+mutable cast for use with a sort operation. This sort operation must
+only be used on absolutely trusted buffers and verification is not
+sufficient if malicous overlaps can be expected.
+
+The builder will normally consume very little memory. It operates a few
+stacks and a small hash table in additional to a circular buffer to
+consum temporary buffer output. It is not possible to access constructed
+buffer objects buffer the buffer is complete because data may span
+multiple buffers. Once a buffer is complete the content can be
+copied out, or a support function can be used allocate new memory with
+the final buffer as content.
+
+The internal memory can grow large when the buffer grows large,
+naturally. In addition the temporary stacks may grow large if there are
+are large tables or notably large vectors that cannot be be copied
+directly to the output buffers. This creates a potential for memory
+allocation errors, especially on constrained systems.
+
+The builder returns error codes, but it is tedious to check these. It is
+not necessary to check return codes if the API is used correctly and if
+there are no allocation errors. It is possible to provide a custom
+allocator and a custom emitter. These can detect memory failures early
+making it potentially safe to use the builder API without any per
+operation checks.
+
+The generated JSON parser checks all return codes and can be used to
+construct a buffer safely, especially since the buffer is naturally
+bounded by the size of the JSON input. JSON printing, on the other hand,
+can potentially explode, as discussed earlier.
+
+FlatCC generated create calls such as `MyGame_Example_Monster_create()`
+will not be compatible across versions if there are deprecated fields
+even if the schema change otherwise respects schema evolutation rules.
+This is mostly a concern if new fields are added because compilation
+will otherwise break on argument count mismatch. Prior to flatcc-0.5.3
+argument order could change if the field (id: x) attribute was used
+which could lead to buffers with unexpected content. JSON parsers that
+support constructors (objects given as an array of create arguments)
+have similar concerns but here trailing arguments can be optional.
author	Toni Uhlig <matzeton@googlemail.com>	2023-07-16 02:03:33 +0200
committer	Toni Uhlig <matzeton@googlemail.com>	2023-07-16 02:03:33 +0200
commit	b31e4bc16d1df62b50c6f77a77041f9e7b6c906d (patch)
tree	024c74c13d918aa6bde302aab6836fa33607613c /flatcc/doc/security.md
parent	ba6815ef8fb8ae472412b5af2837a7caba2799c2 (diff)
parent	5a40295c4cf0af5ea8da9ced04a4ce7d3621a080 (diff)