| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the concept of "global context".
Right now every instance of `struct ndpi_detection_module_struct` (we
will call it "local context" in this description) is completely
independent from each other. This provide optimal performances in
multithreaded environment, where we pin each local context to a thread,
and each thread to a specific CPU core: we don't have any data shared
across the cores.
Each local context has, internally, also some information correlating
**different** flows; something like:
```
if flow1 (PeerA <-> Peer B) is PROTOCOL_X; then
flow2 (PeerC <-> PeerD) will be PROTOCOL_Y
```
To get optimal classification results, both flow1 and flow2 must be
processed by the same local context. This is not an issue at all in the far
most common scenario where there is only one local context, but it might
be impractical in some more complex scenarios.
Create the concept of "global context": multiple local contexts can use
the same global context and share some data (structures) using it.
This way the data correlating multiple flows can be read/write from
different local contexts.
This is an optional feature, disabled by default.
Obviously data structures shared in a global context must be thread safe.
This PR updates the code of the LRU implementation to be, optionally,
thread safe.
Right now, only the LRU caches can be shared; the other main structures
(trees and automas) are basically read-only: there is little sense in
sharing them. Furthermore, these structures don't have any information
correlating multiple flows.
Every LRU cache can be shared, independently from the others, via
`ndpi_set_config(ndpi_struct, NULL, "lru.$CACHE_NAME.scope", "1")`.
It's up to the user to find the right trade-off between performances
(i.e. without shared data) and classification results (i.e. with some
shared data among the local contexts), depending on the specific traffic
patterns and on the algorithms used to balance the flows across the
threads/cores/local contexts.
Add some basic examples of library initialization in
`doc/library_initialization.md`.
This code needs libpthread as external dependency. It shouldn't be a big
issue; however a configure flag has been added to disable global context
support. A new CI job has been added to test it.
TODO: we should need to find a proper way to add some tests on
multithreaded enviroment... not an easy task...
*** API changes ***
If you are not interested in this feature, simply add a NULL parameter to
any `ndpi_init_detection_module()` calls.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some low hanging fruits found using nallocfuzz.
See: https://github.com/catenacyber/nallocfuzz
See: https://github.com/google/oss-fuzz/pull/9902
Most of these errors are quite trivial to fix; the only exception is the
stuff in the uthash.
If the insertion fails (because of an allocation failure), we need to
avoid some memory leaks. But the only way to check if the `HASH_ADD_*`
failed, is to perform a new lookup: a bit costly, but we don't use that
code in any critical data-path.
|
|
|
|
|
|
|
| |
Start using a dictionary for fuzzing (see:
https://llvm.org/docs/LibFuzzer.html#dictionaries).
Remove some dead code.
Fuzzing with debug enabled is not usually a great idea (from performance
POV). Keep the code since it might be useful while debugging.
|
|
|
|
|
|
|
|
|
|
| |
Load some custom configuration (like in the unit tests) and factorize some
(fuzzing) common code.
There is no way to pass file paths to the fuzzers as parameters. The safe
solution seems to be to load them from the process working dir. Anyway,
missing file is not a blocking error.
Remove some dead code (found looking at the coverage report)
|
|
|
|
|
| |
We don't need specific targets to reproduce fuzzing issues.
After all, calling `./fuzz/fuzz_process_packet_with_main $ARTIFACT_FILE`
is equivalento to `./fuzz/fuzz_process_packet $ARTIFACT_FILE`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a lot of places in ndPI we use *packet* source/dest info
(address/port/direction) when we are interested in *flow* client/server
info, instead.
Add basic logic to autodetect this kind of information.
nDPI doesn't perform any "flow management" itself but this task is
delegated to the external application. It is then likely that the
application might provide more reliable hints about flow
client/server direction and about the TCP handshake presence: in that case,
these information might be (optionally) passed to the library, disabling
the internal "autodetect" logic.
These new fields have been used in some LRU caches and in the "guessing"
algorithm.
It is quite likely that some other code needs to be updated.
|
|
|
|
|
|
|
|
|
|
|
| |
serialization interface. (#1535)
* Fixes #1528
* Serialization Interface should also fuzzed
* libjson-c may only be used in the unit test to verify the internal serialization interface
* Serialization Interface supports tlv(broken), csv and json
* Unit test does work again and requires libjson-c
Signed-off-by: lns <matzeton@googlemail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the last uses of `struct ndpi_id_struct`.
That code is not really used and it has not been updated for a very long
time: see #1279 for details.
Correlation among flows is achieved via LRU caches.
This change allows to further reduce memory consumption (see also
91bb77a8).
At nDPI 4.0 (more precisly, at a6b10cf, because memory stats
were wrong until that commit):
```
nDPI Memory statistics:
nDPI Memory (once): 221.15 KB
Flow Memory (per flow): 2.94 KB
```
Now:
```
nDPI Memory statistics:
nDPI Memory (once): 235.27 KB
Flow Memory (per flow): 688 B <--------
```
i.e. memory usage per flow has been reduced by 77%.
Close #1279
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can write to `flow->protos` only after a proper classification.
This issue has been found in Kerberos, DHCP, HTTP, STUN, IMO, FTP,
SMTP, IMAP and POP code.
There are two kinds of fixes:
* write to `flow->protos` only if a final protocol has been detected
* move protocol state out of `flow->protos`
The hard part is to find, for each protocol, the right tradeoff between
memory usage and code complexity.
Handle Kerberos like DNS: if we find a request, we set the protocol
and an extra callback to further parsing the reply.
For all the other protocols, move the state out of `flow->protos`. This
is an issue only for the FTP/MAIL stuff.
Add DHCP Class Identification value to the output of ndpiReader and to
the Jason serialization.
Extend code coverage of fuzz tests.
Close #1343
Close #1342
|
|
|
|
|
|
|
|
|
|
|
|
| |
ndpi_finalize_initialization(). (#1334)
* fixed several memory errors (heap-overflow, unitialized memory, etc)
* ability to build fuzz_process_packet with a main()
allowing to replay crash data generated with fuzz_process_packet
by LLVMs libfuzzer
* temporarily disable fuzzing if `tests/do.sh`
executed with env FUZZY_TESTING_ENABLED=1
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
|
| |
|
| |
|
|
And configur option enable-fuzztargets
|