diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 425 |
1 files changed, 296 insertions, 129 deletions
@@ -1,182 +1,349 @@ -JSMN -==== +[](https://github.com/utoni/nDPId/actions/workflows/build.yml) +[](https://gitlab.com/utoni/nDPId/-/pipelines) +[](https://app.circleci.com/pipelines/github/utoni/nDPId) +[](https://sonarcloud.io/summary/new_code?id=lnslbrty_nDPId) +[](https://sonarcloud.io/summary/new_code?id=lnslbrty_nDPId) +[](https://sonarcloud.io/summary/new_code?id=lnslbrty_nDPId) +[](https://sonarcloud.io/summary/new_code?id=lnslbrty_nDPId) +[](https://sonarcloud.io/summary/new_code?id=lnslbrty_nDPId) +[](https://sonarcloud.io/summary/new_code?id=lnslbrty_nDPId) + -[](https://travis-ci.org/zserge/jsmn) +# References -jsmn (pronounced like 'jasmine') is a minimalistic JSON parser in C. It can be -easily integrated into resource-limited or embedded projects. +[ntop Webinar 2022](https://www.ntop.org/webinar/ntop-webinar-on-dec-14th-community-meeting-and-future-plans/) +[ntopconf 2023](https://www.ntop.org/ntopconf2023/) -You can find more information about JSON format at [json.org][1] +# Disclaimer -Library sources are available at https://github.com/zserge/jsmn +Please respect & protect the privacy of others. -The web page with some information about jsmn can be found at -[http://zserge.com/jsmn.html][2] +The purpose of this software is not to spy on others, but to detect network anomalies and malicious traffic. -Philosophy ----------- +# Abstract -Most JSON parsers offer you a bunch of functions to load JSON data, parse it -and extract any value by its name. jsmn proves that checking the correctness of -every JSON packet or allocating temporary objects to store parsed JSON fields -often is an overkill. +nDPId is a set of daemons and tools to capture, process and classify network traffic. +Its minimal dependencies (besides a half-way modern C library and POSIX threads) are libnDPI (>=4.9.0 or current github dev branch) and libpcap. -JSON format itself is extremely simple, so why should we complicate it? +The daemon `nDPId` is capable of multithreading for packet processing, but w/o mutexes for performance reasons. +Instead, synchronization is achieved by a packet distribution mechanism. +To balance the workload to all threads (more or less) equally, a unique identifier represented as hash value is calculated using a 3-tuple consisting of: IPv4/IPv6 src/dst address; IP header value of the layer4 protocol; and (for TCP/UDP) src/dst port. Other protocols e.g. ICMP/ICMPv6 lack relevance for DPI, thus nDPId does not distinguish between different ICMP/ICMPv6 flows coming from the same host. This saves memory and performance, but might change in the future. -jsmn is designed to be **robust** (it should work fine even with erroneous -data), **fast** (it should parse data on the fly), **portable** (no superfluous -dependencies or non-standard C extensions). And of course, **simplicity** is a -key feature - simple code style, simple algorithm, simple integration into -other projects. +`nDPId` uses libnDPI's JSON serialization interface to generate a JSON messages for each event it receives from the library and which it then sends out to a UNIX-socket (default: `/tmp/ndpid-collector.sock` ). From such a socket, `nDPIsrvd` (or other custom applications) can retrieve incoming JSON-messages and further proceed working/distributing messages to higher-level applications. -Features --------- +Unfortunately, `nDPIsrvd` does not yet support any encryption/authentication for TCP connections (TODO!). -* compatible with C89 -* no dependencies (even libc!) -* highly portable (tested on x86/amd64, ARM, AVR) -* about 200 lines of code -* extremely small code footprint -* API contains only 2 functions -* no dynamic memory allocation -* incremental single-pass parsing -* library code is covered with unit-tests +# Architecture -Design ------- +This project uses a kind of microservice architecture. -The rudimentary jsmn object is a **token**. Let's consider a JSON string: +```text + connect to UNIX socket [1] connect to UNIX/TCP socket [2] +_______________________ | | __________________________ +| "producer" |___| |___| "consumer" | +|---------------------| _____________________________ |------------------------| +| | | nDPIsrvd | | | +| nDPId --- Thread 1 >| ---> |> | <| ---> |< example/c-json-stdout | +| (eth0) `- Thread 2 >| ---> |> collector | distributor <| ---> |________________________| +| `- Thread N >| ---> |> >>> forward >>> <| ---> | | +|_____________________| ^ |____________|______________| ^ |< example/py-flow-info | +| | | | |________________________| +| nDPId --- Thread 1 >| `- send serialized data [1] | | | +| (eth1) `- Thread 2 >| | |< example/... | +| `- Thread N >| receive serialized data [2] -' |________________________| +|_____________________| - '{ "name" : "Jack", "age" : 27 }' +``` +where: -It holds the following tokens: +* `nDPId` capture traffic, extract traffic data (with libnDPI) and send a JSON-serialized output stream to an already existing UNIX-socket; +* `nDPIsrvd`: -* Object: `{ "name" : "Jack", "age" : 27}` (the whole object) -* Strings: `"name"`, `"Jack"`, `"age"` (keys and some values) -* Number: `27` + * create and manage an "incoming" UNIX-socket (ref [1] above), to fetch data from a local `nDPId`; + * apply a buffering logic to received data; + * create and manage an "outgoing" UNIX or TCP socket (ref [2] above) to relay matched events + to connected clients -In jsmn, tokens do not hold any data, but point to token boundaries in JSON -string instead. In the example above jsmn will create tokens like: Object -[0..31], String [3..7], String [12..16], String [20..23], Number [27..29]. +* `consumers` are common/custom applications being able to receive selected flows/events, via both UNIX-socket or TCP-socket. -Every jsmn token has a type, which indicates the type of corresponding JSON -token. jsmn supports the following token types: -* Object - a container of key-value pairs, e.g.: - `{ "foo":"bar", "x":0.3 }` -* Array - a sequence of values, e.g.: - `[ 1, 2, 3 ]` -* String - a quoted sequence of chars, e.g.: `"foo"` -* Primitive - a number, a boolean (`true`, `false`) or `null` +# JSON stream format -Besides start/end positions, jsmn tokens for complex types (like arrays -or objects) also contain a number of child items, so you can easily follow -object hierarchy. +JSON messages streamed by both `nDPId` and `nDPIsrvd` are presented with: -This approach provides enough information for parsing any JSON data and makes -it possible to use zero-copy techniques. +* a 5-digit-number describing (as decimal number) the **entire** JSON message including the newline `\n` at the end; +* the JSON messages -Usage ------ +```text +[5-digit-number][JSON message] +``` -Download `jsmn.h`, include it, done. +as with the following example: +```text +01223{"flow_event_id":7,"flow_event_name":"detection-update","thread_id":12,"packet_id":307,"source":"wlan0", ...snip...} +00458{"packet_event_id":2,"packet_event_name":"packet-flow","thread_id":11,"packet_id":324,"source":"wlan0", ...snip...} +00572{"flow_event_id":1,"flow_event_name":"new","thread_id":11,"packet_id":324,"source":"wlan0", ...snip...} ``` -#include "jsmn.h" -... -jsmn_parser p; -jsmntok_t t[128]; /* We expect no more than 128 JSON tokens */ +The full stream of `nDPId` generated JSON-events can be retrieved directly from `nDPId`, without relying on `nDPIsrvd`, by providing a properly managed UNIX-socket. + +Technical details about the JSON-message format can be obtained from the related `.schema` file included in the `schema` directory + + +# Events + +`nDPId` generates JSON messages whereby each string is assigned to a certain event. +Those events specify the contents (key-value-pairs) of the JSON message. +They are divided into four categories, each with a number of subevents. + +## Error Events +They are 17 distinct events, indicating that layer2 or layer3 packet processing failed or not enough flow memory available: + +1. Unknown datalink layer packet +2. Unknown L3 protocol +3. Unsupported datalink layer +4. Packet too short +5. Unknown packet type +6. Packet header invalid +7. IP4 packet too short +8. Packet smaller than IP4 header: +9. nDPI IPv4/L4 payload detection failed +10. IP6 packet too short +11. Packet smaller than IP6 header +12. nDPI IPv6/L4 payload detection failed +13. TCP packet smaller than expected +14. UDP packet smaller than expected +15. Captured packet size is smaller than expected packet size +16. Max flows to track reached +17. Flow memory allocation failed + +Detailed JSON-schema is available [here](schema/error_event_schema.json) + +## Daemon Events +There are 4 distinct events indicating startup/shutdown or status events as well as a reconnect event if there was a previous connection failure (collector): + +1. init: `nDPId` startup +2. reconnect: (UNIX) socket connection lost previously and was established again +3. shutdown: `nDPId` terminates gracefully +4. status: statistics about the daemon itself e.g. memory consumption, zLib compressions (if enabled) + +Detailed JSON-schema is available [here](schema/daemon_event_schema.json) + -jsmn_init(&p); -r = jsmn_parse(&p, s, strlen(s), t, 128); // "s" is the char array holding the json content +## Packet Events +There are 2 events containing base64 encoded packet payloads either belonging to a flow or not: + +1. packet: does not belong to any flow +2. packet-flow: belongs to a flow e.g. TCP/UDP or ICMP + +Detailed JSON-schema is available [here](schema/packet_event_schema.json) + +## Flow Events +There are 9 distinct events related to a flow: + +1. new: a new TCP/UDP/ICMP flow seen which will be tracked +2. end: a TCP connection terminates +3. idle: a flow timed out, because there was no packet on the wire for a certain amount of time +4. update: inform nDPIsrvd or other apps about a long-lasting flow, whose detection was finished a long time ago but is still active +5. analyse: provide some information about extracted features of a flow (Experimental; disabled per default, enable with `-A`) +6. guessed: `libnDPI` was not able to reliably detect a layer7 protocol and falls back to IP/Port based detection +7. detected: `libnDPI` sucessfully detected a layer7 protocol +8. detection-update: `libnDPI` dissected more layer7 protocol data (after detection already done) +9. not-detected: neither detected nor guessed + +Detailed JSON-schema is available [here](schema/flow_event_schema.json). Also, a graphical representation of *Flow Events* timeline is available [here](schema/flow_events_diagram.png). + +# Flow States + +A flow can have three different states while it is been tracked by `nDPId`. + +1. skipped: the flow will be tracked, but no detection will happen to reduce memory usage. + See command line argument `-I` and `-E` +2. finished: detection finished and the memory used for the detection is freed +3. info: detection is in progress and all flow memory required for `libnDPI` is allocated (this state consumes most memory) + +# Build (CMake) + +`nDPId` build system is based on [CMake](https://cmake.org/) + +```shell +git clone https://github.com/utoni/nDPId.git +[...] +cd ndpid +mkdir build +cd build +cmake .. +[...] +make ``` -Since jsmn is a single-header, header-only library, for more complex use cases -you might need to define additional macros. `#define JSMN_STATIC` hides all -jsmn API symbols by making them static. Also, if you want to include `jsmn.h` -from multiple C files, to avoid duplication of symbols you may define `JSMN_HEADER` macro. +see below for a full/test live-session + + +Based on your build environment and/or desiderata, you could need: + +```shell +mkdir build +cd build +ccmake .. +``` + +or to build with a staticially linked libnDPI: + +```shell +cmake -S . -B ./build \ + -DSTATIC_LIBNDPI_INSTALLDIR=[path/to/your/libnDPI/installdir] \ + -DNDPI_NO_PKGCONFIG=ON +cmake --build ./build +``` + +If you use the latter, make sure that you've configured libnDPI with `./configure --prefix=[path/to/your/libnDPI/installdir]` +and remember to set the all-necessary CMake variables to link against shared libraries used by your nDPI build. +You'll also need to use `-DNDPI_NO_PKGCONFIG=ON` if `STATIC_LIBNDPI_INSTALLDIR` does not contain a pkg-config file. + +e.g.: + +```shell +cmake -S . -B ./build \ + -DSTATIC_LIBNDPI_INSTALLDIR=[path/to/your/libnDPI/installdir] \ + -DNDPI_NO_PKGCONFIG=ON \ + -DNDPI_WITH_GCRYPT=ON -DNDPI_WITH_PCRE=OFF -DNDPI_WITH_MAXMINDDB=OFF +cmake --build ./build ``` -/* In every .c file that uses jsmn include only declarations: */ -#define JSMN_HEADER -#include "jsmn.h" -/* Additionally, create one jsmn.c file for jsmn implementation: */ -#include "jsmn.h" +Or let a shell script do the work for you: + +```shell +cmake -S . -B ./build \ + -DBUILD_NDPI=ON +cmake --build ./build ``` -API ---- +The CMake cache variable `-DBUILD_NDPI=ON` builds a version of `libnDPI` residing as a git submodule in this repository. + +# run + +As mentioned above, in order to run `nDPId`, a UNIX-socket needs to be provided in order to stream our related JSON-data. + +Such a UNIX-socket can be provided by both the included `nDPIsrvd` daemon, or, if you simply need a quick check, with the [ncat](https://nmap.org/book/ncat-man.html) utility, with a simple `ncat -U /tmp/listen.sock -l -k`. Remember that OpenBSD `netcat` is not able to handle multiple connections reliably. -Token types are described by `jsmntype_t`: +Once the socket is ready, you can run `nDPId` capturing and analyzing your own traffic, with something similar to: `sudo nDPId -c /tmp/listen.sock` +If you're using OpenBSD `netcat`, you need to run: `sudo nDPId -c /tmp/listen.sock -o max-reader-threads=1` +Make sure that the UNIX socket is accessible by the user (see -u) to whom nDPId changes to, default: nobody. - typedef enum { - JSMN_UNDEFINED = 0, - JSMN_OBJECT = 1 << 0, - JSMN_ARRAY = 1 << 1, - JSMN_STRING = 1 << 2, - JSMN_PRIMITIVE = 1 << 3 - } jsmntype_t; +Of course, both `ncat` and `nDPId` need to point to the same UNIX-socket (`nDPId` provides the `-c` option, exactly for this. By default, `nDPId` refers to `/tmp/ndpid-collector.sock`, and the same default-path is also used by `nDPIsrvd` for the incoming socket). -**Note:** Unlike JSON data types, primitive tokens are not divided into -numbers, booleans and null, because one can easily tell the type using the -first character: +Give `nDPId` some real-traffic. You can capture your own traffic, with something similar to: -* <code>'t', 'f'</code> - boolean -* <code>'n'</code> - null -* <code>'-', '0'..'9'</code> - number +```shell +socat -u UNIX-Listen:/tmp/listen.sock,fork - # does the same as `ncat` +sudo chown nobody:nobody /tmp/listen.sock # default `nDPId` user/group, see `-u` and `-g` +sudo ./nDPId -c /tmp/listen.sock -l +``` + +`nDPId` supports also UDP collector endpoints: + +```shell +nc -d -u 127.0.0.1 7000 -l -k +sudo ./nDPId -c 127.0.0.1:7000 -l +``` + +or you can generate a nDPId-compatible JSON dump with: -Token is an object of `jsmntok_t` type: +```shell +./nDPId-test [path-to-a-PCAP-file] +``` - typedef struct { - jsmntype_t type; // Token type - int start; // Token start position - int end; // Token end position - int size; // Number of child (nested) tokens - } jsmntok_t; +You can also automatically fire both `nDPId` and `nDPIsrvd` automatically, with: -**Note:** string tokens point to the first character after -the opening quote and the previous symbol before final quote. This was made -to simplify string extraction from JSON data. +Daemons: +```shell +make -C [path-to-a-build-dir] daemon +``` -All job is done by `jsmn_parser` object. You can initialize a new parser using: +Or a manual approach with: - jsmn_parser parser; - jsmntok_t tokens[10]; +```shell +./nDPIsrvd -d +sudo ./nDPId -d +``` - jsmn_init(&parser); +or for a usage printout: +```shell +./nDPIsrvd -h +./nDPId -h +``` - // js - pointer to JSON string - // tokens - an array of tokens available - // 10 - number of tokens available - jsmn_parse(&parser, js, strlen(js), tokens, 10); +And why not a flow-info example? +```shell +./examples/py-flow-info/flow-info.py +``` -This will create a parser, and then it tries to parse up to 10 JSON tokens from -the `js` string. +or anything below `./examples`. + +# nDPId tuning + +It is possible to change `nDPId` internals w/o recompiling by using `-o subopt=value`. +But be careful: changing the default values may render `nDPId` useless and is not well tested. + +Suboptions for `-o`: + +Format: `subopt` (unit, comment): description + + * `max-flows-per-thread` (N, caution advised): affects max. memory usage + * `max-idle-flows-per-thread` (N, safe): max. allowed idle flows whose memory gets freed after `flow-scan-interval` + * `max-reader-threads` (N, safe): amount of packet processing threads, every thread can have a max. of `max-flows-per-thread` flows + * `daemon-status-interval` (ms, safe): specifies how often daemon event `status` is generated + * `compression-scan-interval` (ms, untested): specifies how often `nDPId` scans for inactive flows ready for compression + * `compression-flow-inactivity` (ms, untested): the shortest period of time elapsed before `nDPId` considers compressing a flow (e.g. nDPI flow struct) that neither sent nor received any data + * `flow-scan-interval` (ms, safe): min. amount of time after which `nDPId` scans for idle or long-lasting flows + * `generic-max-idle-time` (ms, untested): time after which a non TCP/UDP/ICMP flow times out + * `icmp-max-idle-time` (ms, untested): time after which an ICMP flow times out + * `udp-max-idle-time` (ms, caution advised): time after which an UDP flow times out + * `tcp-max-idle-time` (ms, caution advised): time after which a TCP flow times out + * `tcp-max-post-end-flow-time` (ms, caution advised): a TCP flow that received a FIN or RST waits this amount of time before flow tracking stops and the flow memory is freed + * `max-packets-per-flow-to-send` (N, safe): max. `packet-flow` events generated for the first N packets of each flow + * `max-packets-per-flow-to-process` (N, caution advised): max. amount of packets processed by `libnDPI` + * `max-packets-per-flow-to-analyze` (N, safe): max. packets to analyze before sending an `analyse` event, requires `-A` + * `error-event-threshold-n` (N, safe): max. error events to send until threshold time has passed + * `error-event-threshold-time` (N, safe): time after which the error event threshold resets + +# test + +The recommended way to run regression / diff tests: + +```shell +cmake -S . -B ./build-like-ci \ + -DBUILD_NDPI=ON -DENABLE_ZLIB=ON -DBUILD_EXAMPLES=ON +# optional: -DENABLE_CURL=ON -DENABLE_SANITIZER=ON +./test/run_tests.sh ./libnDPI ./build-like-ci/nDPId-test +# or: make -C ./build-like-ci test +``` -A non-negative return value of `jsmn_parse` is the number of tokens actually -used by the parser. -Passing NULL instead of the tokens array would not store parsing results, but -instead the function will return the number of tokens needed to parse the given -string. This can be useful if you don't know yet how many tokens to allocate. +Run `./test/run_tests.sh` to see some usage information. -If something goes wrong, you will get an error. Error will be one of these: +Remember that all test results are tied to a specific libnDPI commit hash +as part of the `git submodule`. Using `test/run_tests.sh` for other commit hashes +will most likely result in PCAP diffs. -* `JSMN_ERROR_INVAL` - bad token, JSON string is corrupted -* `JSMN_ERROR_NOMEM` - not enough tokens, JSON string is too large -* `JSMN_ERROR_PART` - JSON string is too short, expecting more JSON data +# Code Coverage -If you get `JSMN_ERROR_NOMEM`, you can re-allocate more tokens and call -`jsmn_parse` once more. If you read json data from the stream, you can -periodically call `jsmn_parse` and check if return value is `JSMN_ERROR_PART`. -You will get this error until you reach the end of JSON data. +You may generate code coverage by using: -Other info ----------- +```shell +cmake -S . -B ./build-coverage \ + -DENABLE_COVERAGE=ON -DENABLE_ZLIB=ON +# optional: -DBUILD_NDPI=ON +make -C ./build-coverage coverage-clean +make -C ./build-coverage clean +make -C ./build-coverage all +./test/run_tests.sh ./libnDPI ./build-coverage/nDPId-test +make -C ./build-coverage coverage +make -C ./build-coverage coverage-view +``` -This software is distributed under [MIT license](http://www.opensource.org/licenses/mit-license.php), - so feel free to integrate it in your commercial products. +# Contributors -[1]: http://www.json.org/ -[2]: http://zserge.com/jsmn.html +Special thanks to Damiano Verzulli ([@verzulli](https://github.com/verzulli)) from [GARRLab](https://www.garrlab.it) for providing server and test infrastructure. |