summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md425
1 files changed, 296 insertions, 129 deletions
diff --git a/README.md b/README.md
index e94679775..7844b21f9 100644
--- a/README.md
+++ b/README.md
@@ -1,182 +1,349 @@
-JSMN
-====
+[![Build](https://github.com/utoni/nDPId/actions/workflows/build.yml/badge.svg)](https://github.com/utoni/nDPId/actions/workflows/build.yml)
+[![Gitlab-CI](https://gitlab.com/utoni/nDPId/badges/main/pipeline.svg)](https://gitlab.com/utoni/nDPId/-/pipelines)
+[![Circle-CI](https://circleci.com/gh/utoni/nDPId.svg?style=shield "Circle-CI")](https://app.circleci.com/pipelines/github/utoni/nDPId)
+[![Lines of Code](https://sonarcloud.io/api/project_badges/measure?project=lnslbrty_nDPId&metric=ncloc)](https://sonarcloud.io/summary/new_code?id=lnslbrty_nDPId)
+[![Code Smells](https://sonarcloud.io/api/project_badges/measure?project=lnslbrty_nDPId&metric=code_smells)](https://sonarcloud.io/summary/new_code?id=lnslbrty_nDPId)
+[![Bugs](https://sonarcloud.io/api/project_badges/measure?project=lnslbrty_nDPId&metric=bugs)](https://sonarcloud.io/summary/new_code?id=lnslbrty_nDPId)
+[![Vulnerabilities](https://sonarcloud.io/api/project_badges/measure?project=lnslbrty_nDPId&metric=vulnerabilities)](https://sonarcloud.io/summary/new_code?id=lnslbrty_nDPId)
+[![Reliability Rating](https://sonarcloud.io/api/project_badges/measure?project=lnslbrty_nDPId&metric=reliability_rating)](https://sonarcloud.io/summary/new_code?id=lnslbrty_nDPId)
+[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=lnslbrty_nDPId&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=lnslbrty_nDPId)
+![Docker Automated build](https://img.shields.io/docker/automated/utoni/ndpid)
-[![Build Status](https://travis-ci.org/zserge/jsmn.svg?branch=master)](https://travis-ci.org/zserge/jsmn)
+# References
-jsmn (pronounced like 'jasmine') is a minimalistic JSON parser in C. It can be
-easily integrated into resource-limited or embedded projects.
+[ntop Webinar 2022](https://www.ntop.org/webinar/ntop-webinar-on-dec-14th-community-meeting-and-future-plans/)
+[ntopconf 2023](https://www.ntop.org/ntopconf2023/)
-You can find more information about JSON format at [json.org][1]
+# Disclaimer
-Library sources are available at https://github.com/zserge/jsmn
+Please respect & protect the privacy of others.
-The web page with some information about jsmn can be found at
-[http://zserge.com/jsmn.html][2]
+The purpose of this software is not to spy on others, but to detect network anomalies and malicious traffic.
-Philosophy
-----------
+# Abstract
-Most JSON parsers offer you a bunch of functions to load JSON data, parse it
-and extract any value by its name. jsmn proves that checking the correctness of
-every JSON packet or allocating temporary objects to store parsed JSON fields
-often is an overkill.
+nDPId is a set of daemons and tools to capture, process and classify network traffic.
+Its minimal dependencies (besides a half-way modern C library and POSIX threads) are libnDPI (>=4.9.0 or current github dev branch) and libpcap.
-JSON format itself is extremely simple, so why should we complicate it?
+The daemon `nDPId` is capable of multithreading for packet processing, but w/o mutexes for performance reasons.
+Instead, synchronization is achieved by a packet distribution mechanism.
+To balance the workload to all threads (more or less) equally, a unique identifier represented as hash value is calculated using a 3-tuple consisting of: IPv4/IPv6 src/dst address; IP header value of the layer4 protocol; and (for TCP/UDP) src/dst port. Other protocols e.g. ICMP/ICMPv6 lack relevance for DPI, thus nDPId does not distinguish between different ICMP/ICMPv6 flows coming from the same host. This saves memory and performance, but might change in the future.
-jsmn is designed to be **robust** (it should work fine even with erroneous
-data), **fast** (it should parse data on the fly), **portable** (no superfluous
-dependencies or non-standard C extensions). And of course, **simplicity** is a
-key feature - simple code style, simple algorithm, simple integration into
-other projects.
+`nDPId` uses libnDPI's JSON serialization interface to generate a JSON messages for each event it receives from the library and which it then sends out to a UNIX-socket (default: `/tmp/ndpid-collector.sock` ). From such a socket, `nDPIsrvd` (or other custom applications) can retrieve incoming JSON-messages and further proceed working/distributing messages to higher-level applications.
-Features
---------
+Unfortunately, `nDPIsrvd` does not yet support any encryption/authentication for TCP connections (TODO!).
-* compatible with C89
-* no dependencies (even libc!)
-* highly portable (tested on x86/amd64, ARM, AVR)
-* about 200 lines of code
-* extremely small code footprint
-* API contains only 2 functions
-* no dynamic memory allocation
-* incremental single-pass parsing
-* library code is covered with unit-tests
+# Architecture
-Design
-------
+This project uses a kind of microservice architecture.
-The rudimentary jsmn object is a **token**. Let's consider a JSON string:
+```text
+ connect to UNIX socket [1] connect to UNIX/TCP socket [2]
+_______________________ | | __________________________
+| "producer" |___| |___| "consumer" |
+|---------------------| _____________________________ |------------------------|
+| | | nDPIsrvd | | |
+| nDPId --- Thread 1 >| ---> |> | <| ---> |< example/c-json-stdout |
+| (eth0) `- Thread 2 >| ---> |> collector | distributor <| ---> |________________________|
+| `- Thread N >| ---> |> >>> forward >>> <| ---> | |
+|_____________________| ^ |____________|______________| ^ |< example/py-flow-info |
+| | | | |________________________|
+| nDPId --- Thread 1 >| `- send serialized data [1] | | |
+| (eth1) `- Thread 2 >| | |< example/... |
+| `- Thread N >| receive serialized data [2] -' |________________________|
+|_____________________|
- '{ "name" : "Jack", "age" : 27 }'
+```
+where:
-It holds the following tokens:
+* `nDPId` capture traffic, extract traffic data (with libnDPI) and send a JSON-serialized output stream to an already existing UNIX-socket;
+* `nDPIsrvd`:
-* Object: `{ "name" : "Jack", "age" : 27}` (the whole object)
-* Strings: `"name"`, `"Jack"`, `"age"` (keys and some values)
-* Number: `27`
+ * create and manage an "incoming" UNIX-socket (ref [1] above), to fetch data from a local `nDPId`;
+ * apply a buffering logic to received data;
+ * create and manage an "outgoing" UNIX or TCP socket (ref [2] above) to relay matched events
+ to connected clients
-In jsmn, tokens do not hold any data, but point to token boundaries in JSON
-string instead. In the example above jsmn will create tokens like: Object
-[0..31], String [3..7], String [12..16], String [20..23], Number [27..29].
+* `consumers` are common/custom applications being able to receive selected flows/events, via both UNIX-socket or TCP-socket.
-Every jsmn token has a type, which indicates the type of corresponding JSON
-token. jsmn supports the following token types:
-* Object - a container of key-value pairs, e.g.:
- `{ "foo":"bar", "x":0.3 }`
-* Array - a sequence of values, e.g.:
- `[ 1, 2, 3 ]`
-* String - a quoted sequence of chars, e.g.: `"foo"`
-* Primitive - a number, a boolean (`true`, `false`) or `null`
+# JSON stream format
-Besides start/end positions, jsmn tokens for complex types (like arrays
-or objects) also contain a number of child items, so you can easily follow
-object hierarchy.
+JSON messages streamed by both `nDPId` and `nDPIsrvd` are presented with:
-This approach provides enough information for parsing any JSON data and makes
-it possible to use zero-copy techniques.
+* a 5-digit-number describing (as decimal number) the **entire** JSON message including the newline `\n` at the end;
+* the JSON messages
-Usage
------
+```text
+[5-digit-number][JSON message]
+```
-Download `jsmn.h`, include it, done.
+as with the following example:
+```text
+01223{"flow_event_id":7,"flow_event_name":"detection-update","thread_id":12,"packet_id":307,"source":"wlan0", ...snip...}
+00458{"packet_event_id":2,"packet_event_name":"packet-flow","thread_id":11,"packet_id":324,"source":"wlan0", ...snip...}
+00572{"flow_event_id":1,"flow_event_name":"new","thread_id":11,"packet_id":324,"source":"wlan0", ...snip...}
```
-#include "jsmn.h"
-...
-jsmn_parser p;
-jsmntok_t t[128]; /* We expect no more than 128 JSON tokens */
+The full stream of `nDPId` generated JSON-events can be retrieved directly from `nDPId`, without relying on `nDPIsrvd`, by providing a properly managed UNIX-socket.
+
+Technical details about the JSON-message format can be obtained from the related `.schema` file included in the `schema` directory
+
+
+# Events
+
+`nDPId` generates JSON messages whereby each string is assigned to a certain event.
+Those events specify the contents (key-value-pairs) of the JSON message.
+They are divided into four categories, each with a number of subevents.
+
+## Error Events
+They are 17 distinct events, indicating that layer2 or layer3 packet processing failed or not enough flow memory available:
+
+1. Unknown datalink layer packet
+2. Unknown L3 protocol
+3. Unsupported datalink layer
+4. Packet too short
+5. Unknown packet type
+6. Packet header invalid
+7. IP4 packet too short
+8. Packet smaller than IP4 header:
+9. nDPI IPv4/L4 payload detection failed
+10. IP6 packet too short
+11. Packet smaller than IP6 header
+12. nDPI IPv6/L4 payload detection failed
+13. TCP packet smaller than expected
+14. UDP packet smaller than expected
+15. Captured packet size is smaller than expected packet size
+16. Max flows to track reached
+17. Flow memory allocation failed
+
+Detailed JSON-schema is available [here](schema/error_event_schema.json)
+
+## Daemon Events
+There are 4 distinct events indicating startup/shutdown or status events as well as a reconnect event if there was a previous connection failure (collector):
+
+1. init: `nDPId` startup
+2. reconnect: (UNIX) socket connection lost previously and was established again
+3. shutdown: `nDPId` terminates gracefully
+4. status: statistics about the daemon itself e.g. memory consumption, zLib compressions (if enabled)
+
+Detailed JSON-schema is available [here](schema/daemon_event_schema.json)
+
-jsmn_init(&p);
-r = jsmn_parse(&p, s, strlen(s), t, 128); // "s" is the char array holding the json content
+## Packet Events
+There are 2 events containing base64 encoded packet payloads either belonging to a flow or not:
+
+1. packet: does not belong to any flow
+2. packet-flow: belongs to a flow e.g. TCP/UDP or ICMP
+
+Detailed JSON-schema is available [here](schema/packet_event_schema.json)
+
+## Flow Events
+There are 9 distinct events related to a flow:
+
+1. new: a new TCP/UDP/ICMP flow seen which will be tracked
+2. end: a TCP connection terminates
+3. idle: a flow timed out, because there was no packet on the wire for a certain amount of time
+4. update: inform nDPIsrvd or other apps about a long-lasting flow, whose detection was finished a long time ago but is still active
+5. analyse: provide some information about extracted features of a flow (Experimental; disabled per default, enable with `-A`)
+6. guessed: `libnDPI` was not able to reliably detect a layer7 protocol and falls back to IP/Port based detection
+7. detected: `libnDPI` sucessfully detected a layer7 protocol
+8. detection-update: `libnDPI` dissected more layer7 protocol data (after detection already done)
+9. not-detected: neither detected nor guessed
+
+Detailed JSON-schema is available [here](schema/flow_event_schema.json). Also, a graphical representation of *Flow Events* timeline is available [here](schema/flow_events_diagram.png).
+
+# Flow States
+
+A flow can have three different states while it is been tracked by `nDPId`.
+
+1. skipped: the flow will be tracked, but no detection will happen to reduce memory usage.
+ See command line argument `-I` and `-E`
+2. finished: detection finished and the memory used for the detection is freed
+3. info: detection is in progress and all flow memory required for `libnDPI` is allocated (this state consumes most memory)
+
+# Build (CMake)
+
+`nDPId` build system is based on [CMake](https://cmake.org/)
+
+```shell
+git clone https://github.com/utoni/nDPId.git
+[...]
+cd ndpid
+mkdir build
+cd build
+cmake ..
+[...]
+make
```
-Since jsmn is a single-header, header-only library, for more complex use cases
-you might need to define additional macros. `#define JSMN_STATIC` hides all
-jsmn API symbols by making them static. Also, if you want to include `jsmn.h`
-from multiple C files, to avoid duplication of symbols you may define `JSMN_HEADER` macro.
+see below for a full/test live-session
+
+![](examples/ndpid_install_and_run.gif)
+Based on your build environment and/or desiderata, you could need:
+
+```shell
+mkdir build
+cd build
+ccmake ..
+```
+
+or to build with a staticially linked libnDPI:
+
+```shell
+cmake -S . -B ./build \
+ -DSTATIC_LIBNDPI_INSTALLDIR=[path/to/your/libnDPI/installdir] \
+ -DNDPI_NO_PKGCONFIG=ON
+cmake --build ./build
+```
+
+If you use the latter, make sure that you've configured libnDPI with `./configure --prefix=[path/to/your/libnDPI/installdir]`
+and remember to set the all-necessary CMake variables to link against shared libraries used by your nDPI build.
+You'll also need to use `-DNDPI_NO_PKGCONFIG=ON` if `STATIC_LIBNDPI_INSTALLDIR` does not contain a pkg-config file.
+
+e.g.:
+
+```shell
+cmake -S . -B ./build \
+ -DSTATIC_LIBNDPI_INSTALLDIR=[path/to/your/libnDPI/installdir] \
+ -DNDPI_NO_PKGCONFIG=ON \
+ -DNDPI_WITH_GCRYPT=ON -DNDPI_WITH_PCRE=OFF -DNDPI_WITH_MAXMINDDB=OFF
+cmake --build ./build
```
-/* In every .c file that uses jsmn include only declarations: */
-#define JSMN_HEADER
-#include "jsmn.h"
-/* Additionally, create one jsmn.c file for jsmn implementation: */
-#include "jsmn.h"
+Or let a shell script do the work for you:
+
+```shell
+cmake -S . -B ./build \
+ -DBUILD_NDPI=ON
+cmake --build ./build
```
-API
----
+The CMake cache variable `-DBUILD_NDPI=ON` builds a version of `libnDPI` residing as a git submodule in this repository.
+
+# run
+
+As mentioned above, in order to run `nDPId`, a UNIX-socket needs to be provided in order to stream our related JSON-data.
+
+Such a UNIX-socket can be provided by both the included `nDPIsrvd` daemon, or, if you simply need a quick check, with the [ncat](https://nmap.org/book/ncat-man.html) utility, with a simple `ncat -U /tmp/listen.sock -l -k`. Remember that OpenBSD `netcat` is not able to handle multiple connections reliably.
-Token types are described by `jsmntype_t`:
+Once the socket is ready, you can run `nDPId` capturing and analyzing your own traffic, with something similar to: `sudo nDPId -c /tmp/listen.sock`
+If you're using OpenBSD `netcat`, you need to run: `sudo nDPId -c /tmp/listen.sock -o max-reader-threads=1`
+Make sure that the UNIX socket is accessible by the user (see -u) to whom nDPId changes to, default: nobody.
- typedef enum {
- JSMN_UNDEFINED = 0,
- JSMN_OBJECT = 1 << 0,
- JSMN_ARRAY = 1 << 1,
- JSMN_STRING = 1 << 2,
- JSMN_PRIMITIVE = 1 << 3
- } jsmntype_t;
+Of course, both `ncat` and `nDPId` need to point to the same UNIX-socket (`nDPId` provides the `-c` option, exactly for this. By default, `nDPId` refers to `/tmp/ndpid-collector.sock`, and the same default-path is also used by `nDPIsrvd` for the incoming socket).
-**Note:** Unlike JSON data types, primitive tokens are not divided into
-numbers, booleans and null, because one can easily tell the type using the
-first character:
+Give `nDPId` some real-traffic. You can capture your own traffic, with something similar to:
-* <code>'t', 'f'</code> - boolean
-* <code>'n'</code> - null
-* <code>'-', '0'..'9'</code> - number
+```shell
+socat -u UNIX-Listen:/tmp/listen.sock,fork - # does the same as `ncat`
+sudo chown nobody:nobody /tmp/listen.sock # default `nDPId` user/group, see `-u` and `-g`
+sudo ./nDPId -c /tmp/listen.sock -l
+```
+
+`nDPId` supports also UDP collector endpoints:
+
+```shell
+nc -d -u 127.0.0.1 7000 -l -k
+sudo ./nDPId -c 127.0.0.1:7000 -l
+```
+
+or you can generate a nDPId-compatible JSON dump with:
-Token is an object of `jsmntok_t` type:
+```shell
+./nDPId-test [path-to-a-PCAP-file]
+```
- typedef struct {
- jsmntype_t type; // Token type
- int start; // Token start position
- int end; // Token end position
- int size; // Number of child (nested) tokens
- } jsmntok_t;
+You can also automatically fire both `nDPId` and `nDPIsrvd` automatically, with:
-**Note:** string tokens point to the first character after
-the opening quote and the previous symbol before final quote. This was made
-to simplify string extraction from JSON data.
+Daemons:
+```shell
+make -C [path-to-a-build-dir] daemon
+```
-All job is done by `jsmn_parser` object. You can initialize a new parser using:
+Or a manual approach with:
- jsmn_parser parser;
- jsmntok_t tokens[10];
+```shell
+./nDPIsrvd -d
+sudo ./nDPId -d
+```
- jsmn_init(&parser);
+or for a usage printout:
+```shell
+./nDPIsrvd -h
+./nDPId -h
+```
- // js - pointer to JSON string
- // tokens - an array of tokens available
- // 10 - number of tokens available
- jsmn_parse(&parser, js, strlen(js), tokens, 10);
+And why not a flow-info example?
+```shell
+./examples/py-flow-info/flow-info.py
+```
-This will create a parser, and then it tries to parse up to 10 JSON tokens from
-the `js` string.
+or anything below `./examples`.
+
+# nDPId tuning
+
+It is possible to change `nDPId` internals w/o recompiling by using `-o subopt=value`.
+But be careful: changing the default values may render `nDPId` useless and is not well tested.
+
+Suboptions for `-o`:
+
+Format: `subopt` (unit, comment): description
+
+ * `max-flows-per-thread` (N, caution advised): affects max. memory usage
+ * `max-idle-flows-per-thread` (N, safe): max. allowed idle flows whose memory gets freed after `flow-scan-interval`
+ * `max-reader-threads` (N, safe): amount of packet processing threads, every thread can have a max. of `max-flows-per-thread` flows
+ * `daemon-status-interval` (ms, safe): specifies how often daemon event `status` is generated
+ * `compression-scan-interval` (ms, untested): specifies how often `nDPId` scans for inactive flows ready for compression
+ * `compression-flow-inactivity` (ms, untested): the shortest period of time elapsed before `nDPId` considers compressing a flow (e.g. nDPI flow struct) that neither sent nor received any data
+ * `flow-scan-interval` (ms, safe): min. amount of time after which `nDPId` scans for idle or long-lasting flows
+ * `generic-max-idle-time` (ms, untested): time after which a non TCP/UDP/ICMP flow times out
+ * `icmp-max-idle-time` (ms, untested): time after which an ICMP flow times out
+ * `udp-max-idle-time` (ms, caution advised): time after which an UDP flow times out
+ * `tcp-max-idle-time` (ms, caution advised): time after which a TCP flow times out
+ * `tcp-max-post-end-flow-time` (ms, caution advised): a TCP flow that received a FIN or RST waits this amount of time before flow tracking stops and the flow memory is freed
+ * `max-packets-per-flow-to-send` (N, safe): max. `packet-flow` events generated for the first N packets of each flow
+ * `max-packets-per-flow-to-process` (N, caution advised): max. amount of packets processed by `libnDPI`
+ * `max-packets-per-flow-to-analyze` (N, safe): max. packets to analyze before sending an `analyse` event, requires `-A`
+ * `error-event-threshold-n` (N, safe): max. error events to send until threshold time has passed
+ * `error-event-threshold-time` (N, safe): time after which the error event threshold resets
+
+# test
+
+The recommended way to run regression / diff tests:
+
+```shell
+cmake -S . -B ./build-like-ci \
+ -DBUILD_NDPI=ON -DENABLE_ZLIB=ON -DBUILD_EXAMPLES=ON
+# optional: -DENABLE_CURL=ON -DENABLE_SANITIZER=ON
+./test/run_tests.sh ./libnDPI ./build-like-ci/nDPId-test
+# or: make -C ./build-like-ci test
+```
-A non-negative return value of `jsmn_parse` is the number of tokens actually
-used by the parser.
-Passing NULL instead of the tokens array would not store parsing results, but
-instead the function will return the number of tokens needed to parse the given
-string. This can be useful if you don't know yet how many tokens to allocate.
+Run `./test/run_tests.sh` to see some usage information.
-If something goes wrong, you will get an error. Error will be one of these:
+Remember that all test results are tied to a specific libnDPI commit hash
+as part of the `git submodule`. Using `test/run_tests.sh` for other commit hashes
+will most likely result in PCAP diffs.
-* `JSMN_ERROR_INVAL` - bad token, JSON string is corrupted
-* `JSMN_ERROR_NOMEM` - not enough tokens, JSON string is too large
-* `JSMN_ERROR_PART` - JSON string is too short, expecting more JSON data
+# Code Coverage
-If you get `JSMN_ERROR_NOMEM`, you can re-allocate more tokens and call
-`jsmn_parse` once more. If you read json data from the stream, you can
-periodically call `jsmn_parse` and check if return value is `JSMN_ERROR_PART`.
-You will get this error until you reach the end of JSON data.
+You may generate code coverage by using:
-Other info
-----------
+```shell
+cmake -S . -B ./build-coverage \
+ -DENABLE_COVERAGE=ON -DENABLE_ZLIB=ON
+# optional: -DBUILD_NDPI=ON
+make -C ./build-coverage coverage-clean
+make -C ./build-coverage clean
+make -C ./build-coverage all
+./test/run_tests.sh ./libnDPI ./build-coverage/nDPId-test
+make -C ./build-coverage coverage
+make -C ./build-coverage coverage-view
+```
-This software is distributed under [MIT license](http://www.opensource.org/licenses/mit-license.php),
- so feel free to integrate it in your commercial products.
+# Contributors
-[1]: http://www.json.org/
-[2]: http://zserge.com/jsmn.html
+Special thanks to Damiano Verzulli ([@verzulli](https://github.com/verzulli)) from [GARRLab](https://www.garrlab.it) for providing server and test infrastructure.