| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
|
|
| |
Use DNS information to get a better First Packet Classification.
See: #2322
---------
Co-authored-by: Nardi Ivan <nardi.ivan@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Added
size_t ndpi_compress_str(const char * in, size_t len, char * out, size_t bufsize);
size_t ndpi_decompress_str(const char * in, size_t len, char * out, size_t bufsize);
used to compress short strings such as domain names. This code is based on
https://github.com/Ed-von-Schleck/shoco
* Major code rewrite for ndpi_hash and ndpi_domain_classify
* Improvements to make sure custom categories are loaded and enabled
* Fixed string encoding
* Extended SalesForce/Cloudflare domains list
|
|
|
|
|
| |
* unused parameters and functions pollute the code and decrease readability
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
* Normalization of host_server_name
The ndpi_hostname_sni_set() function replaces all non-printable
characters with the "?" character and removing whitespace characters
at the end of the line.
* Added conditional hostname normalization.
|
|
|
|
|
| |
* Enable/disable sub-classification of DNS flows
* Enable/disable processing of DNS responses
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove some unreached/duplicated code.
Add error checking for `atoi()` calls.
About `isdigit()` and similar functions. The warning reported is:
```
Negative Character Value help
isdigit() is invoked here with an argument of signed type char, but only
has defined behavior for int arguments that are either representable
as unsigned char or equal to the value of macro EOF(-1).
Casting the argument to unsigned char will avoid the undefined behavior.
In a number of libc implementations, isdigit() is implemented using lookup
tables (arrays): passing in a negative value can result in a read underrun.
```
Switching to our macros fix that.
Add a check to `check_symbols.sh` to avoid using the original functions
from libc.
|
| |
|
| |
|
|
|
|
|
|
| |
1) Public API/headers in `src/include/` [as it has always been]
2) Private API/headers in `src/lib/`
Try to keep the "ndpi_" prefix only for the public functions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix compilation on Windows.
"dirent.h" file has been taken from https://github.com/tronkko/dirent/
Fix Python bindings
Fix some warnings with x86_64-w64-mingw32-gcc:
```
protocols/dns.c: In function ‘ndpi_search_dns’:
protocols/dns.c:775:41: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
775 | unsigned long first_element_len = (unsigned long)dot - (unsigned long)_hostname;
| ^
protocols/dns.c:775:62: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
775 | unsigned long first_element_len = (unsigned long)dot - (unsigned long)_hostname;
|
```
```
In file included from ndpi_bitmap64.c:31:
third_party/include/binaryfusefilter.h: In function ‘binary_fuse8_hash’:
third_party/include/binaryfusefilter.h:160:32: error: left shift count >= width of type [-Werror=shift-count-overflow]
160 | uint64_t hh = hash & ((1UL << 36) - 1);
```
```
In function ‘ndpi_match_custom_category’,
inlined from ‘ndpi_fill_protocol_category.part.0’ at ndpi_main.c:7056:16:
ndpi_main.c:3419:3: error: ‘strncpy’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=]
3419 | strncpy(buf, name, name_len);
```
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
The option NSID (RFC5001) is used by Google DNS to report the
airport code of the metro where the DNS query is handled.
This option is quite rare, but the added overhead in DNS code is pretty
much zero for "normal" DNS traffic
|
|
|
|
| |
that in DNS (instead) are not permitted
|
|
|
|
| |
Fix CI
See: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=54614
|
|
|
|
| |
about issues found on traffic.
|
|
|
|
| |
Improved DNS dissection
|
|
|
| |
Remove some dead code (found via coverage report)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The application may enable only some protocols.
Disabling a protocol means:
*) don't register/use the protocol dissector code (if any)
*) disable classification by-port for such a protocol
*) disable string matchings for domains/certificates involving this protocol
*) disable subprotocol registration (if any)
This feature can be tested with `ndpiReader -B list_of_protocols_to_disable`.
Custom protocols are always enabled.
Technically speaking, this commit doesn't introduce any API/ABI
incompatibility. However, calling `ndpi_set_protocol_detection_bitmask2()`
is now mandatory, just after having called `ndpi_init_detection_module()`.
Most of the diffs (and all the diffs in `/src/lib/protocols/`) are due to
the removing of some function parameters.
Fix the low level macro `NDPI_LOG`. This issue hasn't been detected
sooner simply because almost all the code uses only the helpers `NDPI_LOG_*`
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Load some custom configuration (like in the unit tests) and factorize some
(fuzzing) common code.
There is no way to pass file paths to the fuzzers as parameters. The safe
solution seems to be to load them from the process working dir. Anyway,
missing file is not a blocking error.
Remove some dead code (found looking at the coverage report)
|
|
|
|
| |
Found by sydr-fuzz
Close #1803
|
| |
|
|
|
|
|
|
| |
DNS flows should have `NDPI_PROTOCOL_CATEGORY_NETWORK` as category,
regardless of the subprotocol (if any).
Follow-up of 83de3e47
|
|
|
|
|
| |
DNS flows should have `NDPI_PROTOCOL_CATEGORY_NETWORK` as category,
regardless of the subprotocol (if any).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The host automa is used for two tasks:
* protocol sub-classification (obviously);
* DGA evaluation: the idea is that if a domain is present in this
automa, it can't be a DGA, regardless of its format/name.
In most dissectors both checks are executed, i.e. the code is something
like:
```
ndpi_match_host_subprotocol(..., flow->host_server_name, ...);
ndpi_check_dga_name(..., flow->host_server_name,...);
```
In that common case, we can perform only one automa lookup: if we check the
sub-classification before the DGA, we can avoid the second lookup in
the DGA function itself.
|
|
|
|
|
|
|
|
| |
See 95e16872.
After c0732eda, we can safely remove the protocol list from
`ndpi_process_extra_packet()`.
The field `flow->check_extra_packets` is redundant; remove it.
|
|
|
|
|
|
|
| |
Move the prottocol specific logic into the proper dissector code, where
it belongs.
Next step: remove that list of protocols. Long goal: remove this
function altogether...
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
* Checking for port 5353/5355 is not enough.
* Added additional multicast address and header checks.
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As a general rule, the higher the confidence value, the higher the
"reliability/precision" of the classification.
In other words, this new field provides an hint about "how" the flow
classification has been obtained.
For example, the application may want to ignore classification "by-port"
(they are not real DPI classifications, after all) or give a second
glance at flows classified via LRU caches (because of false positives).
Setting only one value for the confidence field is a bit tricky: more
work is probably needed in the next future to tweak/fix/improve the logic.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improve Microsoft, GMail, Likee, Whatsapp, DisneyPlus and Tiktok
detection.
Add Vimeo, Fuze, Alibaba and Firebase Crashlytics detection.
Try to differentiate between Messenger/Signal standard flows (i.e chat)
and their VOIP (video)calls (like we already do for Whatsapp and
Snapchat).
Add a partial list of some ADS/Tracking stuff.
Fix Cassandra, Radius and GTP false positives.
Fix DNS, Syslog and SIP false negatives.
Improve GTP (sub)classification: differentiate among GTP-U, GTP_C and
GTP_PRIME.
Fix 3 LGTM warnings.
|
|
|
|
| |
ipAddress and rfc822Name were specified in certificates
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Looking at `struct ndpi_flow_struct` the two bigger fields are
`host_server_name[240]` (mainly for HTTP hostnames and DNS domains) and
`protos.tls_quic.client_requested_server_name[256]`
(for TLS/QUIC SNIs).
This commit aims to reduce `struct ndpi_flow_struct` size, according to
two simple observations:
1) maximum one of these two fields is used for each flow. So it seems safe
to merge them;
2) even if hostnames/SNIs might be very long, in practice they are rarely
longer than a fews tens of bytes. So, using a (single) large buffer is a
waste of memory for all kinds of flows. If we need to truncate the name,
we keep the *last* characters, easing domain matching.
Analyzing some real traffic, it seems safe to assume that the vast
majority of hostnames/SNIs is shorter than 80 bytes.
Hostnames/SNIs are always converted to lowercase.
Attention was given so as to be sure that unit-tests outputs are not
affected by this change.
Because of a bug, TLS/QUIC SNI were always truncated to 64 bytes (the
*first* 64 ones): as a consequence, there were some "Suspicious DGA
domain name" and "TLS Certificate Mismatch" false positives.
|
|
|
| |
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are no real reasons to embed `struct ndpi_packet_struct` (i.e. "packet")
in `struct ndpi_flow_struct` (i.e. "flow"). In other words, we can avoid
saving dissection information of "current packet" into the "flow" state,
i.e. in the flow management table.
The nDPI detection module processes only one packet at the time, so it is
safe to save packet dissection information in `struct ndpi_detection_module_struct`,
reusing always the same "packet" instance and saving a huge amount of memory.
Bottom line: we need only one copy of "packet" (for detection module),
not one for each "flow".
It is not clear how/why "packet" ended up in "flow" in the first place.
It has been there since the beginning of the GIT history, but in the original
OpenDPI code `struct ipoque_packet_struct` was embedded in
`struct ipoque_detection_module_struct`, i.e. there was the same exact
situation this commit wants to achieve.
Most of the changes in this PR are some boilerplate to update something
like "flow->packet" into something like "module->packet" throughout the code.
Some attention has been paid to update `ndpi_init_packet()` since we need
to reset some "packet" fields before starting to process another packet.
There has been one important change, though, in ndpi_detection_giveup().
Nothing changed for the applications/users, but this function can't access
"packet" anymore.
The reason is that this function can be called "asynchronously" with respect
to the data processing, i.e in context where there is no valid notion of
"current packet"; for example ndpiReader calls it after having processed all
the traffic, iterating the entire session table.
Mining LRU stuff seems a bit odd (even before this patch): probably we need
to rethink it, as a follow-up.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This field is an exact copy of `ndpi_flow_struct->detected_protocol_stack[2]`:
* at the very beginning of packet dissection, the value saved in
`flow->detected_protocol_stack` is copied in `packet->detected_protocol_stack`
(via `ndpi_detection_process_packet()` -> `ndpi_init_packet_header()`)
* every time we update `flow->detected_protocol_stack` we update
`packet->detected_protocol_stack` too (via `ndpi_int_change_protocol()`
-> `ndpi_int_change_packet_protocol()`)
These two fields are always in sync: keeping the same value in two
different places is useless.
|
| |
|
| |
|