aboutsummaryrefslogtreecommitdiff
path: root/src/lib/ndpi_domain_classify.c
Commit message (Collapse)AuthorAge
* ndpi_mod is now optional (albeit better to spcify it) in ↵Luca Deri2024-05-07
| | | | ndpi_domain_classify_xxx calls
* fuzz: improvements (#2400)Ivan Nardi2024-04-20
| | | | | Create the zip file with all the traces only once. Add a new fuzzer to test "shoco" compression algorithm
* Domain Classification Improvements (#2396)Luca Deri2024-04-18
| | | | | | | | | | | | | | | | | | | * Added size_t ndpi_compress_str(const char * in, size_t len, char * out, size_t bufsize); size_t ndpi_decompress_str(const char * in, size_t len, char * out, size_t bufsize); used to compress short strings such as domain names. This code is based on https://github.com/Ed-von-Schleck/shoco * Major code rewrite for ndpi_hash and ndpi_domain_classify * Improvements to make sure custom categories are loaded and enabled * Fixed string encoding * Extended SalesForce/Cloudflare domains list
* Added support for roaring bitmap v3 (#2355)Luca Deri2024-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Integrated RoaringBitmap v3 * Renamed ndpi_bitmap64 ro ndpi_bitmap64_fuse * Fixes to ndpi_bitmap for new roaring library * Fixes for bitmap serialization * Fixed format * Warning fix * Conversion fix * Warning fix * Added check for roaring v3 support * Updated file name * Updated path * Uses clang-9 (instead of clang-7) for builds * Fixed fuzz_ds_bitmap64_fuse * Fixes nDPI printf handling * Disabled printf * Yet another printf fix * Cleaup * Fx for compiling on older platforms * Fixes for old compilers * Initialization changes * Added compiler check * Fixes for old compilers * Inline function is not static inline * Added missing include
* Removed check fir skipping .arpa domainsLuca Deri2024-01-27
|
* Fixed loading of non-ICANN domains that caused false positives with ↵Luca Deri2024-01-27
| | | | | | ndpi_load_domain_suffixes Minor hash optimization
* Improved ndpi_get_host_domainLuca2024-01-16
|
* Added ndpi_get_host_domain() for returning the host domainLuca2024-01-16
| | | | vs ndpi_get_host_domain_prefix() that instead returnd the host TLD
* Removes extraneous parentheses that caused macOS to complainLuca2024-01-15
|
* Added new API callsLuca2024-01-15
| | | | | | | | | | - ndpi_load_domain_suffixes() - ndpi_get_host_domain_suffix() whose goal is to find the domain name of a hostname. Example: www.bbc.co.uk -> co.uk mail.apple.com -> com
* Fix some warnings reported by CODESonar (#2227)Ivan Nardi2024-01-12
| | | | | | | | | | | | | | | | | | | Remove some unreached/duplicated code. Add error checking for `atoi()` calls. About `isdigit()` and similar functions. The warning reported is: ``` Negative Character Value help isdigit() is invoked here with an argument of signed type char, but only has defined behavior for int arguments that are either representable as unsigned char or equal to the value of macro EOF(-1). Casting the argument to unsigned char will avoid the undefined behavior. In a number of libc implementations, isdigit() is implemented using lookup tables (arrays): passing in a negative value can result in a read underrun. ``` Switching to our macros fix that. Add a check to `check_symbols.sh` to avoid using the original functions from libc.
* fuzz: extend fuzzing coverageNardi Ivan2023-09-16
|
* Add `ndpi_domain_classify_finalize()` function (#2084)Ivan Nardi2023-09-12
| | | | | | | | | The "domain classify" data structure is immutable, since it uses "bitmap64". Allow to finalize it before starting to process packets (i.e. before calling `ndpi_domain_classify_contains()`) to avoid, in the data-path, all the memory allocations due to compression. Calling `ndpi_domain_classify_finalize()` is optional.
* Fixes matches with domain name strings that start with a dotLuca Deri2023-09-11
|
* fuzz: add fuzzers to test bitmap64 and domain_classify data structures (#2082)Ivan Nardi2023-09-10
|
* fuzz: add fuzzers to test reader_util code (#2080)Ivan Nardi2023-09-10
|
* Fix some errors found by fuzzers (#2078)Ivan Nardi2023-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix compilation on Windows. "dirent.h" file has been taken from https://github.com/tronkko/dirent/ Fix Python bindings Fix some warnings with x86_64-w64-mingw32-gcc: ``` protocols/dns.c: In function ‘ndpi_search_dns’: protocols/dns.c:775:41: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] 775 | unsigned long first_element_len = (unsigned long)dot - (unsigned long)_hostname; | ^ protocols/dns.c:775:62: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] 775 | unsigned long first_element_len = (unsigned long)dot - (unsigned long)_hostname; | ``` ``` In file included from ndpi_bitmap64.c:31: third_party/include/binaryfusefilter.h: In function ‘binary_fuse8_hash’: third_party/include/binaryfusefilter.h:160:32: error: left shift count >= width of type [-Werror=shift-count-overflow] 160 | uint64_t hh = hash & ((1UL << 36) - 1); ``` ``` In function ‘ndpi_match_custom_category’, inlined from ‘ndpi_fill_protocol_category.part.0’ at ndpi_main.c:7056:16: ndpi_main.c:3419:3: error: ‘strncpy’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=] 3419 | strncpy(buf, name, name_len); ```
* Reworked initializationLuca Deri2023-09-10
|
* Minor warning fixesLuca Deri2023-09-05
|
* Improved classification further reducing memory usedLuca Deri2023-09-05
|
* Added sub-domain classification fixLuca Deri2023-09-05
|
* Classification fixesLuca Deri2023-09-05
|
* Added ndpi_bitmap64 supportLuca Deri2023-09-05
|
* Added ndpi_murmur_hash to the nDPI APILuca Deri2023-09-04
|
* Merged new and old version of ndpi_domain_classify.c codeLuca Deri2023-09-02
|
* Reworked domain classification based on binary filtersLuca Deri2023-09-02
|
* Improvement for reducing false positivesLuca Deri2023-09-01
|
* Added ndpi_binary_bitmap datastrutureLuca Deri2023-08-31
| | | | | It is similar to ndpi_filter but based on binary search and with the ability to store a category per value (as ndpi_domain_classify)
* Code cleanupLuca Deri2023-08-31
|
* Swap from Aho-Corasick to an experimental/home-grown algorithm that uses a ↵Luca Deri2023-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | probabilistic approach for handling Internet domain names. For switching back to Aho-Corasick it is necessary to edit ndpi-typedefs.h and uncomment the line // #define USE_LEGACY_AHO_CORASICK [1] With Aho-Corasick $ ./example/ndpiReader -G ./lists/ -i tests/pcap/ookla.pcap | grep Memory nDPI Memory statistics: nDPI Memory (once): 37.34 KB Flow Memory (per flow): 960 B Actual Memory: 33.09 MB Peak Memory: 33.09 MB [2] With the new algorithm $ ./example/ndpiReader -G ./lists/ -i tests/pcap/ookla.pcap | grep Memory nDPI Memory statistics: nDPI Memory (once): 37.31 KB Flow Memory (per flow): 960 B Actual Memory: 7.42 MB Peak Memory: 7.42 MB In essence from ~33 MB to ~7 MB This new algorithm will enable larger lists to be loaded (e.g. top 1M domans https://s3-us-west-1.amazonaws.com/umbrella-static/index.html) In ./lists there are file names that are named as <category>_<string>.list With -G ndpiReader can load all of them at startup
* fix compilation and symbol checkToni Uhlig2023-08-27
| | | | Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
* Search fixesLuca Deri2023-08-26
|
* LEak fixLuca Deri2023-08-26
|
* Added ndpi_domain_classify_XXX(0 APILuca Deri2023-08-26