aboutsummaryrefslogtreecommitdiff
path: root/src/lib/ndpi_domain_classify.c
Commit message (Collapse)AuthorAge
* Removed check fir skipping .arpa domainsLuca Deri2024-01-27
|
* Fixed loading of non-ICANN domains that caused false positives with ↵Luca Deri2024-01-27
| | | | | | ndpi_load_domain_suffixes Minor hash optimization
* Improved ndpi_get_host_domainLuca2024-01-16
|
* Added ndpi_get_host_domain() for returning the host domainLuca2024-01-16
| | | | vs ndpi_get_host_domain_prefix() that instead returnd the host TLD
* Removes extraneous parentheses that caused macOS to complainLuca2024-01-15
|
* Added new API callsLuca2024-01-15
| | | | | | | | | | - ndpi_load_domain_suffixes() - ndpi_get_host_domain_suffix() whose goal is to find the domain name of a hostname. Example: www.bbc.co.uk -> co.uk mail.apple.com -> com
* Fix some warnings reported by CODESonar (#2227)Ivan Nardi2024-01-12
| | | | | | | | | | | | | | | | | | | Remove some unreached/duplicated code. Add error checking for `atoi()` calls. About `isdigit()` and similar functions. The warning reported is: ``` Negative Character Value help isdigit() is invoked here with an argument of signed type char, but only has defined behavior for int arguments that are either representable as unsigned char or equal to the value of macro EOF(-1). Casting the argument to unsigned char will avoid the undefined behavior. In a number of libc implementations, isdigit() is implemented using lookup tables (arrays): passing in a negative value can result in a read underrun. ``` Switching to our macros fix that. Add a check to `check_symbols.sh` to avoid using the original functions from libc.
* fuzz: extend fuzzing coverageNardi Ivan2023-09-16
|
* Add `ndpi_domain_classify_finalize()` function (#2084)Ivan Nardi2023-09-12
| | | | | | | | | The "domain classify" data structure is immutable, since it uses "bitmap64". Allow to finalize it before starting to process packets (i.e. before calling `ndpi_domain_classify_contains()`) to avoid, in the data-path, all the memory allocations due to compression. Calling `ndpi_domain_classify_finalize()` is optional.
* Fixes matches with domain name strings that start with a dotLuca Deri2023-09-11
|
* fuzz: add fuzzers to test bitmap64 and domain_classify data structures (#2082)Ivan Nardi2023-09-10
|
* fuzz: add fuzzers to test reader_util code (#2080)Ivan Nardi2023-09-10
|
* Fix some errors found by fuzzers (#2078)Ivan Nardi2023-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix compilation on Windows. "dirent.h" file has been taken from https://github.com/tronkko/dirent/ Fix Python bindings Fix some warnings with x86_64-w64-mingw32-gcc: ``` protocols/dns.c: In function ‘ndpi_search_dns’: protocols/dns.c:775:41: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] 775 | unsigned long first_element_len = (unsigned long)dot - (unsigned long)_hostname; | ^ protocols/dns.c:775:62: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] 775 | unsigned long first_element_len = (unsigned long)dot - (unsigned long)_hostname; | ``` ``` In file included from ndpi_bitmap64.c:31: third_party/include/binaryfusefilter.h: In function ‘binary_fuse8_hash’: third_party/include/binaryfusefilter.h:160:32: error: left shift count >= width of type [-Werror=shift-count-overflow] 160 | uint64_t hh = hash & ((1UL << 36) - 1); ``` ``` In function ‘ndpi_match_custom_category’, inlined from ‘ndpi_fill_protocol_category.part.0’ at ndpi_main.c:7056:16: ndpi_main.c:3419:3: error: ‘strncpy’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=] 3419 | strncpy(buf, name, name_len); ```
* Reworked initializationLuca Deri2023-09-10
|
* Minor warning fixesLuca Deri2023-09-05
|
* Improved classification further reducing memory usedLuca Deri2023-09-05
|
* Added sub-domain classification fixLuca Deri2023-09-05
|
* Classification fixesLuca Deri2023-09-05
|
* Added ndpi_bitmap64 supportLuca Deri2023-09-05
|
* Added ndpi_murmur_hash to the nDPI APILuca Deri2023-09-04
|
* Merged new and old version of ndpi_domain_classify.c codeLuca Deri2023-09-02
|
* Reworked domain classification based on binary filtersLuca Deri2023-09-02
|
* Improvement for reducing false positivesLuca Deri2023-09-01
|
* Added ndpi_binary_bitmap datastrutureLuca Deri2023-08-31
| | | | | It is similar to ndpi_filter but based on binary search and with the ability to store a category per value (as ndpi_domain_classify)
* Code cleanupLuca Deri2023-08-31
|
* Swap from Aho-Corasick to an experimental/home-grown algorithm that uses a ↵Luca Deri2023-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | probabilistic approach for handling Internet domain names. For switching back to Aho-Corasick it is necessary to edit ndpi-typedefs.h and uncomment the line // #define USE_LEGACY_AHO_CORASICK [1] With Aho-Corasick $ ./example/ndpiReader -G ./lists/ -i tests/pcap/ookla.pcap | grep Memory nDPI Memory statistics: nDPI Memory (once): 37.34 KB Flow Memory (per flow): 960 B Actual Memory: 33.09 MB Peak Memory: 33.09 MB [2] With the new algorithm $ ./example/ndpiReader -G ./lists/ -i tests/pcap/ookla.pcap | grep Memory nDPI Memory statistics: nDPI Memory (once): 37.31 KB Flow Memory (per flow): 960 B Actual Memory: 7.42 MB Peak Memory: 7.42 MB In essence from ~33 MB to ~7 MB This new algorithm will enable larger lists to be loaded (e.g. top 1M domans https://s3-us-west-1.amazonaws.com/umbrella-static/index.html) In ./lists there are file names that are named as <category>_<string>.list With -G ndpiReader can load all of them at startup
* fix compilation and symbol checkToni Uhlig2023-08-27
| | | | Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
* Search fixesLuca Deri2023-08-26
|
* LEak fixLuca Deri2023-08-26
|
* Added ndpi_domain_classify_XXX(0 APILuca Deri2023-08-26