aboutsummaryrefslogtreecommitdiff
path: root/utils/gambling_sites_download.sh
Commit message (Collapse)AuthorAge
* shell: reformatted, fixed inspections, typos (#2506)Petr2024-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reformatted shell scripts according to [ShellCheck](https://github.com/koalaman/shellcheck/). I. Most common changes: 1. https://github.com/koalaman/shellcheck/wiki/SC2086 `$var` → `"$var"` Note: this isn't always necessary and I've been careful not to substitute where it wasn't necessary in meaning. 2. https://github.com/koalaman/shellcheck/wiki/SC2006 `` `command` `` → `$(command)` 3. https://github.com/koalaman/shellcheck/wiki/SC2004 `$(( $a + $b ))` → `$(( a + b ))` 4. https://github.com/koalaman/shellcheck/wiki/SC2164 `cd "$dir"` → `cd "$dir" || exit 1` 5. https://github.com/koalaman/shellcheck/wiki/SC2166 `[ check1 -o check2 ]` → `[ check1 ] || [ check2 ]` 6. https://github.com/koalaman/shellcheck/wiki/SC2002 `cat "${file}" | wc -c` → `< "${file}" wc -c` Note: this looks a bit uglier but works faster. II. Some special changes: 1. In file `utils/common.sh`: https://github.com/koalaman/shellcheck/wiki/SC2112 This script is interpreted by `sh`, not by `bash`, but uses the keyword `function`. So I replaced `#!/usr/bin/env sh` to `#!/usr/bin/env bash`. 2. After that I thought of replacing all shebangs to `#!/usr/bin/env bash` for consistency and cross-platform compatibility, especially since most of the files already use bash. 3. But in cases when it was `#!/bin/sh -e` or `#!/bin/bash -eu` another problem appears: https://github.com/koalaman/shellcheck/wiki/SC2096 So I decided to make all shebangs look uniform: ``` #!/usr/bin/env bash set -e (or set -eu) (if needed) ``` 4. In file `tests/ossfuzz.sh`: https://github.com/koalaman/shellcheck/wiki/SC2162 `read i` → `read -r i` Note: I think that there is no need in special treatment for backslashes, but I could be wrong. 5. In file `tests/do.sh.in`: https://github.com/koalaman/shellcheck/wiki/SC2035 `ls *.*cap*` → `ls -- *.*cap*` 6. In file `utils/verify_dist_tarball.sh`: https://github.com/koalaman/shellcheck/wiki/SC2268 `[ "x${TARBALL}" = x ]` → `[ -z "${TARBALL}" ]` 7. In file `utils/check_symbols.sh`: https://github.com/koalaman/shellcheck/wiki/SC2221 `'[ndpi_utils.o]'|'[ndpi_memory.o]'|'[roaring.o]')` → `'[ndpi_utils.o]'|'[ndpi_memory.o]')` 8. In file `autogen.sh`: https://github.com/koalaman/shellcheck/wiki/SC2145 `echo "./configure $@"` → `echo "./configure $*"` https://github.com/koalaman/shellcheck/wiki/SC2068 `./configure $@` → `./configure "$@"` III. `LIST6_MERGED` and `LIST_MERGED6` There were typos with this variables in files `utils/aws_ip_addresses_download.sh`, `utils/aws_ip_addresses_download.sh` and `utils/microsoft_ip_addresses_download.sh` where variable `LIST6_MERGED` was defined, but `LIST_MERGED6` was removed by `rm`. I changed all `LIST_MERGED6` to `LIST6_MERGED`. Not all changes are absolutely necessary, but some may save you from future bugs.
* Improved Polish gambling sites fetch script. (#2315)Toni2024-02-10
| | | | | * fails quite often in the CI, so ignore potential xmllint error Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
* Improved belgium gambling sites regex. (#2184)Toni2023-11-29
| | | Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
* Swap from Aho-Corasick to an experimental/home-grown algorithm that uses a ↵Luca Deri2023-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | probabilistic approach for handling Internet domain names. For switching back to Aho-Corasick it is necessary to edit ndpi-typedefs.h and uncomment the line // #define USE_LEGACY_AHO_CORASICK [1] With Aho-Corasick $ ./example/ndpiReader -G ./lists/ -i tests/pcap/ookla.pcap | grep Memory nDPI Memory statistics: nDPI Memory (once): 37.34 KB Flow Memory (per flow): 960 B Actual Memory: 33.09 MB Peak Memory: 33.09 MB [2] With the new algorithm $ ./example/ndpiReader -G ./lists/ -i tests/pcap/ookla.pcap | grep Memory nDPI Memory statistics: nDPI Memory (once): 37.31 KB Flow Memory (per flow): 960 B Actual Memory: 7.42 MB Peak Memory: 7.42 MB In essence from ~33 MB to ~7 MB This new algorithm will enable larger lists to be loaded (e.g. top 1M domans https://s3-us-west-1.amazonaws.com/umbrella-static/index.html) In ./lists there are file names that are named as <category>_<string>.list With -G ndpiReader can load all of them at startup
* Changes for supporinng more efficient sub-string matchingLuca Deri2023-08-26
|
* Included Gambling website data from the Polish `hazard.mf.gov.pl` list (#2041)snicket21002023-07-14
| | | | | | | | | | | | | * Refreshed the Belgium Gambling Site list data Unfortunately some hostnames have been removed from that list, which means they are disappearing from the `ndpi_gambling_match.c.inc` file as well. * build: added `libxml2-utils` (for `xmllint`) * Included Gambling website data from the Polish `hazard.mf.gov.pl` list The list contains over 30k gambling website hostnames as of today.
* Improved helper scripts. (#1986)Toni2023-05-28
| | | | | * added additional (more restrictive) checks Signed-off-by: lns <matzeton@googlemail.com>
* Added scripts to auto generate hostname/SNI *.inc files. (#1984)Toni2023-05-20
* add illegal gambling sites (Belgium) Signed-off-by: lns <matzeton@googlemail.com>