mirror of
https://github.com/soxoj/maigret.git
synced 2026-05-17 11:55:36 +00:00
Compare commits
5 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| ff00a51840 | |||
| 073c20338b | |||
| d1ff1d0e66 | |||
| 3e77c13743 | |||
| c5885331d6 |
+90
-1
@@ -1,5 +1,94 @@
|
|||||||
# Changelog
|
# Changelog
|
||||||
|
|
||||||
|
## [0.6.1] - 2026-05-15
|
||||||
|
|
||||||
|
## What's Changed
|
||||||
|
* build(deps): bump pypdf from 6.9.2 to 6.10.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2512
|
||||||
|
* Fix duplicate attribute initialization in SimpleAiohttpChecker.__init__ by @MichaelMVS in https://github.com/soxoj/maigret/pull/2513
|
||||||
|
* Support Python 3.14 in tests by @soxoj in https://github.com/soxoj/maigret/pull/2515
|
||||||
|
* build(deps-dev): bump tuna from 0.5.11 to 0.5.13 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2516
|
||||||
|
* build(deps): bump lxml from 6.0.3 to 6.0.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2519
|
||||||
|
* build(deps): bump chardet from 7.4.1 to 7.4.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2517
|
||||||
|
* build(deps-dev): bump mypy from 1.20.0 to 1.20.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2518
|
||||||
|
* build(deps): bump pillow from 12.1.1 to 12.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2520
|
||||||
|
* build(deps): bump chardet from 7.4.2 to 7.4.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2521
|
||||||
|
* build(deps): bump pypdf from 6.10.0 to 6.10.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2527
|
||||||
|
* Checks fixes by @soxoj in https://github.com/soxoj/maigret/pull/2528
|
||||||
|
* Update of Readme and documentation by @soxoj in https://github.com/soxoj/maigret/pull/2514
|
||||||
|
* build(deps): bump lxml from 6.0.4 to 6.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2533
|
||||||
|
* Fix site checks: recover 6 CF sites via tls_fingerprint, 500px GraphQ… by @soxoj in https://github.com/soxoj/maigret/pull/2535
|
||||||
|
* fix site checks: 14 sites → ip_reputation, 7 disabled, 5 dead deleted by @soxoj in https://github.com/soxoj/maigret/pull/2536
|
||||||
|
* Fix site checks: 4 fixed, 14 → ip_reputation, 8 disabled, 5 dead deleted by @soxoj in https://github.com/soxoj/maigret/pull/2537
|
||||||
|
* Fix site checks: 3 fixed, 2 → ip_reputation, 7 disabled, 1 dead deleted by @soxoj in https://github.com/soxoj/maigret/pull/2539
|
||||||
|
* Add 3 crypto sites (Polymarket, Zora, Revolut.me), added crypto inves… by @soxoj in https://github.com/soxoj/maigret/pull/2538
|
||||||
|
* Automated Sites List Update by @github-actions[bot] in https://github.com/soxoj/maigret/pull/2541
|
||||||
|
* Fix site checks: 3 fixed, 2 → ip_reputation, 7 disabled, 1 dead deleted by @soxoj in https://github.com/soxoj/maigret/pull/2543
|
||||||
|
* Automated Sites List Update by @github-actions[bot] in https://github.com/soxoj/maigret/pull/2545
|
||||||
|
* Add OnlyFans with activation mechanism; updated site ranks by @soxoj in https://github.com/soxoj/maigret/pull/2546
|
||||||
|
* build(deps-dev): bump mypy from 1.20.1 to 1.20.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2547
|
||||||
|
* build(deps): bump idna from 3.11 to 3.12 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2548
|
||||||
|
* Fix site checks: 3 → ip_reputation, 10 fixed, 6 disabled, 2 dead deleted by @soxoj in https://github.com/soxoj/maigret/pull/2549
|
||||||
|
* Fix site checks: 12 fixed, 19 disabled; add new protection tags by @soxoj in https://github.com/soxoj/maigret/pull/2550
|
||||||
|
* build(deps): bump certifi from 2026.2.25 to 2026.4.22 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2552
|
||||||
|
* AI mode by @soxoj in https://github.com/soxoj/maigret/pull/2529
|
||||||
|
* Fix site checks: 4 → ip_reputation, 9 fixed, 16 disabled, 3 dead dele… by @soxoj in https://github.com/soxoj/maigret/pull/2555
|
||||||
|
* Fix Google Cloud Shell launch by @soxoj in https://github.com/soxoj/maigret/pull/2557
|
||||||
|
* build(deps): bump pyinstaller from 6.19.0 to 6.20.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2554
|
||||||
|
* build(deps): bump idna from 3.12 to 3.13 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2553
|
||||||
|
* test: loosen executor timing upper bounds for slower CI by @juliosuas in https://github.com/soxoj/maigret/pull/2558
|
||||||
|
* Fix site checks: 5 fixed; readme fix by @soxoj in https://github.com/soxoj/maigret/pull/2562
|
||||||
|
* Add Docker web image with multi-stage building by @soxoj in https://github.com/soxoj/maigret/pull/2564
|
||||||
|
* Fix site checks: 7 fixed, 1 disabled by @soxoj in https://github.com/soxoj/maigret/pull/2565
|
||||||
|
* Fix site checks: 5 fixed, 4 disabled; fix UA leak bug by @soxoj in https://github.com/soxoj/maigret/pull/2569
|
||||||
|
* build(deps): bump arabic-reshaper from 3.0.0 to 3.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2573
|
||||||
|
* Add site checks: 18 new sites by @soxoj in https://github.com/soxoj/maigret/pull/2575
|
||||||
|
* Automated Sites List Update by @github-actions[bot] in https://github.com/soxoj/maigret/pull/2576
|
||||||
|
* build(deps): bump reportlab from 4.4.10 to 4.5.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2578
|
||||||
|
* Fix ID extraction crash when regex groups are optional by @egrezeli in https://github.com/soxoj/maigret/pull/2572
|
||||||
|
* Update CONTRIBUTING.md with instructions for developers by @soxoj in https://github.com/soxoj/maigret/pull/2589
|
||||||
|
* Fix outdated Google Colab setup and dependency installation by @SayanDey322 in https://github.com/soxoj/maigret/pull/2591
|
||||||
|
* fix: disable RomanticCollection check by @juliosuas in https://github.com/soxoj/maigret/pull/2588
|
||||||
|
* docs: add Simplified Chinese (zh-CN) README translation by @whtis in https://github.com/soxoj/maigret/pull/2606
|
||||||
|
* Automated Sites List Update by @github-actions[bot] in https://github.com/soxoj/maigret/pull/2607
|
||||||
|
* Improve startup error message for missing dependencies by @SayanDey322 in https://github.com/soxoj/maigret/pull/2593
|
||||||
|
* Modernize python package workflow by @SayanDey322 in https://github.com/soxoj/maigret/pull/2594
|
||||||
|
* Fix site checks: 8 → ip_reputation, 6 fixed, 9 disabled, 1 dead deleted by @soxoj in https://github.com/soxoj/maigret/pull/2611
|
||||||
|
* Reddit fix by @soxoj in https://github.com/soxoj/maigret/pull/2614
|
||||||
|
* Automated Sites List Update by @github-actions[bot] in https://github.com/soxoj/maigret/pull/2615
|
||||||
|
* Fix site checks: 7 fixed, 1 disabled, 1 dead deleted by @soxoj in https://github.com/soxoj/maigret/pull/2616
|
||||||
|
* Fixed duplicates of YouTube and Periscope by @soxoj in https://github.com/soxoj/maigret/pull/2618
|
||||||
|
* Fix network graph height to be viewport-responsive instead of fixed 750px by @SayanDey322 in https://github.com/soxoj/maigret/pull/2590
|
||||||
|
* Add web interface tests by @soxoj in https://github.com/soxoj/maigret/pull/2619
|
||||||
|
* refactor:reduces the cognitive complexity of get_ai_analysis by @odanilosalve in https://github.com/soxoj/maigret/pull/2581
|
||||||
|
* AI mode documentation by @soxoj in https://github.com/soxoj/maigret/pull/2620
|
||||||
|
* build(deps): bump python-bidi from 0.6.7 to 0.6.9 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2622
|
||||||
|
* build(deps-dev): bump mypy from 1.20.2 to 2.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2625
|
||||||
|
* Cloudflare bypass webgate by @soxoj in https://github.com/soxoj/maigret/pull/2628
|
||||||
|
* Fix context field using class instead of instance in error handling by @disappear00 in https://github.com/soxoj/maigret/pull/2627
|
||||||
|
* Add test for CheckError bug by @soxoj in https://github.com/soxoj/maigret/pull/2631
|
||||||
|
* Update download badge links in README.md by @soxoj in https://github.com/soxoj/maigret/pull/2636
|
||||||
|
* fix(security): harden /reports path containment via send_from_directory by @aaronjmars in https://github.com/soxoj/maigret/pull/2635
|
||||||
|
* build(deps-dev): bump coverage from 7.13.5 to 7.14.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2638
|
||||||
|
* build(deps): bump idna from 3.13 to 3.14 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2639
|
||||||
|
* Update links to the community Telegram bot by @soxoj in https://github.com/soxoj/maigret/pull/2641
|
||||||
|
* build(deps): bump urllib3 from 2.6.3 to 2.7.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2642
|
||||||
|
* build(deps-dev): bump mypy from 2.0.0 to 2.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2644
|
||||||
|
* Refresh stale Duolingo usernameClaimed sample (blue → duolingo) by @razbenya in https://github.com/soxoj/maigret/pull/2650
|
||||||
|
* Fix linktr.ee detector (status_code, not stale message check) by @razbenya in https://github.com/soxoj/maigret/pull/2649
|
||||||
|
* Apply --proxy to CurlCffiChecker (tls_fingerprint sites) by @razbenya in https://github.com/soxoj/maigret/pull/2648
|
||||||
|
* Refresh stale Gravatar usernameClaimed sample (blue → automattic) by @razbenya in https://github.com/soxoj/maigret/pull/2651
|
||||||
|
* Add regression tests for CurlCffiChecker proxy forwarding (#2648 follow-up) by @razbenya in https://github.com/soxoj/maigret/pull/2652
|
||||||
|
* build(deps): bump idna from 3.14 to 3.15 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2647
|
||||||
|
* build(deps): bump reportlab from 4.5.0 to 4.5.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2645
|
||||||
|
* build(deps): bump requests from 2.33.1 to 2.34.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2656
|
||||||
|
* build(deps-dev): bump pytest-rerunfailures from 16.1 to 16.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2654
|
||||||
|
* build(deps): bump python-bidi from 0.6.9 to 0.6.10 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2655
|
||||||
|
* Make xhtml2pdf optional, fix install on Linux without libcairo by @soxoj in https://github.com/soxoj/maigret/pull/2659
|
||||||
|
* build(deps): bump requests from 2.34.1 to 2.34.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2658
|
||||||
|
* Fix site checks: 2 fixed, 3 disabled; add Faceit; fix utils import by @soxoj in https://github.com/soxoj/maigret/pull/2660
|
||||||
|
|
||||||
|
**Full Changelog**: https://github.com/soxoj/maigret/compare/v0.6.0...v0.6.1
|
||||||
|
|
||||||
## [0.6.0] - 2025-04-10
|
## [0.6.0] - 2025-04-10
|
||||||
|
|
||||||
## What's Changed
|
## What's Changed
|
||||||
@@ -778,4 +867,4 @@
|
|||||||
## [0.1.1] - 2020-12-05 [YANKED]
|
## [0.1.1] - 2020-12-05 [YANKED]
|
||||||
|
|
||||||
## [0.1.0] - 2020-12-05
|
## [0.1.0] - 2020-12-05
|
||||||
* initial release
|
* initial release
|
||||||
|
|||||||
@@ -95,6 +95,13 @@ Each site entry uses one of three `checkType` modes to decide whether a profile
|
|||||||
|
|
||||||
**Errors vs absence.** Anything that means "the server can't answer right now" — rate limits, captchas, "Checking your browser", "unusual traffic", maintenance pages — belongs in `errors` (mapping the substring to a human-readable error string), not in `absenceStrs`. The `errors` mechanism produces an UNKNOWN result instead of a false CLAIMED or false AVAILABLE.
|
**Errors vs absence.** Anything that means "the server can't answer right now" — rate limits, captchas, "Checking your browser", "unusual traffic", maintenance pages — belongs in `errors` (mapping the substring to a human-readable error string), not in `absenceStrs`. The `errors` mechanism produces an UNKNOWN result instead of a false CLAIMED or false AVAILABLE.
|
||||||
|
|
||||||
|
**`regexCheck` and non-ASCII usernames.** When `{username}` is interpolated into a URL **path segment** and the username contains characters that need percent-encoding (Cyrillic, Chinese, Korean, spaces, etc.), Maigret skips the site with an `URL-incompatible username` error rather than send a request that would land on a generic listing/homepage and trip overly-broad `presenseStrs`. This default avoids the cascade of false-positives observed in [#459](https://github.com/soxoj/maigret/issues/459) and [#2633](https://github.com/soxoj/maigret/issues/2633). Two corollaries for site entries:
|
||||||
|
|
||||||
|
- If your site legitimately accepts non-ASCII characters in the URL path (a wiki that mounts Unicode usernames, a Russian forum that serves Cyrillic slugs, etc.), declare the actual format with an explicit `regexCheck`. For example, a MediaWiki-style wiki could use `"regexCheck": "^[^\\/\\\\#<>\\[\\]\\|{}]+$"`; a Japanese blog platform might use `"regexCheck": "^[\\w\\-_\\.]+$"` (Python's `\w` matches Unicode letters). Don't paper this over with `regexCheck: "."` — pick a regex that reflects what the site actually accepts.
|
||||||
|
- If `{username}` is in a query string (`?name={username}`) or only in `requestPayload`, the default has no effect — query/body values are URL-encoded as parameters and most APIs handle that fine.
|
||||||
|
|
||||||
|
The default kicks in *only* when no per-site `regexCheck` is set. Existing per-site regexes always win.
|
||||||
|
|
||||||
Full reference for `checkType`, `urlProbe`, `engine`, and the rest of the `data.json` schema is in the [development guide](docs/source/development.rst), section *How to fix false-positives*.
|
Full reference for `checkType`, `urlProbe`, `engine`, and the rest of the `data.json` schema is in the [development guide](docs/source/development.rst), section *How to fix false-positives*.
|
||||||
|
|
||||||
### Editing `data.json` safely
|
### Editing `data.json` safely
|
||||||
|
|||||||
+22
-9
@@ -51,19 +51,32 @@ pip install --upgrade certifi
|
|||||||
|
|
||||||
If you are behind a corporate proxy, set `HTTPS_PROXY` / `HTTP_PROXY` environment variables and pass `--proxy "$HTTPS_PROXY"` so Maigret uses the same route.
|
If you are behind a corporate proxy, set `HTTPS_PROXY` / `HTTP_PROXY` environment variables and pass `--proxy "$HTTPS_PROXY"` so Maigret uses the same route.
|
||||||
|
|
||||||
## ".onion / .i2p sites are skipped"
|
## Running over Tor, I2P, or Tails OS
|
||||||
|
|
||||||
These sites only load through the matching gateway. Start your Tor or I2P daemon first, then:
|
Two different goals, two different flags:
|
||||||
|
|
||||||
```bash
|
- **Route only `.onion` / `.i2p` sites through their gateway** (clearweb checks still use your direct connection). Use `--tor-proxy` / `--i2p-proxy`:
|
||||||
# Tor
|
```bash
|
||||||
maigret user --tor-proxy socks5://127.0.0.1:9050
|
maigret user --tor-proxy socks5://127.0.0.1:9050 # only .onion goes via Tor
|
||||||
|
maigret user --i2p-proxy http://127.0.0.1:4444 # only .i2p goes via I2P
|
||||||
|
```
|
||||||
|
Without these flags, `.onion` / `.i2p` sites are silently skipped.
|
||||||
|
|
||||||
# I2P
|
- **Route the whole run through Tor / a proxy** (e.g. on Tails OS, or to anonymise the scan). Use `--proxy`:
|
||||||
maigret user --i2p-proxy http://127.0.0.1:4444
|
```bash
|
||||||
```
|
# system tor daemon (apt install tor, Tails)
|
||||||
|
maigret user --proxy socks5://127.0.0.1:9050 --timeout 60 --retries 2
|
||||||
|
|
||||||
Maigret does not launch or manage these daemons — they must already be running.
|
# Tor Browser bundle (different SOCKS port!)
|
||||||
|
maigret user --proxy socks5://127.0.0.1:9150 --timeout 60 --retries 2
|
||||||
|
```
|
||||||
|
Most public WAFs block Tor exits, so expect more UNKNOWNs over Tor than on a residential line — this is the cost of anonymity, not a bug. Raising `--timeout` to 60 and adding `--retries 2` materially reduces noise.
|
||||||
|
|
||||||
|
On Tails, `torsocks maigret …` / `torify maigret …` do **not** work — Maigret's HTTP client bypasses libc, so the wrapper has no effect. Use `--proxy` instead. To install Maigret over Tor: `torsocks pip install --user maigret`.
|
||||||
|
|
||||||
|
Maigret does not launch or manage Tor / I2P daemons — they must already be running.
|
||||||
|
|
||||||
|
For the full walkthrough (Tor Browser vs system `tor` ports, Tails persistence, reports paths), see the [Tor, I2P, and proxies](https://maigret.readthedocs.io/en/latest/tor-and-proxies.html) page on readthedocs.
|
||||||
|
|
||||||
## "The PDF / XMind / HTML report looks wrong"
|
## "The PDF / XMind / HTML report looks wrong"
|
||||||
|
|
||||||
|
|||||||
@@ -63,6 +63,29 @@ from slow sites. On the other hand, this may cause a long delay to
|
|||||||
gather all results. The choice of the right timeout should be carried
|
gather all results. The choice of the right timeout should be carried
|
||||||
out taking into account the bandwidth of the Internet connection.
|
out taking into account the bandwidth of the Internet connection.
|
||||||
|
|
||||||
|
Network and proxy options
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
``--proxy PROXY_URL`` / ``-p PROXY_URL`` - Route **every** check through
|
||||||
|
the given HTTP or SOCKS proxy. Example: ``socks5://127.0.0.1:1080``,
|
||||||
|
``http://user:pass@proxy.example:3128``. This is the flag to use for
|
||||||
|
routing the whole run through Tor (``--proxy socks5://127.0.0.1:9050``),
|
||||||
|
a residential proxy, or any corporate gateway. No default.
|
||||||
|
|
||||||
|
``--tor-proxy TOR_PROXY_URL`` - Gateway used **only** for ``.onion``
|
||||||
|
sites in the database **(default: socks5://127.0.0.1:9050)**. Clearweb
|
||||||
|
sites are unaffected — for them Maigret uses your direct connection or
|
||||||
|
``--proxy`` if you set one. Without this flag, ``.onion`` sites are
|
||||||
|
silently skipped.
|
||||||
|
|
||||||
|
``--i2p-proxy I2P_PROXY_URL`` - Gateway used **only** for ``.i2p``
|
||||||
|
sites in the database **(default: http://127.0.0.1:4444)**. Same
|
||||||
|
"only matching protocol" rule as ``--tor-proxy``.
|
||||||
|
|
||||||
|
Maigret does not start the Tor or I2P daemon for you — launch it first.
|
||||||
|
For a full walkthrough (Tor Browser vs system ``tor`` port numbers,
|
||||||
|
Tails OS recipe, timeout/retry tuning), see :doc:`tor-and-proxies`.
|
||||||
|
|
||||||
``--cookies-jar-file`` - File with custom cookies in Netscape format
|
``--cookies-jar-file`` - File with custom cookies in Netscape format
|
||||||
(aka cookies.txt). You can install an extension to your browser to
|
(aka cookies.txt). You can install an extension to your browser to
|
||||||
download own cookies (`Chrome <https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid>`_, `Firefox <https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/>`_).
|
download own cookies (`Chrome <https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid>`_, `Firefox <https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/>`_).
|
||||||
|
|||||||
+2
-2
@@ -6,8 +6,8 @@ project = 'Maigret'
|
|||||||
copyright = '2025, soxoj'
|
copyright = '2025, soxoj'
|
||||||
author = 'soxoj'
|
author = 'soxoj'
|
||||||
|
|
||||||
release = '0.5.0'
|
release = '0.6.1'
|
||||||
version = '0.5'
|
version = '0.6'
|
||||||
|
|
||||||
# -- General configuration
|
# -- General configuration
|
||||||
|
|
||||||
|
|||||||
@@ -134,11 +134,50 @@ There are few options for sites data.json helpful in various cases:
|
|||||||
- ``engine`` - a predefined check for the sites of certain type (e.g. forums), see the ``engines`` section in the JSON file
|
- ``engine`` - a predefined check for the sites of certain type (e.g. forums), see the ``engines`` section in the JSON file
|
||||||
- ``headers`` - a dictionary of additional headers to be sent to the site
|
- ``headers`` - a dictionary of additional headers to be sent to the site
|
||||||
- ``requestHeadOnly`` - set to ``true`` if it's enough to make a HEAD request to the site
|
- ``requestHeadOnly`` - set to ``true`` if it's enough to make a HEAD request to the site
|
||||||
- ``regexCheck`` - a regex to check if the username is valid, in case of frequent false-positives
|
- ``regexCheck`` - a regex to check if the username is valid, in case of frequent false-positives (see ``regexCheck`` and the non-ASCII default below)
|
||||||
- ``requestMethod`` - set the HTTP method to use (e.g., ``POST``). By default, Maigret natively defaults to GET or HEAD.
|
- ``requestMethod`` - set the HTTP method to use (e.g., ``POST``). By default, Maigret natively defaults to GET or HEAD.
|
||||||
- ``requestPayload`` - a dictionary with the JSON payload to send for POST requests (e.g., ``{"username": "{username}"}``), extremely useful for parsing GraphQL or modern JSON APIs.
|
- ``requestPayload`` - a dictionary with the JSON payload to send for POST requests (e.g., ``{"username": "{username}"}``), extremely useful for parsing GraphQL or modern JSON APIs.
|
||||||
- ``protection`` - a list of protection types detected on the site (see below).
|
- ``protection`` - a list of protection types detected on the site (see below).
|
||||||
|
|
||||||
|
``regexCheck`` and non-ASCII usernames
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
When ``{username}`` is interpolated into a URL **path segment** and the user-supplied username contains characters that would be percent-encoded by :py:func:`urllib.parse.quote` (Cyrillic, Chinese, Korean, Arabic, spaces, etc.), Maigret skips the site with an ``URL-incompatible username`` error rather than send a request that would land on a generic listing/homepage and trip overly-broad ``presenseStrs``. This default closes the cascade of false-positives observed in `issue #459 <https://github.com/soxoj/maigret/issues/459>`_ and `issue #2633 <https://github.com/soxoj/maigret/issues/2633>`_.
|
||||||
|
|
||||||
|
Scope of the default:
|
||||||
|
|
||||||
|
- Active **only** when ``{username}`` is in the URL path of ``url`` (or ``urlProbe`` if set), e.g. ``https://example.com/u/{username}``.
|
||||||
|
- **Not** active when ``{username}`` is in the query string (``?name={username}``) or only in ``requestPayload`` — those values are URL-encoded as parameters and most APIs handle them fine.
|
||||||
|
- **Always** deferred when the site has its own ``regexCheck`` — an explicit per-site rule wins.
|
||||||
|
|
||||||
|
Opting a site into broader matching:
|
||||||
|
|
||||||
|
If a site genuinely accepts non-ASCII characters in the URL path (a wiki that mounts Unicode usernames, a Russian forum that serves Cyrillic slugs, etc.), declare the actual accepted format with an explicit ``regexCheck`` that matches your reality. A few worked examples:
|
||||||
|
|
||||||
|
- A MediaWiki-style wiki that allows any character except the MediaWiki-forbidden punctuation:
|
||||||
|
|
||||||
|
.. code-block:: json
|
||||||
|
|
||||||
|
{
|
||||||
|
"url": "https://wiki.example/wiki/User:{username}",
|
||||||
|
"regexCheck": "^[^\\/\\\\#<>\\[\\]\\|{}]+$"
|
||||||
|
}
|
||||||
|
|
||||||
|
- A Japanese blog platform that allows Unicode word characters + dash + dot:
|
||||||
|
|
||||||
|
.. code-block:: json
|
||||||
|
|
||||||
|
{
|
||||||
|
"url": "https://blog.example/{username}",
|
||||||
|
"regexCheck": "^[\\w\\-_\\.]+$"
|
||||||
|
}
|
||||||
|
|
||||||
|
In Python's regex engine, ``\\w`` against a ``str`` pattern matches Unicode letters by default, so Hiragana / Hangul / Cyrillic / etc. all pass.
|
||||||
|
|
||||||
|
**Do not** paper this over with ``"regexCheck": "."`` — that's a placeholder, not a description of what the site accepts; it will let any string through, including URLs and emails that other parts of Maigret may pick up and feed back into recursive search (see ``parse_usernames`` in ``checking.py``).
|
||||||
|
|
||||||
|
The complementary direction also matters: if you notice an existing site with a too-permissive ``regexCheck`` (e.g. ``"^[^\\.]+$"``, which means "anything but a dot" — that gladly lets non-ASCII through), tighten it to the actual accepted character class for the site (typically ``"^[A-Za-z0-9_-]+$"`` for ASCII slugs) when fixing related false-positives.
|
||||||
|
|
||||||
``protection`` (site protection tracking)
|
``protection`` (site protection tracking)
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
@@ -283,12 +322,21 @@ PyPi package.
|
|||||||
|
|
||||||
git checkout -b 0.4.0
|
git checkout -b 0.4.0
|
||||||
|
|
||||||
2. Update Maigret version in three files manually:
|
2. Update Maigret version in four files manually. **All four must be in
|
||||||
|
sync** — the previous bump missed ``docs/source/conf.py`` and
|
||||||
|
``snapcraft.yaml`` and they fell behind by a release.
|
||||||
|
|
||||||
- pyproject.toml
|
- ``pyproject.toml`` — single line ``version = "X.Y.Z"`` under
|
||||||
- maigret/__version__.py
|
``[tool.poetry]``.
|
||||||
- docs/source/conf.py
|
- ``maigret/__version__.py`` — single line ``__version__ = 'X.Y.Z'``.
|
||||||
- snapcraft.yaml
|
- ``docs/source/conf.py`` — **two** Sphinx fields. ``release`` is the
|
||||||
|
full version (``'X.Y.Z'``); ``version`` is the short ``major.minor``
|
||||||
|
(``'X.Y'``, **without** the patch number). Update **both**.
|
||||||
|
- ``snapcraft.yaml`` — single line ``version: X.Y.Z`` (no quotes, no
|
||||||
|
``v`` prefix).
|
||||||
|
|
||||||
|
After editing, sanity-check with ``grep -rE '0\.5\.|0\.6\.|<old>'`` to
|
||||||
|
catch any straggler reference.
|
||||||
|
|
||||||
3. Create a new empty text section in the beginning of the file `CHANGELOG.md` with a current date:
|
3. Create a new empty text section in the beginning of the file `CHANGELOG.md` with a current date:
|
||||||
|
|
||||||
|
|||||||
@@ -30,6 +30,7 @@ You may be interested in:
|
|||||||
- :doc:`Command line options <command-line-options>`
|
- :doc:`Command line options <command-line-options>`
|
||||||
- :doc:`Features list <features>`
|
- :doc:`Features list <features>`
|
||||||
- :doc:`Library usage <library-usage>`
|
- :doc:`Library usage <library-usage>`
|
||||||
|
- :doc:`Tor, I2P, and proxies <tor-and-proxies>`
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:hidden:
|
:hidden:
|
||||||
@@ -40,13 +41,19 @@ You may be interested in:
|
|||||||
usage-examples
|
usage-examples
|
||||||
command-line-options
|
command-line-options
|
||||||
features
|
features
|
||||||
library-usage
|
|
||||||
philosophy
|
philosophy
|
||||||
supported-identifier-types
|
supported-identifier-types
|
||||||
tags
|
tags
|
||||||
settings
|
|
||||||
development
|
development
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:hidden:
|
||||||
|
:caption: Advanced usage
|
||||||
|
|
||||||
|
library-usage
|
||||||
|
settings
|
||||||
|
tor-and-proxies
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:hidden:
|
:hidden:
|
||||||
:caption: Use cases
|
:caption: Use cases
|
||||||
|
|||||||
@@ -0,0 +1,122 @@
|
|||||||
|
.. _tor-and-proxies:
|
||||||
|
|
||||||
|
Tor, I2P, and proxies
|
||||||
|
=====================
|
||||||
|
|
||||||
|
Maigret can route checks through an HTTP/SOCKS proxy, the Tor network, or I2P. Three CLI flags cover three distinct goals — knowing which one you need is the most common stumbling block.
|
||||||
|
|
||||||
|
``--proxy`` vs ``--tor-proxy`` (and ``--i2p-proxy``)
|
||||||
|
----------------------------------------------------
|
||||||
|
|
||||||
|
The most-asked question (see `issue #544 <https://github.com/soxoj/maigret/issues/544>`_):
|
||||||
|
|
||||||
|
- **You want every check to go through Tor** (e.g. you're on Tails OS, or behind a country-level block, or your IP is rate-limited). → Use ``--proxy``, pointing at your Tor SOCKS port:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
maigret <username> --proxy socks5://127.0.0.1:9050
|
||||||
|
|
||||||
|
- **You want to reach ``.onion`` sites in the Maigret database**, while the rest of the run still uses your normal connection. → Use ``--tor-proxy``:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
maigret <username> --tor-proxy socks5://127.0.0.1:9050
|
||||||
|
|
||||||
|
``--tor-proxy`` is **only** consulted for sites whose ``url`` is a ``.onion`` host. For every other site Maigret uses your direct connection (or ``--proxy`` if set). Without ``--tor-proxy``, ``.onion`` sites are silently skipped.
|
||||||
|
|
||||||
|
The same split applies to ``--i2p-proxy``: it is consulted only for ``.i2p`` hosts, never for clearweb sites.
|
||||||
|
|
||||||
|
Defaults: ``--tor-proxy`` defaults to ``socks5://127.0.0.1:9050`` and ``--i2p-proxy`` to ``http://127.0.0.1:4444``. ``--proxy`` has no default. Maigret does **not** launch ``tor`` or an I2P router for you — start the daemon first.
|
||||||
|
|
||||||
|
Tor Browser vs system ``tor``: port numbers
|
||||||
|
-------------------------------------------
|
||||||
|
|
||||||
|
The SOCKS port differs by Tor installation:
|
||||||
|
|
||||||
|
- **System ``tor`` daemon** (``apt install tor``, ``brew install tor``, Tails) listens on ``9050``.
|
||||||
|
- **Tor Browser bundle** ships its own ``tor`` listening on ``9150``.
|
||||||
|
|
||||||
|
If a connection refuses, try the other port:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# system tor
|
||||||
|
maigret <username> --proxy socks5://127.0.0.1:9050
|
||||||
|
|
||||||
|
# Tor Browser running in the background
|
||||||
|
maigret <username> --proxy socks5://127.0.0.1:9150
|
||||||
|
|
||||||
|
A note on results over Tor
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
Most public WAFs (Cloudflare, DDoS-Guard, AWS WAF, Akamai) block Tor exit nodes by default — usually more aggressively than they block datacenter IPs. A Tor run typically produces **more UNKNOWNs and fewer CLAIMEDs** than the same run from a residential connection. This is not a bug in Maigret; it is the cost of anonymity.
|
||||||
|
|
||||||
|
Recommended flags for a Tor run:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
maigret <username> --proxy socks5://127.0.0.1:9050 --timeout 60 --retries 2
|
||||||
|
|
||||||
|
- ``--timeout 60`` — Tor circuits add 1–3 seconds per request; the default 30 s causes spurious timeouts.
|
||||||
|
- ``--retries 2`` — retries cover transient circuit failures, which are common on Tor.
|
||||||
|
- Optional ``-n 20`` — lowering concurrency (default 100) reduces the chance of exits rate-limiting you.
|
||||||
|
|
||||||
|
If you mainly need to bypass WAFs (rather than to remain anonymous), a residential proxy will usually outperform Tor by a wide margin. See the **"Lots of sites fail / timeout / return 403"** section in `TROUBLESHOOTING.md <https://github.com/soxoj/maigret/blob/main/TROUBLESHOOTING.md>`_.
|
||||||
|
|
||||||
|
Running on Tails OS
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Tails forces every outbound connection through Tor at the network layer. Maigret needs no special configuration to comply — pointing ``--proxy`` at the Tails Tor daemon is enough:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
maigret <username> --proxy socks5://127.0.0.1:9050 --timeout 60
|
||||||
|
|
||||||
|
Things that are **not** needed:
|
||||||
|
|
||||||
|
- ``torsocks maigret …`` and ``torify maigret …`` — these wrap libc socket calls, but Maigret's HTTP client (``aiohttp`` / ``curl_cffi``) bypasses libc for network I/O, so the wrapper has no effect. Use ``--proxy`` instead.
|
||||||
|
- ``--tor-proxy`` — on Tails, *everything* must go via Tor (the OS enforces this), so the niche "only .onion via Tor" mode that ``--tor-proxy`` provides does not apply.
|
||||||
|
|
||||||
|
Installation over Tor on Tails
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
``pip`` itself does not know about Tor; on Tails you need ``torsocks`` to wrap it:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
torsocks pip install --user maigret
|
||||||
|
|
||||||
|
After install, the binary lands in ``~/.local/bin/maigret``. If ``maigret: command not found``, either add ``~/.local/bin`` to ``PATH`` or invoke it as ``python3 -m maigret <username>``.
|
||||||
|
|
||||||
|
Persisting Maigret across Tails sessions
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Tails wipes ``~/.local/`` on reboot unless you configure the Persistent Storage to keep it. This is Tails configuration, not Maigret configuration — see the official Tails docs:
|
||||||
|
|
||||||
|
- `Persistent Storage on Tails <https://tails.boum.org/doc/persistent_storage/>`_
|
||||||
|
- `Configuring Persistent Storage features <https://tails.boum.org/doc/persistent_storage/configure/>`_
|
||||||
|
|
||||||
|
A step-by-step recipe contributed by a user (persisting ``~/.local/lib/python3.9`` and ``~/.local/bin`` and patching ``.bashrc``) is in `issue #544 <https://github.com/soxoj/maigret/issues/544#issuecomment-1356469171>`_. Treat it as a starting point: the Python version and Tails internals change between Tails releases.
|
||||||
|
|
||||||
|
Reports on Tails — where to save them
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The default ``reports/`` directory lives next to the working directory and is wiped with the amnesiac session. To save reports somewhere persistent, either pass ``-fo``:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
maigret <username> --html -fo "/home/amnesia/Persistent/maigret-reports"
|
||||||
|
|
||||||
|
or set ``"reports_path"`` in your ``settings.json`` to a persistent path. See :doc:`settings`.
|
||||||
|
|
||||||
|
Programmatic equivalents (Python library)
|
||||||
|
-----------------------------------------
|
||||||
|
|
||||||
|
The same options are available through the Python API. See :doc:`library-usage` — the relevant keyword arguments are ``proxy=``, ``tor_proxy=`` and ``i2p_proxy=``, accepting the same URL formats as the CLI flags.
|
||||||
|
|
||||||
|
See also
|
||||||
|
--------
|
||||||
|
|
||||||
|
- :doc:`command-line-options` — full reference for the three flags.
|
||||||
|
- `TROUBLESHOOTING.md <https://github.com/soxoj/maigret/blob/main/TROUBLESHOOTING.md>`_ — quick recipes for ``.onion`` / I2P sites and for WAF-induced 403s.
|
||||||
|
- :doc:`library-usage` — proxy options for embedded use.
|
||||||
@@ -1,3 +1,3 @@
|
|||||||
"""Maigret version file"""
|
"""Maigret version file"""
|
||||||
|
|
||||||
__version__ = '0.6.0'
|
__version__ = '0.6.1'
|
||||||
|
|||||||
@@ -49,6 +49,34 @@ SUPPORTED_IDS = (
|
|||||||
BAD_CHARS = "#"
|
BAD_CHARS = "#"
|
||||||
|
|
||||||
|
|
||||||
|
def _username_fits_url_template(site: MaigretSite, username: str) -> bool:
|
||||||
|
"""Decide whether a username can be safely substituted into a site's URL
|
||||||
|
path without producing a percent-encoded slug that the site cannot match.
|
||||||
|
|
||||||
|
Rationale: most sites that interpolate ``{username}`` into a URL path
|
||||||
|
segment treat the slug as an ASCII identifier. When a username contains
|
||||||
|
non-ASCII characters (or other reserved characters), ``urllib.parse.quote``
|
||||||
|
percent-encodes the bytes; the site typically cannot resolve such a slug
|
||||||
|
and falls back to a generic listing/homepage that trips overly-broad
|
||||||
|
``presenseStrs`` markers, producing a false CLAIMED. See issues #459 and
|
||||||
|
#2633. Sites that genuinely accept broader character sets (e.g. wikis
|
||||||
|
that allow Unicode usernames) opt into permissive matching by setting
|
||||||
|
their own ``regexCheck``; in that case this helper is bypassed entirely.
|
||||||
|
|
||||||
|
Returns True when the check should proceed, False when the result is
|
||||||
|
inherently unreliable and the site should be skipped (ILLEGAL).
|
||||||
|
"""
|
||||||
|
if site.regex_check:
|
||||||
|
return True
|
||||||
|
template = site.url_probe or site.url or ""
|
||||||
|
if "{username}" not in template:
|
||||||
|
return True
|
||||||
|
path_part, _sep, _query = template.partition("?")
|
||||||
|
if "{username}" not in path_part:
|
||||||
|
return True
|
||||||
|
return quote(username, safe='') == username
|
||||||
|
|
||||||
|
|
||||||
def build_cloudflare_bypass_config(
|
def build_cloudflare_bypass_config(
|
||||||
settings_obj: Optional[Any], force_enable: bool = False
|
settings_obj: Optional[Any], force_enable: bool = False
|
||||||
) -> Optional[Dict[str, Any]]:
|
) -> Optional[Dict[str, Any]]:
|
||||||
@@ -880,6 +908,23 @@ def make_site_result(
|
|||||||
results_site["http_status"] = ""
|
results_site["http_status"] = ""
|
||||||
results_site["response_text"] = ""
|
results_site["response_text"] = ""
|
||||||
# query_notify.update(results_site["status"])
|
# query_notify.update(results_site["status"])
|
||||||
|
# username would be percent-encoded into a path segment — see #459/#2633.
|
||||||
|
elif not _username_fits_url_template(site, username):
|
||||||
|
results_site["status"] = MaigretCheckResult(
|
||||||
|
username,
|
||||||
|
site.name,
|
||||||
|
url,
|
||||||
|
MaigretCheckStatus.ILLEGAL,
|
||||||
|
error=CheckError(
|
||||||
|
'URL-incompatible username',
|
||||||
|
'username contains characters that would be percent-encoded '
|
||||||
|
'in this site\'s URL path; result would be unreliable. Add a '
|
||||||
|
'`regexCheck` to opt this site in if it accepts these chars.'
|
||||||
|
),
|
||||||
|
)
|
||||||
|
results_site["url_user"] = ""
|
||||||
|
results_site["http_status"] = ""
|
||||||
|
results_site["response_text"] = ""
|
||||||
else:
|
else:
|
||||||
# URL of user on site (if it exists)
|
# URL of user on site (if it exists)
|
||||||
results_site["url_user"] = url
|
results_site["url_user"] = url
|
||||||
|
|||||||
@@ -57,7 +57,8 @@
|
|||||||
"\"routePath\":null"
|
"\"routePath\":null"
|
||||||
],
|
],
|
||||||
"errors": {
|
"errors": {
|
||||||
"Login • Instagram": "Login required"
|
"Login • Instagram": "Login required",
|
||||||
|
"\"routePath\":\"\\/\"": "Login required (rate-limited or session blocked)"
|
||||||
},
|
},
|
||||||
"alexaRank": 4,
|
"alexaRank": 4,
|
||||||
"urlMain": "https://www.instagram.com/",
|
"urlMain": "https://www.instagram.com/",
|
||||||
@@ -3766,7 +3767,7 @@
|
|||||||
"absenceStrs": [
|
"absenceStrs": [
|
||||||
"Couldn't find any profile with name"
|
"Couldn't find any profile with name"
|
||||||
],
|
],
|
||||||
"regexCheck": "^.{1,25}$",
|
"regexCheck": "^[A-Za-z0-9_]{3,16}$",
|
||||||
"usernameClaimed": "blue",
|
"usernameClaimed": "blue",
|
||||||
"usernameUnclaimed": "noonewouldeverusethis7",
|
"usernameUnclaimed": "noonewouldeverusethis7",
|
||||||
"alexaRank": 1635,
|
"alexaRank": 1635,
|
||||||
@@ -8217,7 +8218,17 @@
|
|||||||
"Namuwiki": {
|
"Namuwiki": {
|
||||||
"url": "https://namu.wiki/w/%EC%82%AC%EC%9A%A9%EC%9E%90:{username}",
|
"url": "https://namu.wiki/w/%EC%82%AC%EC%9A%A9%EC%9E%90:{username}",
|
||||||
"urlMain": "https://namu.wiki/",
|
"urlMain": "https://namu.wiki/",
|
||||||
"checkType": "status_code",
|
"checkType": "message",
|
||||||
|
"presenseStrs": [
|
||||||
|
"<meta property=\"og:title\""
|
||||||
|
],
|
||||||
|
"absenceStrs": [
|
||||||
|
"새 문서 만들기"
|
||||||
|
],
|
||||||
|
"regexCheck": "^[\\w\\-_.]+$",
|
||||||
|
"protection": [
|
||||||
|
"cf_js_challenge"
|
||||||
|
],
|
||||||
"usernameClaimed": "namu",
|
"usernameClaimed": "namu",
|
||||||
"usernameUnclaimed": "noonewouldeverusethis7",
|
"usernameUnclaimed": "noonewouldeverusethis7",
|
||||||
"alexaRank": 7047,
|
"alexaRank": 7047,
|
||||||
@@ -13241,7 +13252,7 @@
|
|||||||
"ru"
|
"ru"
|
||||||
],
|
],
|
||||||
"checkType": "response_url",
|
"checkType": "response_url",
|
||||||
"regexCheck": "^[^-]+$",
|
"regexCheck": "^[A-Za-z0-9_.]+$",
|
||||||
"alexaRank": 29071,
|
"alexaRank": 29071,
|
||||||
"urlMain": "https://studfile.net",
|
"urlMain": "https://studfile.net",
|
||||||
"url": "https://studfile.net/users/{username}/",
|
"url": "https://studfile.net/users/{username}/",
|
||||||
@@ -15602,7 +15613,7 @@
|
|||||||
"tags": [
|
"tags": [
|
||||||
"coding"
|
"coding"
|
||||||
],
|
],
|
||||||
"regexCheck": "^[^\\.]+$",
|
"regexCheck": "^[A-Za-z0-9_-]+$",
|
||||||
"checkType": "message",
|
"checkType": "message",
|
||||||
"absenceStrs": [
|
"absenceStrs": [
|
||||||
"<title>Users - Hacking with Swift</title>"
|
"<title>Users - Hacking with Swift</title>"
|
||||||
@@ -17095,7 +17106,7 @@
|
|||||||
"tags": [
|
"tags": [
|
||||||
"hacking"
|
"hacking"
|
||||||
],
|
],
|
||||||
"regexCheck": "^[^\\.]+$",
|
"regexCheck": "^[A-Za-z0-9_-]+$",
|
||||||
"checkType": "message",
|
"checkType": "message",
|
||||||
"absenceStrs": [
|
"absenceStrs": [
|
||||||
"Cannot Retrieve Information For The Specified Username"
|
"Cannot Retrieve Information For The Specified Username"
|
||||||
@@ -17555,7 +17566,7 @@
|
|||||||
"errors": {
|
"errors": {
|
||||||
"An error has occurred.": "Site error"
|
"An error has occurred.": "Site error"
|
||||||
},
|
},
|
||||||
"regexCheck": "^[^\\.]+$",
|
"regexCheck": "^[A-Za-z0-9_-]+$",
|
||||||
"checkType": "message",
|
"checkType": "message",
|
||||||
"absenceStrs": [
|
"absenceStrs": [
|
||||||
"No such user."
|
"No such user."
|
||||||
@@ -20679,7 +20690,7 @@
|
|||||||
"tags": [
|
"tags": [
|
||||||
"ru"
|
"ru"
|
||||||
],
|
],
|
||||||
"regexCheck": "^[^\\.]+$",
|
"regexCheck": "^[A-Za-z0-9_-]+$",
|
||||||
"checkType": "message",
|
"checkType": "message",
|
||||||
"absenceStrs": [
|
"absenceStrs": [
|
||||||
"Указанный пользователь не найден"
|
"Указанный пользователь не найден"
|
||||||
@@ -20811,7 +20822,7 @@
|
|||||||
"tags": [
|
"tags": [
|
||||||
"hu"
|
"hu"
|
||||||
],
|
],
|
||||||
"regexCheck": "^[^\\.]+$",
|
"regexCheck": "^[A-Za-z0-9_-]+$",
|
||||||
"checkType": "message",
|
"checkType": "message",
|
||||||
"absenceStrs": [
|
"absenceStrs": [
|
||||||
"<title>Log in - Chan4Chan</title>"
|
"<title>Log in - Chan4Chan</title>"
|
||||||
|
|||||||
@@ -1,8 +1,8 @@
|
|||||||
{
|
{
|
||||||
"version": 1,
|
"version": 1,
|
||||||
"updated_at": "2026-05-15T16:12:58Z",
|
"updated_at": "2026-05-17T08:44:03Z",
|
||||||
"sites_count": 3155,
|
"sites_count": 3155,
|
||||||
"min_maigret_version": "0.6.0",
|
"min_maigret_version": "0.6.1",
|
||||||
"data_sha256": "df2ab3dbc96bdcdc8aa4e9da485df75ce6c3274814080f00a35e89f7f43783e1",
|
"data_sha256": "896a15cfb0de131848de5ae915a81d60d9d86a3e4537dc1004adeab29ceb4b43",
|
||||||
"data_url": "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/data.json"
|
"data_url": "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/data.json"
|
||||||
}
|
}
|
||||||
+1
-1
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
|
|||||||
|
|
||||||
[tool.poetry]
|
[tool.poetry]
|
||||||
name = "maigret"
|
name = "maigret"
|
||||||
version = "0.6.0"
|
version = "0.6.1"
|
||||||
description = "🕵️♂️ Collect a dossier on a person by username from thousands of sites."
|
description = "🕵️♂️ Collect a dossier on a person by username from thousands of sites."
|
||||||
authors = ["Soxoj <soxoj@protonmail.com>"]
|
authors = ["Soxoj <soxoj@protonmail.com>"]
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
|
|||||||
@@ -3159,16 +3159,16 @@ Rank data fetched from Majestic Million by domains.
|
|||||||
1.  [GreasyFork (https://greasyfork.org)](https://greasyfork.org)*: top 100M, coding*
|
1.  [GreasyFork (https://greasyfork.org)](https://greasyfork.org)*: top 100M, coding*
|
||||||
1.  [Faceit (https://faceit.com/)](https://faceit.com/)*: top 100M, gaming*
|
1.  [Faceit (https://faceit.com/)](https://faceit.com/)*: top 100M, gaming*
|
||||||
|
|
||||||
The list was updated at (2026-05-15)
|
The list was updated at (2026-05-17)
|
||||||
## Statistics
|
## Statistics
|
||||||
|
|
||||||
Enabled/total sites: 2522/3155 = 79.94%
|
Enabled/total sites: 2522/3155 = 79.94%
|
||||||
|
|
||||||
Incomplete message checks: 311/2522 = 12.33% (false positive risks)
|
Incomplete message checks: 311/2522 = 12.33% (false positive risks)
|
||||||
|
|
||||||
Status code checks: 635/2522 = 25.18% (false positive risks)
|
Status code checks: 634/2522 = 25.14% (false positive risks)
|
||||||
|
|
||||||
False positive risk (total): 37.51%
|
False positive risk (total): 37.47%
|
||||||
|
|
||||||
Sites with probing: 500px, Armchairgm, BinarySearch (disabled), BleachFandom, Bluesky, BongaCams, Boosty, BuyMeACoffee, Calendly, Cent, Chess, Code Sandbox (disabled), Code Snippet Wiki, DailyMotion, Discord, Diskusjon.no, Disqus, Docker Hub, Duolingo, Faceit, FandomCommunityCentral, GitHub, GitLab, Google Plus (archived), Gravatar, HackTheBox, Hackerrank, Hashnode, Holopin, Imgur, Issuu, Keybase, Kick, Kvinneguiden, LeetCode, Lesswrong, Livejasmin, LocalCryptos (disabled), Medium, MicrosoftLearn, MixCloud, Monkeytype, NPM, Niftygateway, Omg.lol, OnlyFans, Paragraph, Picsart, Plurk, Polarsteps, Rarible, Reddit, Reddit Search (Pushshift) (disabled), Revolut.me, RoyalCams, Scratch, Soop, SportsTracker, Spotify, StackOverflow, Substack, TAP'D, Topcoder, Trello, Twitch, Twitter, Twitter Shadowban (disabled), UnstoppableDomains, Vimeo, Vivino, Warframe Market, Warpcast, Weibo, Wikipedia, Yapisal (disabled), YouNow, en.brickimedia.org, forums.grandstream.com, nightbot, notabug.org, qiwi.me (disabled)
|
Sites with probing: 500px, Armchairgm, BinarySearch (disabled), BleachFandom, Bluesky, BongaCams, Boosty, BuyMeACoffee, Calendly, Cent, Chess, Code Sandbox (disabled), Code Snippet Wiki, DailyMotion, Discord, Diskusjon.no, Disqus, Docker Hub, Duolingo, Faceit, FandomCommunityCentral, GitHub, GitLab, Google Plus (archived), Gravatar, HackTheBox, Hackerrank, Hashnode, Holopin, Imgur, Issuu, Keybase, Kick, Kvinneguiden, LeetCode, Lesswrong, Livejasmin, LocalCryptos (disabled), Medium, MicrosoftLearn, MixCloud, Monkeytype, NPM, Niftygateway, Omg.lol, OnlyFans, Paragraph, Picsart, Plurk, Polarsteps, Rarible, Reddit, Reddit Search (Pushshift) (disabled), Revolut.me, RoyalCams, Scratch, Soop, SportsTracker, Spotify, StackOverflow, Substack, TAP'D, Topcoder, Trello, Twitch, Twitter, Twitter Shadowban (disabled), UnstoppableDomains, Vimeo, Vivino, Warframe Market, Warpcast, Weibo, Wikipedia, Yapisal (disabled), YouNow, en.brickimedia.org, forums.grandstream.com, nightbot, notabug.org, qiwi.me (disabled)
|
||||||
|
|
||||||
|
|||||||
+1
-1
@@ -7,7 +7,7 @@ description: |
|
|||||||
|
|
||||||
Currently supported more than 3000 sites, search is launched against 500 popular sites in descending order of popularity by default. Also supported checking of Tor sites, I2P sites, and domains (via DNS resolving).
|
Currently supported more than 3000 sites, search is launched against 500 popular sites in descending order of popularity by default. Also supported checking of Tor sites, I2P sites, and domains (via DNS resolving).
|
||||||
|
|
||||||
version: 0.5.0
|
version: 0.6.1
|
||||||
license: MIT
|
license: MIT
|
||||||
base: core22
|
base: core22
|
||||||
confinement: strict
|
confinement: strict
|
||||||
|
|||||||
@@ -13,6 +13,7 @@ from maigret.checking import (
|
|||||||
timeout_check,
|
timeout_check,
|
||||||
debug_response_logging,
|
debug_response_logging,
|
||||||
process_site_result,
|
process_site_result,
|
||||||
|
_username_fits_url_template,
|
||||||
)
|
)
|
||||||
from maigret.errors import CheckError
|
from maigret.errors import CheckError
|
||||||
from maigret.result import MaigretCheckResult, MaigretCheckStatus
|
from maigret.result import MaigretCheckResult, MaigretCheckStatus
|
||||||
@@ -126,6 +127,113 @@ def test_detect_error_page_ok():
|
|||||||
assert detect_error_page("hello world", 200, {}, ignore_403=False) is None
|
assert detect_error_page("hello world", 200, {}, ignore_403=False) is None
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_error_page_instagram_login_wall():
|
||||||
|
"""Regression for #11: when Instagram serves the login wall (typically the
|
||||||
|
response after rate-limiting an unauthenticated client), the JSON state
|
||||||
|
contains `"routePath":"\\/"` (root path) rather than a username route. The
|
||||||
|
Instagram entry in data.json carries this marker in `errors` so the result
|
||||||
|
surfaces as UNKNOWN instead of a false AVAILABLE.
|
||||||
|
"""
|
||||||
|
instagram_errors = {
|
||||||
|
"Login • Instagram": "Login required",
|
||||||
|
'"routePath":"\\/"': "Login required (rate-limited or session blocked)",
|
||||||
|
}
|
||||||
|
login_wall_html = '...{"routePath":"\\/"},"timeSpent":...'
|
||||||
|
err = detect_error_page(login_wall_html, 200, instagram_errors, ignore_403=False)
|
||||||
|
assert err is not None
|
||||||
|
assert err.type == "Site-specific"
|
||||||
|
assert "rate-limited" in err.desc
|
||||||
|
|
||||||
|
|
||||||
|
def _site_for_url(url_pattern, regex_check=None, url_probe=None):
|
||||||
|
"""Build a minimal MaigretSite stub for the URL-template helper tests."""
|
||||||
|
raw = {
|
||||||
|
"url": url_pattern,
|
||||||
|
"urlMain": "https://example.com/",
|
||||||
|
"checkType": "message",
|
||||||
|
"usernameClaimed": "alice",
|
||||||
|
"usernameUnclaimed": "noone",
|
||||||
|
}
|
||||||
|
if regex_check is not None:
|
||||||
|
raw["regexCheck"] = regex_check
|
||||||
|
if url_probe is not None:
|
||||||
|
raw["urlProbe"] = url_probe
|
||||||
|
return MaigretSite("Example", raw)
|
||||||
|
|
||||||
|
|
||||||
|
# Regression tests for #459 / #2633 — usernames that would be percent-encoded
|
||||||
|
# into a URL path segment trip generic presence markers on fallback pages.
|
||||||
|
def test_username_fits_path_segment_ascii_slug_passes():
|
||||||
|
site = _site_for_url("https://example.com/u/{username}")
|
||||||
|
assert _username_fits_url_template(site, "alice") is True
|
||||||
|
assert _username_fits_url_template(site, "alice-bob") is True
|
||||||
|
assert _username_fits_url_template(site, "alice.bob_42") is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_username_fits_path_segment_non_ascii_blocked():
|
||||||
|
site = _site_for_url("https://example.com/u/{username}")
|
||||||
|
# Cyrillic
|
||||||
|
assert _username_fits_url_template(site, "Александр") is False
|
||||||
|
# Chinese
|
||||||
|
assert _username_fits_url_template(site, "快嘴摩卡酱") is False
|
||||||
|
# Korean
|
||||||
|
assert _username_fits_url_template(site, "홍길동") is False
|
||||||
|
# Space (also percent-encoded)
|
||||||
|
assert _username_fits_url_template(site, "alice bob") is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_username_fits_query_string_is_unconstrained():
|
||||||
|
"""If {username} sits in the query string, the value is URL-encoded as a
|
||||||
|
parameter and most APIs handle that fine — don't block."""
|
||||||
|
site = _site_for_url("https://example.com/api/users?name={username}")
|
||||||
|
assert _username_fits_url_template(site, "快嘴摩卡酱") is True
|
||||||
|
assert _username_fits_url_template(site, "Александр") is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_username_fits_explicit_regex_check_bypasses_helper():
|
||||||
|
"""When the site declares its own regexCheck, the helper defers entirely."""
|
||||||
|
# Permissive site: accepts anything via Unicode-friendly regex.
|
||||||
|
site = _site_for_url(
|
||||||
|
"https://wiki.example/User:{username}", regex_check=r"^[\w\- .]+$"
|
||||||
|
)
|
||||||
|
assert _username_fits_url_template(site, "Александр") is True
|
||||||
|
assert _username_fits_url_template(site, "快嘴摩卡酱") is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_username_fits_url_probe_overrides_url():
|
||||||
|
"""urlProbe is the actual request URL; the helper must use it when set."""
|
||||||
|
# Path-segment url, but urlProbe is a clean query API → no validation
|
||||||
|
site = _site_for_url(
|
||||||
|
"https://example.com/u/{username}",
|
||||||
|
url_probe="https://example.com/api/u?name={username}",
|
||||||
|
)
|
||||||
|
assert _username_fits_url_template(site, "快嘴摩卡酱") is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_username_fits_post_payload_sites_skipped():
|
||||||
|
"""Sites with {username} only in requestPayload (no {username} in URL
|
||||||
|
template at all) should pass unconditionally — payload is JSON-encoded,
|
||||||
|
not URL-path-encoded."""
|
||||||
|
site = _site_for_url("https://api.example.com/check")
|
||||||
|
assert _username_fits_url_template(site, "快嘴摩卡酱") is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_error_page_instagram_marker_no_false_positive_on_profile():
|
||||||
|
"""The login-wall marker must NOT match a real profile page. On a claimed
|
||||||
|
user page, `routePath` carries the user-route template
|
||||||
|
(`"routePath":"\\/{username}\\/..."`); the closing-quote form
|
||||||
|
`"routePath":"\\/"` only appears on the login wall.
|
||||||
|
"""
|
||||||
|
instagram_errors = {
|
||||||
|
'"routePath":"\\/"': "Login required (rate-limited or session blocked)",
|
||||||
|
}
|
||||||
|
profile_html = (
|
||||||
|
'foo,"routePath":"\\/{username}\\/{?tab}\\/{?view_type}\\/",bar'
|
||||||
|
)
|
||||||
|
err = detect_error_page(profile_html, 200, instagram_errors, ignore_403=False)
|
||||||
|
assert err is None
|
||||||
|
|
||||||
|
|
||||||
def test_parse_usernames_single_username():
|
def test_parse_usernames_single_username():
|
||||||
logger = Mock()
|
logger = Mock()
|
||||||
result = parse_usernames({"profile_username": "alice"}, logger)
|
result = parse_usernames({"profile_username": "alice"}, logger)
|
||||||
|
|||||||
Reference in New Issue
Block a user