diff --git a/TROUBLESHOOTING.md b/TROUBLESHOOTING.md index a14237a..3ee79c0 100644 --- a/TROUBLESHOOTING.md +++ b/TROUBLESHOOTING.md @@ -51,19 +51,32 @@ pip install --upgrade certifi If you are behind a corporate proxy, set `HTTPS_PROXY` / `HTTP_PROXY` environment variables and pass `--proxy "$HTTPS_PROXY"` so Maigret uses the same route. -## ".onion / .i2p sites are skipped" +## Running over Tor, I2P, or Tails OS -These sites only load through the matching gateway. Start your Tor or I2P daemon first, then: +Two different goals, two different flags: -```bash -# Tor -maigret user --tor-proxy socks5://127.0.0.1:9050 +- **Route only `.onion` / `.i2p` sites through their gateway** (clearweb checks still use your direct connection). Use `--tor-proxy` / `--i2p-proxy`: + ```bash + maigret user --tor-proxy socks5://127.0.0.1:9050 # only .onion goes via Tor + maigret user --i2p-proxy http://127.0.0.1:4444 # only .i2p goes via I2P + ``` + Without these flags, `.onion` / `.i2p` sites are silently skipped. -# I2P -maigret user --i2p-proxy http://127.0.0.1:4444 -``` +- **Route the whole run through Tor / a proxy** (e.g. on Tails OS, or to anonymise the scan). Use `--proxy`: + ```bash + # system tor daemon (apt install tor, Tails) + maigret user --proxy socks5://127.0.0.1:9050 --timeout 60 --retries 2 -Maigret does not launch or manage these daemons — they must already be running. + # Tor Browser bundle (different SOCKS port!) + maigret user --proxy socks5://127.0.0.1:9150 --timeout 60 --retries 2 + ``` + Most public WAFs block Tor exits, so expect more UNKNOWNs over Tor than on a residential line — this is the cost of anonymity, not a bug. Raising `--timeout` to 60 and adding `--retries 2` materially reduces noise. + +On Tails, `torsocks maigret …` / `torify maigret …` do **not** work — Maigret's HTTP client bypasses libc, so the wrapper has no effect. Use `--proxy` instead. To install Maigret over Tor: `torsocks pip install --user maigret`. + +Maigret does not launch or manage Tor / I2P daemons — they must already be running. + +For the full walkthrough (Tor Browser vs system `tor` ports, Tails persistence, reports paths), see the [Tor, I2P, and proxies](https://maigret.readthedocs.io/en/latest/tor-and-proxies.html) page on readthedocs. ## "The PDF / XMind / HTML report looks wrong" diff --git a/docs/source/command-line-options.rst b/docs/source/command-line-options.rst index 2d97d74..e6034c1 100644 --- a/docs/source/command-line-options.rst +++ b/docs/source/command-line-options.rst @@ -63,6 +63,29 @@ from slow sites. On the other hand, this may cause a long delay to gather all results. The choice of the right timeout should be carried out taking into account the bandwidth of the Internet connection. +Network and proxy options +~~~~~~~~~~~~~~~~~~~~~~~~~ + +``--proxy PROXY_URL`` / ``-p PROXY_URL`` - Route **every** check through +the given HTTP or SOCKS proxy. Example: ``socks5://127.0.0.1:1080``, +``http://user:pass@proxy.example:3128``. This is the flag to use for +routing the whole run through Tor (``--proxy socks5://127.0.0.1:9050``), +a residential proxy, or any corporate gateway. No default. + +``--tor-proxy TOR_PROXY_URL`` - Gateway used **only** for ``.onion`` +sites in the database **(default: socks5://127.0.0.1:9050)**. Clearweb +sites are unaffected — for them Maigret uses your direct connection or +``--proxy`` if you set one. Without this flag, ``.onion`` sites are +silently skipped. + +``--i2p-proxy I2P_PROXY_URL`` - Gateway used **only** for ``.i2p`` +sites in the database **(default: http://127.0.0.1:4444)**. Same +"only matching protocol" rule as ``--tor-proxy``. + +Maigret does not start the Tor or I2P daemon for you — launch it first. +For a full walkthrough (Tor Browser vs system ``tor`` port numbers, +Tails OS recipe, timeout/retry tuning), see :doc:`tor-and-proxies`. + ``--cookies-jar-file`` - File with custom cookies in Netscape format (aka cookies.txt). You can install an extension to your browser to download own cookies (`Chrome `_, `Firefox `_). diff --git a/docs/source/index.rst b/docs/source/index.rst index 49e6016..8562e3f 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -30,6 +30,7 @@ You may be interested in: - :doc:`Command line options ` - :doc:`Features list ` - :doc:`Library usage ` +- :doc:`Tor, I2P, and proxies ` .. toctree:: :hidden: @@ -40,13 +41,19 @@ You may be interested in: usage-examples command-line-options features - library-usage philosophy supported-identifier-types tags - settings development +.. toctree:: + :hidden: + :caption: Advanced usage + + library-usage + settings + tor-and-proxies + .. toctree:: :hidden: :caption: Use cases diff --git a/docs/source/tor-and-proxies.rst b/docs/source/tor-and-proxies.rst new file mode 100644 index 0000000..ad8117b --- /dev/null +++ b/docs/source/tor-and-proxies.rst @@ -0,0 +1,122 @@ +.. _tor-and-proxies: + +Tor, I2P, and proxies +===================== + +Maigret can route checks through an HTTP/SOCKS proxy, the Tor network, or I2P. Three CLI flags cover three distinct goals — knowing which one you need is the most common stumbling block. + +``--proxy`` vs ``--tor-proxy`` (and ``--i2p-proxy``) +---------------------------------------------------- + +The most-asked question (see `issue #544 `_): + +- **You want every check to go through Tor** (e.g. you're on Tails OS, or behind a country-level block, or your IP is rate-limited). → Use ``--proxy``, pointing at your Tor SOCKS port: + + .. code-block:: console + + maigret --proxy socks5://127.0.0.1:9050 + +- **You want to reach ``.onion`` sites in the Maigret database**, while the rest of the run still uses your normal connection. → Use ``--tor-proxy``: + + .. code-block:: console + + maigret --tor-proxy socks5://127.0.0.1:9050 + + ``--tor-proxy`` is **only** consulted for sites whose ``url`` is a ``.onion`` host. For every other site Maigret uses your direct connection (or ``--proxy`` if set). Without ``--tor-proxy``, ``.onion`` sites are silently skipped. + +The same split applies to ``--i2p-proxy``: it is consulted only for ``.i2p`` hosts, never for clearweb sites. + +Defaults: ``--tor-proxy`` defaults to ``socks5://127.0.0.1:9050`` and ``--i2p-proxy`` to ``http://127.0.0.1:4444``. ``--proxy`` has no default. Maigret does **not** launch ``tor`` or an I2P router for you — start the daemon first. + +Tor Browser vs system ``tor``: port numbers +------------------------------------------- + +The SOCKS port differs by Tor installation: + +- **System ``tor`` daemon** (``apt install tor``, ``brew install tor``, Tails) listens on ``9050``. +- **Tor Browser bundle** ships its own ``tor`` listening on ``9150``. + +If a connection refuses, try the other port: + +.. code-block:: console + + # system tor + maigret --proxy socks5://127.0.0.1:9050 + + # Tor Browser running in the background + maigret --proxy socks5://127.0.0.1:9150 + +A note on results over Tor +-------------------------- + +Most public WAFs (Cloudflare, DDoS-Guard, AWS WAF, Akamai) block Tor exit nodes by default — usually more aggressively than they block datacenter IPs. A Tor run typically produces **more UNKNOWNs and fewer CLAIMEDs** than the same run from a residential connection. This is not a bug in Maigret; it is the cost of anonymity. + +Recommended flags for a Tor run: + +.. code-block:: console + + maigret --proxy socks5://127.0.0.1:9050 --timeout 60 --retries 2 + +- ``--timeout 60`` — Tor circuits add 1–3 seconds per request; the default 30 s causes spurious timeouts. +- ``--retries 2`` — retries cover transient circuit failures, which are common on Tor. +- Optional ``-n 20`` — lowering concurrency (default 100) reduces the chance of exits rate-limiting you. + +If you mainly need to bypass WAFs (rather than to remain anonymous), a residential proxy will usually outperform Tor by a wide margin. See the **"Lots of sites fail / timeout / return 403"** section in `TROUBLESHOOTING.md `_. + +Running on Tails OS +------------------- + +Tails forces every outbound connection through Tor at the network layer. Maigret needs no special configuration to comply — pointing ``--proxy`` at the Tails Tor daemon is enough: + +.. code-block:: console + + maigret --proxy socks5://127.0.0.1:9050 --timeout 60 + +Things that are **not** needed: + +- ``torsocks maigret …`` and ``torify maigret …`` — these wrap libc socket calls, but Maigret's HTTP client (``aiohttp`` / ``curl_cffi``) bypasses libc for network I/O, so the wrapper has no effect. Use ``--proxy`` instead. +- ``--tor-proxy`` — on Tails, *everything* must go via Tor (the OS enforces this), so the niche "only .onion via Tor" mode that ``--tor-proxy`` provides does not apply. + +Installation over Tor on Tails +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +``pip`` itself does not know about Tor; on Tails you need ``torsocks`` to wrap it: + +.. code-block:: console + + torsocks pip install --user maigret + +After install, the binary lands in ``~/.local/bin/maigret``. If ``maigret: command not found``, either add ``~/.local/bin`` to ``PATH`` or invoke it as ``python3 -m maigret ``. + +Persisting Maigret across Tails sessions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Tails wipes ``~/.local/`` on reboot unless you configure the Persistent Storage to keep it. This is Tails configuration, not Maigret configuration — see the official Tails docs: + +- `Persistent Storage on Tails `_ +- `Configuring Persistent Storage features `_ + +A step-by-step recipe contributed by a user (persisting ``~/.local/lib/python3.9`` and ``~/.local/bin`` and patching ``.bashrc``) is in `issue #544 `_. Treat it as a starting point: the Python version and Tails internals change between Tails releases. + +Reports on Tails — where to save them +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The default ``reports/`` directory lives next to the working directory and is wiped with the amnesiac session. To save reports somewhere persistent, either pass ``-fo``: + +.. code-block:: console + + maigret --html -fo "/home/amnesia/Persistent/maigret-reports" + +or set ``"reports_path"`` in your ``settings.json`` to a persistent path. See :doc:`settings`. + +Programmatic equivalents (Python library) +----------------------------------------- + +The same options are available through the Python API. See :doc:`library-usage` — the relevant keyword arguments are ``proxy=``, ``tor_proxy=`` and ``i2p_proxy=``, accepting the same URL formats as the CLI flags. + +See also +-------- + +- :doc:`command-line-options` — full reference for the three flags. +- `TROUBLESHOOTING.md `_ — quick recipes for ``.onion`` / I2P sites and for WAF-induced 403s. +- :doc:`library-usage` — proxy options for embedded use. diff --git a/maigret/resources/db_meta.json b/maigret/resources/db_meta.json index 8e6d5e2..e479cfb 100644 --- a/maigret/resources/db_meta.json +++ b/maigret/resources/db_meta.json @@ -1,6 +1,6 @@ { "version": 1, - "updated_at": "2026-05-15T18:46:56Z", + "updated_at": "2026-05-16T10:45:38Z", "sites_count": 3155, "min_maigret_version": "0.6.1", "data_sha256": "df2ab3dbc96bdcdc8aa4e9da485df75ce6c3274814080f00a35e89f7f43783e1",