Cloudflare bypass webgate (#2628)

This commit is contained in:
Soxoj
2026-05-09 10:48:43 +03:00
committed by GitHub
parent b98a134fcf
commit 5c93b206e7
14 changed files with 1170 additions and 159 deletions
+13
View File
@@ -268,6 +268,19 @@ maigret user --i2p-proxy http://127.0.0.1:4444
Start your Tor / I2P daemon before running the command — Maigret does not manage these gateways.
### Cloudflare bypass
> **Experimental.** The Cloudflare webgate is under active development; the configuration schema, CLI behaviour, and the set of routed sites may change without backwards-compatibility guarantees.
A subset of sites in the database require a real browser to solve a JavaScript challenge. Maigret can offload these checks to a local [FlareSolverr](https://github.com/FlareSolverr/FlareSolverr) instance:
```bash
docker run -d -p 8191:8191 --name flaresolverr ghcr.io/flaresolverr/flaresolverr:latest
maigret --cloudflare-bypass <username>
```
The bypass is opt-in (`--cloudflare-bypass` or `cloudflare_bypass.enabled` in `settings.json`) and only fires for sites whose `protection` field matches. See the [feature docs](https://maigret.readthedocs.io/en/latest/features.html#cloudflare-bypass) for backend options and configuration.
## Contributing
Add or fix new sites surgically in `data.json` (no `json.load`/`json.dump`), then run `./utils/update_site_data.py` to regenerate `sites.md` and the database metadata, and open a pull request. For more details, see the [CONTRIBUTING guide](https://github.com/soxoj/maigret/blob/main/CONTRIBUTING.md) and [development docs](https://maigret.readthedocs.io/en/latest/development.html). Release history: [CHANGELOG.md](CHANGELOG.md).
+11
View File
@@ -95,6 +95,17 @@ the run after the explicit update finishes.
``--retries RETRIES`` - Count of attempts to restart temporarily failed
requests.
``--cloudflare-bypass`` *(experimental)* - Route checks for sites tagged
``protection: ["cf_js_challenge"]`` / ``["cf_firewall"]`` / ``["webgate"]``
through a local Chrome-based solver (FlareSolverr by default). The bypass
is opt-in — without this flag (or
``settings.cloudflare_bypass.enabled = true``) those sites are checked
the usual way, which Cloudflare almost always blocks: you get an UNKNOWN
status with a JS-challenge / firewall error rather than a real result.
Configure the backend in ``settings.cloudflare_bypass.modules``.
See :ref:`cloudflare-bypass`. **Experimental** — the flag, schema and
routing rules may change without backwards-compatibility guarantees.
.. _custom-database:
Using a custom sites database
+55
View File
@@ -237,6 +237,61 @@ The Maigret database contains not only the original websites, but also mirrors,
It allows getting additional info about the person and checking the existence of the account even if the main site is unavailable (bot protection, captcha, etc.)
.. _cloudflare-bypass:
Cloudflare webgate bypass
-------------------------
.. warning::
**Experimental feature.** The Cloudflare webgate is under active
development. The configuration schema, CLI flag behaviour, and the set
of sites that route through it may change without backwards-compatibility
guarantees. Expect rough edges (CF rate limits, occasional solver
failures) and report issues so they can be ironed out.
Some sites sit behind a full Cloudflare JavaScript challenge or a CF firewall
hard block — these are tagged ``protection: ["cf_js_challenge"]`` or
``protection: ["cf_firewall"]`` in the database and are normally kept disabled
because neither aiohttp nor curl_cffi can solve the JS challenge on their own.
Maigret can offload these checks to a local Chrome-based solver. Two backends
are supported, configured in ``settings.json`` under
``cloudflare_bypass.modules`` (the first reachable module wins; subsequent
ones are tried as a fallback chain):
* **FlareSolverr** (recommended). Runs a real Chrome instance and exposes a
JSON API. The upstream HTTP status, headers and final URL are preserved, so
``checkType: status_code`` and ``checkType: response_url`` keep working
through the bypass.
.. code-block:: console
docker run -d -p 8191:8191 --name flaresolverr ghcr.io/flaresolverr/flaresolverr:latest
* **CloudflareBypassForScraping** (legacy fallback). Returns rendered HTML
only, so the upstream status code is lost — ``checkType: message`` keeps
working but ``status_code`` checks misfire (treated as 200 on success).
Activate the bypass either with the CLI flag::
maigret --cloudflare-bypass <username>
or by setting ``cloudflare_bypass.enabled`` to ``true`` in ``settings.json``.
The bypass only fires for sites whose ``protection`` field intersects
``cloudflare_bypass.trigger_protection`` (default
``["cf_js_challenge", "cf_firewall", "webgate"]``); all other sites use the
normal aiohttp / curl_cffi path.
If all configured modules are unreachable, affected sites get an UNKNOWN
status with an actionable error pointing at the first module's URL — the
fix is almost always to start the FlareSolverr container.
FlareSolverr session reuse is automatic: Maigret pins a single
``session: <session_prefix>-<pid>`` per run, so cf_clearance cookies are
shared between checks of the same domain (510× faster on subsequent
requests to that host).
Activation
----------
The activation mechanism helps make requests to sites requiring additional authentication like cookies, JWT tokens, or custom headers.
+22
View File
@@ -125,3 +125,25 @@ After installing the system dependencies, retry the maigret installation.
If you continue to have issues, consider using Docker instead, which includes all
necessary dependencies.
Optional: Cloudflare bypass solver
----------------------------------
.. warning::
**Experimental.** The Cloudflare webgate is under active development;
the configuration schema and CLI behaviour may change without
backwards-compatibility guarantees.
Sites tagged ``cf_js_challenge`` / ``cf_firewall`` need a real browser to pass
their JavaScript challenge. To check those sites you can run a local
`FlareSolverr <https://github.com/FlareSolverr/FlareSolverr>`_ instance —
Maigret will route protected checks to it when ``--cloudflare-bypass`` is set:
.. code-block:: bash
docker run -d -p 8191:8191 --name flaresolverr ghcr.io/flaresolverr/flaresolverr:latest
This is **optional** — Maigret runs without it; only sites whose
``protection`` field intersects ``settings.cloudflare_bypass.trigger_protection``
require the solver. See :ref:`cloudflare-bypass` for details.
+89
View File
@@ -102,6 +102,95 @@ This is recommended for **Docker containers**, **CI pipelines**, and **air-gappe
**Using a custom database** with ``--db`` always skips auto-update — you are explicitly choosing your data source.
Cloudflare webgate
------------------
.. warning::
**Experimental.** The ``cloudflare_bypass`` block is under active
development; field names, defaults, and the trigger-protection routing
rules may change without backwards-compatibility guarantees.
The ``cloudflare_bypass`` block in ``settings.json`` configures the optional
bypass described in :ref:`cloudflare-bypass`. Default value:
.. code-block:: json
{
"cloudflare_bypass": {
"enabled": false,
"session_prefix": "maigret",
"trigger_protection": ["cf_js_challenge", "cf_firewall", "webgate"],
"modules": [
{
"name": "flaresolverr",
"method": "json_api",
"url": "http://localhost:8191/v1",
"max_timeout_ms": 60000
},
{
"name": "chrome_webgate",
"method": "url_rewrite",
"url": "http://localhost:8000/html?url={url}&retries=1"
}
]
}
}
**Fields.**
.. list-table::
:header-rows: 1
:widths: 30 70
* - Field
- Description
* - ``enabled``
- When ``true``, the bypass is active for every run; when ``false``
(the default), it activates only on ``--cloudflare-bypass``.
* - ``trigger_protection``
- List of ``site.protection`` values that route a check through the
webgate. Sites whose protection is empty or doesn't intersect this
list use the default (aiohttp / curl_cffi) checker.
* - ``session_prefix``
- Prefix for the FlareSolverr ``session`` field. Maigret appends the
process PID so concurrent runs don't collide. Reusing a session
caches cf_clearance between checks of the same domain.
* - ``modules``
- Ordered list of backend modules. The first reachable module
handles the check; later ones serve as a fallback chain.
**Module methods.**
* ``json_api`` — FlareSolverr-compatible POST endpoint at ``url``.
Preserves real upstream HTTP status, headers and final URL.
Optional ``max_timeout_ms`` (default ``60000``) is the per-request
budget the solver is allowed to spend on the JS challenge.
* ``url_rewrite`` — legacy CloudflareBypassForScraping endpoint. The
``url`` must contain a ``{url}`` placeholder; the original probe URL
is URL-encoded and substituted in. Returns rendered HTML only —
``checkType: status_code`` and ``response_url`` checks misfire under
this method (treated as a synthetic HTTP 200 on success).
**Optional ``proxy`` field (``json_api`` only).**
A module may carry a ``proxy`` entry that the solver routes the upstream
request through. Useful when a site enforces ``ip_reputation`` rules
that block the solver host. Two forms are accepted:
.. code-block:: json
{ "proxy": "socks5://localhost:1080" }
.. code-block:: json
{ "proxy": { "url": "http://gw.example:3128",
"username": "u",
"password": "p" } }
Only ``url``/``username``/``password`` are forwarded; other keys are
dropped. Cloudflare ``Error 1015 / 1020`` responses indicate the IP is
rate-limited or banned — switch the proxy rather than retrying.
.. _ai-analysis-settings:
AI analysis
+287 -3
View File
@@ -2,6 +2,7 @@
import ast
import asyncio
import logging
import os
import random
import re
import ssl
@@ -48,6 +49,53 @@ SUPPORTED_IDS = (
BAD_CHARS = "#"
def build_cloudflare_bypass_config(
settings_obj: Optional[Any], force_enable: bool = False
) -> Optional[Dict[str, Any]]:
"""Resolve Cloudflare webgate config from settings + CLI flag.
Returns ``None`` when bypass is inactive or no usable module is configured.
Otherwise returns a dict consumed by ``CloudflareWebgateChecker``:
- ``trigger_protection``: list of ``site.protection`` values that
activate the bypass (e.g. ``["cf_js_challenge", "cf_firewall", "webgate"]``)
- ``modules``: ordered list of backend modules to try; each entry has
``name``, ``method`` (``json_api`` for FlareSolverr, ``url_rewrite``
for CloudflareBypassForScraping), and a method-specific ``url`` plus
optional ``max_timeout_ms``.
- ``session_prefix``: prefix for FlareSolverr session reuse.
"""
raw = {}
if settings_obj is not None:
raw = getattr(settings_obj, "cloudflare_bypass", {}) or {}
enabled = bool(force_enable) or bool(raw.get("enabled", False))
if not enabled:
return None
modules_raw = raw.get("modules") or []
valid_modules: List[Dict[str, Any]] = []
for module in modules_raw:
method = module.get("method")
url = module.get("url")
if method == "json_api" and url:
valid_modules.append(dict(module))
elif method == "url_rewrite" and url and "{url}" in url:
valid_modules.append(dict(module))
if not valid_modules:
return None
trigger = raw.get("trigger_protection") or [
"cf_js_challenge",
"cf_firewall",
"webgate",
]
return {
"trigger_protection": list(trigger),
"modules": valid_modules,
"session_prefix": raw.get("session_prefix", "maigret"),
}
class CheckerBase:
pass
@@ -287,6 +335,221 @@ class CurlCffiChecker(CheckerBase):
return None, 0, CheckError("Unexpected", str(e))
class CloudflareWebgateChecker(CheckerBase):
"""Sends checks through a Cloudflare-bypass proxy.
Supports two backends, selected by ``modules[0].method`` in settings:
- ``json_api`` (FlareSolverr): POST to ``/v1`` with ``cmd: request.get``.
Preserves real upstream status_code, headers and final URL — drop-in
replacement for SimpleAiohttpChecker.
- ``url_rewrite`` (CloudflareBypassForScraping ``/html`` endpoint):
legacy mode. Returns rendered HTML only. Real upstream status is
lost (proxy answers 200 on success). status_code / response_url
check types degrade to "200 if HTML returned, AVAILABLE otherwise".
"""
SESSION_PREFIX_DEFAULT = "maigret"
def __init__(self, *args, **kwargs):
self.logger = kwargs.get('logger', Mock())
config = kwargs.get('config') or {}
self._modules: List[Dict[str, Any]] = []
for raw in config.get('modules') or []:
module = dict(raw)
module.setdefault('method', 'json_api')
module.setdefault('name', module.get('method'))
self._modules.append(module)
if not self._modules:
raise ValueError("CloudflareWebgateChecker requires at least one module")
# Session ID is computed per-request from the target host. Sharing a
# single session across hosts caused FlareSolverr to break in
# practice (TLS state / cookies leaking between domains), so each
# host gets its own Chrome instance.
self._session_prefix = (
f"{config.get('session_prefix', self.SESSION_PREFIX_DEFAULT)}-{os.getpid()}"
)
self.url = None
self.headers = None
self.allow_redirects = True
self.timeout = 0
self.method = 'get'
self.payload = None
@property
def session_id(self) -> str:
"""FlareSolverr session ID, scoped per target host."""
from urllib.parse import urlparse
host = urlparse(self.url or "").hostname or "default"
host_safe = re.sub(r"[^a-zA-Z0-9.-]", "_", host)
return f"{self._session_prefix}-{host_safe}"
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get', payload=None):
self.url = url
self.headers = headers or {}
self.allow_redirects = allow_redirects
self.timeout = timeout
self.method = method
self.payload = payload
return None
async def close(self):
pass
async def check(self) -> Tuple[Optional[str], int, Optional[CheckError]]:
attempts: List[str] = []
last_error: Optional[CheckError] = None
for module in self._modules:
method = module.get('method')
module_name = module.get('name', method or '?')
if method == 'json_api':
result = await self._check_flaresolverr(module)
elif method == 'url_rewrite':
result = await self._check_url_rewrite(module)
else:
self.logger.warning(
f"Webgate module '{module_name}' has unknown method "
f"'{method}', skipping"
)
attempts.append(f"{module_name}:unknown-method")
continue
body, status, err = result
if err is None:
return result
last_error = err
attempts.append(f"{module_name}:{err.type}")
self.logger.info(
f"Webgate module '{module_name}' failed for {self.url}: "
f"{err.type}: {err.desc}. Trying next module if any."
)
# All modules failed. Give the user a single, actionable error with
# the first module's URL — that's almost always FlareSolverr, and
# the most common failure is "user forgot to start the container".
primary = self._modules[0]
primary_url = primary.get('url', '?')
primary_method = primary.get('method', '?')
hint = (
f"docker run -d -p 8191:8191 ghcr.io/flaresolverr/flaresolverr:latest"
if primary_method == 'json_api'
else "start the local proxy container"
)
last_desc = last_error.desc if last_error else "unknown"
return None, 0, CheckError(
"Webgate unavailable",
f"all {len(self._modules)} module(s) failed [{', '.join(attempts)}]. "
f"Last error: {last_desc}. "
f"Is the solver running at {primary_url}? (hint: {hint})",
)
async def _check_flaresolverr(
self, module: Dict[str, Any]
) -> Tuple[Optional[str], int, Optional[CheckError]]:
endpoint = module.get('url') or 'http://localhost:8191/v1'
max_timeout_ms = int(module.get('max_timeout_ms', 60000))
post_method = self.method.lower() == 'post'
cmd = "request.post" if post_method else "request.get"
body: Dict[str, Any] = {
"cmd": cmd,
"url": self.url,
"maxTimeout": max_timeout_ms,
"session": self.session_id,
}
proxy = module.get('proxy')
if isinstance(proxy, str) and proxy:
body["proxy"] = {"url": proxy}
elif isinstance(proxy, dict) and proxy.get("url"):
body["proxy"] = {k: v for k, v in proxy.items() if k in ("url", "username", "password")}
if post_method and self.payload is not None:
# FlareSolverr expects postData as urlencoded string for form data,
# but if site.request_payload is JSON we still send it.
body["postData"] = (
"&".join(f"{k}={quote(str(v))}" for k, v in self.payload.items())
)
timeout = max(int(self.timeout) if self.timeout else 30, max_timeout_ms / 1000 + 5)
try:
async with ClientSession() as session:
async with session.post(
endpoint, json=body, timeout=timeout
) as resp:
if resp.status >= 500:
return None, 0, CheckError(
"Webgate", f"FlareSolverr {resp.status}"
)
data = await resp.json()
except (ClientConnectorError, ServerDisconnectedError) as e:
return None, 0, CheckError("Webgate unreachable", str(e))
except asyncio.TimeoutError:
return None, 0, CheckError("Webgate timeout", endpoint)
except Exception as e:
self.logger.debug(e, exc_info=True)
return None, 0, CheckError("Webgate", str(e))
if data.get("status") != "ok":
return None, 0, CheckError("Webgate", data.get("message", "unknown"))
solution = data.get("solution") or {}
upstream_status = int(solution.get("status") or 0)
response_text = solution.get("response") or ""
# Diagnostic: warn if FlareSolverr returned the CF challenge page
# itself (challenge not fully solved) rather than the real content.
# When this happens with sites that have weak presenseStrs/absenceStrs,
# maigret's default-true presence rule produces false CLAIMED.
cf_markers = ("Just a moment", "_cf_chl_opt", "cf-mitigated", "challenges.cloudflare.com")
if response_text and any(m in response_text for m in cf_markers):
self.logger.warning(
f"Webgate response from {self.url} still contains CF challenge "
f"markers (status={upstream_status}, body={len(response_text)}b). "
f"FlareSolverr likely did not solve the challenge — site checks "
f"with weak markers may produce false CLAIMED."
)
self.logger.info(
f"Webgate response: url={self.url} status={upstream_status} "
f"body_len={len(response_text)}"
)
return response_text, upstream_status, None
async def _check_url_rewrite(
self, module: Dict[str, Any]
) -> Tuple[Optional[str], int, Optional[CheckError]]:
url_template = module.get('url') or ''
if "{url}" not in url_template:
return None, 0, CheckError(
"Webgate", f"module '{module.get('name')}' url has no {{url}} placeholder"
)
from urllib.parse import quote_plus
proxy_url = url_template.format(url=quote_plus(self.url))
timeout = self.timeout if self.timeout else 30
try:
async with ClientSession() as session:
async with session.get(proxy_url, timeout=timeout) as resp:
if resp.status >= 500:
return None, 0, CheckError(
"Webgate", f"url_rewrite proxy {resp.status}"
)
body = await resp.text()
except (ClientConnectorError, ServerDisconnectedError) as e:
return None, 0, CheckError("Webgate unreachable", str(e))
except asyncio.TimeoutError:
return None, 0, CheckError("Webgate timeout", proxy_url)
except Exception as e:
self.logger.debug(e, exc_info=True)
return None, 0, CheckError("Webgate", str(e))
# url_rewrite mode CANNOT recover the upstream HTTP status.
# We assume 200 when HTML is returned; status_code/response_url
# check types will misfire (see docs).
return body, 200, None
class CheckerMock:
def __init__(self, *args, **kwargs):
pass
@@ -547,9 +810,24 @@ def make_site_result(
# workaround to prevent slash errors
url = re.sub("(?<!:)/+", "/", url)
# Select checker: use curl_cffi for sites requiring TLS impersonation
# Select checker. Order of precedence:
# 1. Cloudflare webgate (FlareSolverr / CloudflareBypassForScraping) when
# bypass is active and site.protection requests it.
# 2. curl_cffi for sites requiring TLS impersonation.
# 3. Default protocol-specific checker (aiohttp).
cf_bypass = options.get("cloudflare_bypass")
needs_webgate = bool(cf_bypass) and any(
p in cf_bypass["trigger_protection"] for p in site.protection
)
needs_impersonation = 'tls_fingerprint' in site.protection
if needs_impersonation and CURL_CFFI_AVAILABLE:
if needs_webgate:
checker = CloudflareWebgateChecker(logger=logger, config=cf_bypass)
logger.info(
f"Using Cloudflare webgate for {site.name} "
f"(protection: {list(site.protection)})"
)
elif needs_impersonation and CURL_CFFI_AVAILABLE:
checker = CurlCffiChecker(logger=logger, browser_emulate='chrome')
elif needs_impersonation and not CURL_CFFI_AVAILABLE:
logger.warning(
@@ -761,6 +1039,7 @@ async def maigret(
cookies=None,
retries=0,
check_domains=False,
cloudflare_bypass: Optional[Dict[str, Any]] = None,
*args,
**kwargs,
) -> QueryResultWrapper:
@@ -859,6 +1138,7 @@ async def maigret(
options["timeout"] = timeout
options["id_type"] = id_type
options["forced"] = forced
options["cloudflare_bypass"] = cloudflare_bypass
# results from analysis of all sites
all_results: Dict[str, QueryResultWrapper] = {}
@@ -962,6 +1242,7 @@ async def site_self_check(
cookies=None,
auto_disable=False,
diagnose=False,
cloudflare_bypass: Optional[Dict[str, Any]] = None,
):
"""
Self-check a site configuration.
@@ -1002,6 +1283,7 @@ async def site_self_check(
tor_proxy=tor_proxy,
i2p_proxy=i2p_proxy,
cookies=cookies,
cloudflare_bypass=cloudflare_bypass,
)
# don't disable entries with other ids types
@@ -1130,6 +1412,7 @@ async def self_check(
auto_disable=False,
diagnose=False,
no_progressbar=False,
cloudflare_bypass: Optional[Dict[str, Any]] = None,
) -> dict:
"""
Run self-check on sites.
@@ -1158,7 +1441,8 @@ async def self_check(
for _, site in all_sites.items():
check_coro = site_self_check(
site, logger, sem, db, silent, proxy, tor_proxy, i2p_proxy,
skip_errors=True, auto_disable=auto_disable, diagnose=diagnose
skip_errors=True, auto_disable=auto_disable, diagnose=diagnose,
cloudflare_bypass=cloudflare_bypass,
)
future = asyncio.ensure_future(check_coro)
tasks.append((site.name, future))
+24
View File
@@ -34,6 +34,7 @@ from .checking import (
self_check,
BAD_CHARS,
maigret,
build_cloudflare_bypass_config,
)
from . import errors
from .notify import QueryNotifyPrint
@@ -281,6 +282,13 @@ def setup_arguments_parser(settings: Settings):
default=settings.domain_search,
help="Enable (experimental) feature of checking domains on usernames.",
)
parser.add_argument(
"--cloudflare-bypass",
action="store_true",
default=False,
help="Enable Cloudflare webgate bypass for sites with protection cf_js_challenge / cf_firewall / webgate. "
"Requires a local CloudflareBypassForScraping instance (see settings.json -> cloudflare_bypass.modules[0].url).",
)
filter_group = parser.add_argument_group(
'Site filtering', 'Options to set site search scope'
@@ -552,6 +560,20 @@ async def main():
arg_parser = setup_arguments_parser(settings)
args = arg_parser.parse_args()
# Resolve Cloudflare webgate config (CLI flag OR settings.cloudflare_bypass.enabled)
cf_bypass_config = build_cloudflare_bypass_config(
settings, force_enable=args.cloudflare_bypass
)
if cf_bypass_config:
modules_summary = ", ".join(
f"{m.get('name', m.get('method'))}({m.get('url')})"
for m in cf_bypass_config["modules"]
)
logger.info(
f"Cloudflare webgate active: triggers={cf_bypass_config['trigger_protection']}, "
f"modules=[{modules_summary}]"
)
# Re-set logging level based on args
if args.debug:
log_level = logging.DEBUG
@@ -682,6 +704,7 @@ async def main():
auto_disable=args.auto_disable,
diagnose=args.diagnose,
no_progressbar=args.no_progressbar,
cloudflare_bypass=cf_bypass_config,
)
is_need_update = check_result.get('needs_update', False)
@@ -816,6 +839,7 @@ async def main():
no_progressbar=args.no_progressbar,
retries=args.retries,
check_domains=args.with_domains,
cloudflare_bypass=cf_bypass_config,
)
if not args.ai:
File diff suppressed because it is too large Load Diff
+21 -1
View File
@@ -61,5 +61,25 @@
"web_interface_port": 5000,
"no_autoupdate": false,
"db_update_meta_url": "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/db_meta.json",
"autoupdate_check_interval_hours": 24
"autoupdate_check_interval_hours": 24,
"cloudflare_bypass": {
"enabled": false,
"session_prefix": "maigret",
"trigger_protection": ["cf_js_challenge", "cf_firewall", "webgate"],
"modules": [
{
"name": "flaresolverr",
"method": "json_api",
"url": "http://localhost:8191/v1",
"max_timeout_ms": 60000,
"comment": "FlareSolverr (https://github.com/FlareSolverr/FlareSolverr). docker run -d -p 8191:8191 ghcr.io/flaresolverr/flaresolverr:latest"
},
{
"name": "chrome_webgate",
"method": "url_rewrite",
"url": "http://localhost:8000/html?url={url}&retries=1",
"comment": "CloudflareBypassForScraping fallback. WARNING: returns rendered HTML only — checkType: status_code and response_url misfire."
}
]
}
}
+1
View File
@@ -47,6 +47,7 @@ class Settings:
no_autoupdate: bool
db_update_meta_url: str
autoupdate_check_interval_hours: int
cloudflare_bypass: dict
# submit mode settings
presence_strings: list
+1
View File
@@ -113,6 +113,7 @@ class Submitter:
cookies=self.args.cookie_file,
# Don't skip errors in submit mode - we need check both false positives/true negatives
skip_errors=False,
cloudflare_bypass=getattr(self, 'cloudflare_bypass', None),
)
return changes
+37 -37
View File
@@ -100,7 +100,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://www.op.gg/) [OP.GG LoL Vietnam (https://www.op.gg/)](https://www.op.gg/)*: top 500, gaming, vn*
1. ![](https://www.google.com/s2/favicons?domain=https://www.op.gg/) [OP.GG LoL Thailand (https://www.op.gg/)](https://www.op.gg/)*: top 500, gaming, th*
1. ![](https://www.google.com/s2/favicons?domain=https://www.xing.com/) [Xing (https://www.xing.com/)](https://www.xing.com/)*: top 500, de, eu*
1. ![](https://www.google.com/s2/favicons?domain=https://www.patreon.com/) [Patreon (https://www.patreon.com/)](https://www.patreon.com/)*: top 500, finance*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.patreon.com/) [Patreon (https://www.patreon.com/)](https://www.patreon.com/)*: top 500, finance*
1. ![](https://www.google.com/s2/favicons?domain=https://deviantart.com) [DeviantART (https://deviantart.com)](https://deviantart.com)*: top 500, art, photo*
1. ![](https://www.google.com/s2/favicons?domain=https://www.gofundme.com) [Gofundme (https://www.gofundme.com)](https://www.gofundme.com)*: top 500, finance*
1. ![](https://www.google.com/s2/favicons?domain=https://www.zhihu.com/) [Zhihu (https://www.zhihu.com/)](https://www.zhihu.com/)*: top 500, cn*, search is disabled
@@ -170,7 +170,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://www.liveinternet.ru) [LiveInternet (https://www.liveinternet.ru)](https://www.liveinternet.ru)*: top 5K, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://www.buymeacoffee.com/) [BuyMeACoffee (https://www.buymeacoffee.com/)](https://www.buymeacoffee.com/)*: top 5K, freelance*
1. ![](https://www.google.com/s2/favicons?domain=https://gitea.com/) [Gitea (https://gitea.com/)](https://gitea.com/)*: top 5K, coding*
1. ![](https://www.google.com/s2/favicons?domain=https://genius.com/) [Genius (https://genius.com/)](https://genius.com/)*: top 5K, music*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://genius.com/) [Genius (https://genius.com/)](https://genius.com/)*: top 5K, music*
1. ![](https://www.google.com/s2/favicons?domain=https://www.techrepublic.com) [Techrepublic (https://www.techrepublic.com)](https://www.techrepublic.com)*: top 5K, news, tech*
1. ![](https://www.google.com/s2/favicons?domain=https://hubpages.com/) [HubPages (https://hubpages.com/)](https://hubpages.com/)*: top 5K, blog*
1. ![](https://www.google.com/s2/favicons?domain=https://www.artstation.com) [Artstation (https://www.artstation.com)](https://www.artstation.com)*: top 5K, art, stock*
@@ -182,7 +182,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://www.alltrails.com/) [AllTrails (https://www.alltrails.com/)](https://www.alltrails.com/)*: top 5K, sport, travel*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://habr.com/) [Habr (https://habr.com/)](https://habr.com/)*: top 5K, blog, discussion, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://www.allrecipes.com/) [AllRecipes (https://www.allrecipes.com/)](https://www.allrecipes.com/)*: top 5K, hobby*
1. ![](https://www.google.com/s2/favicons?domain=https://www.redbubble.com/) [Redbubble (https://www.redbubble.com/)](https://www.redbubble.com/)*: top 5K, shopping*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.redbubble.com/) [Redbubble (https://www.redbubble.com/)](https://www.redbubble.com/)*: top 5K, shopping*
1. ![](https://www.google.com/s2/favicons?domain=https://www.diigo.com/) [Diigo (https://www.diigo.com/)](https://www.diigo.com/)*: top 5K, bookmarks*
1. ![](https://www.google.com/s2/favicons?domain=https://windy.com/) [Windy (https://windy.com/)](https://windy.com/)*: top 5K, maps*
1. ![](https://www.google.com/s2/favicons?domain=https://codecanyon.net) [Codecanyon (https://codecanyon.net)](https://codecanyon.net)*: top 5K, coding, shopping*
@@ -270,7 +270,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://hackaday.io/) [Hackaday (https://hackaday.io/)](https://hackaday.io/)*: top 5K, hobby, tech*
1. ![](https://www.google.com/s2/favicons?domain=https://www.animenewsnetwork.com) [AnimeNewsNetwork (https://www.animenewsnetwork.com)](https://www.animenewsnetwork.com)*: top 5K, anime, news*
1. ![](https://www.google.com/s2/favicons?domain=https://www.librarything.com/) [LibraryThing (https://www.librarything.com/)](https://www.librarything.com/)*: top 5K, books*
1. ![](https://www.google.com/s2/favicons?domain=https://www.fodors.com) [Fodors (https://www.fodors.com)](https://www.fodors.com)*: top 5K, travel*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.fodors.com) [Fodors (https://www.fodors.com)](https://www.fodors.com)*: top 5K, travel*
1. ![](https://www.google.com/s2/favicons?domain=https://99designs.com) [Designs99 (https://99designs.com)](https://99designs.com)*: top 5K, design, photo*
1. ![](https://www.google.com/s2/favicons?domain=https://www.pscp.tv) [Periscope (https://www.pscp.tv)](https://www.pscp.tv)*: top 5K, streaming, video*
1. ![](https://www.google.com/s2/favicons?domain=https://freesound.org/) [Freesound (https://freesound.org/)](https://freesound.org/)*: top 5K, music*
@@ -415,13 +415,13 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://www.thestudentroom.co.uk) [TheStudentRoom (https://www.thestudentroom.co.uk)](https://www.thestudentroom.co.uk)*: top 100K, forum, gb*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.codementor.io/) [Codementor (https://www.codementor.io/)](https://www.codementor.io/)*: top 100K, coding*
1. ![](https://www.google.com/s2/favicons?domain=https://n4g.com/) [N4g (https://n4g.com/)](https://n4g.com/)*: top 100K, gaming, news*
1. ![](https://www.google.com/s2/favicons?domain=https://www.lomography.com) [Lomography (https://www.lomography.com)](https://www.lomography.com)*: top 100K, photo*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.lomography.com) [Lomography (https://www.lomography.com)](https://www.lomography.com)*: top 100K, photo*
1. ![](https://www.google.com/s2/favicons?domain=https://pixelfed.social/) [pixelfed.social (https://pixelfed.social/)](https://pixelfed.social/)*: top 100K, art, photo*
1. ![](https://www.google.com/s2/favicons?domain=https://www.hackerearth.com) [Hackerearth (https://www.hackerearth.com)](https://www.hackerearth.com)*: top 100K, freelance*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://weedmaps.com) [Weedmaps (https://weedmaps.com)](https://weedmaps.com)*: top 100K, us*
1. ![](https://www.google.com/s2/favicons?domain=https://www.redtube.com/) [Redtube (https://www.redtube.com/)](https://www.redtube.com/)*: top 100K, porn*
1. ![](https://www.google.com/s2/favicons?domain=https://www.neoseeker.com) [Neoseeker (https://www.neoseeker.com)](https://www.neoseeker.com)*: top 100K, forum, gaming*
1. ![](https://www.google.com/s2/favicons?domain=https://liberapay.com) [Liberapay (https://liberapay.com)](https://liberapay.com)*: top 100K, finance*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://liberapay.com) [Liberapay (https://liberapay.com)](https://liberapay.com)*: top 100K, finance*
1. ![](https://www.google.com/s2/favicons?domain=https://www.sythe.org) [Sythe (https://www.sythe.org)](https://www.sythe.org)*: top 100K, forum*
1. ![](https://www.google.com/s2/favicons?domain=https://www.filmweb.pl/user/adam) [FilmWeb (https://www.filmweb.pl/user/adam)](https://www.filmweb.pl/user/adam)*: top 100K, movies, pl*
1. ![](https://www.google.com/s2/favicons?domain=https://listal.com/) [Listal (https://listal.com/)](https://listal.com/)*: top 100K, movies, music*
@@ -430,7 +430,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://www.spatial.io) [Spatial (https://www.spatial.io)](https://www.spatial.io)*: top 100K, crypto, gaming*
1. ![](https://www.google.com/s2/favicons?domain=https://www.nn.ru/) [NN.RU (https://www.nn.ru/)](https://www.nn.ru/)*: top 100K, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://paragraph.com) [Paragraph (https://paragraph.com)](https://paragraph.com)*: top 100K, blog, crypto*
1. ![](https://www.google.com/s2/favicons?domain=https://www.huntingnet.com) [Huntingnet (https://www.huntingnet.com)](https://www.huntingnet.com)*: top 100K, us*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.huntingnet.com) [Huntingnet (https://www.huntingnet.com)](https://www.huntingnet.com)*: top 100K, us*
1. ![](https://www.google.com/s2/favicons?domain=https://telescope.ac) [telescope.ac (https://telescope.ac)](https://telescope.ac)*: top 100K, blog*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://chaos.social/) [chaos.social (https://chaos.social/)](https://chaos.social/)*: top 100K, social*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://chaos.social/) [mastodon.social (https://chaos.social/)](https://chaos.social/)*: top 100K, social*
@@ -522,7 +522,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://mastodon.cloud/) [mastodon.cloud (https://mastodon.cloud/)](https://mastodon.cloud/)*: top 100K, pk*
1. ![](https://www.google.com/s2/favicons?domain=https://1x.com) [1x (https://1x.com)](https://1x.com)*: top 100K, photo*
1. ![](https://www.google.com/s2/favicons?domain=https://www.patientslikeme.com) [PatientsLikeMe (https://www.patientslikeme.com)](https://www.patientslikeme.com)*: top 100K, medicine, us*
1. ![](https://www.google.com/s2/favicons?domain=https://www.picuki.com/) [Picuki (https://www.picuki.com/)](https://www.picuki.com/)*: top 100K, photo*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.tikvib.com/) [Picuki (https://www.tikvib.com/)](https://www.tikvib.com/)*: top 100K, video*
1. ![](https://www.google.com/s2/favicons?domain=https://www.pokecommunity.com) [Pokecommunity (https://www.pokecommunity.com)](https://www.pokecommunity.com)*: top 100K, forum, gaming*
1. ![](https://www.google.com/s2/favicons?domain=https://eintracht.de) [Eintracht (https://eintracht.de)](https://eintracht.de)*: top 100K, tr*
1. ![](https://www.google.com/s2/favicons?domain=https://www.datpiff.com) [Datpiff (https://www.datpiff.com)](https://www.datpiff.com)*: top 100K, us*
@@ -623,14 +623,14 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://mywed.com/ru) [Mywed (https://mywed.com/ru)](https://mywed.com/ru)*: top 100K, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://golbis.com) [Golbis (https://golbis.com)](https://golbis.com)*: top 100K, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://www.sooplive.co.kr/) [Soop (https://www.sooplive.co.kr/)](https://www.sooplive.co.kr/)*: top 100K, kr*
1. ![](https://www.google.com/s2/favicons?domain=https://freelancehunt.com) [Freelancehunt (https://freelancehunt.com)](https://freelancehunt.com)*: top 100K, freelance, ru, ua*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://freelancehunt.com) [Freelancehunt (https://freelancehunt.com)](https://freelancehunt.com)*: top 100K, freelance, ru, ua*
1. ![](https://www.google.com/s2/favicons?domain=https://atcoder.jp/) [Atcoder (https://atcoder.jp/)](https://atcoder.jp/)*: top 100K, coding, jp*
1. ![](https://www.google.com/s2/favicons?domain=https://www.livejasmin.com/) [Livejasmin (https://www.livejasmin.com/)](https://www.livejasmin.com/)*: top 100K, us, webcam*
1. ![](https://www.google.com/s2/favicons?domain=https://wanelo.com/) [Wanelo (https://wanelo.com/)](https://wanelo.com/)*: top 100K, shopping*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://motherless.com/) [Motherless (https://motherless.com/)](https://motherless.com/)*: top 100K, porn*
1. ![](https://www.google.com/s2/favicons?domain=http://fanlore.org) [Fanlore (http://fanlore.org)](http://fanlore.org)*: top 100K, us*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=http://fanlore.org) [Fanlore (http://fanlore.org)](http://fanlore.org)*: top 100K, us*
1. ![](https://www.google.com/s2/favicons?domain=https://www.jetpunk.com) [Jetpunk (https://www.jetpunk.com)](https://www.jetpunk.com)*: top 100K, gaming*
1. ![](https://www.google.com/s2/favicons?domain=https://icobench.com) [Icobench (https://icobench.com)](https://icobench.com)*: top 100K, kr, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://icobench.com) [Icobench (https://icobench.com)](https://icobench.com)*: top 100K, kr, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://www.rappad.co) [Rappad (https://www.rappad.co)](https://www.rappad.co)*: top 100K, music*
1. ![](https://www.google.com/s2/favicons?domain=https://maxpark.com) [Maxpark (https://maxpark.com)](https://maxpark.com)*: top 100K, news, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://savingadvice.com) [savingadvice.com (https://savingadvice.com)](https://savingadvice.com)*: top 100K, finance*
@@ -671,7 +671,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://rmmedia.ru) [Rmmedia (https://rmmedia.ru)](https://rmmedia.ru)*: top 100K, forum, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://trashbox.ru/) [Trashbox.ru (https://trashbox.ru/)](https://trashbox.ru/)*: top 100K, az, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://www.ddo.com) [Ddo (https://www.ddo.com)](https://www.ddo.com)*: top 100K, forum*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.hometheaterforum.com) [Hometheaterforum (https://www.hometheaterforum.com)](https://www.hometheaterforum.com)*: top 100K, forum, us*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.hometheaterforum.com) [Hometheaterforum (https://www.hometheaterforum.com)](https://www.hometheaterforum.com)*: top 100K, forum, us*
1. ![](https://www.google.com/s2/favicons?domain=https://www.vlr.gg) [VLR (https://www.vlr.gg)](https://www.vlr.gg)*: top 100K, gaming*
1. ![](https://www.google.com/s2/favicons?domain=https://www.hackingwithswift.com) [HackingWithSwift (https://www.hackingwithswift.com)](https://www.hackingwithswift.com)*: top 100K, coding*
1. ![](https://www.google.com/s2/favicons?domain=https://partyflock.nl) [Partyflock (https://partyflock.nl)](https://partyflock.nl)*: top 100K, nl*
@@ -682,7 +682,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://www.medikforum.ru) [Medikforum (https://www.medikforum.ru)](https://www.medikforum.ru)*: top 100K, de, forum, nl, ru, ua*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://mynickname.com) [mynickname.com (https://mynickname.com)](https://mynickname.com)*: top 100K, social*
1. ![](https://www.google.com/s2/favicons?domain=https://appleinsider.ru) [appleinsider.ru (https://appleinsider.ru)](https://appleinsider.ru)*: top 100K, news, ru, tech*
1. ![](https://www.google.com/s2/favicons?domain=https://imginn.com) [ImgInn (https://imginn.com)](https://imginn.com)*: top 100K, photo*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://imginn.com) [ImgInn (https://imginn.com)](https://imginn.com)*: top 100K, photo*
1. ![](https://www.google.com/s2/favicons?domain=https://rpggeek.com) [RPGGeek (https://rpggeek.com)](https://rpggeek.com)*: top 100K, gaming*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.suomi24.fi) [Suomi24 (https://www.suomi24.fi)](https://www.suomi24.fi)*: top 100K, fi, jp*
1. ![](https://www.google.com/s2/favicons?domain=https://ethereum-magicians.org) [Ethereum-magicians (https://ethereum-magicians.org)](https://ethereum-magicians.org)*: top 100K, cr, forum*
@@ -763,7 +763,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://www.freelancejob.ru) [FreelanceJob (https://www.freelancejob.ru)](https://www.freelancejob.ru)*: top 10M, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.rusfootball.info/) [Football (https://www.rusfootball.info/)](https://www.rusfootball.info/)*: top 10M, ru*
1. ![](https://www.google.com/s2/favicons?domain=http://www.beerintheevening.com) [Beerintheevening (http://www.beerintheevening.com)](http://www.beerintheevening.com)*: top 10M, gb*
1. ![](https://www.google.com/s2/favicons?domain=https://fortnitetracker.com/challenges) [FortniteTracker (https://fortnitetracker.com/challenges)](https://fortnitetracker.com/challenges)*: top 10M, gaming*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://fortnitetracker.com/challenges) [FortniteTracker (https://fortnitetracker.com/challenges)](https://fortnitetracker.com/challenges)*: top 10M, gaming*
1. ![](https://www.google.com/s2/favicons?domain=https://www.heavy-r.com/) [Heavy R (https://www.heavy-r.com/)](https://www.heavy-r.com/)*: top 10M, porn*
1. ![](https://www.google.com/s2/favicons?domain=http://www.coolminiornot.com) [Coolminiornot (http://www.coolminiornot.com)](http://www.coolminiornot.com)*: top 10M, forum, sg*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.1001tracklists.com) [1001tracklists (https://www.1001tracklists.com)](https://www.1001tracklists.com)*: top 10M, music*
@@ -779,7 +779,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://professionali.ru) [Professionali (https://professionali.ru)](https://professionali.ru)*: top 10M, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://listography.com/adam) [Listography (https://listography.com/adam)](https://listography.com/adam)*: top 10M, sharing*
1. ![](https://www.google.com/s2/favicons?domain=https://www.theanswerbank.co.uk) [The AnswerBank (https://www.theanswerbank.co.uk)](https://www.theanswerbank.co.uk)*: top 10M, gb, q&a*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.bdoutdoors.com) [Bdoutdoors (https://www.bdoutdoors.com)](https://www.bdoutdoors.com)*: top 10M, us*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.bdoutdoors.com) [Bdoutdoors (https://www.bdoutdoors.com)](https://www.bdoutdoors.com)*: top 10M, us*
1. ![](https://www.google.com/s2/favicons?domain=http://millerovo161.ru) [millerovo161.ru (http://millerovo161.ru)](http://millerovo161.ru)*: top 10M, forum, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://shikimori.one) [Shikimori (https://shikimori.one)](https://shikimori.one)*: top 10M, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://www.kharkovforum.com/) [KharkovForum (https://www.kharkovforum.com/)](https://www.kharkovforum.com/)*: top 10M, forum, ua*, search is disabled
@@ -796,7 +796,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://www.fluther.com/) [Fluther (https://www.fluther.com/)](https://www.fluther.com/)*: top 10M, q&a*
1. ![](https://www.google.com/s2/favicons?domain=https://www.sbazar.cz/) [Sbazar.cz (https://www.sbazar.cz/)](https://www.sbazar.cz/)*: top 10M, cz, shopping*
1. ![](https://www.google.com/s2/favicons?domain=https://vintage-mustang.com) [vintage-mustang.com (https://vintage-mustang.com)](https://vintage-mustang.com)*: top 10M, forum, us*
1. ![](https://www.google.com/s2/favicons?domain=http://www.forum.hr) [forum.hr (http://www.forum.hr)](http://www.forum.hr)*: top 10M, forum, hr*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.forum.hr) [forum.hr (https://www.forum.hr)](https://www.forum.hr)*: top 10M, forum, hr*
1. ![](https://www.google.com/s2/favicons?domain=http://school2dobrinka.ru) [school2dobrinka.ru (http://school2dobrinka.ru)](http://school2dobrinka.ru)*: top 10M, education, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://kosmetista.ru) [Kosmetista (https://kosmetista.ru)](https://kosmetista.ru)*: top 10M, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://www.pbnation.com/) [Pbnation (https://www.pbnation.com/)](https://www.pbnation.com/)*: top 10M, ca*, search is disabled
@@ -880,7 +880,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://proglib.io) [Proglib (https://proglib.io)](https://proglib.io)*: top 10M, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://nightbot.tv/) [nightbot (https://nightbot.tv/)](https://nightbot.tv/)*: top 10M, jp*
1. ![](https://www.google.com/s2/favicons?domain=https://www.hunttalk.com) [Hunttalk (https://www.hunttalk.com)](https://www.hunttalk.com)*: top 10M, forum, us*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://dmoj.ca/) [DMOJ (https://dmoj.ca/)](https://dmoj.ca/)*: top 10M, ca, coding*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://dmoj.ca/) [DMOJ (https://dmoj.ca/)](https://dmoj.ca/)*: top 10M, ca, coding*
1. ![](https://www.google.com/s2/favicons?domain=https://truesteamachievements.com) [Truesteamachievements (https://truesteamachievements.com)](https://truesteamachievements.com)*: top 10M, az, gb*
1. ![](https://www.google.com/s2/favicons?domain=https://www.thefastlaneforum.com) [TheFastlaneForum (https://www.thefastlaneforum.com)](https://www.thefastlaneforum.com)*: top 10M, forum, us*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=http://www.lada-vesta.net) [lada-vesta.net (http://www.lada-vesta.net)](http://www.lada-vesta.net)*: top 10M, auto, forum, ru*
@@ -944,7 +944,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://www.gps-data-team.com) [Gps-data-team (https://www.gps-data-team.com)](https://www.gps-data-team.com)*: top 10M, maps*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://yasobe.ru) [Soberu (https://yasobe.ru)](https://yasobe.ru)*: top 10M, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.imood.com/) [Imood (https://www.imood.com/)](https://www.imood.com/)*: top 10M, blog*
1. ![](https://www.google.com/s2/favicons?domain=https://elakiri.com) [Elakiri (https://elakiri.com)](https://elakiri.com)*: top 10M, lk*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://elakiri.com) [Elakiri (https://elakiri.com)](https://elakiri.com)*: top 10M, lk*
1. ![](https://www.google.com/s2/favicons?domain=https://www.countable.us/) [Countable (https://www.countable.us/)](https://www.countable.us/)*: top 10M, us*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.shipmodeling.ru/phpbb) [shipmodeling.ru (https://www.shipmodeling.ru/phpbb)](https://www.shipmodeling.ru/phpbb)*: top 10M, forum, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://armtorg.ru/) [Armtorg (https://armtorg.ru/)](https://armtorg.ru/)*: top 10M, forum, ru*
@@ -979,7 +979,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://www.mdshooters.com) [Mdshooters (https://www.mdshooters.com)](https://www.mdshooters.com)*: top 10M, forum, us*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://prodaman.ru) [Prodaman (https://prodaman.ru)](https://prodaman.ru)*: top 10M, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://mikrob.ru) [mikrob.ru (https://mikrob.ru)](https://mikrob.ru)*: top 10M, forum, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://www.gardrops.com) [Gardrops (https://www.gardrops.com)](https://www.gardrops.com)*: top 10M, shopping, tr*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.gardrops.com) [Gardrops (https://www.gardrops.com)](https://www.gardrops.com)*: top 10M, shopping, tr*
1. ![](https://www.google.com/s2/favicons?domain=https://zagony.ru) [Zagony (https://zagony.ru)](https://zagony.ru)*: top 10M, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://pogovorim.by) [Pogovorim (https://pogovorim.by)](https://pogovorim.by)*: top 10M, by, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://sniperforums.com) [sniperforums.com (https://sniperforums.com)](https://sniperforums.com)*: top 10M, forum*
@@ -1754,7 +1754,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://social.tchncs.de/) [social.tchncs.de (https://social.tchncs.de/)](https://social.tchncs.de/)*: top 100M, de*
1. ![](https://www.google.com/s2/favicons?domain=https://forums.alliedmods.net/) [alliedmods (https://forums.alliedmods.net/)](https://forums.alliedmods.net/)*: top 100M, forum, gb, jp, tr, uz*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://forums.gamerevolution.com) [GameRevolution (https://forums.gamerevolution.com)](https://forums.gamerevolution.com)*: top 100M, forum, gaming*
1. ![](https://www.google.com/s2/favicons?domain=https://ru.pathofexile.com) [Pathofexile (https://ru.pathofexile.com)](https://ru.pathofexile.com)*: top 100M, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://ru.pathofexile.com) [Pathofexile (https://ru.pathofexile.com)](https://ru.pathofexile.com)*: top 100M, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://boards.theforce.net) [boards.theforce.net (https://boards.theforce.net)](https://boards.theforce.net)*: top 100M*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://community.justlanded.com) [Justlanded (https://community.justlanded.com)](https://community.justlanded.com)*: top 100M*
1. ![](https://www.google.com/s2/favicons?domain=http://forum.igromania.ru/) [igromania (http://forum.igromania.ru/)](http://forum.igromania.ru/)*: top 100M, forum, gaming, ru*
@@ -1826,7 +1826,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://forum.ubuntu-it.org) [forum.ubuntu-it.org (https://forum.ubuntu-it.org)](https://forum.ubuntu-it.org)*: top 100M, ch, forum, it*
1. ![](https://www.google.com/s2/favicons?domain=https://forum.endeavouros.com) [forum.endeavouros.com (https://forum.endeavouros.com)](https://forum.endeavouros.com)*: top 100M, forum*
1. ![](https://www.google.com/s2/favicons?domain=http://forum.newlcn.com) [forum.newlcn.com (http://forum.newlcn.com)](http://forum.newlcn.com)*: top 100M, forum*
1. ![](https://www.google.com/s2/favicons?domain=https://discussion.squadhelp.com) [discussion.squadhelp.com (https://discussion.squadhelp.com)](https://discussion.squadhelp.com)*: top 100M, forum*
1. ![](https://www.google.com/s2/favicons?domain=https://discussion.squadhelp.com) [discussion.squadhelp.com (https://discussion.squadhelp.com)](https://discussion.squadhelp.com)*: top 100M, forum*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://discuss.flarum.org) [discuss.flarum.org (https://discuss.flarum.org)](https://discuss.flarum.org)*: top 100M*
1. ![](https://www.google.com/s2/favicons?domain=https://forum.mirf.ru/) [mirf (https://forum.mirf.ru/)](https://forum.mirf.ru/)*: top 100M, forum, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=http://kpyto.pp.net.ua) [kpyto.pp.net.ua (http://kpyto.pp.net.ua)](http://kpyto.pp.net.ua)*: top 100M, ua*
@@ -1885,7 +1885,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://forum.gong.bg) [forum.gong.bg (https://forum.gong.bg)](https://forum.gong.bg)*: top 100M, bg, forum*
1. ![](https://www.google.com/s2/favicons?domain=https://forum.velomania.ru/) [Velomania (https://forum.velomania.ru/)](https://forum.velomania.ru/)*: top 100M, forum, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=http://bbs.evony.com) [bbs.evony.com (http://bbs.evony.com)](http://bbs.evony.com)*: top 100M, forum, pk, tr*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://forum.vectric.com) [forum.vectric.com (https://forum.vectric.com)](https://forum.vectric.com)*: top 100M, forum*
1. ![](https://www.google.com/s2/favicons?domain=https://forum.vectric.com) [forum.vectric.com (https://forum.vectric.com)](https://forum.vectric.com)*: top 100M, forum*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=http://forum.bratsk.org) [Bratsk Forum (http://forum.bratsk.org)](http://forum.bratsk.org)*: top 100M, forum, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://forums.runnersworld.co.uk/) [Runnersworld (https://forums.runnersworld.co.uk/)](https://forums.runnersworld.co.uk/)*: top 100M, forum, sport*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=http://forum.qwas.ru) [Qwas (http://forum.qwas.ru)](http://forum.qwas.ru)*: top 100M, forum, ru*
@@ -2141,7 +2141,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://fireworktv.com) [Fireworktv (https://fireworktv.com)](https://fireworktv.com)*: top 100M, jp*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://flbord.com) [Flbord (https://flbord.com)](https://flbord.com)*: top 100M, ru, ua*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://fm-forum.ru) [Fm-forum (https://fm-forum.ru)](https://fm-forum.ru)*: top 100M, forum, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=http://forum.glow-dm.ru) [Forum.glow-dm.ru (http://forum.glow-dm.ru)](http://forum.glow-dm.ru)*: top 100M, forum, ru*
1. ![](https://www.google.com/s2/favicons?domain=http://forum.glow-dm.ru) [Forum.glow-dm.ru (http://forum.glow-dm.ru)](http://forum.glow-dm.ru)*: top 100M, forum, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://forum.jambox.ru) [Forum.jambox.ru (https://forum.jambox.ru)](https://forum.jambox.ru)*: top 100M, forum, ru*
1. ![](https://www.google.com/s2/favicons?domain=http://forum.quake2.com.ru/) [Forum.quake2.com.ru (http://forum.quake2.com.ru/)](http://forum.quake2.com.ru/)*: top 100M, forum, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=http://forum29.net) [Forum29 (http://forum29.net)](http://forum29.net)*: top 100M, forum, ru*, search is disabled
@@ -2199,7 +2199,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://www.invalidnost.com) [Invalidnost (https://www.invalidnost.com)](https://www.invalidnost.com)*: top 100M, ru*
1. ![](https://www.google.com/s2/favicons?domain=) [IonicFramework ()]()*: top 100M*
1. ![](https://www.google.com/s2/favicons?domain=http://ispdn.ru) [Ispdn (http://ispdn.ru)](http://ispdn.ru)*: top 100M, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://itforums.ru) [Itforums (https://itforums.ru)](https://itforums.ru)*: top 100M, forum, ru*
1. ![](https://www.google.com/s2/favicons?domain=https://itforums.ru) [Itforums (https://itforums.ru)](https://itforums.ru)*: top 100M, forum, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://itfy.org) [Itfy (https://itfy.org)](https://itfy.org)*: top 100M, ru*
1. ![](https://www.google.com/s2/favicons?domain=) [Jbzd ()]()*: top 100M*
1. ![](https://www.google.com/s2/favicons?domain=) [Jeja.pl ()]()*: top 100M*
@@ -2278,7 +2278,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=) [Ninjakiwi ()]()*: top 100M*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://www.nationalgunforum.com) [NationalgunForum (https://www.nationalgunforum.com)](https://www.nationalgunforum.com)*: top 100M, ca, forum*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://naturalworld.guru) [Naturalworld (https://naturalworld.guru)](https://naturalworld.guru)*: top 100M, ru*
1. ![](https://www.google.com/s2/favicons?domain=) [Needrom ()]()*: top 100M*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=) [Needrom ()]()*: top 100M*
1. ![](https://www.google.com/s2/favicons?domain=https://no-jus.com) [No-jus (https://no-jus.com)](https://no-jus.com)*: top 100M, ru*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=https://numizmat-forum.ru) [Numizmat (https://numizmat-forum.ru)](https://numizmat-forum.ru)*: top 100M, forum, ru*
1. ![](https://www.google.com/s2/favicons?domain=) [Nyaa.si ()]()*: top 100M*
@@ -2306,7 +2306,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=) [Polczat.pl ()]()*: top 100M*
1. ![](https://www.google.com/s2/favicons?domain=) [Policja2009 ()]()*: top 100M*
1. ![](https://www.google.com/s2/favicons?domain=) [Polleverywhere ()]()*: top 100M*
1. ![](https://www.google.com/s2/favicons?domain=) [Polymart ()]()*: top 100M*
1. ![](https://www.google.com/s2/favicons?domain=) [Polymart ()]()*: top 100M*, search is disabled
1. ![](https://www.google.com/s2/favicons?domain=) [PornhubPornstars ()]()*: top 100M*
1. ![](https://www.google.com/s2/favicons?domain=) [Poshmark ()]()*: top 100M*
1. ![](https://www.google.com/s2/favicons?domain=http://pro-cats.ru) [Pro-cats (http://pro-cats.ru)](http://pro-cats.ru)*: top 100M, ru*
@@ -3158,16 +3158,16 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://app.airnfts.com) [AirNFTs (https://app.airnfts.com)](https://app.airnfts.com)*: top 100M, crypto, nft*
1. ![](https://www.google.com/s2/favicons?domain=https://greasyfork.org) [GreasyFork (https://greasyfork.org)](https://greasyfork.org)*: top 100M, coding*
The list was updated at (2026-05-05)
The list was updated at (2026-05-08)
## Statistics
Enabled/total sites: 2510/3154 = 79.58%
Enabled/total sites: 2524/3154 = 80.03%
Incomplete message checks: 308/2510 = 12.27% (false positive risks)
Incomplete message checks: 311/2524 = 12.32% (false positive risks)
Status code checks: 631/2510 = 25.14% (false positive risks)
Status code checks: 636/2524 = 25.2% (false positive risks)
False positive risk (total): 37.41%
False positive risk (total): 37.52%
Sites with probing: 500px, Armchairgm, BinarySearch (disabled), BleachFandom, Bluesky, BongaCams, Boosty, BuyMeACoffee, Calendly, Cent, Chess, Code Sandbox (disabled), Code Snippet Wiki, DailyMotion, Discord, Diskusjon.no, Disqus, Docker Hub, Duolingo, FandomCommunityCentral, GitHub, GitLab, Google Plus (archived), Gravatar, HackTheBox, Hackerrank, Hashnode, Holopin, Imgur, Issuu, Keybase, Kick, Kvinneguiden, LeetCode, Lesswrong, Livejasmin, LocalCryptos (disabled), Medium, MicrosoftLearn, MixCloud, Monkeytype, NPM, Niftygateway, Omg.lol, OnlyFans, Paragraph, Picsart, Plurk, Polarsteps, Rarible, Reddit, Reddit Search (Pushshift) (disabled), Revolut.me, RoyalCams, Scratch, Soop, SportsTracker, Spotify, StackOverflow, Substack, TAP'D, Topcoder, Trello, Twitch, Twitter, Twitter Shadowban (disabled), UnstoppableDomains, Vimeo, Vivino, Warframe Market, Warpcast, Weibo, Wikipedia, Yapisal (disabled), YouNow, en.brickimedia.org, forums.grandstream.com, nightbot, notabug.org, qiwi.me (disabled)
@@ -3198,10 +3198,10 @@ Top 20 profile URLs:
Sites by engine:
- `uCoz`: 634/709 (89.4%)
- `XenForo`: 179/223 (80.3%)
- `phpBB/Search`: 120/127 (94.5%)
- `vBulletin`: 30/120 (25.0%)
- `Discourse`: 85/92 (92.4%)
- `XenForo`: 177/223 (79.4%)
- `phpBB/Search`: 119/127 (93.7%)
- `vBulletin`: 31/120 (25.8%)
- `Discourse`: 84/92 (91.3%)
- `phpBB`: 21/27 (77.8%)
- `engine404`: 19/23 (82.6%)
- `op.gg`: 17/17 (100.0%)
@@ -3217,7 +3217,7 @@ Top 20 tags:
- (749) `forum`
- (128) `gaming`
- (88) `coding`
- (58) `photo`
- (57) `photo`
- (46) `tech`
- (45) `social`
- (42) `news`
@@ -3226,8 +3226,8 @@ Top 20 tags:
- (31) `shopping`
- (29) `crypto`
- (27) `finance`
- (25) `video`
- (25) `sharing`
- (24) `video`
- (23) `education`
- (22) `freelance`
- (21) `art`
+1
View File
@@ -53,6 +53,7 @@ DEFAULT_ARGS: Dict[str, Any] = {
'ai_model': 'gpt-4o',
'no_autoupdate': False,
'force_update': False,
'cloudflare_bypass': False,
}
+256
View File
@@ -0,0 +1,256 @@
"""Tests for the Cloudflare webgate config + checker."""
import json
from types import SimpleNamespace
from mock import Mock
import pytest
from maigret.checking import (
CloudflareWebgateChecker,
build_cloudflare_bypass_config,
)
def _settings(payload):
return SimpleNamespace(cloudflare_bypass=payload)
def test_config_disabled_by_default():
s = _settings({"enabled": False, "modules": [{"method": "json_api", "url": "x"}]})
assert build_cloudflare_bypass_config(s, force_enable=False) is None
def test_config_force_enable_overrides_disabled_settings():
s = _settings({"enabled": False, "modules": [{"method": "json_api", "url": "http://x:8191/v1"}]})
cfg = build_cloudflare_bypass_config(s, force_enable=True)
assert cfg is not None
assert cfg["modules"][0]["url"] == "http://x:8191/v1"
def test_config_drops_invalid_modules():
s = _settings({
"enabled": True,
"modules": [
{"method": "url_rewrite", "url": "http://x:8000/html"}, # missing {url}
{"method": "json_api", "url": "http://x:8191/v1"},
{"method": "unknown", "url": "http://x"},
],
})
cfg = build_cloudflare_bypass_config(s)
assert len(cfg["modules"]) == 1
assert cfg["modules"][0]["method"] == "json_api"
def test_config_returns_none_when_no_valid_modules():
s = _settings({"enabled": True, "modules": [{"method": "url_rewrite", "url": "no-placeholder"}]})
assert build_cloudflare_bypass_config(s) is None
def test_config_default_trigger_protection():
s = _settings({"enabled": True, "modules": [{"method": "json_api", "url": "http://x:8191/v1"}]})
cfg = build_cloudflare_bypass_config(s)
assert "cf_js_challenge" in cfg["trigger_protection"]
assert "cf_firewall" in cfg["trigger_protection"]
assert "webgate" in cfg["trigger_protection"]
@pytest.mark.asyncio
async def test_flaresolverr_success(httpserver):
httpserver.expect_request("/v1", method="POST").respond_with_json({
"status": "ok",
"solution": {"status": 404, "response": "<html>missing</html>", "url": "https://site/missing"},
})
config = {
"modules": [{"name": "fs", "method": "json_api", "url": httpserver.url_for("/v1")}],
"session_prefix": "test",
}
c = CloudflareWebgateChecker(logger=Mock(), config=config)
c.prepare(url="https://site/missing", timeout=5)
body, status, err = await c.check()
assert err is None
assert status == 404 # upstream status preserved — fixes status_code checktype
assert "missing" in body
@pytest.mark.asyncio
async def test_flaresolverr_solver_error_propagates(httpserver):
httpserver.expect_request("/v1", method="POST").respond_with_json({
"status": "error",
"message": "Challenge could not be solved",
})
config = {
"modules": [{"name": "fs", "method": "json_api", "url": httpserver.url_for("/v1")}],
}
c = CloudflareWebgateChecker(logger=Mock(), config=config)
c.prepare(url="https://site/page", timeout=5)
body, status, err = await c.check()
assert err is not None
assert "Challenge could not be solved" in err.desc
@pytest.mark.asyncio
async def test_falls_back_to_next_module_on_failure(httpserver):
# Bind only the second module — the first is unreachable.
httpserver.expect_request("/v1", method="POST").respond_with_json({
"status": "ok",
"solution": {"status": 200, "response": "ok-from-second", "url": "https://x"},
})
config = {
"modules": [
{"name": "broken", "method": "json_api", "url": "http://127.0.0.1:1/v1"},
{"name": "good", "method": "json_api", "url": httpserver.url_for("/v1")},
],
}
c = CloudflareWebgateChecker(logger=Mock(), config=config)
c.prepare(url="https://site/page", timeout=5)
body, status, err = await c.check()
assert err is None
assert status == 200
assert body == "ok-from-second"
@pytest.mark.asyncio
async def test_url_rewrite_returns_html_with_synthetic_200(httpserver):
# CloudflareBypassForScraping returns just the rendered HTML, no JSON wrapper.
httpserver.expect_request("/html").respond_with_data(
"<html>profile body</html>", status=200, content_type="text/html"
)
config = {
"modules": [{
"name": "cbfs",
"method": "url_rewrite",
"url": httpserver.url_for("/html") + "?url={url}",
}],
}
c = CloudflareWebgateChecker(logger=Mock(), config=config)
c.prepare(url="https://site/page", timeout=5)
body, status, err = await c.check()
assert err is None
assert status == 200 # synthetic — url_rewrite cannot recover real status
assert "profile body" in body
@pytest.mark.asyncio
async def test_all_modules_unreachable_actionable_error():
config = {
"modules": [
{"name": "fs", "method": "json_api", "url": "http://127.0.0.1:1/v1"},
{"name": "cbfs", "method": "url_rewrite", "url": "http://127.0.0.1:2/html?url={url}"},
],
}
c = CloudflareWebgateChecker(logger=Mock(), config=config)
c.prepare(url="https://site/page", timeout=2)
body, status, err = await c.check()
assert err is not None
assert err.type == "Webgate unavailable"
# Per-module attempt summary helps users see WHICH backend failed
assert "fs:" in err.desc and "cbfs:" in err.desc
# Primary URL is shown so the user knows where to look
assert "http://127.0.0.1:1/v1" in err.desc
# FlareSolverr docker hint when primary is json_api
assert "flaresolverr" in err.desc.lower()
@pytest.mark.asyncio
async def test_session_is_scoped_per_host(httpserver):
seen_sessions = []
def handler(request):
seen_sessions.append(request.get_json()["session"])
return {"status": "ok", "solution": {"status": 200, "response": "", "url": "x"}}
httpserver.expect_request("/v1", method="POST").respond_with_handler(handler)
config = {"modules": [{"name": "fs", "method": "json_api", "url": httpserver.url_for("/v1")}]}
c = CloudflareWebgateChecker(logger=Mock(), config=config)
c.prepare(url="https://patreon.com/foo", timeout=5)
await c.check()
c.prepare(url="https://patreon.com/bar", timeout=5)
await c.check()
c.prepare(url="https://lomography.com/baz", timeout=5)
await c.check()
assert seen_sessions[0] == seen_sessions[1], "same host -> same session"
assert seen_sessions[0] != seen_sessions[2], "different host -> different session"
assert "patreon.com" in seen_sessions[0]
assert "lomography.com" in seen_sessions[2]
@pytest.mark.asyncio
async def test_flaresolverr_request_body_shape(httpserver):
captured = {}
def handler(request):
captured["body"] = request.get_json()
return {"status": "ok", "solution": {"status": 200, "response": "", "url": "x"}}
httpserver.expect_request("/v1", method="POST").respond_with_handler(handler)
config = {"modules": [{"name": "fs", "method": "json_api", "url": httpserver.url_for("/v1")}]}
c = CloudflareWebgateChecker(logger=Mock(), config=config)
c.prepare(url="https://site/page", headers={"User-Agent": "test-ua/1.0"}, timeout=5)
await c.check()
body = captured["body"]
assert body["cmd"] == "request.get"
assert body["url"] == "https://site/page"
assert body["session"].startswith("maigret-")
# userAgent was removed in FlareSolverr v2; the impersonated browser's
# own UA must be used to keep TLS+UA consistent.
assert "userAgent" not in body
assert "proxy" not in body
@pytest.mark.asyncio
async def test_flaresolverr_proxy_string_passed_through(httpserver):
captured = {}
def handler(request):
captured["body"] = request.get_json()
return {"status": "ok", "solution": {"status": 200, "response": "", "url": "x"}}
httpserver.expect_request("/v1", method="POST").respond_with_handler(handler)
config = {
"modules": [
{
"name": "fs",
"method": "json_api",
"url": httpserver.url_for("/v1"),
"proxy": "socks5://localhost:1080",
}
]
}
c = CloudflareWebgateChecker(logger=Mock(), config=config)
c.prepare(url="https://site/page", headers={}, timeout=5)
await c.check()
assert captured["body"]["proxy"] == {"url": "socks5://localhost:1080"}
@pytest.mark.asyncio
async def test_flaresolverr_proxy_dict_with_credentials(httpserver):
captured = {}
def handler(request):
captured["body"] = request.get_json()
return {"status": "ok", "solution": {"status": 200, "response": "", "url": "x"}}
httpserver.expect_request("/v1", method="POST").respond_with_handler(handler)
config = {
"modules": [
{
"name": "fs",
"method": "json_api",
"url": httpserver.url_for("/v1"),
"proxy": {
"url": "http://proxy.example:3128",
"username": "u",
"password": "p",
"stripped_extra": "ignored",
},
}
]
}
c = CloudflareWebgateChecker(logger=Mock(), config=config)
c.prepare(url="https://site/page", headers={}, timeout=5)
await c.check()
proxy = captured["body"]["proxy"]
assert proxy == {"url": "http://proxy.example:3128", "username": "u", "password": "p"}