mirror of
https://github.com/soxoj/maigret.git
synced 2026-05-09 08:04:32 +00:00
Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 5830c9ce72 |
@@ -44,4 +44,3 @@ settings.json
|
||||
*.egg-info
|
||||
build
|
||||
LLM
|
||||
lib
|
||||
|
||||
@@ -84,9 +84,6 @@ ids. Useful for repeated scanning with found known irrelevant usernames.
|
||||
``--db`` - Load Maigret database from a JSON file or an online, valid,
|
||||
JSON file. See :ref:`custom-database` below.
|
||||
|
||||
``--extra-db`` - Load an **additional** sites database on top of
|
||||
``--db`` (overlay). Repeatable. See :ref:`extra-database` below.
|
||||
|
||||
``--no-autoupdate`` - Disable the automatic database update check that
|
||||
runs at startup. The currently cached (or bundled) database is used
|
||||
as-is.
|
||||
@@ -142,47 +139,6 @@ disabled and all sites scanned, looks like::
|
||||
--db LLM/maigret_private_db.json \
|
||||
--no-autoupdate -a
|
||||
|
||||
.. _extra-database:
|
||||
|
||||
Overlaying additional databases (``--extra-db``)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``--extra-db FILE`` loads an additional sites database **on top of**
|
||||
``--db``, rather than replacing it. The flag is repeatable, so multiple
|
||||
extras can be layered in one invocation::
|
||||
|
||||
python3 -m maigret username \
|
||||
--extra-db private_sites.json \
|
||||
--extra-db team_sites.json -a
|
||||
|
||||
Each extra accepts the same three forms as ``--db`` (HTTP(S) URL,
|
||||
absolute or cwd-relative local path, or module-relative path).
|
||||
|
||||
**Merge semantics.** Sites, engines and tags are merged into the main
|
||||
database. On duplicate names, **last wins**: a site or engine defined
|
||||
later (either in a subsequent ``--extra-db`` or in an ``--extra-db``
|
||||
that re-defines a name from ``--db``) overrides the earlier definition.
|
||||
Tag lists are deduplicated while preserving first-seen order.
|
||||
|
||||
**Auto-update.** Extras are never auto-updated — they are read exactly
|
||||
as provided, regardless of ``--no-autoupdate`` / ``--force-update``.
|
||||
|
||||
**Save behaviour.** While any ``--extra-db`` is active, Maigret **skips
|
||||
every database save** — including the implicit end-of-run save, the
|
||||
``--self-check --auto-disable`` save, and the ``--submit`` save. This
|
||||
prevents silently writing merged (main + extras) content back into the
|
||||
main ``--db`` file. If you need to persist edits, run Maigret again
|
||||
without ``--extra-db``. You will see a warning at startup::
|
||||
|
||||
[!] Database modifications will NOT be persisted while --extra-db is active.
|
||||
|
||||
**Missing or unreadable extra.** Maigret exits with a non-zero status —
|
||||
extras are opt-in, so a silent skip would hide configuration errors.
|
||||
|
||||
**Not supported with** ``--web``. The web UI reloads its own database
|
||||
from the main ``--db`` path, so extras would be invisible. Passing both
|
||||
exits with an error.
|
||||
|
||||
Reports
|
||||
-------
|
||||
|
||||
|
||||
@@ -148,14 +148,8 @@ Supported values:
|
||||
|
||||
- ``tls_fingerprint`` — the site fingerprints the TLS handshake (JA3/JA4) and blocks non-browser clients. Maigret automatically uses ``curl_cffi`` with Chrome browser emulation to bypass this. Requires the ``curl_cffi`` package (included as a dependency). Examples: Instagram, NPM, Codepen, Kickstarter, Letterboxd.
|
||||
- ``ip_reputation`` — the site blocks requests from datacenter/cloud IPs regardless of headers or TLS. Cannot be bypassed automatically; run Maigret from a regular internet connection (not a datacenter) or use a proxy (``--proxy``). Examples: Reddit, Patreon, Figma.
|
||||
- ``cf_js_challenge`` — Cloudflare Managed Challenge / Turnstile JS challenge. Symptom: HTTP 403 with ``cf-mitigated: challenge`` header; body contains ``challenges.cloudflare.com``, ``_cf_chl_opt``, ``window._cf_chl``, or "Just a moment". Not bypassable via ``curl_cffi`` TLS impersonation (verified across Chrome 123/124/131, Safari 17/18, Firefox 133/135, Edge 101 — all return the same 403 challenge page); a real browser executing the challenge JS is required to obtain the clearance cookie. Documentation-only flag; sites stay ``disabled: true`` until a CF-challenge solver is integrated. Examples: DMOJ, Elakiri, Fanlore, Bdoutdoors, TheStudentRoom, forum.hr.
|
||||
- ``cf_firewall`` — Cloudflare firewall rule / bot score block (WAF action=block, **not** action=challenge). Symptom: HTTP 403 served by Cloudflare (``server: cloudflare``, ``cf-ray`` header) **without** JS-challenge markers — body typically shows "Access denied", "Attention Required", or just a bare 1015/1016/1020 error page. Unlike ``ip_reputation``, residential IPs are **not** sufficient to bypass — Cloudflare decides based on a composite of bot score, TLS fingerprint, UA, ASN, and custom site-owner rules, so ``curl_cffi`` Chrome impersonation from a residential line still returns 403. Documentation-only flag; sites stay ``disabled: true`` until a per-site bypass (cookies, real browser, or residential+clean session) is found. Examples: Fark, Fodors, Huntingnet, Hunttalk.
|
||||
- ``js_challenge`` — the site serves a JavaScript challenge page (e.g. "Just a moment...") that cannot be solved without a browser. Maigret detects challenge signatures and returns UNKNOWN instead of a false positive.
|
||||
- ``aws_waf_js_challenge`` — the site is protected by AWS WAF with a JavaScript challenge. Symptom: HTTP 202 with empty body and ``x-amzn-waf-action: challenge`` header (a token-granting challenge that requires executing the CAPTCHA/challenge JS bundle). Neither ``curl_cffi`` TLS impersonation nor User-Agent changes bypass this — a real browser or the official AWS WAF challenge-solver SDK is required. Currently marked for documentation only; sites using this protection stay ``disabled: true`` until a solver is integrated. Example: Dreamwidth.
|
||||
- ``ddos_guard_challenge`` — DDoS-Guard (ddos-guard.net) anti-bot page. Symptom: HTTP 403 with ``server: ddos-guard`` header; body contains "DDoS-Guard". DDoS-Guard fingerprints different UAs per source IP, so a single User-Agent override does not work across environments; a JS-capable bypass or DDoS-Guard-aware solver is required. Documentation-only flag; sites stay ``disabled: true`` until a solver is integrated. Example: ForumHouse.
|
||||
- ``js_challenge`` — **fallback** for JavaScript-challenge systems whose provider cannot be identified (custom in-house challenge pages that are not Cloudflare, AWS WAF, or any other recognized vendor). Prefer a provider-specific tag whenever the provider can be pinned down from response headers or body signatures.
|
||||
- ``custom_bot_protection`` — **fallback** for non-JS-challenge bot protection served by a custom/in-house system (not Cloudflare, not AWS WAF, not DDoS-Guard). Typical symptom: HTTP 403 from the site's own origin server (``server: nginx``, AWS ELB, etc.) with a branded block page, returned regardless of TLS fingerprint or residential IP. Not generically bypassable; investigate per site (cookies, session, proxy geography). Examples: Hackerearth ("HackerEarth Guardian"), FreelanceJob (nginx-level block).
|
||||
|
||||
**Rule: prefer provider-specific protection tags.** When a site is blocked by an identifiable anti-bot vendor, always record the vendor in the tag (``cf_js_challenge``, ``cf_firewall``, ``aws_waf_js_challenge``, ``ddos_guard_challenge``, and future additions such as ``sucuri_challenge``, ``incapsula_challenge``). The generic ``js_challenge`` and ``custom_bot_protection`` tags are reserved for custom/unknown systems. Rationale: bypass solvers are inherently provider-specific (a Cloudflare Turnstile solver does not help with AWS WAF); recording the provider in advance lets us fan out fixes the moment a per-provider solver is added, without re-auditing every disabled site. The same principle applies to other protection categories when the provider is identifiable.
|
||||
|
||||
Example:
|
||||
|
||||
|
||||
@@ -46,9 +46,3 @@ You may be interested in:
|
||||
tags
|
||||
settings
|
||||
development
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
:caption: Use cases
|
||||
|
||||
use-cases/crypto
|
||||
|
||||
@@ -1,147 +0,0 @@
|
||||
.. _use-case-crypto:
|
||||
|
||||
Cryptocurrency & Web3 Investigations
|
||||
=====================================
|
||||
|
||||
Blockchain transactions are public, but the people behind wallets are not. Maigret helps bridge this gap by finding Web3 accounts tied to a username, revealing the person behind a pseudonymous crypto persona.
|
||||
|
||||
Why it matters
|
||||
--------------
|
||||
|
||||
Crypto investigations often start with a wallet address or an ENS name but hit a wall — the blockchain tells you *what* happened, not *who* did it. A username, however, is reused across platforms. If someone trades on OpenSea as ``zachxbt`` and posts on Warpcast as ``zachxbt``, Maigret connects the dots and builds a full profile.
|
||||
|
||||
Common scenarios:
|
||||
|
||||
- **Scam attribution.** A rug-pull promoter uses the same alias on Fragment (Telegram username marketplace), OpenSea, and a personal blog.
|
||||
- **Sanctions compliance.** Verifying whether a counterparty's online footprint matches known sanctioned individuals.
|
||||
- **Due diligence.** Before an OTC deal or DAO vote, checking whether the other party has a consistent online presence or is a freshly created sockpuppet.
|
||||
- **Stolen funds tracing.** A stolen NFT appears on OpenSea under a new account — but the username matches a Warpcast profile with real-world links.
|
||||
|
||||
Supported sites
|
||||
---------------
|
||||
|
||||
Maigret currently checks the following crypto and Web3 platforms:
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 20 40 40
|
||||
|
||||
* - Site
|
||||
- What it reveals
|
||||
- Notes
|
||||
* - **OpenSea**
|
||||
- NFT collections, trading history, profile bio, linked website
|
||||
-
|
||||
* - **Rarible**
|
||||
- NFT marketplace profile, collections, listing history
|
||||
- Complements OpenSea for NFT attribution across marketplaces
|
||||
* - **Zora**
|
||||
- Zora Network profile, minted NFTs, creator activity
|
||||
- Ethereum L2 creator platform; useful for on-chain art attribution
|
||||
* - **Polymarket**
|
||||
- Prediction-market profile, positions, public portfolio P&L
|
||||
- Useful for political/financial prediction attribution
|
||||
* - **Warpcast** (Farcaster)
|
||||
- Decentralized social profile, posts, follower graph, Farcaster ID
|
||||
- Every Farcaster ID maps to an Ethereum address via the on-chain ID registry
|
||||
* - **Fragment**
|
||||
- Telegram username ownership, TON wallet address, purchase date and price
|
||||
- Valuable for linking Telegram identities to TON wallets
|
||||
* - **Paragraph**
|
||||
- Web3 blog/newsletter, ETH wallet address, linked Twitter handle
|
||||
- Richest cross-platform data among crypto sites
|
||||
* - **Tonometerbot**
|
||||
- TON wallet balance, subscriber count, NFT collection, rankings
|
||||
- TON blockchain analytics
|
||||
* - **Spatial**
|
||||
- Metaverse profile, linked social accounts (Discord, Twitter, Instagram, LinkedIn, TikTok)
|
||||
- Rich cross-platform links
|
||||
* - **Revolut.me**
|
||||
- Payment handle: first/last name, country code, base currency, supported payment methods
|
||||
- Not strictly Web3, but widely used by crypto OTC traders for fiat off-ramps; the public API returns structured KYC-adjacent data
|
||||
|
||||
Real-world example: zachxbt
|
||||
---------------------------
|
||||
|
||||
`ZachXBT <https://twitter.com/zachxbt>`_ is a well-known on-chain investigator. Let's see what Maigret can find from just the username ``zachxbt``:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
maigret zachxbt --tags crypto
|
||||
|
||||
Maigret finds 5 accounts and automatically extracts structured data from each:
|
||||
|
||||
**Fragment** — confirms the Telegram username ``@zachxbt`` is claimed, reveals the TON wallet address (``EQBisZrk...``), purchase price (10 TON), and date (January 2023).
|
||||
|
||||
**Paragraph** — the richest result. Returns the real name used on the platform (``ZachXBT``), bio (``Scam survivor turned 2D investigator``), an Ethereum wallet address (``0x23dBf066...``), and a linked Twitter handle (``zachxbt``). The ``wallet_address`` field is especially valuable — it directly links the pseudonym to an on-chain identity.
|
||||
|
||||
**Warpcast** — Farcaster profile with a Farcaster ID (``fid: 20931``), profile image, and social graph (33K followers). Every Farcaster ID is tied to an Ethereum address via the on-chain ID registry, so this is another on-chain anchor.
|
||||
|
||||
**OpenSea** — NFT marketplace profile with bio (``On-chain sleuth | 10x rug pull survivor``), avatar (hosted on ``seadn.io`` with an Ethereum address in the URL path), and a link to an external investigations page.
|
||||
|
||||
**Hive Blog** — blockchain-based blog account created in March 2025. Low activity (1 post), but confirms the username is claimed across blockchain ecosystems.
|
||||
|
||||
From a single username, Maigret produces:
|
||||
|
||||
- **2 wallet addresses** — one TON (from Fragment), one Ethereum (from Paragraph)
|
||||
- **1 confirmed Twitter handle** — ``zachxbt`` (from Paragraph)
|
||||
- **1 Telegram username** — ``@zachxbt`` (from Fragment)
|
||||
- **1 external link** — ``investigations.notion.site`` (from OpenSea)
|
||||
- **Social graph data** — 33K Farcaster followers, blog activity timestamps
|
||||
|
||||
This is enough to pivot into blockchain analysis tools (Etherscan, Arkham, Nansen) using the wallet addresses, or into social media analysis using the Twitter handle.
|
||||
|
||||
Workflow: from username to wallet
|
||||
---------------------------------
|
||||
|
||||
**Step 1: Search crypto platforms**
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
maigret <username> --tags crypto -v
|
||||
|
||||
Review the results. Pay attention to:
|
||||
|
||||
- **Fragment** — if the username is claimed, you get a TON wallet address directly.
|
||||
- **Paragraph** — blog profiles often contain an ETH address and a Twitter handle.
|
||||
- **Warpcast** — Farcaster IDs map to Ethereum addresses via the on-chain registry.
|
||||
- **OpenSea** — avatar URLs sometimes contain wallet addresses in the path.
|
||||
|
||||
**Step 2: Expand with extracted identifiers**
|
||||
|
||||
Maigret automatically extracts additional identifiers from found profiles (real names, linked accounts, profile URLs) and recursively searches for them. This is enabled by default. If Maigret finds a linked Twitter handle on a Paragraph profile, it will automatically search for that handle across all sites.
|
||||
|
||||
**Step 3: Cross-reference with non-crypto platforms**
|
||||
|
||||
The real power is connecting crypto personas to mainstream accounts. Drop the tag filter:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
maigret <username> -a
|
||||
|
||||
This checks all 3000+ sites. A match on GitHub, Reddit, or a forum can reveal the person behind the wallet.
|
||||
|
||||
Workflow: from wallet to identity
|
||||
---------------------------------
|
||||
|
||||
If you start with a wallet address rather than a username, you can use complementary tools to get a username first:
|
||||
|
||||
1. **ENS / Unstoppable Domains** — resolve the wallet address to a human-readable name (``vitalik.eth``). Then search that name in Maigret.
|
||||
2. **Etherscan labels** — check if the address has a public label (exchange, known entity).
|
||||
3. **Fragment** — search the TON wallet address to find which Telegram usernames it purchased.
|
||||
4. **Arkham Intelligence / Nansen** — blockchain attribution platforms that may tag the address with a known identity.
|
||||
|
||||
Once you have a username candidate, feed it to Maigret.
|
||||
|
||||
Tips
|
||||
----
|
||||
|
||||
- **Username reuse is the #1 signal.** Crypto-native users often reuse their ENS name (``alice.eth``) or a variation (``alice_eth``, ``aliceeth``) across platforms. Try all variations.
|
||||
- **Fragment is uniquely valuable** because it directly links Telegram usernames to TON wallet addresses — a rare on-chain / off-chain bridge.
|
||||
- **Warpcast profiles are Ethereum-native.** Every Farcaster account is tied to an Ethereum address via the ID registry contract. If you find a Warpcast profile, you implicitly have a wallet address.
|
||||
- **Paragraph often has the richest data** — wallet address, Twitter handle, bio, and activity timestamps in a single API response.
|
||||
- **Use** ``--exclude-tags`` **to skip irrelevant sites** when you're focused on crypto:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
maigret alice_eth --exclude-tags porn,dating,forum
|
||||
+3
-54
@@ -7,7 +7,7 @@ from aiohttp import CookieJar
|
||||
|
||||
class ParsingActivator:
|
||||
@staticmethod
|
||||
def twitter(site, logger, cookies={}, **kwargs):
|
||||
def twitter(site, logger, cookies={}):
|
||||
headers = dict(site.headers)
|
||||
del headers["x-guest-token"]
|
||||
import requests
|
||||
@@ -19,7 +19,7 @@ class ParsingActivator:
|
||||
site.headers["x-guest-token"] = guest_token
|
||||
|
||||
@staticmethod
|
||||
def vimeo(site, logger, cookies={}, **kwargs):
|
||||
def vimeo(site, logger, cookies={}):
|
||||
headers = dict(site.headers)
|
||||
if "Authorization" in headers:
|
||||
del headers["Authorization"]
|
||||
@@ -31,58 +31,7 @@ class ParsingActivator:
|
||||
site.headers["Authorization"] = "jwt " + jwt_token
|
||||
|
||||
@staticmethod
|
||||
def onlyfans(site, logger, url=None, **kwargs):
|
||||
# Signing rules (static_param / checksum_indexes / checksum_constant / format / app_token)
|
||||
# live in data.json under OnlyFans.activation and rotate upstream every ~1–3 weeks.
|
||||
# If "Please refresh the page" keeps firing after activation, refresh them from:
|
||||
# https://raw.githubusercontent.com/DATAHOARDERS/dynamic-rules/main/onlyfans.json
|
||||
import hashlib
|
||||
import secrets
|
||||
import time as _time
|
||||
from urllib.parse import urlparse
|
||||
|
||||
import requests
|
||||
|
||||
act = site.activation
|
||||
static_param = act["static_param"]
|
||||
indexes = act["checksum_indexes"]
|
||||
constant = act["checksum_constant"]
|
||||
fmt = act["format"]
|
||||
init_url = act["url"]
|
||||
|
||||
user_id = site.headers.get("user-id", "0") or "0"
|
||||
|
||||
def _sign(path):
|
||||
t = str(int(_time.time() * 1000))
|
||||
msg = "\n".join([static_param, t, path, user_id]).encode()
|
||||
sha = hashlib.sha1(msg).hexdigest()
|
||||
cs = sum(ord(sha[i]) for i in indexes) + constant
|
||||
return t, fmt.format(sha, abs(cs))
|
||||
|
||||
if site.headers.get("x-bc", "").strip("0") == "":
|
||||
site.headers["x-bc"] = secrets.token_hex(20)
|
||||
|
||||
if not site.headers.get("cookie"):
|
||||
init_path = urlparse(init_url).path
|
||||
t, sg = _sign(init_path)
|
||||
hdrs = dict(site.headers)
|
||||
hdrs["time"] = t
|
||||
hdrs["sign"] = sg
|
||||
hdrs.pop("cookie", None)
|
||||
r = requests.get(init_url, headers=hdrs, timeout=15)
|
||||
jar = "; ".join(f"{k}={v}" for k, v in r.cookies.items())
|
||||
if jar:
|
||||
site.headers["cookie"] = jar
|
||||
logger.debug(f"OnlyFans init: got cookies {list(r.cookies.keys())}")
|
||||
|
||||
target_path = urlparse(url).path if url else urlparse(init_url).path
|
||||
t, sg = _sign(target_path)
|
||||
site.headers["time"] = t
|
||||
site.headers["sign"] = sg
|
||||
logger.debug(f"OnlyFans signed {target_path} time={t}")
|
||||
|
||||
@staticmethod
|
||||
def weibo(site, logger, **kwargs):
|
||||
def weibo(site, logger):
|
||||
headers = dict(site.headers)
|
||||
import requests
|
||||
|
||||
|
||||
+15
-6
@@ -6,6 +6,7 @@ import random
|
||||
import re
|
||||
import ssl
|
||||
import sys
|
||||
import time
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
from urllib.parse import quote
|
||||
|
||||
@@ -334,7 +335,12 @@ def debug_response_logging(url, html_text, status_code, check_error):
|
||||
|
||||
|
||||
def process_site_result(
|
||||
response, query_notify, logger, results_info: QueryResultWrapper, site: MaigretSite
|
||||
response,
|
||||
query_notify,
|
||||
logger,
|
||||
results_info: QueryResultWrapper,
|
||||
site: MaigretSite,
|
||||
response_time: Optional[float] = None,
|
||||
):
|
||||
if not response:
|
||||
return results_info
|
||||
@@ -362,9 +368,6 @@ def process_site_result(
|
||||
|
||||
html_text, status_code, check_error = response
|
||||
|
||||
# TODO: add elapsed request time counting
|
||||
response_time = None
|
||||
|
||||
if logger.level == logging.DEBUG:
|
||||
debug_response_logging(url, html_text, status_code, check_error)
|
||||
|
||||
@@ -667,7 +670,10 @@ async def check_site_for_username(
|
||||
print(f"error, no checker for {site.name}")
|
||||
return site.name, default_result
|
||||
|
||||
elapsed = 0.0
|
||||
t0 = time.perf_counter()
|
||||
response = await checker.check()
|
||||
elapsed += time.perf_counter() - t0
|
||||
html_text = response[0] if response and response[0] else ""
|
||||
|
||||
# Retry once after token-style activation (e.g. Twitter guest token refresh).
|
||||
@@ -678,7 +684,7 @@ async def check_site_for_username(
|
||||
method = act["method"]
|
||||
try:
|
||||
activate_fun = getattr(ParsingActivator(), method)
|
||||
activate_fun(site, logger, url=checker.url)
|
||||
activate_fun(site, logger)
|
||||
except AttributeError as e:
|
||||
logger.warning(
|
||||
f"Activation method {method} for site {site.name} not found!",
|
||||
@@ -700,10 +706,13 @@ async def check_site_for_username(
|
||||
method=checker.method,
|
||||
payload=getattr(checker, 'payload', None),
|
||||
)
|
||||
t1 = time.perf_counter()
|
||||
response = await checker.check()
|
||||
elapsed += time.perf_counter() - t1
|
||||
|
||||
response_result = process_site_result(
|
||||
response, query_notify, logger, default_result, site
|
||||
response, query_notify, logger, default_result, site,
|
||||
response_time=elapsed,
|
||||
)
|
||||
|
||||
query_notify.update(response_result['status'], site.similar_search)
|
||||
|
||||
+4
-63
@@ -202,17 +202,6 @@ def setup_arguments_parser(settings: Settings):
|
||||
default=settings.sites_db_path,
|
||||
help="Load Maigret database from a JSON file or HTTP web resource.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--extra-db",
|
||||
metavar="EXTRA_DB_FILE",
|
||||
dest="extra_db_files",
|
||||
action="append",
|
||||
default=[],
|
||||
help="Load an additional sites database on top of --db. Repeatable. "
|
||||
"Accepts a local path (absolute or cwd-relative) or HTTP(S) URL. "
|
||||
"Never auto-updated. Changes from --self-check / --submit are NOT "
|
||||
"persisted when any --extra-db is loaded.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--no-autoupdate",
|
||||
action="store_true",
|
||||
@@ -625,46 +614,6 @@ async def main():
|
||||
)
|
||||
else:
|
||||
raise
|
||||
|
||||
for extra_arg in args.extra_db_files:
|
||||
try:
|
||||
extra_path = resolve_db_path(
|
||||
db_file_arg=extra_arg,
|
||||
no_autoupdate=True,
|
||||
meta_url=settings.db_update_meta_url,
|
||||
check_interval_hours=settings.autoupdate_check_interval_hours,
|
||||
color=not args.no_color,
|
||||
)
|
||||
except FileNotFoundError as e:
|
||||
logger.error(f"--extra-db: {e}")
|
||||
sys.exit(2)
|
||||
|
||||
before = len(db.sites)
|
||||
try:
|
||||
db.load_extra_from_path(extra_path)
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to load extra database from {extra_path}: {e}")
|
||||
sys.exit(2)
|
||||
query_notify.success(
|
||||
f'Loaded extra database: {extra_path} '
|
||||
f'(+{len(db.sites) - before} new, {len(db.sites)} total sites)'
|
||||
)
|
||||
|
||||
if args.extra_db_files:
|
||||
query_notify.warning(
|
||||
'Database modifications will NOT be persisted while --extra-db is active.'
|
||||
)
|
||||
|
||||
def save_db_if_safe(reason: str) -> bool:
|
||||
if args.extra_db_files:
|
||||
logger.warning(
|
||||
f"Skipping database save ({reason}): --extra-db is active; "
|
||||
"modifications are in-memory only."
|
||||
)
|
||||
return False
|
||||
db.save_to_file(db_file)
|
||||
return True
|
||||
|
||||
get_top_sites_for_id = lambda x: db.ranked_sites_dict(
|
||||
top=args.top_sites,
|
||||
tags=args.tags,
|
||||
@@ -680,7 +629,7 @@ async def main():
|
||||
submitter = Submitter(db=db, logger=logger, settings=settings, args=args)
|
||||
is_submitted = await submitter.dialog(args.new_site_to_submit, args.cookie_file)
|
||||
if is_submitted:
|
||||
save_db_if_safe("post-submit")
|
||||
db.save_to_file(db_file)
|
||||
await submitter.close()
|
||||
|
||||
# Database self-checking
|
||||
@@ -714,8 +663,8 @@ async def main():
|
||||
'y',
|
||||
'',
|
||||
):
|
||||
if save_db_if_safe("post-self-check"):
|
||||
print('Database was successfully updated.')
|
||||
db.save_to_file(db_file)
|
||||
print('Database was successfully updated.')
|
||||
else:
|
||||
print('Updates will be applied only for current search session.')
|
||||
|
||||
@@ -738,14 +687,6 @@ async def main():
|
||||
|
||||
# Web interface
|
||||
if args.web is not None:
|
||||
if args.extra_db_files:
|
||||
logger.error(
|
||||
'--web is not compatible with --extra-db: the web UI reloads '
|
||||
'the database from --db only, so extras would be silently '
|
||||
'ignored. Remove --extra-db or use the CLI mode.'
|
||||
)
|
||||
sys.exit(2)
|
||||
|
||||
from maigret.web.app import app
|
||||
|
||||
app.config["MAIGRET_DB_FILE"] = db_file
|
||||
@@ -932,7 +873,7 @@ async def main():
|
||||
print(text_report)
|
||||
|
||||
# update database
|
||||
save_db_if_safe("end-of-run")
|
||||
db.save_to_file(db_file)
|
||||
|
||||
|
||||
def run():
|
||||
|
||||
+1235
-1457
File diff suppressed because it is too large
Load Diff
@@ -1,8 +1,8 @@
|
||||
{
|
||||
"version": 1,
|
||||
"updated_at": "2026-04-23T15:02:48Z",
|
||||
"sites_count": 3142,
|
||||
"updated_at": "2026-04-21T00:02:26Z",
|
||||
"sites_count": 3141,
|
||||
"min_maigret_version": "0.6.0",
|
||||
"data_sha256": "1e1ed6da2aa9db0f34171f61a044c20bbd1ed53a0430dec4a9ce8f8543655d1a",
|
||||
"data_sha256": "d93fb2d051328b60126c98fbf02841a6974549f0c8c9220a207a9172b3ee0c90",
|
||||
"data_url": "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/data.json"
|
||||
}
|
||||
@@ -516,15 +516,6 @@ class MaigretDatabase:
|
||||
else:
|
||||
return self.load_from_file(path)
|
||||
|
||||
def load_extra_from_path(self, path: str) -> "MaigretDatabase":
|
||||
"""Merge an additional DB on top of self. Last-wins on duplicate
|
||||
site/engine names; tags deduped preserving first-seen order."""
|
||||
self.load_from_path(path)
|
||||
self._sites = list({s.name: s for s in self._sites}.values())
|
||||
self._engines = list({e.name: e for e in self._engines}.values())
|
||||
self._tags = list(dict.fromkeys(self._tags))
|
||||
return self
|
||||
|
||||
def load_from_http(self, url: str) -> "MaigretDatabase":
|
||||
is_url_valid = url.startswith("http://") or url.startswith("https://")
|
||||
|
||||
|
||||
Generated
+50
-53
@@ -1261,18 +1261,18 @@ lxml = ["lxml ; platform_python_implementation == \"CPython\""]
|
||||
|
||||
[[package]]
|
||||
name = "idna"
|
||||
version = "3.12"
|
||||
version = "3.11"
|
||||
description = "Internationalized Domain Names in Applications (IDNA)"
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
groups = ["main"]
|
||||
files = [
|
||||
{file = "idna-3.12-py3-none-any.whl", hash = "sha256:60ffaa1858fac94c9c124728c24fcde8160f3fb4a7f79aa8cdd33a9d1af60a67"},
|
||||
{file = "idna-3.12.tar.gz", hash = "sha256:724e9952cc9e2bd7550ea784adb098d837ab5267ef67a1ab9cf7846bdbdd8254"},
|
||||
{file = "idna-3.11-py3-none-any.whl", hash = "sha256:771a87f49d9defaf64091e6e6fe9c18d4833f140bd19464795bc32d966ca37ea"},
|
||||
{file = "idna-3.11.tar.gz", hash = "sha256:795dafcc9c04ed0c1fb032c2aa73654d8e8c5023a7df64a53f39190ada629902"},
|
||||
]
|
||||
|
||||
[package.extras]
|
||||
all = ["mypy (>=1.11.2)", "pytest (>=8.3.2)", "ruff (>=0.6.2)"]
|
||||
all = ["flake8 (>=7.1.1)", "mypy (>=1.11.2)", "pytest (>=8.3.2)", "ruff (>=0.6.2)"]
|
||||
|
||||
[[package]]
|
||||
name = "iniconfig"
|
||||
@@ -1985,56 +1985,56 @@ typing-extensions = {version = ">=4.1.0", markers = "python_version < \"3.11\""}
|
||||
|
||||
[[package]]
|
||||
name = "mypy"
|
||||
version = "1.20.2"
|
||||
version = "1.20.1"
|
||||
description = "Optional static typing for Python"
|
||||
optional = false
|
||||
python-versions = ">=3.10"
|
||||
groups = ["dev"]
|
||||
files = [
|
||||
{file = "mypy-1.20.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:cf5a4db6dca263010e2c7bff081c89383c72d187ba2cf4c44759aac970e2f0c4"},
|
||||
{file = "mypy-1.20.2-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:7b0e817b518bff7facd7f85ea05b643ad8bdcce684cf29784987b0a7c8e1f997"},
|
||||
{file = "mypy-1.20.2-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:97d7b9a485b40f8ca425460e89bf1da2814625b2da627c0dcc6aa46c92631d14"},
|
||||
{file = "mypy-1.20.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:1e1c12f6d2db3d78b909b5f77513c11eb7f2dd2782b96a3ab6dffc7d44575c99"},
|
||||
{file = "mypy-1.20.2-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:89dce27e142d25ffbc154c1819383b69f2e9234dc4ed4766f42e0e8cb264ab5c"},
|
||||
{file = "mypy-1.20.2-cp310-cp310-win_amd64.whl", hash = "sha256:f376e37f9bf2a946872fc5fd1199c99310748e3c26c7a26683f13f8bdb756cbd"},
|
||||
{file = "mypy-1.20.2-cp310-cp310-win_arm64.whl", hash = "sha256:6e2b469efd811707bc530fd1effef0f5d6eebcb7fe376affae69025da4b979a2"},
|
||||
{file = "mypy-1.20.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:4077797a273e56e8843d001e9dfe4ba10e33323d6ade647ff260e5cd97d9758c"},
|
||||
{file = "mypy-1.20.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:cdecf62abcc4292500d7858aeae87a1f8f1150f4c4dd08fb0b336ee79b2a6df3"},
|
||||
{file = "mypy-1.20.2-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c566c3a88b6ece59b3d70f65bedef17304f48eb52ff040a6a18214e1917b3254"},
|
||||
{file = "mypy-1.20.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0deb80d062b2479f2c87ae568f89845afc71d11bc41b04179e58165fd9f31e98"},
|
||||
{file = "mypy-1.20.2-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:bba9ad231e92a3e424b3e56b65aa17704993425bba97e302c832f9466bb85bac"},
|
||||
{file = "mypy-1.20.2-cp311-cp311-win_amd64.whl", hash = "sha256:baf593f2765fa3a6b1ef95807dbaa3d25b594f6a52adcc506a6b9cb115e1be67"},
|
||||
{file = "mypy-1.20.2-cp311-cp311-win_arm64.whl", hash = "sha256:20175a1c0f49863946ec20b7f63255768058ac4f07d2b9ded6a6b46cfb5a9100"},
|
||||
{file = "mypy-1.20.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:4dbfcf869f6b0517f70cf0030ba6ea1d6645e132337a7d5204a18d8d5636c02b"},
|
||||
{file = "mypy-1.20.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4b6481b228d072315b053210b01ac320e1be243dc17f9e5887ef167f23f5fae4"},
|
||||
{file = "mypy-1.20.2-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:34397cdced6b90b836e38182076049fdb41424322e0b0728c946b0939ebdf9f6"},
|
||||
{file = "mypy-1.20.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a5da6976f20cae27059ea8d0c86e7cef3de720e04c4bb9ee18e3690fdb792066"},
|
||||
{file = "mypy-1.20.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:56908d7e08318d39f85b1f0c6cfd47b0cac1a130da677630dac0de3e0623e102"},
|
||||
{file = "mypy-1.20.2-cp312-cp312-win_amd64.whl", hash = "sha256:d52ad8d78522da1d308789df651ee5379088e77c76cb1994858d40a426b343b9"},
|
||||
{file = "mypy-1.20.2-cp312-cp312-win_arm64.whl", hash = "sha256:785b08db19c9f214dc37d65f7c165d19a30fcecb48abfa30f31b01b5acaabb58"},
|
||||
{file = "mypy-1.20.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:edfbfca868cdd6bd8d974a60f8a3682f5565d3f5c99b327640cedd24c4264026"},
|
||||
{file = "mypy-1.20.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:e2877a02380adfcdbc69071a0f74d6e9dbbf593c0dc9d174e1f223ffd5281943"},
|
||||
{file = "mypy-1.20.2-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7488448de6007cd5177c6cea0517ac33b4c0f5ee9b5e9f2be51ce75511a85517"},
|
||||
{file = "mypy-1.20.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bb9c2fa06887e21d6a3a868762acb82aec34e2c6fd0174064f27c93ede68ad15"},
|
||||
{file = "mypy-1.20.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:9d56a78b646f2e3daa865bc70cd5ec5a46c50045801ca8ff17a0c43abc97e3ee"},
|
||||
{file = "mypy-1.20.2-cp313-cp313-win_amd64.whl", hash = "sha256:2a4102b03bb7481d9a91a6da8d174740c9c8c4401024684b9ca3b7cc5e49852f"},
|
||||
{file = "mypy-1.20.2-cp313-cp313-win_arm64.whl", hash = "sha256:a95a9248b0c6fd933a442c03c3b113c3b61320086b88e2c444676d3fd1ca3330"},
|
||||
{file = "mypy-1.20.2-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:419413398fe250aae057fd2fe50166b61077083c9b82754c341cf4fd73038f30"},
|
||||
{file = "mypy-1.20.2-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:e73c07f23009962885c197ccb9b41356a30cc0e5a1d0c2ea8fd8fb1362d7f924"},
|
||||
{file = "mypy-1.20.2-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0c64e5973df366b747646fc98da921f9d6eba9716d57d1db94a83c026a08e0fb"},
|
||||
{file = "mypy-1.20.2-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5a65aa591af023864fd08a97da9974e919452cfe19cb146c8a5dc692626445dc"},
|
||||
{file = "mypy-1.20.2-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:4fef51b01e638974a6e69885687e9bd40c8d1e09a6cd291cca0619625cf1f558"},
|
||||
{file = "mypy-1.20.2-cp314-cp314-win_amd64.whl", hash = "sha256:913485a03f1bcf5d279409a9d2b9ed565c151f61c09f29991e5faa14033da4c8"},
|
||||
{file = "mypy-1.20.2-cp314-cp314-win_arm64.whl", hash = "sha256:c3bae4f855d965b5453784300c12ffc63a548304ac7f99e55d4dc7c898673aa3"},
|
||||
{file = "mypy-1.20.2-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:2de3dcea53babc1c3237a19002bc3d228ce1833278f093b8d619e06e7cc79609"},
|
||||
{file = "mypy-1.20.2-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:52b176444e2e5054dfcbcb8c75b0b719865c96247b37407184bbfca5c353f2c2"},
|
||||
{file = "mypy-1.20.2-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:688c3312e5dadb573a2c69c82af3a298d43ecf9e6d264e0f95df960b5f6ac19c"},
|
||||
{file = "mypy-1.20.2-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:29752dbbf8cc53f89f6ac096d363314333045c257c9c75cbd189ca2de0455744"},
|
||||
{file = "mypy-1.20.2-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:803203d2b6ea644982c644895c2f78b28d0e208bba7b27d9b921e0ec5eb207c6"},
|
||||
{file = "mypy-1.20.2-cp314-cp314t-win_amd64.whl", hash = "sha256:9bcb8aa397ff0093c824182fd76a935a9ba7ad097fcbef80ae89bf6c1731d8ec"},
|
||||
{file = "mypy-1.20.2-cp314-cp314t-win_arm64.whl", hash = "sha256:e061b58443f1736f8a37c48978d7ab581636d6ab03e3d4f99e3fa90463bb9382"},
|
||||
{file = "mypy-1.20.2-py3-none-any.whl", hash = "sha256:a94c5a76ab46c5e6257c7972b6c8cff0574201ca7dc05647e33e795d78680563"},
|
||||
{file = "mypy-1.20.2.tar.gz", hash = "sha256:e8222c26daaafd9e8626dec58ae36029f82585890589576f769a650dd20fd665"},
|
||||
{file = "mypy-1.20.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:3ba5d1e712ada9c3b6223dcbc5a31dac334ed62991e5caa17bcf5a4ddc349af0"},
|
||||
{file = "mypy-1.20.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:2e731284c117b0987fb1e6c5013a56f33e7faa1fce594066ab83876183ce1c66"},
|
||||
{file = "mypy-1.20.1-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f8e945b872a05f4fbefabe2249c0b07b6b194e5e11a86ebee9edf855de09806c"},
|
||||
{file = "mypy-1.20.1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2fc88acef0dc9b15246502b418980478c1bfc9702057a0e1e7598d01a7af8937"},
|
||||
{file = "mypy-1.20.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:14911a115c73608f155f648b978c5055d16ff974e6b1b5512d7fedf4fa8b15c6"},
|
||||
{file = "mypy-1.20.1-cp310-cp310-win_amd64.whl", hash = "sha256:76d9b4c992cca3331d9793ef197ae360ea44953cf35beb2526e95b9e074f2866"},
|
||||
{file = "mypy-1.20.1-cp310-cp310-win_arm64.whl", hash = "sha256:b408722f80be44845da555671a5ef3a0c63f51ca5752b0c20e992dc9c0fbd3cd"},
|
||||
{file = "mypy-1.20.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:c01eb9bac2c6a962d00f9d23421cd2913840e65bba365167d057bd0b4171a92e"},
|
||||
{file = "mypy-1.20.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:55d12ddbd8a9cac5b276878bd534fa39fff5bf543dc6ae18f25d30c8d7d27fca"},
|
||||
{file = "mypy-1.20.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c0aa322c1468b6cdfc927a44ce130f79bb44bcd34eb4a009eb9f96571fd80955"},
|
||||
{file = "mypy-1.20.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:3f8bc95899cf676b6e2285779a08a998cc3a7b26f1026752df9d2741df3c79e8"},
|
||||
{file = "mypy-1.20.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:47c2b90191a870a04041e910277494b0d92f0711be9e524d45c074fe60c00b65"},
|
||||
{file = "mypy-1.20.1-cp311-cp311-win_amd64.whl", hash = "sha256:9857dc8d2ec1a392ffbda518075beb00ac58859979c79f9e6bdcb7277082c2f2"},
|
||||
{file = "mypy-1.20.1-cp311-cp311-win_arm64.whl", hash = "sha256:09d8df92bb25b6065ab91b178da843dda67b33eb819321679a6e98a907ce0e10"},
|
||||
{file = "mypy-1.20.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:36ee2b9c6599c230fea89bbd79f401f9f9f8e9fcf0c777827789b19b7da90f51"},
|
||||
{file = "mypy-1.20.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:fba3fb0968a7b48806b0c90f38d39296f10766885a94c83bd21399de1e14eb28"},
|
||||
{file = "mypy-1.20.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ef1415a637cd3627d6304dfbeddbadd21079dafc2a8a753c477ce4fc0c2af54f"},
|
||||
{file = "mypy-1.20.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ef3461b1ad5cd446e540016e90b5984657edda39f982f4cc45ca317b628f5a37"},
|
||||
{file = "mypy-1.20.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:542dd63c9e1339b6092eb25bd515f3a32a1453aee8c9521d2ddb17dacd840237"},
|
||||
{file = "mypy-1.20.1-cp312-cp312-win_amd64.whl", hash = "sha256:1d55c7cd8ca22e31f93af2a01160a9e95465b5878de23dba7e48116052f20a8d"},
|
||||
{file = "mypy-1.20.1-cp312-cp312-win_arm64.whl", hash = "sha256:f5b84a79070586e0d353ee07b719d9d0a4aa7c8ee90c0ea97747e98cbe193019"},
|
||||
{file = "mypy-1.20.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:8f3886c03e40afefd327bd70b3f634b39ea82e87f314edaa4d0cce4b927ddcc1"},
|
||||
{file = "mypy-1.20.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:e860eb3904f9764e83bafd70c8250bdffdc7dde6b82f486e8156348bf7ceb184"},
|
||||
{file = "mypy-1.20.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a4b5aac6e785719da51a84f5d09e9e843d473170a9045b1ea7ea1af86225df4b"},
|
||||
{file = "mypy-1.20.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f37b6cd0fe2ad3a20f05ace48ca3523fc52ff86940e34937b439613b6854472e"},
|
||||
{file = "mypy-1.20.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:e4bbb0f6b54ce7cc350ef4a770650d15fa70edd99ad5267e227133eda9c94218"},
|
||||
{file = "mypy-1.20.1-cp313-cp313-win_amd64.whl", hash = "sha256:c3dc20f8ec76eecd77148cdd2f1542ed496e51e185713bf488a414f862deb8f2"},
|
||||
{file = "mypy-1.20.1-cp313-cp313-win_arm64.whl", hash = "sha256:a9d62bbac5d6d46718e2b0330b25e6264463ed832722b8f7d4440ff1be3ca895"},
|
||||
{file = "mypy-1.20.1-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:12927b9c0ed794daedcf1dab055b6c613d9d5659ac511e8d936d96f19c087d12"},
|
||||
{file = "mypy-1.20.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:752507dd481e958b2c08fc966d3806c962af5a9433b5bf8f3bdd7175c20e34fe"},
|
||||
{file = "mypy-1.20.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c614655b5a065e56274c6cbbe405f7cf7e96c0654db7ba39bc680238837f7b08"},
|
||||
{file = "mypy-1.20.1-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2c3f6221a76f34d5100c6d35b3ef6b947054123c3f8d6938a4ba00b1308aa572"},
|
||||
{file = "mypy-1.20.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:4bdfc06303ac06500af71ea0cdbe995c502b3c9ba32f3f8313523c137a25d1b6"},
|
||||
{file = "mypy-1.20.1-cp314-cp314-win_amd64.whl", hash = "sha256:0131edd7eba289973d1ba1003d1a37c426b85cdef76650cd02da6420898a5eb3"},
|
||||
{file = "mypy-1.20.1-cp314-cp314-win_arm64.whl", hash = "sha256:33f02904feb2c07e1fdf7909026206396c9deeb9e6f34d466b4cfedb0aadbbe4"},
|
||||
{file = "mypy-1.20.1-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:168472149dd8cc505c98cefd21ad77e4257ed6022cd5ed2fe2999bed56977a5a"},
|
||||
{file = "mypy-1.20.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:eb674600309a8f22790cca883a97c90299f948183ebb210fbef6bcee07cb1986"},
|
||||
{file = "mypy-1.20.1-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ef2b2e4cc464ba9795459f2586923abd58a0055487cbe558cb538ea6e6bc142a"},
|
||||
{file = "mypy-1.20.1-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:dee461d396dd46b3f0ed5a098dbc9b8860c81c46ad44fa071afcfbc149f167c9"},
|
||||
{file = "mypy-1.20.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:e364926308b3e66f1361f81a566fc1b2f8cd47fc8525e8136d4058a65a4b4f02"},
|
||||
{file = "mypy-1.20.1-cp314-cp314t-win_amd64.whl", hash = "sha256:a0c17fbd746d38c70cbc42647cfd884f845a9708a4b160a8b4f7e70d41f4d7fa"},
|
||||
{file = "mypy-1.20.1-cp314-cp314t-win_arm64.whl", hash = "sha256:db2cb89654626a912efda69c0d5c1d22d948265e2069010d3dde3abf751c7d08"},
|
||||
{file = "mypy-1.20.1-py3-none-any.whl", hash = "sha256:1aae28507f253fe82d883790d1c0a0d35798a810117c88184097fe8881052f06"},
|
||||
{file = "mypy-1.20.1.tar.gz", hash = "sha256:6fc3f4ecd52de81648fed1945498bf42fa2993ddfad67c9056df36ae5757f804"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@@ -2042,10 +2042,7 @@ librt = {version = ">=0.8.0", markers = "platform_python_implementation != \"PyP
|
||||
mypy_extensions = ">=1.0.0"
|
||||
pathspec = ">=1.0.0"
|
||||
tomli = {version = ">=1.1.0", markers = "python_version < \"3.11\""}
|
||||
typing_extensions = [
|
||||
{version = ">=4.6.0", markers = "python_version < \"3.15\""},
|
||||
{version = ">=4.14.0", markers = "python_version >= \"3.15\""},
|
||||
]
|
||||
typing_extensions = ">=4.6.0"
|
||||
|
||||
[package.extras]
|
||||
dmypy = ["psutil (>=4.0)"]
|
||||
|
||||
@@ -1,7 +1,12 @@
|
||||
import asyncio
|
||||
import logging
|
||||
|
||||
from mock import Mock
|
||||
import pytest
|
||||
|
||||
from maigret import search
|
||||
from maigret.checking import check_site_for_username, process_site_result
|
||||
from maigret.result import MaigretCheckResult, MaigretCheckStatus
|
||||
|
||||
|
||||
def site_result_except(server, username, **kwargs):
|
||||
@@ -67,3 +72,70 @@ async def test_checking_by_message_negative(httpserver, local_test_db):
|
||||
|
||||
result = await search('unclaimed', site_dict=sites_dict, logger=Mock())
|
||||
assert result['Message']['status'].is_found() is True
|
||||
|
||||
|
||||
def test_process_site_result_threads_response_time(local_test_db):
|
||||
"""process_site_result must thread the response_time kwarg into the result's query_time."""
|
||||
site = local_test_db.sites_dict['StatusCode']
|
||||
results_info = {
|
||||
'username': 'claimed',
|
||||
'parsing_enabled': False,
|
||||
'url_user': site.url.replace('{username}', 'claimed'),
|
||||
'status': None,
|
||||
'rank': 0,
|
||||
'url_main': site.url_main,
|
||||
'ids_data': {},
|
||||
}
|
||||
response = ('body', 200, None)
|
||||
logger = logging.getLogger('test')
|
||||
query_notify = Mock()
|
||||
|
||||
out = process_site_result(
|
||||
response, query_notify, logger, results_info, site,
|
||||
response_time=1.234,
|
||||
)
|
||||
assert out['status'].query_time == pytest.approx(1.234)
|
||||
|
||||
|
||||
def test_process_site_result_defaults_response_time_to_none(local_test_db):
|
||||
"""Omitting response_time keeps query_time as None (backward compatible)."""
|
||||
site = local_test_db.sites_dict['StatusCode']
|
||||
results_info = {
|
||||
'username': 'claimed',
|
||||
'parsing_enabled': False,
|
||||
'url_user': site.url.replace('{username}', 'claimed'),
|
||||
'status': None,
|
||||
'rank': 0,
|
||||
'url_main': site.url_main,
|
||||
'ids_data': {},
|
||||
}
|
||||
out = process_site_result(
|
||||
('body', 200, None), Mock(), logging.getLogger('test'), results_info, site,
|
||||
)
|
||||
assert out['status'].query_time is None
|
||||
|
||||
|
||||
@pytest.mark.slow
|
||||
@pytest.mark.asyncio
|
||||
async def test_query_time_populated_from_http_check(httpserver, local_test_db):
|
||||
"""check_site_for_username measures HTTP round-trip and populates query_time."""
|
||||
sites_dict = local_test_db.sites_dict
|
||||
|
||||
# Delay the response on the test HTTP server to produce a measurable query_time.
|
||||
DELAY = 0.25
|
||||
|
||||
def delayed_handler(request):
|
||||
import time as _time
|
||||
_time.sleep(DELAY)
|
||||
from werkzeug.wrappers import Response
|
||||
return Response('ok', status=200)
|
||||
|
||||
httpserver.expect_request('/url', query_string='id=claimed').respond_with_handler(delayed_handler)
|
||||
|
||||
result = await search('claimed', site_dict={'StatusCode': sites_dict['StatusCode']}, logger=Mock())
|
||||
status = result['StatusCode']['status']
|
||||
assert status.is_found() is True
|
||||
assert isinstance(status.query_time, float)
|
||||
assert status.query_time >= DELAY
|
||||
# Upper bound: the measurement should not wildly exceed the server delay.
|
||||
assert status.query_time < DELAY + 5.0
|
||||
|
||||
@@ -51,7 +51,6 @@ DEFAULT_ARGS: Dict[str, Any] = {
|
||||
'md': False,
|
||||
'no_autoupdate': False,
|
||||
'force_update': False,
|
||||
'extra_db_files': [],
|
||||
}
|
||||
|
||||
|
||||
@@ -127,38 +126,6 @@ def test_args_exclude_tags(argparser):
|
||||
assert getattr(args, arg) == want_args[arg]
|
||||
|
||||
|
||||
def test_args_single_extra_db(argparser):
|
||||
args = argparser.parse_args('--extra-db extras.json username'.split())
|
||||
|
||||
want_args = dict(DEFAULT_ARGS)
|
||||
want_args.update(
|
||||
{
|
||||
'extra_db_files': ['extras.json'],
|
||||
'username': ['username'],
|
||||
}
|
||||
)
|
||||
|
||||
for arg in vars(args):
|
||||
assert getattr(args, arg) == want_args[arg]
|
||||
|
||||
|
||||
def test_args_multiple_extra_dbs(argparser):
|
||||
args = argparser.parse_args(
|
||||
'--extra-db a.json --extra-db https://example.com/b.json username'.split()
|
||||
)
|
||||
|
||||
want_args = dict(DEFAULT_ARGS)
|
||||
want_args.update(
|
||||
{
|
||||
'extra_db_files': ['a.json', 'https://example.com/b.json'],
|
||||
'username': ['username'],
|
||||
}
|
||||
)
|
||||
|
||||
for arg in vars(args):
|
||||
assert getattr(args, arg) == want_args[arg]
|
||||
|
||||
|
||||
def test_args_tags_with_exclude_tags(argparser):
|
||||
args = argparser.parse_args('--tags coding --exclude-tags porn username'.split())
|
||||
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
"""Maigret Database test functions"""
|
||||
|
||||
import json
|
||||
from typing import Any, Dict
|
||||
|
||||
from maigret.sites import MaigretDatabase, MaigretSite
|
||||
@@ -97,163 +96,6 @@ def test_site_strip_engine_data_with_site_prior_updates():
|
||||
assert amperka_stripped.json == UPDATED_EXAMPLE_DB['sites']['Amperka']
|
||||
|
||||
|
||||
def _write_db(tmp_path, name, data):
|
||||
p = tmp_path / name
|
||||
p.write_text(json.dumps(data), encoding='utf-8')
|
||||
return str(p)
|
||||
|
||||
|
||||
def test_extra_db_new_site(tmp_path):
|
||||
db = MaigretDatabase()
|
||||
db.load_from_json(EXAMPLE_DB)
|
||||
assert len(db.sites) == 1
|
||||
|
||||
extra = {
|
||||
'engines': {},
|
||||
'sites': {
|
||||
'ExampleExtra': {
|
||||
'tags': ['us'],
|
||||
'checkType': 'status_code',
|
||||
'url': 'https://example.com/{username}',
|
||||
'urlMain': 'https://example.com/',
|
||||
'usernameClaimed': 'test',
|
||||
'usernameUnclaimed': 'noonewouldeverusethis7',
|
||||
}
|
||||
},
|
||||
'tags': ['us'],
|
||||
}
|
||||
db.load_extra_from_path(_write_db(tmp_path, 'extra.json', extra))
|
||||
|
||||
assert len(db.sites) == 2
|
||||
assert set(db.sites_dict.keys()) == {'Amperka', 'ExampleExtra'}
|
||||
assert len(db._sites) == len(db.sites_dict)
|
||||
|
||||
|
||||
def test_extra_db_site_override_last_wins(tmp_path):
|
||||
db = MaigretDatabase()
|
||||
db.load_from_json(EXAMPLE_DB)
|
||||
assert db.sites_dict['Amperka'].url_main == 'http://forum.amperka.ru'
|
||||
|
||||
extra = {
|
||||
'engines': {},
|
||||
'sites': {
|
||||
'Amperka': {
|
||||
'engine': 'XenForo',
|
||||
'rank': 1,
|
||||
'tags': ['overridden'],
|
||||
'urlMain': 'https://overridden.example',
|
||||
'usernameClaimed': 'adam',
|
||||
'usernameUnclaimed': 'noonewouldeverusethis7',
|
||||
}
|
||||
},
|
||||
'tags': [],
|
||||
}
|
||||
db.load_extra_from_path(_write_db(tmp_path, 'extra.json', extra))
|
||||
|
||||
assert len(db.sites) == 1
|
||||
amperka = db.sites_dict['Amperka']
|
||||
assert amperka.url_main == 'https://overridden.example'
|
||||
assert 'overridden' in amperka.tags
|
||||
|
||||
|
||||
def test_extra_db_engine_override(tmp_path):
|
||||
main = {
|
||||
'engines': {
|
||||
'Proto': {
|
||||
'presenseStrs': ['orig'],
|
||||
'site': {
|
||||
'absenceStrs': ['original absence'],
|
||||
'checkType': 'message',
|
||||
'url': '{urlMain}/orig/{username}',
|
||||
},
|
||||
}
|
||||
},
|
||||
'sites': {
|
||||
'MainSite': {
|
||||
'engine': 'Proto',
|
||||
'rank': 1,
|
||||
'tags': [],
|
||||
'urlMain': 'https://main.example',
|
||||
'usernameClaimed': 'a',
|
||||
'usernameUnclaimed': 'noonewouldeverusethis7',
|
||||
}
|
||||
},
|
||||
'tags': [],
|
||||
}
|
||||
db = MaigretDatabase()
|
||||
db.load_from_json(main)
|
||||
|
||||
extra = {
|
||||
'engines': {
|
||||
'Proto': {
|
||||
'presenseStrs': ['overridden'],
|
||||
'site': {
|
||||
'absenceStrs': ['overridden absence'],
|
||||
'checkType': 'message',
|
||||
'url': '{urlMain}/overridden/{username}',
|
||||
},
|
||||
}
|
||||
},
|
||||
'sites': {
|
||||
'ExtraSite': {
|
||||
'engine': 'Proto',
|
||||
'rank': 10,
|
||||
'tags': [],
|
||||
'urlMain': 'https://extra.example',
|
||||
'usernameClaimed': 'a',
|
||||
'usernameUnclaimed': 'noonewouldeverusethis7',
|
||||
}
|
||||
},
|
||||
'tags': [],
|
||||
}
|
||||
db.load_extra_from_path(_write_db(tmp_path, 'extra.json', extra))
|
||||
|
||||
assert len(db._engines) == 1
|
||||
assert db.engines_dict['Proto'].presenseStrs == ['overridden']
|
||||
extra_site = db.sites_dict['ExtraSite']
|
||||
assert extra_site.absence_strs == ['overridden absence']
|
||||
main_site = db.sites_dict['MainSite']
|
||||
assert main_site.absence_strs == ['original absence']
|
||||
|
||||
|
||||
def test_extra_db_tag_dedup(tmp_path):
|
||||
db = MaigretDatabase()
|
||||
db.load_from_json({'engines': {}, 'sites': {}, 'tags': ['forum', 'ru']})
|
||||
|
||||
extra = {'engines': {}, 'sites': {}, 'tags': ['forum', 'us']}
|
||||
db.load_extra_from_path(_write_db(tmp_path, 'extra.json', extra))
|
||||
|
||||
assert db._tags.count('forum') == 1
|
||||
assert sorted(db._tags) == ['forum', 'ru', 'us']
|
||||
|
||||
|
||||
def test_extra_db_chain_last_wins(tmp_path):
|
||||
db = MaigretDatabase()
|
||||
db.load_from_json(EXAMPLE_DB)
|
||||
|
||||
def site_with_url(url):
|
||||
return {
|
||||
'engines': {},
|
||||
'sites': {
|
||||
'Amperka': {
|
||||
'engine': 'XenForo',
|
||||
'rank': 1,
|
||||
'tags': ['ru'],
|
||||
'urlMain': url,
|
||||
'usernameClaimed': 'adam',
|
||||
'usernameUnclaimed': 'noonewouldeverusethis7',
|
||||
}
|
||||
},
|
||||
'tags': [],
|
||||
}
|
||||
|
||||
db.load_extra_from_path(_write_db(tmp_path, 'a.json', site_with_url('https://a')))
|
||||
db.load_extra_from_path(_write_db(tmp_path, 'b.json', site_with_url('https://b')))
|
||||
|
||||
assert len(db.sites) == 1
|
||||
assert db.sites_dict['Amperka'].url_main == 'https://b'
|
||||
|
||||
|
||||
def test_saving_site_error():
|
||||
db = MaigretDatabase()
|
||||
|
||||
|
||||
Reference in New Issue
Block a user