Compare commits

..

1 Commits

Author SHA1 Message Date
Soxoj 5830c9ce72 Add response time measurement 2026-04-21 02:02:24 +02:00
27 changed files with 2283 additions and 3705 deletions
+10 -48
View File
@@ -2,7 +2,7 @@ name: Build docker image and push to DockerHub
on:
push:
branches: [ main, dev ]
branches: [ main ]
jobs:
docker:
@@ -10,62 +10,24 @@ jobs:
steps:
-
name: Set up QEMU
uses: docker/setup-qemu-action@v3
uses: docker/setup-qemu-action@v1
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
uses: docker/setup-buildx-action@v1
-
name: Login to DockerHub
uses: docker/login-action@v3
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_HUB_USERNAME }}
password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}
-
name: Extract metadata (CLI)
id: meta_cli
uses: docker/metadata-action@v5
with:
images: ${{ secrets.DOCKER_HUB_USERNAME }}/maigret
tags: |
type=raw,value=latest,enable={{is_default_branch}}
type=ref,event=branch
type=sha,prefix=
-
name: Extract metadata (Web UI)
id: meta_web
uses: docker/metadata-action@v5
with:
images: ${{ secrets.DOCKER_HUB_USERNAME }}/maigret
tags: |
type=raw,value=web,enable={{is_default_branch}}
type=ref,event=branch,suffix=-web
type=sha,prefix=web-
-
name: Build and push (CLI, default)
id: docker_build_cli
uses: docker/build-push-action@v6
name: Build and push
id: docker_build
uses: docker/build-push-action@v2
with:
push: true
target: cli
tags: ${{ steps.meta_cli.outputs.tags }}
labels: ${{ steps.meta_cli.outputs.labels }}
tags: ${{ secrets.DOCKER_HUB_USERNAME }}/maigret:latest
platforms: linux/amd64,linux/arm64
cache-from: type=gha
cache-to: type=gha,mode=max
-
name: Build and push (Web UI)
id: docker_build_web
uses: docker/build-push-action@v6
with:
push: true
target: web
tags: ${{ steps.meta_web.outputs.tags }}
labels: ${{ steps.meta_web.outputs.labels }}
platforms: linux/amd64,linux/arm64
cache-from: type=gha
cache-to: type=gha,mode=max
-
name: Image digests
run: |
echo "cli: ${{ steps.docker_build_cli.outputs.digest }}"
echo "web: ${{ steps.docker_build_web.outputs.digest }}"
name: Image digest
run: echo ${{ steps.docker_build.outputs.digest }}
-1
View File
@@ -44,4 +44,3 @@ settings.json
*.egg-info
build
LLM
lib
+1 -10
View File
@@ -1,4 +1,4 @@
FROM python:3.11-slim AS base
FROM python:3.11-slim
LABEL maintainer="Soxoj <soxoj@protonmail.com>"
WORKDIR /app
RUN pip install --no-cache-dir --upgrade pip
@@ -15,13 +15,4 @@ COPY . .
RUN YARL_NO_EXTENSIONS=1 python3 -m pip install --no-cache-dir .
# For production use, set FLASK_HOST to a specific IP address for security
ENV FLASK_HOST=0.0.0.0
# Web UI variant: auto-launches the web interface on $PORT
FROM base AS web
ENV PORT=5000
EXPOSE 5000
ENTRYPOINT ["sh", "-c", "exec maigret --web \"$PORT\""]
# Default variant (last stage = `docker build .` target): CLI, backwards-compatible
FROM base AS cli
ENTRYPOINT ["maigret"]
+4 -16
View File
@@ -109,7 +109,7 @@ Download a standalone EXE from [Releases](https://github.com/soxoj/maigret/relea
Run Maigret in the browser via cloud shells or Jupyter notebooks:
<a href="https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/soxoj/maigret&tutorial=cloudshell-tutorial.md"><img src="https://user-images.githubusercontent.com/27065646/92304704-8d146d80-ef80-11ea-8c29-0deaabb1c702.png" alt="Open in Cloud Shell" height="50"></a>
[![Open in Cloud Shell](https://user-images.githubusercontent.com/27065646/92304704-8d146d80-ef80-11ea-8c29-0deaabb1c702.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/soxoj/maigret&tutorial=README.md)
<a href="https://repl.it/github/soxoj/maigret"><img src="https://replit.com/badge/github/soxoj/maigret" alt="Run on Replit" height="50"></a>
<a href="https://colab.research.google.com/gist/soxoj/879b51bc3b2f8b695abb054090645000/maigret-collab.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="45"></a>
@@ -140,27 +140,15 @@ maigret username
### Docker
Two image variants are published:
- `soxoj/maigret:latest` — CLI mode (default)
- `soxoj/maigret:web` — auto-launches the [web interface](#web-interface)
```bash
# official image (CLI)
# official image
docker pull soxoj/maigret
# CLI usage
# usage
docker run -v /mydir:/app/reports soxoj/maigret:latest username --html
# Web UI (open http://localhost:5000)
docker run -p 5000:5000 soxoj/maigret:web
# Web UI on a custom port
docker run -e PORT=8080 -p 8080:8080 soxoj/maigret:web
# manual build
docker build -t maigret . # CLI image (default target)
docker build --target web -t maigret-web . # Web UI image
docker build -t maigret .
```
### Troubleshooting
-69
View File
@@ -1,69 +0,0 @@
# Maigret
<div align="center">
<img src="https://raw.githubusercontent.com/soxoj/maigret/main/static/maigret.png" height="220" alt="Maigret logo"/>
</div>
**Maigret** collects a dossier on a person **by username only**, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys required.
## Installation
Google Cloud Shell does not ship with all the system libraries Maigret needs (`libcairo2-dev`, `pkg-config`). The helper script below installs them and then builds Maigret from the cloned source.
Copy the command and run it in the Cloud Shell terminal:
```bash
./utils/cloudshell_install.sh
```
When the script finishes, verify the install:
```bash
maigret --version
```
## Usage examples
Run a basic search for a username. By default Maigret checks the **500 highest-ranked sites by traffic** — pass `-a` to scan the full 3,000+ database.
```bash
maigret soxoj
```
Search several usernames at once:
```bash
maigret user1 user2 user3
```
Narrow the run to sites related to cryptocurrency via the `crypto` tag (you can also use country tags):
```bash
maigret vitalik.eth --tags crypto
```
Generate reports in HTML, PDF, and XMind 8 formats:
```bash
maigret soxoj --html
maigret soxoj --pdf
maigret soxoj --xmind
```
Download a generated report from Cloud Shell to your local machine:
```bash
cloudshell download reports/report_soxoj.pdf
```
Tune reliability on flaky networks — raise the timeout and retry failed checks:
```bash
maigret soxoj --timeout 60 --retries 2
```
For the full list of options see `maigret --help` or the [CLI documentation](https://maigret.readthedocs.io/en/latest/command-line-options.html).
## Further reading
Full project documentation: [maigret.readthedocs.io](https://maigret.readthedocs.io/)
+5 -21
View File
@@ -142,30 +142,14 @@ There are few options for sites data.json helpful in various cases:
``protection`` (site protection tracking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ``protection`` field records what kind of anti-bot protection a site uses. Maigret reads this field and automatically applies the appropriate bypass mechanism where one exists.
Two categories of tag:
- **Load-bearing.** Maigret changes its HTTP client or headers based on the tag. Currently only ``tls_fingerprint`` (switches to ``curl_cffi`` with Chrome-class TLS).
- **Documentation-only.** Maigret does **not** change behavior based on the tag; it records *why* the site is hard so a future solver can target the right set of sites without re-auditing.
Within the documentation-only tags, there is a further split that dictates whether the site is ``disabled: true``:
- ``ip_reputation`` is the **only** doc-tag that **keeps the site enabled**. It means "works for most users, fails from datacenter/cloud IPs." Disabling would silently hide a working site from anyone with a clean IP. The fix is **external** to Maigret (residential IP or ``--proxy``).
- ``cf_js_challenge``, ``cf_firewall``, ``aws_waf_js_challenge``, ``ddos_guard_challenge``, ``custom_bot_protection``, ``js_challenge`` all pair with ``disabled: true``. They mean "does not work for anyone right now"; the tag identifies the provider so that when a bypass ships, every site with that tag can be re-enabled in one pass.
The ``protection`` field records what kind of anti-bot protection a site uses. Maigret reads this field and automatically applies the appropriate bypass mechanism.
Supported values:
- ``tls_fingerprint`` *(load-bearing; site stays enabled)* — the site fingerprints the TLS handshake (JA3/JA4) and blocks non-browser clients. Maigret automatically uses ``curl_cffi`` with Chrome browser emulation to bypass this. Requires the ``curl_cffi`` package (included as a dependency). Examples: Instagram, NPM, Codepen, Kickstarter, Letterboxd.
- ``ip_reputation`` *(documentation-only; site stays enabled)* — the site blocks requests from datacenter/cloud IPs regardless of headers or TLS. Cannot be bypassed automatically; run Maigret from a regular internet connection (not a datacenter) or use a proxy (``--proxy``). The site is **not** marked ``disabled`` because it continues to work for users on residential IPs. Examples: Reddit, Patreon, Figma, OnlyFans.
- ``cf_js_challenge`` *(documentation-only; pair with ``disabled: true``)* — Cloudflare Managed Challenge / Turnstile JS challenge. Symptom: HTTP 403 with ``cf-mitigated: challenge`` header; body contains ``challenges.cloudflare.com``, ``_cf_chl_opt``, ``window._cf_chl``, or "Just a moment". Not bypassable via ``curl_cffi`` TLS impersonation (verified across Chrome 123/124/131, Safari 17/18, Firefox 133/135, Edge 101 — all return the same 403 challenge page); a real browser executing the challenge JS is required to obtain the clearance cookie. Sites stay ``disabled: true`` until a CF-challenge solver is integrated. Examples: DMOJ, Elakiri, Fanlore, Bdoutdoors, TheStudentRoom, forum.hr.
- ``cf_firewall`` *(documentation-only; pair with ``disabled: true``)* — Cloudflare firewall rule / bot score block (WAF action=block, **not** action=challenge). Symptom: HTTP 403 served by Cloudflare (``server: cloudflare``, ``cf-ray`` header) **without** JS-challenge markers — body typically shows "Access denied", "Attention Required", or just a bare 1015/1016/1020 error page. Unlike ``ip_reputation``, residential IPs are **not** sufficient to bypass — Cloudflare decides based on a composite of bot score, TLS fingerprint, UA, ASN, and custom site-owner rules, so ``curl_cffi`` Chrome impersonation from a residential line still returns 403. Sites stay ``disabled: true`` until a per-site bypass (cookies, real browser, or residential+clean session) is found. Examples: Fark, Fodors, Huntingnet, Hunttalk.
- ``aws_waf_js_challenge`` *(documentation-only; pair with ``disabled: true``)* — the site is protected by AWS WAF with a JavaScript challenge. Symptom: HTTP 202 with empty body and ``x-amzn-waf-action: challenge`` header (a token-granting challenge that requires executing the CAPTCHA/challenge JS bundle). Neither ``curl_cffi`` TLS impersonation nor User-Agent changes bypass this — a real browser or the official AWS WAF challenge-solver SDK is required. Sites stay ``disabled: true`` until a solver is integrated. Example: Dreamwidth.
- ``ddos_guard_challenge`` *(documentation-only; pair with ``disabled: true``)* — DDoS-Guard (ddos-guard.net) anti-bot page. Symptom: HTTP 403 with ``server: ddos-guard`` header; body contains "DDoS-Guard". DDoS-Guard fingerprints different UAs per source IP, so a single User-Agent override does not work across environments; a JS-capable bypass or DDoS-Guard-aware solver is required. Sites stay ``disabled: true`` until a solver is integrated. Example: ForumHouse.
- ``js_challenge`` *(documentation-only; pair with ``disabled: true``)***fallback** for JavaScript-challenge systems whose provider cannot be identified (custom in-house challenge pages that are not Cloudflare, AWS WAF, or any other recognized vendor). Prefer a provider-specific tag whenever the provider can be pinned down from response headers or body signatures.
- ``custom_bot_protection`` *(documentation-only; pair with ``disabled: true``)***fallback** for non-JS-challenge bot protection served by a custom/in-house system (not Cloudflare, not AWS WAF, not DDoS-Guard). Typical symptom: HTTP 403 from the site's own origin server (``server: nginx``, AWS ELB, etc.) with a branded block page, returned regardless of TLS fingerprint or residential IP. Not generically bypassable; investigate per site (cookies, session, proxy geography). Examples: Hackerearth ("HackerEarth Guardian"), FreelanceJob (nginx-level block).
**Rule: prefer provider-specific protection tags.** When a site is blocked by an identifiable anti-bot vendor, always record the vendor in the tag (``cf_js_challenge``, ``cf_firewall``, ``aws_waf_js_challenge``, ``ddos_guard_challenge``, and future additions such as ``sucuri_challenge``, ``incapsula_challenge``). The generic ``js_challenge`` and ``custom_bot_protection`` tags are reserved for custom/unknown systems. Rationale: bypass solvers are inherently provider-specific (a Cloudflare Turnstile solver does not help with AWS WAF); recording the provider in advance lets us fan out fixes the moment a per-provider solver is added, without re-auditing every disabled site. The same principle applies to other protection categories when the provider is identifiable.
- ``tls_fingerprint`` — the site fingerprints the TLS handshake (JA3/JA4) and blocks non-browser clients. Maigret automatically uses ``curl_cffi`` with Chrome browser emulation to bypass this. Requires the ``curl_cffi`` package (included as a dependency). Examples: Instagram, NPM, Codepen, Kickstarter, Letterboxd.
- ``ip_reputation`` — the site blocks requests from datacenter/cloud IPs regardless of headers or TLS. Cannot be bypassed automatically; run Maigret from a regular internet connection (not a datacenter) or use a proxy (``--proxy``). Examples: Reddit, Patreon, Figma.
- ``js_challenge`` — the site serves a JavaScript challenge page (e.g. "Just a moment...") that cannot be solved without a browser. Maigret detects challenge signatures and returns UNKNOWN instead of a false positive.
- ``aws_waf_js_challenge`` — the site is protected by AWS WAF with a JavaScript challenge. Symptom: HTTP 202 with empty body and ``x-amzn-waf-action: challenge`` header (a token-granting challenge that requires executing the CAPTCHA/challenge JS bundle). Neither ``curl_cffi`` TLS impersonation nor User-Agent changes bypass this — a real browser or the official AWS WAF challenge-solver SDK is required. Currently marked for documentation only; sites using this protection stay ``disabled: true`` until a solver is integrated. Example: Dreamwidth.
Example:
-6
View File
@@ -46,9 +46,3 @@ You may be interested in:
tags
settings
development
.. toctree::
:hidden:
:caption: Use cases
use-cases/crypto
-147
View File
@@ -1,147 +0,0 @@
.. _use-case-crypto:
Cryptocurrency & Web3 Investigations
=====================================
Blockchain transactions are public, but the people behind wallets are not. Maigret helps bridge this gap by finding Web3 accounts tied to a username, revealing the person behind a pseudonymous crypto persona.
Why it matters
--------------
Crypto investigations often start with a wallet address or an ENS name but hit a wall — the blockchain tells you *what* happened, not *who* did it. A username, however, is reused across platforms. If someone trades on OpenSea as ``zachxbt`` and posts on Warpcast as ``zachxbt``, Maigret connects the dots and builds a full profile.
Common scenarios:
- **Scam attribution.** A rug-pull promoter uses the same alias on Fragment (Telegram username marketplace), OpenSea, and a personal blog.
- **Sanctions compliance.** Verifying whether a counterparty's online footprint matches known sanctioned individuals.
- **Due diligence.** Before an OTC deal or DAO vote, checking whether the other party has a consistent online presence or is a freshly created sockpuppet.
- **Stolen funds tracing.** A stolen NFT appears on OpenSea under a new account — but the username matches a Warpcast profile with real-world links.
Supported sites
---------------
Maigret currently checks the following crypto and Web3 platforms:
.. list-table::
:header-rows: 1
:widths: 20 40 40
* - Site
- What it reveals
- Notes
* - **OpenSea**
- NFT collections, trading history, profile bio, linked website
-
* - **Rarible**
- NFT marketplace profile, collections, listing history
- Complements OpenSea for NFT attribution across marketplaces
* - **Zora**
- Zora Network profile, minted NFTs, creator activity
- Ethereum L2 creator platform; useful for on-chain art attribution
* - **Polymarket**
- Prediction-market profile, positions, public portfolio P&L
- Useful for political/financial prediction attribution
* - **Warpcast** (Farcaster)
- Decentralized social profile, posts, follower graph, Farcaster ID
- Every Farcaster ID maps to an Ethereum address via the on-chain ID registry
* - **Fragment**
- Telegram username ownership, TON wallet address, purchase date and price
- Valuable for linking Telegram identities to TON wallets
* - **Paragraph**
- Web3 blog/newsletter, ETH wallet address, linked Twitter handle
- Richest cross-platform data among crypto sites
* - **Tonometerbot**
- TON wallet balance, subscriber count, NFT collection, rankings
- TON blockchain analytics
* - **Spatial**
- Metaverse profile, linked social accounts (Discord, Twitter, Instagram, LinkedIn, TikTok)
- Rich cross-platform links
* - **Revolut.me**
- Payment handle: first/last name, country code, base currency, supported payment methods
- Not strictly Web3, but widely used by crypto OTC traders for fiat off-ramps; the public API returns structured KYC-adjacent data
Real-world example: zachxbt
---------------------------
`ZachXBT <https://twitter.com/zachxbt>`_ is a well-known on-chain investigator. Let's see what Maigret can find from just the username ``zachxbt``:
.. code-block:: console
maigret zachxbt --tags crypto
Maigret finds 5 accounts and automatically extracts structured data from each:
**Fragment** — confirms the Telegram username ``@zachxbt`` is claimed, reveals the TON wallet address (``EQBisZrk...``), purchase price (10 TON), and date (January 2023).
**Paragraph** — the richest result. Returns the real name used on the platform (``ZachXBT``), bio (``Scam survivor turned 2D investigator``), an Ethereum wallet address (``0x23dBf066...``), and a linked Twitter handle (``zachxbt``). The ``wallet_address`` field is especially valuable — it directly links the pseudonym to an on-chain identity.
**Warpcast** — Farcaster profile with a Farcaster ID (``fid: 20931``), profile image, and social graph (33K followers). Every Farcaster ID is tied to an Ethereum address via the on-chain ID registry, so this is another on-chain anchor.
**OpenSea** — NFT marketplace profile with bio (``On-chain sleuth | 10x rug pull survivor``), avatar (hosted on ``seadn.io`` with an Ethereum address in the URL path), and a link to an external investigations page.
**Hive Blog** — blockchain-based blog account created in March 2025. Low activity (1 post), but confirms the username is claimed across blockchain ecosystems.
From a single username, Maigret produces:
- **2 wallet addresses** — one TON (from Fragment), one Ethereum (from Paragraph)
- **1 confirmed Twitter handle**``zachxbt`` (from Paragraph)
- **1 Telegram username**``@zachxbt`` (from Fragment)
- **1 external link**``investigations.notion.site`` (from OpenSea)
- **Social graph data** — 33K Farcaster followers, blog activity timestamps
This is enough to pivot into blockchain analysis tools (Etherscan, Arkham, Nansen) using the wallet addresses, or into social media analysis using the Twitter handle.
Workflow: from username to wallet
---------------------------------
**Step 1: Search crypto platforms**
.. code-block:: console
maigret <username> --tags crypto -v
Review the results. Pay attention to:
- **Fragment** — if the username is claimed, you get a TON wallet address directly.
- **Paragraph** — blog profiles often contain an ETH address and a Twitter handle.
- **Warpcast** — Farcaster IDs map to Ethereum addresses via the on-chain registry.
- **OpenSea** — avatar URLs sometimes contain wallet addresses in the path.
**Step 2: Expand with extracted identifiers**
Maigret automatically extracts additional identifiers from found profiles (real names, linked accounts, profile URLs) and recursively searches for them. This is enabled by default. If Maigret finds a linked Twitter handle on a Paragraph profile, it will automatically search for that handle across all sites.
**Step 3: Cross-reference with non-crypto platforms**
The real power is connecting crypto personas to mainstream accounts. Drop the tag filter:
.. code-block:: console
maigret <username> -a
This checks all 3000+ sites. A match on GitHub, Reddit, or a forum can reveal the person behind the wallet.
Workflow: from wallet to identity
---------------------------------
If you start with a wallet address rather than a username, you can use complementary tools to get a username first:
1. **ENS / Unstoppable Domains** — resolve the wallet address to a human-readable name (``vitalik.eth``). Then search that name in Maigret.
2. **Etherscan labels** — check if the address has a public label (exchange, known entity).
3. **Fragment** — search the TON wallet address to find which Telegram usernames it purchased.
4. **Arkham Intelligence / Nansen** — blockchain attribution platforms that may tag the address with a known identity.
Once you have a username candidate, feed it to Maigret.
Tips
----
- **Username reuse is the #1 signal.** Crypto-native users often reuse their ENS name (``alice.eth``) or a variation (``alice_eth``, ``aliceeth``) across platforms. Try all variations.
- **Fragment is uniquely valuable** because it directly links Telegram usernames to TON wallet addresses — a rare on-chain / off-chain bridge.
- **Warpcast profiles are Ethereum-native.** Every Farcaster account is tied to an Ethereum address via the ID registry contract. If you find a Warpcast profile, you implicitly have a wallet address.
- **Paragraph often has the richest data** — wallet address, Twitter handle, bio, and activity timestamps in a single API response.
- **Use** ``--exclude-tags`` **to skip irrelevant sites** when you're focused on crypto:
.. code-block:: console
maigret alice_eth --exclude-tags porn,dating,forum
+3 -54
View File
@@ -7,7 +7,7 @@ from aiohttp import CookieJar
class ParsingActivator:
@staticmethod
def twitter(site, logger, cookies={}, **kwargs):
def twitter(site, logger, cookies={}):
headers = dict(site.headers)
del headers["x-guest-token"]
import requests
@@ -19,7 +19,7 @@ class ParsingActivator:
site.headers["x-guest-token"] = guest_token
@staticmethod
def vimeo(site, logger, cookies={}, **kwargs):
def vimeo(site, logger, cookies={}):
headers = dict(site.headers)
if "Authorization" in headers:
del headers["Authorization"]
@@ -31,58 +31,7 @@ class ParsingActivator:
site.headers["Authorization"] = "jwt " + jwt_token
@staticmethod
def onlyfans(site, logger, url=None, **kwargs):
# Signing rules (static_param / checksum_indexes / checksum_constant / format / app_token)
# live in data.json under OnlyFans.activation and rotate upstream every ~13 weeks.
# If "Please refresh the page" keeps firing after activation, refresh them from:
# https://raw.githubusercontent.com/DATAHOARDERS/dynamic-rules/main/onlyfans.json
import hashlib
import secrets
import time as _time
from urllib.parse import urlparse
import requests
act = site.activation
static_param = act["static_param"]
indexes = act["checksum_indexes"]
constant = act["checksum_constant"]
fmt = act["format"]
init_url = act["url"]
user_id = site.headers.get("user-id", "0") or "0"
def _sign(path):
t = str(int(_time.time() * 1000))
msg = "\n".join([static_param, t, path, user_id]).encode()
sha = hashlib.sha1(msg).hexdigest()
cs = sum(ord(sha[i]) for i in indexes) + constant
return t, fmt.format(sha, abs(cs))
if site.headers.get("x-bc", "").strip("0") == "":
site.headers["x-bc"] = secrets.token_hex(20)
if not site.headers.get("cookie"):
init_path = urlparse(init_url).path
t, sg = _sign(init_path)
hdrs = dict(site.headers)
hdrs["time"] = t
hdrs["sign"] = sg
hdrs.pop("cookie", None)
r = requests.get(init_url, headers=hdrs, timeout=15)
jar = "; ".join(f"{k}={v}" for k, v in r.cookies.items())
if jar:
site.headers["cookie"] = jar
logger.debug(f"OnlyFans init: got cookies {list(r.cookies.keys())}")
target_path = urlparse(url).path if url else urlparse(init_url).path
t, sg = _sign(target_path)
site.headers["time"] = t
site.headers["sign"] = sg
logger.debug(f"OnlyFans signed {target_path} time={t}")
@staticmethod
def weibo(site, logger, **kwargs):
def weibo(site, logger):
headers = dict(site.headers)
import requests
-158
View File
@@ -1,158 +0,0 @@
"""Maigret AI Analysis Module
Provides AI-powered analysis of search results using OpenAI-compatible APIs.
"""
import asyncio
import json
import os
import sys
import threading
import aiohttp
def load_ai_prompt() -> str:
"""Load the AI system prompt from the resources directory."""
maigret_path = os.path.dirname(os.path.realpath(__file__))
prompt_path = os.path.join(maigret_path, "resources", "ai_prompt.txt")
with open(prompt_path, "r", encoding="utf-8") as f:
return f.read()
def resolve_api_key(settings) -> str | None:
"""Resolve OpenAI API key from settings or environment variable.
Priority: settings.openai_api_key > OPENAI_API_KEY env var.
"""
key = getattr(settings, "openai_api_key", None)
if key:
return key
return os.environ.get("OPENAI_API_KEY")
class _Spinner:
"""Simple animated spinner for terminal output."""
FRAMES = ["", "", "", "", "", "", "", "", "", ""]
def __init__(self, text=""):
self.text = text
self._stop = threading.Event()
self._thread = None
def start(self):
self._thread = threading.Thread(target=self._spin, daemon=True)
self._thread.start()
def _spin(self):
i = 0
while not self._stop.is_set():
frame = self.FRAMES[i % len(self.FRAMES)]
sys.stderr.write(f"\r{frame} {self.text}")
sys.stderr.flush()
i += 1
self._stop.wait(0.08)
def stop(self):
self._stop.set()
if self._thread:
self._thread.join()
sys.stderr.write("\r\033[2K")
sys.stderr.flush()
async def print_streaming(text: str, delay: float = 0.04):
"""Print text word by word with a delay, simulating streaming LLM output."""
words = text.split(" ")
for i, word in enumerate(words):
if i > 0:
sys.stdout.write(" ")
sys.stdout.write(word)
sys.stdout.flush()
await asyncio.sleep(delay)
sys.stdout.write("\n")
sys.stdout.flush()
async def get_ai_analysis(
api_key: str,
markdown_report: str,
model: str = "gpt-4o",
api_base_url: str = "https://api.openai.com/v1",
) -> str:
"""Send the markdown report to an OpenAI-compatible API and return the analysis.
Uses streaming to display tokens as they arrive.
Raises on HTTP errors with descriptive messages.
"""
system_prompt = load_ai_prompt()
url = f"{api_base_url.rstrip('/')}/chat/completions"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
}
payload = {
"model": model,
"stream": True,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": markdown_report},
],
}
spinner = _Spinner("Analysing the data with AI...")
spinner.start()
first_token = True
full_response = []
try:
async with aiohttp.ClientSession() as session:
async with session.post(url, json=payload, headers=headers) as resp:
if resp.status == 401:
raise RuntimeError("Invalid OpenAI API key (HTTP 401)")
if resp.status == 429:
raise RuntimeError("OpenAI API rate limit exceeded (HTTP 429)")
if resp.status != 200:
body = await resp.text()
raise RuntimeError(
f"OpenAI API error (HTTP {resp.status}): {body[:500]}"
)
async for line in resp.content:
decoded = line.decode("utf-8").strip()
if not decoded or not decoded.startswith("data: "):
continue
data_str = decoded[len("data: "):]
if data_str == "[DONE]":
break
try:
chunk = json.loads(data_str)
except json.JSONDecodeError:
continue
delta = chunk.get("choices", [{}])[0].get("delta", {})
content = delta.get("content", "")
if not content:
continue
if first_token:
spinner.stop()
print()
first_token = False
sys.stdout.write(content)
sys.stdout.flush()
except Exception:
spinner.stop()
raise
if first_token:
# No tokens received — stop spinner anyway
spinner.stop()
print()
return "".join(full_response)
+16 -13
View File
@@ -6,6 +6,7 @@ import random
import re
import ssl
import sys
import time
from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import quote
@@ -334,7 +335,12 @@ def debug_response_logging(url, html_text, status_code, check_error):
def process_site_result(
response, query_notify, logger, results_info: QueryResultWrapper, site: MaigretSite
response,
query_notify,
logger,
results_info: QueryResultWrapper,
site: MaigretSite,
response_time: Optional[float] = None,
):
if not response:
return results_info
@@ -345,11 +351,7 @@ def process_site_result(
username = results_info["username"]
is_parsing_enabled = results_info["parsing_enabled"]
url = results_info.get("url_user")
url_probe = results_info.get("url_probe") or url
if url_probe != url:
logger.info(f"{url_probe} (display: {url})")
else:
logger.info(url)
logger.info(url)
status = results_info.get("status")
if status is not None:
@@ -366,9 +368,6 @@ def process_site_result(
html_text, status_code, check_error = response
# TODO: add elapsed request time counting
response_time = None
if logger.level == logging.DEBUG:
debug_response_logging(url, html_text, status_code, check_error)
@@ -607,8 +606,6 @@ def make_site_result(
for k, v in site.get_params.items():
url_probe += f"&{k}={v}"
results_site["url_probe"] = url_probe
if site.request_method:
request_method = site.request_method.lower()
elif site.check_type == "status_code" and site.request_head_only:
@@ -673,7 +670,10 @@ async def check_site_for_username(
print(f"error, no checker for {site.name}")
return site.name, default_result
elapsed = 0.0
t0 = time.perf_counter()
response = await checker.check()
elapsed += time.perf_counter() - t0
html_text = response[0] if response and response[0] else ""
# Retry once after token-style activation (e.g. Twitter guest token refresh).
@@ -684,7 +684,7 @@ async def check_site_for_username(
method = act["method"]
try:
activate_fun = getattr(ParsingActivator(), method)
activate_fun(site, logger, url=checker.url)
activate_fun(site, logger)
except AttributeError as e:
logger.warning(
f"Activation method {method} for site {site.name} not found!",
@@ -706,10 +706,13 @@ async def check_site_for_username(
method=checker.method,
payload=getattr(checker, 'payload', None),
)
t1 = time.perf_counter()
response = await checker.check()
elapsed += time.perf_counter() - t1
response_result = process_site_result(
response, query_notify, logger, default_result, site
response, query_notify, logger, default_result, site,
response_time=elapsed,
)
query_notify.update(response_result['status'], site.similar_search)
+14 -80
View File
@@ -494,21 +494,6 @@ def setup_arguments_parser(settings: Settings):
" (one report per username).",
)
report_group.add_argument(
"--ai",
action="store_true",
dest="ai",
default=False,
help="Generate an AI-powered analysis of the search results using OpenAI API. "
"Requires OPENAI_API_KEY env var or openai_api_key in settings.",
)
report_group.add_argument(
"--ai-model",
dest="ai_model",
default=settings.openai_model,
help="OpenAI model to use for AI analysis (default: gpt-4o).",
)
parser.add_argument(
"--reports-sorting",
default=settings.report_sorting,
@@ -611,7 +596,6 @@ async def main():
print_found_only=not args.print_not_found,
skip_check_errors=not args.print_check_errors,
color=not args.no_color,
silent=args.ai,
)
# Create object with all information about sites we are aware of.
@@ -727,33 +711,17 @@ async def main():
+ get_dict_ascii_tree(usernames, prepend="\t")
)
if args.ai:
from .ai import resolve_api_key
if not resolve_api_key(settings):
query_notify.warning(
'AI analysis requires an OpenAI API key. '
'Set OPENAI_API_KEY environment variable or add '
'openai_api_key to settings.json.'
)
sys.exit(1)
if not site_data:
query_notify.warning('No sites to check, exiting!')
sys.exit(2)
if args.ai:
query_notify.warning(
f'Starting a search on top {len(site_data)} sites from the Maigret database...'
)
if not args.all_sites:
query_notify.warning(
f'Starting AI-assisted search on top {len(site_data)} sites from the Maigret database...'
'You can run search by full list of sites with flag `-a`', '!'
)
else:
query_notify.warning(
f'Starting a search on top {len(site_data)} sites from the Maigret database...'
)
if not args.all_sites:
query_notify.warning(
'You can run search by full list of sites with flag `-a`', '!'
)
already_checked = set()
general_results = []
@@ -806,12 +774,11 @@ async def main():
check_domains=args.with_domains,
)
if not args.ai:
errs = errors.notify_about_errors(
results, query_notify, show_statistics=args.verbose
)
for e in errs:
query_notify.warning(*e)
errs = errors.notify_about_errors(
results, query_notify, show_statistics=args.verbose
)
for e in errs:
query_notify.warning(*e)
if args.reports_sorting == "data":
results = sort_report_by_data_points(results)
@@ -900,43 +867,10 @@ async def main():
save_graph_report(filename, general_results, db)
query_notify.warning(f'Graph report on all usernames saved in {filename}')
if not args.ai:
text_report = get_plaintext_report(report_context)
if text_report:
query_notify.info('Short text report:')
print(text_report)
if args.ai:
from .ai import get_ai_analysis, resolve_api_key
from .report import generate_markdown_report
api_key = resolve_api_key(settings)
run_flags = []
if args.tags:
run_flags.append(f"--tags {args.tags}")
if args.site_list:
run_flags.append(f"--site {','.join(args.site_list)}")
if args.all_sites:
run_flags.append("--all-sites")
run_info = {
"sites_count": sum(len(d) for _, _, d in general_results),
"flags": " ".join(run_flags) if run_flags else None,
}
md_report = generate_markdown_report(report_context, run_info=run_info)
try:
await get_ai_analysis(
api_key=api_key,
markdown_report=md_report,
model=args.ai_model,
api_base_url=getattr(
settings, 'openai_api_base_url', 'https://api.openai.com/v1'
),
)
except Exception as e:
query_notify.warning(f'AI analysis failed: {e}')
text_report = get_plaintext_report(report_context)
if text_report:
query_notify.info('Short text report:')
print(text_report)
# update database
db.save_to_file(db_file)
-8
View File
@@ -123,7 +123,6 @@ class QueryNotifyPrint(QueryNotify):
print_found_only=False,
skip_check_errors=False,
color=True,
silent=False,
):
"""Create Query Notify Print Object.
@@ -150,7 +149,6 @@ class QueryNotifyPrint(QueryNotify):
self.print_found_only = print_found_only
self.skip_check_errors = skip_check_errors
self.color = color
self.silent = silent
return
@@ -189,9 +187,6 @@ class QueryNotifyPrint(QueryNotify):
Nothing.
"""
if self.silent:
return
title = f"Checking {id_type}"
if self.color:
print(
@@ -241,9 +236,6 @@ class QueryNotifyPrint(QueryNotify):
Return Value:
Nothing.
"""
if self.silent:
return
notify = None
self.result = result
+6 -15
View File
@@ -30,18 +30,14 @@ UTILS
def filter_supposed_data(data):
# interesting fields
allowed_fields = ["fullname", "gender", "location", "age"]
def _first(v):
if isinstance(v, (list, tuple)):
return v[0] if v else ""
return v
return {
CaseConverter.snake_to_title(k): _first(v)
filtered_supposed_data = {
CaseConverter.snake_to_title(k): v[0]
for k, v in data.items()
if k in allowed_fields
}
return filtered_supposed_data
def sort_report_by_data_points(results):
@@ -271,7 +267,7 @@ def _md_format_value(value) -> str:
return s
def generate_markdown_report(context: dict, run_info: dict = None) -> str:
def save_markdown_report(filename: str, context: dict, run_info: dict = None):
username = context.get("username", "unknown")
generated_at = context.get("generated_at", "")
brief = context.get("brief", "")
@@ -395,13 +391,8 @@ def generate_markdown_report(context: dict, run_info: dict = None) -> str:
"CCPA, and similar).\n"
)
return "\n".join(lines)
def save_markdown_report(filename: str, context: dict, run_info: dict = None):
content = generate_markdown_report(context, run_info)
with open(filename, "w", encoding="utf-8") as f:
f.write(content)
f.write("\n".join(lines))
"""
-62
View File
@@ -1,62 +0,0 @@
You are an OSINT analyst that converts raw username-investigation reports into a short, clean human-readable summary.
Your task:
Read the attached account-discovery report and produce a concise report in exactly this style:
# Investigation Summary
Name: <most likely real full name>
Location: <most likely current location>
Occupation: <short combined description based only on strong signals>
Interests: <36 broad interests inferred from platform types, bios, and activity>
Languages: <languages supported by strong evidence only>
Website: <main personal website if clearly present>
Username: <main username> (variant: <variant usernames if any>)
Platforms: <number> profiles, active from <first year> to <last year>
Confidence: <High / Medium / Low> — <one short explanation why>
# Other leads
- <lead 1>
- <lead 2>
- <lead 3 if needed>
Rules:
1. Use only information supported by the report.
2. Resolve identity using consistency of username, full name, bio, links, company, and location.
3. Prefer strong repeated signals over one-off weak signals.
4. If one profile clearly conflicts with the rest, mention it in "Other leads" as a likely false positive instead of mixing it into the main identity.
5. Keep the tone analytical and neutral.
6. Do not mention every platform individually.
7. Do not include raw URLs except for the main website.
8. Do not mention NSFW/adult platforms in the main summary unless they are the only source for a critical lead; if such a profile looks inconsistent, mention it only as a likely false positive.
9. "Occupation" should be a compact merged description, for example: "Chief Product Officer (CPO) at ..., entrepreneur, OSINT community founder".
10. "Interests" should be broad categories, not noisy tags. Convert raw platform/tag evidence into natural categories like OSINT, software development, blogging, gaming, streaming, etc.
11. "Languages" should only include languages clearly supported by bios, texts, country tags, or profile content.
12. For "Platforms", count the profiles reported as found by the report summary, not manually deduplicated.
13. For active years, use the earliest and latest reliable dates from the consistent identity cluster. Ignore obvious outlier dates if they belong to likely false positives or weak profiles.
14. For confidence:
- High = strong consistency across username, name, bio, links, location, and/or company
- Medium = partial consistency with some gaps
- Low = mostly username-only matches
15. If some field is not reliably known, omit speculation and use the best cautious wording possible.
16. For "Name", output only the most likely real personal name in clean canonical form.
- Remove nicknames, handles, aliases, or bracketed parts such as "(Soxoj)".
- Example: "Dmitriy (Soxoj) Danilov" -> "Dmitriy Danilov".
17. For "Website", output only the plain domain or URL as text, not a markdown hyperlink.
18. In "Other leads", do not label conflicting profiles as "false positive", "likely unrelated", or "potentially a false positive".
- Instead, use neutral intelligence wording such as:
"Accounts were found that are most likely unrelated to the main identity, but may indicate possible cross-border activity and should be verified."
19. When describing anomalies in "Other leads", prefer cautious investigative phrasing:
- "may be unrelated"
- "requires verification"
- "could indicate separate activity"
- "should be checked manually"
20. Do not include nicknames or aliases inside the Name field unless they are clearly part of the legal or real-world name.
Output requirements:
- Return only the final formatted text.
- Keep it short.
- No preamble, no explanations.
Now analyze the following report
+1295 -1555
View File
File diff suppressed because it is too large Load Diff
+3 -3
View File
@@ -1,8 +1,8 @@
{
"version": 1,
"updated_at": "2026-04-26T09:18:14Z",
"sites_count": 3139,
"updated_at": "2026-04-21T00:02:26Z",
"sites_count": 3141,
"min_maigret_version": "0.6.0",
"data_sha256": "c51ecaa6c0736c5e1e7ca91aaf111445b3ac9ce9541a472d97db2dcc3ff8aa17",
"data_sha256": "d93fb2d051328b60126c98fbf02841a6974549f0c8c9220a207a9172b3ee0c90",
"data_url": "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/data.json"
}
-3
View File
@@ -55,9 +55,6 @@
"pdf_report": false,
"html_report": false,
"md_report": false,
"openai_api_key": "",
"openai_model": "gpt-4o",
"openai_api_base_url": "https://api.openai.com/v1",
"web_interface_port": 5000,
"no_autoupdate": false,
"db_update_meta_url": "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/db_meta.json",
Generated
+53 -56
View File
@@ -418,14 +418,14 @@ files = [
[[package]]
name = "certifi"
version = "2026.4.22"
version = "2026.2.25"
description = "Python package for providing Mozilla's CA Bundle."
optional = false
python-versions = ">=3.7"
groups = ["main"]
files = [
{file = "certifi-2026.4.22-py3-none-any.whl", hash = "sha256:3cb2210c8f88ba2318d29b0388d1023c8492ff72ecdde4ebdaddbb13a31b1c4a"},
{file = "certifi-2026.4.22.tar.gz", hash = "sha256:8d455352a37b71bf76a79caa83a3d6c25afee4a385d632127b6afb3963f1c580"},
{file = "certifi-2026.2.25-py3-none-any.whl", hash = "sha256:027692e4402ad994f1c42e52a4997a9763c646b73e4096e4d5d6db8af1d6f0fa"},
{file = "certifi-2026.2.25.tar.gz", hash = "sha256:e887ab5cee78ea814d3472169153c2d12cd43b14bd03329a39a9c6e2e80bfba7"},
]
[[package]]
@@ -1261,18 +1261,18 @@ lxml = ["lxml ; platform_python_implementation == \"CPython\""]
[[package]]
name = "idna"
version = "3.13"
version = "3.11"
description = "Internationalized Domain Names in Applications (IDNA)"
optional = false
python-versions = ">=3.8"
groups = ["main"]
files = [
{file = "idna-3.13-py3-none-any.whl", hash = "sha256:892ea0cde124a99ce773decba204c5552b69c3c67ffd5f232eb7696135bc8bb3"},
{file = "idna-3.13.tar.gz", hash = "sha256:585ea8fe5d69b9181ec1afba340451fba6ba764af97026f92a91d4eef164a242"},
{file = "idna-3.11-py3-none-any.whl", hash = "sha256:771a87f49d9defaf64091e6e6fe9c18d4833f140bd19464795bc32d966ca37ea"},
{file = "idna-3.11.tar.gz", hash = "sha256:795dafcc9c04ed0c1fb032c2aa73654d8e8c5023a7df64a53f39190ada629902"},
]
[package.extras]
all = ["mypy (>=1.11.2)", "pytest (>=8.3.2)", "ruff (>=0.6.2)"]
all = ["flake8 (>=7.1.1)", "mypy (>=1.11.2)", "pytest (>=8.3.2)", "ruff (>=0.6.2)"]
[[package]]
name = "iniconfig"
@@ -1985,56 +1985,56 @@ typing-extensions = {version = ">=4.1.0", markers = "python_version < \"3.11\""}
[[package]]
name = "mypy"
version = "1.20.2"
version = "1.20.1"
description = "Optional static typing for Python"
optional = false
python-versions = ">=3.10"
groups = ["dev"]
files = [
{file = "mypy-1.20.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:cf5a4db6dca263010e2c7bff081c89383c72d187ba2cf4c44759aac970e2f0c4"},
{file = "mypy-1.20.2-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:7b0e817b518bff7facd7f85ea05b643ad8bdcce684cf29784987b0a7c8e1f997"},
{file = "mypy-1.20.2-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:97d7b9a485b40f8ca425460e89bf1da2814625b2da627c0dcc6aa46c92631d14"},
{file = "mypy-1.20.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:1e1c12f6d2db3d78b909b5f77513c11eb7f2dd2782b96a3ab6dffc7d44575c99"},
{file = "mypy-1.20.2-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:89dce27e142d25ffbc154c1819383b69f2e9234dc4ed4766f42e0e8cb264ab5c"},
{file = "mypy-1.20.2-cp310-cp310-win_amd64.whl", hash = "sha256:f376e37f9bf2a946872fc5fd1199c99310748e3c26c7a26683f13f8bdb756cbd"},
{file = "mypy-1.20.2-cp310-cp310-win_arm64.whl", hash = "sha256:6e2b469efd811707bc530fd1effef0f5d6eebcb7fe376affae69025da4b979a2"},
{file = "mypy-1.20.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:4077797a273e56e8843d001e9dfe4ba10e33323d6ade647ff260e5cd97d9758c"},
{file = "mypy-1.20.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:cdecf62abcc4292500d7858aeae87a1f8f1150f4c4dd08fb0b336ee79b2a6df3"},
{file = "mypy-1.20.2-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c566c3a88b6ece59b3d70f65bedef17304f48eb52ff040a6a18214e1917b3254"},
{file = "mypy-1.20.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0deb80d062b2479f2c87ae568f89845afc71d11bc41b04179e58165fd9f31e98"},
{file = "mypy-1.20.2-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:bba9ad231e92a3e424b3e56b65aa17704993425bba97e302c832f9466bb85bac"},
{file = "mypy-1.20.2-cp311-cp311-win_amd64.whl", hash = "sha256:baf593f2765fa3a6b1ef95807dbaa3d25b594f6a52adcc506a6b9cb115e1be67"},
{file = "mypy-1.20.2-cp311-cp311-win_arm64.whl", hash = "sha256:20175a1c0f49863946ec20b7f63255768058ac4f07d2b9ded6a6b46cfb5a9100"},
{file = "mypy-1.20.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:4dbfcf869f6b0517f70cf0030ba6ea1d6645e132337a7d5204a18d8d5636c02b"},
{file = "mypy-1.20.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4b6481b228d072315b053210b01ac320e1be243dc17f9e5887ef167f23f5fae4"},
{file = "mypy-1.20.2-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:34397cdced6b90b836e38182076049fdb41424322e0b0728c946b0939ebdf9f6"},
{file = "mypy-1.20.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a5da6976f20cae27059ea8d0c86e7cef3de720e04c4bb9ee18e3690fdb792066"},
{file = "mypy-1.20.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:56908d7e08318d39f85b1f0c6cfd47b0cac1a130da677630dac0de3e0623e102"},
{file = "mypy-1.20.2-cp312-cp312-win_amd64.whl", hash = "sha256:d52ad8d78522da1d308789df651ee5379088e77c76cb1994858d40a426b343b9"},
{file = "mypy-1.20.2-cp312-cp312-win_arm64.whl", hash = "sha256:785b08db19c9f214dc37d65f7c165d19a30fcecb48abfa30f31b01b5acaabb58"},
{file = "mypy-1.20.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:edfbfca868cdd6bd8d974a60f8a3682f5565d3f5c99b327640cedd24c4264026"},
{file = "mypy-1.20.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:e2877a02380adfcdbc69071a0f74d6e9dbbf593c0dc9d174e1f223ffd5281943"},
{file = "mypy-1.20.2-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7488448de6007cd5177c6cea0517ac33b4c0f5ee9b5e9f2be51ce75511a85517"},
{file = "mypy-1.20.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bb9c2fa06887e21d6a3a868762acb82aec34e2c6fd0174064f27c93ede68ad15"},
{file = "mypy-1.20.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:9d56a78b646f2e3daa865bc70cd5ec5a46c50045801ca8ff17a0c43abc97e3ee"},
{file = "mypy-1.20.2-cp313-cp313-win_amd64.whl", hash = "sha256:2a4102b03bb7481d9a91a6da8d174740c9c8c4401024684b9ca3b7cc5e49852f"},
{file = "mypy-1.20.2-cp313-cp313-win_arm64.whl", hash = "sha256:a95a9248b0c6fd933a442c03c3b113c3b61320086b88e2c444676d3fd1ca3330"},
{file = "mypy-1.20.2-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:419413398fe250aae057fd2fe50166b61077083c9b82754c341cf4fd73038f30"},
{file = "mypy-1.20.2-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:e73c07f23009962885c197ccb9b41356a30cc0e5a1d0c2ea8fd8fb1362d7f924"},
{file = "mypy-1.20.2-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0c64e5973df366b747646fc98da921f9d6eba9716d57d1db94a83c026a08e0fb"},
{file = "mypy-1.20.2-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5a65aa591af023864fd08a97da9974e919452cfe19cb146c8a5dc692626445dc"},
{file = "mypy-1.20.2-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:4fef51b01e638974a6e69885687e9bd40c8d1e09a6cd291cca0619625cf1f558"},
{file = "mypy-1.20.2-cp314-cp314-win_amd64.whl", hash = "sha256:913485a03f1bcf5d279409a9d2b9ed565c151f61c09f29991e5faa14033da4c8"},
{file = "mypy-1.20.2-cp314-cp314-win_arm64.whl", hash = "sha256:c3bae4f855d965b5453784300c12ffc63a548304ac7f99e55d4dc7c898673aa3"},
{file = "mypy-1.20.2-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:2de3dcea53babc1c3237a19002bc3d228ce1833278f093b8d619e06e7cc79609"},
{file = "mypy-1.20.2-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:52b176444e2e5054dfcbcb8c75b0b719865c96247b37407184bbfca5c353f2c2"},
{file = "mypy-1.20.2-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:688c3312e5dadb573a2c69c82af3a298d43ecf9e6d264e0f95df960b5f6ac19c"},
{file = "mypy-1.20.2-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:29752dbbf8cc53f89f6ac096d363314333045c257c9c75cbd189ca2de0455744"},
{file = "mypy-1.20.2-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:803203d2b6ea644982c644895c2f78b28d0e208bba7b27d9b921e0ec5eb207c6"},
{file = "mypy-1.20.2-cp314-cp314t-win_amd64.whl", hash = "sha256:9bcb8aa397ff0093c824182fd76a935a9ba7ad097fcbef80ae89bf6c1731d8ec"},
{file = "mypy-1.20.2-cp314-cp314t-win_arm64.whl", hash = "sha256:e061b58443f1736f8a37c48978d7ab581636d6ab03e3d4f99e3fa90463bb9382"},
{file = "mypy-1.20.2-py3-none-any.whl", hash = "sha256:a94c5a76ab46c5e6257c7972b6c8cff0574201ca7dc05647e33e795d78680563"},
{file = "mypy-1.20.2.tar.gz", hash = "sha256:e8222c26daaafd9e8626dec58ae36029f82585890589576f769a650dd20fd665"},
{file = "mypy-1.20.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:3ba5d1e712ada9c3b6223dcbc5a31dac334ed62991e5caa17bcf5a4ddc349af0"},
{file = "mypy-1.20.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:2e731284c117b0987fb1e6c5013a56f33e7faa1fce594066ab83876183ce1c66"},
{file = "mypy-1.20.1-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f8e945b872a05f4fbefabe2249c0b07b6b194e5e11a86ebee9edf855de09806c"},
{file = "mypy-1.20.1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2fc88acef0dc9b15246502b418980478c1bfc9702057a0e1e7598d01a7af8937"},
{file = "mypy-1.20.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:14911a115c73608f155f648b978c5055d16ff974e6b1b5512d7fedf4fa8b15c6"},
{file = "mypy-1.20.1-cp310-cp310-win_amd64.whl", hash = "sha256:76d9b4c992cca3331d9793ef197ae360ea44953cf35beb2526e95b9e074f2866"},
{file = "mypy-1.20.1-cp310-cp310-win_arm64.whl", hash = "sha256:b408722f80be44845da555671a5ef3a0c63f51ca5752b0c20e992dc9c0fbd3cd"},
{file = "mypy-1.20.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:c01eb9bac2c6a962d00f9d23421cd2913840e65bba365167d057bd0b4171a92e"},
{file = "mypy-1.20.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:55d12ddbd8a9cac5b276878bd534fa39fff5bf543dc6ae18f25d30c8d7d27fca"},
{file = "mypy-1.20.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c0aa322c1468b6cdfc927a44ce130f79bb44bcd34eb4a009eb9f96571fd80955"},
{file = "mypy-1.20.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:3f8bc95899cf676b6e2285779a08a998cc3a7b26f1026752df9d2741df3c79e8"},
{file = "mypy-1.20.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:47c2b90191a870a04041e910277494b0d92f0711be9e524d45c074fe60c00b65"},
{file = "mypy-1.20.1-cp311-cp311-win_amd64.whl", hash = "sha256:9857dc8d2ec1a392ffbda518075beb00ac58859979c79f9e6bdcb7277082c2f2"},
{file = "mypy-1.20.1-cp311-cp311-win_arm64.whl", hash = "sha256:09d8df92bb25b6065ab91b178da843dda67b33eb819321679a6e98a907ce0e10"},
{file = "mypy-1.20.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:36ee2b9c6599c230fea89bbd79f401f9f9f8e9fcf0c777827789b19b7da90f51"},
{file = "mypy-1.20.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:fba3fb0968a7b48806b0c90f38d39296f10766885a94c83bd21399de1e14eb28"},
{file = "mypy-1.20.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ef1415a637cd3627d6304dfbeddbadd21079dafc2a8a753c477ce4fc0c2af54f"},
{file = "mypy-1.20.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ef3461b1ad5cd446e540016e90b5984657edda39f982f4cc45ca317b628f5a37"},
{file = "mypy-1.20.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:542dd63c9e1339b6092eb25bd515f3a32a1453aee8c9521d2ddb17dacd840237"},
{file = "mypy-1.20.1-cp312-cp312-win_amd64.whl", hash = "sha256:1d55c7cd8ca22e31f93af2a01160a9e95465b5878de23dba7e48116052f20a8d"},
{file = "mypy-1.20.1-cp312-cp312-win_arm64.whl", hash = "sha256:f5b84a79070586e0d353ee07b719d9d0a4aa7c8ee90c0ea97747e98cbe193019"},
{file = "mypy-1.20.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:8f3886c03e40afefd327bd70b3f634b39ea82e87f314edaa4d0cce4b927ddcc1"},
{file = "mypy-1.20.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:e860eb3904f9764e83bafd70c8250bdffdc7dde6b82f486e8156348bf7ceb184"},
{file = "mypy-1.20.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a4b5aac6e785719da51a84f5d09e9e843d473170a9045b1ea7ea1af86225df4b"},
{file = "mypy-1.20.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f37b6cd0fe2ad3a20f05ace48ca3523fc52ff86940e34937b439613b6854472e"},
{file = "mypy-1.20.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:e4bbb0f6b54ce7cc350ef4a770650d15fa70edd99ad5267e227133eda9c94218"},
{file = "mypy-1.20.1-cp313-cp313-win_amd64.whl", hash = "sha256:c3dc20f8ec76eecd77148cdd2f1542ed496e51e185713bf488a414f862deb8f2"},
{file = "mypy-1.20.1-cp313-cp313-win_arm64.whl", hash = "sha256:a9d62bbac5d6d46718e2b0330b25e6264463ed832722b8f7d4440ff1be3ca895"},
{file = "mypy-1.20.1-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:12927b9c0ed794daedcf1dab055b6c613d9d5659ac511e8d936d96f19c087d12"},
{file = "mypy-1.20.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:752507dd481e958b2c08fc966d3806c962af5a9433b5bf8f3bdd7175c20e34fe"},
{file = "mypy-1.20.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c614655b5a065e56274c6cbbe405f7cf7e96c0654db7ba39bc680238837f7b08"},
{file = "mypy-1.20.1-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2c3f6221a76f34d5100c6d35b3ef6b947054123c3f8d6938a4ba00b1308aa572"},
{file = "mypy-1.20.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:4bdfc06303ac06500af71ea0cdbe995c502b3c9ba32f3f8313523c137a25d1b6"},
{file = "mypy-1.20.1-cp314-cp314-win_amd64.whl", hash = "sha256:0131edd7eba289973d1ba1003d1a37c426b85cdef76650cd02da6420898a5eb3"},
{file = "mypy-1.20.1-cp314-cp314-win_arm64.whl", hash = "sha256:33f02904feb2c07e1fdf7909026206396c9deeb9e6f34d466b4cfedb0aadbbe4"},
{file = "mypy-1.20.1-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:168472149dd8cc505c98cefd21ad77e4257ed6022cd5ed2fe2999bed56977a5a"},
{file = "mypy-1.20.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:eb674600309a8f22790cca883a97c90299f948183ebb210fbef6bcee07cb1986"},
{file = "mypy-1.20.1-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ef2b2e4cc464ba9795459f2586923abd58a0055487cbe558cb538ea6e6bc142a"},
{file = "mypy-1.20.1-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:dee461d396dd46b3f0ed5a098dbc9b8860c81c46ad44fa071afcfbc149f167c9"},
{file = "mypy-1.20.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:e364926308b3e66f1361f81a566fc1b2f8cd47fc8525e8136d4058a65a4b4f02"},
{file = "mypy-1.20.1-cp314-cp314t-win_amd64.whl", hash = "sha256:a0c17fbd746d38c70cbc42647cfd884f845a9708a4b160a8b4f7e70d41f4d7fa"},
{file = "mypy-1.20.1-cp314-cp314t-win_arm64.whl", hash = "sha256:db2cb89654626a912efda69c0d5c1d22d948265e2069010d3dde3abf751c7d08"},
{file = "mypy-1.20.1-py3-none-any.whl", hash = "sha256:1aae28507f253fe82d883790d1c0a0d35798a810117c88184097fe8881052f06"},
{file = "mypy-1.20.1.tar.gz", hash = "sha256:6fc3f4ecd52de81648fed1945498bf42fa2993ddfad67c9056df36ae5757f804"},
]
[package.dependencies]
@@ -2042,10 +2042,7 @@ librt = {version = ">=0.8.0", markers = "platform_python_implementation != \"PyP
mypy_extensions = ">=1.0.0"
pathspec = ">=1.0.0"
tomli = {version = ">=1.1.0", markers = "python_version < \"3.11\""}
typing_extensions = [
{version = ">=4.6.0", markers = "python_version < \"3.15\""},
{version = ">=4.14.0", markers = "python_version >= \"3.15\""},
]
typing_extensions = ">=4.6.0"
[package.extras]
dmypy = ["psutil (>=4.0)"]
+1 -1
View File
@@ -1,5 +1,5 @@
maigret @ https://github.com/soxoj/maigret/archive/refs/heads/main.zip
pefile==2023.2.7 # do not bump while pyinstaller is 6.11.1, there is a conflict
psutil==7.2.2
pyinstaller==6.20.0
pyinstaller==6.19.0
pywin32-ctypes==0.2.3
+803 -801
View File
File diff suppressed because it is too large Load Diff
-107
View File
@@ -56,110 +56,3 @@ async def test_import_aiohttp_cookies(cookie_test_server):
print(f"Server response: {result}")
assert result == {'cookies': {'a': 'b'}}
# ---- OnlyFans signing tests (pure-compute, no network) ----
class _FakeSite:
"""Minimal stand-in for MaigretSite with the attributes onlyfans() touches."""
def __init__(self, headers=None, activation=None):
self.headers = headers or {}
self.activation = activation or {
"static_param": "jLM8LXHU1CGcuCzPMNwWX9osCScVuP4D",
"checksum_indexes": [28, 3, 16, 32, 25, 24, 23, 0, 26],
"checksum_constant": -180,
"format": "57203:{}:{:x}:69cfa6d8",
"url": "https://onlyfans.com/api2/v2/init",
}
class _FakeResponse:
def __init__(self, cookies=None):
self.cookies = cookies or {}
def test_onlyfans_sets_xbc_when_zero(monkeypatch):
site = _FakeSite(headers={"x-bc": "0", "cookie": "existing=1"})
# Prevent any real network. If _sign path still fires requests.get, fail loudly.
import maigret.activation as act_mod
def boom(*a, **kw): # pragma: no cover - sanity
raise AssertionError("requests.get should not run when cookie is present")
monkeypatch.setattr(act_mod.__dict__.get("requests", None) or __import__("requests"), "get", boom, raising=False)
logger = Mock()
ParsingActivator.onlyfans(site, logger, url="https://onlyfans.com/api2/v2/users/adam")
# x-bc must be rewritten to a non-zero hex token
assert site.headers["x-bc"] != "0"
assert len(site.headers["x-bc"]) == 40 # 20 bytes → 40 hex chars
# time / sign headers set for target URL
assert "time" in site.headers and site.headers["time"].isdigit()
assert site.headers["sign"].startswith("57203:")
def test_onlyfans_fetches_init_cookie_when_missing(monkeypatch):
"""When cookie header is absent, init endpoint is called and its cookies stored."""
site = _FakeSite(headers={"x-bc": "already_set_token", "user-id": "0"})
import requests
captured = {}
def fake_get(url, headers=None, timeout=15):
captured["url"] = url
captured["headers"] = dict(headers or {})
return _FakeResponse(cookies={"sess": "abc123", "csrf": "xyz"})
monkeypatch.setattr(requests, "get", fake_get)
logger = Mock()
ParsingActivator.onlyfans(site, logger, url="https://onlyfans.com/api2/v2/users/adam")
# init request made
assert captured["url"] == site.activation["url"]
# headers passed to init include freshly generated time/sign
assert "time" in captured["headers"]
assert captured["headers"]["sign"].startswith("57203:")
# cookie header populated from response
assert site.headers["cookie"] == "sess=abc123; csrf=xyz"
def test_onlyfans_signature_is_deterministic_for_same_time(monkeypatch):
"""Two calls with patched time produce identical signatures."""
site1 = _FakeSite(headers={"x-bc": "token", "cookie": "c=1"})
site2 = _FakeSite(headers={"x-bc": "token", "cookie": "c=1"})
import maigret.activation
monkeypatch.setattr(maigret.activation, "_time", __import__("time"), raising=False)
fixed = 1_700_000_000.123
import time as time_mod
monkeypatch.setattr(time_mod, "time", lambda: fixed)
logger = Mock()
ParsingActivator.onlyfans(site1, logger, url="https://onlyfans.com/api2/v2/users/adam")
ParsingActivator.onlyfans(site2, logger, url="https://onlyfans.com/api2/v2/users/adam")
assert site1.headers["time"] == site2.headers["time"]
assert site1.headers["sign"] == site2.headers["sign"]
def test_onlyfans_sign_differs_per_path(monkeypatch):
"""Different target URLs must yield different signatures."""
site = _FakeSite(headers={"x-bc": "token", "cookie": "c=1"})
import time as time_mod
monkeypatch.setattr(time_mod, "time", lambda: 1_700_000_000.0)
logger = Mock()
ParsingActivator.onlyfans(site, logger, url="https://onlyfans.com/api2/v2/users/adam")
sig_adam = site.headers["sign"]
ParsingActivator.onlyfans(site, logger, url="https://onlyfans.com/api2/v2/users/bob")
sig_bob = site.headers["sign"]
assert sig_adam != sig_bob
+58 -226
View File
@@ -1,22 +1,12 @@
from argparse import ArgumentTypeError
import asyncio
import logging
from mock import Mock
import pytest
from maigret import search
from maigret.checking import (
detect_error_page,
extract_ids_data,
parse_usernames,
update_results_info,
get_failed_sites,
timeout_check,
debug_response_logging,
process_site_result,
)
from maigret.errors import CheckError
from maigret.checking import check_site_for_username, process_site_result
from maigret.result import MaigretCheckResult, MaigretCheckStatus
from maigret.sites import MaigretSite
def site_result_except(server, username, **kwargs):
@@ -84,226 +74,68 @@ async def test_checking_by_message_negative(httpserver, local_test_db):
assert result['Message']['status'].is_found() is True
# ---- Pure-function unit tests (no network) ----
def test_detect_error_page_site_specific():
err = detect_error_page(
"Please enable JavaScript to proceed",
200,
{"Please enable JavaScript to proceed": "Scraping protection"},
ignore_403=False,
)
assert err is not None
assert err.type == "Site-specific"
assert err.desc == "Scraping protection"
def test_detect_error_page_403():
err = detect_error_page("some body", 403, {}, ignore_403=False)
assert err is not None
assert err.type == "Access denied"
def test_detect_error_page_403_ignored():
# XenForo engine uses ignore403 because member-not-found also returns 403
assert detect_error_page("not found body", 403, {}, ignore_403=True) is None
def test_detect_error_page_999_linkedin():
# LinkedIn returns 999 on bot suspicion — must NOT be reported as Server error
assert detect_error_page("", 999, {}, ignore_403=False) is None
def test_detect_error_page_500():
err = detect_error_page("", 503, {}, ignore_403=False)
assert err is not None
assert err.type == "Server"
assert "503" in err.desc
def test_detect_error_page_ok():
assert detect_error_page("hello world", 200, {}, ignore_403=False) is None
def test_parse_usernames_single_username():
logger = Mock()
result = parse_usernames({"profile_username": "alice"}, logger)
assert result == {"alice": "username"}
def test_parse_usernames_list_of_usernames():
logger = Mock()
result = parse_usernames({"other_usernames": "['alice', 'bob']"}, logger)
assert result == {"alice": "username", "bob": "username"}
def test_parse_usernames_malformed_list():
logger = Mock()
result = parse_usernames({"other_usernames": "not-a-list"}, logger)
# should swallow the error and just return empty
assert result == {}
assert logger.warning.called
def test_parse_usernames_supported_id():
logger = Mock()
# "telegram" is in SUPPORTED_IDS per socid_extractor
from maigret.checking import SUPPORTED_IDS
if SUPPORTED_IDS:
key = next(iter(SUPPORTED_IDS))
result = parse_usernames({key: "some_value"}, logger)
assert result.get("some_value") == key
def test_update_results_info_links():
info = {"username": "test"}
result = update_results_info(
info,
{"links": "['https://example.com/a', 'https://example.com/b']", "website": "https://example.com/w"},
{"alice": "username"},
)
assert result["ids_usernames"] == {"alice": "username"}
assert "https://example.com/w" in result["ids_links"]
assert "https://example.com/a" in result["ids_links"]
def test_update_results_info_no_website():
info = {}
result = update_results_info(info, {"links": "[]"}, {})
assert result["ids_links"] == []
def test_extract_ids_data_bad_html_returns_empty():
logger = Mock()
# Random HTML should not raise — returns {} if nothing matches
out = extract_ids_data("<html><body>nothing special</body></html>", logger, Mock(name="Site"))
assert isinstance(out, dict)
def test_get_failed_sites_filters_permanent_errors():
# Temporary errors (Request timeout, Connecting failure, etc.) are retryable → returned.
# Permanent ones (Captcha, Access denied, etc.) and results without error → filtered out.
good_status = MaigretCheckResult("u", "S1", "https://s1", MaigretCheckStatus.CLAIMED)
timeout_err = MaigretCheckResult(
"u", "S2", "https://s2", MaigretCheckStatus.UNKNOWN,
error=CheckError("Request timeout", "slow server"),
)
captcha_err = MaigretCheckResult(
"u", "S3", "https://s3", MaigretCheckStatus.UNKNOWN,
error=CheckError("Captcha", "Cloudflare"),
)
results = {
"S1": {"status": good_status},
"S2": {"status": timeout_err},
"S3": {"status": captcha_err},
"S4": {}, # no status at all
def test_process_site_result_threads_response_time(local_test_db):
"""process_site_result must thread the response_time kwarg into the result's query_time."""
site = local_test_db.sites_dict['StatusCode']
results_info = {
'username': 'claimed',
'parsing_enabled': False,
'url_user': site.url.replace('{username}', 'claimed'),
'status': None,
'rank': 0,
'url_main': site.url_main,
'ids_data': {},
}
failed = get_failed_sites(results)
# Only the temporary-error site is retry-worthy
assert failed == ["S2"]
response = ('body', 200, None)
logger = logging.getLogger('test')
query_notify = Mock()
out = process_site_result(
response, query_notify, logger, results_info, site,
response_time=1.234,
)
assert out['status'].query_time == pytest.approx(1.234)
def test_timeout_check_valid():
assert timeout_check("2.5") == 2.5
assert timeout_check("30") == 30.0
def test_timeout_check_invalid():
with pytest.raises(ArgumentTypeError):
timeout_check("abc")
with pytest.raises(ArgumentTypeError):
timeout_check("0")
with pytest.raises(ArgumentTypeError):
timeout_check("-1")
def test_debug_response_logging_writes(tmp_path, monkeypatch):
monkeypatch.chdir(tmp_path)
debug_response_logging("https://example.com", "<html>hi</html>", 200, None)
out = (tmp_path / "debug.log").read_text()
assert "https://example.com" in out
assert "200" in out
def test_debug_response_logging_no_response(tmp_path, monkeypatch):
monkeypatch.chdir(tmp_path)
debug_response_logging("https://example.com", None, None, CheckError("Timeout"))
out = (tmp_path / "debug.log").read_text()
assert "No response" in out
def _make_site(data_overrides=None):
base = {
"url": "https://x/{username}",
"urlMain": "https://x",
"checkType": "status_code",
"usernameClaimed": "a",
"usernameUnclaimed": "b",
def test_process_site_result_defaults_response_time_to_none(local_test_db):
"""Omitting response_time keeps query_time as None (backward compatible)."""
site = local_test_db.sites_dict['StatusCode']
results_info = {
'username': 'claimed',
'parsing_enabled': False,
'url_user': site.url.replace('{username}', 'claimed'),
'status': None,
'rank': 0,
'url_main': site.url_main,
'ids_data': {},
}
if data_overrides:
base.update(data_overrides)
return MaigretSite("TestSite", base)
out = process_site_result(
('body', 200, None), Mock(), logging.getLogger('test'), results_info, site,
)
assert out['status'].query_time is None
def test_process_site_result_no_response_returns_info():
site = _make_site()
info = {"username": "a", "parsing_enabled": False, "url_user": "https://x/a"}
out = process_site_result(None, Mock(), Mock(), info, site)
assert out is info
@pytest.mark.slow
@pytest.mark.asyncio
async def test_query_time_populated_from_http_check(httpserver, local_test_db):
"""check_site_for_username measures HTTP round-trip and populates query_time."""
sites_dict = local_test_db.sites_dict
# Delay the response on the test HTTP server to produce a measurable query_time.
DELAY = 0.25
def test_process_site_result_status_already_set():
site = _make_site()
pre = MaigretCheckResult("a", "S", "u", MaigretCheckStatus.ILLEGAL)
info = {"username": "a", "parsing_enabled": False, "status": pre, "url_user": "u"}
# Since status is already set, function returns without changes
out = process_site_result(("<html/>", 200, None), Mock(), Mock(), info, site)
assert out["status"] is pre
def delayed_handler(request):
import time as _time
_time.sleep(DELAY)
from werkzeug.wrappers import Response
return Response('ok', status=200)
httpserver.expect_request('/url', query_string='id=claimed').respond_with_handler(delayed_handler)
def test_process_site_result_status_code_claimed():
site = _make_site({"checkType": "status_code"})
info = {"username": "a", "parsing_enabled": False, "url_user": "https://x/a"}
out = process_site_result(("<html/>", 200, None), Mock(), Mock(), info, site)
assert out["status"].status == MaigretCheckStatus.CLAIMED
assert out["http_status"] == 200
def test_process_site_result_status_code_available():
site = _make_site({"checkType": "status_code"})
info = {"username": "a", "parsing_enabled": False, "url_user": "https://x/a"}
out = process_site_result(("<html/>", 404, None), Mock(), Mock(), info, site)
assert out["status"].status == MaigretCheckStatus.AVAILABLE
def test_process_site_result_message_claimed():
site = _make_site({
"checkType": "message",
"presenseStrs": ["profile-name"],
"absenceStrs": ["not found"],
})
info = {"username": "a", "parsing_enabled": False, "url_user": "https://x/a"}
out = process_site_result(("<div class='profile-name'>Alice</div>", 200, None), Mock(), Mock(), info, site)
assert out["status"].status == MaigretCheckStatus.CLAIMED
def test_process_site_result_message_available_by_absence():
site = _make_site({
"checkType": "message",
"presenseStrs": ["profile-name"],
"absenceStrs": ["not found"],
})
info = {"username": "a", "parsing_enabled": False, "url_user": "https://x/a"}
out = process_site_result(("<h1>not found</h1> profile-name too", 200, None), Mock(), Mock(), info, site)
# absence marker wins even if presence marker also appears
assert out["status"].status == MaigretCheckStatus.AVAILABLE
def test_process_site_result_with_error_is_unknown():
site = _make_site({"checkType": "status_code"})
info = {"username": "a", "parsing_enabled": False, "url_user": "https://x/a"}
resp = ("body", 403, CheckError("Captcha", "Cloudflare"))
out = process_site_result(resp, Mock(), Mock(), info, site)
assert out["status"].status == MaigretCheckStatus.UNKNOWN
assert out["status"].error is not None
result = await search('claimed', site_dict={'StatusCode': sites_dict['StatusCode']}, logger=Mock())
status = result['StatusCode']['status']
assert status.is_found() is True
assert isinstance(status.query_time, float)
assert status.query_time >= DELAY
# Upper bound: the measurement should not wildly exceed the server delay.
assert status.query_time < DELAY + 5.0
-2
View File
@@ -49,8 +49,6 @@ DEFAULT_ARGS: Dict[str, Any] = {
'with_domains': False,
'xmind': False,
'md': False,
'ai': False,
'ai_model': 'gpt-4o',
'no_autoupdate': False,
'force_update': False,
}
+11 -11
View File
@@ -26,7 +26,7 @@ async def test_simple_asyncio_executor():
executor = AsyncioSimpleExecutor(logger=logger)
assert await executor.run(tasks) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
assert executor.execution_time > 0.2
assert executor.execution_time < 1.0
assert executor.execution_time < 0.3
@pytest.mark.asyncio
@@ -37,7 +37,7 @@ async def test_asyncio_progressbar_executor():
# no guarantees for the results order
assert sorted(await executor.run(tasks)) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
assert executor.execution_time > 0.2
assert executor.execution_time < 1.0
assert executor.execution_time < 0.3
@pytest.mark.asyncio
@@ -48,7 +48,7 @@ async def test_asyncio_progressbar_semaphore_executor():
# no guarantees for the results order
assert sorted(await executor.run(tasks)) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
assert executor.execution_time > 0.2
assert executor.execution_time < 1.1
assert executor.execution_time < 0.4
@pytest.mark.slow
@@ -59,12 +59,12 @@ async def test_asyncio_progressbar_queue_executor():
executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=2)
assert await executor.run(tasks) == [0, 1, 3, 2, 4, 6, 7, 5, 9, 8]
assert executor.execution_time > 0.5
assert executor.execution_time < 1.4
assert executor.execution_time < 0.7
executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=3)
assert await executor.run(tasks) == [0, 3, 1, 4, 6, 2, 7, 9, 5, 8]
assert executor.execution_time > 0.4
assert executor.execution_time < 1.3
assert executor.execution_time < 0.6
executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=5)
assert await executor.run(tasks) in (
@@ -72,12 +72,12 @@ async def test_asyncio_progressbar_queue_executor():
[0, 3, 6, 1, 4, 9, 7, 2, 5, 8],
)
assert executor.execution_time > 0.3
assert executor.execution_time < 1.2
assert executor.execution_time < 0.5
executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=10)
assert await executor.run(tasks) == [0, 3, 6, 9, 1, 4, 7, 2, 5, 8]
assert executor.execution_time > 0.2
assert executor.execution_time < 1.1
assert executor.execution_time < 0.4
@pytest.mark.asyncio
@@ -88,13 +88,13 @@ async def test_asyncio_queue_generator_executor():
results = [result async for result in executor.run(tasks)] # type: ignore[arg-type]
assert results == [0, 1, 3, 2, 4, 6, 7, 5, 9, 8]
assert executor.execution_time > 0.5
assert executor.execution_time < 1.3
assert executor.execution_time < 0.6
executor = AsyncioQueueGeneratorExecutor(logger=logger, in_parallel=3)
results = [result async for result in executor.run(tasks)] # type: ignore[arg-type]
assert results == [0, 3, 1, 4, 6, 2, 7, 9, 5, 8]
assert executor.execution_time > 0.4
assert executor.execution_time < 1.2
assert executor.execution_time < 0.5
executor = AsyncioQueueGeneratorExecutor(logger=logger, in_parallel=5)
results = [result async for result in executor.run(tasks)] # type: ignore[arg-type]
@@ -103,10 +103,10 @@ async def test_asyncio_queue_generator_executor():
[0, 3, 6, 1, 4, 9, 7, 2, 5, 8],
)
assert executor.execution_time > 0.3
assert executor.execution_time < 1.1
assert executor.execution_time < 0.4
executor = AsyncioQueueGeneratorExecutor(logger=logger, in_parallel=10)
results = [result async for result in executor.run(tasks)] # type: ignore[arg-type]
assert results == [0, 3, 6, 9, 1, 4, 7, 2, 5, 8]
assert executor.execution_time > 0.2
assert executor.execution_time < 1.0
assert executor.execution_time < 0.3
-227
View File
@@ -10,15 +10,8 @@ import xmind # type: ignore[import-untyped]
from jinja2 import Template
from maigret.report import (
filter_supposed_data,
sort_report_by_data_points,
_md_format_value,
generate_csv_report,
generate_txt_report,
save_csv_report,
save_txt_report,
save_json_report,
save_markdown_report,
save_xmind_report,
save_html_report,
save_pdf_report,
@@ -463,223 +456,3 @@ def test_text_report_broken():
assert brief_part in report_text
assert 'us' in report_text
assert 'photo' in report_text
def test_filter_supposed_data():
data = {
'fullname': ['Alice'],
'gender': ['female'],
'location': ['Berlin'],
'age': ['30'],
'email': ['x@y.z'], # not allowed, must be dropped
'bio': ['hi'], # not allowed
}
result = filter_supposed_data(data)
assert result == {
'Fullname': 'Alice',
'Gender': 'female',
'Location': 'Berlin',
'Age': '30',
}
def test_filter_supposed_data_empty():
assert filter_supposed_data({}) == {}
assert filter_supposed_data({'nope': ['v']}) == {}
def test_filter_supposed_data_scalar_values():
# Strings and scalars must be kept whole — previously v[0] on "Alice"
# silently returned "A" instead of "Alice".
data = {
'fullname': 'Alice',
'gender': 'female',
'location': 'Berlin',
'age': 30,
}
assert filter_supposed_data(data) == {
'Fullname': 'Alice',
'Gender': 'female',
'Location': 'Berlin',
'Age': 30,
}
def test_filter_supposed_data_empty_list_yields_empty_string():
# Edge case: list value present but empty should not crash with IndexError.
assert filter_supposed_data({'fullname': []}) == {'Fullname': ''}
def test_filter_supposed_data_mixed_values():
# List and scalar mixed in the same payload.
data = {'fullname': ['Alice', 'Alicia'], 'gender': 'female'}
assert filter_supposed_data(data) == {
'Fullname': 'Alice',
'Gender': 'female',
}
def test_sort_report_by_data_points():
status_many = MaigretCheckResult('', '', '', MaigretCheckStatus.CLAIMED)
status_many.ids_data = {'a': 1, 'b': 2, 'c': 3}
status_one = MaigretCheckResult('', '', '', MaigretCheckStatus.CLAIMED)
status_one.ids_data = {'a': 1}
status_none = MaigretCheckResult('', '', '', MaigretCheckStatus.CLAIMED)
results = {
'few': {'status': status_one},
'many': {'status': status_many},
'zero': {'status': status_none},
'nostatus': {},
}
sorted_out = sort_report_by_data_points(results)
keys = list(sorted_out.keys())
# site with 3 ids_data fields must come first
assert keys[0] == 'many'
# site with 1 field next
assert keys[1] == 'few'
def test_md_format_value_list():
assert _md_format_value(['a', 'b', 'c']) == 'a, b, c'
def test_md_format_value_url():
assert _md_format_value('https://example.com') == '[https://example.com](https://example.com)'
assert _md_format_value('http://x.y') == '[http://x.y](http://x.y)'
def test_md_format_value_plain():
assert _md_format_value('hello') == 'hello'
assert _md_format_value(42) == '42'
def test_save_csv_report():
filename = 'report_test.csv'
save_csv_report(filename, 'test', EXAMPLE_RESULTS)
with open(filename) as f:
content = f.read()
assert 'username,name,url_main' in content
assert 'test,GitHub' in content
def test_save_txt_report():
filename = 'report_test.txt'
save_txt_report(filename, 'test', EXAMPLE_RESULTS)
with open(filename) as f:
content = f.read()
assert 'https://www.github.com/test' in content
assert 'Total Websites Username Detected On : 1' in content
def test_save_json_report_simple():
filename = 'report_test.json'
save_json_report(filename, 'test', EXAMPLE_RESULTS, 'simple')
with open(filename) as f:
data = json.load(f)
assert 'GitHub' in data
def test_save_json_report_ndjson():
filename = 'report_test_ndjson.json'
save_json_report(filename, 'test', EXAMPLE_RESULTS, 'ndjson')
with open(filename) as f:
lines = f.readlines()
assert len(lines) == 1
assert json.loads(lines[0])['sitename'] == 'GitHub'
def _markdown_context_with_rich_ids():
"""Build a context with found accounts, ids_data (incl. image, url, list) to exercise all branches."""
found_result = copy.deepcopy(GOOD_RESULT)
found_result.tags = ['photo', 'us']
found_result.ids_data = {
"fullname": "Alice",
"name": "Alice A.",
"location": "Berlin",
"bio": "Photographer",
"external_url": "https://example.com/profile",
"image": "https://example.com/avatar.png", # must be skipped
"aliases": ["alice", "alicea"], # list value
"last_online": "2024-01-02 10:00:00",
}
data = {
'Github': {
'username': 'alice',
'parsing_enabled': True,
'url_main': 'https://github.com/',
'url_user': 'https://github.com/alice',
'status': found_result,
'http_status': 200,
'is_similar': False,
'rank': 1,
'site': MaigretSite('Github', {}),
'found': True,
'ids_data': found_result.ids_data,
},
'Similar': {
'username': 'alice',
'url_user': 'https://other.com/alice',
'is_similar': True,
'found': True,
'status': copy.deepcopy(GOOD_RESULT),
},
}
return {
'username': 'alice',
'generated_at': '2024-01-02 10:00',
'brief': 'Search returned 1 account',
'countries_tuple_list': [('us', 1)],
'interests_tuple_list': [('photo', 1)],
'first_seen': '2023-01-01',
'results': [('alice', 'username', data)],
}
def test_save_markdown_report():
filename = 'report_test.md'
context = _markdown_context_with_rich_ids()
save_markdown_report(filename, context, run_info={'sites_count': 100, 'flags': '--top-sites 100'})
with open(filename) as f:
content = f.read()
assert '# Report by searching on username "alice"' in content
assert '## Summary' in content
assert '## Accounts found' in content
assert '### Github' in content
assert '[https://github.com/alice](https://github.com/alice)' in content
assert 'Ethical use' in content
assert '100 sites checked' in content
# image field must NOT appear in per-site listing
assert 'avatar.png' not in content
# list field rendered with join
assert 'alice, alicea' in content
# external url formatted as markdown link
assert '[https://example.com/profile](https://example.com/profile)' in content
def test_save_markdown_report_minimal_context():
"""No run_info, no first_seen — exercise the fallback branches."""
filename = 'report_test_min.md'
context = {
'username': 'bob',
'brief': 'nothing found',
'results': [],
}
save_markdown_report(filename, context)
with open(filename) as f:
content = f.read()
assert '# Report by searching on username "bob"' in content
assert '## Summary' in content
def test_get_plaintext_report_minimal():
"""Minimal context without countries/interests."""
context = {
'brief': 'Nothing to report.',
'interests_tuple_list': [],
'countries_tuple_list': [],
}
out = get_plaintext_report(context)
assert 'Nothing to report.' in out
assert 'Countries:' not in out
assert 'Interests' not in out
-5
View File
@@ -1,5 +0,0 @@
#!/bin/bash
set -e
sudo apt-get update && sudo apt-get install -y libcairo2-dev pkg-config
pip install .