diff --git a/.gitignore b/.gitignore
index 99a9ca0..4b887a9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -42,4 +42,4 @@ settings.json
# other
*.egg-info
-build
\ No newline at end of file
+build
diff --git a/LLM/site-checks-guide.md b/LLM/site-checks-guide.md
deleted file mode 100644
index 015f0bc..0000000
--- a/LLM/site-checks-guide.md
+++ /dev/null
@@ -1,558 +0,0 @@
-# Site checks — guide (Maigret)
-
-Working document for future changes: workflow, findings from reviews, and practical steps. See also [`site-checks-playbook.md`](site-checks-playbook.md) (short checklist), [`socid_extractor_improvements.log`](socid_extractor_improvements.log) (proposals for upstream identity extraction), and the code in [`maigret/checking.py`](../maigret/checking.py).
-
-**Documentation maintenance:** whenever you improve Maigret, add search tooling, or change check logic, update **this file** and [`site-checks-playbook.md`](site-checks-playbook.md) in sync (see the section at the end). If you change rules about the JSON API check or the `socid_extractor` log format, update **[`socid_extractor_improvements.log`](socid_extractor_improvements.log)** (template / header) together with this guide.
-
----
-
-## 1. How checks work
-
-Logic lives in `process_site_result` ([`maigret/checking.py`](../maigret/checking.py)):
-
-| `checkType` | Meaning |
-|-------------|---------|
-| `message` | Profile is “found” if the HTML contains **none** of the `absenceStrs` substrings **and** at least one `presenseStrs` marker matches. If `presenseStrs` is **empty**, presence is treated as true for **any** page (risky configuration). |
-| `status_code` | HTTP **2xx** is enough — only safe if the server does **not** return 200 for “user not found”. |
-| `response_url` | Custom flow with **redirects disabled** so the status/URL of the *first* response can be used. |
-
-For other `checkType` values, [`make_site_result`](../maigret/checking.py) sets **`allow_redirects=True`**: the client follows redirects and `process_site_result` sees the **final** response body and status (not the pre-redirect hop). You do **not** need to “turn on” follow-redirect separately for most sites.
-
-Sites with an `engine` field (e.g. XenForo) are merged with a template from the `engines` section in [`maigret/resources/data.json`](../maigret/resources/data.json) ([`MaigretSite.update_from_engine`](../maigret/sites.py)).
-
-### `urlProbe`: probe URL vs reported profile URL
-
-- **`url`** — pattern for the **public profile page** users should open (what appears in reports as `url_user`). Supports `{username}`, `{urlMain}`, `{urlSubpath}`; the username segment is URL-encoded when the string is built ([`make_site_result`](../maigret/checking.py)).
-- **`urlProbe`** (optional) — if set, Maigret sends the HTTP **GET** (or HEAD where applicable) to **this** URL for the check, instead of to `url`. Same placeholders. Use it when the reliable signal is a **JSON/API** endpoint but the human-facing link must stay on the main site (e.g. `https://picsart.com/u/{username}` + probe `https://api.picsart.com/users/show/{username}.json`, or GitHub’s `https://github.com/{username}` + `https://api.github.com/users/{username}`).
-
-If `urlProbe` is omitted, the probe URL defaults to `url`.
-
-### Redirects and final URL as a signal
-
-If the **HTML shell** looks the same for “user exists” and “user does not exist” (typical SPA), it is still worth checking whether the **server** behaves differently:
-
-- **Final URL** after redirects (e.g. profile canonical URL vs `/404` path).
-- **Redirect chain** length or target host (e.g. lander vs profile).
-
-If that differs reliably, you may be able to use **`checkType`: `response_url`** in [`data.json`](../maigret/resources/data.json) (no auto-follow) or extend logic — but only when the difference is stable.
-
-**Server-side HTTP vs client-side navigation.** Maigret follows **HTTP** redirects only; it does **not** run JavaScript. If the browser shows a navigation to `/u/name/posts` or `/not-found` **after** the SPA bundle loads, that may never appear as an extra hop in `curl`/aiohttp — only a **trailing-slash** `301` might show up. Always confirm with `curl -sIL` / a small script whether the **Location** chain differs for real vs fake users before relying on URL-based rules.
-
-**Empirical check (claimed vs non-existent usernames, `GET` with follow redirects, no JS):**
-
-| Site | Result |
-|------|--------|
-| **Kaskus** | No HTTP redirects beyond the request path; same generic `
` and near-identical body length — **no** discriminating signal from redirects alone. |
-| **Bibsonomy** | Both requests redirect to **`/pow-challenge/?return=/user/...`** (proof-of-work). Only the `return` path changes with the username; **both** existing and fake hit the same challenge flow — not a profile-vs-missing distinction. |
-| **Picsart (web UI `https://picsart.com/u/{username}`)** | Only a **trailing-slash** `301`; the first HTML is the same empty app shell (~3 KiB) for real and fake users. Browser-only routes such as `…/posts` vs `…/not-found` are **not** visible as additional HTTP redirects in this pipeline. |
-
-**Picsart — workable check via public API.** The site exposes **`https://api.picsart.com/users/show/{username}.json`**: JSON with `"status":"success"` and a user object when the account exists, and `"reason":"user_not_found"` when it does not. Put that URL in **`urlProbe`**, set **`url`** to the web profile pattern **`https://picsart.com/u/{username}`**, and use **`checkType`: `message`** with narrow `presenseStrs` / `absenceStrs` so reports show the human link while the request hits the API (see **`urlProbe`** above).
-
-For **Kaskus** and **Bibsonomy**, HTTP-level comparison still does **not** unlock a safe check without PoW / richer signals; keep **`disabled: true`** until something stable appears (API, SSR markers, etc.).
-
----
-
-## 2. Standard checks: public JSON API and `socid_extractor` log
-
-### 2.1 Public JSON API (always)
-
-When diagnosing a site—especially **SPAs**, **soft 404s**, or **near-identical HTML** for real vs fake users—**routinely look for a public JSON (or JSON-like) API** used for profile or user lookup. Typical leads: paths containing `/api/`, `/v1/`, `graphql`, `users/show`, `.json` suffixes, or the same endpoints mobile apps use. Verify with `curl` (or the Maigret request path) that **claimed** and **unclaimed** usernames produce **reliably different** bodies or status codes. If such an endpoint is more stable than HTML, put it in **`urlProbe`** and keep **`url`** as the canonical profile page on the main site (see **`urlProbe`** in section 1). If there is no separate public URL for humans, you may still point **`url`** at the API only (reports will show that URL).
-
-This is a **standard** part of site-check work, not an optional extra.
-
-### 2.2 Mandatory: [`LLM/socid_extractor_improvements.log`](socid_extractor_improvements.log)
-
-If you discover **either**:
-
-1. **JSON embedded in HTML** with user/profile fields (inline scripts, `__NEXT_DATA__`, `application/ld+json`, hydration blobs, etc.), or
-2. A **standalone JSON HTTP response** (public API) with user/profile data for that service,
-
-you **must append** a proposal block to **[`LLM/socid_extractor_improvements.log`](socid_extractor_improvements.log)**.
-
-**Why:** Maigret calls [`socid_extractor.extract`](https://pypi.org/project/socid-extractor/) on the response body ([`extract_ids_data` in `checking.py`](../maigret/checking.py)) to fill `ids_data`. New payloads usually need a **new scheme** upstream (`flags`, `regex`, optional `extract_json`, `fields`, optional `url_mutations` / `transforms`), matching patterns such as **`GitHub API`** or **`Gitlab API`** in `socid_extractor`’s `schemes.py`.
-
-**Each log entry must include:**
-
-- **Date** — ISO `YYYY-MM-DD` (day you add the entry).
-- **Example username** — Prefer the site’s `usernameClaimed` from `data.json`, or any account that reproduces the payload.
-- **Proposal** — Use the **block template** in the log file: detection idea, optional URL mutation, and field mappings in the same style as existing schemes.
-
-If the service is **already covered** by an existing `socid_extractor` scheme, add a **short** entry anyway (date, example username, scheme name, “already implemented”) so there is an audit trail.
-
-Do **not** paste secrets, cookies, or full private JSON; short key names and structure hints are enough.
-
----
-
-## 3. Improvement workflow
-
-### Phase A — Reproduce
-
-1. Targeted run:
- ```bash
- maigret --db /path/to/maigret/resources/data.json \
- TEST_USERNAME \
- --site "SiteName" \
- --print-not-found --print-errors \
- --no-progressbar -vv
- ```
-2. Run separately with a **real** existing username and a **definitely non-existent** one (as `usernameClaimed` / `usernameUnclaimed` in JSON).
-3. If needed: `-vvv` and `debug.log` (raw response).
-4. Automated pair check:
- ```bash
- maigret --db ... --self-check --site "SiteName" --no-progressbar
- ```
-
-### Phase B — Classify the cause
-
-| Symptom | Likely cause |
-|---------|----------------|
-| False “found” with `status_code` | Soft 404 (200 on a “not found” page). |
-| False “found” with `message` | Overly broad `presenseStrs` (`name`, `email`, JSON keys) or stale `absenceStrs`. |
-| Same HTML for different users | SPA / skeleton shell before hydration — also compare **final URL / redirect chain** (see above); if still identical, often `disabled`. |
-| Login page instead of profile | XenForo etc.: guest, `ignore403`, “must be logged in” strings. |
-| reCAPTCHA / “Checking your browser” / “not a bot” | Bot protection; Maigret’s default User-Agent may worsen the response. |
-| Redirect to another domain / lander | Stale URL template. |
-
-### Phase C — Edits in [`data.json`](../maigret/resources/data.json)
-
-**CRITICAL — surgical edits only.** `data.json` is a ~36 000-line file. **Never** rewrite it via `json.load()` + `json.dump()` — this reformats every line and produces a 70 000-line diff that is impossible to review. Instead, make **targeted text-level edits** (find the site's block, change only the specific lines). Use the `Edit` tool (or equivalent line-precise method), not a full JSON round-trip. The same rule applies to scripts: if a helper writes `data.json`, it must preserve the original formatting of untouched entries.
-
-1. Update `url` / `urlMain` if needed (HTTPS, new profile path).
-2. Replace inappropriate `status_code` with `message` (or `response_url`), choosing:
- - **`absenceStrs`** — only what reliably appears on the “user does not exist” page;
- - **`presenseStrs`** — narrow markers of a real profile (avoid generic words).
-3. For XenForo: override only fields that differ in the site entry; do not break the global `engines` template.
-4. Refresh `usernameClaimed` / `usernameUnclaimed` if reference accounts disappeared.
-5. Set **`headers`** (e.g. another `User-Agent`) if the site serves a captcha only to “suspicious” clients.
-6. Use **`errors`**: HTML substring → meaningful check error (UNKNOWN), so it is not confused with “available”.
-
-### Phase D — Decision criteria
-
-| Outcome | When to use |
-|---------|-------------|
-| **Check fixed** | The `claimed` / `unclaimed` pair behaves predictably, `--self-check` passes, no regression on a similar site with the same engine. |
-| **Check disabled** (`disabled: true`) | Cloudflare / anti-bot / login required / indistinguishable SPA without stable markers. |
-| **Entry removed** | **Only** if the domain/service is gone (NXDOMAIN, clearly dead project), not “because it is hard to fix”. |
-
-### Phase E — Before commit
-
-- `maigret --self-check` for affected sites.
-- `make test`.
-
----
-
-## 4. Findings from reviews (concrete site batch)
-
-Summary from an earlier false-positive review for: OpenSea, Mercado Livre, Redtube, Tom’s Guide, Kaggle, Kaskus, Livemaster, TechPowerUp, authorSTREAM, Bibsonomy, Bulbagarden, iXBT, Serebii, Picsart, Hashnode, hi5.
-
-### What most often broke checks
-
-1. **`status_code` where content checks are needed** — soft 404 with status 200.
-2. **Broad `presenseStrs`** — matches on error pages or generic SPA shells.
-3. **XenForo + guest** — HTML includes strings like “You must be logged in” that overlap the engine template.
-4. **User-Agent** — on some sites (e.g. Kaggle) the default UA triggered a reCAPTCHA page instead of profile HTML; a deliberate `User-Agent` in site `headers` helped.
-5. **SPAs and redirects** — identical first HTML, redirect to lander / another product (hi5 → Tagged), URL format changes by region (Mercado Livre).
-
-### What worked as a fix
-
-- Switching to **`message`** with narrow strings from **``** or unique markup where stable (**Kaggle**, **Mercado Livre**, **Hashnode**).
-- For **Kaggle**, additionally: **`headers`**, **`errors`** for browser-check text.
-- **Redtube** stayed valid on **`status_code`** with a stable **404** for non-existent users.
-- **Picsart**: the web profile URL is a thin SPA shell; use the **JSON API** (`api.picsart.com/users/show/{username}.json`) in **`url`** with **`message`**-style markers (`"status":"success"` vs `user_not_found`), not the browser-only `/posts` vs `/not-found` navigation.
-- For **Weblate / Anubis Anti-Bot**: Setting `headers` with a basic script User-Agent (e.g. `python-requests/2.25.1`) rather than the default browser UA completely bypassed the Anubis Proof-of-Work challenge HTTP 307 redirect, instantly recovering the native HTTP 404 framework.
-
-### What required disabling checks
-
-Where you **cannot** reliably tell “profile exists” from “no profile” without bypassing protection, login, or full JS:
-
-- Anti-bot / captcha / “not a bot” page;
-- Guest-only access to the needed page;
-- SPA with indistinguishable first response;
-- Forums returning **403** and a login page instead of a member profile for the member-search URL;
-- Stale URLs that redirect to a stub.
-
-In those cases **`disabled: true`** is better than false “found”; remove the DB entry only on **actual** domain death.
-
-### Code notes
-
-- For the `status_code` branch in `process_site_result`, use **strict** comparison `check_type == "status_code"`, not a substring match inside `"status_code"`.
-- Treat empty `presenseStrs` with `message` as risky: when debugging, watch DEBUG-level logs if that diagnostics exists in code.
-
----
-
-## 5. Future ideas (Maigret improvements)
-
-- A mode or script: one site, two usernames, print statuses and first N bytes of the response (wrapper around `maigret()`).
-- Document in CLI help that **`--use-disabled-sites`** is needed to analyze disabled entries.
-
----
-
-## 6. Development utilities
-
-### 6.1 `utils/site_check.py` — Single site diagnostics
-
-A comprehensive utility for testing individual sites with multiple modes:
-
-```bash
-# Basic comparison of claimed vs unclaimed (aiohttp)
-python utils/site_check.py --site "VK" --check-claimed
-
-# Test via Maigret's checker directly
-python utils/site_check.py --site "VK" --maigret
-
-# Compare aiohttp vs Maigret results (find discrepancies)
-python utils/site_check.py --site "VK" --compare-methods
-
-# Full diagnosis with recommendations
-python utils/site_check.py --site "VK" --diagnose
-
-# Test with custom URL
-python utils/site_check.py --url "https://example.com/{username}" --compare user1 user2
-
-# Find a valid username for a site
-python utils/site_check.py --site "VK" --find-user
-```
-
-**Key features:**
-- `--maigret` — Uses Maigret's actual checking code, not raw aiohttp
-- `--compare-methods` — Shows if aiohttp and Maigret see different results (useful for debugging)
-- `--diagnose` — Validates checkType against actual responses, suggests fixes
-- Color output with markers detection (captcha, cloudflare, login, etc.)
-- `--json` flag for machine-readable output
-
-**When to use each mode:**
-
-| Mode | Use case |
-|------|----------|
-| `--check-claimed` | Quick sanity check: do claimed/unclaimed still differ? |
-| `--maigret` | Verify Maigret's actual behavior matches expectations |
-| `--compare-methods` | Debug "works in curl but fails in Maigret" issues |
-| `--diagnose` | Full analysis when a site is broken, get fix recommendations |
-
-### 6.2 `utils/check_top_n.py` — Mass site checking
-
-Batch-check top N sites by Alexa rank with categorized reporting:
-
-```bash
-# Check top 100 sites
-python utils/check_top_n.py --top 100
-
-# Faster with more parallelism
-python utils/check_top_n.py --top 100 --parallel 10
-
-# Output JSON report
-python utils/check_top_n.py --top 100 --output report.json
-
-# Only show broken sites
-python utils/check_top_n.py --top 100 --only-broken
-```
-
-**Output categories:**
-- `working` — Site check passes
-- `broken` — Check fails (wrong status, missing markers)
-- `timeout` — Request timed out
-- `anti_bot` — 403/429 or captcha detected
-- `error` — Connection or other errors
-- `disabled` — Already disabled in data.json
-
-**Report includes:**
-- Summary counts by category
-- List of broken sites with issues
-- Recommendations for fixes (e.g., "Switch to checkType: status_code")
-
-### 6.3 Self-check behavior (`--self-check`)
-
-The self-check command has been improved to be less aggressive:
-
-```bash
-# Check sites WITHOUT auto-disabling (default)
-maigret --self-check --site "VK"
-
-# Auto-disable failing sites (old behavior)
-maigret --self-check --site "VK" --auto-disable
-
-# Show detailed diagnosis for each failure
-maigret --self-check --site "VK" --diagnose
-```
-
-**Behavior changes:**
-
-| Flag | Effect |
-|------|--------|
-| `--self-check` alone | Reports issues but does NOT disable sites |
-| `--auto-disable` | Automatically disables sites that fail (opt-in) |
-| `--diagnose` | Prints detailed diagnosis with recommendations |
-
-**Why this matters:**
-- Old behavior was too aggressive — sites got disabled without explanation
-- New behavior reports issues and suggests fixes
-- Explicit `--auto-disable` required to modify database
-
----
-
-## 7. Lessons learned (practical observations)
-
-Collected from hands-on work fixing top-ranked sites (Reddit, Wikipedia, Microsoft Learn, Baidu, etc.).
-
-### 7.1 JSON API is the first thing to look for
-
-Both Reddit and Microsoft Learn had working public APIs that solved the problem entirely. The web pages were SPAs or blocked by anti-bot measures, but the APIs worked reliably:
-
-- **Reddit**: `https://api.reddit.com/user/{username}/about` — returns JSON with user data or `{"message": "Not Found", "error": 404}`.
-- **Microsoft Learn**: `https://learn.microsoft.com/api/profiles/{username}` — returns JSON with `userName` field or HTTP 404.
-
-This confirms the playbook recommendation: always check for `/api/`, `.json`, GraphQL endpoints before giving up on a site.
-
-### 7.2 `urlProbe` is a powerful tool
-
-It separates "what we check" (API) from "what we show the user" (human-readable profile URL). Reddit is a perfect example:
-
-```json
-{
- "url": "https://www.reddit.com/user/{username}",
- "urlProbe": "https://api.reddit.com/user/{username}/about",
- "checkType": "message",
- "presenseStrs": ["\"name\":"],
- "absenceStrs": ["Not Found"]
-}
-```
-
-The check hits the API, but reports display `www.reddit.com/user/blue`.
-
-### 7.3 aiohttp ≠ curl ≠ requests
-
-Wikipedia returned HTTP 200 for `curl` and Python `requests`, but HTTP 403 for `aiohttp`. This is **TLS fingerprinting** — the server identifies the HTTP library by cryptographic characteristics of the TLS handshake, not by headers.
-
-**Key insight:** Changing `User-Agent` does **not** help against TLS fingerprinting. Always test with aiohttp directly (or via Maigret with `-vvv` and `debug.log`), not just `curl`.
-
-```python
-# This returns 403 for Wikipedia even with browser UA:
-async with aiohttp.ClientSession() as session:
- async with session.get(url, headers={"User-Agent": "Mozilla/5.0 ..."}) as resp:
- print(resp.status) # 403
-```
-
-### 7.4 HTTP 403 in Maigret can mean different things
-
-Initially it seemed Wikipedia was returning 403, but `curl` showed 200. Only `debug.log` revealed the real picture — aiohttp was getting blocked at TLS level.
-
-**Lesson:** Use `-vvv` flag and inspect `debug.log` for raw response status and body. The warning message alone may be misleading.
-
-### 7.5 Dead services migrate, not disappear
-
-MSDN Social and TechNet profiles redirected to Microsoft Learn. Instead of deleting old entries:
-
-1. Keep old entries with `disabled: true` as historical record.
-2. Create a new entry for the current service with working API.
-
-This preserves audit trail and avoids breaking existing workflows.
-
-### 7.6 `status_code` is more reliable than `message` for APIs
-
-Microsoft Learn API returns HTTP 404 for non-existent users — a clean signal without HTML parsing. For JSON APIs that return proper HTTP status codes, `status_code` is often the best choice:
-
-```json
-{
- "checkType": "status_code",
- "urlProbe": "https://learn.microsoft.com/api/profiles/{username}"
-}
-```
-
-No need for fragile string matching when the API speaks HTTP correctly.
-
-### 7.8 Engine templates can silently break across many sites
-
-The **vBulletin** engine template has `absenceStrs` in six languages ("This user has not registered…", two Russian variants, Turkish, Ukrainian, Dutch). In a comprehensive audit of all 57 enabled vBulletin sites (2026-03-27), **26 were broken** (46%). The root causes were **not** template marker mismatch in most cases:
-
-| Category | Count | Examples |
-|----------|-------|---------|
-| Cloudflare challenge (403 `cf-mitigated`) | 7 | Mpgh, TheStudentRoom, SevenForums, alliedmods |
-| Dead/unreachable | 5 | Tanks, holodforum.ru, Microchip |
-| Server-side 403 (non-CF) | 5 | scaleforum.ru, forum-history.ru, Gorod.dp.ua |
-| Redirect/domain moved | 5 | Warface, Revelation, Stratege |
-| Login required to view profiles | 4 | goha, Animeforum, WiredNewYork |
-
-Only the "login required" category relates to the template markers: when a forum requires authentication, the member.php page shows a generic response without the "user not registered" text. All 26 sites were disabled.
-
-**Note on Russian translations:** Two distinct Russian vBulletin translations exist in the wild:
-- `"Этот пользователь ещё не зарегистрирован, поэтому его профиль недоступен."` (standard)
-- `"Пользователь не зарегистрирован и не имеет профиля для просмотра."` (goha.ru variant)
-
-Both are now in the engine template.
-
-**Lesson:** When a whole engine class shows high failure rates, categorize failures first — most are site-level infrastructure issues (CF, dead, auth), not template problems. Batch-disable broken sites rather than patching individually. Only investigate the template itself if the HTTP response is 200 but markers don't match.
-
-### 7.9 Search-by-author URLs are architecturally unreliable
-
-Several sites (OnanistovNet, Shoppingzone, Pogovorim, Astrogalaxy, Sexwin) used a phpBB-style `search.php?keywords=&terms=all&author={username}` URL as the check endpoint. This searches for **posts** by that author, not for the user account itself. Even if the markers worked, a user who exists but has zero posts would be indistinguishable from a non-existent user. And in practice, the sites changed their response format — some now return HTTP 404, others dropped the expected Russian absence text altogether.
-
-**Lesson:** Avoid author-search URLs as the check endpoint; they test "has posts" rather than "account exists" and are doubly fragile (both logic mismatch and format drift).
-
-### 7.10 Some sites generate a page for any path — permanent false positives
-
-Two distinct patterns:
-
-- **Pbase** creates a stub page titled "pbase Artist {username}" for **every** URL, real or fake. Both return HTTP 200 with nearly identical content (~3.3 KB). No markers can distinguish them.
-- **ffm.bio** is even trickier: for the non-existent username `a.slomkoowski` it generated a page titled "mr.a" with description "a is a", apparently fuzzy-matching the path to the closest real entry. Both return HTTP 200 with large, content-rich pages.
-
-**Lesson:** Before writing markers for a site, verify that the "unclaimed" URL actually produces an **error-like** response (different status, different title, unique error text). If the site always returns a plausible-looking page, no combination of `presenseStrs` / `absenceStrs` will help — `disabled: true` is the only safe option.
-
-### 7.11 TLS fingerprinting can degrade over time (Kaggle)
-
-Kaggle was previously fixed with a custom `User-Agent` header and `errors` for the "Checking your browser" captcha page. In the latest batch review, aiohttp receives HTTP 404 with identical content for **both** claimed and unclaimed usernames — the site now blocks the entire request before it reaches the profile page. This matches the TLS fingerprinting pattern seen earlier with Wikipedia (section 7.3), but here the degradation happened **after** a working fix was already in place.
-
-**Lesson:** Sites that rely on bot-detection can tighten their rules at any time. A working `User-Agent` override today may fail tomorrow. When a previously fixed site starts returning identical responses for both usernames, suspect TLS fingerprinting first, and accept `disabled: true` if no public API is available.
-
-### 7.12 API endpoints may bypass Cloudflare even when the main site is blocked
-
-All four Fandom wikis returned HTTP 403 with a Cloudflare "Just a moment..." challenge when aiohttp accessed the user profile page (`/wiki/User:{username}`). However, the **MediaWiki API** on the same domain (`/api.php?action=query&list=users&ususers={username}&format=json`) returned clean JSON without any challenge. Similarly, **Substack** served a captcha-laden SPA for `/@{username}`, but its `public_profile` API (`/api/v1/user/{username}/public_profile`) responded with proper JSON and correct HTTP 404 for missing users.
-
-This is likely because API routes are excluded from the Cloudflare WAF rules or use a different pipeline than the HTML-serving paths.
-
-**Lesson:** When a site's main pages are blocked by Cloudflare or similar WAF, still check API endpoints on the **same domain** — they may not go through the same protection layer. This is especially true for:
-- MediaWiki's `api.php` on wiki farms (Fandom, Wikia, self-hosted MediaWiki)
-- REST API paths (`/api/v1/`, `/api/v2/`) on SPA-heavy sites
-- Internal data endpoints that the SPA itself calls
-
-### 7.13 GraphQL APIs often support GET, not just POST
-
-**hashnode** exposes a GraphQL endpoint at `https://gql.hashnode.com`. While GraphQL is typically associated with POST requests, many implementations also support **GET** with the query passed as a URL parameter. This is critical for Maigret, which only supports GET/HEAD for `urlProbe`.
-
-```
-GET https://gql.hashnode.com?query=%7Buser(username%3A%20%22melwinalm%22)%20%7B%20name%20username%20%7D%7D
-→ {"data":{"user":{"name":"Melwin D'Almeida","username":"melwinalm"}}}
-
-GET https://gql.hashnode.com?query=%7Buser(username%3A%20%22a.slomkoowski%22)%20%7B%20name%20username%20%7D%7D
-→ {"data":{"user":null}}
-```
-
-**Lesson:** Before giving up on a GraphQL-only site, try the same query via GET with `?query=...` (URL-encoded). Many GraphQL servers accept both methods.
-
-### 7.14 URL-encoding resolves template placeholder conflicts
-
-The hashnode GraphQL query `{user(username: "{username}") { name }}` contains curly braces that conflict with Maigret's `{username}` placeholder — Python's `str.format()` would raise a `KeyError` on `{user(username...}`.
-
-The fix: URL-encode the GraphQL braces (`{` → `%7B`, `}` → `%7D`) but leave `{username}` as-is. Python's `.format()` only interprets literal `{…}` as placeholders, not `%7B…%7D`, and the GraphQL server decodes the percent-encoding on its end:
-
-```
-urlProbe: https://gql.hashnode.com?query=%7Buser(username%3A%20%22{username}%22)%20%7B%20name%20username%20%7D%7D
-```
-
-After `.format(username="melwinalm")`:
-```
-https://gql.hashnode.com?query=%7Buser(username%3A%20%22melwinalm%22)%20%7B%20name%20username%20%7D%7D
-```
-
-**Lesson:** When a `urlProbe` needs literal curly braces (GraphQL, JSON in URL, etc.), percent-encode them. This is a general technique for any `data.json` URL field processed by `.format()`.
-
-### 7.15 Rate-limit responses belong in `errors`, not `absenceStrs`
-
-When a site's API returns a rate-limit response, the text may **not** match the `absenceStrs` entry — either because the wording varies between API versions (`"The resource is being rate limited"` vs `"You are being rate limited."`) or because the JSON structure differs entirely. If the rate-limit string is in `absenceStrs` and the actual response uses a different phrasing, **no** absence string matches. With empty `presenseStrs` (presence always true), the result is a false **CLAIMED**.
-
-**Fix:** Move rate-limit strings out of `absenceStrs` and into `errors` (mapping to `"Rate limited"` or similar). The `errors` mechanism produces an **UNKNOWN** result instead of CLAIMED or NOT FOUND, which is the correct semantic: rate limiting means "we don't know", not "user exists" or "user doesn't exist".
-
-```json
-{
- "absenceStrs": ["{\"taken\":false}"],
- "errors": {
- "The resource is being rate limited": "Rate limited",
- "You are being rate limited": "Rate limited"
- }
-}
-```
-
-**General rule:** Any response that means "I can't answer right now" (rate limit, maintenance page, CAPTCHA, temporary ban) should go into `errors`, never into `absenceStrs` or `presenseStrs`. Only strings that reliably indicate "user does / does not exist" belong in the presence/absence lists.
-
-**Discord example (2026-03-24):** The POST API at `discord.com/api/v9/unique-username/username-attempt-unauthed` returns `{"taken":true}` / `{"taken":false}` normally, but under load returns varying rate-limit messages. Keeping only `{"taken":false}` in `absenceStrs` and all rate-limit variants in `errors` eliminates the transient false positives the Maigret bot was reporting.
-
-### 7.16 Non-UTF-8 page encoding silently breaks string markers
-
-**opennet.ru** serves pages in **KOI8-R** encoding. The `absenceStrs` value `"Имя участника не найдено"` is stored as UTF-8 bytes in `data.json`, but the HTTP response body contains the same text encoded as KOI8-R bytes. Since Maigret (and aiohttp) compares raw bytes by default, the substring is **never found** — the absence check silently fails, and empty `presenseStrs` (presence always true) produces a false CLAIMED.
-
-**How to detect:** If `absenceStrs` contains non-ASCII text and the check fails despite the string visibly appearing on the page in a browser, inspect the `Content-Type` header or raw bytes for a non-UTF-8 `charset` (KOI8-R, Windows-1251, ISO-8859-*, etc.). Also check with `curl -s URL | iconv -f KOI8-R -t UTF-8` to confirm.
-
-**Lesson:** Maigret has no built-in charset transcoding for marker comparison. If a site serves a non-UTF-8 charset and the relevant markers contain non-ASCII characters, string matching will fail. Options:
-- Find ASCII-only markers that work in any encoding (HTML tags, class names, English text).
-- Use a JSON API endpoint (APIs almost always return UTF-8).
-- If neither is available, `disabled: true`.
-
-### 7.17 ARIA and HTML boilerplate attributes are dangerous `presenseStrs`
-
-SlideShare had `"polite"` in `presenseStrs`, matching the standard `aria-live="polite"` attribute. This attribute appears on virtually any modern web page — including anti-bot challenge pages, error pages, and homepage redirects. When the real profile page is replaced by such a generic page, `absenceStrs` don't match (different content) but `presenseStrs` still fires → false CLAIMED.
-
-**Common traps:** `polite`, `alert`, `status`, `navigation`, `assertive`, `banner`, `main`, `complementary`, `contentinfo` — all standard ARIA landmark/live-region values present on most pages.
-
-**Lesson:** Never use single generic words that are part of HTML/ARIA boilerplate as `presenseStrs`. Profile markers should be **specific to the profile page structure**: unique CSS classes (e.g. `"profile-card"`), `` fragments with the site name (e.g. `"- salon24.pl"`), or JSON field names from API responses (e.g. `"displayName"`).
-
-### 7.18 Anti-bot challenge pages can pass through `message` checks as false CLAIMED
-
-When a site intermittently serves an anti-bot challenge page (e.g. SlideShare's "Client Challenge", Cloudflare "Just a moment..."), a specific failure mode occurs with `checkType: "message"`:
-
-1. The challenge HTML replaces the real profile/error page.
-2. `absenceStrs` don't match (challenge page has different content than "user not found").
-3. If `presenseStrs` is empty (presence always true) **or** contains a broad marker that matches the challenge HTML → result is **CLAIMED**.
-
-This is different from a simple "anti-bot → disable" situation because the challenge may be **intermittent** — the check works most of the time but produces sporadic false positives under load or for specific IPs.
-
-**Fix:** Add the challenge page's distinctive text to `errors`:
-```json
-{
- "errors": {
- "Client Challenge": "Anti-bot challenge",
- "Just a moment": "Cloudflare challenge",
- "Checking your browser": "Anti-bot challenge"
- }
-}
-```
-
-The `errors` mechanism produces **UNKNOWN** instead of CLAIMED, which is correct: "we got a challenge page, not a profile page, so we don't know."
-
-**Lesson:** When fixing a site that is **intermittently** reported as false positive, check whether the failure happens only when anti-bot protection triggers. If so, adding challenge markers to `errors` is better than disabling the entire check.
-
-### 7.19 Redirect-to-homepage as a "user not found" signal
-
-Some sites (e.g. **Salon24.pl**) redirect non-existent user URLs to the **homepage** via HTTP 301/302, while existing users get a 200 with profile content. Since Maigret follows redirects by default (`allow_redirects=True` for `message`/`status_code` checks), it sees the **final** page — the homepage.
-
-This creates a usable signal for `checkType: "message"`:
-- **`presenseStrs`** with a fragment unique to profile pages (e.g. `"- salon24.pl"` which appears in `"test 1 - salon24.pl"` on profiles but not on the generic homepage title).
-- No `absenceStrs` needed — the homepage simply doesn't contain the profile-specific marker.
-
-**Lesson:** When a site returns the same HTTP 200 for both users (after redirect-follow), compare the **final page content** for both. If unclaimed lands on the homepage, use a profile-specific `presenseStrs` marker rather than trying to find an absence string on the homepage.
-
-### 7.20 Non-standard HTTP status codes from anti-bot systems
-
-Anti-bot systems don't always use standard 403/429 codes. Observed examples:
-- **HTTP 468** (forum.exkavator.ru) — custom Tengine anti-bot status.
-- **HTTP 520–530** — Cloudflare-specific error codes (520 = unknown error, 521 = web server down, 522 = connection timed out, 523 = origin unreachable, 524 = timeout, 525 = SSL handshake failed, 526 = invalid SSL, 530 = with 1xxx error).
-
-**Lesson:** When diagnosing a site that returns connection errors or unexpected statuses in Maigret, check with `curl -sIL` first. If the status code is non-standard (not 2xx/3xx/4xx/5xx from the origin), it's likely an intermediary (CDN, WAF, anti-bot) and the site should be `disabled: true`.
-
-### 7.21 `site_check.py --diagnose` does not test POST APIs
-
-The `utils/site_check.py --diagnose` tool performs raw aiohttp GET requests to compare claimed/unclaimed responses. For sites that use `requestMethod: "POST"` (e.g. Discord, Holopin), the diagnose tool will show the site as broken because GET to a POST endpoint returns different content (often the site's homepage or an error page).
-
-**Workaround:** For POST-based checks, verify manually with `curl -X POST` or use `maigret --self-check --site "SiteName"` which respects the full configuration including request method and payload.
-
-### 7.7 The playbook classification works
-
-The decision tree from the documentation accurately describes real-world cases:
-
-| Situation | Playbook says | Actual result |
-|-----------|---------------|---------------|
-| Captcha (Baidu) | `disabled: true` | Correct |
-| TLS fingerprinting (Wikipedia) | `disabled: true` (anti-bot) | Correct |
-| Working API available (Reddit, MS Learn) | Use `urlProbe` | Correct |
-| Service migrated (MSDN → MS Learn) | Update URL or create new entry | Correct |
-
----
-
-## Documentation maintenance
-
-For any of the changes below, **always** keep these artifacts in sync — this file ([`site-checks-guide.md`](site-checks-guide.md)), [`site-checks-playbook.md`](site-checks-playbook.md), and (when rules or templates change) the header/template in [`socid_extractor_improvements.log`](socid_extractor_improvements.log):
-
-- Maigret code changes (including [`maigret/checking.py`](../maigret/checking.py), request executors, CLI);
-- New or changed search tools / helper utilities for site checks;
-- Changes to rules or semantics of `checkType`, `data.json` fields, self-check, etc.;
-- Changes to the **public JSON API** diagnostic step or **mandatory** `socid_extractor` logging rules.
-
-Prefer updating the guide, playbook, and log template in one commit or in the same task so instructions do not diverge. **Append-only:** new proposals go at the bottom of `socid_extractor_improvements.log`; do not delete historical entries when editing the template.
diff --git a/LLM/site-checks-playbook.md b/LLM/site-checks-playbook.md
deleted file mode 100644
index b637a6d..0000000
--- a/LLM/site-checks-playbook.md
+++ /dev/null
@@ -1,142 +0,0 @@
-# Site checks — playbook (Maigret)
-
-Short checklist for edits to [`maigret/resources/data.json`](../maigret/resources/data.json) and, when needed, [`maigret/checking.py`](../maigret/checking.py). Full guide: [`site-checks-guide.md`](site-checks-guide.md). Upstream extraction proposals: [`socid_extractor_improvements.log`](socid_extractor_improvements.log).
-
-**Documentation maintenance:** whenever you improve Maigret, add search tooling, or change check logic, update **both** this file and [`site-checks-guide.md`](site-checks-guide.md) (see the “Documentation maintenance” section at the end of that file). When JSON API / `socid_extractor` logging rules change, update the **template header** in [`socid_extractor_improvements.log`](socid_extractor_improvements.log) in the same change.
-
-## 0. Standard checks (do alongside reproduce / classify)
-
-- **Public JSON API:** always look for a stable JSON (or GraphQL JSON) profile endpoint (`/api/`, `.json`, mobile-style URLs). When the API is more reliable than HTML, set **`urlProbe`** to that endpoint and keep **`url`** as the human-readable profile link (e.g. `https://picsart.com/u/{username}`). If there is no separate profile URL, use the API as `url` only. Details: **`urlProbe`** and section **2.1** in [`site-checks-guide.md`](site-checks-guide.md).
-- **`socid_extractor` log (mandatory):** if you find **embedded user JSON in HTML** or a **standalone JSON profile API**, append a dated entry (with **example username**) to [`socid_extractor_improvements.log`](socid_extractor_improvements.log). Details: section **2.2** in [`site-checks-guide.md`](site-checks-guide.md).
-
-## 1. Reproduce
-
-- Run a targeted check:
- `maigret USER --db /path/to/maigret/resources/data.json --site "SiteName" --print-not-found --print-errors --no-progressbar -vv`
-- Compare an **existing** and a **non-existent** username (as `usernameClaimed` / `usernameUnclaimed` in JSON).
-- With `-vvv`, inspect `debug.log` (raw response in the log).
-
-## 2. Classify the cause
-
-| Symptom | Typical cause | Action |
-|--------|-----------------|--------|
-| HTTP 200 for “user does not exist” | Soft 404 | Move from `status_code` to `message` or `response_url`; add `absenceStrs` / narrow `presenseStrs` |
-| Generic words match (`name`, `email`) | `presenseStrs` too broad | Remove generic markers; add profile-specific ones. **Avoid** ARIA/boilerplate words (`polite`, `alert`, `navigation`, etc.) — see 7.17 in guide |
-| Same HTML without JS | SPA / skeleton shell | Compare **final URL and HTTP redirects** (Maigret already follows redirects by default). If the browser shows extra routes (`/posts`, `/not-found`) only **after JS**, they will **not** appear to Maigret — try a **public JSON/API** endpoint for the same site if one exists. See **Redirects and final URL** and **Picsart** in [`site-checks-guide.md`](site-checks-guide.md). |
-| Unclaimed redirects to homepage | Site returns 301/302 to main page | Use `presenseStrs` with a profile-specific marker (e.g. title fragment unique to profile pages). See 7.19 in guide |
-| 403 / “Log in” / guest-only | Auth or anti-bot required | `disabled: true` |
-| reCAPTCHA / “Checking your browser” / “Client Challenge” | Bot protection | Add challenge text to `errors` (→ UNKNOWN). Try a reasonable `User-Agent` in `headers`. If intermittent, `errors` is better than `disabled`. See 7.18 in guide |
-| Non-standard HTTP code (468, 520–530) | CDN/WAF anti-bot | `disabled: true`. Check with `curl -sIL` to confirm the code comes from an intermediary. See 7.20 in guide |
-| Non-ASCII `absenceStrs` not matching despite visible text | Page encoding ≠ UTF-8 | Check `Content-Type` for charset (KOI8-R, Windows-1251, etc.). Use ASCII-only markers, a JSON API, or `disabled: true`. See 7.16 in guide |
-| Domain does not resolve / persistent timeout | Dead service | Remove entry **only** after confirming the domain is dead |
-
-## 3. Data edits
-
-**CRITICAL — surgical edits only.** Never rewrite `data.json` via `json.load()` + `json.dump()` — this reformats the entire ~36 000-line file and produces an unreviewable diff. Make targeted, line-level edits to only the fields you are changing. See Phase C in [`site-checks-guide.md`](site-checks-guide.md).
-
-1. Update `url` / `urlMain` if needed (HTTPS redirects). Use optional **`urlProbe`** when the HTTP check should hit a different URL than the profile link shown in reports (API vs web UI).
-2. For `message`: **always** tune string pairs so `absenceStrs` fire on “no user” pages and `presenseStrs` fire on real profiles without false absence hits.
- - **Never** use ARIA/boilerplate words as `presenseStrs` (`polite`, `alert`, `navigation`, `status`, `main`, etc.).
- - If markers contain **non-ASCII text**, verify the page charset is UTF-8. Non-UTF-8 pages (KOI8-R, Windows-1251) will silently fail byte comparison — prefer ASCII-only markers or a JSON API.
-3. Engine (`engine`, e.g. XenForo): override only differing fields in the site entry so other sites are not broken.
-4. Keep `status_code` only if the response **reliably** differs by status code without soft 404.
-5. Add **anti-bot challenge text** to `errors` (not `absenceStrs`) when the site intermittently serves challenge pages. Common patterns: `”Client Challenge”`, `”Just a moment”`, `”Checking your browser”`, `”Attention Required”`. This produces UNKNOWN instead of false CLAIMED.
-
-## 4. Verify
-
-- `maigret --self-check --site "SiteName" --db ...` for touched entries.
-- `make test` before commit.
-
-## 5. Code notes
-
-- `process_site_result` uses strict comparison to `"status_code"` for `checkType` (not a substring trick).
-- Empty `presenseStrs` with `message` means “presence always true”; a debug line is logged only at DEBUG level.
-
-## 6. Development utilities
-
-Quick reference for site check utilities. Full details: section **6** in [`site-checks-guide.md`](site-checks-guide.md).
-
-| Command | Purpose |
-|---------|---------|
-| `python utils/site_check.py --site "X" --check-claimed` | Quick aiohttp comparison |
-| `python utils/site_check.py --site "X" --maigret` | Test via Maigret checker |
-| `python utils/site_check.py --site "X" --compare-methods` | Find aiohttp vs Maigret discrepancies |
-| `python utils/site_check.py --site "X" --diagnose` | Full diagnosis with fix recommendations |
-| `python utils/check_top_n.py --top 100` | Mass-check top 100 sites |
-| `maigret --self-check --site "X"` | Self-check (reports only, no auto-disable) |
-| `maigret --self-check --site "X" --auto-disable` | Self-check with auto-disable |
-| `maigret --self-check --site "X" --diagnose` | Self-check with detailed diagnosis |
-
-## 7. Quick tips (lessons learned)
-
-Practical observations from fixing top-ranked sites. Full details: section **7** in [`site-checks-guide.md`](site-checks-guide.md).
-
-| Tip | Why it matters |
-|-----|----------------|
-| **API first** | Reddit, Microsoft Learn — APIs worked when web pages were blocked. Always check `/api/`, `.json` endpoints. |
-| **`urlProbe` separates check from display** | Check via API, show human URL in reports. Example: Reddit API → `www.reddit.com/user/` link. |
-| **aiohttp ≠ curl** | Wikipedia returned 200 for curl, 403 for aiohttp (TLS fingerprinting). Always test with Maigret directly. |
-| **Use `debug.log`** | Run with `-vvv` to see raw response. Warning messages alone can be misleading. |
-| **`status_code` for clean APIs** | If API returns proper 404 for missing users, prefer `status_code` over `message`. |
-| **Migrate, don't delete** | MSDN → Microsoft Learn: keep old entry disabled, create new one for current service. |
-| **Engine templates break silently** | vBulletin `absenceStrs` failed on ~12 forums at once — many require login, showing a generic page with no error text. Check the engine template first. |
-| **Search-by-author is unreliable** | phpBB `search.php?author=` checks for posts, not accounts. A user with zero posts looks identical to a non-existent user. Avoid these URLs. |
-| **Some sites always generate a page** | Pbase stubs "pbase Artist {name}" for any path; ffm.bio fuzzy-matches to the nearest real entry. No markers can help — `disabled: true`. |
-| **TLS fingerprinting degrades over time** | Kaggle's custom `User-Agent` fix stopped working — aiohttp now gets 404 for both usernames. Accept `disabled: true` when no API exists. |
-| **API endpoints bypass Cloudflare** | Fandom `api.php` and Substack `/api/v1/` returned clean JSON while main pages were blocked by Cloudflare. Always try API paths on the same domain. |
-| **Inspect Network tab for POST APIs** | Many modern platforms (e.g., Discord) heavily protect HTML profiles but expose unauthenticated `POST` endpoints for username checks. Maigret supports this natively: define `"request_method": "POST"` and `"request_payload": {"username": "{username}"}` in `data.json` to query them! |
-| **Strict JSON markers are bulletproof** | When probing APIs, use `checkType: "message"` with exact JSON substrings (like `"{\"taken\": false}"`). Unlike HTML layout checks, this approach is immune to UI redesigns, A/B testing, and language translations. |
-| **GraphQL supports GET too** | hashnode GraphQL works via `GET ?query=...` (URL-encoded). You can use either native POST payloads or GET `urlProbe` for GraphQL. |
-| **URL-encode braces for template safety** | GraphQL `{...}` conflicts with Maigret's `{username}`. Use `%7B`/`%7D` for literal braces in `urlProbe` — `.format()` ignores percent-encoded chars. |
-| **Anti-bot bypass via simple UA** | "Anubis" anti-bot PoW screens (like on Weblate) intercept modern browser UAs via HTTP 307. Hardcoding `"headers": {"User-Agent": "python-requests/2.25.1"}` circumvents the scraper filter and restores default detection logic. |
-| **Rate-limit → `errors`, not `absenceStrs`** | Rate-limit wording varies across API versions. If the phrasing doesn't match `absenceStrs` and `presenseStrs` is empty, the result is a false CLAIMED. Put all "can't answer right now" strings (rate limit, CAPTCHA, maintenance) in `errors` so the result is UNKNOWN. |
-| **Non-UTF-8 encoding breaks markers** | opennet.ru serves KOI8-R; UTF-8 `absenceStrs` never match raw bytes. Use ASCII-only markers, a JSON API, or `disabled: true`. |
-| **ARIA attrs are presenseStrs traps** | `"polite"`, `"alert"`, `"navigation"` match `aria-live`/ARIA landmarks on any page including anti-bot challenges. Use profile-specific markers instead. |
-| **Anti-bot challenge + broad markers = false CLAIMED** | Challenge pages bypass `absenceStrs` but match broad `presenseStrs`. Add challenge text (e.g. `"Client Challenge"`) to `errors` → UNKNOWN. Better than disabling for intermittent issues. |
-| **Redirect-to-homepage as signal** | Salon24.pl 301-redirects unclaimed users to homepage. Use `presenseStrs` with a profile-only marker (e.g. `"- salon24.pl"`). |
-| **Non-standard anti-bot HTTP codes** | HTTP 468 (Tengine), 520–530 (Cloudflare) — not standard 403/429. Check with `curl -sIL`; if code is from intermediary → `disabled: true`. |
-| **`--diagnose` doesn't test POST** | `site_check.py --diagnose` uses GET only. For POST APIs (Discord, Holopin), verify with `curl -X POST` or `maigret --self-check`. |
-
-## 8. Site naming rules
-
-Site names in `data.json` are the **keys** of the `"sites"` object and appear in user-facing reports. Follow these rules:
-
-| Rule | Example | Counter-example |
-|------|---------|-----------------|
-| **Title Case** by default | `Hacker News`, `Product Hunt` | ~~`hackernews`~~, ~~`product hunt`~~ |
-| **Lowercase** if the brand is written that way | `kofi`, `note`, `hi5` | ~~`Kofi`~~, ~~`Note`~~ |
-| **No domain suffix** unless it is part of the recognized brand | `Flickr`, `Calendly`, `Upwork` | ~~`www.flickr.com`~~, ~~`calendly.com`~~ |
-| **Domain OK** when the brand is commonly written with it | `last.fm`, `VC.ru`, `Archive.org` | |
-| **No full UPPERCASE** unless the brand is an acronym/initialism | `VK`, `CNET`, `ICQ`, `IFTTT` | ~~`BOOTH`~~, ~~`VSCO`~~ → `Booth`, `VSCO` (brand) |
-| **`{username}` templates** in names are OK | `{username}.tilda.ws` | |
-| **Spaces** are allowed when the brand uses them | `Star Citizen`, `Google Maps` | |
-| **No `www.` or `https://`** prefix | `Flickr`, `Change.org` | ~~`www.flickr.com`~~, ~~`https:`~~ |
-
-When in doubt, check how the service refers to itself on its homepage or in its page title.
-
-## 9. Tagging rules
-
-### Country tags (ISO 3166-1 alpha-2)
-
-The goal of a country tag is to **attribute a person to their country of origin or residence**, not to be a perfect truth source.
-
-| Scenario | Action | Example |
-|----------|--------|---------|
-| Site is global, account says nothing about country | **No country tag** | GitHub, YouTube, Reddit, Medium, Udemy |
-| Account implies connection to a specific country | **Add country tag** | VK → `ru`, Naver → `kr`, Zhihu → `cn` |
-| Service used mostly in a few specific countries | **Multiple country tags OK** | Xing → `de`, `eu` |
-| Very local/regional site | **Must have country tag** | Nairaland → `ng`, 4pda → `ru` |
-
-**Do NOT** assign country tags based on traffic statistics (e.g. Alexa/SimilarWeb audience data). A site popular in India by traffic is not "Indian" if it is used globally. The `in` tag was previously over-applied this way.
-
-### Category tags
-
-- Every tag used in `data.json` must be registered in the `"tags"` array at the bottom of the file. The `test_tags_validity` test enforces this.
-- Do not use platform/software names as tags (`writefreely`, `pixelfed`). Use category names instead (`blog`, `photo`).
-- Avoid 2-letter category tags that collide with ISO country codes (e.g. `ai` = Anguilla). The `is_country_tag()` function treats any 2-letter tag as a country code.
-- Keep existing category tags when modifying country tags.
-- Top-50 sites by alexaRank must have at least one category tag (enforced by `test_top_sites_have_category_tag`).
-
-## 10. Documentation maintenance
-
-When you change Maigret, add search tools, or change check logic, keep **this playbook**, [`site-checks-guide.md`](site-checks-guide.md), and (when applicable) the template in [`socid_extractor_improvements.log`](socid_extractor_improvements.log) aligned. New log **entries** are append-only at the bottom of that file.
diff --git a/MANIFEST.in b/MANIFEST.in
deleted file mode 100644
index 6255e80..0000000
--- a/MANIFEST.in
+++ /dev/null
@@ -1,4 +0,0 @@
-include LICENSE
-include README.md
-include requirements.txt
-include maigret/resources/*
diff --git a/maigret/checking.py b/maigret/checking.py
index c0a0603..c26f6f4 100644
--- a/maigret/checking.py
+++ b/maigret/checking.py
@@ -139,12 +139,18 @@ class SimpleAiohttpChecker(CheckerBase):
async def check(self) -> Tuple[str, int, Optional[CheckError]]:
from aiohttp_socks import ProxyConnector
+ # Use a real SSL context instead of ssl=False to avoid TLS fingerprinting
+ # blocks by Cloudflare and similar WAFs. Certificate verification is
+ # disabled to handle sites with invalid/expired certs.
+ ssl_context = ssl.create_default_context()
+ ssl_context.check_hostname = False
+ ssl_context.verify_mode = ssl.CERT_NONE
+
connector = (
ProxyConnector.from_url(self.proxy)
if self.proxy
- else TCPConnector(ssl=False)
+ else TCPConnector(ssl=ssl_context)
)
- connector.verify_ssl = False
async with ClientSession(
connector=connector,
diff --git a/maigret/errors.py b/maigret/errors.py
index 3b79a6c..d8930c5 100644
--- a/maigret/errors.py
+++ b/maigret/errors.py
@@ -58,6 +58,7 @@ COMMON_ERRORS = {
'Censorship', 'MGTS'
),
'Incapsula incident ID': CheckError('Bot protection', 'Incapsula'),
+ 'DDoS-Guard': CheckError('Bot protection', 'DDoS-Guard'),
'Сайт заблокирован хостинг-провайдером': CheckError(
'Site-specific', 'Site is disabled (Beget)'
),
diff --git a/maigret/resources/data.json b/maigret/resources/data.json
index e216cb6..0e13114 100644
--- a/maigret/resources/data.json
+++ b/maigret/resources/data.json
@@ -88,6 +88,7 @@
},
"Twitter": {
"tags": [
+ "messaging",
"social"
],
"headers": {
@@ -126,7 +127,6 @@
},
"MicrosoftTechNet": {
"disabled": true,
- "tags": [],
"checkType": "status_code",
"urlMain": "https://social.technet.microsoft.com",
"url": "https://social.technet.microsoft.com/profile/{username}/",
@@ -146,7 +146,6 @@
},
"social.msdn.microsoft.com": {
"disabled": true,
- "tags": [],
"engine": "engine404",
"urlMain": "https://social.msdn.microsoft.com",
"url": "https://social.msdn.microsoft.com/profile/{username}",
@@ -164,7 +163,6 @@
"usernameUnclaimed": "noonewouldeverusethis7"
},
"AppleDiscussions": {
- "tags": [],
"checkType": "status_code",
"urlMain": "https://discussions.apple.com/",
"url": "https://discussions.apple.com/profile/{username}",
@@ -199,7 +197,8 @@
"WordPressOrg": {
"tags": [
"blog",
- "coding"
+ "coding",
+ "in"
],
"checkType": "response_url",
"alexaRank": 12,
@@ -301,8 +300,8 @@
},
"TikTok": {
"tags": [
- "video",
- "social"
+ "social",
+ "video"
],
"headers": {
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
@@ -325,7 +324,8 @@
},
"Amazon": {
"tags": [
- "shopping"
+ "shopping",
+ "us"
],
"checkType": "message",
"presenseStrs": [
@@ -342,7 +342,6 @@
},
"community.adobe.com": {
"similarSearch": true,
- "tags": [],
"checkType": "message",
"presenseStrs": [
"lia-user-item-profile",
@@ -382,7 +381,6 @@
"usernameUnclaimed": "noonewouldeverusethis7"
},
"Mozilla Support": {
- "tags": [],
"checkType": "message",
"absenceStrs": [
">Page Not Found",
@@ -697,7 +695,6 @@
},
"Oracle Community": {
"disabled": true,
- "tags": [],
"checkType": "status_code",
"urlMain": "https://community.oracle.com",
"url": "https://community.oracle.com/people/{username}",
@@ -765,8 +762,10 @@
},
"ResearchGate": {
"tags": [
+ "in",
"research",
- "social"
+ "social",
+ "us"
],
"regexCheck": "\\w+_\\w+",
"checkType": "response_url",
@@ -789,7 +788,6 @@
"usernameUnclaimed": "noonewould"
},
"cyber.harvard.edu": {
- "tags": [],
"checkType": "status_code",
"urlMain": "https://cyber.harvard.edu",
"url": "https://cyber.harvard.edu/people/{username}",
@@ -923,8 +921,8 @@
"rate_limited": "Rate limited"
},
"tags": [
- "messaging",
- "gaming"
+ "gaming",
+ "messaging"
]
},
"Unsplash": {
@@ -941,7 +939,8 @@
},
"Calendly": {
"tags": [
- "business"
+ "business",
+ "us"
],
"checkType": "status_code",
"presenseStrs": [
@@ -1099,6 +1098,7 @@
},
"Bluesky": {
"tags": [
+ "messaging",
"social"
],
"checkType": "message",
@@ -1324,8 +1324,8 @@
},
"Myspace": {
"tags": [
- "social",
- "music"
+ "music",
+ "social"
],
"checkType": "status_code",
"alexaRank": 244,
@@ -2588,8 +2588,8 @@
"alexaRank": 749,
"tags": [
"coding",
- "tech",
- "llm"
+ "llm",
+ "tech"
]
},
"Laracast": {
@@ -2913,6 +2913,7 @@
"Foursquare": {
"tags": [
"geosocial",
+ "in",
"social"
],
"checkType": "message",
@@ -3104,7 +3105,6 @@
"usernameUnclaimed": "noonewouldeverusethis7"
},
"djskt.lnk.to": {
- "tags": [],
"checkType": "message",
"presenseStrs": [
"artistName",
@@ -3923,8 +3923,8 @@
"OpenSea": {
"disabled": true,
"tags": [
- "nft",
- "crypto"
+ "crypto",
+ "nft"
],
"checkType": "message",
"presenseStrs": [
@@ -4077,7 +4077,6 @@
]
},
"fablero.ucoz.ru": {
- "tags": [],
"engine": "uCoz",
"urlMain": "http://fablero.ucoz.ru",
"usernameClaimed": "alex",
@@ -5324,7 +5323,6 @@
]
},
"SoftwareInformer": {
- "tags": [],
"checkType": "response_url",
"urlMain": "https://users.software.informer.com",
"url": "https://users.software.informer.com/{username}/",
@@ -5459,7 +5457,6 @@
"usernameUnclaimed": "noonewouldeverusethis7"
},
"T-MobileSupport": {
- "tags": [],
"checkType": "status_code",
"urlMain": "https://support.t-mobile.com",
"url": "https://support.t-mobile.com/people/{username}",
@@ -5721,8 +5718,8 @@
"LiveLeak": {
"disabled": true,
"tags": [
- "video",
- "news"
+ "news",
+ "video"
],
"checkType": "message",
"absenceStrs": [
@@ -6028,12 +6025,11 @@
"usernameUnclaimed": "cawlpwmifx",
"alexaRank": 3027,
"tags": [
- "science",
- "hobby"
+ "hobby",
+ "science"
]
},
"Pluralsight": {
- "tags": [],
"checkType": "message",
"errors": {
"Unfortunately, Pluralsight's products are not available in your area at this time": "Site censorship"
@@ -6157,8 +6153,8 @@
},
"OpenCollective": {
"tags": [
- "finance",
- "coding"
+ "coding",
+ "finance"
],
"checkType": "message",
"absenceStrs": [
@@ -6496,759 +6492,9 @@
},
"Plurk": {
"tags": [
+ "social",
"tw",
- "social"
- ],
- "checkType": "message",
- "absenceStrs": [
- "User Not Found!"
- ],
- "alexaRank": 3426,
- "urlMain": "https://www.plurk.com/",
- "url": "https://www.plurk.com/{username}",
- "usernameClaimed": "adam",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "ceed.at.ua": {
- "disabled": true,
- "engine": "uCoz",
- "urlMain": "http://ceed.at.ua",
- "usernameClaimed": "alex",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "club-2105.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://club-2105.at.ua",
- "usernameClaimed": "alex",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "delta72.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://delta72.at.ua",
- "usernameClaimed": "alex",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "fobia.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://fobia.at.ua",
- "usernameClaimed": "alex",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "komsomolskiy.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://komsomolskiy.at.ua",
- "usernameClaimed": "alex",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "my-citrus.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://my-citrus.at.ua",
- "usernameClaimed": "admin",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "netbiz.at.ua": {
- "disabled": true,
- "engine": "uCoz",
- "urlMain": "http://netbiz.at.ua",
- "usernameClaimed": "admin",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "nikos.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://nikos.at.ua",
- "usernameClaimed": "alex",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "tags": [
- "ua"
- ]
- },
- "stalkerbar.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://stalkerbar.at.ua",
- "usernameClaimed": "admin",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "tdo888.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://tdo888.at.ua",
- "usernameClaimed": "admin",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "uahack.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://uahack.at.ua",
- "usernameClaimed": "alex",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "vii.at.ua": {
- "disabled": true,
- "engine": "uCoz",
- "urlMain": "http://vii.at.ua",
- "usernameClaimed": "alex",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "vovdm.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://vovdm.at.ua",
- "usernameClaimed": "alex",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "alka-mine.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://alka-mine.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "oih.at.ua": {
- "tags": [
- "ua"
- ],
- "engine": "uCoz",
- "urlMain": "http://oih.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "medkarta.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://medkarta.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "lexus-club.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://lexus-club.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "wolga24.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://wolga24.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "avon-kiev.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://avon-kiev.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "prof-rem-zona.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://prof-rem-zona.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "super-warez-por.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://super-warez-por.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "girl.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://girl.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "bull-baza.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://bull-baza.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "starfiles.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://starfiles.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "expressinfo.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://expressinfo.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex",
- "tags": [
- "classified",
- "ua"
- ]
- },
- "uface.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://uface.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "drujba.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://drujba.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "zhelezyaka.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://zhelezyaka.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "reklama-x.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://reklama-x.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "hamradio.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://hamradio.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "torworld.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://torworld.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "stop-nazi.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://stop-nazi.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "osiris.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://osiris.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "sloboganec.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://sloboganec.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "ohorona.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://ohorona.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "ganjaspice.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://ganjaspice.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "uko.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://uko.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "koshtoris.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://koshtoris.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "alikgor.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://alikgor.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "novayamebel.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://novayamebel.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "greenbacks.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://greenbacks.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin",
- "tags": [
- "finance",
- "ru"
- ]
- },
- "bashtanka.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://bashtanka.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "bot-cs.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://bot-cs.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "gool-live.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://gool-live.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "trainmodels.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://trainmodels.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "sbuda.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://sbuda.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex",
- "tags": [
- "ua"
- ]
- },
- "vse-o-zaz.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://vse-o-zaz.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "vinbazar.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://vinbazar.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "angell.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://angell.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "kupluradiodetal.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://kupluradiodetal.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex",
- "tags": [
- "forum",
- "ua"
- ]
- },
- "lksmu-lg.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://lksmu-lg.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "tv-android.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://tv-android.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "fcbarca.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://fcbarca.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "piratapes.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://piratapes.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "john"
- },
- "magia.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://magia.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "gdz.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://gdz.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "greenvisa.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://greenvisa.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "generalu.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://generalu.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "russianfoxmail.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://russianfoxmail.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "umorbos.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://umorbos.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin",
- "tags": [
- "ua"
- ]
- },
- "avangard-basket.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://avangard-basket.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "mechta-sev.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://mechta-sev.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "polotno.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://polotno.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "futajik.at.ua": {
- "tags": [
- "ru"
- ],
- "engine": "uCoz",
- "urlMain": "http://futajik.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "avto-box.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://avto-box.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "smarton.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://smarton.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "xemera.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://xemera.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "mystyle.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://mystyle.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "pc-world.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://pc-world.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "molodezh-ua.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://molodezh-ua.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "metanoia.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://metanoia.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "motomanual.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://motomanual.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "mlm.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://mlm.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "programm.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://programm.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "maslinka.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://maslinka.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "unreal.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://unreal.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "nordar.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://nordar.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "satsoft.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://satsoft.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "ltsai.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://ltsai.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "autosila.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://autosila.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "tovyanskaya.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://tovyanskaya.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "admin"
- },
- "fx-profit.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://fx-profit.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "lori.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://lori.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "kotneko.at.ua": {
- "engine": "uCoz",
- "urlMain": "http://kotneko.at.ua",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "usernameClaimed": "alex"
- },
- "TechPowerUp": {
- "disabled": true,
- "tags": [
- "tech"
- ],
- "checkType": "message",
- "absenceStrs": [
- "The specified member cannot be found"
- ],
- "alexaRank": 3480,
- "urlMain": "https://www.techpowerup.com",
- "url": "https://www.techpowerup.com/forums/members/?username={username}",
- "usernameClaimed": "adam",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "MoneySavingExpert": {
- "tags": [
- "forum",
- "gb"
- ],
- "checkType": "status_code",
- "urlMain": "https://forums.moneysavingexpert.com",
- "url": "https://forums.moneysavingexpert.com/profile/{username}",
- "usernameClaimed": "adam",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "HackerOne": {
- "tags": [
- "hacking"
- ],
- "checkType": "message",
- "absenceStrs": [
- "Page not found"
- ],
- "alexaRank": 3652,
- "urlMain": "https://hackerone.com/",
- "url": "https://hackerone.com/{username}",
- "usernameClaimed": "test",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "Zomato": {
- "tags": [
- "geosocial",
- "in"
- ],
- "headers": {
- "Accept-Language": "en-US,en;q=0.9"
- },
- "checkType": "status_code",
- "alexaRank": 3688,
- "urlMain": "https://www.zomato.com/",
- "url": "https://www.zomato.com/pl/{username}/foodjourney",
- "usernameClaimed": "deepigoyal",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "disabled": true
- },
- "The Movie DB": {
- "url": "https://www.themoviedb.org/u/{username}",
- "urlMain": "https://www.themoviedb.org/",
- "checkType": "status_code",
- "usernameClaimed": "blue",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "alexaRank": 3699,
- "tags": [
- "movies"
- ]
- },
- "Winamp": {
- "disabled": true,
- "tags": [
- "forum"
- ],
- "engine": "vBulletin",
- "urlMain": "http://forums.winamp.com",
- "usernameClaimed": "red",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "Skyrock": {
- "disabled": true,
- "tags": [
- "fr"
- ],
- "regexCheck": "^[^_\\.]+$",
- "checkType": "status_code",
- "alexaRank": 3781,
- "urlMain": "https://skyrock.com/",
- "url": "https://{username}.skyrock.com/",
- "usernameClaimed": "adam",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "Netvibes": {
- "tags": [
- "business",
- "fr"
- ],
- "disabled": true,
- "checkType": "status_code",
- "alexaRank": 3789,
- "headers": {
- "User-Agent": "curl/7.64.1"
- },
- "urlMain": "https://www.netvibes.com",
- "url": "https://www.netvibes.com/{username}#General",
- "usernameClaimed": "blue",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "AngelList": {
- "absenceStrs": [
- "render_not_found"
- ],
- "presenseStrs": [
- "Profile",
- "profiles",
- "User profile",
- "name",
- "layouts/profile"
- ],
- "url": "https://angel.co/u/{username}",
- "urlMain": "https://angel.co",
- "usernameClaimed": "john",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "checkType": "message",
- "tags": [
- "business"
- ],
- "alexaRank": 3821
- },
- "Tomtom": {
- "disabled": true,
- "tags": [
- "de",
- "it",
- "nl",
- "no"
- ],
- "checkType": "status_code",
- "urlMain": "https://discussions.tomtom.com/",
- "url": "https://discussions.tomtom.com/en/profile/{username}",
- "usernameClaimed": "adam",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "Cracked": {
- "tags": [
- "news"
- ],
- "checkType": "response_url",
- "alexaRank": 3845,
- "urlMain": "https://www.cracked.com/",
- "url": "https://www.cracked.com/members/{username}/",
- "errorUrl": "https://www.cracked.com/",
- "usernameClaimed": "blue",
- "usernameUnclaimed": "noonewouldeverusethis"
- },
- "Digitalspy": {
- "disabled": true,
- "tags": [
- "forum",
- "gb"
- ],
- "checkType": "status_code",
- "urlMain": "https://forums.digitalspy.com/",
- "url": "https://forums.digitalspy.com/profile/discussions/{username}",
- "usernameClaimed": "adam",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "night.kharkov.ua": {
- "engine": "uCoz",
- "urlMain": "http://night.kharkov.ua",
- "usernameClaimed": "alex",
- "usernameUnclaimed": "noonewouldeverusethis7"
- },
- "scooterclub.kharkov.ua": {
- "engine": "uCoz",
- "urlMain": "http://scooterclub.kharkov.ua",
- "usernameClaimed": "alex",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "disabled": true
- },
- "Aparat": {
- "disabled": true,
- "absenceStrs": [
- "404 - Page Not Found"
- ],
- "presenseStrs": [
- "Profile",
- "username",
- "ProfileMore",
- "name",
- "provider"
- ],
- "urlProbe": "https://www.aparat.com/api/fa/v1/user/user/information/username/{username}",
- "url": "https://www.aparat.com/{username}",
- "urlMain": "https://www.aparat.com",
- "usernameClaimed": "BoHBiG",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "checkType": "message",
- "alexaRank": 3928,
- "tags": [
- "ir",
- "video"
- ]
- },
- "Gab": {
- "tags": [
- "us",
- "social"
+ "us"
],
"urlProbe": "https://gab.com/api/v1/account_by_username/{username}",
"checkType": "status_code",
@@ -7282,7 +6528,6 @@
"usernameUnclaimed": "noonewouldeverusethis7"
},
"BodyBuilding": {
- "tags": [],
"checkType": "response_url",
"urlMain": "https://bodyspace.bodybuilding.com/",
"url": "https://bodyspace.bodybuilding.com/{username}",
@@ -7444,8 +6689,8 @@
},
"Hackaday": {
"tags": [
- "tech",
- "hobby"
+ "hobby",
+ "tech"
],
"checkType": "status_code",
"alexaRank": 4348,
@@ -8160,9 +7405,9 @@
"usernameUnclaimed": "noonewouldeverusethis7",
"alexaRank": 5921,
"tags": [
- "kr",
"blog",
- "coding"
+ "coding",
+ "kr"
]
},
"Tinkoff Invest": {
@@ -8251,6 +7496,7 @@
},
"Minds": {
"tags": [
+ "in",
"social"
],
"checkType": "message",
@@ -8438,8 +7684,8 @@
"usernameUnclaimed": "noonewouldeverusethis7",
"alexaRank": 6611,
"tags": [
- "tech",
- "nl"
+ "nl",
+ "tech"
]
},
"Destructoid": {
@@ -8669,12 +7915,11 @@
"usernameUnclaimed": "noonewouldeverusethis7",
"alexaRank": 6693,
"tags": [
- "wiki",
- "kr"
+ "kr",
+ "wiki"
]
},
"Cheezburger": {
- "tags": [],
"checkType": "response_url",
"urlMain": "https://profile.cheezburger.com",
"url": "https://profile.cheezburger.com/{username}",
@@ -8693,7 +7938,9 @@
"GaiaOnline": {
"tags": [
"gaming",
- "social"
+ "ro",
+ "social",
+ "us"
],
"checkType": "message",
"absenceStrs": [
@@ -8919,6 +8166,9 @@
"AskFM": {
"disabled": true,
"tags": [
+ "eg",
+ "in",
+ "ru",
"social"
],
"regexCheck": "^[a-zA-Z0-9_]{3,40}$",
@@ -9135,6 +8385,7 @@
},
"Ello": {
"tags": [
+ "in",
"social"
],
"checkType": "message",
@@ -9950,8 +9201,8 @@
},
"Airliners": {
"tags": [
- "photo",
- "hobby"
+ "hobby",
+ "photo"
],
"checkType": "status_code",
"alexaRank": 8484,
@@ -10135,7 +9386,9 @@
},
"AminoApp": {
"tags": [
- "social"
+ "br",
+ "social",
+ "us"
],
"checkType": "status_code",
"alexaRank": 9040,
@@ -10298,7 +9551,6 @@
"usernameUnclaimed": "noonewouldeverusethis7"
},
"Niftygateway": {
- "tags": [],
"urlProbe": "https://api.niftygateway.com/user/profile-and-offchain-nifties-by-url/?profile_url={username}",
"checkType": "message",
"presenseStrs": [
@@ -10567,8 +9819,8 @@
},
"Depop": {
"tags": [
- "shopping",
- "fashion"
+ "fashion",
+ "shopping"
],
"checkType": "message",
"presenseStrs": [
@@ -10673,8 +9925,8 @@
],
"urlProbe": "https://rarible.com/marketplace/api/v4/urls/{username}",
"tags": [
- "nft",
- "crypto"
+ "crypto",
+ "nft"
]
},
"Computerbase": {
@@ -10843,8 +10095,8 @@
"usernameUnclaimed": "noonewouldeverusethis7",
"alexaRank": 10985,
"tags": [
- "forum",
- "de"
+ "de",
+ "forum"
]
},
"Badoo": {
@@ -10899,8 +10151,8 @@
},
"Diary.ru": {
"tags": [
- "ru",
- "blog"
+ "blog",
+ "ru"
],
"checkType": "message",
"absenceStrs": [
@@ -11004,18 +10256,30 @@
"reading",
"ru"
],
- "checkType": "status_code",
+ "checkType": "message",
+ "presenseStrs": [
+ "на livelib.ru"
+ ],
+ "absenceStrs": [
+ "не найден"
+ ],
"alexaRank": 11577,
"urlMain": "https://www.livelib.ru/",
"url": "https://www.livelib.ru/reader/{username}",
"usernameClaimed": "blue",
- "usernameUnclaimed": "noonewouldeverusethis7"
+ "usernameUnclaimed": "noonewouldeverusethis7",
+ "headers": {
+ "User-Agent": ""
+ },
+ "errors": {
+ "DDoS-Guard": "DDoS protection detected, use proxy/vpn"
+ }
},
"PCPartPicker": {
"disabled": true,
"tags": [
- "tech",
- "shopping"
+ "shopping",
+ "tech"
],
"checkType": "status_code",
"alexaRank": 11598,
@@ -11154,7 +10418,6 @@
},
"InfosecInstitute": {
"disabled": true,
- "tags": [],
"checkType": "status_code",
"urlMain": "https://community.infosecinstitute.com",
"url": "https://community.infosecinstitute.com/profile/{username}",
@@ -11232,8 +10495,8 @@
},
"Neoseeker": {
"tags": [
- "gaming",
- "forum"
+ "forum",
+ "gaming"
],
"checkType": "status_code",
"alexaRank": 13021,
@@ -11577,13 +10840,15 @@
"usernameUnclaimed": "noonewouldeverusethis7",
"alexaRank": 14425,
"tags": [
- "movies",
- "kr"
+ "kr",
+ "movies"
]
},
"MeetMe": {
"tags": [
- "social"
+ "in",
+ "social",
+ "us"
],
"errors": {
"fa fa-spinner fa-pulse loading-icon-lg": "Registration page"
@@ -11841,8 +11106,8 @@
},
"Avforums": {
"tags": [
- "gb",
- "forum"
+ "forum",
+ "gb"
],
"checkType": "message",
"absenceStrs": [
@@ -11970,7 +11235,6 @@
"usernameUnclaimed": "noonewouldeverusethis7"
},
"Justlanded": {
- "tags": [],
"checkType": "status_code",
"urlMain": "https://community.justlanded.com",
"url": "https://community.justlanded.com/en/profile/{username}",
@@ -12089,8 +11353,8 @@
"alexaRank": 18203,
"requestMethod": "GET",
"tags": [
- "shopping",
- "gb"
+ "gb",
+ "shopping"
]
},
"TJournal": {
@@ -12172,8 +11436,8 @@
},
"codeforces.com": {
"tags": [
- "ru",
- "coding"
+ "coding",
+ "ru"
],
"errors": {
"The page is temporarily blocked by administrator.": "IP ban"
@@ -12473,7 +11737,6 @@
]
},
"Storycorps": {
- "tags": [],
"checkType": "status_code",
"urlMain": "https://archive.storycorps.org",
"url": "https://archive.storycorps.org/user/{username}/",
@@ -12488,8 +11751,8 @@
"usernameUnclaimed": "noonewouldeverusethis7",
"alexaRank": 21854,
"tags": [
- "shopping",
- "ar"
+ "ar",
+ "shopping"
]
},
"fixya": {
@@ -12517,6 +11780,7 @@
},
"Nitter": {
"tags": [
+ "messaging",
"social"
],
"headers": {
@@ -12720,8 +11984,8 @@
},
"Lobsters": {
"tags": [
- "news",
- "coding"
+ "coding",
+ "news"
],
"regexCheck": "[A-Za-z0-9][A-Za-z0-9_-]{0,24}",
"checkType": "status_code",
@@ -13124,7 +12388,6 @@
"usernameUnclaimed": "noonewouldeverusethis7"
},
"osu!": {
- "tags": [],
"checkType": "status_code",
"urlMain": "https://osu.ppy.sh/",
"url": "https://osu.ppy.sh/users/{username}",
@@ -13621,8 +12884,8 @@
},
"Mamot": {
"tags": [
- "mastodon",
- "fr"
+ "fr",
+ "mastodon"
],
"checkType": "status_code",
"urlMain": "https://mamot.fr",
@@ -13789,7 +13052,6 @@
"alexaRank": 35942
},
"TalkDrugabuse": {
- "tags": [],
"checkType": "message",
"absenceStrs": [
"The specified member cannot be found"
@@ -13895,13 +13157,12 @@
"usernameUnclaimed": "noonewouldeverusethis7",
"checkType": "message",
"tags": [
- "tech",
- "llm"
+ "llm",
+ "tech"
],
"alexaRank": 37110
},
"Insanejournal": {
- "tags": [],
"checkType": "message",
"absenceStrs": [
"404 Not Found",
@@ -14415,8 +13676,8 @@
},
"Gapyear": {
"tags": [
- "travel",
- "gb"
+ "gb",
+ "travel"
],
"checkType": "status_code",
"alexaRank": 47746,
@@ -14591,7 +13852,9 @@
},
"Vero": {
"tags": [
- "social"
+ "in",
+ "social",
+ "us"
],
"checkType": "message",
"absenceStrs": [
@@ -14775,8 +14038,7 @@
"urlMain": "https://board.phpbuilder.com",
"engine": "Flarum",
"usernameClaimed": "alex",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "tags": []
+ "usernameUnclaimed": "noonewouldeverusethis7"
},
"ProfilesTigweb": {
"tags": [
@@ -15375,8 +14637,7 @@
"urlMain": "https://discuss.flarum.org",
"engine": "Flarum",
"usernameClaimed": "alex",
- "usernameUnclaimed": "noonewouldeverusethis7",
- "tags": []
+ "usernameUnclaimed": "noonewouldeverusethis7"
},
"Root-me": {
"tags": [
@@ -15842,7 +15103,6 @@
"usernameUnclaimed": "noonewouldeverusethis7"
},
"KanoWorld": {
- "tags": [],
"checkType": "status_code",
"urlMain": "https://world.kano.me/",
"url": "https://api.kano.me/progress/user/{username}",
@@ -16708,8 +15968,8 @@
"usernameClaimed": "blue",
"usernameUnclaimed": "noonewouldeverusethis7",
"tags": [
- "social",
- "mastodon"
+ "mastodon",
+ "social"
]
},
"HackTheBox": {
@@ -17446,30 +16706,19 @@
"usernameUnclaimed": "noonewouldeverusethis7"
},
"Livios": {
- "checkType": "message",
- "absenceStrs": [
- "not found",
- "email",
- "nav-profile",
- "og:title",
- "\r"
- ],
- "presenseStrs": [
- "