Commit Graph

6 Commits

Author SHA1 Message Date
Soxoj 42de0a526d Sites re-check (#2423) 2026-04-08 00:48:37 +02:00
Soxoj 55fc8694ed Fix false-positive site checks reported by Maigret Bot (#2376) 2026-04-08 00:48:36 +02:00
Soxoj 06c3360e3d feat(core): add POST request support, new sites, migrate to Majestic Million ranking (#2317)
* feat(core): add POST request support, new sites, migrate to Majestic Million ranking
- Added native POST request support to the Maigret engine (requestMethod, requestPayload) to enable querying modern JSON registration endpoints.
- Replaced the discontinued Alexa rank API with the Majestic Million dataset for global popularity sorting and automated CI updates.
- Fixed multiple false positives among top 500 sites and bypassed standard anti-bot protections using custom User-Agents.
- Updated public documentation and internal playbooks to reflect the new features.

* feat(data): apply all data.json site check updates from main branch

- Added CTFtime and PentesterLab (new sites added in main)
- Removed forums.imore.com (deleted in main as dead site)
- Disabled 5 sites per main branch fixes: Librusec, MirTesen, amateurvoyeurforum.com, forums.stevehoffman.tv, vegalab
- Fixed 5 site checks per main branch: SoundCloud, Taplink, Setlist, RoyalCams, club.cnews.ru (switched from status_code to message checkType with proper markers)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/a1d194d9-c0ff-4e2b-974c-c5e4b59548bf

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Soxoj 5fa86187f5 feat(sites): fix false positives: disable 74 broken sites, fix 8 with API probes and better markers (#2302)
- Disable 74 sites: Cloudflare/captcha blocks, identical responses,
    dead domains, vBulletin/phpBB engine failures
  - Fix Roblox, Salon24.pl, Planetaexcel → status_code (clear 404 signal)
  - Fix en.brickimedia.org → message with "noarticletext" absenceStr
  - Fix Arduino → narrower title-based presenseStrs/absenceStrs
  - Re-enable Fandom (3 wikis) via MediaWiki api.php urlProbe
  - Re-enable Substack via /api/v1/user/{}/public_profile urlProbe
  - Re-enable hashnode via GraphQL GET urlProbe (URL-encoded query)
  - Document lessons: engine template drift, search-by-author fragility,
    always-200 sites, TLS degradation, API bypassing Cloudflare,
    GraphQL GET support, URL-encoding for template safety
2026-04-08 00:48:36 +02:00
Soxoj c9ab9d676b Improve site-check quality: fix broken site configs, add diagnostic utilities, and make self-check report-only by default with opt-in auto-disable. (#2301)
- Fix VK and TradingView checkType; add Reddit and Microsoft Learn API-style probes where appropriate; adjust or disable entries that are unreliable under anti-bot protection.
- Self-check: stop aggressive auto-disable; default to reporting issues only; add --auto-disable and --diagnose for optional fixes and deeper output.
- Tooling: add utils/site_check.py and utils/check_top_n.py (and related helpers) to inspect and rank site behavior against the top-N list
- Scope: aligns with fixing top-traffic / high-impact sites and making diagnostics repeatable without silently flipping disabled flags
2026-04-08 00:48:36 +02:00
Soxoj 59535c59e5 Fixed false positives in top-500 (#2292) 2026-04-08 00:48:36 +02:00