Commit Graph

28 Commits

Author SHA1 Message Date
Soxoj 77c11df119 Fix Google Cloud Shell launch (#2557) 2026-04-23 21:45:27 +02:00
Soxoj abce3c9be4 Fix false positives (#2459)
* Fix false positives: APClips, Taplink, gentoo, Discord.bio, ChaturBate; disable 7Cups, playtime, openriskmanual, reactos; update tags

* Fix db_meta.json regeneration in update_site_data.py (inline instead of module import)
2026-04-04 18:22:21 +02:00
Soxoj 269d50eedc DB update mechanism (#2458)
* Database update mechanism
2026-04-04 18:00:50 +02:00
Soxoj fa1a4d1b4a Sites re-check (#2423) 2026-03-27 22:41:55 +01:00
Soxoj b145e7b26f feat(core): add POST request support, new sites, migrate to Majestic Million ranking (#2317)
* feat(core): add POST request support, new sites, migrate to Majestic Million ranking
- Added native POST request support to the Maigret engine (requestMethod, requestPayload) to enable querying modern JSON registration endpoints.
- Replaced the discontinued Alexa rank API with the Majestic Million dataset for global popularity sorting and automated CI updates.
- Fixed multiple false positives among top 500 sites and bypassed standard anti-bot protections using custom User-Agents.
- Updated public documentation and internal playbooks to reflect the new features.

* feat(data): apply all data.json site check updates from main branch

- Added CTFtime and PentesterLab (new sites added in main)
- Removed forums.imore.com (deleted in main as dead site)
- Disabled 5 sites per main branch fixes: Librusec, MirTesen, amateurvoyeurforum.com, forums.stevehoffman.tv, vegalab
- Fixed 5 site checks per main branch: SoundCloud, Taplink, Setlist, RoyalCams, club.cnews.ru (switched from status_code to message checkType with proper markers)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/a1d194d9-c0ff-4e2b-974c-c5e4b59548bf

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-03-24 22:08:42 +01:00
Soxoj 97cc4b46d9 Improve site-check quality: fix broken site configs, add diagnostic utilities, and make self-check report-only by default with opt-in auto-disable. (#2301)
- Fix VK and TradingView checkType; add Reddit and Microsoft Learn API-style probes where appropriate; adjust or disable entries that are unreliable under anti-bot protection.
- Self-check: stop aggressive auto-disable; default to reporting issues only; add --auto-disable and --diagnose for optional fixes and deeper output.
- Tooling: add utils/site_check.py and utils/check_top_n.py (and related helpers) to inspect and rank site behavior against the top-N list
- Scope: aligns with fixing top-traffic / high-impact sites and making diagnostics repeatable without silently flipping disabled flags
2026-03-22 16:48:35 +01:00
Soxoj 127d9032c3 Fixed Vimeo, activation/probing mechanisms improvements 2024-12-11 00:56:00 +01:00
Soxoj c66d776f8a Refactoring, test coverage increased to 60% (#1943) 2024-12-08 02:13:28 +01:00
Soxoj 2f93963a0a Refactored sites module, updated documentation (#1918) 2024-12-01 11:41:41 +01:00
Soxoj 324c118530 Parallel execution optimization (#1897)
* Connection failure fix: removed futures, added semaphores

* Additional fixes

* Tqdm replace to alive_progress, poetry update

* Self-check mode fix, tests fixes

* Sites checks fixes (#1896)

* Fixed incorrect site names, added method to compare sites
2024-11-26 13:55:12 +01:00
Richard Mwewa f7f77e587c Fixed/Disabled sites. Update requirements.txt (#1517)
* Fixed/Disabled sites. Update requirements.txt

fixed_sites: AllRecipes, Linktree, CreativeMarket, ImgInn, Shutterstock, Contently

disabled_sites: Forums.ea.com. CrunchyRoll, Windy, MetaCritic, InfosecInstitute, Armchairgm.fandom.com, Bleach.fandom.com

Update requirements to prevent dependency conflicts.

* Update requirements.txt

Update requirements.txt to prevent dependency conflicts

* Update requirements.txt

* Update sites.md

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher
2024-05-14 15:11:17 +02:00
Soxoj cbe1f09536 Added new forums, updated ranks, some utils improvements (#481)
* Added new forums, updated ranks, some utils improvements

* Updated requirements
2022-05-14 13:29:48 +03:00
Soxoj 1e2d5cf742 Fixed issue with str alexaRank 2022-03-06 16:19:25 +03:00
Soxoj 8a53a38543 Fixed the rest of false positives for now (#371)
* Fixed the rest of false positives for now

* Fixed tag

* Updated site list and statistics
2022-02-26 16:43:40 +03:00
Soxoj 1683e5b744 Added DB statistics autoupdate and write to sites.md (#357) 2022-02-23 18:01:42 +03:00
Soxoj 79f872c77c Added some scripts (#355) 2022-02-23 14:33:37 +03:00
Soxoj ecabf88c3a Added a couple of sites, fixed false positives (#286) 2022-01-03 01:35:53 +03:00
Soxoj 5912ad4fbc Added fuzzy search by StackOverflow 2021-05-10 00:39:36 +03:00
Soxoj 9eb62e4e22 Tags sorting and some updates 2021-05-09 23:19:41 +03:00
Soxoj 43f189f774 Tags updates, script added 2021-05-09 16:25:42 +03:00
Soxoj f77d7d307a Tags markup stabilization 2021-05-08 00:59:54 +03:00
Soxoj b6a207d0e3 Updated sites, improved submit dialog, bump to 0.2.2 2021-05-07 12:27:24 +03:00
Soxoj 8c700b9810 Added new sites through auto submit, some fixes 2021-03-18 23:21:33 +03:00
Soxoj c16fc7c002 Added several sites, updated sites list 2021-02-13 23:24:53 +03:00
Soxoj 10426c07aa Favicons added to sites list 2021-02-07 00:43:42 +03:00
Soxoj 50f1f6d915 Sites list updated 2021-01-21 22:18:05 +03:00
Soxoj 47cf6f4d81 Self-checking mode fixed, tags/names site filtering & ranking 2021-01-11 21:42:24 +03:00
Soxoj ac0be37480 first commit 2020-01-08 09:51:07 +03:00