Commit Graph

112 Commits

Author SHA1 Message Date
Soxoj 42de0a526d Sites re-check (#2423) 2026-04-08 00:48:37 +02:00
github-actions[bot] 20b74383ee Updated site list and statistics (#2399)
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Soxoj 06c3360e3d feat(core): add POST request support, new sites, migrate to Majestic Million ranking (#2317)
* feat(core): add POST request support, new sites, migrate to Majestic Million ranking
- Added native POST request support to the Maigret engine (requestMethod, requestPayload) to enable querying modern JSON registration endpoints.
- Replaced the discontinued Alexa rank API with the Majestic Million dataset for global popularity sorting and automated CI updates.
- Fixed multiple false positives among top 500 sites and bypassed standard anti-bot protections using custom User-Agents.
- Updated public documentation and internal playbooks to reflect the new features.

* feat(data): apply all data.json site check updates from main branch

- Added CTFtime and PentesterLab (new sites added in main)
- Removed forums.imore.com (deleted in main as dead site)
- Disabled 5 sites per main branch fixes: Librusec, MirTesen, amateurvoyeurforum.com, forums.stevehoffman.tv, vegalab
- Fixed 5 site checks per main branch: SoundCloud, Taplink, Setlist, RoyalCams, club.cnews.ru (switched from status_code to message checkType with proper markers)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/a1d194d9-c0ff-4e2b-974c-c5e4b59548bf

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
github-actions[bot] ff13d5fafb Automated Sites List Update (#2341)
* Updated site list and statistics

* Rebase and regenerate sites.md against latest main (#2351)

* Updated site list and statistics

* Initial plan

* Disable MirTesen site check (false positive) (#2350)

* Initial plan

* Disable MirTesen site check to fix false-positive probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/61c86064-423d-4f1b-8277-2838f747dd89

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

* build(deps): bump attrs from 25.4.0 to 26.1.0 (#2344)

Bumps [attrs](https://github.com/sponsors/hynek) from 25.4.0 to 26.1.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-version: 26.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated site list and statistics

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
github-actions[bot] ebbff47829 Automated Sites List Update (#2339)
* Updated site list and statistics

* Rebase: merge origin/main into auto/update-sites-list (#2340)

* Updated site list and statistics (#2315)

Co-authored-by: soxoj <soxoj@users.noreply.github.com>

* Initial plan

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

---------

Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
github-actions[bot] 9d6319aebd Updated site list and statistics (#2315)
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 05015a9cce [WIP] Fix invalid link on forums.imore.com (#2337)
* Initial plan

* Remove dead forums.imore.com site from database

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/c83530d0-d24f-45fc-aca3-ae1e46ece33c

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 278a5082ce Remove dead site xxxforum.org (#2310)
* Initial plan

* Remove broken site xxxforum.org from data.json and sites.md

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/bfbd3aa8-bfb1-480a-b2e7-a2c40fc69def

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
github-actions[bot] d1ecd8a965 Updated site list and statistics (#2314)
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 6c5f67f30b Re-enable taplink.cc with browser User-Agent to bypass Cloudflare (#2308)
* Initial plan

* fix(taplink): re-enable taplink.cc with browser User-Agent header to bypass Cloudflare

Remove disabled flag and add a Chrome User-Agent header to help
bypass Cloudflare bot detection for taplink.cc profile checks.
If Cloudflare still blocks requests, maigret's built-in error
detection will gracefully mark results as UNKNOWN.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/271904b6-e358-4aeb-b503-21c9b91186d9

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Soxoj 5fa86187f5 feat(sites): fix false positives: disable 74 broken sites, fix 8 with API probes and better markers (#2302)
- Disable 74 sites: Cloudflare/captcha blocks, identical responses,
    dead domains, vBulletin/phpBB engine failures
  - Fix Roblox, Salon24.pl, Planetaexcel → status_code (clear 404 signal)
  - Fix en.brickimedia.org → message with "noarticletext" absenceStr
  - Fix Arduino → narrower title-based presenseStrs/absenceStrs
  - Re-enable Fandom (3 wikis) via MediaWiki api.php urlProbe
  - Re-enable Substack via /api/v1/user/{}/public_profile urlProbe
  - Re-enable hashnode via GraphQL GET urlProbe (URL-encoded query)
  - Document lessons: engine template drift, search-by-author fragility,
    always-200 sites, TLS degradation, API bypassing Cloudflare,
    GraphQL GET support, URL-encoding for template safety
2026-04-08 00:48:36 +02:00
Soxoj c9ab9d676b Improve site-check quality: fix broken site configs, add diagnostic utilities, and make self-check report-only by default with opt-in auto-disable. (#2301)
- Fix VK and TradingView checkType; add Reddit and Microsoft Learn API-style probes where appropriate; adjust or disable entries that are unreliable under anti-bot protection.
- Self-check: stop aggressive auto-disable; default to reporting issues only; add --auto-disable and --diagnose for optional fixes and deeper output.
- Tooling: add utils/site_check.py and utils/check_top_n.py (and related helpers) to inspect and rank site behavior against the top-N list
- Scope: aligns with fixing top-traffic / high-impact sites and making diagnostics repeatable without silently flipping disabled flags
2026-04-08 00:48:36 +02:00
Soxoj 59535c59e5 Fixed false positives in top-500 (#2292) 2026-04-08 00:48:36 +02:00
Soxoj fb26ccd1f6 Disabled some sites giving false positive results (#2170) 2025-08-22 03:10:47 +02:00
Soxoj bebadb0362 Bump to 0.5.0 (#2108) 2025-08-10 13:10:50 +02:00
Pierre-Yves Lapersonne f76ea5d738 [#2010] Add 6 more websites to manage (#2009)
* feat: add `framapiaf.org` in supported web sites, add tag `mastodon` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `write.as` in supported web sites, add tag `writefreely` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `programming.dev` in supported web sites, add tag `lemmy` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `mamot.fr` in supported web sites (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `pixelfed.social` in supported web sites, add tag `pixelfed` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `Outgress` in supported web sites (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* Updated the list of supported sites

---------

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>
Co-authored-by: Soxoj <soxoj@protonmail.com>
2025-06-28 23:33:29 +02:00
Soxoj c4af0a4df0 Fixed flaky tests to check cookies (#1965) 2024-12-13 12:37:58 +01:00
Soxoj f212bc9bc8 Site check fixes 2024-12-12 21:39:35 +01:00
Soxoj 64ae391a4a Updated Vimeo, CNET, DailyMotion 2024-12-11 01:17:20 +01:00
Soxoj 127d9032c3 Fixed Vimeo, activation/probing mechanisms improvements 2024-12-11 00:56:00 +01:00
Soxoj 81a817a39f Improved "submit new site" mode, added tests, fixed top-500 sites (#1952) 2024-12-10 18:02:43 +01:00
Soxoj 5517636850 Updated OP.GG checks (#1950)
* Updated OP.GG checks
* Finalized LoL, added Valorant, disabled Archive.org
2024-12-09 15:59:19 +01:00
Soxoj c66d776f8a Refactoring, test coverage increased to 60% (#1943) 2024-12-08 02:13:28 +01:00
Soxoj 2aa1ea39a0 Site fixes (#1940) 2024-12-06 14:27:38 +01:00
Soxoj f04de78682 Activation mechanism documentation added (#1935)
Few site checks fixed
2024-12-06 01:35:19 +01:00
Soxoj 2f93963a0a Refactored sites module, updated documentation (#1918) 2024-12-01 11:41:41 +01:00
Soxoj d15e12750b Sites fixes (#1917)
* Some sites fixes

* Sites stats updated
2024-12-01 03:19:36 +01:00
Soxoj e96d09dee7 Permutator output and documentation updates (#1914) 2024-11-29 13:15:03 +01:00
Soxoj 324c118530 Parallel execution optimization (#1897)
* Connection failure fix: removed futures, added semaphores

* Additional fixes

* Tqdm replace to alive_progress, poetry update

* Self-check mode fix, tests fixes

* Sites checks fixes (#1896)

* Fixed incorrect site names, added method to compare sites
2024-11-26 13:55:12 +01:00
Soxoj b370bc4c44 Sites checks fixes (#1896)
Fixed incorrect site names, added method to compare sites
2024-11-26 13:29:43 +01:00
Soxoj 13c20afe5b Improved self-check mode (#1887) 2024-11-25 18:27:59 +01:00
Soxoj d8a05807ba New sites added (#1888) 2024-11-25 18:24:20 +01:00
Soxoj 86d51bced0 Added 7 sites, implemented integration with Marple, docs update (#1881)
* Added 5 sites, implemented integration with Marple

* Added 2 more sites, updated docs

* Updated sites list
2024-11-25 14:41:34 +01:00
Soxoj 54b864f167 Disabled unavailable sites (#1880) 2024-11-24 17:19:31 +01:00
Soxoj 24e545b62c Added dev documentation, fixed some sites, removed GitHub issue links from reports (#1869) 2024-11-23 18:45:56 +01:00
Richard Mwewa 034153791b Fixed 3 sites, disabed 3, added (#1539)
* Fixed/Disabled sites. Update requirements.txt

fixed_sites: AllRecipes, Linktree, CreativeMarket, ImgInn, Shutterstock, Contently

disabled_sites: Forums.ea.com. CrunchyRoll, Windy, MetaCritic, InfosecInstitute, Armchairgm.fandom.com, Bleach.fandom.com

Update requirements to prevent dependency conflicts.

* Update requirements.txt

Update requirements.txt to prevent dependency conflicts

* Update requirements.txt

* Update sites.md

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher

* fixed 2 sites, disabled 22 sites, and added 1 site

* fixed 3 sites, disabled 28, added 4 sites

* update sites.md

* Added 2 more sites

* fixed 3 sites, disabled 3 sites, added 1 site

* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml

* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml

* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml

* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml

* Update sites.md

* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml

* Update sites.md

* fix forums.drom.ru

* Add EduGeek

* Add EduGeek

* Update python-package.yml

Fix dependency installation

* Update python-package.yml

* Update python-package.yml
2024-05-24 14:51:27 +02:00
Richard Mwewa 9399737ee6 Fixed 4 sites, added 6 sites, disabled 27 sites (#1536)
* Fixed/Disabled sites. Update requirements.txt

fixed_sites: AllRecipes, Linktree, CreativeMarket, ImgInn, Shutterstock, Contently

disabled_sites: Forums.ea.com. CrunchyRoll, Windy, MetaCritic, InfosecInstitute, Armchairgm.fandom.com, Bleach.fandom.com

Update requirements to prevent dependency conflicts.

* Update requirements.txt

Update requirements.txt to prevent dependency conflicts

* Update requirements.txt

* Update sites.md

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher

* fixed 2 sites, disabled 22 sites, and added 1 site

* fixed 3 sites, disabled 28, added 4 sites

* update sites.md

* Added 2 more sites
2024-05-18 01:50:05 +02:00
Richard Mwewa f7f77e587c Fixed/Disabled sites. Update requirements.txt (#1517)
* Fixed/Disabled sites. Update requirements.txt

fixed_sites: AllRecipes, Linktree, CreativeMarket, ImgInn, Shutterstock, Contently

disabled_sites: Forums.ea.com. CrunchyRoll, Windy, MetaCritic, InfosecInstitute, Armchairgm.fandom.com, Bleach.fandom.com

Update requirements to prevent dependency conflicts.

* Update requirements.txt

Update requirements.txt to prevent dependency conflicts

* Update requirements.txt

* Update sites.md

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher
2024-05-14 15:11:17 +02:00
Soxoj 2ccef4a9f9 Updated site statistics (#1273) 2023-10-27 21:48:37 +02:00
Soxoj 656b9c19ea Improved search through UnstoppableDomains (#1040) 2023-07-07 21:24:20 +02:00
Soxoj 932e07a8ee Added 26 ENS and similar domains with tag crypto (#942) 2023-05-13 18:23:17 +08:00
Soxoj fc1f5bfc82 Fixed false positives on Mastodon sites (#901) 2023-04-17 10:51:32 +02:00
fen0s 6020e766ce fix opensea and shutterstock, disable a few dead sites (#798)
* fix shutterstock and disable allsoft

* disable dead forums and fix opensea

* Update sites.md
2022-12-18 12:22:24 +03:00
fen0s aebd8539ed disable broken sites (#756)
* Update data.json

* Update sites.md
2022-11-22 23:13:52 +03:00
fen0s fea1c6b552 disable not working sites (#739)
* Update data.json
* Update sites.md

Co-authored-by: Soxoj <31013580+soxoj@users.noreply.github.com>
2022-11-08 10:47:21 +04:00
Soxoj 026fd98304 Fixed YouTube (#717) 2022-10-17 01:17:09 +03:00
fen0s d4d525647c fix sites from issues (#680)
* Update data.json
* Update sites.md
2022-10-03 23:00:48 +03:00
fen0s d9fd6e0b29 fix false positives from bot (#663)
* fix false positives from bot

* Update data.json

* Update sites.md
2022-09-29 20:56:15 +03:00
Soxoj eb304b6804 Invalid results fixes (#634) 2022-09-11 14:26:19 +03:00
Soxoj c5e973bc5b Streaming sites (#628)
* Added new sites, new error solution caption
2022-09-11 01:49:46 +03:00