Soxoj
f9f9ec8ada
Fix false positives ( #2459 )
...
* Fix false positives: APClips, Taplink, gentoo, Discord.bio, ChaturBate; disable 7Cups, playtime, openriskmanual, reactos; update tags
* Fix db_meta.json regeneration in update_site_data.py (inline instead of module import)
2026-04-08 00:48:37 +02:00
Soxoj
66b741793e
Added Crypto/Web3 site checks ( #2457 )
2026-04-08 00:48:37 +02:00
Soxoj
99847ad3e7
Add site protection tracking system, fix broken site checks (Instagra… ( #2452 )
...
* Add site protection tracking system, fix broken site checks (Instagram, StackOverflow, LeetCode, Boosty, LiveLib), preserve unicode in data.json
* Update poetry.lock by running poetry lock
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/14333f41-67d5-4e28-a782-9730b31fc667
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
2026-04-08 00:48:37 +02:00
Soxoj
f41f9439fc
Overhaul site tags and naming: add social tag to 33 networks, fill mi… ( #2430 )
...
* Overhaul site tags and naming: add social tag to 33 networks, fill missing tags for 213 top-1000 sites, clean up false us/in country tags (~374 sites), normalize site names to Title Case, add tag validation tests, document tagging and naming rules
Remove LLM folder: ask @soxoj for the up-to-date version!
* Remove LLM/ from version control
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-08 00:48:37 +02:00
Soxoj
d27acbed86
Add urlProbes ( #2425 )
2026-04-08 00:48:37 +02:00
Soxoj
42de0a526d
Sites re-check ( #2423 )
2026-04-08 00:48:37 +02:00
github-actions[bot]
20b74383ee
Updated site list and statistics ( #2399 )
...
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com >
2026-04-08 00:48:36 +02:00
Soxoj
06c3360e3d
feat(core): add POST request support, new sites, migrate to Majestic Million ranking ( #2317 )
...
* feat(core): add POST request support, new sites, migrate to Majestic Million ranking
- Added native POST request support to the Maigret engine (requestMethod, requestPayload) to enable querying modern JSON registration endpoints.
- Replaced the discontinued Alexa rank API with the Majestic Million dataset for global popularity sorting and automated CI updates.
- Fixed multiple false positives among top 500 sites and bypassed standard anti-bot protections using custom User-Agents.
- Updated public documentation and internal playbooks to reflect the new features.
* feat(data): apply all data.json site check updates from main branch
- Added CTFtime and PentesterLab (new sites added in main)
- Removed forums.imore.com (deleted in main as dead site)
- Disabled 5 sites per main branch fixes: Librusec, MirTesen, amateurvoyeurforum.com, forums.stevehoffman.tv, vegalab
- Fixed 5 site checks per main branch: SoundCloud, Taplink, Setlist, RoyalCams, club.cnews.ru (switched from status_code to message checkType with proper markers)
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com >
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/a1d194d9-c0ff-4e2b-974c-c5e4b59548bf
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
2026-04-08 00:48:36 +02:00
github-actions[bot]
ff13d5fafb
Automated Sites List Update ( #2341 )
...
* Updated site list and statistics
* Rebase and regenerate sites.md against latest main (#2351 )
* Updated site list and statistics
* Initial plan
* Disable MirTesen site check (false positive) (#2350 )
* Initial plan
* Disable MirTesen site check to fix false-positive probe
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com >
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/61c86064-423d-4f1b-8277-2838f747dd89
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com >
* build(deps): bump attrs from 25.4.0 to 26.1.0 (#2344 )
Bumps [attrs](https://github.com/sponsors/hynek ) from 25.4.0 to 26.1.0.
- [Commits](https://github.com/sponsors/hynek/commits )
---
updated-dependencies:
- dependency-name: attrs
dependency-version: 26.1.0
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Updated site list and statistics
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: soxoj <soxoj@users.noreply.github.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: soxoj <soxoj@users.noreply.github.com >
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com >
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
github-actions[bot]
ebbff47829
Automated Sites List Update ( #2339 )
...
* Updated site list and statistics
* Rebase: merge origin/main into auto/update-sites-list (#2340 )
* Updated site list and statistics (#2315 )
Co-authored-by: soxoj <soxoj@users.noreply.github.com >
* Initial plan
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: soxoj <soxoj@users.noreply.github.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
---------
Co-authored-by: soxoj <soxoj@users.noreply.github.com >
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com >
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
github-actions[bot]
9d6319aebd
Updated site list and statistics ( #2315 )
...
Co-authored-by: soxoj <soxoj@users.noreply.github.com >
2026-04-08 00:48:36 +02:00
Copilot
05015a9cce
[WIP] Fix invalid link on forums.imore.com ( #2337 )
...
* Initial plan
* Remove dead forums.imore.com site from database
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com >
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/c83530d0-d24f-45fc-aca3-ae1e46ece33c
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com >
2026-04-08 00:48:36 +02:00
Copilot
278a5082ce
Remove dead site xxxforum.org ( #2310 )
...
* Initial plan
* Remove broken site xxxforum.org from data.json and sites.md
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com >
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/bfbd3aa8-bfb1-480a-b2e7-a2c40fc69def
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com >
2026-04-08 00:48:36 +02:00
github-actions[bot]
d1ecd8a965
Updated site list and statistics ( #2314 )
...
Co-authored-by: soxoj <soxoj@users.noreply.github.com >
2026-04-08 00:48:36 +02:00
Copilot
6c5f67f30b
Re-enable taplink.cc with browser User-Agent to bypass Cloudflare ( #2308 )
...
* Initial plan
* fix(taplink): re-enable taplink.cc with browser User-Agent header to bypass Cloudflare
Remove disabled flag and add a Chrome User-Agent header to help
bypass Cloudflare bot detection for taplink.cc profile checks.
If Cloudflare still blocks requests, maigret's built-in error
detection will gracefully mark results as UNKNOWN.
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com >
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/271904b6-e358-4aeb-b503-21c9b91186d9
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com >
2026-04-08 00:48:36 +02:00
Soxoj
5fa86187f5
feat(sites): fix false positives: disable 74 broken sites, fix 8 with API probes and better markers ( #2302 )
...
- Disable 74 sites: Cloudflare/captcha blocks, identical responses,
dead domains, vBulletin/phpBB engine failures
- Fix Roblox, Salon24.pl, Planetaexcel → status_code (clear 404 signal)
- Fix en.brickimedia.org → message with "noarticletext" absenceStr
- Fix Arduino → narrower title-based presenseStrs/absenceStrs
- Re-enable Fandom (3 wikis) via MediaWiki api.php urlProbe
- Re-enable Substack via /api/v1/user/{}/public_profile urlProbe
- Re-enable hashnode via GraphQL GET urlProbe (URL-encoded query)
- Document lessons: engine template drift, search-by-author fragility,
always-200 sites, TLS degradation, API bypassing Cloudflare,
GraphQL GET support, URL-encoding for template safety
2026-04-08 00:48:36 +02:00
Soxoj
c9ab9d676b
Improve site-check quality: fix broken site configs, add diagnostic utilities, and make self-check report-only by default with opt-in auto-disable. ( #2301 )
...
- Fix VK and TradingView checkType; add Reddit and Microsoft Learn API-style probes where appropriate; adjust or disable entries that are unreliable under anti-bot protection.
- Self-check: stop aggressive auto-disable; default to reporting issues only; add --auto-disable and --diagnose for optional fixes and deeper output.
- Tooling: add utils/site_check.py and utils/check_top_n.py (and related helpers) to inspect and rank site behavior against the top-N list
- Scope: aligns with fixing top-traffic / high-impact sites and making diagnostics repeatable without silently flipping disabled flags
2026-04-08 00:48:36 +02:00
Soxoj
59535c59e5
Fixed false positives in top-500 ( #2292 )
2026-04-08 00:48:36 +02:00
Soxoj
fb26ccd1f6
Disabled some sites giving false positive results ( #2170 )
2025-08-22 03:10:47 +02:00
Soxoj
bebadb0362
Bump to 0.5.0 ( #2108 )
2025-08-10 13:10:50 +02:00
Pierre-Yves Lapersonne
f76ea5d738
[ #2010 ] Add 6 more websites to manage ( #2009 )
...
* feat: add `framapiaf.org` in supported web sites, add tag `mastodon` (#2010 )
Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info >
* feat: add `write.as` in supported web sites, add tag `writefreely` (#2010 )
Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info >
* feat: add `programming.dev` in supported web sites, add tag `lemmy` (#2010 )
Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info >
* feat: add `mamot.fr` in supported web sites (#2010 )
Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info >
* feat: add `pixelfed.social` in supported web sites, add tag `pixelfed` (#2010 )
Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info >
* feat: add `Outgress` in supported web sites (#2010 )
Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info >
* Updated the list of supported sites
---------
Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info >
Co-authored-by: Soxoj <soxoj@protonmail.com >
2025-06-28 23:33:29 +02:00
Soxoj
c4af0a4df0
Fixed flaky tests to check cookies ( #1965 )
2024-12-13 12:37:58 +01:00
Soxoj
f212bc9bc8
Site check fixes
2024-12-12 21:39:35 +01:00
Soxoj
64ae391a4a
Updated Vimeo, CNET, DailyMotion
2024-12-11 01:17:20 +01:00
Soxoj
127d9032c3
Fixed Vimeo, activation/probing mechanisms improvements
2024-12-11 00:56:00 +01:00
Soxoj
81a817a39f
Improved "submit new site" mode, added tests, fixed top-500 sites ( #1952 )
2024-12-10 18:02:43 +01:00
Soxoj
5517636850
Updated OP.GG checks ( #1950 )
...
* Updated OP.GG checks
* Finalized LoL, added Valorant, disabled Archive.org
2024-12-09 15:59:19 +01:00
Soxoj
c66d776f8a
Refactoring, test coverage increased to 60% ( #1943 )
2024-12-08 02:13:28 +01:00
Soxoj
2aa1ea39a0
Site fixes ( #1940 )
2024-12-06 14:27:38 +01:00
Soxoj
f04de78682
Activation mechanism documentation added ( #1935 )
...
Few site checks fixed
2024-12-06 01:35:19 +01:00
Soxoj
2f93963a0a
Refactored sites module, updated documentation ( #1918 )
2024-12-01 11:41:41 +01:00
Soxoj
d15e12750b
Sites fixes ( #1917 )
...
* Some sites fixes
* Sites stats updated
2024-12-01 03:19:36 +01:00
Soxoj
e96d09dee7
Permutator output and documentation updates ( #1914 )
2024-11-29 13:15:03 +01:00
Soxoj
324c118530
Parallel execution optimization ( #1897 )
...
* Connection failure fix: removed futures, added semaphores
* Additional fixes
* Tqdm replace to alive_progress, poetry update
* Self-check mode fix, tests fixes
* Sites checks fixes (#1896 )
* Fixed incorrect site names, added method to compare sites
2024-11-26 13:55:12 +01:00
Soxoj
b370bc4c44
Sites checks fixes ( #1896 )
...
Fixed incorrect site names, added method to compare sites
2024-11-26 13:29:43 +01:00
Soxoj
13c20afe5b
Improved self-check mode ( #1887 )
2024-11-25 18:27:59 +01:00
Soxoj
d8a05807ba
New sites added ( #1888 )
2024-11-25 18:24:20 +01:00
Soxoj
86d51bced0
Added 7 sites, implemented integration with Marple, docs update ( #1881 )
...
* Added 5 sites, implemented integration with Marple
* Added 2 more sites, updated docs
* Updated sites list
2024-11-25 14:41:34 +01:00
Soxoj
54b864f167
Disabled unavailable sites ( #1880 )
2024-11-24 17:19:31 +01:00
Soxoj
24e545b62c
Added dev documentation, fixed some sites, removed GitHub issue links from reports ( #1869 )
2024-11-23 18:45:56 +01:00
Richard Mwewa
034153791b
Fixed 3 sites, disabed 3, added ( #1539 )
...
* Fixed/Disabled sites. Update requirements.txt
fixed_sites: AllRecipes, Linktree, CreativeMarket, ImgInn, Shutterstock, Contently
disabled_sites: Forums.ea.com. CrunchyRoll, Windy, MetaCritic, InfosecInstitute, Armchairgm.fandom.com, Bleach.fandom.com
Update requirements to prevent dependency conflicts.
* Update requirements.txt
Update requirements.txt to prevent dependency conflicts
* Update requirements.txt
* Update sites.md
* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher
* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher
* fixed 2 sites, disabled 22 sites, and added 1 site
* fixed 3 sites, disabled 28, added 4 sites
* update sites.md
* Added 2 more sites
* fixed 3 sites, disabled 3 sites, added 1 site
* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml
* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml
* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml
* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml
* Update sites.md
* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml
* Update sites.md
* fix forums.drom.ru
* Add EduGeek
* Add EduGeek
* Update python-package.yml
Fix dependency installation
* Update python-package.yml
* Update python-package.yml
2024-05-24 14:51:27 +02:00
Richard Mwewa
9399737ee6
Fixed 4 sites, added 6 sites, disabled 27 sites ( #1536 )
...
* Fixed/Disabled sites. Update requirements.txt
fixed_sites: AllRecipes, Linktree, CreativeMarket, ImgInn, Shutterstock, Contently
disabled_sites: Forums.ea.com. CrunchyRoll, Windy, MetaCritic, InfosecInstitute, Armchairgm.fandom.com, Bleach.fandom.com
Update requirements to prevent dependency conflicts.
* Update requirements.txt
Update requirements.txt to prevent dependency conflicts
* Update requirements.txt
* Update sites.md
* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher
* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher
* fixed 2 sites, disabled 22 sites, and added 1 site
* fixed 3 sites, disabled 28, added 4 sites
* update sites.md
* Added 2 more sites
2024-05-18 01:50:05 +02:00
Richard Mwewa
f7f77e587c
Fixed/Disabled sites. Update requirements.txt ( #1517 )
...
* Fixed/Disabled sites. Update requirements.txt
fixed_sites: AllRecipes, Linktree, CreativeMarket, ImgInn, Shutterstock, Contently
disabled_sites: Forums.ea.com. CrunchyRoll, Windy, MetaCritic, InfosecInstitute, Armchairgm.fandom.com, Bleach.fandom.com
Update requirements to prevent dependency conflicts.
* Update requirements.txt
Update requirements.txt to prevent dependency conflicts
* Update requirements.txt
* Update sites.md
* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher
* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher
2024-05-14 15:11:17 +02:00
Soxoj
2ccef4a9f9
Updated site statistics ( #1273 )
2023-10-27 21:48:37 +02:00
Soxoj
656b9c19ea
Improved search through UnstoppableDomains ( #1040 )
2023-07-07 21:24:20 +02:00
Soxoj
932e07a8ee
Added 26 ENS and similar domains with tag crypto ( #942 )
2023-05-13 18:23:17 +08:00
Soxoj
fc1f5bfc82
Fixed false positives on Mastodon sites ( #901 )
2023-04-17 10:51:32 +02:00
fen0s
6020e766ce
fix opensea and shutterstock, disable a few dead sites ( #798 )
...
* fix shutterstock and disable allsoft
* disable dead forums and fix opensea
* Update sites.md
2022-12-18 12:22:24 +03:00
fen0s
aebd8539ed
disable broken sites ( #756 )
...
* Update data.json
* Update sites.md
2022-11-22 23:13:52 +03:00
fen0s
fea1c6b552
disable not working sites ( #739 )
...
* Update data.json
* Update sites.md
Co-authored-by: Soxoj <31013580+soxoj@users.noreply.github.com >
2022-11-08 10:47:21 +04:00