mirror of
https://github.com/soxoj/maigret.git
synced 2026-05-06 14:08:59 +00:00
Improve site-check quality: fix broken site configs, add diagnostic utilities, and make self-check report-only by default with opt-in auto-disable. (#2301)
- Fix VK and TradingView checkType; add Reddit and Microsoft Learn API-style probes where appropriate; adjust or disable entries that are unreliable under anti-bot protection. - Self-check: stop aggressive auto-disable; default to reporting issues only; add --auto-disable and --diagnose for optional fixes and deeper output. - Tooling: add utils/site_check.py and utils/check_top_n.py (and related helpers) to inspect and rank site behavior against the top-N list - Scope: aligns with fixing top-traffic / high-impact sites and making diagnostics repeatable without silently flipping disabled flags
This commit is contained in:
@@ -1,5 +1,5 @@
|
||||
|
||||
## List of supported sites (search methods): total 3143
|
||||
## List of supported sites (search methods): total 3144
|
||||
|
||||
Rank data fetched from Alexa by domains.
|
||||
|
||||
@@ -8,13 +8,14 @@ Rank data fetched from Alexa by domains.
|
||||
1.  [GooglePlayStore (https://play.google.com/store)](https://play.google.com/store)*: top 1, apps, us*
|
||||
1.  [YouTube (https://www.youtube.com/)](https://www.youtube.com/)*: top 2, video*
|
||||
1.  [YouTube User (https://www.youtube.com/)](https://www.youtube.com/)*: top 2, video*
|
||||
1.  [Baidu (https://tieba.baidu.com)](https://tieba.baidu.com)*: top 3, cn*
|
||||
1.  [Baidu (https://tieba.baidu.com)](https://tieba.baidu.com)*: top 3, cn*, search is disabled
|
||||
1.  [Facebook (https://www.facebook.com/)](https://www.facebook.com/)*: top 10, networking*
|
||||
1.  [Amazon (https://amazon.com)](https://amazon.com)*: top 50, us*
|
||||
1.  [Wikipedia (https://www.wikipedia.org/)](https://www.wikipedia.org/)*: top 50, wiki*
|
||||
1.  [Wikipedia (https://en.wikipedia.org/)](https://en.wikipedia.org/)*: top 50, wiki*, search is disabled
|
||||
1.  [Reddit (https://www.reddit.com/)](https://www.reddit.com/)*: top 50, discussion, news*
|
||||
1.  [social.msdn.microsoft.com (https://social.msdn.microsoft.com)](https://social.msdn.microsoft.com)*: top 50, us*, search is disabled
|
||||
1.  [MicrosoftTechNet (https://social.technet.microsoft.com)](https://social.technet.microsoft.com)*: top 50, us*, search is disabled
|
||||
1.  [MicrosoftLearn (https://learn.microsoft.com)](https://learn.microsoft.com)*: top 50, tech, us*
|
||||
1.  [Weibo (https://weibo.com)](https://weibo.com)*: top 50, cn, networking*
|
||||
1.  [GitHubGist (https://gist.github.com)](https://gist.github.com)*: top 50, coding, sharing*
|
||||
1.  [VK (https://vk.com/)](https://vk.com/)*: top 50, ru*
|
||||
@@ -52,7 +53,7 @@ Rank data fetched from Alexa by domains.
|
||||
1.  [YandexBugbounty (https://yandex.ru/bugbounty/)](https://yandex.ru/bugbounty/)*: top 50, hacking, ru*, search is disabled
|
||||
1.  [YandexCollections API (by yandex_public_id) (https://yandex.ru/collections/)](https://yandex.ru/collections/)*: top 50, ru, sharing*
|
||||
1.  [YandexMarket (https://market.yandex.ru/)](https://market.yandex.ru/)*: top 50, ru*
|
||||
1.  [YandexMusic (https://music.yandex.ru/)](https://music.yandex.ru/)*: top 50, music, ru*
|
||||
1.  [YandexMusic (https://music.yandex.ru/)](https://music.yandex.ru/)*: top 50, music, ru*, search is disabled
|
||||
1.  [YandexZnatoki (https://yandex.ru/q/)](https://yandex.ru/q/)*: top 50, ru*
|
||||
1.  [YandexZenChannel (https://dzen.ru)](https://dzen.ru)*: top 50, ru*
|
||||
1.  [YandexZenUser (https://zen.yandex.ru)](https://zen.yandex.ru)*: top 50, ru*
|
||||
@@ -61,18 +62,18 @@ Rank data fetched from Alexa by domains.
|
||||
1.  [OK (https://ok.ru/)](https://ok.ru/)*: top 100, ru*
|
||||
1.  [community.adobe.com (https://community.adobe.com)](https://community.adobe.com)*: top 100, us*
|
||||
1.  [TradingView (https://www.tradingview.com/)](https://www.tradingview.com/)*: top 100, trading, us*
|
||||
1.  [Aparat (https://www.aparat.com)](https://www.aparat.com)*: top 100, ir, video*
|
||||
1.  [Aparat (https://www.aparat.com)](https://www.aparat.com)*: top 100, ir, video*, search is disabled
|
||||
1.  [ChaturBate (https://chaturbate.com)](https://chaturbate.com)*: top 100, us*
|
||||
1.  [Medium (https://medium.com/)](https://medium.com/)*: top 100, blog, us*, search is disabled
|
||||
1.  [Livejasmin (https://www.livejasmin.com/)](https://www.livejasmin.com/)*: top 100, us, webcam*
|
||||
1.  [Pornhub (https://pornhub.com/)](https://pornhub.com/)*: top 100, porn*
|
||||
1.  [Pornhub (https://pornhub.com/)](https://pornhub.com/)*: top 100, porn*, search is disabled
|
||||
1.  [Imgur (https://imgur.com)](https://imgur.com)*: top 100, photo*
|
||||
1.  [Armchairgm (https://armchairgm.fandom.com/)](https://armchairgm.fandom.com/)*: top 100, us, wiki*
|
||||
1.  [Battleraprus (https://battleraprus.fandom.com/ru)](https://battleraprus.fandom.com/ru)*: top 100, ru, us, wiki*
|
||||
1.  [BleachFandom (https://bleach.fandom.com/ru)](https://bleach.fandom.com/ru)*: top 100, ru, wiki*
|
||||
1.  [Fandom (https://www.fandom.com/)](https://www.fandom.com/)*: top 100, us*
|
||||
1.  [FandomCommunityCentral (https://community.fandom.com)](https://community.fandom.com)*: top 100, wiki*
|
||||
1.  [Etsy (https://www.etsy.com/)](https://www.etsy.com/)*: top 100, shopping, us*
|
||||
1.  [Etsy (https://www.etsy.com/)](https://www.etsy.com/)*: top 100, shopping, us*, search is disabled
|
||||
1.  [GitHub (https://www.github.com/)](https://www.github.com/)*: top 100, coding*
|
||||
1.  [Spotify (https://open.spotify.com/)](https://open.spotify.com/)*: top 100, music, us*, search is disabled
|
||||
1.  [TikTok (https://www.tiktok.com/)](https://www.tiktok.com/)*: top 100, video*
|
||||
@@ -80,7 +81,7 @@ Rank data fetched from Alexa by domains.
|
||||
1.  [Tumblr (https://www.tumblr.com)](https://www.tumblr.com)*: top 500, blog*
|
||||
1.  [Roblox (https://www.roblox.com/)](https://www.roblox.com/)*: top 500, gaming, us*
|
||||
1.  [SoundCloud (https://soundcloud.com/)](https://soundcloud.com/)*: top 500, music*
|
||||
1.  [Udemy (https://www.udemy.com)](https://www.udemy.com)*: top 500, in*
|
||||
1.  [Udemy (https://www.udemy.com)](https://www.udemy.com)*: top 500, in*, search is disabled
|
||||
1.  [discourse.mozilla.org (https://discourse.mozilla.org)](https://discourse.mozilla.org)*: top 500*
|
||||
1.  [linktr.ee (https://linktr.ee)](https://linktr.ee)*: top 500, links*
|
||||
1.  [xHamster (https://xhamster.com)](https://xhamster.com)*: top 500, porn, us*
|
||||
@@ -525,7 +526,7 @@ Rank data fetched from Alexa by domains.
|
||||
1.  [Neoseeker (https://www.neoseeker.com)](https://www.neoseeker.com)*: top 100K, us*
|
||||
1.  [InfosecInstitute (https://community.infosecinstitute.com)](https://community.infosecinstitute.com)*: top 100K, us*, search is disabled
|
||||
1.  [Armorgames (https://armorgames.com)](https://armorgames.com)*: top 100K, gaming, us*
|
||||
1.  [giters.com (https://giters.com)](https://giters.com)*: top 100K, coding*
|
||||
1.  [giters.com (https://giters.com)](https://giters.com)*: top 100K, coding*, search is disabled
|
||||
1.  [teamtreehouse.com (https://teamtreehouse.com)](https://teamtreehouse.com)*: top 100K, us*
|
||||
1.  [Blu-ray (https://forum.blu-ray.com/)](https://forum.blu-ray.com/)*: top 100K, forum, us*, search is disabled
|
||||
1.  [TheOdysseyOnline (https://www.theodysseyonline.com)](https://www.theodysseyonline.com)*: top 100K, blog*
|
||||
@@ -1120,7 +1121,7 @@ Rank data fetched from Alexa by domains.
|
||||
1.  [commons.ishtar-collective.net (https://commons.ishtar-collective.net)](https://commons.ishtar-collective.net)*: top 10M, forum, gaming*
|
||||
1.  [4cheat (https://4cheat.ru)](https://4cheat.ru)*: top 10M, forum, ru*, search is disabled
|
||||
1.  [svtperformance.com (https://svtperformance.com)](https://svtperformance.com)*: top 10M, forum, us*
|
||||
1.  [githubplus.com (https://githubplus.com)](https://githubplus.com)*: top 10M, coding*
|
||||
1.  [githubplus.com (https://githubplus.com)](https://githubplus.com)*: top 10M, coding*, search is disabled
|
||||
1.  [Runitonce (https://www.runitonce.com/)](https://www.runitonce.com/)*: top 10M, ca, us*
|
||||
1.  [Paypal (https://www.paypal.me)](https://www.paypal.me)*: top 10M, finance*
|
||||
1.  [Seatracker (https://seatracker.ru/)](https://seatracker.ru/)*: top 10M, ru*
|
||||
@@ -1239,7 +1240,7 @@ Rank data fetched from Alexa by domains.
|
||||
1.  [Faqusha (https://faqusha.ru)](https://faqusha.ru)*: top 10M, ru*
|
||||
1.  [Skyrimforums (https://skyrimforums.org)](https://skyrimforums.org)*: top 10M, forum, in, us*
|
||||
1.  [juce (https://forum.juce.com)](https://forum.juce.com)*: top 10M, ca, forum, us*
|
||||
1.  [rblx.trade (https://rblx.trade)](https://rblx.trade)*: top 10M, gaming*
|
||||
1.  [rblx.trade (https://rblx.trade)](https://rblx.trade)*: top 10M, gaming*, search is disabled
|
||||
1.  [quik (https://forum.quik.ru)](https://forum.quik.ru)*: top 10M, forum, ru*
|
||||
1.  [navimba.com (https://navimba.com)](https://navimba.com)*: top 10M*
|
||||
1.  [Gardenstew (https://www.gardenstew.com)](https://www.gardenstew.com)*: top 10M, forum, in, us*, search is disabled
|
||||
@@ -3147,18 +3148,18 @@ Rank data fetched from Alexa by domains.
|
||||
1.  [OP.GG [Valorant] (https://valorant.op.gg)](https://valorant.op.gg)*: top 100M, gaming*
|
||||
1.  [write.as (https://write.as)](https://write.as)*: top 100M, writefreely*
|
||||
|
||||
The list was updated at (2026-03-21)
|
||||
The list was updated at (2026-03-22)
|
||||
## Statistics
|
||||
|
||||
Enabled/total sites: 2650/3143 = 84.31%
|
||||
Enabled/total sites: 2641/3144 = 84.0%
|
||||
|
||||
Incomplete message checks: 387/2650 = 14.6% (false positive risks)
|
||||
Incomplete message checks: 386/2641 = 14.62% (false positive risks)
|
||||
|
||||
Status code checks: 607/2650 = 22.91% (false positive risks)
|
||||
Status code checks: 608/2641 = 23.02% (false positive risks)
|
||||
|
||||
False positive risk (total): 37.51%
|
||||
False positive risk (total): 37.64%
|
||||
|
||||
Sites with probing: 500px, Aparat, BinarySearch (disabled), BongaCams, BuyMeACoffee, Cent, Disqus, Docker Hub, Duolingo, Gab, GitHub, GitLab, Google Plus (archived), Gravatar, Imgur, Issuu, Keybase, Livejasmin, LocalCryptos (disabled), MixCloud, Niftygateway, Reddit Search (Pushshift) (disabled), SportsTracker, Spotify (disabled), TAP'D, Trello, Twitch, Twitter, Twitter Shadowban (disabled), UnstoppableDomains, Vimeo, Weibo, Yapisal (disabled), YouNow, nightbot, notabug.org, polarsteps, qiwi.me (disabled)
|
||||
Sites with probing: 500px, Aparat (disabled), BinarySearch (disabled), BongaCams, BuyMeACoffee, Cent, Chess, Disqus, Docker Hub, Duolingo, Gab, GitHub, GitLab, Google Plus (archived), Gravatar, Imgur, Issuu, Keybase, Livejasmin, LocalCryptos (disabled), MicrosoftLearn, MixCloud, Niftygateway, Picsart, Reddit, Reddit Search (Pushshift) (disabled), SportsTracker, Spotify (disabled), TAP'D, Trello, Twitch, Twitter, Twitter Shadowban (disabled), UnstoppableDomains, Vimeo, Weibo, Yapisal (disabled), YouNow, nightbot, notabug.org, polarsteps, qiwi.me (disabled)
|
||||
|
||||
Sites with activation: Spotify (disabled), Twitter, Vimeo, Weibo
|
||||
|
||||
@@ -3170,7 +3171,7 @@ Top 20 profile URLs:
|
||||
- (133) `{urlMain}{urlSubpath}/member.php?username={username} (vBulletin)`
|
||||
- (127) `{urlMain}{urlSubpath}/search.php?author={username} (phpBB/Search)`
|
||||
- (118) `/profile/{username}`
|
||||
- (111) `/u/{username}`
|
||||
- (112) `/u/{username}`
|
||||
- (88) `/users/{username}`
|
||||
- (87) `{urlMain}/u/{username}/summary (Discourse)`
|
||||
- (54) `/@{username}`
|
||||
@@ -3191,7 +3192,7 @@ Top 20 tags:
|
||||
- (92) `gaming`
|
||||
- (48) `photo`
|
||||
- (41) `coding`
|
||||
- (30) `tech`
|
||||
- (31) `tech`
|
||||
- (29) `news`
|
||||
- (28) `blog`
|
||||
- (23) `music`
|
||||
|
||||
Reference in New Issue
Block a user