Commit Graph

488 Commits

Author SHA1 Message Date
Soxoj 5c93b206e7 Cloudflare bypass webgate (#2628) 2026-05-09 10:48:43 +03:00
Soxoj 03b62027f6 Fixed duplicates of YouTube and Periscope (#2618) 2026-05-05 14:02:37 +02:00
Soxoj f293bff417 Fix site checks: 7 fixed, 1 disabled, 1 dead deleted (#2616) 2026-05-04 23:40:58 +02:00
Soxoj a77a8b3e84 Reddit fix (#2614) 2026-05-04 14:12:22 +02:00
Soxoj 3ff05b240a Fix site checks: 8 → ip_reputation, 6 fixed, 9 disabled, 1 dead deleted (#2611) 2026-05-03 20:02:45 +02:00
github-actions[bot] ff0ffce427 Updated site list and statistics (#2607)
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-05-03 10:49:46 +02:00
Julio César Suástegui 8b5dce1d3c fix: disable RomanticCollection check (#2588)
* fix: disable RomanticCollection check

* chore: regenerate db metadata

---------

Co-authored-by: Julio César Suástegui <juliosuas@users.noreply.github.com>
2026-05-02 15:29:45 +02:00
Soxoj b79f8aca28 Add site checks: 18 new sites (#2575) 2026-04-29 16:55:47 +02:00
Soxoj 1352bd35c6 Fix site checks: 5 fixed, 4 disabled; fix UA leak bug (#2569) 2026-04-26 14:51:44 +02:00
Soxoj 3960510b63 Fix site checks: 7 fixed, 1 disabled (#2565)
False-positive site probe issues #2531, #2542, #2556, #2559, #2560, #2561, #2563, #2496.
2026-04-26 12:34:52 +02:00
Soxoj e962b8c693 Fix site checks: 5 fixed; readme fix (#2562)
* Fix site checks: 5 fixed; readme fix

* Logging improvements

* Improve YouTube data extraction
2026-04-25 18:15:38 +02:00
Soxoj 25026e21ea Fix site checks: 4 → ip_reputation, 9 fixed, 16 disabled, 3 dead dele… (#2555)
* Fix site checks: 4 → ip_reputation, 9 fixed, 16 disabled, 3 dead deleted; clarify ip_reputation tag semantics

* Improved test coverage
2026-04-23 21:17:07 +02:00
Soxoj 5e1cc45c17 Fix site checks: 12 fixed, 19 disabled; add new protection tags (#2550) 2026-04-22 20:25:41 +02:00
Soxoj d9b361b626 Fix site checks: 3 → ip_reputation, 10 fixed, 6 disabled, 2 dead deleted (#2549) 2026-04-22 12:46:53 +02:00
Soxoj 0131f0b64c Add OnlyFans with activation mechanism; updated site ranks (#2546) 2026-04-21 19:03:45 +02:00
Soxoj 98f03c153b Add 3 crypto sites (Polymarket, Zora, Revolut.me), added crypto inves… (#2538)
* Add 3 crypto sites (Polymarket, Zora, Revolut.me), added crypto investigation use case page in docs

* Added fintech tag

* Updated sites metadata
2026-04-21 11:08:48 +02:00
Soxoj 1f823e8322 Fix site checks: 3 fixed, 2 → ip_reputation, 7 disabled, 1 dead deleted (#2539) 2026-04-21 10:58:45 +02:00
Soxoj d6905a8fd8 Fix site checks: 4 fixed, 14 → ip_reputation, 8 disabled, 5 dead deleted (#2537) 2026-04-21 00:40:24 +02:00
Soxoj 7d216638fa fix site checks: 14 sites → ip_reputation, 7 disabled, 5 dead deleted (#2536) 2026-04-20 23:51:18 +02:00
Soxoj fb71f26fd0 Fix site checks: recover 6 CF sites via tls_fingerprint, 500px GraphQL, delete 4 dead domains (#2535) 2026-04-20 22:41:51 +02:00
Soxoj f74f82ee13 Fixed: Hack MD, DailyKos, Mywed, WikimapiaSearch, TikTok Online Viewer (#2526, #2522, #2523, #2500, #2496); Disabled: Radiokot, Lurkmore, Mylespaul, AppleDiscussions, Loveplanet (#2524, #2511, #2498) (#2528) 2026-04-17 17:04:50 +02:00
Soxoj dc8751ac55 Added 3 sites, fixed 6, disabled 8 (#2505) 2026-04-10 12:26:41 +02:00
Copilot 9303b1686d Disable Kinja.com site check (#2503) 2026-04-10 12:16:28 +02:00
Soxoj 5e24117e93 Fix false positives (#2499)
* Re-disable 29 false positives from #2478
2026-04-09 17:48:45 +02:00
Soxoj 777e503e30 Re-enable 69 stale-disabled sites validated via self-check (#2478)
Total: 2539 → 2608 enabled sites (+69).
2026-04-09 12:27:48 +02:00
Soxoj b213f6e079 vBulletin cleanup, Flarum sites, engine stats, UA bump (#2476) 2026-04-09 01:17:24 +02:00
Copilot fbb8255518 Update HackTheBox and Wikipedia to use new API endpoints (#2470)
* Initial plan

* Update HackTheBox and Wikipedia to use new API endpoints for username checking

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/6dc9147c-787f-4f4f-8903-7b9873ac6ac9

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 10:23:07 +02:00
Soxoj 6db1df2ddb Fix failing test for custom DB path resolution (#2468)
* Fix `--db` bug

* Fix test_resolve_db_path_custom_file to create the file before testing

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/3ea7b2e8-0565-4fca-8ec2-eff8eb4ee617

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-08 00:53:57 +02:00
Soxoj 6834483360 Fix Spotify, add Spotify Community forum (#2467) 2026-04-07 18:25:13 +02:00
Soxoj 3fd34afb77 Sites fixes (#2464) 2026-04-06 21:41:16 +02:00
Soxoj ad95302745 Add Markdown reports for LLM analysis (#2463) 2026-04-06 18:26:43 +02:00
Soxoj 6d0a22b738 False positive fixes (#2460)
* Fix false positives: APClips, Taplink, gentoo, Discord.bio, ChaturBate; disable 7Cups, playtime, openriskmanual, reactos; update tags

* Fix db_meta.json regeneration in update_site_data.py (inline instead of module import)

* Fix false positives: disable Bit.ly, Firearmstalk, Needrom, Travelblog; fix gentoo, Discord.bio, brickimedia via API; remove dead sites dreamhost, typepad
2026-04-04 19:08:51 +02:00
Soxoj abce3c9be4 Fix false positives (#2459)
* Fix false positives: APClips, Taplink, gentoo, Discord.bio, ChaturBate; disable 7Cups, playtime, openriskmanual, reactos; update tags

* Fix db_meta.json regeneration in update_site_data.py (inline instead of module import)
2026-04-04 18:22:21 +02:00
Soxoj 269d50eedc DB update mechanism (#2458)
* Database update mechanism
2026-04-04 18:00:50 +02:00
Soxoj e8f4318e5d Added Crypto/Web3 site checks (#2457) 2026-04-04 16:49:12 +02:00
Julio César Suástegui eeb38ccdc0 fix(data): update InterPals absence string to match current site response (#2442)
The previous absence string 'The requested user does not exist or is inactive'
no longer matches the live site response. InterPals now returns 'User not found'
for non-existent profiles, causing false positives for all username searches.

Tested against interpals.net/noneownsthisusername (non-existent) and
interpals.net/blue (claimed) to confirm detection accuracy.

Closes #2433

Co-authored-by: Julio César Suástegui <juliosuas@users.noreply.github.com>
2026-04-03 13:43:33 +02:00
Soxoj 5d502eaef6 Add site protection tracking system, fix broken site checks (Instagra… (#2452)
* Add site protection tracking system, fix broken site checks (Instagram, StackOverflow, LeetCode, Boosty, LiveLib), preserve unicode in data.json

* Update poetry.lock by running poetry lock

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/14333f41-67d5-4e28-a782-9730b31fc667

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-02 20:28:20 +02:00
Soxoj 5aa7f6429b Overhaul site tags and naming: add social tag to 33 networks, fill mi… (#2430)
* Overhaul site tags and naming: add social tag to 33 networks, fill missing tags for 213 top-1000 sites, clean up false us/in country tags (~374 sites), normalize site names to Title Case, add tag validation tests, document tagging and naming rules
Remove LLM folder: ask @soxoj for the up-to-date version!

* Remove LLM/ from version control

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 19:48:16 +01:00
Soxoj a5d337b765 Tags and site names improvements (#2427)
- Added social tag to social networks (33 sites)
- Fixed wrong tags (8 sites)
- Filled empty tags for 213 sites in top-1000
- Country tag cleanup (~374 sites)
- Site naming normalization (75 sites)
- New tests (3)
- Documentation updates
2026-03-28 15:42:12 +01:00
Soxoj 51b452ad71 Add urlProbes (#2425) 2026-03-28 00:08:02 +01:00
Soxoj fa1a4d1b4a Sites re-check (#2423) 2026-03-27 22:41:55 +01:00
Soxoj 656fe1df24 Added Max.ru check; --no-progressbar flag fixed (#2386) 2026-03-25 11:48:12 +01:00
Soxoj bc3d9faad9 Fix false-positive site checks reported by Maigret Bot (#2376) 2026-03-24 23:01:11 +01:00
Soxoj b145e7b26f feat(core): add POST request support, new sites, migrate to Majestic Million ranking (#2317)
* feat(core): add POST request support, new sites, migrate to Majestic Million ranking
- Added native POST request support to the Maigret engine (requestMethod, requestPayload) to enable querying modern JSON registration endpoints.
- Replaced the discontinued Alexa rank API with the Majestic Million dataset for global popularity sorting and automated CI updates.
- Fixed multiple false positives among top 500 sites and bypassed standard anti-bot protections using custom User-Agents.
- Updated public documentation and internal playbooks to reflect the new features.

* feat(data): apply all data.json site check updates from main branch

- Added CTFtime and PentesterLab (new sites added in main)
- Removed forums.imore.com (deleted in main as dead site)
- Disabled 5 sites per main branch fixes: Librusec, MirTesen, amateurvoyeurforum.com, forums.stevehoffman.tv, vegalab
- Fixed 5 site checks per main branch: SoundCloud, Taplink, Setlist, RoyalCams, club.cnews.ru (switched from status_code to message checkType with proper markers)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/a1d194d9-c0ff-4e2b-974c-c5e4b59548bf

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-03-24 22:08:42 +01:00
Copilot 3e56c95e16 Fix SoundCloud false-positive: switch to message-based check (#2355)
* Initial plan

* Fix SoundCloud false-positive: switch from status_code to message checkType

SoundCloud returns HTTP 200 for non-existent user profiles (soft 404),
causing status_code check to report CLAIMED for random usernames.

Switch to message checkType with:
- presenseStrs: hydratable user marker in server-rendered HTML
- absenceStrs: generic page title for non-existent users

Markers sourced from WhatsMyName project's verified SoundCloud entry.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/8aa10eef-78bf-4251-bf42-473cd94c7ef4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 15:12:56 +01:00
Copilot 28f35f9a4f Fix club.cnews.ru false positive: switch from status_code to message checkType (#2342)
* Initial plan

* Fix club.cnews.ru false positive: switch from status_code to message checkType with absence strings

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/af131d2f-c7b5-4798-8ad1-86bab2673fe4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 10:52:23 +01:00
Julio César Suástegui 79cea49526 feat: add CTFtime and PentesterLab site support (#2318)
Add two cybersecurity platforms for username enumeration:
- CTFtime (ctftime.org) - CTF competition platform
- PentesterLab (pentesterlab.com) - Security training platform

Both verified working with status_code check type.
Returns 200 for existing users, 404 for non-existent.

Co-authored-by: Julio César Suástegui <juliosuas@users.noreply.github.com>
2026-03-24 10:52:07 +01:00
Copilot eb541dcf51 Disable MirTesen site check (false positive) (#2350)
* Initial plan

* Disable MirTesen site check to fix false-positive probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/61c86064-423d-4f1b-8277-2838f747dd89

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 09:51:31 +01:00
Copilot 4c97025a32 Disable Librusec site check (false positive) (#2349)
* Initial plan

* Disable Librusec site check to fix false-positive probe

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-03-24 09:51:16 +01:00
Copilot d3f13ac295 Fix false-positive site probe: Re-enable Taplink with message checkType (#2326)
* Initial plan

* Disable Taplink site check to fix false-positive detections

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/ef9281f4-ba67-4760-a6e2-57564ac4ea94

* Re-enable Taplink with message checkType and absenceStrs

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/db3e572e-b79b-4cec-ac7f-062e76144660

* Improve Taplink absenceStrs: add Russian variant and presenseStrs

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/28e24317-e8b9-45f6-bad5-0e549b891313

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 21:36:36 +01:00