Commit Graph

541 Commits

Author SHA1 Message Date
Soxoj 656fe1df24 Added Max.ru check; --no-progressbar flag fixed (#2386) 2026-03-25 11:48:12 +01:00
Soxoj bc3d9faad9 Fix false-positive site checks reported by Maigret Bot (#2376) 2026-03-24 23:01:11 +01:00
Soxoj b145e7b26f feat(core): add POST request support, new sites, migrate to Majestic Million ranking (#2317)
* feat(core): add POST request support, new sites, migrate to Majestic Million ranking
- Added native POST request support to the Maigret engine (requestMethod, requestPayload) to enable querying modern JSON registration endpoints.
- Replaced the discontinued Alexa rank API with the Majestic Million dataset for global popularity sorting and automated CI updates.
- Fixed multiple false positives among top 500 sites and bypassed standard anti-bot protections using custom User-Agents.
- Updated public documentation and internal playbooks to reflect the new features.

* feat(data): apply all data.json site check updates from main branch

- Added CTFtime and PentesterLab (new sites added in main)
- Removed forums.imore.com (deleted in main as dead site)
- Disabled 5 sites per main branch fixes: Librusec, MirTesen, amateurvoyeurforum.com, forums.stevehoffman.tv, vegalab
- Fixed 5 site checks per main branch: SoundCloud, Taplink, Setlist, RoyalCams, club.cnews.ru (switched from status_code to message checkType with proper markers)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/a1d194d9-c0ff-4e2b-974c-c5e4b59548bf

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-03-24 22:08:42 +01:00
Copilot abd9aa57fe Fix domain substring matching and NoneType crash in submit dialog (#2367)
* Initial plan

* Fix domain matching and NoneType error in submit.py

- Use regex with domain boundary matching instead of substring matching
  to prevent x.com from matching 500px.com, mix.com, etc.
- Handle None old_site gracefully when user enters a site name not in
  the matched list, fixing AttributeError crash.
- Add tests for both fixes.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/7eabc755-47fd-4b80-a38c-9d6c056c2ce9

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 22:04:10 +01:00
Copilot 2e430e5039 feat: add tag blacklisting via --exclude-tags (#2352)
* Initial plan

* feat: add tag blacklisting support (--exclude-tags CLI flag, web UI, docs, tests)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/1a656af2-36bf-494f-9f03-1b5340f0357c

* fix: correct tag cloud label to match click-cycle interaction

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/1a656af2-36bf-494f-9f03-1b5340f0357c

* feat: add all country tags to web interface tag cloud

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/7e184b24-ff26-48fd-8a93-aea12b0a8d7b

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 22:00:59 +01:00
Copilot 3e56c95e16 Fix SoundCloud false-positive: switch to message-based check (#2355)
* Initial plan

* Fix SoundCloud false-positive: switch from status_code to message checkType

SoundCloud returns HTTP 200 for non-existent user profiles (soft 404),
causing status_code check to report CLAIMED for random usernames.

Switch to message checkType with:
- presenseStrs: hydratable user marker in server-rendered HTML
- absenceStrs: generic page title for non-existent users

Markers sourced from WhatsMyName project's verified SoundCloud entry.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/8aa10eef-78bf-4251-bf42-473cd94c7ef4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 15:12:56 +01:00
Copilot 28f35f9a4f Fix club.cnews.ru false positive: switch from status_code to message checkType (#2342)
* Initial plan

* Fix club.cnews.ru false positive: switch from status_code to message checkType with absence strings

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/af131d2f-c7b5-4798-8ad1-86bab2673fe4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 10:52:23 +01:00
Julio César Suástegui 79cea49526 feat: add CTFtime and PentesterLab site support (#2318)
Add two cybersecurity platforms for username enumeration:
- CTFtime (ctftime.org) - CTF competition platform
- PentesterLab (pentesterlab.com) - Security training platform

Both verified working with status_code check type.
Returns 200 for existing users, 404 for non-existent.

Co-authored-by: Julio César Suástegui <juliosuas@users.noreply.github.com>
2026-03-24 10:52:07 +01:00
Copilot eb541dcf51 Disable MirTesen site check (false positive) (#2350)
* Initial plan

* Disable MirTesen site check to fix false-positive probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/61c86064-423d-4f1b-8277-2838f747dd89

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 09:51:31 +01:00
Copilot 4c97025a32 Disable Librusec site check (false positive) (#2349)
* Initial plan

* Disable Librusec site check to fix false-positive probe

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-03-24 09:51:16 +01:00
Copilot d3f13ac295 Fix false-positive site probe: Re-enable Taplink with message checkType (#2326)
* Initial plan

* Disable Taplink site check to fix false-positive detections

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/ef9281f4-ba67-4760-a6e2-57564ac4ea94

* Re-enable Taplink with message checkType and absenceStrs

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/db3e572e-b79b-4cec-ac7f-062e76144660

* Improve Taplink absenceStrs: add Russian variant and presenseStrs

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/28e24317-e8b9-45f6-bad5-0e549b891313

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 21:36:36 +01:00
Copilot 00a9249229 [WIP] Fix invalid link on forums.imore.com (#2337)
* Initial plan

* Remove dead forums.imore.com site from database

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/c83530d0-d24f-45fc-aca3-ae1e46ece33c

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:20:28 +01:00
Copilot 005863c2e0 Fix Setlist site check: switch to message checkType with proper markers (#2333)
* Initial plan

* Disable Setlist site check due to false positives (soft 404)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/8c552ca6-51e5-4e79-a791-ddd6f27d2461

* Fix Setlist check: switch to message checkType with proper markers

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/3c387df6-1dfe-451f-96d8-b4b6455f7857

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:18:33 +01:00
Copilot e3aada6aef Fix RoyalCams site check using BongaCams white-label pattern (#2334)
* Initial plan

* Disable RoyalCams site check to fix false-positive probe

The Telegram Maigret bot auto-probe reported CLAIMED for three random
usernames. The status_code checkType is unreliable as the site returns
200 for non-existent user profiles (soft 404). Disabling the site check
until a reliable detection method can be established.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/05b3d513-fe15-477d-a455-0c9ddf0b8b51

* Fix RoyalCams: switch to message checkType using BongaCams white-label pattern

RoyalCams runs on the BongaCams platform. Applied the same fix pattern:
- Switch from status_code to message checkType
- Use Portuguese locale (pt.royalcams.com) as urlProbe
- absenceStrs matches generic title on non-existent profiles
- presenseStrs matches Portuguese profile title for existing users
- Add browser-like headers matching BongaCams config
- Remove disabled flag

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/2f6a9523-278a-4992-ba7c-c320de14bfa4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:16:45 +01:00
Copilot 9b35fc1ab0 [WIP] Fix false-positive probe for vegalab site (#2336)
* Initial plan

* Disable vegalab site check: domain is dead (DNS does not resolve), causing false positives

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/98430e81-5dcb-4cb3-9aaa-f8c5ce86d026

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:09:46 +01:00
Copilot 146bc0481b Disable forums.stevehoffman.tv due to false positives (#2331)
* Initial plan

* Disable forums.stevehoffman.tv to fix false-positive site probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/39fea4a9-ec6d-4a12-b34b-1a3486d647e4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:08:15 +01:00
Copilot 5930a3022e Disable false-positive site probe: amateurvoyeurforum.com (#2332)
* Initial plan

* Disable amateurvoyeurforum.com site check to fix false positives

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/e7fcad2b-4511-4e6d-b186-411951170e0a

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:07:42 +01:00
Copilot b1a211c3cd Disable forums.developer.nvidia.com (auth-gated user profiles) (#2305)
* Initial plan

* disable forums.developer.nvidia.com due to auth-locked user pages

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/b8f41f15-8588-4aac-a443-af5e2aaa1918

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:43:51 +01:00
Copilot 56d0c9f2f1 Remove dead site xxxforum.org (#2310)
* Initial plan

* Remove broken site xxxforum.org from data.json and sites.md

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/bfbd3aa8-bfb1-480a-b2e7-a2c40fc69def

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:43:21 +01:00
Copilot 01049b730d Fix Love.Mail.ru: update to numeric-only identifiers and new profile URL (#2307)
* Initial plan

* fix: update Love.Mail.ru to use numeric-only identifiers (#1264)

- Add regexCheck to enforce numeric-only IDs (^\d+$)
- Update usernameClaimed/usernameUnclaimed to numeric values
- Site remains disabled pending live verification

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/6de16097-6bc1-424a-beb1-1d2ec6b99944

* fix: update Love.Mail.ru URL to /profile/ path, enable check with verified ID

Use maintainer-provided working link https://love.mail.ru/profile/1838153357.
- Change URL pattern from /ru/{username} to /profile/{username}
- Set usernameClaimed to 1838153357
- Remove disabled flag to enable the check

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/ac07d38e-46e2-42d3-9e93-eda3e5cfbcc3

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:42:59 +01:00
Copilot 4f397fed1c Re-enable taplink.cc with browser User-Agent to bypass Cloudflare (#2308)
* Initial plan

* fix(taplink): re-enable taplink.cc with browser User-Agent header to bypass Cloudflare

Remove disabled flag and add a Chrome User-Agent header to help
bypass Cloudflare bot detection for taplink.cc profile checks.
If Cloudflare still blocks requests, maigret's built-in error
detection will gracefully mark results as UNKNOWN.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/271904b6-e358-4aeb-b503-21c9b91186d9

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:10:44 +01:00
Soxoj 959b2be136 feat(sites): fix false positives: disable 74 broken sites, fix 8 with API probes and better markers (#2302)
- Disable 74 sites: Cloudflare/captcha blocks, identical responses,
    dead domains, vBulletin/phpBB engine failures
  - Fix Roblox, Salon24.pl, Planetaexcel → status_code (clear 404 signal)
  - Fix en.brickimedia.org → message with "noarticletext" absenceStr
  - Fix Arduino → narrower title-based presenseStrs/absenceStrs
  - Re-enable Fandom (3 wikis) via MediaWiki api.php urlProbe
  - Re-enable Substack via /api/v1/user/{}/public_profile urlProbe
  - Re-enable hashnode via GraphQL GET urlProbe (URL-encoded query)
  - Document lessons: engine template drift, search-by-author fragility,
    always-200 sites, TLS degradation, API bypassing Cloudflare,
    GraphQL GET support, URL-encoding for template safety
2026-03-22 20:47:51 +01:00
Soxoj 97cc4b46d9 Improve site-check quality: fix broken site configs, add diagnostic utilities, and make self-check report-only by default with opt-in auto-disable. (#2301)
- Fix VK and TradingView checkType; add Reddit and Microsoft Learn API-style probes where appropriate; adjust or disable entries that are unreliable under anti-bot protection.
- Self-check: stop aggressive auto-disable; default to reporting issues only; add --auto-disable and --diagnose for optional fixes and deeper output.
- Tooling: add utils/site_check.py and utils/check_top_n.py (and related helpers) to inspect and rank site behavior against the top-N list
- Scope: aligns with fixing top-traffic / high-impact sites and making diagnostics repeatable without silently flipping disabled flags
2026-03-22 16:48:35 +01:00
Soxoj 227a25bfa1 Twitter fixed, mirrors mechanism improvement (#2299) 2026-03-22 01:14:17 +01:00
Soxoj f99091f5f7 Fixed false positives in top-500 (#2292) 2026-03-21 23:35:59 +01:00
Tang Vu 4cd1fccaa3 ♻️ Refactor: Hardcoded relative path for database file (#2285)
* refactor: hardcoded relative path for database file

`app.config['MAIGRET_DB_FILE']` is set to a hardcoded relative path `os.path.join('maigret', 'resources', 'data.json')`. If the Flask application is executed from a different working directory (other than the repository root), it will fail to find the database file and crash.

Affected files: app.py, settings.py

* refactor: hardcoded relative path for database file

`app.config['MAIGRET_DB_FILE']` is set to a hardcoded relative path `os.path.join('maigret', 'resources', 'data.json')`. If the Flask application is executed from a different working directory (other than the repository root), it will fail to find the database file and crash.

Affected files: app.py, settings.py
2026-03-21 18:06:36 +01:00
Soxoj 48ca13dc4d Make web interface accessible for Docker deployment by default (#2189) 2025-08-31 16:14:42 +02:00
Soxoj fb26ccd1f6 Disabled some sites giving false positive results (#2170) 2025-08-22 03:10:47 +02:00
Soxoj bebadb0362 Bump to 0.5.0 (#2108) 2025-08-10 13:10:50 +02:00
MR-VL d90d8a8ac9 Disable AskFM (#2037) 2025-07-13 16:16:49 +02:00
Darlyson Rangel c9e38632ca Disable ICQ site (#1993) 2025-06-28 23:46:09 +02:00
Pierre-Yves Lapersonne f76ea5d738 [#2010] Add 6 more websites to manage (#2009)
* feat: add `framapiaf.org` in supported web sites, add tag `mastodon` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `write.as` in supported web sites, add tag `writefreely` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `programming.dev` in supported web sites, add tag `lemmy` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `mamot.fr` in supported web sites (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `pixelfed.social` in supported web sites, add tag `pixelfed` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `Outgress` in supported web sites (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* Updated the list of supported sites

---------

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>
Co-authored-by: Soxoj <soxoj@protonmail.com>
2025-06-28 23:33:29 +02:00
pykereaper b21ac36b27 Fix usage of data.json files from web (#2020) 2025-06-28 23:20:02 +02:00
pykereaper 0f7aa2c456 Pass db_file configuration to web interface (#2019)
* pass db_file configuration to web interface
* Autoformatting

---------

Co-authored-by: Soxoj <soxoj@protonmail.com>
2025-06-28 23:15:56 +02:00
Soxoj 97e5f600d0 Async generator-executor for site checks (#1978) 2024-12-17 22:48:11 +01:00
overcuriousity 36ce285572 make graph more meaningful (#1977)
* make graph more meaningful

if a search with multiple usernames is launched, it creates an additional site node where they both are found. 
advantages:
- better recognition, that users have a connection with each other
- better detection of false positives when launching a search with two fake usernames (site node = definite false positive)

* fix Graph linking report.py
2024-12-17 16:51:19 +01:00
overcuriousity c2e3e96cb7 Improving the web interface (#1975)
* update web interface with commandline options
* improve web interface
* update README images of web interface
* fix bug in app.py
* fix web interface
2024-12-17 16:50:49 +01:00
Soxoj c3dfe9cb4d Small docs and parameters fixes for web interface mode (#1973) 2024-12-16 17:18:22 +01:00
overcuriousity 88d68490f3 Created web frontend launched via --web flag (#1967)
Author: overcuriousity 
Co-authored-by: Soxoj <soxoj@protonmail.com>
2024-12-16 14:24:03 +01:00
Soxoj cb01535565 Preparation of 0.5.0 alpha version (#1966) 2024-12-13 12:51:31 +01:00
Soxoj c4af0a4df0 Fixed flaky tests to check cookies (#1965) 2024-12-13 12:37:58 +01:00
Soxoj b2283a5b04 Merge pull request #1961 from overcuriousity/main
fix bad linux filename generation
2024-12-12 22:07:21 +01:00
Soxoj f212bc9bc8 Site check fixes 2024-12-12 21:39:35 +01:00
overcuriousity b8c62f95ae fix bad linux filename generation
currently maigret parses urls as usernames related to gravatar. this leads to bad filenames of the output on my linux host, as the slashes cause it to try to write subfolders, causing the script to abort with the error "file does not exist".
Applied a simple fix to replace all "/" with "_" in output file generation.
2024-12-12 15:00:54 +01:00
Soxoj 2653c617f8 Merge pull request #1958 from soxoj/gravatar-pypi-fix
Fixed Gravatar parsing (socid_extractor)
2024-12-12 02:32:35 +01:00
Soxoj 4dd82bf4c9 Fixed Gravatar parsing (socid_extractor) 2024-12-12 02:30:29 +01:00
Ikko Eltociear Ashimine f8ab484cd2 chore: update submit.py
futher -> further
2024-12-11 23:23:45 +09:00
Soxoj 64ae391a4a Updated Vimeo, CNET, DailyMotion 2024-12-11 01:17:20 +01:00
Soxoj 127d9032c3 Fixed Vimeo, activation/probing mechanisms improvements 2024-12-11 00:56:00 +01:00
Soxoj 81a817a39f Improved "submit new site" mode, added tests, fixed top-500 sites (#1952) 2024-12-10 18:02:43 +01:00