Commit Graph

583 Commits

Author SHA1 Message Date
Copilot eb541dcf51 Disable MirTesen site check (false positive) (#2350)
* Initial plan

* Disable MirTesen site check to fix false-positive probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/61c86064-423d-4f1b-8277-2838f747dd89

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 09:51:31 +01:00
Copilot 4c97025a32 Disable Librusec site check (false positive) (#2349)
* Initial plan

* Disable Librusec site check to fix false-positive probe

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-03-24 09:51:16 +01:00
Copilot d3f13ac295 Fix false-positive site probe: Re-enable Taplink with message checkType (#2326)
* Initial plan

* Disable Taplink site check to fix false-positive detections

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/ef9281f4-ba67-4760-a6e2-57564ac4ea94

* Re-enable Taplink with message checkType and absenceStrs

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/db3e572e-b79b-4cec-ac7f-062e76144660

* Improve Taplink absenceStrs: add Russian variant and presenseStrs

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/28e24317-e8b9-45f6-bad5-0e549b891313

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 21:36:36 +01:00
Copilot 00a9249229 [WIP] Fix invalid link on forums.imore.com (#2337)
* Initial plan

* Remove dead forums.imore.com site from database

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/c83530d0-d24f-45fc-aca3-ae1e46ece33c

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:20:28 +01:00
Copilot 005863c2e0 Fix Setlist site check: switch to message checkType with proper markers (#2333)
* Initial plan

* Disable Setlist site check due to false positives (soft 404)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/8c552ca6-51e5-4e79-a791-ddd6f27d2461

* Fix Setlist check: switch to message checkType with proper markers

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/3c387df6-1dfe-451f-96d8-b4b6455f7857

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:18:33 +01:00
Copilot e3aada6aef Fix RoyalCams site check using BongaCams white-label pattern (#2334)
* Initial plan

* Disable RoyalCams site check to fix false-positive probe

The Telegram Maigret bot auto-probe reported CLAIMED for three random
usernames. The status_code checkType is unreliable as the site returns
200 for non-existent user profiles (soft 404). Disabling the site check
until a reliable detection method can be established.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/05b3d513-fe15-477d-a455-0c9ddf0b8b51

* Fix RoyalCams: switch to message checkType using BongaCams white-label pattern

RoyalCams runs on the BongaCams platform. Applied the same fix pattern:
- Switch from status_code to message checkType
- Use Portuguese locale (pt.royalcams.com) as urlProbe
- absenceStrs matches generic title on non-existent profiles
- presenseStrs matches Portuguese profile title for existing users
- Add browser-like headers matching BongaCams config
- Remove disabled flag

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/2f6a9523-278a-4992-ba7c-c320de14bfa4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:16:45 +01:00
Copilot 9b35fc1ab0 [WIP] Fix false-positive probe for vegalab site (#2336)
* Initial plan

* Disable vegalab site check: domain is dead (DNS does not resolve), causing false positives

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/98430e81-5dcb-4cb3-9aaa-f8c5ce86d026

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:09:46 +01:00
Copilot 146bc0481b Disable forums.stevehoffman.tv due to false positives (#2331)
* Initial plan

* Disable forums.stevehoffman.tv to fix false-positive site probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/39fea4a9-ec6d-4a12-b34b-1a3486d647e4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:08:15 +01:00
Copilot 5930a3022e Disable false-positive site probe: amateurvoyeurforum.com (#2332)
* Initial plan

* Disable amateurvoyeurforum.com site check to fix false positives

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/e7fcad2b-4511-4e6d-b186-411951170e0a

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:07:42 +01:00
Copilot b1a211c3cd Disable forums.developer.nvidia.com (auth-gated user profiles) (#2305)
* Initial plan

* disable forums.developer.nvidia.com due to auth-locked user pages

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/b8f41f15-8588-4aac-a443-af5e2aaa1918

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:43:51 +01:00
Copilot 56d0c9f2f1 Remove dead site xxxforum.org (#2310)
* Initial plan

* Remove broken site xxxforum.org from data.json and sites.md

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/bfbd3aa8-bfb1-480a-b2e7-a2c40fc69def

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:43:21 +01:00
Copilot 01049b730d Fix Love.Mail.ru: update to numeric-only identifiers and new profile URL (#2307)
* Initial plan

* fix: update Love.Mail.ru to use numeric-only identifiers (#1264)

- Add regexCheck to enforce numeric-only IDs (^\d+$)
- Update usernameClaimed/usernameUnclaimed to numeric values
- Site remains disabled pending live verification

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/6de16097-6bc1-424a-beb1-1d2ec6b99944

* fix: update Love.Mail.ru URL to /profile/ path, enable check with verified ID

Use maintainer-provided working link https://love.mail.ru/profile/1838153357.
- Change URL pattern from /ru/{username} to /profile/{username}
- Set usernameClaimed to 1838153357
- Remove disabled flag to enable the check

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/ac07d38e-46e2-42d3-9e93-eda3e5cfbcc3

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:42:59 +01:00
Copilot 4f397fed1c Re-enable taplink.cc with browser User-Agent to bypass Cloudflare (#2308)
* Initial plan

* fix(taplink): re-enable taplink.cc with browser User-Agent header to bypass Cloudflare

Remove disabled flag and add a Chrome User-Agent header to help
bypass Cloudflare bot detection for taplink.cc profile checks.
If Cloudflare still blocks requests, maigret's built-in error
detection will gracefully mark results as UNKNOWN.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/271904b6-e358-4aeb-b503-21c9b91186d9

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:10:44 +01:00
Soxoj 959b2be136 feat(sites): fix false positives: disable 74 broken sites, fix 8 with API probes and better markers (#2302)
- Disable 74 sites: Cloudflare/captcha blocks, identical responses,
    dead domains, vBulletin/phpBB engine failures
  - Fix Roblox, Salon24.pl, Planetaexcel → status_code (clear 404 signal)
  - Fix en.brickimedia.org → message with "noarticletext" absenceStr
  - Fix Arduino → narrower title-based presenseStrs/absenceStrs
  - Re-enable Fandom (3 wikis) via MediaWiki api.php urlProbe
  - Re-enable Substack via /api/v1/user/{}/public_profile urlProbe
  - Re-enable hashnode via GraphQL GET urlProbe (URL-encoded query)
  - Document lessons: engine template drift, search-by-author fragility,
    always-200 sites, TLS degradation, API bypassing Cloudflare,
    GraphQL GET support, URL-encoding for template safety
2026-03-22 20:47:51 +01:00
Soxoj 97cc4b46d9 Improve site-check quality: fix broken site configs, add diagnostic utilities, and make self-check report-only by default with opt-in auto-disable. (#2301)
- Fix VK and TradingView checkType; add Reddit and Microsoft Learn API-style probes where appropriate; adjust or disable entries that are unreliable under anti-bot protection.
- Self-check: stop aggressive auto-disable; default to reporting issues only; add --auto-disable and --diagnose for optional fixes and deeper output.
- Tooling: add utils/site_check.py and utils/check_top_n.py (and related helpers) to inspect and rank site behavior against the top-N list
- Scope: aligns with fixing top-traffic / high-impact sites and making diagnostics repeatable without silently flipping disabled flags
2026-03-22 16:48:35 +01:00
Soxoj 227a25bfa1 Twitter fixed, mirrors mechanism improvement (#2299) 2026-03-22 01:14:17 +01:00
Soxoj f99091f5f7 Fixed false positives in top-500 (#2292) 2026-03-21 23:35:59 +01:00
Tang Vu 4cd1fccaa3 ♻️ Refactor: Hardcoded relative path for database file (#2285)
* refactor: hardcoded relative path for database file

`app.config['MAIGRET_DB_FILE']` is set to a hardcoded relative path `os.path.join('maigret', 'resources', 'data.json')`. If the Flask application is executed from a different working directory (other than the repository root), it will fail to find the database file and crash.

Affected files: app.py, settings.py

* refactor: hardcoded relative path for database file

`app.config['MAIGRET_DB_FILE']` is set to a hardcoded relative path `os.path.join('maigret', 'resources', 'data.json')`. If the Flask application is executed from a different working directory (other than the repository root), it will fail to find the database file and crash.

Affected files: app.py, settings.py
2026-03-21 18:06:36 +01:00
Soxoj 48ca13dc4d Make web interface accessible for Docker deployment by default (#2189) 2025-08-31 16:14:42 +02:00
Soxoj fb26ccd1f6 Disabled some sites giving false positive results (#2170) 2025-08-22 03:10:47 +02:00
Soxoj bebadb0362 Bump to 0.5.0 (#2108) 2025-08-10 13:10:50 +02:00
MR-VL d90d8a8ac9 Disable AskFM (#2037) 2025-07-13 16:16:49 +02:00
Darlyson Rangel c9e38632ca Disable ICQ site (#1993) 2025-06-28 23:46:09 +02:00
Pierre-Yves Lapersonne f76ea5d738 [#2010] Add 6 more websites to manage (#2009)
* feat: add `framapiaf.org` in supported web sites, add tag `mastodon` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `write.as` in supported web sites, add tag `writefreely` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `programming.dev` in supported web sites, add tag `lemmy` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `mamot.fr` in supported web sites (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `pixelfed.social` in supported web sites, add tag `pixelfed` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `Outgress` in supported web sites (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* Updated the list of supported sites

---------

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>
Co-authored-by: Soxoj <soxoj@protonmail.com>
2025-06-28 23:33:29 +02:00
pykereaper b21ac36b27 Fix usage of data.json files from web (#2020) 2025-06-28 23:20:02 +02:00
pykereaper 0f7aa2c456 Pass db_file configuration to web interface (#2019)
* pass db_file configuration to web interface
* Autoformatting

---------

Co-authored-by: Soxoj <soxoj@protonmail.com>
2025-06-28 23:15:56 +02:00
Soxoj 97e5f600d0 Async generator-executor for site checks (#1978) 2024-12-17 22:48:11 +01:00
overcuriousity 36ce285572 make graph more meaningful (#1977)
* make graph more meaningful

if a search with multiple usernames is launched, it creates an additional site node where they both are found. 
advantages:
- better recognition, that users have a connection with each other
- better detection of false positives when launching a search with two fake usernames (site node = definite false positive)

* fix Graph linking report.py
2024-12-17 16:51:19 +01:00
overcuriousity c2e3e96cb7 Improving the web interface (#1975)
* update web interface with commandline options
* improve web interface
* update README images of web interface
* fix bug in app.py
* fix web interface
2024-12-17 16:50:49 +01:00
Soxoj c3dfe9cb4d Small docs and parameters fixes for web interface mode (#1973) 2024-12-16 17:18:22 +01:00
overcuriousity 88d68490f3 Created web frontend launched via --web flag (#1967)
Author: overcuriousity 
Co-authored-by: Soxoj <soxoj@protonmail.com>
2024-12-16 14:24:03 +01:00
Soxoj cb01535565 Preparation of 0.5.0 alpha version (#1966) 2024-12-13 12:51:31 +01:00
Soxoj c4af0a4df0 Fixed flaky tests to check cookies (#1965) 2024-12-13 12:37:58 +01:00
Soxoj b2283a5b04 Merge pull request #1961 from overcuriousity/main
fix bad linux filename generation
2024-12-12 22:07:21 +01:00
Soxoj f212bc9bc8 Site check fixes 2024-12-12 21:39:35 +01:00
overcuriousity b8c62f95ae fix bad linux filename generation
currently maigret parses urls as usernames related to gravatar. this leads to bad filenames of the output on my linux host, as the slashes cause it to try to write subfolders, causing the script to abort with the error "file does not exist".
Applied a simple fix to replace all "/" with "_" in output file generation.
2024-12-12 15:00:54 +01:00
Soxoj 2653c617f8 Merge pull request #1958 from soxoj/gravatar-pypi-fix
Fixed Gravatar parsing (socid_extractor)
2024-12-12 02:32:35 +01:00
Soxoj 4dd82bf4c9 Fixed Gravatar parsing (socid_extractor) 2024-12-12 02:30:29 +01:00
Ikko Eltociear Ashimine f8ab484cd2 chore: update submit.py
futher -> further
2024-12-11 23:23:45 +09:00
Soxoj 64ae391a4a Updated Vimeo, CNET, DailyMotion 2024-12-11 01:17:20 +01:00
Soxoj 127d9032c3 Fixed Vimeo, activation/probing mechanisms improvements 2024-12-11 00:56:00 +01:00
Soxoj 81a817a39f Improved "submit new site" mode, added tests, fixed top-500 sites (#1952) 2024-12-10 18:02:43 +01:00
Soxoj 51ab988e36 Fixed ProductHunt check (#1951) 2024-12-09 17:06:03 +01:00
Soxoj 5517636850 Updated OP.GG checks (#1950)
* Updated OP.GG checks
* Finalized LoL, added Valorant, disabled Archive.org
2024-12-09 15:59:19 +01:00
Soxoj 4eada16b94 Added a test for submitter (#1944) 2024-12-08 13:35:27 +01:00
Soxoj c66d776f8a Refactoring, test coverage increased to 60% (#1943) 2024-12-08 02:13:28 +01:00
Soxoj 4b1317789d Refactored self-check method, code formatting, small lint fixes (#1942) 2024-12-07 18:05:30 +01:00
Soxoj 8b7d8073d9 Fixed Linktr and discourse.mozilla.org (#1941) 2024-12-07 17:11:39 +01:00
Soxoj 2aa1ea39a0 Site fixes (#1940) 2024-12-06 14:27:38 +01:00
Soxoj cd789ed138 Fixed Ebay and BongaCams checks (#1939) 2024-12-06 13:32:51 +01:00