Compare commits

...

318 Commits

Author SHA1 Message Date
Soxoj 7a8b82f1da Merge branch 'freelabz-fix/lxml-python314' 2026-04-08 00:49:18 +02:00
Copilot f83b73bf89 Fix crash on -a --self-check by adding exception handling to site check coroutines (#2466)
* Initial plan

* Fix crash on -a --self-check by adding exception handling in site_self_check and self_check

Wrap the body of site_self_check in try/except to catch unexpected errors
and always return a valid changes dict. Also add a safety-net try/except
in self_check around awaiting individual site check futures so that a
single site failure doesn't crash the entire self-check process.

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/5e27d620-5cbb-43d2-a9f9-ecb53a29904d

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

* Restore @pytest.mark.slow on test_maigret_results

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/5e27d620-5cbb-43d2-a9f9-ecb53a29904d

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

* Document --self-check error resilience, --auto-disable, and --diagnose in docs/

Update command-line-options.rst with expanded --self-check description
and new --auto-disable and --diagnose entries. Add a "Database self-check"
section to features.rst explaining error-resilient behaviour and usage
examples. Update usage-examples.rst to reference --auto-disable.

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/af1f0f09-9112-4902-8475-e81d235ff3ed

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
Soxoj 6083ea2664 Fix Spotify, add Spotify Community forum (#2467) 2026-04-08 00:48:37 +02:00
Copilot 7307e5328b Add installation troubleshooting for missing system dependencies (#2465)
* Initial plan

* Add installation troubleshooting section for system dependency errors

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/6c3a5612-bdd5-4611-ba77-aea7ab52e304

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

* Simplify README troubleshooting to a link to the full docs

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/6c557093-0643-4980-93ad-973e2d3141ef

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
Soxoj 86fb3cb414 Sites fixes (#2464) 2026-04-08 00:48:37 +02:00
Soxoj be01cb5d1b Add Markdown reports for LLM analysis (#2463) 2026-04-08 00:48:37 +02:00
dependabot[bot] 2f5df91a82 build(deps): bump curl-cffi from 0.14.0 to 0.15.0 (#2462)
Bumps [curl-cffi](https://github.com/lexiforest/curl_cffi) from 0.14.0 to 0.15.0.
- [Release notes](https://github.com/lexiforest/curl_cffi/releases)
- [Changelog](https://github.com/lexiforest/curl_cffi/blob/main/docs/changelog.rst)
- [Commits](https://github.com/lexiforest/curl_cffi/compare/v0.14.0...v0.15.0)

---
updated-dependencies:
- dependency-name: curl-cffi
  dependency-version: 0.15.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
Soxoj 5578b6881a False positive fixes (#2460)
* Fix false positives: APClips, Taplink, gentoo, Discord.bio, ChaturBate; disable 7Cups, playtime, openriskmanual, reactos; update tags

* Fix db_meta.json regeneration in update_site_data.py (inline instead of module import)

* Fix false positives: disable Bit.ly, Firearmstalk, Needrom, Travelblog; fix gentoo, Discord.bio, brickimedia via API; remove dead sites dreamhost, typepad
2026-04-08 00:48:37 +02:00
Soxoj f9f9ec8ada Fix false positives (#2459)
* Fix false positives: APClips, Taplink, gentoo, Discord.bio, ChaturBate; disable 7Cups, playtime, openriskmanual, reactos; update tags

* Fix db_meta.json regeneration in update_site_data.py (inline instead of module import)
2026-04-08 00:48:37 +02:00
Soxoj f3093fd5af DB update mechanism (#2458)
* Database update mechanism
2026-04-08 00:48:37 +02:00
Soxoj 66b741793e Added Crypto/Web3 site checks (#2457) 2026-04-08 00:48:37 +02:00
Soxoj 59b1570f1f Update of MIT License (#2455) 2026-04-08 00:48:37 +02:00
Julio César Suástegui 4f070f5e6c fix(data): update InterPals absence string to match current site response (#2442)
The previous absence string 'The requested user does not exist or is inactive'
no longer matches the live site response. InterPals now returns 'User not found'
for non-existent profiles, causing false positives for all username searches.

Tested against interpals.net/noneownsthisusername (non-existent) and
interpals.net/blue (claimed) to confirm detection accuracy.

Closes #2433

Co-authored-by: Julio César Suástegui <juliosuas@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
Soxoj 2329bd62fd Multiple lint and types fixes (#2454) 2026-04-08 00:48:37 +02:00
Soxoj 99847ad3e7 Add site protection tracking system, fix broken site checks (Instagra… (#2452)
* Add site protection tracking system, fix broken site checks (Instagram, StackOverflow, LeetCode, Boosty, LiveLib), preserve unicode in data.json

* Update poetry.lock by running poetry lock

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/14333f41-67d5-4e28-a782-9730b31fc667

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
dependabot[bot] c8a183683a build(deps): bump aiohttp from 3.13.4 to 3.13.5 (#2448)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
dependabot[bot] 77288cd4ca build(deps-dev): bump mypy from 1.19.1 to 1.20.0 (#2447)
Bumps [mypy](https://github.com/python/mypy) from 1.19.1 to 1.20.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.19.1...v1.20.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.20.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
dependabot[bot] 870bb1761e build(deps): bump requests from 2.33.0 to 2.33.1 (#2444)
Bumps [requests](https://github.com/psf/requests) from 2.33.0 to 2.33.1.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.33.0...v2.33.1)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.33.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
dependabot[bot] 02de3b649a build(deps): bump pygments from 2.18.0 to 2.20.0 (#2440)
Bumps [pygments](https://github.com/pygments/pygments) from 2.18.0 to 2.20.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.18.0...2.20.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-version: 2.20.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
dependabot[bot] cb0288ceeb build(deps): bump aiohttp from 3.13.3 to 3.13.4 (#2435)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
dependabot[bot] 07d1839148 build(deps): bump platformdirs from 4.5.0 to 4.9.4 (#2434)
Bumps [platformdirs](https://github.com/tox-dev/platformdirs) from 4.5.0 to 4.9.4.
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/docs/changelog.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.5.0...4.9.4)

---
updated-dependencies:
- dependency-name: platformdirs
  dependency-version: 4.9.4
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
dependabot[bot] 687ef4d249 build(deps): bump chardet from 5.2.0 to 7.4.0.post2 (#2436)
Bumps [chardet](https://github.com/chardet/chardet) from 5.2.0 to 7.4.0.post2.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](https://github.com/chardet/chardet/compare/5.2.0...7.4.0.post2)

---
updated-dependencies:
- dependency-name: chardet
  dependency-version: 7.4.0.post2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
dependabot[bot] 661cfb8a69 build(deps): bump multidict from 6.7.0 to 6.7.1 (#2396)
Bumps [multidict](https://github.com/aio-libs/multidict) from 6.7.0 to 6.7.1.
- [Release notes](https://github.com/aio-libs/multidict/releases)
- [Changelog](https://github.com/aio-libs/multidict/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/multidict/compare/v6.7.0...v6.7.1)

---
updated-dependencies:
- dependency-name: multidict
  dependency-version: 6.7.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
Soxoj f41f9439fc Overhaul site tags and naming: add social tag to 33 networks, fill mi… (#2430)
* Overhaul site tags and naming: add social tag to 33 networks, fill missing tags for 213 top-1000 sites, clean up false us/in country tags (~374 sites), normalize site names to Title Case, add tag validation tests, document tagging and naming rules
Remove LLM folder: ask @soxoj for the up-to-date version!

* Remove LLM/ from version control

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 00:48:37 +02:00
Soxoj c2f912b164 Tags and site names improvements (#2427)
- Added social tag to social networks (33 sites)
- Fixed wrong tags (8 sites)
- Filled empty tags for 213 sites in top-1000
- Country tag cleanup (~374 sites)
- Site naming normalization (75 sites)
- New tests (3)
- Documentation updates
2026-04-08 00:48:37 +02:00
dependabot[bot] b653d1dda8 build(deps): bump cryptography from 46.0.5 to 46.0.6 (#2422)
Bumps [cryptography](https://github.com/pyca/cryptography) from 46.0.5 to 46.0.6.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/46.0.5...46.0.6)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-version: 46.0.6
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
Soxoj d27acbed86 Add urlProbes (#2425) 2026-04-08 00:48:37 +02:00
Soxoj 42de0a526d Sites re-check (#2423) 2026-04-08 00:48:37 +02:00
dependabot[bot] f150966056 build(deps): bump soupsieve from 2.8 to 2.8.3 (#2404)
Bumps [soupsieve](https://github.com/facelessuser/soupsieve) from 2.8 to 2.8.3.
- [Release notes](https://github.com/facelessuser/soupsieve/releases)
- [Commits](https://github.com/facelessuser/soupsieve/compare/2.8...2.8.3)

---
updated-dependencies:
- dependency-name: soupsieve
  dependency-version: 2.8.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:37 +02:00
dependabot[bot] 8e8d410ee8 build(deps-dev): bump pytest from 9.0.1 to 9.0.2 (#2381)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 9.0.1 to 9.0.2.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/9.0.1...9.0.2)

---
updated-dependencies:
- dependency-name: pytest
  dependency-version: 9.0.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] aefc236af3 build(deps): bump psutil from 7.1.3 to 7.2.2 (#2406)
Bumps [psutil](https://github.com/giampaolo/psutil) from 7.1.3 to 7.2.2.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/docs/changelog.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/v7.1.3...v7.2.2)

---
updated-dependencies:
- dependency-name: psutil
  dependency-version: 7.2.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] adcbefb59a build(deps): bump pyinstaller from 6.16.0 to 6.19.0 (#2405)
Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 6.16.0 to 6.19.0.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v6.16.0...v6.19.0)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-version: 6.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Soxoj ac5b4df258 Readme update: commercial use (#2403) 2026-04-08 00:48:36 +02:00
dependabot[bot] 299e88a4a4 build(deps): bump requests from 2.32.5 to 2.33.0 (#2394)
Bumps [requests](https://github.com/psf/requests) from 2.32.5 to 2.33.0.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.5...v2.33.0)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.33.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
github-actions[bot] 20b74383ee Updated site list and statistics (#2399)
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] 48559a2fe8 build(deps-dev): bump pytest-httpserver from 1.1.0 to 1.1.5 (#2397)
Bumps [pytest-httpserver](https://github.com/csernazs/pytest-httpserver) from 1.1.0 to 1.1.5.
- [Release notes](https://github.com/csernazs/pytest-httpserver/releases)
- [Changelog](https://github.com/csernazs/pytest-httpserver/blob/master/CHANGES.rst)
- [Commits](https://github.com/csernazs/pytest-httpserver/compare/1.1.0...1.1.5)

---
updated-dependencies:
- dependency-name: pytest-httpserver
  dependency-version: 1.1.5
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] 19abcc8586 build(deps): bump pypdf from 6.9.1 to 6.9.2 (#2392)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.9.1 to 6.9.2.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/6.9.1...6.9.2)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.9.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] bc3a069624 build(deps): bump yarl from 1.22.0 to 1.23.0 (#2383)
---
updated-dependencies:
- dependency-name: yarl
  dependency-version: 1.23.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] 337b9b5270 build(deps): bump asgiref from 3.11.0 to 3.11.1 (#2384)
Bumps [asgiref](https://github.com/django/asgiref) from 3.11.0 to 3.11.1.
- [Changelog](https://github.com/django/asgiref/blob/main/CHANGELOG.txt)
- [Commits](https://github.com/django/asgiref/compare/3.11.0...3.11.1)

---
updated-dependencies:
- dependency-name: asgiref
  dependency-version: 3.11.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Soxoj 0c6388aff5 Added Max.ru check; --no-progressbar flag fixed (#2386) 2026-04-08 00:48:36 +02:00
dependabot[bot] f860ce788a build(deps): bump pycountry from 24.6.1 to 26.2.16 (#2382)
Bumps [pycountry](https://github.com/pycountry/pycountry) from 24.6.1 to 26.2.16.
- [Release notes](https://github.com/pycountry/pycountry/releases)
- [Changelog](https://github.com/pycountry/pycountry/blob/main/HISTORY.txt)
- [Commits](https://github.com/pycountry/pycountry/compare/24.6.1...26.2.16)

---
updated-dependencies:
- dependency-name: pycountry
  dependency-version: 26.2.16
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Soxoj 55fc8694ed Fix false-positive site checks reported by Maigret Bot (#2376) 2026-04-08 00:48:36 +02:00
Copilot 31b759c08b Fix update-site-data workflow race condition on branch push (#2366)
* Initial plan

* Fix update-site-data workflow race condition on branch push

- Add concurrency control to cancel in-progress runs on new pushes to main
- Delete existing PR branch before creating new one to avoid stale ref conflicts
- Upgrade peter-evans/create-pull-request from v5 to v7 (Node.js 20 deprecation)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/a095d3d3-0093-43e8-9cc5-82797bd52453

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Soxoj 06c3360e3d feat(core): add POST request support, new sites, migrate to Majestic Million ranking (#2317)
* feat(core): add POST request support, new sites, migrate to Majestic Million ranking
- Added native POST request support to the Maigret engine (requestMethod, requestPayload) to enable querying modern JSON registration endpoints.
- Replaced the discontinued Alexa rank API with the Majestic Million dataset for global popularity sorting and automated CI updates.
- Fixed multiple false positives among top 500 sites and bypassed standard anti-bot protections using custom User-Agents.
- Updated public documentation and internal playbooks to reflect the new features.

* feat(data): apply all data.json site check updates from main branch

- Added CTFtime and PentesterLab (new sites added in main)
- Removed forums.imore.com (deleted in main as dead site)
- Disabled 5 sites per main branch fixes: Librusec, MirTesen, amateurvoyeurforum.com, forums.stevehoffman.tv, vegalab
- Fixed 5 site checks per main branch: SoundCloud, Taplink, Setlist, RoyalCams, club.cnews.ru (switched from status_code to message checkType with proper markers)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/a1d194d9-c0ff-4e2b-974c-c5e4b59548bf

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot e95d061460 Fix domain substring matching and NoneType crash in submit dialog (#2367)
* Initial plan

* Fix domain matching and NoneType error in submit.py

- Use regex with domain boundary matching instead of substring matching
  to prevent x.com from matching 500px.com, mix.com, etc.
- Handle None old_site gracefully when user enters a site name not in
  the matched list, fixing AttributeError crash.
- Add tests for both fixes.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/7eabc755-47fd-4b80-a38c-9d6c056c2ce9

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 3097b37eb3 feat: add tag blacklisting via --exclude-tags (#2352)
* Initial plan

* feat: add tag blacklisting support (--exclude-tags CLI flag, web UI, docs, tests)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/1a656af2-36bf-494f-9f03-1b5340f0357c

* fix: correct tag cloud label to match click-cycle interaction

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/1a656af2-36bf-494f-9f03-1b5340f0357c

* feat: add all country tags to web interface tag cloud

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/7e184b24-ff26-48fd-8a93-aea12b0a8d7b

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] 7d1911288f build(deps): bump certifi from 2025.11.12 to 2026.2.25 (#2346)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2025.11.12 to 2026.2.25.
- [Commits](https://github.com/certifi/python-certifi/compare/2025.11.12...2026.02.25)

---
updated-dependencies:
- dependency-name: certifi
  dependency-version: 2026.2.25
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot c9272983fc Fix SoundCloud false-positive: switch to message-based check (#2355)
* Initial plan

* Fix SoundCloud false-positive: switch from status_code to message checkType

SoundCloud returns HTTP 200 for non-existent user profiles (soft 404),
causing status_code check to report CLAIMED for random usernames.

Switch to message checkType with:
- presenseStrs: hydratable user marker in server-rendered HTML
- absenceStrs: generic page title for non-existent users

Markers sourced from WhatsMyName project's verified SoundCloud entry.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/8aa10eef-78bf-4251-bf42-473cd94c7ef4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 797daf0aa1 Fix club.cnews.ru false positive: switch from status_code to message checkType (#2342)
* Initial plan

* Fix club.cnews.ru false positive: switch from status_code to message checkType with absence strings

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/af131d2f-c7b5-4798-8ad1-86bab2673fe4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Julio César Suástegui abfb579755 feat: add CTFtime and PentesterLab site support (#2318)
Add two cybersecurity platforms for username enumeration:
- CTFtime (ctftime.org) - CTF competition platform
- PentesterLab (pentesterlab.com) - Security training platform

Both verified working with status_code check type.
Returns 200 for existing users, 404 for non-existent.

Co-authored-by: Julio César Suástegui <juliosuas@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
github-actions[bot] ff13d5fafb Automated Sites List Update (#2341)
* Updated site list and statistics

* Rebase and regenerate sites.md against latest main (#2351)

* Updated site list and statistics

* Initial plan

* Disable MirTesen site check (false positive) (#2350)

* Initial plan

* Disable MirTesen site check to fix false-positive probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/61c86064-423d-4f1b-8277-2838f747dd89

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

* build(deps): bump attrs from 25.4.0 to 26.1.0 (#2344)

Bumps [attrs](https://github.com/sponsors/hynek) from 25.4.0 to 26.1.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-version: 26.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated site list and statistics

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] 2c30d19c9f build(deps): bump attrs from 25.4.0 to 26.1.0 (#2344)
Bumps [attrs](https://github.com/sponsors/hynek) from 25.4.0 to 26.1.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-version: 26.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 089cb48773 Disable MirTesen site check (false positive) (#2350)
* Initial plan

* Disable MirTesen site check to fix false-positive probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/61c86064-423d-4f1b-8277-2838f747dd89

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 2bb473d919 Disable Librusec site check (false positive) (#2349)
* Initial plan

* Disable Librusec site check to fix false-positive probe

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] 1a8a76ca21 build(deps-dev): bump mypy from 1.19.0 to 1.19.1 (#2347)
Bumps [mypy](https://github.com/python/mypy) from 1.19.0 to 1.19.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.19.0...v1.19.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.19.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] ddb5f26813 build(deps): bump aiodns from 3.5.0 to 4.0.0 (#2345)
Bumps [aiodns](https://github.com/saghul/aiodns) from 3.5.0 to 4.0.0.
- [Release notes](https://github.com/saghul/aiodns/releases)
- [Changelog](https://github.com/aio-libs/aiodns/blob/master/ChangeLog)
- [Commits](https://github.com/saghul/aiodns/compare/v3.5.0...v4.0.0)

---
updated-dependencies:
- dependency-name: aiodns
  dependency-version: 4.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 6b5955d676 Fix false-positive site probe: Re-enable Taplink with message checkType (#2326)
* Initial plan

* Disable Taplink site check to fix false-positive detections

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/ef9281f4-ba67-4760-a6e2-57564ac4ea94

* Re-enable Taplink with message checkType and absenceStrs

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/db3e572e-b79b-4cec-ac7f-062e76144660

* Improve Taplink absenceStrs: add Russian variant and presenseStrs

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/28e24317-e8b9-45f6-bad5-0e549b891313

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
github-actions[bot] ebbff47829 Automated Sites List Update (#2339)
* Updated site list and statistics

* Rebase: merge origin/main into auto/update-sites-list (#2340)

* Updated site list and statistics (#2315)

Co-authored-by: soxoj <soxoj@users.noreply.github.com>

* Initial plan

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

---------

Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
github-actions[bot] 9d6319aebd Updated site list and statistics (#2315)
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 05015a9cce [WIP] Fix invalid link on forums.imore.com (#2337)
* Initial plan

* Remove dead forums.imore.com site from database

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/c83530d0-d24f-45fc-aca3-ae1e46ece33c

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 444eba65cd Fix Setlist site check: switch to message checkType with proper markers (#2333)
* Initial plan

* Disable Setlist site check due to false positives (soft 404)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/8c552ca6-51e5-4e79-a791-ddd6f27d2461

* Fix Setlist check: switch to message checkType with proper markers

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/3c387df6-1dfe-451f-96d8-b4b6455f7857

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 86777eeccf Fix RoyalCams site check using BongaCams white-label pattern (#2334)
* Initial plan

* Disable RoyalCams site check to fix false-positive probe

The Telegram Maigret bot auto-probe reported CLAIMED for three random
usernames. The status_code checkType is unreliable as the site returns
200 for non-existent user profiles (soft 404). Disabling the site check
until a reliable detection method can be established.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/05b3d513-fe15-477d-a455-0c9ddf0b8b51

* Fix RoyalCams: switch to message checkType using BongaCams white-label pattern

RoyalCams runs on the BongaCams platform. Applied the same fix pattern:
- Switch from status_code to message checkType
- Use Portuguese locale (pt.royalcams.com) as urlProbe
- absenceStrs matches generic title on non-existent profiles
- presenseStrs matches Portuguese profile title for existing users
- Add browser-like headers matching BongaCams config
- Remove disabled flag

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/2f6a9523-278a-4992-ba7c-c320de14bfa4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 829063d1a1 [WIP] Fix false-positive probe for vegalab site (#2336)
* Initial plan

* Disable vegalab site check: domain is dead (DNS does not resolve), causing false positives

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/98430e81-5dcb-4cb3-9aaa-f8c5ce86d026

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot b6d4473f6f Disable forums.stevehoffman.tv due to false positives (#2331)
* Initial plan

* Disable forums.stevehoffman.tv to fix false-positive site probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/39fea4a9-ec6d-4a12-b34b-1a3486d647e4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 93e48e583b Disable false-positive site probe: amateurvoyeurforum.com (#2332)
* Initial plan

* Disable amateurvoyeurforum.com site check to fix false positives

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/e7fcad2b-4511-4e6d-b186-411951170e0a

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] d2b71dce57 build(deps): bump aiohttp-socks from 0.10.1 to 0.11.0 (#2319)
Bumps [aiohttp-socks](https://github.com/romis2012/aiohttp-socks) from 0.10.1 to 0.11.0.
- [Release notes](https://github.com/romis2012/aiohttp-socks/releases)
- [Commits](https://github.com/romis2012/aiohttp-socks/compare/v0.10.1...v0.11.0)

---
updated-dependencies:
- dependency-name: aiohttp-socks
  dependency-version: 0.11.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] d8f2a48870 build(deps-dev): bump pytest-cov from 7.0.0 to 7.1.0 (#2320)
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 7.0.0 to 7.1.0.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v7.0.0...v7.1.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-version: 7.1.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] b5aa76a52c build(deps-dev): bump coverage from 7.12.0 to 7.13.5 (#2321)
Bumps [coverage](https://github.com/coveragepy/coveragepy) from 7.12.0 to 7.13.5.
- [Release notes](https://github.com/coveragepy/coveragepy/releases)
- [Changelog](https://github.com/coveragepy/coveragepy/blob/main/CHANGES.rst)
- [Commits](https://github.com/coveragepy/coveragepy/compare/7.12.0...7.13.5)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.13.5
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] f8bd42baa8 build(deps): bump reportlab from 4.4.5 to 4.4.10 (#2323)
Bumps [reportlab](https://www.reportlab.com/) from 4.4.5 to 4.4.10.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-version: 4.4.10
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot ce4c768891 Pin requests-toolbelt>=1.0.0 to fix urllib3 v2 incompatibility (#2316)
* Initial plan

* Add requests-toolbelt ^1.0.0 as explicit dependency to fix urllib3 v2 appengine ImportError

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/458d41b2-c135-4b51-b0b1-b1832490c808

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot d566d48d1f Disable forums.developer.nvidia.com (auth-gated user profiles) (#2305)
* Initial plan

* disable forums.developer.nvidia.com due to auth-locked user pages

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/b8f41f15-8588-4aac-a443-af5e2aaa1918

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 278a5082ce Remove dead site xxxforum.org (#2310)
* Initial plan

* Remove broken site xxxforum.org from data.json and sites.md

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/bfbd3aa8-bfb1-480a-b2e7-a2c40fc69def

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Copilot 87c8356ff1 Fix Love.Mail.ru: update to numeric-only identifiers and new profile URL (#2307)
* Initial plan

* fix: update Love.Mail.ru to use numeric-only identifiers (#1264)

- Add regexCheck to enforce numeric-only IDs (^\d+$)
- Update usernameClaimed/usernameUnclaimed to numeric values
- Site remains disabled pending live verification

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/6de16097-6bc1-424a-beb1-1d2ec6b99944

* fix: update Love.Mail.ru URL to /profile/ path, enable check with verified ID

Use maintainer-provided working link https://love.mail.ru/profile/1838153357.
- Change URL pattern from /ru/{username} to /profile/{username}
- Set usernameClaimed to 1838153357
- Remove disabled flag to enable the check

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/ac07d38e-46e2-42d3-9e93-eda3e5cfbcc3

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
github-actions[bot] d1ecd8a965 Updated site list and statistics (#2314)
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Soxoj 44991e0b7c Update site data workflow fix: remove ambiguous main tag (#2313)
* feat(workflow): fix update site data workflow err

* feat(workflow): the final update side data workflow fix (hopefully)
2026-04-08 00:48:36 +02:00
Soxoj 1a3ce4f114 feat(workflow): fix update site data workflow err (#2312) 2026-04-08 00:48:36 +02:00
Copilot 6c5f67f30b Re-enable taplink.cc with browser User-Agent to bypass Cloudflare (#2308)
* Initial plan

* fix(taplink): re-enable taplink.cc with browser User-Agent header to bypass Cloudflare

Remove disabled flag and add a Chrome User-Agent header to help
bypass Cloudflare bot detection for taplink.cc profile checks.
If Cloudflare still blocks requests, maigret's built-in error
detection will gracefully mark results as UNKNOWN.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/271904b6-e358-4aeb-b503-21c9b91186d9

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Soxoj a2982da0f3 feat(workflow): fix update site data workflow dependency (#2306) 2026-04-08 00:48:36 +02:00
dependabot[bot] 6b8ad90172 Bump svglib from 1.5.1 to 1.6.0 (#2205)
* Bump svglib from 1.5.1 to 1.6.0

Bumps [svglib](https://github.com/deeplook/svglib) from 1.5.1 to 1.6.0.
- [Changelog](https://github.com/deeplook/svglib/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/deeplook/svglib/commits)

---
updated-dependencies:
- dependency-name: svglib
  dependency-version: 1.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Add libcairo2-dev to CI workflow for svglib 1.6.0 compatibility (#2304)

* Initial plan

* Add libcairo2-dev system dependency install step to test workflow

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/3ecab70e-d4a3-4e74-9245-bffc58d6d0a3

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Soxoj e95057abf8 Update sites list workflow (#2303) 2026-04-08 00:48:36 +02:00
Soxoj 5fa86187f5 feat(sites): fix false positives: disable 74 broken sites, fix 8 with API probes and better markers (#2302)
- Disable 74 sites: Cloudflare/captcha blocks, identical responses,
    dead domains, vBulletin/phpBB engine failures
  - Fix Roblox, Salon24.pl, Planetaexcel → status_code (clear 404 signal)
  - Fix en.brickimedia.org → message with "noarticletext" absenceStr
  - Fix Arduino → narrower title-based presenseStrs/absenceStrs
  - Re-enable Fandom (3 wikis) via MediaWiki api.php urlProbe
  - Re-enable Substack via /api/v1/user/{}/public_profile urlProbe
  - Re-enable hashnode via GraphQL GET urlProbe (URL-encoded query)
  - Document lessons: engine template drift, search-by-author fragility,
    always-200 sites, TLS degradation, API bypassing Cloudflare,
    GraphQL GET support, URL-encoding for template safety
2026-04-08 00:48:36 +02:00
Soxoj c9ab9d676b Improve site-check quality: fix broken site configs, add diagnostic utilities, and make self-check report-only by default with opt-in auto-disable. (#2301)
- Fix VK and TradingView checkType; add Reddit and Microsoft Learn API-style probes where appropriate; adjust or disable entries that are unreliable under anti-bot protection.
- Self-check: stop aggressive auto-disable; default to reporting issues only; add --auto-disable and --diagnose for optional fixes and deeper output.
- Tooling: add utils/site_check.py and utils/check_top_n.py (and related helpers) to inspect and rank site behavior against the top-N list
- Scope: aligns with fixing top-traffic / high-impact sites and making diagnostics repeatable without silently flipping disabled flags
2026-04-08 00:48:36 +02:00
Soxoj 4784ecdacc Update Telegram bot link in README (#2300) 2026-04-08 00:48:36 +02:00
dependabot[bot] f24a73846e Bump certifi from 2025.10.5 to 2025.11.12 (#2249)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2025.10.5 to 2025.11.12.
- [Commits](https://github.com/certifi/python-certifi/compare/2025.10.05...2025.11.12)

---
updated-dependencies:
- dependency-name: certifi
  dependency-version: 2025.11.12
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] 9e64f7260d build(deps): bump werkzeug from 3.1.4 to 3.1.6 (#2288)
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.1.4 to 3.1.6.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/werkzeug/compare/3.1.4...3.1.6)

---
updated-dependencies:
- dependency-name: werkzeug
  dependency-version: 3.1.6
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] ee68542028 Bump reportlab from 4.4.4 to 4.4.5 (#2251)
Bumps [reportlab](https://www.reportlab.com/) from 4.4.4 to 4.4.5.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-version: 4.4.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
dependabot[bot] c2240b6607 build(deps): bump flask from 3.1.2 to 3.1.3 (#2289)
Bumps [flask](https://github.com/pallets/flask) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/flask/releases)
- [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/flask/compare/3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: flask
  dependency-version: 3.1.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:36 +02:00
Soxoj 54f1a46d79 Twitter fixed, mirrors mechanism improvement (#2299) 2026-04-08 00:48:36 +02:00
Soxoj f5151ca0fd Pyinstaller GitHub workflow fix (#2298) 2026-04-08 00:48:36 +02:00
Soxoj 43db010dfe Update Telegram bot link in README (#2293) 2026-04-08 00:48:36 +02:00
Soxoj 59535c59e5 Fixed false positives in top-500 (#2292) 2026-04-08 00:48:36 +02:00
Soxoj eccc09275a Dockerfile fix (#2290) 2026-04-08 00:48:23 +02:00
dependabot[bot] 2e94bafb7b Bump pypdf from 6.4.0 to 6.9.1 (#2281)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.4.0 to 6.9.1.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/6.4.0...6.9.1)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.9.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:23 +02:00
dependabot[bot] 5a6ee2ef90 Bump cryptography from 44.0.1 to 46.0.5 (#2270)
Bumps [cryptography](https://github.com/pyca/cryptography) from 44.0.1 to 46.0.5.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/44.0.1...46.0.5)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-version: 46.0.5
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:23 +02:00
dependabot[bot] d180101a06 Bump black from 25.11.0 to 26.3.1 (#2280)
Bumps [black](https://github.com/psf/black) from 25.11.0 to 26.3.1.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.11.0...26.3.1)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 26.3.1
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:48:18 +02:00
dependabot[bot] c0af820de3 Bump pillow from 11.0.0 to 12.1.1 (#2271)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 11.0.0 to 12.1.1.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/11.0.0...12.1.1)

---
updated-dependencies:
- dependency-name: pillow
  dependency-version: 12.1.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:45:06 +02:00
dependabot[bot] b7125dc97e Bump urllib3 from 2.5.0 to 2.6.3 (#2262)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.3.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.3)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 00:45:06 +02:00
Tang Vu 108fef50ee refactor: unexpanded tilde in file path (#2283)
The path `'~/.maigret/settings.json'` uses a tilde (`~`) which is not automatically expanded by Python's `open()` function. This will cause the settings file in the user's home directory to be silently ignored (caught by `FileNotFoundError`) because Python will look for a literal directory named `~` in the current working directory.

Affected files: settings.py
2026-04-08 00:45:06 +02:00
Tang Vu 4470c3a440 refactor: missing tests for settings cascade and override logic (#2287)
The `Settings.load()` method iterates through multiple configuration file paths and updates the internal `__dict__`, intending to override earlier default settings with later user-specific ones. This cascading logic is a core configuration feature but lacks explicit tests to guarantee that dictionary merging and overriding behave exactly as documented (e.g., ensuring a setting in `~/.maigret/settings.json` correctly overrides `resources/settings.json` without wiping out other keys).


Affected files: test_settings.py
2026-04-08 00:45:06 +02:00
Tang Vu 84529cd5b4 ♻️ Refactor: Hardcoded relative path for database file (#2285)
* refactor: hardcoded relative path for database file

`app.config['MAIGRET_DB_FILE']` is set to a hardcoded relative path `os.path.join('maigret', 'resources', 'data.json')`. If the Flask application is executed from a different working directory (other than the repository root), it will fail to find the database file and crash.

Affected files: app.py, settings.py

* refactor: hardcoded relative path for database file

`app.config['MAIGRET_DB_FILE']` is set to a hardcoded relative path `os.path.join('maigret', 'resources', 'data.json')`. If the Flask application is executed from a different working directory (other than the repository root), it will fail to find the database file and crash.

Affected files: app.py, settings.py
2026-04-08 00:45:06 +02:00
Copilot 23adc178ea Fix crash on -a --self-check by adding exception handling to site check coroutines (#2466)
* Initial plan

* Fix crash on -a --self-check by adding exception handling in site_self_check and self_check

Wrap the body of site_self_check in try/except to catch unexpected errors
and always return a valid changes dict. Also add a safety-net try/except
in self_check around awaiting individual site check futures so that a
single site failure doesn't crash the entire self-check process.

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/5e27d620-5cbb-43d2-a9f9-ecb53a29904d

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

* Restore @pytest.mark.slow on test_maigret_results

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/5e27d620-5cbb-43d2-a9f9-ecb53a29904d

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

* Document --self-check error resilience, --auto-disable, and --diagnose in docs/

Update command-line-options.rst with expanded --self-check description
and new --auto-disable and --diagnose entries. Add a "Database self-check"
section to features.rst explaining error-resilient behaviour and usage
examples. Update usage-examples.rst to reference --auto-disable.

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/af1f0f09-9112-4902-8475-e81d235ff3ed

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-07 19:44:09 +02:00
Soxoj 6834483360 Fix Spotify, add Spotify Community forum (#2467) 2026-04-07 18:25:13 +02:00
Copilot 6ed8fdefcc Add installation troubleshooting for missing system dependencies (#2465)
* Initial plan

* Add installation troubleshooting section for system dependency errors

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/6c3a5612-bdd5-4611-ba77-aea7ab52e304

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

* Simplify README troubleshooting to a link to the full docs

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/6c557093-0643-4980-93ad-973e2d3141ef

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-04-07 17:54:02 +02:00
Soxoj 3fd34afb77 Sites fixes (#2464) 2026-04-06 21:41:16 +02:00
Soxoj ad95302745 Add Markdown reports for LLM analysis (#2463) 2026-04-06 18:26:43 +02:00
dependabot[bot] 44a6c729e3 build(deps): bump curl-cffi from 0.14.0 to 0.15.0 (#2462)
Bumps [curl-cffi](https://github.com/lexiforest/curl_cffi) from 0.14.0 to 0.15.0.
- [Release notes](https://github.com/lexiforest/curl_cffi/releases)
- [Changelog](https://github.com/lexiforest/curl_cffi/blob/main/docs/changelog.rst)
- [Commits](https://github.com/lexiforest/curl_cffi/compare/v0.14.0...v0.15.0)

---
updated-dependencies:
- dependency-name: curl-cffi
  dependency-version: 0.15.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-06 15:04:48 +02:00
Soxoj 6d0a22b738 False positive fixes (#2460)
* Fix false positives: APClips, Taplink, gentoo, Discord.bio, ChaturBate; disable 7Cups, playtime, openriskmanual, reactos; update tags

* Fix db_meta.json regeneration in update_site_data.py (inline instead of module import)

* Fix false positives: disable Bit.ly, Firearmstalk, Needrom, Travelblog; fix gentoo, Discord.bio, brickimedia via API; remove dead sites dreamhost, typepad
2026-04-04 19:08:51 +02:00
Soxoj abce3c9be4 Fix false positives (#2459)
* Fix false positives: APClips, Taplink, gentoo, Discord.bio, ChaturBate; disable 7Cups, playtime, openriskmanual, reactos; update tags

* Fix db_meta.json regeneration in update_site_data.py (inline instead of module import)
2026-04-04 18:22:21 +02:00
Soxoj 269d50eedc DB update mechanism (#2458)
* Database update mechanism
2026-04-04 18:00:50 +02:00
Soxoj e8f4318e5d Added Crypto/Web3 site checks (#2457) 2026-04-04 16:49:12 +02:00
Soxoj 75289c78bf Update of MIT License (#2455) 2026-04-03 18:02:54 +02:00
Julio César Suástegui eeb38ccdc0 fix(data): update InterPals absence string to match current site response (#2442)
The previous absence string 'The requested user does not exist or is inactive'
no longer matches the live site response. InterPals now returns 'User not found'
for non-existent profiles, causing false positives for all username searches.

Tested against interpals.net/noneownsthisusername (non-existent) and
interpals.net/blue (claimed) to confirm detection accuracy.

Closes #2433

Co-authored-by: Julio César Suástegui <juliosuas@users.noreply.github.com>
2026-04-03 13:43:33 +02:00
Soxoj d136014576 Multiple lint and types fixes (#2454) 2026-04-02 21:01:49 +02:00
Soxoj 5d502eaef6 Add site protection tracking system, fix broken site checks (Instagra… (#2452)
* Add site protection tracking system, fix broken site checks (Instagram, StackOverflow, LeetCode, Boosty, LiveLib), preserve unicode in data.json

* Update poetry.lock by running poetry lock

Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/14333f41-67d5-4e28-a782-9730b31fc667

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-02 20:28:20 +02:00
dependabot[bot] 9e8a701c54 build(deps): bump aiohttp from 3.13.4 to 3.13.5 (#2448)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-02 08:11:42 +02:00
dependabot[bot] 7b67c61240 build(deps-dev): bump mypy from 1.19.1 to 1.20.0 (#2447)
Bumps [mypy](https://github.com/python/mypy) from 1.19.1 to 1.20.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.19.1...v1.20.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.20.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-01 20:12:29 +02:00
dependabot[bot] 0e113c4592 build(deps): bump requests from 2.33.0 to 2.33.1 (#2444)
Bumps [requests](https://github.com/psf/requests) from 2.33.0 to 2.33.1.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.33.0...v2.33.1)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.33.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-31 16:49:17 +02:00
dependabot[bot] fb4e17be92 build(deps): bump pygments from 2.18.0 to 2.20.0 (#2440)
Bumps [pygments](https://github.com/pygments/pygments) from 2.18.0 to 2.20.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.18.0...2.20.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-version: 2.20.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 20:25:10 +02:00
dependabot[bot] adb19e5930 build(deps): bump aiohttp from 3.13.3 to 3.13.4 (#2435)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 14:27:54 +02:00
dependabot[bot] 116fae3e0f build(deps): bump platformdirs from 4.5.0 to 4.9.4 (#2434)
Bumps [platformdirs](https://github.com/tox-dev/platformdirs) from 4.5.0 to 4.9.4.
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/docs/changelog.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.5.0...4.9.4)

---
updated-dependencies:
- dependency-name: platformdirs
  dependency-version: 4.9.4
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 11:09:27 +02:00
dependabot[bot] bf495cd57e build(deps): bump chardet from 5.2.0 to 7.4.0.post2 (#2436)
Bumps [chardet](https://github.com/chardet/chardet) from 5.2.0 to 7.4.0.post2.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](https://github.com/chardet/chardet/compare/5.2.0...7.4.0.post2)

---
updated-dependencies:
- dependency-name: chardet
  dependency-version: 7.4.0.post2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 11:09:14 +02:00
dependabot[bot] e49aa533df build(deps): bump multidict from 6.7.0 to 6.7.1 (#2396)
Bumps [multidict](https://github.com/aio-libs/multidict) from 6.7.0 to 6.7.1.
- [Release notes](https://github.com/aio-libs/multidict/releases)
- [Changelog](https://github.com/aio-libs/multidict/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/multidict/compare/v6.7.0...v6.7.1)

---
updated-dependencies:
- dependency-name: multidict
  dependency-version: 6.7.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-29 12:20:48 +02:00
Soxoj 5aa7f6429b Overhaul site tags and naming: add social tag to 33 networks, fill mi… (#2430)
* Overhaul site tags and naming: add social tag to 33 networks, fill missing tags for 213 top-1000 sites, clean up false us/in country tags (~374 sites), normalize site names to Title Case, add tag validation tests, document tagging and naming rules
Remove LLM folder: ask @soxoj for the up-to-date version!

* Remove LLM/ from version control

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 19:48:16 +01:00
Soxoj a5d337b765 Tags and site names improvements (#2427)
- Added social tag to social networks (33 sites)
- Fixed wrong tags (8 sites)
- Filled empty tags for 213 sites in top-1000
- Country tag cleanup (~374 sites)
- Site naming normalization (75 sites)
- New tests (3)
- Documentation updates
2026-03-28 15:42:12 +01:00
dependabot[bot] 5aa0c908b0 build(deps): bump cryptography from 46.0.5 to 46.0.6 (#2422)
Bumps [cryptography](https://github.com/pyca/cryptography) from 46.0.5 to 46.0.6.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/46.0.5...46.0.6)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-version: 46.0.6
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-28 10:10:35 +01:00
Soxoj 51b452ad71 Add urlProbes (#2425) 2026-03-28 00:08:02 +01:00
Soxoj fa1a4d1b4a Sites re-check (#2423) 2026-03-27 22:41:55 +01:00
dependabot[bot] 184519b202 build(deps): bump soupsieve from 2.8 to 2.8.3 (#2404)
Bumps [soupsieve](https://github.com/facelessuser/soupsieve) from 2.8 to 2.8.3.
- [Release notes](https://github.com/facelessuser/soupsieve/releases)
- [Commits](https://github.com/facelessuser/soupsieve/compare/2.8...2.8.3)

---
updated-dependencies:
- dependency-name: soupsieve
  dependency-version: 2.8.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-27 22:41:40 +01:00
dependabot[bot] a203eecbb2 build(deps-dev): bump pytest from 9.0.1 to 9.0.2 (#2381)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 9.0.1 to 9.0.2.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/9.0.1...9.0.2)

---
updated-dependencies:
- dependency-name: pytest
  dependency-version: 9.0.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-27 22:15:56 +01:00
dependabot[bot] dde1cd5d78 build(deps): bump psutil from 7.1.3 to 7.2.2 (#2406)
Bumps [psutil](https://github.com/giampaolo/psutil) from 7.1.3 to 7.2.2.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/docs/changelog.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/v7.1.3...v7.2.2)

---
updated-dependencies:
- dependency-name: psutil
  dependency-version: 7.2.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-27 15:58:24 +01:00
dependabot[bot] 547512519b build(deps): bump pyinstaller from 6.16.0 to 6.19.0 (#2405)
Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 6.16.0 to 6.19.0.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v6.16.0...v6.19.0)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-version: 6.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-27 09:19:04 +01:00
Soxoj b333a2e2b2 Readme update: commercial use (#2403) 2026-03-26 21:51:53 +01:00
dependabot[bot] 2835ec71c7 build(deps): bump requests from 2.32.5 to 2.33.0 (#2394)
Bumps [requests](https://github.com/psf/requests) from 2.32.5 to 2.33.0.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.5...v2.33.0)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.33.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-26 21:10:51 +01:00
github-actions[bot] af67a6a3f3 Updated site list and statistics (#2399)
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-26 16:36:23 +01:00
dependabot[bot] 4f737b5260 build(deps-dev): bump pytest-httpserver from 1.1.0 to 1.1.5 (#2397)
Bumps [pytest-httpserver](https://github.com/csernazs/pytest-httpserver) from 1.1.0 to 1.1.5.
- [Release notes](https://github.com/csernazs/pytest-httpserver/releases)
- [Changelog](https://github.com/csernazs/pytest-httpserver/blob/master/CHANGES.rst)
- [Commits](https://github.com/csernazs/pytest-httpserver/compare/1.1.0...1.1.5)

---
updated-dependencies:
- dependency-name: pytest-httpserver
  dependency-version: 1.1.5
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-26 16:12:53 +01:00
dependabot[bot] 185e09e4ea build(deps): bump pypdf from 6.9.1 to 6.9.2 (#2392)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.9.1 to 6.9.2.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/6.9.1...6.9.2)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.9.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-25 23:33:41 +01:00
dependabot[bot] 5865e0f375 build(deps): bump yarl from 1.22.0 to 1.23.0 (#2383)
---
updated-dependencies:
- dependency-name: yarl
  dependency-version: 1.23.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-25 16:59:46 +01:00
dependabot[bot] 815c8cb2f3 build(deps): bump asgiref from 3.11.0 to 3.11.1 (#2384)
Bumps [asgiref](https://github.com/django/asgiref) from 3.11.0 to 3.11.1.
- [Changelog](https://github.com/django/asgiref/blob/main/CHANGELOG.txt)
- [Commits](https://github.com/django/asgiref/compare/3.11.0...3.11.1)

---
updated-dependencies:
- dependency-name: asgiref
  dependency-version: 3.11.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-25 15:03:02 +01:00
Soxoj 656fe1df24 Added Max.ru check; --no-progressbar flag fixed (#2386) 2026-03-25 11:48:12 +01:00
dependabot[bot] 1c5dc5f152 build(deps): bump pycountry from 24.6.1 to 26.2.16 (#2382)
Bumps [pycountry](https://github.com/pycountry/pycountry) from 24.6.1 to 26.2.16.
- [Release notes](https://github.com/pycountry/pycountry/releases)
- [Changelog](https://github.com/pycountry/pycountry/blob/main/HISTORY.txt)
- [Commits](https://github.com/pycountry/pycountry/compare/24.6.1...26.2.16)

---
updated-dependencies:
- dependency-name: pycountry
  dependency-version: 26.2.16
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-25 09:54:11 +01:00
Soxoj bc3d9faad9 Fix false-positive site checks reported by Maigret Bot (#2376) 2026-03-24 23:01:11 +01:00
Copilot 5aae2ee005 Fix update-site-data workflow race condition on branch push (#2366)
* Initial plan

* Fix update-site-data workflow race condition on branch push

- Add concurrency control to cancel in-progress runs on new pushes to main
- Delete existing PR branch before creating new one to avoid stale ref conflicts
- Upgrade peter-evans/create-pull-request from v5 to v7 (Node.js 20 deprecation)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/a095d3d3-0093-43e8-9cc5-82797bd52453

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 22:10:03 +01:00
Soxoj b145e7b26f feat(core): add POST request support, new sites, migrate to Majestic Million ranking (#2317)
* feat(core): add POST request support, new sites, migrate to Majestic Million ranking
- Added native POST request support to the Maigret engine (requestMethod, requestPayload) to enable querying modern JSON registration endpoints.
- Replaced the discontinued Alexa rank API with the Majestic Million dataset for global popularity sorting and automated CI updates.
- Fixed multiple false positives among top 500 sites and bypassed standard anti-bot protections using custom User-Agents.
- Updated public documentation and internal playbooks to reflect the new features.

* feat(data): apply all data.json site check updates from main branch

- Added CTFtime and PentesterLab (new sites added in main)
- Removed forums.imore.com (deleted in main as dead site)
- Disabled 5 sites per main branch fixes: Librusec, MirTesen, amateurvoyeurforum.com, forums.stevehoffman.tv, vegalab
- Fixed 5 site checks per main branch: SoundCloud, Taplink, Setlist, RoyalCams, club.cnews.ru (switched from status_code to message checkType with proper markers)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/a1d194d9-c0ff-4e2b-974c-c5e4b59548bf

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-03-24 22:08:42 +01:00
Copilot abd9aa57fe Fix domain substring matching and NoneType crash in submit dialog (#2367)
* Initial plan

* Fix domain matching and NoneType error in submit.py

- Use regex with domain boundary matching instead of substring matching
  to prevent x.com from matching 500px.com, mix.com, etc.
- Handle None old_site gracefully when user enters a site name not in
  the matched list, fixing AttributeError crash.
- Add tests for both fixes.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/7eabc755-47fd-4b80-a38c-9d6c056c2ce9

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 22:04:10 +01:00
Copilot 2e430e5039 feat: add tag blacklisting via --exclude-tags (#2352)
* Initial plan

* feat: add tag blacklisting support (--exclude-tags CLI flag, web UI, docs, tests)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/1a656af2-36bf-494f-9f03-1b5340f0357c

* fix: correct tag cloud label to match click-cycle interaction

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/1a656af2-36bf-494f-9f03-1b5340f0357c

* feat: add all country tags to web interface tag cloud

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/7e184b24-ff26-48fd-8a93-aea12b0a8d7b

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 22:00:59 +01:00
dependabot[bot] f5786f11ce build(deps): bump certifi from 2025.11.12 to 2026.2.25 (#2346)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2025.11.12 to 2026.2.25.
- [Commits](https://github.com/certifi/python-certifi/compare/2025.11.12...2026.02.25)

---
updated-dependencies:
- dependency-name: certifi
  dependency-version: 2026.2.25
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 15:13:10 +01:00
Copilot 3e56c95e16 Fix SoundCloud false-positive: switch to message-based check (#2355)
* Initial plan

* Fix SoundCloud false-positive: switch from status_code to message checkType

SoundCloud returns HTTP 200 for non-existent user profiles (soft 404),
causing status_code check to report CLAIMED for random usernames.

Switch to message checkType with:
- presenseStrs: hydratable user marker in server-rendered HTML
- absenceStrs: generic page title for non-existent users

Markers sourced from WhatsMyName project's verified SoundCloud entry.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/8aa10eef-78bf-4251-bf42-473cd94c7ef4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 15:12:56 +01:00
Copilot 28f35f9a4f Fix club.cnews.ru false positive: switch from status_code to message checkType (#2342)
* Initial plan

* Fix club.cnews.ru false positive: switch from status_code to message checkType with absence strings

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/af131d2f-c7b5-4798-8ad1-86bab2673fe4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 10:52:23 +01:00
Julio César Suástegui 79cea49526 feat: add CTFtime and PentesterLab site support (#2318)
Add two cybersecurity platforms for username enumeration:
- CTFtime (ctftime.org) - CTF competition platform
- PentesterLab (pentesterlab.com) - Security training platform

Both verified working with status_code check type.
Returns 200 for existing users, 404 for non-existent.

Co-authored-by: Julio César Suástegui <juliosuas@users.noreply.github.com>
2026-03-24 10:52:07 +01:00
github-actions[bot] 2d94269656 Automated Sites List Update (#2341)
* Updated site list and statistics

* Rebase and regenerate sites.md against latest main (#2351)

* Updated site list and statistics

* Initial plan

* Disable MirTesen site check (false positive) (#2350)

* Initial plan

* Disable MirTesen site check to fix false-positive probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/61c86064-423d-4f1b-8277-2838f747dd89

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

* build(deps): bump attrs from 25.4.0 to 26.1.0 (#2344)

Bumps [attrs](https://github.com/sponsors/hynek) from 25.4.0 to 26.1.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-version: 26.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated site list and statistics

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 10:51:21 +01:00
dependabot[bot] 829bda885a build(deps): bump attrs from 25.4.0 to 26.1.0 (#2344)
Bumps [attrs](https://github.com/sponsors/hynek) from 25.4.0 to 26.1.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-version: 26.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 09:51:42 +01:00
Copilot eb541dcf51 Disable MirTesen site check (false positive) (#2350)
* Initial plan

* Disable MirTesen site check to fix false-positive probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/61c86064-423d-4f1b-8277-2838f747dd89

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 09:51:31 +01:00
Copilot 4c97025a32 Disable Librusec site check (false positive) (#2349)
* Initial plan

* Disable Librusec site check to fix false-positive probe

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-03-24 09:51:16 +01:00
dependabot[bot] 2775181a6a build(deps-dev): bump mypy from 1.19.0 to 1.19.1 (#2347)
Bumps [mypy](https://github.com/python/mypy) from 1.19.0 to 1.19.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.19.0...v1.19.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.19.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 09:35:54 +01:00
dependabot[bot] b00ef1f5dd build(deps): bump aiodns from 3.5.0 to 4.0.0 (#2345)
Bumps [aiodns](https://github.com/saghul/aiodns) from 3.5.0 to 4.0.0.
- [Release notes](https://github.com/saghul/aiodns/releases)
- [Changelog](https://github.com/aio-libs/aiodns/blob/master/ChangeLog)
- [Commits](https://github.com/saghul/aiodns/compare/v3.5.0...v4.0.0)

---
updated-dependencies:
- dependency-name: aiodns
  dependency-version: 4.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 09:32:12 +01:00
Copilot d3f13ac295 Fix false-positive site probe: Re-enable Taplink with message checkType (#2326)
* Initial plan

* Disable Taplink site check to fix false-positive detections

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/ef9281f4-ba67-4760-a6e2-57564ac4ea94

* Re-enable Taplink with message checkType and absenceStrs

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/db3e572e-b79b-4cec-ac7f-062e76144660

* Improve Taplink absenceStrs: add Russian variant and presenseStrs

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/28e24317-e8b9-45f6-bad5-0e549b891313

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 21:36:36 +01:00
github-actions[bot] 479a614d1d Automated Sites List Update (#2339)
* Updated site list and statistics

* Rebase: merge origin/main into auto/update-sites-list (#2340)

* Updated site list and statistics (#2315)

Co-authored-by: soxoj <soxoj@users.noreply.github.com>

* Initial plan

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

---------

Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-03-23 21:36:09 +01:00
github-actions[bot] e0559e4320 Updated site list and statistics (#2315)
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
2026-03-23 20:20:53 +01:00
Copilot 00a9249229 [WIP] Fix invalid link on forums.imore.com (#2337)
* Initial plan

* Remove dead forums.imore.com site from database

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/c83530d0-d24f-45fc-aca3-ae1e46ece33c

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:20:28 +01:00
Copilot 005863c2e0 Fix Setlist site check: switch to message checkType with proper markers (#2333)
* Initial plan

* Disable Setlist site check due to false positives (soft 404)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/8c552ca6-51e5-4e79-a791-ddd6f27d2461

* Fix Setlist check: switch to message checkType with proper markers

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/3c387df6-1dfe-451f-96d8-b4b6455f7857

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:18:33 +01:00
Copilot e3aada6aef Fix RoyalCams site check using BongaCams white-label pattern (#2334)
* Initial plan

* Disable RoyalCams site check to fix false-positive probe

The Telegram Maigret bot auto-probe reported CLAIMED for three random
usernames. The status_code checkType is unreliable as the site returns
200 for non-existent user profiles (soft 404). Disabling the site check
until a reliable detection method can be established.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/05b3d513-fe15-477d-a455-0c9ddf0b8b51

* Fix RoyalCams: switch to message checkType using BongaCams white-label pattern

RoyalCams runs on the BongaCams platform. Applied the same fix pattern:
- Switch from status_code to message checkType
- Use Portuguese locale (pt.royalcams.com) as urlProbe
- absenceStrs matches generic title on non-existent profiles
- presenseStrs matches Portuguese profile title for existing users
- Add browser-like headers matching BongaCams config
- Remove disabled flag

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/2f6a9523-278a-4992-ba7c-c320de14bfa4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:16:45 +01:00
Copilot 9b35fc1ab0 [WIP] Fix false-positive probe for vegalab site (#2336)
* Initial plan

* Disable vegalab site check: domain is dead (DNS does not resolve), causing false positives

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/98430e81-5dcb-4cb3-9aaa-f8c5ce86d026

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:09:46 +01:00
Copilot 146bc0481b Disable forums.stevehoffman.tv due to false positives (#2331)
* Initial plan

* Disable forums.stevehoffman.tv to fix false-positive site probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/39fea4a9-ec6d-4a12-b34b-1a3486d647e4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:08:15 +01:00
Copilot 5930a3022e Disable false-positive site probe: amateurvoyeurforum.com (#2332)
* Initial plan

* Disable amateurvoyeurforum.com site check to fix false positives

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/e7fcad2b-4511-4e6d-b186-411951170e0a

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:07:42 +01:00
dependabot[bot] b4482e0ba4 build(deps): bump aiohttp-socks from 0.10.1 to 0.11.0 (#2319)
Bumps [aiohttp-socks](https://github.com/romis2012/aiohttp-socks) from 0.10.1 to 0.11.0.
- [Release notes](https://github.com/romis2012/aiohttp-socks/releases)
- [Commits](https://github.com/romis2012/aiohttp-socks/compare/v0.10.1...v0.11.0)

---
updated-dependencies:
- dependency-name: aiohttp-socks
  dependency-version: 0.11.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-23 12:16:47 +01:00
dependabot[bot] 2c55501bc2 build(deps-dev): bump pytest-cov from 7.0.0 to 7.1.0 (#2320)
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 7.0.0 to 7.1.0.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v7.0.0...v7.1.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-version: 7.1.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-23 12:15:36 +01:00
dependabot[bot] 3ba07591a1 build(deps-dev): bump coverage from 7.12.0 to 7.13.5 (#2321)
Bumps [coverage](https://github.com/coveragepy/coveragepy) from 7.12.0 to 7.13.5.
- [Release notes](https://github.com/coveragepy/coveragepy/releases)
- [Changelog](https://github.com/coveragepy/coveragepy/blob/main/CHANGES.rst)
- [Commits](https://github.com/coveragepy/coveragepy/compare/7.12.0...7.13.5)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.13.5
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-23 12:14:31 +01:00
dependabot[bot] a2d4373b68 build(deps): bump reportlab from 4.4.5 to 4.4.10 (#2323)
Bumps [reportlab](https://www.reportlab.com/) from 4.4.5 to 4.4.10.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-version: 4.4.10
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-23 12:14:18 +01:00
Copilot b960acec10 Pin requests-toolbelt>=1.0.0 to fix urllib3 v2 incompatibility (#2316)
* Initial plan

* Add requests-toolbelt ^1.0.0 as explicit dependency to fix urllib3 v2 appengine ImportError

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/458d41b2-c135-4b51-b0b1-b1832490c808

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 23:53:20 +01:00
Copilot b1a211c3cd Disable forums.developer.nvidia.com (auth-gated user profiles) (#2305)
* Initial plan

* disable forums.developer.nvidia.com due to auth-locked user pages

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/b8f41f15-8588-4aac-a443-af5e2aaa1918

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:43:51 +01:00
Copilot 56d0c9f2f1 Remove dead site xxxforum.org (#2310)
* Initial plan

* Remove broken site xxxforum.org from data.json and sites.md

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/bfbd3aa8-bfb1-480a-b2e7-a2c40fc69def

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:43:21 +01:00
Copilot 01049b730d Fix Love.Mail.ru: update to numeric-only identifiers and new profile URL (#2307)
* Initial plan

* fix: update Love.Mail.ru to use numeric-only identifiers (#1264)

- Add regexCheck to enforce numeric-only IDs (^\d+$)
- Update usernameClaimed/usernameUnclaimed to numeric values
- Site remains disabled pending live verification

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/6de16097-6bc1-424a-beb1-1d2ec6b99944

* fix: update Love.Mail.ru URL to /profile/ path, enable check with verified ID

Use maintainer-provided working link https://love.mail.ru/profile/1838153357.
- Change URL pattern from /ru/{username} to /profile/{username}
- Set usernameClaimed to 1838153357
- Remove disabled flag to enable the check

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/ac07d38e-46e2-42d3-9e93-eda3e5cfbcc3

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:42:59 +01:00
github-actions[bot] 2c2d3409e2 Updated site list and statistics (#2314)
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
2026-03-22 22:42:50 +01:00
Soxoj e81b50ef61 Update site data workflow fix: remove ambiguous main tag (#2313)
* feat(workflow): fix update site data workflow err

* feat(workflow): the final update side data workflow fix (hopefully)
2026-03-22 22:37:48 +01:00
Soxoj 9ac0a65914 feat(workflow): fix update site data workflow err (#2312) 2026-03-22 22:31:55 +01:00
Copilot 4f397fed1c Re-enable taplink.cc with browser User-Agent to bypass Cloudflare (#2308)
* Initial plan

* fix(taplink): re-enable taplink.cc with browser User-Agent header to bypass Cloudflare

Remove disabled flag and add a Chrome User-Agent header to help
bypass Cloudflare bot detection for taplink.cc profile checks.
If Cloudflare still blocks requests, maigret's built-in error
detection will gracefully mark results as UNKNOWN.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/271904b6-e358-4aeb-b503-21c9b91186d9

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:10:44 +01:00
Soxoj a17e0c7a13 feat(workflow): fix update site data workflow dependency (#2306) 2026-03-22 21:34:30 +01:00
dependabot[bot] e84e394e6f Bump svglib from 1.5.1 to 1.6.0 (#2205)
* Bump svglib from 1.5.1 to 1.6.0

Bumps [svglib](https://github.com/deeplook/svglib) from 1.5.1 to 1.6.0.
- [Changelog](https://github.com/deeplook/svglib/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/deeplook/svglib/commits)

---
updated-dependencies:
- dependency-name: svglib
  dependency-version: 1.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Add libcairo2-dev to CI workflow for svglib 1.6.0 compatibility (#2304)

* Initial plan

* Add libcairo2-dev system dependency install step to test workflow

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/3ecab70e-d4a3-4e74-9245-bffc58d6d0a3

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 21:06:10 +01:00
Soxoj b8ada1c818 Update sites list workflow (#2303) 2026-03-22 20:59:37 +01:00
Soxoj 959b2be136 feat(sites): fix false positives: disable 74 broken sites, fix 8 with API probes and better markers (#2302)
- Disable 74 sites: Cloudflare/captcha blocks, identical responses,
    dead domains, vBulletin/phpBB engine failures
  - Fix Roblox, Salon24.pl, Planetaexcel → status_code (clear 404 signal)
  - Fix en.brickimedia.org → message with "noarticletext" absenceStr
  - Fix Arduino → narrower title-based presenseStrs/absenceStrs
  - Re-enable Fandom (3 wikis) via MediaWiki api.php urlProbe
  - Re-enable Substack via /api/v1/user/{}/public_profile urlProbe
  - Re-enable hashnode via GraphQL GET urlProbe (URL-encoded query)
  - Document lessons: engine template drift, search-by-author fragility,
    always-200 sites, TLS degradation, API bypassing Cloudflare,
    GraphQL GET support, URL-encoding for template safety
2026-03-22 20:47:51 +01:00
Soxoj 97cc4b46d9 Improve site-check quality: fix broken site configs, add diagnostic utilities, and make self-check report-only by default with opt-in auto-disable. (#2301)
- Fix VK and TradingView checkType; add Reddit and Microsoft Learn API-style probes where appropriate; adjust or disable entries that are unreliable under anti-bot protection.
- Self-check: stop aggressive auto-disable; default to reporting issues only; add --auto-disable and --diagnose for optional fixes and deeper output.
- Tooling: add utils/site_check.py and utils/check_top_n.py (and related helpers) to inspect and rank site behavior against the top-N list
- Scope: aligns with fixing top-traffic / high-impact sites and making diagnostics repeatable without silently flipping disabled flags
2026-03-22 16:48:35 +01:00
Soxoj f3b741d283 Update Telegram bot link in README (#2300) 2026-03-22 12:23:35 +01:00
dependabot[bot] 33620853a1 Bump certifi from 2025.10.5 to 2025.11.12 (#2249)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2025.10.5 to 2025.11.12.
- [Commits](https://github.com/certifi/python-certifi/compare/2025.10.05...2025.11.12)

---
updated-dependencies:
- dependency-name: certifi
  dependency-version: 2025.11.12
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-22 12:21:32 +01:00
dependabot[bot] 19ed03a94d build(deps): bump werkzeug from 3.1.4 to 3.1.6 (#2288)
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.1.4 to 3.1.6.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/werkzeug/compare/3.1.4...3.1.6)

---
updated-dependencies:
- dependency-name: werkzeug
  dependency-version: 3.1.6
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-22 12:20:42 +01:00
dependabot[bot] 35372446e0 Bump reportlab from 4.4.4 to 4.4.5 (#2251)
Bumps [reportlab](https://www.reportlab.com/) from 4.4.4 to 4.4.5.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-version: 4.4.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-22 12:19:43 +01:00
dependabot[bot] 519bb46db6 build(deps): bump flask from 3.1.2 to 3.1.3 (#2289)
Bumps [flask](https://github.com/pallets/flask) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/flask/releases)
- [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/flask/compare/3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: flask
  dependency-version: 3.1.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-22 12:19:21 +01:00
Soxoj 227a25bfa1 Twitter fixed, mirrors mechanism improvement (#2299) 2026-03-22 01:14:17 +01:00
Soxoj 5da4e78092 Pyinstaller GitHub workflow fix (#2298) 2026-03-22 00:59:17 +01:00
Soxoj e4d6b064df Update Telegram bot link in README (#2293) 2026-03-21 23:49:45 +01:00
Soxoj f99091f5f7 Fixed false positives in top-500 (#2292) 2026-03-21 23:35:59 +01:00
Soxoj f26976f1dd Dockerfile fix (#2290) 2026-03-21 20:02:35 +01:00
dependabot[bot] 83ae9c0133 Bump pypdf from 6.4.0 to 6.9.1 (#2281)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.4.0 to 6.9.1.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/6.4.0...6.9.1)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.9.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-21 18:09:10 +01:00
dependabot[bot] 93c4fdeba9 Bump cryptography from 44.0.1 to 46.0.5 (#2270)
Bumps [cryptography](https://github.com/pyca/cryptography) from 44.0.1 to 46.0.5.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/44.0.1...46.0.5)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-version: 46.0.5
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-21 18:08:57 +01:00
dependabot[bot] 6ec3c47769 Bump black from 25.11.0 to 26.3.1 (#2280)
Bumps [black](https://github.com/psf/black) from 25.11.0 to 26.3.1.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.11.0...26.3.1)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 26.3.1
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-21 18:08:45 +01:00
dependabot[bot] 3dc3fe9371 Bump pillow from 11.0.0 to 12.1.1 (#2271)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 11.0.0 to 12.1.1.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/11.0.0...12.1.1)

---
updated-dependencies:
- dependency-name: pillow
  dependency-version: 12.1.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-21 18:08:18 +01:00
dependabot[bot] ebf8227bf1 Bump urllib3 from 2.5.0 to 2.6.3 (#2262)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.3.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.3)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-21 18:07:51 +01:00
Tang Vu 5b7b28e683 refactor: unexpanded tilde in file path (#2283)
The path `'~/.maigret/settings.json'` uses a tilde (`~`) which is not automatically expanded by Python's `open()` function. This will cause the settings file in the user's home directory to be silently ignored (caught by `FileNotFoundError`) because Python will look for a literal directory named `~` in the current working directory.

Affected files: settings.py
2026-03-21 18:07:23 +01:00
Tang Vu 0e95e2e3cc refactor: missing tests for settings cascade and override logic (#2287)
The `Settings.load()` method iterates through multiple configuration file paths and updates the internal `__dict__`, intending to override earlier default settings with later user-specific ones. This cascading logic is a core configuration feature but lacks explicit tests to guarantee that dictionary merging and overriding behave exactly as documented (e.g., ensuring a setting in `~/.maigret/settings.json` correctly overrides `resources/settings.json` without wiping out other keys).


Affected files: test_settings.py
2026-03-21 18:06:54 +01:00
Tang Vu 4cd1fccaa3 ♻️ Refactor: Hardcoded relative path for database file (#2285)
* refactor: hardcoded relative path for database file

`app.config['MAIGRET_DB_FILE']` is set to a hardcoded relative path `os.path.join('maigret', 'resources', 'data.json')`. If the Flask application is executed from a different working directory (other than the repository root), it will fail to find the database file and crash.

Affected files: app.py, settings.py

* refactor: hardcoded relative path for database file

`app.config['MAIGRET_DB_FILE']` is set to a hardcoded relative path `os.path.join('maigret', 'resources', 'data.json')`. If the Flask application is executed from a different working directory (other than the repository root), it will fail to find the database file and crash.

Affected files: app.py, settings.py
2026-03-21 18:06:36 +01:00
Olivier Cervello 7ee1872dbc Update poetry.lock to match pyproject.toml changes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 11:06:54 +01:00
Olivier Cervello 5130206b98 Bump lxml minimum to 6.0.2 for Python 3.14 compatibility
lxml 5.x fails to build on Python 3.14 due to incompatible pointer
types in Cython-generated C code. lxml 6.0.2 compiles correctly.

Fixes #2266

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 11:04:57 +01:00
dependabot[bot] 83a9dafe55 Bump mypy from 1.18.2 to 1.19.0 (#2250)
Bumps [mypy](https://github.com/python/mypy) from 1.18.2 to 1.19.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.18.2...v1.19.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.19.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-06 21:13:28 +01:00
dependabot[bot] b4147d2cd3 Bump pytest from 8.4.2 to 9.0.1 (#2244)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.4.2 to 9.0.1.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.4.2...9.0.1)

---
updated-dependencies:
- dependency-name: pytest
  dependency-version: 9.0.1
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-06 21:13:15 +01:00
dependabot[bot] aa591da913 Bump aiohttp from 3.13.2 to 3.13.3 (#2261)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-06 21:12:22 +01:00
dependabot[bot] 2d4d3ba0cc Bump pytest-asyncio from 1.2.0 to 1.3.0 (#2242)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 1.2.0 to 1.3.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v1.2.0...v1.3.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-version: 1.3.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-02 14:16:48 +01:00
dependabot[bot] ec21bbe974 Bump asgiref from 3.10.0 to 3.11.0 (#2243)
Bumps [asgiref](https://github.com/django/asgiref) from 3.10.0 to 3.11.0.
- [Changelog](https://github.com/django/asgiref/blob/main/CHANGELOG.txt)
- [Commits](https://github.com/django/asgiref/compare/3.10.0...3.11.0)

---
updated-dependencies:
- dependency-name: asgiref
  dependency-version: 3.11.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-02 14:16:27 +01:00
dependabot[bot] 1a4190ee03 Bump pypdf from 6.1.3 to 6.4.0 (#2245)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.1.3 to 6.4.0.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/6.1.3...6.4.0)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.4.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-02 14:16:14 +01:00
dependabot[bot] fe60783a68 Bump werkzeug from 3.1.3 to 3.1.4 (#2248)
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.1.3 to 3.1.4.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/werkzeug/compare/3.1.3...3.1.4)

---
updated-dependencies:
- dependency-name: werkzeug
  dependency-version: 3.1.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-02 14:16:04 +01:00
dependabot[bot] 8aa0fab314 Bump coverage from 7.11.0 to 7.12.0 (#2241)
Bumps [coverage](https://github.com/coveragepy/coveragepy) from 7.11.0 to 7.12.0.
- [Release notes](https://github.com/coveragepy/coveragepy/releases)
- [Changelog](https://github.com/coveragepy/coveragepy/blob/main/CHANGES.rst)
- [Commits](https://github.com/coveragepy/coveragepy/compare/7.11.0...7.12.0)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.12.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-28 10:31:19 +01:00
dependabot[bot] 941a5171ae Bump psutil from 7.1.0 to 7.1.3 (#2240)
Bumps [psutil](https://github.com/giampaolo/psutil) from 7.1.0 to 7.1.3.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-7.1.0...release-7.1.3)

---
updated-dependencies:
- dependency-name: psutil
  dependency-version: 7.1.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-23 21:24:11 +01:00
dependabot[bot] 9a1bd8ffdb Bump python-bidi from 0.6.6 to 0.6.7 (#2234)
Bumps [python-bidi](https://github.com/MeirKriheli/python-bidi) from 0.6.6 to 0.6.7.
- [Release notes](https://github.com/MeirKriheli/python-bidi/releases)
- [Changelog](https://github.com/MeirKriheli/python-bidi/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/MeirKriheli/python-bidi/compare/v0.6.6...v0.6.7)

---
updated-dependencies:
- dependency-name: python-bidi
  dependency-version: 0.6.7
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-23 21:24:04 +01:00
dependabot[bot] 68f586fcca Bump black from 25.9.0 to 25.11.0 (#2239)
Bumps [black](https://github.com/psf/black) from 25.9.0 to 25.11.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.9.0...25.11.0)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 25.11.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-23 21:23:52 +01:00
dependabot[bot] e39476c4c7 Bump pypdf from 6.0.0 to 6.1.3 (#2233)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.0.0 to 6.1.3.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/6.0.0...6.1.3)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.1.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-23 21:23:40 +01:00
dependabot[bot] 6a7f778c80 Bump aiohttp from 3.13.0 to 3.13.2 (#2237)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-07 12:17:34 +01:00
dependabot[bot] 7679f98e58 Bump attrs from 25.3.0 to 25.4.0 (#2226)
Bumps [attrs](https://github.com/sponsors/hynek) from 25.3.0 to 25.4.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-version: 25.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:41:42 +07:00
dependabot[bot] c6dbc09ba5 Bump pytest-rerunfailures from 16.0.1 to 16.1 (#2229)
Bumps [pytest-rerunfailures](https://github.com/pytest-dev/pytest-rerunfailures) from 16.0.1 to 16.1.
- [Changelog](https://github.com/pytest-dev/pytest-rerunfailures/blob/master/CHANGES.rst)
- [Commits](https://github.com/pytest-dev/pytest-rerunfailures/compare/16.0.1...16.1)

---
updated-dependencies:
- dependency-name: pytest-rerunfailures
  dependency-version: '16.1'
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:41:30 +07:00
dependabot[bot] b8352c3406 Bump certifi from 2025.8.3 to 2025.10.5 (#2228)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2025.8.3 to 2025.10.5.
- [Commits](https://github.com/certifi/python-certifi/compare/2025.08.03...2025.10.05)

---
updated-dependencies:
- dependency-name: certifi
  dependency-version: 2025.10.5
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:41:18 +07:00
dependabot[bot] 8a02ad5ed7 Bump coverage from 7.10.7 to 7.11.0 (#2230)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.10.7 to 7.11.0.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.10.7...7.11.0)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.11.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:41:07 +07:00
dependabot[bot] 8fda5776c6 Bump aiohttp from 3.12.15 to 3.13.0 (#2225)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-14 19:55:53 +02:00
dependabot[bot] 2347bd2f7d Bump idna from 3.10 to 3.11 (#2227)
Bumps [idna](https://github.com/kjd/idna) from 3.10 to 3.11.
- [Release notes](https://github.com/kjd/idna/releases)
- [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst)
- [Commits](https://github.com/kjd/idna/compare/v3.10...v3.11)

---
updated-dependencies:
- dependency-name: idna
  dependency-version: '3.11'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-14 19:55:44 +02:00
dependabot[bot] 229472f323 Bump multidict from 6.6.4 to 6.7.0 (#2224)
Bumps [multidict](https://github.com/aio-libs/multidict) from 6.6.4 to 6.7.0.
- [Release notes](https://github.com/aio-libs/multidict/releases)
- [Changelog](https://github.com/aio-libs/multidict/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/multidict/compare/v6.6.4...v6.7.0)

---
updated-dependencies:
- dependency-name: multidict
  dependency-version: 6.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-13 10:06:54 +02:00
dependabot[bot] 6acc22dd69 Bump markupsafe from 3.0.2 to 3.0.3 (#2209)
Bumps [markupsafe](https://github.com/pallets/markupsafe) from 3.0.2 to 3.0.3.
- [Release notes](https://github.com/pallets/markupsafe/releases)
- [Changelog](https://github.com/pallets/markupsafe/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/markupsafe/compare/3.0.2...3.0.3)

---
updated-dependencies:
- dependency-name: markupsafe
  dependency-version: 3.0.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-10 11:37:12 +02:00
dependabot[bot] 8af07b3889 Bump yarl from 1.20.1 to 1.22.0 (#2221)
---
updated-dependencies:
- dependency-name: yarl
  dependency-version: 1.22.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-10 11:37:02 +02:00
dependabot[bot] e9df40bdce Bump asgiref from 3.9.2 to 3.10.0 (#2220)
Bumps [asgiref](https://github.com/django/asgiref) from 3.9.2 to 3.10.0.
- [Changelog](https://github.com/django/asgiref/blob/main/CHANGELOG.txt)
- [Commits](https://github.com/django/asgiref/compare/3.9.2...3.10.0)

---
updated-dependencies:
- dependency-name: asgiref
  dependency-version: 3.10.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-10 11:36:53 +02:00
dependabot[bot] d5bef9e3ac Bump platformdirs from 4.4.0 to 4.5.0 (#2223)
Bumps [platformdirs](https://github.com/tox-dev/platformdirs) from 4.4.0 to 4.5.0.
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.4.0...4.5.0)

---
updated-dependencies:
- dependency-name: platformdirs
  dependency-version: 4.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-10 11:36:44 +02:00
dependabot[bot] 25121754bd Bump lxml from 6.0.1 to 6.0.2 (#2208)
Bumps [lxml](https://github.com/lxml/lxml) from 6.0.1 to 6.0.2.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-6.0.1...lxml-6.0.2)

---
updated-dependencies:
- dependency-name: lxml
  dependency-version: 6.0.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-08 09:57:59 +02:00
dependabot[bot] 198c11b8d4 Bump asgiref from 3.9.1 to 3.9.2 (#2204)
Bumps [asgiref](https://github.com/django/asgiref) from 3.9.1 to 3.9.2.
- [Changelog](https://github.com/django/asgiref/blob/main/CHANGELOG.txt)
- [Commits](https://github.com/django/asgiref/compare/3.9.1...3.9.2)

---
updated-dependencies:
- dependency-name: asgiref
  dependency-version: 3.9.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-01 13:32:28 +02:00
dependabot[bot] bf9bc5a518 Bump psutil from 7.0.0 to 7.1.0 (#2201)
Bumps [psutil](https://github.com/giampaolo/psutil) from 7.0.0 to 7.1.0.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-7.0.0...release-7.1.0)

---
updated-dependencies:
- dependency-name: psutil
  dependency-version: 7.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-01 13:32:21 +02:00
dependabot[bot] 41e246f6a6 Bump coverage from 7.10.6 to 7.10.7 (#2207)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.10.6 to 7.10.7.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.10.6...7.10.7)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.10.7
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-01 13:32:13 +02:00
dependabot[bot] 9f58fb27ad Bump reportlab from 4.4.3 to 4.4.4 (#2206)
Bumps [reportlab](https://www.reportlab.com/) from 4.4.3 to 4.4.4.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-version: 4.4.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-01 13:32:04 +02:00
dependabot[bot] b344a5d98a Bump pyinstaller from 6.15.0 to 6.16.0 (#2199)
Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 6.15.0 to 6.16.0.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v6.15.0...v6.16.0)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-version: 6.16.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-25 11:10:36 +03:00
dependabot[bot] d8b26181f1 Bump pytest-asyncio from 1.1.0 to 1.2.0 (#2200)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-version: 1.2.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-25 11:10:27 +03:00
dependabot[bot] a60d96c7f2 Bump mypy from 1.18.1 to 1.18.2 (#2202)
Bumps [mypy](https://github.com/python/mypy) from 1.18.1 to 1.18.2.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.18.1...v1.18.2)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.18.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-25 11:10:22 +03:00
dependabot[bot] a3159b213b Bump black from 25.1.0 to 25.9.0 (#2203)
Bumps [black](https://github.com/psf/black) from 25.1.0 to 25.9.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.1.0...25.9.0)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 25.9.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-25 11:10:06 +03:00
dependabot[bot] 123ead4c03 Bump mypy from 1.17.1 to 1.18.1 (#2197)
Bumps [mypy](https://github.com/python/mypy) from 1.17.1 to 1.18.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.17.1...v1.18.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.18.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-14 10:38:16 +02:00
dependabot[bot] cd7571ef57 Bump pytest-cov from 6.3.0 to 7.0.0 (#2196)
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 6.3.0 to 7.0.0.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v6.3.0...v7.0.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-version: 7.0.0
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-10 13:28:44 +02:00
dependabot[bot] d922f9be25 Bump pytest-cov from 6.2.1 to 6.3.0 (#2195)
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 6.2.1 to 6.3.0.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v6.2.1...v6.3.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-version: 6.3.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 10:06:55 +02:00
dependabot[bot] 3b20b36609 Bump pytest from 8.4.1 to 8.4.2 (#2194)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.4.1 to 8.4.2.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.4.1...8.4.2)

---
updated-dependencies:
- dependency-name: pytest
  dependency-version: 8.4.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-05 11:59:27 +02:00
dependabot[bot] ba86981cf4 Bump pytest-rerunfailures from 15.1 to 16.0.1 (#2193)
Bumps [pytest-rerunfailures](https://github.com/pytest-dev/pytest-rerunfailures) from 15.1 to 16.0.1.
- [Changelog](https://github.com/pytest-dev/pytest-rerunfailures/blob/master/CHANGES.rst)
- [Commits](https://github.com/pytest-dev/pytest-rerunfailures/compare/15.1...16.0.1)

---
updated-dependencies:
- dependency-name: pytest-rerunfailures
  dependency-version: 16.0.1
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-04 20:26:34 +02:00
dependabot[bot] 561ced647f Bump pytest-rerunfailures from 15.1 to 16.0 (#2191)
Bumps [pytest-rerunfailures](https://github.com/pytest-dev/pytest-rerunfailures) from 15.1 to 16.0.
- [Changelog](https://github.com/pytest-dev/pytest-rerunfailures/blob/master/CHANGES.rst)
- [Commits](https://github.com/pytest-dev/pytest-rerunfailures/compare/15.1...16.0)

---
updated-dependencies:
- dependency-name: pytest-rerunfailures
  dependency-version: '16.0'
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-02 11:19:28 +02:00
dependabot[bot] 7be3ee8240 Bump coverage from 7.10.5 to 7.10.6 (#2192)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.10.5 to 7.10.6.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.10.5...7.10.6)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.10.6
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-02 11:19:14 +02:00
Soxoj 48ca13dc4d Make web interface accessible for Docker deployment by default (#2189) 2025-08-31 16:14:42 +02:00
dependabot[bot] 7f94e86259 Bump platformdirs from 4.3.8 to 4.4.0 (#2184)
Bumps [platformdirs](https://github.com/tox-dev/platformdirs) from 4.3.8 to 4.4.0.
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.3.8...4.4.0)

---
updated-dependencies:
- dependency-name: platformdirs
  dependency-version: 4.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-29 15:20:31 +02:00
dependabot[bot] c2ed1af4b4 Bump python-bidi from 0.6.3 to 0.6.6 (#2183)
Bumps [python-bidi](https://github.com/MeirKriheli/python-bidi) from 0.6.3 to 0.6.6.
- [Release notes](https://github.com/MeirKriheli/python-bidi/releases)
- [Changelog](https://github.com/MeirKriheli/python-bidi/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/MeirKriheli/python-bidi/compare/v0.6.3...v0.6.6)

---
updated-dependencies:
- dependency-name: python-bidi
  dependency-version: 0.6.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-29 15:20:23 +02:00
dependabot[bot] 648ba6e64c Bump typing-extensions from 4.14.1 to 4.15.0 (#2182)
Bumps [typing-extensions](https://github.com/python/typing_extensions) from 4.14.1 to 4.15.0.
- [Release notes](https://github.com/python/typing_extensions/releases)
- [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/python/typing_extensions/compare/4.14.1...4.15.0)

---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-version: 4.15.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-29 15:20:15 +02:00
dependabot[bot] 56815d8368 Bump soupsieve from 2.7 to 2.8 (#2185)
Bumps [soupsieve](https://github.com/facelessuser/soupsieve) from 2.7 to 2.8.
- [Release notes](https://github.com/facelessuser/soupsieve/releases)
- [Commits](https://github.com/facelessuser/soupsieve/compare/2.7...2.8)

---
updated-dependencies:
- dependency-name: soupsieve
  dependency-version: '2.8'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-29 15:20:05 +02:00
dependabot[bot] b178e97d90 Bump multidict from 6.6.3 to 6.6.4 (#2177)
Bumps [multidict](https://github.com/aio-libs/multidict) from 6.6.3 to 6.6.4.
- [Release notes](https://github.com/aio-libs/multidict/releases)
- [Changelog](https://github.com/aio-libs/multidict/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/multidict/compare/v6.6.3...v6.6.4)

---
updated-dependencies:
- dependency-name: multidict
  dependency-version: 6.6.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-27 00:40:35 +02:00
dependabot[bot] a764198c2c Bump lxml from 6.0.0 to 6.0.1 (#2178)
Bumps [lxml](https://github.com/lxml/lxml) from 6.0.0 to 6.0.1.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-6.0.0...lxml-6.0.1)

---
updated-dependencies:
- dependency-name: lxml
  dependency-version: 6.0.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-27 00:40:24 +02:00
dependabot[bot] 2c4684e4a9 Bump psutil from 6.1.1 to 7.0.0 (#2179)
Bumps [psutil](https://github.com/giampaolo/psutil) from 6.1.1 to 7.0.0.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-6.1.1...release-7.0.0)

---
updated-dependencies:
- dependency-name: psutil
  dependency-version: 7.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-27 00:40:15 +02:00
dependabot[bot] 8713e1a63e Bump coverage from 7.10.3 to 7.10.5 (#2180)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.10.3 to 7.10.5.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.10.3...7.10.5)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.10.5
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-27 00:40:08 +02:00
dependabot[bot] 55adc70d10 Bump aiohttp from 3.12.14 to 3.12.15 (#2181)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.12.15
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-27 00:39:59 +02:00
dependabot[bot] 53fc83dbce Bump flake8 from 7.1.1 to 7.3.0 (#2171)
Bumps [flake8](https://github.com/pycqa/flake8) from 7.1.1 to 7.3.0.
- [Commits](https://github.com/pycqa/flake8/compare/7.1.1...7.3.0)

---
updated-dependencies:
- dependency-name: flake8
  dependency-version: 7.3.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-22 21:15:03 +02:00
dependabot[bot] e8bd00f013 Bump pytest from 8.3.4 to 8.4.1 (#2172)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.3.4 to 8.4.1.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.3.4...8.4.1)

---
updated-dependencies:
- dependency-name: pytest
  dependency-version: 8.4.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-22 21:14:55 +02:00
dependabot[bot] a0ba853e64 Bump mypy from 1.14.1 to 1.17.1 (#2173)
Bumps [mypy](https://github.com/python/mypy) from 1.14.1 to 1.17.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.14.1...v1.17.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.17.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-22 21:14:48 +02:00
dependabot[bot] 54b4c7d2ab Bump pyinstaller from 6.11.1 to 6.15.0 (#2174)
Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 6.11.1 to 6.15.0.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v6.11.1...v6.15.0)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-version: 6.15.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-22 21:14:35 +02:00
dependabot[bot] 8791bca866 Bump flask from 3.1.1 to 3.1.2 (#2175)
Bumps [flask](https://github.com/pallets/flask) from 3.1.1 to 3.1.2.
- [Release notes](https://github.com/pallets/flask/releases)
- [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/flask/compare/3.1.1...3.1.2)

---
updated-dependencies:
- dependency-name: flask
  dependency-version: 3.1.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-22 21:14:24 +02:00
Soxoj fb26ccd1f6 Disabled some sites giving false positive results (#2170) 2025-08-22 03:10:47 +02:00
dependabot[bot] c22abdb834 Bump certifi from 2025.6.15 to 2025.8.3 (#2147)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2025.6.15 to 2025.8.3.
- [Commits](https://github.com/certifi/python-certifi/compare/2025.06.15...2025.08.03)

---
updated-dependencies:
- dependency-name: certifi
  dependency-version: 2025.8.3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-21 16:27:10 +02:00
dependabot[bot] 0689470506 Bump alive-progress from 3.2.0 to 3.3.0 (#2145)
Bumps [alive-progress](https://github.com/rsalmei/alive-progress) from 3.2.0 to 3.3.0.
- [Changelog](https://github.com/rsalmei/alive-progress/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rsalmei/alive-progress/compare/v3.2.0...v3.3.0)

---
updated-dependencies:
- dependency-name: alive-progress
  dependency-version: 3.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-21 16:27:02 +02:00
dependabot[bot] 410d7568b7 Bump aiodns from 3.2.0 to 3.5.0 (#2148)
Bumps [aiodns](https://github.com/saghul/aiodns) from 3.2.0 to 3.5.0.
- [Release notes](https://github.com/saghul/aiodns/releases)
- [Changelog](https://github.com/aio-libs/aiodns/blob/master/ChangeLog)
- [Commits](https://github.com/saghul/aiodns/compare/v3.2.0...v3.5.0)

---
updated-dependencies:
- dependency-name: aiodns
  dependency-version: 3.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-21 10:03:57 +02:00
dependabot[bot] 7280033198 Bump lxml from 5.3.0 to 6.0.0 (#2146)
Bumps [lxml](https://github.com/lxml/lxml) from 5.3.0 to 6.0.0.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-5.3.0...lxml-6.0.0)

---
updated-dependencies:
- dependency-name: lxml
  dependency-version: 6.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-21 10:03:46 +02:00
dependabot[bot] 3c6af42916 Bump requests from 2.32.4 to 2.32.5 (#2165)
Bumps [requests](https://github.com/psf/requests) from 2.32.4 to 2.32.5.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.4...v2.32.5)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-21 10:02:46 +02:00
dependabot[bot] cdb896ba32 Bump xhtml2pdf from 0.2.16 to 0.2.17 (#2149)
Bumps [xhtml2pdf](https://github.com/xhtml2pdf/xhtml2pdf) from 0.2.16 to 0.2.17.
- [Release notes](https://github.com/xhtml2pdf/xhtml2pdf/releases)
- [Commits](https://github.com/xhtml2pdf/xhtml2pdf/compare/v0.2.16...v0.2.17)

---
updated-dependencies:
- dependency-name: xhtml2pdf
  dependency-version: 0.2.17
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-21 09:57:49 +02:00
dependabot[bot] 6bd047fda3 Bump pytest-cov from 6.0.0 to 6.2.1 (#2115)
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 6.0.0 to 6.2.1.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v6.0.0...v6.2.1)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-version: 6.2.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-16 14:07:36 +02:00
dependabot[bot] e30cf353a6 Bump pytest-asyncio from 1.0.0 to 1.1.0 (#2114)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 1.0.0 to 1.1.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v1.0.0...v1.1.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-version: 1.1.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-16 14:03:49 +02:00
dependabot[bot] bd9e48de7c Bump mock from 5.1.0 to 5.2.0 (#2116)
Bumps [mock](https://github.com/testing-cabal/mock) from 5.1.0 to 5.2.0.
- [Changelog](https://github.com/testing-cabal/mock/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/testing-cabal/mock/compare/5.1.0...5.2.0)

---
updated-dependencies:
- dependency-name: mock
  dependency-version: 5.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-16 14:03:40 +02:00
dependabot[bot] aec4fef8db Bump soupsieve from 2.6 to 2.7 (#2118)
Bumps [soupsieve](https://github.com/facelessuser/soupsieve) from 2.6 to 2.7.
- [Release notes](https://github.com/facelessuser/soupsieve/releases)
- [Commits](https://github.com/facelessuser/soupsieve/compare/2.6...2.7)

---
updated-dependencies:
- dependency-name: soupsieve
  dependency-version: '2.7'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-16 14:03:27 +02:00
dependabot[bot] 1da49bd208 Bump coverage from 7.9.2 to 7.10.3 (#2117)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.9.2 to 7.10.3.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.9.2...7.10.3)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.10.3
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-16 14:03:19 +02:00
dependabot[bot] 6da39cf3d5 Bump pypdf from 5.1.0 to 6.0.0 (#2122)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 5.1.0 to 6.0.0.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/5.1.0...6.0.0)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.0.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-16 14:03:08 +02:00
Soxoj f869eb49ca Updated workflows: added 3.13 to test, updated pypi upload (#2111) 2025-08-10 14:10:06 +02:00
Soxoj bebadb0362 Bump to 0.5.0 (#2108) 2025-08-10 13:10:50 +02:00
dependabot[bot] 495eef6ad5 Bump pytest-rerunfailures from 15.0 to 15.1 (#2030)
Bumps [pytest-rerunfailures](https://github.com/pytest-dev/pytest-rerunfailures) from 15.0 to 15.1.
- [Changelog](https://github.com/pytest-dev/pytest-rerunfailures/blob/master/CHANGES.rst)
- [Commits](https://github.com/pytest-dev/pytest-rerunfailures/compare/15.0...15.1)

---
updated-dependencies:
- dependency-name: pytest-rerunfailures
  dependency-version: '15.1'
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-10 12:58:35 +02:00
dependabot[bot] e1c72bfb94 Bump multidict from 6.1.0 to 6.6.3 (#2034)
Bumps [multidict](https://github.com/aio-libs/multidict) from 6.1.0 to 6.6.3.
- [Release notes](https://github.com/aio-libs/multidict/releases)
- [Changelog](https://github.com/aio-libs/multidict/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/multidict/compare/v6.1.0...v6.6.3)

---
updated-dependencies:
- dependency-name: multidict
  dependency-version: 6.6.3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-10 12:54:40 +02:00
dependabot[bot] deb13c9638 Bump asgiref from 3.8.1 to 3.9.1 (#2040)
Bumps [asgiref](https://github.com/django/asgiref) from 3.8.1 to 3.9.1.
- [Changelog](https://github.com/django/asgiref/blob/main/CHANGELOG.txt)
- [Commits](https://github.com/django/asgiref/compare/3.8.1...3.9.1)

---
updated-dependencies:
- dependency-name: asgiref
  dependency-version: 3.9.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-10 12:50:12 +02:00
dependabot[bot] 1e8e1acd58 Bump reportlab from 4.2.5 to 4.4.3 (#2063)
Bumps [reportlab](https://www.reportlab.com/) from 4.2.5 to 4.4.3.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-version: 4.4.3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-10 12:47:18 +02:00
Soxoj 5e88fd9ba8 Fixed test dialog_adds_site_negative (#2107) 2025-08-10 12:44:05 +02:00
dependabot[bot] 6bc836d6c4 Bump yarl from 1.18.3 to 1.20.1 (#2032)
Bumps [yarl](https://github.com/aio-libs/yarl) from 1.18.3 to 1.20.1.
- [Release notes](https://github.com/aio-libs/yarl/releases)
- [Changelog](https://github.com/aio-libs/yarl/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/yarl/commits)

---
updated-dependencies:
- dependency-name: yarl
  dependency-version: 1.20.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-10 12:29:46 +02:00
dependabot[bot] 080611c8b9 Bump aiohttp from 3.11.11 to 3.12.14 (#2041)
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.11.11 to 3.12.14.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.11.11...v3.12.14)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.12.14
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-18 18:15:14 +02:00
dependabot[bot] c3cf589aed Bump coverage from 7.6.10 to 7.9.2 (#2039)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.6.10 to 7.9.2.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.6.10...7.9.2)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.9.2
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-15 09:39:52 +02:00
dependabot[bot] e01d5caae1 Bump platformdirs from 4.3.6 to 4.3.8 (#2033)
Bumps [platformdirs](https://github.com/tox-dev/platformdirs) from 4.3.6 to 4.3.8.
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.3.6...4.3.8)

---
updated-dependencies:
- dependency-name: platformdirs
  dependency-version: 4.3.8
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-13 16:19:40 +02:00
MR-VL d90d8a8ac9 Disable AskFM (#2037) 2025-07-13 16:16:49 +02:00
dependabot[bot] c3ce8a200b Bump typing-extensions from 4.12.2 to 4.14.1 (#2038)
---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-version: 4.14.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-13 16:16:16 +02:00
dependabot[bot] 65ea5ceeb1 Bump certifi from 2024.12.14 to 2025.1.31 (#2004)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.12.14 to 2025.1.31.
- [Commits](https://github.com/certifi/python-certifi/compare/2024.12.14...2025.01.31)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:52:07 +02:00
dependabot[bot] bca1d4bfd8 Bump attrs from 24.3.0 to 25.3.0 (#2014)
Bumps [attrs](https://github.com/sponsors/hynek) from 24.3.0 to 25.3.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:46:19 +02:00
Darlyson Rangel c9e38632ca Disable ICQ site (#1993) 2025-06-28 23:46:09 +02:00
dependabot[bot] 5f8ce2da98 Bump urllib3 from 2.2.3 to 2.5.0 (#2027)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.2.3 to 2.5.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.2.3...2.5.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.5.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:43:19 +02:00
dependabot[bot] bc6f7f831d Bump pytest-asyncio from 0.25.2 to 0.26.0 (#2016)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.25.2 to 0.26.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.25.2...v0.26.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:43:08 +02:00
dependabot[bot] f95c71d009 Bump pycares from 4.5.0 to 4.9.0 (#2025)
Bumps [pycares](https://github.com/saghul/pycares) from 4.5.0 to 4.9.0.
- [Release notes](https://github.com/saghul/pycares/releases)
- [Changelog](https://github.com/saghul/pycares/blob/master/ChangeLog)
- [Commits](https://github.com/saghul/pycares/compare/v4.5.0...v4.9.0)

---
updated-dependencies:
- dependency-name: pycares
  dependency-version: 4.9.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:39:52 +02:00
dependabot[bot] 974c93f327 Bump requests from 2.32.3 to 2.32.4 (#2026)
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.3...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:39:39 +02:00
dependabot[bot] ed7b65e5ed Bump flask from 3.1.0 to 3.1.1 (#2028)
Bumps [flask](https://github.com/pallets/flask) from 3.1.0 to 3.1.1.
- [Release notes](https://github.com/pallets/flask/releases)
- [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/flask/compare/3.1.0...3.1.1)

---
updated-dependencies:
- dependency-name: flask
  dependency-version: 3.1.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:33:49 +02:00
Pierre-Yves Lapersonne f76ea5d738 [#2010] Add 6 more websites to manage (#2009)
* feat: add `framapiaf.org` in supported web sites, add tag `mastodon` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `write.as` in supported web sites, add tag `writefreely` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `programming.dev` in supported web sites, add tag `lemmy` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `mamot.fr` in supported web sites (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `pixelfed.social` in supported web sites, add tag `pixelfed` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `Outgress` in supported web sites (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* Updated the list of supported sites

---------

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>
Co-authored-by: Soxoj <soxoj@protonmail.com>
2025-06-28 23:33:29 +02:00
dependabot[bot] 960b28d454 Bump jinja2 from 3.1.5 to 3.1.6 (#2011)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.5 to 3.1.6.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.5...3.1.6)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:24:46 +02:00
dependabot[bot] 09eef6701a Bump cryptography from 44.0.0 to 44.0.1 (#2005)
Bumps [cryptography](https://github.com/pyca/cryptography) from 44.0.0 to 44.0.1.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/44.0.0...44.0.1)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:24:32 +02:00
Sammy Folkhome 329ef27eff Update Installer.bat (#1994)
Greatly improved the installer overall and fixed issue of newer versions of pip not installing packages
2025-06-28 23:23:11 +02:00
dependabot[bot] ccb3b3bbd1 Bump black from 24.10.0 to 25.1.0 (#2001)
Bumps [black](https://github.com/psf/black) from 24.10.0 to 25.1.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/24.10.0...25.1.0)

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:20:13 +02:00
pykereaper b21ac36b27 Fix usage of data.json files from web (#2020) 2025-06-28 23:20:02 +02:00
pykereaper 0f7aa2c456 Pass db_file configuration to web interface (#2019)
* pass db_file configuration to web interface
* Autoformatting

---------

Co-authored-by: Soxoj <soxoj@protonmail.com>
2025-06-28 23:15:56 +02:00
Soxoj c0e60e25b8 upload-artifact action in python test workflow updated to v4 (#2024)
* upload-artifact action in python test workflow updated to v4
* Upload artifacts for all jobs
2025-06-28 23:08:56 +02:00
Ikko Eltociear Ashimine 4195a3ca21 docs: update usage-examples.rst (#1996)
inital -> initial
2025-02-18 10:50:29 +01:00
dependabot[bot] 5b3b81b482 Bump pytest-asyncio from 0.25.1 to 0.25.2 (#1990)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.25.1 to 0.25.2.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.25.1...v0.25.2)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-19 16:18:40 +01:00
dependabot[bot] 29d2c07a76 Bump mypy from 1.14.0 to 1.14.1 (#1988)
Bumps [mypy](https://github.com/python/mypy) from 1.14.0 to 1.14.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.14.0...v1.14.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-02 21:10:20 +05:00
dependabot[bot] 7ff2424de1 Bump pytest-asyncio from 0.25.0 to 0.25.1 (#1989)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.25.0 to 0.25.1.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.25.0...v0.25.1)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-02 21:07:27 +05:00
dependabot[bot] fc1dd9380e Bump coverage from 7.6.9 to 7.6.10 (#1986)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.6.9 to 7.6.10.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.6.9...7.6.10)

---
updated-dependencies:
- dependency-name: coverage
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-28 20:02:37 +01:00
dependabot[bot] e423d72576 Bump jinja2 from 3.1.4 to 3.1.5 (#1982)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.4 to 3.1.5.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.4...3.1.5)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-28 19:59:41 +01:00
dependabot[bot] 9bc6c3370c Bump aiohttp-socks from 0.10.0 to 0.10.1 (#1987)
Bumps [aiohttp-socks](https://github.com/romis2012/aiohttp-socks) from 0.10.0 to 0.10.1.
- [Release notes](https://github.com/romis2012/aiohttp-socks/releases)
- [Commits](https://github.com/romis2012/aiohttp-socks/compare/v0.10.0...v0.10.1)

---
updated-dependencies:
- dependency-name: aiohttp-socks
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-28 19:56:12 +01:00
dependabot[bot] b90cdb1981 Bump mypy from 1.13.0 to 1.14.0 (#1983)
Bumps [mypy](https://github.com/python/mypy) from 1.13.0 to 1.14.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.13.0...v1.14.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-26 12:28:44 +01:00
dependabot[bot] 21b35e3798 Bump aiohttp-socks from 0.9.1 to 0.10.0 (#1985)
Bumps [aiohttp-socks](https://github.com/romis2012/aiohttp-socks) from 0.9.1 to 0.10.0.
- [Release notes](https://github.com/romis2012/aiohttp-socks/releases)
- [Commits](https://github.com/romis2012/aiohttp-socks/compare/v0.9.1...v0.10.0)

---
updated-dependencies:
- dependency-name: aiohttp-socks
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-26 11:43:41 +01:00
dependabot[bot] b8cf91cc8b Bump psutil from 6.1.0 to 6.1.1 (#1980)
Bumps [psutil](https://github.com/giampaolo/psutil) from 6.1.0 to 6.1.1.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-6.1.0...release-6.1.1)

---
updated-dependencies:
- dependency-name: psutil
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-20 10:03:52 +01:00
dependabot[bot] 8d5e557720 Bump aiohttp from 3.11.10 to 3.11.11 (#1979)
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.11.10 to 3.11.11.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.11.10...v3.11.11)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-19 12:04:31 +01:00
Soxoj 97e5f600d0 Async generator-executor for site checks (#1978) 2024-12-17 22:48:11 +01:00
overcuriousity 36ce285572 make graph more meaningful (#1977)
* make graph more meaningful

if a search with multiple usernames is launched, it creates an additional site node where they both are found. 
advantages:
- better recognition, that users have a connection with each other
- better detection of false positives when launching a search with two fake usernames (site node = definite false positive)

* fix Graph linking report.py
2024-12-17 16:51:19 +01:00
overcuriousity c2e3e96cb7 Improving the web interface (#1975)
* update web interface with commandline options
* improve web interface
* update README images of web interface
* fix bug in app.py
* fix web interface
2024-12-17 16:50:49 +01:00
imgbot[bot] 900ed840b3 [ImgBot] Optimize images (#1974)
*Total -- 2,762.12kb -> 2,382.30kb (13.75%)

/static/web_interface_screenshot_start.png -- 106.23kb -> 59.10kb (44.37%)
/docs/source/maigret_screenshot.png -- 375.21kb -> 233.98kb (37.64%)
/static/web_interface_screenshot.png -- 615.65kb -> 424.23kb (31.09%)
/static/recursive_search.svg -- 1,665.02kb -> 1,664.99kb (0%)

Signed-off-by: ImgBotApp <ImgBotHelp@gmail.com>
Co-authored-by: ImgBotApp <ImgBotHelp@gmail.com>
2024-12-16 19:24:08 +01:00
Soxoj c3dfe9cb4d Small docs and parameters fixes for web interface mode (#1973) 2024-12-16 17:18:22 +01:00
Soxoj 4894a267d7 Added web interface docs (#1972) 2024-12-16 17:06:06 +01:00
dependabot[bot] 984584f87d Bump attrs from 24.2.0 to 24.3.0 (#1970)
Bumps [attrs](https://github.com/sponsors/hynek) from 24.2.0 to 24.3.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-16 14:46:22 +01:00
dependabot[bot] a96d574000 Bump certifi from 2024.8.30 to 2024.12.14 (#1969)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.8.30 to 2024.12.14.
- [Commits](https://github.com/certifi/python-certifi/compare/2024.08.30...2024.12.14)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-16 14:27:20 +01:00
overcuriousity 88d68490f3 Created web frontend launched via --web flag (#1967)
Author: overcuriousity 
Co-authored-by: Soxoj <soxoj@protonmail.com>
2024-12-16 14:24:03 +01:00
68 changed files with 34656 additions and 28895 deletions
+8 -1
View File
@@ -1,3 +1,10 @@
#!/bin/sh
echo 'Activating update_sitesmd hook script...'
poetry run update_sitesmd
poetry run update_sitesmd
echo 'Regenerating db_meta.json...'
python3 utils/generate_db_meta.py
git add maigret/resources/db_meta.json
git add maigret/resources/data.json
git add sites.md
+54 -39
View File
@@ -2,54 +2,69 @@ name: Package exe with PyInstaller - Windows
on:
push:
branches: [ main, dev ]
branches: [main, dev]
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Checkout
uses: actions/checkout@v4
- name: PyInstaller Windows Build
uses: JackMcKew/pyinstaller-action-windows@main
with:
path: pyinstaller
# Wine Python (not Linux) runs PyInstaller; altgraph needs pkg_resources — reinstall setuptools after all deps.
- name: Prepare requirements for Wine (setuptools last)
run: |
set -euo pipefail
cp pyinstaller/requirements.txt pyinstaller/requirements-wine.txt
{
echo ""
echo "# CI: setuptools last so pkg_resources exists for PyInstaller/altgraph in Wine"
echo "setuptools==70.0.0"
} >> pyinstaller/requirements-wine.txt
- name: Upload PyInstaller Binary to Workflow as Artifact
uses: actions/upload-artifact@v4
with:
name: maigret_standalone_win32
path: pyinstaller/dist/windows
- name: PyInstaller Windows Build
uses: JackMcKew/pyinstaller-action-windows@main
with:
path: pyinstaller
requirements: requirements-wine.txt
- name: Download PyInstaller Binary
uses: actions/download-artifact@v4
with:
name: maigret_standalone_win32
- name: Upload PyInstaller Binary to Workflow as Artifact
if: success()
uses: actions/upload-artifact@v4
with:
name: maigret_standalone_win32
path: pyinstaller/dist/windows
- name: Create New Release and Upload PyInstaller Binary to Release
uses: ncipollo/release-action@v1.14.0
id: create_release
with:
allowUpdates: true
draft: false
prerelease: false
artifactErrorsFailBuild: true
makeLatest: true
replacesArtifacts: true
artifacts: maigret_standalone.exe
name: Development Windows Release [${{ github.ref_name }}]
tag: ${{ github.ref_name }}
body: |
This is a development release built from the **${{ github.ref_name }}** branch.
- name: Download PyInstaller Binary
if: success()
uses: actions/download-artifact@v4
with:
name: maigret_standalone_win32
Take into account that `dev` releases may be unstable.
Please, use [the development release](https://github.com/soxoj/maigret/releases/tag/main) build from the **main** branch.
- name: Create New Release and Upload PyInstaller Binary to Release
if: success()
uses: ncipollo/release-action@v1.14.0
id: create_release
with:
allowUpdates: true
draft: false
prerelease: false
artifactErrorsFailBuild: true
makeLatest: true
replacesArtifacts: true
artifacts: maigret_standalone.exe
name: Development Windows Release [${{ github.ref_name }}]
tag: ${{ github.ref_name }}
body: |
This is a development release built from the **${{ github.ref_name }}** branch.
Instructions:
- Download the attached file `maigret_standalone.exe` to get the Windows executable.
- Video guide on how to run it: https://youtu.be/qIgwTZOmMmM
- For detailed documentation, visit: https://maigret.readthedocs.io/en/latest/
Take into account that `dev` releases may be unstable.
Please, use [the development release](https://github.com/soxoj/maigret/releases/tag/main) build from the **main** branch.
env:
GITHUB_TOKEN: ${{ github.token }}
Instructions:
- Download the attached file `maigret_standalone.exe` to get the Windows executable.
- Video guide on how to run it: https://youtu.be/qIgwTZOmMmM
- For detailed documentation, visit: https://maigret.readthedocs.io/en/latest/
env:
GITHUB_TOKEN: ${{ github.token }}
+6 -3
View File
@@ -13,7 +13,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]
python-version: ["3.10", "3.11", "3.12", "3.13"]
steps:
- name: Checkout
@@ -22,6 +22,9 @@ jobs:
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install system dependencies
run: |
sudo apt-get update && sudo apt-get install -y libcairo2-dev
- name: Install dependencies
run: |
python -m pip install --upgrade pip
@@ -33,7 +36,7 @@ jobs:
poetry run coverage report --fail-under=60
poetry run coverage html
- name: Upload coverage report
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: htmlcov
name: htmlcov-${{ strategy.job-index }}
path: htmlcov
+12 -21
View File
@@ -1,30 +1,21 @@
name: Upload Python Package to PyPI when a Release is Created
on:
release:
types: [created]
push:
tags:
- "v*"
permissions:
id-token: write
contents: read
jobs:
pypi-publish:
name: Publish release to PyPI
build-and-publish:
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/p/maigret
permissions:
id-token: write
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
- uses: astral-sh/setup-uv@v3
- run: uv build
- name: Publish to PyPI (Trusted Publishing)
uses: pypa/gh-action-pypi-publish@release/v1
with:
python-version: "3.x"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel
- name: Build package
run: |
python setup.py sdist bdist_wheel # Could also be python -m build
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
packages-dir: dist
+43 -17
View File
@@ -1,34 +1,60 @@
name: Update sites rating and statistics
on:
pull_request:
branches: [ dev ]
types: [opened, synchronize]
push:
branches: [ main ]
concurrency:
group: update-sites-${{ github.ref }}
cancel-in-progress: true
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2.3.2
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
ref: main
fetch-depth: 0 # otherwise, there would be errors pushing refs to the destination repository.
- name: build application
- name: Install system dependencies
run: |
sudo apt-get update && sudo apt-get install -y libcairo2-dev
- name: Build application
run: |
pip3 install .
python3 ./utils/update_site_data.py --empty-only
- name: Commit and push changes
- name: Regenerate db_meta.json
run: python3 utils/generate_db_meta.py
- name: Remove ambiguous main tag
run: git tag -d main || true
- name: Check for meaningful changes
id: check
run: |
git config --global user.name "Maigret autoupdate"
git config --global user.email "soxoj@protonmail.com"
echo `git name-rev ${{ github.event.pull_request.head.sha }} --name-only`
export BRANCH=`git name-rev ${{ github.event.pull_request.head.sha }} --name-only | sed 's/remotes\/origin\///'`
echo $BRANCH
git remote -v
git checkout $BRANCH
git add sites.md
git commit -m "Updated site list and statistics"
git push origin $BRANCH
REAL_CHANGES=$(git diff --unified=0 sites.md | grep '^[+-][^+-]' | grep -v 'The list was updated at' | wc -l)
if [ "$REAL_CHANGES" -gt 0 ]; then
echo "has_changes=true" >> $GITHUB_OUTPUT
else
echo "has_changes=false" >> $GITHUB_OUTPUT
fi
- name: Delete existing PR branch
if: steps.check.outputs.has_changes == 'true'
run: git push origin --delete auto/update-sites-list || true
- name: Create Pull Request
if: steps.check.outputs.has_changes == 'true'
uses: peter-evans/create-pull-request@v7
with:
token: ${{ secrets.GITHUB_TOKEN }}
commit-message: "Updated site list and statistics"
title: "Automated Sites List Update"
body: "Automated changes to sites.md based on new Alexa rankings/statistics."
branch: "auto/update-sites-list"
base: main
delete-branch: true
+1
View File
@@ -43,3 +43,4 @@ settings.json
# other
*.egg-info
build
LLM
+249 -1
View File
@@ -1,6 +1,254 @@
# Changelog
## [Unreleased]
## [0.5.0] - 2025-08-10
* Site Supression by @C3n7ral051nt4g3ncy in https://github.com/soxoj/maigret/pull/627
* Bump yarl from 1.7.2 to 1.8.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/626
* Streaming sites by @soxoj in https://github.com/soxoj/maigret/pull/628
* Mirrors by @fen0s in https://github.com/soxoj/maigret/pull/630
* Added Instagram scrapers by @soxoj in https://github.com/soxoj/maigret/pull/633
* Bump psutil from 5.9.1 to 5.9.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/624
* Bump pypdf2 from 2.10.4 to 2.10.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/625
* Invalid results fixes by @soxoj in https://github.com/soxoj/maigret/pull/634
* Bump pytest-httpserver from 1.0.5 to 1.0.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/638
* Bump pypdf2 from 2.10.5 to 2.10.8 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/641
* Bump certifi from 2022.6.15 to 2022.9.14 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/644
* Bump idna from 3.3 to 3.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/640
* fix false positives from bot by @fen0s in https://github.com/soxoj/maigret/pull/663
* Add pre commit hook by @fen0s in https://github.com/soxoj/maigret/pull/664
* site deletion by @C3n7ral051nt4g3ncy in https://github.com/soxoj/maigret/pull/648
* Changed docker run to interactive and remove on exit by @dr-BEat in https://github.com/soxoj/maigret/pull/675
* Corrected grammar in README.md by @Trkzi-Omar in https://github.com/soxoj/maigret/pull/674
* fix sites from issues by @fen0s in https://github.com/soxoj/maigret/pull/680
* correct username in usage examples by @LeonGr in https://github.com/soxoj/maigret/pull/673
* Update README.md by @johanburati in https://github.com/soxoj/maigret/pull/669
* Fix typos by @LorenzoSapora in https://github.com/soxoj/maigret/pull/681
* Build docker images for arm64 and amd64 by @krydos in https://github.com/soxoj/maigret/pull/687
* Bump certifi from 2022.9.14 to 2022.9.24 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/652
* Bump aiohttp from 3.8.1 to 3.8.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/651
* Bump arabic-reshaper from 2.1.3 to 2.1.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/650
* Update README.md, Repl.it -> Replit with new badge by @PeterDaveHello in https://github.com/soxoj/maigret/pull/692
* Refactor Dockerfile with best practices by @PeterDaveHello in https://github.com/soxoj/maigret/pull/691
* Improve README.md Installation section by @PeterDaveHello in https://github.com/soxoj/maigret/pull/690
* Bump pytest-cov from 3.0.0 to 4.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/688
* Bump stem from 1.8.0 to 1.8.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/689
* Bump typing-extensions from 4.3.0 to 4.4.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/698
* Typo fixes in error.py by @Ben-Chapman in https://github.com/soxoj/maigret/pull/711
* Fixed docs about tags by @soxoj in https://github.com/soxoj/maigret/pull/715
* Fixed lightstalking.com by @soxoj in https://github.com/soxoj/maigret/pull/716
* Fixed YouTube by @soxoj in https://github.com/soxoj/maigret/pull/717
* Bump pytest-asyncio from 0.19.0 to 0.20.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/732
* Updated snapcraft yaml by @kz6fittycent in https://github.com/soxoj/maigret/pull/720
* Bump colorama from 0.4.5 to 0.4.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/733
* Bump pytest from 7.1.3 to 7.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/734
* disable not working sites by @fen0s in https://github.com/soxoj/maigret/pull/739
* disable broken sites by @fen0s in https://github.com/soxoj/maigret/pull/756
* Bump cloudscraper from 1.2.64 to 1.2.66 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/769
* fix opensea and shutterstock, disable a few dead sites by @fen0s in https://github.com/soxoj/maigret/pull/798
* Fixed documentation URL by @soxoj in https://github.com/soxoj/maigret/pull/799
* Small readme fix by @soxoj in https://github.com/soxoj/maigret/pull/857
* docs spelling error by @Nadeem-05 in https://github.com/soxoj/maigret/pull/866
* Fix Pinterest false positive by @therealchiendat in https://github.com/soxoj/maigret/pull/862
* Added new Websites by @codyMar30 in https://github.com/soxoj/maigret/pull/838
* Update "future" package to v0.18.3 by @PeterDaveHello in https://github.com/soxoj/maigret/pull/834
* Bump certifi from 2022.9.24 to 2022.12.7 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/793
* Update dependency - networkx from v2.5.1 to v2.6 by @PeterDaveHello in https://github.com/soxoj/maigret/pull/738
* Bump reportlab from 3.6.11 to 3.6.12 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/735
* Bump typing-extensions from 4.4.0 to 4.5.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/888
* Bump psutil from 5.9.2 to 5.9.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/741
* Bump attrs from 22.1.0 to 22.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/892
* Bump multidict from 6.0.2 to 6.0.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/891
* Fixed false positives, updated networkx dep, some lint fixes by @soxoj in https://github.com/soxoj/maigret/pull/894
* Bump lxml from 4.9.1 to 4.9.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/900
* Bump yarl from 1.8.1 to 1.8.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/899
* Fixed false positives on Mastodon sites by @soxoj in https://github.com/soxoj/maigret/pull/901
* Added valid regex for Mastodon instances (#848) by @soxoj in https://github.com/soxoj/maigret/pull/906
* Fix missing Mastodon Regex on #906 by @therealchiendat in https://github.com/soxoj/maigret/pull/908
* Bump tqdm from 4.64.1 to 4.65.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/905
* Bump requests from 2.28.1 to 2.28.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/904
* Bump psutil from 5.9.4 to 5.9.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/910
* fix deployment of tests by @noraj in https://github.com/soxoj/maigret/pull/933
* Added 26 ENS and similar domains with tag `crypto` by @soxoj in https://github.com/soxoj/maigret/pull/942
* Bump requests from 2.28.2 to 2.31.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/957
* Update wizard.py by @engNoori in https://github.com/soxoj/maigret/pull/1016
* Improved search through UnstoppableDomains by @soxoj in https://github.com/soxoj/maigret/pull/1040
* Added memory.lol (Twitter usernames archive) by @soxoj in https://github.com/soxoj/maigret/pull/1067
* Disabled and fixed several sites by @soxoj in https://github.com/soxoj/maigret/pull/1132
* Fixed some sites (again) by @soxoj in https://github.com/soxoj/maigret/pull/1133
* fix(sec): upgrade reportlab to 3.6.13 by @realize096 in https://github.com/soxoj/maigret/pull/1051
* Add compatibility with pytest >= 7.3.0 by @tjni in https://github.com/soxoj/maigret/pull/1117
* Additionally fixed sites, win32 build fix by @soxoj in https://github.com/soxoj/maigret/pull/1148
* Sites fixes 250823 by @soxoj in https://github.com/soxoj/maigret/pull/1149
* Bump reportlab from 3.6.12 to 4.0.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1160
* Bump certifi from 2022.12.7 to 2023.7.22 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1070
* fix(sec): upgrade certifi to 2022.12.07 by @realize096 in https://github.com/soxoj/maigret/pull/1173
* Bump cloudscraper from 1.2.66 to 1.2.71 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/914
* Some sites fixed & cloudflare detection by @soxoj in https://github.com/soxoj/maigret/pull/1178
* EasyInstaller because everyone likes saving time :) by @CatchySmile in https://github.com/soxoj/maigret/pull/1212
* Tests fixes + last updates by @soxoj in https://github.com/soxoj/maigret/pull/1228
* Bump pypdf2 from 2.10.8 to 3.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/815
* Bump pyvis from 0.2.1 to 0.3.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/861
* Bump xhtml2pdf from 0.2.8 to 0.2.11 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/935
* Bump flake8 from 5.0.4 to 6.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1091
* Bump aiohttp from 3.8.3 to 3.8.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1222
* Specified pyinstaller version by @soxoj in https://github.com/soxoj/maigret/pull/1230
* Pyinstaller fix by @soxoj in https://github.com/soxoj/maigret/pull/1231
* Test pyinstaller on dev branch by @soxoj in https://github.com/soxoj/maigret/pull/1233
* Update main from dev again by @soxoj in https://github.com/soxoj/maigret/pull/1234
* Bump typing-extensions from 4.5.0 to 4.8.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1239
* Bump pytest-rerunfailures from 10.2 to 12.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1237
* Bump async-timeout from 4.0.2 to 4.0.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1238
* Changed pyinstaller dir by @soxoj in https://github.com/soxoj/maigret/pull/1245
* Bump tqdm from 4.65.0 to 4.66.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1235
* Updating site checkers, disabling suspended sites by @MeowyPouncer in https://github.com/soxoj/maigret/pull/1266
* Updated site statistics by @soxoj in https://github.com/soxoj/maigret/pull/1273
* Compat RegataOS (Opensuse) by @Jeiel0rbit in https://github.com/soxoj/maigret/pull/1308
* fix reddit by @hhhtylerw in https://github.com/soxoj/maigret/pull/1296
* Added Telegram bot link by @soxoj in https://github.com/soxoj/maigret/pull/1321
* Added SOWEL classification by @soxoj in https://github.com/soxoj/maigret/pull/1453
* Bump jinja2 from 3.1.2 to 3.1.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1358
* Fixed/Disabled sites. Update requirements.txt by @rly0nheart in https://github.com/soxoj/maigret/pull/1517
* Fixed 4 sites, added 6 sites, disabled 27 sites by @rly0nheart in https://github.com/soxoj/maigret/pull/1536
* Fixed 3 sites, disabed 3, added by @rly0nheart in https://github.com/soxoj/maigret/pull/1539
* Bump socid-extractor from 0.0.24 to 0.0.26 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1546
* Added code conventions to CONTRIBUTING.md by @Lord-Topa in https://github.com/soxoj/maigret/pull/1589
* Readme by @Lord-Topa in https://github.com/soxoj/maigret/pull/1588
* Update data.json by @ranlo in https://github.com/soxoj/maigret/pull/1559
* Adding permutator feature for usernames by @balestek in https://github.com/soxoj/maigret/pull/1575
* Alik.cz indirectly requests removal by @ppfeister in https://github.com/soxoj/maigret/pull/1671
* Fixed 1 site, PyInstaller workflow, Google Colab example by @Ixve in https://github.com/soxoj/maigret/pull/1558
* Bump soupsieve from 2.5 to 2.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1708
* Added dev documentation, fixed some sites, removed GitHub issue links… by @soxoj in https://github.com/soxoj/maigret/pull/1869
* Bump cryptography from 42.0.7 to 43.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1870
* Bump requests-futures from 1.0.1 to 1.0.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1868
* Bump werkzeug from 3.0.3 to 3.0.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1846
* Added .readthedocs.yaml, fixed Pyinstaller and Docker workflows by @soxoj in https://github.com/soxoj/maigret/pull/1874
* Added GitHub and BuyMeACoffee sponsorships by @soxoj in https://github.com/soxoj/maigret/pull/1875
* Bump psutil from 5.9.5 to 6.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1839
* Bump flake8 from 6.1.0 to 7.1.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1692
* Bump future from 0.18.3 to 1.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1545
* Bump urllib3 from 2.2.1 to 2.2.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1600
* Bump certifi from 2023.11.17 to 2024.8.30 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1840
* Fixed test for aiohttp 3.10 by @soxoj in https://github.com/soxoj/maigret/pull/1876
* Bump aiohttp from 3.9.5 to 3.10.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1721
* Added new badges to README by @soxoj in https://github.com/soxoj/maigret/pull/1877
* Show detailed error statistics for `-v` by @soxoj in https://github.com/soxoj/maigret/pull/1879
* Disabled unavailable sites by @soxoj in https://github.com/soxoj/maigret/pull/1880
* Added 7 sites, implemented integration with Marple, docs update by @soxoj in https://github.com/soxoj/maigret/pull/1881
* Bump pefile from 2022.5.30 to 2024.8.26 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1883
* Bump lxml from 4.9.4 to 5.3.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1884
* New sites added by @soxoj in https://github.com/soxoj/maigret/pull/1888
* Improved self-check mode, added 15 sites by @soxoj in https://github.com/soxoj/maigret/pull/1887
* Bump pyinstaller from 6.1 to 6.11.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1882
* Bump pytest-asyncio from 0.23.7 to 0.23.8 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1885
* Pyinstaller bump & pefile fix by @soxoj in https://github.com/soxoj/maigret/pull/1890
* Bump python-bidi from 0.4.2 to 0.6.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1886
* Sites checks fixes by @soxoj in https://github.com/soxoj/maigret/pull/1896
* Parallel execution optimization by @soxoj in https://github.com/soxoj/maigret/pull/1897
* Maigret bot support (custom progress function fixed) by @soxoj in https://github.com/soxoj/maigret/pull/1898
* Bump markupsafe from 2.1.5 to 3.0.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1895
* Retries set to 0 by default, refactored code of executor with progress by @soxoj in https://github.com/soxoj/maigret/pull/1899
* Bump aiohttp-socks from 0.7.1 to 0.9.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1900
* Bump pycountry from 23.12.11 to 24.6.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1903
* Bump pytest-cov from 4.1.0 to 6.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1902
* Bump pyvis from 0.2.1 to 0.3.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1893
* Close http connections (#1595) by @soxoj in https://github.com/soxoj/maigret/pull/1905
* New logo by @soxoj in https://github.com/soxoj/maigret/pull/1906
* Fixed dateutil parsing error for CDT timezone by @soxoj in https://github.com/soxoj/maigret/pull/1907
* Bump alive-progress from 2.4.1 to 3.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1910
* Permutator output and documentation updates by @soxoj in https://github.com/soxoj/maigret/pull/1914
* Bump aiohttp from 3.11.7 to 3.11.8 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1912
* Bump async-timeout from 4.0.3 to 5.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1909
* An recursive search animation in README has been updated by @soxoj in https://github.com/soxoj/maigret/pull/1915
* Bump pytest-rerunfailures from 12.0 to 15.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1911
* Bump attrs from 22.2.0 to 24.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1913
* Sites fixes by @soxoj in https://github.com/soxoj/maigret/pull/1917
* Update README.md by @soxoj in https://github.com/soxoj/maigret/pull/1919
* Refactored sites module, updated documentation by @soxoj in https://github.com/soxoj/maigret/pull/1918
* Bump aiohttp from 3.11.8 to 3.11.9 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1920
* Bump pytest from 7.4.4 to 8.3.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1923
* Bump yarl from 1.18.0 to 1.18.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1922
* Bump pytest-asyncio from 0.23.8 to 0.24.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1925
* Documentation update by @soxoj in https://github.com/soxoj/maigret/pull/1926
* Bump mock from 4.0.3 to 5.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1921
* Bump pywin32-ctypes from 0.2.1 to 0.2.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1924
* Installation docs update by @soxoj in https://github.com/soxoj/maigret/pull/1927
* Disabled Figma check by @soxoj in https://github.com/soxoj/maigret/pull/1928
* Put Windows executable in Releases for each dev and main commit by @soxoj in https://github.com/soxoj/maigret/pull/1929
* Updated PyInstaller workflow by @soxoj in https://github.com/soxoj/maigret/pull/1930
* Documentation update by @soxoj in https://github.com/soxoj/maigret/pull/1931
* Fixed Figma check and some bugs by @soxoj in https://github.com/soxoj/maigret/pull/1932
* Bump six from 1.16.0 to 1.17.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1933
* Activation mechanism documentation added by @soxoj in https://github.com/soxoj/maigret/pull/1935
* Readme/docs update based on GH discussions by @soxoj in https://github.com/soxoj/maigret/pull/1936
* Bump aiohttp from 3.11.9 to 3.11.10 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1937
* Weibo site check fix, activation mechanism added by @soxoj in https://github.com/soxoj/maigret/pull/1938
* Fixed Ebay and BongaCams checks by @soxoj in https://github.com/soxoj/maigret/pull/1939
* Sites fixes by @soxoj in https://github.com/soxoj/maigret/pull/1940
* Fixed Linktr and discourse.mozilla.org by @soxoj in https://github.com/soxoj/maigret/pull/1941
* Refactored self-check method, code formatting, small lint fixes by @soxoj in https://github.com/soxoj/maigret/pull/1942
* Refactoring, test coverage increased to 60% by @soxoj in https://github.com/soxoj/maigret/pull/1943
* Added a test for submitter by @soxoj in https://github.com/soxoj/maigret/pull/1944
* Update README.md by @soxoj in https://github.com/soxoj/maigret/pull/1949
* Updated OP.GG checks by @soxoj in https://github.com/soxoj/maigret/pull/1950
* Fixed ProductHunt check by @soxoj in https://github.com/soxoj/maigret/pull/1951
* Improved check feature extraction function, added tests by @soxoj in https://github.com/soxoj/maigret/pull/1952
* Submit improvements and site check fixes by @soxoj in https://github.com/soxoj/maigret/pull/1956
* chore: update submit.py by @eltociear in https://github.com/soxoj/maigret/pull/1957
* Fixed Gravatar parsing (socid_extractor) by @soxoj in https://github.com/soxoj/maigret/pull/1958
* Site check fixes by @soxoj in https://github.com/soxoj/maigret/pull/1962
* fix bad linux filename generation by @overcuriousity in https://github.com/soxoj/maigret/pull/1961
* Bump pytest-asyncio from 0.24.0 to 0.25.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1963
* Fixed flaky tests to check cookies by @soxoj in https://github.com/soxoj/maigret/pull/1965
* Preparation of 0.5.0 alpha version by @soxoj in https://github.com/soxoj/maigret/pull/1966
* Created web frontend launched via --web flag by @overcuriousity in https://github.com/soxoj/maigret/pull/1967
* Bump certifi from 2024.8.30 to 2024.12.14 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1969
* Bump attrs from 24.2.0 to 24.3.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1970
* Added web interface docs by @soxoj in https://github.com/soxoj/maigret/pull/1972
* Small docs and parameters fixes for web interface mode by @soxoj in https://github.com/soxoj/maigret/pull/1973
* [ImgBot] Optimize images by @imgbot[bot] in https://github.com/soxoj/maigret/pull/1974
* Improving the web interface by @overcuriousity in https://github.com/soxoj/maigret/pull/1975
* make graph more meaningful by @overcuriousity in https://github.com/soxoj/maigret/pull/1977
* Async generator-executor for site checks by @soxoj in https://github.com/soxoj/maigret/pull/1978
* Bump aiohttp from 3.11.10 to 3.11.11 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1979
* Bump psutil from 6.1.0 to 6.1.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1980
* Bump aiohttp-socks from 0.9.1 to 0.10.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1985
* Bump mypy from 1.13.0 to 1.14.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1983
* Bump aiohttp-socks from 0.10.0 to 0.10.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1987
* Bump jinja2 from 3.1.4 to 3.1.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1982
* Bump coverage from 7.6.9 to 7.6.10 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1986
* Bump pytest-asyncio from 0.25.0 to 0.25.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1989
* Bump mypy from 1.14.0 to 1.14.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1988
* Bump pytest-asyncio from 0.25.1 to 0.25.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1990
* docs: update usage-examples.rst by @eltociear in https://github.com/soxoj/maigret/pull/1996
* upload-artifact action in python test workflow updated to v4 by @soxoj in https://github.com/soxoj/maigret/pull/2024
* Pass db_file configuration to web interface by @pykereaper in https://github.com/soxoj/maigret/pull/2019
* Fix usage of data.json files from web by @pykereaper in https://github.com/soxoj/maigret/pull/2020
* Bump black from 24.10.0 to 25.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2001
* Important Update Installer.bat by @CatchySmile in https://github.com/soxoj/maigret/pull/1994
* Bump cryptography from 44.0.0 to 44.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2005
* Bump jinja2 from 3.1.5 to 3.1.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2011
* [#2010] Add 6 more websites to manage by @pylapp in https://github.com/soxoj/maigret/pull/2009
* Bump flask from 3.1.0 to 3.1.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2028
* Bump requests from 2.32.3 to 2.32.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2026
* Bump pycares from 4.5.0 to 4.9.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2025
* Bump pytest-asyncio from 0.25.2 to 0.26.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2016
* Bump urllib3 from 2.2.3 to 2.5.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2027
* Disable ICQ site by @Echo-Darlyson in https://github.com/soxoj/maigret/pull/1993
* Bump attrs from 24.3.0 to 25.3.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2014
* Bump certifi from 2024.12.14 to 2025.1.31 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2004
* Bump typing-extensions from 4.12.2 to 4.14.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2038
* Disable AskFM by @MR-VL in https://github.com/soxoj/maigret/pull/2037
* Bump platformdirs from 4.3.6 to 4.3.8 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2033
* Bump coverage from 7.6.10 to 7.9.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2039
* Bump aiohttp from 3.11.11 to 3.12.14 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2041
* Bump yarl from 1.18.3 to 1.20.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2032
* Fixed test dialog_adds_site_negative by @soxoj in https://github.com/soxoj/maigret/pull/2107
* Bump reportlab from 4.2.5 to 4.4.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2063
* Bump asgiref from 3.8.1 to 3.9.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2040
* Bump multidict from 6.1.0 to 6.6.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2034
* Bump pytest-rerunfailures from 15.0 to 15.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2030
**Full Changelog**: https://github.com/soxoj/maigret/compare/v0.4.4...v0.5.0
## [0.4.4] - 2022-09-03
* Fixed some false positives by @soxoj in https://github.com/soxoj/maigret/pull/433
+9 -7
View File
@@ -1,16 +1,18 @@
FROM python:3.10-slim
FROM python:3.11-slim
LABEL maintainer="Soxoj <soxoj@protonmail.com>"
WORKDIR /app
RUN pip install --no-cache-dir --upgrade pip
RUN apt-get update && \
apt-get install --no-install-recommends -y \
gcc \
musl-dev \
libxml2 \
build-essential \
python3-dev \
pkg-config \
libcairo2-dev \
libxml2-dev \
libxslt-dev \
&& \
rm -rf /var/lib/apt/lists/* /tmp/*
libxslt1-dev \
&& rm -rf /var/lib/apt/lists/* /tmp/*
COPY . .
RUN YARL_NO_EXTENSIONS=1 python3 -m pip install --no-cache-dir .
# For production use, set FLASK_HOST to a specific IP address for security
ENV FLASK_HOST=0.0.0.0
ENTRYPOINT ["maigret"]
+71 -81
View File
@@ -1,85 +1,61 @@
@echo off
REM check if running as admin
goto check_Permissions
:check_Permissions
echo Administrative permissions required. Detecting permissions...
net session >nul 2>&1
if %errorLevel% == 0 (
goto 1
echo Success: Elevated permissions granted.
) else (
cls
echo Failure: You MUST run this as administator, otherwise commands will fail.
echo Failure: Requires elevated permissions.
pause >nul
)
pause >nul
REM Step 2: Check if Python and pip3 are installed
python --version >nul 2>&1
if %errorlevel% neq 0 (
echo Python is not installed. Please install Python 3.8 or higher.
pause
exit /b
)
pip3 --version >nul 2>&1
if %errorlevel% neq 0 (
echo pip3 is not installed. Please install pip3.
pause
exit /b
)
REM Step 3: Check Python version
python -c "import sys; exit(0) if sys.version_info >= (3,8) else exit(1)"
if %errorlevel% neq 0 (
echo Python version 3.8 or higher is required.
pause
exit /b
)
:1
cls
:::===============================================================
::: ______ __ __ _ _
::: | ____| | \/ | (_) | |
::: | |__ __ _ ___ _ _ | \ / | __ _ _ __ _ _ __ ___| |_
::: | __| / _` / __| | | | | |\/| |/ _` | |/ _` | '__/ _ \ __|
::: | |___| (_| \__ \ |_| | | | | | (_| | | (_| | | | __/ |_
::: |______\__,_|___/\__, | |_| |_|\__,_|_|\__, |_| \___|\__|
::: __/ | __/ |
::: |___/ |___/
:::
:::===============================================================
echo.
for /f "delims=: tokens=*" %%A in ('findstr /b ::: "%~f0"') do @echo(%%A
echo.
echo ----------------------------------------------------------------
echo Python 3.8 or higher and pip3 required.
echo ----------------------------------------------------------------
echo Press [I] to begin installation.
echo Press [R] If already installed.
echo ----------------------------------------------------------------
echo --------------------------------------------------------
echo Python 3.8 or higher and pip3 required.
echo --------------------------------------------------------
echo Press [I] to begin installation.
echo Press [R] If already installed.
echo --------------------------------------------------------
choice /c IR
if %errorlevel%==1 goto install1
if %errorlevel%==1 goto check_python
if %errorlevel%==2 goto after
:check_python
cls
for /f "tokens=2 delims= " %%i in ('python --version 2^>nul') do (
for /f "tokens=1,2 delims=." %%j in ("%%i") do (
if %%j GEQ 3 (
if %%k GEQ 8 (
goto check_pip
)
)
)
)
echo Python 3.8 or higher is required. Please install it first.
pause
exit /b
:check_pip
pip --version 2>nul | findstr /r /c:"pip" >nul
if %errorlevel% neq 0 (
echo pip is required. Please install it first.
pause
exit /b
)
goto install1
:install1
cls
echo ========================================================
echo Maigret Installation Script
echo Maigret Installation
echo ========================================================
echo.
echo --------------------------------------------------------
echo If your pip installation is outdated, it could cause
echo cryptography to fail on installation.
echo --------------------------------------------------------
echo check for and install pip updates now?
echo Check for and install pip 23.3.2 now?
echo --------------------------------------------------------
choice /c YN
if %errorlevel%==1 goto install2
@@ -87,42 +63,56 @@ if %errorlevel%==2 goto install3
:install2
cls
python -m pip install --upgrade pip
goto:install3
python -m pip install --upgrade pip==23.3.2
if %errorlevel% neq 0 (
echo Failed to update pip to version 23.3.2. Please check your installation.
pause
exit /b
)
goto install3
:install3
cls
echo ========================================================
echo Maigret Installation Script
echo Maigret Installation
echo ========================================================
echo.
echo --------------------------------------------------------
echo Install requirements and maigret?
echo --------------------------------------------------------
choice /c YN
if %errorlevel%==1 goto install4
if %errorlevel%==2 goto 1
:install4
cls
pip install .
pip install maigret
goto:after
echo Installing Maigret...
python -m pip install maigret
if %errorlevel% neq 0 (
echo Failed to install Maigret. Please check your installation.
pause
exit /b
)
echo.
echo +------------------------------------------------------+
echo Maigret installed successfully.
echo +------------------------------------------------------+
pause
goto after
:after
cls
echo ========================================================
echo Maigret Background Search
echo Maigret Usage
echo ========================================================
echo.
echo --------------------------------------------------------
echo Please Enter Username / Email
echo --------------------------------------------------------
set /p input=
maigret %input%
echo +--------------------------------------------------------+
echo To use Maigret, you can run the following command:
echo.
echo maigret [options] [username]
echo.
echo For example, to search for a username:
echo.
echo maigret example_username
echo.
echo For more options and usage details, refer to the Maigret documentation.
echo.
echo https://github.com/soxoj/maigret/blob/5b3b81b4822f6deb2e9c31eb95039907f25beb5e/README.md
echo +--------------------------------------------------------+
echo.
cmd
pause
goto:after
exit /b
exit /b
+1 -2
View File
@@ -1,7 +1,6 @@
MIT License
Copyright (c) 2019 Sherlock Project
Copyright (c) 2020-2021 Soxoj
Copyright (c) 2020-2026 Soxoj
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
-4
View File
@@ -1,4 +0,0 @@
include LICENSE
include README.md
include requirements.txt
include maigret/resources/*
+1 -1
View File
@@ -1,7 +1,7 @@
LINT_FILES=maigret wizard.py tests
test:
coverage run --source=./maigret -m pytest tests
coverage run --source=./maigret,./maigret/web -m pytest tests
coverage report -m
coverage html
+41 -2
View File
@@ -25,7 +25,7 @@
<i>The Commissioner Jules Maigret is a fictional French police detective, created by Georges Simenon. His investigation method is based on understanding the personality of different people and their interactions.</i>
<b>👉👉👉 [Online Telegram bot](https://t.me/osint_maigret_bot)</b>
<b>👉👉👉 [Online Telegram bot](https://t.me/maigret_search_bot) | 🏢 [Commercial use & API](#commercial-use)</b>
## About
@@ -53,7 +53,7 @@ See the full description of Maigret features [in the documentation](https://maig
## Installation
‼️ Maigret is available online via [official Telegram bot](https://t.me/osint_maigret_bot). Consider using it if you don't want to install anything.
‼️ Maigret is available online via [official Telegram bot](https://t.me/maigret_search_bot). Consider using it if you don't want to install anything.
### Windows
@@ -75,6 +75,7 @@ You can launch Maigret using cloud shells and Jupyter notebooks. Press one of th
Maigret can be installed using pip, Docker, or simply can be launched from the cloned repo.
**NOTE**: Python 3.10 or higher and pip is required, **Python 3.11 is recommended.**
```bash
@@ -111,6 +112,10 @@ docker run -v /mydir:/app/reports soxoj/maigret:latest username --html
docker build -t maigret .
```
### Troubleshooting
If you encounter build errors during installation, check the [troubleshooting guide](https://maigret.readthedocs.io/en/latest/installation.html#troubleshooting).
## Usage examples
```bash
@@ -131,6 +136,30 @@ maigret user1 user2 user3 -a
Use `maigret --help` to get full options description. Also options [are documented](https://maigret.readthedocs.io/en/latest/command-line-options.html).
### Web interface
You can run Maigret with a web interface, where you can view the graph with results and download reports of all formats on a single page.
<details>
<summary>Web Interface Screenshots</summary>
![Web interface: how to start](https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot_start.png)
![Web interface: results](https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot.png)
</details>
Instructions:
1. Run Maigret with the ``--web`` flag and specify the port number.
```console
maigret --web 5000
```
2. Open http://127.0.0.1:5000 in your browser and enter one or more usernames to make a search.
3. Wait a bit for the search to complete and view the graph with results, the table with all accounts found, and download reports of all formats.
## Contributing
Maigret has open-source code, so you may contribute your own sites by adding them to `data.json` file, or bring changes to it's code!
@@ -167,6 +196,16 @@ The authors and developers of this tool bear no responsibility for any misuse or
If you have any questions, suggestions, or feedback, please feel free to [open an issue](https://github.com/soxoj/maigret/issues), create a [GitHub discussion](https://github.com/soxoj/maigret/discussions), or contact the author directly via [Telegram](https://t.me/soxoj).
## Commercial Use
If you need a **daily updated database** of supported sites or an **API for username checks**, feel free to reach out:
📧 [maigret@soxoj.com](mailto:maigret@soxoj.com)
Available options:
- Up-to-date site database - regularly maintained and updated list of 5K+ sites, delivered daily
- Username check API - programmatic access to Maigret's search capabilities for integration into your products
## SOWEL classification
This tool uses the following OSINT techniques:
+72 -7
View File
@@ -31,14 +31,32 @@ two-letter country codes (**not a language!**). E.g. photo, dating, sport; jp, u
Multiple tags can be associated with one site. **Warning**: tags markup is
not stable now. Read more :doc:`in the separate section <tags>`.
``--exclude-tags`` - Exclude sites with specific tags from the search
(blacklist). E.g. ``--exclude-tags porn,dating`` will skip all sites
tagged with ``porn`` or ``dating``. Can be combined with ``--tags`` to
include certain categories while excluding others. Read more
:doc:`in the separate section <tags>`.
``-n``, ``--max-connections`` - Allowed number of concurrent connections
**(default: 100)**.
``-a``, ``--all-sites`` - Use all sites for scan **(default: top 500)**.
``--top-sites`` - Count of sites for scan ranked by Alexa Top
``--top-sites`` - Count of sites for scan ranked by Majestic Million
**(default: top 500)**.
**Mirrors:** After the top *N* sites by Majestic Million rank are chosen (respecting
``--tags``, ``--use-disabled-sites``, etc.), Maigret may add extra sites
whose database field ``source`` names a **parent platform** that itself falls
in the Majestic Million top *N* when ranking **including disabled** sites. For example,
if ``Twitter`` ranks in the first 500 by Majestic Million, a mirror such as ``memory.lol``
(with ``source: Twitter``) is included even though it has no rank and would
otherwise be cut off. The same applies to Instagram-related mirrors (e.g.
Picuki) when ``Instagram`` is in that parent top *N* by rank—even if the
official ``Instagram`` entry is disabled and not scanned by default, its
mirrors can still be pulled in. The final list is the ranked top *N* plus
these mirrors (no fixed upper bound on mirror count).
``--timeout`` - Time (in seconds) to wait for responses from sites
**(default: 30)**. A longer timeout will be more likely to get results
from slow sites. On the other hand, this may cause a long delay to
@@ -88,6 +106,9 @@ username).
``-J``, ``--json`` - Generate a JSON report of specific type: simple,
ndjson (one report per username). E.g. ``--json ndjson``
``-M``, ``--md`` - Generate a Markdown report (general report on all
usernames). See :ref:`markdown-report` below.
``-fo``, ``--folderoutput`` - Results will be saved to this folder,
``results`` by default. Will be created if doesnt exist.
@@ -112,16 +133,60 @@ Other operations modes
``--version`` - Display version information and dependencies.
``--self-check`` - Do self-checking for sites and database and disable
non-working ones **for current search session** by default. Its useful
for testing new internet connection (it depends on provider/hosting on
which sites there will be censorship stub or captcha display). After
checking Maigret asks if you want to save updates, answering y/Y will
rewrite the local database.
``--self-check`` - Do self-checking for sites and database. Each site is
tested by looking up its known-claimed and known-unclaimed usernames and
verifying that the results match expectations. Individual site failures
(network errors, unexpected exceptions, etc.) are caught and logged
without stopping the overall process, so the check always runs to
completion. After checking, Maigret reports a summary of issues found.
If any sites were disabled (see ``--auto-disable``), Maigret asks if you
want to save updates; answering y/Y will rewrite the local database.
``--auto-disable`` - Used with ``--self-check``: automatically disable
sites that fail checks (incorrect detection of claimed/unclaimed
usernames, connection errors, or unexpected exceptions). Without this
flag, ``--self-check`` only **reports** issues without modifying the
database.
``--diagnose`` - Used with ``--self-check``: print detailed diagnosis
information for each failing site, including the check type, the list
of issues found, and recommendations (e.g. suggesting a different
``checkType``).
``--submit URL`` - Do an automatic analysis of the given account URL or
site main page URL to determine the site engine and methods to check
account presence. After checking Maigret asks if you want to add the
site, answering y/Y will rewrite the local database.
.. _markdown-report:
Markdown report (LLM-friendly)
------------------------------
The ``--md`` / ``-M`` flag generates a Markdown report designed for both human reading and analysis by AI assistants (ChatGPT, Claude, etc.).
.. code-block:: console
maigret username --md
The report includes:
- **Summary** with aggregated personal data (all fullnames, locations, bios found across accounts), country tags, website tags, first/last seen timestamps.
- **Per-account sections** with profile URL, site tags, and all extracted fields (username, bio, follower count, linked accounts, etc.).
- **Possible false positives** disclaimer explaining that accounts may belong to different people.
- **Ethical use** notice about applicable data protection laws.
**Using with AI tools:**
The Markdown format is optimized for LLM context windows. You can feed the report directly to an AI assistant for follow-up analysis:
.. code-block:: console
# Generate the report
maigret johndoe --md
# Feed it to an AI tool
cat reports/report_johndoe.md | llm "Analyze this OSINT report and summarize key findings"
The structured Markdown with per-site sections makes it easy for AI tools to extract relationships, cross-reference identities, and identify patterns across accounts.
+2 -2
View File
@@ -3,10 +3,10 @@
# -- Project information
project = 'Maigret'
copyright = '2024, soxoj'
copyright = '2025, soxoj'
author = 'soxoj'
release = '0.5.0a1'
release = '0.5.0'
version = '0.5'
# -- General configuration
+77 -2
View File
@@ -22,8 +22,16 @@ The supported methods (``checkType`` values in ``data.json``) are:
- ``status_code`` - checks that status code of the response is 2XX
- ``response_url`` - check if there is not redirect and the response is 2XX
.. note::
Maigret natively treats specific anti-bot HTTP status codes (like LinkedIn's ``HTTP 999``) as a standard "Not Found/Available" signal instead of throwing an infrastructure Server Error, gracefully preventing false positives.
See the details of check mechanisms in the `checking.py <https://github.com/soxoj/maigret/blob/main/maigret/checking.py#L339>`_ file.
.. note::
Maigret now uses the **Majestic Million** dataset for site popularity sorting instead of the discontinued Alexa Rank API. For backward compatibility with existing configurations and parsers, the ranking field in `data.json` and internal site models remains named ``alexaRank`` and ``alexa_rank``.
**Mirrors and ``--top-sites``:** When you limit scans with ``--top-sites N``, Maigret also includes *mirror* sites (entries whose ``source`` field points at a parent platform such as Twitter or Instagram) if that parent would appear in the Majestic Million top *N* when disabled sites are considered for ranking. See the **Mirrors** paragraph under ``--top-sites`` in :doc:`command-line-options`.
Testing
-------
@@ -61,6 +69,21 @@ Use the following commands to check Maigret:
make speed
Site naming conventions
-----------------------------------------------
Site names are the keys in ``data.json`` and appear in user-facing reports. Follow these rules:
- **Title Case** by default: ``Product Hunt``, ``Hacker News``.
- **Lowercase** only if the brand itself is written that way: ``kofi``, ``note``, ``hi5``.
- **No domain suffix** (``calendly.com````Calendly``), unless the domain is part of the recognized brand name: ``last.fm``, ``VC.ru``, ``Archive.org``.
- **No full UPPERCASE** unless the brand is an acronym: ``VK``, ``CNET``, ``ICQ``, ``IFTTT``.
- **No** ``www.`` **or** ``https://`` **prefix** in the name.
- **Spaces** are allowed when the brand uses them: ``Star Citizen``, ``Google Maps``.
- **{username} templates** in names are acceptable: ``{username}.tilda.ws``.
When in doubt, check how the service refers to itself on its homepage.
How to fix false-positives
-----------------------------------------------
@@ -112,6 +135,57 @@ There are few options for sites data.json helpful in various cases:
- ``headers`` - a dictionary of additional headers to be sent to the site
- ``requestHeadOnly`` - set to ``true`` if it's enough to make a HEAD request to the site
- ``regexCheck`` - a regex to check if the username is valid, in case of frequent false-positives
- ``requestMethod`` - set the HTTP method to use (e.g., ``POST``). By default, Maigret natively defaults to GET or HEAD.
- ``requestPayload`` - a dictionary with the JSON payload to send for POST requests (e.g., ``{"username": "{username}"}``), extremely useful for parsing GraphQL or modern JSON APIs.
- ``protection`` - a list of protection types detected on the site (see below).
``protection`` (site protection tracking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ``protection`` field records what kind of anti-bot protection a site uses. Maigret reads this field and automatically applies the appropriate bypass mechanism.
Supported values:
- ``tls_fingerprint`` — the site fingerprints the TLS handshake (JA3/JA4) and blocks non-browser clients. Maigret automatically uses ``curl_cffi`` with Chrome browser emulation to bypass this. Requires the ``curl_cffi`` package (included as a dependency). Examples: Instagram, NPM, Codepen, Kickstarter, Letterboxd.
- ``ip_reputation`` — the site blocks requests from datacenter/cloud IPs regardless of headers or TLS. Cannot be bypassed automatically; run Maigret from a regular internet connection (not a datacenter) or use a proxy (``--proxy``). Examples: Reddit, Patreon, Figma.
- ``js_challenge`` — the site serves a JavaScript challenge page (e.g. "Just a moment...") that cannot be solved without a browser. Maigret detects challenge signatures and returns UNKNOWN instead of a false positive.
Example:
.. code-block:: json
"Instagram": {
"url": "https://www.instagram.com/{username}/",
"checkType": "message",
"presenseStrs": ["\"routePath\":\"\\/"],
"absenceStrs": ["\"routePath\":null"],
"protection": ["tls_fingerprint"]
}
``urlProbe`` (optional profile probe URL)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By default Maigret performs the HTTP request to the same URL as ``url`` (the public profile link pattern).
If you set ``urlProbe`` in ``data.json``, Maigret **fetches** that URL for the presence check (API, GraphQL, JSON endpoint, etc.), while **reports and ``url_user``** still use ``url`` — the human-readable profile page users should open.
Placeholders: ``{username}``, ``{urlMain}``, ``{urlSubpath}`` (same as for ``url``). Example: GitHub uses ``url`` ``https://github.com/{username}`` and ``urlProbe`` ``https://api.github.com/users/{username}``; Picsart uses the web profile ``https://picsart.com/u/{username}`` and probes ``https://api.picsart.com/users/show/{username}.json``.
Implementation: ``make_site_result`` in `checking.py <https://github.com/soxoj/maigret/blob/main/maigret/checking.py>`_.
Site check fixes using LLM
--------------------------
.. note::
The ``LLM/`` directory at the root of the repository contains detailed instructions for editing site checks (in Markdown format): checklist, full guide to ``checkType`` / ``data.json`` / ``urlProbe``, handling false positives, searching for public JSON APIs, and the proposal log for ``socid_extractor``.
Main files:
- `site-checks-playbook.md <https://github.com/soxoj/maigret/blob/main/LLM/site-checks-playbook.md>`_ — short checklist
- `site-checks-guide.md <https://github.com/soxoj/maigret/blob/main/LLM/site-checks-guide.md>`_ — detailed guide
- `socid_extractor_improvements.log <https://github.com/soxoj/maigret/blob/main/LLM/socid_extractor_improvements.log>`_ — template and entries for identity extractor improvements
These files should be kept up-to-date whenever changes are made to the check logic in the code or in ``data.json``.
.. _activation-mechanism:
@@ -194,9 +268,10 @@ PyPi package.
2. Update Maigret version in three files manually:
- setup.py
- pyproject.toml
- maigret/__version__.py
- docs/source/conf.py
- docs/source/conf.py
- snapcraft.yaml
3. Create a new empty text section in the beginning of the file `CHANGELOG.md` with a current date:
+57
View File
@@ -5,6 +5,34 @@ Features
This is the list of Maigret features.
.. _web-interface:
Web Interface
-------------
You can run Maigret with a web interface, where you can view the graph with results and download reports of all formats on a single page.
.. image:: https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot_start.png
:alt: Web interface: how to start
.. image:: https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot.png
:alt: Web interface: results
Instructions:
1. Run Maigret with the ``--web`` flag and specify the port number.
.. code-block:: console
maigret --web 5000
2. Open http://127.0.0.1:5000 in your browser and enter one or more usernames to make a search.
3. Wait a bit for the search to complete and view the graph with results, the table with all accounts found, and download reports of all formats.
Personal info gathering
-----------------------
@@ -142,6 +170,35 @@ Maigret will do retries of the requests with temporary errors got (connection fa
One attempt by default, can be changed with option ``--retries N``.
Database self-check
-------------------
Maigret includes a self-check mode (``--self-check``) that validates every site
in the database by looking up its known-claimed and known-unclaimed usernames
and verifying that the detection results match expectations.
The self-check is **error-resilient**: if an individual site check raises an
unexpected exception (e.g. a network error or a parsing failure), the error is
caught, logged, and recorded as an issue — the remaining sites continue to be
checked without interruption. This means the process always runs to completion,
even when checking hundreds of sites with ``-a --self-check``.
Use ``--auto-disable`` together with ``--self-check`` to automatically disable
sites that fail checks. Without it, issues are only reported. Use ``--diagnose``
to print detailed per-site diagnosis including the check type, specific issues,
and recommendations.
.. code-block:: console
# Report-only mode (no changes to the database)
maigret --self-check
# Automatically disable failing sites and save updates
maigret -a --self-check --auto-disable
# Show detailed diagnosis for each failing site
maigret -a --self-check --diagnose
Archives and mirrors checking
-----------------------------
+36
View File
@@ -90,3 +90,39 @@ Docker
# manual build
docker build -t maigret .
Troubleshooting
---------------
If you encounter build errors during installation such as ``cannot find ft2build.h``
or errors related to ``reportlab`` / ``_renderPM``, you need to install system-level
dependencies required to compile native extensions.
**Debian/Ubuntu/Kali:**
.. code-block:: bash
sudo apt install -y libfreetype6-dev libjpeg-dev libffi-dev
**Fedora/RHEL/CentOS:**
.. code-block:: bash
sudo dnf install -y freetype-devel libjpeg-devel libffi-devel
**Arch Linux:**
.. code-block:: bash
sudo pacman -S freetype2 libjpeg-turbo libffi
**macOS (Homebrew):**
.. code-block:: bash
brew install freetype
After installing the system dependencies, retry the maigret installation.
If you continue to have issues, consider using Docker instead, which includes all
necessary dependencies.
Binary file not shown.

Before

Width:  |  Height:  |  Size: 375 KiB

After

Width:  |  Height:  |  Size: 234 KiB

+74
View File
@@ -27,3 +27,77 @@ Missing any of these files is not an error.
If the next settings file contains already known option,
this option will be rewrited. So it is possible to make
custom configuration for different users and directories.
.. _database-auto-update:
Database auto-update
--------------------
Maigret ships with a bundled site database, but it gets outdated between releases. To keep the database current, Maigret automatically checks for updates on startup.
**How it works:**
1. On startup, Maigret checks if more than 24 hours have passed since the last update check.
2. If so, it fetches a lightweight metadata file (~200 bytes) from GitHub to see if a newer database is available.
3. If a newer, compatible database exists, Maigret downloads it to ``~/.maigret/data.json`` and uses it instead of the bundled copy.
4. If the download fails or the new database is incompatible with your Maigret version, the bundled database is used as a fallback.
The downloaded database has **higher priority** than the bundled one — it replaces, not overlays.
**Status messages** are printed only when an action occurs:
.. code-block:: text
[*] DB auto-update: checking for updates...
[+] DB auto-update: database updated successfully (3180 sites)
[*] DB auto-update: database is up to date (3157 sites)
[!] DB auto-update: latest database requires maigret >= 0.6.0, you have 0.5.0
**Forcing an update:**
Use the ``--force-update`` flag to check for updates immediately, ignoring the check interval:
.. code-block:: console
maigret username --force-update
The update happens at startup, then the search continues normally with the freshly downloaded database.
**Disabling auto-update:**
Use the ``--no-autoupdate`` flag to skip the update check entirely:
.. code-block:: console
maigret username --no-autoupdate
Or set it permanently in ``~/.maigret/settings.json``:
.. code-block:: json
{
"no_autoupdate": true
}
This is recommended for **Docker containers**, **CI pipelines**, and **air-gapped environments**.
**Configuration options** (in ``settings.json``):
.. list-table::
:header-rows: 1
:widths: 35 15 50
* - Setting
- Default
- Description
* - ``no_autoupdate``
- ``false``
- Disable auto-update entirely
* - ``autoupdate_check_interval_hours``
- ``24``
- How often to check for updates (in hours)
* - ``db_update_meta_url``
- GitHub raw URL
- URL of the metadata file (for custom mirrors)
**Using a custom database** with ``--db`` always skips auto-update — you are explicitly choosing your data source.
+22 -1
View File
@@ -10,7 +10,12 @@ The use of tags allows you to select a subset of the sites from big Maigret DB f
There are several types of tags:
1. **Country codes**: ``us``, ``jp``, ``br``... (`ISO 3166-1 alpha-2 <https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2>`_). These tags reflect the site language and regional origin of its users and are then used to locate the owner of a username. If the regional origin is difficult to establish or a site is positioned as worldwide, `no country code is given`. There could be multiple country code tags for one site.
1. **Country codes**: ``us``, ``jp``, ``br``... (`ISO 3166-1 alpha-2 <https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2>`_). A country tag means that having an account on the site implies a connection to that country — either origin or residence. The goal is attribution, not perfect accuracy.
- **Global sites** (GitHub, YouTube, Reddit, Medium, etc.) get **no country tag** — an account there says nothing about where a person is from.
- **Regional/local sites** where an account implies a specific country **must** have a country tag: ``VK````ru``, ``Naver````kr``, ``Zhihu````cn``.
- Multiple country tags are allowed when a service is used predominantly in a few countries (e.g. ``Xing````de``, ``eu``).
- Do **not** assign country tags based on traffic statistics alone — a site popular in India by traffic is not "Indian" if it is used globally.
2. **Site engines**. Most of them are forum engines now: ``uCoz``, ``vBulletin``, ``XenForo`` et al. Full list of engines stored in the Maigret database.
@@ -23,3 +28,19 @@ Usage
``--tags coding`` -- search on sites related to software development.
``--tags ucoz`` -- search on uCoz sites only (mostly CIS countries)
Blacklisting (excluding) tags
------------------------------
You can exclude sites with certain tags from the search using ``--exclude-tags``:
``--exclude-tags porn,dating`` -- skip all sites tagged with ``porn`` or ``dating``.
``--exclude-tags ru`` -- skip all Russian sites.
You can combine ``--tags`` and ``--exclude-tags`` to fine-tune your search:
``--tags forum --exclude-tags ru`` -- search on forum sites, but skip Russian ones.
In the web interface, the tag cloud supports three states per tag:
click once to **include** (green), click again to **exclude** (dark/strikethrough),
and click once more to return to **neutral** (red).
+12 -2
View File
@@ -3,7 +3,17 @@
Usage examples
==============
1. Search for accounts with username ``machine42`` on top 500 sites (by default, according to Alexa rank) from the Maigret DB.
You can use Maigret as:
- a command line tool: initial and a default mode
- a `web interface <#web-interface>`_: view the graph with results and download all report formats on a single page
- a library: integrate Maigret into your own project
Use Cases
---------
1. Search for accounts with username ``machine42`` on top 500 sites (by default, according to Majestic Million rank) from the Maigret DB.
.. code-block:: console
@@ -23,7 +33,7 @@ Usage examples
If you experience many false positives, you can do the following:
- Install the last development version of Maigret from GitHub
- Run Maigret with ``--self-check`` flag and agree on disabling of problematic sites
- Run Maigret with ``--self-check --auto-disable`` flag and agree on disabling of problematic sites
3. Search for accounts with username ``machine42`` and generate HTML and PDF reports.
+1 -1
View File
@@ -1,3 +1,3 @@
"""Maigret version file"""
__version__ = '0.5.0a1'
__version__ = '0.5.0'
+3 -14
View File
@@ -30,17 +30,6 @@ class ParsingActivator:
jwt_token = r.json()["jwt"]
site.headers["Authorization"] = "jwt " + jwt_token
@staticmethod
def spotify(site, logger, cookies={}):
headers = dict(site.headers)
if "Authorization" in headers:
del headers["Authorization"]
import requests
r = requests.get(site.activation["url"])
bearer_token = r.json()["accessToken"]
site.headers["authorization"] = f"Bearer {bearer_token}"
@staticmethod
def weibo(site, logger):
headers = dict(site.headers)
@@ -54,7 +43,7 @@ class ParsingActivator:
logger.debug(
f"1 stage: {'success' if r.status_code == 302 else 'no 302 redirect, fail!'}"
)
location = r.headers.get("Location")
location = r.headers.get("Location", "")
# 2 stage: go to passport visitor page
headers["Referer"] = location
@@ -84,9 +73,9 @@ def import_aiohttp_cookies(cookiestxt_filename):
cookies = CookieJar()
cookies_list = []
for domain in cookies_obj._cookies.values():
for domain in cookies_obj._cookies.values(): # type: ignore[attr-defined]
for key, cookie in list(domain.values())[0].items():
c = Morsel()
c: Morsel = Morsel()
c.set(key, cookie.value, cookie.value)
c["domain"] = cookie.domain
c["path"] = cookie.path
+394 -126
View File
@@ -6,7 +6,7 @@ import random
import re
import ssl
import sys
from typing import Dict, List, Optional, Tuple
from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import quote
# Third party imports
@@ -15,7 +15,7 @@ from alive_progress import alive_bar
from aiohttp import ClientSession, TCPConnector, http_exceptions
from aiohttp.client_exceptions import ClientConnectorError, ServerDisconnectedError
from python_socks import _errors as proxy_errors
from socid_extractor import extract
from socid_extractor import extract # type: ignore[import-not-found]
try:
from mock import Mock
@@ -26,11 +26,7 @@ except ImportError:
from . import errors
from .activation import ParsingActivator, import_aiohttp_cookies
from .errors import CheckError
from .executors import (
AsyncExecutor,
AsyncioSimpleExecutor,
AsyncioProgressbarQueueExecutor,
)
from .executors import AsyncioQueueGeneratorExecutor
from .result import MaigretCheckResult, MaigretCheckStatus
from .sites import MaigretDatabase, MaigretSite
from .types import QueryOptions, QueryResultWrapper
@@ -65,30 +61,49 @@ class SimpleAiohttpChecker(CheckerBase):
self.headers = None
self.allow_redirects = True
self.timeout = 0
self.allow_redirects = True
self.timeout = 0
self.method = 'get'
self.payload = None
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get'):
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get', payload=None):
self.url = url
self.headers = headers
self.allow_redirects = allow_redirects
self.timeout = timeout
self.method = method
self.payload = payload
return None
async def close(self):
pass
async def _make_request(
self, session, url, headers, allow_redirects, timeout, method, logger
) -> Tuple[str, int, Optional[CheckError]]:
self, session, url, headers, allow_redirects, timeout, method, logger, payload=None
) -> Tuple[Optional[str], int, Optional[CheckError]]:
try:
request_method = session.get if method == 'get' else session.head
async with request_method(
url=url,
headers=headers,
allow_redirects=allow_redirects,
timeout=timeout,
) as response:
if method.lower() == 'get':
request_method = session.get
elif method.lower() == 'post':
request_method = session.post
elif method.lower() == 'head':
request_method = session.head
else:
request_method = session.get
kwargs = {
'url': url,
'headers': headers,
'allow_redirects': allow_redirects,
'timeout': timeout,
}
if payload and method.lower() == 'post':
if headers and headers.get('Content-Type') == 'application/x-www-form-urlencoded':
kwargs['data'] = payload
else:
kwargs['json'] = payload
async with request_method(**kwargs) as response:
status_code = response.status
response_content = await response.content.read()
charset = response.charset or "utf-8"
@@ -121,15 +136,21 @@ class SimpleAiohttpChecker(CheckerBase):
logger.debug(e, exc_info=True)
return None, 0, CheckError("Unexpected", str(e))
async def check(self) -> Tuple[str, int, Optional[CheckError]]:
async def check(self) -> Tuple[Optional[str], int, Optional[CheckError]]:
from aiohttp_socks import ProxyConnector
# Use a real SSL context instead of ssl=False to avoid TLS fingerprinting
# blocks by Cloudflare and similar WAFs. Certificate verification is
# disabled to handle sites with invalid/expired certs.
ssl_context = ssl.create_default_context()
ssl_context.check_hostname = False
ssl_context.verify_mode = ssl.CERT_NONE
connector = (
ProxyConnector.from_url(self.proxy)
if self.proxy
else TCPConnector(ssl=False)
else TCPConnector(ssl=ssl_context)
)
connector.verify_ssl = False
async with ClientSession(
connector=connector,
@@ -145,6 +166,7 @@ class SimpleAiohttpChecker(CheckerBase):
self.timeout,
self.method,
self.logger,
self.payload,
)
if error and str(error) == "Invalid proxy response":
@@ -169,11 +191,11 @@ class AiodnsDomainResolver(CheckerBase):
self.logger = kwargs.get('logger', Mock())
self.resolver = aiodns.DNSResolver(loop=loop)
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get'):
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get', payload=None):
self.url = url
return None
async def check(self) -> Tuple[str, int, Optional[CheckError]]:
async def check(self) -> Tuple[Optional[str], int, Optional[CheckError]]:
status = 404
error = None
text = ''
@@ -191,14 +213,84 @@ class AiodnsDomainResolver(CheckerBase):
return text, status, error
try:
from curl_cffi.requests import AsyncSession as CurlCffiAsyncSession
CURL_CFFI_AVAILABLE = True
except ImportError:
CURL_CFFI_AVAILABLE = False
class CurlCffiChecker(CheckerBase):
"""Checker using curl_cffi to emulate browser TLS fingerprint and bypass WAF."""
def __init__(self, *args, **kwargs):
self.logger = kwargs.get('logger', Mock())
self.browser_emulate = kwargs.get('browser_emulate', 'chrome')
self.url = None
self.headers = None
self.allow_redirects = True
self.timeout = 0
self.method = 'get'
self.payload = None
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get', payload=None):
self.url = url
self.headers = headers
self.allow_redirects = allow_redirects
self.timeout = timeout
self.method = method
self.payload = payload
return None
async def close(self):
pass
async def check(self) -> Tuple[Optional[str], int, Optional[CheckError]]:
try:
async with CurlCffiAsyncSession() as session:
kwargs = {
'url': self.url,
'headers': self.headers,
'allow_redirects': self.allow_redirects,
'timeout': self.timeout if self.timeout else 10,
'impersonate': self.browser_emulate,
}
if self.payload and self.method.lower() == 'post':
kwargs['json'] = self.payload
if self.method.lower() == 'post':
response = await session.post(**kwargs)
elif self.method.lower() == 'head':
response = await session.head(**kwargs)
else:
response = await session.get(**kwargs)
status_code = response.status_code
decoded_content = response.text
self.logger.debug(decoded_content)
error = CheckError("Connection lost") if status_code == 0 else None
return decoded_content, status_code, error
except asyncio.TimeoutError as e:
return None, 0, CheckError("Request timeout", str(e))
except KeyboardInterrupt:
return None, 0, CheckError("Interrupted")
except Exception as e:
self.logger.debug(e, exc_info=True)
return None, 0, CheckError("Unexpected", str(e))
class CheckerMock:
def __init__(self, *args, **kwargs):
pass
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get'):
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get', payload=None):
return None
async def check(self) -> Tuple[str, int, Optional[CheckError]]:
async def check(self) -> Tuple[Optional[str], int, Optional[CheckError]]:
await asyncio.sleep(0)
return '', 0, None
@@ -224,6 +316,11 @@ def detect_error_page(
if status_code == 403 and not ignore_403:
return CheckError("Access denied", "403 status code, use proxy/vpn")
elif status_code == 999:
# LinkedIn anti-bot / HTTP 999 workaround. It shouldn't trigger an infrastructure
# Server Error because it represents a valid "Not Found / Blocked" state for the username.
pass
elif status_code >= 500:
return CheckError("Server", f"{status_code} status code")
@@ -311,6 +408,12 @@ def process_site_result(
if html_text:
if not presense_flags:
if check_type == "message" and logger.isEnabledFor(logging.DEBUG):
logger.debug(
"Site %s uses checkType message with empty presenseStrs; "
"presence is treated as true for any page.",
site.name,
)
is_presense_detected = True
site.stats["presense_flag"] = None
else:
@@ -353,7 +456,7 @@ def process_site_result(
result = build_result(MaigretCheckStatus.CLAIMED)
else:
result = build_result(MaigretCheckStatus.AVAILABLE)
elif check_type in "status_code":
elif check_type == "status_code":
# Checks if the status code of the response is 2XX
if 200 <= status_code < 300:
result = build_result(MaigretCheckStatus.CLAIMED)
@@ -436,8 +539,18 @@ def make_site_result(
# workaround to prevent slash errors
url = re.sub("(?<!:)/+", "/", url)
# always clearweb_checker for now
checker = options["checkers"][site.protocol]
# Select checker: use curl_cffi for sites requiring TLS impersonation
needs_impersonation = 'tls_fingerprint' in site.protection
if needs_impersonation and CURL_CFFI_AVAILABLE:
checker = CurlCffiChecker(logger=logger, browser_emulate='chrome')
elif needs_impersonation and not CURL_CFFI_AVAILABLE:
logger.warning(
f"Site {site.name} requires TLS impersonation (curl_cffi) but it's not installed. "
"Install with: pip install curl_cffi"
)
checker = options["checkers"][site.protocol]
else:
checker = options["checkers"][site.protocol]
# site check is disabled
if site.disabled and not options['forced']:
@@ -492,7 +605,9 @@ def make_site_result(
for k, v in site.get_params.items():
url_probe += f"&{k}={v}"
if site.check_type == "status_code" and site.request_head_only:
if site.request_method:
request_method = site.request_method.lower()
elif site.check_type == "status_code" and site.request_head_only:
# In most cases when we are detecting by status code,
# it is not necessary to get the entire body: we can
# detect fine with just the HEAD response.
@@ -503,6 +618,15 @@ def make_site_result(
# not respond properly unless we request the whole page.
request_method = 'get'
payload = None
if site.request_payload:
payload = {}
for k, v in site.request_payload.items():
if isinstance(v, str):
payload[k] = v.format(username=username)
else:
payload[k] = v
if site.check_type == "response_url":
# Site forwards request to a different URL if username not
# found. Disallow the redirect so we can capture the
@@ -519,6 +643,7 @@ def make_site_result(
headers=headers,
allow_redirects=allow_redirects,
timeout=options['timeout'],
payload=payload,
)
# Store future request object in the results object
@@ -545,6 +670,39 @@ async def check_site_for_username(
return site.name, default_result
response = await checker.check()
html_text = response[0] if response and response[0] else ""
# Retry once after token-style activation (e.g. Twitter guest token refresh).
act = site.activation
if act and html_text:
marks = act.get("marks") or []
if marks and any(m in html_text for m in marks):
method = act["method"]
try:
activate_fun = getattr(ParsingActivator(), method)
activate_fun(site, logger)
except AttributeError as e:
logger.warning(
f"Activation method {method} for site {site.name} not found!",
exc_info=True,
)
except Exception as e:
logger.warning(
f"Failed activation {method} for site {site.name}: {str(e)}",
exc_info=True,
)
else:
merged = dict(checker.headers or {})
merged.update(site.headers)
checker.prepare(
url=checker.url,
headers=merged,
allow_redirects=checker.allow_redirects,
timeout=checker.timeout,
method=checker.method,
payload=getattr(checker, 'payload', None),
)
response = await checker.check()
response_result = process_site_result(
response, query_notify, logger, default_result, site
@@ -670,18 +828,13 @@ async def maigret(
await debug_ip_request(clearweb_checker, logger)
# setup parallel executor
executor: Optional[AsyncExecutor] = None
if no_progressbar:
# TODO: switch to AsyncioProgressbarQueueExecutor with progress object mock
executor = AsyncioSimpleExecutor(logger=logger)
else:
executor = AsyncioProgressbarQueueExecutor(
logger=logger,
in_parallel=max_connections,
timeout=timeout + 0.5,
*args,
**kwargs,
)
executor = AsyncioQueueGeneratorExecutor(
logger=logger,
in_parallel=max_connections,
timeout=timeout + 0.5,
*args,
**kwargs,
)
# make options objects for all the requests
options: QueryOptions = {}
@@ -728,13 +881,17 @@ async def maigret(
},
)
cur_results = await executor.run(tasks_dict.values())
# wait for executor timeout errors
await asyncio.sleep(1)
cur_results = []
with alive_bar(
len(tasks_dict), title="Searching", force_tty=True, disable=no_progressbar
) as progress:
async for result in executor.run(list(tasks_dict.values())): # type: ignore[arg-type]
cur_results.append(result)
progress()
all_results.update(cur_results)
# rerun for failed sites
sites = get_failed_sites(dict(cur_results))
attempts -= 1
@@ -793,91 +950,160 @@ async def site_self_check(
i2p_proxy=None,
skip_errors=False,
cookies=None,
auto_disable=False,
diagnose=False,
):
changes = {
"""
Self-check a site configuration.
Args:
auto_disable: If True, automatically disable sites that fail checks.
If False (default), only report issues without disabling.
diagnose: If True, print detailed diagnosis information.
"""
changes: Dict[str, Any] = {
"disabled": False,
"issues": [],
"recommendations": [],
}
check_data = [
(site.username_claimed, MaigretCheckStatus.CLAIMED),
(site.username_unclaimed, MaigretCheckStatus.AVAILABLE),
]
try:
check_data = [
(site.username_claimed, MaigretCheckStatus.CLAIMED),
(site.username_unclaimed, MaigretCheckStatus.AVAILABLE),
]
logger.info(f"Checking {site.name}...")
logger.info(f"Checking {site.name}...")
for username, status in check_data:
async with semaphore:
results_dict = await maigret(
username=username,
site_dict={site.name: site},
logger=logger,
timeout=30,
id_type=site.type,
forced=True,
no_progressbar=True,
retries=1,
proxy=proxy,
tor_proxy=tor_proxy,
i2p_proxy=i2p_proxy,
cookies=cookies,
)
results_cache = {}
# don't disable entries with other ids types
# TODO: make normal checking
if site.name not in results_dict:
logger.info(results_dict)
changes["disabled"] = True
continue
logger.debug(results_dict)
result = results_dict[site.name]["status"]
if result.error and 'Cannot connect to host' in result.error.desc:
changes["disabled"] = True
site_status = result.status
if site_status != status:
if site_status == MaigretCheckStatus.UNKNOWN:
msgs = site.absence_strs
etype = site.check_type
logger.warning(
f"Error while searching {username} in {site.name}: {result.context}, {msgs}, type {etype}"
for username, status in check_data:
async with semaphore:
results_dict = await maigret(
username=username,
site_dict={site.name: site},
logger=logger,
timeout=30,
id_type=site.type,
forced=True,
no_progressbar=True,
retries=1,
proxy=proxy,
tor_proxy=tor_proxy,
i2p_proxy=i2p_proxy,
cookies=cookies,
)
# don't disable sites after the error
# meaning that the site could be available, but returned error for the check
# e.g. many sites protected by cloudflare and available in general
if skip_errors:
pass
# don't disable in case of available username
elif status == MaigretCheckStatus.CLAIMED:
# don't disable entries with other ids types
# TODO: make normal checking
if site.name not in results_dict:
logger.info(results_dict)
changes["issues"].append(f"Site {site.name} not in results (wrong id_type?)")
if auto_disable:
changes["disabled"] = True
continue
logger.debug(results_dict)
result = results_dict[site.name]["status"]
results_cache[username] = results_dict[site.name]
if result.error and 'Cannot connect to host' in result.error.desc:
changes["issues"].append("Cannot connect to host")
if auto_disable:
changes["disabled"] = True
elif status == MaigretCheckStatus.CLAIMED:
logger.warning(
f"Not found `{username}` in {site.name}, must be claimed"
)
logger.info(results_dict[site.name])
changes["disabled"] = True
else:
logger.warning(f"Found `{username}` in {site.name}, must be available")
logger.info(results_dict[site.name])
changes["disabled"] = True
logger.info(f"Site {site.name} checking is finished")
site_status = result.status
if changes["disabled"] != site.disabled:
site.disabled = changes["disabled"]
logger.info(f"Switching property 'disabled' for {site.name} to {site.disabled}")
db.update_site(site)
if not silent:
action = "Disabled" if site.disabled else "Enabled"
print(f"{action} site {site.name}...")
if site_status != status:
if site_status == MaigretCheckStatus.UNKNOWN:
msgs = site.absence_strs
etype = site.check_type
error_msg = f"Error checking {username}: {result.context}"
changes["issues"].append(error_msg)
logger.warning(
f"Error while searching {username} in {site.name}: {result.context}, {msgs}, type {etype}"
)
# don't disable sites after the error
# meaning that the site could be available, but returned error for the check
# e.g. many sites protected by cloudflare and available in general
if skip_errors:
pass
# don't disable in case of available username
elif status == MaigretCheckStatus.CLAIMED and auto_disable:
changes["disabled"] = True
elif status == MaigretCheckStatus.CLAIMED:
changes["issues"].append(f"Claimed user '{username}' not detected as claimed")
logger.warning(
f"Not found `{username}` in {site.name}, must be claimed"
)
logger.info(results_dict[site.name])
if auto_disable:
changes["disabled"] = True
else:
changes["issues"].append(f"Unclaimed user '{username}' detected as claimed")
logger.warning(f"Found `{username}` in {site.name}, must be available")
logger.info(results_dict[site.name])
if auto_disable:
changes["disabled"] = True
# remove service tag "unchecked"
if "unchecked" in site.tags:
site.tags.remove("unchecked")
db.update_site(site)
logger.info(f"Site {site.name} checking is finished")
# Generate recommendations based on issues
if changes["issues"] and len(results_cache) == 2:
claimed_result = results_cache.get(site.username_claimed, {})
unclaimed_result = results_cache.get(site.username_unclaimed, {})
claimed_http = claimed_result.get("http_status")
unclaimed_http = unclaimed_result.get("http_status")
if claimed_http and unclaimed_http:
if claimed_http != unclaimed_http and site.check_type != "status_code":
changes["recommendations"].append(
f"Consider checkType: status_code (HTTP {claimed_http} vs {unclaimed_http})"
)
# Print diagnosis if requested
if diagnose and changes["issues"]:
print(f"\n--- {site.name} DIAGNOSIS ---")
print(f" Check type: {site.check_type}")
print(" Issues:")
for issue in changes["issues"]:
print(f" - {issue}")
if changes["recommendations"]:
print(" Recommendations:")
for rec in changes["recommendations"]:
print(f" -> {rec}")
# Only modify site if auto_disable is enabled
if auto_disable and changes["disabled"] != site.disabled:
site.disabled = changes["disabled"]
logger.info(f"Switching property 'disabled' for {site.name} to {site.disabled}")
db.update_site(site)
if not silent:
action = "Disabled" if site.disabled else "Enabled"
print(f"{action} site {site.name}...")
elif changes["issues"] and not silent and not diagnose:
# Report issues without disabling
print(f"Issues found in {site.name}: {len(changes['issues'])} (not auto-disabled)")
# remove service tag "unchecked"
if "unchecked" in site.tags:
site.tags.remove("unchecked")
db.update_site(site)
except Exception as e:
logger.warning(
f"Self-check of {site.name} failed with unexpected error: {e}",
exc_info=True,
)
changes["issues"].append(f"Unexpected error: {e}")
if auto_disable and not site.disabled:
changes["disabled"] = True
site.disabled = True
db.update_site(site)
if not silent:
print(f"Disabled site {site.name} (unexpected error)...")
return changes
@@ -891,10 +1117,25 @@ async def self_check(
proxy=None,
tor_proxy=None,
i2p_proxy=None,
) -> bool:
auto_disable=False,
diagnose=False,
no_progressbar=False,
) -> dict:
"""
Run self-check on sites.
Args:
auto_disable: If True, automatically disable sites that fail checks.
If False (default), only report issues without disabling.
diagnose: If True, print detailed diagnosis for each failing site.
Returns:
dict with 'needs_update' bool and 'results' list of check results
"""
sem = asyncio.Semaphore(max_connections)
tasks = []
all_sites = site_data
all_results = []
def disabled_count(lst):
return len(list(filter(lambda x: x.disabled, lst)))
@@ -906,15 +1147,29 @@ async def self_check(
for _, site in all_sites.items():
check_coro = site_self_check(
site, logger, sem, db, silent, proxy, tor_proxy, i2p_proxy, skip_errors=True
site, logger, sem, db, silent, proxy, tor_proxy, i2p_proxy,
skip_errors=True, auto_disable=auto_disable, diagnose=diagnose
)
future = asyncio.ensure_future(check_coro)
tasks.append(future)
tasks.append((site.name, future))
if tasks:
with alive_bar(len(tasks), title='Self-checking', force_tty=True) as progress:
for f in asyncio.as_completed(tasks):
await f
with alive_bar(len(tasks), title='Self-checking', force_tty=True, disable=no_progressbar) as progress:
for site_name, f in tasks:
try:
result = await f
except Exception as e:
logger.warning(
f"Self-check task for {site_name} raised unexpected error: {e}",
exc_info=True,
)
result = {
"disabled": False,
"issues": [f"Unexpected error: {e}"],
"recommendations": [],
}
result['site_name'] = site_name
all_results.append(result)
progress() # Update the progress bar
unchecked_new_count = len(
@@ -923,7 +1178,10 @@ async def self_check(
disabled_new_count = disabled_count(all_sites.values())
total_disabled = disabled_new_count - disabled_old_count
if total_disabled:
# Count issues
total_issues = sum(1 for r in all_results if r.get('issues'))
if auto_disable and total_disabled:
if total_disabled >= 0:
message = "Disabled"
else:
@@ -935,11 +1193,21 @@ async def self_check(
f"{message} {total_disabled} ({disabled_old_count} => {disabled_new_count}) checked sites. "
"Run with `--info` flag to get more information"
)
elif total_issues and not silent:
print(f"\nFound issues in {total_issues} sites (auto-disable is OFF)")
print("Use --auto-disable to automatically disable failing sites")
print("Use --diagnose to see detailed diagnosis for each site")
if unchecked_new_count != unchecked_old_count:
print(f"Unchecked sites verified: {unchecked_old_count - unchecked_new_count}")
return total_disabled != 0 or unchecked_new_count != unchecked_old_count
needs_update = total_disabled != 0 or unchecked_new_count != unchecked_old_count
return {
'needs_update': needs_update,
'results': all_results,
'total_issues': total_issues,
}
def extract_ids_data(html_text, logger, site) -> Dict:
@@ -958,7 +1226,7 @@ def parse_usernames(extracted_ids_data, logger) -> Dict:
elif "usernames" in k:
try:
tree = ast.literal_eval(v)
if type(tree) == list:
if isinstance(tree, list):
for n in tree:
new_usernames[n] = "username"
except Exception as e:
+330
View File
@@ -0,0 +1,330 @@
"""
Database auto-update logic for maigret.
Checks a lightweight meta file to determine if a newer site database is available,
downloads it if compatible, and caches it locally in ~/.maigret/.
"""
import hashlib
import json
import logging
import os
import os.path as path
import tempfile
from datetime import datetime, timezone
from typing import Optional
import requests
from colorama import Fore, Style
from .__version__ import __version__
logger = logging.getLogger("maigret")
_use_color = True
def _print_info(msg: str) -> None:
text = f"[*] {msg}"
if _use_color:
print(Style.BRIGHT + Fore.GREEN + text + Style.RESET_ALL)
else:
print(text)
def _print_success(msg: str) -> None:
text = f"[+] {msg}"
if _use_color:
print(Style.BRIGHT + Fore.GREEN + text + Style.RESET_ALL)
else:
print(text)
def _print_warning(msg: str) -> None:
text = f"[!] {msg}"
if _use_color:
print(Style.BRIGHT + Fore.YELLOW + text + Style.RESET_ALL)
else:
print(text)
DEFAULT_META_URL = (
"https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/db_meta.json"
)
DEFAULT_CHECK_INTERVAL_HOURS = 24
MAIGRET_HOME = path.expanduser("~/.maigret")
CACHED_DB_PATH = path.join(MAIGRET_HOME, "data.json")
STATE_PATH = path.join(MAIGRET_HOME, "autoupdate_state.json")
BUNDLED_DB_PATH = path.join(path.dirname(path.realpath(__file__)), "resources", "data.json")
def _parse_version(version_str: str) -> tuple:
"""Parse a version string like '0.5.0' into a comparable tuple (0, 5, 0)."""
try:
return tuple(int(x) for x in version_str.strip().split("."))
except (ValueError, AttributeError):
return (0, 0, 0)
def _ensure_maigret_home() -> None:
os.makedirs(MAIGRET_HOME, exist_ok=True)
def _load_state() -> dict:
try:
with open(STATE_PATH, "r", encoding="utf-8") as f:
return json.load(f)
except (FileNotFoundError, json.JSONDecodeError, OSError):
return {}
def _save_state(state: dict) -> None:
_ensure_maigret_home()
tmp_path = STATE_PATH + ".tmp"
try:
with open(tmp_path, "w", encoding="utf-8") as f:
json.dump(state, f, indent=2, ensure_ascii=False)
os.replace(tmp_path, STATE_PATH)
except OSError:
try:
os.unlink(tmp_path)
except OSError:
pass
def _needs_check(state: dict, interval_hours: int) -> bool:
last_check = state.get("last_check_at")
if not last_check:
return True
try:
last_dt = datetime.fromisoformat(last_check.replace("Z", "+00:00"))
elapsed = (datetime.now(timezone.utc) - last_dt).total_seconds() / 3600
return elapsed >= interval_hours
except (ValueError, TypeError):
return True
def _fetch_meta(meta_url: str, timeout: int = 10) -> Optional[dict]:
try:
response = requests.get(meta_url, timeout=timeout)
if response.status_code == 200:
return response.json()
except Exception:
pass
return None
def _is_version_compatible(meta: dict) -> bool:
min_ver = meta.get("min_maigret_version", "0.0.0")
return _parse_version(__version__) >= _parse_version(min_ver)
def _is_update_available(meta: dict, state: dict) -> bool:
if not path.isfile(CACHED_DB_PATH):
return True
remote_date = meta.get("updated_at", "")
cached_date = state.get("last_meta", {}).get("updated_at", "")
return remote_date > cached_date
def _download_and_verify(data_url: str, expected_sha256: str, timeout: int = 60) -> Optional[str]:
_ensure_maigret_home()
tmp_fd, tmp_path = tempfile.mkstemp(dir=MAIGRET_HOME, suffix=".json")
try:
response = requests.get(data_url, timeout=timeout)
if response.status_code != 200:
return None
content = response.content
actual_sha256 = hashlib.sha256(content).hexdigest()
if actual_sha256 != expected_sha256:
_print_warning("DB auto-update: SHA-256 mismatch, download rejected")
return None
# Validate JSON structure
data = json.loads(content)
if not all(k in data for k in ("sites", "engines", "tags")):
_print_warning("DB auto-update: invalid database structure")
return None
os.write(tmp_fd, content)
os.close(tmp_fd)
tmp_fd = None
os.replace(tmp_path, CACHED_DB_PATH)
return CACHED_DB_PATH
except Exception:
return None
finally:
if tmp_fd is not None:
os.close(tmp_fd)
try:
os.unlink(tmp_path)
except OSError:
pass
def _best_local() -> str:
"""Return cached DB if it exists and is valid, otherwise bundled."""
if path.isfile(CACHED_DB_PATH):
try:
with open(CACHED_DB_PATH, "r", encoding="utf-8") as f:
data = json.load(f)
if "sites" in data:
return CACHED_DB_PATH
except (json.JSONDecodeError, OSError):
pass
return BUNDLED_DB_PATH
def _now_iso() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def resolve_db_path(
db_file_arg: str,
no_autoupdate: bool = False,
meta_url: str = DEFAULT_META_URL,
check_interval_hours: int = DEFAULT_CHECK_INTERVAL_HOURS,
color: bool = True,
) -> str:
"""
Determine which database file to use, potentially downloading an update.
Returns the path to the database file that should be loaded.
"""
global _use_color
_use_color = color
default_db_name = "resources/data.json"
# User specified a custom DB — skip auto-update
is_url = db_file_arg.startswith("http://") or db_file_arg.startswith("https://")
is_default = db_file_arg == default_db_name
if is_url:
return db_file_arg
if not is_default:
return path.join(path.dirname(path.realpath(__file__)), db_file_arg)
# Auto-update disabled
if no_autoupdate:
return _best_local()
# Check interval
_ensure_maigret_home()
state = _load_state()
if not _needs_check(state, check_interval_hours):
return _best_local()
# Time to check
_print_info("DB auto-update: checking for updates...")
meta = _fetch_meta(meta_url)
if meta is None:
_print_warning("DB auto-update: could not reach update server, using local database")
state["last_check_at"] = _now_iso()
_save_state(state)
return _best_local()
# Version compatibility
if not _is_version_compatible(meta):
min_ver = meta.get("min_maigret_version", "?")
_print_warning(
f"DB auto-update: latest database requires maigret >= {min_ver}, "
f"you have {__version__}. Please upgrade with: pip install -U maigret"
)
state["last_check_at"] = _now_iso()
_save_state(state)
return _best_local()
# Check if update available
if not _is_update_available(meta, state):
sites_count = meta.get("sites_count", "?")
_print_info(f"DB auto-update: database is up to date ({sites_count} sites)")
state["last_check_at"] = _now_iso()
state["last_meta"] = meta
_save_state(state)
return _best_local()
# Download update
new_count = meta.get("sites_count", "?")
old_count = state.get("last_meta", {}).get("sites_count")
if old_count:
_print_info(f"DB auto-update: downloading updated database ({new_count} sites, was {old_count})...")
else:
_print_info(f"DB auto-update: downloading database ({new_count} sites)...")
data_url = meta.get("data_url", "")
expected_sha = meta.get("data_sha256", "")
result = _download_and_verify(data_url, expected_sha)
if result is None:
_print_warning("DB auto-update: download failed, using local database")
state["last_check_at"] = _now_iso()
_save_state(state)
return _best_local()
_print_success(f"DB auto-update: database updated successfully ({new_count} sites)")
state["last_check_at"] = _now_iso()
state["last_meta"] = meta
state["cached_db_sha256"] = expected_sha
_save_state(state)
return CACHED_DB_PATH
def force_update(
meta_url: str = DEFAULT_META_URL,
color: bool = True,
) -> bool:
"""
Force check for database updates and download if available.
Returns True if database was updated, False otherwise.
"""
global _use_color
_use_color = color
_ensure_maigret_home()
_print_info("DB update: checking for updates...")
meta = _fetch_meta(meta_url)
if meta is None:
_print_warning("DB update: could not reach update server")
return False
if not _is_version_compatible(meta):
min_ver = meta.get("min_maigret_version", "?")
_print_warning(
f"DB update: latest database requires maigret >= {min_ver}, "
f"you have {__version__}. Please upgrade with: pip install -U maigret"
)
return False
state = _load_state()
new_count = meta.get("sites_count", "?")
old_count = state.get("last_meta", {}).get("sites_count")
if not _is_update_available(meta, state):
_print_info(f"DB update: database is already up to date ({new_count} sites)")
state["last_check_at"] = _now_iso()
state["last_meta"] = meta
_save_state(state)
return False
if old_count:
_print_info(f"DB update: downloading updated database ({new_count} sites, was {old_count})...")
else:
_print_info(f"DB update: downloading database ({new_count} sites)...")
data_url = meta.get("data_url", "")
expected_sha = meta.get("data_sha256", "")
result = _download_and_verify(data_url, expected_sha)
if result is None:
_print_warning("DB update: download failed")
return False
_print_success(f"DB update: database updated successfully ({new_count} sites)")
state["last_check_at"] = _now_iso()
state["last_meta"] = meta
state["cached_db_sha256"] = expected_sha
_save_state(state)
return True
+5
View File
@@ -32,6 +32,9 @@ COMMON_ERRORS = {
'<title>Attention Required! | Cloudflare</title>': CheckError(
'Captcha', 'Cloudflare'
),
'<title>Just a moment</title>': CheckError(
'Bot protection', 'Cloudflare challenge page'
),
'Please stand by, while we are checking your browser': CheckError(
'Bot protection', 'Cloudflare'
),
@@ -55,6 +58,8 @@ COMMON_ERRORS = {
'Censorship', 'MGTS'
),
'Incapsula incident ID': CheckError('Bot protection', 'Incapsula'),
'<title>Client Challenge</title>': CheckError('Bot protection', 'Anti-bot challenge'),
'<title>DDoS-Guard</title>': CheckError('Bot protection', 'DDoS-Guard'),
'Сайт заблокирован хостинг-провайдером': CheckError(
'Site-specific', 'Site is disabled (Beget)'
),
+70 -2
View File
@@ -1,7 +1,7 @@
import asyncio
import sys
import time
from typing import Any, Iterable, List
from typing import Any, Iterable, List, Callable
import alive_progress
from alive_progress import alive_bar
@@ -19,6 +19,7 @@ def create_task_func():
class AsyncExecutor:
# Deprecated: will be removed soon, don't use it
def __init__(self, *args, **kwargs):
self.logger = kwargs['logger']
@@ -34,6 +35,7 @@ class AsyncExecutor:
class AsyncioSimpleExecutor(AsyncExecutor):
# Deprecated: will be removed soon, don't use it
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.semaphore = asyncio.Semaphore(kwargs.get('in_parallel', 100))
@@ -48,6 +50,7 @@ class AsyncioSimpleExecutor(AsyncExecutor):
class AsyncioProgressbarExecutor(AsyncExecutor):
# Deprecated: will be removed soon, don't use it
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
@@ -71,6 +74,7 @@ class AsyncioProgressbarExecutor(AsyncExecutor):
class AsyncioProgressbarSemaphoreExecutor(AsyncExecutor):
# Deprecated: will be removed soon, don't use it
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.semaphore = asyncio.Semaphore(kwargs.get('in_parallel', 1))
@@ -99,7 +103,7 @@ class AsyncioProgressbarQueueExecutor(AsyncExecutor):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.workers_count = kwargs.get('in_parallel', 10)
self.queue = asyncio.Queue(self.workers_count)
self.queue: asyncio.Queue = asyncio.Queue(self.workers_count)
self.timeout = kwargs.get('timeout')
# Pass a progress function; alive_bar by default
self.progress_func = kwargs.get('progress_func', alive_bar)
@@ -174,3 +178,67 @@ class AsyncioProgressbarQueueExecutor(AsyncExecutor):
w.cancel()
return self.results
class AsyncioQueueGeneratorExecutor:
# Deprecated: will be removed soon, don't use it
def __init__(self, *args, **kwargs):
self.workers_count = kwargs.get('in_parallel', 10)
self.queue: asyncio.Queue = asyncio.Queue()
self.timeout = kwargs.get('timeout')
self.logger = kwargs['logger']
self._results: asyncio.Queue = asyncio.Queue()
self._stop_signal = object()
async def worker(self):
"""Process tasks from the queue and put results into the results queue."""
while True:
task = await self.queue.get()
if task is self._stop_signal:
self.queue.task_done()
break
try:
f, args, kwargs = task
query_future = f(*args, **kwargs)
query_task = create_task_func()(query_future)
try:
result = await asyncio.wait_for(query_task, timeout=self.timeout)
except asyncio.TimeoutError:
result = kwargs.get('default')
await self._results.put(result)
except Exception as e:
self.logger.error(f"Error in worker: {e}", exc_info=True)
finally:
self.queue.task_done()
async def run(self, queries: Iterable[Callable[..., Any]]):
"""Run workers to process queries in parallel."""
start_time = time.time()
# Add tasks to the queue
for t in queries:
await self.queue.put(t)
# Create workers
workers = [
asyncio.create_task(self.worker()) for _ in range(self.workers_count)
]
# Add stop signals
for _ in range(self.workers_count):
await self.queue.put(self._stop_signal)
try:
while any(w.done() is False for w in workers) or not self._results.empty():
try:
result = await asyncio.wait_for(self._results.get(), timeout=1)
yield result
except asyncio.TimeoutError:
pass
finally:
# Ensure all workers are awaited
await asyncio.gather(*workers)
self.execution_time = time.time() - start_time
self.logger.debug(f"Spent time: {self.execution_time}")
+123 -8
View File
@@ -13,7 +13,7 @@ from argparse import ArgumentParser, RawDescriptionHelpFormatter
from typing import List, Tuple
import os.path as path
from socid_extractor import extract, parse
from socid_extractor import extract, parse # type: ignore[import-not-found]
from .__version__ import __version__
from .checking import (
@@ -37,6 +37,7 @@ from .report import (
get_plaintext_report,
sort_report_by_data_points,
save_graph_report,
save_markdown_report,
)
from .sites import MaigretDatabase
from .submit import Submitter
@@ -75,7 +76,7 @@ def extract_ids_from_page(url, logger, timeout=5) -> dict:
elif 'usernames' in k:
try:
tree = ast.literal_eval(v)
if type(tree) == list:
if isinstance(tree, list):
for n in tree:
results[n] = 'username'
except Exception as e:
@@ -201,6 +202,20 @@ def setup_arguments_parser(settings: Settings):
default=settings.sites_db_path,
help="Load Maigret database from a JSON file or HTTP web resource.",
)
parser.add_argument(
"--no-autoupdate",
action="store_true",
dest="no_autoupdate",
default=settings.no_autoupdate,
help="Disable automatic database updates on startup.",
)
parser.add_argument(
"--force-update",
action="store_true",
dest="force_update",
default=False,
help="Force check for database updates and download if available.",
)
parser.add_argument(
"--cookies-jar-file",
metavar="COOKIE_FILE",
@@ -277,6 +292,12 @@ def setup_arguments_parser(settings: Settings):
filter_group.add_argument(
"--tags", dest="tags", default='', help="Specify tags of sites (see `--stats`)."
)
filter_group.add_argument(
"--exclude-tags",
dest="exclude_tags",
default='',
help="Specify tags to exclude from search (blacklist).",
)
filter_group.add_argument(
"--site",
action="append",
@@ -316,7 +337,19 @@ def setup_arguments_parser(settings: Settings):
"--self-check",
action="store_true",
default=settings.self_check_enabled,
help="Do self check for sites and database and disable non-working ones.",
help="Do self check for sites and database. Use --auto-disable to disable failing sites.",
)
modes_group.add_argument(
"--auto-disable",
action="store_true",
default=False,
help="With --self-check: automatically disable sites that fail checks.",
)
modes_group.add_argument(
"--diagnose",
action="store_true",
default=False,
help="With --self-check: print detailed diagnosis for each failing site.",
)
modes_group.add_argument(
"--stats",
@@ -324,7 +357,15 @@ def setup_arguments_parser(settings: Settings):
default=False,
help="Show database statistics (most frequent sites engines and tags).",
)
modes_group.add_argument(
"--web",
metavar='PORT',
type=int,
nargs='?', # Optional PORT value
const=5000, # Default PORT if `--web` is provided without a value
default=None, # Explicitly set default to None
help="Launch the web interface on the specified port (default: 5000 if no PORT is provided).",
)
output_group = parser.add_argument_group(
'Output options', 'Options to change verbosity and view of the console output'
)
@@ -425,6 +466,14 @@ def setup_arguments_parser(settings: Settings):
default=settings.pdf_report,
help="Generate a PDF report (general report on all usernames).",
)
report_group.add_argument(
"-M",
"--md",
action="store_true",
dest="md",
default=settings.md_report,
help="Generate a Markdown report (general report on all usernames).",
)
report_group.add_argument(
"-G",
"--graph",
@@ -512,7 +561,26 @@ async def main():
if args.tags:
args.tags = list(set(str(args.tags).split(',')))
db_file = path.join(path.dirname(path.realpath(__file__)), args.db_file)
if args.exclude_tags:
args.exclude_tags = list(set(str(args.exclude_tags).split(',')))
else:
args.exclude_tags = []
from .db_updater import resolve_db_path, force_update, BUNDLED_DB_PATH
if args.force_update:
force_update(
meta_url=settings.db_update_meta_url,
color=not args.no_color,
)
db_file = resolve_db_path(
db_file_arg=args.db_file,
no_autoupdate=args.no_autoupdate or args.force_update,
meta_url=settings.db_update_meta_url,
check_interval_hours=settings.autoupdate_check_interval_hours,
color=not args.no_color,
)
if args.top_sites == 0 or args.all_sites:
args.top_sites = sys.maxsize
@@ -527,10 +595,19 @@ async def main():
)
# Create object with all information about sites we are aware of.
db = MaigretDatabase().load_from_path(db_file)
try:
db = MaigretDatabase().load_from_path(db_file)
except Exception as e:
logger.warning(f"Failed to load database from {db_file}: {e}")
if db_file != BUNDLED_DB_PATH:
logger.warning("Falling back to bundled database")
db = MaigretDatabase().load_from_path(BUNDLED_DB_PATH)
else:
raise
get_top_sites_for_id = lambda x: db.ranked_sites_dict(
top=args.top_sites,
tags=args.tags,
excluded_tags=args.exclude_tags,
names=args.site_list,
disabled=args.use_disabled_sites,
id_type=x,
@@ -556,7 +633,7 @@ async def main():
query_notify.success(
f'Maigret sites database self-check started for {len(site_data)} sites...'
)
is_need_update = await self_check(
check_result = await self_check(
db,
site_data,
logger,
@@ -564,7 +641,13 @@ async def main():
max_connections=args.connections,
tor_proxy=args.tor_proxy,
i2p_proxy=args.i2p_proxy,
auto_disable=args.auto_disable,
diagnose=args.diagnose,
no_progressbar=args.no_progressbar,
)
is_need_update = check_result.get('needs_update', False)
if is_need_update:
if input('Do you want to save changes permanently? [Yn]\n').lower() in (
'y',
@@ -592,6 +675,21 @@ async def main():
# Define one report filename template
report_filepath_tpl = path.join(report_dir, 'report_{username}{postfix}')
# Web interface
if args.web is not None:
from maigret.web.app import app
app.config["MAIGRET_DB_FILE"] = db_file
port = (
args.web if args.web else 5000
) # args.web is either the specified port or 5000 by default
# Host configuration: secure by default, but allow override via environment
host = os.getenv('FLASK_HOST', '127.0.0.1')
app.run(host=host, port=port)
return
if usernames == {}:
# magic params to exit after init
query_notify.warning('No usernames to check, exiting.')
@@ -714,7 +812,7 @@ async def main():
# reporting for all the result
if general_results:
if args.html or args.pdf:
if args.html or args.pdf or args.md:
query_notify.warning('Generating report info...')
report_context = generate_report_context(general_results)
# determine main username
@@ -734,6 +832,23 @@ async def main():
save_pdf_report(filename, report_context)
query_notify.warning(f'PDF report on all usernames saved in {filename}')
if args.md:
username = username.replace('/', '_')
filename = report_filepath_tpl.format(username=username, postfix='.md')
run_flags = []
if args.tags:
run_flags.append(f"--tags {args.tags}")
if args.site_list:
run_flags.append(f"--site {','.join(args.site_list)}")
if args.all_sites:
run_flags.append("--all-sites")
run_info = {
"sites_count": sum(len(d) for _, _, d in general_results),
"flags": " ".join(run_flags) if run_flags else None,
}
save_markdown_report(filename, report_context, run_info=run_info)
query_notify.warning(f'Markdown report on all usernames saved in {filename}')
if args.graph:
username = username.replace('/', '_')
filename = report_filepath_tpl.format(
+1 -1
View File
@@ -174,7 +174,7 @@ class QueryNotifyPrint(QueryNotify):
else:
return self.make_simple_terminal_notify(*args)
def start(self, message, id_type):
def start(self, message=None, id_type="username"):
"""Notify Start.
Will print the title to the standard output.
+249 -79
View File
@@ -7,7 +7,7 @@ import os
from datetime import datetime
from typing import Dict, Any
import xmind
import xmind # type: ignore[import-untyped]
from dateutil.tz import gettz
from dateutil.parser import parse as parse_datetime_str
from jinja2 import Template
@@ -79,7 +79,7 @@ def save_pdf_report(filename: str, context: dict):
filled_template = template.render(**context)
# moved here to speed up the launch of Maigret
from xhtml2pdf import pisa
from xhtml2pdf import pisa # type: ignore[import-untyped]
with open(filename, "w+b") as f:
pisa.pisaDocument(io.StringIO(filled_template), dest=f, default_css=css)
@@ -91,28 +91,27 @@ def save_json_report(filename: str, username: str, results: dict, report_type: s
class MaigretGraph:
other_params = {'size': 10, 'group': 3}
site_params = {'size': 15, 'group': 2}
username_params = {'size': 20, 'group': 1}
other_params: dict = {'size': 10, 'group': 3}
site_params: dict = {'size': 15, 'group': 2}
username_params: dict = {'size': 20, 'group': 1}
def __init__(self, graph):
self.G = graph
def add_node(self, key, value):
def add_node(self, key, value, color=None):
node_name = f'{key}: {value}'
params = self.other_params
params = dict(self.other_params)
if key in SUPPORTED_IDS:
params = self.username_params
params = dict(self.username_params)
elif value.startswith('http'):
params = self.site_params
params = dict(self.site_params)
self.G.add_node(node_name, title=node_name, **params)
if value != value.lower():
normalized_node_name = self.add_node(key, value.lower())
self.link(node_name, normalized_node_name)
params['title'] = node_name
if color:
params['color'] = color
self.G.add_node(node_name, **params)
return node_name
def link(self, node1_name, node2_name):
@@ -120,95 +119,127 @@ class MaigretGraph:
def save_graph_report(filename: str, username_results: list, db: MaigretDatabase):
# moved here to speed up the launch of Maigret
import networkx as nx
G = nx.Graph()
G: Any = nx.Graph()
graph = MaigretGraph(G)
base_site_nodes = {}
site_account_nodes = {}
processed_values: Dict[str, Any] = {} # Track processed values to avoid duplicates
for username, id_type, results in username_results:
username_node_name = graph.add_node(id_type, username)
# Add username node, using normalized version directly if different
norm_username = username.lower()
username_node_name = graph.add_node(id_type, norm_username)
for website_name in results:
dictionary = results[website_name]
# TODO: fix no site data issue
if not dictionary:
continue
if dictionary.get("is_similar"):
for website_name, dictionary in results.items():
if not dictionary or dictionary.get("is_similar"):
continue
status = dictionary.get("status")
if not status: # FIXME: currently in case of timeout
if not status or status.status != MaigretCheckStatus.CLAIMED:
continue
if dictionary["status"].status != MaigretCheckStatus.CLAIMED:
continue
# base site node
site_base_url = website_name
if site_base_url not in base_site_nodes:
base_site_nodes[site_base_url] = graph.add_node(
'site', site_base_url, color='#28a745'
) # Green color
site_fallback_name = dictionary.get(
'url_user', f'{website_name}: {username.lower()}'
)
# site_node_name = dictionary.get('url_user', f'{website_name}: {username.lower()}')
site_node_name = graph.add_node('site', site_fallback_name)
graph.link(username_node_name, site_node_name)
site_base_node_name = base_site_nodes[site_base_url]
# account node
account_url = dictionary.get('url_user', f'{site_base_url}/{norm_username}')
account_node_id = f"{site_base_url}: {account_url}"
if account_node_id not in site_account_nodes:
site_account_nodes[account_node_id] = graph.add_node(
'account', account_url
)
account_node_name = site_account_nodes[account_node_id]
# link username → account → site
graph.link(username_node_name, account_node_name)
graph.link(account_node_name, site_base_node_name)
def process_ids(parent_node, ids):
for k, v in ids.items():
if k.endswith('_count') or k.startswith('is_') or k.endswith('_at'):
continue
if k in 'image':
if (
k.endswith('_count')
or k.startswith('is_')
or k.endswith('_at')
or k in 'image'
):
continue
v_data = v
if v.startswith('['):
try:
v_data = ast.literal_eval(v)
except Exception as e:
logging.error(e)
# Normalize value if string
norm_v = v.lower() if isinstance(v, str) else v
value_key = f"{k}:{norm_v}"
# value is a list
if isinstance(v_data, list):
list_node_name = graph.add_node(k, site_fallback_name)
for vv in v_data:
data_node_name = graph.add_node(vv, site_fallback_name)
graph.link(list_node_name, data_node_name)
if value_key in processed_values:
ids_data_name = processed_values[value_key]
else:
v_data = v
if isinstance(v, str) and v.startswith('['):
try:
v_data = ast.literal_eval(v)
except Exception as e:
logging.error(e)
continue
if isinstance(v_data, list):
list_node_name = graph.add_node(k, site_base_url)
processed_values[value_key] = list_node_name
for vv in v_data:
data_node_name = graph.add_node(vv, site_base_url)
graph.link(list_node_name, data_node_name)
add_ids = {
a: b for b, a in db.extract_ids_from_url(vv).items()
}
if add_ids:
process_ids(data_node_name, add_ids)
ids_data_name = list_node_name
else:
ids_data_name = graph.add_node(k, norm_v)
processed_values[value_key] = ids_data_name
if 'username' in k or k in SUPPORTED_IDS:
new_username_key = f"username:{norm_v}"
if new_username_key not in processed_values:
new_username_node_name = graph.add_node(
'username', norm_v
)
processed_values[new_username_key] = (
new_username_node_name
)
graph.link(ids_data_name, new_username_node_name)
add_ids = {
a: b for b, a in db.extract_ids_from_url(vv).items()
k: v for v, k in db.extract_ids_from_url(v).items()
}
if add_ids:
process_ids(data_node_name, add_ids)
else:
# value is just a string
# ids_data_name = f'{k}: {v}'
# if ids_data_name == parent_node:
# continue
process_ids(ids_data_name, add_ids)
ids_data_name = graph.add_node(k, v)
# G.add_node(ids_data_name, size=10, title=ids_data_name, group=3)
graph.link(parent_node, ids_data_name)
# check for username
if 'username' in k or k in SUPPORTED_IDS:
new_username_node_name = graph.add_node('username', v)
graph.link(ids_data_name, new_username_node_name)
add_ids = {k: v for v, k in db.extract_ids_from_url(v).items()}
if add_ids:
process_ids(ids_data_name, add_ids)
graph.link(parent_node, ids_data_name)
if status.ids_data:
process_ids(site_node_name, status.ids_data)
process_ids(account_node_name, status.ids_data)
nodes_to_remove = []
for node in G.nodes:
if len(str(node)) > 100:
nodes_to_remove.append(node)
# Remove overly long nodes
nodes_to_remove = [node for node in G.nodes if len(str(node)) > 100]
G.remove_nodes_from(nodes_to_remove)
[G.remove_node(node) for node in nodes_to_remove]
# Remove site nodes with only one connection
single_degree_sites = [
n for n, deg in G.degree() if n.startswith("site:") and deg <= 1
]
G.remove_nodes_from(single_degree_sites)
# moved here to speed up the launch of Maigret
from pyvis.network import Network
# Generate interactive visualization
from pyvis.network import Network # type: ignore[import-untyped]
nt = Network(notebook=True, height="750px", width="100%")
nt.from_nx(G)
@@ -226,6 +257,144 @@ def get_plaintext_report(context: dict) -> str:
return output.strip()
def _md_format_value(value) -> str:
"""Format a value for Markdown output, detecting links."""
if isinstance(value, list):
return ", ".join(str(v) for v in value)
s = str(value)
if s.startswith("http://") or s.startswith("https://"):
return f"[{s}]({s})"
return s
def save_markdown_report(filename: str, context: dict, run_info: dict = None):
username = context.get("username", "unknown")
generated_at = context.get("generated_at", "")
brief = context.get("brief", "")
countries = context.get("countries_tuple_list", [])
interests = context.get("interests_tuple_list", [])
first_seen = context.get("first_seen")
results = context.get("results", [])
# Collect ALL values for key fields across all accounts
all_fields: Dict[str, list] = {}
last_seen = None
for _, _, data in results:
for _, v in data.items():
if not v.get("found") or v.get("is_similar"):
continue
ids_data = v.get("ids_data", {})
# Map multiple source fields to unified output fields
field_sources = {
"fullname": ("fullname", "name"),
"location": ("location", "country", "city", "country_code", "locale", "region"),
"gender": ("gender",),
"bio": ("bio", "about", "description"),
}
for out_field, source_keys in field_sources.items():
for src in source_keys:
val = ids_data.get(src)
if val:
all_fields.setdefault(out_field, [])
val_str = str(val)
if val_str not in all_fields[out_field]:
all_fields[out_field].append(val_str)
# Track last_seen
for ts_field in ("last_online", "latest_activity_at", "updated_at"):
ts = ids_data.get(ts_field)
if ts and (last_seen is None or str(ts) > str(last_seen)):
last_seen = ts
lines = []
lines.append(f"# Report by searching on username \"{username}\"\n")
# Generated line with run info
gen_line = f"Generated at {generated_at} by [Maigret](https://github.com/soxoj/maigret)"
if run_info:
parts = []
if run_info.get("sites_count"):
parts.append(f"{run_info['sites_count']} sites checked")
if run_info.get("flags"):
parts.append(f"flags: `{run_info['flags']}`")
if parts:
gen_line += f" ({', '.join(parts)})"
lines.append(f"{gen_line}\n")
# Summary
lines.append("## Summary\n")
lines.append(f"{brief}\n")
if all_fields:
lines.append("**Information extracted from accounts:**\n")
for field, values in all_fields.items():
title = CaseConverter.snake_to_title(field)
lines.append(f"- {title}: {'; '.join(values)}")
lines.append("")
if countries:
geo = ", ".join(f"{code} (x{count})" for code, count in countries)
lines.append(f"**Country tags:** {geo}\n")
if interests:
tags = ", ".join(f"{tag} (x{count})" for tag, count in interests)
lines.append(f"**Website tags:** {tags}\n")
if first_seen:
lines.append(f"**First seen:** {first_seen}")
if last_seen:
lines.append(f"**Last seen:** {last_seen}")
if first_seen or last_seen:
lines.append("")
# Accounts found
lines.append("## Accounts found\n")
for u, id_type, data in results:
for site_name, v in data.items():
if not v.get("found") or v.get("is_similar"):
continue
lines.append(f"### {site_name}\n")
lines.append(f"- **URL:** [{v.get('url_user', '')}]({v.get('url_user', '')})")
tags = v.get("status") and v["status"].tags or []
if tags:
lines.append(f"- **Tags:** {', '.join(tags)}")
lines.append("")
ids_data = v.get("ids_data", {})
if ids_data:
for field, value in ids_data.items():
if field == "image":
continue
title = CaseConverter.snake_to_title(field)
lines.append(f"- {title}: {_md_format_value(value)}")
lines.append("")
# Possible false positives
lines.append("## Possible false positives\n")
lines.append(
f"This report was generated by searching for accounts matching the username `{username}`. "
f"Accounts listed above may belong to different people who happen to use the same "
f"or similar username. Results without extracted personal information could contain "
f"some false positive findings. Always verify findings before drawing conclusions.\n"
)
# Ethical use
lines.append("## Ethical use\n")
lines.append(
"This report is a result of a technical collection of publicly available information "
"from online accounts and does not constitute personal data processing. If you intend "
"to use this data for personal data processing or collection purposes, ensure your use "
"complies with applicable laws and regulations in your jurisdiction (such as GDPR, "
"CCPA, and similar).\n"
)
with open(filename, "w", encoding="utf-8") as f:
f.write("\n".join(lines))
"""
REPORTS GENERATING
"""
@@ -322,11 +491,12 @@ def generate_report_context(username_results: list):
if k in ["country", "locale"]:
try:
if is_country_tag(k):
tag = pycountry.countries.get(alpha_2=v).alpha_2.lower()
country = pycountry.countries.get(alpha_2=v)
tag = country.alpha_2.lower() # type: ignore[union-attr]
else:
tag = pycountry.countries.search_fuzzy(v)[
0
].alpha_2.lower()
].alpha_2.lower() # type: ignore[attr-defined]
# TODO: move countries to another struct
tags[tag] = tags.get(tag, 0) + 1
except Exception as e:
@@ -482,8 +652,8 @@ def add_xmind_subtopic(userlink, k, v, supposed_data):
def design_xmind_sheet(sheet, username, results):
alltags = {}
supposed_data = {}
alltags: Dict[str, Any] = {}
supposed_data: Dict[str, Any] = {}
sheet.setTitle("%s Analysis" % (username))
root_topic1 = sheet.getRootTopic()
+24215 -24612
View File
File diff suppressed because it is too large Load Diff
+8
View File
@@ -0,0 +1,8 @@
{
"version": 1,
"updated_at": "2026-04-07T16:18:18Z",
"sites_count": 3155,
"min_maigret_version": "0.5.0",
"data_sha256": "279fb90280814cd11dcd711b1b8e6c6a99fefea4ce6ef05c9d64dced6ac795c0",
"data_url": "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/data.json"
}
+6 -1
View File
@@ -53,5 +53,10 @@
"xmind_report": false,
"graph_report": false,
"pdf_report": false,
"html_report": false
"html_report": false,
"md_report": false,
"web_interface_port": 5000,
"no_autoupdate": false,
"db_update_meta_url": "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/db_meta.json",
"autoupdate_check_interval_hours": 24
}
+6 -1
View File
@@ -5,7 +5,7 @@ from typing import List
SETTINGS_FILES_PATHS = [
path.join(path.dirname(path.realpath(__file__)), "resources/settings.json"),
'~/.maigret/settings.json',
path.expanduser('~/.maigret/settings.json'),
path.join(os.getcwd(), 'settings.json'),
]
@@ -42,6 +42,11 @@ class Settings:
pdf_report: bool
html_report: bool
graph_report: bool
md_report: bool
web_interface_port: int
no_autoupdate: bool
db_update_meta_url: str
autoupdate_check_interval_hours: int
# submit mode settings
presence_strings: list
+73 -9
View File
@@ -65,6 +65,10 @@ class MaigretSite:
url_probe = None
# Type of check to perform
check_type = ""
# HTTP request method (GET, POST, HEAD, etc.)
request_method = ""
# HTTP request payload (for POST, PUT, etc.)
request_payload: Dict[str, Any] = {}
# Whether to only send HEAD requests (GET by default)
request_head_only = ""
# GET parameters to include in requests
@@ -88,10 +92,12 @@ class MaigretSite:
# Alexa traffic rank
alexa_rank = None
# Source (in case a site is a mirror of another site)
source = None
source: Optional[str] = None
# URL protocol (http/https)
protocol = ''
# Protection types detected on this site (e.g. ["tls_fingerprint", "ddos_guard"])
protection: List[str] = []
def __init__(self, name, information):
self.name = name
@@ -137,6 +143,8 @@ class MaigretSite:
'regex_check',
'url_probe',
'check_type',
'request_method',
'request_payload',
'request_head_only',
'get_params',
'presense_strs',
@@ -167,7 +175,7 @@ class MaigretSite:
self.__dict__[CaseConverter.camel_to_snake(group)],
)
self.url_regexp = URLMatcher.make_profile_url_regexp(url, self.regex_check)
self.url_regexp = URLMatcher.make_profile_url_regexp(url, self.regex_check or "")
def detect_username(self, url: str) -> Optional[str]:
if self.url_regexp:
@@ -318,6 +326,7 @@ class MaigretDatabase:
reverse=False,
top=sys.maxsize,
tags=[],
excluded_tags=[],
names=[],
disabled=True,
id_type="username",
@@ -325,19 +334,30 @@ class MaigretDatabase:
"""
Ranking and filtering of the sites list
When ``top`` is limited (not "all sites"), **mirrors** may be appended after
the Alexa-ranked slice. A mirror is any filtered site with a non-empty
``source`` field equal to the name of a site that appears in the first
``top`` positions of a **parent ranking** that includes disabled sites.
Thus mirrors such as third-party viewers (e.g. for Twitter or Instagram)
are still scanned when their parent platform ranks highly, even if the
official site is disabled and omitted from the main list.
Args:
reverse (bool, optional): Reverse the sorting order. Defaults to False.
top (int, optional): Maximum number of sites to return. Defaults to sys.maxsize.
tags (list, optional): List of tags to filter sites by. Defaults to empty list.
tags (list, optional): List of tags to filter sites by (whitelist). Defaults to empty list.
excluded_tags (list, optional): List of tags to exclude sites by (blacklist). Defaults to empty list.
names (list, optional): List of site names (or urls, see MaigretSite.__eq__) to filter by. Defaults to empty list.
disabled (bool, optional): Whether to include disabled sites. Defaults to True.
id_type (str, optional): Type of identifier to filter by. Defaults to "username".
Returns:
dict: Dictionary of filtered and ranked sites, with site names as keys and MaigretSite objects as values
dict: Dictionary of filtered and ranked sites (base top slice plus mirrors),
with site names as keys and MaigretSite objects as values
"""
normalized_names = list(map(str.lower, names))
normalized_tags = list(map(str.lower, tags))
normalized_excluded_tags = list(map(str.lower, excluded_tags))
is_name_ok = lambda x: x.name.lower() in normalized_names
is_source_ok = lambda x: x.source and x.source.lower() in normalized_names
@@ -351,6 +371,22 @@ class MaigretDatabase:
)
is_id_type_ok = lambda x: x.type == id_type
is_excluded_by_tag = lambda x: set(
map(str.lower, x.tags)
).intersection(set(normalized_excluded_tags))
is_excluded_by_engine = lambda x: (
isinstance(x.engine, str)
and x.engine.lower() in normalized_excluded_tags
)
is_excluded_by_protocol = lambda x: (
x.protocol and x.protocol in normalized_excluded_tags
)
is_not_excluded = lambda x: not excluded_tags or not (
is_excluded_by_tag(x)
or is_excluded_by_engine(x)
or is_excluded_by_protocol(x)
)
filter_tags_engines_fun = (
lambda x: not tags
or is_engine_ok(x)
@@ -361,6 +397,7 @@ class MaigretDatabase:
filter_fun = (
lambda x: filter_tags_engines_fun(x)
and is_not_excluded(x)
and filter_names_fun(x)
and is_disabled_needed(x)
and is_id_type_ok(x)
@@ -371,6 +408,33 @@ class MaigretDatabase:
sorted_list = sorted(
filtered_list, key=lambda x: x.alexa_rank, reverse=reverse
)[:top]
# Mirrors: sites whose `source` matches a parent platform that ranks in the
# top `top` by Alexa when disabled entries are included in the ranking pool
# (so e.g. Instagram can be a parent for Picuki even if Instagram is disabled).
if top < sys.maxsize and sorted_list:
filter_fun_ranking_parents = (
lambda x: filter_tags_engines_fun(x)
and is_not_excluded(x)
and filter_names_fun(x)
and is_id_type_ok(x)
)
ranking_pool = [s for s in self.sites if filter_fun_ranking_parents(s)]
sorted_parents = sorted(
ranking_pool, key=lambda x: x.alexa_rank, reverse=reverse
)[:top]
parent_names_lower = {s.name.lower() for s in sorted_parents}
base_names = {s.name for s in sorted_list}
def is_mirror(s) -> bool:
if not s.source or s.name in base_names:
return False
return s.source.lower() in parent_names_lower
mirrors = [s for s in filtered_list if is_mirror(s)]
mirrors.sort(key=lambda x: (x.alexa_rank, x.name))
sorted_list = list(sorted_list) + mirrors
return {site.name: site for site in sorted_list}
@property
@@ -400,9 +464,9 @@ class MaigretDatabase:
"tags": self._tags,
}
json_data = json.dumps(db_data, indent=4)
json_data = json.dumps(db_data, indent=4, ensure_ascii=False)
with open(filename, "w") as f:
with open(filename, "w", encoding="utf-8") as f:
f.write(json_data)
return self
@@ -502,7 +566,7 @@ class MaigretDatabase:
def get_scan_stats(self, sites_dict):
sites = sites_dict or self.sites_dict
found_flags = {}
found_flags: Dict[str, int] = {}
for _, s in sites.items():
if "presense_flag" in s.stats:
flag = s.stats["presense_flag"]
@@ -523,8 +587,8 @@ class MaigretDatabase:
def get_db_stats(self, is_markdown=False):
# Initialize counters
sites_dict = self.sites_dict
urls = {}
tags = {}
urls: Dict[str, int] = {}
tags: Dict[str, int] = {}
disabled_count = 0
message_checks_one_factor = 0
status_checks = 0
+42 -28
View File
@@ -6,8 +6,7 @@ import logging
from typing import Any, Dict, List, Optional, Tuple
from aiohttp import ClientSession, TCPConnector
from aiohttp_socks import ProxyConnector
import cloudscraper
import cloudscraper # type: ignore[import-untyped]
from colorama import Fore, Style
from .activation import import_aiohttp_cookies
@@ -68,8 +67,10 @@ class Submitter:
else:
cookie_jar = import_aiohttp_cookies(args.cookie_file)
connector = ProxyConnector.from_url(proxy) if proxy else TCPConnector(ssl=False)
connector.verify_ssl = False
ssl_context = __import__('ssl').create_default_context()
ssl_context.check_hostname = False
ssl_context.verify_mode = __import__('ssl').CERT_NONE
connector = ProxyConnector.from_url(proxy) if proxy else TCPConnector(ssl=ssl_context)
self.session = ClientSession(
connector=connector, trust_env=True, cookie_jar=cookie_jar
)
@@ -88,7 +89,9 @@ class Submitter:
alexa_rank = 0
try:
alexa_rank = int(root.find('.//REACH').attrib['RANK'])
reach_elem = root.find('.//REACH')
if reach_elem is not None:
alexa_rank = int(reach_elem.attrib['RANK'])
except Exception:
pass
@@ -127,7 +130,7 @@ class Submitter:
async def detect_known_engine(
self, url_exists, url_mainpage, session, follow_redirects, headers
) -> [List[MaigretSite], str]:
) -> Tuple[List[MaigretSite], str]:
session = session or self.session
resp_text, _ = await self.get_html_response_to_compare(
@@ -188,10 +191,12 @@ class Submitter:
)
return entered_username if entered_username else supposed_username
# TODO: replace with checking.py/SimpleAiohttpChecker call
@staticmethod
async def get_html_response_to_compare(
url: str, session: ClientSession = None, redirects=False, headers: Dict = None
url: str, session: Optional[ClientSession] = None, redirects=False, headers: Optional[Dict] = None
):
assert session is not None, "session must not be None"
async with session.get(
url, allow_redirects=redirects, headers=headers
) as response:
@@ -210,10 +215,10 @@ class Submitter:
username: str,
url_exists: str,
cookie_filename="", # TODO: use cookies
session: ClientSession = None,
session: Optional[ClientSession] = None,
follow_redirects=False,
headers: dict = None,
) -> Tuple[List[str], List[str], str, str]:
headers: Optional[dict] = None,
) -> Tuple[Optional[List[str]], Optional[List[str]], str, str]:
random_username = generate_random_username()
url_of_non_existing_account = url_exists.lower().replace(
@@ -268,11 +273,8 @@ class Submitter:
tokens_a = set(re.split(f'[{self.SEPARATORS}]', first_html_response))
tokens_b = set(re.split(f'[{self.SEPARATORS}]', second_html_response))
a_minus_b = tokens_a.difference(tokens_b)
b_minus_a = tokens_b.difference(tokens_a)
a_minus_b = list(map(lambda x: x.strip('\\'), a_minus_b))
b_minus_a = list(map(lambda x: x.strip('\\'), b_minus_a))
a_minus_b: List[str] = [x.strip('\\') for x in tokens_a.difference(tokens_b)]
b_minus_a: List[str] = [x.strip('\\') for x in tokens_b.difference(tokens_a)]
# Filter out strings containing usernames
a_minus_b = [s for s in a_minus_b if username.lower() not in s.lower()]
@@ -377,7 +379,7 @@ class Submitter:
).strip()
if field in ['tags', 'presense_strs', 'absence_strs']:
new_value = list(map(str.strip, new_value.split(',')))
new_value = list(map(str.strip, new_value.split(','))) # type: ignore[assignment]
if new_value:
setattr(site, field, new_value)
@@ -408,8 +410,13 @@ class Submitter:
self.logger.info('Domain is %s', domain_raw)
# check for existence
domain_re = re.compile(
r'://(www\.)?' + re.escape(domain_raw) + r'(/|$)'
)
matched_sites = list(
filter(lambda x: domain_raw in x.url_main + x.url, self.db.sites)
filter(
lambda x: domain_re.search(x.url_main + x.url), self.db.sites
)
)
if matched_sites:
@@ -418,12 +425,12 @@ class Submitter:
f"{Fore.YELLOW}[!] Sites with domain \"{domain_raw}\" already exists in the Maigret database!{Style.RESET_ALL}"
)
status = lambda s: "(disabled)" if s.disabled else ""
site_status = lambda s: "(disabled)" if s.disabled else ""
url_block = lambda s: f"\n\t{s.url_main}\n\t{s.url}"
print(
"\n".join(
[
f"{site.name} {status(site)}{url_block(site)}"
f"{site.name} {site_status(site)}{url_block(site)}"
for site in matched_sites
]
)
@@ -447,9 +454,14 @@ class Submitter:
old_site = next(
(site for site in matched_sites if site.name == site_name), None
)
print(
f'{Fore.GREEN}[+] We will update site "{old_site.name}" in case of success.{Style.RESET_ALL}'
)
if old_site is None:
print(
f'{Fore.RED}[!] Site "{site_name}" not found in the matched list. Proceeding without updating an existing site.{Style.RESET_ALL}'
)
else:
print(
f'{Fore.GREEN}[+] We will update site "{old_site.name}" in case of success.{Style.RESET_ALL}'
)
# Check if the site check is ordinary or not
if old_site and (old_site.url_probe or old_site.activation):
@@ -486,7 +498,7 @@ class Submitter:
)
print('Detecting site engine, please wait...')
sites = []
sites: List[MaigretSite] = []
text = None
try:
sites, text = await self.detect_known_engine(
@@ -499,7 +511,7 @@ class Submitter:
except KeyboardInterrupt:
print('Engine detect process is interrupted.')
if 'cloudflare' in text.lower():
if text and 'cloudflare' in text.lower():
print(
'Cloudflare protection detected. I will use cloudscraper for further work'
)
@@ -562,6 +574,8 @@ class Submitter:
found = True
break
assert chosen_site is not None, "No sites to check"
if not found:
print(
f"{Fore.RED}[!] The check for site '{chosen_site.name}' failed!{Style.RESET_ALL}"
@@ -620,8 +634,8 @@ class Submitter:
# chosen_site.alexa_rank = rank
self.logger.info(chosen_site.json)
site_data = chosen_site.strip_engine_data()
self.logger.info(site_data.json)
stripped_site = chosen_site.strip_engine_data()
self.logger.info(stripped_site.json)
if old_site:
# Update old site with new values and log changes
@@ -640,7 +654,7 @@ class Submitter:
for field, display_name in fields_to_check.items():
old_value = getattr(old_site, field)
new_value = getattr(site_data, field)
new_value = getattr(stripped_site, field)
if field == 'tags' and not new_tags:
continue
if str(old_value) != str(new_value):
@@ -650,7 +664,7 @@ class Submitter:
old_site.__dict__[field] = new_value
# update the site
final_site = old_site if old_site else site_data
final_site = old_site if old_site else stripped_site
self.db.update_site(final_site)
# save the db in file
+5 -2
View File
@@ -71,7 +71,10 @@ class URLMatcher:
def ascii_data_display(data: str) -> Any:
return ast.literal_eval(data)
try:
return ast.literal_eval(data)
except (ValueError, SyntaxError):
return data
def get_dict_ascii_tree(items, prepend="", new_line=True):
@@ -86,7 +89,7 @@ def get_dict_ascii_tree(items, prepend="", new_line=True):
new_result + new_line if num != len(items) - 1 else last_result + new_line
)
if type(item) == tuple:
if isinstance(item, tuple):
field_name, field_value = item
if field_value.startswith("['"):
is_last_item = num == len(items) - 1
+353
View File
@@ -0,0 +1,353 @@
from flask import (
Flask,
render_template,
request,
send_file,
Response,
flash,
redirect,
url_for,
)
import logging
import os
import asyncio
from datetime import datetime
from threading import Thread
from typing import Any, Dict
import maigret
import maigret.settings
from maigret.sites import MaigretDatabase
from maigret.report import generate_report_context
app = Flask(__name__)
# Use environment variable for secret key, generate random one if not set
app.secret_key = os.getenv('FLASK_SECRET_KEY', os.urandom(24).hex())
# add background job tracking
background_jobs: Dict[str, Any] = {}
job_results = {}
# Configuration
app.config["MAIGRET_DB_FILE"] = os.path.join(os.path.dirname(os.path.dirname(__file__)), 'resources', 'data.json')
app.config["COOKIES_FILE"] = "cookies.txt"
app.config["UPLOAD_FOLDER"] = 'uploads'
app.config["REPORTS_FOLDER"] = os.path.abspath('/tmp/maigret_reports')
def setup_logger(log_level, name):
logger = logging.getLogger(name)
logger.setLevel(log_level)
return logger
async def maigret_search(username, options):
logger = setup_logger(logging.WARNING, 'maigret')
try:
db = MaigretDatabase().load_from_path(app.config["MAIGRET_DB_FILE"])
top_sites = int(options.get('top_sites') or 500)
if options.get('all_sites'):
top_sites = 999999999 # effectively all
tags = options.get('tags', [])
excluded_tags = options.get('excluded_tags', [])
site_list = options.get('site_list', [])
logger.info(f"Filtering sites by tags: {tags}, excluded: {excluded_tags}")
sites = db.ranked_sites_dict(
top=top_sites,
tags=tags,
excluded_tags=excluded_tags,
names=site_list,
disabled=False,
id_type='username',
)
logger.info(f"Found {len(sites)} sites matching the tag criteria")
results = await maigret.search(
username=username,
site_dict=sites,
timeout=int(options.get('timeout', 30)),
logger=logger,
id_type='username',
cookies=app.config["COOKIES_FILE"] if options.get('use_cookies') else None,
is_parsing_enabled=(not options.get('disable_extracting', False)),
recursive_search_enabled=(
not options.get('disable_recursive_search', False)
),
check_domains=options.get('with_domains', False),
proxy=options.get('proxy', None),
tor_proxy=options.get('tor_proxy', None),
i2p_proxy=options.get('i2p_proxy', None),
)
return results
except Exception as e:
logger.error(f"Error during search: {str(e)}")
raise
async def search_multiple_usernames(usernames, options):
results = []
for username in usernames:
try:
search_results = await maigret_search(username.strip(), options)
results.append((username.strip(), 'username', search_results))
except Exception as e:
logging.error(f"Error searching username {username}: {str(e)}")
return results
def process_search_task(usernames, options, timestamp):
try:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
general_results = loop.run_until_complete(
search_multiple_usernames(usernames, options)
)
os.makedirs(app.config["REPORTS_FOLDER"], exist_ok=True)
session_folder = os.path.join(
app.config["REPORTS_FOLDER"], f"search_{timestamp}"
)
os.makedirs(session_folder, exist_ok=True)
graph_path = os.path.join(session_folder, "combined_graph.html")
maigret.report.save_graph_report(
graph_path,
general_results,
MaigretDatabase().load_from_path(app.config["MAIGRET_DB_FILE"]),
)
individual_reports = []
for username, id_type, results in general_results:
report_base = os.path.join(session_folder, f"report_{username}")
csv_path = f"{report_base}.csv"
json_path = f"{report_base}.json"
pdf_path = f"{report_base}.pdf"
html_path = f"{report_base}.html"
context = generate_report_context(general_results)
maigret.report.save_csv_report(csv_path, username, results)
maigret.report.save_json_report(
json_path, username, results, report_type='ndjson'
)
maigret.report.save_pdf_report(pdf_path, context)
maigret.report.save_html_report(html_path, context)
claimed_profiles = []
for site_name, site_data in results.items():
if (
site_data.get('status')
and site_data['status'].status
== maigret.result.MaigretCheckStatus.CLAIMED
):
claimed_profiles.append(
{
'site_name': site_name,
'url': site_data.get('url_user', ''),
'tags': (
site_data.get('status').tags
if site_data.get('status')
else []
),
}
)
individual_reports.append(
{
'username': username,
'csv_file': os.path.join(
f"search_{timestamp}", f"report_{username}.csv"
),
'json_file': os.path.join(
f"search_{timestamp}", f"report_{username}.json"
),
'pdf_file': os.path.join(
f"search_{timestamp}", f"report_{username}.pdf"
),
'html_file': os.path.join(
f"search_{timestamp}", f"report_{username}.html"
),
'claimed_profiles': claimed_profiles,
}
)
# save results and mark job as complete using timestamp as key
job_results[timestamp] = {
'status': 'completed',
'session_folder': f"search_{timestamp}",
'graph_file': os.path.join(f"search_{timestamp}", "combined_graph.html"),
'usernames': usernames,
'individual_reports': individual_reports,
}
except Exception as e:
logging.error(f"Error in search task for timestamp {timestamp}: {str(e)}")
job_results[timestamp] = {'status': 'failed', 'error': str(e)}
finally:
background_jobs[timestamp]['completed'] = True
@app.route('/')
def index():
# load site data for autocomplete
db = MaigretDatabase().load_from_path(app.config["MAIGRET_DB_FILE"])
site_options = []
for site in db.sites:
# add main site name
site_options.append(site.name)
# add URL if different from name
if site.url_main and site.url_main not in site_options:
site_options.append(site.url_main)
# sort and deduplicate
site_options = sorted(set(site_options))
return render_template('index.html', site_options=site_options)
# Modified search route
@app.route('/search', methods=['POST'])
def search():
usernames_input = request.form.get('usernames', '').strip()
if not usernames_input:
flash('At least one username is required', 'danger')
return redirect(url_for('index'))
usernames = [
u.strip() for u in usernames_input.replace(',', ' ').split() if u.strip()
]
# Create timestamp for this search session
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
# Get selected tags - ensure it's a list
selected_tags = request.form.getlist('tags')
excluded_tags = request.form.getlist('excluded_tags')
logging.info(f"Selected tags: {selected_tags}, Excluded tags: {excluded_tags}")
options = {
'top_sites': request.form.get('top_sites') or '500',
'timeout': request.form.get('timeout') or '30',
'use_cookies': 'use_cookies' in request.form,
'all_sites': 'all_sites' in request.form,
'disable_recursive_search': 'disable_recursive_search' in request.form,
'disable_extracting': 'disable_extracting' in request.form,
'with_domains': 'with_domains' in request.form,
'proxy': request.form.get('proxy', None) or None,
'tor_proxy': request.form.get('tor_proxy', None) or None,
'i2p_proxy': request.form.get('i2p_proxy', None) or None,
'permute': 'permute' in request.form,
'tags': selected_tags, # Pass selected tags as a list
'excluded_tags': excluded_tags, # Pass excluded tags as a list
'site_list': [
s.strip() for s in request.form.get('site', '').split(',') if s.strip()
],
}
logging.info(
f"Starting search for usernames: {usernames} with tags: {selected_tags}, excluded: {excluded_tags}"
)
# Start background job
background_jobs[timestamp] = {
'completed': False,
'thread': Thread(
target=process_search_task, args=(usernames, options, timestamp)
),
}
background_jobs[timestamp]['thread'].start() # type: ignore[union-attr]
return redirect(url_for('status', timestamp=timestamp))
@app.route('/status/<timestamp>')
def status(timestamp):
logging.info(f"Status check for timestamp: {timestamp}")
# Validate timestamp
if timestamp not in background_jobs:
flash('Invalid search session.', 'danger')
logging.error(f"Invalid search session: {timestamp}")
return redirect(url_for('index'))
# Check if job is completed
if background_jobs[timestamp]['completed']:
result = job_results.get(timestamp)
if not result:
flash('No results found for this search session.', 'warning')
logging.error(f"No results found for completed session: {timestamp}")
return redirect(url_for('index'))
if result['status'] == 'completed':
# Note: use the session_folder from the results to redirect
return redirect(url_for('results', session_id=result['session_folder']))
else:
error_msg = result.get('error', 'Unknown error occurred.')
flash(f'Search failed: {error_msg}', 'danger')
logging.error(f"Search failed for session {timestamp}: {error_msg}")
return redirect(url_for('index'))
# If job is still running, show a status page
return render_template('status.html', timestamp=timestamp)
@app.route('/results/<session_id>')
def results(session_id):
# Find completed results that match this session_folder
result_data = next(
(
r
for r in job_results.values()
if r.get('status') == 'completed' and r['session_folder'] == session_id
),
None,
)
if not result_data:
flash('No results found for this session ID.', 'danger')
logging.error(f"Results for session {session_id} not found in job_results.")
return redirect(url_for('index'))
return render_template(
'results.html',
usernames=result_data['usernames'],
graph_file=result_data['graph_file'],
individual_reports=result_data['individual_reports'],
timestamp=session_id.replace('search_', ''),
)
@app.route('/reports/<path:filename>')
def download_report(filename):
try:
os.makedirs(app.config["REPORTS_FOLDER"], exist_ok=True)
file_path = os.path.normpath(
os.path.join(app.config["REPORTS_FOLDER"], filename)
)
if not file_path.startswith(app.config["REPORTS_FOLDER"]):
raise Exception("Invalid file path")
return send_file(file_path)
except Exception as e:
logging.error(f"Error serving file {filename}: {str(e)}")
return "File not found", 404
if __name__ == '__main__':
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
)
debug_mode = os.getenv('FLASK_DEBUG', 'False').lower() in ['true', '1', 't']
# Host configuration: secure by default
# Use 127.0.0.1 for local development, 0.0.0.0 only if explicitly set
host = os.getenv('FLASK_HOST', '127.0.0.1')
port = int(os.getenv('FLASK_PORT', '5000'))
app.run(host=host, port=port, debug=debug_mode)
Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

+118
View File
@@ -0,0 +1,118 @@
<!DOCTYPE html>
<html lang="en" data-bs-theme="dark">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Maigret Web Interface</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
<style>
body {
min-height: 100vh;
display: flex;
flex-direction: column;
}
.main-container {
flex: 1;
padding-top: 2rem;
}
.form-container {
max-width: auto;
margin: auto;
padding-bottom: 2rem;
}
[data-bs-theme="dark"] {
--bs-body-bg: #212529;
--bs-body-color: #dee2e6;
}
.header {
padding: 1rem 0;
margin-bottom: 2rem;
border-bottom: 1px solid var(--bs-border-color);
}
.header-content {
display: flex;
align-items: center;
justify-content: space-between;
}
.logo-container {
display: flex;
align-items: center;
gap: 1rem;
}
.logo {
height: 40px;
width: auto;
}
.footer {
margin-top: auto;
padding: 1rem 0;
text-align: center;
border-top: 1px solid var(--bs-border-color);
font-size: 0.9rem;
}
.footer a {
color: inherit;
text-decoration: none;
}
.footer a:hover {
text-decoration: underline;
}
</style>
</head>
<body>
<div class="header">
<div class="container">
<div class="header-content">
<div class="logo-container">
<img src="{{ url_for('static', filename='maigret.png') }}" alt="Maigret Logo" class="logo">
<h1 class="h4 mb-0">Maigret Web Interface</h1>
</div>
<button class="btn btn-outline-secondary" id="theme-toggle">
Toggle Dark/Light Mode
</button>
</div>
</div>
</div>
<div class="main-container">
<div class="container">
{% block content %}{% endblock %}
</div>
</div>
<footer class="footer">
<div class="container">
<p class="mb-0">
Powered by <a href="https://github.com/soxoj/maigret" target="_blank">Maigret</a> |
Licensed under <a href="https://github.com/soxoj/maigret/blob/main/LICENSE" target="_blank">MIT
License</a>
</p>
</div>
</footer>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script>
<script>
document.getElementById('theme-toggle').addEventListener('click', function () {
const html = document.documentElement;
if (html.getAttribute('data-bs-theme') === 'dark') {
html.setAttribute('data-bs-theme', 'light');
} else {
html.setAttribute('data-bs-theme', 'dark');
}
});
</script>
</body>
</html>
+520
View File
@@ -0,0 +1,520 @@
{% extends "base.html" %}
{% block content %}
<style>
.tag-cloud {
display: flex;
flex-wrap: wrap;
gap: 8px;
padding: 15px;
border-radius: 8px;
background: rgba(0, 0, 0, 0.05);
margin-bottom: 20px;
}
.tag {
display: inline-block;
padding: 5px 10px;
border-radius: 15px;
background-color: #dc3545;
color: white;
cursor: pointer;
font-size: 14px;
transition: all 0.3s ease;
user-select: none;
}
.tag.selected {
background-color: #28a745;
}
.tag.excluded {
background-color: #343a40;
text-decoration: line-through;
}
.tag:hover {
transform: translateY(-2px);
box-shadow: 0 2px 5px rgba(0, 0, 0, 0.2);
}
.hidden-select {
display: none !important;
}
.site-input-container {
position: relative;
}
.site-input {
width: 100%;
}
.selected-sites {
display: flex;
flex-wrap: wrap;
gap: 8px;
padding: 10px 0;
}
.selected-site {
background-color: #214e7b;
padding: 2px 8px;
border-radius: 12px;
font-size: 14px;
display: inline-flex;
align-items: center;
gap: 5px;
}
.remove-site {
cursor: pointer;
color: #dc3545;
font-weight: bold;
}
.section-header {
cursor: pointer;
padding: 1rem;
background: rgba(255, 255, 255, 0.05);
border-radius: 4px;
margin-bottom: 0.5rem;
display: flex;
justify-content: space-between;
align-items: center;
}
.section-content {
padding: 1rem;
display: none;
}
.section-content.show {
display: block;
}
.chevron::after {
content: '▼';
transition: transform 0.2s;
}
.chevron.collapsed::after {
transform: rotate(-90deg);
}
.main-search-section {
background: rgba(255, 255, 255, 0.03);
padding: 2rem;
border-radius: 8px;
margin-bottom: 2rem;
}
.search-button {
width: 100%;
padding: 1rem;
font-size: 1.2rem;
margin-top: 2rem;
}
</style>
<div class="form-container">
{% if error %}
<div class="alert alert-danger">{{ error }}</div>
{% endif %}
<form method="POST" action="{{ url_for('search') }}" class="mb-4">
<!-- Main Search Section -->
<div class="main-search-section">
<div class="mb-4">
<label for="usernames" class="form-label h5">Usernames to Search</label>
<textarea class="form-control" id="usernames" name="usernames" rows="3" required
placeholder="Enter one or more usernames (separated by spaces or commas)..."></textarea>
</div>
<div class="row align-items-center">
<div class="col-md-6">
<label for="top_sites" class="form-label">Number of Sites</label>
<input type="number" class="form-control" id="top_sites" name="top_sites" min="1" max="10000"
placeholder="Default: 500">
</div>
<div class="col-md-6">
<label for="timeout" class="form-label">Timeout (seconds)</label>
<input type="number" class="form-control" id="timeout" name="timeout" min="1"
placeholder="Default: 30">
</div>
<div class="col-12 mt-3">
<div class="form-check">
<input type="checkbox" class="form-check-input" id="all_sites" name="all_sites"
onchange="document.getElementById('top_sites').disabled = this.checked;">
<label class="form-check-label" for="all_sites">Search All Sites</label>
</div>
</div>
</div>
</div>
<!-- Filters Section -->
<div class="mb-4">
<div class="section-header" onclick="toggleSection('filters')">
<h5 class="mb-0">Filters</h5>
<span class="chevron"></span>
</div>
<div id="filters" class="section-content">
<div class="mb-3 site-input-container">
<label for="site" class="form-label">Specify Sites (Optional)</label>
<input type="text" class="form-control site-input" id="siteInput"
placeholder="Type to search for sites..." list="siteOptions">
<input type="hidden" id="site" name="site">
<datalist id="siteOptions">
{% for site in site_options %}
<option value="{{ site }}">
{% endfor %}
</datalist>
<div class="selected-sites" id="selectedSites"></div>
</div>
<div class="mb-3">
<label class="form-label">Tags (click to cycle: include → exclude → neutral)</label>
<div class="mb-2">
<small class="text-muted">
<span style="display:inline-block;width:12px;height:12px;background:#28a745;border-radius:50%;"></span> Included (whitelist)
&nbsp;&nbsp;
<span style="display:inline-block;width:12px;height:12px;background:#343a40;border-radius:50%;"></span> Excluded (blacklist)
&nbsp;&nbsp;
<span style="display:inline-block;width:12px;height:12px;background:#dc3545;border-radius:50%;"></span> Neutral
</small>
</div>
<div class="tag-cloud" id="tagCloud"></div>
<select multiple class="hidden-select" id="tags" name="tags">
<option value="gaming">Gaming</option>
<option value="coding">Coding</option>
<option value="photo">Photo</option>
<option value="music">Music</option>
<option value="blog">Blog</option>
<option value="finance">Finance</option>
<option value="freelance">Freelance</option>
<option value="dating">Dating</option>
<option value="tech">Tech</option>
<option value="forum">Forum</option>
<option value="porn">Porn</option>
<option value="erotic">Erotic</option>
<option value="webcam">Webcam</option>
<option value="video">Video</option>
<option value="movies">Movies</option>
<option value="hacking">Hacking</option>
<option value="art">Art</option>
<option value="discussion">Discussion</option>
<option value="sharing">Sharing</option>
<option value="writing">Writing</option>
<option value="wiki">Wiki</option>
<option value="business">Business</option>
<option value="shopping">Shopping</option>
<option value="sport">Sport</option>
<option value="books">Books</option>
<option value="news">News</option>
<option value="documents">Documents</option>
<option value="travel">Travel</option>
<option value="maps">Maps</option>
<option value="hobby">Hobby</option>
<option value="apps">Apps</option>
<option value="classified">Classified</option>
<option value="career">Career</option>
<option value="geosocial">Geosocial</option>
<option value="streaming">Streaming</option>
<option value="education">Education</option>
<option value="networking">Networking</option>
<option value="torrent">Torrent</option>
<option value="science">Science</option>
<option value="medicine">Medicine</option>
<option value="reading">Reading</option>
<option value="stock">Stock</option>
<option value="messaging">Messaging</option>
<option value="trading">Trading</option>
<option value="links">Links</option>
<option value="fashion">Fashion</option>
<option value="tasks">Tasks</option>
<option value="military">Military</option>
<option value="auto">Auto</option>
<option value="gambling">Gambling</option>
<option value="cybercriminal">Cybercriminal</option>
<option value="review">Review</option>
<option value="bookmarks">Bookmarks</option>
<option value="design">Design</option>
<option value="tor">Tor</option>
<option value="i2p">I2P</option>
<option value="q&a">Q&A</option>
<option value="crypto">Crypto</option>
<option value="ai">AI</option>
<!-- Country tags -->
<option value="ae" data-group="country">AE - United Arab Emirates</option>
<option value="ao" data-group="country">AO - Angola</option>
<option value="ar" data-group="country">AR - Argentina</option>
<option value="at" data-group="country">AT - Austria</option>
<option value="au" data-group="country">AU - Australia</option>
<option value="az" data-group="country">AZ - Azerbaijan</option>
<option value="bd" data-group="country">BD - Bangladesh</option>
<option value="be" data-group="country">BE - Belgium</option>
<option value="bg" data-group="country">BG - Bulgaria</option>
<option value="br" data-group="country">BR - Brazil</option>
<option value="by" data-group="country">BY - Belarus</option>
<option value="ca" data-group="country">CA - Canada</option>
<option value="ch" data-group="country">CH - Switzerland</option>
<option value="cl" data-group="country">CL - Chile</option>
<option value="cn" data-group="country">CN - China</option>
<option value="co" data-group="country">CO - Colombia</option>
<option value="cr" data-group="country">CR - Costa Rica</option>
<option value="cz" data-group="country">CZ - Czechia</option>
<option value="de" data-group="country">DE - Germany</option>
<option value="dk" data-group="country">DK - Denmark</option>
<option value="dz" data-group="country">DZ - Algeria</option>
<option value="ee" data-group="country">EE - Estonia</option>
<option value="eg" data-group="country">EG - Egypt</option>
<option value="es" data-group="country">ES - Spain</option>
<option value="eu" data-group="country">EU - European Union</option>
<option value="fi" data-group="country">FI - Finland</option>
<option value="fr" data-group="country">FR - France</option>
<option value="gb" data-group="country">GB - United Kingdom</option>
<option value="global" data-group="country">🌍 Global</option>
<option value="gr" data-group="country">GR - Greece</option>
<option value="hk" data-group="country">HK - Hong Kong</option>
<option value="hr" data-group="country">HR - Croatia</option>
<option value="hu" data-group="country">HU - Hungary</option>
<option value="id" data-group="country">ID - Indonesia</option>
<option value="ie" data-group="country">IE - Ireland</option>
<option value="il" data-group="country">IL - Israel</option>
<option value="in" data-group="country">IN - India</option>
<option value="ir" data-group="country">IR - Iran</option>
<option value="it" data-group="country">IT - Italy</option>
<option value="jp" data-group="country">JP - Japan</option>
<option value="kg" data-group="country">KG - Kyrgyzstan</option>
<option value="kr" data-group="country">KR - Korea</option>
<option value="kz" data-group="country">KZ - Kazakhstan</option>
<option value="la" data-group="country">LA - Laos</option>
<option value="lk" data-group="country">LK - Sri Lanka</option>
<option value="lt" data-group="country">LT - Lithuania</option>
<option value="ma" data-group="country">MA - Morocco</option>
<option value="md" data-group="country">MD - Moldova</option>
<option value="mg" data-group="country">MG - Madagascar</option>
<option value="mk" data-group="country">MK - North Macedonia</option>
<option value="mx" data-group="country">MX - Mexico</option>
<option value="ng" data-group="country">NG - Nigeria</option>
<option value="nl" data-group="country">NL - Netherlands</option>
<option value="no" data-group="country">NO - Norway</option>
<option value="ph" data-group="country">PH - Philippines</option>
<option value="pk" data-group="country">PK - Pakistan</option>
<option value="pl" data-group="country">PL - Poland</option>
<option value="pt" data-group="country">PT - Portugal</option>
<option value="re" data-group="country">RE - Réunion</option>
<option value="ro" data-group="country">RO - Romania</option>
<option value="rs" data-group="country">RS - Serbia</option>
<option value="ru" data-group="country">RU - Russia</option>
<option value="sa" data-group="country">SA - Saudi Arabia</option>
<option value="sd" data-group="country">SD - Sudan</option>
<option value="se" data-group="country">SE - Sweden</option>
<option value="sg" data-group="country">SG - Singapore</option>
<option value="sk" data-group="country">SK - Slovakia</option>
<option value="sv" data-group="country">SV - El Salvador</option>
<option value="th" data-group="country">TH - Thailand</option>
<option value="tn" data-group="country">TN - Tunisia</option>
<option value="tr" data-group="country">TR - Türkiye</option>
<option value="tw" data-group="country">TW - Taiwan</option>
<option value="ua" data-group="country">UA - Ukraine</option>
<option value="uk" data-group="country">UK - United Kingdom</option>
<option value="us" data-group="country">US - United States</option>
<option value="uz" data-group="country">UZ - Uzbekistan</option>
<option value="ve" data-group="country">VE - Venezuela</option>
<option value="vi" data-group="country">VI - Virgin Islands</option>
<option value="vn" data-group="country">VN - Viet Nam</option>
<option value="za" data-group="country">ZA - South Africa</option>
</select>
<select multiple class="hidden-select" id="excludedTags" name="excluded_tags">
</select>
</div>
</div>
</div>
<!-- Advanced Options Section -->
<div class="mb-4">
<div class="section-header" onclick="toggleSection('advanced')">
<h5 class="mb-0">Advanced Options</h5>
<span class="chevron"></span>
</div>
<div id="advanced" class="section-content">
<div class="mb-3 form-check">
<input type="checkbox" class="form-check-input" id="permute" name="permute">
<label class="form-check-label" for="permute">Enable Username Permutations</label>
</div>
<div class="mb-3 form-check">
<input type="checkbox" class="form-check-input" id="disable_recursive_search"
name="disable_recursive_search">
<label class="form-check-label" for="disable_recursive_search">Disable Recursive Search</label>
</div>
<div class="mb-3 form-check">
<input type="checkbox" class="form-check-input" id="disable_extracting" name="disable_extracting">
<label class="form-check-label" for="disable_extracting">Disable Information Extraction</label>
</div>
<div class="mb-3 form-check">
<input type="checkbox" class="form-check-input" id="with_domains" name="with_domains">
<label class="form-check-label" for="with_domains">Check Domains</label>
</div>
<div class="mb-3">
<label for="proxy" class="form-label">Proxy URL</label>
<input type="text" class="form-control" id="proxy" name="proxy"
placeholder="e.g., 127.0.0.1:1080">
</div>
<div class="mb-3">
<label for="tor_proxy" class="form-label">TOR Proxy URL</label>
<input type="text" class="form-control" id="tor_proxy" name="tor_proxy"
placeholder="Default: 127.0.0.1:9050">
</div>
<div class="mb-3">
<label for="i2p_proxy" class="form-label">I2P Proxy URL</label>
<input type="text" class="form-control" id="i2p_proxy" name="i2p_proxy"
placeholder="Default: 127.0.0.1:4444">
</div>
</div>
</div>
<button type="submit" class="btn search-button" style="background-color: rgb(249, 207, 0); color: black;">
Start Search
</button>
</form>
</div>
<script>
function toggleSection(sectionId) {
const content = document.getElementById(sectionId);
const header = content.previousElementSibling;
content.classList.toggle('show');
header.querySelector('.chevron').classList.toggle('collapsed');
}
document.addEventListener('DOMContentLoaded', function () {
// Tag cloud functionality with include/exclude (whitelist/blacklist) support
const tagCloud = document.getElementById('tagCloud');
const hiddenSelect = document.getElementById('tags');
const excludedSelect = document.getElementById('excludedTags');
const allTags = Array.from(hiddenSelect.options).map(opt => ({
value: opt.value,
label: opt.text,
group: opt.dataset.group || 'category'
}));
function updateTagSelects() {
// Clear and repopulate hidden selects based on tag states
Array.from(hiddenSelect.options).forEach(opt => opt.selected = false);
// Clear excluded select
excludedSelect.innerHTML = '';
document.querySelectorAll('#tagCloud .tag').forEach(tagEl => {
const val = tagEl.dataset.value;
if (tagEl.classList.contains('selected')) {
const option = Array.from(hiddenSelect.options).find(opt => opt.value === val);
if (option) option.selected = true;
} else if (tagEl.classList.contains('excluded')) {
const opt = document.createElement('option');
opt.value = val;
opt.selected = true;
excludedSelect.appendChild(opt);
}
});
}
let lastGroup = '';
allTags.forEach(tag => {
if (tag.group !== lastGroup && tag.group === 'country') {
const separator = document.createElement('div');
separator.style.cssText = 'width:100%;margin:8px 0 4px;padding:4px 0;border-top:1px solid rgba(0,0,0,0.15);font-size:13px;color:#666;';
separator.textContent = 'Countries';
tagCloud.appendChild(separator);
}
lastGroup = tag.group;
const tagElement = document.createElement('span');
tagElement.className = 'tag';
tagElement.textContent = tag.label;
tagElement.dataset.value = tag.value;
// Single click cycles: neutral -> included -> excluded -> neutral
tagElement.addEventListener('click', function (e) {
e.preventDefault();
if (this.classList.contains('selected')) {
// included -> excluded
this.classList.remove('selected');
this.classList.add('excluded');
} else if (this.classList.contains('excluded')) {
// excluded -> neutral
this.classList.remove('excluded');
} else {
// neutral -> included
this.classList.add('selected');
}
updateTagSelects();
});
tagCloud.appendChild(tagElement);
});
// Site selection functionality
const siteInput = document.getElementById('siteInput');
const hiddenInput = document.getElementById('site');
const selectedSitesContainer = document.getElementById('selectedSites');
let selectedSites = new Set();
function updateHiddenInput() {
hiddenInput.value = Array.from(selectedSites).join(',');
}
function addSite(site) {
if (site && !selectedSites.has(site)) {
selectedSites.add(site);
updateHiddenInput();
const siteElement = document.createElement('span');
siteElement.className = 'selected-site';
siteElement.innerHTML = `${site}<span class="remove-site" data-site="${site}">&times;</span>`;
selectedSitesContainer.appendChild(siteElement);
}
}
function removeSite(site) {
selectedSites.delete(site);
updateHiddenInput();
const siteElements = selectedSitesContainer.querySelectorAll('.selected-site');
siteElements.forEach(el => {
if (el.querySelector('.remove-site').dataset.site === site) {
el.remove();
}
});
}
siteInput.addEventListener('change', function (e) {
const value = this.value.trim();
if (value) {
addSite(value);
this.value = '';
}
});
selectedSitesContainer.addEventListener('click', function (e) {
if (e.target.classList.contains('remove-site')) {
removeSite(e.target.dataset.site);
}
});
siteInput.addEventListener('paste', function (e) {
e.preventDefault();
const paste = (e.clipboardData || window.clipboardData).getData('text');
const sites = paste.split(',').map(site => site.trim()).filter(site => site);
sites.forEach(addSite);
});
const form = document.querySelector('form');
form.addEventListener('submit', function (e) {
const selectedTags = Array.from(tagCloud.querySelectorAll('.tag.selected'));
Array.from(hiddenSelect.options).forEach(opt => {
opt.selected = selectedTags.some(tag => tag.dataset.value === opt.value);
});
updateHiddenInput();
});
});
</script>
{% endblock %}
+156
View File
@@ -0,0 +1,156 @@
{% extends "base.html" %}
{% block content %}
<style>
.tag-badge {
background-color: #214e7b;
padding: 2px 8px;
border-radius: 12px;
font-size: 14px;
display: inline-flex;
align-items: center;
gap: 5px;
margin: 2px;
color: white;
}
.profile-list {
list-style: none;
padding: 0;
}
.profile-item {
margin-bottom: 10px;
padding: 10px;
display: flex;
justify-content: space-between;
align-items: center;
border-bottom: 1px solid rgba(255, 255, 255, 0.1);
}
.profile-link {
display: flex;
align-items: center;
gap: 8px;
}
.favicon {
width: 16px;
height: 16px;
}
.tag-container {
display: flex;
flex-wrap: wrap;
gap: 5px;
justify-content: flex-end;
}
.report-container {
margin-bottom: 1rem;
}
.report-header {
cursor: pointer;
padding: 1rem;
background: rgba(255, 255, 255, 0.05);
border-radius: 4px;
margin-bottom: 0.5rem;
}
.report-content {
display: none;
}
.report-content.show {
display: block;
}
.chevron::after {
content: '▼';
margin-left: 8px;
transition: transform 0.2s;
}
.chevron.collapsed::after {
transform: rotate(-90deg);
}
</style>
<div class="form-container">
<h1 class="mb-4">Search Results</h1>
<!-- Flash messages -->
{% with messages = get_flashed_messages() %}
{% if messages %}
{% for message in messages %}
<div class="alert alert-info">{{ message }}</div>
{% endfor %}
{% endif %}
{% endwith %}
<p>The search has completed. <a href="{{ url_for('index')}}">Back to start.</a></p>
{% if graph_file %}
<h3>Combined Graph</h3>
<iframe src="{{ url_for('download_report', filename=graph_file) }}" style="width:100%; height:600px; border:none;"></iframe>
{% endif %}
<hr>
{% if individual_reports %}
<h3>Individual Reports</h3>
<div class="reports-list">
{% for report in individual_reports %}
<div class="report-container">
<div class="report-header" onclick="toggleReport(this)" data-target="report-{{ loop.index }}">
<h5 class="mb-0 d-flex align-items-center">
<span>{{ report.username }}</span>
<span class="chevron"></span>
</h5>
</div>
<div id="report-{{ loop.index }}" class="report-content">
<p>
<a href="{{ url_for('download_report', filename=report.csv_file) }}">CSV Report</a> |
<a href="{{ url_for('download_report', filename=report.json_file) }}">JSON Report</a> |
<a href="{{ url_for('download_report', filename=report.pdf_file) }}">PDF Report</a> |
<a href="{{ url_for('download_report', filename=report.html_file) }}">HTML Report</a>
</p>
{% if report.claimed_profiles %}
<strong>Claimed Profiles:</strong>
<ul class="profile-list">
{% for profile in report.claimed_profiles %}
<li class="profile-item">
<div class="profile-link">
<img class="favicon" src="https://www.google.com/s2/favicons?domain={{ profile.url }}" onerror="this.style.display='none'" alt="">
<a href="{{ profile.url }}" target="_blank">{{ profile.site_name }}</a>
</div>
{% if profile.tags %}
<div class="tag-container">
{% for tag in profile.tags %}
<span class="tag-badge">{{ tag }}</span>
{% endfor %}
</div>
{% endif %}
</li>
{% endfor %}
</ul>
{% else %}
<p>No claimed profiles found.</p>
{% endif %}
</div>
</div>
{% endfor %}
</div>
{% else %}
<p>No individual reports available.</p>
{% endif %}
</div>
<script>
function toggleReport(header) {
const reportId = header.getAttribute('data-target');
const content = document.getElementById(reportId);
content.classList.toggle('show');
header.querySelector('.chevron').classList.toggle('collapsed');
}
</script>
{% endblock %}
+16
View File
@@ -0,0 +1,16 @@
{% extends "base.html" %}
{% block content %}
<div class="container mt-4 text-center">
<h2>Search in progress...</h2>
<p>Your request is being processed in the background. This page will automatically redirect once the results are ready.</p>
<div class="spinner-border text-primary" role="status">
<span class="visually-hidden">Loading...</span>
</div>
<script>
// Auto-refresh the page every 5 seconds to check completion
setTimeout(function() {
window.location.reload();
}, 5000);
</script>
</div>
{% endblock %}
Generated
+1991 -1149
View File
File diff suppressed because it is too large Load Diff
+2 -2
View File
@@ -1,5 +1,5 @@
maigret @ https://github.com/soxoj/maigret/archive/refs/heads/main.zip
pefile==2023.2.7 # do not bump while pyinstaller is 6.11.1, there is a conflict
psutil==6.1.0
pyinstaller==6.11.1
psutil==7.2.2
pyinstaller==6.19.0
pywin32-ctypes==0.2.3
+29 -25
View File
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
[tool.poetry]
name = "maigret"
version = "0.5.0a1"
version = "0.5.0"
description = "🕵️‍♂️ Collect a dossier on a person by username from thousands of sites."
authors = ["Soxoj <soxoj@protonmail.com>"]
readme = "README.md"
@@ -31,64 +31,68 @@ classifiers = [
# Install with dev dependencies:
# poetry install --with dev
python = "^3.10"
aiodns = "^3.0.0"
aiohttp = "^3.11.10"
aiohttp-socks = "^0.9.1"
aiodns = ">=3,<5"
aiohttp = "^3.12.14"
aiohttp-socks = ">=0.10.1,<0.12.0"
arabic-reshaper = "^3.0.0"
async-timeout = "^5.0.1"
attrs = "^24.2.0"
certifi = "^2024.8.30"
chardet = "^5.0.0"
attrs = ">=25.3,<27.0"
certifi = ">=2025.6.15,<2027.0.0"
chardet = ">=5,<8"
colorama = "^0.4.6"
future = "^1.0.0"
future-annotations= "^1.0.0"
html5lib = "^1.1"
idna = "^3.4"
Jinja2 = "^3.1.3"
lxml = "^5.3.0"
Jinja2 = "^3.1.6"
lxml = ">=6.0.2,<7.0"
MarkupSafe = "^3.0.2"
mock = "^5.1.0"
multidict = "^6.0.4"
pycountry = "^24.6.1"
multidict = "^6.6.3"
pycountry = ">=24.6.1,<27.0.0"
PyPDF2 = "^3.0.1"
PySocks = "^1.7.1"
python-bidi = "^0.6.3"
requests = "^2.31.0"
requests = "^2.32.4"
requests-futures = "^1.0.2"
requests-toolbelt = "^1.0.0"
six = "^1.17.0"
socid-extractor = "^0.0.27"
soupsieve = "^2.6"
stem = "^1.8.1"
torrequest = "^0.1.0"
alive_progress = "^3.2.0"
typing-extensions = "^4.8.0"
typing-extensions = "^4.14.1"
webencodings = "^0.5.1"
xhtml2pdf = "^0.2.11"
XMind = "^1.2.0"
yarl = "^1.18.3"
yarl = "^1.20.1"
networkx = "^2.6.3"
pyvis = "^0.3.2"
reportlab = "^4.2.0"
reportlab = "^4.4.3"
cloudscraper = "^1.2.71"
platformdirs = "^4.3.6"
flask = {extras = ["async"], version = "^3.1.1"}
asgiref = "^3.9.1"
platformdirs = "^4.3.8"
curl-cffi = ">=0.14,<1.0"
[tool.poetry.group.dev.dependencies]
# How to add a new dev dependency: poetry add black --group dev
# Install dev dependencies with: poetry install --with dev
flake8 = "^7.1.1"
pytest = "^8.3.4"
pytest-asyncio = "^0.25.0"
pytest-cov = "^6.0.0"
pytest = ">=8.3.4,<10.0.0"
pytest-asyncio = "^1.0.0"
pytest-cov = ">=6,<8"
pytest-httpserver = "^1.0.0"
pytest-rerunfailures = "^15.0"
reportlab = "^4.2.0"
mypy = "^1.13.0"
pytest-rerunfailures = ">=15.1,<17.0"
reportlab = "^4.4.3"
mypy = "^1.14.1"
tuna = "^0.5.11"
coverage = "^7.6.9"
black = "^24.10.0"
coverage = "^7.9.2"
black = ">=25.1,<27.0"
[tool.poetry.scripts]
# Run with: poetry run maigret <username>
maigret = "maigret.maigret:run"
update_sitesmd = "utils.update_site_data:main"
update_sitesmd = "utils.update_site_data:main"
+2597 -2579
View File
File diff suppressed because it is too large Load Diff
+1 -1
View File
@@ -7,7 +7,7 @@ description: |
Currently supported more than 3000 sites, search is launched against 500 popular sites in descending order of popularity by default. Also supported checking of Tor sites, I2P sites, and domains (via DNS resolving).
version: 0.5.0a1
version: 0.5.0
license: MIT
base: core22
confinement: strict
File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 1.6 MiB

After

Width:  |  Height:  |  Size: 1.6 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 501 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 312 KiB

+46 -4
View File
@@ -5,11 +5,13 @@ from typing import Dict, Any
DEFAULT_ARGS: Dict[str, Any] = {
'all_sites': False,
'auto_disable': False,
'connections': 100,
'cookie_file': None,
'csv': False,
'db_file': 'resources/data.json',
'debug': False,
'diagnose': False,
'disable_extracting': False,
'disable_recursive_search': False,
'folderoutput': 'reports',
@@ -34,6 +36,7 @@ DEFAULT_ARGS: Dict[str, Any] = {
'site_list': [],
'stats': False,
'tags': '',
'exclude_tags': '',
'timeout': 30,
'tor_proxy': 'socks5://127.0.0.1:9050',
'i2p_proxy': 'http://127.0.0.1:4444',
@@ -42,8 +45,12 @@ DEFAULT_ARGS: Dict[str, Any] = {
'use_disabled_sites': False,
'username': [],
'verbose': False,
'web': None,
'with_domains': False,
'xmind': False,
'md': False,
'no_autoupdate': False,
'force_update': False,
}
@@ -55,7 +62,8 @@ def test_args_search_mode(argparser):
want_args = dict(DEFAULT_ARGS)
want_args.update({'username': ['username']})
assert args == Namespace(**want_args)
for arg in vars(args):
assert getattr(args, arg) == want_args[arg]
def test_args_search_mode_several_usernames(argparser):
@@ -66,7 +74,8 @@ def test_args_search_mode_several_usernames(argparser):
want_args = dict(DEFAULT_ARGS)
want_args.update({'username': ['username1', 'username2']})
assert args == Namespace(**want_args)
for arg in vars(args):
assert getattr(args, arg) == want_args[arg]
def test_args_self_check_mode(argparser):
@@ -81,7 +90,8 @@ def test_args_self_check_mode(argparser):
}
)
assert args == Namespace(**want_args)
for arg in vars(args):
assert getattr(args, arg) == want_args[arg]
def test_args_multiple_sites(argparser):
@@ -97,4 +107,36 @@ def test_args_multiple_sites(argparser):
}
)
assert args == Namespace(**want_args)
for arg in vars(args):
assert getattr(args, arg) == want_args[arg]
def test_args_exclude_tags(argparser):
args = argparser.parse_args('--exclude-tags porn,dating username'.split())
want_args = dict(DEFAULT_ARGS)
want_args.update(
{
'exclude_tags': 'porn,dating',
'username': ['username'],
}
)
for arg in vars(args):
assert getattr(args, arg) == want_args[arg]
def test_args_tags_with_exclude_tags(argparser):
args = argparser.parse_args('--tags coding --exclude-tags porn username'.split())
want_args = dict(DEFAULT_ARGS)
want_args.update(
{
'tags': 'coding',
'exclude_tags': 'porn',
'username': ['username'],
}
)
for arg in vars(args):
assert getattr(args, arg) == want_args[arg]
+83
View File
@@ -4,6 +4,30 @@ import pytest
from maigret.utils import is_country_tag
TOP_SITES_ALEXA_RANK_LIMIT = 50
KNOWN_SOCIAL_DOMAINS = [
"facebook.com",
"instagram.com",
"twitter.com",
"tiktok.com",
"vk.com",
"reddit.com",
"pinterest.com",
"snapchat.com",
"linkedin.com",
"tumblr.com",
"threads.net",
"bsky.app",
"myspace.com",
"weibo.com",
"mastodon.social",
"gab.com",
"minds.com",
"clubhouse.com",
]
@pytest.mark.slow
def test_tags_validity(default_db):
unknown_tags = set()
@@ -19,3 +43,62 @@ def test_tags_validity(default_db):
# if you see "unchecked" tag error, please, do
# maigret --db `pwd`/maigret/resources/data.json --self-check --tag unchecked --use-disabled-sites
assert unknown_tags == set()
@pytest.mark.slow
def test_top_sites_have_category_tag(default_db):
"""Top sites by alexaRank must have at least one category tag (not just country codes)."""
sites_ranked = sorted(
[s for s in default_db.sites if s.alexa_rank],
key=lambda s: s.alexa_rank,
)[:TOP_SITES_ALEXA_RANK_LIMIT]
missing_category = []
for site in sites_ranked:
category_tags = [t for t in site.tags if not is_country_tag(t)]
if not category_tags:
missing_category.append(f"{site.name} (rank {site.alexa_rank})")
assert missing_category == [], (
f"{len(missing_category)} top-{TOP_SITES_ALEXA_RANK_LIMIT} sites have no category tag: "
+ ", ".join(missing_category[:20])
)
@pytest.mark.slow
def test_no_unused_tags_in_registry(default_db):
"""Every tag in the registry should be used by at least one site."""
all_used_tags = set()
for site in default_db.sites:
for tag in site.tags:
if not is_country_tag(tag):
all_used_tags.add(tag)
registered_tags = set(default_db._tags)
unused = registered_tags - all_used_tags
assert unused == set(), f"Tags registered but not used by any site: {unused}"
@pytest.mark.slow
def test_social_networks_have_social_tag(default_db):
"""Known social network domains must have the 'social' tag."""
from urllib.parse import urlparse
missing_social = []
for site in default_db.sites:
url = site.url_main or ""
try:
hostname = urlparse(url).hostname or ""
except Exception:
continue
for domain in KNOWN_SOCIAL_DOMAINS:
if hostname == domain or hostname.endswith("." + domain):
if "social" not in site.tags:
missing_social.append(f"{site.name} ({domain})")
break
assert missing_social == [], (
f"{len(missing_social)} known social networks missing 'social' tag: "
+ ", ".join(missing_social)
)
+233
View File
@@ -0,0 +1,233 @@
"""Tests for the database auto-update system."""
import json
import os
import hashlib
from datetime import datetime, timezone, timedelta
from unittest.mock import patch, MagicMock
import pytest
from maigret.db_updater import (
_parse_version,
_needs_check,
_is_version_compatible,
_is_update_available,
_load_state,
_save_state,
_best_local,
_now_iso,
resolve_db_path,
force_update,
CACHED_DB_PATH,
BUNDLED_DB_PATH,
STATE_PATH,
MAIGRET_HOME,
)
def test_parse_version():
assert _parse_version("0.5.0") == (0, 5, 0)
assert _parse_version("1.2.3") == (1, 2, 3)
assert _parse_version("bad") == (0, 0, 0)
assert _parse_version("") == (0, 0, 0)
def test_needs_check_no_state():
assert _needs_check({}, 24) is True
def test_needs_check_recent():
state = {"last_check_at": _now_iso()}
assert _needs_check(state, 24) is False
def test_needs_check_expired():
old_time = (datetime.now(timezone.utc) - timedelta(hours=25)).strftime("%Y-%m-%dT%H:%M:%SZ")
state = {"last_check_at": old_time}
assert _needs_check(state, 24) is True
def test_needs_check_corrupt():
state = {"last_check_at": "not-a-date"}
assert _needs_check(state, 24) is True
def test_version_compatible():
with patch("maigret.db_updater.__version__", "0.5.0"):
assert _is_version_compatible({"min_maigret_version": "0.5.0"}) is True
assert _is_version_compatible({"min_maigret_version": "0.4.0"}) is True
assert _is_version_compatible({"min_maigret_version": "0.6.0"}) is False
assert _is_version_compatible({}) is True # missing field = compatible
def test_update_available_no_cache(tmp_path):
with patch("maigret.db_updater.CACHED_DB_PATH", str(tmp_path / "nonexistent.json")):
assert _is_update_available({"updated_at": "2026-01-01T00:00:00Z"}, {}) is True
def test_update_available_newer(tmp_path):
cache = tmp_path / "data.json"
cache.write_text("{}")
with patch("maigret.db_updater.CACHED_DB_PATH", str(cache)):
state = {"last_meta": {"updated_at": "2026-01-01T00:00:00Z"}}
meta = {"updated_at": "2026-02-01T00:00:00Z"}
assert _is_update_available(meta, state) is True
def test_update_available_same(tmp_path):
cache = tmp_path / "data.json"
cache.write_text("{}")
with patch("maigret.db_updater.CACHED_DB_PATH", str(cache)):
state = {"last_meta": {"updated_at": "2026-01-01T00:00:00Z"}}
meta = {"updated_at": "2026-01-01T00:00:00Z"}
assert _is_update_available(meta, state) is False
def test_load_state_missing(tmp_path):
with patch("maigret.db_updater.STATE_PATH", str(tmp_path / "missing.json")):
assert _load_state() == {}
def test_load_state_corrupt(tmp_path):
corrupt = tmp_path / "state.json"
corrupt.write_text("not json{{{")
with patch("maigret.db_updater.STATE_PATH", str(corrupt)):
assert _load_state() == {}
def test_save_and_load_state(tmp_path):
state_file = tmp_path / "state.json"
with patch("maigret.db_updater.STATE_PATH", str(state_file)):
with patch("maigret.db_updater.MAIGRET_HOME", str(tmp_path)):
_save_state({"last_check_at": "2026-01-01T00:00:00Z"})
loaded = _load_state()
assert loaded["last_check_at"] == "2026-01-01T00:00:00Z"
def test_best_local_with_valid_cache(tmp_path):
cache = tmp_path / "data.json"
cache.write_text('{"sites": {}, "engines": {}, "tags": []}')
with patch("maigret.db_updater.CACHED_DB_PATH", str(cache)):
assert _best_local() == str(cache)
def test_best_local_with_corrupt_cache(tmp_path):
cache = tmp_path / "data.json"
cache.write_text("not json")
with patch("maigret.db_updater.CACHED_DB_PATH", str(cache)):
assert _best_local() == BUNDLED_DB_PATH
def test_best_local_no_cache(tmp_path):
with patch("maigret.db_updater.CACHED_DB_PATH", str(tmp_path / "missing.json")):
assert _best_local() == BUNDLED_DB_PATH
def test_resolve_db_path_custom_url():
result = resolve_db_path("https://example.com/db.json")
assert result == "https://example.com/db.json"
def test_resolve_db_path_custom_file():
result = resolve_db_path("custom/path.json")
assert result.endswith("custom/path.json")
def test_resolve_db_path_no_autoupdate(tmp_path):
with patch("maigret.db_updater.CACHED_DB_PATH", str(tmp_path / "missing.json")):
result = resolve_db_path("resources/data.json", no_autoupdate=True)
assert result == BUNDLED_DB_PATH
def test_resolve_db_path_no_autoupdate_with_cache(tmp_path):
cache = tmp_path / "data.json"
cache.write_text('{"sites": {}, "engines": {}, "tags": []}')
with patch("maigret.db_updater.CACHED_DB_PATH", str(cache)):
result = resolve_db_path("resources/data.json", no_autoupdate=True)
assert result == str(cache)
@patch("maigret.db_updater._fetch_meta")
def test_resolve_db_path_network_failure(mock_fetch, tmp_path):
mock_fetch.return_value = None
with patch("maigret.db_updater.MAIGRET_HOME", str(tmp_path)):
with patch("maigret.db_updater.STATE_PATH", str(tmp_path / "state.json")):
with patch("maigret.db_updater.CACHED_DB_PATH", str(tmp_path / "missing.json")):
result = resolve_db_path("resources/data.json")
assert result == BUNDLED_DB_PATH
# --- force_update tests ---
@patch("maigret.db_updater._fetch_meta")
def test_force_update_network_failure(mock_fetch, tmp_path):
mock_fetch.return_value = None
with patch("maigret.db_updater.MAIGRET_HOME", str(tmp_path)):
with patch("maigret.db_updater.STATE_PATH", str(tmp_path / "state.json")):
assert force_update() is False
@patch("maigret.db_updater._fetch_meta")
def test_force_update_incompatible_version(mock_fetch, tmp_path):
mock_fetch.return_value = {"min_maigret_version": "99.0.0", "sites_count": 100}
with patch("maigret.db_updater.MAIGRET_HOME", str(tmp_path)):
with patch("maigret.db_updater.STATE_PATH", str(tmp_path / "state.json")):
assert force_update() is False
@patch("maigret.db_updater._download_and_verify")
@patch("maigret.db_updater._fetch_meta")
def test_force_update_success(mock_fetch, mock_download, tmp_path):
mock_fetch.return_value = {
"min_maigret_version": "0.1.0",
"sites_count": 3200,
"updated_at": "2099-01-01T00:00:00Z",
"data_url": "https://example.com/data.json",
"data_sha256": "abc123",
}
mock_download.return_value = str(tmp_path / "data.json")
with patch("maigret.db_updater.MAIGRET_HOME", str(tmp_path)):
with patch("maigret.db_updater.STATE_PATH", str(tmp_path / "state.json")):
with patch("maigret.db_updater.CACHED_DB_PATH", str(tmp_path / "missing.json")):
assert force_update() is True
state = _load_state()
assert state["last_meta"]["sites_count"] == 3200
@patch("maigret.db_updater._fetch_meta")
def test_force_update_already_up_to_date(mock_fetch, tmp_path):
cache = tmp_path / "data.json"
cache.write_text('{"sites": {}, "engines": {}, "tags": []}')
state_file = tmp_path / "state.json"
state_file.write_text(json.dumps({
"last_check_at": _now_iso(),
"last_meta": {"updated_at": "2026-01-01T00:00:00Z", "sites_count": 3000},
}))
mock_fetch.return_value = {
"min_maigret_version": "0.1.0",
"sites_count": 3000,
"updated_at": "2026-01-01T00:00:00Z",
}
with patch("maigret.db_updater.MAIGRET_HOME", str(tmp_path)):
with patch("maigret.db_updater.STATE_PATH", str(state_file)):
with patch("maigret.db_updater.CACHED_DB_PATH", str(cache)):
assert force_update() is False
@patch("maigret.db_updater._download_and_verify")
@patch("maigret.db_updater._fetch_meta")
def test_force_update_download_fails(mock_fetch, mock_download, tmp_path):
mock_fetch.return_value = {
"min_maigret_version": "0.1.0",
"sites_count": 3200,
"updated_at": "2099-01-01T00:00:00Z",
"data_url": "https://example.com/data.json",
"data_sha256": "abc123",
}
mock_download.return_value = None
with patch("maigret.db_updater.MAIGRET_HOME", str(tmp_path)):
with patch("maigret.db_updater.STATE_PATH", str(tmp_path / "state.json")):
with patch("maigret.db_updater.CACHED_DB_PATH", str(tmp_path / "missing.json")):
assert force_update() is False
+2 -2
View File
@@ -36,7 +36,7 @@ def test_notify_about_errors():
},
}
results = notify_about_errors(results, query_notify=None, show_statistics=True)
notifications = notify_about_errors(results, query_notify=None, show_statistics=True)
# Check the output
expected_output = [
@@ -55,4 +55,4 @@ def test_notify_about_errors():
('Access denied: 25.0%', '!'),
('You can see detailed site check errors with a flag `--print-errors`', '-'),
]
assert results == expected_output
assert notifications == expected_output
+38 -4
View File
@@ -3,11 +3,13 @@
import pytest
import asyncio
import logging
from typing import Any, List, Tuple, Callable, Dict
from maigret.executors import (
AsyncioSimpleExecutor,
AsyncioProgressbarExecutor,
AsyncioProgressbarSemaphoreExecutor,
AsyncioProgressbarQueueExecutor,
AsyncioQueueGeneratorExecutor,
)
logger = logging.getLogger(__name__)
@@ -20,7 +22,7 @@ async def func(n):
@pytest.mark.asyncio
async def test_simple_asyncio_executor():
tasks = [(func, [n], {}) for n in range(10)]
tasks: List[Tuple[Callable, list, dict]] = [(func, [n], {}) for n in range(10)]
executor = AsyncioSimpleExecutor(logger=logger)
assert await executor.run(tasks) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
assert executor.execution_time > 0.2
@@ -29,7 +31,7 @@ async def test_simple_asyncio_executor():
@pytest.mark.asyncio
async def test_asyncio_progressbar_executor():
tasks = [(func, [n], {}) for n in range(10)]
tasks: List[Tuple[Callable, list, dict]] = [(func, [n], {}) for n in range(10)]
executor = AsyncioProgressbarExecutor(logger=logger)
# no guarantees for the results order
@@ -40,7 +42,7 @@ async def test_asyncio_progressbar_executor():
@pytest.mark.asyncio
async def test_asyncio_progressbar_semaphore_executor():
tasks = [(func, [n], {}) for n in range(10)]
tasks: List[Tuple[Callable, list, dict]] = [(func, [n], {}) for n in range(10)]
executor = AsyncioProgressbarSemaphoreExecutor(logger=logger, in_parallel=5)
# no guarantees for the results order
@@ -52,7 +54,7 @@ async def test_asyncio_progressbar_semaphore_executor():
@pytest.mark.slow
@pytest.mark.asyncio
async def test_asyncio_progressbar_queue_executor():
tasks = [(func, [n], {}) for n in range(10)]
tasks: List[Tuple[Callable, list, dict]] = [(func, [n], {}) for n in range(10)]
executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=2)
assert await executor.run(tasks) == [0, 1, 3, 2, 4, 6, 7, 5, 9, 8]
@@ -76,3 +78,35 @@ async def test_asyncio_progressbar_queue_executor():
assert await executor.run(tasks) == [0, 3, 6, 9, 1, 4, 7, 2, 5, 8]
assert executor.execution_time > 0.2
assert executor.execution_time < 0.4
@pytest.mark.asyncio
async def test_asyncio_queue_generator_executor():
tasks: List[Tuple[Callable, list, dict]] = [(func, [n], {}) for n in range(10)]
executor = AsyncioQueueGeneratorExecutor(logger=logger, in_parallel=2)
results = [result async for result in executor.run(tasks)] # type: ignore[arg-type]
assert results == [0, 1, 3, 2, 4, 6, 7, 5, 9, 8]
assert executor.execution_time > 0.5
assert executor.execution_time < 0.6
executor = AsyncioQueueGeneratorExecutor(logger=logger, in_parallel=3)
results = [result async for result in executor.run(tasks)] # type: ignore[arg-type]
assert results == [0, 3, 1, 4, 6, 2, 7, 9, 5, 8]
assert executor.execution_time > 0.4
assert executor.execution_time < 0.5
executor = AsyncioQueueGeneratorExecutor(logger=logger, in_parallel=5)
results = [result async for result in executor.run(tasks)] # type: ignore[arg-type]
assert results in (
[0, 3, 6, 1, 4, 7, 9, 2, 5, 8],
[0, 3, 6, 1, 4, 9, 7, 2, 5, 8],
)
assert executor.execution_time > 0.3
assert executor.execution_time < 0.4
executor = AsyncioQueueGeneratorExecutor(logger=logger, in_parallel=10)
results = [result async for result in executor.run(tasks)] # type: ignore[arg-type]
assert results == [0, 3, 6, 9, 1, 4, 7, 2, 5, 8]
assert executor.execution_time > 0.2
assert executor.execution_time < 0.3
+87 -3
View File
@@ -2,6 +2,7 @@
import asyncio
import copy
from unittest.mock import patch
import pytest
from mock import Mock
@@ -11,7 +12,8 @@ from maigret.maigret import (
extract_ids_from_page,
extract_ids_from_results,
)
from maigret.sites import MaigretSite
from maigret.checking import site_self_check
from maigret.sites import MaigretSite, MaigretDatabase
from maigret.result import MaigretCheckResult, MaigretCheckStatus
from tests.conftest import RESULTS_EXAMPLE
@@ -27,7 +29,9 @@ async def test_self_check_db(test_db):
assert test_db.sites_dict['ValidActive'].disabled is False
assert test_db.sites_dict['InvalidInactive'].disabled is True
await self_check(test_db, test_db.sites_dict, logger, silent=False)
await self_check(
test_db, test_db.sites_dict, logger, silent=False, auto_disable=True
)
assert test_db.sites_dict['InvalidActive'].disabled is True
assert test_db.sites_dict['ValidInactive'].disabled is False
@@ -35,6 +39,86 @@ async def test_self_check_db(test_db):
assert test_db.sites_dict['InvalidInactive'].disabled is True
@pytest.mark.slow
@pytest.mark.asyncio
async def test_self_check_no_progressbar(test_db):
"""Verify that no_progressbar=True disables the alive_bar in self_check."""
logger = Mock()
with patch('maigret.checking.alive_bar') as mock_alive_bar:
mock_bar = Mock()
mock_alive_bar.return_value.__enter__ = Mock(return_value=mock_bar)
mock_alive_bar.return_value.__exit__ = Mock(return_value=False)
await self_check(
test_db, test_db.sites_dict, logger, silent=True,
no_progressbar=True,
)
# First call is the self-check progress bar; subsequent calls are
# from inner search() invocations.
self_check_call = mock_alive_bar.call_args_list[0]
_, kwargs = self_check_call
assert kwargs.get('title') == 'Self-checking'
assert kwargs.get('disable') is True
@pytest.mark.slow
@pytest.mark.asyncio
async def test_self_check_progressbar_enabled_by_default(test_db):
"""Verify that alive_bar is enabled by default (no_progressbar=False)."""
logger = Mock()
with patch('maigret.checking.alive_bar') as mock_alive_bar:
mock_bar = Mock()
mock_alive_bar.return_value.__enter__ = Mock(return_value=mock_bar)
mock_alive_bar.return_value.__exit__ = Mock(return_value=False)
await self_check(
test_db, test_db.sites_dict, logger, silent=True,
)
self_check_call = mock_alive_bar.call_args_list[0]
_, kwargs = self_check_call
assert kwargs.get('title') == 'Self-checking'
assert kwargs.get('disable') is False
@pytest.mark.asyncio
async def test_site_self_check_handles_exception(test_db):
"""Verify that site_self_check catches unexpected exceptions and returns a valid result."""
logger = Mock()
sem = asyncio.Semaphore(1)
site = test_db.sites_dict['ValidActive']
with patch('maigret.checking.maigret', side_effect=RuntimeError("test crash")):
result = await site_self_check(site, logger, sem, test_db)
assert isinstance(result, dict)
assert "issues" in result
assert len(result["issues"]) > 0
assert any("Unexpected error" in issue for issue in result["issues"])
@pytest.mark.asyncio
async def test_self_check_handles_task_exception(test_db):
"""Verify that self_check continues when individual site checks raise exceptions."""
logger = Mock()
with patch('maigret.checking.maigret', side_effect=RuntimeError("test crash")):
result = await self_check(
test_db, test_db.sites_dict, logger, silent=True,
no_progressbar=True,
)
assert isinstance(result, dict)
assert 'results' in result
assert len(result['results']) == len(test_db.sites_dict)
for r in result['results']:
assert 'site_name' in r
assert 'issues' in r
@pytest.mark.slow
@pytest.mark.skip(reason="broken, fixme")
def test_maigret_results(test_db):
@@ -110,7 +194,7 @@ def test_extract_ids_from_page(test_db):
def test_extract_ids_from_results(test_db):
TEST_EXAMPLE = copy.deepcopy(RESULTS_EXAMPLE)
TEST_EXAMPLE: dict = copy.deepcopy(RESULTS_EXAMPLE)
TEST_EXAMPLE['Reddit']['ids_usernames'] = {'test1': 'yandex_public_id'}
TEST_EXAMPLE['Reddit']['ids_links'] = ['https://www.reddit.com/user/test2']
+1 -1
View File
@@ -6,7 +6,7 @@ import os
import pytest
from io import StringIO
import xmind
import xmind # type: ignore[import-untyped]
from jinja2 import Template
from maigret.report import (
+53
View File
@@ -0,0 +1,53 @@
import unittest
from unittest.mock import patch, mock_open
from maigret.settings import Settings
class TestSettings(unittest.TestCase):
@patch('json.load')
@patch('builtins.open', new_callable=mock_open)
def test_settings_cascade_and_override(self, mock_file, mock_json_load):
file1_data = {"timeout": 10, "retries_count": 3, "proxy_url": "http://proxy1"}
file2_data = {"timeout": 20, "recursive_search": True}
file3_data = {"proxy_url": "http://proxy3", "print_not_found": False}
mock_json_load.side_effect = [file1_data, file2_data, file3_data]
settings = Settings()
paths = ['file1.json', 'file2.json', 'file3.json']
was_inited, msg = settings.load(paths)
self.assertTrue(was_inited)
self.assertEqual(settings.retries_count, 3)
self.assertEqual(settings.timeout, 20)
self.assertTrue(settings.recursive_search)
self.assertEqual(settings.proxy_url, "http://proxy3")
self.assertFalse(settings.print_not_found)
@patch('builtins.open')
def test_settings_file_not_found(self, mock_open_func):
mock_open_func.side_effect = FileNotFoundError()
settings = Settings()
paths = ['nonexistent.json']
was_inited, msg = settings.load(paths)
self.assertFalse(was_inited)
self.assertIn('None of the default settings files found', msg)
@patch('json.load')
@patch('builtins.open', new_callable=mock_open)
def test_settings_invalid_json(self, mock_file, mock_json_load):
mock_json_load.side_effect = ValueError("Expecting value")
settings = Settings()
paths = ['invalid.json']
was_inited, msg = settings.load(paths)
self.assertFalse(was_inited)
self.assertIsInstance(msg, ValueError)
self.assertIn('Problem with parsing json contents', str(msg))
+94 -1
View File
@@ -1,8 +1,10 @@
"""Maigret Database test functions"""
from typing import Any, Dict
from maigret.sites import MaigretDatabase, MaigretSite
EXAMPLE_DB = {
EXAMPLE_DB: Dict[str, Any] = {
'engines': {
"XenForo": {
"presenseStrs": ["XenForo"],
@@ -182,6 +184,97 @@ def test_ranked_sites_dict_id_type():
assert len(db.ranked_sites_dict(id_type='gaia_id')) == 1
def test_ranked_sites_dict_excluded_tags():
db = MaigretDatabase()
db.update_site(MaigretSite('3', {'alexaRank': 1000, 'engine': 'ucoz'}))
db.update_site(MaigretSite('1', {'alexaRank': 2, 'tags': ['forum']}))
db.update_site(MaigretSite('2', {'alexaRank': 10, 'tags': ['ru', 'forum']}))
# excluding by tag
assert list(db.ranked_sites_dict(excluded_tags=['ru']).keys()) == ['1', '3']
assert list(db.ranked_sites_dict(excluded_tags=['forum']).keys()) == ['3']
# excluding by engine
assert list(db.ranked_sites_dict(excluded_tags=['ucoz']).keys()) == ['1', '2']
# combining include and exclude tags
assert list(db.ranked_sites_dict(tags=['forum'], excluded_tags=['ru']).keys()) == ['1']
# excluding non-existent tag has no effect
assert list(db.ranked_sites_dict(excluded_tags=['nonexistent']).keys()) == ['1', '2', '3']
# exclude all
assert list(db.ranked_sites_dict(excluded_tags=['forum', 'ucoz']).keys()) == []
def test_ranked_sites_dict_excluded_tags_with_top():
"""Excluded tags should also prevent mirrors from being included."""
db = MaigretDatabase()
db.update_site(
MaigretSite('Parent', {'alexaRank': 1, 'tags': ['forum'], 'type': 'username'})
)
db.update_site(
MaigretSite('Mirror', {'alexaRank': 999999, 'source': 'Parent', 'tags': ['forum'], 'type': 'username'})
)
db.update_site(
MaigretSite('Other', {'alexaRank': 2, 'tags': ['coding'], 'type': 'username'})
)
# Without exclusion, mirror should be included
result = db.ranked_sites_dict(top=1, id_type='username')
assert 'Parent' in result
assert 'Mirror' in result
# With exclusion of 'forum', both Parent and Mirror should be excluded
result = db.ranked_sites_dict(top=2, excluded_tags=['forum'], id_type='username')
assert 'Parent' not in result
assert 'Mirror' not in result
assert 'Other' in result
def test_ranked_sites_dict_mirrors_disabled_parent():
"""Mirror is included when parent ranks in top N but parent is disabled."""
db = MaigretDatabase()
db.update_site(
MaigretSite(
'ParentPlatform',
{'alexaRank': 5, 'disabled': True, 'type': 'username'},
)
)
db.update_site(
MaigretSite(
'OtherSite',
{'alexaRank': 100, 'type': 'username'},
)
)
db.update_site(
MaigretSite(
'MirrorSite',
{
'alexaRank': 99999999,
'source': 'ParentPlatform',
'type': 'username',
},
)
)
result = db.ranked_sites_dict(top=1, disabled=False, id_type='username')
assert list(result.keys()) == ['OtherSite', 'MirrorSite']
def test_ranked_sites_dict_mirrors_no_extra_without_parent_in_top():
db = MaigretDatabase()
db.update_site(MaigretSite('A', {'alexaRank': 1, 'type': 'username'}))
db.update_site(
MaigretSite(
'B',
{'alexaRank': 2, 'source': 'NotInDb', 'type': 'username'},
)
)
assert list(db.ranked_sites_dict(top=1, id_type='username').keys()) == ['A']
def test_get_url_template():
site = MaigretSite(
"test",
+89 -7
View File
@@ -1,9 +1,10 @@
import re
import pytest
from unittest.mock import AsyncMock, MagicMock, patch
from maigret.submit import Submitter, MaigretSite, MaigretEngine
from unittest.mock import MagicMock, patch
from maigret.submit import Submitter
from aiohttp import ClientSession
from maigret.sites import MaigretDatabase
from maigret.settings import Settings
from maigret.sites import MaigretDatabase, MaigretSite
import logging
@@ -27,7 +28,7 @@ async def test_detect_known_engine(test_db, local_test_db):
url_exists = "https://devforum.zoom.us/u/adam"
url_mainpage = "https://devforum.zoom.us/"
# Mock extract_username_dialog to return "adam"
submitter.extract_username_dialog = MagicMock(return_value="adam")
submitter.extract_username_dialog = MagicMock(return_value="adam") # type: ignore[method-assign]
sites, resp_text = await submitter.detect_known_engine(
url_exists, url_mainpage, session=None, follow_redirects=False, headers=None
@@ -110,7 +111,7 @@ async def test_check_features_manually_success(settings):
@pytest.mark.slow
@pytest.mark.asyncio
async def test_check_features_manually_success(settings):
async def test_check_features_manually_cloudflare(settings):
# Setup
db = MaigretDatabase()
logger = logging.getLogger("test_logger")
@@ -272,7 +273,88 @@ async def test_dialog_adds_site_negative(settings):
]
with patch('builtins.input', side_effect=user_inputs):
result = await submitter.dialog("https://icq.im/sokrat", None)
result = await submitter.dialog("https://icq.com/sokrat", None)
await submitter.close()
assert result is False
def test_domain_matching_exact():
"""Test that domain matching uses proper boundary checks, not substring matching.
x.com should NOT match sites like 500px.com, mix.com, etc.
"""
domain_raw = "x.com"
domain_re = re.compile(
r'://(www\.)?' + re.escape(domain_raw) + r'(/|$)'
)
# These should NOT match x.com
non_matching = [
MaigretSite("500px", {"url": "https://500px.com/p/{username}", "urlMain": "https://500px.com/"}),
MaigretSite("Mix", {"url": "https://mix.com/{username}", "urlMain": "https://mix.com"}),
MaigretSite("Screwfix", {"url": "{urlMain}{urlSubpath}/members/?username={username}", "urlMain": "https://community.screwfix.com"}),
MaigretSite("Wix", {"url": "https://{username}.wix.com", "urlMain": "https://wix.com/"}),
MaigretSite("1x", {"url": "https://1x.com/{username}", "urlMain": "https://1x.com"}),
MaigretSite("Roblox", {"url": "https://www.roblox.com/user.aspx?username={username}", "urlMain": "https://www.roblox.com/"}),
]
for site in non_matching:
assert not domain_re.search(site.url_main + site.url), \
f"x.com should NOT match site {site.name} ({site.url_main})"
def test_domain_matching_positive():
"""Test that domain matching correctly matches the exact domain."""
domain_raw = "x.com"
domain_re = re.compile(
r'://(www\.)?' + re.escape(domain_raw) + r'(/|$)'
)
# These SHOULD match x.com
matching = [
MaigretSite("X", {"url": "https://x.com/{username}", "urlMain": "https://x.com"}),
MaigretSite("X-www", {"url": "https://www.x.com/{username}", "urlMain": "https://www.x.com"}),
]
for site in matching:
assert domain_re.search(site.url_main + site.url), \
f"x.com SHOULD match site {site.name} ({site.url_main})"
def test_dialog_nonexistent_site_name_no_crash():
"""Test that entering a site name not in the matched list doesn't crash.
This tests the fix for: AttributeError: 'NoneType' object has no attribute 'name'
The old_site should be None when user enters a name not in matched_sites,
and the code should handle it gracefully.
"""
# Simulate the logic that was crashing
matched_sites = [
MaigretSite("ValidActive", {"url": "https://example.com/{username}", "urlMain": "https://example.com"}),
MaigretSite("InvalidActive", {"url": "https://example.com/alt/{username}", "urlMain": "https://example.com"}),
]
site_name = "NonExistentSite"
old_site = next(
(site for site in matched_sites if site.name == site_name), None
)
# This is what the old code did - it would crash here
assert old_site is None
# The fix: check before accessing .name
if old_site is None:
result = "not found"
else:
result = old_site.name
assert result == "not found"
# And when site_name IS in matched_sites, it should work
site_name = "ValidActive"
old_site = next(
(site for site in matched_sites if site.name == site_name), None
)
assert old_site is not None
assert old_site.name == "ValidActive"
+63
View File
@@ -0,0 +1,63 @@
"""Tests for the Twitter / X site entry and GraphQL probe."""
import re
import pytest
import requests
from maigret.sites import MaigretSite
def _twitter_site(site: MaigretSite) -> None:
assert site.name == "Twitter"
assert site.disabled is False
assert site.check_type == "message"
assert site.url_probe and "{username}" in site.url_probe
assert "UserByScreenName" in site.url_probe or "graphql" in site.url_probe
assert site.regex_check
assert re.fullmatch(site.regex_check, site.username_claimed)
assert re.fullmatch(site.regex_check, site.username_unclaimed)
assert site.absence_strs
assert site.activation.get("method") == "twitter"
assert site.activation.get("url")
assert "authorization" in {k.lower() for k in site.headers.keys()}
def test_twitter_site_entry_config(default_db):
"""Twitter entry in data.json must define probe URL, regex, and activation."""
site = default_db.sites_dict["Twitter"]
assert isinstance(site, MaigretSite)
_twitter_site(site)
@pytest.mark.slow
def test_twitter_graphql_probe_claimed_vs_unclaimed(default_db):
"""
Live check: guest activation + UserByScreenName GraphQL returns a user for
usernameClaimed and no user for usernameUnclaimed (same flow as urlProbe).
"""
site = default_db.sites_dict["Twitter"]
_twitter_site(site)
headers = dict(site.headers)
headers.pop("x-guest-token", None)
act = requests.post(site.activation["url"], headers=headers, timeout=45)
assert act.status_code == 200, act.text[:500]
body = act.json()
assert "guest_token" in body
headers["x-guest-token"] = body["guest_token"]
def fetch(username: str) -> dict:
url = site.url_probe.format(username=username)
resp = requests.get(url, headers=headers, timeout=45)
resp.raise_for_status()
return resp.json()
claimed_json = fetch(site.username_claimed)
assert "data" in claimed_json
assert claimed_json["data"].get("user") is not None
unclaimed_json = fetch(site.username_unclaimed)
data = unclaimed_json.get("data") or {}
assert data == {} or data.get("user") is None
+480
View File
@@ -0,0 +1,480 @@
#!/usr/bin/env python3
"""
Mass site checking utility for Maigret development.
Check top-N sites from data.json and generate a report.
Usage:
python utils/check_top_n.py --top 100 # Check top 100 sites
python utils/check_top_n.py --top 50 --parallel 10 # Check with 10 parallel requests
python utils/check_top_n.py --top 100 --output report.json
python utils/check_top_n.py --top 100 --fix # Auto-fix simple issues
"""
import argparse
import asyncio
import json
import sys
import time
from collections import defaultdict
from dataclasses import dataclass, field, asdict
from pathlib import Path
from typing import Dict, List, Optional, Tuple
# Add parent dir for imports
sys.path.insert(0, str(Path(__file__).parent.parent))
try:
import aiohttp
except ImportError:
print("aiohttp not installed. Run: pip install aiohttp")
sys.exit(1)
class Colors:
RED = "\033[91m"
GREEN = "\033[92m"
YELLOW = "\033[93m"
BLUE = "\033[94m"
CYAN = "\033[96m"
RESET = "\033[0m"
BOLD = "\033[1m"
def color(text: str, c: str) -> str:
return f"{c}{text}{Colors.RESET}"
@dataclass
class SiteCheckResult:
"""Result of checking a single site."""
site_name: str
alexa_rank: int
disabled: bool
check_type: str
# Status
status: str = "unknown" # working, broken, timeout, error, anti_bot, disabled
# HTTP results
claimed_http_status: Optional[int] = None
unclaimed_http_status: Optional[int] = None
claimed_error: Optional[str] = None
unclaimed_error: Optional[str] = None
# Issues detected
issues: List[str] = field(default_factory=list)
warnings: List[str] = field(default_factory=list)
# Recommendations
recommendations: List[str] = field(default_factory=list)
# Timing
check_time_ms: int = 0
DEFAULT_HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
}
async def check_url(url: str, headers: dict, timeout: int = 15) -> dict:
"""Quick URL check returning status and basic info."""
result = {
"status": None,
"final_url": None,
"content_length": 0,
"error": None,
"error_type": None,
"content": None,
"markers": {},
}
try:
connector = aiohttp.TCPConnector(ssl=False)
timeout_obj = aiohttp.ClientTimeout(total=timeout)
async with aiohttp.ClientSession(connector=connector, timeout=timeout_obj) as session:
async with session.get(url, headers=headers, allow_redirects=True) as resp:
result["status"] = resp.status
result["final_url"] = str(resp.url)
try:
text = await resp.text()
result["content_length"] = len(text)
result["content"] = text
text_lower = text.lower()
result["markers"] = {
"404_text": any(m in text_lower for m in ["not found", "404", "doesn't exist"]),
"captcha": any(m in text_lower for m in ["captcha", "recaptcha", "challenge"]),
"cloudflare": "cloudflare" in text_lower,
"login": any(m in text_lower for m in ["log in", "login", "sign in"]),
}
except Exception as e:
result["error"] = f"Content error: {e}"
result["error_type"] = "content"
except asyncio.TimeoutError:
result["error"] = "Timeout"
result["error_type"] = "timeout"
except aiohttp.ClientError as e:
result["error"] = str(e)
result["error_type"] = "client"
except Exception as e:
result["error"] = str(e)
result["error_type"] = "unknown"
return result
async def check_site(site_name: str, config: dict, timeout: int = 15) -> SiteCheckResult:
"""Check a single site and return detailed result."""
start_time = time.time()
result = SiteCheckResult(
site_name=site_name,
alexa_rank=config.get("alexaRank", 999999),
disabled=config.get("disabled", False),
check_type=config.get("checkType", "status_code"),
)
# Skip disabled sites
if result.disabled:
result.status = "disabled"
return result
# Build URL
url_template = config.get("url", "")
url_main = config.get("urlMain", "")
url_subpath = config.get("urlSubpath", "")
url_template = url_template.replace("{urlMain}", url_main).replace("{urlSubpath}", url_subpath)
claimed = config.get("usernameClaimed")
unclaimed = config.get("usernameUnclaimed", "noonewouldeverusethis7")
if not claimed:
result.status = "error"
result.issues.append("No usernameClaimed defined")
return result
# Prepare headers
headers = DEFAULT_HEADERS.copy()
if config.get("headers"):
headers.update(config["headers"])
# Check both URLs
url_claimed = url_template.replace("{username}", claimed)
url_unclaimed = url_template.replace("{username}", unclaimed)
try:
claimed_result, unclaimed_result = await asyncio.gather(
check_url(url_claimed, headers, timeout),
check_url(url_unclaimed, headers, timeout),
)
except Exception as e:
result.status = "error"
result.issues.append(f"Check failed: {e}")
return result
result.claimed_http_status = claimed_result["status"]
result.unclaimed_http_status = unclaimed_result["status"]
result.claimed_error = claimed_result.get("error")
result.unclaimed_error = unclaimed_result.get("error")
# Categorize result
if claimed_result["error_type"] == "timeout" or unclaimed_result["error_type"] == "timeout":
result.status = "timeout"
result.issues.append("Request timeout")
elif claimed_result["status"] == 403 or claimed_result["status"] == 429:
result.status = "anti_bot"
result.issues.append(f"Anti-bot protection (HTTP {claimed_result['status']})")
elif claimed_result.get("markers", {}).get("captcha"):
result.status = "anti_bot"
result.issues.append("Captcha detected")
elif claimed_result.get("markers", {}).get("cloudflare"):
result.status = "anti_bot"
result.warnings.append("Cloudflare protection detected")
elif claimed_result["error"] or unclaimed_result["error"]:
result.status = "error"
if claimed_result["error"]:
result.issues.append(f"Claimed error: {claimed_result['error']}")
if unclaimed_result["error"]:
result.issues.append(f"Unclaimed error: {unclaimed_result['error']}")
else:
# Validate check type
check_type = config.get("checkType", "status_code")
if check_type == "status_code":
if claimed_result["status"] == unclaimed_result["status"]:
result.status = "broken"
result.issues.append(f"Same status code ({claimed_result['status']}) for both")
# Suggest fix
if claimed_result["final_url"] != unclaimed_result["final_url"]:
result.recommendations.append("Switch to checkType: response_url")
else:
result.status = "working"
elif check_type == "response_url":
if claimed_result["final_url"] == unclaimed_result["final_url"]:
result.status = "broken"
result.issues.append("Same final URL for both")
if claimed_result["status"] != unclaimed_result["status"]:
result.recommendations.append("Switch to checkType: status_code")
else:
result.status = "working"
elif check_type == "message":
presense_strs = config.get("presenseStrs", [])
absence_strs = config.get("absenceStrs", [])
claimed_content = claimed_result.get("content", "") or ""
unclaimed_content = unclaimed_result.get("content", "") or ""
presense_ok = not presense_strs or any(s in claimed_content for s in presense_strs)
absence_claimed = absence_strs and any(s in claimed_content for s in absence_strs)
absence_unclaimed = absence_strs and any(s in unclaimed_content for s in absence_strs)
if presense_strs and not presense_ok:
result.status = "broken"
result.issues.append(f"presenseStrs not found: {presense_strs}")
# Check if status_code would work
if claimed_result["status"] != unclaimed_result["status"]:
result.recommendations.append(f"Switch to checkType: status_code ({claimed_result['status']} vs {unclaimed_result['status']})")
elif absence_claimed:
result.status = "broken"
result.issues.append(f"absenceStrs found in claimed page")
elif absence_strs and not absence_unclaimed:
result.status = "broken"
result.warnings.append("absenceStrs not found in unclaimed page")
else:
result.status = "working"
else:
result.status = "unknown"
result.warnings.append(f"Unknown checkType: {check_type}")
result.check_time_ms = int((time.time() - start_time) * 1000)
return result
def load_sites(db_path: Path) -> Dict[str, dict]:
"""Load all sites from data.json."""
with open(db_path) as f:
data = json.load(f)
return data.get("sites", {})
def get_top_sites(sites: Dict[str, dict], n: int) -> List[Tuple[str, dict]]:
"""Get top N sites by Alexa rank."""
ranked = []
for name, config in sites.items():
rank = config.get("alexaRank", 999999)
ranked.append((name, config, rank))
ranked.sort(key=lambda x: x[2])
return [(name, config) for name, config, _ in ranked[:n]]
async def check_sites_batch(sites: List[Tuple[str, dict]], parallel: int = 5,
timeout: int = 15, progress_callback=None) -> List[SiteCheckResult]:
"""Check multiple sites with parallelism control."""
results = []
semaphore = asyncio.Semaphore(parallel)
async def check_with_semaphore(name, config, index):
async with semaphore:
if progress_callback:
progress_callback(index, len(sites), name)
return await check_site(name, config, timeout)
tasks = [
check_with_semaphore(name, config, i)
for i, (name, config) in enumerate(sites)
]
results = await asyncio.gather(*tasks)
return results
def print_progress(current: int, total: int, site_name: str):
"""Print progress indicator."""
pct = int(current / total * 100)
bar_width = 30
filled = int(bar_width * current / total)
bar = "" * filled + "" * (bar_width - filled)
print(f"\r[{bar}] {pct:3d}% ({current}/{total}) {site_name:<30}", end="", flush=True)
def generate_report(results: List[SiteCheckResult]) -> dict:
"""Generate a summary report from check results."""
report = {
"summary": {
"total": len(results),
"working": 0,
"broken": 0,
"disabled": 0,
"timeout": 0,
"anti_bot": 0,
"error": 0,
"unknown": 0,
},
"by_status": defaultdict(list),
"issues": [],
"recommendations": [],
}
for r in results:
report["summary"][r.status] = report["summary"].get(r.status, 0) + 1
report["by_status"][r.status].append(r.site_name)
if r.issues:
report["issues"].append({
"site": r.site_name,
"rank": r.alexa_rank,
"issues": r.issues,
})
if r.recommendations:
report["recommendations"].append({
"site": r.site_name,
"rank": r.alexa_rank,
"recommendations": r.recommendations,
})
return report
def print_report(report: dict, results: List[SiteCheckResult]):
"""Print a formatted report to console."""
summary = report["summary"]
print(f"\n{'='*60}")
print(f"{color('SITE CHECK REPORT', Colors.CYAN)}")
print(f"{'='*60}\n")
print(f"{color('SUMMARY:', Colors.BOLD)}")
print(f" Total sites checked: {summary['total']}")
print(f" {color('Working:', Colors.GREEN)} {summary['working']}")
print(f" {color('Broken:', Colors.RED)} {summary['broken']}")
print(f" {color('Disabled:', Colors.YELLOW)} {summary['disabled']}")
print(f" {color('Timeout:', Colors.YELLOW)} {summary['timeout']}")
print(f" {color('Anti-bot:', Colors.YELLOW)} {summary['anti_bot']}")
print(f" {color('Error:', Colors.RED)} {summary['error']}")
# Broken sites
if report["by_status"]["broken"]:
print(f"\n{color('BROKEN SITES:', Colors.RED)}")
for site in report["by_status"]["broken"][:20]:
r = next(x for x in results if x.site_name == site)
print(f" - {site} (rank {r.alexa_rank}): {', '.join(r.issues)}")
if len(report["by_status"]["broken"]) > 20:
print(f" ... and {len(report['by_status']['broken']) - 20} more")
# Timeout sites
if report["by_status"]["timeout"]:
print(f"\n{color('TIMEOUT SITES:', Colors.YELLOW)}")
for site in report["by_status"]["timeout"][:10]:
print(f" - {site}")
if len(report["by_status"]["timeout"]) > 10:
print(f" ... and {len(report['by_status']['timeout']) - 10} more")
# Anti-bot sites
if report["by_status"]["anti_bot"]:
print(f"\n{color('ANTI-BOT PROTECTED:', Colors.YELLOW)}")
for site in report["by_status"]["anti_bot"][:10]:
r = next(x for x in results if x.site_name == site)
print(f" - {site}: {', '.join(r.issues)}")
if len(report["by_status"]["anti_bot"]) > 10:
print(f" ... and {len(report['by_status']['anti_bot']) - 10} more")
# Recommendations
if report["recommendations"]:
print(f"\n{color('RECOMMENDATIONS:', Colors.CYAN)}")
for rec in report["recommendations"][:15]:
print(f" {rec['site']} (rank {rec['rank']}):")
for r in rec["recommendations"]:
print(f" -> {r}")
if len(report["recommendations"]) > 15:
print(f" ... and {len(report['recommendations']) - 15} more")
async def main():
parser = argparse.ArgumentParser(
description="Mass site checking for Maigret",
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument("--top", "-n", type=int, default=100,
help="Check top N sites by Alexa rank (default: 100)")
parser.add_argument("--parallel", "-p", type=int, default=5,
help="Number of parallel requests (default: 5)")
parser.add_argument("--timeout", "-t", type=int, default=15,
help="Request timeout in seconds (default: 15)")
parser.add_argument("--output", "-o", help="Output JSON report to file")
parser.add_argument("--include-disabled", action="store_true",
help="Include disabled sites in results")
parser.add_argument("--only-broken", action="store_true",
help="Only show broken sites")
parser.add_argument("--json", action="store_true",
help="Output as JSON only")
args = parser.parse_args()
# Load sites
db_path = Path(__file__).parent.parent / "maigret" / "resources" / "data.json"
if not db_path.exists():
print(f"Database not found: {db_path}")
sys.exit(1)
sites = load_sites(db_path)
top_sites = get_top_sites(sites, args.top)
if not args.json:
print(f"Checking top {len(top_sites)} sites (parallel={args.parallel}, timeout={args.timeout}s)...")
print()
# Run checks
progress = print_progress if not args.json else None
results = await check_sites_batch(top_sites, args.parallel, args.timeout, progress)
if not args.json:
print() # Clear progress line
# Filter results
if not args.include_disabled:
results = [r for r in results if r.status != "disabled"]
if args.only_broken:
results = [r for r in results if r.status in ("broken", "error", "timeout")]
# Generate report
report = generate_report(results)
# Output
if args.json:
output = {
"report": report,
"results": [asdict(r) for r in results],
}
print(json.dumps(output, indent=2))
else:
print_report(report, results)
# Save to file
if args.output:
output = {
"report": report,
"results": [asdict(r) for r in results],
}
with open(args.output, "w") as f:
json.dump(output, f, indent=2)
print(f"\nReport saved to: {args.output}")
if __name__ == "__main__":
asyncio.run(main())
+223
View File
@@ -0,0 +1,223 @@
#!/usr/bin/env python3
"""
Probe likely false-positive sites among the top-N Alexa-ranked entries.
For each of K random *distinct* usernames taken from ``usernameClaimed`` fields in
the Maigret database, runs a clean ``maigret`` scan (``--top-sites N --json simple|ndjson``).
Sites that return CLAIMED in *every* run are reported: unrelated random claimed
handles are unlikely to all exist on the same third-party site, so such sites are
candidates for broken checks.
"""
from __future__ import annotations
import argparse
import json
import random
import shutil
import subprocess
import sys
import tempfile
from pathlib import Path
def repo_root() -> Path:
return Path(__file__).resolve().parent.parent
def load_username_claimed_pool(db_path: Path) -> list[str]:
with db_path.open(encoding="utf-8") as f:
data = json.load(f)
sites = data.get("sites") or {}
seen: set[str] = set()
pool: list[str] = []
for _name, site in sites.items():
u = (site or {}).get("usernameClaimed")
if not u or not isinstance(u, str):
continue
u = u.strip()
if not u or u in seen:
continue
seen.add(u)
pool.append(u)
return pool
def run_maigret(
*,
username: str,
db_path: Path,
out_dir: Path,
top_sites: int,
json_format: str,
quiet: bool,
) -> Path:
"""Run maigret subprocess; return path to the written JSON report."""
safe = username.replace("/", "_")
report_name = f"report_{safe}_{json_format}.json"
report_path = out_dir / report_name
cmd = [
sys.executable,
"-m",
"maigret",
username,
"--db",
str(db_path),
"--top-sites",
str(top_sites),
"--json",
json_format,
"--folderoutput",
str(out_dir),
"--no-progressbar",
"--no-color",
"--no-recursion",
"--no-extracting",
]
sink = subprocess.DEVNULL if quiet else None
proc = subprocess.run(
cmd,
cwd=str(repo_root()),
text=True,
stdout=sink,
stderr=sink,
)
if proc.returncode != 0:
raise RuntimeError(
f"maigret exited with {proc.returncode} for username {username!r}"
)
if not report_path.is_file():
raise FileNotFoundError(f"Expected report missing: {report_path}")
return report_path
def claimed_sites_from_report(path: Path, json_format: str) -> set[str]:
if json_format == "simple":
with path.open(encoding="utf-8") as f:
data = json.load(f)
if not isinstance(data, dict):
return set()
return set(data.keys())
# ndjson: one object per line, each has "sitename"
sites: set[str] = set()
with path.open(encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
obj = json.loads(line)
name = obj.get("sitename")
if isinstance(name, str) and name:
sites.add(name)
return sites
def main() -> int:
parser = argparse.ArgumentParser(
description=(
"Pick random distinct usernameClaimed values, run maigret --top-sites N "
"with JSON reports, and list sites that claimed all of them (suspicious FP)."
)
)
parser.add_argument(
"--db",
"-b",
type=Path,
default=repo_root() / "maigret" / "resources" / "data.json",
help="Path to Maigret data.json (a temp copy is used for runs).",
)
parser.add_argument(
"--top-sites",
"-n",
type=int,
default=500,
metavar="N",
help="Value for maigret --top-sites (default: 500).",
)
parser.add_argument(
"--samples",
"-k",
type=int,
default=5,
metavar="K",
help="How many distinct random usernames to draw (default: 5).",
)
parser.add_argument(
"--seed",
type=int,
default=None,
help="RNG seed for reproducible username selection.",
)
parser.add_argument(
"--json",
dest="json_format",
default="simple",
choices=["simple", "ndjson"],
help="JSON report type passed to maigret -J (default: simple).",
)
parser.add_argument(
"--verbose",
"-v",
action="store_true",
default=False,
help="Print maigret stdout/stderr (default: suppress child output).",
)
args = parser.parse_args()
quiet = not args.verbose
db_src = args.db.resolve()
if not db_src.is_file():
print(f"Database not found: {db_src}", file=sys.stderr)
return 2
pool = load_username_claimed_pool(db_src)
if len(pool) < args.samples:
print(
f"Need at least {args.samples} distinct usernameClaimed entries, "
f"found {len(pool)}.",
file=sys.stderr,
)
return 2
rng = random.Random(args.seed)
picked = rng.sample(pool, args.samples)
print(f"Database: {db_src}")
print(f"--top-sites {args.top_sites}, {args.samples} random usernameClaimed:")
for i, u in enumerate(picked, 1):
print(f" {i}. {u}")
site_sets: list[set[str]] = []
with tempfile.TemporaryDirectory(prefix="maigret_fp_probe_") as tmp:
tmp_path = Path(tmp)
db_work = tmp_path / "data.json"
shutil.copyfile(db_src, db_work)
for u in picked:
print(f"\nRunning maigret for {u!r} ...", flush=True)
report = run_maigret(
username=u,
db_path=db_work,
out_dir=tmp_path,
top_sites=args.top_sites,
json_format=args.json_format,
quiet=quiet,
)
sites = claimed_sites_from_report(report, args.json_format)
site_sets.append(sites)
print(f" -> {len(sites)} positive site(s) in JSON", flush=True)
always = set.intersection(*site_sets) if site_sets else set()
print("\n--- Sites with CLAIMED in all runs (candidates for false positives) ---")
if not always:
print("(none)")
else:
for name in sorted(always):
print(name)
return 0
if __name__ == "__main__":
raise SystemExit(main())
+59
View File
@@ -0,0 +1,59 @@
"""Generate db_meta.json from data.json for the auto-update system."""
import argparse
import hashlib
import json
import os.path as path
import sys
from datetime import datetime, timezone
RESOURCES_DIR = path.join(path.dirname(path.dirname(path.abspath(__file__))), "maigret", "resources")
DATA_JSON_PATH = path.join(RESOURCES_DIR, "data.json")
META_JSON_PATH = path.join(RESOURCES_DIR, "db_meta.json")
DEFAULT_DATA_URL = "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/data.json"
def get_current_version():
version_file = path.join(path.dirname(path.dirname(path.abspath(__file__))), "maigret", "__version__.py")
with open(version_file) as f:
for line in f:
if line.startswith("__version__"):
return line.split("=")[1].strip().strip("'\"")
return "0.0.0"
def main():
parser = argparse.ArgumentParser(description="Generate db_meta.json from data.json")
parser.add_argument("--min-version", default=None, help="Minimum compatible maigret version (default: current version)")
parser.add_argument("--data-url", default=DEFAULT_DATA_URL, help="URL where data.json can be downloaded")
args = parser.parse_args()
min_version = args.min_version or get_current_version()
with open(DATA_JSON_PATH, "rb") as f:
raw = f.read()
sha256 = hashlib.sha256(raw).hexdigest()
data = json.loads(raw)
sites_count = len(data.get("sites", {}))
meta = {
"version": 1,
"updated_at": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
"sites_count": sites_count,
"min_maigret_version": min_version,
"data_sha256": sha256,
"data_url": args.data_url,
}
with open(META_JSON_PATH, "w", encoding="utf-8") as f:
json.dump(meta, f, indent=4, ensure_ascii=False)
print(f"Generated {META_JSON_PATH}")
print(f" sites: {sites_count}")
print(f" sha256: {sha256[:16]}...")
print(f" min_version: {min_version}")
if __name__ == "__main__":
main()
+808
View File
@@ -0,0 +1,808 @@
#!/usr/bin/env python3
"""
Site check utility for Maigret development.
Quickly test site availability, find valid usernames, and diagnose check issues.
Usage:
python utils/site_check.py --site "SiteName" --check-claimed
python utils/site_check.py --site "SiteName" --maigret # Test via Maigret
python utils/site_check.py --site "SiteName" --compare-methods # aiohttp vs Maigret
python utils/site_check.py --url "https://example.com/user/{username}" --test "john"
python utils/site_check.py --site "SiteName" --find-user
python utils/site_check.py --site "SiteName" --diagnose # Full diagnosis
"""
import argparse
import asyncio
import json
import logging
import re
import sys
from pathlib import Path
from typing import Dict, List, Optional, Tuple
# Add parent dir for imports
sys.path.insert(0, str(Path(__file__).parent.parent))
try:
import aiohttp
from yarl import URL as YarlURL
except ImportError:
print("aiohttp not installed. Run: pip install aiohttp")
sys.exit(1)
# Maigret imports (optional, for --maigret mode)
MAIGRET_AVAILABLE = False
try:
from maigret.sites import MaigretDatabase, MaigretSite
from maigret.checking import (
SimpleAiohttpChecker,
check_site_for_username,
process_site_result,
make_site_result,
)
from maigret.notify import QueryNotifyPrint
from maigret.result import QueryStatus
MAIGRET_AVAILABLE = True
except ImportError:
pass
DEFAULT_HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
}
COMMON_USERNAMES = ["blue", "test", "admin", "user", "john", "alex", "david", "mike", "chris", "dan"]
class Colors:
"""ANSI color codes for terminal output."""
RED = "\033[91m"
GREEN = "\033[92m"
YELLOW = "\033[93m"
BLUE = "\033[94m"
MAGENTA = "\033[95m"
CYAN = "\033[96m"
RESET = "\033[0m"
BOLD = "\033[1m"
def color(text: str, c: str) -> str:
"""Wrap text with color codes."""
return f"{c}{text}{Colors.RESET}"
async def check_url_aiohttp(url: str, headers: dict = None, follow_redirects: bool = True,
timeout: int = 15, ssl_verify: bool = False,
method: str = "GET", payload: dict = None) -> dict:
"""Check a URL using aiohttp and return detailed response info.
Args:
method: HTTP method ("GET" or "POST").
payload: JSON payload for POST requests (dict, will be serialized).
"""
headers = headers or DEFAULT_HEADERS.copy()
result = {
"method": "aiohttp",
"url": url,
"status": None,
"final_url": None,
"redirects": [],
"content_length": 0,
"content": None,
"title": None,
"error": None,
"error_type": None,
"markers": {},
}
try:
connector = aiohttp.TCPConnector(ssl=ssl_verify)
timeout_obj = aiohttp.ClientTimeout(total=timeout)
async with aiohttp.ClientSession(connector=connector, timeout=timeout_obj) as session:
# Use encoded=True if URL contains percent-encoded chars to prevent double-encoding
request_url = YarlURL(url, encoded=True) if '%' in url else url
request_kwargs = dict(headers=headers, allow_redirects=follow_redirects)
if method.upper() == "POST" and payload is not None:
request_kwargs["json"] = payload
request_fn = session.post if method.upper() == "POST" else session.get
async with request_fn(request_url, **request_kwargs) as resp:
result["status"] = resp.status
result["final_url"] = str(resp.url)
# Get redirect history
if resp.history:
result["redirects"] = [str(r.url) for r in resp.history]
# Read content
try:
text = await resp.text()
result["content_length"] = len(text)
result["content"] = text
# Extract title
title_match = re.search(r'<title>([^<]*)</title>', text, re.IGNORECASE)
if title_match:
result["title"] = title_match.group(1).strip()[:100]
# Check common markers
text_lower = text.lower()
markers = {
"404_text": any(m in text_lower for m in ["not found", "404", "doesn't exist", "does not exist"]),
"profile_markers": any(m in text_lower for m in ["profile", "user", "member", "account"]),
"error_markers": any(m in text_lower for m in ["error", "banned", "suspended", "blocked"]),
"login_required": any(m in text_lower for m in ["log in", "login", "sign in", "signin"]),
"captcha": any(m in text_lower for m in ["captcha", "recaptcha", "challenge", "verify you"]),
"cloudflare": "cloudflare" in text_lower or "cf-ray" in text_lower,
"rate_limit": any(m in text_lower for m in ["rate limit", "too many requests", "429"]),
}
result["markers"] = markers
# First 500 chars of body for inspection
result["body_preview"] = text[:500].replace("\n", " ").strip()
except Exception as e:
result["error"] = f"Content read error: {e}"
result["error_type"] = "content_error"
except asyncio.TimeoutError:
result["error"] = "Timeout"
result["error_type"] = "timeout"
except aiohttp.ClientError as e:
result["error"] = f"Client error: {e}"
result["error_type"] = "client_error"
except Exception as e:
result["error"] = f"Error: {e}"
result["error_type"] = "unknown"
return result
async def check_url_maigret(site: 'MaigretSite', username: str, logger=None) -> dict:
"""Check a URL using Maigret's checking mechanism."""
if not MAIGRET_AVAILABLE:
return {"error": "Maigret not available", "method": "maigret"}
if logger is None:
logger = logging.getLogger("site_check")
logger.setLevel(logging.WARNING)
result = {
"method": "maigret",
"url": None,
"status": None,
"status_str": None,
"http_status": None,
"final_url": None,
"error": None,
"error_type": None,
"ids_data": None,
}
try:
# Create query options
options = {
"parsing": False,
"cookie_jar": None,
"timeout": 15,
}
# Create a simple notifier
class SilentNotify:
def start(self, msg=None): pass
def update(self, status, similar=False): pass
def finish(self, msg=None, status=None): pass
notifier = SilentNotify()
# Run the check
site_name, site_result = await check_site_for_username(
site, username, options, logger, notifier
)
result["url"] = site_result.get("url_user")
result["status"] = site_result.get("status")
result["status_str"] = str(site_result.get("status"))
result["http_status"] = site_result.get("http_status")
result["ids_data"] = site_result.get("ids_data")
# Check for errors
status = site_result.get("status")
if status and hasattr(status, 'error') and status.error:
result["error"] = f"{status.error.type}: {status.error.desc}"
result["error_type"] = str(status.error.type)
except Exception as e:
result["error"] = str(e)
result["error_type"] = "exception"
return result
async def find_valid_username(url_template: str, usernames: list = None, headers: dict = None) -> Optional[str]:
"""Try common usernames to find one that works."""
usernames = usernames or COMMON_USERNAMES
headers = headers or DEFAULT_HEADERS.copy()
print(f"Testing {len(usernames)} usernames on {url_template}...")
for username in usernames:
url = url_template.replace("{username}", username)
result = await check_url_aiohttp(url, headers)
status = result["status"]
markers = result.get("markers", {})
# Good signs: 200 status, profile markers, no 404 text
if status == 200 and not markers.get("404_text") and markers.get("profile_markers"):
print(f" {color('[+]', Colors.GREEN)} {username}: status={status}, has profile markers")
return username
elif status == 200 and not markers.get("404_text"):
print(f" {color('[?]', Colors.YELLOW)} {username}: status={status}, might work")
else:
print(f" {color('[-]', Colors.RED)} {username}: status={status}")
return None
async def compare_users_aiohttp(url_template: str, claimed: str, unclaimed: str = "noonewouldeverusethis7",
headers: dict = None) -> Tuple[dict, dict]:
"""Compare responses for claimed vs unclaimed usernames using aiohttp."""
headers = headers or DEFAULT_HEADERS.copy()
print(f"\n{'='*60}")
print(f"Comparing: {color(claimed, Colors.GREEN)} vs {color(unclaimed, Colors.RED)}")
print(f"URL template: {url_template}")
print(f"Method: aiohttp")
print(f"{'='*60}\n")
url_claimed = url_template.replace("{username}", claimed)
url_unclaimed = url_template.replace("{username}", unclaimed)
result_claimed, result_unclaimed = await asyncio.gather(
check_url_aiohttp(url_claimed, headers),
check_url_aiohttp(url_unclaimed, headers)
)
def print_result(name, r, c):
print(f"--- {color(name, c)} ---")
print(f" URL: {r['url']}")
print(f" Status: {color(str(r['status']), Colors.GREEN if r['status'] == 200 else Colors.RED)}")
if r["redirects"]:
print(f" Redirects: {' -> '.join(r['redirects'])} -> {r['final_url']}")
print(f" Final URL: {r['final_url']}")
print(f" Content length: {r['content_length']}")
print(f" Title: {r['title']}")
if r["error"]:
print(f" Error: {color(r['error'], Colors.RED)}")
print(f" Markers: {r['markers']}")
print()
print_result(f"CLAIMED ({claimed})", result_claimed, Colors.GREEN)
print_result(f"UNCLAIMED ({unclaimed})", result_unclaimed, Colors.RED)
# Analysis
print(f"--- {color('ANALYSIS', Colors.CYAN)} ---")
recommendations = []
if result_claimed["status"] != result_unclaimed["status"]:
print(f" [!] Status codes differ: {result_claimed['status']} vs {result_unclaimed['status']}")
recommendations.append(("status_code", f"Status codes: {result_claimed['status']} vs {result_unclaimed['status']}"))
if result_claimed["final_url"] != result_unclaimed["final_url"]:
print(f" [!] Final URLs differ")
recommendations.append(("response_url", "Final URLs differ"))
if result_claimed["content_length"] != result_unclaimed["content_length"]:
diff = abs(result_claimed["content_length"] - result_unclaimed["content_length"])
print(f" [!] Content length differs by {diff} bytes")
recommendations.append(("message", f"Content differs by {diff} bytes"))
if result_claimed["title"] != result_unclaimed["title"]:
print(f" [!] Titles differ:")
print(f" Claimed: {result_claimed['title']}")
print(f" Unclaimed: {result_unclaimed['title']}")
recommendations.append(("message", f"Titles differ: '{result_claimed['title']}' vs '{result_unclaimed['title']}'"))
# Check for problems
if result_claimed.get("markers", {}).get("captcha"):
print(f" {color('[WARN]', Colors.YELLOW)} Captcha detected on claimed page")
if result_claimed.get("markers", {}).get("cloudflare"):
print(f" {color('[WARN]', Colors.YELLOW)} Cloudflare protection detected")
if result_claimed.get("markers", {}).get("login_required"):
print(f" {color('[WARN]', Colors.YELLOW)} Login may be required")
if recommendations:
print(f"\n {color('Recommended checkType:', Colors.BOLD)} {recommendations[0][0]}")
else:
print(f" {color('[!]', Colors.RED)} No clear difference found - site may need special handling")
return result_claimed, result_unclaimed
async def compare_methods(site: 'MaigretSite', claimed: str, unclaimed: str) -> dict:
"""Compare aiohttp vs Maigret results for the same site."""
if not MAIGRET_AVAILABLE:
print(color("Maigret not available for comparison", Colors.RED))
return {}
print(f"\n{'='*60}")
print(f"{color('METHOD COMPARISON', Colors.CYAN)}: aiohttp vs Maigret")
print(f"Site: {site.name}")
print(f"Claimed: {claimed}, Unclaimed: {unclaimed}")
print(f"{'='*60}\n")
# Build URL template
url_template = site.url
url_template = url_template.replace("{urlMain}", site.url_main or "")
url_template = url_template.replace("{urlSubpath}", getattr(site, 'url_subpath', '') or "")
headers = DEFAULT_HEADERS.copy()
if hasattr(site, 'headers') and site.headers:
headers.update(site.headers)
# Run all checks in parallel
url_claimed = url_template.replace("{username}", claimed)
url_unclaimed = url_template.replace("{username}", unclaimed)
aiohttp_claimed, aiohttp_unclaimed, maigret_claimed, maigret_unclaimed = await asyncio.gather(
check_url_aiohttp(url_claimed, headers),
check_url_aiohttp(url_unclaimed, headers),
check_url_maigret(site, claimed),
check_url_maigret(site, unclaimed),
)
def status_icon(status):
if status == 200:
return color("200", Colors.GREEN)
elif status == 404:
return color("404", Colors.YELLOW)
elif status and status >= 400:
return color(str(status), Colors.RED)
return str(status)
def maigret_status_icon(status_str):
if "Claimed" in str(status_str):
return color("Claimed", Colors.GREEN)
elif "Available" in str(status_str):
return color("Available", Colors.YELLOW)
else:
return color(str(status_str), Colors.RED)
print(f"{'Method':<12} {'Username':<25} {'HTTP Status':<12} {'Result':<20}")
print("-" * 70)
print(f"{'aiohttp':<12} {claimed:<25} {status_icon(aiohttp_claimed['status']):<20} {'OK' if not aiohttp_claimed['error'] else aiohttp_claimed['error'][:20]}")
print(f"{'aiohttp':<12} {unclaimed:<25} {status_icon(aiohttp_unclaimed['status']):<20} {'OK' if not aiohttp_unclaimed['error'] else aiohttp_unclaimed['error'][:20]}")
print(f"{'Maigret':<12} {claimed:<25} {status_icon(maigret_claimed.get('http_status')):<20} {maigret_status_icon(maigret_claimed.get('status_str'))}")
print(f"{'Maigret':<12} {unclaimed:<25} {status_icon(maigret_unclaimed.get('http_status')):<20} {maigret_status_icon(maigret_unclaimed.get('status_str'))}")
# Check for discrepancies
print(f"\n--- {color('DISCREPANCY ANALYSIS', Colors.CYAN)} ---")
issues = []
if aiohttp_claimed['status'] != maigret_claimed.get('http_status'):
issues.append(f"HTTP status mismatch for claimed: aiohttp={aiohttp_claimed['status']}, Maigret={maigret_claimed.get('http_status')}")
if aiohttp_unclaimed['status'] != maigret_unclaimed.get('http_status'):
issues.append(f"HTTP status mismatch for unclaimed: aiohttp={aiohttp_unclaimed['status']}, Maigret={maigret_unclaimed.get('http_status')}")
# Check Maigret detection correctness
claimed_detected = "Claimed" in str(maigret_claimed.get('status_str', ''))
unclaimed_detected = "Available" in str(maigret_unclaimed.get('status_str', ''))
if not claimed_detected:
issues.append(f"Maigret did NOT detect claimed user '{claimed}' as Claimed")
if not unclaimed_detected:
issues.append(f"Maigret did NOT detect unclaimed user '{unclaimed}' as Available")
if issues:
for issue in issues:
print(f" {color('[!]', Colors.RED)} {issue}")
else:
print(f" {color('[OK]', Colors.GREEN)} Both methods agree on results")
return {
"aiohttp_claimed": aiohttp_claimed,
"aiohttp_unclaimed": aiohttp_unclaimed,
"maigret_claimed": maigret_claimed,
"maigret_unclaimed": maigret_unclaimed,
"issues": issues,
}
async def diagnose_site(site_config: dict, site_name: str) -> dict:
"""Full diagnosis of a site configuration."""
print(f"\n{'='*60}")
print(f"{color('FULL SITE DIAGNOSIS', Colors.CYAN)}: {site_name}")
print(f"{'='*60}\n")
diagnosis = {
"site_name": site_name,
"issues": [],
"warnings": [],
"recommendations": [],
"working": False,
}
# 1. Config analysis
print(f"--- {color('1. CONFIGURATION', Colors.BOLD)} ---")
check_type = site_config.get("checkType", "status_code")
url = site_config.get("url", "")
url_main = site_config.get("urlMain", "")
claimed = site_config.get("usernameClaimed")
unclaimed = site_config.get("usernameUnclaimed", "noonewouldeverusethis7")
disabled = site_config.get("disabled", False)
print(f" checkType: {check_type}")
print(f" URL: {url}")
print(f" urlMain: {url_main}")
print(f" usernameClaimed: {claimed}")
print(f" disabled: {disabled}")
if disabled:
diagnosis["issues"].append("Site is disabled")
print(f" {color('[!]', Colors.YELLOW)} Site is disabled")
if not claimed:
diagnosis["issues"].append("No usernameClaimed defined")
print(f" {color('[!]', Colors.RED)} No usernameClaimed defined")
return diagnosis
# Build full URL (display URL)
url_template = url.replace("{urlMain}", url_main).replace("{urlSubpath}", site_config.get("urlSubpath", ""))
# Build probe URL (what Maigret actually requests)
url_probe = site_config.get("urlProbe", "")
if url_probe:
probe_template = url_probe.replace("{urlMain}", url_main).replace("{urlSubpath}", site_config.get("urlSubpath", ""))
else:
probe_template = url_template
# Detect request method and payload
request_method = site_config.get("requestMethod", "GET").upper()
request_payload_template = site_config.get("requestPayload")
headers = DEFAULT_HEADERS.copy()
# For API probes (urlProbe, POST), use neutral Accept header instead of text/html
# which can cause servers to return HTML instead of JSON
if url_probe or request_method == "POST":
headers["Accept"] = "*/*"
if site_config.get("headers"):
headers.update(site_config["headers"])
if url_probe:
print(f" urlProbe: {url_probe}")
if request_method != "GET":
print(f" requestMethod: {request_method}")
if request_payload_template:
print(f" requestPayload: {request_payload_template}")
# 2. Connectivity test
print(f"\n--- {color('2. CONNECTIVITY TEST', Colors.BOLD)} ---")
probe_claimed = probe_template.replace("{username}", claimed)
probe_unclaimed = probe_template.replace("{username}", unclaimed)
# Build payloads with username substituted
payload_claimed = None
payload_unclaimed = None
if request_payload_template and request_method == "POST":
payload_claimed = json.loads(
json.dumps(request_payload_template).replace("{username}", claimed)
)
payload_unclaimed = json.loads(
json.dumps(request_payload_template).replace("{username}", unclaimed)
)
result_claimed, result_unclaimed = await asyncio.gather(
check_url_aiohttp(probe_claimed, headers, method=request_method, payload=payload_claimed),
check_url_aiohttp(probe_unclaimed, headers, method=request_method, payload=payload_unclaimed)
)
print(f" Claimed ({claimed}): status={result_claimed['status']}, error={result_claimed['error']}")
print(f" Unclaimed ({unclaimed}): status={result_unclaimed['status']}, error={result_unclaimed['error']}")
# Check for common problems
if result_claimed["error_type"] == "timeout":
diagnosis["issues"].append("Timeout on claimed username")
if result_unclaimed["error_type"] == "timeout":
diagnosis["issues"].append("Timeout on unclaimed username")
if result_claimed.get("markers", {}).get("cloudflare"):
diagnosis["warnings"].append("Cloudflare protection detected")
if result_claimed.get("markers", {}).get("captcha"):
diagnosis["warnings"].append("Captcha detected")
if result_claimed["status"] == 403:
diagnosis["issues"].append("403 Forbidden - possible anti-bot protection")
if result_claimed["status"] == 429:
diagnosis["issues"].append("429 Rate Limited")
# 3. Check type validation
print(f"\n--- {color('3. CHECK TYPE VALIDATION', Colors.BOLD)} ---")
if check_type == "status_code":
if result_claimed["status"] == result_unclaimed["status"]:
diagnosis["issues"].append(f"status_code check but same status ({result_claimed['status']}) for both")
print(f" {color('[FAIL]', Colors.RED)} Same status code for claimed and unclaimed: {result_claimed['status']}")
else:
print(f" {color('[OK]', Colors.GREEN)} Status codes differ: {result_claimed['status']} vs {result_unclaimed['status']}")
diagnosis["working"] = True
elif check_type == "response_url":
if result_claimed["final_url"] == result_unclaimed["final_url"]:
diagnosis["issues"].append("response_url check but same final URL for both")
print(f" {color('[FAIL]', Colors.RED)} Same final URL for both")
else:
print(f" {color('[OK]', Colors.GREEN)} Final URLs differ")
diagnosis["working"] = True
elif check_type == "message":
presense_strs = site_config.get("presenseStrs", [])
absence_strs = site_config.get("absenceStrs", [])
print(f" presenseStrs: {presense_strs}")
print(f" absenceStrs: {absence_strs}")
claimed_content = result_claimed.get("content", "") or ""
unclaimed_content = result_unclaimed.get("content", "") or ""
# Check presenseStrs
presense_found_claimed = any(s in claimed_content for s in presense_strs) if presense_strs else True
presense_found_unclaimed = any(s in unclaimed_content for s in presense_strs) if presense_strs else True
# Check absenceStrs
absence_found_claimed = any(s in claimed_content for s in absence_strs) if absence_strs else False
absence_found_unclaimed = any(s in unclaimed_content for s in absence_strs) if absence_strs else False
print(f" Claimed - presenseStrs found: {presense_found_claimed}, absenceStrs found: {absence_found_claimed}")
print(f" Unclaimed - presenseStrs found: {presense_found_unclaimed}, absenceStrs found: {absence_found_unclaimed}")
if presense_strs and not presense_found_claimed:
diagnosis["issues"].append(f"presenseStrs {presense_strs} not found in claimed page")
print(f" {color('[FAIL]', Colors.RED)} presenseStrs not found in claimed page")
if absence_strs and absence_found_claimed:
diagnosis["issues"].append(f"absenceStrs {absence_strs} found in claimed page (should not be)")
print(f" {color('[FAIL]', Colors.RED)} absenceStrs found in claimed page")
if absence_strs and not absence_found_unclaimed:
diagnosis["warnings"].append(f"absenceStrs not found in unclaimed page")
print(f" {color('[WARN]', Colors.YELLOW)} absenceStrs not found in unclaimed page")
# Check works if: claimed is detected as present AND unclaimed is detected as absent.
# Presence detection: presenseStrs found (or empty = always true).
# Absence detection: absenceStrs found in unclaimed (or empty = never, rely on presenseStrs only).
# With only presenseStrs: works if found in claimed but NOT in unclaimed.
# With only absenceStrs: works if found in unclaimed but NOT in claimed.
# With both: standard combination.
claimed_is_present = presense_found_claimed and not absence_found_claimed
unclaimed_is_absent = (
(absence_strs and absence_found_unclaimed) or
(presense_strs and not presense_found_unclaimed)
)
if claimed_is_present and unclaimed_is_absent:
print(f" {color('[OK]', Colors.GREEN)} Message check should work correctly")
diagnosis["working"] = True
# 4. Recommendations
print(f"\n--- {color('4. RECOMMENDATIONS', Colors.BOLD)} ---")
if not diagnosis["working"]:
# Suggest alternatives
if result_claimed["status"] != result_unclaimed["status"]:
diagnosis["recommendations"].append(f"Switch to checkType: status_code (status {result_claimed['status']} vs {result_unclaimed['status']})")
if result_claimed["final_url"] != result_unclaimed["final_url"]:
diagnosis["recommendations"].append("Switch to checkType: response_url")
if result_claimed["title"] != result_unclaimed["title"]:
diagnosis["recommendations"].append(f"Use title as marker: presenseStrs=['{result_claimed['title']}'] or absenceStrs=['{result_unclaimed['title']}']")
if diagnosis["recommendations"]:
for rec in diagnosis["recommendations"]:
print(f" -> {rec}")
elif diagnosis["working"]:
print(f" {color('Site appears to be working correctly', Colors.GREEN)}")
else:
print(f" {color('No clear fix found - site may need special handling or should be disabled', Colors.RED)}")
# Summary
print(f"\n--- {color('SUMMARY', Colors.BOLD)} ---")
if diagnosis["issues"]:
print(f" Issues: {len(diagnosis['issues'])}")
for issue in diagnosis["issues"]:
print(f" - {issue}")
if diagnosis["warnings"]:
print(f" Warnings: {len(diagnosis['warnings'])}")
for warn in diagnosis["warnings"]:
print(f" - {warn}")
print(f" Working: {color('YES', Colors.GREEN) if diagnosis['working'] else color('NO', Colors.RED)}")
return diagnosis
def load_site_from_db(site_name: str) -> Tuple[Optional[dict], Optional['MaigretSite']]:
"""Load site config from data.json. Returns (config_dict, MaigretSite or None)."""
db_path = Path(__file__).parent.parent / "maigret" / "resources" / "data.json"
with open(db_path) as f:
data = json.load(f)
config = None
if site_name in data["sites"]:
config = data["sites"][site_name]
else:
# Try case-insensitive search
for name, cfg in data["sites"].items():
if name.lower() == site_name.lower():
config = cfg
site_name = name
break
if not config:
return None, None
# Also load MaigretSite if available
maigret_site = None
if MAIGRET_AVAILABLE:
try:
db = MaigretDatabase().load_from_path(db_path)
maigret_site = db.sites_dict.get(site_name)
except Exception:
pass
return config, maigret_site
async def main():
parser = argparse.ArgumentParser(
description="Site check utility for Maigret development",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s --site "VK" --check-claimed # Test site with aiohttp
%(prog)s --site "VK" --maigret # Test site with Maigret
%(prog)s --site "VK" --compare-methods # Compare aiohttp vs Maigret
%(prog)s --site "VK" --diagnose # Full diagnosis
%(prog)s --url "https://vk.com/{username}" --compare blue nobody123
%(prog)s --site "VK" --find-user # Find a valid username
"""
)
parser.add_argument("--site", "-s", help="Site name from data.json")
parser.add_argument("--url", "-u", help="URL template with {username}")
parser.add_argument("--test", "-t", help="Username to test")
parser.add_argument("--compare", "-c", nargs=2, metavar=("CLAIMED", "UNCLAIMED"),
help="Compare two usernames")
parser.add_argument("--find-user", "-f", action="store_true",
help="Find a valid username")
parser.add_argument("--check-claimed", action="store_true",
help="Check if claimed username still works (aiohttp)")
parser.add_argument("--maigret", "-m", action="store_true",
help="Test using Maigret's checker instead of aiohttp")
parser.add_argument("--compare-methods", action="store_true",
help="Compare aiohttp vs Maigret results")
parser.add_argument("--diagnose", "-d", action="store_true",
help="Full diagnosis of site configuration")
parser.add_argument("--headers", help="Custom headers as JSON")
parser.add_argument("--timeout", type=int, default=15, help="Request timeout in seconds")
parser.add_argument("--json", action="store_true", help="Output results as JSON")
args = parser.parse_args()
url_template = None
claimed = None
unclaimed = "noonewouldeverusethis7"
headers = DEFAULT_HEADERS.copy()
site_config = None
maigret_site = None
# Load from site name
if args.site:
site_config, maigret_site = load_site_from_db(args.site)
if not site_config:
print(f"Site '{args.site}' not found in database")
sys.exit(1)
url_template = site_config.get("url", "")
url_main = site_config.get("urlMain", "")
url_subpath = site_config.get("urlSubpath", "")
url_template = url_template.replace("{urlMain}", url_main).replace("{urlSubpath}", url_subpath)
claimed = site_config.get("usernameClaimed")
unclaimed = site_config.get("usernameUnclaimed", unclaimed)
if site_config.get("headers"):
headers.update(site_config["headers"])
if not args.json:
print(f"Loaded site: {args.site}")
print(f" URL: {url_template}")
print(f" Claimed: {claimed}")
print(f" CheckType: {site_config.get('checkType', 'unknown')}")
print(f" Disabled: {site_config.get('disabled', False)}")
# Override with explicit URL
if args.url:
url_template = args.url
# Custom headers
if args.headers:
headers.update(json.loads(args.headers))
# Actions
if args.diagnose:
if not site_config:
print("--diagnose requires --site")
sys.exit(1)
result = await diagnose_site(site_config, args.site)
if args.json:
print(json.dumps(result, indent=2, default=str))
elif args.compare_methods:
if not maigret_site:
if not MAIGRET_AVAILABLE:
print("Maigret imports not available")
else:
print("Could not load MaigretSite object")
sys.exit(1)
result = await compare_methods(maigret_site, claimed, unclaimed)
if args.json:
print(json.dumps(result, indent=2, default=str))
elif args.maigret:
if not maigret_site:
if not MAIGRET_AVAILABLE:
print("Maigret imports not available")
else:
print("Could not load MaigretSite object")
sys.exit(1)
print(f"\n--- Testing with Maigret ---")
for username in [claimed, unclaimed]:
result = await check_url_maigret(maigret_site, username)
print(f" {username}: status={result.get('status_str')}, http={result.get('http_status')}, error={result.get('error')}")
elif args.find_user:
if not url_template:
print("--find-user requires --site or --url")
sys.exit(1)
result = await find_valid_username(url_template, headers=headers)
if result:
print(f"\n{color('Found valid username:', Colors.GREEN)} {result}")
else:
print(f"\n{color('No valid username found', Colors.RED)}")
elif args.compare:
if not url_template:
print("--compare requires --site or --url")
sys.exit(1)
result = await compare_users_aiohttp(url_template, args.compare[0], args.compare[1], headers)
if args.json:
# Remove content field for JSON output (too large)
for r in result:
if isinstance(r, dict) and "content" in r:
del r["content"]
print(json.dumps(result, indent=2, default=str))
elif args.check_claimed and claimed:
result = await compare_users_aiohttp(url_template, claimed, unclaimed, headers)
elif args.test:
if not url_template:
print("--test requires --site or --url")
sys.exit(1)
url = url_template.replace("{username}", args.test)
result = await check_url_aiohttp(url, headers, timeout=args.timeout)
if "content" in result:
del result["content"] # Too large for display
print(json.dumps(result, indent=2, default=str))
else:
# Default: check claimed username if available
if url_template and claimed:
await compare_users_aiohttp(url_template, claimed, unclaimed, headers)
else:
parser.print_help()
if __name__ == "__main__":
asyncio.run(main())
+134 -39
View File
@@ -4,6 +4,7 @@ This module generates the listing of supported sites in file `SITES.md`
and pretty prints file with sites data.
"""
import sys
import socket
import requests
import logging
import threading
@@ -24,36 +25,87 @@ RANKS.update({
'100000000': '100M',
})
SEMAPHORE = threading.Semaphore(20)
def get_rank(domain_to_query, site, print_errors=True):
with SEMAPHORE:
# Retrieve ranking data via alexa API
url = f"http://data.alexa.com/data?cli=10&url={domain_to_query}"
xml_data = requests.get(url).text
root = ET.fromstring(xml_data)
import csv
import io
from urllib.parse import urlparse
try:
#Get ranking for this site.
site.alexa_rank = int(root.find('.//REACH').attrib['RANK'])
# country = root.find('.//COUNTRY')
# if not country is None and country.attrib:
# country_code = country.attrib['CODE']
# tags = set(site.tags)
# if country_code:
# tags.add(country_code.lower())
# site.tags = sorted(list(tags))
# if site.type != 'username':
# site.disabled = False
except Exception as e:
if print_errors:
logging.error(e)
# We did not find the rank for some reason.
print(f"Error retrieving rank information for '{domain_to_query}'")
print(f" Returned XML is |{xml_data}|")
def fetch_majestic_million():
print("Fetching Majestic Million CSV (this may take a few seconds)...")
ranks = {}
url = "https://downloads.majestic.com/majestic_million.csv"
try:
response = requests.get(url, stream=True)
response.raise_for_status()
csv_file = io.StringIO(response.text)
reader = csv.reader(csv_file)
next(reader) # skip headers
for row in reader:
if not row or len(row) < 3:
continue
rank = int(row[0])
domain = row[2].lower()
ranks[domain] = rank
except Exception as e:
logging.error(f"Error fetching Majestic Million: {e}")
print(f"Loaded {len(ranks)} domains from Majestic Million.")
return ranks
return
def get_base_domain(url):
try:
netloc = urlparse(url).netloc
if netloc.startswith('www.'):
netloc = netloc[4:]
return netloc.lower()
except Exception:
return ""
def check_dns(domain, timeout=5):
"""Check if a domain resolves via DNS. Returns True if it resolves."""
try:
socket.setdefaulttimeout(timeout)
socket.getaddrinfo(domain, None)
return True
except (socket.gaierror, socket.timeout, OSError):
return False
def check_sites_dns(sites):
"""Check DNS resolution for all sites. Returns a set of site names that failed."""
SKIP_TLDS = ('.onion', '.i2p')
domains = {}
for site in sites:
domain = get_base_domain(site.url_main)
if domain and not any(domain.endswith(tld) for tld in SKIP_TLDS):
domains.setdefault(domain, []).append(site)
failed_sites = set()
results = {}
def resolve(domain):
results[domain] = check_dns(domain)
threads = []
for domain in domains:
t = threading.Thread(target=resolve, args=(domain,))
threads.append(t)
t.start()
for t in threads:
t.join()
for domain, resolved in results.items():
if not resolved:
for site in domains[domain]:
failed_sites.add(site.name)
logging.warning(f"DNS resolution failed for {domain}")
return failed_sites
def get_step_rank(rank):
@@ -78,6 +130,8 @@ def main():
parser.add_argument('--empty-only', help='update only sites without rating', action='store_true')
parser.add_argument('--exclude-engine', help='do not update score with certain engine',
action="append", dest="exclude_engine_list", default=[])
parser.add_argument('--dns-check', help='disable sites whose domains do not resolve via DNS',
action='store_true')
pool = list()
@@ -91,30 +145,51 @@ def main():
with open("sites.md", "w") as site_file:
site_file.write(f"""
## List of supported sites (search methods): total {len(sites_subset)}\n
Rank data fetched from Alexa by domains.
Rank data fetched from Majestic Million by domains.
""")
if args.dns_check:
print("Checking DNS resolution for all site domains...")
failed = check_sites_dns(sites_subset)
disabled_count = 0
re_enabled_count = 0
for site in sites_subset:
if site.name in failed:
if not site.disabled:
site.disabled = True
disabled_count += 1
print(f" Disabled {site.name}: DNS does not resolve ({get_base_domain(site.url_main)})")
else:
if site.disabled:
# Re-enable previously disabled site if DNS now resolves
# (only if it was likely disabled due to DNS failure)
pass
print(f"DNS check complete: {disabled_count} site(s) disabled, {len(failed)} domain(s) unresolvable.")
majestic_ranks = {}
if args.with_rank:
majestic_ranks = fetch_majestic_million()
for site in sites_subset:
if not args.with_rank:
break
url_main = site.url_main
if site.alexa_rank < sys.maxsize and args.empty_only:
continue
if args.exclude_engine_list and site.engine in args.exclude_engine_list:
continue
site.alexa_rank = 0
th = threading.Thread(target=get_rank, args=(url_main, site,))
pool.append((site.name, url_main, th))
th.start()
domain = get_base_domain(site.url_main)
if domain in majestic_ranks:
site.alexa_rank = majestic_ranks[domain]
else:
site.alexa_rank = sys.maxsize
# In memory matching complete, no threads to join
if args.with_rank:
index = 1
for site_name, url_main, th in pool:
th.join()
sys.stdout.write("\r{0}".format(f"Updated {index} out of {len(sites_subset)} entries"))
sys.stdout.flush()
index = index + 1
print("Successfully updated ranks matching Majestic Million dataset.")
sites_full_list = [(s, int(s.alexa_rank)) for s in sites_subset]
@@ -142,6 +217,26 @@ Rank data fetched from Alexa by domains.
site_file.write(f'\nThe list was updated at ({datetime.now(timezone.utc).date()})\n')
db.save_to_file(args.base_file)
# Regenerate db_meta.json to stay in sync with data.json
try:
import hashlib, json, os
db_data_raw = open(args.base_file, 'rb').read()
db_data_parsed = json.loads(db_data_raw)
meta = {
"version": 1,
"updated_at": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
"sites_count": len(db_data_parsed.get("sites", {})),
"min_maigret_version": "0.5.0",
"data_sha256": hashlib.sha256(db_data_raw).hexdigest(),
"data_url": "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/data.json",
}
meta_path = os.path.join(os.path.dirname(args.base_file), "db_meta.json")
with open(meta_path, "w", encoding="utf-8") as mf:
json.dump(meta, mf, indent=4, ensure_ascii=False)
print(f"Updated {meta_path} ({meta['sites_count']} sites)")
except Exception as e:
print(f"Warning: could not regenerate db_meta.json: {e}")
statistics_text = db.get_db_stats(is_markdown=True)
site_file.write('## Statistics\n\n')
site_file.write(statistics_text)