Compare commits

..

726 Commits

Author SHA1 Message Date
copilot-swe-agent[bot] b59179b4d5 Rebase: squash branch onto main with single Virgool data.json change 2026-03-24 22:01:31 +00:00
copilot-swe-agent[bot] a2142634e1 fix(virgool): enable virgool.io via POST user-existence API
Use existing POST support from main to enable virgool.io:
- checkType: message with presenseStrs for user_exist:true
- urlProbe: POST /api/v1.4/auth/user-existence
- requestMethod: POST with requestPayload for username lookup
- Content-Type: application/json header
- absenceStrs with Persian user not found message
- disabled flag removed — POST API bypasses JS cookie protection
2026-03-24 22:01:18 +00:00
copilot-swe-agent[bot] ab01dfce92 Merge remote-tracking branch 'origin/copilot/fix-broken-site-virgool' into copilot/fix-broken-site-virgool 2026-03-24 21:33:35 +00:00
copilot-swe-agent[bot] 64cca25b12 fix(virgool): use existing POST support from main to enable virgool.io via user-existence API
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/e4d95115-25eb-44aa-b144-14d4bdc905c6
2026-03-24 21:32:45 +00:00
copilot-swe-agent[bot] 7ba2fd31ea refactor: move json import to module level and address review comments
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/e7f4ab84-917a-49fc-bfbd-9bbaf76027f8
2026-03-24 21:21:32 +00:00
copilot-swe-agent[bot] 4d70f0f7c9 feat(virgool): add POST support and use user-existence API to bypass JS cookies
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/e7f4ab84-917a-49fc-bfbd-9bbaf76027f8
2026-03-24 21:20:03 +00:00
Copilot 5aae2ee005 Fix update-site-data workflow race condition on branch push (#2366)
* Initial plan

* Fix update-site-data workflow race condition on branch push

- Add concurrency control to cancel in-progress runs on new pushes to main
- Delete existing PR branch before creating new one to avoid stale ref conflicts
- Upgrade peter-evans/create-pull-request from v5 to v7 (Node.js 20 deprecation)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/a095d3d3-0093-43e8-9cc5-82797bd52453

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 22:10:03 +01:00
Soxoj b145e7b26f feat(core): add POST request support, new sites, migrate to Majestic Million ranking (#2317)
* feat(core): add POST request support, new sites, migrate to Majestic Million ranking
- Added native POST request support to the Maigret engine (requestMethod, requestPayload) to enable querying modern JSON registration endpoints.
- Replaced the discontinued Alexa rank API with the Majestic Million dataset for global popularity sorting and automated CI updates.
- Fixed multiple false positives among top 500 sites and bypassed standard anti-bot protections using custom User-Agents.
- Updated public documentation and internal playbooks to reflect the new features.

* feat(data): apply all data.json site check updates from main branch

- Added CTFtime and PentesterLab (new sites added in main)
- Removed forums.imore.com (deleted in main as dead site)
- Disabled 5 sites per main branch fixes: Librusec, MirTesen, amateurvoyeurforum.com, forums.stevehoffman.tv, vegalab
- Fixed 5 site checks per main branch: SoundCloud, Taplink, Setlist, RoyalCams, club.cnews.ru (switched from status_code to message checkType with proper markers)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/a1d194d9-c0ff-4e2b-974c-c5e4b59548bf

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-03-24 22:08:42 +01:00
Copilot abd9aa57fe Fix domain substring matching and NoneType crash in submit dialog (#2367)
* Initial plan

* Fix domain matching and NoneType error in submit.py

- Use regex with domain boundary matching instead of substring matching
  to prevent x.com from matching 500px.com, mix.com, etc.
- Handle None old_site gracefully when user enters a site name not in
  the matched list, fixing AttributeError crash.
- Add tests for both fixes.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/7eabc755-47fd-4b80-a38c-9d6c056c2ce9

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 22:04:10 +01:00
Copilot 2e430e5039 feat: add tag blacklisting via --exclude-tags (#2352)
* Initial plan

* feat: add tag blacklisting support (--exclude-tags CLI flag, web UI, docs, tests)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/1a656af2-36bf-494f-9f03-1b5340f0357c

* fix: correct tag cloud label to match click-cycle interaction

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/1a656af2-36bf-494f-9f03-1b5340f0357c

* feat: add all country tags to web interface tag cloud

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/7e184b24-ff26-48fd-8a93-aea12b0a8d7b

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 22:00:59 +01:00
dependabot[bot] f5786f11ce build(deps): bump certifi from 2025.11.12 to 2026.2.25 (#2346)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2025.11.12 to 2026.2.25.
- [Commits](https://github.com/certifi/python-certifi/compare/2025.11.12...2026.02.25)

---
updated-dependencies:
- dependency-name: certifi
  dependency-version: 2026.2.25
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 15:13:10 +01:00
Copilot 3e56c95e16 Fix SoundCloud false-positive: switch to message-based check (#2355)
* Initial plan

* Fix SoundCloud false-positive: switch from status_code to message checkType

SoundCloud returns HTTP 200 for non-existent user profiles (soft 404),
causing status_code check to report CLAIMED for random usernames.

Switch to message checkType with:
- presenseStrs: hydratable user marker in server-rendered HTML
- absenceStrs: generic page title for non-existent users

Markers sourced from WhatsMyName project's verified SoundCloud entry.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/8aa10eef-78bf-4251-bf42-473cd94c7ef4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 15:12:56 +01:00
Copilot 28f35f9a4f Fix club.cnews.ru false positive: switch from status_code to message checkType (#2342)
* Initial plan

* Fix club.cnews.ru false positive: switch from status_code to message checkType with absence strings

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/af131d2f-c7b5-4798-8ad1-86bab2673fe4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 10:52:23 +01:00
Julio César Suástegui 79cea49526 feat: add CTFtime and PentesterLab site support (#2318)
Add two cybersecurity platforms for username enumeration:
- CTFtime (ctftime.org) - CTF competition platform
- PentesterLab (pentesterlab.com) - Security training platform

Both verified working with status_code check type.
Returns 200 for existing users, 404 for non-existent.

Co-authored-by: Julio César Suástegui <juliosuas@users.noreply.github.com>
2026-03-24 10:52:07 +01:00
github-actions[bot] 2d94269656 Automated Sites List Update (#2341)
* Updated site list and statistics

* Rebase and regenerate sites.md against latest main (#2351)

* Updated site list and statistics

* Initial plan

* Disable MirTesen site check (false positive) (#2350)

* Initial plan

* Disable MirTesen site check to fix false-positive probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/61c86064-423d-4f1b-8277-2838f747dd89

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

* build(deps): bump attrs from 25.4.0 to 26.1.0 (#2344)

Bumps [attrs](https://github.com/sponsors/hynek) from 25.4.0 to 26.1.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-version: 26.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updated site list and statistics

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 10:51:21 +01:00
dependabot[bot] 829bda885a build(deps): bump attrs from 25.4.0 to 26.1.0 (#2344)
Bumps [attrs](https://github.com/sponsors/hynek) from 25.4.0 to 26.1.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-version: 26.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 09:51:42 +01:00
Copilot eb541dcf51 Disable MirTesen site check (false positive) (#2350)
* Initial plan

* Disable MirTesen site check to fix false-positive probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/61c86064-423d-4f1b-8277-2838f747dd89

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-24 09:51:31 +01:00
Copilot 4c97025a32 Disable Librusec site check (false positive) (#2349)
* Initial plan

* Disable Librusec site check to fix false-positive probe

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-03-24 09:51:16 +01:00
dependabot[bot] 2775181a6a build(deps-dev): bump mypy from 1.19.0 to 1.19.1 (#2347)
Bumps [mypy](https://github.com/python/mypy) from 1.19.0 to 1.19.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.19.0...v1.19.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.19.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 09:35:54 +01:00
dependabot[bot] b00ef1f5dd build(deps): bump aiodns from 3.5.0 to 4.0.0 (#2345)
Bumps [aiodns](https://github.com/saghul/aiodns) from 3.5.0 to 4.0.0.
- [Release notes](https://github.com/saghul/aiodns/releases)
- [Changelog](https://github.com/aio-libs/aiodns/blob/master/ChangeLog)
- [Commits](https://github.com/saghul/aiodns/compare/v3.5.0...v4.0.0)

---
updated-dependencies:
- dependency-name: aiodns
  dependency-version: 4.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 09:32:12 +01:00
Copilot d3f13ac295 Fix false-positive site probe: Re-enable Taplink with message checkType (#2326)
* Initial plan

* Disable Taplink site check to fix false-positive detections

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/ef9281f4-ba67-4760-a6e2-57564ac4ea94

* Re-enable Taplink with message checkType and absenceStrs

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/db3e572e-b79b-4cec-ac7f-062e76144660

* Improve Taplink absenceStrs: add Russian variant and presenseStrs

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/28e24317-e8b9-45f6-bad5-0e549b891313

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 21:36:36 +01:00
github-actions[bot] 479a614d1d Automated Sites List Update (#2339)
* Updated site list and statistics

* Rebase: merge origin/main into auto/update-sites-list (#2340)

* Updated site list and statistics (#2315)

Co-authored-by: soxoj <soxoj@users.noreply.github.com>

* Initial plan

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

---------

Co-authored-by: soxoj <soxoj@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-03-23 21:36:09 +01:00
github-actions[bot] e0559e4320 Updated site list and statistics (#2315)
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
2026-03-23 20:20:53 +01:00
Copilot 00a9249229 [WIP] Fix invalid link on forums.imore.com (#2337)
* Initial plan

* Remove dead forums.imore.com site from database

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/c83530d0-d24f-45fc-aca3-ae1e46ece33c

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:20:28 +01:00
Copilot 005863c2e0 Fix Setlist site check: switch to message checkType with proper markers (#2333)
* Initial plan

* Disable Setlist site check due to false positives (soft 404)

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/8c552ca6-51e5-4e79-a791-ddd6f27d2461

* Fix Setlist check: switch to message checkType with proper markers

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/3c387df6-1dfe-451f-96d8-b4b6455f7857

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:18:33 +01:00
Copilot e3aada6aef Fix RoyalCams site check using BongaCams white-label pattern (#2334)
* Initial plan

* Disable RoyalCams site check to fix false-positive probe

The Telegram Maigret bot auto-probe reported CLAIMED for three random
usernames. The status_code checkType is unreliable as the site returns
200 for non-existent user profiles (soft 404). Disabling the site check
until a reliable detection method can be established.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/05b3d513-fe15-477d-a455-0c9ddf0b8b51

* Fix RoyalCams: switch to message checkType using BongaCams white-label pattern

RoyalCams runs on the BongaCams platform. Applied the same fix pattern:
- Switch from status_code to message checkType
- Use Portuguese locale (pt.royalcams.com) as urlProbe
- absenceStrs matches generic title on non-existent profiles
- presenseStrs matches Portuguese profile title for existing users
- Add browser-like headers matching BongaCams config
- Remove disabled flag

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/2f6a9523-278a-4992-ba7c-c320de14bfa4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:16:45 +01:00
Copilot 9b35fc1ab0 [WIP] Fix false-positive probe for vegalab site (#2336)
* Initial plan

* Disable vegalab site check: domain is dead (DNS does not resolve), causing false positives

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/98430e81-5dcb-4cb3-9aaa-f8c5ce86d026

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:09:46 +01:00
Copilot 146bc0481b Disable forums.stevehoffman.tv due to false positives (#2331)
* Initial plan

* Disable forums.stevehoffman.tv to fix false-positive site probe

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/39fea4a9-ec6d-4a12-b34b-1a3486d647e4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:08:15 +01:00
Copilot 5930a3022e Disable false-positive site probe: amateurvoyeurforum.com (#2332)
* Initial plan

* Disable amateurvoyeurforum.com site check to fix false positives

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/e7fcad2b-4511-4e6d-b186-411951170e0a

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-23 20:07:42 +01:00
dependabot[bot] b4482e0ba4 build(deps): bump aiohttp-socks from 0.10.1 to 0.11.0 (#2319)
Bumps [aiohttp-socks](https://github.com/romis2012/aiohttp-socks) from 0.10.1 to 0.11.0.
- [Release notes](https://github.com/romis2012/aiohttp-socks/releases)
- [Commits](https://github.com/romis2012/aiohttp-socks/compare/v0.10.1...v0.11.0)

---
updated-dependencies:
- dependency-name: aiohttp-socks
  dependency-version: 0.11.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-23 12:16:47 +01:00
dependabot[bot] 2c55501bc2 build(deps-dev): bump pytest-cov from 7.0.0 to 7.1.0 (#2320)
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 7.0.0 to 7.1.0.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v7.0.0...v7.1.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-version: 7.1.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-23 12:15:36 +01:00
dependabot[bot] 3ba07591a1 build(deps-dev): bump coverage from 7.12.0 to 7.13.5 (#2321)
Bumps [coverage](https://github.com/coveragepy/coveragepy) from 7.12.0 to 7.13.5.
- [Release notes](https://github.com/coveragepy/coveragepy/releases)
- [Changelog](https://github.com/coveragepy/coveragepy/blob/main/CHANGES.rst)
- [Commits](https://github.com/coveragepy/coveragepy/compare/7.12.0...7.13.5)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.13.5
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-23 12:14:31 +01:00
dependabot[bot] a2d4373b68 build(deps): bump reportlab from 4.4.5 to 4.4.10 (#2323)
Bumps [reportlab](https://www.reportlab.com/) from 4.4.5 to 4.4.10.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-version: 4.4.10
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-23 12:14:18 +01:00
Copilot b960acec10 Pin requests-toolbelt>=1.0.0 to fix urllib3 v2 incompatibility (#2316)
* Initial plan

* Add requests-toolbelt ^1.0.0 as explicit dependency to fix urllib3 v2 appengine ImportError

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/458d41b2-c135-4b51-b0b1-b1832490c808

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 23:53:20 +01:00
Copilot b1a211c3cd Disable forums.developer.nvidia.com (auth-gated user profiles) (#2305)
* Initial plan

* disable forums.developer.nvidia.com due to auth-locked user pages

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/b8f41f15-8588-4aac-a443-af5e2aaa1918

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:43:51 +01:00
Copilot 56d0c9f2f1 Remove dead site xxxforum.org (#2310)
* Initial plan

* Remove broken site xxxforum.org from data.json and sites.md

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/bfbd3aa8-bfb1-480a-b2e7-a2c40fc69def

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:43:21 +01:00
Copilot 01049b730d Fix Love.Mail.ru: update to numeric-only identifiers and new profile URL (#2307)
* Initial plan

* fix: update Love.Mail.ru to use numeric-only identifiers (#1264)

- Add regexCheck to enforce numeric-only IDs (^\d+$)
- Update usernameClaimed/usernameUnclaimed to numeric values
- Site remains disabled pending live verification

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/6de16097-6bc1-424a-beb1-1d2ec6b99944

* fix: update Love.Mail.ru URL to /profile/ path, enable check with verified ID

Use maintainer-provided working link https://love.mail.ru/profile/1838153357.
- Change URL pattern from /ru/{username} to /profile/{username}
- Set usernameClaimed to 1838153357
- Remove disabled flag to enable the check

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/ac07d38e-46e2-42d3-9e93-eda3e5cfbcc3

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:42:59 +01:00
github-actions[bot] 2c2d3409e2 Updated site list and statistics (#2314)
Co-authored-by: soxoj <soxoj@users.noreply.github.com>
2026-03-22 22:42:50 +01:00
Soxoj e81b50ef61 Update site data workflow fix: remove ambiguous main tag (#2313)
* feat(workflow): fix update site data workflow err

* feat(workflow): the final update side data workflow fix (hopefully)
2026-03-22 22:37:48 +01:00
Soxoj 9ac0a65914 feat(workflow): fix update site data workflow err (#2312) 2026-03-22 22:31:55 +01:00
Copilot 4f397fed1c Re-enable taplink.cc with browser User-Agent to bypass Cloudflare (#2308)
* Initial plan

* fix(taplink): re-enable taplink.cc with browser User-Agent header to bypass Cloudflare

Remove disabled flag and add a Chrome User-Agent header to help
bypass Cloudflare bot detection for taplink.cc profile checks.
If Cloudflare still blocks requests, maigret's built-in error
detection will gracefully mark results as UNKNOWN.

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/271904b6-e358-4aeb-b503-21c9b91186d9

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 22:10:44 +01:00
copilot-swe-agent[bot] 2def9a2014 fix(virgool): switch from status_code to message check type with errors for JS cookies
The virgool.io site uses JS-generated cookies for anti-bot protection,
causing HTTP 200 for all URLs regardless of user existence. The
status_code check type always produced false positives.

Changes:
- Switch checkType from status_code to message so presence/absence
  strings are actually used for detection
- Add presenseStrs with profile-specific markers that won't match
  the JS cookie challenge page
- Add errors field to detect the JS cookie challenge page and report
  a meaningful error message
- Keep disabled: true as the site requires JS execution

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/2df41b87-39e2-47c7-934a-f03106cef094
2026-03-22 21:00:40 +00:00
copilot-swe-agent[bot] 0615deab8b Initial plan 2026-03-22 20:36:33 +00:00
Soxoj a17e0c7a13 feat(workflow): fix update site data workflow dependency (#2306) 2026-03-22 21:34:30 +01:00
dependabot[bot] e84e394e6f Bump svglib from 1.5.1 to 1.6.0 (#2205)
* Bump svglib from 1.5.1 to 1.6.0

Bumps [svglib](https://github.com/deeplook/svglib) from 1.5.1 to 1.6.0.
- [Changelog](https://github.com/deeplook/svglib/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/deeplook/svglib/commits)

---
updated-dependencies:
- dependency-name: svglib
  dependency-version: 1.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Add libcairo2-dev to CI workflow for svglib 1.6.0 compatibility (#2304)

* Initial plan

* Add libcairo2-dev system dependency install step to test workflow

Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/3ecab70e-d4a3-4e74-9245-bffc58d6d0a3

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
2026-03-22 21:06:10 +01:00
Soxoj b8ada1c818 Update sites list workflow (#2303) 2026-03-22 20:59:37 +01:00
Soxoj 959b2be136 feat(sites): fix false positives: disable 74 broken sites, fix 8 with API probes and better markers (#2302)
- Disable 74 sites: Cloudflare/captcha blocks, identical responses,
    dead domains, vBulletin/phpBB engine failures
  - Fix Roblox, Salon24.pl, Planetaexcel → status_code (clear 404 signal)
  - Fix en.brickimedia.org → message with "noarticletext" absenceStr
  - Fix Arduino → narrower title-based presenseStrs/absenceStrs
  - Re-enable Fandom (3 wikis) via MediaWiki api.php urlProbe
  - Re-enable Substack via /api/v1/user/{}/public_profile urlProbe
  - Re-enable hashnode via GraphQL GET urlProbe (URL-encoded query)
  - Document lessons: engine template drift, search-by-author fragility,
    always-200 sites, TLS degradation, API bypassing Cloudflare,
    GraphQL GET support, URL-encoding for template safety
2026-03-22 20:47:51 +01:00
Soxoj 97cc4b46d9 Improve site-check quality: fix broken site configs, add diagnostic utilities, and make self-check report-only by default with opt-in auto-disable. (#2301)
- Fix VK and TradingView checkType; add Reddit and Microsoft Learn API-style probes where appropriate; adjust or disable entries that are unreliable under anti-bot protection.
- Self-check: stop aggressive auto-disable; default to reporting issues only; add --auto-disable and --diagnose for optional fixes and deeper output.
- Tooling: add utils/site_check.py and utils/check_top_n.py (and related helpers) to inspect and rank site behavior against the top-N list
- Scope: aligns with fixing top-traffic / high-impact sites and making diagnostics repeatable without silently flipping disabled flags
2026-03-22 16:48:35 +01:00
Soxoj f3b741d283 Update Telegram bot link in README (#2300) 2026-03-22 12:23:35 +01:00
dependabot[bot] 33620853a1 Bump certifi from 2025.10.5 to 2025.11.12 (#2249)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2025.10.5 to 2025.11.12.
- [Commits](https://github.com/certifi/python-certifi/compare/2025.10.05...2025.11.12)

---
updated-dependencies:
- dependency-name: certifi
  dependency-version: 2025.11.12
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-22 12:21:32 +01:00
dependabot[bot] 19ed03a94d build(deps): bump werkzeug from 3.1.4 to 3.1.6 (#2288)
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.1.4 to 3.1.6.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/werkzeug/compare/3.1.4...3.1.6)

---
updated-dependencies:
- dependency-name: werkzeug
  dependency-version: 3.1.6
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-22 12:20:42 +01:00
dependabot[bot] 35372446e0 Bump reportlab from 4.4.4 to 4.4.5 (#2251)
Bumps [reportlab](https://www.reportlab.com/) from 4.4.4 to 4.4.5.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-version: 4.4.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-22 12:19:43 +01:00
dependabot[bot] 519bb46db6 build(deps): bump flask from 3.1.2 to 3.1.3 (#2289)
Bumps [flask](https://github.com/pallets/flask) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/flask/releases)
- [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/flask/compare/3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: flask
  dependency-version: 3.1.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-22 12:19:21 +01:00
Soxoj 227a25bfa1 Twitter fixed, mirrors mechanism improvement (#2299) 2026-03-22 01:14:17 +01:00
Soxoj 5da4e78092 Pyinstaller GitHub workflow fix (#2298) 2026-03-22 00:59:17 +01:00
Soxoj e4d6b064df Update Telegram bot link in README (#2293) 2026-03-21 23:49:45 +01:00
Soxoj f99091f5f7 Fixed false positives in top-500 (#2292) 2026-03-21 23:35:59 +01:00
Soxoj f26976f1dd Dockerfile fix (#2290) 2026-03-21 20:02:35 +01:00
dependabot[bot] 83ae9c0133 Bump pypdf from 6.4.0 to 6.9.1 (#2281)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.4.0 to 6.9.1.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/6.4.0...6.9.1)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.9.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-21 18:09:10 +01:00
dependabot[bot] 93c4fdeba9 Bump cryptography from 44.0.1 to 46.0.5 (#2270)
Bumps [cryptography](https://github.com/pyca/cryptography) from 44.0.1 to 46.0.5.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/44.0.1...46.0.5)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-version: 46.0.5
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-21 18:08:57 +01:00
dependabot[bot] 6ec3c47769 Bump black from 25.11.0 to 26.3.1 (#2280)
Bumps [black](https://github.com/psf/black) from 25.11.0 to 26.3.1.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.11.0...26.3.1)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 26.3.1
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-21 18:08:45 +01:00
dependabot[bot] 3dc3fe9371 Bump pillow from 11.0.0 to 12.1.1 (#2271)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 11.0.0 to 12.1.1.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/11.0.0...12.1.1)

---
updated-dependencies:
- dependency-name: pillow
  dependency-version: 12.1.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-21 18:08:18 +01:00
dependabot[bot] ebf8227bf1 Bump urllib3 from 2.5.0 to 2.6.3 (#2262)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.3.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.3)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-21 18:07:51 +01:00
Tang Vu 5b7b28e683 refactor: unexpanded tilde in file path (#2283)
The path `'~/.maigret/settings.json'` uses a tilde (`~`) which is not automatically expanded by Python's `open()` function. This will cause the settings file in the user's home directory to be silently ignored (caught by `FileNotFoundError`) because Python will look for a literal directory named `~` in the current working directory.

Affected files: settings.py
2026-03-21 18:07:23 +01:00
Tang Vu 0e95e2e3cc refactor: missing tests for settings cascade and override logic (#2287)
The `Settings.load()` method iterates through multiple configuration file paths and updates the internal `__dict__`, intending to override earlier default settings with later user-specific ones. This cascading logic is a core configuration feature but lacks explicit tests to guarantee that dictionary merging and overriding behave exactly as documented (e.g., ensuring a setting in `~/.maigret/settings.json` correctly overrides `resources/settings.json` without wiping out other keys).


Affected files: test_settings.py
2026-03-21 18:06:54 +01:00
Tang Vu 4cd1fccaa3 ♻️ Refactor: Hardcoded relative path for database file (#2285)
* refactor: hardcoded relative path for database file

`app.config['MAIGRET_DB_FILE']` is set to a hardcoded relative path `os.path.join('maigret', 'resources', 'data.json')`. If the Flask application is executed from a different working directory (other than the repository root), it will fail to find the database file and crash.

Affected files: app.py, settings.py

* refactor: hardcoded relative path for database file

`app.config['MAIGRET_DB_FILE']` is set to a hardcoded relative path `os.path.join('maigret', 'resources', 'data.json')`. If the Flask application is executed from a different working directory (other than the repository root), it will fail to find the database file and crash.

Affected files: app.py, settings.py
2026-03-21 18:06:36 +01:00
dependabot[bot] 83a9dafe55 Bump mypy from 1.18.2 to 1.19.0 (#2250)
Bumps [mypy](https://github.com/python/mypy) from 1.18.2 to 1.19.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.18.2...v1.19.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.19.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-06 21:13:28 +01:00
dependabot[bot] b4147d2cd3 Bump pytest from 8.4.2 to 9.0.1 (#2244)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.4.2 to 9.0.1.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.4.2...9.0.1)

---
updated-dependencies:
- dependency-name: pytest
  dependency-version: 9.0.1
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-06 21:13:15 +01:00
dependabot[bot] aa591da913 Bump aiohttp from 3.13.2 to 3.13.3 (#2261)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-06 21:12:22 +01:00
dependabot[bot] 2d4d3ba0cc Bump pytest-asyncio from 1.2.0 to 1.3.0 (#2242)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 1.2.0 to 1.3.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v1.2.0...v1.3.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-version: 1.3.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-02 14:16:48 +01:00
dependabot[bot] ec21bbe974 Bump asgiref from 3.10.0 to 3.11.0 (#2243)
Bumps [asgiref](https://github.com/django/asgiref) from 3.10.0 to 3.11.0.
- [Changelog](https://github.com/django/asgiref/blob/main/CHANGELOG.txt)
- [Commits](https://github.com/django/asgiref/compare/3.10.0...3.11.0)

---
updated-dependencies:
- dependency-name: asgiref
  dependency-version: 3.11.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-02 14:16:27 +01:00
dependabot[bot] 1a4190ee03 Bump pypdf from 6.1.3 to 6.4.0 (#2245)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.1.3 to 6.4.0.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/6.1.3...6.4.0)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.4.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-02 14:16:14 +01:00
dependabot[bot] fe60783a68 Bump werkzeug from 3.1.3 to 3.1.4 (#2248)
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.1.3 to 3.1.4.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/werkzeug/compare/3.1.3...3.1.4)

---
updated-dependencies:
- dependency-name: werkzeug
  dependency-version: 3.1.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-02 14:16:04 +01:00
dependabot[bot] 8aa0fab314 Bump coverage from 7.11.0 to 7.12.0 (#2241)
Bumps [coverage](https://github.com/coveragepy/coveragepy) from 7.11.0 to 7.12.0.
- [Release notes](https://github.com/coveragepy/coveragepy/releases)
- [Changelog](https://github.com/coveragepy/coveragepy/blob/main/CHANGES.rst)
- [Commits](https://github.com/coveragepy/coveragepy/compare/7.11.0...7.12.0)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.12.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-28 10:31:19 +01:00
dependabot[bot] 941a5171ae Bump psutil from 7.1.0 to 7.1.3 (#2240)
Bumps [psutil](https://github.com/giampaolo/psutil) from 7.1.0 to 7.1.3.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-7.1.0...release-7.1.3)

---
updated-dependencies:
- dependency-name: psutil
  dependency-version: 7.1.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-23 21:24:11 +01:00
dependabot[bot] 9a1bd8ffdb Bump python-bidi from 0.6.6 to 0.6.7 (#2234)
Bumps [python-bidi](https://github.com/MeirKriheli/python-bidi) from 0.6.6 to 0.6.7.
- [Release notes](https://github.com/MeirKriheli/python-bidi/releases)
- [Changelog](https://github.com/MeirKriheli/python-bidi/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/MeirKriheli/python-bidi/compare/v0.6.6...v0.6.7)

---
updated-dependencies:
- dependency-name: python-bidi
  dependency-version: 0.6.7
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-23 21:24:04 +01:00
dependabot[bot] 68f586fcca Bump black from 25.9.0 to 25.11.0 (#2239)
Bumps [black](https://github.com/psf/black) from 25.9.0 to 25.11.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.9.0...25.11.0)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 25.11.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-23 21:23:52 +01:00
dependabot[bot] e39476c4c7 Bump pypdf from 6.0.0 to 6.1.3 (#2233)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.0.0 to 6.1.3.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/6.0.0...6.1.3)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.1.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-23 21:23:40 +01:00
dependabot[bot] 6a7f778c80 Bump aiohttp from 3.13.0 to 3.13.2 (#2237)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-07 12:17:34 +01:00
dependabot[bot] 7679f98e58 Bump attrs from 25.3.0 to 25.4.0 (#2226)
Bumps [attrs](https://github.com/sponsors/hynek) from 25.3.0 to 25.4.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-version: 25.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:41:42 +07:00
dependabot[bot] c6dbc09ba5 Bump pytest-rerunfailures from 16.0.1 to 16.1 (#2229)
Bumps [pytest-rerunfailures](https://github.com/pytest-dev/pytest-rerunfailures) from 16.0.1 to 16.1.
- [Changelog](https://github.com/pytest-dev/pytest-rerunfailures/blob/master/CHANGES.rst)
- [Commits](https://github.com/pytest-dev/pytest-rerunfailures/compare/16.0.1...16.1)

---
updated-dependencies:
- dependency-name: pytest-rerunfailures
  dependency-version: '16.1'
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:41:30 +07:00
dependabot[bot] b8352c3406 Bump certifi from 2025.8.3 to 2025.10.5 (#2228)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2025.8.3 to 2025.10.5.
- [Commits](https://github.com/certifi/python-certifi/compare/2025.08.03...2025.10.05)

---
updated-dependencies:
- dependency-name: certifi
  dependency-version: 2025.10.5
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:41:18 +07:00
dependabot[bot] 8a02ad5ed7 Bump coverage from 7.10.7 to 7.11.0 (#2230)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.10.7 to 7.11.0.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.10.7...7.11.0)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.11.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:41:07 +07:00
dependabot[bot] 8fda5776c6 Bump aiohttp from 3.12.15 to 3.13.0 (#2225)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-14 19:55:53 +02:00
dependabot[bot] 2347bd2f7d Bump idna from 3.10 to 3.11 (#2227)
Bumps [idna](https://github.com/kjd/idna) from 3.10 to 3.11.
- [Release notes](https://github.com/kjd/idna/releases)
- [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst)
- [Commits](https://github.com/kjd/idna/compare/v3.10...v3.11)

---
updated-dependencies:
- dependency-name: idna
  dependency-version: '3.11'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-14 19:55:44 +02:00
dependabot[bot] 229472f323 Bump multidict from 6.6.4 to 6.7.0 (#2224)
Bumps [multidict](https://github.com/aio-libs/multidict) from 6.6.4 to 6.7.0.
- [Release notes](https://github.com/aio-libs/multidict/releases)
- [Changelog](https://github.com/aio-libs/multidict/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/multidict/compare/v6.6.4...v6.7.0)

---
updated-dependencies:
- dependency-name: multidict
  dependency-version: 6.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-13 10:06:54 +02:00
dependabot[bot] 6acc22dd69 Bump markupsafe from 3.0.2 to 3.0.3 (#2209)
Bumps [markupsafe](https://github.com/pallets/markupsafe) from 3.0.2 to 3.0.3.
- [Release notes](https://github.com/pallets/markupsafe/releases)
- [Changelog](https://github.com/pallets/markupsafe/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/markupsafe/compare/3.0.2...3.0.3)

---
updated-dependencies:
- dependency-name: markupsafe
  dependency-version: 3.0.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-10 11:37:12 +02:00
dependabot[bot] 8af07b3889 Bump yarl from 1.20.1 to 1.22.0 (#2221)
---
updated-dependencies:
- dependency-name: yarl
  dependency-version: 1.22.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-10 11:37:02 +02:00
dependabot[bot] e9df40bdce Bump asgiref from 3.9.2 to 3.10.0 (#2220)
Bumps [asgiref](https://github.com/django/asgiref) from 3.9.2 to 3.10.0.
- [Changelog](https://github.com/django/asgiref/blob/main/CHANGELOG.txt)
- [Commits](https://github.com/django/asgiref/compare/3.9.2...3.10.0)

---
updated-dependencies:
- dependency-name: asgiref
  dependency-version: 3.10.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-10 11:36:53 +02:00
dependabot[bot] d5bef9e3ac Bump platformdirs from 4.4.0 to 4.5.0 (#2223)
Bumps [platformdirs](https://github.com/tox-dev/platformdirs) from 4.4.0 to 4.5.0.
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.4.0...4.5.0)

---
updated-dependencies:
- dependency-name: platformdirs
  dependency-version: 4.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-10 11:36:44 +02:00
dependabot[bot] 25121754bd Bump lxml from 6.0.1 to 6.0.2 (#2208)
Bumps [lxml](https://github.com/lxml/lxml) from 6.0.1 to 6.0.2.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-6.0.1...lxml-6.0.2)

---
updated-dependencies:
- dependency-name: lxml
  dependency-version: 6.0.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-08 09:57:59 +02:00
dependabot[bot] 198c11b8d4 Bump asgiref from 3.9.1 to 3.9.2 (#2204)
Bumps [asgiref](https://github.com/django/asgiref) from 3.9.1 to 3.9.2.
- [Changelog](https://github.com/django/asgiref/blob/main/CHANGELOG.txt)
- [Commits](https://github.com/django/asgiref/compare/3.9.1...3.9.2)

---
updated-dependencies:
- dependency-name: asgiref
  dependency-version: 3.9.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-01 13:32:28 +02:00
dependabot[bot] bf9bc5a518 Bump psutil from 7.0.0 to 7.1.0 (#2201)
Bumps [psutil](https://github.com/giampaolo/psutil) from 7.0.0 to 7.1.0.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-7.0.0...release-7.1.0)

---
updated-dependencies:
- dependency-name: psutil
  dependency-version: 7.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-01 13:32:21 +02:00
dependabot[bot] 41e246f6a6 Bump coverage from 7.10.6 to 7.10.7 (#2207)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.10.6 to 7.10.7.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.10.6...7.10.7)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.10.7
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-01 13:32:13 +02:00
dependabot[bot] 9f58fb27ad Bump reportlab from 4.4.3 to 4.4.4 (#2206)
Bumps [reportlab](https://www.reportlab.com/) from 4.4.3 to 4.4.4.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-version: 4.4.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-01 13:32:04 +02:00
dependabot[bot] b344a5d98a Bump pyinstaller from 6.15.0 to 6.16.0 (#2199)
Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 6.15.0 to 6.16.0.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v6.15.0...v6.16.0)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-version: 6.16.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-25 11:10:36 +03:00
dependabot[bot] d8b26181f1 Bump pytest-asyncio from 1.1.0 to 1.2.0 (#2200)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-version: 1.2.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-25 11:10:27 +03:00
dependabot[bot] a60d96c7f2 Bump mypy from 1.18.1 to 1.18.2 (#2202)
Bumps [mypy](https://github.com/python/mypy) from 1.18.1 to 1.18.2.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.18.1...v1.18.2)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.18.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-25 11:10:22 +03:00
dependabot[bot] a3159b213b Bump black from 25.1.0 to 25.9.0 (#2203)
Bumps [black](https://github.com/psf/black) from 25.1.0 to 25.9.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.1.0...25.9.0)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 25.9.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-25 11:10:06 +03:00
dependabot[bot] 123ead4c03 Bump mypy from 1.17.1 to 1.18.1 (#2197)
Bumps [mypy](https://github.com/python/mypy) from 1.17.1 to 1.18.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.17.1...v1.18.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.18.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-14 10:38:16 +02:00
dependabot[bot] cd7571ef57 Bump pytest-cov from 6.3.0 to 7.0.0 (#2196)
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 6.3.0 to 7.0.0.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v6.3.0...v7.0.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-version: 7.0.0
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-10 13:28:44 +02:00
dependabot[bot] d922f9be25 Bump pytest-cov from 6.2.1 to 6.3.0 (#2195)
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 6.2.1 to 6.3.0.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v6.2.1...v6.3.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-version: 6.3.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 10:06:55 +02:00
dependabot[bot] 3b20b36609 Bump pytest from 8.4.1 to 8.4.2 (#2194)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.4.1 to 8.4.2.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.4.1...8.4.2)

---
updated-dependencies:
- dependency-name: pytest
  dependency-version: 8.4.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-05 11:59:27 +02:00
dependabot[bot] ba86981cf4 Bump pytest-rerunfailures from 15.1 to 16.0.1 (#2193)
Bumps [pytest-rerunfailures](https://github.com/pytest-dev/pytest-rerunfailures) from 15.1 to 16.0.1.
- [Changelog](https://github.com/pytest-dev/pytest-rerunfailures/blob/master/CHANGES.rst)
- [Commits](https://github.com/pytest-dev/pytest-rerunfailures/compare/15.1...16.0.1)

---
updated-dependencies:
- dependency-name: pytest-rerunfailures
  dependency-version: 16.0.1
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-04 20:26:34 +02:00
dependabot[bot] 561ced647f Bump pytest-rerunfailures from 15.1 to 16.0 (#2191)
Bumps [pytest-rerunfailures](https://github.com/pytest-dev/pytest-rerunfailures) from 15.1 to 16.0.
- [Changelog](https://github.com/pytest-dev/pytest-rerunfailures/blob/master/CHANGES.rst)
- [Commits](https://github.com/pytest-dev/pytest-rerunfailures/compare/15.1...16.0)

---
updated-dependencies:
- dependency-name: pytest-rerunfailures
  dependency-version: '16.0'
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-02 11:19:28 +02:00
dependabot[bot] 7be3ee8240 Bump coverage from 7.10.5 to 7.10.6 (#2192)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.10.5 to 7.10.6.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.10.5...7.10.6)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.10.6
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-02 11:19:14 +02:00
Soxoj 48ca13dc4d Make web interface accessible for Docker deployment by default (#2189) 2025-08-31 16:14:42 +02:00
dependabot[bot] 7f94e86259 Bump platformdirs from 4.3.8 to 4.4.0 (#2184)
Bumps [platformdirs](https://github.com/tox-dev/platformdirs) from 4.3.8 to 4.4.0.
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.3.8...4.4.0)

---
updated-dependencies:
- dependency-name: platformdirs
  dependency-version: 4.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-29 15:20:31 +02:00
dependabot[bot] c2ed1af4b4 Bump python-bidi from 0.6.3 to 0.6.6 (#2183)
Bumps [python-bidi](https://github.com/MeirKriheli/python-bidi) from 0.6.3 to 0.6.6.
- [Release notes](https://github.com/MeirKriheli/python-bidi/releases)
- [Changelog](https://github.com/MeirKriheli/python-bidi/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/MeirKriheli/python-bidi/compare/v0.6.3...v0.6.6)

---
updated-dependencies:
- dependency-name: python-bidi
  dependency-version: 0.6.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-29 15:20:23 +02:00
dependabot[bot] 648ba6e64c Bump typing-extensions from 4.14.1 to 4.15.0 (#2182)
Bumps [typing-extensions](https://github.com/python/typing_extensions) from 4.14.1 to 4.15.0.
- [Release notes](https://github.com/python/typing_extensions/releases)
- [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/python/typing_extensions/compare/4.14.1...4.15.0)

---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-version: 4.15.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-29 15:20:15 +02:00
dependabot[bot] 56815d8368 Bump soupsieve from 2.7 to 2.8 (#2185)
Bumps [soupsieve](https://github.com/facelessuser/soupsieve) from 2.7 to 2.8.
- [Release notes](https://github.com/facelessuser/soupsieve/releases)
- [Commits](https://github.com/facelessuser/soupsieve/compare/2.7...2.8)

---
updated-dependencies:
- dependency-name: soupsieve
  dependency-version: '2.8'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-29 15:20:05 +02:00
dependabot[bot] b178e97d90 Bump multidict from 6.6.3 to 6.6.4 (#2177)
Bumps [multidict](https://github.com/aio-libs/multidict) from 6.6.3 to 6.6.4.
- [Release notes](https://github.com/aio-libs/multidict/releases)
- [Changelog](https://github.com/aio-libs/multidict/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/multidict/compare/v6.6.3...v6.6.4)

---
updated-dependencies:
- dependency-name: multidict
  dependency-version: 6.6.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-27 00:40:35 +02:00
dependabot[bot] a764198c2c Bump lxml from 6.0.0 to 6.0.1 (#2178)
Bumps [lxml](https://github.com/lxml/lxml) from 6.0.0 to 6.0.1.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-6.0.0...lxml-6.0.1)

---
updated-dependencies:
- dependency-name: lxml
  dependency-version: 6.0.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-27 00:40:24 +02:00
dependabot[bot] 2c4684e4a9 Bump psutil from 6.1.1 to 7.0.0 (#2179)
Bumps [psutil](https://github.com/giampaolo/psutil) from 6.1.1 to 7.0.0.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-6.1.1...release-7.0.0)

---
updated-dependencies:
- dependency-name: psutil
  dependency-version: 7.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-27 00:40:15 +02:00
dependabot[bot] 8713e1a63e Bump coverage from 7.10.3 to 7.10.5 (#2180)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.10.3 to 7.10.5.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.10.3...7.10.5)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.10.5
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-27 00:40:08 +02:00
dependabot[bot] 55adc70d10 Bump aiohttp from 3.12.14 to 3.12.15 (#2181)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.12.15
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-27 00:39:59 +02:00
dependabot[bot] 53fc83dbce Bump flake8 from 7.1.1 to 7.3.0 (#2171)
Bumps [flake8](https://github.com/pycqa/flake8) from 7.1.1 to 7.3.0.
- [Commits](https://github.com/pycqa/flake8/compare/7.1.1...7.3.0)

---
updated-dependencies:
- dependency-name: flake8
  dependency-version: 7.3.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-22 21:15:03 +02:00
dependabot[bot] e8bd00f013 Bump pytest from 8.3.4 to 8.4.1 (#2172)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.3.4 to 8.4.1.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.3.4...8.4.1)

---
updated-dependencies:
- dependency-name: pytest
  dependency-version: 8.4.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-22 21:14:55 +02:00
dependabot[bot] a0ba853e64 Bump mypy from 1.14.1 to 1.17.1 (#2173)
Bumps [mypy](https://github.com/python/mypy) from 1.14.1 to 1.17.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.14.1...v1.17.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.17.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-22 21:14:48 +02:00
dependabot[bot] 54b4c7d2ab Bump pyinstaller from 6.11.1 to 6.15.0 (#2174)
Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 6.11.1 to 6.15.0.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v6.11.1...v6.15.0)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-version: 6.15.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-22 21:14:35 +02:00
dependabot[bot] 8791bca866 Bump flask from 3.1.1 to 3.1.2 (#2175)
Bumps [flask](https://github.com/pallets/flask) from 3.1.1 to 3.1.2.
- [Release notes](https://github.com/pallets/flask/releases)
- [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/flask/compare/3.1.1...3.1.2)

---
updated-dependencies:
- dependency-name: flask
  dependency-version: 3.1.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-22 21:14:24 +02:00
Soxoj fb26ccd1f6 Disabled some sites giving false positive results (#2170) 2025-08-22 03:10:47 +02:00
dependabot[bot] c22abdb834 Bump certifi from 2025.6.15 to 2025.8.3 (#2147)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2025.6.15 to 2025.8.3.
- [Commits](https://github.com/certifi/python-certifi/compare/2025.06.15...2025.08.03)

---
updated-dependencies:
- dependency-name: certifi
  dependency-version: 2025.8.3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-21 16:27:10 +02:00
dependabot[bot] 0689470506 Bump alive-progress from 3.2.0 to 3.3.0 (#2145)
Bumps [alive-progress](https://github.com/rsalmei/alive-progress) from 3.2.0 to 3.3.0.
- [Changelog](https://github.com/rsalmei/alive-progress/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rsalmei/alive-progress/compare/v3.2.0...v3.3.0)

---
updated-dependencies:
- dependency-name: alive-progress
  dependency-version: 3.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-21 16:27:02 +02:00
dependabot[bot] 410d7568b7 Bump aiodns from 3.2.0 to 3.5.0 (#2148)
Bumps [aiodns](https://github.com/saghul/aiodns) from 3.2.0 to 3.5.0.
- [Release notes](https://github.com/saghul/aiodns/releases)
- [Changelog](https://github.com/aio-libs/aiodns/blob/master/ChangeLog)
- [Commits](https://github.com/saghul/aiodns/compare/v3.2.0...v3.5.0)

---
updated-dependencies:
- dependency-name: aiodns
  dependency-version: 3.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-21 10:03:57 +02:00
dependabot[bot] 7280033198 Bump lxml from 5.3.0 to 6.0.0 (#2146)
Bumps [lxml](https://github.com/lxml/lxml) from 5.3.0 to 6.0.0.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-5.3.0...lxml-6.0.0)

---
updated-dependencies:
- dependency-name: lxml
  dependency-version: 6.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-21 10:03:46 +02:00
dependabot[bot] 3c6af42916 Bump requests from 2.32.4 to 2.32.5 (#2165)
Bumps [requests](https://github.com/psf/requests) from 2.32.4 to 2.32.5.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.4...v2.32.5)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-21 10:02:46 +02:00
dependabot[bot] cdb896ba32 Bump xhtml2pdf from 0.2.16 to 0.2.17 (#2149)
Bumps [xhtml2pdf](https://github.com/xhtml2pdf/xhtml2pdf) from 0.2.16 to 0.2.17.
- [Release notes](https://github.com/xhtml2pdf/xhtml2pdf/releases)
- [Commits](https://github.com/xhtml2pdf/xhtml2pdf/compare/v0.2.16...v0.2.17)

---
updated-dependencies:
- dependency-name: xhtml2pdf
  dependency-version: 0.2.17
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-21 09:57:49 +02:00
dependabot[bot] 6bd047fda3 Bump pytest-cov from 6.0.0 to 6.2.1 (#2115)
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 6.0.0 to 6.2.1.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v6.0.0...v6.2.1)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-version: 6.2.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-16 14:07:36 +02:00
dependabot[bot] e30cf353a6 Bump pytest-asyncio from 1.0.0 to 1.1.0 (#2114)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 1.0.0 to 1.1.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v1.0.0...v1.1.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-version: 1.1.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-16 14:03:49 +02:00
dependabot[bot] bd9e48de7c Bump mock from 5.1.0 to 5.2.0 (#2116)
Bumps [mock](https://github.com/testing-cabal/mock) from 5.1.0 to 5.2.0.
- [Changelog](https://github.com/testing-cabal/mock/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/testing-cabal/mock/compare/5.1.0...5.2.0)

---
updated-dependencies:
- dependency-name: mock
  dependency-version: 5.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-16 14:03:40 +02:00
dependabot[bot] aec4fef8db Bump soupsieve from 2.6 to 2.7 (#2118)
Bumps [soupsieve](https://github.com/facelessuser/soupsieve) from 2.6 to 2.7.
- [Release notes](https://github.com/facelessuser/soupsieve/releases)
- [Commits](https://github.com/facelessuser/soupsieve/compare/2.6...2.7)

---
updated-dependencies:
- dependency-name: soupsieve
  dependency-version: '2.7'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-16 14:03:27 +02:00
dependabot[bot] 1da49bd208 Bump coverage from 7.9.2 to 7.10.3 (#2117)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.9.2 to 7.10.3.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.9.2...7.10.3)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.10.3
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-16 14:03:19 +02:00
dependabot[bot] 6da39cf3d5 Bump pypdf from 5.1.0 to 6.0.0 (#2122)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 5.1.0 to 6.0.0.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/5.1.0...6.0.0)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.0.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-16 14:03:08 +02:00
Soxoj f869eb49ca Updated workflows: added 3.13 to test, updated pypi upload (#2111) 2025-08-10 14:10:06 +02:00
Soxoj bebadb0362 Bump to 0.5.0 (#2108) 2025-08-10 13:10:50 +02:00
dependabot[bot] 495eef6ad5 Bump pytest-rerunfailures from 15.0 to 15.1 (#2030)
Bumps [pytest-rerunfailures](https://github.com/pytest-dev/pytest-rerunfailures) from 15.0 to 15.1.
- [Changelog](https://github.com/pytest-dev/pytest-rerunfailures/blob/master/CHANGES.rst)
- [Commits](https://github.com/pytest-dev/pytest-rerunfailures/compare/15.0...15.1)

---
updated-dependencies:
- dependency-name: pytest-rerunfailures
  dependency-version: '15.1'
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-10 12:58:35 +02:00
dependabot[bot] e1c72bfb94 Bump multidict from 6.1.0 to 6.6.3 (#2034)
Bumps [multidict](https://github.com/aio-libs/multidict) from 6.1.0 to 6.6.3.
- [Release notes](https://github.com/aio-libs/multidict/releases)
- [Changelog](https://github.com/aio-libs/multidict/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/multidict/compare/v6.1.0...v6.6.3)

---
updated-dependencies:
- dependency-name: multidict
  dependency-version: 6.6.3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-10 12:54:40 +02:00
dependabot[bot] deb13c9638 Bump asgiref from 3.8.1 to 3.9.1 (#2040)
Bumps [asgiref](https://github.com/django/asgiref) from 3.8.1 to 3.9.1.
- [Changelog](https://github.com/django/asgiref/blob/main/CHANGELOG.txt)
- [Commits](https://github.com/django/asgiref/compare/3.8.1...3.9.1)

---
updated-dependencies:
- dependency-name: asgiref
  dependency-version: 3.9.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-10 12:50:12 +02:00
dependabot[bot] 1e8e1acd58 Bump reportlab from 4.2.5 to 4.4.3 (#2063)
Bumps [reportlab](https://www.reportlab.com/) from 4.2.5 to 4.4.3.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-version: 4.4.3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-10 12:47:18 +02:00
Soxoj 5e88fd9ba8 Fixed test dialog_adds_site_negative (#2107) 2025-08-10 12:44:05 +02:00
dependabot[bot] 6bc836d6c4 Bump yarl from 1.18.3 to 1.20.1 (#2032)
Bumps [yarl](https://github.com/aio-libs/yarl) from 1.18.3 to 1.20.1.
- [Release notes](https://github.com/aio-libs/yarl/releases)
- [Changelog](https://github.com/aio-libs/yarl/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/yarl/commits)

---
updated-dependencies:
- dependency-name: yarl
  dependency-version: 1.20.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-10 12:29:46 +02:00
dependabot[bot] 080611c8b9 Bump aiohttp from 3.11.11 to 3.12.14 (#2041)
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.11.11 to 3.12.14.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.11.11...v3.12.14)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.12.14
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-18 18:15:14 +02:00
dependabot[bot] c3cf589aed Bump coverage from 7.6.10 to 7.9.2 (#2039)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.6.10 to 7.9.2.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.6.10...7.9.2)

---
updated-dependencies:
- dependency-name: coverage
  dependency-version: 7.9.2
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-15 09:39:52 +02:00
dependabot[bot] e01d5caae1 Bump platformdirs from 4.3.6 to 4.3.8 (#2033)
Bumps [platformdirs](https://github.com/tox-dev/platformdirs) from 4.3.6 to 4.3.8.
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.3.6...4.3.8)

---
updated-dependencies:
- dependency-name: platformdirs
  dependency-version: 4.3.8
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-13 16:19:40 +02:00
MR-VL d90d8a8ac9 Disable AskFM (#2037) 2025-07-13 16:16:49 +02:00
dependabot[bot] c3ce8a200b Bump typing-extensions from 4.12.2 to 4.14.1 (#2038)
---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-version: 4.14.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-13 16:16:16 +02:00
dependabot[bot] 65ea5ceeb1 Bump certifi from 2024.12.14 to 2025.1.31 (#2004)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.12.14 to 2025.1.31.
- [Commits](https://github.com/certifi/python-certifi/compare/2024.12.14...2025.01.31)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:52:07 +02:00
dependabot[bot] bca1d4bfd8 Bump attrs from 24.3.0 to 25.3.0 (#2014)
Bumps [attrs](https://github.com/sponsors/hynek) from 24.3.0 to 25.3.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:46:19 +02:00
Darlyson Rangel c9e38632ca Disable ICQ site (#1993) 2025-06-28 23:46:09 +02:00
dependabot[bot] 5f8ce2da98 Bump urllib3 from 2.2.3 to 2.5.0 (#2027)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.2.3 to 2.5.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.2.3...2.5.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.5.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:43:19 +02:00
dependabot[bot] bc6f7f831d Bump pytest-asyncio from 0.25.2 to 0.26.0 (#2016)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.25.2 to 0.26.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.25.2...v0.26.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:43:08 +02:00
dependabot[bot] f95c71d009 Bump pycares from 4.5.0 to 4.9.0 (#2025)
Bumps [pycares](https://github.com/saghul/pycares) from 4.5.0 to 4.9.0.
- [Release notes](https://github.com/saghul/pycares/releases)
- [Changelog](https://github.com/saghul/pycares/blob/master/ChangeLog)
- [Commits](https://github.com/saghul/pycares/compare/v4.5.0...v4.9.0)

---
updated-dependencies:
- dependency-name: pycares
  dependency-version: 4.9.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:39:52 +02:00
dependabot[bot] 974c93f327 Bump requests from 2.32.3 to 2.32.4 (#2026)
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.3...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:39:39 +02:00
dependabot[bot] ed7b65e5ed Bump flask from 3.1.0 to 3.1.1 (#2028)
Bumps [flask](https://github.com/pallets/flask) from 3.1.0 to 3.1.1.
- [Release notes](https://github.com/pallets/flask/releases)
- [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/flask/compare/3.1.0...3.1.1)

---
updated-dependencies:
- dependency-name: flask
  dependency-version: 3.1.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:33:49 +02:00
Pierre-Yves Lapersonne f76ea5d738 [#2010] Add 6 more websites to manage (#2009)
* feat: add `framapiaf.org` in supported web sites, add tag `mastodon` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `write.as` in supported web sites, add tag `writefreely` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `programming.dev` in supported web sites, add tag `lemmy` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `mamot.fr` in supported web sites (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `pixelfed.social` in supported web sites, add tag `pixelfed` (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* feat: add `Outgress` in supported web sites (#2010)

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>

* Updated the list of supported sites

---------

Signed-off-by: Pierre-Yves Lapersonne <dev@pylapersonne.info>
Co-authored-by: Soxoj <soxoj@protonmail.com>
2025-06-28 23:33:29 +02:00
dependabot[bot] 960b28d454 Bump jinja2 from 3.1.5 to 3.1.6 (#2011)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.5 to 3.1.6.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.5...3.1.6)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:24:46 +02:00
dependabot[bot] 09eef6701a Bump cryptography from 44.0.0 to 44.0.1 (#2005)
Bumps [cryptography](https://github.com/pyca/cryptography) from 44.0.0 to 44.0.1.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/44.0.0...44.0.1)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:24:32 +02:00
Sammy Folkhome 329ef27eff Update Installer.bat (#1994)
Greatly improved the installer overall and fixed issue of newer versions of pip not installing packages
2025-06-28 23:23:11 +02:00
dependabot[bot] ccb3b3bbd1 Bump black from 24.10.0 to 25.1.0 (#2001)
Bumps [black](https://github.com/psf/black) from 24.10.0 to 25.1.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/24.10.0...25.1.0)

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-28 23:20:13 +02:00
pykereaper b21ac36b27 Fix usage of data.json files from web (#2020) 2025-06-28 23:20:02 +02:00
pykereaper 0f7aa2c456 Pass db_file configuration to web interface (#2019)
* pass db_file configuration to web interface
* Autoformatting

---------

Co-authored-by: Soxoj <soxoj@protonmail.com>
2025-06-28 23:15:56 +02:00
Soxoj c0e60e25b8 upload-artifact action in python test workflow updated to v4 (#2024)
* upload-artifact action in python test workflow updated to v4
* Upload artifacts for all jobs
2025-06-28 23:08:56 +02:00
Ikko Eltociear Ashimine 4195a3ca21 docs: update usage-examples.rst (#1996)
inital -> initial
2025-02-18 10:50:29 +01:00
dependabot[bot] 5b3b81b482 Bump pytest-asyncio from 0.25.1 to 0.25.2 (#1990)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.25.1 to 0.25.2.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.25.1...v0.25.2)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-19 16:18:40 +01:00
dependabot[bot] 29d2c07a76 Bump mypy from 1.14.0 to 1.14.1 (#1988)
Bumps [mypy](https://github.com/python/mypy) from 1.14.0 to 1.14.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.14.0...v1.14.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-02 21:10:20 +05:00
dependabot[bot] 7ff2424de1 Bump pytest-asyncio from 0.25.0 to 0.25.1 (#1989)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.25.0 to 0.25.1.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.25.0...v0.25.1)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-02 21:07:27 +05:00
dependabot[bot] fc1dd9380e Bump coverage from 7.6.9 to 7.6.10 (#1986)
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.6.9 to 7.6.10.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](https://github.com/nedbat/coveragepy/compare/7.6.9...7.6.10)

---
updated-dependencies:
- dependency-name: coverage
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-28 20:02:37 +01:00
dependabot[bot] e423d72576 Bump jinja2 from 3.1.4 to 3.1.5 (#1982)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.4 to 3.1.5.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.4...3.1.5)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-28 19:59:41 +01:00
dependabot[bot] 9bc6c3370c Bump aiohttp-socks from 0.10.0 to 0.10.1 (#1987)
Bumps [aiohttp-socks](https://github.com/romis2012/aiohttp-socks) from 0.10.0 to 0.10.1.
- [Release notes](https://github.com/romis2012/aiohttp-socks/releases)
- [Commits](https://github.com/romis2012/aiohttp-socks/compare/v0.10.0...v0.10.1)

---
updated-dependencies:
- dependency-name: aiohttp-socks
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-28 19:56:12 +01:00
dependabot[bot] b90cdb1981 Bump mypy from 1.13.0 to 1.14.0 (#1983)
Bumps [mypy](https://github.com/python/mypy) from 1.13.0 to 1.14.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.13.0...v1.14.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-26 12:28:44 +01:00
dependabot[bot] 21b35e3798 Bump aiohttp-socks from 0.9.1 to 0.10.0 (#1985)
Bumps [aiohttp-socks](https://github.com/romis2012/aiohttp-socks) from 0.9.1 to 0.10.0.
- [Release notes](https://github.com/romis2012/aiohttp-socks/releases)
- [Commits](https://github.com/romis2012/aiohttp-socks/compare/v0.9.1...v0.10.0)

---
updated-dependencies:
- dependency-name: aiohttp-socks
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-26 11:43:41 +01:00
dependabot[bot] b8cf91cc8b Bump psutil from 6.1.0 to 6.1.1 (#1980)
Bumps [psutil](https://github.com/giampaolo/psutil) from 6.1.0 to 6.1.1.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-6.1.0...release-6.1.1)

---
updated-dependencies:
- dependency-name: psutil
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-20 10:03:52 +01:00
dependabot[bot] 8d5e557720 Bump aiohttp from 3.11.10 to 3.11.11 (#1979)
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.11.10 to 3.11.11.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.11.10...v3.11.11)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-19 12:04:31 +01:00
Soxoj 97e5f600d0 Async generator-executor for site checks (#1978) 2024-12-17 22:48:11 +01:00
overcuriousity 36ce285572 make graph more meaningful (#1977)
* make graph more meaningful

if a search with multiple usernames is launched, it creates an additional site node where they both are found. 
advantages:
- better recognition, that users have a connection with each other
- better detection of false positives when launching a search with two fake usernames (site node = definite false positive)

* fix Graph linking report.py
2024-12-17 16:51:19 +01:00
overcuriousity c2e3e96cb7 Improving the web interface (#1975)
* update web interface with commandline options
* improve web interface
* update README images of web interface
* fix bug in app.py
* fix web interface
2024-12-17 16:50:49 +01:00
imgbot[bot] 900ed840b3 [ImgBot] Optimize images (#1974)
*Total -- 2,762.12kb -> 2,382.30kb (13.75%)

/static/web_interface_screenshot_start.png -- 106.23kb -> 59.10kb (44.37%)
/docs/source/maigret_screenshot.png -- 375.21kb -> 233.98kb (37.64%)
/static/web_interface_screenshot.png -- 615.65kb -> 424.23kb (31.09%)
/static/recursive_search.svg -- 1,665.02kb -> 1,664.99kb (0%)

Signed-off-by: ImgBotApp <ImgBotHelp@gmail.com>
Co-authored-by: ImgBotApp <ImgBotHelp@gmail.com>
2024-12-16 19:24:08 +01:00
Soxoj c3dfe9cb4d Small docs and parameters fixes for web interface mode (#1973) 2024-12-16 17:18:22 +01:00
Soxoj 4894a267d7 Added web interface docs (#1972) 2024-12-16 17:06:06 +01:00
dependabot[bot] 984584f87d Bump attrs from 24.2.0 to 24.3.0 (#1970)
Bumps [attrs](https://github.com/sponsors/hynek) from 24.2.0 to 24.3.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-16 14:46:22 +01:00
dependabot[bot] a96d574000 Bump certifi from 2024.8.30 to 2024.12.14 (#1969)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.8.30 to 2024.12.14.
- [Commits](https://github.com/certifi/python-certifi/compare/2024.08.30...2024.12.14)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-16 14:27:20 +01:00
overcuriousity 88d68490f3 Created web frontend launched via --web flag (#1967)
Author: overcuriousity 
Co-authored-by: Soxoj <soxoj@protonmail.com>
2024-12-16 14:24:03 +01:00
Soxoj cb01535565 Preparation of 0.5.0 alpha version (#1966) 2024-12-13 12:51:31 +01:00
Soxoj c4af0a4df0 Fixed flaky tests to check cookies (#1965) 2024-12-13 12:37:58 +01:00
Soxoj f113c3d21a Merge pull request #1963 from soxoj/dependabot/pip/pytest-asyncio-0.25.0
Bump pytest-asyncio from 0.24.0 to 0.25.0
2024-12-13 11:25:15 +01:00
dependabot[bot] 4c7552ef88 Bump pytest-asyncio from 0.24.0 to 0.25.0
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.24.0 to 0.25.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.24.0...v0.25.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-13 06:54:21 +00:00
Soxoj b2283a5b04 Merge pull request #1961 from overcuriousity/main
fix bad linux filename generation
2024-12-12 22:07:21 +01:00
Soxoj 1ed0c61b56 Merge pull request #1962 from soxoj/site-fixes-121224
Site check fixes
2024-12-12 21:56:45 +01:00
Soxoj f212bc9bc8 Site check fixes 2024-12-12 21:39:35 +01:00
overcuriousity b8c62f95ae fix bad linux filename generation
currently maigret parses urls as usernames related to gravatar. this leads to bad filenames of the output on my linux host, as the slashes cause it to try to write subfolders, causing the script to abort with the error "file does not exist".
Applied a simple fix to replace all "/" with "_" in output file generation.
2024-12-12 15:00:54 +01:00
Soxoj 2653c617f8 Merge pull request #1958 from soxoj/gravatar-pypi-fix
Fixed Gravatar parsing (socid_extractor)
2024-12-12 02:32:35 +01:00
Soxoj 4dd82bf4c9 Fixed Gravatar parsing (socid_extractor) 2024-12-12 02:30:29 +01:00
Soxoj 33588ff090 Merge pull request #1957 from eltociear/patch-1
chore: update submit.py
2024-12-12 01:08:44 +01:00
Ikko Eltociear Ashimine f8ab484cd2 chore: update submit.py
futher -> further
2024-12-11 23:23:45 +09:00
Soxoj 2c39cd0646 Merge pull request #1956 from soxoj/submit-improvements-sitefixes
* Fixed Vimeo, activation/probing mechanisms improvements
* Updated CNET, DailyMotion
2024-12-11 01:23:01 +01:00
Soxoj 64ae391a4a Updated Vimeo, CNET, DailyMotion 2024-12-11 01:17:20 +01:00
Soxoj 127d9032c3 Fixed Vimeo, activation/probing mechanisms improvements 2024-12-11 00:56:00 +01:00
Soxoj 81a817a39f Improved "submit new site" mode, added tests, fixed top-500 sites (#1952) 2024-12-10 18:02:43 +01:00
Soxoj 51ab988e36 Fixed ProductHunt check (#1951) 2024-12-09 17:06:03 +01:00
Soxoj 5517636850 Updated OP.GG checks (#1950)
* Updated OP.GG checks
* Finalized LoL, added Valorant, disabled Archive.org
2024-12-09 15:59:19 +01:00
Soxoj 2be6e02800 Update README.md (#1949) 2024-12-09 13:01:31 +01:00
Soxoj 4eada16b94 Added a test for submitter (#1944) 2024-12-08 13:35:27 +01:00
Soxoj c66d776f8a Refactoring, test coverage increased to 60% (#1943) 2024-12-08 02:13:28 +01:00
Soxoj 4b1317789d Refactored self-check method, code formatting, small lint fixes (#1942) 2024-12-07 18:05:30 +01:00
Soxoj 8b7d8073d9 Fixed Linktr and discourse.mozilla.org (#1941) 2024-12-07 17:11:39 +01:00
Soxoj 2aa1ea39a0 Site fixes (#1940) 2024-12-06 14:27:38 +01:00
Soxoj cd789ed138 Fixed Ebay and BongaCams checks (#1939) 2024-12-06 13:32:51 +01:00
Soxoj 5641456ba0 Weibo site check fix, activation mechanism added (#1938) 2024-12-06 11:31:20 +01:00
dependabot[bot] 29c1f56fcb Bump aiohttp from 3.11.9 to 3.11.10 (#1937)
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.11.9 to 3.11.10.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.11.9...v3.11.10)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-06 11:31:09 +01:00
Soxoj f4edab8946 Readme/docs update based on GH discussions (#1936) 2024-12-06 01:54:26 +01:00
Soxoj f04de78682 Activation mechanism documentation added (#1935)
Few site checks fixed
2024-12-06 01:35:19 +01:00
dependabot[bot] 260b80c2f1 Bump six from 1.16.0 to 1.17.0 (#1933)
Bumps [six](https://github.com/benjaminp/six) from 1.16.0 to 1.17.0.
- [Changelog](https://github.com/benjaminp/six/blob/main/CHANGES)
- [Commits](https://github.com/benjaminp/six/compare/1.16.0...1.17.0)

---
updated-dependencies:
- dependency-name: six
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-05 15:14:25 +01:00
Soxoj cb9f01c106 Fixed Figma check (#1932)
Fixed cookies bug
Improved self-check mode: don't disable sites because of check errors
2024-12-04 19:21:27 +01:00
Soxoj e701c881a1 Documentation update (#1931) 2024-12-04 16:58:56 +01:00
Soxoj d78aa02833 Updated workflow (#1930) 2024-12-04 15:52:27 +01:00
Soxoj 4e54a9b496 Put Windows executable in Releases for each dev and main commit (#1929) 2024-12-04 15:27:28 +01:00
Soxoj 1cb25946dd Disabled Figma check (#1928) 2024-12-04 00:27:55 +01:00
Soxoj e982be4109 Installation docs update (#1927) 2024-12-03 20:23:49 +01:00
dependabot[bot] 1a8bbe7ff8 Bump pywin32-ctypes from 0.2.1 to 0.2.3 (#1924)
Bumps [pywin32-ctypes](https://github.com/enthought/pywin32-ctypes) from 0.2.1 to 0.2.3.
- [Release notes](https://github.com/enthought/pywin32-ctypes/releases)
- [Changelog](https://github.com/enthought/pywin32-ctypes/blob/main/CHANGELOG.txt)
- [Commits](https://github.com/enthought/pywin32-ctypes/compare/v0.2.1...v0.2.3)

---
updated-dependencies:
- dependency-name: pywin32-ctypes
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-03 19:53:22 +01:00
dependabot[bot] 0ec9fc9027 Bump mock from 4.0.3 to 5.1.0 (#1921)
Bumps [mock](https://github.com/testing-cabal/mock) from 4.0.3 to 5.1.0.
- [Changelog](https://github.com/testing-cabal/mock/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/testing-cabal/mock/compare/4.0.3...5.1.0)

---
updated-dependencies:
- dependency-name: mock
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-03 19:50:58 +01:00
Soxoj 07a7a474f8 Documentation update (#1926) 2024-12-03 17:25:17 +01:00
dependabot[bot] ce84f8d046 Bump pytest-asyncio from 0.23.8 to 0.24.0 (#1925)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.23.8 to 0.24.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.23.8...v0.24.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-03 16:50:26 +01:00
dependabot[bot] 82f494495c Bump yarl from 1.18.0 to 1.18.3 (#1922)
---
updated-dependencies:
- dependency-name: yarl
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-02 19:53:09 +01:00
dependabot[bot] 779ec87659 Bump pytest from 7.4.4 to 8.3.4 (#1923)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.4.4 to 8.3.4.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/7.4.4...8.3.4)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-02 19:00:41 +01:00
dependabot[bot] d5d4242015 Bump aiohttp from 3.11.8 to 3.11.9 (#1920)
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.11.8 to 3.11.9.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.11.8...v3.11.9)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-02 18:14:07 +01:00
Soxoj 2f93963a0a Refactored sites module, updated documentation (#1918) 2024-12-01 11:41:41 +01:00
Soxoj 5073ceff13 Update README.md (#1919) 2024-12-01 11:40:02 +01:00
Soxoj d15e12750b Sites fixes (#1917)
* Some sites fixes

* Sites stats updated
2024-12-01 03:19:36 +01:00
dependabot[bot] 0c7e3898e8 Bump attrs from 22.2.0 to 24.2.0 (#1913)
Bumps [attrs](https://github.com/sponsors/hynek) from 22.2.0 to 24.2.0.
- [Commits](https://github.com/sponsors/hynek/commits)

---
updated-dependencies:
- dependency-name: attrs
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-30 16:23:19 +01:00
dependabot[bot] 03089613dc Bump pytest-rerunfailures from 12.0 to 15.0 (#1911)
Bumps [pytest-rerunfailures](https://github.com/pytest-dev/pytest-rerunfailures) from 12.0 to 15.0.
- [Changelog](https://github.com/pytest-dev/pytest-rerunfailures/blob/master/CHANGES.rst)
- [Commits](https://github.com/pytest-dev/pytest-rerunfailures/compare/12.0...15.0)

---
updated-dependencies:
- dependency-name: pytest-rerunfailures
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-29 17:09:48 +01:00
Soxoj 21a8459b18 An recursive search animation in README has been updated (#1915) 2024-11-29 15:15:52 +01:00
dependabot[bot] 7f1f349300 Bump async-timeout from 4.0.3 to 5.0.1 (#1909)
Bumps [async-timeout](https://github.com/aio-libs/async-timeout) from 4.0.3 to 5.0.1.
- [Release notes](https://github.com/aio-libs/async-timeout/releases)
- [Changelog](https://github.com/aio-libs/async-timeout/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/async-timeout/compare/v4.0.3...v5.0.1)

---
updated-dependencies:
- dependency-name: async-timeout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-29 14:09:39 +01:00
dependabot[bot] 258f30ec5c Bump aiohttp from 3.11.7 to 3.11.8 (#1912)
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.11.7 to 3.11.8.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.11.7...v3.11.8)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-29 13:18:53 +01:00
Soxoj e96d09dee7 Permutator output and documentation updates (#1914) 2024-11-29 13:15:03 +01:00
dependabot[bot] ff06029253 Bump alive-progress from 2.4.1 to 3.2.0 (#1910)
Bumps [alive-progress](https://github.com/rsalmei/alive-progress) from 2.4.1 to 3.2.0.
- [Changelog](https://github.com/rsalmei/alive-progress/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rsalmei/alive-progress/commits/v3.2.0)

---
updated-dependencies:
- dependency-name: alive-progress
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-29 12:15:34 +01:00
Soxoj 15702bd9f4 Fixed dateutil parsing error for CDT timezone (#1907) 2024-11-29 12:02:41 +01:00
Soxoj 909a7e6a91 A new logo added (#1906)
* New cool logo
* Badges updated
* Increased the size of logo
2024-11-27 15:30:41 +01:00
Soxoj 2e2a47a12b Close http connections (#1595) (#1905) 2024-11-27 15:28:10 +01:00
dependabot[bot] 6170f07154 Bump pyvis from 0.2.1 to 0.3.2 (#1893)
Bumps [pyvis](https://github.com/WestHealth/pyvis) from 0.2.1 to 0.3.2.
- [Release notes](https://github.com/WestHealth/pyvis/releases)
- [Commits](https://github.com/WestHealth/pyvis/compare/v0.2.1...v0.3.2)

---
updated-dependencies:
- dependency-name: pyvis
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-27 13:47:52 +01:00
dependabot[bot] 3ad9bb59ce Bump pytest-cov from 4.1.0 to 6.0.0 (#1902)
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 4.1.0 to 6.0.0.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v4.1.0...v6.0.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-27 13:31:32 +01:00
dependabot[bot] c00b864017 Bump pycountry from 23.12.11 to 24.6.1 (#1903)
Bumps [pycountry](https://github.com/flyingcircusio/pycountry) from 23.12.11 to 24.6.1.
- [Changelog](https://github.com/pycountry/pycountry/blob/main/HISTORY.txt)
- [Commits](https://github.com/flyingcircusio/pycountry/compare/23.12.11...24.6.1)

---
updated-dependencies:
- dependency-name: pycountry
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-27 10:38:41 +01:00
dependabot[bot] 404c0376d3 Bump aiohttp-socks from 0.7.1 to 0.9.1 (#1900)
Bumps [aiohttp-socks](https://github.com/romis2012/aiohttp-socks) from 0.7.1 to 0.9.1.
- [Release notes](https://github.com/romis2012/aiohttp-socks/releases)
- [Commits](https://github.com/romis2012/aiohttp-socks/compare/v0.7.1...v0.9.1)

---
updated-dependencies:
- dependency-name: aiohttp-socks
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-27 10:16:41 +01:00
Soxoj 8a98aa9eaa Retries set to 0 by default, refactored code of executor with progress (#1899)
* Retries set to 0 by default, refactored code of executor with progress
2024-11-26 19:07:15 +01:00
dependabot[bot] 80cf70d151 Bump markupsafe from 2.1.5 to 3.0.2 (#1895)
Bumps [markupsafe](https://github.com/pallets/markupsafe) from 2.1.5 to 3.0.2.
- [Release notes](https://github.com/pallets/markupsafe/releases)
- [Changelog](https://github.com/pallets/markupsafe/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/markupsafe/compare/2.1.5...3.0.2)

---
updated-dependencies:
- dependency-name: markupsafe
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-26 16:19:05 +01:00
Soxoj ee25c61fc2 Maigret bot support (custom progress function fixed) (#1898)
* Fixed progress/close functions
* Fixed tests: execution time increased with alive_progressbar
2024-11-26 15:54:26 +01:00
Soxoj 324c118530 Parallel execution optimization (#1897)
* Connection failure fix: removed futures, added semaphores

* Additional fixes

* Tqdm replace to alive_progress, poetry update

* Self-check mode fix, tests fixes

* Sites checks fixes (#1896)

* Fixed incorrect site names, added method to compare sites
2024-11-26 13:55:12 +01:00
Soxoj b370bc4c44 Sites checks fixes (#1896)
Fixed incorrect site names, added method to compare sites
2024-11-26 13:29:43 +01:00
dependabot[bot] f529d16c62 Bump python-bidi from 0.4.2 to 0.6.3 (#1886)
Bumps [python-bidi](https://github.com/MeirKriheli/python-bidi) from 0.4.2 to 0.6.3.
- [Release notes](https://github.com/MeirKriheli/python-bidi/releases)
- [Changelog](https://github.com/MeirKriheli/python-bidi/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/MeirKriheli/python-bidi/compare/v0.4.2...v0.6.3)

---
updated-dependencies:
- dependency-name: python-bidi
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-25 21:28:33 +01:00
Soxoj 886fdc82d6 Pyinstaller bump & pefile fix (#1890)
Pinned pefile version
2024-11-25 21:23:38 +01:00
dependabot[bot] 10950332a1 Bump pytest-asyncio from 0.23.7 to 0.23.8 (#1885)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.23.7 to 0.23.8.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.23.7...v0.23.8)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-25 20:30:09 +01:00
dependabot[bot] 4d87adc0c8 Bump pyinstaller from 6.1 to 6.11.1 (#1882)
Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 6.1 to 6.11.1.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v6.1.0...v6.11.1)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-25 19:30:17 +01:00
Soxoj 13c20afe5b Improved self-check mode (#1887) 2024-11-25 18:27:59 +01:00
Soxoj d8a05807ba New sites added (#1888) 2024-11-25 18:24:20 +01:00
dependabot[bot] 089d33b88b Bump lxml from 4.9.4 to 5.3.0 (#1884)
Bumps [lxml](https://github.com/lxml/lxml) from 4.9.4 to 5.3.0.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-4.9.4...lxml-5.3.0)

---
updated-dependencies:
- dependency-name: lxml
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-25 14:42:12 +01:00
dependabot[bot] b3b84c633a Bump pefile from 2022.5.30 to 2024.8.26 (#1883)
Bumps [pefile](https://github.com/erocarrera/pefile) from 2022.5.30 to 2024.8.26.
- [Release notes](https://github.com/erocarrera/pefile/releases)
- [Commits](https://github.com/erocarrera/pefile/compare/v2022.5.30...v2024.8.26)

---
updated-dependencies:
- dependency-name: pefile
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-25 14:41:58 +01:00
Soxoj 86d51bced0 Added 7 sites, implemented integration with Marple, docs update (#1881)
* Added 5 sites, implemented integration with Marple

* Added 2 more sites, updated docs

* Updated sites list
2024-11-25 14:41:34 +01:00
Soxoj 54b864f167 Disabled unavailable sites (#1880) 2024-11-24 17:19:31 +01:00
Soxoj 54fecccbfb Show detailed error statistics for -v (#1879) 2024-11-24 04:21:24 +01:00
Soxoj 3745711b12 Added new badges to README (#1877) 2024-11-23 22:12:29 +01:00
dependabot[bot] 25bc88a438 Bump aiohttp from 3.9.5 to 3.10.5 (#1721)
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.9.5 to 3.10.5.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.9.5...v3.10.5)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-23 21:48:25 +01:00
Soxoj 9b0212d7c7 Fixed test for aiohttp 3.10 (#1876) 2024-11-23 21:42:34 +01:00
dependabot[bot] ceaf8cd9aa Bump certifi from 2023.11.17 to 2024.8.30 (#1840)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.11.17 to 2024.8.30.
- [Commits](https://github.com/certifi/python-certifi/compare/2023.11.17...2024.08.30)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-23 21:29:52 +01:00
dependabot[bot] 0c3ae98fd1 Bump urllib3 from 2.2.1 to 2.2.2 (#1600)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.2.1 to 2.2.2.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-23 21:18:46 +01:00
dependabot[bot] f0f64075ad Bump future from 0.18.3 to 1.0.0 (#1545)
Bumps [future](https://github.com/PythonCharmers/python-future) from 0.18.3 to 1.0.0.
- [Release notes](https://github.com/PythonCharmers/python-future/releases)
- [Changelog](https://github.com/PythonCharmers/python-future/blob/master/docs/changelog.rst)
- [Commits](https://github.com/PythonCharmers/python-future/compare/v0.18.3...v1.0.0)

---
updated-dependencies:
- dependency-name: future
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-23 21:17:43 +01:00
dependabot[bot] 2fae5bb340 Bump flake8 from 6.1.0 to 7.1.1 (#1692)
Bumps [flake8](https://github.com/pycqa/flake8) from 6.1.0 to 7.1.1.
- [Commits](https://github.com/pycqa/flake8/compare/6.1.0...7.1.1)

---
updated-dependencies:
- dependency-name: flake8
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-23 21:12:21 +01:00
dependabot[bot] 9287734a24 Bump psutil from 5.9.5 to 6.1.0 (#1839)
Bumps [psutil](https://github.com/giampaolo/psutil) from 5.9.5 to 6.1.0.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-5.9.5...release-6.1.0)

---
updated-dependencies:
- dependency-name: psutil
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-23 21:08:41 +01:00
Soxoj ff46d880cb Added GitHub and BuyMeACoffe sponsorships (#1875) 2024-11-23 19:53:34 +01:00
Soxoj f78c93eaca Added .readthedocs.yaml, fixed Pyinstaller and Docker workflows (#1874) 2024-11-23 19:11:25 +01:00
dependabot[bot] 1ff75403cd Bump werkzeug from 3.0.3 to 3.0.6 (#1846)
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.0.3 to 3.0.6.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/werkzeug/compare/3.0.3...3.0.6)

---
updated-dependencies:
- dependency-name: werkzeug
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-23 18:56:37 +01:00
dependabot[bot] 0dc8e52662 Bump requests-futures from 1.0.1 to 1.0.2 (#1868)
Bumps [requests-futures](https://github.com/ross/requests-futures) from 1.0.1 to 1.0.2.
- [Changelog](https://github.com/ross/requests-futures/blob/main/CHANGELOG.md)
- [Commits](https://github.com/ross/requests-futures/compare/v1.0.1...v1.0.2)

---
updated-dependencies:
- dependency-name: requests-futures
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-23 18:56:22 +01:00
dependabot[bot] 7c1f8a30ad Bump cryptography from 42.0.7 to 43.0.1 (#1870)
Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.7 to 43.0.1.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/42.0.7...43.0.1)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-23 18:56:12 +01:00
Soxoj 24e545b62c Added dev documentation, fixed some sites, removed GitHub issue links from reports (#1869) 2024-11-23 18:45:56 +01:00
dependabot[bot] 4331b5f532 Bump soupsieve from 2.5 to 2.6 (#1708)
Bumps [soupsieve](https://github.com/facelessuser/soupsieve) from 2.5 to 2.6.
- [Release notes](https://github.com/facelessuser/soupsieve/releases)
- [Commits](https://github.com/facelessuser/soupsieve/compare/2.5...2.6)

---
updated-dependencies:
- dependency-name: soupsieve
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-21 15:58:31 +02:00
synth 05db32f28f Fixed 1 site, PyInstaller workflow, Google Colab example (#1558)
* Updated example colab file (Due to latest update)

* Fix RobertsSpaceIndustries URI

* Fix PyInstaller workflow

* Fix example.ipynb (read desc.)

Currently the version installed via pip3 doesn't appear to contain the latest data.json file, resulting in many false positives..

* Fix non-existant users (read desc.)

Fixed non-existant usernames for the following:
Telegram (t.me)
TikBuddy (tikbuddy.com)
FurAffinity (furaffinity.net)
2024-10-21 15:58:16 +02:00
Paul Pfeister 1cb589eadb chore: remove alik.cz (#1671)
Alik.cz is seeing unusually high traffic on usernames julian and
noonewouldeverusethis due to its presence in both Sherlock and Maigret.
This target is permanently removed and should not be replaced.
2024-10-21 15:57:11 +02:00
jm.balestek 6fb0dc1067 Adding permutator feature for usernames (#1575)
* Adding permutator feature for usernames
("", "_", "-", ".") when id_type == username

File : maigret/permutator.py

Arg : --permute

For now, only permute from 2 elements and doesn't return single elements (element1, _element1, element1_,  element2, _element2, ...). 12 permuts for 2 elements.

To return single elements as well, Permute(usernames).gather(method="all"), but not implemented in maigrat.py. 18 permuts for 2 elements. Should we ? With another argument ?

* Update test_cli.py

permute arg added
2024-07-23 16:19:43 +02:00
ranlo e02a5571b6 Update data.json (#1559)
changed the URL for vidamora.com to www.vidamora.com

any username on https://vidamora.com/profile/{username} returns a redirect, to www.vidamora.com

on https://www.vidamora.com, you get different behavior for existing and non-existing users.
2024-06-24 12:51:54 +02:00
Topa b097a49ed5 Readme (#1588)
* Updated README

* added a link to the CONTRIBUTING file
2024-06-20 20:04:53 +02:00
Topa 45f9966b34 Added code conventions to CONTRIBUTING.md (#1589)
Added a link to code of conduct inside of CONTRIBUTING.md. Added naming conventions, indentation and import conventions. Added link to PEP 8 which I think most closely resembles the coding style used.
2024-06-20 20:04:10 +02:00
dependabot[bot] 46d8d8fc3d Bump socid-extractor from 0.0.24 to 0.0.26 (#1546)
Bumps [socid-extractor](https://github.com/soxoj/socid-extractor) from 0.0.24 to 0.0.26.
- [Release notes](https://github.com/soxoj/socid-extractor/releases)
- [Changelog](https://github.com/soxoj/socid-extractor/blob/master/CHANGELOG.md)
- [Commits](https://github.com/soxoj/socid-extractor/compare/v0.0.24...v0.0.26)

---
updated-dependencies:
- dependency-name: socid-extractor
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-31 09:17:29 +02:00
Richard Mwewa 034153791b Fixed 3 sites, disabed 3, added (#1539)
* Fixed/Disabled sites. Update requirements.txt

fixed_sites: AllRecipes, Linktree, CreativeMarket, ImgInn, Shutterstock, Contently

disabled_sites: Forums.ea.com. CrunchyRoll, Windy, MetaCritic, InfosecInstitute, Armchairgm.fandom.com, Bleach.fandom.com

Update requirements to prevent dependency conflicts.

* Update requirements.txt

Update requirements.txt to prevent dependency conflicts

* Update requirements.txt

* Update sites.md

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher

* fixed 2 sites, disabled 22 sites, and added 1 site

* fixed 3 sites, disabled 28, added 4 sites

* update sites.md

* Added 2 more sites

* fixed 3 sites, disabled 3 sites, added 1 site

* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml

* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml

* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml

* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml

* Update sites.md

* fix Twitch. Update snapcraft.yaml. Add pyproject.toml. Remove setup.py, requirements.txt, test-requirements.txt, as they are already specified in pyproject.toml

* Update sites.md

* fix forums.drom.ru

* Add EduGeek

* Add EduGeek

* Update python-package.yml

Fix dependency installation

* Update python-package.yml

* Update python-package.yml
2024-05-24 14:51:27 +02:00
Richard Mwewa 9399737ee6 Fixed 4 sites, added 6 sites, disabled 27 sites (#1536)
* Fixed/Disabled sites. Update requirements.txt

fixed_sites: AllRecipes, Linktree, CreativeMarket, ImgInn, Shutterstock, Contently

disabled_sites: Forums.ea.com. CrunchyRoll, Windy, MetaCritic, InfosecInstitute, Armchairgm.fandom.com, Bleach.fandom.com

Update requirements to prevent dependency conflicts.

* Update requirements.txt

Update requirements.txt to prevent dependency conflicts

* Update requirements.txt

* Update sites.md

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher

* fixed 2 sites, disabled 22 sites, and added 1 site

* fixed 3 sites, disabled 28, added 4 sites

* update sites.md

* Added 2 more sites
2024-05-18 01:50:05 +02:00
Richard Mwewa f7f77e587c Fixed/Disabled sites. Update requirements.txt (#1517)
* Fixed/Disabled sites. Update requirements.txt

fixed_sites: AllRecipes, Linktree, CreativeMarket, ImgInn, Shutterstock, Contently

disabled_sites: Forums.ea.com. CrunchyRoll, Windy, MetaCritic, InfosecInstitute, Armchairgm.fandom.com, Bleach.fandom.com

Update requirements to prevent dependency conflicts.

* Update requirements.txt

Update requirements.txt to prevent dependency conflicts

* Update requirements.txt

* Update sites.md

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher

* fixed_sites: Armchairgm.fandom.com, Bleach.fandom.com, Battleraprus. disabled_sites: MicrosoftTechNet, club.cnews.ru, Scorcher
2024-05-14 15:11:17 +02:00
dependabot[bot] 7a8c077c57 Bump jinja2 from 3.1.2 to 3.1.3 (#1358)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-10 17:17:05 +02:00
Soxoj 03900b0c26 Added SOWEL classification (#1453) 2024-04-10 11:54:41 +02:00
Soxoj 6be2f409e5 Added Telegram bot link (#1321) 2023-12-04 18:25:32 +01:00
h3x 46b13b4f23 fix reddit (#1296) 2023-11-26 18:38:06 +01:00
Jeiel be58bf0ab4 Compat RegataOS (Opensuse) (#1308)
* compat opensuse.txt

coloroma
async-timeout
Jinja2
MarkupSafe
multidict
requests
tqdm
typing-extensions
yarl
networkx
reportlab
[+] svglib

* compat opensuse.txt

* compat opensuse.txt

coloroma
async-timeout
Jinja2
MarkupSafe
multidict
requests
tqdm
typing-extensions
yarl
networkx
reportlab
[+] svglib


sudo zypper in python3-devel
sudo zypper in python3-dev
2023-11-26 18:36:20 +01:00
Soxoj 2ccef4a9f9 Updated site statistics (#1273) 2023-10-27 21:48:37 +02:00
weekend sorrow f1ea12d731 Updating site checkers, disabling suspended sites (#1266)
* Fixing checks for broken sites and repairing the ones that were changed

* little tweaks

* little tweaks

---------

Co-authored-by: Weekrow <somewherelse@yandex.ru>
2023-10-27 21:43:45 +02:00
dependabot[bot] 01121d7695 Bump tqdm from 4.65.0 to 4.66.1 (#1235)
Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.65.0 to 4.66.1.
- [Release notes](https://github.com/tqdm/tqdm/releases)
- [Commits](https://github.com/tqdm/tqdm/compare/v4.65.0...v4.66.1)

---
updated-dependencies:
- dependency-name: tqdm
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-22 20:44:44 +02:00
Soxoj 3ed043993f Changed pyinstaller dir (#1245) 2023-10-18 22:56:19 +02:00
dependabot[bot] a5bdf08c1c Bump async-timeout from 4.0.2 to 4.0.3 (#1238)
Bumps [async-timeout](https://github.com/aio-libs/async-timeout) from 4.0.2 to 4.0.3.
- [Release notes](https://github.com/aio-libs/async-timeout/releases)
- [Changelog](https://github.com/aio-libs/async-timeout/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/async-timeout/compare/v4.0.2...v4.0.3)

---
updated-dependencies:
- dependency-name: async-timeout
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-18 10:56:44 +02:00
dependabot[bot] 88fcf01d8f Bump pytest-rerunfailures from 10.2 to 12.0 (#1237)
Bumps [pytest-rerunfailures](https://github.com/pytest-dev/pytest-rerunfailures) from 10.2 to 12.0.
- [Changelog](https://github.com/pytest-dev/pytest-rerunfailures/blob/master/CHANGES.rst)
- [Commits](https://github.com/pytest-dev/pytest-rerunfailures/compare/10.2...12.0)

---
updated-dependencies:
- dependency-name: pytest-rerunfailures
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-18 10:56:32 +02:00
dependabot[bot] 451a858d6b Bump typing-extensions from 4.5.0 to 4.8.0 (#1239)
Bumps [typing-extensions](https://github.com/python/typing_extensions) from 4.5.0 to 4.8.0.
- [Release notes](https://github.com/python/typing_extensions/releases)
- [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/python/typing_extensions/compare/4.5.0...4.8.0)

---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-18 10:56:23 +02:00
Soxoj df0a0696a9 Update main from dev again (#1234)
* Specified pyinstaller version

* Switch to new branch of pyinstaller GH action

* Changed dir for pyinstaller

* Added branch for pyinstaller workflow
2023-10-18 10:56:02 +02:00
Soxoj f7341200bc Test pyinstaller on dev branch (#1233)
* Specified pyinstaller version

* Switch to new branch of pyinstaller GH action

* Changed dir for pyinstaller

* Added branch for pyinstaller workflow
2023-10-15 21:55:46 +02:00
Soxoj 9f252f6d41 Pyinstaller fix (#1231)
* Specified pyinstaller version

* Switch to new branch of pyinstaller GH action
2023-10-15 21:43:30 +02:00
Soxoj 397beebd21 Specified pyinstaller version (#1230) 2023-10-15 21:40:17 +02:00
dependabot[bot] 7c5995f165 Bump aiohttp from 3.8.3 to 3.8.6 (#1222)
* Bump aiohttp from 3.8.3 to 3.8.6

Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.8.3 to 3.8.6.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.8.3...v3.8.6)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fixed problematic test after aiohttp upgrade

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Soxoj <soxoj@protonmail.com>
2023-10-15 21:34:21 +02:00
dependabot[bot] aee1773e0c Bump flake8 from 5.0.4 to 6.1.0 (#1091)
Bumps [flake8](https://github.com/pycqa/flake8) from 5.0.4 to 6.1.0.
- [Commits](https://github.com/pycqa/flake8/compare/5.0.4...6.1.0)

---
updated-dependencies:
- dependency-name: flake8
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-15 21:28:26 +02:00
dependabot[bot] ffca24435b Bump xhtml2pdf from 0.2.8 to 0.2.11 (#935)
* Bump xhtml2pdf from 0.2.8 to 0.2.11

Bumps [xhtml2pdf](https://github.com/xhtml2pdf/xhtml2pdf) from 0.2.8 to 0.2.11.
- [Release notes](https://github.com/xhtml2pdf/xhtml2pdf/releases)
- [Commits](https://github.com/xhtml2pdf/xhtml2pdf/compare/v0.2.8...v0.2.11)

---
updated-dependencies:
- dependency-name: xhtml2pdf
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Updated libs versions

* Downgrade reportlab version

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Soxoj <soxoj@protonmail.com>
2023-10-15 21:23:19 +02:00
dependabot[bot] 2b588a2003 Bump pyvis from 0.2.1 to 0.3.2 (#861)
Bumps [pyvis](https://github.com/WestHealth/pyvis) from 0.2.1 to 0.3.2.
- [Release notes](https://github.com/WestHealth/pyvis/releases)
- [Commits](https://github.com/WestHealth/pyvis/compare/v0.2.1...v0.3.2)

---
updated-dependencies:
- dependency-name: pyvis
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Soxoj <31013580+soxoj@users.noreply.github.com>
2023-10-15 21:14:01 +02:00
dependabot[bot] 1978f24fc4 Bump pypdf2 from 2.10.8 to 3.0.1 (#815)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 2.10.8 to 3.0.1.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/PyPDF2/commits)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-15 21:13:42 +02:00
Soxoj 83d5740096 Tests fixes + last updates (#1228)
* Some sites fixed & cloudflare detection

* Fixed issue with tests

* Updates GitHub test workflow and sites data
2023-10-15 21:07:21 +02:00
Sammy Folkhome 726380ee09 EasyInstaller bat added (#1212) 2023-10-15 11:49:40 +02:00
Soxoj 90599ea3c2 Some sites fixed & cloudflare detection (#1178) 2023-09-09 20:58:01 +02:00
dependabot[bot] 72a1f948ba Bump cloudscraper from 1.2.66 to 1.2.71 (#914)
Bumps [cloudscraper](https://github.com/venomous/cloudscraper) from 1.2.66 to 1.2.71.
- [Release notes](https://github.com/venomous/cloudscraper/releases)
- [Commits](https://github.com/venomous/cloudscraper/commits)

---
updated-dependencies:
- dependency-name: cloudscraper
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Soxoj <31013580+soxoj@users.noreply.github.com>
2023-09-09 20:54:59 +02:00
realize096 71f22f65c4 update certifi 2022.12.7 to 2022.12.07 (#1173)
Co-authored-by: Soxoj <31013580+soxoj@users.noreply.github.com>
2023-09-09 20:54:12 +02:00
dependabot[bot] c9039cfd07 Bump certifi from 2022.12.7 to 2023.7.22 (#1070)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.12.7 to 2023.7.22.
- [Commits](https://github.com/certifi/python-certifi/compare/2022.12.07...2023.07.22)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-29 10:00:57 +02:00
dependabot[bot] f5fe575b6b Bump reportlab from 3.6.12 to 4.0.4 (#1160)
Bumps [reportlab](http://www.reportlab.com/) from 3.6.12 to 4.0.4.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-29 10:00:50 +02:00
Soxoj c5c78b2a66 Sites fixes 250823 (#1149)
* Additionally fixed sites, win32 build fix

* Fixed and disabled some sites (again)
2023-08-24 10:51:24 +02:00
Soxoj 390f3a49ee Additionally fixed sites, win32 build fix (#1148) 2023-08-24 09:32:24 +02:00
Theodore Ni dc9b44bd14 Add compatibility with pytest >= 7.3.0 (#1117)
Starting in this version, marks are no longer ordered. Sorting by their
names still sorts slow marks to the end of the list of tests.
2023-08-24 09:32:03 +02:00
realize096 b72e9b6a0c update reportlab 3.6.12 to 3.6.13 (#1051) 2023-08-24 09:31:40 +02:00
Soxoj b8c035e564 Fixed some sites (again) (#1133) 2023-08-23 21:58:41 +02:00
Soxoj eb115a1a70 Disabled and fixed several sites (#1132) 2023-08-23 20:58:46 +02:00
Soxoj f5ca005766 Added memory.lol (Twitter usernames archive) (#1067) 2023-07-24 12:57:45 +06:00
Soxoj 656b9c19ea Improved search through UnstoppableDomains (#1040) 2023-07-07 21:24:20 +02:00
engNoori 5855cbfcc9 Update wizard.py (#1016)
This code is more readable and easier to understand than the original code. It uses more descriptive variable names, and it breaks the code into smaller, more manageable functions. The code also uses comments to explain what each part of the code is doing.

Here are some specific improvements that I made to the code:

* I renamed the variables `TOP_SITES_COUNT` and `TIMEOUT` to more descriptive names, such as `max_sites_to_search` and `timeout`.
* I broke the code into smaller, more manageable functions, such as `main()` and `search_func()`.
* I added comments to explain what each part of the code is doing.
* I used more consistent indentation.
2023-07-07 19:54:20 +02:00
dependabot[bot] 6caa08902f Bump requests from 2.28.2 to 2.31.0 (#957)
Bumps [requests](https://github.com/psf/requests) from 2.28.2 to 2.31.0.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.28.2...v2.31.0)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-07 19:52:59 +02:00
Soxoj 932e07a8ee Added 26 ENS and similar domains with tag crypto (#942) 2023-05-13 18:23:17 +08:00
Alexandre ZANNI 71d5368fea fix deployment of tests (#933)
fix #932
2023-05-08 22:25:31 +08:00
dependabot[bot] 9f2f4d5107 Bump psutil from 5.9.4 to 5.9.5 (#910)
Bumps [psutil](https://github.com/giampaolo/psutil) from 5.9.4 to 5.9.5.
- [Release notes](https://github.com/giampaolo/psutil/releases)
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-5.9.4...release-5.9.5)

---
updated-dependencies:
- dependency-name: psutil
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-20 09:27:18 +02:00
dependabot[bot] d6003c93b8 Bump requests from 2.28.1 to 2.28.2 (#904)
Bumps [requests](https://github.com/psf/requests) from 2.28.1 to 2.28.2.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.28.1...v2.28.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-19 10:16:54 +02:00
dependabot[bot] 4055fa088d Bump tqdm from 4.64.1 to 4.65.0 (#905)
Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.64.1 to 4.65.0.
- [Release notes](https://github.com/tqdm/tqdm/releases)
- [Commits](https://github.com/tqdm/tqdm/compare/v4.64.1...v4.65.0)

---
updated-dependencies:
- dependency-name: tqdm
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-19 10:16:47 +02:00
Chien Dat Nguyen Dinh 745a70a534 Fix missing Mastodon Regex (#908)
Co-authored-by: Dat Nguyen Dinh <dat.nguyen@liferaftinc.com>
2023-04-19 10:16:37 +02:00
Soxoj 366e9333dd Added valid regex for Mastodon instances (#848) (#906) 2023-04-18 15:25:01 +02:00
Soxoj fc1f5bfc82 Fixed false positives on Mastodon sites (#901) 2023-04-17 10:51:32 +02:00
dependabot[bot] bfe33d74d3 Bump yarl from 1.8.1 to 1.8.2 (#899)
Bumps [yarl](https://github.com/aio-libs/yarl) from 1.8.1 to 1.8.2.
- [Release notes](https://github.com/aio-libs/yarl/releases)
- [Changelog](https://github.com/aio-libs/yarl/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/yarl/compare/v1.8.1...v1.8.2)

---
updated-dependencies:
- dependency-name: yarl
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-17 10:30:51 +02:00
dependabot[bot] 9c2746fc28 Bump lxml from 4.9.1 to 4.9.2 (#900)
Bumps [lxml](https://github.com/lxml/lxml) from 4.9.1 to 4.9.2.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-4.9.1...lxml-4.9.2)

---
updated-dependencies:
- dependency-name: lxml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-17 10:30:45 +02:00
Soxoj 0ad2cdef2c Fixed false positives, updated networkx dep, some lint fixes (#894)
* Fixed false positives, updated networkx dep, some lint fixes

* Downgraded networkx version
2023-04-16 18:24:29 +02:00
dependabot[bot] 0064fad85c Bump multidict from 6.0.2 to 6.0.4 (#891)
Bumps [multidict](https://github.com/aio-libs/multidict) from 6.0.2 to 6.0.4.
- [Release notes](https://github.com/aio-libs/multidict/releases)
- [Changelog](https://github.com/aio-libs/multidict/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/multidict/compare/v6.0.2...v6.0.4)

---
updated-dependencies:
- dependency-name: multidict
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-16 11:28:43 +02:00
dependabot[bot] 16f4978b31 Bump attrs from 22.1.0 to 22.2.0 (#892)
Bumps [attrs](https://github.com/python-attrs/attrs) from 22.1.0 to 22.2.0.
- [Release notes](https://github.com/python-attrs/attrs/releases)
- [Changelog](https://github.com/python-attrs/attrs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/python-attrs/attrs/compare/22.1.0...22.2.0)

---
updated-dependencies:
- dependency-name: attrs
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-16 11:28:35 +02:00
dependabot[bot] b0ec08d753 Bump psutil from 5.9.2 to 5.9.4 (#741)
Bumps [psutil](https://github.com/giampaolo/psutil) from 5.9.2 to 5.9.4.
- [Release notes](https://github.com/giampaolo/psutil/releases)
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-5.9.2...release-5.9.4)

---
updated-dependencies:
- dependency-name: psutil
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-13 10:56:58 +02:00
dependabot[bot] fb8952b783 Bump typing-extensions from 4.4.0 to 4.5.0 (#888)
Bumps [typing-extensions](https://github.com/python/typing_extensions) from 4.4.0 to 4.5.0.
- [Release notes](https://github.com/python/typing_extensions/releases)
- [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/python/typing_extensions/compare/4.4.0...4.5.0)

---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-13 10:56:35 +02:00
dependabot[bot] 4216f5c028 Bump reportlab from 3.6.11 to 3.6.12 (#735)
Bumps [reportlab](http://www.reportlab.com/) from 3.6.11 to 3.6.12.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-12 17:08:06 +02:00
Peter Dave Hello 539a3c5000 Update dependency - networkx from v2.5.1 to v2.6 (#738)
Found a security issue via snyk:
- https://security.snyk.io/vuln/SNYK-PYTHON-NETWORKX-1062709
2023-04-12 16:54:32 +02:00
dependabot[bot] 064d5707f9 Bump certifi from 2022.9.24 to 2022.12.7 (#793)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.9.24 to 2022.12.7.
- [Release notes](https://github.com/certifi/python-certifi/releases)
- [Commits](https://github.com/certifi/python-certifi/compare/2022.09.24...2022.12.07)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-12 16:53:53 +02:00
Peter Dave Hello fd64f5710f Update "future" package to v0.18.3 (#834)
Reference: https://www.cve.org/CVERecord?id=CVE-2022-40899
2023-04-12 16:53:09 +02:00
codyMar30 2136a71db1 Added new Websites (#838)
- lyricstraining.com
- forums.expo.dev
- rawg.io
- schemecolor.com
- aetherhub.com
- bugbounty.gg
- universocraft.com
2023-04-12 16:52:36 +02:00
Chien Dat Nguyen Dinh 8308299367 Fix Pinterest (#862)
Co-authored-by: Dat Nguyen Dinh <dat.nguyen@liferaftinc.com>
2023-04-12 16:52:10 +02:00
Nadeem M 70bed56a8a Update philosophy.rst (#866) 2023-04-12 16:51:48 +02:00
Soxoj 4c2a21832b Small readme fix (#857) 2023-02-24 12:53:07 +03:00
Soxoj 356d7d4e49 Fixed documentation URL (#799) 2022-12-18 12:26:19 +03:00
fen0s 6020e766ce fix opensea and shutterstock, disable a few dead sites (#798)
* fix shutterstock and disable allsoft

* disable dead forums and fix opensea

* Update sites.md
2022-12-18 12:22:24 +03:00
dependabot[bot] b4e963b2b1 Bump cloudscraper from 1.2.64 to 1.2.66 (#769)
Bumps [cloudscraper](https://github.com/venomous/cloudscraper) from 1.2.64 to 1.2.66.
- [Release notes](https://github.com/venomous/cloudscraper/releases)
- [Commits](https://github.com/venomous/cloudscraper/commits)

---
updated-dependencies:
- dependency-name: cloudscraper
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-04 12:48:41 +03:00
fen0s aebd8539ed disable broken sites (#756)
* Update data.json

* Update sites.md
2022-11-22 23:13:52 +03:00
fen0s fea1c6b552 disable not working sites (#739)
* Update data.json
* Update sites.md

Co-authored-by: Soxoj <31013580+soxoj@users.noreply.github.com>
2022-11-08 10:47:21 +04:00
dependabot[bot] fd8f5f90fd Bump pytest from 7.1.3 to 7.2.0 (#734)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.1.3 to 7.2.0.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/7.1.3...7.2.0)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-30 11:57:10 +03:00
dependabot[bot] b06fd470cc Bump colorama from 0.4.5 to 0.4.6 (#733)
Bumps [colorama](https://github.com/tartley/colorama) from 0.4.5 to 0.4.6.
- [Release notes](https://github.com/tartley/colorama/releases)
- [Changelog](https://github.com/tartley/colorama/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/tartley/colorama/compare/0.4.5...0.4.6)

---
updated-dependencies:
- dependency-name: colorama
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-25 10:46:25 +03:00
kz6fittycent ec1aaacb41 Updated snapcraft yaml (#720)
* Update snapcraft.yaml

* Update snapcraft.yaml

* Oops...forgot home and network interfaces

* for cryin' out loud.

* cleaning things up
2022-10-24 22:23:34 +03:00
dependabot[bot] bc1035c1ec Bump pytest-asyncio from 0.19.0 to 0.20.1 (#732)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.19.0 to 0.20.1.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Changelog](https://github.com/pytest-dev/pytest-asyncio/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.19.0...v0.20.1)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-24 22:22:41 +03:00
Soxoj 026fd98304 Fixed YouTube (#717) 2022-10-17 01:17:09 +03:00
Soxoj f03a4c81a5 Fixed lightstalking.com (#716) 2022-10-17 01:03:25 +03:00
Soxoj 79afab11c2 Fixed docs about tags (#715) 2022-10-17 00:44:00 +03:00
Ben 10ef102791 Typo fixes in error.py (#711)
Fixing two small typos in the error definition file:
 - "switch to another..." -> ""Switch to another...
    - Capitalizing this sentence 
 - "...parallel connections (e.g. --n 10)" -> "...parallel connections (e.g. -n 10)"
    - Removing the extra `-` for this option
2022-10-16 11:28:24 +03:00
dependabot[bot] 523317e760 Bump typing-extensions from 4.3.0 to 4.4.0 (#698)
Bumps [typing-extensions](https://github.com/python/typing_extensions) from 4.3.0 to 4.4.0.
- [Release notes](https://github.com/python/typing_extensions/releases)
- [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/python/typing_extensions/compare/4.3.0...4.4.0)

---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-08 22:34:47 +03:00
dependabot[bot] 82074d77b1 Bump stem from 1.8.0 to 1.8.1 (#689)
Bumps [stem](https://github.com/torproject/stem) from 1.8.0 to 1.8.1.
- [Release notes](https://github.com/torproject/stem/releases)
- [Commits](https://github.com/torproject/stem/compare/1.8.0...1.8.1)

---
updated-dependencies:
- dependency-name: stem
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-08 21:54:41 +03:00
dependabot[bot] 002c8359fe Bump pytest-cov from 3.0.0 to 4.0.0 (#688)
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 3.0.0 to 4.0.0.
- [Release notes](https://github.com/pytest-dev/pytest-cov/releases)
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v3.0.0...v4.0.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-04 13:46:57 +03:00
Peter Dave Hello 08bba20003 Improve README.md Installation section (#690)
Clone and install manually is duplicated in 2 places and can be merged.
2022-10-04 13:46:34 +03:00
Peter Dave Hello 0a628d2b8f Refactor Dockerfile with best practices (#691)
Multiple best practices applied as below:

- Replace deprecated `MAINTAINER` with `LABEL maintainer`
- Remove additional `apt clean` as it'll be done automatically
- Use `apt-get` instead of `apt` in script, apt does not have a stable
  CLI interface, and it's for end-user.
- Put `apt-get install` & apt lists clean up in the same command
- Use `--no-install-recommends` with `apt-get install` to avoid install
  additional packages
- Use `--no-cache-dir` with `pip install` to prevent temporary cache
- Use `COPY` instead of `ADD` for files and folders
- Use spaces instead of mixing spaces with tabs to indent

Size change by the refactor, almost 100MB saved:

```
REPOSITORY   TAG      IMAGE ID       CREATED         SIZE
maigret      after    9e70c65dde32   1 minutes ago   543MB
maigret      before   a683f2b71751   7 minutes ago   635MB
```
2022-10-04 13:46:01 +03:00
Peter Dave Hello f1969a12a1 Update README.md, Repl.it -> Replit with new badge (#692)
It's changed to Replit.com about two years ago, also there's a higher quality badge can be used ;)
2022-10-04 13:43:58 +03:00
dependabot[bot] 3cb03fe09c Bump arabic-reshaper from 2.1.3 to 2.1.4 (#650)
Bumps [arabic-reshaper](https://github.com/mpcabd/python-arabic-reshaper) from 2.1.3 to 2.1.4.
- [Release notes](https://github.com/mpcabd/python-arabic-reshaper/releases)
- [Commits](https://github.com/mpcabd/python-arabic-reshaper/compare/v2.1.3...v2.1.4)

---
updated-dependencies:
- dependency-name: arabic-reshaper
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-04 10:06:36 +03:00
dependabot[bot] 5769144ac3 Bump aiohttp from 3.8.1 to 3.8.3 (#651)
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.8.1 to 3.8.3.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.8.1...v3.8.3)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-04 10:06:24 +03:00
dependabot[bot] 99c9b0a8ca Bump certifi from 2022.9.14 to 2022.9.24 (#652)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.9.14 to 2022.9.24.
- [Release notes](https://github.com/certifi/python-certifi/releases)
- [Commits](https://github.com/certifi/python-certifi/compare/2022.09.14...2022.09.24)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-04 10:06:12 +03:00
Ruslan Bekenev 8e9722a285 Build docker images for arm64 and amd64 (#687) 2022-10-04 09:55:23 +03:00
Lorenzo Sapora 95276b841c Fix typos (#681) 2022-10-04 09:16:41 +03:00
Johan Burati 9484d6f05e Update README.md (#669)
Add option to map volume for docker example
2022-10-04 09:15:49 +03:00
Leon G 06f94cd476 correct username in usage examples (#673) 2022-10-04 00:40:49 +03:00
fen0s d4d525647c fix sites from issues (#680)
* Update data.json
* Update sites.md
2022-10-03 23:00:48 +03:00
Omar Trkzi f988c532ec Corrected grammar in README.md (#674) 2022-10-03 19:24:57 +03:00
dr-BEat e71c8907f0 Changed docker run to interactive and remove on exit (#675) 2022-10-03 19:24:10 +03:00
OSINT Tactical 45ed832ec8 site deletion (#648)
* Update sites.md
* Update data.json
2022-10-01 13:38:31 +03:00
fen0s a57e5f1d90 Add precommit hook (#664)
* add Sherlock sites
* add precommit hook
2022-10-01 13:38:04 +03:00
fen0s d9fd6e0b29 fix false positives from bot (#663)
* fix false positives from bot

* Update data.json

* Update sites.md
2022-09-29 20:56:15 +03:00
dependabot[bot] 827c11f2e1 Bump idna from 3.3 to 3.4 (#640)
Bumps [idna](https://github.com/kjd/idna) from 3.3 to 3.4.
- [Release notes](https://github.com/kjd/idna/releases)
- [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst)
- [Commits](https://github.com/kjd/idna/compare/v3.3...v3.4)

---
updated-dependencies:
- dependency-name: idna
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-18 16:41:11 +03:00
dependabot[bot] 647a3fabb9 Bump certifi from 2022.6.15 to 2022.9.14 (#644)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.6.15 to 2022.9.14.
- [Release notes](https://github.com/certifi/python-certifi/releases)
- [Commits](https://github.com/certifi/python-certifi/compare/2022.06.15...2022.09.14)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-18 16:40:50 +03:00
dependabot[bot] efb2a9501e Bump pypdf2 from 2.10.5 to 2.10.8 (#641)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 2.10.5 to 2.10.8.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/2.10.8/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/2.10.5...2.10.8)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-18 16:40:27 +03:00
dependabot[bot] 44c009e570 Bump pytest-httpserver from 1.0.5 to 1.0.6 (#638)
Bumps [pytest-httpserver](https://github.com/csernazs/pytest-httpserver) from 1.0.5 to 1.0.6.
- [Release notes](https://github.com/csernazs/pytest-httpserver/releases)
- [Changelog](https://github.com/csernazs/pytest-httpserver/blob/master/CHANGES.rst)
- [Commits](https://github.com/csernazs/pytest-httpserver/compare/1.0.5...1.0.6)

---
updated-dependencies:
- dependency-name: pytest-httpserver
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-18 16:40:12 +03:00
Soxoj eb304b6804 Invalid results fixes (#634) 2022-09-11 14:26:19 +03:00
dependabot[bot] e1b9b62c4d Bump pypdf2 from 2.10.4 to 2.10.5 (#625)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 2.10.4 to 2.10.5.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/2.10.4...2.10.5)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-11 14:13:45 +03:00
dependabot[bot] ad6938f068 Bump psutil from 5.9.1 to 5.9.2 (#624)
Bumps [psutil](https://github.com/giampaolo/psutil) from 5.9.1 to 5.9.2.
- [Release notes](https://github.com/giampaolo/psutil/releases)
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-5.9.1...release-5.9.2)

---
updated-dependencies:
- dependency-name: psutil
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-11 14:13:26 +03:00
Soxoj 1c9ccfe77b Added Instagram scrapers (#633) 2022-09-11 14:12:38 +03:00
fen0s 1fd1e2c809 Mirrors (#630)
* Update checking.py

* Added attempts check and mirrors

Co-authored-by: Soxoj <soxoj@protonmail.com>
Co-authored-by: Soxoj <31013580+soxoj@users.noreply.github.com>
2022-09-11 14:05:32 +03:00
Soxoj c5e973bc5b Streaming sites (#628)
* Added new sites, new error solution caption
2022-09-11 01:49:46 +03:00
dependabot[bot] b288c37d91 Bump yarl from 1.7.2 to 1.8.1 (#626)
Bumps [yarl](https://github.com/aio-libs/yarl) from 1.7.2 to 1.8.1.
- [Release notes](https://github.com/aio-libs/yarl/releases)
- [Changelog](https://github.com/aio-libs/yarl/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/yarl/compare/v1.7.2...v1.8.1)

---
updated-dependencies:
- dependency-name: yarl
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-10 21:40:52 +03:00
OSINT Tactical 2f76f22202 Site Supression (#627)
* Update sites.md

* Update data.json
2022-09-10 21:40:31 +03:00
Soxoj f7c7809d8d Bump to 0.4.4 (#621) 2022-09-03 14:30:24 +03:00
dependabot[bot] 80bd7f21eb Bump tqdm from 4.64.0 to 4.64.1 (#618)
Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.64.0 to 4.64.1.
- [Release notes](https://github.com/tqdm/tqdm/releases)
- [Commits](https://github.com/tqdm/tqdm/compare/v4.64.0...v4.64.1)

---
updated-dependencies:
- dependency-name: tqdm
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-03 14:22:56 +03:00
fen0s 994d79244e add ProtonMail, disable 3 broken sites (#619)
* fixed false positives
2022-09-03 14:22:42 +03:00
dependabot[bot] 4b2d2c07bd Bump pycountry from 22.1.10 to 22.3.5 (#607)
Bumps [pycountry](https://github.com/flyingcircusio/pycountry) from 22.1.10 to 22.3.5.
- [Release notes](https://github.com/flyingcircusio/pycountry/releases)
- [Changelog](https://github.com/flyingcircusio/pycountry/blob/master/HISTORY.txt)
- [Commits](https://github.com/flyingcircusio/pycountry/compare/22.1.10...22.3.5)

---
updated-dependencies:
- dependency-name: pycountry
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-03 14:16:12 +03:00
dependabot[bot] 938d05f812 Bump cloudscraper from 1.2.63 to 1.2.64 (#614)
Bumps [cloudscraper](https://github.com/venomous/cloudscraper) from 1.2.63 to 1.2.64.
- [Release notes](https://github.com/venomous/cloudscraper/releases)
- [Commits](https://github.com/venomous/cloudscraper/commits)

---
updated-dependencies:
- dependency-name: cloudscraper
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-03 13:35:38 +03:00
OSINT Tactical 487c4e0dbf Update sites.md -Gitmemory.com suppression (#610)
* Update sites.md

* Add files via upload
2022-09-03 13:35:29 +03:00
dependabot[bot] 09dce2046a Bump pytest from 7.1.2 to 7.1.3 (#613)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.1.2 to 7.1.3.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/7.1.2...7.1.3)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-03 13:35:18 +03:00
dependabot[bot] 65963e5647 Bump pypdf2 from 2.5.0 to 2.10.4 (#606)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 2.5.0 to 2.10.4.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/2.5.0...2.10.4)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-03 13:34:53 +03:00
dependabot[bot] 69f220a7e4 Bump pytest-asyncio from 0.18.2 to 0.19.0 (#601)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.18.2 to 0.19.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Changelog](https://github.com/pytest-dev/pytest-asyncio/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.18.2...v0.19.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-03 13:26:13 +03:00
dependabot[bot] 722d3039dc Bump attrs from 21.4.0 to 22.1.0 (#597)
Bumps [attrs](https://github.com/python-attrs/attrs) from 21.4.0 to 22.1.0.
- [Release notes](https://github.com/python-attrs/attrs/releases)
- [Changelog](https://github.com/python-attrs/attrs/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/python-attrs/attrs/compare/21.4.0...22.1.0)

---
updated-dependencies:
- dependency-name: attrs
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-03 13:26:01 +03:00
dependabot[bot] 420c29610d Bump flake8 from 4.0.1 to 5.0.4 (#598)
Bumps [flake8](https://github.com/pycqa/flake8) from 4.0.1 to 5.0.4.
- [Release notes](https://github.com/pycqa/flake8/releases)
- [Commits](https://github.com/pycqa/flake8/compare/4.0.1...5.0.4)

---
updated-dependencies:
- dependency-name: flake8
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-02 21:18:35 +03:00
dependabot[bot] 6b53fac424 Bump cloudscraper from 1.2.60 to 1.2.63 (#600)
Bumps [cloudscraper](https://github.com/venomous/cloudscraper) from 1.2.60 to 1.2.63.
- [Release notes](https://github.com/venomous/cloudscraper/releases)
- [Commits](https://github.com/venomous/cloudscraper/commits)

---
updated-dependencies:
- dependency-name: cloudscraper
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-27 17:11:12 +03:00
dependabot[bot] 37c54735f1 Bump chardet from 4.0.0 to 5.0.0 (#550)
Bumps [chardet](https://github.com/chardet/chardet) from 4.0.0 to 5.0.0.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Commits](https://github.com/chardet/chardet/compare/4.0.0...5.0.0)

---
updated-dependencies:
- dependency-name: chardet
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-27 17:07:29 +03:00
dependabot[bot] 2f0a0b49f3 Bump colorama from 0.4.4 to 0.4.5 (#548)
Bumps [colorama](https://github.com/tartley/colorama) from 0.4.4 to 0.4.5.
- [Release notes](https://github.com/tartley/colorama/releases)
- [Changelog](https://github.com/tartley/colorama/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/tartley/colorama/compare/0.4.4...0.4.5)

---
updated-dependencies:
- dependency-name: colorama
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-27 17:03:30 +03:00
dependabot[bot] 1a8b06385a Bump typing-extensions from 4.2.0 to 4.3.0 (#549)
Bumps [typing-extensions](https://github.com/python/typing_extensions) from 4.2.0 to 4.3.0.
- [Release notes](https://github.com/python/typing_extensions/releases)
- [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/python/typing_extensions/compare/4.2.0...4.3.0)

---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-27 17:03:19 +03:00
dependabot[bot] 22d7c204f8 Bump pytest-httpserver from 1.0.4 to 1.0.5 (#583)
Bumps [pytest-httpserver](https://github.com/csernazs/pytest-httpserver) from 1.0.4 to 1.0.5.
- [Release notes](https://github.com/csernazs/pytest-httpserver/releases)
- [Changelog](https://github.com/csernazs/pytest-httpserver/blob/master/CHANGES.rst)
- [Commits](https://github.com/csernazs/pytest-httpserver/compare/1.0.4...1.0.5)

---
updated-dependencies:
- dependency-name: pytest-httpserver
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-27 17:03:09 +03:00
fen0s a6ae0723f9 False positives fixes (#591) 2022-08-24 18:26:01 +03:00
dependabot[bot] aa4f94ac01 Bump certifi from 2022.5.18.1 to 2022.6.15 (#551)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.5.18.1 to 2022.6.15.
- [Release notes](https://github.com/certifi/python-certifi/releases)
- [Commits](https://github.com/certifi/python-certifi/compare/2022.05.18.1...2022.06.15)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-20 12:13:32 +03:00
fen0s 1153a9bf01 disable Instagram, fix two false positives (#578)
* Update data.json

* Update data.json

* Update data.json
2022-08-15 15:45:53 +03:00
fen0s 3d878131b9 fix false positives (#577) 2022-08-13 13:12:22 +03:00
fen0s 20746a0fc3 disable yandex music + set utf8 encoding (#562)
* Update report.py

* Update data.json

* Update data.json
2022-07-26 02:37:26 +03:00
dependabot[bot] ce062d915e Bump lxml from 4.9.0 to 4.9.1 (#538)
Bumps [lxml](https://github.com/lxml/lxml) from 4.9.0 to 4.9.1.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-4.9.0...lxml-4.9.1)

---
updated-dependencies:
- dependency-name: lxml
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-17 12:45:37 +03:00
dependabot[bot] c057c5c478 Bump xhtml2pdf from 0.2.7 to 0.2.8 (#522)
Bumps [xhtml2pdf](https://github.com/xhtml2pdf/xhtml2pdf) from 0.2.7 to 0.2.8.
- [Release notes](https://github.com/xhtml2pdf/xhtml2pdf/releases)
- [Commits](https://github.com/xhtml2pdf/xhtml2pdf/compare/v0.2.7...v0.2.8)

---
updated-dependencies:
- dependency-name: xhtml2pdf
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-17 12:45:29 +03:00
dependabot[bot] eab0ec48da Bump pypdf2 from 2.0.0 to 2.5.0 (#542)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 2.0.0 to 2.5.0.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/2.0.0...2.5.0)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-17 12:45:24 +03:00
dependabot[bot] 5b40eac230 Bump requests from 2.27.1 to 2.28.1 (#530)
Bumps [requests](https://github.com/psf/requests) from 2.27.1 to 2.28.1.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.27.1...v2.28.1)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-17 12:45:15 +03:00
dependabot[bot] 2d782379ab Bump reportlab from 3.6.9 to 3.6.11 (#543)
Bumps [reportlab](http://www.reportlab.com/) from 3.6.9 to 3.6.11.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-17 12:45:05 +03:00
fen0s 042981d8bb Update data.json (#540)
* Update data.json

* disable false positives

* Update data.json

* Update data.json
2022-07-12 14:31:22 +03:00
fen0s 2c2017c7db Update data.json (#539) 2022-07-10 12:49:03 +03:00
fen0s 4aeba4d648 Fixes july third (#535)
* fix falsepositives on megafon

* token spotify
2022-07-06 23:57:10 +03:00
fen0s de34e29188 yazbel, aboutcar, zhihu (#531)
* fix some sites and delete abandoned

* disable aboutcar, fix zhihu, add yazbel

* yazbel quickfix

* Squashed commit of the following:

commit 932152edac2765391e0203d6e75f6bffda73d643
Author: fen0s <37670363+fen0s@users.noreply.github.com>
Date:   Fri Jul 1 17:36:58 2022 +0300

    Update data.json

* fix forumsmotri,  teamtreehouse, sourceforge, tomshardware, disable codeby

* 2 sites disasbled, 6 fixed

disabled echomsk (dead), disabled chipmaker (weird search-based username detection while usernames are searched not by exact match), fixed rutracker, kloomba, mobypicture, gamefaqs, eporner, 1337x, tried to fix myfitnesspal but didn't work
2022-07-03 11:49:48 +03:00
fen0s 0c127a97d5 Fixesjulyfirst (#533)
* fix some sites and delete abandoned

* disable aboutcar, fix zhihu, add yazbel

* yazbel quickfix

* Squashed commit of the following:

commit 932152edac2765391e0203d6e75f6bffda73d643
Author: fen0s <37670363+fen0s@users.noreply.github.com>
Date:   Fri Jul 1 17:36:58 2022 +0300

    Update data.json

* fix forumsmotri,  teamtreehouse, sourceforge, tomshardware, disable codeby

* 2 sites disasbled, 6 fixed

disabled echomsk (dead), disabled chipmaker (weird search-based username detection while usernames are searched not by exact match), fixed rutracker, kloomba, mobypicture, gamefaqs, eporner, 1337x, tried to fix myfitnesspal but didn't work
2022-07-02 18:11:05 +03:00
fen0s 11f047b1ae fix some sites and delete abandoned (#526) 2022-06-23 13:18:02 +03:00
Soxoj 43f8adef66 Downgrade pycountry due to problems with wheels
Python 3.10 related problem
2022-06-18 20:07:31 +03:00
Sergey Mamadjanov 2ffb77823d feat: add *.log & *.bak files to gitignore (#511) 2022-06-08 01:45:50 +03:00
Soxoj 7ba8af0247 Compatibility with Python 10 (#509) 2022-06-05 01:12:54 +03:00
dependabot[bot] 814544e1a0 Bump lxml from 4.8.0 to 4.9.0
Bumps [lxml](https://github.com/lxml/lxml) from 4.8.0 to 4.9.0.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-4.8.0...lxml-4.9.0)

---
updated-dependencies:
- dependency-name: lxml
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-06-04 17:38:36 -04:00
Soxoj 477e62a5c5 Updated sites list, added disabled Anilist 2022-06-04 17:38:28 -04:00
dependabot[bot] 0a629614c2 Bump pefile from 2021.9.3 to 2022.5.30
Bumps [pefile](https://github.com/erocarrera/pefile) from 2021.9.3 to 2022.5.30.
- [Release notes](https://github.com/erocarrera/pefile/releases)
- [Commits](https://github.com/erocarrera/pefile/compare/v2021.9.3...v2022.5.30)

---
updated-dependencies:
- dependency-name: pefile
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-06-04 17:38:17 -04:00
dependabot[bot] e2d623f0d7 Bump pypdf2 from 1.28.2 to 2.0.0
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 1.28.2 to 2.0.0.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/main/CHANGELOG)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/1.28.2...2.0.0)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-06-04 17:38:01 -04:00
kustermariocoding 5145bfe820 added regexchecks for realmeye and realmeye-graveyard to prevent false positives. 2022-06-01 02:12:02 +03:00
kustermariocoding 58f66f5c3c added Bezuzyteczna and Znanylekarz.pl 2022-06-01 02:12:02 +03:00
kustermariocoding 746b74238b added forum.dangerousthings.com 2022-06-01 02:12:02 +03:00
kustermariocoding ae56a927cf added Wiki.vg 2022-06-01 02:12:02 +03:00
kustermariocoding 40ed0a7535 added Watchmemore.com 2022-06-01 02:12:02 +03:00
kustermariocoding beb4d740c7 removed Anilist because it's not working properly 2022-06-01 02:12:02 +03:00
dependabot[bot] a47b6a705e Bump pypdf2 from 1.28.1 to 1.28.2 (#493)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 1.28.1 to 1.28.2.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/1.28.2/CHANGELOG)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/1.28.1...1.28.2)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-26 00:59:55 +03:00
dependabot[bot] 3bfb2db6df Bump pypdf2 from 1.27.12 to 1.28.1 (#491)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 1.27.12 to 1.28.1.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/main/CHANGELOG)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/1.27.12...1.28.1)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-23 17:16:11 +03:00
dependabot[bot] d30ef15a79 Bump psutil from 5.9.0 to 5.9.1 (#490)
Bumps [psutil](https://github.com/giampaolo/psutil) from 5.9.0 to 5.9.1.
- [Release notes](https://github.com/giampaolo/psutil/releases)
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-5.9.0...release-5.9.1)

---
updated-dependencies:
- dependency-name: psutil
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-23 17:16:03 +03:00
dependabot[bot] 1ebf0ca5cf Bump certifi from 2021.10.8 to 2022.5.18.1 (#488)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2021.10.8 to 2022.5.18.1.
- [Release notes](https://github.com/certifi/python-certifi/releases)
- [Commits](https://github.com/certifi/python-certifi/compare/2021.10.08...2022.05.18.1)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-22 23:17:45 +03:00
Soxoj eaa545a2c4 Disabled sites with false positives results (#482) 2022-05-14 20:13:31 +03:00
Soxoj cbe1f09536 Added new forums, updated ranks, some utils improvements (#481)
* Added new forums, updated ranks, some utils improvements

* Updated requirements
2022-05-14 13:29:48 +03:00
Soxoj 246c770d5c Added new sites (#480) 2022-05-14 11:51:15 +03:00
Soxoj e88d71d792 New sites added, some tags/rank update (#477) 2022-05-14 10:58:27 +03:00
Soxoj 929366cc81 Improved usability of external progressbar func (#476) 2022-05-14 02:06:33 +03:00
Soxoj bb6ed59e44 Updated logic of false positive risk estimating (#475) 2022-05-10 14:54:09 +03:00
fen0s 6400d83a46 Social analyzer websites, also fixing presense strs (#471)
* add a lot of new sites from social analyzer, fix presenceStr

* add social-analyzer sites

* fix username claimed

* update site list

* Update data.json
2022-05-10 12:37:23 +03:00
dependabot[bot] 507d0dac3a Bump pyvis from 0.2.0 to 0.2.1 (#472)
Bumps [pyvis](https://github.com/WestHealth/pyvis) from 0.2.0 to 0.2.1.
- [Release notes](https://github.com/WestHealth/pyvis/releases)
- [Commits](https://github.com/WestHealth/pyvis/compare/v0.2.0...v0.2.1)

---
updated-dependencies:
- dependency-name: pyvis
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-10 11:06:26 +03:00
Soxoj f058ee0daf Fixed new false positives, updated sites list (#469) 2022-05-05 02:16:29 +03:00
dependabot[bot] a66c25452a Bump pypdf2 from 1.27.10 to 1.27.12 (#466)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 1.27.10 to 1.27.12.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/main/CHANGELOG)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/1.27.10...1.27.12)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-03 13:29:43 +03:00
dependabot[bot] bfc682f758 Bump pypdf2 from 1.27.9 to 1.27.10 (#465)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 1.27.9 to 1.27.10.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/main/CHANGELOG)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/1.27.9...1.27.10)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-02 19:10:17 +03:00
fen0s aedbe927cb fix Figma username definition, add a bunch of sites (#464)
* Add files via upload

Co-authored-by: fen0s <fen0s@example.com>
2022-05-01 19:52:20 +03:00
fen0s 340d8b45fe Add BYOND, Figma, BeatStars (#462)
* Add files via upload

* fix forums

* Add BYOND, Figma, BeatStars

Co-authored-by: fen0s <fen0s@example.com>
2022-05-01 00:45:57 +03:00
fen0s c95f0fdfbb Ubisoft forums addition (#461)
* Add files via upload

* fix forums

Co-authored-by: fen0s <fen0s@example.com>
2022-04-30 16:34:27 +03:00
dependabot[bot] a5b73d1108 Bump jinja2 from 3.1.1 to 3.1.2 (#460)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.1 to 3.1.2.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.1...3.1.2)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-29 15:27:02 +03:00
dependabot[bot] 6157c5ff3d Bump pytest from 7.0.1 to 7.1.2 (#457)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.0.1 to 7.1.2.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/7.0.1...7.1.2)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-27 01:15:36 +03:00
dependabot[bot] e0f0dd5d4d Bump pypdf2 from 1.27.8 to 1.27.9 (#456)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 1.27.8 to 1.27.9.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/main/CHANGELOG)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/1.27.8...1.27.9)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-27 01:15:28 +03:00
Soxoj 059c8198a1 False positive fixes 24.04.22 (#455)
* Fixed some false positives
2022-04-24 17:14:07 +03:00
Soxoj 34073d12f4 XMind 8 report warning and some docs update (#452) 2022-04-23 01:28:31 +03:00
dependabot[bot] d24d80ab43 Bump pypdf2 from 1.27.7 to 1.27.8 (#450)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 1.27.7 to 1.27.8.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/main/CHANGELOG)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/1.27.7...1.27.8)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-23 01:01:19 +03:00
Soxoj 123ec35569 Update bug.md 2022-04-21 10:34:57 +03:00
dependabot[bot] 73aa8b649b Bump pypdf2 from 1.27.6 to 1.27.7 (#449)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 1.27.6 to 1.27.7.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/main/CHANGELOG)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/1.27.6...1.27.7)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-21 01:15:39 +03:00
dependabot[bot] 28aa74d83a Bump soupsieve from 2.3.2 to 2.3.2.post1 (#444)
Bumps [soupsieve](https://github.com/facelessuser/soupsieve) from 2.3.2 to 2.3.2.post1.
- [Release notes](https://github.com/facelessuser/soupsieve/releases)
- [Commits](https://github.com/facelessuser/soupsieve/compare/2.3.2...2.3.2.post1)

---
updated-dependencies:
- dependency-name: soupsieve
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-19 23:40:35 +03:00
dependabot[bot] d4780d2840 Bump typing-extensions from 4.1.1 to 4.2.0 (#447)
Bumps [typing-extensions](https://github.com/python/typing) from 4.1.1 to 4.2.0.
- [Release notes](https://github.com/python/typing/releases)
- [Changelog](https://github.com/python/typing/blob/master/typing_extensions/CHANGELOG)
- [Commits](https://github.com/python/typing/compare/4.1.1...4.2.0)

---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-19 23:40:24 +03:00
dependabot[bot] 4c7b6d82cf Bump pypdf2 from 1.27.4 to 1.27.6 (#448)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 1.27.4 to 1.27.6.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/main/CHANGELOG)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/1.27.4...1.27.6)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-19 23:40:15 +03:00
dependabot[bot] 37d6b9a949 Bump pyvis from 0.1.9 to 0.2.0 (#443)
Bumps [pyvis](https://github.com/WestHealth/pyvis) from 0.1.9 to 0.2.0.
- [Release notes](https://github.com/WestHealth/pyvis/releases)
- [Commits](https://github.com/WestHealth/pyvis/commits/v0.2.0)

---
updated-dependencies:
- dependency-name: pyvis
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-19 23:39:55 +03:00
dependabot[bot] 2664094f65 Bump pypdf2 from 1.26.0 to 1.27.4 (#442)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 1.26.0 to 1.27.4.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/main/CHANGELOG)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/1.26.0...1.27.4)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-14 01:51:34 +03:00
dependabot[bot] d884fea00b Bump soupsieve from 2.3.1 to 2.3.2 (#436)
Bumps [soupsieve](https://github.com/facelessuser/soupsieve) from 2.3.1 to 2.3.2.
- [Release notes](https://github.com/facelessuser/soupsieve/releases)
- [Commits](https://github.com/facelessuser/soupsieve/compare/2.3.1...2.3.2)

---
updated-dependencies:
- dependency-name: soupsieve
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-14 01:50:28 +03:00
dependabot[bot] 4a4fa69e93 Bump jinja2 from 3.0.3 to 3.1.1 (#441)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.0.3 to 3.1.1.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.0.3...3.1.1)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-14 01:50:14 +03:00
dependabot[bot] 801bc388e4 Bump tqdm from 4.63.0 to 4.64.0 (#440)
Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.63.0 to 4.64.0.
- [Release notes](https://github.com/tqdm/tqdm/releases)
- [Commits](https://github.com/tqdm/tqdm/compare/v4.63.0...v4.64.0)

---
updated-dependencies:
- dependency-name: tqdm
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-14 01:48:13 +03:00
Soxoj 48fcfcb89b Update GH actions (#439) 2022-04-14 01:46:50 +03:00
dependabot[bot] 07db3ce463 Bump pypdf2 from 1.26.0 to 1.27.4 (#438)
Bumps [pypdf2](https://github.com/py-pdf/PyPDF2) from 1.26.0 to 1.27.4.
- [Release notes](https://github.com/py-pdf/PyPDF2/releases)
- [Changelog](https://github.com/py-pdf/PyPDF2/blob/main/CHANGELOG)
- [Commits](https://github.com/py-pdf/PyPDF2/compare/1.26.0...1.27.4)

---
updated-dependencies:
- dependency-name: pypdf2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Soxoj <31013580+soxoj@users.noreply.github.com>
2022-04-14 01:46:03 +03:00
dependabot[bot] f9f4449079 Bump pycountry from 22.1.10 to 22.3.5 (#384)
Bumps [pycountry](https://github.com/flyingcircusio/pycountry) from 22.1.10 to 22.3.5.
- [Release notes](https://github.com/flyingcircusio/pycountry/releases)
- [Changelog](https://github.com/flyingcircusio/pycountry/blob/master/HISTORY.txt)
- [Commits](https://github.com/flyingcircusio/pycountry/compare/22.1.10...22.3.5)

---
updated-dependencies:
- dependency-name: pycountry
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-14 01:36:30 +03:00
dependabot[bot] 0d4236e2d4 Bump markupsafe from 2.0.1 to 2.1.1 (#389)
Bumps [markupsafe](https://github.com/pallets/markupsafe) from 2.0.1 to 2.1.1.
- [Release notes](https://github.com/pallets/markupsafe/releases)
- [Changelog](https://github.com/pallets/markupsafe/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/markupsafe/compare/2.0.1...2.1.1)

---
updated-dependencies:
- dependency-name: markupsafe
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-14 01:36:22 +03:00
dependabot[bot] b2db783620 Bump reportlab from 3.6.6 to 3.6.9 (#403)
Bumps [reportlab](http://www.reportlab.com/) from 3.6.6 to 3.6.9.

---
updated-dependencies:
- dependency-name: reportlab
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-14 01:36:09 +03:00
dependabot[bot] b27c53b5b6 Bump xhtml2pdf from 0.2.5 to 0.2.7 (#409)
Bumps [xhtml2pdf](https://github.com/xhtml2pdf/xhtml2pdf) from 0.2.5 to 0.2.7.
- [Release notes](https://github.com/xhtml2pdf/xhtml2pdf/releases)
- [Commits](https://github.com/xhtml2pdf/xhtml2pdf/compare/0.2.5...v0.2.7)

---
updated-dependencies:
- dependency-name: xhtml2pdf
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-14 01:35:59 +03:00
Soxoj 6691b26674 Drop Python 3.6 support (#434) 2022-04-14 01:32:30 +03:00
Soxoj 131b96ddb3 Fixed some false positives (#433)
* Fixed some false positives

* Updated site list and statistics
2022-04-14 01:11:42 +03:00
Soxoj 0803d8ebaa Merge pull request #432 from soxoj/0.4.3
Bump to 0.4.3
2022-04-13 22:58:45 +03:00
Soxoj 19956f74ca Bump to 0.4.3 2022-04-13 22:58:21 +03:00
Soxoj dd57019c80 Merge pull request #431 from soxoj/fix-false-positives-13-04-22
Fixed actual false positives
2022-04-13 22:51:15 +03:00
Maigret autoupdate 9fb265ea85 Updated site list and statistics 2022-04-13 19:50:19 +00:00
Soxoj 0f9fdfc639 Fixed actual false positives 2022-04-13 22:47:02 +03:00
Soxoj 0de087d751 Merge pull request #424 from soxoj/false-positive-fixes-08-04-22
Fixed last false positives
2022-04-08 00:19:04 +03:00
Maigret autoupdate 600e58f8ef Updated site list and statistics 2022-04-07 21:18:14 +00:00
Soxoj 16131c58f9 Fixed last false positives 2022-04-08 00:17:05 +03:00
Soxoj 5106d32342 Merge pull request #422 from soxoj/houzz
Disabled houzz.com, updated sites statistics
2022-04-08 00:07:09 +03:00
Maigret autoupdate 1456ff6bc1 Updated site list and statistics 2022-04-07 21:04:32 +00:00
Soxoj b94fb65809 Disabled houzz.com, updated sites statistics 2022-04-08 00:03:28 +03:00
Soxoj e283d8b561 Merge pull request #413 from kustermariocoding/main
Added new Websites to data.json
2022-04-08 00:02:19 +03:00
kustermariocoding 7cd727bbff updated sites.md 2022-04-05 13:19:13 +02:00
kustermariocoding 5532c00b04 Merge branch 'main' into site_adds 2022-04-05 11:28:56 +02:00
kustermariocoding 8846b8b225 added Traktrain 2022-04-05 11:26:54 +02:00
kustermariocoding 7307c98029 added Sportlerfrage 2022-04-05 11:24:13 +02:00
kustermariocoding 4d129c2c6b added Splice 2022-04-05 11:22:17 +02:00
kustermariocoding 1e772b7dd4 added Swapd 2022-04-05 11:19:58 +02:00
kustermariocoding 81bb0a01b2 added Reisefrage 2022-04-05 11:17:16 +02:00
kustermariocoding 7ae8b58e1a added RcloneForum 2022-04-05 11:14:09 +02:00
kustermariocoding dde8bf8af0 added Polymart 2022-04-05 11:10:26 +02:00
kustermariocoding dc4addd985 added Needrom 2022-04-05 11:01:39 +02:00
kustermariocoding 803f62f7b7 added Motorradfrage 2022-04-05 10:58:12 +02:00
kustermariocoding 91596b31ec added Mapify.travel 2022-04-05 10:55:31 +02:00
kustermariocoding a27fea4ba4 added Lottiefiles 2022-04-05 10:43:23 +02:00
kustermariocoding ba9a94debc added Listed.to 2022-04-05 10:40:46 +02:00
kustermariocoding ac80d26cab added Lesswrong 2022-04-05 10:34:06 +02:00
kustermariocoding e4aea719fa added Keakr 2022-04-05 10:30:48 +02:00
kustermariocoding 4b18ecbd4b added JoplinApp 2022-04-05 10:28:44 +02:00
kustermariocoding c2a4c64640 added IonicFrameWorks 2022-04-05 10:24:22 +02:00
kustermariocoding 47045dd653 added Grailed 2022-04-05 10:04:24 +02:00
kustermariocoding b65a85368b added Gitbook 2022-04-05 10:02:25 +02:00
kustermariocoding daf483b097 added Gesundheitsfrage 2022-04-05 09:59:17 +02:00
kustermariocoding 838a0c5e0c added GeniusArtists 2022-04-05 09:55:26 +02:00
kustermariocoding 0ccaccfcde added G2g.com 2022-04-05 09:51:35 +02:00
kustermariocoding d1e7f5c113 added Finanzfrage 2022-04-05 09:46:42 +02:00
kustermariocoding bfb5b85c41 added Fameswap 2022-04-05 09:44:35 +02:00
kustermariocoding effd753512 added Cryptomator Forum 2022-04-05 09:42:05 +02:00
kustermariocoding cfc777d45d added Bikemap 2022-04-05 09:39:45 +02:00
kustermariocoding 422f65afbe added Autofrage 2022-04-05 09:37:33 +02:00
kustermariocoding 135b554030 added Airbit 2022-04-05 09:35:06 +02:00
kustermariocoding 47edb4427a added Buzznet 2022-04-04 11:27:11 +02:00
kustermariocoding bda6c7c390 added Patronite 2022-04-04 11:17:16 +02:00
kustermariocoding f0f7334f31 added Archive.org Parler Posts 2022-04-04 11:11:38 +02:00
kustermariocoding 669f92c34b added Archiver.org Parler Profiles 2022-04-04 11:09:19 +02:00
kustermariocoding b657c1323d added Ow.ly 2022-04-04 11:02:21 +02:00
Soxoj 692f401043 Merge pull request #406 from soxoj/update-stats
Updated statistics
2022-03-31 00:08:50 +03:00
Soxoj 27f91ddbe3 Updated statistics 2022-03-31 00:07:45 +03:00
Soxoj 72fccb2868 Merge pull request #404 from kustermariocoding/main
Added new Websites to data.json
2022-03-31 00:00:48 +03:00
kustermariocoding a959243282 added Iconfinder 2022-03-30 14:24:19 +02:00
kustermariocoding 42895e81a8 Merge branch 'main' into site_adds
to add new websites to data.json
2022-03-29 10:19:50 +02:00
Soxoj fb9663599e Merge pull request #401 from kustermariocoding/main
Added new Websites to data.json
2022-03-29 00:10:09 +03:00
kustermariocoding 005685e69a added zmarsa.com 2022-03-28 14:30:23 +02:00
kustermariocoding eb70f91db9 added zbiornik.com 2022-03-28 14:18:44 +02:00
kustermariocoding a3eaf6130e added zatrybi.pl 2022-03-28 14:10:25 +02:00
kustermariocoding 2ce65ca45a added xvideos models 2022-03-28 13:58:17 +02:00
kustermariocoding 46a14631ea added xanga 2022-03-28 13:48:58 +02:00
kustermariocoding 2699cd221f added Wordpress Support 2022-03-28 13:45:10 +02:00
kustermariocoding 2a7851c814 added Wordnik 2022-03-28 12:03:17 +02:00
kustermariocoding 1356cc8e3a added WolniSlowianie 2022-03-28 12:00:16 +02:00
kustermariocoding 523966eaf2 added Wimkin Public Profiles 2022-03-28 11:56:14 +02:00
kustermariocoding 21f5db5661 added wicgforum 2022-03-28 11:30:19 +02:00
kustermariocoding 6b52c41b97 added wego.social 2022-03-28 11:17:53 +02:00
kustermariocoding 8c898bd356 added Voice123 2022-03-28 11:01:20 +02:00
kustermariocoding e725a73c8f added vizjer.pl 2022-03-28 10:53:24 +02:00
kustermariocoding 645abfe72c added Vine 2022-03-28 10:45:05 +02:00
kustermariocoding 17886bb9fa added Viddler 2022-03-28 10:42:29 +02:00
kustermariocoding 5b6cf4f15a added usa.life 2022-03-28 10:02:23 +02:00
kustermariocoding ca1d5e3a76 added ulub.pl 2022-03-28 09:57:15 +02:00
kustermariocoding 52789abda7 added ultrasdiary.pl 2022-03-28 09:48:55 +02:00
kustermariocoding 54f1f1feaa added twpro.jp 2022-03-28 09:37:38 +02:00
kustermariocoding ea33f4150f added Archive.org Twitter Tweets 2022-03-28 09:32:07 +02:00
kustermariocoding 7ff52e60a2 added Archive.org TwitterProfiles 2022-03-28 09:29:20 +02:00
kustermariocoding e5420e4639 added Twitcasting 2022-03-28 09:21:06 +02:00
kustermariocoding 393469ddfd added tunefind 2022-03-28 08:58:39 +02:00
kustermariocoding 0b03a7ab00 added tldrlegal.com 2022-03-25 14:15:14 +01:00
kustermariocoding dd13010bb5 added thetattooforum 2022-03-25 14:06:15 +01:00
kustermariocoding e3bd89c9e4 added thegatewaypundit 2022-03-25 14:02:46 +01:00
kustermariocoding 00865db0f6 added tfl.net.pl 2022-03-25 13:59:54 +01:00
kustermariocoding 8635abe79f added tf2items.com 2022-03-25 13:56:57 +01:00
kustermariocoding 8fbe6b42de added tetr.io 2022-03-25 13:42:41 +01:00
kustermariocoding db12e7b563 added tenor.com 2022-03-25 13:34:43 +01:00
kustermariocoding 77c9bda3e5 added teknik.io 2022-03-25 13:24:00 +01:00
kustermariocoding 54547c797a added taskrabbit 2022-03-25 11:43:38 +01:00
kustermariocoding 7e0b20e8fb added tanuki.pl 2022-03-25 11:33:58 +01:00
kustermariocoding 85288dccb5 added szmer.info 2022-03-25 09:34:22 +01:00
kustermariocoding d973831dc1 added szerokikadr.pl 2022-03-25 09:30:10 +01:00
kustermariocoding 12502c020c added suzuri.jp 2022-03-25 09:25:24 +01:00
kustermariocoding ce48c317b2 fixed headers for vimeo -merge conflict 2022-03-22 14:49:57 +01:00
kustermariocoding 41a277237c added Spankpay 2022-03-22 14:32:34 +01:00
kustermariocoding 721ff2874f added Solikick 2022-03-22 14:20:08 +01:00
kustermariocoding 3cdca22b9d added Citizen4 2022-03-22 14:17:07 +01:00
kustermariocoding 346611c5da added slant.co and fixed usernameClaimed for skeb.jp 2022-03-22 14:13:17 +01:00
kustermariocoding a8e538ad29 added Skeb.jp 2022-03-22 14:10:06 +01:00
kustermariocoding 95ff061cf6 added Shanii Writes 2022-03-22 14:05:17 +01:00
kustermariocoding 5bb5e29ffb added Sfd.pl 2022-03-22 14:02:17 +01:00
kustermariocoding ac3e0b16e4 added Seneporno 2022-03-22 13:57:13 +01:00
kustermariocoding 970b75b88d added regexcheck für Hackerrank 2022-03-18 15:05:04 +01:00
kustermariocoding 8f6b40c8d0 added a regexcheck for gumroad 2022-03-18 15:03:08 +01:00
kustermariocoding ccebd677e3 updated data.json 2022-03-18 08:24:24 +01:00
Soxoj 75625f72f8 Merge pull request #397 from soxoj/skip-broken-tests
Skipped broken tests
2022-03-18 01:54:59 +03:00
Maigret autoupdate f6dbe1a6bd Updated site list and statistics 2022-03-17 22:52:22 +00:00
Soxoj a914283a15 Skipped broken tests 2022-03-18 01:51:14 +03:00
Soxoj 2a4f4d47e2 Merge pull request #390 from kustermariocoding/main
added new Websites to data.json
2022-03-18 01:18:20 +03:00
kustermariocoding 50350972a5 fixed url and absense/presence strings for friendfinder-x.com 2022-03-17 11:52:14 +01:00
kustermariocoding cdb69f99a1 added Scoutwiki 2022-03-16 15:04:26 +01:00
kustermariocoding 4786822e6d added Saracartershow 2022-03-16 12:07:12 +01:00
kustermariocoding 9c56f29267 added Salon24.pl 2022-03-16 11:51:03 +01:00
kustermariocoding 1ee4f4c93b added runescape 2022-03-16 11:42:00 +01:00
kustermariocoding 9e302542ed Merge branch 'main' into site_adds 2022-03-16 11:14:55 +01:00
kustermariocoding 3409f8a726 added RumbleUser 2022-03-16 11:14:20 +01:00
kustermariocoding 94bfa4233d added Rumblechannel 2022-03-16 11:11:28 +01:00
kustermariocoding 9c08c34007 added Ourfreedombook 2022-03-16 11:05:13 +01:00
kustermariocoding 880ffb4bf1 added lowcygier.pl 2022-03-16 11:02:01 +01:00
kustermariocoding d987c681b7 added line.me 2022-03-16 10:55:58 +01:00
kustermariocoding 2ef141a5c5 added d3.ru 2022-03-16 10:43:42 +01:00
kustermariocoding 809b97d4f9 changed usernameClaimed for Bugcrowd to a working one 2022-03-16 10:29:40 +01:00
kustermariocoding 4a1342b654 added Justforfans 2022-03-16 10:25:45 +01:00
kustermariocoding fb200875d3 added engadget 2022-03-16 10:15:02 +01:00
kustermariocoding 53bc79938c added elftown 2022-03-16 10:11:51 +01:00
kustermariocoding 3866c1be9e added chamsko.pl 2022-03-16 09:47:46 +01:00
kustermariocoding ca65ffe864 added cda.pl and changed usernameClaimed of cdaction.pl 2022-03-16 09:44:14 +01:00
kustermariocoding c9638f704f added cd-action 2022-03-16 09:34:17 +01:00
kustermariocoding 39c57e7925 added Cash.app 2022-03-16 09:31:30 +01:00
kustermariocoding 1b5c39dc1b added carrd.co 2022-03-16 08:57:09 +01:00
kustermariocoding 379fca8602 added Americanthinker 2022-03-16 08:47:50 +01:00
kustermariocoding 9716f40140 added anonup 2022-03-16 08:31:49 +01:00
kustermariocoding 61d346dd0a added ApexLegends 2022-03-16 08:30:21 +01:00
kustermariocoding 5edfc00b2d added ruby.dating 2022-03-14 14:20:17 +01:00
kustermariocoding 5905dcf384 added rigcz.club 2022-03-14 14:09:33 +01:00
kustermariocoding 67046273c7 added quizlet.com 2022-03-14 14:01:34 +01:00
kustermariocoding b4fd2fe40f added quitter.pl 2022-03-14 13:47:33 +01:00
kustermariocoding 7113824c59 added prv.pl 2022-03-14 13:32:39 +01:00
Soxoj a2e782d07c Merge pull request #386 from kustermariocoding/main
Added Sites to data.json
2022-03-14 01:56:14 +03:00
kustermariocoding 4b2d030d7a added poshmark 2022-03-11 10:50:09 +01:00
kustermariocoding e98c97dbb1 added Pornhub Pornstars 2022-03-11 10:43:44 +01:00
kustermariocoding fd4d570b59 added Polleverywhere 2022-03-11 10:13:20 +01:00
kustermariocoding 9892532aae added policja2009 2022-03-10 12:01:56 +01:00
kustermariocoding 66422332c4 added Polczat.pl 2022-03-10 11:36:41 +01:00
kustermariocoding 8b1eb15939 added pol.social 2022-03-10 11:32:44 +01:00
kustermariocoding 06df4661bc added Piekielni 2022-03-10 11:26:21 +01:00
kustermariocoding eaa126906f added pewex.pl 2022-03-10 11:19:29 +01:00
kustermariocoding 1c7cbbc27d added olx.pl 2022-03-10 10:30:59 +01:00
kustermariocoding 0eed5ced7d added oglaszamy24h 2022-03-10 10:18:53 +01:00
kustermariocoding 30f3ac4889 added nyaa.si 2022-03-10 10:02:08 +01:00
kustermariocoding 0212796696 Merge remote-tracking branch 'origin' into site_adds 2022-03-09 14:21:14 +01:00
kustermariocoding 6c723f8329 added ninjakiwi 2022-03-09 14:18:32 +01:00
kustermariocoding b1bfbbc371 added Naturalnews.com 2022-03-09 14:11:39 +01:00
kustermariocoding ee8eabc5ed added mym.fans 2022-03-09 14:05:33 +01:00
kustermariocoding cf6bb0bd7a added Motokiller.pl 2022-03-09 13:37:44 +01:00
kustermariocoding 93b542dad2 added Mistrzowie 2022-03-09 12:22:57 +01:00
kustermariocoding ec6324473a added Minecraftlist 2022-03-09 12:16:48 +01:00
kustermariocoding 263afb8990 added megamodels.pl 2022-03-09 11:55:37 +01:00
kustermariocoding 7016161206 added medyczka.pl 2022-03-09 11:34:14 +01:00
Soxoj 470ef5721f Merge pull request #385 from soxoj/v0.4.2
Bump to 0.4.2
2022-03-07 20:12:59 +03:00
Maigret autoupdate fd2c8afd33 Updated site list and statistics 2022-03-07 16:44:19 +00:00
cyb3rk0tik 8c007219f5 Bump to 0.4.2 2022-03-07 21:42:34 +05:00
Soxoj a425e5ceff Merge pull request #380 from soxoj/dependabot/pip/pytest-asyncio-0.18.2
Bump pytest-asyncio from 0.18.1 to 0.18.2
2022-03-07 15:36:39 +03:00
Soxoj d0fd3533b5 Merge pull request #374 from soxoj/dependabot/pip/tqdm-4.63.0
Bump tqdm from 4.62.3 to 4.63.0
2022-03-07 15:36:25 +03:00
kustermariocoding 7d225750ac added Mcuuid(Minecraft) 2022-03-07 11:02:06 +01:00
kustermariocoding 286319b6ec added MassageAnywhere 2022-03-07 10:28:44 +01:00
kustermariocoding fef323ab7d added martech 2022-03-07 10:09:47 +01:00
kustermariocoding 05c29c8c77 added marshmallow 2022-03-07 09:58:08 +01:00
kustermariocoding d18d5c96d9 added MapMyTracks 2022-03-07 09:49:02 +01:00
kustermariocoding 1da4345a50 added magabook 2022-03-07 09:40:27 +01:00
kustermariocoding c5b9f4e0fa added maga-chat 2022-03-07 09:28:04 +01:00
Soxoj 5bf361a1ac Merge pull request #382 from soxoj/fix-alexa-rank
Fixed issue with str alexaRank
2022-03-06 16:23:19 +03:00
Maigret autoupdate e07d3b60ba Updated site list and statistics 2022-03-06 13:20:31 +00:00
Soxoj 1e2d5cf742 Fixed issue with str alexaRank 2022-03-06 16:19:25 +03:00
Soxoj 694e024ba1 Merge pull request #375 from kustermariocoding/main
Added new sites to data.json
2022-03-06 16:17:50 +03:00
dependabot[bot] 6862425215 Bump pytest-asyncio from 0.18.1 to 0.18.2
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.18.1 to 0.18.2.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.18.1...v0.18.2)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-04 06:05:57 +00:00
kustermariocoding 54c8074e51 tried to fix merge conflicts 2022-02-28 11:41:38 +01:00
dependabot[bot] 71e1fb6dcf Bump tqdm from 4.62.3 to 4.63.0
Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.62.3 to 4.63.0.
- [Release notes](https://github.com/tqdm/tqdm/releases)
- [Commits](https://github.com/tqdm/tqdm/compare/v4.62.3...v4.63.0)

---
updated-dependencies:
- dependency-name: tqdm
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-28 06:07:15 +00:00
Soxoj 364187861d Fix false positive and CI (#372)
* Fix false positive and CI
* Updated site list and statistics
2022-02-27 04:44:15 +03:00
Soxoj 8a53a38543 Fixed the rest of false positives for now (#371)
* Fixed the rest of false positives for now

* Fixed tag

* Updated site list and statistics
2022-02-26 16:43:40 +03:00
Soxoj bc787cdf51 Fix false positives (#370)
* Fixed several false positives, improved statistics info

* Disabled some sites, fixed fp percent count method

* Updated site list and statistics
2022-02-26 16:01:22 +03:00
Soxoj dcf5181e28 Fixed several false positives, improved statistics info (#368)
* Fixed several false positives, improved statistics info

* Updated site list and statistics
2022-02-26 15:31:15 +03:00
Soxoj 61452d56d3 Disabled Netvibes and LeetCode (#366)
* Disabled Netvibes and LeetCode

* Specified types of PR for tests in CI

* Updated site list and statistics
2022-02-26 14:49:43 +03:00
Soxoj be204ff119 Wikipedia fix (#365)
* Fixed op.gg sites

* Added testing docs, fixed some error

* Fixed Wikipedia
2022-02-26 14:27:08 +03:00
Soxoj 8a865a1ce6 Op.gg fixes (#363)
* Fixed op.gg sites

* Added testing docs, fixed some error

* Updated site list and statistics
2022-02-26 14:16:13 +03:00
Soxoj a29c3c6abe CI autoupdate (#359)
* CI autoupdate

* Updated site list and statistics
2022-02-26 13:38:15 +03:00
kustermariocoding ea6fd30a30 added kotburger.pl 2022-02-24 11:51:00 +01:00
kustermariocoding 8dbe9a415c added karab.in 2022-02-24 11:45:23 +01:00
kustermariocoding 222398154e added joemonster 2022-02-24 11:25:18 +01:00
kustermariocoding 3030025ea3 added jellyfin weblate 2022-02-24 11:11:40 +01:00
kustermariocoding 40233e66cb added jeja.pl 2022-02-24 10:57:13 +01:00
kustermariocoding 2ea75f7f76 added jbzd 2022-02-24 10:50:33 +01:00
kustermariocoding dbd393da58 added ipolska.pl 2022-02-24 10:34:03 +01:00
kustermariocoding b9f72151ea added Inkbunny 2022-02-24 10:08:59 +01:00
kustermariocoding dc2989a47d added hexrpg 2022-02-24 09:53:56 +01:00
kustermariocoding c86e558a57 added hackerrank 2022-02-24 09:41:14 +01:00
kustermariocoding 3c8c1d1f5a Merge branch 'main' of https://github.com/soxoj/maigret into site_adds 2022-02-24 09:39:58 +01:00
Soxoj 1683e5b744 Added DB statistics autoupdate and write to sites.md (#357) 2022-02-23 18:01:42 +03:00
Soxoj 31fc656721 Added package publishing instruction (#356) 2022-02-23 16:46:58 +03:00
Soxoj 79f872c77c Added some scripts (#355) 2022-02-23 14:33:37 +03:00
kustermariocoding 22f158e749 added gradle 2022-02-22 11:42:39 +01:00
kustermariocoding ff1eac0b20 added gnome vcs 2022-02-22 11:23:16 +01:00
kustermariocoding f2d3fed9c7 added Furaffinity 2022-02-22 10:26:58 +01:00
kustermariocoding cbbdc5a820 added friendfinder-x 2022-02-22 10:15:56 +01:00
kustermariocoding 8a614001fd added friendfinder 2022-02-22 09:49:37 +01:00
kustermariocoding 7a50f2922a Merge branch 'main' of https://github.com/soxoj/maigret into site_adds 2022-02-22 09:15:39 +01:00
kustermariocoding da0f4ae7cf added fotka 2022-02-22 09:15:11 +01:00
kustermariocoding d12310bb53 added fosstodon 2022-02-22 08:59:12 +01:00
cyberkotik 211b8ccfd0 Merge pull request #352 from soxoj/cyb3rk0tik-patch-1
Fix reportlab not only for testing
2022-02-21 23:52:58 +05:00
cyberkotik f352f9f58b Fix reportlab not only for testing 2022-02-21 23:42:49 +05:00
kustermariocoding 0d70ee1abc added forumprawne.org 2022-02-21 14:43:08 +01:00
kustermariocoding 032ca8141a added fedi.lewactwo.pl 2022-02-21 14:28:48 +01:00
kustermariocoding 3acf6e5180 added fansly 2022-02-21 14:20:54 +01:00
kustermariocoding 14f2b0c756 added fancentro.com 2022-02-21 12:50:41 +01:00
cyberkotik e0a4775205 Merge pull request #351 from soxoj/cyb3rk0tik-patch-1
Pin reportlab version
2022-02-21 16:47:25 +05:00
cyberkotik d056eb545f Pin reportlab version 2022-02-21 16:39:56 +05:00
kustermariocoding 10f8e1f597 added faktopedia.pl 2022-02-21 12:12:27 +01:00
kustermariocoding 6cc789d800 added fabswingers 2022-02-21 11:59:34 +01:00
kustermariocoding c214f38841 Merge branch 'main' of https://github.com/soxoj/maigret into site_adds 2022-02-21 11:56:35 +01:00
cyberkotik 392b83c230 Merge pull request #350 from soxoj/dependabot/pip/lxml-4.8.0
Bump lxml from 4.7.1 to 4.8.0
2022-02-21 15:23:31 +05:00
cyberkotik 96bebd49d3 Merge pull request #346 from soxoj/dependabot/pip/typing-extensions-4.1.1
Bump typing-extensions from 4.0.1 to 4.1.1
2022-02-21 15:23:19 +05:00
cyberkotik 92950f1b88 Merge pull request #345 from soxoj/dependabot/pip/pytest-7.0.1
Bump pytest from 7.0.0 to 7.0.1
2022-02-21 15:21:53 +05:00
cyberkotik 07b5874802 Merge pull request #343 from soxoj/dependabot/pip/pytest-asyncio-0.18.1
Bump pytest-asyncio from 0.18.0 to 0.18.1
2022-02-21 15:21:42 +05:00
kustermariocoding 6a62586a59 added dojoverse 2022-02-18 15:00:36 +01:00
kustermariocoding 883abe7877 added demotywatory.pl 2022-02-18 13:53:34 +01:00
kustermariocoding fc58046a34 added cytoid.io 2022-02-18 11:50:54 +01:00
kustermariocoding b6a1eb26e7 added Cults3d 2022-02-18 11:37:16 +01:00
kustermariocoding 42169397fe added chomukij.pl and crowdin.com 2022-02-18 11:18:57 +01:00
kustermariocoding 870d68ec1c added site castingcallclub 2022-02-18 09:22:18 +01:00
kustermariocoding 12ef7f62c2 added site caringbridge 2022-02-18 09:05:16 +01:00
dependabot[bot] 8b7ea67edc Bump lxml from 4.7.1 to 4.8.0
Bumps [lxml](https://github.com/lxml/lxml) from 4.7.1 to 4.8.0.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-4.7.1...lxml-4.8.0)

---
updated-dependencies:
- dependency-name: lxml
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-18 06:07:51 +00:00
dependabot[bot] 182a493b6a Bump typing-extensions from 4.0.1 to 4.1.1
Bumps [typing-extensions](https://github.com/python/typing) from 4.0.1 to 4.1.1.
- [Release notes](https://github.com/python/typing/releases)
- [Changelog](https://github.com/python/typing/blob/master/typing_extensions/CHANGELOG)
- [Commits](https://github.com/python/typing/compare/4.0.1...4.1.1)

---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-14 06:09:08 +00:00
dependabot[bot] 4f7781b7a2 Bump pytest from 7.0.0 to 7.0.1
Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.0.0 to 7.0.1.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/7.0.0...7.0.1)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-14 06:09:03 +00:00
dependabot[bot] 3579f2fd09 Bump pytest-asyncio from 0.18.0 to 0.18.1
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.18.0 to 0.18.1.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.18.0...v0.18.1)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-11 06:14:37 +00:00
kustermariocoding 34b8d938f7 added site blogi.pl 2022-02-10 14:32:47 +01:00
kustermariocoding ea963af29b added Bitwarden Forum 2022-02-10 14:17:20 +01:00
kustermariocoding 5ea5f6337d added site Biggerpockets 2022-02-10 14:00:05 +01:00
kustermariocoding 292d0a2665 added site Bentbox 2022-02-10 13:42:33 +01:00
kustermariocoding 057bdce751 added site Bandlab 2022-02-10 13:24:29 +01:00
kustermariocoding f051cc768e added AvidCommunity Site 2022-02-10 12:06:56 +01:00
kustermariocoding 985f4075f4 added site Artistsnclients 2022-02-10 11:29:14 +01:00
kustermariocoding d88abc6271 added site arduino.cc 2022-02-10 11:14:27 +01:00
kustermariocoding 63b99338d7 added new site appian 2022-02-10 10:35:50 +01:00
kustermariocoding bd3503f3c8 added 101010.pl website to data.json 2022-02-08 14:51:59 +01:00
dependabot[bot] d7f94076bf Bump pytest-asyncio from 0.17.2 to 0.18.0 (#340)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.17.2 to 0.18.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.17.2...v0.18.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-08 12:28:58 +03:00
dependabot[bot] 10879c8bf3 Bump pytest from 6.2.5 to 7.0.0 (#339)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 6.2.5 to 7.0.0.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/6.2.5...7.0.0)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-07 12:42:00 +03:00
dependabot[bot] b48d126118 Bump pytest-httpserver from 1.0.3 to 1.0.4 (#334)
Bumps [pytest-httpserver](https://github.com/csernazs/pytest-httpserver) from 1.0.3 to 1.0.4.
- [Release notes](https://github.com/csernazs/pytest-httpserver/releases)
- [Changelog](https://github.com/csernazs/pytest-httpserver/blob/master/CHANGES.rst)
- [Commits](https://github.com/csernazs/pytest-httpserver/compare/1.0.3...1.0.4)

---
updated-dependencies:
- dependency-name: pytest-httpserver
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-29 00:10:17 +03:00
dependabot[bot] c2c2707fb6 Bump multidict from 6.0.1 to 6.0.2 (#333)
Bumps [multidict](https://github.com/aio-libs/multidict) from 6.0.1 to 6.0.2.
- [Release notes](https://github.com/aio-libs/multidict/releases)
- [Changelog](https://github.com/aio-libs/multidict/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/multidict/compare/v6.0.1...v6.0.2)

---
updated-dependencies:
- dependency-name: multidict
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-25 23:05:58 +03:00
dependabot[bot] 5e16edc003 Bump multidict from 5.2.0 to 6.0.1 (#332)
* Bump multidict from 5.2.0 to 6.0.1

Bumps [multidict](https://github.com/aio-libs/multidict) from 5.2.0 to 6.0.1.
- [Release notes](https://github.com/aio-libs/multidict/releases)
- [Changelog](https://github.com/aio-libs/multidict/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/multidict/compare/v5.2.0...v6.0.1)

---
updated-dependencies:
- dependency-name: multidict
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fixed Python 3.6 compatibility

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Soxoj <soxoj@protonmail.com>
2022-01-25 00:52:11 +03:00
Soxoj e84b5e3d5d Disable kinooh, sites list update workflow added (#329)
* Disable kinooh, sites list update workflow added

* Workflow update
2022-01-22 00:37:49 +03:00
Soxoj 4d65d03074 Disabled Ruboard (#327) 2022-01-21 02:11:08 +03:00
Soxoj 222e8d3d09 Update logo 2022-01-18 23:36:02 +03:00
dependabot[bot] 92c7e41439 Bump pytest-asyncio from 0.17.1 to 0.17.2 (#323)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.17.1 to 0.17.2.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.17.1...v0.17.2)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-18 17:34:08 +03:00
dependabot[bot] 55f941cf18 Bump pytest-asyncio from 0.17.0 to 0.17.1 (#321)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.17.0 to 0.17.1.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.17.0...v0.17.1)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-17 19:29:03 +03:00
imgbot[bot] fa6bb1ee17 [ImgBot] Optimize images (#319)
*Total -- 1,424.29kb -> 846.97kb (40.53%)

/static/report_alexaimephotography_xmind_screenshot.png -- 772.80kb -> 351.39kb (54.53%)
/static/report_alexaimephotography_html_screenshot.png -- 606.94kb -> 451.06kb (25.68%)
/static/recursive_search.svg -- 44.55kb -> 44.52kb (0.07%)

Signed-off-by: ImgBotApp <ImgBotHelp@gmail.com>

Co-authored-by: ImgBotApp <ImgBotHelp@gmail.com>
2022-01-15 15:33:42 +03:00
103 changed files with 47306 additions and 28791 deletions
+3
View File
@@ -0,0 +1,3 @@
#!/bin/sh
echo 'Activating update_sitesmd hook script...'
poetry run update_sitesmd
+2
View File
@@ -1,3 +1,5 @@
# These are supported funding model platforms
patreon: soxoj
github: soxoj
buy_me_a_coffee: soxoj
+5 -1
View File
@@ -15,10 +15,14 @@ assignees: soxoj
## Description
Info about Maigret version you are running and environment (`--version`, operation system, ISP provuder):
Info about Maigret version you are running and environment (`--version`, operation system, ISP provider):
<INSERT VERSION INFO HERE>
How to reproduce this bug (commandline options / conditions):
<INSERT EXAMPLE OF CLI COMMAND HERE>
<DESCRIPTION>
<PASTE SCREENSHOT>
<ATTACH LOG FILE>
+1
View File
@@ -27,6 +27,7 @@ jobs:
with:
push: true
tags: ${{ secrets.DOCKER_HUB_USERNAME }}/maigret:latest
platforms: linux/amd64,linux/arm64
-
name: Image digest
run: echo ${{ steps.docker_build.outputs.digest }}
+60 -14
View File
@@ -2,23 +2,69 @@ name: Package exe with PyInstaller - Windows
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
branches: [main, dev]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: PyInstaller Windows
uses: JackMcKew/pyinstaller-action-windows@main
with:
path: pyinstaller
- name: Checkout
uses: actions/checkout@v4
- uses: actions/upload-artifact@v2
with:
name: maigret_standalone_win32
path: pyinstaller/dist/windows # or path/to/artifact
# Wine Python (not Linux) runs PyInstaller; altgraph needs pkg_resources — reinstall setuptools after all deps.
- name: Prepare requirements for Wine (setuptools last)
run: |
set -euo pipefail
cp pyinstaller/requirements.txt pyinstaller/requirements-wine.txt
{
echo ""
echo "# CI: setuptools last so pkg_resources exists for PyInstaller/altgraph in Wine"
echo "setuptools==70.0.0"
} >> pyinstaller/requirements-wine.txt
- name: PyInstaller Windows Build
uses: JackMcKew/pyinstaller-action-windows@main
with:
path: pyinstaller
requirements: requirements-wine.txt
- name: Upload PyInstaller Binary to Workflow as Artifact
if: success()
uses: actions/upload-artifact@v4
with:
name: maigret_standalone_win32
path: pyinstaller/dist/windows
- name: Download PyInstaller Binary
if: success()
uses: actions/download-artifact@v4
with:
name: maigret_standalone_win32
- name: Create New Release and Upload PyInstaller Binary to Release
if: success()
uses: ncipollo/release-action@v1.14.0
id: create_release
with:
allowUpdates: true
draft: false
prerelease: false
artifactErrorsFailBuild: true
makeLatest: true
replacesArtifacts: true
artifacts: maigret_standalone.exe
name: Development Windows Release [${{ github.ref_name }}]
tag: ${{ github.ref_name }}
body: |
This is a development release built from the **${{ github.ref_name }}** branch.
Take into account that `dev` releases may be unstable.
Please, use [the development release](https://github.com/soxoj/maigret/releases/tag/main) build from the **main** branch.
Instructions:
- Download the attached file `maigret_standalone.exe` to get the Windows executable.
- Video guide on how to run it: https://youtu.be/qIgwTZOmMmM
- For detailed documentation, visit: https://maigret.readthedocs.io/en/latest/
env:
GITHUB_TOKEN: ${{ github.token }}
+19 -10
View File
@@ -1,13 +1,11 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
name: Python package
name: Linting and testing
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
types: [opened, synchronize, reopened]
jobs:
build:
@@ -15,19 +13,30 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.6.9, 3.7, 3.8, 3.9]
python-version: ["3.10", "3.11", "3.12", "3.13"]
steps:
- uses: actions/checkout@v2
- name: Checkout
uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install system dependencies
run: |
sudo apt-get update && sudo apt-get install -y libcairo2-dev
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install -r test-requirements.txt
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Test with pytest
python -m pip install poetry
python -m poetry install --with dev
- name: Test with Coverage and Pytest (Fail if coverage is low)
run: |
pytest --reruns 3 --reruns-delay 5
poetry run coverage run --source=./maigret -m pytest --reruns 3 --reruns-delay 5 tests
poetry run coverage report --fail-under=60
poetry run coverage html
- name: Upload coverage report
uses: actions/upload-artifact@v4
with:
name: htmlcov-${{ strategy.job-index }}
path: htmlcov
+15 -25
View File
@@ -1,31 +1,21 @@
# This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
name: Upload Python Package
name: Upload Python Package to PyPI when a Release is Created
on:
release:
types: [created]
push:
tags:
- "v*"
permissions:
id-token: write
contents: read
jobs:
deploy:
build-and-publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build and publish
env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
run: |
python setup.py sdist bdist_wheel
twine upload dist/*
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v3
- run: uv build
- name: Publish to PyPI (Trusted Publishing)
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: dist
+57
View File
@@ -0,0 +1,57 @@
name: Update sites rating and statistics
on:
push:
branches: [ main ]
concurrency:
group: update-sites-${{ github.ref }}
cancel-in-progress: true
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
ref: main
fetch-depth: 0 # otherwise, there would be errors pushing refs to the destination repository.
- name: Install system dependencies
run: |
sudo apt-get update && sudo apt-get install -y libcairo2-dev
- name: Build application
run: |
pip3 install .
python3 ./utils/update_site_data.py --empty-only
- name: Remove ambiguous main tag
run: git tag -d main || true
- name: Check for meaningful changes
id: check
run: |
REAL_CHANGES=$(git diff --unified=0 sites.md | grep '^[+-][^+-]' | grep -v 'The list was updated at' | wc -l)
if [ "$REAL_CHANGES" -gt 0 ]; then
echo "has_changes=true" >> $GITHUB_OUTPUT
else
echo "has_changes=false" >> $GITHUB_OUTPUT
fi
- name: Delete existing PR branch
if: steps.check.outputs.has_changes == 'true'
run: git push origin --delete auto/update-sites-list || true
- name: Create Pull Request
if: steps.check.outputs.has_changes == 'true'
uses: peter-evans/create-pull-request@v7
with:
token: ${{ secrets.GITHUB_TOKEN }}
commit-message: "Updated site list and statistics"
title: "Automated Sites List Update"
body: "Automated changes to sites.md based on new Alexa rankings/statistics."
branch: "auto/update-sites-list"
base: main
delete-branch: true
+9
View File
@@ -1,5 +1,6 @@
# Virtual Environment
venv/
.venv/
# Editor Configurations
.vscode/
@@ -15,6 +16,10 @@ src/
.ipynb_checkpoints
*.ipynb
# Logs and backups
*.log
*.bak
# Output files, except requirements.txt
*.txt
!requirements.txt
@@ -34,3 +39,7 @@ htmlcov/
# Maigret files
settings.json
# other
*.egg-info
build
+16
View File
@@ -0,0 +1,16 @@
version: 2
build:
os: ubuntu-22.04
tools:
python: "3.10"
sphinx:
configuration: docs/source/conf.py
formats:
- pdf
python:
install:
- requirements: docs/requirements.txt
+383 -1
View File
@@ -1,6 +1,388 @@
# Changelog
## [Unreleased]
## [0.5.0] - 2025-08-10
* Site Supression by @C3n7ral051nt4g3ncy in https://github.com/soxoj/maigret/pull/627
* Bump yarl from 1.7.2 to 1.8.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/626
* Streaming sites by @soxoj in https://github.com/soxoj/maigret/pull/628
* Mirrors by @fen0s in https://github.com/soxoj/maigret/pull/630
* Added Instagram scrapers by @soxoj in https://github.com/soxoj/maigret/pull/633
* Bump psutil from 5.9.1 to 5.9.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/624
* Bump pypdf2 from 2.10.4 to 2.10.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/625
* Invalid results fixes by @soxoj in https://github.com/soxoj/maigret/pull/634
* Bump pytest-httpserver from 1.0.5 to 1.0.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/638
* Bump pypdf2 from 2.10.5 to 2.10.8 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/641
* Bump certifi from 2022.6.15 to 2022.9.14 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/644
* Bump idna from 3.3 to 3.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/640
* fix false positives from bot by @fen0s in https://github.com/soxoj/maigret/pull/663
* Add pre commit hook by @fen0s in https://github.com/soxoj/maigret/pull/664
* site deletion by @C3n7ral051nt4g3ncy in https://github.com/soxoj/maigret/pull/648
* Changed docker run to interactive and remove on exit by @dr-BEat in https://github.com/soxoj/maigret/pull/675
* Corrected grammar in README.md by @Trkzi-Omar in https://github.com/soxoj/maigret/pull/674
* fix sites from issues by @fen0s in https://github.com/soxoj/maigret/pull/680
* correct username in usage examples by @LeonGr in https://github.com/soxoj/maigret/pull/673
* Update README.md by @johanburati in https://github.com/soxoj/maigret/pull/669
* Fix typos by @LorenzoSapora in https://github.com/soxoj/maigret/pull/681
* Build docker images for arm64 and amd64 by @krydos in https://github.com/soxoj/maigret/pull/687
* Bump certifi from 2022.9.14 to 2022.9.24 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/652
* Bump aiohttp from 3.8.1 to 3.8.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/651
* Bump arabic-reshaper from 2.1.3 to 2.1.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/650
* Update README.md, Repl.it -> Replit with new badge by @PeterDaveHello in https://github.com/soxoj/maigret/pull/692
* Refactor Dockerfile with best practices by @PeterDaveHello in https://github.com/soxoj/maigret/pull/691
* Improve README.md Installation section by @PeterDaveHello in https://github.com/soxoj/maigret/pull/690
* Bump pytest-cov from 3.0.0 to 4.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/688
* Bump stem from 1.8.0 to 1.8.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/689
* Bump typing-extensions from 4.3.0 to 4.4.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/698
* Typo fixes in error.py by @Ben-Chapman in https://github.com/soxoj/maigret/pull/711
* Fixed docs about tags by @soxoj in https://github.com/soxoj/maigret/pull/715
* Fixed lightstalking.com by @soxoj in https://github.com/soxoj/maigret/pull/716
* Fixed YouTube by @soxoj in https://github.com/soxoj/maigret/pull/717
* Bump pytest-asyncio from 0.19.0 to 0.20.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/732
* Updated snapcraft yaml by @kz6fittycent in https://github.com/soxoj/maigret/pull/720
* Bump colorama from 0.4.5 to 0.4.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/733
* Bump pytest from 7.1.3 to 7.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/734
* disable not working sites by @fen0s in https://github.com/soxoj/maigret/pull/739
* disable broken sites by @fen0s in https://github.com/soxoj/maigret/pull/756
* Bump cloudscraper from 1.2.64 to 1.2.66 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/769
* fix opensea and shutterstock, disable a few dead sites by @fen0s in https://github.com/soxoj/maigret/pull/798
* Fixed documentation URL by @soxoj in https://github.com/soxoj/maigret/pull/799
* Small readme fix by @soxoj in https://github.com/soxoj/maigret/pull/857
* docs spelling error by @Nadeem-05 in https://github.com/soxoj/maigret/pull/866
* Fix Pinterest false positive by @therealchiendat in https://github.com/soxoj/maigret/pull/862
* Added new Websites by @codyMar30 in https://github.com/soxoj/maigret/pull/838
* Update "future" package to v0.18.3 by @PeterDaveHello in https://github.com/soxoj/maigret/pull/834
* Bump certifi from 2022.9.24 to 2022.12.7 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/793
* Update dependency - networkx from v2.5.1 to v2.6 by @PeterDaveHello in https://github.com/soxoj/maigret/pull/738
* Bump reportlab from 3.6.11 to 3.6.12 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/735
* Bump typing-extensions from 4.4.0 to 4.5.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/888
* Bump psutil from 5.9.2 to 5.9.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/741
* Bump attrs from 22.1.0 to 22.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/892
* Bump multidict from 6.0.2 to 6.0.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/891
* Fixed false positives, updated networkx dep, some lint fixes by @soxoj in https://github.com/soxoj/maigret/pull/894
* Bump lxml from 4.9.1 to 4.9.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/900
* Bump yarl from 1.8.1 to 1.8.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/899
* Fixed false positives on Mastodon sites by @soxoj in https://github.com/soxoj/maigret/pull/901
* Added valid regex for Mastodon instances (#848) by @soxoj in https://github.com/soxoj/maigret/pull/906
* Fix missing Mastodon Regex on #906 by @therealchiendat in https://github.com/soxoj/maigret/pull/908
* Bump tqdm from 4.64.1 to 4.65.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/905
* Bump requests from 2.28.1 to 2.28.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/904
* Bump psutil from 5.9.4 to 5.9.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/910
* fix deployment of tests by @noraj in https://github.com/soxoj/maigret/pull/933
* Added 26 ENS and similar domains with tag `crypto` by @soxoj in https://github.com/soxoj/maigret/pull/942
* Bump requests from 2.28.2 to 2.31.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/957
* Update wizard.py by @engNoori in https://github.com/soxoj/maigret/pull/1016
* Improved search through UnstoppableDomains by @soxoj in https://github.com/soxoj/maigret/pull/1040
* Added memory.lol (Twitter usernames archive) by @soxoj in https://github.com/soxoj/maigret/pull/1067
* Disabled and fixed several sites by @soxoj in https://github.com/soxoj/maigret/pull/1132
* Fixed some sites (again) by @soxoj in https://github.com/soxoj/maigret/pull/1133
* fix(sec): upgrade reportlab to 3.6.13 by @realize096 in https://github.com/soxoj/maigret/pull/1051
* Add compatibility with pytest >= 7.3.0 by @tjni in https://github.com/soxoj/maigret/pull/1117
* Additionally fixed sites, win32 build fix by @soxoj in https://github.com/soxoj/maigret/pull/1148
* Sites fixes 250823 by @soxoj in https://github.com/soxoj/maigret/pull/1149
* Bump reportlab from 3.6.12 to 4.0.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1160
* Bump certifi from 2022.12.7 to 2023.7.22 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1070
* fix(sec): upgrade certifi to 2022.12.07 by @realize096 in https://github.com/soxoj/maigret/pull/1173
* Bump cloudscraper from 1.2.66 to 1.2.71 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/914
* Some sites fixed & cloudflare detection by @soxoj in https://github.com/soxoj/maigret/pull/1178
* EasyInstaller because everyone likes saving time :) by @CatchySmile in https://github.com/soxoj/maigret/pull/1212
* Tests fixes + last updates by @soxoj in https://github.com/soxoj/maigret/pull/1228
* Bump pypdf2 from 2.10.8 to 3.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/815
* Bump pyvis from 0.2.1 to 0.3.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/861
* Bump xhtml2pdf from 0.2.8 to 0.2.11 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/935
* Bump flake8 from 5.0.4 to 6.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1091
* Bump aiohttp from 3.8.3 to 3.8.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1222
* Specified pyinstaller version by @soxoj in https://github.com/soxoj/maigret/pull/1230
* Pyinstaller fix by @soxoj in https://github.com/soxoj/maigret/pull/1231
* Test pyinstaller on dev branch by @soxoj in https://github.com/soxoj/maigret/pull/1233
* Update main from dev again by @soxoj in https://github.com/soxoj/maigret/pull/1234
* Bump typing-extensions from 4.5.0 to 4.8.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1239
* Bump pytest-rerunfailures from 10.2 to 12.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1237
* Bump async-timeout from 4.0.2 to 4.0.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1238
* Changed pyinstaller dir by @soxoj in https://github.com/soxoj/maigret/pull/1245
* Bump tqdm from 4.65.0 to 4.66.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1235
* Updating site checkers, disabling suspended sites by @MeowyPouncer in https://github.com/soxoj/maigret/pull/1266
* Updated site statistics by @soxoj in https://github.com/soxoj/maigret/pull/1273
* Compat RegataOS (Opensuse) by @Jeiel0rbit in https://github.com/soxoj/maigret/pull/1308
* fix reddit by @hhhtylerw in https://github.com/soxoj/maigret/pull/1296
* Added Telegram bot link by @soxoj in https://github.com/soxoj/maigret/pull/1321
* Added SOWEL classification by @soxoj in https://github.com/soxoj/maigret/pull/1453
* Bump jinja2 from 3.1.2 to 3.1.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1358
* Fixed/Disabled sites. Update requirements.txt by @rly0nheart in https://github.com/soxoj/maigret/pull/1517
* Fixed 4 sites, added 6 sites, disabled 27 sites by @rly0nheart in https://github.com/soxoj/maigret/pull/1536
* Fixed 3 sites, disabed 3, added by @rly0nheart in https://github.com/soxoj/maigret/pull/1539
* Bump socid-extractor from 0.0.24 to 0.0.26 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1546
* Added code conventions to CONTRIBUTING.md by @Lord-Topa in https://github.com/soxoj/maigret/pull/1589
* Readme by @Lord-Topa in https://github.com/soxoj/maigret/pull/1588
* Update data.json by @ranlo in https://github.com/soxoj/maigret/pull/1559
* Adding permutator feature for usernames by @balestek in https://github.com/soxoj/maigret/pull/1575
* Alik.cz indirectly requests removal by @ppfeister in https://github.com/soxoj/maigret/pull/1671
* Fixed 1 site, PyInstaller workflow, Google Colab example by @Ixve in https://github.com/soxoj/maigret/pull/1558
* Bump soupsieve from 2.5 to 2.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1708
* Added dev documentation, fixed some sites, removed GitHub issue links… by @soxoj in https://github.com/soxoj/maigret/pull/1869
* Bump cryptography from 42.0.7 to 43.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1870
* Bump requests-futures from 1.0.1 to 1.0.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1868
* Bump werkzeug from 3.0.3 to 3.0.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1846
* Added .readthedocs.yaml, fixed Pyinstaller and Docker workflows by @soxoj in https://github.com/soxoj/maigret/pull/1874
* Added GitHub and BuyMeACoffee sponsorships by @soxoj in https://github.com/soxoj/maigret/pull/1875
* Bump psutil from 5.9.5 to 6.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1839
* Bump flake8 from 6.1.0 to 7.1.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1692
* Bump future from 0.18.3 to 1.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1545
* Bump urllib3 from 2.2.1 to 2.2.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1600
* Bump certifi from 2023.11.17 to 2024.8.30 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1840
* Fixed test for aiohttp 3.10 by @soxoj in https://github.com/soxoj/maigret/pull/1876
* Bump aiohttp from 3.9.5 to 3.10.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1721
* Added new badges to README by @soxoj in https://github.com/soxoj/maigret/pull/1877
* Show detailed error statistics for `-v` by @soxoj in https://github.com/soxoj/maigret/pull/1879
* Disabled unavailable sites by @soxoj in https://github.com/soxoj/maigret/pull/1880
* Added 7 sites, implemented integration with Marple, docs update by @soxoj in https://github.com/soxoj/maigret/pull/1881
* Bump pefile from 2022.5.30 to 2024.8.26 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1883
* Bump lxml from 4.9.4 to 5.3.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1884
* New sites added by @soxoj in https://github.com/soxoj/maigret/pull/1888
* Improved self-check mode, added 15 sites by @soxoj in https://github.com/soxoj/maigret/pull/1887
* Bump pyinstaller from 6.1 to 6.11.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1882
* Bump pytest-asyncio from 0.23.7 to 0.23.8 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1885
* Pyinstaller bump & pefile fix by @soxoj in https://github.com/soxoj/maigret/pull/1890
* Bump python-bidi from 0.4.2 to 0.6.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1886
* Sites checks fixes by @soxoj in https://github.com/soxoj/maigret/pull/1896
* Parallel execution optimization by @soxoj in https://github.com/soxoj/maigret/pull/1897
* Maigret bot support (custom progress function fixed) by @soxoj in https://github.com/soxoj/maigret/pull/1898
* Bump markupsafe from 2.1.5 to 3.0.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1895
* Retries set to 0 by default, refactored code of executor with progress by @soxoj in https://github.com/soxoj/maigret/pull/1899
* Bump aiohttp-socks from 0.7.1 to 0.9.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1900
* Bump pycountry from 23.12.11 to 24.6.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1903
* Bump pytest-cov from 4.1.0 to 6.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1902
* Bump pyvis from 0.2.1 to 0.3.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1893
* Close http connections (#1595) by @soxoj in https://github.com/soxoj/maigret/pull/1905
* New logo by @soxoj in https://github.com/soxoj/maigret/pull/1906
* Fixed dateutil parsing error for CDT timezone by @soxoj in https://github.com/soxoj/maigret/pull/1907
* Bump alive-progress from 2.4.1 to 3.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1910
* Permutator output and documentation updates by @soxoj in https://github.com/soxoj/maigret/pull/1914
* Bump aiohttp from 3.11.7 to 3.11.8 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1912
* Bump async-timeout from 4.0.3 to 5.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1909
* An recursive search animation in README has been updated by @soxoj in https://github.com/soxoj/maigret/pull/1915
* Bump pytest-rerunfailures from 12.0 to 15.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1911
* Bump attrs from 22.2.0 to 24.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1913
* Sites fixes by @soxoj in https://github.com/soxoj/maigret/pull/1917
* Update README.md by @soxoj in https://github.com/soxoj/maigret/pull/1919
* Refactored sites module, updated documentation by @soxoj in https://github.com/soxoj/maigret/pull/1918
* Bump aiohttp from 3.11.8 to 3.11.9 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1920
* Bump pytest from 7.4.4 to 8.3.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1923
* Bump yarl from 1.18.0 to 1.18.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1922
* Bump pytest-asyncio from 0.23.8 to 0.24.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1925
* Documentation update by @soxoj in https://github.com/soxoj/maigret/pull/1926
* Bump mock from 4.0.3 to 5.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1921
* Bump pywin32-ctypes from 0.2.1 to 0.2.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1924
* Installation docs update by @soxoj in https://github.com/soxoj/maigret/pull/1927
* Disabled Figma check by @soxoj in https://github.com/soxoj/maigret/pull/1928
* Put Windows executable in Releases for each dev and main commit by @soxoj in https://github.com/soxoj/maigret/pull/1929
* Updated PyInstaller workflow by @soxoj in https://github.com/soxoj/maigret/pull/1930
* Documentation update by @soxoj in https://github.com/soxoj/maigret/pull/1931
* Fixed Figma check and some bugs by @soxoj in https://github.com/soxoj/maigret/pull/1932
* Bump six from 1.16.0 to 1.17.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1933
* Activation mechanism documentation added by @soxoj in https://github.com/soxoj/maigret/pull/1935
* Readme/docs update based on GH discussions by @soxoj in https://github.com/soxoj/maigret/pull/1936
* Bump aiohttp from 3.11.9 to 3.11.10 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1937
* Weibo site check fix, activation mechanism added by @soxoj in https://github.com/soxoj/maigret/pull/1938
* Fixed Ebay and BongaCams checks by @soxoj in https://github.com/soxoj/maigret/pull/1939
* Sites fixes by @soxoj in https://github.com/soxoj/maigret/pull/1940
* Fixed Linktr and discourse.mozilla.org by @soxoj in https://github.com/soxoj/maigret/pull/1941
* Refactored self-check method, code formatting, small lint fixes by @soxoj in https://github.com/soxoj/maigret/pull/1942
* Refactoring, test coverage increased to 60% by @soxoj in https://github.com/soxoj/maigret/pull/1943
* Added a test for submitter by @soxoj in https://github.com/soxoj/maigret/pull/1944
* Update README.md by @soxoj in https://github.com/soxoj/maigret/pull/1949
* Updated OP.GG checks by @soxoj in https://github.com/soxoj/maigret/pull/1950
* Fixed ProductHunt check by @soxoj in https://github.com/soxoj/maigret/pull/1951
* Improved check feature extraction function, added tests by @soxoj in https://github.com/soxoj/maigret/pull/1952
* Submit improvements and site check fixes by @soxoj in https://github.com/soxoj/maigret/pull/1956
* chore: update submit.py by @eltociear in https://github.com/soxoj/maigret/pull/1957
* Fixed Gravatar parsing (socid_extractor) by @soxoj in https://github.com/soxoj/maigret/pull/1958
* Site check fixes by @soxoj in https://github.com/soxoj/maigret/pull/1962
* fix bad linux filename generation by @overcuriousity in https://github.com/soxoj/maigret/pull/1961
* Bump pytest-asyncio from 0.24.0 to 0.25.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1963
* Fixed flaky tests to check cookies by @soxoj in https://github.com/soxoj/maigret/pull/1965
* Preparation of 0.5.0 alpha version by @soxoj in https://github.com/soxoj/maigret/pull/1966
* Created web frontend launched via --web flag by @overcuriousity in https://github.com/soxoj/maigret/pull/1967
* Bump certifi from 2024.8.30 to 2024.12.14 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1969
* Bump attrs from 24.2.0 to 24.3.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1970
* Added web interface docs by @soxoj in https://github.com/soxoj/maigret/pull/1972
* Small docs and parameters fixes for web interface mode by @soxoj in https://github.com/soxoj/maigret/pull/1973
* [ImgBot] Optimize images by @imgbot[bot] in https://github.com/soxoj/maigret/pull/1974
* Improving the web interface by @overcuriousity in https://github.com/soxoj/maigret/pull/1975
* make graph more meaningful by @overcuriousity in https://github.com/soxoj/maigret/pull/1977
* Async generator-executor for site checks by @soxoj in https://github.com/soxoj/maigret/pull/1978
* Bump aiohttp from 3.11.10 to 3.11.11 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1979
* Bump psutil from 6.1.0 to 6.1.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1980
* Bump aiohttp-socks from 0.9.1 to 0.10.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1985
* Bump mypy from 1.13.0 to 1.14.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1983
* Bump aiohttp-socks from 0.10.0 to 0.10.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1987
* Bump jinja2 from 3.1.4 to 3.1.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1982
* Bump coverage from 7.6.9 to 7.6.10 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1986
* Bump pytest-asyncio from 0.25.0 to 0.25.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1989
* Bump mypy from 1.14.0 to 1.14.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1988
* Bump pytest-asyncio from 0.25.1 to 0.25.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1990
* docs: update usage-examples.rst by @eltociear in https://github.com/soxoj/maigret/pull/1996
* upload-artifact action in python test workflow updated to v4 by @soxoj in https://github.com/soxoj/maigret/pull/2024
* Pass db_file configuration to web interface by @pykereaper in https://github.com/soxoj/maigret/pull/2019
* Fix usage of data.json files from web by @pykereaper in https://github.com/soxoj/maigret/pull/2020
* Bump black from 24.10.0 to 25.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2001
* Important Update Installer.bat by @CatchySmile in https://github.com/soxoj/maigret/pull/1994
* Bump cryptography from 44.0.0 to 44.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2005
* Bump jinja2 from 3.1.5 to 3.1.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2011
* [#2010] Add 6 more websites to manage by @pylapp in https://github.com/soxoj/maigret/pull/2009
* Bump flask from 3.1.0 to 3.1.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2028
* Bump requests from 2.32.3 to 2.32.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2026
* Bump pycares from 4.5.0 to 4.9.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2025
* Bump pytest-asyncio from 0.25.2 to 0.26.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2016
* Bump urllib3 from 2.2.3 to 2.5.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2027
* Disable ICQ site by @Echo-Darlyson in https://github.com/soxoj/maigret/pull/1993
* Bump attrs from 24.3.0 to 25.3.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2014
* Bump certifi from 2024.12.14 to 2025.1.31 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2004
* Bump typing-extensions from 4.12.2 to 4.14.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2038
* Disable AskFM by @MR-VL in https://github.com/soxoj/maigret/pull/2037
* Bump platformdirs from 4.3.6 to 4.3.8 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2033
* Bump coverage from 7.6.10 to 7.9.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2039
* Bump aiohttp from 3.11.11 to 3.12.14 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2041
* Bump yarl from 1.18.3 to 1.20.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2032
* Fixed test dialog_adds_site_negative by @soxoj in https://github.com/soxoj/maigret/pull/2107
* Bump reportlab from 4.2.5 to 4.4.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2063
* Bump asgiref from 3.8.1 to 3.9.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2040
* Bump multidict from 6.1.0 to 6.6.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2034
* Bump pytest-rerunfailures from 15.0 to 15.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2030
**Full Changelog**: https://github.com/soxoj/maigret/compare/v0.4.4...v0.5.0
## [0.4.4] - 2022-09-03
* Fixed some false positives by @soxoj in https://github.com/soxoj/maigret/pull/433
* Drop Python 3.6 support by @soxoj in https://github.com/soxoj/maigret/pull/434
* Bump xhtml2pdf from 0.2.5 to 0.2.7 by @dependabot in https://github.com/soxoj/maigret/pull/409
* Bump reportlab from 3.6.6 to 3.6.9 by @dependabot in https://github.com/soxoj/maigret/pull/403
* Bump markupsafe from 2.0.1 to 2.1.1 by @dependabot in https://github.com/soxoj/maigret/pull/389
* Bump pycountry from 22.1.10 to 22.3.5 by @dependabot in https://github.com/soxoj/maigret/pull/384
* Bump pypdf2 from 1.26.0 to 1.27.4 by @dependabot in https://github.com/soxoj/maigret/pull/438
* Update GH actions by @soxoj in https://github.com/soxoj/maigret/pull/439
* Bump tqdm from 4.63.0 to 4.64.0 by @dependabot in https://github.com/soxoj/maigret/pull/440
* Bump jinja2 from 3.0.3 to 3.1.1 by @dependabot in https://github.com/soxoj/maigret/pull/441
* Bump soupsieve from 2.3.1 to 2.3.2 by @dependabot in https://github.com/soxoj/maigret/pull/436
* Bump pypdf2 from 1.26.0 to 1.27.4 by @dependabot in https://github.com/soxoj/maigret/pull/442
* Bump pyvis from 0.1.9 to 0.2.0 by @dependabot in https://github.com/soxoj/maigret/pull/443
* Bump pypdf2 from 1.27.4 to 1.27.6 by @dependabot in https://github.com/soxoj/maigret/pull/448
* Bump typing-extensions from 4.1.1 to 4.2.0 by @dependabot in https://github.com/soxoj/maigret/pull/447
* Bump soupsieve from 2.3.2 to 2.3.2.post1 by @dependabot in https://github.com/soxoj/maigret/pull/444
* Bump pypdf2 from 1.27.6 to 1.27.7 by @dependabot in https://github.com/soxoj/maigret/pull/449
* Bump pypdf2 from 1.27.7 to 1.27.8 by @dependabot in https://github.com/soxoj/maigret/pull/450
* XMind 8 report warning and some docs update by @soxoj in https://github.com/soxoj/maigret/pull/452
* False positive fixes 24.04.22 by @soxoj in https://github.com/soxoj/maigret/pull/455
* Bump pypdf2 from 1.27.8 to 1.27.9 by @dependabot in https://github.com/soxoj/maigret/pull/456
* Bump pytest from 7.0.1 to 7.1.2 by @dependabot in https://github.com/soxoj/maigret/pull/457
* Bump jinja2 from 3.1.1 to 3.1.2 by @dependabot in https://github.com/soxoj/maigret/pull/460
* Ubisoft forums addition by @fen0s in https://github.com/soxoj/maigret/pull/461
* Add BYOND, Figma, BeatStars by @fen0s in https://github.com/soxoj/maigret/pull/462
* fix Figma username definition, add a bunch of sites by @fen0s in https://github.com/soxoj/maigret/pull/464
* Bump pypdf2 from 1.27.9 to 1.27.10 by @dependabot in https://github.com/soxoj/maigret/pull/465
* Bump pypdf2 from 1.27.10 to 1.27.12 by @dependabot in https://github.com/soxoj/maigret/pull/466
* Sites fixes 05 05 22 by @soxoj in https://github.com/soxoj/maigret/pull/469
* Bump pyvis from 0.2.0 to 0.2.1 by @dependabot in https://github.com/soxoj/maigret/pull/472
* Social analyzer websites, also fixing presense strs by @fen0s in https://github.com/soxoj/maigret/pull/471
* Updated logic of false positive risk estimating by @soxoj in https://github.com/soxoj/maigret/pull/475
* Improved usability of external progressbar func by @soxoj in https://github.com/soxoj/maigret/pull/476
* New sites added, some tags/rank update by @soxoj in https://github.com/soxoj/maigret/pull/477
* Added new sites by @soxoj in https://github.com/soxoj/maigret/pull/480
* Added new forums, updated ranks, some utils improvements by @soxoj in https://github.com/soxoj/maigret/pull/481
* Disabled sites with false positives results by @soxoj in https://github.com/soxoj/maigret/pull/482
* Bump certifi from 2021.10.8 to 2022.5.18.1 by @dependabot in https://github.com/soxoj/maigret/pull/488
* Bump psutil from 5.9.0 to 5.9.1 by @dependabot in https://github.com/soxoj/maigret/pull/490
* Bump pypdf2 from 1.27.12 to 1.28.1 by @dependabot in https://github.com/soxoj/maigret/pull/491
* Bump pypdf2 from 1.28.1 to 1.28.2 by @dependabot in https://github.com/soxoj/maigret/pull/493
* added and fixed some websites in data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/494
* Bump pypdf2 from 1.28.2 to 2.0.0 by @dependabot in https://github.com/soxoj/maigret/pull/504
* Bump pefile from 2021.9.3 to 2022.5.30 by @dependabot in https://github.com/soxoj/maigret/pull/499
* Updated sites list, added disabled Anilist by @soxoj in https://github.com/soxoj/maigret/pull/502
* Bump lxml from 4.8.0 to 4.9.0 by @dependabot in https://github.com/soxoj/maigret/pull/503
* Compatibility with Python 10 by @soxoj in https://github.com/soxoj/maigret/pull/509
* feat: add .log & .bak files to gitignore in https://github.com/soxoj/maigret/pull/511
* fix some sites and delete abandoned by @fen0s in https://github.com/soxoj/maigret/pull/526
* Fixesjulyfirst by @fen0s in https://github.com/soxoj/maigret/pull/533
* yazbel, aboutcar, zhihu by @fen0s in https://github.com/soxoj/maigret/pull/531
* Fixes july third by @fen0s in https://github.com/soxoj/maigret/pull/535
* Update data.json by @fen0s in https://github.com/soxoj/maigret/pull/539
* Update data.json by @fen0s in https://github.com/soxoj/maigret/pull/540
* Bump reportlab from 3.6.9 to 3.6.11 by @dependabot in https://github.com/soxoj/maigret/pull/543
* Bump requests from 2.27.1 to 2.28.1 by @dependabot in https://github.com/soxoj/maigret/pull/530
* Bump pypdf2 from 2.0.0 to 2.5.0 by @dependabot in https://github.com/soxoj/maigret/pull/542
* Bump xhtml2pdf from 0.2.7 to 0.2.8 by @dependabot in https://github.com/soxoj/maigret/pull/522
* Bump lxml from 4.9.0 to 4.9.1 by @dependabot in https://github.com/soxoj/maigret/pull/538
* disable yandex music + set utf8 encoding by @fen0s in https://github.com/soxoj/maigret/pull/562
* fix false positives by @fen0s in https://github.com/soxoj/maigret/pull/577
* disable Instagram, fix two false positives by @fen0s in https://github.com/soxoj/maigret/pull/578
* Bump certifi from 2022.5.18.1 to 2022.6.15 by @dependabot in https://github.com/soxoj/maigret/pull/551
* August15 by @fen0s in https://github.com/soxoj/maigret/pull/591
* Bump pytest-httpserver from 1.0.4 to 1.0.5 by @dependabot in https://github.com/soxoj/maigret/pull/583
* Bump typing-extensions from 4.2.0 to 4.3.0 by @dependabot in https://github.com/soxoj/maigret/pull/549
* Bump colorama from 0.4.4 to 0.4.5 by @dependabot in https://github.com/soxoj/maigret/pull/548
* Bump chardet from 4.0.0 to 5.0.0 by @dependabot in https://github.com/soxoj/maigret/pull/550
* Bump cloudscraper from 1.2.60 to 1.2.63 by @dependabot in https://github.com/soxoj/maigret/pull/600
* Bump flake8 from 4.0.1 to 5.0.4 by @dependabot in https://github.com/soxoj/maigret/pull/598
* Bump attrs from 21.4.0 to 22.1.0 by @dependabot in https://github.com/soxoj/maigret/pull/597
* Bump pytest-asyncio from 0.18.2 to 0.19.0 by @dependabot in https://github.com/soxoj/maigret/pull/601
* Bump pypdf2 from 2.5.0 to 2.10.4 by @dependabot in https://github.com/soxoj/maigret/pull/606
* Bump pytest from 7.1.2 to 7.1.3 by @dependabot in https://github.com/soxoj/maigret/pull/613
* Update sites.md -Gitmemory.com suppression by @C3n7ral051nt4g3ncy in https://github.com/soxoj/maigret/pull/610
* Bump cloudscraper from 1.2.63 to 1.2.64 by @dependabot in https://github.com/soxoj/maigret/pull/614
* Bump pycountry from 22.1.10 to 22.3.5 by @dependabot in https://github.com/soxoj/maigret/pull/607
* add ProtonMail, disable 3 broken sites by @fen0s in https://github.com/soxoj/maigret/pull/619
* Bump tqdm from 4.64.0 to 4.64.1 by @dependabot in https://github.com/soxoj/maigret/pull/618
**Full Changelog**: https://github.com/soxoj/maigret/compare/v0.4.3...v0.4.4
## [0.4.3] - 2022-04-13
* Added Sites to data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/386
* added new Websites to data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/390
* Skipped broken tests by @soxoj in https://github.com/soxoj/maigret/pull/397
* Added new Websites to data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/401
* Added new Websites to data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/404
* Updated statistics by @soxoj in https://github.com/soxoj/maigret/pull/406
* Added new Websites to data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/413
* Disabled houzz.com, updated sites statistics by @soxoj in https://github.com/soxoj/maigret/pull/422
* Fixed last false positives by @soxoj in https://github.com/soxoj/maigret/pull/424
* Fixed actual false positives by @soxoj in https://github.com/soxoj/maigret/pull/431
**Full Changelog**: https://github.com/soxoj/maigret/compare/v0.4.2...v0.4.3
## [0.4.2] - 2022-03-07
* [ImgBot] Optimize images by @imgbot in https://github.com/soxoj/maigret/pull/319
* Bump pytest-asyncio from 0.17.0 to 0.17.1 by @dependabot in https://github.com/soxoj/maigret/pull/321
* Bump pytest-asyncio from 0.17.1 to 0.17.2 by @dependabot in https://github.com/soxoj/maigret/pull/323
* Disabled Ruboard by @soxoj in https://github.com/soxoj/maigret/pull/327
* Disable kinooh, sites list update workflow added by @soxoj in https://github.com/soxoj/maigret/pull/329
* Bump multidict from 5.2.0 to 6.0.1 by @dependabot in https://github.com/soxoj/maigret/pull/332
* Bump multidict from 6.0.1 to 6.0.2 by @dependabot in https://github.com/soxoj/maigret/pull/333
* Bump pytest-httpserver from 1.0.3 to 1.0.4 by @dependabot in https://github.com/soxoj/maigret/pull/334
* Bump pytest from 6.2.5 to 7.0.0 by @dependabot in https://github.com/soxoj/maigret/pull/339
* Bump pytest-asyncio from 0.17.2 to 0.18.0 by @dependabot in https://github.com/soxoj/maigret/pull/340
* Bump pytest-asyncio from 0.18.0 to 0.18.1 by @dependabot in https://github.com/soxoj/maigret/pull/343
* Bump pytest from 7.0.0 to 7.0.1 by @dependabot in https://github.com/soxoj/maigret/pull/345
* Bump typing-extensions from 4.0.1 to 4.1.1 by @dependabot in https://github.com/soxoj/maigret/pull/346
* Bump lxml from 4.7.1 to 4.8.0 by @dependabot in https://github.com/soxoj/maigret/pull/350
* Pin reportlab version by @cyb3rk0tik in https://github.com/soxoj/maigret/pull/351
* Fix reportlab not only for testing by @cyb3rk0tik in https://github.com/soxoj/maigret/pull/352
* Added some scripts by @soxoj in https://github.com/soxoj/maigret/pull/355
* Added package publishing instruction by @soxoj in https://github.com/soxoj/maigret/pull/356
* Added DB statistics autoupdate and write to sites.md by @soxoj in https://github.com/soxoj/maigret/pull/357
* CI autoupdate by @soxoj in https://github.com/soxoj/maigret/pull/359
* Op.gg fixes by @soxoj in https://github.com/soxoj/maigret/pull/363
* Wikipedia fix by @soxoj in https://github.com/soxoj/maigret/pull/365
* Disabled Netvibes and LeetCode by @soxoj in https://github.com/soxoj/maigret/pull/366
* Fixed several false positives, improved statistics info by @soxoj in https://github.com/soxoj/maigret/pull/368
* Fix false positives by @soxoj in https://github.com/soxoj/maigret/pull/370
* Fixed the rest of false positives for now by @soxoj in https://github.com/soxoj/maigret/pull/371
* Fix false positive and CI by @soxoj in https://github.com/soxoj/maigret/pull/372
* Added new sites to data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/375
* Fixed issue with str alexaRank by @soxoj in https://github.com/soxoj/maigret/pull/382
* Bump tqdm from 4.62.3 to 4.63.0 by @dependabot in https://github.com/soxoj/maigret/pull/374
* Bump pytest-asyncio from 0.18.1 to 0.18.2 by @dependabot in https://github.com/soxoj/maigret/pull/380
* @imgbot made their first contribution in https://github.com/soxoj/maigret/pull/319
* @kustermariocoding made their first contribution in https://github.com/soxoj/maigret/pull/375
**Full Changelog**: https://github.com/soxoj/maigret/compare/v0.4.1...v0.4.2
## [0.4.1] - 2022-01-15
* Added dozen of sites, improved submit mode by @soxoj in https://github.com/soxoj/maigret/pull/288
+24 -1
View File
@@ -2,6 +2,10 @@
Hey! I'm really glad you're reading this. Maigret contains a lot of sites, and it is very hard to keep all the sites operational. That's why any fix is important.
## Code of Conduct
Please read and follow the [Code of Conduct](CODE_OF_CONDUCT.md) to foster a welcoming and inclusive community.
## How to add a new site
#### Beginner level
@@ -27,4 +31,23 @@ Always write a clear log message for your commits. One-line messages are fine fo
## Coding conventions
Start reading the code and you'll get the hang of it. ;)
### General Guidelines
- Try to follow [PEP 8](https://www.python.org/dev/peps/pep-0008/) for Python code style.
- Ensure your code passes all tests before submitting a pull request.
### Code Style
- **Indentation**: Use 4 spaces per indentation level.
- **Imports**:
- Standard library imports should be placed at the top.
- Third-party imports should follow.
- Group imports logically.
### Naming Conventions
- **Variables and Functions**: Use `snake_case`.
- **Classes**: Use `CamelCase`.
- **Constants**: Use `UPPER_CASE`.
Start reading the code and you'll get the hang of it. ;)
+14 -12
View File
@@ -1,16 +1,18 @@
FROM python:3.9-slim
MAINTAINER Soxoj <soxoj@protonmail.com>
FROM python:3.11-slim
LABEL maintainer="Soxoj <soxoj@protonmail.com>"
WORKDIR /app
RUN pip install --upgrade pip
RUN apt update && \
apt install -y \
gcc \
musl-dev \
libxml2 \
RUN pip install --no-cache-dir --upgrade pip
RUN apt-get update && \
apt-get install --no-install-recommends -y \
build-essential \
python3-dev \
pkg-config \
libcairo2-dev \
libxml2-dev \
libxslt-dev
RUN apt clean \
libxslt1-dev \
&& rm -rf /var/lib/apt/lists/* /tmp/*
ADD . .
RUN YARL_NO_EXTENSIONS=1 python3 -m pip install .
COPY . .
RUN YARL_NO_EXTENSIONS=1 python3 -m pip install --no-cache-dir .
# For production use, set FLASK_HOST to a specific IP address for security
ENV FLASK_HOST=0.0.0.0
ENTRYPOINT ["maigret"]
+118
View File
@@ -0,0 +1,118 @@
@echo off
goto check_Permissions
:check_Permissions
net session >nul 2>&1
if %errorLevel% == 0 (
echo Success: Elevated permissions granted.
) else (
echo Failure: Requires elevated permissions.
pause >nul
)
cls
echo --------------------------------------------------------
echo Python 3.8 or higher and pip3 required.
echo --------------------------------------------------------
echo Press [I] to begin installation.
echo Press [R] If already installed.
echo --------------------------------------------------------
choice /c IR
if %errorlevel%==1 goto check_python
if %errorlevel%==2 goto after
:check_python
cls
for /f "tokens=2 delims= " %%i in ('python --version 2^>nul') do (
for /f "tokens=1,2 delims=." %%j in ("%%i") do (
if %%j GEQ 3 (
if %%k GEQ 8 (
goto check_pip
)
)
)
)
echo Python 3.8 or higher is required. Please install it first.
pause
exit /b
:check_pip
pip --version 2>nul | findstr /r /c:"pip" >nul
if %errorlevel% neq 0 (
echo pip is required. Please install it first.
pause
exit /b
)
goto install1
:install1
cls
echo ========================================================
echo Maigret Installation
echo ========================================================
echo.
echo --------------------------------------------------------
echo If your pip installation is outdated, it could cause
echo cryptography to fail on installation.
echo --------------------------------------------------------
echo Check for and install pip 23.3.2 now?
echo --------------------------------------------------------
choice /c YN
if %errorlevel%==1 goto install2
if %errorlevel%==2 goto install3
:install2
cls
python -m pip install --upgrade pip==23.3.2
if %errorlevel% neq 0 (
echo Failed to update pip to version 23.3.2. Please check your installation.
pause
exit /b
)
goto install3
:install3
cls
echo ========================================================
echo Maigret Installation
echo ========================================================
echo.
echo --------------------------------------------------------
echo Installing Maigret...
python -m pip install maigret
if %errorlevel% neq 0 (
echo Failed to install Maigret. Please check your installation.
pause
exit /b
)
echo.
echo +------------------------------------------------------+
echo Maigret installed successfully.
echo +------------------------------------------------------+
pause
goto after
:after
cls
echo ========================================================
echo Maigret Usage
echo ========================================================
echo.
echo +--------------------------------------------------------+
echo To use Maigret, you can run the following command:
echo.
echo maigret [options] [username]
echo.
echo For example, to search for a username:
echo.
echo maigret example_username
echo.
echo For more options and usage details, refer to the Maigret documentation.
echo.
echo https://github.com/soxoj/maigret/blob/5b3b81b4822f6deb2e9c31eb95039907f25beb5e/README.md
echo +--------------------------------------------------------+
echo.
cmd
pause
exit /b
exit /b
+452
View File
@@ -0,0 +1,452 @@
# Site checks — guide (Maigret)
Working document for future changes: workflow, findings from reviews, and practical steps. See also [`site-checks-playbook.md`](site-checks-playbook.md) (short checklist), [`socid_extractor_improvements.log`](socid_extractor_improvements.log) (proposals for upstream identity extraction), and the code in [`maigret/checking.py`](../maigret/checking.py).
**Documentation maintenance:** whenever you improve Maigret, add search tooling, or change check logic, update **this file** and [`site-checks-playbook.md`](site-checks-playbook.md) in sync (see the section at the end). If you change rules about the JSON API check or the `socid_extractor` log format, update **[`socid_extractor_improvements.log`](socid_extractor_improvements.log)** (template / header) together with this guide.
---
## 1. How checks work
Logic lives in `process_site_result` ([`maigret/checking.py`](../maigret/checking.py)):
| `checkType` | Meaning |
|-------------|---------|
| `message` | Profile is “found” if the HTML contains **none** of the `absenceStrs` substrings **and** at least one `presenseStrs` marker matches. If `presenseStrs` is **empty**, presence is treated as true for **any** page (risky configuration). |
| `status_code` | HTTP **2xx** is enough — only safe if the server does **not** return 200 for “user not found”. |
| `response_url` | Custom flow with **redirects disabled** so the status/URL of the *first* response can be used. |
For other `checkType` values, [`make_site_result`](../maigret/checking.py) sets **`allow_redirects=True`**: the client follows redirects and `process_site_result` sees the **final** response body and status (not the pre-redirect hop). You do **not** need to “turn on” follow-redirect separately for most sites.
Sites with an `engine` field (e.g. XenForo) are merged with a template from the `engines` section in [`maigret/resources/data.json`](../maigret/resources/data.json) ([`MaigretSite.update_from_engine`](../maigret/sites.py)).
### `urlProbe`: probe URL vs reported profile URL
- **`url`** — pattern for the **public profile page** users should open (what appears in reports as `url_user`). Supports `{username}`, `{urlMain}`, `{urlSubpath}`; the username segment is URL-encoded when the string is built ([`make_site_result`](../maigret/checking.py)).
- **`urlProbe`** (optional) — if set, Maigret sends the HTTP **GET** (or HEAD where applicable) to **this** URL for the check, instead of to `url`. Same placeholders. Use it when the reliable signal is a **JSON/API** endpoint but the human-facing link must stay on the main site (e.g. `https://picsart.com/u/{username}` + probe `https://api.picsart.com/users/show/{username}.json`, or GitHubs `https://github.com/{username}` + `https://api.github.com/users/{username}`).
If `urlProbe` is omitted, the probe URL defaults to `url`.
### Redirects and final URL as a signal
If the **HTML shell** looks the same for “user exists” and “user does not exist” (typical SPA), it is still worth checking whether the **server** behaves differently:
- **Final URL** after redirects (e.g. profile canonical URL vs `/404` path).
- **Redirect chain** length or target host (e.g. lander vs profile).
If that differs reliably, you may be able to use **`checkType`: `response_url`** in [`data.json`](../maigret/resources/data.json) (no auto-follow) or extend logic — but only when the difference is stable.
**Server-side HTTP vs client-side navigation.** Maigret follows **HTTP** redirects only; it does **not** run JavaScript. If the browser shows a navigation to `/u/name/posts` or `/not-found` **after** the SPA bundle loads, that may never appear as an extra hop in `curl`/aiohttp — only a **trailing-slash** `301` might show up. Always confirm with `curl -sIL` / a small script whether the **Location** chain differs for real vs fake users before relying on URL-based rules.
**Empirical check (claimed vs non-existent usernames, `GET` with follow redirects, no JS):**
| Site | Result |
|------|--------|
| **Kaskus** | No HTTP redirects beyond the request path; same generic `<title>` and near-identical body length — **no** discriminating signal from redirects alone. |
| **Bibsonomy** | Both requests redirect to **`/pow-challenge/?return=/user/...`** (proof-of-work). Only the `return` path changes with the username; **both** existing and fake hit the same challenge flow — not a profile-vs-missing distinction. |
| **Picsart (web UI `https://picsart.com/u/{username}`)** | Only a **trailing-slash** `301`; the first HTML is the same empty app shell (~3 KiB) for real and fake users. Browser-only routes such as `…/posts` vs `…/not-found` are **not** visible as additional HTTP redirects in this pipeline. |
**Picsart — workable check via public API.** The site exposes **`https://api.picsart.com/users/show/{username}.json`**: JSON with `"status":"success"` and a user object when the account exists, and `"reason":"user_not_found"` when it does not. Put that URL in **`urlProbe`**, set **`url`** to the web profile pattern **`https://picsart.com/u/{username}`**, and use **`checkType`: `message`** with narrow `presenseStrs` / `absenceStrs` so reports show the human link while the request hits the API (see **`urlProbe`** above).
For **Kaskus** and **Bibsonomy**, HTTP-level comparison still does **not** unlock a safe check without PoW / richer signals; keep **`disabled: true`** until something stable appears (API, SSR markers, etc.).
---
## 2. Standard checks: public JSON API and `socid_extractor` log
### 2.1 Public JSON API (always)
When diagnosing a site—especially **SPAs**, **soft 404s**, or **near-identical HTML** for real vs fake users—**routinely look for a public JSON (or JSON-like) API** used for profile or user lookup. Typical leads: paths containing `/api/`, `/v1/`, `graphql`, `users/show`, `.json` suffixes, or the same endpoints mobile apps use. Verify with `curl` (or the Maigret request path) that **claimed** and **unclaimed** usernames produce **reliably different** bodies or status codes. If such an endpoint is more stable than HTML, put it in **`urlProbe`** and keep **`url`** as the canonical profile page on the main site (see **`urlProbe`** in section 1). If there is no separate public URL for humans, you may still point **`url`** at the API only (reports will show that URL).
This is a **standard** part of site-check work, not an optional extra.
### 2.2 Mandatory: [`LLM/socid_extractor_improvements.log`](socid_extractor_improvements.log)
If you discover **either**:
1. **JSON embedded in HTML** with user/profile fields (inline scripts, `__NEXT_DATA__`, `application/ld+json`, hydration blobs, etc.), or
2. A **standalone JSON HTTP response** (public API) with user/profile data for that service,
you **must append** a proposal block to **[`LLM/socid_extractor_improvements.log`](socid_extractor_improvements.log)**.
**Why:** Maigret calls [`socid_extractor.extract`](https://pypi.org/project/socid-extractor/) on the response body ([`extract_ids_data` in `checking.py`](../maigret/checking.py)) to fill `ids_data`. New payloads usually need a **new scheme** upstream (`flags`, `regex`, optional `extract_json`, `fields`, optional `url_mutations` / `transforms`), matching patterns such as **`GitHub API`** or **`Gitlab API`** in `socid_extractor`s `schemes.py`.
**Each log entry must include:**
- **Date** — ISO `YYYY-MM-DD` (day you add the entry).
- **Example username** — Prefer the sites `usernameClaimed` from `data.json`, or any account that reproduces the payload.
- **Proposal** — Use the **block template** in the log file: detection idea, optional URL mutation, and field mappings in the same style as existing schemes.
If the service is **already covered** by an existing `socid_extractor` scheme, add a **short** entry anyway (date, example username, scheme name, “already implemented”) so there is an audit trail.
Do **not** paste secrets, cookies, or full private JSON; short key names and structure hints are enough.
---
## 3. Improvement workflow
### Phase A — Reproduce
1. Targeted run:
```bash
maigret --db /path/to/maigret/resources/data.json \
TEST_USERNAME \
--site "SiteName" \
--print-not-found --print-errors \
--no-progressbar -vv
```
2. Run separately with a **real** existing username and a **definitely non-existent** one (as `usernameClaimed` / `usernameUnclaimed` in JSON).
3. If needed: `-vvv` and `debug.log` (raw response).
4. Automated pair check:
```bash
maigret --db ... --self-check --site "SiteName" --no-progressbar
```
### Phase B — Classify the cause
| Symptom | Likely cause |
|---------|----------------|
| False “found” with `status_code` | Soft 404 (200 on a “not found” page). |
| False “found” with `message` | Overly broad `presenseStrs` (`name`, `email`, JSON keys) or stale `absenceStrs`. |
| Same HTML for different users | SPA / skeleton shell before hydration — also compare **final URL / redirect chain** (see above); if still identical, often `disabled`. |
| Login page instead of profile | XenForo etc.: guest, `ignore403`, “must be logged in” strings. |
| reCAPTCHA / “Checking your browser” / “not a bot” | Bot protection; Maigrets default User-Agent may worsen the response. |
| Redirect to another domain / lander | Stale URL template. |
### Phase C — Edits in [`data.json`](../maigret/resources/data.json)
1. Update `url` / `urlMain` if needed (HTTPS, new profile path).
2. Replace inappropriate `status_code` with `message` (or `response_url`), choosing:
- **`absenceStrs`** — only what reliably appears on the “user does not exist” page;
- **`presenseStrs`** — narrow markers of a real profile (avoid generic words).
3. For XenForo: override only fields that differ in the site entry; do not break the global `engines` template.
4. Refresh `usernameClaimed` / `usernameUnclaimed` if reference accounts disappeared.
5. Set **`headers`** (e.g. another `User-Agent`) if the site serves a captcha only to “suspicious” clients.
6. Use **`errors`**: HTML substring → meaningful check error (UNKNOWN), so it is not confused with “available”.
### Phase D — Decision criteria
| Outcome | When to use |
|---------|-------------|
| **Check fixed** | The `claimed` / `unclaimed` pair behaves predictably, `--self-check` passes, no regression on a similar site with the same engine. |
| **Check disabled** (`disabled: true`) | Cloudflare / anti-bot / login required / indistinguishable SPA without stable markers. |
| **Entry removed** | **Only** if the domain/service is gone (NXDOMAIN, clearly dead project), not “because it is hard to fix”. |
### Phase E — Before commit
- `maigret --self-check` for affected sites.
- `make test`.
---
## 4. Findings from reviews (concrete site batch)
Summary from an earlier false-positive review for: OpenSea, Mercado Livre, Redtube, Toms Guide, Kaggle, Kaskus, Livemaster, TechPowerUp, authorSTREAM, Bibsonomy, Bulbagarden, iXBT, Serebii, Picsart, Hashnode, hi5.
### What most often broke checks
1. **`status_code` where content checks are needed** — soft 404 with status 200.
2. **Broad `presenseStrs`** — matches on error pages or generic SPA shells.
3. **XenForo + guest** — HTML includes strings like “You must be logged in” that overlap the engine template.
4. **User-Agent** — on some sites (e.g. Kaggle) the default UA triggered a reCAPTCHA page instead of profile HTML; a deliberate `User-Agent` in site `headers` helped.
5. **SPAs and redirects** — identical first HTML, redirect to lander / another product (hi5 → Tagged), URL format changes by region (Mercado Livre).
### What worked as a fix
- Switching to **`message`** with narrow strings from **`<title>`** or unique markup where stable (**Kaggle**, **Mercado Livre**, **Hashnode**).
- For **Kaggle**, additionally: **`headers`**, **`errors`** for browser-check text.
- **Redtube** stayed valid on **`status_code`** with a stable **404** for non-existent users.
- **Picsart**: the web profile URL is a thin SPA shell; use the **JSON API** (`api.picsart.com/users/show/{username}.json`) in **`url`** with **`message`**-style markers (`"status":"success"` vs `user_not_found`), not the browser-only `/posts` vs `/not-found` navigation.
- For **Weblate / Anubis Anti-Bot**: Setting `headers` with a basic script User-Agent (e.g. `python-requests/2.25.1`) rather than the default browser UA completely bypassed the Anubis Proof-of-Work challenge HTTP 307 redirect, instantly recovering the native HTTP 404 framework.
### What required disabling checks
Where you **cannot** reliably tell “profile exists” from “no profile” without bypassing protection, login, or full JS:
- Anti-bot / captcha / “not a bot” page;
- Guest-only access to the needed page;
- SPA with indistinguishable first response;
- Forums returning **403** and a login page instead of a member profile for the member-search URL;
- Stale URLs that redirect to a stub.
In those cases **`disabled: true`** is better than false “found”; remove the DB entry only on **actual** domain death.
### Code notes
- For the `status_code` branch in `process_site_result`, use **strict** comparison `check_type == "status_code"`, not a substring match inside `"status_code"`.
- Treat empty `presenseStrs` with `message` as risky: when debugging, watch DEBUG-level logs if that diagnostics exists in code.
---
## 5. Future ideas (Maigret improvements)
- A mode or script: one site, two usernames, print statuses and first N bytes of the response (wrapper around `maigret()`).
- Document in CLI help that **`--use-disabled-sites`** is needed to analyze disabled entries.
---
## 6. Development utilities
### 6.1 `utils/site_check.py` — Single site diagnostics
A comprehensive utility for testing individual sites with multiple modes:
```bash
# Basic comparison of claimed vs unclaimed (aiohttp)
python utils/site_check.py --site "VK" --check-claimed
# Test via Maigret's checker directly
python utils/site_check.py --site "VK" --maigret
# Compare aiohttp vs Maigret results (find discrepancies)
python utils/site_check.py --site "VK" --compare-methods
# Full diagnosis with recommendations
python utils/site_check.py --site "VK" --diagnose
# Test with custom URL
python utils/site_check.py --url "https://example.com/{username}" --compare user1 user2
# Find a valid username for a site
python utils/site_check.py --site "VK" --find-user
```
**Key features:**
- `--maigret` — Uses Maigret's actual checking code, not raw aiohttp
- `--compare-methods` — Shows if aiohttp and Maigret see different results (useful for debugging)
- `--diagnose` — Validates checkType against actual responses, suggests fixes
- Color output with markers detection (captcha, cloudflare, login, etc.)
- `--json` flag for machine-readable output
**When to use each mode:**
| Mode | Use case |
|------|----------|
| `--check-claimed` | Quick sanity check: do claimed/unclaimed still differ? |
| `--maigret` | Verify Maigret's actual behavior matches expectations |
| `--compare-methods` | Debug "works in curl but fails in Maigret" issues |
| `--diagnose` | Full analysis when a site is broken, get fix recommendations |
### 6.2 `utils/check_top_n.py` — Mass site checking
Batch-check top N sites by Alexa rank with categorized reporting:
```bash
# Check top 100 sites
python utils/check_top_n.py --top 100
# Faster with more parallelism
python utils/check_top_n.py --top 100 --parallel 10
# Output JSON report
python utils/check_top_n.py --top 100 --output report.json
# Only show broken sites
python utils/check_top_n.py --top 100 --only-broken
```
**Output categories:**
- `working` — Site check passes
- `broken` — Check fails (wrong status, missing markers)
- `timeout` — Request timed out
- `anti_bot` — 403/429 or captcha detected
- `error` — Connection or other errors
- `disabled` — Already disabled in data.json
**Report includes:**
- Summary counts by category
- List of broken sites with issues
- Recommendations for fixes (e.g., "Switch to checkType: status_code")
### 6.3 Self-check behavior (`--self-check`)
The self-check command has been improved to be less aggressive:
```bash
# Check sites WITHOUT auto-disabling (default)
maigret --self-check --site "VK"
# Auto-disable failing sites (old behavior)
maigret --self-check --site "VK" --auto-disable
# Show detailed diagnosis for each failure
maigret --self-check --site "VK" --diagnose
```
**Behavior changes:**
| Flag | Effect |
|------|--------|
| `--self-check` alone | Reports issues but does NOT disable sites |
| `--auto-disable` | Automatically disables sites that fail (opt-in) |
| `--diagnose` | Prints detailed diagnosis with recommendations |
**Why this matters:**
- Old behavior was too aggressive — sites got disabled without explanation
- New behavior reports issues and suggests fixes
- Explicit `--auto-disable` required to modify database
---
## 7. Lessons learned (practical observations)
Collected from hands-on work fixing top-ranked sites (Reddit, Wikipedia, Microsoft Learn, Baidu, etc.).
### 7.1 JSON API is the first thing to look for
Both Reddit and Microsoft Learn had working public APIs that solved the problem entirely. The web pages were SPAs or blocked by anti-bot measures, but the APIs worked reliably:
- **Reddit**: `https://api.reddit.com/user/{username}/about` — returns JSON with user data or `{"message": "Not Found", "error": 404}`.
- **Microsoft Learn**: `https://learn.microsoft.com/api/profiles/{username}` — returns JSON with `userName` field or HTTP 404.
This confirms the playbook recommendation: always check for `/api/`, `.json`, GraphQL endpoints before giving up on a site.
### 7.2 `urlProbe` is a powerful tool
It separates "what we check" (API) from "what we show the user" (human-readable profile URL). Reddit is a perfect example:
```json
{
"url": "https://www.reddit.com/user/{username}",
"urlProbe": "https://api.reddit.com/user/{username}/about",
"checkType": "message",
"presenseStrs": ["\"name\":"],
"absenceStrs": ["Not Found"]
}
```
The check hits the API, but reports display `www.reddit.com/user/blue`.
### 7.3 aiohttp ≠ curl ≠ requests
Wikipedia returned HTTP 200 for `curl` and Python `requests`, but HTTP 403 for `aiohttp`. This is **TLS fingerprinting** — the server identifies the HTTP library by cryptographic characteristics of the TLS handshake, not by headers.
**Key insight:** Changing `User-Agent` does **not** help against TLS fingerprinting. Always test with aiohttp directly (or via Maigret with `-vvv` and `debug.log`), not just `curl`.
```python
# This returns 403 for Wikipedia even with browser UA:
async with aiohttp.ClientSession() as session:
async with session.get(url, headers={"User-Agent": "Mozilla/5.0 ..."}) as resp:
print(resp.status) # 403
```
### 7.4 HTTP 403 in Maigret can mean different things
Initially it seemed Wikipedia was returning 403, but `curl` showed 200. Only `debug.log` revealed the real picture — aiohttp was getting blocked at TLS level.
**Lesson:** Use `-vvv` flag and inspect `debug.log` for raw response status and body. The warning message alone may be misleading.
### 7.5 Dead services migrate, not disappear
MSDN Social and TechNet profiles redirected to Microsoft Learn. Instead of deleting old entries:
1. Keep old entries with `disabled: true` as historical record.
2. Create a new entry for the current service with working API.
This preserves audit trail and avoids breaking existing workflows.
### 7.6 `status_code` is more reliable than `message` for APIs
Microsoft Learn API returns HTTP 404 for non-existent users — a clean signal without HTML parsing. For JSON APIs that return proper HTTP status codes, `status_code` is often the best choice:
```json
{
"checkType": "status_code",
"urlProbe": "https://learn.microsoft.com/api/profiles/{username}"
}
```
No need for fragile string matching when the API speaks HTTP correctly.
### 7.8 Engine templates can silently break across many sites
The **vBulletin** engine template has `absenceStrs` in five languages ("This user has not registered…", "Пользователь не зарегистрирован…", etc.). In a batch review of ~12 vBulletin forums (oneclickchicks, mirf, Pesiq, VKMOnline, forum.zone-game.info, etc.), **none** of the absence strings matched — the forums returned identical pages for both claimed and unclaimed usernames. Root cause: many of these forums require login to view member profiles, so they serve a generic page (no "user not registered" message at all) instead of an informative error.
**Lesson:** When a whole engine class shows false positives, do not patch sites one by one — check whether the **engine template** itself still matches the actual error pages. A template written for one version/language pack may silently stop working after a forum upgrade or config change.
### 7.9 Search-by-author URLs are architecturally unreliable
Several sites (OnanistovNet, Shoppingzone, Pogovorim, Astrogalaxy, Sexwin) used a phpBB-style `search.php?keywords=&terms=all&author={username}` URL as the check endpoint. This searches for **posts** by that author, not for the user account itself. Even if the markers worked, a user who exists but has zero posts would be indistinguishable from a non-existent user. And in practice, the sites changed their response format — some now return HTTP 404, others dropped the expected Russian absence text altogether.
**Lesson:** Avoid author-search URLs as the check endpoint; they test "has posts" rather than "account exists" and are doubly fragile (both logic mismatch and format drift).
### 7.10 Some sites generate a page for any path — permanent false positives
Two distinct patterns:
- **Pbase** creates a stub page titled "pbase Artist {username}" for **every** URL, real or fake. Both return HTTP 200 with nearly identical content (~3.3 KB). No markers can distinguish them.
- **ffm.bio** is even trickier: for the non-existent username `a.slomkoowski` it generated a page titled "mr.a" with description "a is a", apparently fuzzy-matching the path to the closest real entry. Both return HTTP 200 with large, content-rich pages.
**Lesson:** Before writing markers for a site, verify that the "unclaimed" URL actually produces an **error-like** response (different status, different title, unique error text). If the site always returns a plausible-looking page, no combination of `presenseStrs` / `absenceStrs` will help — `disabled: true` is the only safe option.
### 7.11 TLS fingerprinting can degrade over time (Kaggle)
Kaggle was previously fixed with a custom `User-Agent` header and `errors` for the "Checking your browser" captcha page. In the latest batch review, aiohttp receives HTTP 404 with identical content for **both** claimed and unclaimed usernames — the site now blocks the entire request before it reaches the profile page. This matches the TLS fingerprinting pattern seen earlier with Wikipedia (section 7.3), but here the degradation happened **after** a working fix was already in place.
**Lesson:** Sites that rely on bot-detection can tighten their rules at any time. A working `User-Agent` override today may fail tomorrow. When a previously fixed site starts returning identical responses for both usernames, suspect TLS fingerprinting first, and accept `disabled: true` if no public API is available.
### 7.12 API endpoints may bypass Cloudflare even when the main site is blocked
All four Fandom wikis returned HTTP 403 with a Cloudflare "Just a moment..." challenge when aiohttp accessed the user profile page (`/wiki/User:{username}`). However, the **MediaWiki API** on the same domain (`/api.php?action=query&list=users&ususers={username}&format=json`) returned clean JSON without any challenge. Similarly, **Substack** served a captcha-laden SPA for `/@{username}`, but its `public_profile` API (`/api/v1/user/{username}/public_profile`) responded with proper JSON and correct HTTP 404 for missing users.
This is likely because API routes are excluded from the Cloudflare WAF rules or use a different pipeline than the HTML-serving paths.
**Lesson:** When a site's main pages are blocked by Cloudflare or similar WAF, still check API endpoints on the **same domain** — they may not go through the same protection layer. This is especially true for:
- MediaWiki's `api.php` on wiki farms (Fandom, Wikia, self-hosted MediaWiki)
- REST API paths (`/api/v1/`, `/api/v2/`) on SPA-heavy sites
- Internal data endpoints that the SPA itself calls
### 7.13 GraphQL APIs often support GET, not just POST
**hashnode** exposes a GraphQL endpoint at `https://gql.hashnode.com`. While GraphQL is typically associated with POST requests, many implementations also support **GET** with the query passed as a URL parameter. This is critical for Maigret, which only supports GET/HEAD for `urlProbe`.
```
GET https://gql.hashnode.com?query=%7Buser(username%3A%20%22melwinalm%22)%20%7B%20name%20username%20%7D%7D
→ {"data":{"user":{"name":"Melwin D'Almeida","username":"melwinalm"}}}
GET https://gql.hashnode.com?query=%7Buser(username%3A%20%22a.slomkoowski%22)%20%7B%20name%20username%20%7D%7D
→ {"data":{"user":null}}
```
**Lesson:** Before giving up on a GraphQL-only site, try the same query via GET with `?query=...` (URL-encoded). Many GraphQL servers accept both methods.
### 7.14 URL-encoding resolves template placeholder conflicts
The hashnode GraphQL query `{user(username: "{username}") { name }}` contains curly braces that conflict with Maigret's `{username}` placeholder — Python's `str.format()` would raise a `KeyError` on `{user(username...}`.
The fix: URL-encode the GraphQL braces (`{` → `%7B`, `}` → `%7D`) but leave `{username}` as-is. Python's `.format()` only interprets literal `{…}` as placeholders, not `%7B…%7D`, and the GraphQL server decodes the percent-encoding on its end:
```
urlProbe: https://gql.hashnode.com?query=%7Buser(username%3A%20%22{username}%22)%20%7B%20name%20username%20%7D%7D
```
After `.format(username="melwinalm")`:
```
https://gql.hashnode.com?query=%7Buser(username%3A%20%22melwinalm%22)%20%7B%20name%20username%20%7D%7D
```
**Lesson:** When a `urlProbe` needs literal curly braces (GraphQL, JSON in URL, etc.), percent-encode them. This is a general technique for any `data.json` URL field processed by `.format()`.
### 7.7 The playbook classification works
The decision tree from the documentation accurately describes real-world cases:
| Situation | Playbook says | Actual result |
|-----------|---------------|---------------|
| Captcha (Baidu) | `disabled: true` | Correct |
| TLS fingerprinting (Wikipedia) | `disabled: true` (anti-bot) | Correct |
| Working API available (Reddit, MS Learn) | Use `urlProbe` | Correct |
| Service migrated (MSDN → MS Learn) | Update URL or create new entry | Correct |
---
## Documentation maintenance
For any of the changes below, **always** keep these artifacts in sync — this file ([`site-checks-guide.md`](site-checks-guide.md)), [`site-checks-playbook.md`](site-checks-playbook.md), and (when rules or templates change) the header/template in [`socid_extractor_improvements.log`](socid_extractor_improvements.log):
- Maigret code changes (including [`maigret/checking.py`](../maigret/checking.py), request executors, CLI);
- New or changed search tools / helper utilities for site checks;
- Changes to rules or semantics of `checkType`, `data.json` fields, self-check, etc.;
- Changes to the **public JSON API** diagnostic step or **mandatory** `socid_extractor` logging rules.
Prefer updating the guide, playbook, and log template in one commit or in the same task so instructions do not diverge. **Append-only:** new proposals go at the bottom of `socid_extractor_improvements.log`; do not delete historical entries when editing the template.
+87
View File
@@ -0,0 +1,87 @@
# Site checks — playbook (Maigret)
Short checklist for edits to [`maigret/resources/data.json`](../maigret/resources/data.json) and, when needed, [`maigret/checking.py`](../maigret/checking.py). Full guide: [`site-checks-guide.md`](site-checks-guide.md). Upstream extraction proposals: [`socid_extractor_improvements.log`](socid_extractor_improvements.log).
**Documentation maintenance:** whenever you improve Maigret, add search tooling, or change check logic, update **both** this file and [`site-checks-guide.md`](site-checks-guide.md) (see the “Documentation maintenance” section at the end of that file). When JSON API / `socid_extractor` logging rules change, update the **template header** in [`socid_extractor_improvements.log`](socid_extractor_improvements.log) in the same change.
## 0. Standard checks (do alongside reproduce / classify)
- **Public JSON API:** always look for a stable JSON (or GraphQL JSON) profile endpoint (`/api/`, `.json`, mobile-style URLs). When the API is more reliable than HTML, set **`urlProbe`** to that endpoint and keep **`url`** as the human-readable profile link (e.g. `https://picsart.com/u/{username}`). If there is no separate profile URL, use the API as `url` only. Details: **`urlProbe`** and section **2.1** in [`site-checks-guide.md`](site-checks-guide.md).
- **`socid_extractor` log (mandatory):** if you find **embedded user JSON in HTML** or a **standalone JSON profile API**, append a dated entry (with **example username**) to [`socid_extractor_improvements.log`](socid_extractor_improvements.log). Details: section **2.2** in [`site-checks-guide.md`](site-checks-guide.md).
## 1. Reproduce
- Run a targeted check:
`maigret USER --db /path/to/maigret/resources/data.json --site "SiteName" --print-not-found --print-errors --no-progressbar -vv`
- Compare an **existing** and a **non-existent** username (as `usernameClaimed` / `usernameUnclaimed` in JSON).
- With `-vvv`, inspect `debug.log` (raw response in the log).
## 2. Classify the cause
| Symptom | Typical cause | Action |
|--------|-----------------|--------|
| HTTP 200 for “user does not exist” | Soft 404 | Move from `status_code` to `message` or `response_url`; add `absenceStrs` / narrow `presenseStrs` |
| Generic words match (`name`, `email`) | `presenseStrs` too broad | Remove generic markers; add profile-specific ones |
| Same HTML without JS | SPA / skeleton shell | Compare **final URL and HTTP redirects** (Maigret already follows redirects by default). If the browser shows extra routes (`/posts`, `/not-found`) only **after JS**, they will **not** appear to Maigret — try a **public JSON/API** endpoint for the same site if one exists. See **Redirects and final URL** and **Picsart** in [`site-checks-guide.md`](site-checks-guide.md). |
| 403 / “Log in” / guest-only | Auth or anti-bot required | `disabled: true` |
| reCAPTCHA / “Checking your browser” | Bot protection | Try a reasonable `User-Agent` in `headers`; else `errors` + UNKNOWN or `disabled` |
| Domain does not resolve / persistent timeout | Dead service | Remove entry **only** after confirming the domain is dead |
## 3. Data edits
1. Update `url` / `urlMain` if needed (HTTPS redirects). Use optional **`urlProbe`** when the HTTP check should hit a different URL than the profile link shown in reports (API vs web UI).
2. For `message`: **always** tune string pairs so `absenceStrs` fire on “no user” pages and `presenseStrs` fire on real profiles without false absence hits.
3. Engine (`engine`, e.g. XenForo): override only differing fields in the site entry so other sites are not broken.
4. Keep `status_code` only if the response **reliably** differs by status code without soft 404.
## 4. Verify
- `maigret --self-check --site "SiteName" --db ...` for touched entries.
- `make test` before commit.
## 5. Code notes
- `process_site_result` uses strict comparison to `"status_code"` for `checkType` (not a substring trick).
- Empty `presenseStrs` with `message` means “presence always true”; a debug line is logged only at DEBUG level.
## 6. Development utilities
Quick reference for site check utilities. Full details: section **6** in [`site-checks-guide.md`](site-checks-guide.md).
| Command | Purpose |
|---------|---------|
| `python utils/site_check.py --site "X" --check-claimed` | Quick aiohttp comparison |
| `python utils/site_check.py --site "X" --maigret` | Test via Maigret checker |
| `python utils/site_check.py --site "X" --compare-methods` | Find aiohttp vs Maigret discrepancies |
| `python utils/site_check.py --site "X" --diagnose` | Full diagnosis with fix recommendations |
| `python utils/check_top_n.py --top 100` | Mass-check top 100 sites |
| `maigret --self-check --site "X"` | Self-check (reports only, no auto-disable) |
| `maigret --self-check --site "X" --auto-disable` | Self-check with auto-disable |
| `maigret --self-check --site "X" --diagnose` | Self-check with detailed diagnosis |
## 7. Quick tips (lessons learned)
Practical observations from fixing top-ranked sites. Full details: section **7** in [`site-checks-guide.md`](site-checks-guide.md).
| Tip | Why it matters |
|-----|----------------|
| **API first** | Reddit, Microsoft Learn — APIs worked when web pages were blocked. Always check `/api/`, `.json` endpoints. |
| **`urlProbe` separates check from display** | Check via API, show human URL in reports. Example: Reddit API → `www.reddit.com/user/` link. |
| **aiohttp ≠ curl** | Wikipedia returned 200 for curl, 403 for aiohttp (TLS fingerprinting). Always test with Maigret directly. |
| **Use `debug.log`** | Run with `-vvv` to see raw response. Warning messages alone can be misleading. |
| **`status_code` for clean APIs** | If API returns proper 404 for missing users, prefer `status_code` over `message`. |
| **Migrate, don't delete** | MSDN → Microsoft Learn: keep old entry disabled, create new one for current service. |
| **Engine templates break silently** | vBulletin `absenceStrs` failed on ~12 forums at once — many require login, showing a generic page with no error text. Check the engine template first. |
| **Search-by-author is unreliable** | phpBB `search.php?author=` checks for posts, not accounts. A user with zero posts looks identical to a non-existent user. Avoid these URLs. |
| **Some sites always generate a page** | Pbase stubs "pbase Artist {name}" for any path; ffm.bio fuzzy-matches to the nearest real entry. No markers can help — `disabled: true`. |
| **TLS fingerprinting degrades over time** | Kaggle's custom `User-Agent` fix stopped working — aiohttp now gets 404 for both usernames. Accept `disabled: true` when no API exists. |
| **API endpoints bypass Cloudflare** | Fandom `api.php` and Substack `/api/v1/` returned clean JSON while main pages were blocked by Cloudflare. Always try API paths on the same domain. |
| **Inspect Network tab for POST APIs** | Many modern platforms (e.g., Discord) heavily protect HTML profiles but expose unauthenticated `POST` endpoints for username checks. Maigret supports this natively: define `"request_method": "POST"` and `"request_payload": {"username": "{username}"}` in `data.json` to query them! |
| **Strict JSON markers are bulletproof** | When probing APIs, use `checkType: "message"` with exact JSON substrings (like `"{\"taken\": false}"`). Unlike HTML layout checks, this approach is immune to UI redesigns, A/B testing, and language translations. |
| **GraphQL supports GET too** | hashnode GraphQL works via `GET ?query=...` (URL-encoded). You can use either native POST payloads or GET `urlProbe` for GraphQL. |
| **URL-encode braces for template safety** | GraphQL `{...}` conflicts with Maigret's `{username}`. Use `%7B`/`%7D` for literal braces in `urlProbe``.format()` ignores percent-encoded chars. |
| **Anti-bot bypass via simple UA** | "Anubis" anti-bot PoW screens (like on Weblate) intercept modern browser UAs via HTTP 307. Hardcoding `"headers": {"User-Agent": "python-requests/2.25.1"}` circumvents the scraper filter and restores default detection logic. |
## 8. Documentation maintenance
When you change Maigret, add search tools, or change check logic, keep **this playbook**, [`site-checks-guide.md`](site-checks-guide.md), and (when applicable) the template in [`socid_extractor_improvements.log`](socid_extractor_improvements.log) aligned. New log **entries** are append-only at the bottom of that file.
+5 -5
View File
@@ -1,7 +1,7 @@
LINT_FILES=maigret wizard.py tests
test:
coverage run --source=./maigret -m pytest tests
coverage run --source=./maigret,./maigret/web -m pytest tests
coverage report -m
coverage html
@@ -10,16 +10,16 @@ rerun-tests:
lint:
@echo 'syntax errors or undefined names'
flake8 --count --select=E9,F63,F7,F82 --show-source --statistics ${LINT_FILES} maigret.py
flake8 --count --select=E9,F63,F7,F82 --show-source --statistics ${LINT_FILES}
@echo 'warning'
flake8 --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --ignore=E731,W503,E501 ${LINT_FILES} maigret.py
flake8 --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --ignore=E731,W503,E501 ${LINT_FILES}
@echo 'mypy'
mypy ${LINT_FILES}
mypy --check-untyped-defs ${LINT_FILES}
speed:
time python3 ./maigret.py --version
time python3 -m maigret --version
python3 -c "import timeit; t = timeit.Timer('import maigret'); print(t.timeit(number = 1000000))"
python3 -X importtime -c "import maigret" 2> maigret-import.log
python3 -m tuna maigret-import.log
+112 -30
View File
@@ -3,64 +3,85 @@
<p align="center">
<p align="center">
<a href="https://pypi.org/project/maigret/">
<img alt="PyPI" src="https://img.shields.io/pypi/v/maigret?style=flat-square">
<img alt="PyPI version badge for Maigret" src="https://img.shields.io/pypi/v/maigret?style=flat-square" />
</a>
<a href="https://pypi.org/project/maigret/">
<img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dw/maigret?style=flat-square">
<a href="https://pypi.org/project/maigret/">
<img alt="PyPI download count for Maigret" src="https://img.shields.io/pypi/dw/maigret?style=flat-square" />
</a>
<a href="https://pypi.org/project/maigret/">
<img alt="Views" src="https://komarev.com/ghpvc/?username=maigret&color=brightgreen&label=views&style=flat-square">
<a href="https://github.com/soxoj/maigret">
<img alt="Minimum Python version required: 3.10+" src="https://img.shields.io/badge/Python-3.10%2B-brightgreen?style=flat-square" />
</a>
<a href="https://github.com/soxoj/maigret/blob/main/LICENSE">
<img alt="License badge for Maigret" src="https://img.shields.io/github/license/soxoj/maigret?style=flat-square" />
</a>
<a href="https://github.com/soxoj/maigret">
<img alt="View count for Maigret project" src="https://komarev.com/ghpvc/?username=maigret&color=brightgreen&label=views&style=flat-square" />
</a>
</p>
<p align="center">
<img src="https://raw.githubusercontent.com/soxoj/maigret/main/static/maigret.png" height="200"/>
<img src="https://raw.githubusercontent.com/soxoj/maigret/main/static/maigret.png" height="300"/>
</p>
</p>
<i>The Commissioner Jules Maigret is a fictional French police detective, created by Georges Simenon. His investigation method is based on understanding the personality of different people and their interactions.</i>
<b>👉👉👉 [Online Telegram bot](https://t.me/maigret_search_bot)</b>
## About
**Maigret** collect a dossier on a person **by username only**, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys required. Maigret is an easy-to-use and powerful fork of [Sherlock](https://github.com/sherlock-project/sherlock).
**Maigret** collects a dossier on a person **by username only**, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys are required. Maigret is an easy-to-use and powerful fork of [Sherlock](https://github.com/sherlock-project/sherlock).
Currently supported more than 2500 sites ([full list](https://github.com/soxoj/maigret/blob/main/sites.md)), search is launched against 500 popular sites in descending order of popularity by default. Also supported checking of Tor sites, I2P sites, and domains (via DNS resolving).
Currently supports more than 3000 sites ([full list](https://github.com/soxoj/maigret/blob/main/sites.md)), search is launched against 500 popular sites in descending order of popularity by default. Also supported checking Tor sites, I2P sites, and domains (via DNS resolving).
## Powered By Maigret
These are professional tools for social media content analysis and OSINT investigations that use Maigret (banners are clickable).
<a href="https://github.com/SocialLinks-IO/sociallinks-api"><img height="60" alt="Social Links API" src="https://github.com/user-attachments/assets/789747b2-d7a0-4d4e-8868-ffc4427df660"></a>
<a href="https://sociallinks.io/products/sl-crimewall"><img height="60" alt="Social Links Crimewall" src="https://github.com/user-attachments/assets/0b18f06c-2f38-477b-b946-1be1a632a9d1"></a>
<a href="https://usersearch.ai/"><img height="60" alt="UserSearch" src="https://github.com/user-attachments/assets/66daa213-cf7d-40cf-9267-42f97cf77580"></a>
## Main features
* Profile pages parsing, [extraction](https://github.com/soxoj/socid_extractor) of personal info, links to other profiles, etc.
* Recursive search by new usernames and other ids found
* Profile page parsing, [extraction](https://github.com/soxoj/socid_extractor) of personal info, links to other profiles, etc.
* Recursive search by new usernames and other IDs found
* Search by tags (site categories, countries)
* Censorship and captcha detection
* Requests retries
See full description of Maigret features [in the documentation](https://maigret.readthedocs.io/en/latest/features.html).
See the full description of Maigret features [in the documentation](https://maigret.readthedocs.io/en/latest/features.html).
## Installation
Maigret can be installed using pip, Docker, or simply can be launched from the cloned repo.
‼️ Maigret is available online via [official Telegram bot](https://t.me/maigret_search_bot). Consider using it if you don't want to install anything.
### Windows
Standalone EXE-binaries for Windows are located in [Releases section](https://github.com/soxoj/maigret/releases) of GitHub repository.
Also you can run Maigret using cloud shells and Jupyter notebooks (see buttons below).
Video guide on how to run it: https://youtu.be/qIgwTZOmMmM.
### Installation in Cloud Shells
You can launch Maigret using cloud shells and Jupyter notebooks. Press one of the buttons below and follow the instructions to launch it in your browser.
[![Open in Cloud Shell](https://user-images.githubusercontent.com/27065646/92304704-8d146d80-ef80-11ea-8c29-0deaabb1c702.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/soxoj/maigret&tutorial=README.md)
<a href="https://repl.it/github/soxoj/maigret"><img src="https://user-images.githubusercontent.com/27065646/92304596-bf719b00-ef7f-11ea-987f-2c1f3c323088.png" alt="Run on Repl.it" height="50"></a>
<a href="https://repl.it/github/soxoj/maigret"><img src="https://replit.com/badge/github/soxoj/maigret" alt="Run on Replit" height="50"></a>
<a href="https://colab.research.google.com/gist/soxoj/879b51bc3b2f8b695abb054090645000/maigret-collab.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="45"></a>
<a href="https://mybinder.org/v2/gist/soxoj/9d65c2f4d3bec5dd25949197ea73cf3a/HEAD"><img src="https://mybinder.org/badge_logo.svg" alt="Open In Binder" height="45"></a>
### Package installing
### Local installation
**NOTE**: Python 3.6 or higher and pip is required, **Python 3.8 is recommended.**
Maigret can be installed using pip, Docker, or simply can be launched from the cloned repo.
**NOTE**: Python 3.10 or higher and pip is required, **Python 3.11 is recommended.**
```bash
# install from pypi
pip3 install maigret
# or clone and install manually
git clone https://github.com/soxoj/maigret && cd maigret
pip3 install .
# usage
maigret username
```
@@ -68,11 +89,14 @@ maigret username
### Cloning a repository
```bash
# or clone and install manually
git clone https://github.com/soxoj/maigret && cd maigret
pip3 install -r requirements.txt
# build and install
pip3 install .
# usage
./maigret.py username
maigret username
```
### Docker
@@ -82,7 +106,7 @@ pip3 install -r requirements.txt
docker pull soxoj/maigret
# usage
docker run soxoj/maigret:latest username
docker run -v /mydir:/app/reports soxoj/maigret:latest username --html
# manual build
docker build -t maigret .
@@ -91,32 +115,90 @@ docker build -t maigret .
## Usage examples
```bash
# make HTML and PDF reports
maigret user --html --pdf
# make HTML, PDF, and Xmind8 reports
maigret user --html
maigret user --pdf
maigret user --xmind #Output not compatible with xmind 2022+
# search on sites marked with tags photo & dating
maigret user --tags photo,dating
# search on sites marked with tag us
maigret user --tags us
# search for three usernames on all available sites
maigret user1 user2 user3 -a
```
Use `maigret --help` to get full options description. Also options are documented in [the Maigret Wiki](https://github.com/soxoj/maigret/wiki/Command-line-options).
Use `maigret --help` to get full options description. Also options [are documented](https://maigret.readthedocs.io/en/latest/command-line-options.html).
### Web interface
You can run Maigret with a web interface, where you can view the graph with results and download reports of all formats on a single page.
<details>
<summary>Web Interface Screenshots</summary>
![Web interface: how to start](https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot_start.png)
![Web interface: results](https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot.png)
</details>
Instructions:
1. Run Maigret with the ``--web`` flag and specify the port number.
```console
maigret --web 5000
```
2. Open http://127.0.0.1:5000 in your browser and enter one or more usernames to make a search.
3. Wait a bit for the search to complete and view the graph with results, the table with all accounts found, and download reports of all formats.
## Contributing
Maigret has open-source code, so you may contribute your own sites by adding them to `data.json` file, or bring changes to it's code!
For more information about development and contribution, please read the [development documentation](https://maigret.readthedocs.io/en/latest/development.html).
## Demo with page parsing and recursive username search
[PDF report](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.pdf), [HTML report](https://htmlpreview.github.io/?https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.html)
### Video (asciinema)
![animation of recursive search](https://raw.githubusercontent.com/soxoj/maigret/main/static/recursive_search.svg)
<a href="https://asciinema.org/a/Ao0y7N0TTxpS0pisoprQJdylZ">
<img src="https://asciinema.org/a/Ao0y7N0TTxpS0pisoprQJdylZ.svg" alt="asciicast" width="600">
</a>
### Reports
[PDF report](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.pdf), [HTML report](https://htmlpreview.github.io/?https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.html)
![HTML report screenshot](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotography_html_screenshot.png)
![XMind report screenshot](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotography_xmind_screenshot.png)
![XMind 8 report screenshot](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotography_xmind_screenshot.png)
[Full console output](https://raw.githubusercontent.com/soxoj/maigret/main/static/recursive_search.md)
## Disclaimer
**This tool is intended for educational and lawful purposes only.** The developers do not endorse or encourage any illegal activities or misuse of this tool. Regulations regarding the collection and use of personal data vary by country and region, including but not limited to GDPR in the EU, CCPA in the USA, and similar laws worldwide.
It is your sole responsibility to ensure that your use of this tool complies with all applicable laws and regulations in your jurisdiction. Any illegal use of this tool is strictly prohibited, and you are fully accountable for your actions.
The authors and developers of this tool bear no responsibility for any misuse or unlawful activities conducted by its users.
## Feedback
If you have any questions, suggestions, or feedback, please feel free to [open an issue](https://github.com/soxoj/maigret/issues), create a [GitHub discussion](https://github.com/soxoj/maigret/discussions), or contact the author directly via [Telegram](https://t.me/soxoj).
## SOWEL classification
This tool uses the following OSINT techniques:
- [SOTL-2.2. Search For Accounts On Other Platforms](https://sowel.soxoj.com/other-platform-accounts)
- [SOTL-6.1. Check Logins Reuse To Find Another Account](https://sowel.soxoj.com/logins-reuse)
- [SOTL-6.2. Check Nicknames Reuse To Find Another Account](https://sowel.soxoj.com/nicknames-reuse)
## License
MIT © [Maigret](https://github.com/soxoj/maigret)<br/>
-18
View File
@@ -1,18 +0,0 @@
#!/usr/bin/env python3
import asyncio
import sys
from maigret.maigret import main
def run():
try:
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
except KeyboardInterrupt:
print('Maigret is interrupted.')
sys.exit(1)
if __name__ == "__main__":
run()
+1 -1
View File
@@ -10,4 +10,4 @@
pixabay.com FALSE / FALSE 0 anonymous_user_id c1e4ee09-5674-4252-aa94-8c47b1ea80ab
pixabay.com FALSE / FALSE 1647214439 csrftoken vfetTSvIul7gBlURt6s985JNM18GCdEwN5MWMKqX4yI73xoPgEj42dbNefjGx5fr
pixabay.com FALSE / FALSE 1647300839 client_width 1680
pixabay.com FALSE / FALSE 748111764839 is_human 1
pixabay.com FALSE / FALSE 748111764839 is_human 1
+1
View File
@@ -1 +1,2 @@
sphinx-copybutton
sphinx_rtd_theme
+23 -5
View File
@@ -18,7 +18,7 @@ Parsing of account pages and online documents
Maigret will try to extract information about the document/account owner
(including username and other ids) and will make a search by the
extracted username and ids. :doc:`Examples <extracting-information-from-pages>`.
extracted username and ids. See examples in the :ref:`extracting-information-from-pages` section.
Main options
------------
@@ -27,18 +27,36 @@ Options are also configurable through settings files, see
:doc:`settings section <settings>`.
``--tags`` - Filter sites for searching by tags: sites categories and
two-letter country codes. E.g. photo, dating, sport; jp, us, global.
Multiple tags can be associated with one site. **Warning: tags markup is
not stable now.**
two-letter country codes (**not a language!**). E.g. photo, dating, sport; jp, us, global.
Multiple tags can be associated with one site. **Warning**: tags markup is
not stable now. Read more :doc:`in the separate section <tags>`.
``--exclude-tags`` - Exclude sites with specific tags from the search
(blacklist). E.g. ``--exclude-tags porn,dating`` will skip all sites
tagged with ``porn`` or ``dating``. Can be combined with ``--tags`` to
include certain categories while excluding others. Read more
:doc:`in the separate section <tags>`.
``-n``, ``--max-connections`` - Allowed number of concurrent connections
**(default: 100)**.
``-a``, ``--all-sites`` - Use all sites for scan **(default: top 500)**.
``--top-sites`` - Count of sites for scan ranked by Alexa Top
``--top-sites`` - Count of sites for scan ranked by Majestic Million
**(default: top 500)**.
**Mirrors:** After the top *N* sites by Majestic Million rank are chosen (respecting
``--tags``, ``--use-disabled-sites``, etc.), Maigret may add extra sites
whose database field ``source`` names a **parent platform** that itself falls
in the Majestic Million top *N* when ranking **including disabled** sites. For example,
if ``Twitter`` ranks in the first 500 by Majestic Million, a mirror such as ``memory.lol``
(with ``source: Twitter``) is included even though it has no rank and would
otherwise be cut off. The same applies to Instagram-related mirrors (e.g.
Picuki) when ``Instagram`` is in that parent top *N* by rank—even if the
official ``Instagram`` entry is disabled and not scanned by default, its
mirrors can still be pulled in. The final list is the ranked top *N* plus
these mirrors (no fixed upper bound on mirror count).
``--timeout`` - Time (in seconds) to wait for responses from sites
**(default: 30)**. A longer timeout will be more likely to get results
from slow sites. On the other hand, this may cause a long delay to
+3 -3
View File
@@ -3,11 +3,11 @@
# -- Project information
project = 'Maigret'
copyright = '2021, soxoj'
copyright = '2025, soxoj'
author = 'soxoj'
release = '0.4.1'
version = '0.4.1'
release = '0.5.0'
version = '0.5'
# -- General configuration
+298
View File
@@ -0,0 +1,298 @@
.. _development:
Development
==============
Frequently Asked Questions
--------------------------
1. Where to find the list of supported sites?
The human-readable list of supported sites is available in the `sites.md <https://github.com/soxoj/maigret/blob/main/sites.md>`_ file in the repository.
It's been generated automatically from the main JSON file with the list of supported sites.
The machine-readable JSON file with the list of supported sites is available in the
`data.json <https://github.com/soxoj/maigret/blob/main/maigret/resources/data.json>`_ file in the directory `resources`.
2. Which methods to check the account presence are supported?
The supported methods (``checkType`` values in ``data.json``) are:
- ``message`` - the most reliable method, checks if any string from ``presenceStrs`` is present and none of the strings from ``absenceStrs`` are present in the HTML response
- ``status_code`` - checks that status code of the response is 2XX
- ``response_url`` - check if there is not redirect and the response is 2XX
.. note::
Maigret natively treats specific anti-bot HTTP status codes (like LinkedIn's ``HTTP 999``) as a standard "Not Found/Available" signal instead of throwing an infrastructure Server Error, gracefully preventing false positives.
See the details of check mechanisms in the `checking.py <https://github.com/soxoj/maigret/blob/main/maigret/checking.py#L339>`_ file.
.. note::
Maigret now uses the **Majestic Million** dataset for site popularity sorting instead of the discontinued Alexa Rank API. For backward compatibility with existing configurations and parsers, the ranking field in `data.json` and internal site models remains named ``alexaRank`` and ``alexa_rank``.
**Mirrors and ``--top-sites``:** When you limit scans with ``--top-sites N``, Maigret also includes *mirror* sites (entries whose ``source`` field points at a parent platform such as Twitter or Instagram) if that parent would appear in the Majestic Million top *N* when disabled sites are considered for ranking. See the **Mirrors** paragraph under ``--top-sites`` in :doc:`command-line-options`.
Testing
-------
It is recommended use Python 3.10 for testing.
Install test requirements:
.. code-block:: console
poetry install --with dev
Use the following commands to check Maigret:
.. code-block:: console
# run linter and typing checks
# order of checks:
# - critical syntax errors or undefined names
# - flake checks
# - mypy checks
make lint
# run black formatter
make format
# run testing with coverage html report
# current test coverage is 58%
make test
# open html report
open htmlcov/index.html
# get flamechart of imports to estimate startup time
make speed
How to fix false-positives
-----------------------------------------------
If you want to work with sites database, don't forget to activate statistics update git hook, command for it would look like this: ``git config --local core.hooksPath .githooks/``.
You should make your git commits from your maigret git repo folder, or else the hook wouldn't find the statistics update script.
1. Determine the problematic site.
If you already know which site has a false-positive and want to fix it specifically, go to the next step.
Otherwise, simply run a search with a random username (e.g. `laiuhi3h4gi3u4hgt`) and check the results.
Alternatively, you can use `the Telegram bot <https://t.me/osint_maigret_bot>`_.
2. Open the account link in your browser and check:
- If the site is completely gone, remove it from the list
- If the site still works but looks different, update in data.json how we check it
- If the site requires login to view profiles, disable checking it
3. Find the site in the `data.json <https://github.com/soxoj/maigret/blob/main/maigret/resources/data.json>`_ file.
If the ``checkType`` method is not ``message`` and you are going to fix check, update it:
- put ``message`` in ``checkType``
- put in ``absenceStrs`` a keyword that is present in the HTML response for an non-existing account
- put in ``presenceStrs`` a keyword that is present in the HTML response for an existing account
If you have trouble determining the right keywords, you can use automatic detection by passing the account URL with the ``--submit`` option:
.. code-block:: console
maigret --submit https://my.mail.ru/bk/alex
To disable checking, set ``disabled`` to ``true`` or simply run:
.. code-block:: console
maigret --self-check --site My.Mail.ru@bk.ru
To debug the check method using the response HTML, you can run:
.. code-block:: console
maigret soxoj --site My.Mail.ru@bk.ru -d 2> response.txt
There are few options for sites data.json helpful in various cases:
- ``engine`` - a predefined check for the sites of certain type (e.g. forums), see the ``engines`` section in the JSON file
- ``headers`` - a dictionary of additional headers to be sent to the site
- ``requestHeadOnly`` - set to ``true`` if it's enough to make a HEAD request to the site
- ``regexCheck`` - a regex to check if the username is valid, in case of frequent false-positives
- ``requestMethod`` - set the HTTP method to use (e.g., ``POST``). By default, Maigret natively defaults to GET or HEAD.
- ``requestPayload`` - a dictionary with the JSON payload to send for POST requests (e.g., ``{"username": "{username}"}``), extremely useful for parsing GraphQL or modern JSON APIs.
``urlProbe`` (optional profile probe URL)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By default Maigret performs the HTTP request to the same URL as ``url`` (the public profile link pattern).
If you set ``urlProbe`` in ``data.json``, Maigret **fetches** that URL for the presence check (API, GraphQL, JSON endpoint, etc.), while **reports and ``url_user``** still use ``url`` — the human-readable profile page users should open.
Placeholders: ``{username}``, ``{urlMain}``, ``{urlSubpath}`` (same as for ``url``). Example: GitHub uses ``url`` ``https://github.com/{username}`` and ``urlProbe`` ``https://api.github.com/users/{username}``; Picsart uses the web profile ``https://picsart.com/u/{username}`` and probes ``https://api.picsart.com/users/show/{username}.json``.
Implementation: ``make_site_result`` in `checking.py <https://github.com/soxoj/maigret/blob/main/maigret/checking.py>`_.
Site check fixes using LLM
--------------------------
.. note::
The ``LLM/`` directory at the root of the repository contains detailed instructions for editing site checks (in Markdown format): checklist, full guide to ``checkType`` / ``data.json`` / ``urlProbe``, handling false positives, searching for public JSON APIs, and the proposal log for ``socid_extractor``.
Main files:
- `site-checks-playbook.md <https://github.com/soxoj/maigret/blob/main/LLM/site-checks-playbook.md>`_ — short checklist
- `site-checks-guide.md <https://github.com/soxoj/maigret/blob/main/LLM/site-checks-guide.md>`_ — detailed guide
- `socid_extractor_improvements.log <https://github.com/soxoj/maigret/blob/main/LLM/socid_extractor_improvements.log>`_ — template and entries for identity extractor improvements
These files should be kept up-to-date whenever changes are made to the check logic in the code or in ``data.json``.
.. _activation-mechanism:
Activation mechanism
--------------------
The activation mechanism helps make requests to sites requiring additional authentication like cookies, JWT tokens, or custom headers.
Let's study the Vimeo site check record from the Maigret database:
.. code-block:: json
"Vimeo": {
"tags": [
"us",
"video"
],
"headers": {
"Authorization": "jwt eyJ0..."
},
"activation": {
"url": "https://vimeo.com/_rv/viewer",
"marks": [
"Something strange occurred. Please get in touch with the app's creator."
],
"method": "vimeo"
},
"urlProbe": "https://api.vimeo.com/users/{username}?fields=name...",
"checkType": "status_code",
"alexaRank": 148,
"urlMain": "https://vimeo.com/",
"url": "https://vimeo.com/{username}",
"usernameClaimed": "blue",
"usernameUnclaimed": "noonewouldeverusethis7"
},
The activation method is:
.. code-block:: python
def vimeo(site, logger, cookies={}):
headers = dict(site.headers)
if "Authorization" in headers:
del headers["Authorization"]
import requests
r = requests.get(site.activation["url"], headers=headers)
jwt_token = r.json()["jwt"]
site.headers["Authorization"] = "jwt " + jwt_token
Here's how the activation process works when a JWT token becomes invalid:
1. The site check makes an HTTP request to ``urlProbe`` with the invalid token
2. The response contains an error message specified in the ``activation``/``marks`` field
3. When this error is detected, the ``vimeo`` activation function is triggered
4. The activation function obtains a new JWT token and updates it in the site check record
5. On the next site check (either through retry or a new Maigret run), the valid token is used and the check succeeds
Examples of activation mechanism implementation are available in `activation.py <https://github.com/soxoj/maigret/blob/main/maigret/activation.py>`_ file.
How to publish new version of Maigret
-------------------------------------
**Collaborats rights are requires, write Soxoj to get them**.
For new version publishing you must create a new branch in repository
with a bumped version number and actual changelog first. After it you
must create a release, and GitHub action automatically create a new
PyPi package.
- New branch example: https://github.com/soxoj/maigret/commit/e520418f6a25d7edacde2d73b41a8ae7c80ddf39
- Release example: https://github.com/soxoj/maigret/releases/tag/v0.4.1
1. Make a new branch locally with a new version name. Check the current version number here: https://pypi.org/project/maigret/.
**Increase only patch version (third number)** if there are no breaking changes.
.. code-block:: console
git checkout -b 0.4.0
2. Update Maigret version in three files manually:
- pyproject.toml
- maigret/__version__.py
- docs/source/conf.py
- snapcraft.yaml
3. Create a new empty text section in the beginning of the file `CHANGELOG.md` with a current date:
.. code-block:: console
## [0.4.0] - 2022-01-03
4. Get auto-generate release notes:
- Open https://github.com/soxoj/maigret/releases/new
- Click `Choose a tag`, enter `v0.4.0` (your version)
- Click `Create new tag`
- Press `+ Auto-generate release notes`
- Copy all the text from description text field below
- Paste it to empty text section in `CHANGELOG.txt`
- Remove redundant lines `## What's Changed` and `## New Contributors` section if it exists
- *Close the new release page*
5. Commit all the changes, push, make pull request
.. code-block:: console
git add -p
git commit -m 'Bump to YOUR VERSION'
git push origin head
6. Merge pull request
7. Create new release
- Open https://github.com/soxoj/maigret/releases/new again
- Click `Choose a tag`
- Enter actual version in format `v0.4.0`
- Also enter actual version in the field `Release title`
- Click `Create new tag`
- Press `+ Auto-generate release notes`
- **Press "Publish release" button**
8. That's all, now you can simply wait push to PyPi. You can monitor it in Action page: https://github.com/soxoj/maigret/actions/workflows/python-publish.yml
Documentation updates
---------------------
Documentations is auto-generated and auto-deployed from the ``docs`` directory.
To manually update documentation:
1. Change something in the ``.rst`` files in the ``docs/source`` directory.
2. Install ``pip install -r requirements.txt`` in the docs directory.
3. Run ``make singlehtml`` in the terminal in the docs directory.
4. Open ``build/singlehtml/index.html`` in your browser to see the result.
5. If everything is ok, commit and push your changes to GitHub.
Roadmap
-------
.. warning::
This roadmap requires updating to reflect the current project status and future plans.
.. figure:: https://i.imgur.com/kk8cFdR.png
:target: https://i.imgur.com/kk8cFdR.png
:align: center
@@ -1,35 +0,0 @@
.. _extracting-information-from-pages:
Extracting information from pages
=================================
Maigret can parse URLs and content of web pages by URLs to extract info about account owner and other meta information.
You must specify the URL with the option ``--parse``, it's can be a link to an account or an online document. List of supported sites `see here <https://github.com/soxoj/socid-extractor#sites>`_.
After the end of the parsing phase, Maigret will start the search phase by :doc:`supported identifiers <supported-identifier-types>` found (usernames, ids, etc.).
Examples
--------
.. code-block:: console
$ maigret --parse https://docs.google.com/spreadsheets/d/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw/edit\#gid\=0
Scanning webpage by URL https://docs.google.com/spreadsheets/d/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw/edit#gid=0...
┣╸org_name: Gooten
┗╸mime_type: application/vnd.google-apps.ritz
Scanning webpage by URL https://clients6.google.com/drive/v2beta/files/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw?fields=alternateLink%2CcopyRequiresWriterPermission%2CcreatedDate%2Cdescription%2CdriveId%2CfileSize%2CiconLink%2Cid%2Clabels(starred%2C%20trashed)%2ClastViewedByMeDate%2CmodifiedDate%2Cshared%2CteamDriveId%2CuserPermission(id%2Cname%2CemailAddress%2Cdomain%2Crole%2CadditionalRoles%2CphotoLink%2Ctype%2CwithLink)%2Cpermissions(id%2Cname%2CemailAddress%2Cdomain%2Crole%2CadditionalRoles%2CphotoLink%2Ctype%2CwithLink)%2Cparents(id)%2Ccapabilities(canMoveItemWithinDrive%2CcanMoveItemOutOfDrive%2CcanMoveItemOutOfTeamDrive%2CcanAddChildren%2CcanEdit%2CcanDownload%2CcanComment%2CcanMoveChildrenWithinDrive%2CcanRename%2CcanRemoveChildren%2CcanMoveItemIntoTeamDrive)%2Ckind&supportsTeamDrives=true&enforceSingleParent=true&key=AIzaSyC1eQ1xj69IdTMeii5r7brs3R90eck-m7k...
┣╸created_at: 2016-02-16T18:51:52.021Z
┣╸updated_at: 2019-10-23T17:15:47.157Z
┣╸gaia_id: 15696155517366416778
┣╸fullname: Nadia Burgess
┣╸email: nadia@gooten.com
┣╸image: https://lh3.googleusercontent.com/a-/AOh14GheZe1CyNa3NeJInWAl70qkip4oJ7qLsD8vDy6X=s64
┗╸email_username: nadia
.. code-block:: console
$ maigret.py --parse https://steamcommunity.com/profiles/76561199113454789
Scanning webpage by URL https://steamcommunity.com/profiles/76561199113454789...
┣╸steam_id: 76561199113454789
┣╸nickname: Pok
┗╸username: Machine42
+170 -5
View File
@@ -5,6 +5,34 @@ Features
This is the list of Maigret features.
.. _web-interface:
Web Interface
-------------
You can run Maigret with a web interface, where you can view the graph with results and download reports of all formats on a single page.
.. image:: https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot_start.png
:alt: Web interface: how to start
.. image:: https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot.png
:alt: Web interface: results
Instructions:
1. Run Maigret with the ``--web`` flag and specify the port number.
.. code-block:: console
maigret --web 5000
2. Open http://127.0.0.1:5000 in your browser and enter one or more usernames to make a search.
3. Wait a bit for the search to complete and view the graph with results, the table with all accounts found, and download reports of all formats.
Personal info gathering
-----------------------
@@ -14,17 +42,99 @@ Also, Maigret use found ids and usernames from links to start a recursive search
Enabled by default, can be disabled with ``--no extracting``.
.. code-block:: text
$ python3 -m maigret soxoj --timeout 5
[-] Starting a search on top 500 sites from the Maigret database...
[!] You can run search by full list of sites with flag `-a`
[*] Checking username soxoj on:
...
[+] GitHub: https://github.com/soxoj
├─uid: 31013580
├─image: https://avatars.githubusercontent.com/u/31013580?v=4
├─created_at: 2017-08-14T17:03:07Z
├─location: Amsterdam, Netherlands
├─follower_count: 1304
├─following_count: 54
├─fullname: Soxoj
├─public_gists_count: 3
├─public_repos_count: 88
├─twitter_username: sox0j
├─bio: Head of OSINT Center of Excellence in @SocialLinks-IO
├─is_company: Social Links
└─blog_url: soxoj.com
...
Recursive search
----------------
Maigret can extract some :ref:`common ids <supported-identifier-types>` and usernames from links on the account page (often people placed links to their other accounts) and immediately start new searches. All the gathered information will be displayed in CLI output and reports.
Maigret has the ability to scan account pages for :ref:`common identifiers <supported-identifier-types>` and usernames found in links.
When people include links to their other social media accounts, Maigret can automatically detect and initiate new searches for those profiles.
Any information discovered through this process will be shown in both the command-line interface output and generated reports.
Enabled by default, can be disabled with ``--no-recursion``.
Reports
.. code-block:: text
$ python3 -m maigret soxoj --timeout 5
[-] Starting a search on top 500 sites from the Maigret database...
[!] You can run search by full list of sites with flag `-a`
[*] Checking username soxoj on:
...
[+] GitHub: https://github.com/soxoj
├─uid: 31013580
├─image: https://avatars.githubusercontent.com/u/31013580?v=4
├─created_at: 2017-08-14T17:03:07Z
├─location: Amsterdam, Netherlands
├─follower_count: 1304
├─following_count: 54
├─fullname: Soxoj
├─public_gists_count: 3
├─public_repos_count: 88
├─twitter_username: sox0j <===== another username found here
├─bio: Head of OSINT Center of Excellence in @SocialLinks-IO
├─is_company: Social Links
└─blog_url: soxoj.com
...
Searching |████████████████████████████████████████| 500/500 [100%] in 9.1s (54.85/s)
[-] You can see detailed site check errors with a flag `--print-errors`
[*] Checking username sox0j on:
[+] Telegram: https://t.me/sox0j
├─fullname: @Sox0j
...
Username permutations
---------------------
Maigret can generate permutations of usernames. Just pass a few usernames in the CLI and use ``--permute`` flag.
Thanks to `@balestek <https://github.com/balestek>`_ for the idea and implementation.
.. code-block:: text
$ python3 -m maigret --permute hope dream --timeout 5
[-] 12 permutations from hope dream to check...
├─ hopedream
├─ _hopedream
├─ hopedream_
├─ hope_dream
├─ hope-dream
├─ hope.dream
├─ dreamhope
├─ _dreamhope
├─ dreamhope_
├─ dream_hope
├─ dream-hope
└─ dream.hope
[-] Starting a search on top 500 sites from the Maigret database...
[!] You can run search by full list of sites with flag `-a`
[*] Checking username hopedream on:
...
Reports
-------
Maigret currently supports HTML, PDF, TXT, XMind mindmap, and JSON reports.
Maigret currently supports HTML, PDF, TXT, XMind 8 mindmap, and JSON reports.
HTML/PDF reports contain:
@@ -34,6 +144,9 @@ HTML/PDF reports contain:
Also, there is a short text report in the CLI output after the end of a searching phase.
.. warning::
XMind 8 mindmaps are incompatible with XMind 2022!
Tags
----
@@ -62,12 +175,64 @@ Archives and mirrors checking
The Maigret database contains not only the original websites, but also mirrors, archives, and aggregators. For example:
- `Reddit BigData search <https://camas.github.io/reddit-search/>`_
- `Picuki <https://www.picuki.com/>`_, Instagram mirror
- `Twitter shadowban <https://shadowban.eu/>`_ checker
- (no longer available) `Reddit BigData search <https://camas.github.io/reddit-search/>`_
- (no longer available) `Twitter shadowban <https://shadowban.eu/>`_ checker
It allows getting additional info about the person and checking the existence of the account even if the main site is unavailable (bot protection, captcha, etc.)
Activation
----------
The activation mechanism helps make requests to sites requiring additional authentication like cookies, JWT tokens, or custom headers.
It works by implementing a custom function that:
1. Makes a specialized HTTP request to a specific website endpoint
2. Processes the response
3. Updates the headers/cookies for that site in the local Maigret database
Since activation only triggers after encountering specific errors, a retry (or another Maigret run) is needed to obtain a valid response with the updated authentication.
The activation mechanism is enabled by default, and cannot be disabled at the moment.
See for more details in Development section :ref:`activation-mechanism`.
.. _extracting-information-from-pages:
Extraction of information from account pages
--------------------------------------------
Maigret can parse URLs and content of web pages by URLs to extract info about account owner and other meta information.
You must specify the URL with the option ``--parse``, it's can be a link to an account or an online document. List of supported sites `see here <https://github.com/soxoj/socid-extractor#sites>`_.
After the end of the parsing phase, Maigret will start the search phase by :doc:`supported identifiers <supported-identifier-types>` found (usernames, ids, etc.).
.. code-block:: console
$ maigret --parse https://docs.google.com/spreadsheets/d/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw/edit\#gid\=0
Scanning webpage by URL https://docs.google.com/spreadsheets/d/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw/edit#gid=0...
┣╸org_name: Gooten
┗╸mime_type: application/vnd.google-apps.ritz
Scanning webpage by URL https://clients6.google.com/drive/v2beta/files/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw?fields=alternateLink%2CcopyRequiresWriterPermission%2CcreatedDate%2Cdescription%2CdriveId%2CfileSize%2CiconLink%2Cid%2Clabels(starred%2C%20trashed)%2ClastViewedByMeDate%2CmodifiedDate%2Cshared%2CteamDriveId%2CuserPermission(id%2Cname%2CemailAddress%2Cdomain%2Crole%2CadditionalRoles%2CphotoLink%2Ctype%2CwithLink)%2Cpermissions(id%2Cname%2CemailAddress%2Cdomain%2Crole%2CadditionalRoles%2CphotoLink%2Ctype%2CwithLink)%2Cparents(id)%2Ccapabilities(canMoveItemWithinDrive%2CcanMoveItemOutOfDrive%2CcanMoveItemOutOfTeamDrive%2CcanAddChildren%2CcanEdit%2CcanDownload%2CcanComment%2CcanMoveChildrenWithinDrive%2CcanRename%2CcanRemoveChildren%2CcanMoveItemIntoTeamDrive)%2Ckind&supportsTeamDrives=true&enforceSingleParent=true&key=AIzaSyC1eQ1xj69IdTMeii5r7brs3R90eck-m7k...
┣╸created_at: 2016-02-16T18:51:52.021Z
┣╸updated_at: 2019-10-23T17:15:47.157Z
┣╸gaia_id: 15696155517366416778
┣╸fullname: Nadia Burgess
┣╸email: nadia@gooten.com
┣╸image: https://lh3.googleusercontent.com/a-/AOh14GheZe1CyNa3NeJInWAl70qkip4oJ7qLsD8vDy6X=s64
┗╸email_username: nadia
.. code-block:: console
$ maigret.py --parse https://steamcommunity.com/profiles/76561199113454789
Scanning webpage by URL https://steamcommunity.com/profiles/76561199113454789...
┣╸steam_id: 76561199113454789
┣╸nickname: Pok
┗╸username: Machine42
Simple API
----------
+23 -7
View File
@@ -3,28 +3,44 @@
Welcome to the Maigret docs!
============================
**Maigret** is an easy-to-use and powerful OSINT tool for collecting a dossier on a person by username only.
**Maigret** is an easy-to-use and powerful OSINT tool for collecting a dossier on a person by a username (alias) only.
This is achieved by checking for accounts on a huge number of sites and gathering all the available information from web pages.
The project's main goal - give to OSINT researchers and pentesters a **universal tool** to get maximum information about a subject and integrate it with other tools in automatization pipelines.
The project's main goal give to OSINT researchers and pentesters a **universal tool** to get maximum information
about a person of interest by a username and integrate it with other tools in automatization pipelines.
.. warning::
**This tool is intended for educational and lawful purposes only.**
The developers do not endorse or encourage any illegal activities or misuse of this tool.
Regulations regarding the collection and use of personal data vary by country and region,
including but not limited to GDPR in the EU, CCPA in the USA, and similar laws worldwide.
It is your sole responsibility to ensure that your use of this tool complies with all applicable laws
and regulations in your jurisdiction. Any illegal use of this tool is strictly prohibited,
and you are fully accountable for your actions.
The authors and developers of this tool bear no responsibility for any misuse
or unlawful activities conducted by its users.
You may be interested in:
-------------------------
- :doc:`Command line options description <command-line-options>` and :doc:`usage examples <usage-examples>`
- :doc:`Quick start <quick-start>`
- :doc:`Usage examples <usage-examples>`
- :doc:`Command line options <command-line-options>`
- :doc:`Features list <features>`
- :doc:`Project roadmap <roadmap>`
.. toctree::
:hidden:
:caption: Sections
quick-start
installation
usage-examples
command-line-options
extracting-information-from-pages
features
philosophy
roadmap
supported-identifier-types
tags
usage-examples
settings
development
+92
View File
@@ -0,0 +1,92 @@
.. _installation:
Installation
============
Maigret can be installed using pip, Docker, or simply can be launched from the cloned repo.
Also, it is available online via `official Telegram bot <https://t.me/osint_maigret_bot>`_,
source code of a bot is `available on GitHub <https://github.com/soxoj/maigret-tg-bot>`_.
Windows Standalone EXE-binaries
-------------------------------
Standalone EXE-binaries for Windows are located in the `Releases section <https://github.com/soxoj/maigret/releases>`_ of GitHub repository.
Currently, the new binary is created automatically after each commit to **main** and **dev** branches.
Video guide on how to run it: https://youtu.be/qIgwTZOmMmM.
Cloud Shells and Jupyter notebooks
----------------------------------
In case you don't want to install Maigret locally, you can use cloud shells and Jupyter notebooks.
Press one of the buttons below and follow the instructions to launch it in your browser.
.. image:: https://user-images.githubusercontent.com/27065646/92304704-8d146d80-ef80-11ea-8c29-0deaabb1c702.png
:target: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/soxoj/maigret&tutorial=README.md
:alt: Open in Cloud Shell
.. image:: https://replit.com/badge/github/soxoj/maigret
:target: https://repl.it/github/soxoj/maigret
:alt: Run on Replit
:height: 50
.. image:: https://colab.research.google.com/assets/colab-badge.svg
:target: https://colab.research.google.com/gist/soxoj/879b51bc3b2f8b695abb054090645000/maigret-collab.ipynb
:alt: Open In Colab
:height: 45
.. image:: https://mybinder.org/badge_logo.svg
:target: https://mybinder.org/v2/gist/soxoj/9d65c2f4d3bec5dd25949197ea73cf3a/HEAD
:alt: Open In Binder
:height: 45
Local installation from PyPi
----------------------------
Please note that the sites database in the PyPI package may be outdated.
If you encounter frequent false positive results, we recommend installing the latest development version from GitHub instead.
.. note::
Python 3.10 or higher and pip is required, **Python 3.11 is recommended.**
.. code-block:: bash
# install from pypi
pip3 install maigret
# usage
maigret username
Development version (GitHub)
----------------------------
.. code-block:: bash
git clone https://github.com/soxoj/maigret && cd maigret
pip3 install .
# OR
pip3 install git+https://github.com/soxoj/maigret.git
# usage
maigret username
# OR use poetry in case you plan to develop Maigret
pip3 install poetry
poetry run maigret
Docker
------
.. code-block:: bash
# official image of the development version, updated from the github repo
docker pull soxoj/maigret
# usage
docker run -v /mydir:/app/reports soxoj/maigret:latest username --html
# manual build
docker build -t maigret .
Binary file not shown.

After

Width:  |  Height:  |  Size: 234 KiB

+12 -1
View File
@@ -3,4 +3,15 @@
Philosophy
==========
Username => Dossier
TL;DR: Username => Dossier
Maigret is designed to gather all the available information about person by his username.
What kind of information is this? First, links to person accounts. Secondly, all the machine-extractable
pieces of info, such as: other usernames, full name, URLs to people's images, birthday, location (country,
city, etc.), gender.
All this information forms some dossier, but it also useful for other tools and analytical purposes.
Each collected piece of data has a label of a certain format (for example, ``follower_count`` for the number
of subscribers or ``created_at`` for account creation time) so that it can be parsed and analyzed by various
systems and stored in databases.
+15
View File
@@ -0,0 +1,15 @@
.. _quick-start:
Quick start
===========
After :doc:`installing Maigret <installation>`, you can begin searching by providing one or more usernames to look up:
``maigret username1 username2 ...``
Maigret will search for accounts with the specified usernames across a vast number of websites. It will provide you with a list
of URLs to any discovered accounts, along with relevant information extracted from those profiles.
.. image:: maigret_screenshot.png
:alt: Maigret search results screenshot
:align: center
-18
View File
@@ -1,18 +0,0 @@
.. _roadmap:
Roadmap
=======
.. figure:: https://i.imgur.com/kk8cFdR.png
:target: https://i.imgur.com/kk8cFdR.png
:align: center
Current status
--------------
- Sites DB stats - ok
- Scan sessions stats - ok
- Site engine autodetect - ok
- Engines for all the sites - WIP
- Unified reporting flow - ok
- Retries - ok
+3
View File
@@ -3,6 +3,9 @@
Settings
==============
.. warning::
The settings system is under development and may be subject to change.
Options are also configurable through settings files. See
`settings JSON file <https://github.com/soxoj/maigret/blob/main/maigret/resources/settings.json>`_
for the list of currently supported options.
+19 -2
View File
@@ -5,7 +5,8 @@ Tags
The use of tags allows you to select a subset of the sites from big Maigret DB for search.
**Warning: tags markup is not stable now.**
.. warning::
Tags markup is still not stable.
There are several types of tags:
@@ -17,8 +18,24 @@ There are several types of tags:
Usage
-----
``--tags en,jp`` -- search on US and Japanese sites (actually marked as such in the Maigret database)
``--tags us,jp`` -- search on US and Japanese sites (actually marked as such in the Maigret database)
``--tags coding`` -- search on sites related to software development.
``--tags ucoz`` -- search on uCoz sites only (mostly CIS countries)
Blacklisting (excluding) tags
------------------------------
You can exclude sites with certain tags from the search using ``--exclude-tags``:
``--exclude-tags porn,dating`` -- skip all sites tagged with ``porn`` or ``dating``.
``--exclude-tags ru`` -- skip all Russian sites.
You can combine ``--tags`` and ``--exclude-tags`` to fine-tune your search:
``--tags forum --exclude-tags ru`` -- search on forum sites, but skip Russian ones.
In the web interface, the tag cloud supports three states per tag:
click once to **include** (green), click again to **exclude** (dark/strikethrough),
and click once more to return to **neutral** (red).
+39 -12
View File
@@ -3,51 +3,78 @@
Usage examples
==============
Start a search for accounts with username ``machine42`` on top 500 sites from the Maigret DB.
You can use Maigret as:
- a command line tool: initial and a default mode
- a `web interface <#web-interface>`_: view the graph with results and download all report formats on a single page
- a library: integrate Maigret into your own project
Use Cases
---------
1. Search for accounts with username ``machine42`` on top 500 sites (by default, according to Majestic Million rank) from the Maigret DB.
.. code-block:: console
maigret machine42
Start a search for accounts with username ``machine42`` on **all sites** from the Maigret DB.
2. Search for accounts with username ``machine42`` on **all sites** from the Maigret DB.
.. code-block:: console
maigret machine42 -a
Start a search [...] and generate HTML and PDF reports.
.. note::
Maigret will search for accounts on a huge number of sites,
and some of them may return false positive results. At the moment, we are working on autorepair mode to deliver
the most accurate results.
If you experience many false positives, you can do the following:
- Install the last development version of Maigret from GitHub
- Run Maigret with ``--self-check`` flag and agree on disabling of problematic sites
3. Search for accounts with username ``machine42`` and generate HTML and PDF reports.
.. code-block:: console
maigret machine42 -a -HP
maigret machine42 -HP
Start a search for accounts with username ``machine42`` only on Facebook.
or
.. code-block:: console
maigret machine42 -a --html --pdf
4. Search for accounts with username ``machine42`` on Facebook only.
.. code-block:: console
maigret machine42 --site Facebook
Extract information from the Steam page by URL and start a search for accounts with found username ``machine42``.
5. Extract information from the Steam page by URL and start a search for accounts with found username ``machine42``.
.. code-block:: console
maigret --parse https://steamcommunity.com/profiles/76561199113454789
Start a search for accounts with username ``machine42`` only on US and Japanese sites.
6. Search for accounts with username ``machine42`` only on US and Japanese sites.
.. code-block:: console
maigret michael --tags en,jp
maigret machine42 --tags us,jp
Start a search for accounts with username ``machine42`` only on sites related to software development.
7. Search for accounts with username ``machine42`` only on sites related to software development.
.. code-block:: console
maigret michael --tags coding
maigret machine42 --tags coding
Start a search for accounts with username ``machine42`` on uCoz sites only (mostly CIS countries).
8. Search for accounts with username ``machine42`` on uCoz sites only (mostly CIS countries).
.. code-block:: console
maigret michael --tags ucoz
maigret machine42 --tags ucoz
+40 -65
View File
@@ -1,68 +1,43 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8v6PEfyXb0Gx"
},
"outputs": [],
"source": [
"# clone the repo\n",
"!git clone https://github.com/soxoj/maigret\n",
"!pip3 install -r maigret/requirements.txt"
]
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "cXOQUAhDchkl"
},
"outputs": [],
"source": [
"# help\n",
"!python3 maigret/maigret.py --help"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "SjDmpN4QGnJu"
},
"outputs": [],
"source": [
"# search\n",
"!python3 maigret/maigret.py user"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"include_colab_link": true,
"name": "maigret.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 1
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "acxNWJOUmLc4"
},
"outputs": [],
"source": [
"!git clone https://github.com/soxoj/maigret\n",
"!pip3 install ./maigret/\n",
"from IPython.display import clear_output\n",
"clear_output()\n",
"username = str(input(\"Username >> \"))\n",
"!maigret {username} -a -n 10"
]
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "S3SmapMHmOoD"
},
"execution_count": null,
"outputs": []
}
]
}
-18
View File
@@ -1,18 +0,0 @@
#!/usr/bin/env python3
import asyncio
import sys
from maigret.maigret import main
def run():
try:
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
except KeyboardInterrupt:
print('Maigret is interrupted.')
sys.exit(1)
if __name__ == "__main__":
run()
+1 -1
View File
@@ -1,3 +1,3 @@
"""Maigret version file"""
__version__ = '0.4.1'
__version__ = '0.5.0'
+37
View File
@@ -1,3 +1,4 @@
import json
from http.cookiejar import MozillaCookieJar
from http.cookies import Morsel
@@ -25,6 +26,7 @@ class ParsingActivator:
import requests
r = requests.get(site.activation["url"], headers=headers)
logger.debug(f"Vimeo viewer activation: {json.dumps(r.json(), indent=4)}")
jwt_token = r.json()["jwt"]
site.headers["Authorization"] = "jwt " + jwt_token
@@ -39,6 +41,41 @@ class ParsingActivator:
bearer_token = r.json()["accessToken"]
site.headers["authorization"] = f"Bearer {bearer_token}"
@staticmethod
def weibo(site, logger):
headers = dict(site.headers)
import requests
session = requests.Session()
# 1 stage: get the redirect URL
r = session.get(
"https://weibo.com/clairekuo", headers=headers, allow_redirects=False
)
logger.debug(
f"1 stage: {'success' if r.status_code == 302 else 'no 302 redirect, fail!'}"
)
location = r.headers.get("Location")
# 2 stage: go to passport visitor page
headers["Referer"] = location
r = session.get(location, headers=headers)
logger.debug(
f"2 stage: {'success' if r.status_code == 200 else 'no 200 response, fail!'}"
)
# 3 stage: gen visitor token
headers["Referer"] = location
r = session.post(
"https://passport.weibo.com/visitor/genvisitor2",
headers=headers,
data={'cb': 'visitor_gray_callback', 'tid': '', 'from': 'weibo'},
)
cookies = r.headers.get('set-cookie')
logger.debug(
f"3 stage: {'success' if r.status_code == 200 and cookies else 'no 200 response and cookies, fail!'}"
)
site.headers["Cookie"] = cookies
def import_aiohttp_cookies(cookiestxt_filename):
cookies_obj = MozillaCookieJar(cookiestxt_filename)
+433 -187
View File
@@ -1,38 +1,36 @@
# Standard library imports
import ast
import asyncio
import logging
import random
import re
import ssl
import sys
from typing import Dict, List, Optional, Tuple
from urllib.parse import quote
# Third party imports
import aiodns
from alive_progress import alive_bar
from aiohttp import ClientSession, TCPConnector, http_exceptions
from aiohttp.client_exceptions import ClientConnectorError, ServerDisconnectedError
from python_socks import _errors as proxy_errors
from socid_extractor import extract
try:
from mock import Mock
except ImportError:
from unittest.mock import Mock
import re
import ssl
import sys
import tqdm
from typing import Tuple, Optional, Dict, List
from urllib.parse import quote
import aiodns
import tqdm.asyncio
from python_socks import _errors as proxy_errors
from socid_extractor import extract
from aiohttp import TCPConnector, ClientSession, http_exceptions
from aiohttp.client_exceptions import ServerDisconnectedError, ClientConnectorError
from .activation import ParsingActivator, import_aiohttp_cookies
# Local imports
from . import errors
from .activation import ParsingActivator, import_aiohttp_cookies
from .errors import CheckError
from .executors import (
AsyncExecutor,
AsyncioSimpleExecutor,
AsyncioProgressbarQueueExecutor,
)
from .result import QueryResult, QueryStatus
from .executors import AsyncioQueueGeneratorExecutor
from .result import MaigretCheckResult, MaigretCheckStatus
from .sites import MaigretDatabase, MaigretSite
from .types import QueryOptions, QueryResultWrapper
from .utils import get_random_user_agent, ascii_data_display
from .utils import ascii_data_display, get_random_user_agent
SUPPORTED_IDS = (
@@ -56,119 +54,148 @@ class CheckerBase:
class SimpleAiohttpChecker(CheckerBase):
def __init__(self, *args, **kwargs):
proxy = kwargs.get('proxy')
cookie_jar = kwargs.get('cookie_jar')
self.proxy = kwargs.get('proxy')
self.cookie_jar = kwargs.get('cookie_jar')
self.logger = kwargs.get('logger', Mock())
self.url = None
self.headers = None
self.allow_redirects = True
self.timeout = 0
self.allow_redirects = True
self.timeout = 0
self.method = 'get'
self.payload = None
# moved here to speed up the launch of Maigret
from aiohttp_socks import ProxyConnector
# make http client session
connector = ProxyConnector.from_url(proxy) if proxy else TCPConnector(ssl=False)
connector.verify_ssl = False
self.session = ClientSession(
connector=connector, trust_env=True, cookie_jar=cookie_jar
)
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get'):
if method == 'get':
request_method = self.session.get
else:
request_method = self.session.head
future = request_method(
url=url,
headers=headers,
allow_redirects=allow_redirects,
timeout=timeout,
)
return future
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get', payload=None):
self.url = url
self.headers = headers
self.allow_redirects = allow_redirects
self.timeout = timeout
self.method = method
self.payload = payload
return None
async def close(self):
await self.session.close()
async def check(self, future) -> Tuple[str, int, Optional[CheckError]]:
html_text = None
status_code = 0
error: Optional[CheckError] = CheckError("Unknown")
pass
async def _make_request(
self, session, url, headers, allow_redirects, timeout, method, logger, payload=None
) -> Tuple[str, int, Optional[CheckError]]:
try:
response = await future
if method.lower() == 'get':
request_method = session.get
elif method.lower() == 'post':
request_method = session.post
elif method.lower() == 'head':
request_method = session.head
else:
request_method = session.get
status_code = response.status
response_content = await response.content.read()
charset = response.charset or "utf-8"
decoded_content = response_content.decode(charset, "ignore")
html_text = decoded_content
kwargs = {
'url': url,
'headers': headers,
'allow_redirects': allow_redirects,
'timeout': timeout,
}
if payload and method.lower() == 'post':
if headers and headers.get('Content-Type') == 'application/x-www-form-urlencoded':
kwargs['data'] = payload
else:
kwargs['json'] = payload
error = None
if status_code == 0:
error = CheckError("Connection lost")
async with request_method(**kwargs) as response:
status_code = response.status
response_content = await response.content.read()
charset = response.charset or "utf-8"
decoded_content = response_content.decode(charset, "ignore")
self.logger.debug(html_text)
error = CheckError("Connection lost") if status_code == 0 else None
logger.debug(decoded_content)
return decoded_content, status_code, error
except asyncio.TimeoutError as e:
error = CheckError("Request timeout", str(e))
return None, 0, CheckError("Request timeout", str(e))
except ClientConnectorError as e:
error = CheckError("Connecting failure", str(e))
return None, 0, CheckError("Connecting failure", str(e))
except ServerDisconnectedError as e:
error = CheckError("Server disconnected", str(e))
return None, 0, CheckError("Server disconnected", str(e))
except http_exceptions.BadHttpMessage as e:
error = CheckError("HTTP", str(e))
return None, 0, CheckError("HTTP", str(e))
except proxy_errors.ProxyError as e:
error = CheckError("Proxy", str(e))
return None, 0, CheckError("Proxy", str(e))
except KeyboardInterrupt:
error = CheckError("Interrupted")
return None, 0, CheckError("Interrupted")
except Exception as e:
# python-specific exceptions
if sys.version_info.minor > 6 and (
isinstance(e, ssl.SSLCertVerificationError)
or isinstance(e, ssl.SSLError)
):
error = CheckError("SSL", str(e))
return None, 0, CheckError("SSL", str(e))
else:
self.logger.debug(e, exc_info=True)
error = CheckError("Unexpected", str(e))
logger.debug(e, exc_info=True)
return None, 0, CheckError("Unexpected", str(e))
if error == "Invalid proxy response":
self.logger.debug(e, exc_info=True)
async def check(self) -> Tuple[str, int, Optional[CheckError]]:
from aiohttp_socks import ProxyConnector
return str(html_text), status_code, error
connector = (
ProxyConnector.from_url(self.proxy)
if self.proxy
else TCPConnector(ssl=False)
)
connector.verify_ssl = False
async with ClientSession(
connector=connector,
trust_env=True,
# TODO: tests
cookie_jar=self.cookie_jar if self.cookie_jar else None,
) as session:
html_text, status_code, error = await self._make_request(
session,
self.url,
self.headers,
self.allow_redirects,
self.timeout,
self.method,
self.logger,
self.payload,
)
if error and str(error) == "Invalid proxy response":
self.logger.debug(error, exc_info=True)
return str(html_text) if html_text else '', status_code, error
class ProxiedAiohttpChecker(SimpleAiohttpChecker):
def __init__(self, *args, **kwargs):
proxy = kwargs.get('proxy')
cookie_jar = kwargs.get('cookie_jar')
self.proxy = kwargs.get('proxy')
self.cookie_jar = kwargs.get('cookie_jar')
self.logger = kwargs.get('logger', Mock())
# moved here to speed up the launch of Maigret
from aiohttp_socks import ProxyConnector
connector = ProxyConnector.from_url(proxy)
connector.verify_ssl = False
self.session = ClientSession(
connector=connector, trust_env=True, cookie_jar=cookie_jar
)
class AiodnsDomainResolver(CheckerBase):
if sys.platform == 'win32': # Temporary workaround for Windows
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
def __init__(self, *args, **kwargs):
loop = asyncio.get_event_loop()
self.logger = kwargs.get('logger', Mock())
self.resolver = aiodns.DNSResolver(loop=loop)
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get'):
return self.resolver.query(url, 'A')
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get', payload=None):
self.url = url
return None
async def check(self, future) -> Tuple[str, int, Optional[CheckError]]:
async def check(self) -> Tuple[str, int, Optional[CheckError]]:
status = 404
error = None
text = ''
try:
res = await future
res = await self.resolver.query(self.url, 'A')
text = str(res[0].host)
status = 200
except aiodns.error.DNSError:
@@ -184,10 +211,10 @@ class CheckerMock:
def __init__(self, *args, **kwargs):
pass
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get'):
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get', payload=None):
return None
async def check(self, future) -> Tuple[str, int, Optional[CheckError]]:
async def check(self) -> Tuple[str, int, Optional[CheckError]]:
await asyncio.sleep(0)
return '', 0, None
@@ -213,6 +240,11 @@ def detect_error_page(
if status_code == 403 and not ignore_403:
return CheckError("Access denied", "403 status code, use proxy/vpn")
elif status_code == 999:
# LinkedIn anti-bot / HTTP 999 workaround. It shouldn't trigger an infrastructure
# Server Error because it represents a valid "Not Found / Blocked" state for the username.
pass
elif status_code >= 500:
return CheckError("Server", f"{status_code} status code")
@@ -274,14 +306,16 @@ def process_site_result(
)
if site.activation and html_text and is_need_activation:
logger.debug(f"Activation for {site.name}")
method = site.activation["method"]
try:
activate_fun = getattr(ParsingActivator(), method)
# TODO: async call
activate_fun(site, logger)
except AttributeError:
except AttributeError as e:
logger.warning(
f"Activation method {method} for site {site.name} not found!"
f"Activation method {method} for site {site.name} not found!",
exc_info=True,
)
except Exception as e:
logger.warning(
@@ -298,6 +332,12 @@ def process_site_result(
if html_text:
if not presense_flags:
if check_type == "message" and logger.isEnabledFor(logging.DEBUG):
logger.debug(
"Site %s uses checkType message with empty presenseStrs; "
"presence is treated as true for any page.",
site.name,
)
is_presense_detected = True
site.stats["presense_flag"] = None
else:
@@ -309,7 +349,7 @@ def process_site_result(
break
def build_result(status, **kwargs):
return QueryResult(
return MaigretCheckResult(
username,
site_name,
url,
@@ -321,11 +361,11 @@ def process_site_result(
if check_error:
logger.warning(check_error)
result = QueryResult(
result = MaigretCheckResult(
username,
site_name,
url,
QueryStatus.UNKNOWN,
MaigretCheckStatus.UNKNOWN,
query_time=response_time,
error=check_error,
context=str(CheckError),
@@ -337,15 +377,15 @@ def process_site_result(
[(absence_flag in html_text) for absence_flag in site.absence_strs]
)
if not is_absence_detected and is_presense_detected:
result = build_result(QueryStatus.CLAIMED)
result = build_result(MaigretCheckStatus.CLAIMED)
else:
result = build_result(QueryStatus.AVAILABLE)
elif check_type in "status_code":
result = build_result(MaigretCheckStatus.AVAILABLE)
elif check_type == "status_code":
# Checks if the status code of the response is 2XX
if 200 <= status_code < 300:
result = build_result(QueryStatus.CLAIMED)
result = build_result(MaigretCheckStatus.CLAIMED)
else:
result = build_result(QueryStatus.AVAILABLE)
result = build_result(MaigretCheckStatus.AVAILABLE)
elif check_type == "response_url":
# For this detection method, we have turned off the redirect.
# So, there is no need to check the response URL: it will always
@@ -353,9 +393,9 @@ def process_site_result(
# code indicates that the request was successful (i.e. no 404, or
# forward to some odd redirect).
if 200 <= status_code < 300 and is_presense_detected:
result = build_result(QueryStatus.CLAIMED)
result = build_result(MaigretCheckStatus.CLAIMED)
else:
result = build_result(QueryStatus.AVAILABLE)
result = build_result(MaigretCheckStatus.AVAILABLE)
else:
# It should be impossible to ever get here...
raise ValueError(
@@ -364,25 +404,13 @@ def process_site_result(
extracted_ids_data = {}
if is_parsing_enabled and result.status == QueryStatus.CLAIMED:
try:
extracted_ids_data = extract(html_text)
except Exception as e:
logger.warning(f"Error while parsing {site.name}: {e}", exc_info=True)
if is_parsing_enabled and result.status == MaigretCheckStatus.CLAIMED:
extracted_ids_data = extract_ids_data(html_text, logger, site)
if extracted_ids_data:
new_usernames = {}
for k, v in extracted_ids_data.items():
if "username" in k:
new_usernames[v] = "username"
if k in SUPPORTED_IDS:
new_usernames[v] = k
results_info["ids_usernames"] = new_usernames
links = ascii_data_display(extracted_ids_data.get("links", "[]"))
if "website" in extracted_ids_data:
links.append(extracted_ids_data["website"])
results_info["ids_links"] = links
new_usernames = parse_usernames(extracted_ids_data, logger)
results_info = update_results_info(
results_info, extracted_ids_data, new_usernames
)
result.ids_data = extracted_ids_data
# Save status of request
@@ -397,7 +425,7 @@ def process_site_result(
def make_site_result(
site: MaigretSite, username: str, options: QueryOptions, logger
site: MaigretSite, username: str, options: QueryOptions, logger, *args, **kwargs
) -> QueryResultWrapper:
results_site: QueryResultWrapper = {}
@@ -414,6 +442,8 @@ def make_site_result(
headers = {
"User-Agent": get_random_user_agent(),
# tell server that we want to close connection after request
"Connection": "close",
}
headers.update(site.headers)
@@ -421,6 +451,10 @@ def make_site_result(
if "url" not in site.__dict__:
logger.error("No URL for site %s", site.name)
if kwargs.get('retry') and hasattr(site, "mirrors"):
site.url_main = random.choice(site.mirrors)
logger.info(f"Use {site.url_main} as a main url of site {site}")
# URL of user on site (if it exists)
url = site.url.format(
urlMain=site.url_main, urlSubpath=site.url_subpath, username=quote(username)
@@ -435,29 +469,29 @@ def make_site_result(
# site check is disabled
if site.disabled and not options['forced']:
logger.debug(f"Site {site.name} is disabled, skipping...")
results_site["status"] = QueryResult(
results_site["status"] = MaigretCheckResult(
username,
site.name,
url,
QueryStatus.ILLEGAL,
MaigretCheckStatus.ILLEGAL,
error=CheckError("Check is disabled"),
)
# current username type could not be applied
elif site.type != options["id_type"]:
results_site["status"] = QueryResult(
results_site["status"] = MaigretCheckResult(
username,
site.name,
url,
QueryStatus.ILLEGAL,
MaigretCheckStatus.ILLEGAL,
error=CheckError('Unsupported identifier type', f'Want "{site.type}"'),
)
# username is not allowed.
elif site.regex_check and re.search(site.regex_check, username) is None:
results_site["status"] = QueryResult(
results_site["status"] = MaigretCheckResult(
username,
site.name,
url,
QueryStatus.ILLEGAL,
MaigretCheckStatus.ILLEGAL,
error=CheckError(
'Unsupported username format', f'Want "{site.regex_check}"'
),
@@ -485,7 +519,9 @@ def make_site_result(
for k, v in site.get_params.items():
url_probe += f"&{k}={v}"
if site.check_type == "status_code" and site.request_head_only:
if site.request_method:
request_method = site.request_method.lower()
elif site.check_type == "status_code" and site.request_head_only:
# In most cases when we are detecting by status code,
# it is not necessary to get the entire body: we can
# detect fine with just the HEAD response.
@@ -496,6 +532,15 @@ def make_site_result(
# not respond properly unless we request the whole page.
request_method = 'get'
payload = None
if site.request_payload:
payload = {}
for k, v in site.request_payload.items():
if isinstance(v, str):
payload[k] = v.format(username=username)
else:
payload[k] = v
if site.check_type == "response_url":
# Site forwards request to a different URL if username not
# found. Disallow the redirect so we can capture the
@@ -512,11 +557,13 @@ def make_site_result(
headers=headers,
allow_redirects=allow_redirects,
timeout=options['timeout'],
payload=payload,
)
# Store future request object in the results object
results_site["future"] = future
results_site["checker"] = checker
results_site["checker"] = checker
return results_site
@@ -524,14 +571,52 @@ def make_site_result(
async def check_site_for_username(
site, username, options: QueryOptions, logger, query_notify, *args, **kwargs
) -> Tuple[str, QueryResultWrapper]:
default_result = make_site_result(site, username, options, logger)
future = default_result.get("future")
if not future:
default_result = make_site_result(
site, username, options, logger, retry=kwargs.get('retry')
)
# future = default_result.get("future")
# if not future:
# return site.name, default_result
checker = default_result.get("checker")
if not checker:
print(f"error, no checker for {site.name}")
return site.name, default_result
checker = default_result["checker"]
response = await checker.check()
html_text = response[0] if response and response[0] else ""
response = await checker.check(future=future)
# Retry once after token-style activation (e.g. Twitter guest token refresh).
act = site.activation
if act and html_text:
marks = act.get("marks") or []
if marks and any(m in html_text for m in marks):
method = act["method"]
try:
activate_fun = getattr(ParsingActivator(), method)
activate_fun(site, logger)
except AttributeError as e:
logger.warning(
f"Activation method {method} for site {site.name} not found!",
exc_info=True,
)
except Exception as e:
logger.warning(
f"Failed activation {method} for site {site.name}: {str(e)}",
exc_info=True,
)
else:
merged = dict(checker.headers or {})
merged.update(site.headers)
checker.prepare(
url=checker.url,
headers=merged,
allow_redirects=checker.allow_redirects,
timeout=checker.timeout,
method=checker.method,
payload=getattr(checker, 'payload', None),
)
response = await checker.check()
response_result = process_site_result(
response, query_notify, logger, default_result, site
@@ -543,8 +628,8 @@ async def check_site_for_username(
async def debug_ip_request(checker, logger):
future = checker.prepare(url="https://icanhazip.com")
ip, status, check_error = await checker.check(future)
checker.prepare(url="https://icanhazip.com")
ip, status, check_error = await checker.check()
if ip:
logger.debug(f"My IP is: {ip.strip()}")
else:
@@ -580,6 +665,8 @@ async def maigret(
cookies=None,
retries=0,
check_domains=False,
*args,
**kwargs,
) -> QueryResultWrapper:
"""Main search func
@@ -597,7 +684,7 @@ async def maigret(
is_parsing_enabled -- Extract additional info from account pages.
id_type -- Type of username to search.
Default is 'username', see all supported here:
https://github.com/soxoj/maigret/wiki/Supported-identifier-types
https://maigret.readthedocs.io/en/latest/supported-identifier-types.html
max_connections -- Maximum number of concurrent connections allowed.
Default is 100.
no_progressbar -- Displaying of ASCII progressbar during scanner.
@@ -655,13 +742,13 @@ async def maigret(
await debug_ip_request(clearweb_checker, logger)
# setup parallel executor
executor: Optional[AsyncExecutor] = None
if no_progressbar:
executor = AsyncioSimpleExecutor(logger=logger)
else:
executor = AsyncioProgressbarQueueExecutor(
logger=logger, in_parallel=max_connections, timeout=timeout + 0.5
)
executor = AsyncioQueueGeneratorExecutor(
logger=logger,
in_parallel=max_connections,
timeout=timeout + 0.5,
*args,
**kwargs,
)
# make options objects for all the requests
options: QueryOptions = {}
@@ -691,27 +778,34 @@ async def maigret(
continue
default_result: QueryResultWrapper = {
'site': site,
'status': QueryResult(
'status': MaigretCheckResult(
username,
sitename,
'',
QueryStatus.UNKNOWN,
MaigretCheckStatus.UNKNOWN,
error=CheckError('Request failed'),
),
}
tasks_dict[sitename] = (
check_site_for_username,
[site, username, options, logger, query_notify],
{'default': (sitename, default_result)},
{
'default': (sitename, default_result),
'retry': retries - attempts + 1,
},
)
cur_results = await executor.run(tasks_dict.values())
# wait for executor timeout errors
await asyncio.sleep(1)
cur_results = []
with alive_bar(
len(tasks_dict), title="Searching", force_tty=True, disable=no_progressbar
) as progress:
async for result in executor.run(tasks_dict.values()):
cur_results.append(result)
progress()
all_results.update(cur_results)
# rerun for failed sites
sites = get_failed_sites(dict(cur_results))
attempts -= 1
@@ -725,10 +819,8 @@ async def maigret(
# closing http client session
await clearweb_checker.close()
if tor_proxy:
await tor_checker.close()
if i2p_proxy:
await i2p_checker.close()
await tor_checker.close()
await i2p_checker.close()
# notify caller that all queries are finished
query_notify.finish()
@@ -763,25 +855,41 @@ def timeout_check(value):
async def site_self_check(
site: MaigretSite,
logger,
logger: logging.Logger,
semaphore,
db: MaigretDatabase,
silent=False,
proxy=None,
tor_proxy=None,
i2p_proxy=None,
skip_errors=False,
cookies=None,
auto_disable=False,
diagnose=False,
):
"""
Self-check a site configuration.
Args:
auto_disable: If True, automatically disable sites that fail checks.
If False (default), only report issues without disabling.
diagnose: If True, print detailed diagnosis information.
"""
changes = {
"disabled": False,
"issues": [],
"recommendations": [],
}
check_data = [
(site.username_claimed, QueryStatus.CLAIMED),
(site.username_unclaimed, QueryStatus.AVAILABLE),
(site.username_claimed, MaigretCheckStatus.CLAIMED),
(site.username_unclaimed, MaigretCheckStatus.AVAILABLE),
]
logger.info(f"Checking {site.name}...")
results_cache = {}
for username, status in check_data:
async with semaphore:
results_dict = await maigret(
@@ -796,50 +904,106 @@ async def site_self_check(
proxy=proxy,
tor_proxy=tor_proxy,
i2p_proxy=i2p_proxy,
cookies=cookies,
)
# don't disable entries with other ids types
# TODO: make normal checking
if site.name not in results_dict:
logger.info(results_dict)
changes["disabled"] = True
changes["issues"].append(f"Site {site.name} not in results (wrong id_type?)")
if auto_disable:
changes["disabled"] = True
continue
logger.debug(results_dict)
result = results_dict[site.name]["status"]
results_cache[username] = results_dict[site.name]
if result.error and 'Cannot connect to host' in result.error.desc:
changes["issues"].append(f"Cannot connect to host")
if auto_disable:
changes["disabled"] = True
site_status = result.status
if site_status != status:
if site_status == QueryStatus.UNKNOWN:
if site_status == MaigretCheckStatus.UNKNOWN:
msgs = site.absence_strs
etype = site.check_type
error_msg = f"Error checking {username}: {result.context}"
changes["issues"].append(error_msg)
logger.warning(
f"Error while searching {username} in {site.name}: {result.context}, {msgs}, type {etype}"
)
# don't disable sites after the error
# meaning that the site could be available, but returned error for the check
# e.g. many sites protected by cloudflare and available in general
if skip_errors:
pass
# don't disable in case of available username
if status == QueryStatus.CLAIMED:
elif status == MaigretCheckStatus.CLAIMED and auto_disable:
changes["disabled"] = True
elif status == QueryStatus.CLAIMED:
elif status == MaigretCheckStatus.CLAIMED:
changes["issues"].append(f"Claimed user '{username}' not detected as claimed")
logger.warning(
f"Not found `{username}` in {site.name}, must be claimed"
)
logger.info(results_dict[site.name])
changes["disabled"] = True
if auto_disable:
changes["disabled"] = True
else:
changes["issues"].append(f"Unclaimed user '{username}' detected as claimed")
logger.warning(f"Found `{username}` in {site.name}, must be available")
logger.info(results_dict[site.name])
changes["disabled"] = True
if auto_disable:
changes["disabled"] = True
logger.info(f"Site {site.name} checking is finished")
if changes["disabled"] != site.disabled:
# Generate recommendations based on issues
if changes["issues"] and len(results_cache) == 2:
claimed_result = results_cache.get(site.username_claimed, {})
unclaimed_result = results_cache.get(site.username_unclaimed, {})
claimed_http = claimed_result.get("http_status")
unclaimed_http = unclaimed_result.get("http_status")
if claimed_http and unclaimed_http:
if claimed_http != unclaimed_http and site.check_type != "status_code":
changes["recommendations"].append(
f"Consider checkType: status_code (HTTP {claimed_http} vs {unclaimed_http})"
)
# Print diagnosis if requested
if diagnose and changes["issues"]:
print(f"\n--- {site.name} DIAGNOSIS ---")
print(f" Check type: {site.check_type}")
print(f" Issues:")
for issue in changes["issues"]:
print(f" - {issue}")
if changes["recommendations"]:
print(f" Recommendations:")
for rec in changes["recommendations"]:
print(f" -> {rec}")
# Only modify site if auto_disable is enabled
if auto_disable and changes["disabled"] != site.disabled:
site.disabled = changes["disabled"]
logger.info(f"Switching property 'disabled' for {site.name} to {site.disabled}")
db.update_site(site)
if not silent:
action = "Disabled" if site.disabled else "Enabled"
print(f"{action} site {site.name}...")
elif changes["issues"] and not silent and not diagnose:
# Report issues without disabling
print(f"Issues found in {site.name}: {len(changes['issues'])} (not auto-disabled)")
# remove service tag "unchecked"
if "unchecked" in site.tags:
site.tags.remove("unchecked")
db.update_site(site)
return changes
@@ -847,45 +1011,127 @@ async def site_self_check(
async def self_check(
db: MaigretDatabase,
site_data: dict,
logger,
logger: logging.Logger,
silent=False,
max_connections=10,
proxy=None,
tor_proxy=None,
i2p_proxy=None,
) -> bool:
auto_disable=False,
diagnose=False,
) -> dict:
"""
Run self-check on sites.
Args:
auto_disable: If True, automatically disable sites that fail checks.
If False (default), only report issues without disabling.
diagnose: If True, print detailed diagnosis for each failing site.
Returns:
dict with 'needs_update' bool and 'results' list of check results
"""
sem = asyncio.Semaphore(max_connections)
tasks = []
all_sites = site_data
all_results = []
def disabled_count(lst):
return len(list(filter(lambda x: x.disabled, lst)))
unchecked_old_count = len(
[site for site in all_sites.values() if "unchecked" in site.tags]
)
disabled_old_count = disabled_count(all_sites.values())
for _, site in all_sites.items():
check_coro = site_self_check(
site, logger, sem, db, silent, proxy, tor_proxy, i2p_proxy
site, logger, sem, db, silent, proxy, tor_proxy, i2p_proxy,
skip_errors=True, auto_disable=auto_disable, diagnose=diagnose
)
future = asyncio.ensure_future(check_coro)
tasks.append(future)
tasks.append((site.name, future))
for f in tqdm.asyncio.tqdm.as_completed(tasks):
await f
if tasks:
with alive_bar(len(tasks), title='Self-checking', force_tty=True) as progress:
for site_name, f in tasks:
result = await f
result['site_name'] = site_name
all_results.append(result)
progress() # Update the progress bar
unchecked_new_count = len(
[site for site in all_sites.values() if "unchecked" in site.tags]
)
disabled_new_count = disabled_count(all_sites.values())
total_disabled = disabled_new_count - disabled_old_count
if total_disabled >= 0:
message = "Disabled"
else:
message = "Enabled"
total_disabled *= -1
# Count issues
total_issues = sum(1 for r in all_results if r.get('issues'))
if not silent:
print(
f"{message} {total_disabled} ({disabled_old_count} => {disabled_new_count}) checked sites. "
"Run with `--info` flag to get more information"
)
if auto_disable and total_disabled:
if total_disabled >= 0:
message = "Disabled"
else:
message = "Enabled"
total_disabled *= -1
return total_disabled != 0
if not silent:
print(
f"{message} {total_disabled} ({disabled_old_count} => {disabled_new_count}) checked sites. "
"Run with `--info` flag to get more information"
)
elif total_issues and not silent:
print(f"\nFound issues in {total_issues} sites (auto-disable is OFF)")
print("Use --auto-disable to automatically disable failing sites")
print("Use --diagnose to see detailed diagnosis for each site")
if unchecked_new_count != unchecked_old_count:
print(f"Unchecked sites verified: {unchecked_old_count - unchecked_new_count}")
needs_update = total_disabled != 0 or unchecked_new_count != unchecked_old_count
# For backwards compatibility, return bool if auto_disable is True
if auto_disable:
return needs_update
return {
'needs_update': needs_update,
'results': all_results,
'total_issues': total_issues,
}
def extract_ids_data(html_text, logger, site) -> Dict:
try:
return extract(html_text)
except Exception as e:
logger.warning(f"Error while parsing {site.name}: {e}", exc_info=True)
return {}
def parse_usernames(extracted_ids_data, logger) -> Dict:
new_usernames = {}
for k, v in extracted_ids_data.items():
if "username" in k and not "usernames" in k:
new_usernames[v] = "username"
elif "usernames" in k:
try:
tree = ast.literal_eval(v)
if type(tree) == list:
for n in tree:
new_usernames[n] = "username"
except Exception as e:
logger.warning(e)
if k in SUPPORTED_IDS:
new_usernames[v] = k
return new_usernames
def update_results_info(results_info, extracted_ids_data, new_usernames):
results_info["ids_usernames"] = new_usernames
links = ascii_data_display(extracted_ids_data.get("links", "[]"))
if "website" in extracted_ids_data:
links.append(extracted_ids_data["website"])
results_info["ids_links"] = links
return results_info
+54 -4
View File
@@ -1,6 +1,6 @@
from typing import Dict, List, Any
from typing import Dict, List, Any, Tuple
from .result import QueryResult
from .result import MaigretCheckResult
from .types import QueryResultWrapper
@@ -32,6 +32,9 @@ COMMON_ERRORS = {
'<title>Attention Required! | Cloudflare</title>': CheckError(
'Captcha', 'Cloudflare'
),
'<title>Just a moment</title>': CheckError(
'Bot protection', 'Cloudflare challenge page'
),
'Please stand by, while we are checking your browser': CheckError(
'Bot protection', 'Cloudflare'
),
@@ -58,13 +61,18 @@ COMMON_ERRORS = {
'Сайт заблокирован хостинг-провайдером': CheckError(
'Site-specific', 'Site is disabled (Beget)'
),
'Generated by cloudfront (CloudFront)': CheckError('Request blocked', 'Cloudflare'),
'/cdn-cgi/challenge-platform/h/b/orchestrate/chl_page': CheckError(
'Just a moment: bot redirect challenge', 'Cloudflare'
),
}
ERRORS_TYPES = {
'Captcha': 'Try to switch to another IP address or to use service cookies',
'Bot protection': 'Try to switch to another IP address',
'Censorship': 'switch to another internet service provider',
'Censorship': 'Switch to another internet service provider',
'Request timeout': 'Try to increase timeout or to switch to another internet service provider',
'Connecting failure': 'Try to decrease number of parallel connections (e.g. -n 10)',
}
# TODO: checking for reason
@@ -109,7 +117,7 @@ def extract_and_group(search_res: QueryResultWrapper) -> List[Dict[str, Any]]:
errors_counts: Dict[str, int] = {}
for r in search_res.values():
if r and isinstance(r, dict) and r.get('status'):
if not isinstance(r['status'], QueryResult):
if not isinstance(r['status'], MaigretCheckResult):
continue
err = r['status'].error
@@ -128,3 +136,45 @@ def extract_and_group(search_res: QueryResultWrapper) -> List[Dict[str, Any]]:
)
return counts
def notify_about_errors(
search_results: QueryResultWrapper, query_notify, show_statistics=False
) -> List[Tuple]:
"""
Prepare error notifications in search results, text + symbol,
to be displayed by notify object.
Example:
[
("Too many errors of type "timeout" (50.0%)", "!")
("Verbose error statistics:", "-")
]
"""
results = []
errs = extract_and_group(search_results)
was_errs_displayed = False
for e in errs:
if not is_important(e):
continue
text = f'Too many errors of type "{e["err"]}" ({round(e["perc"],2)}%)'
solution = solution_of(e['err'])
if solution:
text = '. '.join([text, solution.capitalize()])
results.append((text, '!'))
was_errs_displayed = True
if show_statistics:
results.append(('Verbose error statistics:', '-'))
for e in errs:
text = f'{e["err"]}: {round(e["perc"],2)}%'
results.append((text, '!'))
if was_errs_displayed:
results.append(
('You can see detailed site check errors with a flag `--print-errors`', '-')
)
return results
+146 -20
View File
@@ -1,8 +1,10 @@
import asyncio
import time
import tqdm
import sys
from typing import Iterable, Any, List
import time
from typing import Any, Iterable, List, Callable
import alive_progress
from alive_progress import alive_bar
from .types import QueryDraft
@@ -17,6 +19,7 @@ def create_task_func():
class AsyncExecutor:
# Deprecated: will be removed soon, don't use it
def __init__(self, *args, **kwargs):
self.logger = kwargs['logger']
@@ -32,27 +35,46 @@ class AsyncExecutor:
class AsyncioSimpleExecutor(AsyncExecutor):
# Deprecated: will be removed soon, don't use it
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.semaphore = asyncio.Semaphore(kwargs.get('in_parallel', 100))
async def _run(self, tasks: Iterable[QueryDraft]):
futures = [f(*args, **kwargs) for f, args, kwargs in tasks]
async def sem_task(f, args, kwargs):
async with self.semaphore:
return await f(*args, **kwargs)
futures = [sem_task(f, args, kwargs) for f, args, kwargs in tasks]
return await asyncio.gather(*futures)
class AsyncioProgressbarExecutor(AsyncExecutor):
# Deprecated: will be removed soon, don't use it
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
async def _run(self, tasks: Iterable[QueryDraft]):
futures = [f(*args, **kwargs) for f, args, kwargs in tasks]
total_tasks = len(futures)
results = []
for f in tqdm.asyncio.tqdm.as_completed(futures):
results.append(await f)
# Use alive_bar for progress tracking
with alive_bar(total_tasks, title='Searching', force_tty=True) as progress:
# Chunk progress updates for efficiency
async def track_task(task):
result = await task
progress() # Update progress bar once task completes
return result
# Use gather to run tasks concurrently and track progress
results = await asyncio.gather(*(track_task(f) for f in futures))
return results
class AsyncioProgressbarSemaphoreExecutor(AsyncExecutor):
# Deprecated: will be removed soon, don't use it
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.semaphore = asyncio.Semaphore(kwargs.get('in_parallel', 1))
@@ -66,8 +88,12 @@ class AsyncioProgressbarSemaphoreExecutor(AsyncExecutor):
async def semaphore_gather(tasks: Iterable[QueryDraft]):
coros = [_wrap_query(q) for q in tasks]
results = []
for f in tqdm.asyncio.tqdm.as_completed(coros):
results.append(await f)
# Use alive_bar correctly as a context manager
with alive_bar(len(coros), title='Searching', force_tty=True) as progress:
for f in asyncio.as_completed(coros):
results.append(await f)
progress() # Update the progress bar
return results
return await semaphore_gather(tasks)
@@ -77,11 +103,35 @@ class AsyncioProgressbarQueueExecutor(AsyncExecutor):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.workers_count = kwargs.get('in_parallel', 10)
self.progress_func = kwargs.get('progress_func', tqdm.tqdm)
self.queue = asyncio.Queue(self.workers_count)
self.timeout = kwargs.get('timeout')
# Pass a progress function; alive_bar by default
self.progress_func = kwargs.get('progress_func', alive_bar)
self.progress = None
# TODO: tests
async def increment_progress(self, count):
"""Update progress by calling the provided progress function."""
if self.progress:
if asyncio.iscoroutinefunction(self.progress):
await self.progress(count)
else:
self.progress(count)
await asyncio.sleep(0)
# TODO: tests
async def stop_progress(self):
"""Stop the progress tracking."""
if hasattr(self.progress, "close") and self.progress:
close_func = self.progress.close
if asyncio.iscoroutinefunction(close_func):
await close_func()
else:
close_func()
await asyncio.sleep(0)
async def worker(self):
"""Consume tasks from the queue and process them."""
while True:
try:
f, args, kwargs = self.queue.get_nowait()
@@ -96,23 +146,99 @@ class AsyncioProgressbarQueueExecutor(AsyncExecutor):
result = kwargs.get('default')
self.results.append(result)
self.progress.update(1)
if self.progress:
await self.increment_progress(1)
self.queue.task_done()
async def _run(self, queries: Iterable[QueryDraft]):
"""Main runner function to execute tasks with progress tracking."""
self.results: List[Any] = []
queries_list = list(queries)
min_workers = min(len(queries_list), self.workers_count)
workers = [create_task_func()(self.worker()) for _ in range(min_workers)]
self.progress = self.progress_func(total=len(queries_list))
for t in queries_list:
await self.queue.put(t)
await self.queue.join()
for w in workers:
w.cancel()
self.progress.close()
# Initialize the progress bar
if self.progress_func:
with self.progress_func(
len(queries_list), title="Searching", force_tty=True
) as bar:
self.progress = bar # Assign alive_bar's callable to self.progress
# Add tasks to the queue
for t in queries_list:
await self.queue.put(t)
# Wait for tasks to complete
await self.queue.join()
# Cancel any remaining workers
for w in workers:
w.cancel()
return self.results
class AsyncioQueueGeneratorExecutor:
# Deprecated: will be removed soon, don't use it
def __init__(self, *args, **kwargs):
self.workers_count = kwargs.get('in_parallel', 10)
self.queue = asyncio.Queue()
self.timeout = kwargs.get('timeout')
self.logger = kwargs['logger']
self._results = asyncio.Queue()
self._stop_signal = object()
async def worker(self):
"""Process tasks from the queue and put results into the results queue."""
while True:
task = await self.queue.get()
if task is self._stop_signal:
self.queue.task_done()
break
try:
f, args, kwargs = task
query_future = f(*args, **kwargs)
query_task = create_task_func()(query_future)
try:
result = await asyncio.wait_for(query_task, timeout=self.timeout)
except asyncio.TimeoutError:
result = kwargs.get('default')
await self._results.put(result)
except Exception as e:
self.logger.error(f"Error in worker: {e}")
finally:
self.queue.task_done()
async def run(self, queries: Iterable[Callable[..., Any]]):
"""Run workers to process queries in parallel."""
start_time = time.time()
# Add tasks to the queue
for t in queries:
await self.queue.put(t)
# Create workers
workers = [
asyncio.create_task(self.worker()) for _ in range(self.workers_count)
]
# Add stop signals
for _ in range(self.workers_count):
await self.queue.put(self._stop_signal)
try:
while any(w.done() is False for w in workers) or not self._results.empty():
try:
result = await asyncio.wait_for(self._results.get(), timeout=1)
yield result
except asyncio.TimeoutError:
pass
finally:
# Ensure all workers are awaited
await asyncio.gather(*workers)
self.execution_time = time.time() - start_time
self.logger.debug(f"Spent time: {self.execution_time}")
+128 -33
View File
@@ -1,11 +1,14 @@
"""
Maigret main module
"""
import ast
import asyncio
import logging
import os
import sys
import platform
import re
from argparse import ArgumentParser, RawDescriptionHelpFormatter
from typing import List, Tuple
import os.path as path
@@ -40,26 +43,7 @@ from .submit import Submitter
from .types import QueryResultWrapper
from .utils import get_dict_ascii_tree
from .settings import Settings
def notify_about_errors(search_results: QueryResultWrapper, query_notify):
errs = errors.extract_and_group(search_results)
was_errs_displayed = False
for e in errs:
if not errors.is_important(e):
continue
text = f'Too many errors of type "{e["err"]}" ({e["perc"]}%)'
solution = errors.solution_of(e['err'])
if solution:
text = '. '.join([text, solution.capitalize()])
query_notify.warning(text, '!')
was_errs_displayed = True
if was_errs_displayed:
query_notify.warning(
'You can see detailed site check errors with a flag `--print-errors`'
)
from .permutator import Permute
def extract_ids_from_page(url, logger, timeout=5) -> dict:
@@ -85,8 +69,17 @@ def extract_ids_from_page(url, logger, timeout=5) -> dict:
else:
print(get_dict_ascii_tree(info.items(), new_line=False), ' ')
for k, v in info.items():
if 'username' in k:
# TODO: merge with the same functionality in checking module
if 'username' in k and not 'usernames' in k:
results[v] = 'username'
elif 'usernames' in k:
try:
tree = ast.literal_eval(v)
if type(tree) == list:
for n in tree:
results[n] = 'username'
except Exception as e:
logger.warning(e)
if k in SUPPORTED_IDS:
results[v] = k
@@ -172,7 +165,7 @@ def setup_arguments_parser(settings: Settings):
type=int,
dest="connections",
default=settings.max_connections,
help="Allowed number of concurrent connections.",
help=f"Allowed number of concurrent connections (default {settings.max_connections}).",
)
parser.add_argument(
"--no-recursion",
@@ -195,6 +188,12 @@ def setup_arguments_parser(settings: Settings):
choices=SUPPORTED_IDS,
help="Specify identifier(s) type (default: username).",
)
parser.add_argument(
"--permute",
action="store_true",
default=False,
help="Permute at least 2 usernames to generate more possible usernames.",
)
parser.add_argument(
"--db",
metavar="DB_FILE",
@@ -278,6 +277,12 @@ def setup_arguments_parser(settings: Settings):
filter_group.add_argument(
"--tags", dest="tags", default='', help="Specify tags of sites (see `--stats`)."
)
filter_group.add_argument(
"--exclude-tags",
dest="exclude_tags",
default='',
help="Specify tags to exclude from search (blacklist).",
)
filter_group.add_argument(
"--site",
action="append",
@@ -317,7 +322,19 @@ def setup_arguments_parser(settings: Settings):
"--self-check",
action="store_true",
default=settings.self_check_enabled,
help="Do self check for sites and database and disable non-working ones.",
help="Do self check for sites and database. Use --auto-disable to disable failing sites.",
)
modes_group.add_argument(
"--auto-disable",
action="store_true",
default=False,
help="With --self-check: automatically disable sites that fail checks.",
)
modes_group.add_argument(
"--diagnose",
action="store_true",
default=False,
help="With --self-check: print detailed diagnosis for each failing site.",
)
modes_group.add_argument(
"--stats",
@@ -325,7 +342,15 @@ def setup_arguments_parser(settings: Settings):
default=False,
help="Show database statistics (most frequent sites engines and tags).",
)
modes_group.add_argument(
"--web",
metavar='PORT',
type=int,
nargs='?', # Optional PORT value
const=5000, # Default PORT if `--web` is provided without a value
default=None, # Explicitly set default to None
help="Launch the web interface on the specified port (default: 5000 if no PORT is provided).",
)
output_group = parser.add_argument_group(
'Output options', 'Options to change verbosity and view of the console output'
)
@@ -477,7 +502,7 @@ async def main():
arg_parser = setup_arguments_parser(settings)
args = arg_parser.parse_args()
# Re-set loggging level based on args
# Re-set logging level based on args
if args.debug:
log_level = logging.DEBUG
elif args.info:
@@ -492,6 +517,10 @@ async def main():
for u in args.username
if u and u not in ['-'] and u not in args.ignore_ids_list
}
original_usernames = ""
if args.permute and len(usernames) > 1 and args.id_type == 'username':
original_usernames = " ".join(usernames.keys())
usernames = Permute(usernames).gather(method='strict')
parsing_enabled = not args.disable_extracting
recursive_search_enabled = not args.disable_recursive_search
@@ -509,7 +538,14 @@ async def main():
if args.tags:
args.tags = list(set(str(args.tags).split(',')))
db_file = path.join(path.dirname(path.realpath(__file__)), args.db_file)
if args.exclude_tags:
args.exclude_tags = list(set(str(args.exclude_tags).split(',')))
else:
args.exclude_tags = []
db_file = args.db_file \
if (args.db_file.startswith("http://") or args.db_file.startswith("https://")) \
else path.join(path.dirname(path.realpath(__file__)), args.db_file)
if args.top_sites == 0 or args.all_sites:
args.top_sites = sys.maxsize
@@ -528,6 +564,7 @@ async def main():
get_top_sites_for_id = lambda x: db.ranked_sites_dict(
top=args.top_sites,
tags=args.tags,
excluded_tags=args.exclude_tags,
names=args.site_list,
disabled=args.use_disabled_sites,
id_type=x,
@@ -540,11 +577,20 @@ async def main():
is_submitted = await submitter.dialog(args.new_site_to_submit, args.cookie_file)
if is_submitted:
db.save_to_file(db_file)
await submitter.close()
# Database self-checking
if args.self_check:
print('Maigret sites database self-checking...')
is_need_update = await self_check(
if len(site_data) == 0:
query_notify.warning(
'No sites to self-check with the current filters! Exiting...'
)
return
query_notify.success(
f'Maigret sites database self-check started for {len(site_data)} sites...'
)
check_result = await self_check(
db,
site_data,
logger,
@@ -552,7 +598,16 @@ async def main():
max_connections=args.connections,
tor_proxy=args.tor_proxy,
i2p_proxy=args.i2p_proxy,
auto_disable=args.auto_disable,
diagnose=args.diagnose,
)
# Handle both old (bool) and new (dict) return types
if isinstance(check_result, dict):
is_need_update = check_result.get('needs_update', False)
else:
is_need_update = check_result
if is_need_update:
if input('Do you want to save changes permanently? [Yn]\n').lower() in (
'y',
@@ -562,11 +617,15 @@ async def main():
print('Database was successfully updated.')
else:
print('Updates will be applied only for current search session.')
print('Scan sessions flags stats: ' + str(db.get_scan_stats(site_data)))
if args.verbose or args.debug:
query_notify.info(
'Scan sessions flags stats: ' + str(db.get_scan_stats(site_data))
)
# Database statistics
if args.stats:
print(db.get_db_stats(db.sites_dict))
print(db.get_db_stats())
report_dir = path.join(os.getcwd(), args.folderoutput)
@@ -576,11 +635,32 @@ async def main():
# Define one report filename template
report_filepath_tpl = path.join(report_dir, 'report_{username}{postfix}')
# Web interface
if args.web is not None:
from maigret.web.app import app
app.config["MAIGRET_DB_FILE"] = db_file
port = (
args.web if args.web else 5000
) # args.web is either the specified port or 5000 by default
# Host configuration: secure by default, but allow override via environment
host = os.getenv('FLASK_HOST', '127.0.0.1')
app.run(host=host, port=port)
return
if usernames == {}:
# magic params to exit after init
query_notify.warning('No usernames to check, exiting.')
sys.exit(0)
if len(usernames) > 1 and args.permute and args.id_type == 'username':
query_notify.warning(
f"{len(usernames)} permutations from {original_usernames} to check..."
+ get_dict_ascii_tree(usernames, prepend="\t")
)
if not site_data:
query_notify.warning('No sites to check, exiting!')
sys.exit(2)
@@ -644,7 +724,11 @@ async def main():
check_domains=args.with_domains,
)
notify_about_errors(results, query_notify)
errs = errors.notify_about_errors(
results, query_notify, show_statistics=args.verbose
)
for e in errs:
query_notify.warning(*e)
if args.reports_sorting == "data":
results = sort_report_by_data_points(results)
@@ -654,25 +738,30 @@ async def main():
# TODO: tests
if recursive_search_enabled:
extracted_ids = extract_ids_from_results(results, db)
query_notify.warning(f'Extracted IDs: {extracted_ids}')
usernames.update(extracted_ids)
# reporting for a one username
if args.xmind:
username = username.replace('/', '_')
filename = report_filepath_tpl.format(username=username, postfix='.xmind')
save_xmind_report(filename, username, results)
query_notify.warning(f'XMind report for {username} saved in {filename}')
if args.csv:
username = username.replace('/', '_')
filename = report_filepath_tpl.format(username=username, postfix='.csv')
save_csv_report(filename, username, results)
query_notify.warning(f'CSV report for {username} saved in {filename}')
if args.txt:
username = username.replace('/', '_')
filename = report_filepath_tpl.format(username=username, postfix='.txt')
save_txt_report(filename, username, results)
query_notify.warning(f'TXT report for {username} saved in {filename}')
if args.json:
username = username.replace('/', '_')
filename = report_filepath_tpl.format(
username=username, postfix=f'_{args.json}.json'
)
@@ -690,6 +779,7 @@ async def main():
username = report_context['username']
if args.html:
username = username.replace('/', '_')
filename = report_filepath_tpl.format(
username=username, postfix='_plain.html'
)
@@ -697,11 +787,13 @@ async def main():
query_notify.warning(f'HTML report on all usernames saved in {filename}')
if args.pdf:
username = username.replace('/', '_')
filename = report_filepath_tpl.format(username=username, postfix='.pdf')
save_pdf_report(filename, report_context)
query_notify.warning(f'PDF report on all usernames saved in {filename}')
if args.graph:
username = username.replace('/', '_')
filename = report_filepath_tpl.format(
username=username, postfix='_graph.html'
)
@@ -719,8 +811,11 @@ async def main():
def run():
try:
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
if sys.version_info.minor >= 10:
asyncio.run(main())
else:
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
except KeyboardInterrupt:
print('Maigret is interrupted.')
sys.exit(1)
+10 -5
View File
@@ -3,11 +3,12 @@
This module defines the objects for notifying the caller about the
results of queries.
"""
import sys
from colorama import Fore, Style, init
from .result import QueryStatus
from .result import MaigretCheckStatus
from .utils import get_dict_ascii_tree
@@ -211,6 +212,10 @@ class QueryNotifyPrint(QueryNotify):
else:
print(msg)
def success(self, message, symbol="+"):
msg = f"[{symbol}] {message}"
self._colored_print(Fore.GREEN, msg)
def warning(self, message, symbol="-"):
msg = f"[{symbol}] {message}"
self._colored_print(Fore.YELLOW, msg)
@@ -240,7 +245,7 @@ class QueryNotifyPrint(QueryNotify):
ids_data_text = get_dict_ascii_tree(self.result.ids_data.items(), " ")
# Output to the terminal is desired.
if result.status == QueryStatus.CLAIMED:
if result.status == MaigretCheckStatus.CLAIMED:
color = Fore.BLUE if is_similar else Fore.GREEN
status = "?" if is_similar else "+"
notify = self.make_terminal_notify(
@@ -250,7 +255,7 @@ class QueryNotifyPrint(QueryNotify):
color,
result.site_url_user + ids_data_text,
)
elif result.status == QueryStatus.AVAILABLE:
elif result.status == MaigretCheckStatus.AVAILABLE:
if not self.print_found_only:
notify = self.make_terminal_notify(
"-",
@@ -259,7 +264,7 @@ class QueryNotifyPrint(QueryNotify):
Fore.YELLOW,
"Not found!" + ids_data_text,
)
elif result.status == QueryStatus.UNKNOWN:
elif result.status == MaigretCheckStatus.UNKNOWN:
if not self.skip_check_errors:
notify = self.make_terminal_notify(
"?",
@@ -268,7 +273,7 @@ class QueryNotifyPrint(QueryNotify):
Fore.RED,
str(self.result.error) + ids_data_text,
)
elif result.status == QueryStatus.ILLEGAL:
elif result.status == MaigretCheckStatus.ILLEGAL:
if not self.print_found_only:
text = "Illegal Username Format For This Site!"
notify = self.make_terminal_notify(
+26
View File
@@ -0,0 +1,26 @@
# License MIT. by balestek https://github.com/balestek
from itertools import permutations
class Permute:
def __init__(self, elements: dict):
self.separators = ["", "_", "-", "."]
self.elements = elements
def gather(self, method: str = "strict" or "all") -> dict:
permutations_dict = {}
for i in range(1, len(self.elements) + 1):
for subset in permutations(self.elements, i):
if i == 1:
if method == "all":
permutations_dict[subset[0]] = self.elements[subset[0]]
permutations_dict["_" + subset[0]] = self.elements[subset[0]]
permutations_dict[subset[0] + "_"] = self.elements[subset[0]]
else:
for separator in self.separators:
perm = separator.join(subset)
permutations_dict[perm] = self.elements[subset[0]]
if separator == "":
permutations_dict["_" + perm] = self.elements[subset[0]]
permutations_dict[perm + "_"] = self.elements[subset[0]]
return permutations_dict
+115 -76
View File
@@ -8,14 +8,17 @@ from datetime import datetime
from typing import Dict, Any
import xmind
from dateutil.tz import gettz
from dateutil.parser import parse as parse_datetime_str
from jinja2 import Template
from .checking import SUPPORTED_IDS
from .result import QueryStatus
from .result import MaigretCheckStatus
from .sites import MaigretDatabase
from .utils import is_country_tag, CaseConverter, enrich_link_str
ADDITIONAL_TZINFO = {"CDT": gettz("America/Chicago")}
SUPPORTED_JSON_REPORT_FORMATS = [
"simple",
"ndjson",
@@ -67,7 +70,7 @@ def save_txt_report(filename: str, username: str, results: dict):
def save_html_report(filename: str, context: dict):
template, _ = generate_report_template(is_pdf=False)
filled_template = template.render(**context)
with open(filename, "w") as f:
with open(filename, "w", encoding="utf-8") as f:
f.write(filled_template)
@@ -95,21 +98,20 @@ class MaigretGraph:
def __init__(self, graph):
self.G = graph
def add_node(self, key, value):
def add_node(self, key, value, color=None):
node_name = f'{key}: {value}'
params = self.other_params
params = dict(self.other_params)
if key in SUPPORTED_IDS:
params = self.username_params
params = dict(self.username_params)
elif value.startswith('http'):
params = self.site_params
params = dict(self.site_params)
self.G.add_node(node_name, title=node_name, **params)
if value != value.lower():
normalized_node_name = self.add_node(key, value.lower())
self.link(node_name, normalized_node_name)
params['title'] = node_name
if color:
params['color'] = color
self.G.add_node(node_name, **params)
return node_name
def link(self, node1_name, node2_name):
@@ -117,94 +119,126 @@ class MaigretGraph:
def save_graph_report(filename: str, username_results: list, db: MaigretDatabase):
# moved here to speed up the launch of Maigret
import networkx as nx
G = nx.Graph()
graph = MaigretGraph(G)
base_site_nodes = {}
site_account_nodes = {}
processed_values = {} # Track processed values to avoid duplicates
for username, id_type, results in username_results:
username_node_name = graph.add_node(id_type, username)
# Add username node, using normalized version directly if different
norm_username = username.lower()
username_node_name = graph.add_node(id_type, norm_username)
for website_name in results:
dictionary = results[website_name]
# TODO: fix no site data issue
if not dictionary:
continue
if dictionary.get("is_similar"):
for website_name, dictionary in results.items():
if not dictionary or dictionary.get("is_similar"):
continue
status = dictionary.get("status")
if not status: # FIXME: currently in case of timeout
if not status or status.status != MaigretCheckStatus.CLAIMED:
continue
if dictionary["status"].status != QueryStatus.CLAIMED:
continue
# base site node
site_base_url = website_name
if site_base_url not in base_site_nodes:
base_site_nodes[site_base_url] = graph.add_node(
'site', site_base_url, color='#28a745'
) # Green color
site_fallback_name = dictionary.get(
'url_user', f'{website_name}: {username.lower()}'
)
# site_node_name = dictionary.get('url_user', f'{website_name}: {username.lower()}')
site_node_name = graph.add_node('site', site_fallback_name)
graph.link(username_node_name, site_node_name)
site_base_node_name = base_site_nodes[site_base_url]
# account node
account_url = dictionary.get('url_user', f'{site_base_url}/{norm_username}')
account_node_id = f"{site_base_url}: {account_url}"
if account_node_id not in site_account_nodes:
site_account_nodes[account_node_id] = graph.add_node(
'account', account_url
)
account_node_name = site_account_nodes[account_node_id]
# link username → account → site
graph.link(username_node_name, account_node_name)
graph.link(account_node_name, site_base_node_name)
def process_ids(parent_node, ids):
for k, v in ids.items():
if k.endswith('_count') or k.startswith('is_') or k.endswith('_at'):
continue
if k in 'image':
if (
k.endswith('_count')
or k.startswith('is_')
or k.endswith('_at')
or k in 'image'
):
continue
v_data = v
if v.startswith('['):
try:
v_data = ast.literal_eval(v)
except Exception as e:
logging.error(e)
# Normalize value if string
norm_v = v.lower() if isinstance(v, str) else v
value_key = f"{k}:{norm_v}"
# value is a list
if isinstance(v_data, list):
list_node_name = graph.add_node(k, site_fallback_name)
for vv in v_data:
data_node_name = graph.add_node(vv, site_fallback_name)
graph.link(list_node_name, data_node_name)
if value_key in processed_values:
ids_data_name = processed_values[value_key]
else:
v_data = v
if isinstance(v, str) and v.startswith('['):
try:
v_data = ast.literal_eval(v)
except Exception as e:
logging.error(e)
continue
if isinstance(v_data, list):
list_node_name = graph.add_node(k, site_base_url)
processed_values[value_key] = list_node_name
for vv in v_data:
data_node_name = graph.add_node(vv, site_base_url)
graph.link(list_node_name, data_node_name)
add_ids = {
a: b for b, a in db.extract_ids_from_url(vv).items()
}
if add_ids:
process_ids(data_node_name, add_ids)
ids_data_name = list_node_name
else:
ids_data_name = graph.add_node(k, norm_v)
processed_values[value_key] = ids_data_name
if 'username' in k or k in SUPPORTED_IDS:
new_username_key = f"username:{norm_v}"
if new_username_key not in processed_values:
new_username_node_name = graph.add_node(
'username', norm_v
)
processed_values[new_username_key] = (
new_username_node_name
)
graph.link(ids_data_name, new_username_node_name)
add_ids = {
a: b for b, a in db.extract_ids_from_url(vv).items()
k: v for v, k in db.extract_ids_from_url(v).items()
}
if add_ids:
process_ids(data_node_name, add_ids)
else:
# value is just a string
# ids_data_name = f'{k}: {v}'
# if ids_data_name == parent_node:
# continue
process_ids(ids_data_name, add_ids)
ids_data_name = graph.add_node(k, v)
# G.add_node(ids_data_name, size=10, title=ids_data_name, group=3)
graph.link(parent_node, ids_data_name)
# check for username
if 'username' in k or k in SUPPORTED_IDS:
new_username_node_name = graph.add_node('username', v)
graph.link(ids_data_name, new_username_node_name)
add_ids = {k: v for v, k in db.extract_ids_from_url(v).items()}
if add_ids:
process_ids(ids_data_name, add_ids)
graph.link(parent_node, ids_data_name)
if status.ids_data:
process_ids(site_node_name, status.ids_data)
process_ids(account_node_name, status.ids_data)
nodes_to_remove = []
for node in G.nodes:
if len(str(node)) > 100:
nodes_to_remove.append(node)
# Remove overly long nodes
nodes_to_remove = [node for node in G.nodes if len(str(node)) > 100]
G.remove_nodes_from(nodes_to_remove)
[G.remove_node(node) for node in nodes_to_remove]
# Remove site nodes with only one connection
single_degree_sites = [
n for n, deg in G.degree() if n.startswith("site:") and deg <= 1
]
G.remove_nodes_from(single_degree_sites)
# moved here to speed up the launch of Maigret
# Generate interactive visualization
from pyvis.network import Network
nt = Network(notebook=True, height="750px", width="100%")
@@ -292,8 +326,12 @@ def generate_report_context(username_results: list):
first_seen = created_at
else:
try:
known_time = parse_datetime_str(first_seen)
new_time = parse_datetime_str(created_at)
known_time = parse_datetime_str(
first_seen, tzinfos=ADDITIONAL_TZINFO
)
new_time = parse_datetime_str(
created_at, tzinfos=ADDITIONAL_TZINFO
)
if new_time < known_time:
first_seen = created_at
except Exception as e:
@@ -302,6 +340,7 @@ def generate_report_context(username_results: list):
first_seen,
created_at,
str(e),
exc_info=True,
)
for k, v in status.ids_data.items():
@@ -333,7 +372,7 @@ def generate_report_context(username_results: list):
new_ids.append((u, utype))
usernames[u] = {"type": utype}
if status.status == QueryStatus.CLAIMED:
if status.status == MaigretCheckStatus.CLAIMED:
found_accounts += 1
dictionary["found"] = True
else:
@@ -413,7 +452,7 @@ def generate_txt_report(username: str, results: dict, file):
continue
if (
dictionary.get("status")
and dictionary["status"].status == QueryStatus.CLAIMED
and dictionary["status"].status == MaigretCheckStatus.CLAIMED
):
exists_counter += 1
file.write(dictionary["url_user"] + "\n")
@@ -430,7 +469,7 @@ def generate_json_report(username: str, results: dict, file, report_type):
if not site_result or not site_result.get("status"):
continue
if site_result["status"].status != QueryStatus.CLAIMED:
if site_result["status"].status != MaigretCheckStatus.CLAIMED:
continue
data = dict(site_result)
@@ -491,7 +530,7 @@ def design_xmind_sheet(sheet, username, results):
continue
result_status = dictionary.get("status")
# TODO: fix the reason
if not result_status or result_status.status != QueryStatus.CLAIMED:
if not result_status or result_status.status != MaigretCheckStatus.CLAIMED:
continue
stripped_tags = list(map(lambda x: x.strip(), result_status.tags))
+32581 -25368
View File
File diff suppressed because it is too large Load Diff
+13 -3
View File
@@ -1,21 +1,30 @@
{
"presence_strings": [
"user not found",
"404",
"Page not found",
"error 404",
"username",
"not found",
"пользователь",
"profile",
"lastname",
"firstname",
"DisplayName",
"biography",
"title",
"birthday",
"репутация",
"информация",
"e-mail"
"e-mail",
"body",
"html",
"style"
],
"supposed_usernames": [
"alex", "god", "admin", "red", "blue", "john"
],
"retries_count": 1,
"retries_count": 0,
"sites_db_path": "resources/data.json",
"timeout": 30,
"max_connections": 100,
@@ -44,5 +53,6 @@
"xmind_report": false,
"graph_report": false,
"pdf_report": false,
"html_report": false
"html_report": false,
"web_interface_port": 5000
}
-1
View File
@@ -68,7 +68,6 @@
<div class="row-mb">
<div class="col-md">
<div class="card flex-md-row mb-4 box-shadow h-md-250">
<span style="position: absolute; right: 10px;"><a href="https://github.com/soxoj/maigret/issues/new?assignees=soxoj&amp;labels=bug&amp;template=report-false-result.md&amp;title=Invalid%20result%20{{ v.url_user }}">Invalid?</a></span>
<img class="card-img-right flex-auto d-md-block" alt="Photo" style="width: 200px; height: 200px; object-fit: scale-down;" src="{{ v.status and v.status.ids_data and v.status.ids_data.image or 'https://i.imgur.com/040fmbw.png' }}" data-holder-rendered="true">
<div class="card-body d-flex flex-column align-items-start" style="padding-top: 0;">
<h3 class="mb-0" style="padding-top: 1rem;">
-1
View File
@@ -64,7 +64,6 @@
<div class="sitebox" style="margin-top: 20px;" >
<div>
<div>
<span class="invalid-button"><a href="https://github.com/soxoj/maigret/issues/new?assignees=soxoj&amp;labels=bug&amp;template=report-false-result.md&amp;title=Invalid%20result%20{{ v.url_user }}">Invalid?</a></span>
<table>
<tr>
<td valign="top">
+10 -11
View File
@@ -2,10 +2,11 @@
This module defines various objects for recording the results of queries.
"""
from enum import Enum
class QueryStatus(Enum):
class MaigretCheckStatus(Enum):
"""Query Status Enumeration.
Describes status of query about a given username.
@@ -28,10 +29,9 @@ class QueryStatus(Enum):
return self.value
class QueryResult:
"""Query Result Object.
Describes result of query about a given username.
class MaigretCheckResult:
"""
Describes result of checking a given username on a given site
"""
def __init__(
@@ -46,11 +46,7 @@ class QueryResult:
error=None,
tags=[],
):
"""Create Query Result Object.
Contains information about a specific method of detecting usernames on
a given type of web sites.
"""
Keyword Arguments:
self -- This object.
username -- String indicating username that query result
@@ -97,7 +93,10 @@ class QueryResult:
}
def is_found(self):
return self.status == QueryStatus.CLAIMED
return self.status == MaigretCheckStatus.CLAIMED
def __repr__(self):
return f"<{self.__str__()}>"
def __str__(self):
"""Convert Object To String.
+2 -1
View File
@@ -5,7 +5,7 @@ from typing import List
SETTINGS_FILES_PATHS = [
path.join(path.dirname(path.realpath(__file__)), "resources/settings.json"),
'~/.maigret/settings.json',
path.expanduser('~/.maigret/settings.json'),
path.join(os.getcwd(), 'settings.json'),
]
@@ -42,6 +42,7 @@ class Settings:
pdf_report: bool
html_report: bool
graph_report: bool
web_interface_port: int
# submit mode settings
presence_strings: list
+234 -20
View File
@@ -21,6 +21,7 @@ class MaigretEngine:
class MaigretSite:
# Fields that should not be serialized when converting site to JSON
NOT_SERIALIZABLE_FIELDS = [
"name",
"engineData",
@@ -31,37 +32,69 @@ class MaigretSite:
"urlRegexp",
]
# Username known to exist on the site
username_claimed = ""
# Username known to not exist on the site
username_unclaimed = ""
# Additional URL path component, e.g. /forum in https://example.com/forum/users/{username}
url_subpath = ""
# Main site URL (the main page)
url_main = ""
# Full URL pattern for username page, e.g. https://example.com/forum/users/{username}
url = ""
# Whether site is disabled. Not used by Maigret without --use-disabled argument
disabled = False
# Whether a positive result indicates accounts with similar usernames rather than exact matches
similar_search = False
# Whether to ignore 403 status codes
ignore403 = False
# Site category tags
tags: List[str] = []
# Type of identifier (username, gaia_id etc); see SUPPORTED_IDS in checking.py
type = "username"
# Custom HTTP headers
headers: Dict[str, str] = {}
# Error message substrings
errors: Dict[str, str] = {}
# Site activation requirements
activation: Dict[str, Any] = {}
# Regular expression for username validation
regex_check = None
# URL to probe site status
url_probe = None
# Type of check to perform
check_type = ""
# HTTP request method (GET, POST, HEAD, etc.)
request_method = ""
# HTTP request payload (for POST, PUT, etc.)
request_payload: Dict[str, Any] = {}
# Whether to only send HEAD requests (GET by default)
request_head_only = ""
# GET parameters to include in requests
get_params: Dict[str, Any] = {}
# Substrings in HTML response that indicate profile exists
presense_strs: List[str] = []
# Substrings in HTML response that indicate profile doesn't exist
absence_strs: List[str] = []
# Site statistics
stats: Dict[str, Any] = {}
# Site engine name
engine = None
# Engine-specific configuration
engine_data: Dict[str, Any] = {}
# Engine instance
engine_obj: Optional["MaigretEngine"] = None
# Future for async requests
request_future = None
# Alexa traffic rank
alexa_rank = None
# Source (in case a site is a mirror of another site)
source = None
# URL protocol (http/https)
protocol = ''
def __init__(self, name, information):
@@ -80,6 +113,56 @@ class MaigretSite:
def __str__(self):
return f"{self.name} ({self.url_main})"
def __is_equal_by_url_or_name(self, url_or_name_str: str):
lower_url_or_name_str = url_or_name_str.lower()
lower_url = self.url.lower()
lower_name = self.name.lower()
lower_url_main = self.url_main.lower()
return (
lower_name == lower_url_or_name_str
or (lower_url_main and lower_url_main == lower_url_or_name_str)
or (lower_url_main and lower_url_main in lower_url_or_name_str)
or (lower_url_main and lower_url_or_name_str in lower_url_main)
or (lower_url and lower_url_or_name_str in lower_url)
)
def __eq__(self, other):
if isinstance(other, MaigretSite):
# Compare only relevant attributes, not internal state like request_future
attrs_to_compare = [
'name',
'url_main',
'url_subpath',
'type',
'headers',
'errors',
'activation',
'regex_check',
'url_probe',
'check_type',
'request_method',
'request_payload',
'request_head_only',
'get_params',
'presense_strs',
'absence_strs',
'stats',
'engine',
'engine_data',
'alexa_rank',
'source',
'protocol',
]
return all(
getattr(self, attr) == getattr(other, attr) for attr in attrs_to_compare
)
elif isinstance(other, str):
# Compare only by name (exactly) or url_main (partial similarity)
return self.__is_equal_by_url_or_name(other)
return False
def update_detectors(self):
if "url" in self.__dict__:
url = self.url
@@ -101,6 +184,10 @@ class MaigretSite:
return None
def extract_id_from_url(self, url: str) -> Optional[Tuple[str, str]]:
"""
Extracts username from url.
It's outdated, detects only a format of https://example.com/{username}
"""
if not self.url_regexp:
return None
@@ -223,20 +310,52 @@ class MaigretDatabase:
def sites_dict(self):
return {site.name: site for site in self._sites}
def has_site(self, site: MaigretSite):
for s in self._sites:
if site == s:
return True
return False
def __contains__(self, site):
return self.has_site(site)
def ranked_sites_dict(
self,
reverse=False,
top=sys.maxsize,
tags=[],
excluded_tags=[],
names=[],
disabled=True,
id_type="username",
):
"""
Ranking and filtering of the sites list
When ``top`` is limited (not "all sites"), **mirrors** may be appended after
the Alexa-ranked slice. A mirror is any filtered site with a non-empty
``source`` field equal to the name of a site that appears in the first
``top`` positions of a **parent ranking** that includes disabled sites.
Thus mirrors such as third-party viewers (e.g. for Twitter or Instagram)
are still scanned when their parent platform ranks highly, even if the
official site is disabled and omitted from the main list.
Args:
reverse (bool, optional): Reverse the sorting order. Defaults to False.
top (int, optional): Maximum number of sites to return. Defaults to sys.maxsize.
tags (list, optional): List of tags to filter sites by (whitelist). Defaults to empty list.
excluded_tags (list, optional): List of tags to exclude sites by (blacklist). Defaults to empty list.
names (list, optional): List of site names (or urls, see MaigretSite.__eq__) to filter by. Defaults to empty list.
disabled (bool, optional): Whether to include disabled sites. Defaults to True.
id_type (str, optional): Type of identifier to filter by. Defaults to "username".
Returns:
dict: Dictionary of filtered and ranked sites (base top slice plus mirrors),
with site names as keys and MaigretSite objects as values
"""
normalized_names = list(map(str.lower, names))
normalized_tags = list(map(str.lower, tags))
normalized_excluded_tags = list(map(str.lower, excluded_tags))
is_name_ok = lambda x: x.name.lower() in normalized_names
is_source_ok = lambda x: x.source and x.source.lower() in normalized_names
@@ -250,6 +369,22 @@ class MaigretDatabase:
)
is_id_type_ok = lambda x: x.type == id_type
is_excluded_by_tag = lambda x: set(
map(str.lower, x.tags)
).intersection(set(normalized_excluded_tags))
is_excluded_by_engine = lambda x: (
isinstance(x.engine, str)
and x.engine.lower() in normalized_excluded_tags
)
is_excluded_by_protocol = lambda x: (
x.protocol and x.protocol in normalized_excluded_tags
)
is_not_excluded = lambda x: not excluded_tags or not (
is_excluded_by_tag(x)
or is_excluded_by_engine(x)
or is_excluded_by_protocol(x)
)
filter_tags_engines_fun = (
lambda x: not tags
or is_engine_ok(x)
@@ -260,6 +395,7 @@ class MaigretDatabase:
filter_fun = (
lambda x: filter_tags_engines_fun(x)
and is_not_excluded(x)
and filter_names_fun(x)
and is_disabled_needed(x)
and is_id_type_ok(x)
@@ -270,6 +406,33 @@ class MaigretDatabase:
sorted_list = sorted(
filtered_list, key=lambda x: x.alexa_rank, reverse=reverse
)[:top]
# Mirrors: sites whose `source` matches a parent platform that ranks in the
# top `top` by Alexa when disabled entries are included in the ranking pool
# (so e.g. Instagram can be a parent for Picuki even if Instagram is disabled).
if top < sys.maxsize and sorted_list:
filter_fun_ranking_parents = (
lambda x: filter_tags_engines_fun(x)
and is_not_excluded(x)
and filter_names_fun(x)
and is_id_type_ok(x)
)
ranking_pool = [s for s in self.sites if filter_fun_ranking_parents(s)]
sorted_parents = sorted(
ranking_pool, key=lambda x: x.alexa_rank, reverse=reverse
)[:top]
parent_names_lower = {s.name.lower() for s in sorted_parents}
base_names = {s.name for s in sorted_list}
def is_mirror(s) -> bool:
if not s.source or s.name in base_names:
return False
return s.source.lower() in parent_names_lower
mirrors = [s for s in filtered_list if is_mirror(s)]
mirrors.sort(key=lambda x: (x.alexa_rank, x.name))
sorted_list = list(sorted_list) + mirrors
return {site.name: site for site in sorted_list}
@property
@@ -419,41 +582,92 @@ class MaigretDatabase:
results[_id] = _type
return results
def get_db_stats(self, sites_dict):
if not sites_dict:
sites_dict = self.sites_dict()
def get_db_stats(self, is_markdown=False):
# Initialize counters
sites_dict = self.sites_dict
urls = {}
tags = {}
output = ""
disabled_count = 0
total_count = len(sites_dict)
message_checks_one_factor = 0
status_checks = 0
for _, site in sites_dict.items():
# Collect statistics
for site in sites_dict.values():
# Count disabled sites
if site.disabled:
disabled_count += 1
# Count URL types
url_type = site.get_url_template()
urls[url_type] = urls.get(url_type, 0) + 1
# Count check types for enabled sites
if not site.disabled:
if site.check_type == 'message':
if not (site.absence_strs and site.presense_strs):
message_checks_one_factor += 1
elif site.check_type == 'status_code':
status_checks += 1
# Count tags
if not site.tags:
tags["NO_TAGS"] = tags.get("NO_TAGS", 0) + 1
for tag in filter(lambda x: not is_country_tag(x), site.tags):
tags[tag] = tags.get(tag, 0) + 1
output += f"Enabled/total sites: {total_count - disabled_count}/{total_count}\n"
output += "Top profile URLs:\n"
for url, count in sorted(urls.items(), key=lambda x: x[1], reverse=True)[:20]:
# Calculate percentages
total_count = len(sites_dict)
enabled_count = total_count - disabled_count
enabled_perc = round(100 * enabled_count / total_count, 2)
checks_perc = round(100 * message_checks_one_factor / enabled_count, 2)
status_checks_perc = round(100 * status_checks / enabled_count, 2)
# Sites with probing and activation (kinda special cases, let's watch them)
site_with_probing = []
site_with_activation = []
for site in sites_dict.values():
def get_site_label(site):
return f"{site.name}{' (disabled)' if site.disabled else ''}"
if site.url_probe:
site_with_probing.append(get_site_label(site))
if site.activation:
site_with_activation.append(get_site_label(site))
# Format output
separator = "\n\n"
output = [
f"Enabled/total sites: {enabled_count}/{total_count} = {enabled_perc}%",
f"Incomplete message checks: {message_checks_one_factor}/{enabled_count} = {checks_perc}% (false positive risks)",
f"Status code checks: {status_checks}/{enabled_count} = {status_checks_perc}% (false positive risks)",
f"False positive risk (total): {checks_perc + status_checks_perc:.2f}%",
f"Sites with probing: {', '.join(sorted(site_with_probing))}",
f"Sites with activation: {', '.join(sorted(site_with_activation))}",
self._format_top_items("profile URLs", urls, 20, is_markdown),
self._format_top_items("tags", tags, 20, is_markdown, self._tags),
]
return separator.join(output)
def _format_top_items(
self, title, items_dict, limit, is_markdown, valid_items=None
):
"""Helper method to format top items lists"""
output = f"Top {limit} {title}:\n"
for item, count in sorted(items_dict.items(), key=lambda x: x[1], reverse=True)[
:limit
]:
if count == 1:
break
output += f"{count}\t{url}\n"
output += "Top tags:\n"
for tag, count in sorted(tags.items(), key=lambda x: x[1], reverse=True)[:200]:
mark = ""
if tag not in self._tags:
mark = " (non-standard)"
output += f"{count}\t{tag}{mark}\n"
mark = (
" (non-standard)"
if valid_items is not None and item not in valid_items
else ""
)
output += (
f"- ({count})\t`{item}`{mark}\n"
if is_markdown
else f"{count}\t{item}{mark}\n"
)
return output
+454 -187
View File
@@ -1,17 +1,44 @@
import asyncio
import json
import re
from typing import List
import xml.etree.ElementTree as ET
from aiohttp import TCPConnector, ClientSession
import requests
import os
import logging
from typing import Any, Dict, List, Optional, Tuple
from aiohttp import ClientSession, TCPConnector
from aiohttp_socks import ProxyConnector
import cloudscraper
from colorama import Fore, Style
from .activation import import_aiohttp_cookies
from .checking import maigret
from .result import QueryStatus
from .result import MaigretCheckResult
from .settings import Settings
from .sites import MaigretDatabase, MaigretSite, MaigretEngine
from .utils import get_random_user_agent, get_match_ratio
from .sites import MaigretDatabase, MaigretEngine, MaigretSite
from .utils import get_random_user_agent
from .checking import site_self_check
from .utils import get_match_ratio, generate_random_username
class CloudflareSession:
def __init__(self):
self.scraper = cloudscraper.create_scraper()
async def get(self, *args, **kwargs):
await asyncio.sleep(0)
res = self.scraper.get(*args, **kwargs)
self.last_text = res.text
self.status = res.status_code
return self
def status_code(self):
return self.status
async def text(self):
await asyncio.sleep(0)
return self.last_text
async def close(self):
pass
class Submitter:
@@ -19,7 +46,7 @@ class Submitter:
"User-Agent": get_random_user_agent(),
}
SEPARATORS = "\"'"
SEPARATORS = "\"'\n"
RATIO = 0.6
TOP_FEATURES = 5
@@ -32,10 +59,14 @@ class Submitter:
self.logger = logger
from aiohttp_socks import ProxyConnector
proxy = self.args.proxy
cookie_jar = None
if args.cookie_file:
cookie_jar = import_aiohttp_cookies(args.cookie_file)
if not os.path.exists(args.cookie_file):
logger.error(f"Cookie file {args.cookie_file} does not exist!")
else:
cookie_jar = import_aiohttp_cookies(args.cookie_file)
connector = ProxyConnector.from_url(proxy) if proxy else TCPConnector(ssl=False)
connector.verify_ssl = False
@@ -43,11 +74,17 @@ class Submitter:
connector=connector, trust_env=True, cookie_jar=cookie_jar
)
async def close(self):
await self.session.close()
@staticmethod
def get_alexa_rank(site_url_main):
import requests
import xml.etree.ElementTree as ElementTree
url = f"http://data.alexa.com/data?cli=10&url={site_url_main}"
xml_data = requests.get(url).text
root = ET.fromstring(xml_data)
root = ElementTree.fromstring(xml_data)
alexa_rank = 0
try:
@@ -62,71 +99,18 @@ class Submitter:
return "/".join(url.split("/", 3)[:3])
async def site_self_check(self, site, semaphore, silent=False):
changes = {
"disabled": False,
}
check_data = [
(site.username_claimed, QueryStatus.CLAIMED),
(site.username_unclaimed, QueryStatus.AVAILABLE),
]
self.logger.info(f"Checking {site.name}...")
for username, status in check_data:
results_dict = await maigret(
username=username,
site_dict={site.name: site},
proxy=self.args.proxy,
logger=self.logger,
cookies=self.args.cookie_file,
timeout=30,
id_type=site.type,
forced=True,
no_progressbar=True,
)
# don't disable entries with other ids types
# TODO: make normal checking
if site.name not in results_dict:
self.logger.info(results_dict)
changes["disabled"] = True
continue
result = results_dict[site.name]["status"]
site_status = result.status
if site_status != status:
if site_status == QueryStatus.UNKNOWN:
msgs = site.absence_strs
etype = site.check_type
self.logger.warning(
"Error while searching '%s' in %s: %s, %s, check type %s",
username,
site.name,
result.context,
msgs,
etype,
)
# don't disable in case of available username
if status == QueryStatus.CLAIMED:
changes["disabled"] = True
elif status == QueryStatus.CLAIMED:
self.logger.warning(
f"Not found `{username}` in {site.name}, must be claimed"
)
self.logger.info(results_dict[site.name])
changes["disabled"] = True
else:
self.logger.warning(
f"Found `{username}` in {site.name}, must be available"
)
self.logger.info(results_dict[site.name])
changes["disabled"] = True
self.logger.info(f"Site {site.name} checking is finished")
# Call the general function from the checking.py
changes = await site_self_check(
site=site,
logger=self.logger,
semaphore=semaphore,
db=self.db,
silent=silent,
proxy=self.args.proxy,
cookies=self.args.cookie_file,
# Don't skip errors in submit mode - we need check both false positives/true negatives
skip_errors=False,
)
return changes
def generate_additional_fields_dialog(self, engine: MaigretEngine, dialog):
@@ -141,16 +125,14 @@ class Submitter:
fields['urlSubpath'] = f'/{subpath}'
return fields
async def detect_known_engine(self, url_exists, url_mainpage) -> List[MaigretSite]:
resp_text = ''
try:
r = await self.session.get(url_mainpage)
resp_text = await r.text()
self.logger.debug(resp_text)
except Exception as e:
self.logger.warning(e)
print("Some error while checking main page")
return []
async def detect_known_engine(
self, url_exists, url_mainpage, session, follow_redirects, headers
) -> [List[MaigretSite], str]:
session = session or self.session
resp_text, _ = await self.get_html_response_to_compare(
url_exists, session, follow_redirects, headers
)
for engine in self.db.engines:
strs_to_check = engine.__dict__.get("presenseStrs")
@@ -177,7 +159,7 @@ class Submitter:
for u in usernames_to_check:
site_data = {
"urlMain": url_mainpage,
"name": url_mainpage.split("//")[1],
"name": url_mainpage.split("//")[1].split("/")[0],
"engine": engine_name,
"usernameClaimed": u,
"usernameUnclaimed": "noonewouldeverusethis7",
@@ -193,132 +175,255 @@ class Submitter:
)
sites.append(maigret_site)
return sites
return sites, resp_text
return []
return [], resp_text
def extract_username_dialog(self, url):
@staticmethod
def extract_username_dialog(url):
url_parts = url.rstrip("/").split("/")
supposed_username = url_parts[-1].strip('@')
entered_username = input(
f'Is "{supposed_username}" a valid username? If not, write it manually: '
f"{Fore.GREEN}[?] Is \"{supposed_username}\" a valid username? If not, write it manually: {Style.RESET_ALL}"
)
return entered_username if entered_username else supposed_username
async def check_features_manually(
self, url_exists, url_mainpage, cookie_file, redirects=False
# TODO: replace with checking.py/SimpleAiohttpChecker call
@staticmethod
async def get_html_response_to_compare(
url: str, session: ClientSession = None, redirects=False, headers: Dict = None
):
custom_headers = {}
while self.args.verbose:
header_key = input(
'Specify custom header if you need or just press Enter to skip. Header name: '
async with session.get(
url, allow_redirects=redirects, headers=headers
) as response:
# Try different encodings or fallback to 'ignore' errors
try:
html_response = await response.text(encoding='utf-8')
except UnicodeDecodeError:
try:
html_response = await response.text(encoding='latin1')
except UnicodeDecodeError:
html_response = await response.text(errors='ignore')
return html_response, response.status
async def check_features_manually(
self,
username: str,
url_exists: str,
cookie_filename="", # TODO: use cookies
session: ClientSession = None,
follow_redirects=False,
headers: dict = None,
) -> Tuple[List[str], List[str], str, str]:
random_username = generate_random_username()
url_of_non_existing_account = url_exists.lower().replace(
username.lower(), random_username
)
try:
session = session or self.session
first_html_response, first_status = await self.get_html_response_to_compare(
url_exists, session, follow_redirects, headers
)
if not header_key:
break
header_value = input('Header value: ')
custom_headers[header_key.strip()] = header_value.strip()
second_html_response, second_status = (
await self.get_html_response_to_compare(
url_of_non_existing_account, session, follow_redirects, headers
)
)
await session.close()
except Exception as e:
self.logger.error(
f"Error while getting HTTP response for username {username}: {e}",
exc_info=True,
)
return None, None, str(e), random_username
supposed_username = self.extract_username_dialog(url_exists)
non_exist_username = "noonewouldeverusethis7"
url_user = url_exists.replace(supposed_username, "{username}")
url_not_exists = url_exists.replace(supposed_username, non_exist_username)
headers = dict(self.HEADERS)
headers.update(custom_headers)
exists_resp = await self.session.get(
url_exists,
headers=headers,
allow_redirects=redirects,
self.logger.info(f"URL with existing account: {url_exists}")
self.logger.info(
f"HTTP response status for URL with existing account: {first_status}"
)
exists_resp_text = await exists_resp.text()
self.logger.debug(url_exists)
self.logger.debug(exists_resp.status)
self.logger.debug(exists_resp_text)
non_exists_resp = await self.session.get(
url_not_exists,
headers=headers,
allow_redirects=redirects,
self.logger.info(
f"HTTP response length URL with existing account: {len(first_html_response)}"
)
non_exists_resp_text = await non_exists_resp.text()
self.logger.debug(url_not_exists)
self.logger.debug(non_exists_resp.status)
self.logger.debug(non_exists_resp_text)
self.logger.debug(first_html_response)
a = exists_resp_text
b = non_exists_resp_text
self.logger.info(f"URL with existing account: {url_of_non_existing_account}")
self.logger.info(
f"HTTP response status for URL with non-existing account: {second_status}"
)
self.logger.info(
f"HTTP response length URL with non-existing account: {len(second_html_response)}"
)
self.logger.debug(second_html_response)
tokens_a = set(re.split(f'[{self.SEPARATORS}]', a))
tokens_b = set(re.split(f'[{self.SEPARATORS}]', b))
# TODO: filter by errors, move to dialog function
if (
"/cdn-cgi/challenge-platform" in first_html_response
or "\t\t\t\tnow: " in first_html_response
or "Sorry, you have been blocked" in first_html_response
):
self.logger.info("Cloudflare detected, skipping")
return None, None, "Cloudflare detected, skipping", random_username
tokens_a = set(re.split(f'[{self.SEPARATORS}]', first_html_response))
tokens_b = set(re.split(f'[{self.SEPARATORS}]', second_html_response))
a_minus_b = tokens_a.difference(tokens_b)
b_minus_a = tokens_b.difference(tokens_a)
if len(a_minus_b) == len(b_minus_a) == 0:
print("The pages for existing and non-existing account are the same!")
a_minus_b = list(map(lambda x: x.strip('\\'), a_minus_b))
b_minus_a = list(map(lambda x: x.strip('\\'), b_minus_a))
top_features_count = int(
input(
f"Specify count of features to extract [default {self.TOP_FEATURES}]: "
# Filter out strings containing usernames
a_minus_b = [s for s in a_minus_b if username.lower() not in s.lower()]
b_minus_a = [s for s in b_minus_a if random_username.lower() not in s.lower()]
def filter_tokens(token: str, html_response: str) -> bool:
is_in_html = token in html_response
is_long_str = len(token) >= 50
is_number = re.match(r'^\d\.?\d+$', token) or re.match(r':^\d+$', token)
is_whitelisted_number = token in ['200', '404', '403']
return not (
is_in_html or is_long_str or (is_number and not is_whitelisted_number)
)
or self.TOP_FEATURES
a_minus_b = list(
filter(lambda t: filter_tokens(t, second_html_response), a_minus_b)
)
b_minus_a = list(
filter(lambda t: filter_tokens(t, first_html_response), b_minus_a)
)
if len(a_minus_b) == len(b_minus_a) == 0:
return (
None,
None,
"HTTP responses for pages with existing and non-existing accounts are the same",
random_username,
)
match_fun = get_match_ratio(self.settings.presence_strings)
presence_list = sorted(a_minus_b, key=match_fun, reverse=True)[
:top_features_count
: self.TOP_FEATURES
]
print("Detected text features of existing account: " + ", ".join(presence_list))
features = input("If features was not detected correctly, write it manually: ")
if features:
presence_list = list(map(str.strip, features.split(",")))
absence_list = sorted(b_minus_a, key=match_fun, reverse=True)[
:top_features_count
: self.TOP_FEATURES
]
self.logger.info(f"Detected presence features: {presence_list}")
self.logger.info(f"Detected absence features: {absence_list}")
return presence_list, absence_list, "Found", random_username
async def add_site(self, site):
sem = asyncio.Semaphore(1)
print(
"Detected text features of non-existing account: " + ", ".join(absence_list)
f"{Fore.BLUE}{Style.BRIGHT}[*] Adding site {site.name}, let's check it...{Style.RESET_ALL}"
)
features = input("If features was not detected correctly, write it manually: ")
if features:
absence_list = list(map(str.strip, features.split(",")))
result = await self.site_self_check(site, sem)
if result["disabled"]:
print(f"Checks failed for {site.name}, please, verify them manually.")
return {
"valid": False,
"reason": "checks_failed",
}
site_data = {
"absenceStrs": absence_list,
"presenseStrs": presence_list,
"url": url_user,
"urlMain": url_mainpage,
"usernameClaimed": supposed_username,
"usernameUnclaimed": non_exist_username,
"checkType": "message",
while True:
print("\nAvailable fields to edit:")
editable_fields = {
'1': 'name',
'2': 'tags',
'3': 'url',
'4': 'url_main',
'5': 'username_claimed',
'6': 'username_unclaimed',
'7': 'presense_strs',
'8': 'absence_strs',
}
for num, field in editable_fields.items():
current_value = getattr(site, field)
print(f"{num}. {field} (current: {current_value})")
print("0. finish editing")
print("10. reject and block domain")
print("11. invalid params, remove")
choice = input("\nSelect field number to edit (0-8): ").strip()
if choice == '0':
break
if choice == '10':
return {
"valid": False,
"reason": "manual block",
}
if choice == '11':
return {
"valid": False,
"reason": "remove",
}
if choice in editable_fields:
field = editable_fields[choice]
current_value = getattr(site, field)
new_value = input(
f"Enter new value for {field} (current: {current_value}): "
).strip()
if field in ['tags', 'presense_strs', 'absence_strs']:
new_value = list(map(str.strip, new_value.split(',')))
if new_value:
setattr(site, field, new_value)
print(f"Updated {field} to: {new_value}")
self.logger.info(site.json)
self.db.update_site(site)
return {
"valid": True,
}
if headers != self.HEADERS:
site_data['headers'] = headers
site = MaigretSite(url_mainpage.split("/")[-1], site_data)
return site
async def dialog(self, url_exists, cookie_file):
"""
An implementation of the submit mode:
- User provides a URL of a existing social media account
- Maigret tries to detect the site engine and understand how to check
for account presence with HTTP responses analysis
- If detection succeeds, Maigret generates a new site entry/replace old one in the database
"""
old_site = None
additional_options_enabled = self.logger.level in (
logging.DEBUG,
logging.WARNING,
)
domain_raw = self.URL_RE.sub("", url_exists).strip().strip("/")
domain_raw = domain_raw.split("/")[0]
self.logger.info('Domain is %s', domain_raw)
# check for existence
domain_re = re.compile(
r'://(www\.)?' + re.escape(domain_raw) + r'(/|$)'
)
matched_sites = list(
filter(lambda x: domain_raw in x.url_main + x.url, self.db.sites)
filter(
lambda x: domain_re.search(x.url_main + x.url), self.db.sites
)
)
if matched_sites:
# TODO: update the existing site
print(
f'Sites with domain "{domain_raw}" already exists in the Maigret database!'
f"{Fore.YELLOW}[!] Sites with domain \"{domain_raw}\" already exists in the Maigret database!{Style.RESET_ALL}"
)
status = lambda s: "(disabled)" if s.disabled else ""
url_block = lambda s: f"\n\t{s.url_main}\n\t{s.url}"
print(
@@ -330,36 +435,135 @@ class Submitter:
)
)
if input("Do you want to continue? [yN] ").lower() in "n":
if (
input(
f"{Fore.GREEN}[?] Do you want to continue? [yN] {Style.RESET_ALL}"
).lower()
in "n"
):
return False
site_names = [site.name for site in matched_sites]
site_name = (
input(
f"{Fore.GREEN}[?] Which site do you want to update in case of success? 1st by default. [{', '.join(site_names)}] {Style.RESET_ALL}"
)
or matched_sites[0].name
)
old_site = next(
(site for site in matched_sites if site.name == site_name), None
)
if old_site is None:
print(
f'{Fore.RED}[!] Site "{site_name}" not found in the matched list. Proceeding without updating an existing site.{Style.RESET_ALL}'
)
else:
print(
f'{Fore.GREEN}[+] We will update site "{old_site.name}" in case of success.{Style.RESET_ALL}'
)
# Check if the site check is ordinary or not
if old_site and (old_site.url_probe or old_site.activation):
skip = input(
f"{Fore.RED}[!] The site check depends on activation / probing mechanism! Consider to update it manually. Continue? [yN]{Style.RESET_ALL}"
)
if skip.lower() in ['n', '']:
return False
# TODO: urlProbe support
# TODO: activation support
url_mainpage = self.extract_mainpage_url(url_exists)
# headers update
custom_headers = dict(self.HEADERS)
while additional_options_enabled:
header_key = input(
f'{Fore.GREEN}[?] Specify custom header if you need or just press Enter to skip. Header name: {Style.RESET_ALL}'
)
if not header_key:
break
header_value = input(f'{Fore.GREEN}[?] Header value: {Style.RESET_ALL}')
custom_headers[header_key.strip()] = header_value.strip()
# redirects settings update
redirects = False
if additional_options_enabled:
redirects = (
'y'
in input(
f'{Fore.GREEN}[?] Should we do redirects automatically? [yN] {Style.RESET_ALL}'
).lower()
)
print('Detecting site engine, please wait...')
sites = []
text = None
try:
sites = await self.detect_known_engine(url_exists, url_mainpage)
sites, text = await self.detect_known_engine(
url_exists,
url_exists,
session=None,
follow_redirects=redirects,
headers=custom_headers,
)
except KeyboardInterrupt:
print('Engine detect process is interrupted.')
if 'cloudflare' in text.lower():
print(
'Cloudflare protection detected. I will use cloudscraper for further work'
)
# self.session = CloudflareSession()
if not sites:
print("Unable to detect site engine, lets generate checking features")
redirects = False
if self.args.verbose:
redirects = 'y' in input('Should we do redirects automatically? [yN] ').lower()
supposed_username = self.extract_username_dialog(url_exists)
self.logger.info(f"Supposed username: {supposed_username}")
sites = [
# TODO: pass status_codes
# check it here and suggest to enable / auto-enable redirects
presence_list, absence_list, status, non_exist_username = (
await self.check_features_manually(
url_exists, url_mainpage, cookie_file, redirects,
username=supposed_username,
url_exists=url_exists,
cookie_filename=cookie_file,
follow_redirects=redirects,
headers=custom_headers,
)
]
)
if status == "Found":
site_data = {
"absenceStrs": absence_list,
"presenseStrs": presence_list,
"url": url_exists.replace(supposed_username, '{username}'),
"urlMain": url_mainpage,
"usernameClaimed": supposed_username,
"usernameUnclaimed": non_exist_username,
"headers": custom_headers,
"checkType": "message",
}
self.logger.info(json.dumps(site_data, indent=4))
if custom_headers != self.HEADERS:
site_data['headers'] = custom_headers
site = MaigretSite(url_mainpage.split("/")[-1], site_data)
sites.append(site)
else:
print(
f"{Fore.RED}[!] The check for site failed! Reason: {status}{Style.RESET_ALL}"
)
return False
self.logger.debug(sites[0].__dict__)
sem = asyncio.Semaphore(1)
print("Checking, please wait...")
print(f"{Fore.GREEN}[*] Checking, please wait...{Style.RESET_ALL}")
found = False
chosen_site = None
for s in sites:
@@ -371,7 +575,7 @@ class Submitter:
if not found:
print(
f"Sorry, we couldn't find params to detect account presence/absence in {chosen_site.name}."
f"{Fore.RED}[!] The check for site '{chosen_site.name}' failed!{Style.RESET_ALL}"
)
print(
"Try to run this mode again and increase features count or choose others."
@@ -381,7 +585,7 @@ class Submitter:
else:
if (
input(
f"Site {chosen_site.name} successfully checked. Do you want to save it in the Maigret DB? [Yn] "
f"{Fore.GREEN}[?] Site {chosen_site.name} successfully checked. Do you want to save it in the Maigret DB? [Yn] {Style.RESET_ALL}"
)
.lower()
.strip("y")
@@ -389,19 +593,82 @@ class Submitter:
return False
if self.args.verbose:
source = input("Name the source site if it is mirror: ")
self.logger.info(
"Verbose mode is enabled, additional settings are available"
)
source = input(
f"{Fore.GREEN}[?] Name the source site if it is mirror: {Style.RESET_ALL}"
)
if source:
chosen_site.source = source
chosen_site.name = input("Change site name if you want: ") or chosen_site.name
chosen_site.tags = list(map(str.strip, input("Site tags: ").split(',')))
rank = Submitter.get_alexa_rank(chosen_site.url_main)
if rank:
print(f'New alexa rank: {rank}')
chosen_site.alexa_rank = rank
default_site_name = old_site.name if old_site else chosen_site.name
new_name = (
input(
f"{Fore.GREEN}[?] Change site name if you want [{default_site_name}]: {Style.RESET_ALL}"
)
or default_site_name
)
if new_name != default_site_name:
self.logger.info(f"New site name is {new_name}")
chosen_site.name = new_name
self.logger.debug(chosen_site.json)
default_tags_str = ""
if old_site:
default_tags_str = f' [{", ".join(old_site.tags)}]'
new_tags = input(
f"{Fore.GREEN}[?] Site tags{default_tags_str}: {Style.RESET_ALL}"
)
if new_tags:
chosen_site.tags = list(map(str.strip, new_tags.split(',')))
else:
chosen_site.tags = []
self.logger.info(f"Site tags are: {', '.join(chosen_site.tags)}")
# rank = Submitter.get_alexa_rank(chosen_site.url_main)
# if rank:
# print(f'New alexa rank: {rank}')
# chosen_site.alexa_rank = rank
self.logger.info(chosen_site.json)
site_data = chosen_site.strip_engine_data()
self.logger.debug(site_data.json)
self.db.update_site(site_data)
self.logger.info(site_data.json)
if old_site:
# Update old site with new values and log changes
fields_to_check = {
'url': 'URL',
'url_main': 'Main URL',
'username_claimed': 'Username claimed',
'username_unclaimed': 'Username unclaimed',
'check_type': 'Check type',
'presense_strs': 'Presence strings',
'absence_strs': 'Absence strings',
'tags': 'Tags',
'source': 'Source',
'headers': 'Headers',
}
for field, display_name in fields_to_check.items():
old_value = getattr(old_site, field)
new_value = getattr(site_data, field)
if field == 'tags' and not new_tags:
continue
if str(old_value) != str(new_value):
print(
f"{Fore.YELLOW}[*] '{display_name}' updated: {Fore.RED}{old_value} {Fore.YELLOW}to {Fore.GREEN}{new_value}{Style.RESET_ALL}"
)
old_site.__dict__[field] = new_value
# update the site
final_site = old_site if old_site else site_data
self.db.update_site(final_site)
# save the db in file
if self.args.db_file != self.settings.sites_db_path:
print(
f"{Fore.GREEN}[+] Maigret DB is saved to {self.args.db}.{Style.RESET_ALL}"
)
self.db.save_to_file(self.args.db)
return True
+5
View File
@@ -3,6 +3,7 @@ import ast
import difflib
import re
import random
import string
from typing import Any
@@ -119,3 +120,7 @@ def get_match_ratio(base_strs: list):
)
return get_match_inner
def generate_random_username():
return ''.join(random.choices(string.ascii_lowercase, k=10))
+352
View File
@@ -0,0 +1,352 @@
from flask import (
Flask,
render_template,
request,
send_file,
Response,
flash,
redirect,
url_for,
)
import logging
import os
import asyncio
from datetime import datetime
from threading import Thread
import maigret
import maigret.settings
from maigret.sites import MaigretDatabase
from maigret.report import generate_report_context
app = Flask(__name__)
# Use environment variable for secret key, generate random one if not set
app.secret_key = os.getenv('FLASK_SECRET_KEY', os.urandom(24).hex())
# add background job tracking
background_jobs = {}
job_results = {}
# Configuration
app.config["MAIGRET_DB_FILE"] = os.path.join(os.path.dirname(os.path.dirname(__file__)), 'resources', 'data.json')
app.config["COOKIES_FILE"] = "cookies.txt"
app.config["UPLOAD_FOLDER"] = 'uploads'
app.config["REPORTS_FOLDER"] = os.path.abspath('/tmp/maigret_reports')
def setup_logger(log_level, name):
logger = logging.getLogger(name)
logger.setLevel(log_level)
return logger
async def maigret_search(username, options):
logger = setup_logger(logging.WARNING, 'maigret')
try:
db = MaigretDatabase().load_from_path(app.config["MAIGRET_DB_FILE"])
top_sites = int(options.get('top_sites') or 500)
if options.get('all_sites'):
top_sites = 999999999 # effectively all
tags = options.get('tags', [])
excluded_tags = options.get('excluded_tags', [])
site_list = options.get('site_list', [])
logger.info(f"Filtering sites by tags: {tags}, excluded: {excluded_tags}")
sites = db.ranked_sites_dict(
top=top_sites,
tags=tags,
excluded_tags=excluded_tags,
names=site_list,
disabled=False,
id_type='username',
)
logger.info(f"Found {len(sites)} sites matching the tag criteria")
results = await maigret.search(
username=username,
site_dict=sites,
timeout=int(options.get('timeout', 30)),
logger=logger,
id_type='username',
cookies=app.config["COOKIES_FILE"] if options.get('use_cookies') else None,
is_parsing_enabled=(not options.get('disable_extracting', False)),
recursive_search_enabled=(
not options.get('disable_recursive_search', False)
),
check_domains=options.get('with_domains', False),
proxy=options.get('proxy', None),
tor_proxy=options.get('tor_proxy', None),
i2p_proxy=options.get('i2p_proxy', None),
)
return results
except Exception as e:
logger.error(f"Error during search: {str(e)}")
raise
async def search_multiple_usernames(usernames, options):
results = []
for username in usernames:
try:
search_results = await maigret_search(username.strip(), options)
results.append((username.strip(), 'username', search_results))
except Exception as e:
logging.error(f"Error searching username {username}: {str(e)}")
return results
def process_search_task(usernames, options, timestamp):
try:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
general_results = loop.run_until_complete(
search_multiple_usernames(usernames, options)
)
os.makedirs(app.config["REPORTS_FOLDER"], exist_ok=True)
session_folder = os.path.join(
app.config["REPORTS_FOLDER"], f"search_{timestamp}"
)
os.makedirs(session_folder, exist_ok=True)
graph_path = os.path.join(session_folder, "combined_graph.html")
maigret.report.save_graph_report(
graph_path,
general_results,
MaigretDatabase().load_from_path(app.config["MAIGRET_DB_FILE"]),
)
individual_reports = []
for username, id_type, results in general_results:
report_base = os.path.join(session_folder, f"report_{username}")
csv_path = f"{report_base}.csv"
json_path = f"{report_base}.json"
pdf_path = f"{report_base}.pdf"
html_path = f"{report_base}.html"
context = generate_report_context(general_results)
maigret.report.save_csv_report(csv_path, username, results)
maigret.report.save_json_report(
json_path, username, results, report_type='ndjson'
)
maigret.report.save_pdf_report(pdf_path, context)
maigret.report.save_html_report(html_path, context)
claimed_profiles = []
for site_name, site_data in results.items():
if (
site_data.get('status')
and site_data['status'].status
== maigret.result.MaigretCheckStatus.CLAIMED
):
claimed_profiles.append(
{
'site_name': site_name,
'url': site_data.get('url_user', ''),
'tags': (
site_data.get('status').tags
if site_data.get('status')
else []
),
}
)
individual_reports.append(
{
'username': username,
'csv_file': os.path.join(
f"search_{timestamp}", f"report_{username}.csv"
),
'json_file': os.path.join(
f"search_{timestamp}", f"report_{username}.json"
),
'pdf_file': os.path.join(
f"search_{timestamp}", f"report_{username}.pdf"
),
'html_file': os.path.join(
f"search_{timestamp}", f"report_{username}.html"
),
'claimed_profiles': claimed_profiles,
}
)
# save results and mark job as complete using timestamp as key
job_results[timestamp] = {
'status': 'completed',
'session_folder': f"search_{timestamp}",
'graph_file': os.path.join(f"search_{timestamp}", "combined_graph.html"),
'usernames': usernames,
'individual_reports': individual_reports,
}
except Exception as e:
logging.error(f"Error in search task for timestamp {timestamp}: {str(e)}")
job_results[timestamp] = {'status': 'failed', 'error': str(e)}
finally:
background_jobs[timestamp]['completed'] = True
@app.route('/')
def index():
# load site data for autocomplete
db = MaigretDatabase().load_from_path(app.config["MAIGRET_DB_FILE"])
site_options = []
for site in db.sites:
# add main site name
site_options.append(site.name)
# add URL if different from name
if site.url_main and site.url_main not in site_options:
site_options.append(site.url_main)
# sort and deduplicate
site_options = sorted(set(site_options))
return render_template('index.html', site_options=site_options)
# Modified search route
@app.route('/search', methods=['POST'])
def search():
usernames_input = request.form.get('usernames', '').strip()
if not usernames_input:
flash('At least one username is required', 'danger')
return redirect(url_for('index'))
usernames = [
u.strip() for u in usernames_input.replace(',', ' ').split() if u.strip()
]
# Create timestamp for this search session
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
# Get selected tags - ensure it's a list
selected_tags = request.form.getlist('tags')
excluded_tags = request.form.getlist('excluded_tags')
logging.info(f"Selected tags: {selected_tags}, Excluded tags: {excluded_tags}")
options = {
'top_sites': request.form.get('top_sites') or '500',
'timeout': request.form.get('timeout') or '30',
'use_cookies': 'use_cookies' in request.form,
'all_sites': 'all_sites' in request.form,
'disable_recursive_search': 'disable_recursive_search' in request.form,
'disable_extracting': 'disable_extracting' in request.form,
'with_domains': 'with_domains' in request.form,
'proxy': request.form.get('proxy', None) or None,
'tor_proxy': request.form.get('tor_proxy', None) or None,
'i2p_proxy': request.form.get('i2p_proxy', None) or None,
'permute': 'permute' in request.form,
'tags': selected_tags, # Pass selected tags as a list
'excluded_tags': excluded_tags, # Pass excluded tags as a list
'site_list': [
s.strip() for s in request.form.get('site', '').split(',') if s.strip()
],
}
logging.info(
f"Starting search for usernames: {usernames} with tags: {selected_tags}, excluded: {excluded_tags}"
)
# Start background job
background_jobs[timestamp] = {
'completed': False,
'thread': Thread(
target=process_search_task, args=(usernames, options, timestamp)
),
}
background_jobs[timestamp]['thread'].start()
return redirect(url_for('status', timestamp=timestamp))
@app.route('/status/<timestamp>')
def status(timestamp):
logging.info(f"Status check for timestamp: {timestamp}")
# Validate timestamp
if timestamp not in background_jobs:
flash('Invalid search session.', 'danger')
logging.error(f"Invalid search session: {timestamp}")
return redirect(url_for('index'))
# Check if job is completed
if background_jobs[timestamp]['completed']:
result = job_results.get(timestamp)
if not result:
flash('No results found for this search session.', 'warning')
logging.error(f"No results found for completed session: {timestamp}")
return redirect(url_for('index'))
if result['status'] == 'completed':
# Note: use the session_folder from the results to redirect
return redirect(url_for('results', session_id=result['session_folder']))
else:
error_msg = result.get('error', 'Unknown error occurred.')
flash(f'Search failed: {error_msg}', 'danger')
logging.error(f"Search failed for session {timestamp}: {error_msg}")
return redirect(url_for('index'))
# If job is still running, show a status page
return render_template('status.html', timestamp=timestamp)
@app.route('/results/<session_id>')
def results(session_id):
# Find completed results that match this session_folder
result_data = next(
(
r
for r in job_results.values()
if r.get('status') == 'completed' and r['session_folder'] == session_id
),
None,
)
if not result_data:
flash('No results found for this session ID.', 'danger')
logging.error(f"Results for session {session_id} not found in job_results.")
return redirect(url_for('index'))
return render_template(
'results.html',
usernames=result_data['usernames'],
graph_file=result_data['graph_file'],
individual_reports=result_data['individual_reports'],
timestamp=session_id.replace('search_', ''),
)
@app.route('/reports/<path:filename>')
def download_report(filename):
try:
os.makedirs(app.config["REPORTS_FOLDER"], exist_ok=True)
file_path = os.path.normpath(
os.path.join(app.config["REPORTS_FOLDER"], filename)
)
if not file_path.startswith(app.config["REPORTS_FOLDER"]):
raise Exception("Invalid file path")
return send_file(file_path)
except Exception as e:
logging.error(f"Error serving file {filename}: {str(e)}")
return "File not found", 404
if __name__ == '__main__':
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
)
debug_mode = os.getenv('FLASK_DEBUG', 'False').lower() in ['true', '1', 't']
# Host configuration: secure by default
# Use 127.0.0.1 for local development, 0.0.0.0 only if explicitly set
host = os.getenv('FLASK_HOST', '127.0.0.1')
port = int(os.getenv('FLASK_PORT', '5000'))
app.run(host=host, port=port, debug=debug_mode)
Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

+118
View File
@@ -0,0 +1,118 @@
<!DOCTYPE html>
<html lang="en" data-bs-theme="dark">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Maigret Web Interface</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
<style>
body {
min-height: 100vh;
display: flex;
flex-direction: column;
}
.main-container {
flex: 1;
padding-top: 2rem;
}
.form-container {
max-width: auto;
margin: auto;
padding-bottom: 2rem;
}
[data-bs-theme="dark"] {
--bs-body-bg: #212529;
--bs-body-color: #dee2e6;
}
.header {
padding: 1rem 0;
margin-bottom: 2rem;
border-bottom: 1px solid var(--bs-border-color);
}
.header-content {
display: flex;
align-items: center;
justify-content: space-between;
}
.logo-container {
display: flex;
align-items: center;
gap: 1rem;
}
.logo {
height: 40px;
width: auto;
}
.footer {
margin-top: auto;
padding: 1rem 0;
text-align: center;
border-top: 1px solid var(--bs-border-color);
font-size: 0.9rem;
}
.footer a {
color: inherit;
text-decoration: none;
}
.footer a:hover {
text-decoration: underline;
}
</style>
</head>
<body>
<div class="header">
<div class="container">
<div class="header-content">
<div class="logo-container">
<img src="{{ url_for('static', filename='maigret.png') }}" alt="Maigret Logo" class="logo">
<h1 class="h4 mb-0">Maigret Web Interface</h1>
</div>
<button class="btn btn-outline-secondary" id="theme-toggle">
Toggle Dark/Light Mode
</button>
</div>
</div>
</div>
<div class="main-container">
<div class="container">
{% block content %}{% endblock %}
</div>
</div>
<footer class="footer">
<div class="container">
<p class="mb-0">
Powered by <a href="https://github.com/soxoj/maigret" target="_blank">Maigret</a> |
Licensed under <a href="https://github.com/soxoj/maigret/blob/main/LICENSE" target="_blank">MIT
License</a>
</p>
</div>
</footer>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script>
<script>
document.getElementById('theme-toggle').addEventListener('click', function () {
const html = document.documentElement;
if (html.getAttribute('data-bs-theme') === 'dark') {
html.setAttribute('data-bs-theme', 'light');
} else {
html.setAttribute('data-bs-theme', 'dark');
}
});
</script>
</body>
</html>
+520
View File
@@ -0,0 +1,520 @@
{% extends "base.html" %}
{% block content %}
<style>
.tag-cloud {
display: flex;
flex-wrap: wrap;
gap: 8px;
padding: 15px;
border-radius: 8px;
background: rgba(0, 0, 0, 0.05);
margin-bottom: 20px;
}
.tag {
display: inline-block;
padding: 5px 10px;
border-radius: 15px;
background-color: #dc3545;
color: white;
cursor: pointer;
font-size: 14px;
transition: all 0.3s ease;
user-select: none;
}
.tag.selected {
background-color: #28a745;
}
.tag.excluded {
background-color: #343a40;
text-decoration: line-through;
}
.tag:hover {
transform: translateY(-2px);
box-shadow: 0 2px 5px rgba(0, 0, 0, 0.2);
}
.hidden-select {
display: none !important;
}
.site-input-container {
position: relative;
}
.site-input {
width: 100%;
}
.selected-sites {
display: flex;
flex-wrap: wrap;
gap: 8px;
padding: 10px 0;
}
.selected-site {
background-color: #214e7b;
padding: 2px 8px;
border-radius: 12px;
font-size: 14px;
display: inline-flex;
align-items: center;
gap: 5px;
}
.remove-site {
cursor: pointer;
color: #dc3545;
font-weight: bold;
}
.section-header {
cursor: pointer;
padding: 1rem;
background: rgba(255, 255, 255, 0.05);
border-radius: 4px;
margin-bottom: 0.5rem;
display: flex;
justify-content: space-between;
align-items: center;
}
.section-content {
padding: 1rem;
display: none;
}
.section-content.show {
display: block;
}
.chevron::after {
content: '▼';
transition: transform 0.2s;
}
.chevron.collapsed::after {
transform: rotate(-90deg);
}
.main-search-section {
background: rgba(255, 255, 255, 0.03);
padding: 2rem;
border-radius: 8px;
margin-bottom: 2rem;
}
.search-button {
width: 100%;
padding: 1rem;
font-size: 1.2rem;
margin-top: 2rem;
}
</style>
<div class="form-container">
{% if error %}
<div class="alert alert-danger">{{ error }}</div>
{% endif %}
<form method="POST" action="{{ url_for('search') }}" class="mb-4">
<!-- Main Search Section -->
<div class="main-search-section">
<div class="mb-4">
<label for="usernames" class="form-label h5">Usernames to Search</label>
<textarea class="form-control" id="usernames" name="usernames" rows="3" required
placeholder="Enter one or more usernames (separated by spaces or commas)..."></textarea>
</div>
<div class="row align-items-center">
<div class="col-md-6">
<label for="top_sites" class="form-label">Number of Sites</label>
<input type="number" class="form-control" id="top_sites" name="top_sites" min="1" max="10000"
placeholder="Default: 500">
</div>
<div class="col-md-6">
<label for="timeout" class="form-label">Timeout (seconds)</label>
<input type="number" class="form-control" id="timeout" name="timeout" min="1"
placeholder="Default: 30">
</div>
<div class="col-12 mt-3">
<div class="form-check">
<input type="checkbox" class="form-check-input" id="all_sites" name="all_sites"
onchange="document.getElementById('top_sites').disabled = this.checked;">
<label class="form-check-label" for="all_sites">Search All Sites</label>
</div>
</div>
</div>
</div>
<!-- Filters Section -->
<div class="mb-4">
<div class="section-header" onclick="toggleSection('filters')">
<h5 class="mb-0">Filters</h5>
<span class="chevron"></span>
</div>
<div id="filters" class="section-content">
<div class="mb-3 site-input-container">
<label for="site" class="form-label">Specify Sites (Optional)</label>
<input type="text" class="form-control site-input" id="siteInput"
placeholder="Type to search for sites..." list="siteOptions">
<input type="hidden" id="site" name="site">
<datalist id="siteOptions">
{% for site in site_options %}
<option value="{{ site }}">
{% endfor %}
</datalist>
<div class="selected-sites" id="selectedSites"></div>
</div>
<div class="mb-3">
<label class="form-label">Tags (click to cycle: include → exclude → neutral)</label>
<div class="mb-2">
<small class="text-muted">
<span style="display:inline-block;width:12px;height:12px;background:#28a745;border-radius:50%;"></span> Included (whitelist)
&nbsp;&nbsp;
<span style="display:inline-block;width:12px;height:12px;background:#343a40;border-radius:50%;"></span> Excluded (blacklist)
&nbsp;&nbsp;
<span style="display:inline-block;width:12px;height:12px;background:#dc3545;border-radius:50%;"></span> Neutral
</small>
</div>
<div class="tag-cloud" id="tagCloud"></div>
<select multiple class="hidden-select" id="tags" name="tags">
<option value="gaming">Gaming</option>
<option value="coding">Coding</option>
<option value="photo">Photo</option>
<option value="music">Music</option>
<option value="blog">Blog</option>
<option value="finance">Finance</option>
<option value="freelance">Freelance</option>
<option value="dating">Dating</option>
<option value="tech">Tech</option>
<option value="forum">Forum</option>
<option value="porn">Porn</option>
<option value="erotic">Erotic</option>
<option value="webcam">Webcam</option>
<option value="video">Video</option>
<option value="movies">Movies</option>
<option value="hacking">Hacking</option>
<option value="art">Art</option>
<option value="discussion">Discussion</option>
<option value="sharing">Sharing</option>
<option value="writing">Writing</option>
<option value="wiki">Wiki</option>
<option value="business">Business</option>
<option value="shopping">Shopping</option>
<option value="sport">Sport</option>
<option value="books">Books</option>
<option value="news">News</option>
<option value="documents">Documents</option>
<option value="travel">Travel</option>
<option value="maps">Maps</option>
<option value="hobby">Hobby</option>
<option value="apps">Apps</option>
<option value="classified">Classified</option>
<option value="career">Career</option>
<option value="geosocial">Geosocial</option>
<option value="streaming">Streaming</option>
<option value="education">Education</option>
<option value="networking">Networking</option>
<option value="torrent">Torrent</option>
<option value="science">Science</option>
<option value="medicine">Medicine</option>
<option value="reading">Reading</option>
<option value="stock">Stock</option>
<option value="messaging">Messaging</option>
<option value="trading">Trading</option>
<option value="links">Links</option>
<option value="fashion">Fashion</option>
<option value="tasks">Tasks</option>
<option value="military">Military</option>
<option value="auto">Auto</option>
<option value="gambling">Gambling</option>
<option value="cybercriminal">Cybercriminal</option>
<option value="review">Review</option>
<option value="bookmarks">Bookmarks</option>
<option value="design">Design</option>
<option value="tor">Tor</option>
<option value="i2p">I2P</option>
<option value="q&a">Q&A</option>
<option value="crypto">Crypto</option>
<option value="ai">AI</option>
<!-- Country tags -->
<option value="ae" data-group="country">AE - United Arab Emirates</option>
<option value="ao" data-group="country">AO - Angola</option>
<option value="ar" data-group="country">AR - Argentina</option>
<option value="at" data-group="country">AT - Austria</option>
<option value="au" data-group="country">AU - Australia</option>
<option value="az" data-group="country">AZ - Azerbaijan</option>
<option value="bd" data-group="country">BD - Bangladesh</option>
<option value="be" data-group="country">BE - Belgium</option>
<option value="bg" data-group="country">BG - Bulgaria</option>
<option value="br" data-group="country">BR - Brazil</option>
<option value="by" data-group="country">BY - Belarus</option>
<option value="ca" data-group="country">CA - Canada</option>
<option value="ch" data-group="country">CH - Switzerland</option>
<option value="cl" data-group="country">CL - Chile</option>
<option value="cn" data-group="country">CN - China</option>
<option value="co" data-group="country">CO - Colombia</option>
<option value="cr" data-group="country">CR - Costa Rica</option>
<option value="cz" data-group="country">CZ - Czechia</option>
<option value="de" data-group="country">DE - Germany</option>
<option value="dk" data-group="country">DK - Denmark</option>
<option value="dz" data-group="country">DZ - Algeria</option>
<option value="ee" data-group="country">EE - Estonia</option>
<option value="eg" data-group="country">EG - Egypt</option>
<option value="es" data-group="country">ES - Spain</option>
<option value="eu" data-group="country">EU - European Union</option>
<option value="fi" data-group="country">FI - Finland</option>
<option value="fr" data-group="country">FR - France</option>
<option value="gb" data-group="country">GB - United Kingdom</option>
<option value="global" data-group="country">🌍 Global</option>
<option value="gr" data-group="country">GR - Greece</option>
<option value="hk" data-group="country">HK - Hong Kong</option>
<option value="hr" data-group="country">HR - Croatia</option>
<option value="hu" data-group="country">HU - Hungary</option>
<option value="id" data-group="country">ID - Indonesia</option>
<option value="ie" data-group="country">IE - Ireland</option>
<option value="il" data-group="country">IL - Israel</option>
<option value="in" data-group="country">IN - India</option>
<option value="ir" data-group="country">IR - Iran</option>
<option value="it" data-group="country">IT - Italy</option>
<option value="jp" data-group="country">JP - Japan</option>
<option value="kg" data-group="country">KG - Kyrgyzstan</option>
<option value="kr" data-group="country">KR - Korea</option>
<option value="kz" data-group="country">KZ - Kazakhstan</option>
<option value="la" data-group="country">LA - Laos</option>
<option value="lk" data-group="country">LK - Sri Lanka</option>
<option value="lt" data-group="country">LT - Lithuania</option>
<option value="ma" data-group="country">MA - Morocco</option>
<option value="md" data-group="country">MD - Moldova</option>
<option value="mg" data-group="country">MG - Madagascar</option>
<option value="mk" data-group="country">MK - North Macedonia</option>
<option value="mx" data-group="country">MX - Mexico</option>
<option value="ng" data-group="country">NG - Nigeria</option>
<option value="nl" data-group="country">NL - Netherlands</option>
<option value="no" data-group="country">NO - Norway</option>
<option value="ph" data-group="country">PH - Philippines</option>
<option value="pk" data-group="country">PK - Pakistan</option>
<option value="pl" data-group="country">PL - Poland</option>
<option value="pt" data-group="country">PT - Portugal</option>
<option value="re" data-group="country">RE - Réunion</option>
<option value="ro" data-group="country">RO - Romania</option>
<option value="rs" data-group="country">RS - Serbia</option>
<option value="ru" data-group="country">RU - Russia</option>
<option value="sa" data-group="country">SA - Saudi Arabia</option>
<option value="sd" data-group="country">SD - Sudan</option>
<option value="se" data-group="country">SE - Sweden</option>
<option value="sg" data-group="country">SG - Singapore</option>
<option value="sk" data-group="country">SK - Slovakia</option>
<option value="sv" data-group="country">SV - El Salvador</option>
<option value="th" data-group="country">TH - Thailand</option>
<option value="tn" data-group="country">TN - Tunisia</option>
<option value="tr" data-group="country">TR - Türkiye</option>
<option value="tw" data-group="country">TW - Taiwan</option>
<option value="ua" data-group="country">UA - Ukraine</option>
<option value="uk" data-group="country">UK - United Kingdom</option>
<option value="us" data-group="country">US - United States</option>
<option value="uz" data-group="country">UZ - Uzbekistan</option>
<option value="ve" data-group="country">VE - Venezuela</option>
<option value="vi" data-group="country">VI - Virgin Islands</option>
<option value="vn" data-group="country">VN - Viet Nam</option>
<option value="za" data-group="country">ZA - South Africa</option>
</select>
<select multiple class="hidden-select" id="excludedTags" name="excluded_tags">
</select>
</div>
</div>
</div>
<!-- Advanced Options Section -->
<div class="mb-4">
<div class="section-header" onclick="toggleSection('advanced')">
<h5 class="mb-0">Advanced Options</h5>
<span class="chevron"></span>
</div>
<div id="advanced" class="section-content">
<div class="mb-3 form-check">
<input type="checkbox" class="form-check-input" id="permute" name="permute">
<label class="form-check-label" for="permute">Enable Username Permutations</label>
</div>
<div class="mb-3 form-check">
<input type="checkbox" class="form-check-input" id="disable_recursive_search"
name="disable_recursive_search">
<label class="form-check-label" for="disable_recursive_search">Disable Recursive Search</label>
</div>
<div class="mb-3 form-check">
<input type="checkbox" class="form-check-input" id="disable_extracting" name="disable_extracting">
<label class="form-check-label" for="disable_extracting">Disable Information Extraction</label>
</div>
<div class="mb-3 form-check">
<input type="checkbox" class="form-check-input" id="with_domains" name="with_domains">
<label class="form-check-label" for="with_domains">Check Domains</label>
</div>
<div class="mb-3">
<label for="proxy" class="form-label">Proxy URL</label>
<input type="text" class="form-control" id="proxy" name="proxy"
placeholder="e.g., 127.0.0.1:1080">
</div>
<div class="mb-3">
<label for="tor_proxy" class="form-label">TOR Proxy URL</label>
<input type="text" class="form-control" id="tor_proxy" name="tor_proxy"
placeholder="Default: 127.0.0.1:9050">
</div>
<div class="mb-3">
<label for="i2p_proxy" class="form-label">I2P Proxy URL</label>
<input type="text" class="form-control" id="i2p_proxy" name="i2p_proxy"
placeholder="Default: 127.0.0.1:4444">
</div>
</div>
</div>
<button type="submit" class="btn search-button" style="background-color: rgb(249, 207, 0); color: black;">
Start Search
</button>
</form>
</div>
<script>
function toggleSection(sectionId) {
const content = document.getElementById(sectionId);
const header = content.previousElementSibling;
content.classList.toggle('show');
header.querySelector('.chevron').classList.toggle('collapsed');
}
document.addEventListener('DOMContentLoaded', function () {
// Tag cloud functionality with include/exclude (whitelist/blacklist) support
const tagCloud = document.getElementById('tagCloud');
const hiddenSelect = document.getElementById('tags');
const excludedSelect = document.getElementById('excludedTags');
const allTags = Array.from(hiddenSelect.options).map(opt => ({
value: opt.value,
label: opt.text,
group: opt.dataset.group || 'category'
}));
function updateTagSelects() {
// Clear and repopulate hidden selects based on tag states
Array.from(hiddenSelect.options).forEach(opt => opt.selected = false);
// Clear excluded select
excludedSelect.innerHTML = '';
document.querySelectorAll('#tagCloud .tag').forEach(tagEl => {
const val = tagEl.dataset.value;
if (tagEl.classList.contains('selected')) {
const option = Array.from(hiddenSelect.options).find(opt => opt.value === val);
if (option) option.selected = true;
} else if (tagEl.classList.contains('excluded')) {
const opt = document.createElement('option');
opt.value = val;
opt.selected = true;
excludedSelect.appendChild(opt);
}
});
}
let lastGroup = '';
allTags.forEach(tag => {
if (tag.group !== lastGroup && tag.group === 'country') {
const separator = document.createElement('div');
separator.style.cssText = 'width:100%;margin:8px 0 4px;padding:4px 0;border-top:1px solid rgba(0,0,0,0.15);font-size:13px;color:#666;';
separator.textContent = 'Countries';
tagCloud.appendChild(separator);
}
lastGroup = tag.group;
const tagElement = document.createElement('span');
tagElement.className = 'tag';
tagElement.textContent = tag.label;
tagElement.dataset.value = tag.value;
// Single click cycles: neutral -> included -> excluded -> neutral
tagElement.addEventListener('click', function (e) {
e.preventDefault();
if (this.classList.contains('selected')) {
// included -> excluded
this.classList.remove('selected');
this.classList.add('excluded');
} else if (this.classList.contains('excluded')) {
// excluded -> neutral
this.classList.remove('excluded');
} else {
// neutral -> included
this.classList.add('selected');
}
updateTagSelects();
});
tagCloud.appendChild(tagElement);
});
// Site selection functionality
const siteInput = document.getElementById('siteInput');
const hiddenInput = document.getElementById('site');
const selectedSitesContainer = document.getElementById('selectedSites');
let selectedSites = new Set();
function updateHiddenInput() {
hiddenInput.value = Array.from(selectedSites).join(',');
}
function addSite(site) {
if (site && !selectedSites.has(site)) {
selectedSites.add(site);
updateHiddenInput();
const siteElement = document.createElement('span');
siteElement.className = 'selected-site';
siteElement.innerHTML = `${site}<span class="remove-site" data-site="${site}">&times;</span>`;
selectedSitesContainer.appendChild(siteElement);
}
}
function removeSite(site) {
selectedSites.delete(site);
updateHiddenInput();
const siteElements = selectedSitesContainer.querySelectorAll('.selected-site');
siteElements.forEach(el => {
if (el.querySelector('.remove-site').dataset.site === site) {
el.remove();
}
});
}
siteInput.addEventListener('change', function (e) {
const value = this.value.trim();
if (value) {
addSite(value);
this.value = '';
}
});
selectedSitesContainer.addEventListener('click', function (e) {
if (e.target.classList.contains('remove-site')) {
removeSite(e.target.dataset.site);
}
});
siteInput.addEventListener('paste', function (e) {
e.preventDefault();
const paste = (e.clipboardData || window.clipboardData).getData('text');
const sites = paste.split(',').map(site => site.trim()).filter(site => site);
sites.forEach(addSite);
});
const form = document.querySelector('form');
form.addEventListener('submit', function (e) {
const selectedTags = Array.from(tagCloud.querySelectorAll('.tag.selected'));
Array.from(hiddenSelect.options).forEach(opt => {
opt.selected = selectedTags.some(tag => tag.dataset.value === opt.value);
});
updateHiddenInput();
});
});
</script>
{% endblock %}
+156
View File
@@ -0,0 +1,156 @@
{% extends "base.html" %}
{% block content %}
<style>
.tag-badge {
background-color: #214e7b;
padding: 2px 8px;
border-radius: 12px;
font-size: 14px;
display: inline-flex;
align-items: center;
gap: 5px;
margin: 2px;
color: white;
}
.profile-list {
list-style: none;
padding: 0;
}
.profile-item {
margin-bottom: 10px;
padding: 10px;
display: flex;
justify-content: space-between;
align-items: center;
border-bottom: 1px solid rgba(255, 255, 255, 0.1);
}
.profile-link {
display: flex;
align-items: center;
gap: 8px;
}
.favicon {
width: 16px;
height: 16px;
}
.tag-container {
display: flex;
flex-wrap: wrap;
gap: 5px;
justify-content: flex-end;
}
.report-container {
margin-bottom: 1rem;
}
.report-header {
cursor: pointer;
padding: 1rem;
background: rgba(255, 255, 255, 0.05);
border-radius: 4px;
margin-bottom: 0.5rem;
}
.report-content {
display: none;
}
.report-content.show {
display: block;
}
.chevron::after {
content: '▼';
margin-left: 8px;
transition: transform 0.2s;
}
.chevron.collapsed::after {
transform: rotate(-90deg);
}
</style>
<div class="form-container">
<h1 class="mb-4">Search Results</h1>
<!-- Flash messages -->
{% with messages = get_flashed_messages() %}
{% if messages %}
{% for message in messages %}
<div class="alert alert-info">{{ message }}</div>
{% endfor %}
{% endif %}
{% endwith %}
<p>The search has completed. <a href="{{ url_for('index')}}">Back to start.</a></p>
{% if graph_file %}
<h3>Combined Graph</h3>
<iframe src="{{ url_for('download_report', filename=graph_file) }}" style="width:100%; height:600px; border:none;"></iframe>
{% endif %}
<hr>
{% if individual_reports %}
<h3>Individual Reports</h3>
<div class="reports-list">
{% for report in individual_reports %}
<div class="report-container">
<div class="report-header" onclick="toggleReport(this)" data-target="report-{{ loop.index }}">
<h5 class="mb-0 d-flex align-items-center">
<span>{{ report.username }}</span>
<span class="chevron"></span>
</h5>
</div>
<div id="report-{{ loop.index }}" class="report-content">
<p>
<a href="{{ url_for('download_report', filename=report.csv_file) }}">CSV Report</a> |
<a href="{{ url_for('download_report', filename=report.json_file) }}">JSON Report</a> |
<a href="{{ url_for('download_report', filename=report.pdf_file) }}">PDF Report</a> |
<a href="{{ url_for('download_report', filename=report.html_file) }}">HTML Report</a>
</p>
{% if report.claimed_profiles %}
<strong>Claimed Profiles:</strong>
<ul class="profile-list">
{% for profile in report.claimed_profiles %}
<li class="profile-item">
<div class="profile-link">
<img class="favicon" src="https://www.google.com/s2/favicons?domain={{ profile.url }}" onerror="this.style.display='none'" alt="">
<a href="{{ profile.url }}" target="_blank">{{ profile.site_name }}</a>
</div>
{% if profile.tags %}
<div class="tag-container">
{% for tag in profile.tags %}
<span class="tag-badge">{{ tag }}</span>
{% endfor %}
</div>
{% endif %}
</li>
{% endfor %}
</ul>
{% else %}
<p>No claimed profiles found.</p>
{% endif %}
</div>
</div>
{% endfor %}
</div>
{% else %}
<p>No individual reports available.</p>
{% endif %}
</div>
<script>
function toggleReport(header) {
const reportId = header.getAttribute('data-target');
const content = document.getElementById(reportId);
content.classList.toggle('show');
header.querySelector('.chevron').classList.toggle('collapsed');
}
</script>
{% endblock %}
+16
View File
@@ -0,0 +1,16 @@
{% extends "base.html" %}
{% block content %}
<div class="container mt-4 text-center">
<h2>Search in progress...</h2>
<p>Your request is being processed in the background. This page will automatically redirect once the results are ready.</p>
<div class="spinner-border text-primary" role="status">
<span class="visually-hidden">Loading...</span>
</div>
<script>
// Auto-refresh the page every 5 seconds to check completion
setTimeout(function() {
window.location.reload();
}, 5000);
</script>
</div>
{% endblock %}
+47
View File
@@ -0,0 +1,47 @@
# Download this first to avoid compatibility issues:
#
# sudo zypper in python3-devel
# sudo zypper in python3-dev
#
# Then run 'pip3 install -r opensuse.txt' as usual.
#
aiodns>=3.0.0
aiohttp>=3.8.6
aiohttp-socks>=0.7.1
arabic-reshaper~=3.0.0
async-timeout
attrs>=22.2.0
certifi>=2023.7.22
chardet>=5.0.0
colorama
future>=0.18.3
future-annotations>=1.0.0
html5lib>=1.1
idna>=3.4
Jinja2
lxml>=4.9.2
MarkupSafe
mock>=4.0.3
multidict
pycountry>=22.3.5
PyPDF2>=3.0.1
PySocks>=1.7.1
python-bidi>=0.4.2
requests
requests-futures>=1.0.0
six>=1.16.0
socid-extractor>=0.0.24
soupsieve>=2.3.2.post1
stem>=1.8.1
torrequest>=0.1.0
tqdm
typing-extensions
webencodings>=0.5.1
svglib
xhtml2pdf~=0.2.11
XMind>=1.2.0
yarl
networkx
pyvis>=0.2.1
reportlab
cloudscraper>=1.2.71
Generated
+3654
View File
File diff suppressed because it is too large Load Diff
+4 -4
View File
@@ -1,5 +1,5 @@
maigret @ https://github.com/soxoj/maigret/archive/refs/heads/main.zip
pefile==2021.9.3
psutil==5.9.0
pyinstaller @ https://github.com/pyinstaller/pyinstaller/archive/develop.zip
pywin32-ctypes==0.2.0
pefile==2023.2.7 # do not bump while pyinstaller is 6.11.1, there is a conflict
psutil==7.1.3
pyinstaller==6.16.0
pywin32-ctypes==0.2.3
+97
View File
@@ -0,0 +1,97 @@
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
[tool.poetry]
name = "maigret"
version = "0.5.0"
description = "🕵️‍♂️ Collect a dossier on a person by username from thousands of sites."
authors = ["Soxoj <soxoj@protonmail.com>"]
readme = "README.md"
license = "MIT License"
homepage = "https://pypi.org/project/maigret"
documentation = "https://maigret.readthedocs.io"
repository = "https://github.com/soxoj/maigret"
classifiers = [
"Development Status :: 5 - Production/Stable",
"Programming Language :: Python :: 3",
"Intended Audience :: Information Technology",
"Operating System :: OS Independent",
"License :: OSI Approved :: MIT License",
"Natural Language :: English"
]
[tool.poetry.urls]
"Bug Tracker" = "https://github.com/soxoj/maigret/issues"
[tool.poetry.dependencies]
# poetry install
# Install only production dependencies:
# poetry install --without dev
# Install with dev dependencies:
# poetry install --with dev
python = "^3.10"
aiodns = ">=3,<5"
aiohttp = "^3.12.14"
aiohttp-socks = ">=0.10.1,<0.12.0"
arabic-reshaper = "^3.0.0"
async-timeout = "^5.0.1"
attrs = ">=25.3,<27.0"
certifi = ">=2025.6.15,<2027.0.0"
chardet = "^5.0.0"
colorama = "^0.4.6"
future = "^1.0.0"
future-annotations= "^1.0.0"
html5lib = "^1.1"
idna = "^3.4"
Jinja2 = "^3.1.6"
lxml = ">=5.3,<7.0"
MarkupSafe = "^3.0.2"
mock = "^5.1.0"
multidict = "^6.6.3"
pycountry = "^24.6.1"
PyPDF2 = "^3.0.1"
PySocks = "^1.7.1"
python-bidi = "^0.6.3"
requests = "^2.32.4"
requests-futures = "^1.0.2"
requests-toolbelt = "^1.0.0"
six = "^1.17.0"
socid-extractor = "^0.0.27"
soupsieve = "^2.6"
stem = "^1.8.1"
torrequest = "^0.1.0"
alive_progress = "^3.2.0"
typing-extensions = "^4.14.1"
webencodings = "^0.5.1"
xhtml2pdf = "^0.2.11"
XMind = "^1.2.0"
yarl = "^1.20.1"
networkx = "^2.6.3"
pyvis = "^0.3.2"
reportlab = "^4.4.3"
cloudscraper = "^1.2.71"
flask = {extras = ["async"], version = "^3.1.1"}
asgiref = "^3.9.1"
platformdirs = "^4.3.8"
[tool.poetry.group.dev.dependencies]
# How to add a new dev dependency: poetry add black --group dev
# Install dev dependencies with: poetry install --with dev
flake8 = "^7.1.1"
pytest = ">=8.3.4,<10.0.0"
pytest-asyncio = "^1.0.0"
pytest-cov = ">=6,<8"
pytest-httpserver = "^1.0.0"
pytest-rerunfailures = ">=15.1,<17.0"
reportlab = "^4.4.3"
mypy = "^1.14.1"
tuna = "^0.5.11"
coverage = "^7.9.2"
black = ">=25.1,<27.0"
[tool.poetry.scripts]
# Run with: poetry run maigret <username>
maigret = "maigret.maigret:run"
update_sitesmd = "utils.update_site_data:main"
-37
View File
@@ -1,37 +0,0 @@
aiodns==3.0.0
aiohttp==3.8.1
aiohttp-socks==0.7.1
arabic-reshaper==2.1.3
async-timeout==4.0.2
attrs==21.4.0
certifi==2021.10.8
chardet==4.0.0
colorama==0.4.4
future==0.18.2
future-annotations==1.0.0
html5lib==1.1
idna==3.3
Jinja2==3.0.3
lxml==4.7.1
MarkupSafe==2.0.1
mock==4.0.3
multidict==5.2.0
pycountry==22.1.10
PyPDF2==1.26.0
PySocks==1.7.1
python-bidi==0.4.2
requests==2.27.1
requests-futures==1.0.0
six==1.16.0
socid-extractor>=0.0.21
soupsieve==2.3.1
stem==1.8.0
torrequest==0.1.0
tqdm==4.62.3
typing-extensions==4.0.1
webencodings==0.5.1
xhtml2pdf==0.2.5
XMind==1.2.0
yarl==1.7.2
networkx==2.5.1
pyvis==0.1.9
+3 -9
View File
@@ -1,9 +1,3 @@
[egg_info]
tag_build =
tag_date = 0
[flake8]
per-file-ignores = __init__.py:F401
[mypy]
ignore_missing_imports = True
[mutmut]
paths_to_mutate=maigret/
tests_dir=tests/
-26
View File
@@ -1,26 +0,0 @@
from setuptools import (
setup,
find_packages,
)
with open('README.md') as fh:
long_description = fh.read()
with open('requirements.txt') as rf:
requires = rf.read().splitlines()
setup(name='maigret',
version='0.4.1',
description='Collect a dossier on a person by username from a huge number of sites',
long_description=long_description,
long_description_content_type="text/markdown",
url='https://github.com/soxoj/maigret',
install_requires=requires,
entry_points={'console_scripts': ['maigret = maigret.maigret:run']},
packages=find_packages(),
include_package_data=True,
author='Soxoj',
author_email='soxoj@protonmail.com',
license='MIT',
zip_safe=False)
+2994 -2276
View File
File diff suppressed because it is too large Load Diff
+22 -20
View File
@@ -1,30 +1,32 @@
name: maigret2
version: git
summary: SOCMINT / Instagram
title: Maigret
icon: static/maigret.png
name: maigret
summary: 🕵️‍♂️ Collect a dossier on a person by username from thousands of sites.
description: |
Test Test Test
base: core18
**Maigret** collects a dossier on a person **by username only**, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys required. Maigret is an easy-to-use and powerful fork of Sherlock.
Currently supported more than 3000 sites, search is launched against 500 popular sites in descending order of popularity by default. Also supported checking of Tor sites, I2P sites, and domains (via DNS resolving).
version: 0.5.0
license: MIT
base: core22
confinement: strict
source-code: https://github.com/soxoj/maigret
issues:
- https://github.com/soxoj/maigret/issues
donation:
- https://patreon.com/soxoj
contact:
- mailto:soxoj@protonmail.com
parts:
maigret2:
maigret:
plugin: python
python-version: python3
source: .
stage-packages:
- python-six
type: app
apps:
maigret2:
maigret:
command: bin/maigret
architectures:
- build-on: amd64
- build-on: i386
plugs: [ network, network-bind, home ]
Binary file not shown.

Before

Width:  |  Height:  |  Size: 15 KiB

After

Width:  |  Height:  |  Size: 45 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 44 KiB

After

Width:  |  Height:  |  Size: 1.6 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 607 KiB

After

Width:  |  Height:  |  Size: 451 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 773 KiB

After

Width:  |  Height:  |  Size: 351 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 501 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 312 KiB

-7
View File
@@ -1,7 +0,0 @@
flake8==4.0.1
pytest==6.2.5
pytest-asyncio==0.16.0;python_version<"3.7"
pytest-asyncio==0.17.0;python_version>="3.7"
pytest-cov==3.0.0
pytest-httpserver==1.0.3
pytest-rerunfailures==10.2
+48 -1
View File
@@ -8,8 +8,11 @@ from _pytest.mark import Mark
from maigret.sites import MaigretDatabase
from maigret.maigret import setup_arguments_parser
from maigret.settings import Settings
from aiohttp import web
LOCAL_SERVER_PORT = 8080
CUR_PATH = os.path.dirname(os.path.realpath(__file__))
JSON_FILE = os.path.join(CUR_PATH, '../maigret/resources/data.json')
SETTINGS_FILE = os.path.join(CUR_PATH, '../maigret/resources/settings.json')
@@ -18,8 +21,28 @@ LOCAL_TEST_JSON_FILE = os.path.join(CUR_PATH, 'local.json')
empty_mark = Mark('', (), {})
RESULTS_EXAMPLE = {
'Reddit': {
'cookies': None,
'parsing_enabled': False,
'url_main': 'https://www.reddit.com/',
'username': 'Skyeng',
},
'GooglePlayStore': {
'cookies': None,
'http_status': 200,
'is_similar': False,
'parsing_enabled': False,
'rank': 1,
'url_main': 'https://play.google.com/store',
'url_user': 'https://play.google.com/store/apps/developer?id=Skyeng',
'username': 'Skyeng',
},
}
def by_slow_marker(item):
return item.get_closest_marker('slow', default=empty_mark)
return item.get_closest_marker('slow', default=empty_mark).name
def pytest_collection_modifyitems(items):
@@ -59,6 +82,13 @@ def reports_autoclean():
remove_test_reports()
@pytest.fixture(scope='session')
def settings():
settings = Settings()
settings.load([SETTINGS_FILE])
return settings
@pytest.fixture(scope='session')
def argparser():
settings = Settings()
@@ -69,3 +99,20 @@ def argparser():
@pytest.fixture(scope="session")
def httpserver_listen_address():
return ("localhost", 8989)
@pytest.fixture
async def cookie_test_server():
async def handle_cookies(request):
print(f"Received cookies: {request.cookies}")
cookies_dict = {k: v for k, v in request.cookies.items()}
return web.json_response({'cookies': cookies_dict})
app = web.Application()
app.router.add_get('/cookies', handle_cookies)
runner = web.AppRunner(app)
await runner.setup()
server = web.TCPSite(runner, port=LOCAL_SERVER_PORT)
await server.start()
yield server
await runner.cleanup()
+47 -10
View File
@@ -1,25 +1,62 @@
{
"engines": {},
"engines": {
"Discourse": {
"name": "Discourse",
"site": {
"presenseStrs": [
"<meta name=\"generator\" content=\"Discourse"
],
"absenceStrs": [
"Oops! That page doesn\u2019t exist or is private.",
"wrap not-found-container"
],
"checkType": "message",
"url": "{urlMain}/u/{username}/summary"
},
"presenseStrs": [
"<meta name=\"generator\" content=\"Discourse"
]
}
},
"sites": {
"GooglePlayStore": {
"ValidActive": {
"tags": ["global", "us"],
"disabled": false,
"checkType": "status_code",
"alexaRank": 1,
"url": "https://play.google.com/store/apps/developer?id={username}",
"urlMain": "https://play.google.com/store",
"usernameClaimed": "Facebook_nosuchname",
"usernameClaimed": "KONAMI",
"usernameUnclaimed": "noonewouldeverusethis7"
},
"Reddit": {
"tags": ["news", "social", "us"],
"InvalidActive": {
"tags": ["global", "us"],
"disabled": false,
"checkType": "status_code",
"presenseStrs": ["totalKarma"],
"alexaRank": 1,
"url": "https://play.google.com/store/apps/dev?id={username}",
"urlMain": "https://play.google.com/store",
"usernameClaimed": "KONAMI",
"usernameUnclaimed": "noonewouldeverusethis7"
},
"ValidInactive": {
"tags": ["global", "us"],
"disabled": true,
"alexaRank": 17,
"url": "https://www.reddit.com/user/{username}",
"urlMain": "https://www.reddit.com/",
"usernameClaimed": "blue",
"checkType": "status_code",
"alexaRank": 1,
"url": "https://play.google.com/store/apps/developer?id={username}",
"urlMain": "https://play.google.com/store",
"usernameClaimed": "KONAMI",
"usernameUnclaimed": "noonewouldeverusethis7"
},
"InvalidInactive": {
"tags": ["global", "us"],
"disabled": true,
"checkType": "status_code",
"alexaRank": 1,
"url": "https://play.google.com/store/apps/dev?id={username}",
"urlMain": "https://play.google.com/store",
"usernameClaimed": "KONAMI",
"usernameUnclaimed": "noonewouldeverusethis7"
}
}
+19 -17
View File
@@ -1,10 +1,13 @@
"""Maigret activation test functions"""
import json
import yarl
import aiohttp
import pytest
from mock import Mock
from tests.conftest import LOCAL_SERVER_PORT
from maigret.activation import ParsingActivator, import_aiohttp_cookies
COOKIES_TXT = """# HTTP Cookie File downloaded with cookies.txt by Genuinous @genuinous
@@ -18,39 +21,38 @@ xss.is FALSE / TRUE 0 xf_csrf test
xss.is FALSE / TRUE 1642709308 xf_user tset
.xss.is TRUE / FALSE 0 muchacho_cache test
.xss.is TRUE / FALSE 1924905600 132_evc test
httpbin.org FALSE / FALSE 0 a b
localhost FALSE / FALSE 0 a b
"""
@pytest.mark.skip(reason="periodically fails")
@pytest.mark.skip("captcha")
@pytest.mark.slow
def test_twitter_activation(default_db):
twitter_site = default_db.sites_dict['Twitter']
token1 = twitter_site.headers['x-guest-token']
def test_vimeo_activation(default_db):
vimeo_site = default_db.sites_dict['Vimeo']
token1 = vimeo_site.headers['Authorization']
ParsingActivator.twitter(twitter_site, Mock())
token2 = twitter_site.headers['x-guest-token']
ParsingActivator.vimeo(vimeo_site, Mock())
token2 = vimeo_site.headers['Authorization']
assert token1 != token2
@pytest.mark.slow
@pytest.mark.asyncio
async def test_import_aiohttp_cookies():
async def test_import_aiohttp_cookies(cookie_test_server):
cookies_filename = 'cookies_test.txt'
with open(cookies_filename, 'w') as f:
f.write(COOKIES_TXT)
cookie_jar = import_aiohttp_cookies(cookies_filename)
assert list(cookie_jar._cookies.keys()) == ['xss.is', 'httpbin.org']
url = f'http://localhost:{LOCAL_SERVER_PORT}/cookies'
url = 'https://httpbin.org/cookies'
connector = aiohttp.TCPConnector(ssl=False)
session = aiohttp.ClientSession(
connector=connector, trust_env=True, cookie_jar=cookie_jar
)
cookies = cookie_jar.filter_cookies(yarl.URL(url))
assert cookies['a'].value == 'b'
response = await session.get(url=url)
result = json.loads(await response.content.read())
await session.close()
async with aiohttp.ClientSession(cookie_jar=cookie_jar) as session:
async with session.get(url=url) as response:
result = await response.json()
print(f"Server response: {result}")
assert result == {'cookies': {'a': 'b'}}
+46 -5
View File
@@ -1,14 +1,17 @@
"""Maigret command-line arguments parsing tests"""
from argparse import Namespace
from typing import Dict, Any
DEFAULT_ARGS: Dict[str, Any] = {
'all_sites': False,
'auto_disable': False,
'connections': 100,
'cookie_file': None,
'csv': False,
'db_file': 'resources/data.json',
'debug': False,
'diagnose': False,
'disable_extracting': False,
'disable_recursive_search': False,
'folderoutput': 'reports',
@@ -23,15 +26,17 @@ DEFAULT_ARGS: Dict[str, Any] = {
'no_progressbar': False,
'parse_url': '',
'pdf': False,
'permute': False,
'print_check_errors': False,
'print_not_found': False,
'proxy': None,
'reports_sorting': 'default',
'retries': 1,
'retries': 0,
'self_check': False,
'site_list': [],
'stats': False,
'tags': '',
'exclude_tags': '',
'timeout': 30,
'tor_proxy': 'socks5://127.0.0.1:9050',
'i2p_proxy': 'http://127.0.0.1:4444',
@@ -40,6 +45,7 @@ DEFAULT_ARGS: Dict[str, Any] = {
'use_disabled_sites': False,
'username': [],
'verbose': False,
'web': None,
'with_domains': False,
'xmind': False,
}
@@ -53,7 +59,8 @@ def test_args_search_mode(argparser):
want_args = dict(DEFAULT_ARGS)
want_args.update({'username': ['username']})
assert args == Namespace(**want_args)
for arg in vars(args):
assert getattr(args, arg) == want_args[arg]
def test_args_search_mode_several_usernames(argparser):
@@ -64,7 +71,8 @@ def test_args_search_mode_several_usernames(argparser):
want_args = dict(DEFAULT_ARGS)
want_args.update({'username': ['username1', 'username2']})
assert args == Namespace(**want_args)
for arg in vars(args):
assert getattr(args, arg) == want_args[arg]
def test_args_self_check_mode(argparser):
@@ -79,7 +87,8 @@ def test_args_self_check_mode(argparser):
}
)
assert args == Namespace(**want_args)
for arg in vars(args):
assert getattr(args, arg) == want_args[arg]
def test_args_multiple_sites(argparser):
@@ -95,4 +104,36 @@ def test_args_multiple_sites(argparser):
}
)
assert args == Namespace(**want_args)
for arg in vars(args):
assert getattr(args, arg) == want_args[arg]
def test_args_exclude_tags(argparser):
args = argparser.parse_args('--exclude-tags porn,dating username'.split())
want_args = dict(DEFAULT_ARGS)
want_args.update(
{
'exclude_tags': 'porn,dating',
'username': ['username'],
}
)
for arg in vars(args):
assert getattr(args, arg) == want_args[arg]
def test_args_tags_with_exclude_tags(argparser):
args = argparser.parse_args('--tags coding --exclude-tags porn username'.split())
want_args = dict(DEFAULT_ARGS)
want_args.update(
{
'tags': 'coding',
'exclude_tags': 'porn',
'username': ['username'],
}
)
for arg in vars(args):
assert getattr(args, arg) == want_args[arg]
+5
View File
@@ -1,8 +1,10 @@
"""Maigret data test functions"""
import pytest
from maigret.utils import is_country_tag
@pytest.mark.slow
def test_tags_validity(default_db):
unknown_tags = set()
@@ -13,4 +15,7 @@ def test_tags_validity(default_db):
if tag not in tags:
unknown_tags.add(tag)
# make sure all tags are known
# if you see "unchecked" tag error, please, do
# maigret --db `pwd`/maigret/resources/data.json --self-check --tag unchecked --use-disabled-sites
assert unknown_tags == set()
+58
View File
@@ -0,0 +1,58 @@
import pytest
from maigret.errors import notify_about_errors, CheckError
from maigret.types import QueryResultWrapper
from maigret.result import MaigretCheckResult, MaigretCheckStatus
def test_notify_about_errors():
results = {
'site1': {
'status': MaigretCheckResult(
'', '', '', MaigretCheckStatus.UNKNOWN, error=CheckError('Captcha')
)
},
'site2': {
'status': MaigretCheckResult(
'',
'',
'',
MaigretCheckStatus.UNKNOWN,
error=CheckError('Bot protection'),
)
},
'site3': {
'status': MaigretCheckResult(
'',
'',
'',
MaigretCheckStatus.UNKNOWN,
error=CheckError('Access denied'),
)
},
'site4': {
'status': MaigretCheckResult(
'', '', '', MaigretCheckStatus.CLAIMED, error=None
)
},
}
results = notify_about_errors(results, query_notify=None, show_statistics=True)
# Check the output
expected_output = [
(
'Too many errors of type "Captcha" (25.0%). Try to switch to another ip address or to use service cookies',
'!',
),
(
'Too many errors of type "Bot protection" (25.0%). Try to switch to another ip address',
'!',
),
('Too many errors of type "Access denied" (25.0%)', '!'),
('Verbose error statistics:', '-'),
('Captcha: 25.0%', '!'),
('Bot protection: 25.0%', '!'),
('Access denied: 25.0%', '!'),
('You can see detailed site check errors with a flag `--print-errors`', '-'),
]
assert results == expected_output
+38 -3
View File
@@ -1,4 +1,5 @@
"""Maigret checking logic test functions"""
import pytest
import asyncio
import logging
@@ -7,6 +8,7 @@ from maigret.executors import (
AsyncioProgressbarExecutor,
AsyncioProgressbarSemaphoreExecutor,
AsyncioProgressbarQueueExecutor,
AsyncioQueueGeneratorExecutor,
)
logger = logging.getLogger(__name__)
@@ -48,6 +50,7 @@ async def test_asyncio_progressbar_semaphore_executor():
assert executor.execution_time < 0.4
@pytest.mark.slow
@pytest.mark.asyncio
async def test_asyncio_progressbar_queue_executor():
tasks = [(func, [n], {}) for n in range(10)]
@@ -55,12 +58,12 @@ async def test_asyncio_progressbar_queue_executor():
executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=2)
assert await executor.run(tasks) == [0, 1, 3, 2, 4, 6, 7, 5, 9, 8]
assert executor.execution_time > 0.5
assert executor.execution_time < 0.6
assert executor.execution_time < 0.7
executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=3)
assert await executor.run(tasks) == [0, 3, 1, 4, 6, 2, 7, 9, 5, 8]
assert executor.execution_time > 0.4
assert executor.execution_time < 0.5
assert executor.execution_time < 0.6
executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=5)
assert await executor.run(tasks) in (
@@ -68,9 +71,41 @@ async def test_asyncio_progressbar_queue_executor():
[0, 3, 6, 1, 4, 9, 7, 2, 5, 8],
)
assert executor.execution_time > 0.3
assert executor.execution_time < 0.4
assert executor.execution_time < 0.5
executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=10)
assert await executor.run(tasks) == [0, 3, 6, 9, 1, 4, 7, 2, 5, 8]
assert executor.execution_time > 0.2
assert executor.execution_time < 0.4
@pytest.mark.asyncio
async def test_asyncio_queue_generator_executor():
tasks = [(func, [n], {}) for n in range(10)]
executor = AsyncioQueueGeneratorExecutor(logger=logger, in_parallel=2)
results = [result async for result in executor.run(tasks)]
assert results == [0, 1, 3, 2, 4, 6, 7, 5, 9, 8]
assert executor.execution_time > 0.5
assert executor.execution_time < 0.6
executor = AsyncioQueueGeneratorExecutor(logger=logger, in_parallel=3)
results = [result async for result in executor.run(tasks)]
assert results == [0, 3, 1, 4, 6, 2, 7, 9, 5, 8]
assert executor.execution_time > 0.4
assert executor.execution_time < 0.5
executor = AsyncioQueueGeneratorExecutor(logger=logger, in_parallel=5)
results = [result async for result in executor.run(tasks)]
assert results in (
[0, 3, 6, 1, 4, 7, 9, 2, 5, 8],
[0, 3, 6, 1, 4, 9, 7, 2, 5, 8],
)
assert executor.execution_time > 0.3
assert executor.execution_time < 0.4
executor = AsyncioQueueGeneratorExecutor(logger=logger, in_parallel=10)
results = [result async for result in executor.run(tasks)]
assert results == [0, 3, 6, 9, 1, 4, 7, 2, 5, 8]
assert executor.execution_time > 0.2
assert executor.execution_time < 0.3
+22 -77
View File
@@ -1,4 +1,5 @@
"""Maigret main module test functions"""
import asyncio
import copy
@@ -11,90 +12,33 @@ from maigret.maigret import (
extract_ids_from_results,
)
from maigret.sites import MaigretSite
from maigret.result import QueryResult, QueryStatus
RESULTS_EXAMPLE = {
'Reddit': {
'cookies': None,
'parsing_enabled': False,
'url_main': 'https://www.reddit.com/',
'username': 'Skyeng',
},
'GooglePlayStore': {
'cookies': None,
'http_status': 200,
'is_similar': False,
'parsing_enabled': False,
'rank': 1,
'url_main': 'https://play.google.com/store',
'url_user': 'https://play.google.com/store/apps/developer?id=Skyeng',
'username': 'Skyeng',
},
}
from maigret.result import MaigretCheckResult, MaigretCheckStatus
from tests.conftest import RESULTS_EXAMPLE
@pytest.mark.slow
def test_self_check_db_positive_disable(test_db):
logger = Mock()
assert test_db.sites[0].disabled is False
loop = asyncio.get_event_loop()
loop.run_until_complete(
self_check(test_db, test_db.sites_dict, logger, silent=True)
)
assert test_db.sites[0].disabled is True
@pytest.mark.slow
def test_self_check_db_positive_enable(test_db):
@pytest.mark.asyncio
async def test_self_check_db(test_db):
# initalize logger to debug
logger = Mock()
test_db.sites[0].disabled = True
test_db.sites[0].username_claimed = 'Skyeng'
assert test_db.sites[0].disabled is True
assert test_db.sites_dict['InvalidActive'].disabled is False
assert test_db.sites_dict['ValidInactive'].disabled is True
assert test_db.sites_dict['ValidActive'].disabled is False
assert test_db.sites_dict['InvalidInactive'].disabled is True
loop = asyncio.get_event_loop()
loop.run_until_complete(
self_check(test_db, test_db.sites_dict, logger, silent=True)
await self_check(
test_db, test_db.sites_dict, logger, silent=False, auto_disable=True
)
assert test_db.sites[0].disabled is False
@pytest.mark.slow
def test_self_check_db_negative_disabled(test_db):
logger = Mock()
test_db.sites[0].disabled = True
assert test_db.sites[0].disabled is True
loop = asyncio.get_event_loop()
loop.run_until_complete(
self_check(test_db, test_db.sites_dict, logger, silent=True)
)
assert test_db.sites[0].disabled is True
@pytest.mark.slow
def test_self_check_db_negative_enabled(test_db):
logger = Mock()
test_db.sites[0].disabled = False
test_db.sites[0].username_claimed = 'Skyeng'
assert test_db.sites[0].disabled is False
loop = asyncio.get_event_loop()
loop.run_until_complete(
self_check(test_db, test_db.sites_dict, logger, silent=True)
)
assert test_db.sites[0].disabled is False
assert test_db.sites_dict['InvalidActive'].disabled is True
assert test_db.sites_dict['ValidInactive'].disabled is False
assert test_db.sites_dict['ValidActive'].disabled is False
assert test_db.sites_dict['InvalidInactive'].disabled is True
@pytest.mark.slow
@pytest.mark.skip(reason="broken, fixme")
def test_maigret_results(test_db):
logger = Mock()
@@ -125,12 +69,12 @@ def test_maigret_results(test_db):
del results['GooglePlayStore']['site']
reddit_status = results['Reddit']['status']
assert isinstance(reddit_status, QueryResult)
assert reddit_status.status == QueryStatus.ILLEGAL
assert isinstance(reddit_status, MaigretCheckResult)
assert reddit_status.status == MaigretCheckStatus.ILLEGAL
playstore_status = results['GooglePlayStore']['status']
assert isinstance(playstore_status, QueryResult)
assert playstore_status.status == QueryStatus.CLAIMED
assert isinstance(playstore_status, MaigretCheckResult)
assert playstore_status.status == MaigretCheckStatus.CLAIMED
del results['Reddit']['status']
del results['GooglePlayStore']['status']
@@ -142,6 +86,7 @@ def test_maigret_results(test_db):
assert results == RESULTS_EXAMPLE
@pytest.mark.slow
def test_extract_ids_from_url(default_db):
assert default_db.extract_ids_from_url('https://www.reddit.com/user/test') == {
'test': 'username'
+9 -9
View File
@@ -1,6 +1,6 @@
from maigret.errors import CheckError
from maigret.notify import QueryNotifyPrint
from maigret.result import QueryStatus, QueryResult
from maigret.result import MaigretCheckStatus, MaigretCheckResult
def test_notify_illegal():
@@ -8,9 +8,9 @@ def test_notify_illegal():
assert (
n.update(
QueryResult(
MaigretCheckResult(
username="test",
status=QueryStatus.ILLEGAL,
status=MaigretCheckStatus.ILLEGAL,
site_name="TEST_SITE",
site_url_user="http://example.com/test",
)
@@ -24,9 +24,9 @@ def test_notify_claimed():
assert (
n.update(
QueryResult(
MaigretCheckResult(
username="test",
status=QueryStatus.CLAIMED,
status=MaigretCheckStatus.CLAIMED,
site_name="TEST_SITE",
site_url_user="http://example.com/test",
)
@@ -40,9 +40,9 @@ def test_notify_available():
assert (
n.update(
QueryResult(
MaigretCheckResult(
username="test",
status=QueryStatus.AVAILABLE,
status=MaigretCheckStatus.AVAILABLE,
site_name="TEST_SITE",
site_url_user="http://example.com/test",
)
@@ -53,9 +53,9 @@ def test_notify_available():
def test_notify_unknown():
n = QueryNotifyPrint(color=False)
result = QueryResult(
result = MaigretCheckResult(
username="test",
status=QueryStatus.UNKNOWN,
status=MaigretCheckStatus.UNKNOWN,
site_name="TEST_SITE",
site_url_user="http://example.com/test",
)
+50
View File
@@ -0,0 +1,50 @@
import pytest
from maigret.permutator import Permute
def test_gather_strict():
elements = {'a': 1, 'b': 2}
permute = Permute(elements)
result = permute.gather(method="strict")
expected = {
'a_b': 1,
'b_a': 2,
'a-b': 1,
'b-a': 2,
'a.b': 1,
'b.a': 2,
'ab': 1,
'ba': 2,
'_ab': 1,
'ab_': 1,
'_ba': 2,
'ba_': 2,
}
assert result == expected
def test_gather_all():
elements = {'a': 1, 'b': 2}
permute = Permute(elements)
result = permute.gather(method="all")
expected = {
'a': 1,
'_a': 1,
'a_': 1,
'b': 2,
'_b': 2,
'b_': 2,
'a_b': 1,
'b_a': 2,
'a-b': 1,
'b-a': 2,
'a.b': 1,
'b.a': 2,
'ab': 1,
'ba': 2,
'_ab': 1,
'ab_': 1,
'_ba': 2,
'ba_': 2,
}
assert result == expected
+8 -5
View File
@@ -1,7 +1,9 @@
"""Maigret reports test functions"""
import copy
import json
import os
import pytest
from io import StringIO
import xmind
@@ -18,12 +20,12 @@ from maigret.report import (
generate_json_report,
get_plaintext_report,
)
from maigret.result import QueryResult, QueryStatus
from maigret.result import MaigretCheckResult, MaigretCheckStatus
from maigret.sites import MaigretSite
GOOD_RESULT = QueryResult('', '', '', QueryStatus.CLAIMED)
BAD_RESULT = QueryResult('', '', '', QueryStatus.AVAILABLE)
GOOD_RESULT = MaigretCheckResult('', '', '', MaigretCheckStatus.CLAIMED)
BAD_RESULT = MaigretCheckResult('', '', '', MaigretCheckStatus.AVAILABLE)
EXAMPLE_RESULTS = {
'GitHub': {
@@ -31,11 +33,11 @@ EXAMPLE_RESULTS = {
'parsing_enabled': True,
'url_main': 'https://www.github.com/',
'url_user': 'https://www.github.com/test',
'status': QueryResult(
'status': MaigretCheckResult(
'test',
'GitHub',
'https://www.github.com/test',
QueryStatus.CLAIMED,
MaigretCheckStatus.CLAIMED,
tags=['test_tag'],
),
'http_status': 200,
@@ -424,6 +426,7 @@ def test_html_report_broken():
assert SUPPOSED_BROKEN_INTERESTS in report_text
@pytest.mark.skip(reason='connection reset, fixme')
def test_pdf_report():
report_name = 'report_test.pdf'
context = generate_report_context(TEST)
+53
View File
@@ -0,0 +1,53 @@
import unittest
from unittest.mock import patch, mock_open
from maigret.settings import Settings
class TestSettings(unittest.TestCase):
@patch('json.load')
@patch('builtins.open', new_callable=mock_open)
def test_settings_cascade_and_override(self, mock_file, mock_json_load):
file1_data = {"timeout": 10, "retries_count": 3, "proxy_url": "http://proxy1"}
file2_data = {"timeout": 20, "recursive_search": True}
file3_data = {"proxy_url": "http://proxy3", "print_not_found": False}
mock_json_load.side_effect = [file1_data, file2_data, file3_data]
settings = Settings()
paths = ['file1.json', 'file2.json', 'file3.json']
was_inited, msg = settings.load(paths)
self.assertTrue(was_inited)
self.assertEqual(settings.retries_count, 3)
self.assertEqual(settings.timeout, 20)
self.assertTrue(settings.recursive_search)
self.assertEqual(settings.proxy_url, "http://proxy3")
self.assertFalse(settings.print_not_found)
@patch('builtins.open')
def test_settings_file_not_found(self, mock_open_func):
mock_open_func.side_effect = FileNotFoundError()
settings = Settings()
paths = ['nonexistent.json']
was_inited, msg = settings.load(paths)
self.assertFalse(was_inited)
self.assertIn('None of the default settings files found', msg)
@patch('json.load')
@patch('builtins.open', new_callable=mock_open)
def test_settings_invalid_json(self, mock_file, mock_json_load):
mock_json_load.side_effect = ValueError("Expecting value")
settings = Settings()
paths = ['invalid.json']
was_inited, msg = settings.load(paths)
self.assertFalse(was_inited)
self.assertIsInstance(msg, ValueError)
self.assertIn('Problem with parsing json contents', str(msg))
+109
View File
@@ -1,4 +1,5 @@
"""Maigret Database test functions"""
from maigret.sites import MaigretDatabase, MaigretSite
EXAMPLE_DB = {
@@ -181,6 +182,97 @@ def test_ranked_sites_dict_id_type():
assert len(db.ranked_sites_dict(id_type='gaia_id')) == 1
def test_ranked_sites_dict_excluded_tags():
db = MaigretDatabase()
db.update_site(MaigretSite('3', {'alexaRank': 1000, 'engine': 'ucoz'}))
db.update_site(MaigretSite('1', {'alexaRank': 2, 'tags': ['forum']}))
db.update_site(MaigretSite('2', {'alexaRank': 10, 'tags': ['ru', 'forum']}))
# excluding by tag
assert list(db.ranked_sites_dict(excluded_tags=['ru']).keys()) == ['1', '3']
assert list(db.ranked_sites_dict(excluded_tags=['forum']).keys()) == ['3']
# excluding by engine
assert list(db.ranked_sites_dict(excluded_tags=['ucoz']).keys()) == ['1', '2']
# combining include and exclude tags
assert list(db.ranked_sites_dict(tags=['forum'], excluded_tags=['ru']).keys()) == ['1']
# excluding non-existent tag has no effect
assert list(db.ranked_sites_dict(excluded_tags=['nonexistent']).keys()) == ['1', '2', '3']
# exclude all
assert list(db.ranked_sites_dict(excluded_tags=['forum', 'ucoz']).keys()) == []
def test_ranked_sites_dict_excluded_tags_with_top():
"""Excluded tags should also prevent mirrors from being included."""
db = MaigretDatabase()
db.update_site(
MaigretSite('Parent', {'alexaRank': 1, 'tags': ['forum'], 'type': 'username'})
)
db.update_site(
MaigretSite('Mirror', {'alexaRank': 999999, 'source': 'Parent', 'tags': ['forum'], 'type': 'username'})
)
db.update_site(
MaigretSite('Other', {'alexaRank': 2, 'tags': ['coding'], 'type': 'username'})
)
# Without exclusion, mirror should be included
result = db.ranked_sites_dict(top=1, id_type='username')
assert 'Parent' in result
assert 'Mirror' in result
# With exclusion of 'forum', both Parent and Mirror should be excluded
result = db.ranked_sites_dict(top=2, excluded_tags=['forum'], id_type='username')
assert 'Parent' not in result
assert 'Mirror' not in result
assert 'Other' in result
def test_ranked_sites_dict_mirrors_disabled_parent():
"""Mirror is included when parent ranks in top N but parent is disabled."""
db = MaigretDatabase()
db.update_site(
MaigretSite(
'ParentPlatform',
{'alexaRank': 5, 'disabled': True, 'type': 'username'},
)
)
db.update_site(
MaigretSite(
'OtherSite',
{'alexaRank': 100, 'type': 'username'},
)
)
db.update_site(
MaigretSite(
'MirrorSite',
{
'alexaRank': 99999999,
'source': 'ParentPlatform',
'type': 'username',
},
)
)
result = db.ranked_sites_dict(top=1, disabled=False, id_type='username')
assert list(result.keys()) == ['OtherSite', 'MirrorSite']
def test_ranked_sites_dict_mirrors_no_extra_without_parent_in_top():
db = MaigretDatabase()
db.update_site(MaigretSite('A', {'alexaRank': 1, 'type': 'username'}))
db.update_site(
MaigretSite(
'B',
{'alexaRank': 2, 'source': 'NotInDb', 'type': 'username'},
)
)
assert list(db.ranked_sites_dict(top=1, id_type='username').keys()) == ['A']
def test_get_url_template():
site = MaigretSite(
"test",
@@ -202,3 +294,20 @@ def test_get_url_template():
},
)
assert site.get_url_template() == "SUBDOMAIN"
def test_has_site_url_or_name(default_db):
# by the same url or partial match
assert default_db.has_site("https://aback.com.ua/user/") == True
assert default_db.has_site("https://aback.com.ua") == True
# acceptable partial match
assert default_db.has_site("https://aback.com.ua/use") == True
assert default_db.has_site("https://aback.com") == True
# by name
assert default_db.has_site("Aback") == True
# false
assert default_db.has_site("https://aeifgoai3h4g8a3u4g5") == False
assert default_db.has_site("aeifgoai3h4g8a3u4g5") == False
+360
View File
@@ -0,0 +1,360 @@
import re
import pytest
from unittest.mock import MagicMock, patch
from maigret.submit import Submitter
from aiohttp import ClientSession
from maigret.sites import MaigretDatabase, MaigretSite
import logging
@pytest.mark.slow
@pytest.mark.asyncio
async def test_detect_known_engine(test_db, local_test_db):
# Use the database fixture instead of mocking
mock_db = test_db
mock_settings = MagicMock()
mock_logger = MagicMock()
mock_args = MagicMock()
mock_args.cookie_file = ""
mock_args.proxy = ""
# Mock the supposed usernames
mock_settings.supposed_usernames = ["adam"]
# Create the Submitter instance
submitter = Submitter(test_db, mock_settings, mock_logger, mock_args)
# Call the method with test URLs
url_exists = "https://devforum.zoom.us/u/adam"
url_mainpage = "https://devforum.zoom.us/"
# Mock extract_username_dialog to return "adam"
submitter.extract_username_dialog = MagicMock(return_value="adam")
sites, resp_text = await submitter.detect_known_engine(
url_exists, url_mainpage, session=None, follow_redirects=False, headers=None
)
# Assertions
assert len(sites) == 2
assert sites[0].name == "devforum.zoom.us"
assert sites[0].url_main == "https://devforum.zoom.us/"
assert sites[0].engine == "Discourse"
assert sites[0].username_claimed == "adam"
assert sites[0].username_unclaimed == "noonewouldeverusethis7"
assert resp_text != ""
await submitter.close()
# Create the Submitter instance without engines
submitter = Submitter(local_test_db, mock_settings, mock_logger, mock_args)
sites, resp_text = await submitter.detect_known_engine(
url_exists, url_mainpage, session=None, follow_redirects=False, headers=None
)
assert len(sites) == 0
await submitter.close()
@pytest.mark.slow
@pytest.mark.asyncio
async def test_check_features_manually_success(settings):
# Setup
db = MaigretDatabase()
logger = logging.getLogger("test_logger")
args = type(
'Args', (object,), {'proxy': None, 'cookie_file': None, 'verbose': False}
)()
submitter = Submitter(db, settings, logger, args)
username = "KONAMI"
url_exists = "https://play.google.com/store/apps/developer?id=KONAMI"
# Execute
presence_list, absence_list, status, random_username = (
await submitter.check_features_manually(
username=username,
url_exists=url_exists,
session=ClientSession(),
follow_redirects=False,
headers=None,
)
)
await submitter.close()
# Assert
assert status == "Found", "Expected status to be 'Found'"
assert isinstance(presence_list, list), "Presence list should be a list"
assert isinstance(absence_list, list), "Absence list should be a list"
assert isinstance(random_username, str), "Random username should be a string"
assert (
random_username != username
), "Random username should not be the same as the input username"
assert sorted(presence_list) == sorted(
[
' title=',
'og:title',
'display: none;',
'4;0',
'main-title',
]
)
assert sorted(absence_list) == sorted(
[
' body {',
' </style>',
'><title>Not Found</title>',
' <style nonce=',
' .rounded {',
]
)
@pytest.mark.slow
@pytest.mark.asyncio
async def test_check_features_manually_success(settings):
# Setup
db = MaigretDatabase()
logger = logging.getLogger("test_logger")
args = type(
'Args', (object,), {'proxy': None, 'cookie_file': None, 'verbose': False}
)()
submitter = Submitter(db, settings, logger, args)
username = "abel"
url_exists = "https://community.cloudflare.com/badges/1/basic?username=abel"
# Execute
presence_list, absence_list, status, random_username = (
await submitter.check_features_manually(
username=username,
url_exists=url_exists,
session=ClientSession(),
follow_redirects=False,
headers=None,
)
)
await submitter.close()
# Assert
assert status == "Cloudflare detected, skipping"
assert presence_list is None
assert absence_list is None
assert random_username != username
@pytest.mark.slow
@pytest.mark.asyncio
async def test_dialog_adds_site_positive(settings):
# Initialize necessary objects
db = MaigretDatabase()
logger = logging.getLogger("test_logger")
logger.setLevel(logging.INFO)
args = type(
'Args',
(object,),
{
'proxy': None,
'cookie_file': None,
'verbose': False,
'db_file': 'test_db.json',
'db': 'test_db.json',
},
)()
submitter = Submitter(db, settings, logger, args)
# Mock user inputs
user_inputs = [
'KONAMI', # Manually input username
'y', # Save the site in the Maigret DB
'GooglePlayStore', # Custom site name
'', # no custom tags
]
with patch('builtins.input', side_effect=user_inputs):
result = await submitter.dialog(
"https://play.google.com/store/apps/developer?id=KONAMI", None
)
await submitter.close()
assert result is True
assert len(db.sites) == 1
site = db.sites[0]
assert site.url_main == "https://play.google.com"
assert site.name == "GooglePlayStore"
assert site.tags == []
assert site.presense_strs != []
assert site.absence_strs != []
assert site.username_claimed == "KONAMI"
assert site.check_type == "message"
@pytest.mark.slow
@pytest.mark.asyncio
async def test_dialog_replace_site(settings, test_db):
# Initialize necessary objects
db = test_db
logger = logging.getLogger("test_logger")
logger.setLevel(logging.DEBUG)
args = type(
'Args',
(object,),
{
'proxy': None,
'cookie_file': None,
'verbose': False,
'db_file': 'test_db.json',
'db': 'test_db.json',
},
)()
assert len(db.sites) == 4
submitter = Submitter(db, settings, logger, args)
# Mock user inputs
user_inputs = [
'y', # Similar sites found, continue
'InvalidActive', # Choose site to replace
'', # Custom headers
'y', # Should we do redirects automatically?
'KONAMI', # Manually input username
'y', # Save the site in the Maigret DB
'', # Custom site name
'', # no custom tags
]
with patch('builtins.input', side_effect=user_inputs):
result = await submitter.dialog(
"https://play.google.com/store/apps/developer?id=KONAMI", None
)
await submitter.close()
assert result is True
assert len(db.sites) == 4
site = db.sites_dict["InvalidActive"]
assert site.name == "InvalidActive"
assert site.url_main == "https://play.google.com"
assert site.tags == ['global', 'us']
assert site.presense_strs != []
assert site.absence_strs != []
assert site.username_claimed == "KONAMI"
assert site.check_type == "message"
@pytest.mark.slow
@pytest.mark.asyncio
async def test_dialog_adds_site_negative(settings):
# Initialize necessary objects
db = MaigretDatabase()
logger = logging.getLogger("test_logger")
logger.setLevel(logging.INFO)
args = type(
'Args',
(object,),
{
'proxy': None,
'cookie_file': None,
'verbose': False,
'db_file': 'test_db.json',
'db': 'test_db.json',
},
)()
submitter = Submitter(db, settings, logger, args)
# Mock user inputs
user_inputs = [
'sokrat', # Manually input username
'y', # Save the site in the Maigret DB
]
with patch('builtins.input', side_effect=user_inputs):
result = await submitter.dialog("https://icq.com/sokrat", None)
await submitter.close()
assert result is False
def test_domain_matching_exact():
"""Test that domain matching uses proper boundary checks, not substring matching.
x.com should NOT match sites like 500px.com, mix.com, etc.
"""
domain_raw = "x.com"
domain_re = re.compile(
r'://(www\.)?' + re.escape(domain_raw) + r'(/|$)'
)
# These should NOT match x.com
non_matching = [
MaigretSite("500px", {"url": "https://500px.com/p/{username}", "urlMain": "https://500px.com/"}),
MaigretSite("Mix", {"url": "https://mix.com/{username}", "urlMain": "https://mix.com"}),
MaigretSite("Screwfix", {"url": "{urlMain}{urlSubpath}/members/?username={username}", "urlMain": "https://community.screwfix.com"}),
MaigretSite("Wix", {"url": "https://{username}.wix.com", "urlMain": "https://wix.com/"}),
MaigretSite("1x", {"url": "https://1x.com/{username}", "urlMain": "https://1x.com"}),
MaigretSite("Roblox", {"url": "https://www.roblox.com/user.aspx?username={username}", "urlMain": "https://www.roblox.com/"}),
]
for site in non_matching:
assert not domain_re.search(site.url_main + site.url), \
f"x.com should NOT match site {site.name} ({site.url_main})"
def test_domain_matching_positive():
"""Test that domain matching correctly matches the exact domain."""
domain_raw = "x.com"
domain_re = re.compile(
r'://(www\.)?' + re.escape(domain_raw) + r'(/|$)'
)
# These SHOULD match x.com
matching = [
MaigretSite("X", {"url": "https://x.com/{username}", "urlMain": "https://x.com"}),
MaigretSite("X-www", {"url": "https://www.x.com/{username}", "urlMain": "https://www.x.com"}),
]
for site in matching:
assert domain_re.search(site.url_main + site.url), \
f"x.com SHOULD match site {site.name} ({site.url_main})"
def test_dialog_nonexistent_site_name_no_crash():
"""Test that entering a site name not in the matched list doesn't crash.
This tests the fix for: AttributeError: 'NoneType' object has no attribute 'name'
The old_site should be None when user enters a name not in matched_sites,
and the code should handle it gracefully.
"""
# Simulate the logic that was crashing
matched_sites = [
MaigretSite("ValidActive", {"url": "https://example.com/{username}", "urlMain": "https://example.com"}),
MaigretSite("InvalidActive", {"url": "https://example.com/alt/{username}", "urlMain": "https://example.com"}),
]
site_name = "NonExistentSite"
old_site = next(
(site for site in matched_sites if site.name == site_name), None
)
# This is what the old code did - it would crash here
assert old_site is None
# The fix: check before accessing .name
if old_site is None:
result = "not found"
else:
result = old_site.name
assert result == "not found"
# And when site_name IS in matched_sites, it should work
site_name = "ValidActive"
old_site = next(
(site for site in matched_sites if site.name == site_name), None
)
assert old_site is not None
assert old_site.name == "ValidActive"
+63
View File
@@ -0,0 +1,63 @@
"""Tests for the Twitter / X site entry and GraphQL probe."""
import re
import pytest
import requests
from maigret.sites import MaigretSite
def _twitter_site(site: MaigretSite) -> None:
assert site.name == "Twitter"
assert site.disabled is False
assert site.check_type == "message"
assert site.url_probe and "{username}" in site.url_probe
assert "UserByScreenName" in site.url_probe or "graphql" in site.url_probe
assert site.regex_check
assert re.fullmatch(site.regex_check, site.username_claimed)
assert re.fullmatch(site.regex_check, site.username_unclaimed)
assert site.absence_strs
assert site.activation.get("method") == "twitter"
assert site.activation.get("url")
assert "authorization" in {k.lower() for k in site.headers.keys()}
def test_twitter_site_entry_config(default_db):
"""Twitter entry in data.json must define probe URL, regex, and activation."""
site = default_db.sites_dict["Twitter"]
assert isinstance(site, MaigretSite)
_twitter_site(site)
@pytest.mark.slow
def test_twitter_graphql_probe_claimed_vs_unclaimed(default_db):
"""
Live check: guest activation + UserByScreenName GraphQL returns a user for
usernameClaimed and no user for usernameUnclaimed (same flow as urlProbe).
"""
site = default_db.sites_dict["Twitter"]
_twitter_site(site)
headers = dict(site.headers)
headers.pop("x-guest-token", None)
act = requests.post(site.activation["url"], headers=headers, timeout=45)
assert act.status_code == 200, act.text[:500]
body = act.json()
assert "guest_token" in body
headers["x-guest-token"] = body["guest_token"]
def fetch(username: str) -> dict:
url = site.url_probe.format(username=username)
resp = requests.get(url, headers=headers, timeout=45)
resp.raise_for_status()
return resp.json()
claimed_json = fetch(site.username_claimed)
assert "data" in claimed_json
assert claimed_json["data"].get("user") is not None
unclaimed_json = fetch(site.username_unclaimed)
data = unclaimed_json.get("data") or {}
assert data == {} or data.get("user") is None
+1
View File
@@ -1,4 +1,5 @@
"""Maigret utils test functions"""
import itertools
import re
View File
+11 -5
View File
@@ -3,7 +3,7 @@ import random
from argparse import ArgumentParser, RawDescriptionHelpFormatter
from maigret.maigret import MaigretDatabase
from maigret.submit import get_alexa_rank
from maigret.submit import Submitter
def update_tags(site):
@@ -22,7 +22,7 @@ def update_tags(site):
site.disabled = True
print(f'Old alexa rank: {site.alexa_rank}')
rank = get_alexa_rank(site.url_main)
rank = Submitter.get_alexa_rank(site.url_main)
if rank:
print(f'New alexa rank: {rank}')
site.alexa_rank = rank
@@ -36,6 +36,7 @@ if __name__ == '__main__':
parser.add_argument("--base","-b", metavar="BASE_FILE",
dest="base_file", default="maigret/resources/data.json",
help="JSON file with sites data to update.")
parser.add_argument("--name", help="Name of site to check")
pool = list()
@@ -45,12 +46,17 @@ if __name__ == '__main__':
db.load_from_file(args.base_file).sites
while True:
site = random.choice(db.sites)
if args.name:
sites = list(db.ranked_sites_dict(names=[args.name]).values())
site = random.choice(sites)
else:
site = random.choice(db.sites)
if site.engine == 'uCoz':
continue
if not 'in' in site.tags:
continue
# if not 'in' in site.tags:
# continue
update_tags(site)
+144
View File
@@ -0,0 +1,144 @@
#!/usr/bin/env python3
"""Maigret: Supported Site Listing with Alexa ranking and country tags
This module generates the listing of supported sites in file `SITES.md`
and pretty prints file with sites data.
"""
import asyncio
import json
import logging
from argparse import ArgumentParser, RawDescriptionHelpFormatter
from maigret.maigret import get_response
from maigret.sites import MaigretDatabase, MaigretEngine
async def check_engine_of_site(site_name, sites_with_engines, future, engine_name, semaphore, logger):
async with semaphore:
response = await get_response(request_future=future,
site_name=site_name,
logger=logger)
html_text, status_code, error_text, expection_text = response
if html_text and engine_name in html_text:
sites_with_engines.append(site_name)
return True
return False
if __name__ == '__main__':
parser = ArgumentParser(formatter_class=RawDescriptionHelpFormatter
)
parser.add_argument("--base","-b", metavar="BASE_FILE",
dest="base_file", default="maigret/resources/data.json",
help="JSON file with sites data to update.")
parser.add_argument('--engine', '-e', help='check only selected engine', type=str)
args = parser.parse_args()
log_level = logging.INFO
logging.basicConfig(
format='[%(filename)s:%(lineno)d] %(levelname)-3s %(asctime)s %(message)s',
datefmt='%H:%M:%S',
level=log_level
)
logger = logging.getLogger('engines-check')
logger.setLevel(log_level)
db = MaigretDatabase()
sites_subset = db.load_from_file(args.base_file).sites
sites = {site.name: site for site in sites_subset}
with open(args.base_file, "r", encoding="utf-8") as data_file:
sites_info = json.load(data_file)
engines = sites_info['engines']
for engine_name, engine_data in engines.items():
if args.engine and args.engine != engine_name:
continue
if not 'presenseStrs' in engine_data:
print(f'No features to automatically detect sites on engine {engine_name}')
continue
engine_obj = MaigretEngine(engine_name, engine_data)
# setup connections for checking both engine and usernames
connector = aiohttp.TCPConnector(ssl=False)
connector.verify_ssl=False
session = aiohttp.ClientSession(connector=connector)
sem = asyncio.Semaphore(100)
loop = asyncio.get_event_loop()
tasks = []
# check sites without engine if they look like sites on this engine
new_engine_sites = []
for site_name, site_data in sites.items():
if site_data.engine:
continue
future = session.get(url=site_data.url_main,
allow_redirects=True,
timeout=10,
)
check_engine_coro = check_engine_of_site(site_name, new_engine_sites, future, engine_name, sem, logger)
future = asyncio.ensure_future(check_engine_coro)
tasks.append(future)
# progress bar
with alive_progress(len(tasks), title='Checking sites') as progress:
for f in asyncio.as_completed(tasks):
loop.run_until_complete(f)
progress()
print(f'Total detected {len(new_engine_sites)} sites on engine {engine_name}')
# dict with new found engine sites
new_sites = {site_name: sites[site_name] for site_name in new_engine_sites}
# update sites obj from engine
for site_name, site in new_sites.items():
site.request_future = None
site.engine = engine_name
site.update_from_engine(engine_obj)
async def update_site_data(site_name, site_data, all_sites, logger, no_progressbar):
updates = await site_self_check(site_name, site_data, logger, no_progressbar)
all_sites[site_name].update(updates)
tasks = []
# for new_site_name, new_site_data in new_sites.items():
# coro = update_site_data(new_site_name, new_site_data, new_sites, logger)
# future = asyncio.ensure_future(coro)
# tasks.append(future)
# asyncio.gather(*tasks)
for new_site_name, new_site_data in new_sites.items():
coro = update_site_data(new_site_name, new_site_data, new_sites, logger, no_progressbar=True)
loop.run_until_complete(coro)
updated_sites_count = 0
for s in new_sites:
site = new_sites[s]
site.request_future = None
if site.disabled:
print(f'{site.name} failed username checking of engine {engine_name}')
continue
site = site.strip_engine_data()
db.update_site(site)
updated_sites_count += 1
db.save_to_file(args.base_file)
print(f'Site "{s}": ' + json.dumps(site.json, indent=4))
print(f'Updated total {updated_sites_count} sites!')
print(f'Checking all sites on engine {engine_name}')
loop.run_until_complete(session.close())
print("\nFinished updating supported site listing!")
+480
View File
@@ -0,0 +1,480 @@
#!/usr/bin/env python3
"""
Mass site checking utility for Maigret development.
Check top-N sites from data.json and generate a report.
Usage:
python utils/check_top_n.py --top 100 # Check top 100 sites
python utils/check_top_n.py --top 50 --parallel 10 # Check with 10 parallel requests
python utils/check_top_n.py --top 100 --output report.json
python utils/check_top_n.py --top 100 --fix # Auto-fix simple issues
"""
import argparse
import asyncio
import json
import sys
import time
from collections import defaultdict
from dataclasses import dataclass, field, asdict
from pathlib import Path
from typing import Dict, List, Optional, Tuple
# Add parent dir for imports
sys.path.insert(0, str(Path(__file__).parent.parent))
try:
import aiohttp
except ImportError:
print("aiohttp not installed. Run: pip install aiohttp")
sys.exit(1)
class Colors:
RED = "\033[91m"
GREEN = "\033[92m"
YELLOW = "\033[93m"
BLUE = "\033[94m"
CYAN = "\033[96m"
RESET = "\033[0m"
BOLD = "\033[1m"
def color(text: str, c: str) -> str:
return f"{c}{text}{Colors.RESET}"
@dataclass
class SiteCheckResult:
"""Result of checking a single site."""
site_name: str
alexa_rank: int
disabled: bool
check_type: str
# Status
status: str = "unknown" # working, broken, timeout, error, anti_bot, disabled
# HTTP results
claimed_http_status: Optional[int] = None
unclaimed_http_status: Optional[int] = None
claimed_error: Optional[str] = None
unclaimed_error: Optional[str] = None
# Issues detected
issues: List[str] = field(default_factory=list)
warnings: List[str] = field(default_factory=list)
# Recommendations
recommendations: List[str] = field(default_factory=list)
# Timing
check_time_ms: int = 0
DEFAULT_HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
}
async def check_url(url: str, headers: dict, timeout: int = 15) -> dict:
"""Quick URL check returning status and basic info."""
result = {
"status": None,
"final_url": None,
"content_length": 0,
"error": None,
"error_type": None,
"content": None,
"markers": {},
}
try:
connector = aiohttp.TCPConnector(ssl=False)
timeout_obj = aiohttp.ClientTimeout(total=timeout)
async with aiohttp.ClientSession(connector=connector, timeout=timeout_obj) as session:
async with session.get(url, headers=headers, allow_redirects=True) as resp:
result["status"] = resp.status
result["final_url"] = str(resp.url)
try:
text = await resp.text()
result["content_length"] = len(text)
result["content"] = text
text_lower = text.lower()
result["markers"] = {
"404_text": any(m in text_lower for m in ["not found", "404", "doesn't exist"]),
"captcha": any(m in text_lower for m in ["captcha", "recaptcha", "challenge"]),
"cloudflare": "cloudflare" in text_lower,
"login": any(m in text_lower for m in ["log in", "login", "sign in"]),
}
except Exception as e:
result["error"] = f"Content error: {e}"
result["error_type"] = "content"
except asyncio.TimeoutError:
result["error"] = "Timeout"
result["error_type"] = "timeout"
except aiohttp.ClientError as e:
result["error"] = str(e)
result["error_type"] = "client"
except Exception as e:
result["error"] = str(e)
result["error_type"] = "unknown"
return result
async def check_site(site_name: str, config: dict, timeout: int = 15) -> SiteCheckResult:
"""Check a single site and return detailed result."""
start_time = time.time()
result = SiteCheckResult(
site_name=site_name,
alexa_rank=config.get("alexaRank", 999999),
disabled=config.get("disabled", False),
check_type=config.get("checkType", "status_code"),
)
# Skip disabled sites
if result.disabled:
result.status = "disabled"
return result
# Build URL
url_template = config.get("url", "")
url_main = config.get("urlMain", "")
url_subpath = config.get("urlSubpath", "")
url_template = url_template.replace("{urlMain}", url_main).replace("{urlSubpath}", url_subpath)
claimed = config.get("usernameClaimed")
unclaimed = config.get("usernameUnclaimed", "noonewouldeverusethis7")
if not claimed:
result.status = "error"
result.issues.append("No usernameClaimed defined")
return result
# Prepare headers
headers = DEFAULT_HEADERS.copy()
if config.get("headers"):
headers.update(config["headers"])
# Check both URLs
url_claimed = url_template.replace("{username}", claimed)
url_unclaimed = url_template.replace("{username}", unclaimed)
try:
claimed_result, unclaimed_result = await asyncio.gather(
check_url(url_claimed, headers, timeout),
check_url(url_unclaimed, headers, timeout),
)
except Exception as e:
result.status = "error"
result.issues.append(f"Check failed: {e}")
return result
result.claimed_http_status = claimed_result["status"]
result.unclaimed_http_status = unclaimed_result["status"]
result.claimed_error = claimed_result.get("error")
result.unclaimed_error = unclaimed_result.get("error")
# Categorize result
if claimed_result["error_type"] == "timeout" or unclaimed_result["error_type"] == "timeout":
result.status = "timeout"
result.issues.append("Request timeout")
elif claimed_result["status"] == 403 or claimed_result["status"] == 429:
result.status = "anti_bot"
result.issues.append(f"Anti-bot protection (HTTP {claimed_result['status']})")
elif claimed_result.get("markers", {}).get("captcha"):
result.status = "anti_bot"
result.issues.append("Captcha detected")
elif claimed_result.get("markers", {}).get("cloudflare"):
result.status = "anti_bot"
result.warnings.append("Cloudflare protection detected")
elif claimed_result["error"] or unclaimed_result["error"]:
result.status = "error"
if claimed_result["error"]:
result.issues.append(f"Claimed error: {claimed_result['error']}")
if unclaimed_result["error"]:
result.issues.append(f"Unclaimed error: {unclaimed_result['error']}")
else:
# Validate check type
check_type = config.get("checkType", "status_code")
if check_type == "status_code":
if claimed_result["status"] == unclaimed_result["status"]:
result.status = "broken"
result.issues.append(f"Same status code ({claimed_result['status']}) for both")
# Suggest fix
if claimed_result["final_url"] != unclaimed_result["final_url"]:
result.recommendations.append("Switch to checkType: response_url")
else:
result.status = "working"
elif check_type == "response_url":
if claimed_result["final_url"] == unclaimed_result["final_url"]:
result.status = "broken"
result.issues.append("Same final URL for both")
if claimed_result["status"] != unclaimed_result["status"]:
result.recommendations.append("Switch to checkType: status_code")
else:
result.status = "working"
elif check_type == "message":
presense_strs = config.get("presenseStrs", [])
absence_strs = config.get("absenceStrs", [])
claimed_content = claimed_result.get("content", "") or ""
unclaimed_content = unclaimed_result.get("content", "") or ""
presense_ok = not presense_strs or any(s in claimed_content for s in presense_strs)
absence_claimed = absence_strs and any(s in claimed_content for s in absence_strs)
absence_unclaimed = absence_strs and any(s in unclaimed_content for s in absence_strs)
if presense_strs and not presense_ok:
result.status = "broken"
result.issues.append(f"presenseStrs not found: {presense_strs}")
# Check if status_code would work
if claimed_result["status"] != unclaimed_result["status"]:
result.recommendations.append(f"Switch to checkType: status_code ({claimed_result['status']} vs {unclaimed_result['status']})")
elif absence_claimed:
result.status = "broken"
result.issues.append(f"absenceStrs found in claimed page")
elif absence_strs and not absence_unclaimed:
result.status = "broken"
result.warnings.append("absenceStrs not found in unclaimed page")
else:
result.status = "working"
else:
result.status = "unknown"
result.warnings.append(f"Unknown checkType: {check_type}")
result.check_time_ms = int((time.time() - start_time) * 1000)
return result
def load_sites(db_path: Path) -> Dict[str, dict]:
"""Load all sites from data.json."""
with open(db_path) as f:
data = json.load(f)
return data.get("sites", {})
def get_top_sites(sites: Dict[str, dict], n: int) -> List[Tuple[str, dict]]:
"""Get top N sites by Alexa rank."""
ranked = []
for name, config in sites.items():
rank = config.get("alexaRank", 999999)
ranked.append((name, config, rank))
ranked.sort(key=lambda x: x[2])
return [(name, config) for name, config, _ in ranked[:n]]
async def check_sites_batch(sites: List[Tuple[str, dict]], parallel: int = 5,
timeout: int = 15, progress_callback=None) -> List[SiteCheckResult]:
"""Check multiple sites with parallelism control."""
results = []
semaphore = asyncio.Semaphore(parallel)
async def check_with_semaphore(name, config, index):
async with semaphore:
if progress_callback:
progress_callback(index, len(sites), name)
return await check_site(name, config, timeout)
tasks = [
check_with_semaphore(name, config, i)
for i, (name, config) in enumerate(sites)
]
results = await asyncio.gather(*tasks)
return results
def print_progress(current: int, total: int, site_name: str):
"""Print progress indicator."""
pct = int(current / total * 100)
bar_width = 30
filled = int(bar_width * current / total)
bar = "" * filled + "" * (bar_width - filled)
print(f"\r[{bar}] {pct:3d}% ({current}/{total}) {site_name:<30}", end="", flush=True)
def generate_report(results: List[SiteCheckResult]) -> dict:
"""Generate a summary report from check results."""
report = {
"summary": {
"total": len(results),
"working": 0,
"broken": 0,
"disabled": 0,
"timeout": 0,
"anti_bot": 0,
"error": 0,
"unknown": 0,
},
"by_status": defaultdict(list),
"issues": [],
"recommendations": [],
}
for r in results:
report["summary"][r.status] = report["summary"].get(r.status, 0) + 1
report["by_status"][r.status].append(r.site_name)
if r.issues:
report["issues"].append({
"site": r.site_name,
"rank": r.alexa_rank,
"issues": r.issues,
})
if r.recommendations:
report["recommendations"].append({
"site": r.site_name,
"rank": r.alexa_rank,
"recommendations": r.recommendations,
})
return report
def print_report(report: dict, results: List[SiteCheckResult]):
"""Print a formatted report to console."""
summary = report["summary"]
print(f"\n{'='*60}")
print(f"{color('SITE CHECK REPORT', Colors.CYAN)}")
print(f"{'='*60}\n")
print(f"{color('SUMMARY:', Colors.BOLD)}")
print(f" Total sites checked: {summary['total']}")
print(f" {color('Working:', Colors.GREEN)} {summary['working']}")
print(f" {color('Broken:', Colors.RED)} {summary['broken']}")
print(f" {color('Disabled:', Colors.YELLOW)} {summary['disabled']}")
print(f" {color('Timeout:', Colors.YELLOW)} {summary['timeout']}")
print(f" {color('Anti-bot:', Colors.YELLOW)} {summary['anti_bot']}")
print(f" {color('Error:', Colors.RED)} {summary['error']}")
# Broken sites
if report["by_status"]["broken"]:
print(f"\n{color('BROKEN SITES:', Colors.RED)}")
for site in report["by_status"]["broken"][:20]:
r = next(x for x in results if x.site_name == site)
print(f" - {site} (rank {r.alexa_rank}): {', '.join(r.issues)}")
if len(report["by_status"]["broken"]) > 20:
print(f" ... and {len(report['by_status']['broken']) - 20} more")
# Timeout sites
if report["by_status"]["timeout"]:
print(f"\n{color('TIMEOUT SITES:', Colors.YELLOW)}")
for site in report["by_status"]["timeout"][:10]:
print(f" - {site}")
if len(report["by_status"]["timeout"]) > 10:
print(f" ... and {len(report['by_status']['timeout']) - 10} more")
# Anti-bot sites
if report["by_status"]["anti_bot"]:
print(f"\n{color('ANTI-BOT PROTECTED:', Colors.YELLOW)}")
for site in report["by_status"]["anti_bot"][:10]:
r = next(x for x in results if x.site_name == site)
print(f" - {site}: {', '.join(r.issues)}")
if len(report["by_status"]["anti_bot"]) > 10:
print(f" ... and {len(report['by_status']['anti_bot']) - 10} more")
# Recommendations
if report["recommendations"]:
print(f"\n{color('RECOMMENDATIONS:', Colors.CYAN)}")
for rec in report["recommendations"][:15]:
print(f" {rec['site']} (rank {rec['rank']}):")
for r in rec["recommendations"]:
print(f" -> {r}")
if len(report["recommendations"]) > 15:
print(f" ... and {len(report['recommendations']) - 15} more")
async def main():
parser = argparse.ArgumentParser(
description="Mass site checking for Maigret",
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument("--top", "-n", type=int, default=100,
help="Check top N sites by Alexa rank (default: 100)")
parser.add_argument("--parallel", "-p", type=int, default=5,
help="Number of parallel requests (default: 5)")
parser.add_argument("--timeout", "-t", type=int, default=15,
help="Request timeout in seconds (default: 15)")
parser.add_argument("--output", "-o", help="Output JSON report to file")
parser.add_argument("--include-disabled", action="store_true",
help="Include disabled sites in results")
parser.add_argument("--only-broken", action="store_true",
help="Only show broken sites")
parser.add_argument("--json", action="store_true",
help="Output as JSON only")
args = parser.parse_args()
# Load sites
db_path = Path(__file__).parent.parent / "maigret" / "resources" / "data.json"
if not db_path.exists():
print(f"Database not found: {db_path}")
sys.exit(1)
sites = load_sites(db_path)
top_sites = get_top_sites(sites, args.top)
if not args.json:
print(f"Checking top {len(top_sites)} sites (parallel={args.parallel}, timeout={args.timeout}s)...")
print()
# Run checks
progress = print_progress if not args.json else None
results = await check_sites_batch(top_sites, args.parallel, args.timeout, progress)
if not args.json:
print() # Clear progress line
# Filter results
if not args.include_disabled:
results = [r for r in results if r.status != "disabled"]
if args.only_broken:
results = [r for r in results if r.status in ("broken", "error", "timeout")]
# Generate report
report = generate_report(results)
# Output
if args.json:
output = {
"report": report,
"results": [asdict(r) for r in results],
}
print(json.dumps(output, indent=2))
else:
print_report(report, results)
# Save to file
if args.output:
output = {
"report": report,
"results": [asdict(r) for r in results],
}
with open(args.output, "w") as f:
json.dump(output, f, indent=2)
print(f"\nReport saved to: {args.output}")
if __name__ == "__main__":
asyncio.run(main())
+223
View File
@@ -0,0 +1,223 @@
#!/usr/bin/env python3
"""
Probe likely false-positive sites among the top-N Alexa-ranked entries.
For each of K random *distinct* usernames taken from ``usernameClaimed`` fields in
the Maigret database, runs a clean ``maigret`` scan (``--top-sites N --json simple|ndjson``).
Sites that return CLAIMED in *every* run are reported: unrelated random claimed
handles are unlikely to all exist on the same third-party site, so such sites are
candidates for broken checks.
"""
from __future__ import annotations
import argparse
import json
import random
import shutil
import subprocess
import sys
import tempfile
from pathlib import Path
def repo_root() -> Path:
return Path(__file__).resolve().parent.parent
def load_username_claimed_pool(db_path: Path) -> list[str]:
with db_path.open(encoding="utf-8") as f:
data = json.load(f)
sites = data.get("sites") or {}
seen: set[str] = set()
pool: list[str] = []
for _name, site in sites.items():
u = (site or {}).get("usernameClaimed")
if not u or not isinstance(u, str):
continue
u = u.strip()
if not u or u in seen:
continue
seen.add(u)
pool.append(u)
return pool
def run_maigret(
*,
username: str,
db_path: Path,
out_dir: Path,
top_sites: int,
json_format: str,
quiet: bool,
) -> Path:
"""Run maigret subprocess; return path to the written JSON report."""
safe = username.replace("/", "_")
report_name = f"report_{safe}_{json_format}.json"
report_path = out_dir / report_name
cmd = [
sys.executable,
"-m",
"maigret",
username,
"--db",
str(db_path),
"--top-sites",
str(top_sites),
"--json",
json_format,
"--folderoutput",
str(out_dir),
"--no-progressbar",
"--no-color",
"--no-recursion",
"--no-extracting",
]
sink = subprocess.DEVNULL if quiet else None
proc = subprocess.run(
cmd,
cwd=str(repo_root()),
text=True,
stdout=sink,
stderr=sink,
)
if proc.returncode != 0:
raise RuntimeError(
f"maigret exited with {proc.returncode} for username {username!r}"
)
if not report_path.is_file():
raise FileNotFoundError(f"Expected report missing: {report_path}")
return report_path
def claimed_sites_from_report(path: Path, json_format: str) -> set[str]:
if json_format == "simple":
with path.open(encoding="utf-8") as f:
data = json.load(f)
if not isinstance(data, dict):
return set()
return set(data.keys())
# ndjson: one object per line, each has "sitename"
sites: set[str] = set()
with path.open(encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
obj = json.loads(line)
name = obj.get("sitename")
if isinstance(name, str) and name:
sites.add(name)
return sites
def main() -> int:
parser = argparse.ArgumentParser(
description=(
"Pick random distinct usernameClaimed values, run maigret --top-sites N "
"with JSON reports, and list sites that claimed all of them (suspicious FP)."
)
)
parser.add_argument(
"--db",
"-b",
type=Path,
default=repo_root() / "maigret" / "resources" / "data.json",
help="Path to Maigret data.json (a temp copy is used for runs).",
)
parser.add_argument(
"--top-sites",
"-n",
type=int,
default=500,
metavar="N",
help="Value for maigret --top-sites (default: 500).",
)
parser.add_argument(
"--samples",
"-k",
type=int,
default=5,
metavar="K",
help="How many distinct random usernames to draw (default: 5).",
)
parser.add_argument(
"--seed",
type=int,
default=None,
help="RNG seed for reproducible username selection.",
)
parser.add_argument(
"--json",
dest="json_format",
default="simple",
choices=["simple", "ndjson"],
help="JSON report type passed to maigret -J (default: simple).",
)
parser.add_argument(
"--verbose",
"-v",
action="store_true",
default=False,
help="Print maigret stdout/stderr (default: suppress child output).",
)
args = parser.parse_args()
quiet = not args.verbose
db_src = args.db.resolve()
if not db_src.is_file():
print(f"Database not found: {db_src}", file=sys.stderr)
return 2
pool = load_username_claimed_pool(db_src)
if len(pool) < args.samples:
print(
f"Need at least {args.samples} distinct usernameClaimed entries, "
f"found {len(pool)}.",
file=sys.stderr,
)
return 2
rng = random.Random(args.seed)
picked = rng.sample(pool, args.samples)
print(f"Database: {db_src}")
print(f"--top-sites {args.top_sites}, {args.samples} random usernameClaimed:")
for i, u in enumerate(picked, 1):
print(f" {i}. {u}")
site_sets: list[set[str]] = []
with tempfile.TemporaryDirectory(prefix="maigret_fp_probe_") as tmp:
tmp_path = Path(tmp)
db_work = tmp_path / "data.json"
shutil.copyfile(db_src, db_work)
for u in picked:
print(f"\nRunning maigret for {u!r} ...", flush=True)
report = run_maigret(
username=u,
db_path=db_work,
out_dir=tmp_path,
top_sites=args.top_sites,
json_format=args.json_format,
quiet=quiet,
)
sites = claimed_sites_from_report(report, args.json_format)
site_sets.append(sites)
print(f" -> {len(sites)} positive site(s) in JSON", flush=True)
always = set.intersection(*site_sets) if site_sets else set()
print("\n--- Sites with CLAIMED in all runs (candidates for false positives) ---")
if not always:
print("(none)")
else:
for name in sorted(always):
print(name)
return 0
if __name__ == "__main__":
raise SystemExit(main())
+282
View File
@@ -0,0 +1,282 @@
#!/usr/bin/env python3
import json
import random
import re
import alive_progress
from mock import Mock
import requests
from maigret.maigret import *
from maigret.result import MaigretCheckStatus
from maigret.sites import MaigretSite
URL_RE = re.compile(r"https?://(www\.)?")
TIMEOUT = 200
async def maigret_check(site, site_data, username, status, logger):
query_notify = Mock()
logger.debug(f'Checking {site}...')
for username, status in [(username, status)]:
results = await maigret(
username,
{site: site_data},
logger,
query_notify,
timeout=TIMEOUT,
forced=True,
no_progressbar=True,
)
if results[site]['status'].status != status:
if results[site]['status'].status == MaigretCheckStatus.UNKNOWN:
msg = site_data.absence_strs
etype = site_data.check_type
context = results[site]['status'].context
logger.debug(f'Error while searching {username} in {site}, must be claimed. Context: {context}')
# if site_data.get('errors'):
# continue
return False
if status == MaigretCheckStatus.CLAIMED:
logger.debug(f'Not found {username} in {site}, must be claimed')
logger.debug(results[site])
pass
else:
logger.debug(f'Found {username} in {site}, must be available')
logger.debug(results[site])
pass
return False
return site_data
async def check_and_add_maigret_site(site_data, semaphore, logger, ok_usernames, bad_usernames):
async with semaphore:
sitename = site_data.name
positive = False
negative = False
for ok_username in ok_usernames:
site_data.username_claimed = ok_username
status = MaigretCheckStatus.CLAIMED
if await maigret_check(sitename, site_data, ok_username, status, logger):
# print(f'{sitename} positive case is okay')
positive = True
break
for bad_username in bad_usernames:
site_data.username_unclaimed = bad_username
status = MaigretCheckStatus.AVAILABLE
if await maigret_check(sitename, site_data, bad_username, status, logger):
# print(f'{sitename} negative case is okay')
negative = True
break
if positive and negative:
site_data = site_data.strip_engine_data()
db.update_site(site_data)
print(site_data.json)
try:
db.save_to_file(args.base_file)
except Exception as e:
logging.error(e, exc_info=True)
print(f'Saved new site {sitename}...')
ok_sites.append(site_data)
if __name__ == '__main__':
parser = ArgumentParser(formatter_class=RawDescriptionHelpFormatter
)
parser.add_argument("--base", "-b", metavar="BASE_FILE",
dest="base_file", default="maigret/resources/data.json",
help="JSON file with sites data to update.")
parser.add_argument("--add-engine", dest="add_engine", help="Additional engine to check")
parser.add_argument("--only-engine", dest="only_engine", help="Use only this engine from detected to check")
parser.add_argument('--check', help='only check sites in database', action='store_true')
parser.add_argument('--random', help='shuffle list of urls', action='store_true', default=False)
parser.add_argument('--top', help='top count of records in file', type=int, default=10000)
parser.add_argument('--filter', help='substring to filter input urls', type=str, default='')
parser.add_argument('--username', help='preferable username to check with', type=str)
parser.add_argument(
"--info",
"-vv",
action="store_true",
dest="info",
default=False,
help="Display service information.",
)
parser.add_argument(
"--verbose",
"-v",
action="store_true",
dest="verbose",
default=False,
help="Display extra information and metrics.",
)
parser.add_argument(
"-d",
"--debug",
"-vvv",
action="store_true",
dest="debug",
default=False,
help="Saving debugging information and sites responses in debug.txt.",
)
parser.add_argument("urls_file",
metavar='URLS_FILE',
action="store",
help="File with base site URLs"
)
args = parser.parse_args()
log_level = logging.ERROR
if args.debug:
log_level = logging.DEBUG
elif args.info:
log_level = logging.INFO
elif args.verbose:
log_level = logging.WARNING
logging.basicConfig(
format='[%(filename)s:%(lineno)d] %(levelname)-3s %(asctime)s %(message)s',
datefmt='%H:%M:%S',
level=log_level
)
logger = logging.getLogger('engines-check')
logger.setLevel(log_level)
db = MaigretDatabase()
sites_subset = db.load_from_file(args.base_file).sites
sites = {site.name: site for site in sites_subset}
engines = db.engines
# TODO: usernames extractors
ok_usernames = ['alex', 'god', 'admin', 'red', 'blue', 'john']
if args.username:
ok_usernames = [args.username] + ok_usernames
bad_usernames = ['noonewouldeverusethis7']
with open(args.urls_file, 'r') as urls_file:
urls = urls_file.read().splitlines()
if args.random:
random.shuffle(urls)
urls = urls[:args.top]
raw_maigret_data = json.dumps({site.name: site.json for site in sites_subset})
new_sites = []
for site in alive_progress.alive_it(urls):
site_lowercase = site.lower()
domain_raw = URL_RE.sub('', site_lowercase).strip().strip('/')
domain_raw = domain_raw.split('/')[0]
if args.filter and args.filter not in domain_raw:
logger.debug('Site %s skipped due to filtering by "%s"', domain_raw, args.filter)
continue
if domain_raw in raw_maigret_data:
logger.debug(f'Site {domain_raw} already exists in the Maigret database!')
continue
if '"' in domain_raw:
logger.debug(f'Invalid site {domain_raw}')
continue
main_page_url = '/'.join(site.split('/', 3)[:3])
site_data = {
'url': site,
'urlMain': main_page_url,
'name': domain_raw,
}
try:
r = requests.get(main_page_url, timeout=5)
except:
r = None
pass
detected_engines = []
for e in engines:
strs_to_check = e.__dict__.get('presenseStrs')
if strs_to_check and r and r.text:
all_strs_in_response = True
for s in strs_to_check:
if not s in r.text:
all_strs_in_response = False
if all_strs_in_response:
engine_name = e.__dict__.get('name')
detected_engines.append(engine_name)
logger.info(f'Detected engine {engine_name} for site {main_page_url}')
if args.only_engine and args.only_engine in detected_engines:
detected_engines = [args.only_engine]
elif not detected_engines and args.add_engine:
logging.debug('Could not detect any engine, applying default engine %s...', args.add_engine)
detected_engines = [args.add_engine]
def create_site_from_engine(sitename, data, e):
site = MaigretSite(sitename, data)
site.update_from_engine(db.engines_dict[e])
site.engine = e
return site
for engine_name in detected_engines:
site = create_site_from_engine(domain_raw, site_data, engine_name)
new_sites.append(site)
logger.debug(site.json)
# if engine_name == "phpBB":
# site_data_with_subpath = dict(site_data)
# site_data_with_subpath["urlSubpath"] = "/forum"
# site = create_site_from_engine(domain_raw, site_data_with_subpath, engine_name)
# new_sites.append(site)
# except Exception as e:
# print(f'Error: {str(e)}')
# pass
print(f'Found {len(new_sites)}/{len(urls)} new sites')
if args.check:
for s in new_sites:
print(s.url_main)
sys.exit(0)
sem = asyncio.Semaphore(20)
loop = asyncio.get_event_loop()
ok_sites = []
tasks = []
for site in new_sites:
check_coro = check_and_add_maigret_site(site, sem, logger, ok_usernames, bad_usernames)
future = asyncio.ensure_future(check_coro)
tasks.append(future)
with alive_progress(len(tasks), title='Checking sites') as progress:
for f in asyncio.as_completed(tasks):
progress()
try:
loop.run_until_complete(f)
except asyncio.exceptions.TimeoutError:
pass
print(f'Found and saved {len(ok_sites)} sites!')
+750
View File
@@ -0,0 +1,750 @@
#!/usr/bin/env python3
"""
Site check utility for Maigret development.
Quickly test site availability, find valid usernames, and diagnose check issues.
Usage:
python utils/site_check.py --site "SiteName" --check-claimed
python utils/site_check.py --site "SiteName" --maigret # Test via Maigret
python utils/site_check.py --site "SiteName" --compare-methods # aiohttp vs Maigret
python utils/site_check.py --url "https://example.com/user/{username}" --test "john"
python utils/site_check.py --site "SiteName" --find-user
python utils/site_check.py --site "SiteName" --diagnose # Full diagnosis
"""
import argparse
import asyncio
import json
import logging
import re
import sys
from pathlib import Path
from typing import Dict, List, Optional, Tuple
# Add parent dir for imports
sys.path.insert(0, str(Path(__file__).parent.parent))
try:
import aiohttp
except ImportError:
print("aiohttp not installed. Run: pip install aiohttp")
sys.exit(1)
# Maigret imports (optional, for --maigret mode)
MAIGRET_AVAILABLE = False
try:
from maigret.sites import MaigretDatabase, MaigretSite
from maigret.checking import (
SimpleAiohttpChecker,
check_site_for_username,
process_site_result,
make_site_result,
)
from maigret.notify import QueryNotifyPrint
from maigret.result import QueryStatus
MAIGRET_AVAILABLE = True
except ImportError:
pass
DEFAULT_HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
}
COMMON_USERNAMES = ["blue", "test", "admin", "user", "john", "alex", "david", "mike", "chris", "dan"]
class Colors:
"""ANSI color codes for terminal output."""
RED = "\033[91m"
GREEN = "\033[92m"
YELLOW = "\033[93m"
BLUE = "\033[94m"
MAGENTA = "\033[95m"
CYAN = "\033[96m"
RESET = "\033[0m"
BOLD = "\033[1m"
def color(text: str, c: str) -> str:
"""Wrap text with color codes."""
return f"{c}{text}{Colors.RESET}"
async def check_url_aiohttp(url: str, headers: dict = None, follow_redirects: bool = True,
timeout: int = 15, ssl_verify: bool = False) -> dict:
"""Check a URL using aiohttp and return detailed response info."""
headers = headers or DEFAULT_HEADERS.copy()
result = {
"method": "aiohttp",
"url": url,
"status": None,
"final_url": None,
"redirects": [],
"content_length": 0,
"content": None,
"title": None,
"error": None,
"error_type": None,
"markers": {},
}
try:
connector = aiohttp.TCPConnector(ssl=ssl_verify)
timeout_obj = aiohttp.ClientTimeout(total=timeout)
async with aiohttp.ClientSession(connector=connector, timeout=timeout_obj) as session:
async with session.get(url, headers=headers, allow_redirects=follow_redirects) as resp:
result["status"] = resp.status
result["final_url"] = str(resp.url)
# Get redirect history
if resp.history:
result["redirects"] = [str(r.url) for r in resp.history]
# Read content
try:
text = await resp.text()
result["content_length"] = len(text)
result["content"] = text
# Extract title
title_match = re.search(r'<title>([^<]*)</title>', text, re.IGNORECASE)
if title_match:
result["title"] = title_match.group(1).strip()[:100]
# Check common markers
text_lower = text.lower()
markers = {
"404_text": any(m in text_lower for m in ["not found", "404", "doesn't exist", "does not exist"]),
"profile_markers": any(m in text_lower for m in ["profile", "user", "member", "account"]),
"error_markers": any(m in text_lower for m in ["error", "banned", "suspended", "blocked"]),
"login_required": any(m in text_lower for m in ["log in", "login", "sign in", "signin"]),
"captcha": any(m in text_lower for m in ["captcha", "recaptcha", "challenge", "verify you"]),
"cloudflare": "cloudflare" in text_lower or "cf-ray" in text_lower,
"rate_limit": any(m in text_lower for m in ["rate limit", "too many requests", "429"]),
}
result["markers"] = markers
# First 500 chars of body for inspection
result["body_preview"] = text[:500].replace("\n", " ").strip()
except Exception as e:
result["error"] = f"Content read error: {e}"
result["error_type"] = "content_error"
except asyncio.TimeoutError:
result["error"] = "Timeout"
result["error_type"] = "timeout"
except aiohttp.ClientError as e:
result["error"] = f"Client error: {e}"
result["error_type"] = "client_error"
except Exception as e:
result["error"] = f"Error: {e}"
result["error_type"] = "unknown"
return result
async def check_url_maigret(site: 'MaigretSite', username: str, logger=None) -> dict:
"""Check a URL using Maigret's checking mechanism."""
if not MAIGRET_AVAILABLE:
return {"error": "Maigret not available", "method": "maigret"}
if logger is None:
logger = logging.getLogger("site_check")
logger.setLevel(logging.WARNING)
result = {
"method": "maigret",
"url": None,
"status": None,
"status_str": None,
"http_status": None,
"final_url": None,
"error": None,
"error_type": None,
"ids_data": None,
}
try:
# Create query options
options = {
"parsing": False,
"cookie_jar": None,
"timeout": 15,
}
# Create a simple notifier
class SilentNotify:
def start(self, msg=None): pass
def update(self, status, similar=False): pass
def finish(self, msg=None, status=None): pass
notifier = SilentNotify()
# Run the check
site_name, site_result = await check_site_for_username(
site, username, options, logger, notifier
)
result["url"] = site_result.get("url_user")
result["status"] = site_result.get("status")
result["status_str"] = str(site_result.get("status"))
result["http_status"] = site_result.get("http_status")
result["ids_data"] = site_result.get("ids_data")
# Check for errors
status = site_result.get("status")
if status and hasattr(status, 'error') and status.error:
result["error"] = f"{status.error.type}: {status.error.desc}"
result["error_type"] = str(status.error.type)
except Exception as e:
result["error"] = str(e)
result["error_type"] = "exception"
return result
async def find_valid_username(url_template: str, usernames: list = None, headers: dict = None) -> Optional[str]:
"""Try common usernames to find one that works."""
usernames = usernames or COMMON_USERNAMES
headers = headers or DEFAULT_HEADERS.copy()
print(f"Testing {len(usernames)} usernames on {url_template}...")
for username in usernames:
url = url_template.replace("{username}", username)
result = await check_url_aiohttp(url, headers)
status = result["status"]
markers = result.get("markers", {})
# Good signs: 200 status, profile markers, no 404 text
if status == 200 and not markers.get("404_text") and markers.get("profile_markers"):
print(f" {color('[+]', Colors.GREEN)} {username}: status={status}, has profile markers")
return username
elif status == 200 and not markers.get("404_text"):
print(f" {color('[?]', Colors.YELLOW)} {username}: status={status}, might work")
else:
print(f" {color('[-]', Colors.RED)} {username}: status={status}")
return None
async def compare_users_aiohttp(url_template: str, claimed: str, unclaimed: str = "noonewouldeverusethis7",
headers: dict = None) -> Tuple[dict, dict]:
"""Compare responses for claimed vs unclaimed usernames using aiohttp."""
headers = headers or DEFAULT_HEADERS.copy()
print(f"\n{'='*60}")
print(f"Comparing: {color(claimed, Colors.GREEN)} vs {color(unclaimed, Colors.RED)}")
print(f"URL template: {url_template}")
print(f"Method: aiohttp")
print(f"{'='*60}\n")
url_claimed = url_template.replace("{username}", claimed)
url_unclaimed = url_template.replace("{username}", unclaimed)
result_claimed, result_unclaimed = await asyncio.gather(
check_url_aiohttp(url_claimed, headers),
check_url_aiohttp(url_unclaimed, headers)
)
def print_result(name, r, c):
print(f"--- {color(name, c)} ---")
print(f" URL: {r['url']}")
print(f" Status: {color(str(r['status']), Colors.GREEN if r['status'] == 200 else Colors.RED)}")
if r["redirects"]:
print(f" Redirects: {' -> '.join(r['redirects'])} -> {r['final_url']}")
print(f" Final URL: {r['final_url']}")
print(f" Content length: {r['content_length']}")
print(f" Title: {r['title']}")
if r["error"]:
print(f" Error: {color(r['error'], Colors.RED)}")
print(f" Markers: {r['markers']}")
print()
print_result(f"CLAIMED ({claimed})", result_claimed, Colors.GREEN)
print_result(f"UNCLAIMED ({unclaimed})", result_unclaimed, Colors.RED)
# Analysis
print(f"--- {color('ANALYSIS', Colors.CYAN)} ---")
recommendations = []
if result_claimed["status"] != result_unclaimed["status"]:
print(f" [!] Status codes differ: {result_claimed['status']} vs {result_unclaimed['status']}")
recommendations.append(("status_code", f"Status codes: {result_claimed['status']} vs {result_unclaimed['status']}"))
if result_claimed["final_url"] != result_unclaimed["final_url"]:
print(f" [!] Final URLs differ")
recommendations.append(("response_url", "Final URLs differ"))
if result_claimed["content_length"] != result_unclaimed["content_length"]:
diff = abs(result_claimed["content_length"] - result_unclaimed["content_length"])
print(f" [!] Content length differs by {diff} bytes")
recommendations.append(("message", f"Content differs by {diff} bytes"))
if result_claimed["title"] != result_unclaimed["title"]:
print(f" [!] Titles differ:")
print(f" Claimed: {result_claimed['title']}")
print(f" Unclaimed: {result_unclaimed['title']}")
recommendations.append(("message", f"Titles differ: '{result_claimed['title']}' vs '{result_unclaimed['title']}'"))
# Check for problems
if result_claimed.get("markers", {}).get("captcha"):
print(f" {color('[WARN]', Colors.YELLOW)} Captcha detected on claimed page")
if result_claimed.get("markers", {}).get("cloudflare"):
print(f" {color('[WARN]', Colors.YELLOW)} Cloudflare protection detected")
if result_claimed.get("markers", {}).get("login_required"):
print(f" {color('[WARN]', Colors.YELLOW)} Login may be required")
if recommendations:
print(f"\n {color('Recommended checkType:', Colors.BOLD)} {recommendations[0][0]}")
else:
print(f" {color('[!]', Colors.RED)} No clear difference found - site may need special handling")
return result_claimed, result_unclaimed
async def compare_methods(site: 'MaigretSite', claimed: str, unclaimed: str) -> dict:
"""Compare aiohttp vs Maigret results for the same site."""
if not MAIGRET_AVAILABLE:
print(color("Maigret not available for comparison", Colors.RED))
return {}
print(f"\n{'='*60}")
print(f"{color('METHOD COMPARISON', Colors.CYAN)}: aiohttp vs Maigret")
print(f"Site: {site.name}")
print(f"Claimed: {claimed}, Unclaimed: {unclaimed}")
print(f"{'='*60}\n")
# Build URL template
url_template = site.url
url_template = url_template.replace("{urlMain}", site.url_main or "")
url_template = url_template.replace("{urlSubpath}", getattr(site, 'url_subpath', '') or "")
headers = DEFAULT_HEADERS.copy()
if hasattr(site, 'headers') and site.headers:
headers.update(site.headers)
# Run all checks in parallel
url_claimed = url_template.replace("{username}", claimed)
url_unclaimed = url_template.replace("{username}", unclaimed)
aiohttp_claimed, aiohttp_unclaimed, maigret_claimed, maigret_unclaimed = await asyncio.gather(
check_url_aiohttp(url_claimed, headers),
check_url_aiohttp(url_unclaimed, headers),
check_url_maigret(site, claimed),
check_url_maigret(site, unclaimed),
)
def status_icon(status):
if status == 200:
return color("200", Colors.GREEN)
elif status == 404:
return color("404", Colors.YELLOW)
elif status and status >= 400:
return color(str(status), Colors.RED)
return str(status)
def maigret_status_icon(status_str):
if "Claimed" in str(status_str):
return color("Claimed", Colors.GREEN)
elif "Available" in str(status_str):
return color("Available", Colors.YELLOW)
else:
return color(str(status_str), Colors.RED)
print(f"{'Method':<12} {'Username':<25} {'HTTP Status':<12} {'Result':<20}")
print("-" * 70)
print(f"{'aiohttp':<12} {claimed:<25} {status_icon(aiohttp_claimed['status']):<20} {'OK' if not aiohttp_claimed['error'] else aiohttp_claimed['error'][:20]}")
print(f"{'aiohttp':<12} {unclaimed:<25} {status_icon(aiohttp_unclaimed['status']):<20} {'OK' if not aiohttp_unclaimed['error'] else aiohttp_unclaimed['error'][:20]}")
print(f"{'Maigret':<12} {claimed:<25} {status_icon(maigret_claimed.get('http_status')):<20} {maigret_status_icon(maigret_claimed.get('status_str'))}")
print(f"{'Maigret':<12} {unclaimed:<25} {status_icon(maigret_unclaimed.get('http_status')):<20} {maigret_status_icon(maigret_unclaimed.get('status_str'))}")
# Check for discrepancies
print(f"\n--- {color('DISCREPANCY ANALYSIS', Colors.CYAN)} ---")
issues = []
if aiohttp_claimed['status'] != maigret_claimed.get('http_status'):
issues.append(f"HTTP status mismatch for claimed: aiohttp={aiohttp_claimed['status']}, Maigret={maigret_claimed.get('http_status')}")
if aiohttp_unclaimed['status'] != maigret_unclaimed.get('http_status'):
issues.append(f"HTTP status mismatch for unclaimed: aiohttp={aiohttp_unclaimed['status']}, Maigret={maigret_unclaimed.get('http_status')}")
# Check Maigret detection correctness
claimed_detected = "Claimed" in str(maigret_claimed.get('status_str', ''))
unclaimed_detected = "Available" in str(maigret_unclaimed.get('status_str', ''))
if not claimed_detected:
issues.append(f"Maigret did NOT detect claimed user '{claimed}' as Claimed")
if not unclaimed_detected:
issues.append(f"Maigret did NOT detect unclaimed user '{unclaimed}' as Available")
if issues:
for issue in issues:
print(f" {color('[!]', Colors.RED)} {issue}")
else:
print(f" {color('[OK]', Colors.GREEN)} Both methods agree on results")
return {
"aiohttp_claimed": aiohttp_claimed,
"aiohttp_unclaimed": aiohttp_unclaimed,
"maigret_claimed": maigret_claimed,
"maigret_unclaimed": maigret_unclaimed,
"issues": issues,
}
async def diagnose_site(site_config: dict, site_name: str) -> dict:
"""Full diagnosis of a site configuration."""
print(f"\n{'='*60}")
print(f"{color('FULL SITE DIAGNOSIS', Colors.CYAN)}: {site_name}")
print(f"{'='*60}\n")
diagnosis = {
"site_name": site_name,
"issues": [],
"warnings": [],
"recommendations": [],
"working": False,
}
# 1. Config analysis
print(f"--- {color('1. CONFIGURATION', Colors.BOLD)} ---")
check_type = site_config.get("checkType", "status_code")
url = site_config.get("url", "")
url_main = site_config.get("urlMain", "")
claimed = site_config.get("usernameClaimed")
unclaimed = site_config.get("usernameUnclaimed", "noonewouldeverusethis7")
disabled = site_config.get("disabled", False)
print(f" checkType: {check_type}")
print(f" URL: {url}")
print(f" urlMain: {url_main}")
print(f" usernameClaimed: {claimed}")
print(f" disabled: {disabled}")
if disabled:
diagnosis["issues"].append("Site is disabled")
print(f" {color('[!]', Colors.YELLOW)} Site is disabled")
if not claimed:
diagnosis["issues"].append("No usernameClaimed defined")
print(f" {color('[!]', Colors.RED)} No usernameClaimed defined")
return diagnosis
# Build full URL
url_template = url.replace("{urlMain}", url_main).replace("{urlSubpath}", site_config.get("urlSubpath", ""))
headers = DEFAULT_HEADERS.copy()
if site_config.get("headers"):
headers.update(site_config["headers"])
# 2. Connectivity test
print(f"\n--- {color('2. CONNECTIVITY TEST', Colors.BOLD)} ---")
url_claimed = url_template.replace("{username}", claimed)
url_unclaimed = url_template.replace("{username}", unclaimed)
result_claimed, result_unclaimed = await asyncio.gather(
check_url_aiohttp(url_claimed, headers),
check_url_aiohttp(url_unclaimed, headers)
)
print(f" Claimed ({claimed}): status={result_claimed['status']}, error={result_claimed['error']}")
print(f" Unclaimed ({unclaimed}): status={result_unclaimed['status']}, error={result_unclaimed['error']}")
# Check for common problems
if result_claimed["error_type"] == "timeout":
diagnosis["issues"].append("Timeout on claimed username")
if result_unclaimed["error_type"] == "timeout":
diagnosis["issues"].append("Timeout on unclaimed username")
if result_claimed.get("markers", {}).get("cloudflare"):
diagnosis["warnings"].append("Cloudflare protection detected")
if result_claimed.get("markers", {}).get("captcha"):
diagnosis["warnings"].append("Captcha detected")
if result_claimed["status"] == 403:
diagnosis["issues"].append("403 Forbidden - possible anti-bot protection")
if result_claimed["status"] == 429:
diagnosis["issues"].append("429 Rate Limited")
# 3. Check type validation
print(f"\n--- {color('3. CHECK TYPE VALIDATION', Colors.BOLD)} ---")
if check_type == "status_code":
if result_claimed["status"] == result_unclaimed["status"]:
diagnosis["issues"].append(f"status_code check but same status ({result_claimed['status']}) for both")
print(f" {color('[FAIL]', Colors.RED)} Same status code for claimed and unclaimed: {result_claimed['status']}")
else:
print(f" {color('[OK]', Colors.GREEN)} Status codes differ: {result_claimed['status']} vs {result_unclaimed['status']}")
diagnosis["working"] = True
elif check_type == "response_url":
if result_claimed["final_url"] == result_unclaimed["final_url"]:
diagnosis["issues"].append("response_url check but same final URL for both")
print(f" {color('[FAIL]', Colors.RED)} Same final URL for both")
else:
print(f" {color('[OK]', Colors.GREEN)} Final URLs differ")
diagnosis["working"] = True
elif check_type == "message":
presense_strs = site_config.get("presenseStrs", [])
absence_strs = site_config.get("absenceStrs", [])
print(f" presenseStrs: {presense_strs}")
print(f" absenceStrs: {absence_strs}")
claimed_content = result_claimed.get("content", "") or ""
unclaimed_content = result_unclaimed.get("content", "") or ""
# Check presenseStrs
presense_found_claimed = any(s in claimed_content for s in presense_strs) if presense_strs else True
presense_found_unclaimed = any(s in unclaimed_content for s in presense_strs) if presense_strs else True
# Check absenceStrs
absence_found_claimed = any(s in claimed_content for s in absence_strs) if absence_strs else False
absence_found_unclaimed = any(s in unclaimed_content for s in absence_strs) if absence_strs else False
print(f" Claimed - presenseStrs found: {presense_found_claimed}, absenceStrs found: {absence_found_claimed}")
print(f" Unclaimed - presenseStrs found: {presense_found_unclaimed}, absenceStrs found: {absence_found_unclaimed}")
if presense_strs and not presense_found_claimed:
diagnosis["issues"].append(f"presenseStrs {presense_strs} not found in claimed page")
print(f" {color('[FAIL]', Colors.RED)} presenseStrs not found in claimed page")
if absence_strs and absence_found_claimed:
diagnosis["issues"].append(f"absenceStrs {absence_strs} found in claimed page (should not be)")
print(f" {color('[FAIL]', Colors.RED)} absenceStrs found in claimed page")
if absence_strs and not absence_found_unclaimed:
diagnosis["warnings"].append(f"absenceStrs not found in unclaimed page")
print(f" {color('[WARN]', Colors.YELLOW)} absenceStrs not found in unclaimed page")
if presense_found_claimed and not absence_found_claimed and absence_found_unclaimed:
print(f" {color('[OK]', Colors.GREEN)} Message check should work correctly")
diagnosis["working"] = True
# 4. Recommendations
print(f"\n--- {color('4. RECOMMENDATIONS', Colors.BOLD)} ---")
if not diagnosis["working"]:
# Suggest alternatives
if result_claimed["status"] != result_unclaimed["status"]:
diagnosis["recommendations"].append(f"Switch to checkType: status_code (status {result_claimed['status']} vs {result_unclaimed['status']})")
if result_claimed["final_url"] != result_unclaimed["final_url"]:
diagnosis["recommendations"].append("Switch to checkType: response_url")
if result_claimed["title"] != result_unclaimed["title"]:
diagnosis["recommendations"].append(f"Use title as marker: presenseStrs=['{result_claimed['title']}'] or absenceStrs=['{result_unclaimed['title']}']")
if diagnosis["recommendations"]:
for rec in diagnosis["recommendations"]:
print(f" -> {rec}")
elif diagnosis["working"]:
print(f" {color('Site appears to be working correctly', Colors.GREEN)}")
else:
print(f" {color('No clear fix found - site may need special handling or should be disabled', Colors.RED)}")
# Summary
print(f"\n--- {color('SUMMARY', Colors.BOLD)} ---")
if diagnosis["issues"]:
print(f" Issues: {len(diagnosis['issues'])}")
for issue in diagnosis["issues"]:
print(f" - {issue}")
if diagnosis["warnings"]:
print(f" Warnings: {len(diagnosis['warnings'])}")
for warn in diagnosis["warnings"]:
print(f" - {warn}")
print(f" Working: {color('YES', Colors.GREEN) if diagnosis['working'] else color('NO', Colors.RED)}")
return diagnosis
def load_site_from_db(site_name: str) -> Tuple[Optional[dict], Optional['MaigretSite']]:
"""Load site config from data.json. Returns (config_dict, MaigretSite or None)."""
db_path = Path(__file__).parent.parent / "maigret" / "resources" / "data.json"
with open(db_path) as f:
data = json.load(f)
config = None
if site_name in data["sites"]:
config = data["sites"][site_name]
else:
# Try case-insensitive search
for name, cfg in data["sites"].items():
if name.lower() == site_name.lower():
config = cfg
site_name = name
break
if not config:
return None, None
# Also load MaigretSite if available
maigret_site = None
if MAIGRET_AVAILABLE:
try:
db = MaigretDatabase().load_from_path(db_path)
maigret_site = db.sites_dict.get(site_name)
except Exception:
pass
return config, maigret_site
async def main():
parser = argparse.ArgumentParser(
description="Site check utility for Maigret development",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s --site "VK" --check-claimed # Test site with aiohttp
%(prog)s --site "VK" --maigret # Test site with Maigret
%(prog)s --site "VK" --compare-methods # Compare aiohttp vs Maigret
%(prog)s --site "VK" --diagnose # Full diagnosis
%(prog)s --url "https://vk.com/{username}" --compare blue nobody123
%(prog)s --site "VK" --find-user # Find a valid username
"""
)
parser.add_argument("--site", "-s", help="Site name from data.json")
parser.add_argument("--url", "-u", help="URL template with {username}")
parser.add_argument("--test", "-t", help="Username to test")
parser.add_argument("--compare", "-c", nargs=2, metavar=("CLAIMED", "UNCLAIMED"),
help="Compare two usernames")
parser.add_argument("--find-user", "-f", action="store_true",
help="Find a valid username")
parser.add_argument("--check-claimed", action="store_true",
help="Check if claimed username still works (aiohttp)")
parser.add_argument("--maigret", "-m", action="store_true",
help="Test using Maigret's checker instead of aiohttp")
parser.add_argument("--compare-methods", action="store_true",
help="Compare aiohttp vs Maigret results")
parser.add_argument("--diagnose", "-d", action="store_true",
help="Full diagnosis of site configuration")
parser.add_argument("--headers", help="Custom headers as JSON")
parser.add_argument("--timeout", type=int, default=15, help="Request timeout in seconds")
parser.add_argument("--json", action="store_true", help="Output results as JSON")
args = parser.parse_args()
url_template = None
claimed = None
unclaimed = "noonewouldeverusethis7"
headers = DEFAULT_HEADERS.copy()
site_config = None
maigret_site = None
# Load from site name
if args.site:
site_config, maigret_site = load_site_from_db(args.site)
if not site_config:
print(f"Site '{args.site}' not found in database")
sys.exit(1)
url_template = site_config.get("url", "")
url_main = site_config.get("urlMain", "")
url_subpath = site_config.get("urlSubpath", "")
url_template = url_template.replace("{urlMain}", url_main).replace("{urlSubpath}", url_subpath)
claimed = site_config.get("usernameClaimed")
unclaimed = site_config.get("usernameUnclaimed", unclaimed)
if site_config.get("headers"):
headers.update(site_config["headers"])
if not args.json:
print(f"Loaded site: {args.site}")
print(f" URL: {url_template}")
print(f" Claimed: {claimed}")
print(f" CheckType: {site_config.get('checkType', 'unknown')}")
print(f" Disabled: {site_config.get('disabled', False)}")
# Override with explicit URL
if args.url:
url_template = args.url
# Custom headers
if args.headers:
headers.update(json.loads(args.headers))
# Actions
if args.diagnose:
if not site_config:
print("--diagnose requires --site")
sys.exit(1)
result = await diagnose_site(site_config, args.site)
if args.json:
print(json.dumps(result, indent=2, default=str))
elif args.compare_methods:
if not maigret_site:
if not MAIGRET_AVAILABLE:
print("Maigret imports not available")
else:
print("Could not load MaigretSite object")
sys.exit(1)
result = await compare_methods(maigret_site, claimed, unclaimed)
if args.json:
print(json.dumps(result, indent=2, default=str))
elif args.maigret:
if not maigret_site:
if not MAIGRET_AVAILABLE:
print("Maigret imports not available")
else:
print("Could not load MaigretSite object")
sys.exit(1)
print(f"\n--- Testing with Maigret ---")
for username in [claimed, unclaimed]:
result = await check_url_maigret(maigret_site, username)
print(f" {username}: status={result.get('status_str')}, http={result.get('http_status')}, error={result.get('error')}")
elif args.find_user:
if not url_template:
print("--find-user requires --site or --url")
sys.exit(1)
result = await find_valid_username(url_template, headers=headers)
if result:
print(f"\n{color('Found valid username:', Colors.GREEN)} {result}")
else:
print(f"\n{color('No valid username found', Colors.RED)}")
elif args.compare:
if not url_template:
print("--compare requires --site or --url")
sys.exit(1)
result = await compare_users_aiohttp(url_template, args.compare[0], args.compare[1], headers)
if args.json:
# Remove content field for JSON output (too large)
for r in result:
if isinstance(r, dict) and "content" in r:
del r["content"]
print(json.dumps(result, indent=2, default=str))
elif args.check_claimed and claimed:
result = await compare_users_aiohttp(url_template, claimed, unclaimed, headers)
elif args.test:
if not url_template:
print("--test requires --site or --url")
sys.exit(1)
url = url_template.replace("{username}", args.test)
result = await check_url_aiohttp(url, headers, timeout=args.timeout)
if "content" in result:
del result["content"] # Too large for display
print(json.dumps(result, indent=2, default=str))
else:
# Default: check claimed username if available
if url_template and claimed:
await compare_users_aiohttp(url_template, claimed, unclaimed, headers)
else:
parser.print_help()
if __name__ == "__main__":
asyncio.run(main())

Some files were not shown because too many files have changed in this diff Show More