the <0.3/<0.4/etc upper bounds don't leave room for darwin or
emulated/aarch64 runners, which have been seeing 0.7s+ on tests
that expected <0.3s.
bumped each upper bound by +0.7s. lower bounds unchanged — they
still validate that tasks ran in parallel rather than serially.
refs #679
Co-authored-by: Julio César Suástegui <juliosuas@users.noreply.github.com>
self.allow_redirects and self.timeout were each initialized twice in
SimpleAiohttpChecker.__init__, which is redundant code.
Co-authored-by: zocomputer <help@zocomputer.com>
* Bump lxml minimum to 6.0.2 for Python 3.14 compatibility
lxml 5.x fails to build on Python 3.14 due to incompatible pointer
types in Cython-generated C code. lxml 6.0.2 compiles correctly.
Fixes#2266
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update poetry.lock to match pyproject.toml changes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Soxoj <31013580+soxoj@users.noreply.github.com>
The previous absence string 'The requested user does not exist or is inactive'
no longer matches the live site response. InterPals now returns 'User not found'
for non-existent profiles, causing false positives for all username searches.
Tested against interpals.net/noneownsthisusername (non-existent) and
interpals.net/blue (claimed) to confirm detection accuracy.
Closes#2433
Co-authored-by: Julio César Suástegui <juliosuas@users.noreply.github.com>
* Overhaul site tags and naming: add social tag to 33 networks, fill missing tags for 213 top-1000 sites, clean up false us/in country tags (~374 sites), normalize site names to Title Case, add tag validation tests, document tagging and naming rules
Remove LLM folder: ask @soxoj for the up-to-date version!
* Remove LLM/ from version control
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Added social tag to social networks (33 sites)
- Fixed wrong tags (8 sites)
- Filled empty tags for 213 sites in top-1000
- Country tag cleanup (~374 sites)
- Site naming normalization (75 sites)
- New tests (3)
- Documentation updates
* feat(core): add POST request support, new sites, migrate to Majestic Million ranking
- Added native POST request support to the Maigret engine (requestMethod, requestPayload) to enable querying modern JSON registration endpoints.
- Replaced the discontinued Alexa rank API with the Majestic Million dataset for global popularity sorting and automated CI updates.
- Fixed multiple false positives among top 500 sites and bypassed standard anti-bot protections using custom User-Agents.
- Updated public documentation and internal playbooks to reflect the new features.
* feat(data): apply all data.json site check updates from main branch
- Added CTFtime and PentesterLab (new sites added in main)
- Removed forums.imore.com (deleted in main as dead site)
- Disabled 5 sites per main branch fixes: Librusec, MirTesen, amateurvoyeurforum.com, forums.stevehoffman.tv, vegalab
- Fixed 5 site checks per main branch: SoundCloud, Taplink, Setlist, RoyalCams, club.cnews.ru (switched from status_code to message checkType with proper markers)
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/a1d194d9-c0ff-4e2b-974c-c5e4b59548bf
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Add two cybersecurity platforms for username enumeration:
- CTFtime (ctftime.org) - CTF competition platform
- PentesterLab (pentesterlab.com) - Security training platform
Both verified working with status_code check type.
Returns 200 for existing users, 404 for non-existent.
Co-authored-by: Julio César Suástegui <juliosuas@users.noreply.github.com>
* Initial plan
* Disable RoyalCams site check to fix false-positive probe
The Telegram Maigret bot auto-probe reported CLAIMED for three random
usernames. The status_code checkType is unreliable as the site returns
200 for non-existent user profiles (soft 404). Disabling the site check
until a reliable detection method can be established.
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/05b3d513-fe15-477d-a455-0c9ddf0b8b51
* Fix RoyalCams: switch to message checkType using BongaCams white-label pattern
RoyalCams runs on the BongaCams platform. Applied the same fix pattern:
- Switch from status_code to message checkType
- Use Portuguese locale (pt.royalcams.com) as urlProbe
- absenceStrs matches generic title on non-existent profiles
- presenseStrs matches Portuguese profile title for existing users
- Add browser-like headers matching BongaCams config
- Remove disabled flag
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/2f6a9523-278a-4992-ba7c-c320de14bfa4
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
- Fix VK and TradingView checkType; add Reddit and Microsoft Learn API-style probes where appropriate; adjust or disable entries that are unreliable under anti-bot protection.
- Self-check: stop aggressive auto-disable; default to reporting issues only; add --auto-disable and --diagnose for optional fixes and deeper output.
- Tooling: add utils/site_check.py and utils/check_top_n.py (and related helpers) to inspect and rank site behavior against the top-N list
- Scope: aligns with fixing top-traffic / high-impact sites and making diagnostics repeatable without silently flipping disabled flags
The path `'~/.maigret/settings.json'` uses a tilde (`~`) which is not automatically expanded by Python's `open()` function. This will cause the settings file in the user's home directory to be silently ignored (caught by `FileNotFoundError`) because Python will look for a literal directory named `~` in the current working directory.
Affected files: settings.py
The `Settings.load()` method iterates through multiple configuration file paths and updates the internal `__dict__`, intending to override earlier default settings with later user-specific ones. This cascading logic is a core configuration feature but lacks explicit tests to guarantee that dictionary merging and overriding behave exactly as documented (e.g., ensuring a setting in `~/.maigret/settings.json` correctly overrides `resources/settings.json` without wiping out other keys).
Affected files: test_settings.py
* refactor: hardcoded relative path for database file
`app.config['MAIGRET_DB_FILE']` is set to a hardcoded relative path `os.path.join('maigret', 'resources', 'data.json')`. If the Flask application is executed from a different working directory (other than the repository root), it will fail to find the database file and crash.
Affected files: app.py, settings.py
* refactor: hardcoded relative path for database file
`app.config['MAIGRET_DB_FILE']` is set to a hardcoded relative path `os.path.join('maigret', 'resources', 'data.json')`. If the Flask application is executed from a different working directory (other than the repository root), it will fail to find the database file and crash.
Affected files: app.py, settings.py
* make graph more meaningful
if a search with multiple usernames is launched, it creates an additional site node where they both are found.
advantages:
- better recognition, that users have a connection with each other
- better detection of false positives when launching a search with two fake usernames (site node = definite false positive)
* fix Graph linking report.py
* update web interface with commandline options
* improve web interface
* update README images of web interface
* fix bug in app.py
* fix web interface
currently maigret parses urls as usernames related to gravatar. this leads to bad filenames of the output on my linux host, as the slashes cause it to try to write subfolders, causing the script to abort with the error "file does not exist".
Applied a simple fix to replace all "/" with "_" in output file generation.
* Updated example colab file (Due to latest update)
* Fix RobertsSpaceIndustries URI
* Fix PyInstaller workflow
* Fix example.ipynb (read desc.)
Currently the version installed via pip3 doesn't appear to contain the latest data.json file, resulting in many false positives..
* Fix non-existant users (read desc.)
Fixed non-existant usernames for the following:
Telegram (t.me)
TikBuddy (tikbuddy.com)
FurAffinity (furaffinity.net)
Alik.cz is seeing unusually high traffic on usernames julian and
noonewouldeverusethis due to its presence in both Sherlock and Maigret.
This target is permanently removed and should not be replaced.
* Adding permutator feature for usernames
("", "_", "-", ".") when id_type == username
File : maigret/permutator.py
Arg : --permute
For now, only permute from 2 elements and doesn't return single elements (element1, _element1, element1_, element2, _element2, ...). 12 permuts for 2 elements.
To return single elements as well, Permute(usernames).gather(method="all"), but not implemented in maigrat.py. 18 permuts for 2 elements. Should we ? With another argument ?
* Update test_cli.py
permute arg added
Added a link to code of conduct inside of CONTRIBUTING.md. Added naming conventions, indentation and import conventions. Added link to PEP 8 which I think most closely resembles the coding style used.
* Fixing checks for broken sites and repairing the ones that were changed
* little tweaks
* little tweaks
---------
Co-authored-by: Weekrow <somewherelse@yandex.ru>
This code is more readable and easier to understand than the original code. It uses more descriptive variable names, and it breaks the code into smaller, more manageable functions. The code also uses comments to explain what each part of the code is doing.
Here are some specific improvements that I made to the code:
* I renamed the variables `TOP_SITES_COUNT` and `TIMEOUT` to more descriptive names, such as `max_sites_to_search` and `timeout`.
* I broke the code into smaller, more manageable functions, such as `main()` and `search_func()`.
* I added comments to explain what each part of the code is doing.
* I used more consistent indentation.
Fixing two small typos in the error definition file:
- "switch to another..." -> ""Switch to another...
- Capitalizing this sentence
- "...parallel connections (e.g. --n 10)" -> "...parallel connections (e.g. -n 10)"
- Removing the extra `-` for this option
Multiple best practices applied as below:
- Replace deprecated `MAINTAINER` with `LABEL maintainer`
- Remove additional `apt clean` as it'll be done automatically
- Use `apt-get` instead of `apt` in script, apt does not have a stable
CLI interface, and it's for end-user.
- Put `apt-get install` & apt lists clean up in the same command
- Use `--no-install-recommends` with `apt-get install` to avoid install
additional packages
- Use `--no-cache-dir` with `pip install` to prevent temporary cache
- Use `COPY` instead of `ADD` for files and folders
- Use spaces instead of mixing spaces with tabs to indent
Size change by the refactor, almost 100MB saved:
```
REPOSITORY TAG IMAGE ID CREATED SIZE
maigret after 9e70c65dde32 1 minutes ago 543MB
maigret before a683f2b71751 7 minutes ago 635MB
```
* add a lot of new sites from social analyzer, fix presenceStr
* add social-analyzer sites
* fix username claimed
* update site list
* Update data.json
* changed Bayoushooter to use XenForo and foursquare to use correct checkType
* fix: removed disable from Bayoushooter
Co-authored-by: Antonio Marco <antonio.marco@liferaftinc.com>
# This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
name:Upload Python Package
name:Upload Python Package to PyPI when a Release is Published
Hey! I'm really glad you're reading this. Maigret contains a lot of sites, and it is very hard to keep all the sites operational. That's why any fix is important.
## Code of Conduct
Please read and follow the [Code of Conduct](CODE_OF_CONDUCT.md) to foster a welcoming and inclusive community.
## How to add a new site
#### Beginner level
You can use Maigret **submit mode** (`maigret --submit URL`) to add a new site or update an existing site. In this mode Maigret do an automatic analysis of the given account URL or site main page URL to determine the site engine and methods to check account presence. After checking Maigret asks if you want to add the site, answering y/Y will rewrite the local database.
#### Advanced level
You can edit [the database JSON file](https://github.com/soxoj/maigret/blob/main/maigret/resources/data.json) (`./maigret/resources/data.json`) manually.
## Testing
There are CI checks for every PR to the Maigret repository. But it will be better to run `make format`, `make link` and `make test` to ensure you've made a corrent changes.
## Submitting changes
To submit you changes you must [send a GitHub PR](https://github.com/soxoj/maigret/pulls) to the Maigret project.
Always write a clear log message for your commits. One-line messages are fine for small changes, but bigger changes should look like this:
$ git commit -m "A brief summary of the commit
>
> A paragraph describing what changed and its impact."
## Coding conventions
### General Guidelines
- Try to follow [PEP 8](https://www.python.org/dev/peps/pep-0008/) for Python code style.
- Ensure your code passes all tests before submitting a pull request.
### Code Style
- **Indentation**: Use 4 spaces per indentation level.
- **Imports**:
- Standard library imports should be placed at the top.
- Third-party imports should follow.
- Group imports logically.
### Naming Conventions
- **Variables and Functions**: Use `snake_case`.
- **Classes**: Use `CamelCase`.
- **Constants**: Use `UPPER_CASE`.
Start reading the code and you'll get the hang of it. ;)
**Maigret** collects a dossier on a person **by username only**, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys required.
<i>The Commissioner Jules Maigret is a fictional French police detective, created by Georges Simenon. His investigation method is based on understanding the personality of different people and their interactions.</i>
## Contents
## About
- [In one minute](#in-one-minute)
- [Main features](#main-features)
- [Demo](#demo)
- [Installation](#installation)
- [Usage](#usage)
- [Contributing](#contributing)
- [Commercial Use](#commercial-use)
- [About](#about)
Purpose of Maigret - **collect a dossier on a person by username only**, checking for accounts on a huge number of sites.
<a id="one-minute"></a>
## In one minute
This is a [sherlock](https://github.com/sherlock-project/) fork with cool features under heavy development.
*Don't forget to regularly update source code from repo*.
Ensure you have Python 3.10 or higher.
Currently supported more than 2000 sites ([full list](./sites.md)), by default search is launched against 500 popular sites in descending order of popularity.
```bash
pip install maigret
maigret YOUR_USERNAME
```
No install? Try the [Telegram bot](https://t.me/maigret_search_bot) or a [Cloud Shell](#cloud-shells).
Want a web UI? See [how to launch it](#web-interface).
See also: [Quick start](https://maigret.readthedocs.io/en/latest/quick-start.html).
## Main features
* Profile pages parsing, [extracting](https://github.com/soxoj/socid_extractor) personal info, links to other profiles, etc.
* Recursive search by new usernames found
* Search by tags (site categories, countries)
* Censorship and captcha detection
* Very few false positives
- Supports 3,000+ sites ([see full list](https://github.com/soxoj/maigret/blob/main/sites.md)). A default run checks the 500 highest-ranked sites by traffic; pass `-a` to scan everything, or `--tags` to narrow by category/country.
- Embeddable in Python projects — import `maigret` and run searches programmatically (see [library usage](https://maigret.readthedocs.io/en/latest/library-usage.html)).
- [Extracts](https://github.com/soxoj/socid_extractor) all available information about the account owner from profile pages and site APIs, including links to other accounts.
- Performs recursive search using discovered usernames and other IDs.
- Allows filtering by tags (site categories, countries).
- Detects and partially bypasses blocks, censorship, and CAPTCHA.
- Fetches an [auto-updated site database](https://maigret.readthedocs.io/en/latest/settings.html#database-auto-update) from GitHub each run (once per 24 hours), and falls back to the built-in database if offline.
- Works with Tor and I2P websites; able to check domains.
- Ships with a [web interface](#web-interface) for browsing results as a graph and downloading reports in every format from a single page.
For the complete feature list, see the [features documentation](https://maigret.readthedocs.io/en/latest/features.html).
### Used by
Professional OSINT and social-media analysis tools built on Maigret:
docker build --target web -t maigret-web . # Web UI image
```
You can use your a free virtual machine, the repo will be automatically cloned:
### Troubleshooting
[](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/soxoj/maigret&tutorial=README.md) [](https://repl.it/github/soxoj/maigret)
<a href="https://colab.research.google.com/gist//soxoj/879b51bc3b2f8b695abb054090645000/maigret.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="40"></a>
Build errors? See the [troubleshooting guide](https://maigret.readthedocs.io/en/latest/installation.html#troubleshooting).
## Usage
### Examples
```bash
pip3 install -r requirements.txt
```
# make HTML, PDF, and Xmind8 reports
maigret user --html
maigret user --pdf
maigret user --xmind #Output not compatible with xmind 2022+
## Using examples
```bash
# for a cloned repo
./maigret.py user
# for a package
maigret user
```
Features:
```bash
# make HTML and PDF reports
maigret user --html --pdf
# machine-readable exports
maigret user --json ndjson # newline-delimited JSON (also: --json simple)
maigret user --csv
maigret user --txt
maigret user --graph # interactive D3 graph (HTML)
# search on sites marked with tags photo & dating
maigret user --tags photo,dating
# search on sites marked with tag us
maigret user --tags us
# search for three usernames on all available sites
maigret user1 user2 user3 -a
```
Run `maigret --help`to get arguments description. Also options are documented in [the Maigret Wiki](https://github.com/soxoj/maigret/wiki/Command-line-options).
Run `maigret --help`for all options. Docs: [CLI options](https://maigret.readthedocs.io/en/latest/command-line-options.html), [more examples](https://maigret.readthedocs.io/en/latest/usage-examples.html). Running into 403s or timeouts? See [TROUBLESHOOTING.md](TROUBLESHOOTING.md).
With Docker:
```
# manual build
docker build -t maigret . && docker run maigret user
<a id="web-interface"></a>
### Web interface
# official image
docker run soxoj/maigret:latest user
Maigret has a built-in web UI with a results graph and downloadable reports.
<details>
<summary>Web Interface Screenshots</summary>


**Maigret can be embedded in your own Python projects.** The CLI is a thin wrapper around an async function you can call directly — build custom pipelines, feed results into your own tooling, or run it inside a larger OSINT workflow.
See the full [library usage guide](https://maigret.readthedocs.io/en/latest/library-usage.html) for a working example, async patterns, and how to filter sites by tag.
Original Creator of Sherlock Project - [Siddharth Dushantha](https://github.com/sdushantha)
```bash
# any HTTP/SOCKS proxy
maigret user --proxy socks5://127.0.0.1:1080
# Tor (default gateway socks5://127.0.0.1:9050)
maigret user --tor-proxy socks5://127.0.0.1:9050
# I2P (default gateway http://127.0.0.1:4444)
maigret user --i2p-proxy http://127.0.0.1:4444
```
Start your Tor / I2P daemon before running the command — Maigret does not manage these gateways.
## Contributing
Add or fix new sites surgically in `data.json` (no `json.load`/`json.dump`), then run `./utils/update_site_data.py` to regenerate `sites.md` and the database metadata, and open a pull request. For more details, see the [CONTRIBUTING guide](https://github.com/soxoj/maigret/blob/main/CONTRIBUTING.md) and [development docs](https://maigret.readthedocs.io/en/latest/development.html). Release history: [CHANGELOG.md](CHANGELOG.md).
## Commercial Use
The open-source Maigret is MIT-licensed and free for commercial use without restriction — but site checks break over time and need active maintenance.
For serious commercial use — with a **daily-updated site database** or a **username-check API** — reach out: 📧 [maigret@soxoj.com](mailto:maigret@soxoj.com)
- Private site database — 5 000+ sites, updated daily (separate from the public open-source database)
- Username check API — integrate Maigret into your product
## About
### Disclaimer
**For educational and lawful purposes only.** You are responsible for complying with all applicable laws (GDPR, CCPA, etc.) in your jurisdiction. The authors bear no responsibility for misuse.
### Feedback
[Open an issue](https://github.com/soxoj/maigret/issues) · [GitHub Discussions](https://github.com/soxoj/maigret/discussions) · [Telegram](https://t.me/soxoj)
### SOWEL classification
OSINT techniques used:
- [SOTL-2.2. Search For Accounts On Other Platforms](https://sowel.soxoj.com/other-platform-accounts)
- [SOTL-6.1. Check Logins Reuse To Find Another Account](https://sowel.soxoj.com/logins-reuse)
- [SOTL-6.2. Check Nicknames Reuse To Find Another Account](https://sowel.soxoj.com/nicknames-reuse)
Common issues when running Maigret and how to fix them. If none of this helps, [open an issue](https://github.com/soxoj/maigret/issues) with the output of `maigret --version` and the exact command you ran.
## "Lots of sites fail / timeout / return 403"
This is by far the most common report. It almost always comes from anti-bot protection (Cloudflare, DDoS-Guard, Akamai, etc.) or a slow network — not from a bug in Maigret.
**Results vary a lot depending on where you run from.** The same command on the same username can produce very different output on:
- **Mobile internet** (4G/5G) — usually the best results. Carrier NAT shares your IP with thousands of real users, so WAFs rarely block it.
- **Home broadband** — generally good, though some ISPs are reputation-flagged.
- **Hosting / cloud / VPS infrastructure** (AWS, GCP, DigitalOcean, Hetzner, etc.) — the worst case. Datacenter IP ranges are blanket-blocked or challenged by most WAFs, so you will see many false negatives and 403s.
If a run looks suspiciously empty, **try a different network before assuming Maigret is broken**: tether from your phone, switch between Wi-Fi and mobile, or move the run off a VPS onto a residential machine. Comparing results across two networks is also the fastest way to tell whether a missing account is genuinely missing or just blocked on the current IP.
Once you have a sense of the baseline, try these tweaks in order:
1.**Raise the timeout.** The default is 30 seconds. On mobile networks or for slow sites, bump it:
```bash
maigret user --timeout 60
```
2. **Retry failed checks.** Transient 5xx / timeouts often clear on a second try:
```bash
maigret user --retries 2
```
3. **Lower parallelism.** Some WAFs rate-limit aggressively. Maigret defaults to 100 concurrent connections (`-n` / `--max-connections`) — dropping this makes you look less like a scanner:
```bash
maigret user -n 20
```
4. **Route through a residential proxy.** Datacenter IPs (AWS, GCP, DigitalOcean) are blanket-blocked by many WAFs. A residential / mobile proxy usually fixes this:
```bash
maigret user --proxy http://user:pass@residential-proxy:port
```
Note: Tor (`--tor-proxy`) rarely helps here — most WAFs block Tor exit nodes just as aggressively as datacenter IPs. Use Tor only when you actually need to reach `.onion` sites (see below).
If specific sites *always* fail regardless of the above, they are likely broken in the database (stale markers, new WAF, site redesign). Report them with `--print-errors` output so a maintainer can look at the check config.
## "No results at all" / "maigret: command not found"
- **`command not found`** — `pip install maigret` put the binary under `~/.local/bin` (Linux/macOS) or `%APPDATA%\Python\Scripts` (Windows). Add that directory to `PATH`, or run `python3 -m maigret user` instead.
- **Empty output** — check that you actually passed a username; `maigret` alone prints help. Also confirm Python 3.10+ with `python3 --version`.
## "SSL / certificate errors"
Usually caused by a corporate MITM proxy or an outdated `certifi` bundle.
```bash
pip install --upgrade certifi
```
If you are behind a corporate proxy, set `HTTPS_PROXY` / `HTTP_PROXY` environment variables and pass `--proxy "$HTTPS_PROXY"` so Maigret uses the same route.
## ".onion / .i2p sites are skipped"
These sites only load through the matching gateway. Start your Tor or I2P daemon first, then:
```bash
# Tor
maigret user --tor-proxy socks5://127.0.0.1:9050
# I2P
maigret user --i2p-proxy http://127.0.0.1:4444
```
Maigret does not launch or manage these daemons — they must already be running.
## "The PDF / XMind / HTML report looks wrong"
- **PDF** — requires `weasyprint` and its system dependencies (Pango, Cairo, GDK-PixBuf). On Debian/Ubuntu: `apt install libpango-1.0-0 libpangoft2-1.0-0`. macOS: `brew install pango`.
- **XMind** — the `--xmind` flag generates **XMind 8** files. XMind 2022+ (Zen / XMind 2023) uses a different format and will not open them. Use XMind 8 or convert via `--html`.
- **HTML** looks unstyled — open it through a local file path (`file:///...`), not via a preview pane that strips CSS.
## "The site database is out of date"
Maigret auto-fetches a fresh `data.json` from GitHub once every 24 hours. To force-refresh now:
```bash
maigret user --force-update
```
To run entirely against the local built-in copy (e.g. offline):
```bash
maigret user --no-autoupdate
```
## Still stuck?
- [Open an issue](https://github.com/soxoj/maigret/issues) — include your OS, Python version, Maigret version, and the full command.
- Ask in [GitHub Discussions](https://github.com/soxoj/maigret/discussions) or the [Telegram](https://t.me/soxoj) channel.
**Maigret** collects a dossier on a person **by username only**, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys required.
## Installation
Google Cloud Shell does not ship with all the system libraries Maigret needs (`libcairo2-dev`, `pkg-config`). The helper script below installs them and then builds Maigret from the cloned source.
Copy the command and run it in the Cloud Shell terminal:
```bash
./utils/cloudshell_install.sh
```
When the script finishes, verify the install:
```bash
maigret --version
```
## Usage examples
Run a basic search for a username. By default Maigret checks the **500 highest-ranked sites by traffic** — pass `-a` to scan the full 3,000+ database.
```bash
maigret soxoj
```
Search several usernames at once:
```bash
maigret user1 user2 user3
```
Narrow the run to sites related to cryptocurrency via the `crypto` tag (you can also use country tags):
```bash
maigret vitalik.eth --tags crypto
```
Generate reports in HTML, PDF, and XMind 8 formats:
```bash
maigret soxoj --html
maigret soxoj --pdf
maigret soxoj --xmind
```
Download a generated report from Cloud Shell to your local machine:
```bash
cloudshell download reports/report_soxoj.pdf
```
Tune reliability on flaky networks — raise the timeout and retry failed checks:
```bash
maigret soxoj --timeout 60 --retries 2
```
For the full list of options see `maigret --help` or the [CLI documentation](https://maigret.readthedocs.io/en/latest/command-line-options.html).
## Further reading
Full project documentation: [maigret.readthedocs.io](https://maigret.readthedocs.io/)
You can specify several usernames separated by space. Usernames are
**not** mandatory as there are other operations modes (see below).
Parsing of account pages and online documents
---------------------------------------------
``maigret --parse URL``
Maigret will try to extract information about the document/account owner
(including username and other ids) and will make a search by the
extracted username and ids. See examples in the :ref:`extracting-information-from-pages` section.
Main options
------------
Options are also configurable through settings files, see
:doc:`settings section <settings>`.
``--tags`` - Filter sites for searching by tags: sites categories and
two-letter country codes (**not a language!**). E.g. photo, dating, sport; jp, us, global.
Multiple tags can be associated with one site. **Warning**: tags markup is
not stable now. Read more :doc:`in the separate section <tags>`.
``--exclude-tags`` - Exclude sites with specific tags from the search
(blacklist). E.g. ``--exclude-tags porn,dating`` will skip all sites
tagged with ``porn`` or ``dating``. Can be combined with ``--tags`` to
include certain categories while excluding others. Read more
:doc:`in the separate section <tags>`.
``-n``, ``--max-connections`` - Allowed number of concurrent connections
**(default: 100)**.
``-a``, ``--all-sites`` - Use all sites for scan **(default: top 500)**.
``--top-sites`` - Count of sites for scan ranked by Majestic Million
**(default: top 500)**.
**Mirrors:** After the top *N* sites by Majestic Million rank are chosen (respecting
``--tags``, ``--use-disabled-sites``, etc.), Maigret may add extra sites
whose database field ``source`` names a **parent platform** that itself falls
in the Majestic Million top *N* when ranking **including disabled** sites. For example,
if ``Twitter`` ranks in the first 500 by Majestic Million, a mirror such as ``memory.lol``
(with ``source: Twitter``) is included even though it has no rank and would
otherwise be cut off. The same applies to Instagram-related mirrors (e.g.
Picuki) when ``Instagram`` is in that parent top *N* by rank—even if the
official ``Instagram`` entry is disabled and not scanned by default, its
mirrors can still be pulled in. The final list is the ranked top *N* plus
these mirrors (no fixed upper bound on mirror count).
``--timeout`` - Time (in seconds) to wait for responses from sites
**(default: 30)**. A longer timeout will be more likely to get results
from slow sites. On the other hand, this may cause a long delay to
gather all results. The choice of the right timeout should be carried
out taking into account the bandwidth of the Internet connection.
``--cookies-jar-file`` - File with custom cookies in Netscape format
(aka cookies.txt). You can install an extension to your browser to
download own cookies (`Chrome <https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid>`_, `Firefox <https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/>`_).
``--no-recursion`` - Disable parsing pages for other usernames and
recursive search by them.
``--use-disabled-sites`` - Use disabled sites to search (may cause many
false positives).
``--id-type`` - Specify identifier(s) type (default: username).
Currently, you must add ``-a`` flag to run a scan on sites with custom
id types, sites will be filtered automatically.
``--ignore-ids`` - Do not make search by the specified username or other
ids. Useful for repeated scanning with found known irrelevant usernames.
``--db`` - Load Maigret database from a JSON file or an online, valid,
JSON file. See :ref:`custom-database` below.
``--no-autoupdate`` - Disable the automatic database update check that
runs at startup. The currently cached (or bundled) database is used
as-is.
``--force-update`` - Force a database update check at startup, ignoring
the usual check interval. Implies ``--no-autoupdate`` for the rest of
the run after the explicit update finishes.
``--retries RETRIES`` - Count of attempts to restart temporarily failed
requests.
.._custom-database:
Using a custom sites database
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ``--db`` flag accepts three forms:
1.**HTTP(S) URL** — fetched as-is, e.g.
``--db https://example.com/my_db.json``.
2.**Local file path** — absolute (``--db /tmp/private.json``) or
relative to the current working directory
(``--db LLM/maigret_private_db.json``).
3.**Module-relative path** — kept for backwards compatibility, resolved
against the installed ``maigret/`` package directory (e.g. the
default ``resources/data.json``).
Resolution order for local paths: the path is first tried as given
(absolute or cwd-relative); if that file does not exist, Maigret falls
back to the legacy module-relative resolution. If neither location
contains the file, Maigret exits with an error rather than silently
loading the bundled database.
When ``--db`` points to a custom file, automatic database updates are
skipped — the file is used exactly as provided.
On every run Maigret prints the database it actually loaded, for
example::
[+] Using sites database: /path/to/maigret_private_db.json (6 sites)
If loading the requested database fails for any other reason (corrupt
JSON, missing required keys, …), Maigret prints a warning, falls back
to the bundled database, and reports the fallback explicitly::
[-] Falling back to bundled database: /…/maigret/resources/data.json
[+] Using sites database: /…/maigret/resources/data.json (3154 sites)
A typical invocation against a private database, with auto-update
disabled and all sites scanned, looks like::
python3 -m maigret username \
--db LLM/maigret_private_db.json \
--no-autoupdate -a
Reports
-------
``-P``, ``--pdf`` - Generate a PDF report (general report on all
usernames).
``-H``, ``--html`` - Generate an HTML report file (general report on all
usernames).
``-X``, ``--xmind`` - Generate an XMind 8 mindmap (one report per
username).
``-C``, ``--csv`` - Generate a CSV report (one report per username).
``-T``, ``--txt`` - Generate a TXT report (one report per username).
``-J``, ``--json`` - Generate a JSON report of specific type: simple,
ndjson (one report per username). E.g. ``--json ndjson``
``-M``, ``--md`` - Generate a Markdown report (general report on all
usernames). See :ref:`markdown-report` below.
``-fo``, ``--folderoutput`` - Results will be saved to this folder,
``results`` by default. Will be created if doesn’t exist.
Output options
--------------
``-v``, ``--verbose`` - Display extra information and metrics.
*(loglevel=WARNING)*
``-vv``, ``--info`` - Display service information. *(loglevel=INFO)*
``-vvv``, ``--debug``, ``-d`` - Display debugging information and site
responses. *(loglevel=DEBUG)*
``--print-not-found`` - Print sites where the username was not found.
``--print-errors`` - Print errors messages: connection, captcha, site
country ban, etc.
Other operations modes
----------------------
``--version`` - Display version information and dependencies.
``--self-check`` - Do self-checking for sites and database. Each site is
tested by looking up its known-claimed and known-unclaimed usernames and
verifying that the results match expectations. Individual site failures
(network errors, unexpected exceptions, etc.) are caught and logged
without stopping the overall process, so the check always runs to
completion. After checking, Maigret reports a summary of issues found.
If any sites were disabled (see ``--auto-disable``), Maigret asks if you
want to save updates; answering y/Y will rewrite the local database.
``--auto-disable`` - Used with ``--self-check``: automatically disable
sites that fail checks (incorrect detection of claimed/unclaimed
usernames, connection errors, or unexpected exceptions). Without this
flag, ``--self-check`` only **reports** issues without modifying the
database.
``--diagnose`` - Used with ``--self-check``: print detailed diagnosis
information for each failing site, including the check type, the list
of issues found, and recommendations (e.g. suggesting a different
``checkType``).
``--submit URL`` - Do an automatic analysis of the given account URL or
site main page URL to determine the site engine and methods to check
account presence. After checking Maigret asks if you want to add the
site, answering y/Y will rewrite the local database.
.._markdown-report:
Markdown report (LLM-friendly)
------------------------------
The ``--md`` / ``-M`` flag generates a Markdown report designed for both human reading and analysis by AI assistants (ChatGPT, Claude, etc.).
..code-block::console
maigret username --md
The report includes:
-**Summary** with aggregated personal data (all fullnames, locations, bios found across accounts), country tags, website tags, first/last seen timestamps.
-**Per-account sections** with profile URL, site tags, and all extracted fields (username, bio, follower count, linked accounts, etc.).
-**Possible false positives** disclaimer explaining that accounts may belong to different people.
-**Ethical use** notice about applicable data protection laws.
**Using with AI tools:**
The Markdown format is optimized for LLM context windows. You can feed the report directly to an AI assistant for follow-up analysis:
..code-block::console
# Generate the report
maigret johndoe --md
# Feed it to an AI tool
cat reports/report_johndoe.md | llm "Analyze this OSINT report and summarize key findings"
The structured Markdown with per-site sections makes it easy for AI tools to extract relationships, cross-reference identities, and identify patterns across accounts.
The human-readable list of supported sites is available in the `sites.md <https://github.com/soxoj/maigret/blob/main/sites.md>`_ file in the repository.
It's been generated automatically from the main JSON file with the list of supported sites.
The machine-readable JSON file with the list of supported sites is available in the
`data.json <https://github.com/soxoj/maigret/blob/main/maigret/resources/data.json>`_ file in the directory `resources`.
2. Which methods to check the account presence are supported?
The supported methods (``checkType`` values in ``data.json``) are:
-``message`` - the most reliable method, checks if any string from ``presenceStrs`` is present and none of the strings from ``absenceStrs`` are present in the HTML response
-``status_code`` - checks that status code of the response is 2XX
-``response_url`` - check if there is not redirect and the response is 2XX
..note::
Maigret natively treats specific anti-bot HTTP status codes (like LinkedIn's ``HTTP 999``) as a standard "Not Found/Available" signal instead of throwing an infrastructure Server Error, gracefully preventing false positives.
See the details of check mechanisms in the `checking.py <https://github.com/soxoj/maigret/blob/main/maigret/checking.py#L339>`_ file.
..note::
Maigret now uses the **Majestic Million** dataset for site popularity sorting instead of the discontinued Alexa Rank API. For backward compatibility with existing configurations and parsers, the ranking field in `data.json` and internal site models remains named ``alexaRank`` and ``alexa_rank``.
**Mirrors and ``--top-sites``:** When you limit scans with ``--top-sites N``, Maigret also includes *mirror* sites (entries whose ``source`` field points at a parent platform such as Twitter or Instagram) if that parent would appear in the Majestic Million top *N* when disabled sites are considered for ranking. See the **Mirrors** paragraph under ``--top-sites`` in :doc:`command-line-options`.
Testing
-------
It is recommended use Python 3.10 for testing.
Install test requirements:
..code-block::console
poetry install --with dev
Use the following commands to check Maigret:
..code-block::console
# run linter and typing checks
# order of checks:
# - critical syntax errors or undefined names
# - flake checks
# - mypy checks
make lint
# run black formatter
make format
# run testing with coverage html report
# current test coverage is 58%
make test
# open html report
open htmlcov/index.html
# get flamechart of imports to estimate startup time
make speed
Site naming conventions
-----------------------------------------------
Site names are the keys in ``data.json`` and appear in user-facing reports. Follow these rules:
-**Title Case** by default: ``Product Hunt``, ``Hacker News``.
-**Lowercase** only if the brand itself is written that way: ``kofi``, ``note``, ``hi5``.
-**No domain suffix** (``calendly.com`` → ``Calendly``), unless the domain is part of the recognized brand name: ``last.fm``, ``VC.ru``, ``Archive.org``.
-**No full UPPERCASE** unless the brand is an acronym: ``VK``, ``CNET``, ``ICQ``, ``IFTTT``.
-**No**``www.``**or**``https://``**prefix** in the name.
-**Spaces** are allowed when the brand uses them: ``Star Citizen``, ``Google Maps``.
-**{username} templates** in names are acceptable: ``{username}.tilda.ws``.
When in doubt, check how the service refers to itself on its homepage.
How to fix false-positives
-----------------------------------------------
If you want to work with sites database, don't forget to activate statistics update git hook, command for it would look like this: ``git config --local core.hooksPath .githooks/``.
You should make your git commits from your maigret git repo folder, or else the hook wouldn't find the statistics update script.
1. Determine the problematic site.
If you already know which site has a false-positive and want to fix it specifically, go to the next step.
Otherwise, simply run a search with a random username (e.g. `laiuhi3h4gi3u4hgt`) and check the results.
Alternatively, you can use `the Telegram bot <https://t.me/maigret_search_bot>`_.
2. Open the account link in your browser and check:
- If the site is completely gone, remove it from the list
- If the site still works but looks different, update in data.json how we check it
- If the site requires login to view profiles, disable checking it
3. Find the site in the `data.json <https://github.com/soxoj/maigret/blob/main/maigret/resources/data.json>`_ file.
If the ``checkType`` method is not ``message`` and you are going to fix check, update it:
- put ``message`` in ``checkType``
- put in ``absenceStrs`` a keyword that is present in the HTML response for an non-existing account
- put in ``presenceStrs`` a keyword that is present in the HTML response for an existing account
If you have trouble determining the right keywords, you can use automatic detection by passing the account URL with the ``--submit`` option:
..code-block::console
maigret --submit https://my.mail.ru/bk/alex
To disable checking, set ``disabled`` to ``true`` or simply run:
..code-block::console
maigret --self-check --site My.Mail.ru@bk.ru
To debug the check method using the response HTML, you can run:
There are few options for sites data.json helpful in various cases:
-``engine`` - a predefined check for the sites of certain type (e.g. forums), see the ``engines`` section in the JSON file
-``headers`` - a dictionary of additional headers to be sent to the site
-``requestHeadOnly`` - set to ``true`` if it's enough to make a HEAD request to the site
-``regexCheck`` - a regex to check if the username is valid, in case of frequent false-positives
-``requestMethod`` - set the HTTP method to use (e.g., ``POST``). By default, Maigret natively defaults to GET or HEAD.
-``requestPayload`` - a dictionary with the JSON payload to send for POST requests (e.g., ``{"username": "{username}"}``), extremely useful for parsing GraphQL or modern JSON APIs.
-``protection`` - a list of protection types detected on the site (see below).
``protection`` (site protection tracking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ``protection`` field records what kind of anti-bot protection a site uses. Maigret reads this field and automatically applies the appropriate bypass mechanism where one exists.
Two categories of tag:
-**Load-bearing.** Maigret changes its HTTP client or headers based on the tag. Currently only ``tls_fingerprint`` (switches to ``curl_cffi`` with Chrome-class TLS).
-**Documentation-only.** Maigret does **not** change behavior based on the tag; it records *why* the site is hard so a future solver can target the right set of sites without re-auditing.
Within the documentation-only tags, there is a further split that dictates whether the site is ``disabled: true``:
-``ip_reputation`` is the **only** doc-tag that **keeps the site enabled**. It means "works for most users, fails from datacenter/cloud IPs." Disabling would silently hide a working site from anyone with a clean IP. The fix is **external** to Maigret (residential IP or ``--proxy``).
-``cf_js_challenge``, ``cf_firewall``, ``aws_waf_js_challenge``, ``ddos_guard_challenge``, ``custom_bot_protection``, ``js_challenge`` all pair with ``disabled: true``. They mean "does not work for anyone right now"; the tag identifies the provider so that when a bypass ships, every site with that tag can be re-enabled in one pass.
Supported values:
-``tls_fingerprint``*(load-bearing; site stays enabled)* — the site fingerprints the TLS handshake (JA3/JA4) and blocks non-browser clients. Maigret automatically uses ``curl_cffi`` with Chrome browser emulation to bypass this. Requires the ``curl_cffi`` package (included as a dependency). Examples: Instagram, NPM, Codepen, Kickstarter, Letterboxd.
-``ip_reputation``*(documentation-only; site stays enabled)* — the site blocks requests from datacenter/cloud IPs regardless of headers or TLS. Cannot be bypassed automatically; run Maigret from a regular internet connection (not a datacenter) or use a proxy (``--proxy``). The site is **not** marked ``disabled`` because it continues to work for users on residential IPs. Examples: Reddit, Patreon, Figma, OnlyFans.
-``cf_js_challenge``*(documentation-only; pair with ``disabled: true``)* — Cloudflare Managed Challenge / Turnstile JS challenge. Symptom: HTTP 403 with ``cf-mitigated: challenge`` header; body contains ``challenges.cloudflare.com``, ``_cf_chl_opt``, ``window._cf_chl``, or "Just a moment". Not bypassable via ``curl_cffi`` TLS impersonation (verified across Chrome 123/124/131, Safari 17/18, Firefox 133/135, Edge 101 — all return the same 403 challenge page); a real browser executing the challenge JS is required to obtain the clearance cookie. Sites stay ``disabled: true`` until a CF-challenge solver is integrated. Examples: DMOJ, Elakiri, Fanlore, Bdoutdoors, TheStudentRoom, forum.hr.
-``cf_firewall``*(documentation-only; pair with ``disabled: true``)* — Cloudflare firewall rule / bot score block (WAF action=block, **not** action=challenge). Symptom: HTTP 403 served by Cloudflare (``server: cloudflare``, ``cf-ray`` header) **without** JS-challenge markers — body typically shows "Access denied", "Attention Required", or just a bare 1015/1016/1020 error page. Unlike ``ip_reputation``, residential IPs are **not** sufficient to bypass — Cloudflare decides based on a composite of bot score, TLS fingerprint, UA, ASN, and custom site-owner rules, so ``curl_cffi`` Chrome impersonation from a residential line still returns 403. Sites stay ``disabled: true`` until a per-site bypass (cookies, real browser, or residential+clean session) is found. Examples: Fark, Fodors, Huntingnet, Hunttalk.
-``aws_waf_js_challenge``*(documentation-only; pair with ``disabled: true``)* — the site is protected by AWS WAF with a JavaScript challenge. Symptom: HTTP 202 with empty body and ``x-amzn-waf-action: challenge`` header (a token-granting challenge that requires executing the CAPTCHA/challenge JS bundle). Neither ``curl_cffi`` TLS impersonation nor User-Agent changes bypass this — a real browser or the official AWS WAF challenge-solver SDK is required. Sites stay ``disabled: true`` until a solver is integrated. Example: Dreamwidth.
-``ddos_guard_challenge``*(documentation-only; pair with ``disabled: true``)* — DDoS-Guard (ddos-guard.net) anti-bot page. Symptom: HTTP 403 with ``server: ddos-guard`` header; body contains "DDoS-Guard". DDoS-Guard fingerprints different UAs per source IP, so a single User-Agent override does not work across environments; a JS-capable bypass or DDoS-Guard-aware solver is required. Sites stay ``disabled: true`` until a solver is integrated. Example: ForumHouse.
-``js_challenge``*(documentation-only; pair with ``disabled: true``)* — **fallback** for JavaScript-challenge systems whose provider cannot be identified (custom in-house challenge pages that are not Cloudflare, AWS WAF, or any other recognized vendor). Prefer a provider-specific tag whenever the provider can be pinned down from response headers or body signatures.
-``custom_bot_protection``*(documentation-only; pair with ``disabled: true``)* — **fallback** for non-JS-challenge bot protection served by a custom/in-house system (not Cloudflare, not AWS WAF, not DDoS-Guard). Typical symptom: HTTP 403 from the site's own origin server (``server: nginx``, AWS ELB, etc.) with a branded block page, returned regardless of TLS fingerprint or residential IP. Not generically bypassable; investigate per site (cookies, session, proxy geography). Examples: Hackerearth ("HackerEarth Guardian"), FreelanceJob (nginx-level block).
**Rule: prefer provider-specific protection tags.** When a site is blocked by an identifiable anti-bot vendor, always record the vendor in the tag (``cf_js_challenge``, ``cf_firewall``, ``aws_waf_js_challenge``, ``ddos_guard_challenge``, and future additions such as ``sucuri_challenge``, ``incapsula_challenge``). The generic ``js_challenge`` and ``custom_bot_protection`` tags are reserved for custom/unknown systems. Rationale: bypass solvers are inherently provider-specific (a Cloudflare Turnstile solver does not help with AWS WAF); recording the provider in advance lets us fan out fixes the moment a per-provider solver is added, without re-auditing every disabled site. The same principle applies to other protection categories when the provider is identifiable.
Example:
..code-block::json
"Instagram":{
"url":"https://www.instagram.com/{username}/",
"checkType":"message",
"presenseStrs":["\"routePath\":\"\\/"],
"absenceStrs":["\"routePath\":null"],
"protection":["tls_fingerprint"]
}
``urlProbe`` (optional profile probe URL)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By default Maigret performs the HTTP request to the same URL as ``url`` (the public profile link pattern).
If you set ``urlProbe`` in ``data.json``, Maigret **fetches** that URL for the presence check (API, GraphQL, JSON endpoint, etc.), while **reports and ``url_user``** still use ``url`` — the human-readable profile page users should open.
Placeholders: ``{username}``, ``{urlMain}``, ``{urlSubpath}`` (same as for ``url``). Example: GitHub uses ``url````https://github.com/{username}`` and ``urlProbe````https://api.github.com/users/{username}``; Picsart uses the web profile ``https://picsart.com/u/{username}`` and probes ``https://api.picsart.com/users/show/{username}.json``.
Implementation: ``make_site_result`` in `checking.py <https://github.com/soxoj/maigret/blob/main/maigret/checking.py>`_.
Site check fixes using LLM
--------------------------
..note::
The ``LLM/`` directory at the root of the repository contains detailed instructions for editing site checks (in Markdown format): checklist, full guide to ``checkType`` / ``data.json`` / ``urlProbe``, handling false positives, searching for public JSON APIs, and the proposal log for ``socid_extractor``.
Main files:
-`site-checks-playbook.md <https://github.com/soxoj/maigret/blob/main/LLM/site-checks-playbook.md>`_ — short checklist
-`socid_extractor_improvements.log <https://github.com/soxoj/maigret/blob/main/LLM/socid_extractor_improvements.log>`_ — template and entries for identity extractor improvements
These files should be kept up-to-date whenever changes are made to the check logic in the code or in ``data.json``.
.._activation-mechanism:
Activation mechanism
--------------------
The activation mechanism helps make requests to sites requiring additional authentication like cookies, JWT tokens, or custom headers.
Let's study the Vimeo site check record from the Maigret database:
..code-block::json
"Vimeo":{
"tags":[
"us",
"video"
],
"headers":{
"Authorization":"jwt eyJ0..."
},
"activation":{
"url":"https://vimeo.com/_rv/viewer",
"marks":[
"Something strange occurred. Please get in touch with the app's creator."
Here's how the activation process works when a JWT token becomes invalid:
1. The site check makes an HTTP request to ``urlProbe`` with the invalid token
2. The response contains an error message specified in the ``activation``/``marks`` field
3. When this error is detected, the ``vimeo`` activation function is triggered
4. The activation function obtains a new JWT token and updates it in the site check record
5. On the next site check (either through retry or a new Maigret run), the valid token is used and the check succeeds
Examples of activation mechanism implementation are available in `activation.py <https://github.com/soxoj/maigret/blob/main/maigret/activation.py>`_ file.
How to publish new version of Maigret
-------------------------------------
**Collaborats rights are requires, write Soxoj to get them**.
For new version publishing you must create a new branch in repository
with a bumped version number and actual changelog first. After it you
must create a release, and GitHub action automatically create a new
PyPi package.
- New branch example: https://github.com/soxoj/maigret/commit/e520418f6a25d7edacde2d73b41a8ae7c80ddf39
1. Make a new branch locally with a new version name. Check the current version number here: https://pypi.org/project/maigret/.
**Increase only patch version (third number)** if there are no breaking changes.
..code-block::console
git checkout -b 0.4.0
2. Update Maigret version in three files manually:
- pyproject.toml
- maigret/__version__.py
- docs/source/conf.py
- snapcraft.yaml
3. Create a new empty text section in the beginning of the file `CHANGELOG.md` with a current date:
..code-block::console
## [0.4.0] - 2022-01-03
4. Get auto-generate release notes:
- Open https://github.com/soxoj/maigret/releases/new
- Click `Choose a tag`, enter `v0.4.0` (your version)
- Click `Create new tag`
- Press `+ Auto-generate release notes`
- Copy all the text from description text field below
- Paste it to empty text section in `CHANGELOG.txt`
- Remove redundant lines `## What's Changed` and `## New Contributors` section if it exists
-*Close the new release page*
5. Commit all the changes, push, make pull request
..code-block::console
git add -p
git commit -m 'Bump to YOUR VERSION'
git push origin head
6. Merge pull request
7. Create new release
- Open https://github.com/soxoj/maigret/releases/new again
- Click `Choose a tag`
- Enter actual version in format `v0.4.0`
- Also enter actual version in the field `Release title`
- Click `Create new tag`
- Press `+ Auto-generate release notes`
-**Press "Publish release" button**
8. That's all, now you can simply wait push to PyPi. You can monitor it in Action page: https://github.com/soxoj/maigret/actions/workflows/python-publish.yml
Documentation updates
---------------------
Documentations is auto-generated and auto-deployed from the ``docs`` directory.
To manually update documentation:
1. Change something in the ``.rst`` files in the ``docs/source`` directory.
2. Install ``pip install -r requirements.txt`` in the docs directory.
3. Run ``make singlehtml`` in the terminal in the docs directory.
4. Open ``build/singlehtml/index.html`` in your browser to see the result.
5. If everything is ok, commit and push your changes to GitHub.
Roadmap
-------
..warning::
This roadmap requires updating to reflect the current project status and future plans.
1. Run Maigret with the ``--web`` flag and specify the port number.
..code-block::console
maigret --web 5000
2. Open http://127.0.0.1:5000 in your browser and enter one or more usernames to make a search.
3. Wait a bit for the search to complete and view the graph with results, the table with all accounts found, and download reports of all formats.
Personal info gathering
-----------------------
Maigret does the `parsing of accounts webpages and extraction <https://github.com/soxoj/socid-extractor>`_ of personal info, links to other profiles, etc.
Extracted info displayed as an additional result in CLI output and as tables in HTML and PDF reports.
Also, Maigret use found ids and usernames from links to start a recursive search.
Enabled by default, can be disabled with ``--no extracting``.
..code-block::text
$ python3 -m maigret soxoj --timeout 5
[-] Starting a search on top 500 sites from the Maigret database...
[!] You can run search by full list of sites with flag `-a`
[-] Starting a search on top 500 sites from the Maigret database...
[!] You can run search by full list of sites with flag `-a`
[*] Checking username hopedream on:
...
Reports
-------
Maigret currently supports HTML, PDF, TXT, XMind 8 mindmap, and JSON reports.
HTML/PDF reports contain:
- profile photo
- all the gathered personal info
- additional information about supposed personal data (full name, gender, location), resulting from statistics of all found accounts
Also, there is a short text report in the CLI output after the end of a searching phase.
..warning::
XMind 8 mindmaps are incompatible with XMind 2022!
Tags
----
The Maigret sites database very big (and will be bigger), and it is maybe an overhead to run a search for all the sites.
Also, it is often hard to understand, what sites more interesting for us in the case of a certain person.
Tags markup allows selecting a subset of sites by interests (photo, messaging, finance, etc.) or by country. Tags of found accounts grouped and displayed in the reports.
See full description :doc:`in the Tags Wiki page <tags>`.
Censorship and captcha detection
--------------------------------
Maigret can detect common errors such as censorship stub pages, CloudFlare captcha pages, and others.
If you get more them 3% errors of a certain type in a session, you've got a warning message in the CLI output with recommendations to improve performance and avoid problems.
Retries
-------
Maigret will do retries of the requests with temporary errors got (connection failures, proxy errors, etc.).
One attempt by default, can be changed with option ``--retries N``.
Database self-check
-------------------
Maigret includes a self-check mode (``--self-check``) that validates every site
in the database by looking up its known-claimed and known-unclaimed usernames
and verifying that the detection results match expectations.
The self-check is **error-resilient**: if an individual site check raises an
unexpected exception (e.g. a network error or a parsing failure), the error is
caught, logged, and recorded as an issue — the remaining sites continue to be
checked without interruption. This means the process always runs to completion,
even when checking hundreds of sites with ``-a --self-check``.
Use ``--auto-disable`` together with ``--self-check`` to automatically disable
sites that fail checks. Without it, issues are only reported. Use ``--diagnose``
to print detailed per-site diagnosis including the check type, specific issues,
and recommendations.
..code-block::console
# Report-only mode (no changes to the database)
maigret --self-check
# Automatically disable failing sites and save updates
maigret -a --self-check --auto-disable
# Show detailed diagnosis for each failing site
maigret -a --self-check --diagnose
Archives and mirrors checking
-----------------------------
The Maigret database contains not only the original websites, but also mirrors, archives, and aggregators. For example:
- (no longer available) `Reddit BigData search <https://camas.github.io/reddit-search/>`_
- (no longer available) `Twitter shadowban <https://shadowban.eu/>`_ checker
It allows getting additional info about the person and checking the existence of the account even if the main site is unavailable (bot protection, captcha, etc.)
Activation
----------
The activation mechanism helps make requests to sites requiring additional authentication like cookies, JWT tokens, or custom headers.
It works by implementing a custom function that:
1. Makes a specialized HTTP request to a specific website endpoint
2. Processes the response
3. Updates the headers/cookies for that site in the local Maigret database
Since activation only triggers after encountering specific errors, a retry (or another Maigret run) is needed to obtain a valid response with the updated authentication.
The activation mechanism is enabled by default, and cannot be disabled at the moment.
See for more details in Development section :ref:`activation-mechanism`.
.._extracting-information-from-pages:
Extraction of information from account pages
--------------------------------------------
Maigret can parse URLs and content of web pages by URLs to extract info about account owner and other meta information.
You must specify the URL with the option ``--parse``, it's can be a link to an account or an online document. List of supported sites `see here <https://github.com/soxoj/socid-extractor#sites>`_.
After the end of the parsing phase, Maigret will start the search phase by :doc:`supported identifiers <supported-identifier-types>` found (usernames, ids, etc.).
Scanning webpage by URL https://docs.google.com/spreadsheets/d/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw/edit#gid=0...
┣╸org_name: Gooten
┗╸mime_type: application/vnd.google-apps.ritz
Scanning webpage by URL https://clients6.google.com/drive/v2beta/files/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw?fields=alternateLink%2CcopyRequiresWriterPermission%2CcreatedDate%2Cdescription%2CdriveId%2CfileSize%2CiconLink%2Cid%2Clabels(starred%2C%20trashed)%2ClastViewedByMeDate%2CmodifiedDate%2Cshared%2CteamDriveId%2CuserPermission(id%2Cname%2CemailAddress%2Cdomain%2Crole%2CadditionalRoles%2CphotoLink%2Ctype%2CwithLink)%2Cpermissions(id%2Cname%2CemailAddress%2Cdomain%2Crole%2CadditionalRoles%2CphotoLink%2Ctype%2CwithLink)%2Cparents(id)%2Ccapabilities(canMoveItemWithinDrive%2CcanMoveItemOutOfDrive%2CcanMoveItemOutOfTeamDrive%2CcanAddChildren%2CcanEdit%2CcanDownload%2CcanComment%2CcanMoveChildrenWithinDrive%2CcanRename%2CcanRemoveChildren%2CcanMoveItemIntoTeamDrive)%2Ckind&supportsTeamDrives=true&enforceSingleParent=true&key=AIzaSyC1eQ1xj69IdTMeii5r7brs3R90eck-m7k...
Maigret ships with a bundled site database. After installation from PyPI (or any other method), it can **automatically fetch a newer compatible database from GitHub** when you run it—see :ref:`database-auto-update` in :doc:`settings`.
..note::
Python 3.10 or higher and pip is required, **Python 3.11 is recommended.**
Maigret's CLI is a thin wrapper around an async Python API. You can embed Maigret in your own tools, pipelines, and OSINT workflows — no need to shell out.
This page covers the common patterns. For the full argument list of the underlying function, see ``maigret.checking.maigret`` in the source.
Installation
------------
..code-block::bash
pip install maigret
Minimal example
---------------
A working end-to-end search against the top 500 sites:
..code-block::python
importasyncio
importlogging
frommaigretimportsearchasmaigret_search
frommaigret.sitesimportMaigretDatabase
# Load the bundled site database
db=MaigretDatabase().load_from_path(
"maigret/resources/data.json"
)
# Pick which sites to scan (same filtering the CLI uses)
sites=db.ranked_sites_dict(top=500)
results=asyncio.run(
maigret_search(
username="soxoj",
site_dict=sites,
logger=logging.getLogger("maigret"),
timeout=30,
is_parsing_enabled=True,
)
)
forsite_name,resultinresults.items():
ifresult["status"].is_found():
print(site_name,result["url_user"])
Key points:
-``maigret_search`` is an ``async`` function — wrap it with ``asyncio.run(...)`` or ``await`` it from inside your own event loop.
-``is_parsing_enabled=True`` turns on ``socid_extractor`` so ``result["ids_data"]`` is populated with profile fields (bio, linked accounts, uids, etc.).
- Each entry in the returned dict has a ``"status"`` object with ``is_found()``, plus ``url_user``, ``http_status``, ``rank``, ``ids_data``, and more.
Filtering sites
---------------
``ranked_sites_dict`` accepts the same filters as the CLI:
# Include disabled sites (useful for maintenance / self-check)
sites=db.ranked_sites_dict(disabled=True)
Running inside an existing event loop
-------------------------------------
If your application already runs an asyncio loop (FastAPI, aiohttp server, a Discord bot, etc.), ``await````maigret_search`` directly instead of calling ``asyncio.run``:
..code-block::python
asyncdefcheck_username(username:str)->dict:
results=awaitmaigret_search(
username=username,
site_dict=sites,
logger=logger,
timeout=30,
)
return{
name:r["url_user"]
forname,rinresults.items()
ifr["status"].is_found()
}
Routing through a proxy
-----------------------
The same proxy / Tor / I2P flags the CLI exposes are plain keyword arguments:
..code-block::python
results=awaitmaigret_search(
username="soxoj",
site_dict=sites,
logger=logger,
proxy="socks5://127.0.0.1:1080",
tor_proxy="socks5://127.0.0.1:9050",# used for .onion sites
i2p_proxy="http://127.0.0.1:4444",# used for .i2p sites
timeout=30,
)
Full function signature
-----------------------
..code-block::python
asyncdefmaigret(
username:str,
site_dict:Dict[str,MaigretSite],
logger,
query_notify=None,
proxy=None,
tor_proxy=None,
i2p_proxy=None,
timeout=30,
is_parsing_enabled=False,
id_type="username",
debug=False,
forced=False,
max_connections=100,
no_progressbar=False,
cookies=None,
retries=0,
check_domains=False,
)->QueryResultWrapper
See :doc:`command-line-options` for a description of each option — the semantics match the CLI flags one-to-one.
After start Maigret tries to load configuration from the following sources in exactly the same order:
..code-block::console
# relative path, based on installed package path
resources/settings.json
# absolute path, configuration file in home directory
~/.maigret/settings.json
# relative path, based on current working directory
settings.json
Missing any of these files is not an error.
If the next settings file contains already known option,
this option will be rewrited. So it is possible to make
custom configuration for different users and directories.
.._database-auto-update:
Database auto-update
--------------------
Maigret ships with a bundled site database, but it gets outdated between releases. To keep the database current, Maigret automatically checks for updates on startup.
**How it works:**
1. On startup, Maigret checks if more than 24 hours have passed since the last update check.
2. If so, it fetches a lightweight metadata file (~200 bytes) from GitHub to see if a newer database is available.
3. If a newer, compatible database exists, Maigret downloads it to ``~/.maigret/data.json`` and uses it instead of the bundled copy.
4. If the download fails or the new database is incompatible with your Maigret version, the bundled database is used as a fallback.
The downloaded database has **higher priority** than the bundled one — it replaces, not overlays.
**Status messages** are printed only when an action occurs:
..code-block::text
[*] DB auto-update: checking for updates...
[+] DB auto-update: database updated successfully (3180 sites)
[*] DB auto-update: database is up to date (3157 sites)
[!] DB auto-update: latest database requires maigret >= 0.6.0, you have 0.5.0
**Forcing an update:**
Use the ``--force-update`` flag to check for updates immediately, ignoring the check interval:
..code-block::console
maigret username --force-update
The update happens at startup, then the search continues normally with the freshly downloaded database.
**Disabling auto-update:**
Use the ``--no-autoupdate`` flag to skip the update check entirely:
..code-block::console
maigret username --no-autoupdate
Or set it permanently in ``~/.maigret/settings.json``:
..code-block::json
{
"no_autoupdate":true
}
This is recommended for **Docker containers**, **CI pipelines**, and **air-gapped environments**.
**Configuration options** (in ``settings.json``):
..list-table::
:header-rows:1
:widths:35 15 50
* - Setting
- Default
- Description
* - ``no_autoupdate``
- ``false``
- Disable auto-update entirely
* - ``autoupdate_check_interval_hours``
- ``24``
- How often to check for updates (in hours)
* - ``db_update_meta_url``
- GitHub raw URL
- URL of the metadata file (for custom mirrors)
**Using a custom database** with ``--db`` always skips auto-update — you are explicitly choosing your data source.
Maigret can search against not only ordinary usernames, but also through certain common identifiers. There is a list of all currently supported identifiers.
-**gaia_id** - Google inner numeric user identifier, in former times was placed in a Google Plus account URL.
-**steam_id** - Steam inner numeric user identifier.
-**wikimapia_uid** - Wikimapia.org inner numeric user identifier.
-**uidme_uguid** - uID.me inner numeric user identifier.
-**yandex_public_id** - Yandex sites inner letter user identifier. See also: `YaSeeker <https://github.com/HowToFind-bot/YaSeeker>`_.
-**vk_id** - VK.com inner numeric user identifier.
The use of tags allows you to select a subset of the sites from big Maigret DB for search.
..warning::
Tags markup is still not stable.
There are several types of tags:
1.**Country codes**: ``us``, ``jp``, ``br``... (`ISO 3166-1 alpha-2 <https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2>`_). A country tag means that having an account on the site implies a connection to that country — either origin or residence. The goal is attribution, not perfect accuracy.
- **Global sites** (GitHub, YouTube, Reddit, Medium, etc.) get **no country tag** — an account there says nothing about where a person is from.
-**Regional/local sites** where an account implies a specific country **must** have a country tag: ``VK`` → ``ru``, ``Naver`` → ``kr``, ``Zhihu`` → ``cn``.
- Multiple country tags are allowed when a service is used predominantly in a few countries (e.g. ``Xing`` → ``de``, ``eu``).
- Do **not** assign country tags based on traffic statistics alone — a site popular in India by traffic is not "Indian" if it is used globally.
2.**Site engines**. Most of them are forum engines now: ``uCoz``, ``vBulletin``, ``XenForo`` et al. Full list of engines stored in the Maigret database.
3.**Sites' subject/type and interests of its users**. Full list of "standard" tags is `present in the source code <https://github.com/soxoj/maigret/blob/main/maigret/sites.py#L13>`_ only for a moment.
Usage
-----
``--tags us,jp`` -- search on US and Japanese sites (actually marked as such in the Maigret database)
``--tags coding`` -- search on sites related to software development.
``--tags ucoz`` -- search on uCoz sites only (mostly CIS countries)
Blacklisting (excluding) tags
------------------------------
You can exclude sites with certain tags from the search using ``--exclude-tags``:
``--exclude-tags porn,dating`` -- skip all sites tagged with ``porn`` or ``dating``.
``--exclude-tags ru`` -- skip all Russian sites.
You can combine ``--tags`` and ``--exclude-tags`` to fine-tune your search:
``--tags forum --exclude-tags ru`` -- search on forum sites, but skip Russian ones.
In the web interface, the tag cloud supports three states per tag:
click once to **include** (green), click again to **exclude** (dark/strikethrough),
and click once more to return to **neutral** (red).
Blockchain transactions are public, but the people behind wallets are not. Maigret helps bridge this gap by finding Web3 accounts tied to a username, revealing the person behind a pseudonymous crypto persona.
Why it matters
--------------
Crypto investigations often start with a wallet address or an ENS name but hit a wall — the blockchain tells you *what* happened, not *who* did it. A username, however, is reused across platforms. If someone trades on OpenSea as ``zachxbt`` and posts on Warpcast as ``zachxbt``, Maigret connects the dots and builds a full profile.
Common scenarios:
-**Scam attribution.** A rug-pull promoter uses the same alias on Fragment (Telegram username marketplace), OpenSea, and a personal blog.
-**Sanctions compliance.** Verifying whether a counterparty's online footprint matches known sanctioned individuals.
-**Due diligence.** Before an OTC deal or DAO vote, checking whether the other party has a consistent online presence or is a freshly created sockpuppet.
-**Stolen funds tracing.** A stolen NFT appears on OpenSea under a new account — but the username matches a Warpcast profile with real-world links.
Supported sites
---------------
Maigret currently checks the following crypto and Web3 platforms:
- Payment handle: first/last name, country code, base currency, supported payment methods
- Not strictly Web3, but widely used by crypto OTC traders for fiat off-ramps; the public API returns structured KYC-adjacent data
Real-world example: zachxbt
---------------------------
`ZachXBT <https://twitter.com/zachxbt>`_ is a well-known on-chain investigator. Let's see what Maigret can find from just the username ``zachxbt``:
..code-block::console
maigret zachxbt --tags crypto
Maigret finds 5 accounts and automatically extracts structured data from each:
**Fragment** — confirms the Telegram username ``@zachxbt`` is claimed, reveals the TON wallet address (``EQBisZrk...``), purchase price (10 TON), and date (January 2023).
**Paragraph** — the richest result. Returns the real name used on the platform (``ZachXBT``), bio (``Scam survivor turned 2D investigator``), an Ethereum wallet address (``0x23dBf066...``), and a linked Twitter handle (``zachxbt``). The ``wallet_address`` field is especially valuable — it directly links the pseudonym to an on-chain identity.
**Warpcast** — Farcaster profile with a Farcaster ID (``fid: 20931``), profile image, and social graph (33K followers). Every Farcaster ID is tied to an Ethereum address via the on-chain ID registry, so this is another on-chain anchor.
**OpenSea** — NFT marketplace profile with bio (``On-chain sleuth | 10x rug pull survivor``), avatar (hosted on ``seadn.io`` with an Ethereum address in the URL path), and a link to an external investigations page.
**Hive Blog** — blockchain-based blog account created in March 2025. Low activity (1 post), but confirms the username is claimed across blockchain ecosystems.
From a single username, Maigret produces:
-**2 wallet addresses** — one TON (from Fragment), one Ethereum (from Paragraph)
-**Social graph data** — 33K Farcaster followers, blog activity timestamps
This is enough to pivot into blockchain analysis tools (Etherscan, Arkham, Nansen) using the wallet addresses, or into social media analysis using the Twitter handle.
Workflow: from username to wallet
---------------------------------
**Step 1: Search crypto platforms**
..code-block::console
maigret <username> --tags crypto -v
Review the results. Pay attention to:
-**Fragment** — if the username is claimed, you get a TON wallet address directly.
-**Paragraph** — blog profiles often contain an ETH address and a Twitter handle.
-**Warpcast** — Farcaster IDs map to Ethereum addresses via the on-chain registry.
-**OpenSea** — avatar URLs sometimes contain wallet addresses in the path.
**Step 2: Expand with extracted identifiers**
Maigret automatically extracts additional identifiers from found profiles (real names, linked accounts, profile URLs) and recursively searches for them. This is enabled by default. If Maigret finds a linked Twitter handle on a Paragraph profile, it will automatically search for that handle across all sites.
**Step 3: Cross-reference with non-crypto platforms**
The real power is connecting crypto personas to mainstream accounts. Drop the tag filter:
..code-block::console
maigret <username> -a
This checks all 3000+ sites. A match on GitHub, Reddit, or a forum can reveal the person behind the wallet.
Workflow: from wallet to identity
---------------------------------
If you start with a wallet address rather than a username, you can use complementary tools to get a username first:
1.**ENS / Unstoppable Domains** — resolve the wallet address to a human-readable name (``vitalik.eth``). Then search that name in Maigret.
2.**Etherscan labels** — check if the address has a public label (exchange, known entity).
3.**Fragment** — search the TON wallet address to find which Telegram usernames it purchased.
4.**Arkham Intelligence / Nansen** — blockchain attribution platforms that may tag the address with a known identity.
Once you have a username candidate, feed it to Maigret.
Tips
----
-**Username reuse is the #1 signal.** Crypto-native users often reuse their ENS name (``alice.eth``) or a variation (``alice_eth``, ``aliceeth``) across platforms. Try all variations.
-**Fragment is uniquely valuable** because it directly links Telegram usernames to TON wallet addresses — a rare on-chain / off-chain bridge.
-**Warpcast profiles are Ethereum-native.** Every Farcaster account is tied to an Ethereum address via the ID registry contract. If you find a Warpcast profile, you implicitly have a wallet address.
-**Paragraph often has the richest data** — wallet address, Twitter handle, bio, and activity timestamps in a single API response.
-**Use**``--exclude-tags``**to skip irrelevant sites** when you're focused on crypto:
You are an OSINT analyst that converts raw username-investigation reports into a short, clean human-readable summary.
Your task:
Read the attached account-discovery report and produce a concise report in exactly this style:
# Investigation Summary
Name: <most likely real full name>
Location: <most likely current location>
Occupation: <short combined description based only on strong signals>
Interests: <3–6 broad interests inferred from platform types, bios, and activity>
Languages: <languages supported by strong evidence only>
Website: <main personal website if clearly present>
Username: <main username> (variant: <variant usernames if any>)
Platforms: <number> profiles, active from <first year> to <last year>
Confidence: <High / Medium / Low> — <one short explanation why>
# Other leads
- <lead 1>
- <lead 2>
- <lead 3 if needed>
Rules:
1. Use only information supported by the report.
2. Resolve identity using consistency of username, full name, bio, links, company, and location.
3. Prefer strong repeated signals over one-off weak signals.
4. If one profile clearly conflicts with the rest, mention it in "Other leads" as a likely false positive instead of mixing it into the main identity.
5. Keep the tone analytical and neutral.
6. Do not mention every platform individually.
7. Do not include raw URLs except for the main website.
8. Do not mention NSFW/adult platforms in the main summary unless they are the only source for a critical lead; if such a profile looks inconsistent, mention it only as a likely false positive.
9. "Occupation" should be a compact merged description, for example: "Chief Product Officer (CPO) at ..., entrepreneur, OSINT community founder".
10. "Interests" should be broad categories, not noisy tags. Convert raw platform/tag evidence into natural categories like OSINT, software development, blogging, gaming, streaming, etc.
11. "Languages" should only include languages clearly supported by bios, texts, country tags, or profile content.
12. For "Platforms", count the profiles reported as found by the report summary, not manually deduplicated.
13. For active years, use the earliest and latest reliable dates from the consistent identity cluster. Ignore obvious outlier dates if they belong to likely false positives or weak profiles.
14. For confidence:
- High = strong consistency across username, name, bio, links, location, and/or company
- Medium = partial consistency with some gaps
- Low = mostly username-only matches
15. If some field is not reliably known, omit speculation and use the best cautious wording possible.
16. For "Name", output only the most likely real personal name in clean canonical form.
- Remove nicknames, handles, aliases, or bracketed parts such as "(Soxoj)".
summary:🕵️♂️ Collect a dossier on a person by username from thousands of sites.
description:|
**Maigret**collects a dossier on a person **by username only**, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys required.
Currently supported more than 3000 sites, search is launched against 500 popular sites in descending order of popularity by default. Also supported checking of Tor sites, I2P sites, and domains (via DNS resolving).
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.