Fix context field using class instead of instance in error handling (#2627 )

In process_site_result(), when a check_error is present, the context field was set to str(CheckError) (the class itself) instead of str(check_error) (the error instance). This caused the context to contain the string representation of the class rather than the actual error message. Before fix: context = "<class 'maigret.errors.CheckError'>" After fix: context = "Request timeout error: slow server"
Cloudflare bypass webgate (#2628 )
2026-05-09 08:04:32 +00:00 · 2026-05-09 10:58:06 +03:00 · 2026-05-09 10:48:43 +03:00 · 2026-05-07 23:52:15 +03:00 · 2026-05-06 10:55:08 +02:00 · 2026-05-05 22:21:00 +02:00
128 changed files with 57925 additions and 25684 deletions
@@ -0,0 +1,10 @@
+#!/bin/sh
+echo 'Activating update_sitesmd hook script...'
+poetry run update_sitesmd
+
+echo 'Regenerating db_meta.json...'
+python3 utils/generate_db_meta.py
+
+git add maigret/resources/db_meta.json
+git add maigret/resources/data.json
+git add sites.md
@@ -0,0 +1,5 @@
+# These are supported funding model platforms
+
+patreon: soxoj
+github: soxoj
+buy_me_a_coffee: soxoj
@@ -0,0 +1,13 @@
+---
+name: Add a site
+about: I want to add a new site for Maigret checks
+title: New site
+labels: new-site
+assignees: soxoj
+
+---
+
+Link to the site main page: https://example.com
+Link to an existing account: https://example.com/users/john
+Link to a nonexistent account: https://example.com/users/noonewouldeverusethis7
+Tags: photo, us, ...
@@ -0,0 +1,28 @@
+---
+name: Maigret bug report
+about: I want to report a bug in Maigret functionality
+title: ''
+labels: bug
+assignees: soxoj
+
+---
+
+## Checklist
+
+- [ ] I'm reporting a bug in Maigret functionality
+- [ ] I've checked for similar bug reports including closed ones
+- [ ] I've checked for pull requests that attempt to fix this bug
+
+## Description
+
+Info about Maigret version you are running and environment (`--version`, operation system, ISP provider):
+<INSERT VERSION INFO HERE>
+
+How to reproduce this bug (commandline options / conditions):
+<INSERT EXAMPLE OF CLI COMMAND HERE>
+
+<DESCRIPTION>
+
+<PASTE SCREENSHOT>
+
+<ATTACH LOG FILE>
@@ -0,0 +1,20 @@
+---
+name: Report invalid result
+about: I want to report invalid result of Maigret search
+title: Invalid result
+labels: false-result
+assignees: soxoj
+
+---
+
+Invalid link: <INSERT LINK HERE>
+
+<!--
+
+Put x into the box
+
+[ ] ==> [x]
+
+-->
+
+- [ ] I'm sure that the link leads to "not found" page
@@ -0,0 +1,6 @@
+version: 2
+updates:
+  - package-ecosystem: "pip"
+    directory: "/"
+    schedule:
+      interval: "daily"
@@ -0,0 +1,71 @@
+name: Build docker image and push to DockerHub
+
+on:
+  push:
+    branches: [ main, dev ]
+
+jobs:
+  docker:
+    runs-on: ubuntu-latest
+    steps:
+      -
+        name: Set up QEMU
+        uses: docker/setup-qemu-action@v3
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@v3
+        with:
+          username: ${{ secrets.DOCKER_HUB_USERNAME }}
+          password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}
+      -
+        name: Extract metadata (CLI)
+        id: meta_cli
+        uses: docker/metadata-action@v5
+        with:
+          images: ${{ secrets.DOCKER_HUB_USERNAME }}/maigret
+          tags: |
+            type=raw,value=latest,enable={{is_default_branch}}
+            type=ref,event=branch
+            type=sha,prefix=
+      -
+        name: Extract metadata (Web UI)
+        id: meta_web
+        uses: docker/metadata-action@v5
+        with:
+          images: ${{ secrets.DOCKER_HUB_USERNAME }}/maigret
+          tags: |
+            type=raw,value=web,enable={{is_default_branch}}
+            type=ref,event=branch,suffix=-web
+            type=sha,prefix=web-
+      -
+        name: Build and push (CLI, default)
+        id: docker_build_cli
+        uses: docker/build-push-action@v6
+        with:
+          push: true
+          target: cli
+          tags: ${{ steps.meta_cli.outputs.tags }}
+          labels: ${{ steps.meta_cli.outputs.labels }}
+          platforms: linux/amd64,linux/arm64
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+      -
+        name: Build and push (Web UI)
+        id: docker_build_web
+        uses: docker/build-push-action@v6
+        with:
+          push: true
+          target: web
+          tags: ${{ steps.meta_web.outputs.tags }}
+          labels: ${{ steps.meta_web.outputs.labels }}
+          platforms: linux/amd64,linux/arm64
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+      -
+        name: Image digests
+        run: |
+          echo "cli: ${{ steps.docker_build_cli.outputs.digest }}"
+          echo "web: ${{ steps.docker_build_web.outputs.digest }}"
@@ -0,0 +1,67 @@
+# For most projects, this workflow file will not need changing; you simply need
+# to commit it to your repository.
+#
+# You may wish to alter this file to override the set of languages analyzed,
+# or to provide custom queries or build logic.
+#
+# ******** NOTE ********
+# We have attempted to detect the languages in your repository. Please check
+# the `language` matrix defined below to confirm you have the correct set of
+# supported CodeQL languages.
+#
+name: "CodeQL"
+
+on:
+  push:
+    branches: [ main ]
+  schedule:
+    - cron: '23 6 * * 6'
+
+jobs:
+  analyze:
+    name: Analyze
+    runs-on: ubuntu-latest
+    permissions:
+      actions: read
+      contents: read
+      security-events: write
+
+    strategy:
+      fail-fast: false
+      matrix:
+        language: [ 'python' ]
+        # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
+        # Learn more about CodeQL language support at https://git.io/codeql-language-support
+
+    steps:
+    - name: Checkout repository
+      uses: actions/checkout@v2
+
+    # Initializes the CodeQL tools for scanning.
+    - name: Initialize CodeQL
+      uses: github/codeql-action/init@v1
+      with:
+        languages: ${{ matrix.language }}
+        # If you wish to specify custom queries, you can do so here or in a config file.
+        # By default, queries listed here will override any specified in a config file.
+        # Prefix the list here with "+" to use these queries and those in the config file.
+        # queries: ./path/to/local/query, your-org/your-repo/queries@main
+
+    # Autobuild attempts to build any compiled languages  (C/C++, C#, or Java).
+    # If this step fails, then you should remove it and run the build manually (see below)
+    - name: Autobuild
+      uses: github/codeql-action/autobuild@v1
+
+    # ℹ️ Command-line programs to run using the OS shell.
+    # 📚 https://git.io/JvXDl
+
+    # ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
+    #    and modify them (or add more) to build your code if your project
+    #    uses a compiled language
+
+    #- run: |
+    #   make bootstrap
+    #   make release
+
+    - name: Perform CodeQL Analysis
+      uses: github/codeql-action/analyze@v1
@@ -0,0 +1,70 @@
+name: Package exe with PyInstaller - Windows
+
+on:
+  push:
+    branches: [main, dev]
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      # Wine Python (not Linux) runs PyInstaller; altgraph needs pkg_resources — reinstall setuptools after all deps.
+      - name: Prepare requirements for Wine (setuptools last)
+        run: |
+          set -euo pipefail
+          cp pyinstaller/requirements.txt pyinstaller/requirements-wine.txt
+          {
+            echo ""
+            echo "# CI: setuptools last so pkg_resources exists for PyInstaller/altgraph in Wine"
+            echo "setuptools==70.0.0"
+          } >> pyinstaller/requirements-wine.txt
+
+      - name: PyInstaller Windows Build
+        uses: JackMcKew/pyinstaller-action-windows@main
+        with:
+          path: pyinstaller
+          requirements: requirements-wine.txt
+
+      - name: Upload PyInstaller Binary to Workflow as Artifact
+        if: success()
+        uses: actions/upload-artifact@v4
+        with:
+          name: maigret_standalone_win32
+          path: pyinstaller/dist/windows
+
+      - name: Download PyInstaller Binary
+        if: success()
+        uses: actions/download-artifact@v4
+        with:
+          name: maigret_standalone_win32
+
+      - name: Create New Release and Upload PyInstaller Binary to Release
+        if: success()
+        uses: ncipollo/release-action@v1.14.0
+        id: create_release
+        with:
+          allowUpdates: true
+          draft: false
+          prerelease: false
+          artifactErrorsFailBuild: true
+          makeLatest: true
+          replacesArtifacts: true
+          artifacts: maigret_standalone.exe
+          name: Development Windows Release [${{ github.ref_name }}]
+          tag: ${{ github.ref_name }}
+          body: |
+            This is a development release built from the **${{ github.ref_name }}** branch.
+
+            Take into account that `dev` releases may be unstable.
+            Please, use [the development release](https://github.com/soxoj/maigret/releases/tag/main) build from the **main** branch.
+
+            Instructions:
+            - Download the attached file `maigret_standalone.exe` to get the Windows executable.
+            - Video guide on how to run it: https://youtu.be/qIgwTZOmMmM
+            - For detailed documentation, visit: https://maigret.readthedocs.io/en/latest/
+
+        env:
+          GITHUB_TOKEN: ${{ github.token }}
@@ -1,33 +1,49 @@
-# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
-# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
-
-name: Python package
+name: Linting and testing

 on:
  push:
-    branches: [ main ]
+    branches: [main]
  pull_request:
-    branches: [ main ]
+    branches: [main]
+    types: [opened, synchronize, reopened]

 jobs:
  build:
-
    runs-on: ubuntu-latest
+
    strategy:
+      fail-fast: false
      matrix:
-        python-version: [3.6.9, 3.7, 3.8, 3.9]
+        python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]

    steps:
-    - uses: actions/checkout@v2
-    - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python@v2
-      with:
-        python-version: ${{ matrix.python-version }}
-    - name: Install dependencies
-      run: |
-        python -m pip install --upgrade pip
-        python -m pip install flake8 pytest pytest-rerunfailures
-        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
-    - name: Test with pytest
-      run: |
-        pytest --reruns 3 --reruns-delay 5
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install system dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y libcairo2-dev
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          python -m pip install poetry
+          python -m poetry install --with dev
+
+      - name: Test with Coverage and Pytest (fail if coverage is low)
+        run: |
+          poetry run coverage run --source=./maigret -m pytest --reruns 3 --reruns-delay 5 tests
+          poetry run coverage report --fail-under=60
+          poetry run coverage html
+
+      - name: Upload coverage report
+        uses: actions/upload-artifact@v4
+        with:
+          name: htmlcov-${{ strategy.job-index }}
+          path: htmlcov
@@ -1,31 +1,30 @@
-# This workflow will upload a Python Package using Twine when a release is created
-# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
-
-name: Upload Python Package
+name: Upload Python Package to PyPI when a Release is Published

 on:
  release:
-    types: [created]
+    types: [published]

 jobs:
-  deploy:
-
+  pypi-publish:
+    name: Publish release to PyPI
    runs-on: ubuntu-latest
-
+    environment:
+      name: pypi
+      url: https://pypi.org/p/maigret
+    permissions:
+      id-token: write
    steps:
-    - uses: actions/checkout@v2
-    - name: Set up Python
-      uses: actions/setup-python@v2
-      with:
-        python-version: '3.x'
-    - name: Install dependencies
-      run: |
-        python -m pip install --upgrade pip
-        pip install setuptools wheel twine
-    - name: Build and publish
-      env:
-        TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
-        TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
-      run: |
-        python setup.py sdist bdist_wheel
-        twine upload dist/*
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.x"
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install build
+      - name: Build package
+        run: |
+          python -m build
+      - name: Publish package distributions to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1
@@ -0,0 +1,60 @@
+name: Update sites rating and statistics
+
+on:
+  push:
+    branches: [ main ]
+
+concurrency:
+  group: update-sites-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+    - name: Checkout repository
+      uses: actions/checkout@v4
+      with:
+        ref: main
+        fetch-depth: 0 # otherwise, there would be errors pushing refs to the destination repository.
+
+    - name: Install system dependencies
+      run: |
+        sudo apt-get update && sudo apt-get install -y libcairo2-dev
+
+    - name: Build application
+      run: |
+        pip3 install .
+        python3 ./utils/update_site_data.py --empty-only
+
+    - name: Regenerate db_meta.json
+      run: python3 utils/generate_db_meta.py
+
+    - name: Remove ambiguous main tag
+      run: git tag -d main || true
+
+    - name: Check for meaningful changes
+      id: check
+      run: |
+        REAL_CHANGES=$(git diff --unified=0 sites.md | grep '^[+-][^+-]' | grep -v 'The list was updated at' | wc -l)
+        if [ "$REAL_CHANGES" -gt 0 ]; then
+          echo "has_changes=true" >> $GITHUB_OUTPUT
+        else
+          echo "has_changes=false" >> $GITHUB_OUTPUT
+        fi
+
+    - name: Delete existing PR branch
+      if: steps.check.outputs.has_changes == 'true'
+      run: git push origin --delete auto/update-sites-list || true
+
+    - name: Create Pull Request
+      if: steps.check.outputs.has_changes == 'true'
+      uses: peter-evans/create-pull-request@v7
+      with:
+        token: ${{ secrets.GITHUB_TOKEN }}
+        commit-message: "Updated site list and statistics"
+        title: "Automated Sites List Update"
+        body: "Automated changes to sites.md based on new Alexa rankings/statistics."
+        branch: "auto/update-sites-list"
+        base: main
+        delete-branch: true
@@ -1,5 +1,6 @@
 # Virtual Environment
 venv/
+.venv/

 # Editor Configurations
 .vscode/
@@ -15,6 +16,10 @@ src/
 .ipynb_checkpoints
 *.ipynb

+# Logs and backups
+*.log
+*.bak
+
 # Output files, except requirements.txt
 *.txt
 !requirements.txt
@@ -22,9 +27,21 @@ src/
 # Comma-Separated Values (CSV) Reports
 *.csv

-# Excluded sites list
-tests/.excluded_sites
-
 # MacOS Folder Metadata File
 .DS_Store
 /reports/
+
+# Testing
+.coverage
+dist/
+htmlcov/
+/test_*
+
+# Maigret files
+settings.json
+
+# other
+*.egg-info
+build
+LLM
+lib
@@ -0,0 +1,16 @@
+version: 2
+
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.10"
+
+sphinx:
+  configuration: docs/source/conf.py
+
+formats:
+  - pdf
+
+python:
+  install:
+    - requirements: docs/requirements.txt
@@ -1,6 +1,697 @@
 # Changelog

-## [Unreleased]
+## [0.6.0] - 2025-04-10
+
+## What's Changed
+* Updated workflows: added 3.13 to test, updated pypi upload by @soxoj in https://github.com/soxoj/maigret/pull/2111
+* Bump pypdf from 5.1.0 to 6.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2122
+* Bump coverage from 7.9.2 to 7.10.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2117
+* Bump soupsieve from 2.6 to 2.7 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2118
+* Bump mock from 5.1.0 to 5.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2116
+* Bump pytest-asyncio from 1.0.0 to 1.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2114
+* Bump pytest-cov from 6.0.0 to 6.2.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2115
+* Bump xhtml2pdf from 0.2.16 to 0.2.17 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2149
+* Bump requests from 2.32.4 to 2.32.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2165
+* Bump lxml from 5.3.0 to 6.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2146
+* Bump aiodns from 3.2.0 to 3.5.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2148
+* Bump alive-progress from 3.2.0 to 3.3.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2145
+* Bump certifi from 2025.6.15 to 2025.8.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2147
+* Disabled some sites giving false positive results by @soxoj in https://github.com/soxoj/maigret/pull/2170
+* Bump flask from 3.1.1 to 3.1.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2175
+* Bump pyinstaller from 6.11.1 to 6.15.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2174
+* Bump mypy from 1.14.1 to 1.17.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2173
+* Bump pytest from 8.3.4 to 8.4.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2172
+* Bump flake8 from 7.1.1 to 7.3.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2171
+* Bump aiohttp from 3.12.14 to 3.12.15 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2181
+* Bump coverage from 7.10.3 to 7.10.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2180
+* Bump psutil from 6.1.1 to 7.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2179
+* Bump lxml from 6.0.0 to 6.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2178
+* Bump multidict from 6.6.3 to 6.6.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2177
+* Bump soupsieve from 2.7 to 2.8 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2185
+* Bump typing-extensions from 4.14.1 to 4.15.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2182
+* Bump python-bidi from 0.6.3 to 0.6.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2183
+* Bump platformdirs from 4.3.8 to 4.4.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2184
+* Make web interface accessible for Docker deployment by default by @soxoj in https://github.com/soxoj/maigret/pull/2189
+* Bump coverage from 7.10.5 to 7.10.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2192
+* Bump pytest-rerunfailures from 15.1 to 16.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2191
+* Bump pytest-rerunfailures from 15.1 to 16.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2193
+* Bump pytest from 8.4.1 to 8.4.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2194
+* Bump pytest-cov from 6.2.1 to 6.3.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2195
+* Bump pytest-cov from 6.3.0 to 7.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2196
+* Bump mypy from 1.17.1 to 1.18.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2197
+* Bump black from 25.1.0 to 25.9.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2203
+* Bump mypy from 1.18.1 to 1.18.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2202
+* Bump pytest-asyncio from 1.1.0 to 1.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2200
+* Bump pyinstaller from 6.15.0 to 6.16.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2199
+* Bump reportlab from 4.4.3 to 4.4.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2206
+* Bump coverage from 7.10.6 to 7.10.7 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2207
+* Bump psutil from 7.0.0 to 7.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2201
+* Bump asgiref from 3.9.1 to 3.9.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2204
+* Bump lxml from 6.0.1 to 6.0.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2208
+* Bump platformdirs from 4.4.0 to 4.5.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2223
+* Bump asgiref from 3.9.2 to 3.10.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2220
+* Bump yarl from 1.20.1 to 1.22.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2221
+* Bump markupsafe from 3.0.2 to 3.0.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2209
+* Bump multidict from 6.6.4 to 6.7.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2224
+* Bump idna from 3.10 to 3.11 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2227
+* Bump aiohttp from 3.12.15 to 3.13.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2225
+* Bump coverage from 7.10.7 to 7.11.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2230
+* Bump certifi from 2025.8.3 to 2025.10.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2228
+* Bump pytest-rerunfailures from 16.0.1 to 16.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2229
+* Bump attrs from 25.3.0 to 25.4.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2226
+* Bump aiohttp from 3.13.0 to 3.13.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2237
+* Bump pypdf from 6.0.0 to 6.1.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2233
+* Bump black from 25.9.0 to 25.11.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2239
+* Bump python-bidi from 0.6.6 to 0.6.7 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2234
+* Bump psutil from 7.1.0 to 7.1.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2240
+* Bump coverage from 7.11.0 to 7.12.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2241
+* Bump werkzeug from 3.1.3 to 3.1.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2248
+* Bump pypdf from 6.1.3 to 6.4.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2245
+* Bump asgiref from 3.10.0 to 3.11.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2243
+* Bump pytest-asyncio from 1.2.0 to 1.3.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2242
+* Bump aiohttp from 3.13.2 to 3.13.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2261
+* Bump pytest from 8.4.2 to 9.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2244
+* Bump mypy from 1.18.2 to 1.19.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2250
+* ♻️ Refactor: Hardcoded relative path for database file by @tang-vu in https://github.com/soxoj/maigret/pull/2285
+* ✨ Quality: Missing tests for settings cascade and override logic by @tang-vu in https://github.com/soxoj/maigret/pull/2287
+* ✨ Quality: Unexpanded tilde in file path by @tang-vu in https://github.com/soxoj/maigret/pull/2283
+* Bump urllib3 from 2.5.0 to 2.6.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2262
+* Bump pillow from 11.0.0 to 12.1.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2271
+* Bump black from 25.11.0 to 26.3.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2280
+* Bump cryptography from 44.0.1 to 46.0.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2270
+* Bump pypdf from 6.4.0 to 6.9.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2281
+* Dockerfile fix by @soxoj in https://github.com/soxoj/maigret/pull/2290
+* Fixed false positives in top-500 by @soxoj in https://github.com/soxoj/maigret/pull/2292
+* Update Telegram bot link in README by @soxoj in https://github.com/soxoj/maigret/pull/2293
+* Pyinstaller GitHub workflow fix by @soxoj in https://github.com/soxoj/maigret/pull/2298
+* Twitter fixed, mirrors mechanism improvement by @soxoj in https://github.com/soxoj/maigret/pull/2299
+* build(deps): bump flask from 3.1.2 to 3.1.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2289
+* Bump reportlab from 4.4.4 to 4.4.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2251
+* build(deps): bump werkzeug from 3.1.4 to 3.1.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2288
+* Bump certifi from 2025.10.5 to 2025.11.12 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2249
+* Update Telegram bot link in README by @soxoj in https://github.com/soxoj/maigret/pull/2300
+* Improve site-check quality by @soxoj in https://github.com/soxoj/maigret/pull/2301
+* feat(sites): fix false positives: disable 74 broken sites, fix 8 with… by @soxoj in https://github.com/soxoj/maigret/pull/2302
+* Update sites list workflow by @soxoj in https://github.com/soxoj/maigret/pull/2303
+* Bump svglib from 1.5.1 to 1.6.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2205
+* feat(workflow): fix update site data workflow dependency by @soxoj in https://github.com/soxoj/maigret/pull/2306
+* Re-enable taplink.cc with browser User-Agent to bypass Cloudflare by @Copilot in https://github.com/soxoj/maigret/pull/2308
+* feat(workflow): fix update site data workflow err by @soxoj in https://github.com/soxoj/maigret/pull/2312
+* Update site data workflow fix: remove ambiguous main tag by @soxoj in https://github.com/soxoj/maigret/pull/2313
+* Automated Sites List Update by @github-actions[bot] in https://github.com/soxoj/maigret/pull/2314
+* Fix Love.Mail.ru: update to numeric-only identifiers and new profile URL by @Copilot in https://github.com/soxoj/maigret/pull/2307
+* Remove dead site xxxforum.org by @Copilot in https://github.com/soxoj/maigret/pull/2310
+* Disable forums.developer.nvidia.com (auth-gated user profiles) by @Copilot in https://github.com/soxoj/maigret/pull/2305
+* Pin requests-toolbelt>=1.0.0 to fix urllib3 v2 incompatibility by @Copilot in https://github.com/soxoj/maigret/pull/2316
+* build(deps): bump reportlab from 4.4.5 to 4.4.10 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2323
+* build(deps-dev): bump coverage from 7.12.0 to 7.13.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2321
+* build(deps-dev): bump pytest-cov from 7.0.0 to 7.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2320
+* build(deps): bump aiohttp-socks from 0.10.1 to 0.11.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2319
+* Disable false-positive site probe: amateurvoyeurforum.com by @Copilot in https://github.com/soxoj/maigret/pull/2332
+* Disable forums.stevehoffman.tv due to false positives by @Copilot in https://github.com/soxoj/maigret/pull/2331
+* [WIP] Fix false-positive probe for vegalab site by @Copilot in https://github.com/soxoj/maigret/pull/2336
+* Fix RoyalCams site check using BongaCams white-label pattern by @Copilot in https://github.com/soxoj/maigret/pull/2334
+* Fix Setlist site check: switch to message checkType with proper markers by @Copilot in https://github.com/soxoj/maigret/pull/2333
+* [WIP] Fix invalid link on forums.imore.com by @Copilot in https://github.com/soxoj/maigret/pull/2337
+* Automated Sites List Update by @github-actions[bot] in https://github.com/soxoj/maigret/pull/2315
+* Automated Sites List Update by @github-actions[bot] in https://github.com/soxoj/maigret/pull/2339
+* Fix false-positive site probe: Re-enable Taplink with message checkType by @Copilot in https://github.com/soxoj/maigret/pull/2326
+* build(deps): bump aiodns from 3.5.0 to 4.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2345
+* build(deps-dev): bump mypy from 1.19.0 to 1.19.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2347
+* Disable Librusec site check (false positive) by @Copilot in https://github.com/soxoj/maigret/pull/2349
+* Disable MirTesen site check (false positive) by @Copilot in https://github.com/soxoj/maigret/pull/2350
+* build(deps): bump attrs from 25.4.0 to 26.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2344
+* Automated Sites List Update by @github-actions[bot] in https://github.com/soxoj/maigret/pull/2341
+* feat: add cybersecurity platforms + re-enable Root-Me by @juliosuas in https://github.com/soxoj/maigret/pull/2318
+* Fix club.cnews.ru false positive: switch from status_code to message checkType by @Copilot in https://github.com/soxoj/maigret/pull/2342
+* Fix SoundCloud false-positive: switch to message-based check by @Copilot in https://github.com/soxoj/maigret/pull/2355
+* build(deps): bump certifi from 2025.11.12 to 2026.2.25 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2346
+* feat: add tag blacklisting via `--exclude-tags` by @Copilot in https://github.com/soxoj/maigret/pull/2352
+* Fix domain substring matching and NoneType crash in submit dialog by @Copilot in https://github.com/soxoj/maigret/pull/2367
+* feat(core): add POST request support, new sites, migrate to Majestic Million ranking by @soxoj in https://github.com/soxoj/maigret/pull/2317
+* Fix update-site-data workflow race condition on branch push by @Copilot in https://github.com/soxoj/maigret/pull/2366
+* Fix false-positive site checks reported by Maigret Bot by @soxoj in https://github.com/soxoj/maigret/pull/2376
+* build(deps): bump pycountry from 24.6.1 to 26.2.16 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2382
+* Added Max.ru check; --no-progressbar flag fixed by @soxoj in https://github.com/soxoj/maigret/pull/2386
+* build(deps): bump asgiref from 3.11.0 to 3.11.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2384
+* build(deps): bump yarl from 1.22.0 to 1.23.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2383
+* build(deps): bump pypdf from 6.9.1 to 6.9.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2392
+* build(deps-dev): bump pytest-httpserver from 1.1.0 to 1.1.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2397
+* Automated Sites List Update by @github-actions[bot] in https://github.com/soxoj/maigret/pull/2399
+* build(deps): bump requests from 2.32.5 to 2.33.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2394
+* Readme update: commercial use by @soxoj in https://github.com/soxoj/maigret/pull/2403
+* build(deps): bump pyinstaller from 6.16.0 to 6.19.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2405
+* build(deps): bump psutil from 7.1.3 to 7.2.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2406
+* build(deps-dev): bump pytest from 9.0.1 to 9.0.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2381
+* build(deps): bump soupsieve from 2.8 to 2.8.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2404
+* Sites re-check by @soxoj in https://github.com/soxoj/maigret/pull/2423
+* Add urlProbes by @soxoj in https://github.com/soxoj/maigret/pull/2425
+* build(deps): bump cryptography from 46.0.5 to 46.0.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2422
+* Tags and site names improvements by @soxoj in https://github.com/soxoj/maigret/pull/2427
+* Overhaul site tags and naming: add social tag to 33 networks, fill mi… by @soxoj in https://github.com/soxoj/maigret/pull/2430
+* build(deps): bump multidict from 6.7.0 to 6.7.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2396
+* build(deps): bump chardet from 5.2.0 to 7.4.0.post2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2436
+* build(deps): bump platformdirs from 4.5.0 to 4.9.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2434
+* build(deps): bump aiohttp from 3.13.3 to 3.13.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2435
+* build(deps): bump pygments from 2.18.0 to 2.20.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2440
+* build(deps): bump requests from 2.33.0 to 2.33.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2444
+* build(deps-dev): bump mypy from 1.19.1 to 1.20.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2447
+* build(deps): bump aiohttp from 3.13.4 to 3.13.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2448
+* Add site protection tracking system, fix broken site checks (Instagra… by @soxoj in https://github.com/soxoj/maigret/pull/2452
+* Multiple lint and types fixes by @soxoj in https://github.com/soxoj/maigret/pull/2454
+* fix(data): update InterPals absence string to match current site response by @juliosuas in https://github.com/soxoj/maigret/pull/2442
+* Update of MIT License by @soxoj in https://github.com/soxoj/maigret/pull/2455
+* Added Crypto/Web3 site checks by @soxoj in https://github.com/soxoj/maigret/pull/2457
+* DB update mechanism by @soxoj in https://github.com/soxoj/maigret/pull/2458
+* Fix false positives by @soxoj in https://github.com/soxoj/maigret/pull/2459
+* False positive fixes by @soxoj in https://github.com/soxoj/maigret/pull/2460
+* build(deps): bump curl-cffi from 0.14.0 to 0.15.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2462
+* Add Markdown reports for LLM analysis by @soxoj in https://github.com/soxoj/maigret/pull/2463
+* Sites fixes by @soxoj in https://github.com/soxoj/maigret/pull/2464
+* Add installation troubleshooting for missing system dependencies by @Copilot in https://github.com/soxoj/maigret/pull/2465
+* Fix Spotify, add Spotify Community forum by @soxoj in https://github.com/soxoj/maigret/pull/2467
+* Fix crash on `-a --self-check` by adding exception handling to site check coroutines by @Copilot in https://github.com/soxoj/maigret/pull/2466
+* Fix failing test for custom DB path resolution by @soxoj in https://github.com/soxoj/maigret/pull/2468
+* Bump lxml minimum to 6.0.2 for Python 3.14 compatibility by @ocervell in https://github.com/soxoj/maigret/pull/2279
+* build(deps-dev): bump pytest from 9.0.2 to 9.0.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2473
+* Update HackTheBox and Wikipedia to use new API endpoints by @Copilot in https://github.com/soxoj/maigret/pull/2470
+* Automated Sites List Update by @github-actions[bot] in https://github.com/soxoj/maigret/pull/2474
+* build(deps): bump chardet from 7.4.0.post2 to 7.4.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2472
+* build(deps): bump cryptography from 46.0.6 to 46.0.7 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2475
+* vBulletin cleanup, Flarum sites, engine stats, UA bump by @soxoj in https://github.com/soxoj/maigret/pull/2476
+* build(deps): bump platformdirs from 4.9.4 to 4.9.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2477
+* Re-enable 69 stale-disabled sites validated via self-check by @soxoj in https://github.com/soxoj/maigret/pull/2478
+* Fix false positives by @soxoj in https://github.com/soxoj/maigret/pull/2499
+* build(deps): bump socid-extractor from 0.0.27 to 0.0.28 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2502
+* build(deps): bump lxml from 6.0.2 to 6.0.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2501
+* Disable Kinja.com site check by @Copilot in https://github.com/soxoj/maigret/pull/2503
+* Added 3 sites, fixed 6, disabled 8 by @soxoj in https://github.com/soxoj/maigret/pull/2505
+* Bump to 0.6.0 by @soxoj in https://github.com/soxoj/maigret/pull/2506
+* Update workflow to trigger on published releases by @soxoj in https://github.com/soxoj/maigret/pull/2508
+
+**Full Changelog**: https://github.com/soxoj/maigret/compare/v0.5.0...v0.6.0
+
+## [0.5.0] - 2025-08-10
+* Site Supression by @C3n7ral051nt4g3ncy in https://github.com/soxoj/maigret/pull/627
+* Bump yarl from 1.7.2 to 1.8.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/626
+* Streaming sites by @soxoj in https://github.com/soxoj/maigret/pull/628
+* Mirrors by @fen0s in https://github.com/soxoj/maigret/pull/630
+* Added Instagram scrapers by @soxoj in https://github.com/soxoj/maigret/pull/633
+* Bump psutil from 5.9.1 to 5.9.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/624
+* Bump pypdf2 from 2.10.4 to 2.10.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/625
+* Invalid results fixes by @soxoj in https://github.com/soxoj/maigret/pull/634
+* Bump pytest-httpserver from 1.0.5 to 1.0.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/638
+* Bump pypdf2 from 2.10.5 to 2.10.8 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/641
+* Bump certifi from 2022.6.15 to 2022.9.14 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/644
+* Bump idna from 3.3 to 3.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/640
+* fix false positives from bot by @fen0s in https://github.com/soxoj/maigret/pull/663
+* Add pre commit hook by @fen0s in https://github.com/soxoj/maigret/pull/664
+* site deletion by @C3n7ral051nt4g3ncy in https://github.com/soxoj/maigret/pull/648
+* Changed docker run to interactive and remove on exit by @dr-BEat in https://github.com/soxoj/maigret/pull/675
+* Corrected grammar in README.md by @Trkzi-Omar in https://github.com/soxoj/maigret/pull/674
+* fix sites from issues by @fen0s in https://github.com/soxoj/maigret/pull/680
+* correct username in usage examples by @LeonGr in https://github.com/soxoj/maigret/pull/673
+* Update README.md by @johanburati in https://github.com/soxoj/maigret/pull/669
+* Fix typos by @LorenzoSapora in https://github.com/soxoj/maigret/pull/681
+* Build docker images for arm64 and amd64 by @krydos in https://github.com/soxoj/maigret/pull/687
+* Bump certifi from 2022.9.14 to 2022.9.24 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/652
+* Bump aiohttp from 3.8.1 to 3.8.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/651
+* Bump arabic-reshaper from 2.1.3 to 2.1.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/650
+* Update README.md, Repl.it -> Replit with new badge by @PeterDaveHello in https://github.com/soxoj/maigret/pull/692
+* Refactor Dockerfile with best practices by @PeterDaveHello in https://github.com/soxoj/maigret/pull/691
+* Improve README.md Installation section by @PeterDaveHello in https://github.com/soxoj/maigret/pull/690
+* Bump pytest-cov from 3.0.0 to 4.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/688
+* Bump stem from 1.8.0 to 1.8.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/689
+* Bump typing-extensions from 4.3.0 to 4.4.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/698
+* Typo fixes in error.py by @Ben-Chapman in https://github.com/soxoj/maigret/pull/711
+* Fixed docs about tags by @soxoj in https://github.com/soxoj/maigret/pull/715
+* Fixed lightstalking.com by @soxoj in https://github.com/soxoj/maigret/pull/716
+* Fixed YouTube by @soxoj in https://github.com/soxoj/maigret/pull/717
+* Bump pytest-asyncio from 0.19.0 to 0.20.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/732
+* Updated snapcraft yaml by @kz6fittycent in https://github.com/soxoj/maigret/pull/720
+* Bump colorama from 0.4.5 to 0.4.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/733
+* Bump pytest from 7.1.3 to 7.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/734
+* disable not working sites by @fen0s in https://github.com/soxoj/maigret/pull/739
+* disable broken sites by @fen0s in https://github.com/soxoj/maigret/pull/756
+* Bump cloudscraper from 1.2.64 to 1.2.66 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/769
+* fix opensea and shutterstock, disable a few dead sites by @fen0s in https://github.com/soxoj/maigret/pull/798
+* Fixed documentation URL by @soxoj in https://github.com/soxoj/maigret/pull/799
+* Small readme fix by @soxoj in https://github.com/soxoj/maigret/pull/857
+* docs spelling error by @Nadeem-05 in https://github.com/soxoj/maigret/pull/866
+* Fix Pinterest false positive by @therealchiendat in https://github.com/soxoj/maigret/pull/862
+* Added new Websites by @codyMar30 in https://github.com/soxoj/maigret/pull/838
+* Update "future" package to v0.18.3 by @PeterDaveHello in https://github.com/soxoj/maigret/pull/834
+* Bump certifi from 2022.9.24 to 2022.12.7 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/793
+* Update dependency - networkx from v2.5.1 to v2.6 by @PeterDaveHello in https://github.com/soxoj/maigret/pull/738
+* Bump reportlab from 3.6.11 to 3.6.12 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/735
+* Bump typing-extensions from 4.4.0 to 4.5.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/888
+* Bump psutil from 5.9.2 to 5.9.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/741
+* Bump attrs from 22.1.0 to 22.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/892
+* Bump multidict from 6.0.2 to 6.0.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/891
+* Fixed false positives, updated networkx dep, some lint fixes by @soxoj in https://github.com/soxoj/maigret/pull/894
+* Bump lxml from 4.9.1 to 4.9.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/900
+* Bump yarl from 1.8.1 to 1.8.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/899
+* Fixed false positives on Mastodon sites by @soxoj in https://github.com/soxoj/maigret/pull/901
+* Added valid regex for Mastodon instances (#848) by @soxoj in https://github.com/soxoj/maigret/pull/906
+* Fix missing Mastodon Regex on #906 by @therealchiendat in https://github.com/soxoj/maigret/pull/908
+* Bump tqdm from 4.64.1 to 4.65.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/905
+* Bump requests from 2.28.1 to 2.28.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/904
+* Bump psutil from 5.9.4 to 5.9.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/910
+* fix deployment of tests by @noraj in https://github.com/soxoj/maigret/pull/933
+* Added 26 ENS and similar domains with tag `crypto` by @soxoj in https://github.com/soxoj/maigret/pull/942
+* Bump requests from 2.28.2 to 2.31.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/957
+* Update wizard.py by @engNoori in https://github.com/soxoj/maigret/pull/1016
+* Improved search through UnstoppableDomains by @soxoj in https://github.com/soxoj/maigret/pull/1040
+* Added memory.lol (Twitter usernames archive) by @soxoj in https://github.com/soxoj/maigret/pull/1067
+* Disabled and fixed several sites by @soxoj in https://github.com/soxoj/maigret/pull/1132
+* Fixed some sites (again) by @soxoj in https://github.com/soxoj/maigret/pull/1133
+* fix(sec): upgrade reportlab to 3.6.13 by @realize096 in https://github.com/soxoj/maigret/pull/1051
+* Add compatibility with pytest >= 7.3.0 by @tjni in https://github.com/soxoj/maigret/pull/1117
+* Additionally fixed sites, win32 build fix by @soxoj in https://github.com/soxoj/maigret/pull/1148
+* Sites fixes 250823 by @soxoj in https://github.com/soxoj/maigret/pull/1149
+* Bump reportlab from 3.6.12 to 4.0.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1160
+* Bump certifi from 2022.12.7 to 2023.7.22 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1070
+* fix(sec): upgrade certifi to 2022.12.07 by @realize096 in https://github.com/soxoj/maigret/pull/1173
+* Bump cloudscraper from 1.2.66 to 1.2.71 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/914
+* Some sites fixed & cloudflare detection by @soxoj in https://github.com/soxoj/maigret/pull/1178
+* EasyInstaller because everyone likes saving time :) by @CatchySmile in https://github.com/soxoj/maigret/pull/1212
+* Tests fixes + last updates by @soxoj in https://github.com/soxoj/maigret/pull/1228
+* Bump pypdf2 from 2.10.8 to 3.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/815
+* Bump pyvis from 0.2.1 to 0.3.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/861
+* Bump xhtml2pdf from 0.2.8 to 0.2.11 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/935
+* Bump flake8 from 5.0.4 to 6.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1091
+* Bump aiohttp from 3.8.3 to 3.8.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1222
+* Specified pyinstaller version by @soxoj in https://github.com/soxoj/maigret/pull/1230
+* Pyinstaller fix by @soxoj in https://github.com/soxoj/maigret/pull/1231
+* Test pyinstaller on dev branch by @soxoj in https://github.com/soxoj/maigret/pull/1233
+* Update main from dev again by @soxoj in https://github.com/soxoj/maigret/pull/1234
+* Bump typing-extensions from 4.5.0 to 4.8.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1239
+* Bump pytest-rerunfailures from 10.2 to 12.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1237
+* Bump async-timeout from 4.0.2 to 4.0.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1238
+* Changed pyinstaller dir by @soxoj in https://github.com/soxoj/maigret/pull/1245
+* Bump tqdm from 4.65.0 to 4.66.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1235
+* Updating site checkers, disabling suspended sites by @MeowyPouncer in https://github.com/soxoj/maigret/pull/1266
+* Updated site statistics by @soxoj in https://github.com/soxoj/maigret/pull/1273
+* Compat RegataOS (Opensuse) by @Jeiel0rbit in https://github.com/soxoj/maigret/pull/1308
+* fix reddit by @hhhtylerw in https://github.com/soxoj/maigret/pull/1296
+* Added Telegram bot link by @soxoj in https://github.com/soxoj/maigret/pull/1321
+* Added SOWEL classification by @soxoj in https://github.com/soxoj/maigret/pull/1453
+* Bump jinja2 from 3.1.2 to 3.1.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1358
+* Fixed/Disabled sites. Update requirements.txt by @rly0nheart in https://github.com/soxoj/maigret/pull/1517
+* Fixed 4 sites, added 6 sites, disabled 27 sites by @rly0nheart in https://github.com/soxoj/maigret/pull/1536
+* Fixed 3 sites, disabed 3, added  by @rly0nheart in https://github.com/soxoj/maigret/pull/1539
+* Bump socid-extractor from 0.0.24 to 0.0.26 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1546
+* Added code conventions to CONTRIBUTING.md by @Lord-Topa in https://github.com/soxoj/maigret/pull/1589
+* Readme by @Lord-Topa in https://github.com/soxoj/maigret/pull/1588
+* Update data.json by @ranlo in https://github.com/soxoj/maigret/pull/1559
+* Adding permutator feature for usernames by @balestek in https://github.com/soxoj/maigret/pull/1575
+* Alik.cz indirectly requests removal by @ppfeister in https://github.com/soxoj/maigret/pull/1671
+* Fixed 1 site, PyInstaller workflow, Google Colab example by @Ixve in https://github.com/soxoj/maigret/pull/1558
+* Bump soupsieve from 2.5 to 2.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1708
+* Added dev documentation, fixed some sites, removed GitHub issue links… by @soxoj in https://github.com/soxoj/maigret/pull/1869
+* Bump cryptography from 42.0.7 to 43.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1870
+* Bump requests-futures from 1.0.1 to 1.0.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1868
+* Bump werkzeug from 3.0.3 to 3.0.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1846
+* Added .readthedocs.yaml, fixed Pyinstaller and Docker workflows by @soxoj in https://github.com/soxoj/maigret/pull/1874
+* Added GitHub and BuyMeACoffee sponsorships by @soxoj in https://github.com/soxoj/maigret/pull/1875
+* Bump psutil from 5.9.5 to 6.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1839
+* Bump flake8 from 6.1.0 to 7.1.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1692
+* Bump future from 0.18.3 to 1.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1545
+* Bump urllib3 from 2.2.1 to 2.2.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1600
+* Bump certifi from 2023.11.17 to 2024.8.30 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1840
+* Fixed test for aiohttp 3.10 by @soxoj in https://github.com/soxoj/maigret/pull/1876
+* Bump aiohttp from 3.9.5 to 3.10.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1721
+* Added new badges to README by @soxoj in https://github.com/soxoj/maigret/pull/1877
+* Show detailed error statistics for `-v` by @soxoj in https://github.com/soxoj/maigret/pull/1879
+* Disabled unavailable sites by @soxoj in https://github.com/soxoj/maigret/pull/1880
+* Added 7 sites, implemented integration with Marple, docs update by @soxoj in https://github.com/soxoj/maigret/pull/1881
+* Bump pefile from 2022.5.30 to 2024.8.26 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1883
+* Bump lxml from 4.9.4 to 5.3.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1884
+* New sites added by @soxoj in https://github.com/soxoj/maigret/pull/1888
+* Improved self-check mode, added 15 sites by @soxoj in https://github.com/soxoj/maigret/pull/1887
+* Bump pyinstaller from 6.1 to 6.11.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1882
+* Bump pytest-asyncio from 0.23.7 to 0.23.8 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1885
+* Pyinstaller bump & pefile fix by @soxoj in https://github.com/soxoj/maigret/pull/1890
+* Bump python-bidi from 0.4.2 to 0.6.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1886
+* Sites checks fixes by @soxoj in https://github.com/soxoj/maigret/pull/1896
+* Parallel execution optimization by @soxoj in https://github.com/soxoj/maigret/pull/1897
+* Maigret bot support (custom progress function fixed) by @soxoj in https://github.com/soxoj/maigret/pull/1898
+* Bump markupsafe from 2.1.5 to 3.0.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1895
+* Retries set to 0 by default, refactored code of executor with progress by @soxoj in https://github.com/soxoj/maigret/pull/1899
+* Bump aiohttp-socks from 0.7.1 to 0.9.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1900
+* Bump pycountry from 23.12.11 to 24.6.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1903
+* Bump pytest-cov from 4.1.0 to 6.0.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1902
+* Bump pyvis from 0.2.1 to 0.3.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1893
+* Close http connections (#1595) by @soxoj in https://github.com/soxoj/maigret/pull/1905
+* New logo by @soxoj in https://github.com/soxoj/maigret/pull/1906
+* Fixed dateutil parsing error for CDT timezone by @soxoj in https://github.com/soxoj/maigret/pull/1907
+* Bump alive-progress from 2.4.1 to 3.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1910
+* Permutator output and documentation updates by @soxoj in https://github.com/soxoj/maigret/pull/1914
+* Bump aiohttp from 3.11.7 to 3.11.8 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1912
+* Bump async-timeout from 4.0.3 to 5.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1909
+* An recursive search animation in README has been updated by @soxoj in https://github.com/soxoj/maigret/pull/1915
+* Bump pytest-rerunfailures from 12.0 to 15.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1911
+* Bump attrs from 22.2.0 to 24.2.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1913
+* Sites fixes by @soxoj in https://github.com/soxoj/maigret/pull/1917
+* Update README.md by @soxoj in https://github.com/soxoj/maigret/pull/1919
+* Refactored sites module, updated documentation by @soxoj in https://github.com/soxoj/maigret/pull/1918
+* Bump aiohttp from 3.11.8 to 3.11.9 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1920
+* Bump pytest from 7.4.4 to 8.3.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1923
+* Bump yarl from 1.18.0 to 1.18.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1922
+* Bump pytest-asyncio from 0.23.8 to 0.24.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1925
+* Documentation update by @soxoj in https://github.com/soxoj/maigret/pull/1926
+* Bump mock from 4.0.3 to 5.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1921
+* Bump pywin32-ctypes from 0.2.1 to 0.2.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1924
+* Installation docs update by @soxoj in https://github.com/soxoj/maigret/pull/1927
+* Disabled Figma check by @soxoj in https://github.com/soxoj/maigret/pull/1928
+* Put Windows executable in Releases for each dev and main commit by @soxoj in https://github.com/soxoj/maigret/pull/1929
+* Updated PyInstaller workflow by @soxoj in https://github.com/soxoj/maigret/pull/1930
+* Documentation update by @soxoj in https://github.com/soxoj/maigret/pull/1931
+* Fixed Figma check and some bugs by @soxoj in https://github.com/soxoj/maigret/pull/1932
+* Bump six from 1.16.0 to 1.17.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1933
+* Activation mechanism documentation added by @soxoj in https://github.com/soxoj/maigret/pull/1935
+* Readme/docs update based on GH discussions by @soxoj in https://github.com/soxoj/maigret/pull/1936
+* Bump aiohttp from 3.11.9 to 3.11.10 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1937
+* Weibo site check fix, activation mechanism added by @soxoj in https://github.com/soxoj/maigret/pull/1938
+* Fixed Ebay and BongaCams checks by @soxoj in https://github.com/soxoj/maigret/pull/1939
+* Sites fixes by @soxoj in https://github.com/soxoj/maigret/pull/1940
+* Fixed Linktr and discourse.mozilla.org by @soxoj in https://github.com/soxoj/maigret/pull/1941
+* Refactored self-check method, code formatting, small lint fixes by @soxoj in https://github.com/soxoj/maigret/pull/1942
+* Refactoring, test coverage increased to 60% by @soxoj in https://github.com/soxoj/maigret/pull/1943
+* Added a test for submitter by @soxoj in https://github.com/soxoj/maigret/pull/1944
+* Update README.md by @soxoj in https://github.com/soxoj/maigret/pull/1949
+* Updated OP.GG checks by @soxoj in https://github.com/soxoj/maigret/pull/1950
+* Fixed ProductHunt check by @soxoj in https://github.com/soxoj/maigret/pull/1951
+* Improved check feature extraction function, added tests by @soxoj in https://github.com/soxoj/maigret/pull/1952
+* Submit improvements and site check fixes by @soxoj in https://github.com/soxoj/maigret/pull/1956
+* chore: update submit.py by @eltociear in https://github.com/soxoj/maigret/pull/1957
+* Fixed Gravatar parsing (socid_extractor) by @soxoj in https://github.com/soxoj/maigret/pull/1958
+* Site check fixes by @soxoj in https://github.com/soxoj/maigret/pull/1962
+* fix bad linux filename generation by @overcuriousity in https://github.com/soxoj/maigret/pull/1961
+* Bump pytest-asyncio from 0.24.0 to 0.25.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1963
+* Fixed flaky tests to check cookies by @soxoj in https://github.com/soxoj/maigret/pull/1965
+* Preparation of 0.5.0 alpha version by @soxoj in https://github.com/soxoj/maigret/pull/1966
+* Created web frontend launched via --web flag by @overcuriousity in https://github.com/soxoj/maigret/pull/1967
+* Bump certifi from 2024.8.30 to 2024.12.14 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1969
+* Bump attrs from 24.2.0 to 24.3.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1970
+* Added web interface docs by @soxoj in https://github.com/soxoj/maigret/pull/1972
+* Small docs and parameters fixes for web interface mode by @soxoj in https://github.com/soxoj/maigret/pull/1973
+* [ImgBot] Optimize images by @imgbot[bot] in https://github.com/soxoj/maigret/pull/1974
+* Improving the web interface by @overcuriousity in https://github.com/soxoj/maigret/pull/1975
+* make graph more meaningful by @overcuriousity in https://github.com/soxoj/maigret/pull/1977
+* Async generator-executor for site checks by @soxoj in https://github.com/soxoj/maigret/pull/1978
+* Bump aiohttp from 3.11.10 to 3.11.11 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1979
+* Bump psutil from 6.1.0 to 6.1.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1980
+* Bump aiohttp-socks from 0.9.1 to 0.10.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1985
+* Bump mypy from 1.13.0 to 1.14.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1983
+* Bump aiohttp-socks from 0.10.0 to 0.10.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1987
+* Bump jinja2 from 3.1.4 to 3.1.5 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1982
+* Bump coverage from 7.6.9 to 7.6.10 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1986
+* Bump pytest-asyncio from 0.25.0 to 0.25.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1989
+* Bump mypy from 1.14.0 to 1.14.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1988
+* Bump pytest-asyncio from 0.25.1 to 0.25.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/1990
+* docs: update usage-examples.rst by @eltociear in https://github.com/soxoj/maigret/pull/1996
+* upload-artifact action in python test workflow updated to v4 by @soxoj in https://github.com/soxoj/maigret/pull/2024
+* Pass db_file configuration to web interface by @pykereaper in https://github.com/soxoj/maigret/pull/2019
+* Fix usage of data.json files from web by @pykereaper in https://github.com/soxoj/maigret/pull/2020
+* Bump black from 24.10.0 to 25.1.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2001
+* Important Update Installer.bat by @CatchySmile in https://github.com/soxoj/maigret/pull/1994
+* Bump cryptography from 44.0.0 to 44.0.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2005
+* Bump jinja2 from 3.1.5 to 3.1.6 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2011
+* [#2010] Add 6 more websites to manage by @pylapp in https://github.com/soxoj/maigret/pull/2009
+* Bump flask from 3.1.0 to 3.1.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2028
+* Bump requests from 2.32.3 to 2.32.4 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2026
+* Bump pycares from 4.5.0 to 4.9.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2025
+* Bump pytest-asyncio from 0.25.2 to 0.26.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2016
+* Bump urllib3 from 2.2.3 to 2.5.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2027
+* Disable ICQ site by @Echo-Darlyson in https://github.com/soxoj/maigret/pull/1993
+* Bump attrs from 24.3.0 to 25.3.0 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2014
+* Bump certifi from 2024.12.14 to 2025.1.31 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2004
+* Bump typing-extensions from 4.12.2 to 4.14.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2038
+* Disable AskFM by @MR-VL in https://github.com/soxoj/maigret/pull/2037
+* Bump platformdirs from 4.3.6 to 4.3.8 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2033
+* Bump coverage from 7.6.10 to 7.9.2 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2039
+* Bump aiohttp from 3.11.11 to 3.12.14 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2041
+* Bump yarl from 1.18.3 to 1.20.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2032
+* Fixed test dialog_adds_site_negative by @soxoj in https://github.com/soxoj/maigret/pull/2107
+* Bump reportlab from 4.2.5 to 4.4.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2063
+* Bump asgiref from 3.8.1 to 3.9.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2040
+* Bump multidict from 6.1.0 to 6.6.3 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2034
+* Bump pytest-rerunfailures from 15.0 to 15.1 by @dependabot[bot] in https://github.com/soxoj/maigret/pull/2030
+
+**Full Changelog**: https://github.com/soxoj/maigret/compare/v0.4.4...v0.5.0
+
+## [0.4.4] - 2022-09-03
+* Fixed some false positives by @soxoj in https://github.com/soxoj/maigret/pull/433
+* Drop Python 3.6 support by @soxoj in https://github.com/soxoj/maigret/pull/434
+* Bump xhtml2pdf from 0.2.5 to 0.2.7 by @dependabot in https://github.com/soxoj/maigret/pull/409
+* Bump reportlab from 3.6.6 to 3.6.9 by @dependabot in https://github.com/soxoj/maigret/pull/403
+* Bump markupsafe from 2.0.1 to 2.1.1 by @dependabot in https://github.com/soxoj/maigret/pull/389
+* Bump pycountry from 22.1.10 to 22.3.5 by @dependabot in https://github.com/soxoj/maigret/pull/384
+* Bump pypdf2 from 1.26.0 to 1.27.4 by @dependabot in https://github.com/soxoj/maigret/pull/438
+* Update GH actions by @soxoj in https://github.com/soxoj/maigret/pull/439
+* Bump tqdm from 4.63.0 to 4.64.0 by @dependabot in https://github.com/soxoj/maigret/pull/440
+* Bump jinja2 from 3.0.3 to 3.1.1 by @dependabot in https://github.com/soxoj/maigret/pull/441
+* Bump soupsieve from 2.3.1 to 2.3.2 by @dependabot in https://github.com/soxoj/maigret/pull/436
+* Bump pypdf2 from 1.26.0 to 1.27.4 by @dependabot in https://github.com/soxoj/maigret/pull/442
+* Bump pyvis from 0.1.9 to 0.2.0 by @dependabot in https://github.com/soxoj/maigret/pull/443
+* Bump pypdf2 from 1.27.4 to 1.27.6 by @dependabot in https://github.com/soxoj/maigret/pull/448
+* Bump typing-extensions from 4.1.1 to 4.2.0 by @dependabot in https://github.com/soxoj/maigret/pull/447
+* Bump soupsieve from 2.3.2 to 2.3.2.post1 by @dependabot in https://github.com/soxoj/maigret/pull/444
+* Bump pypdf2 from 1.27.6 to 1.27.7 by @dependabot in https://github.com/soxoj/maigret/pull/449
+* Bump pypdf2 from 1.27.7 to 1.27.8 by @dependabot in https://github.com/soxoj/maigret/pull/450
+* XMind 8 report warning and some docs update by @soxoj in https://github.com/soxoj/maigret/pull/452
+* False positive fixes 24.04.22 by @soxoj in https://github.com/soxoj/maigret/pull/455
+* Bump pypdf2 from 1.27.8 to 1.27.9 by @dependabot in https://github.com/soxoj/maigret/pull/456
+* Bump pytest from 7.0.1 to 7.1.2 by @dependabot in https://github.com/soxoj/maigret/pull/457
+* Bump jinja2 from 3.1.1 to 3.1.2 by @dependabot in https://github.com/soxoj/maigret/pull/460
+* Ubisoft forums addition by @fen0s in https://github.com/soxoj/maigret/pull/461
+* Add BYOND, Figma, BeatStars by @fen0s in https://github.com/soxoj/maigret/pull/462
+* fix Figma username definition, add a bunch of sites by @fen0s in https://github.com/soxoj/maigret/pull/464
+* Bump pypdf2 from 1.27.9 to 1.27.10 by @dependabot in https://github.com/soxoj/maigret/pull/465
+* Bump pypdf2 from 1.27.10 to 1.27.12 by @dependabot in https://github.com/soxoj/maigret/pull/466
+* Sites fixes 05 05 22 by @soxoj in https://github.com/soxoj/maigret/pull/469
+* Bump pyvis from 0.2.0 to 0.2.1 by @dependabot in https://github.com/soxoj/maigret/pull/472
+* Social analyzer websites, also fixing presense strs by @fen0s in https://github.com/soxoj/maigret/pull/471
+* Updated logic of false positive risk estimating by @soxoj in https://github.com/soxoj/maigret/pull/475
+* Improved usability of external progressbar func by @soxoj in https://github.com/soxoj/maigret/pull/476
+* New sites added, some tags/rank update by @soxoj in https://github.com/soxoj/maigret/pull/477
+* Added new sites by @soxoj in https://github.com/soxoj/maigret/pull/480
+* Added new forums, updated ranks, some utils improvements by @soxoj in https://github.com/soxoj/maigret/pull/481
+* Disabled sites with false positives results by @soxoj in https://github.com/soxoj/maigret/pull/482
+* Bump certifi from 2021.10.8 to 2022.5.18.1 by @dependabot in https://github.com/soxoj/maigret/pull/488
+* Bump psutil from 5.9.0 to 5.9.1 by @dependabot in https://github.com/soxoj/maigret/pull/490
+* Bump pypdf2 from 1.27.12 to 1.28.1 by @dependabot in https://github.com/soxoj/maigret/pull/491
+* Bump pypdf2 from 1.28.1 to 1.28.2 by @dependabot in https://github.com/soxoj/maigret/pull/493
+* added and fixed some websites in data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/494
+* Bump pypdf2 from 1.28.2 to 2.0.0 by @dependabot in https://github.com/soxoj/maigret/pull/504
+* Bump pefile from 2021.9.3 to 2022.5.30 by @dependabot in https://github.com/soxoj/maigret/pull/499
+* Updated sites list, added disabled Anilist by @soxoj in https://github.com/soxoj/maigret/pull/502
+* Bump lxml from 4.8.0 to 4.9.0 by @dependabot in https://github.com/soxoj/maigret/pull/503
+* Compatibility with Python 10 by @soxoj in https://github.com/soxoj/maigret/pull/509
+* feat: add .log & .bak files to gitignore in https://github.com/soxoj/maigret/pull/511
+* fix some sites and delete abandoned by @fen0s in https://github.com/soxoj/maigret/pull/526
+* Fixesjulyfirst by @fen0s in https://github.com/soxoj/maigret/pull/533
+* yazbel, aboutcar, zhihu by @fen0s in https://github.com/soxoj/maigret/pull/531
+* Fixes july third by @fen0s in https://github.com/soxoj/maigret/pull/535
+* Update data.json by @fen0s in https://github.com/soxoj/maigret/pull/539
+* Update data.json by @fen0s in https://github.com/soxoj/maigret/pull/540
+* Bump reportlab from 3.6.9 to 3.6.11 by @dependabot in https://github.com/soxoj/maigret/pull/543
+* Bump requests from 2.27.1 to 2.28.1 by @dependabot in https://github.com/soxoj/maigret/pull/530
+* Bump pypdf2 from 2.0.0 to 2.5.0 by @dependabot in https://github.com/soxoj/maigret/pull/542
+* Bump xhtml2pdf from 0.2.7 to 0.2.8 by @dependabot in https://github.com/soxoj/maigret/pull/522
+* Bump lxml from 4.9.0 to 4.9.1 by @dependabot in https://github.com/soxoj/maigret/pull/538
+* disable yandex music + set utf8 encoding by @fen0s in https://github.com/soxoj/maigret/pull/562
+* fix false positives by @fen0s in https://github.com/soxoj/maigret/pull/577
+* disable Instagram, fix two false positives by @fen0s in https://github.com/soxoj/maigret/pull/578
+* Bump certifi from 2022.5.18.1 to 2022.6.15 by @dependabot in https://github.com/soxoj/maigret/pull/551
+* August15 by @fen0s in https://github.com/soxoj/maigret/pull/591
+* Bump pytest-httpserver from 1.0.4 to 1.0.5 by @dependabot in https://github.com/soxoj/maigret/pull/583
+* Bump typing-extensions from 4.2.0 to 4.3.0 by @dependabot in https://github.com/soxoj/maigret/pull/549
+* Bump colorama from 0.4.4 to 0.4.5 by @dependabot in https://github.com/soxoj/maigret/pull/548
+* Bump chardet from 4.0.0 to 5.0.0 by @dependabot in https://github.com/soxoj/maigret/pull/550
+* Bump cloudscraper from 1.2.60 to 1.2.63 by @dependabot in https://github.com/soxoj/maigret/pull/600
+* Bump flake8 from 4.0.1 to 5.0.4 by @dependabot in https://github.com/soxoj/maigret/pull/598
+* Bump attrs from 21.4.0 to 22.1.0 by @dependabot in https://github.com/soxoj/maigret/pull/597
+* Bump pytest-asyncio from 0.18.2 to 0.19.0 by @dependabot in https://github.com/soxoj/maigret/pull/601
+* Bump pypdf2 from 2.5.0 to 2.10.4 by @dependabot in https://github.com/soxoj/maigret/pull/606
+* Bump pytest from 7.1.2 to 7.1.3 by @dependabot in https://github.com/soxoj/maigret/pull/613
+* Update sites.md -Gitmemory.com suppression by @C3n7ral051nt4g3ncy in https://github.com/soxoj/maigret/pull/610
+* Bump cloudscraper from 1.2.63 to 1.2.64 by @dependabot in https://github.com/soxoj/maigret/pull/614
+* Bump pycountry from 22.1.10 to 22.3.5 by @dependabot in https://github.com/soxoj/maigret/pull/607
+* add ProtonMail, disable 3 broken sites by @fen0s in https://github.com/soxoj/maigret/pull/619
+* Bump tqdm from 4.64.0 to 4.64.1 by @dependabot in https://github.com/soxoj/maigret/pull/618
+
+**Full Changelog**: https://github.com/soxoj/maigret/compare/v0.4.3...v0.4.4
+
+## [0.4.3] - 2022-04-13
+* Added Sites to data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/386
+* added new Websites to data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/390
+* Skipped broken tests by @soxoj in https://github.com/soxoj/maigret/pull/397
+* Added new Websites to data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/401
+* Added new Websites to data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/404
+* Updated statistics by @soxoj in https://github.com/soxoj/maigret/pull/406
+* Added new Websites to data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/413
+* Disabled houzz.com, updated sites statistics by @soxoj in https://github.com/soxoj/maigret/pull/422
+* Fixed last false positives by @soxoj in https://github.com/soxoj/maigret/pull/424
+* Fixed actual false positives by @soxoj in https://github.com/soxoj/maigret/pull/431
+
+**Full Changelog**: https://github.com/soxoj/maigret/compare/v0.4.2...v0.4.3
+
+## [0.4.2] - 2022-03-07
+* [ImgBot] Optimize images by @imgbot in https://github.com/soxoj/maigret/pull/319
+* Bump pytest-asyncio from 0.17.0 to 0.17.1 by @dependabot in https://github.com/soxoj/maigret/pull/321
+* Bump pytest-asyncio from 0.17.1 to 0.17.2 by @dependabot in https://github.com/soxoj/maigret/pull/323
+* Disabled Ruboard by @soxoj in https://github.com/soxoj/maigret/pull/327
+* Disable kinooh, sites list update workflow added by @soxoj in https://github.com/soxoj/maigret/pull/329
+* Bump multidict from 5.2.0 to 6.0.1 by @dependabot in https://github.com/soxoj/maigret/pull/332
+* Bump multidict from 6.0.1 to 6.0.2 by @dependabot in https://github.com/soxoj/maigret/pull/333
+* Bump pytest-httpserver from 1.0.3 to 1.0.4 by @dependabot in https://github.com/soxoj/maigret/pull/334
+* Bump pytest from 6.2.5 to 7.0.0 by @dependabot in https://github.com/soxoj/maigret/pull/339
+* Bump pytest-asyncio from 0.17.2 to 0.18.0 by @dependabot in https://github.com/soxoj/maigret/pull/340
+* Bump pytest-asyncio from 0.18.0 to 0.18.1 by @dependabot in https://github.com/soxoj/maigret/pull/343
+* Bump pytest from 7.0.0 to 7.0.1 by @dependabot in https://github.com/soxoj/maigret/pull/345
+* Bump typing-extensions from 4.0.1 to 4.1.1 by @dependabot in https://github.com/soxoj/maigret/pull/346
+* Bump lxml from 4.7.1 to 4.8.0 by @dependabot in https://github.com/soxoj/maigret/pull/350
+* Pin reportlab version by @cyb3rk0tik in https://github.com/soxoj/maigret/pull/351
+* Fix reportlab not only for testing by @cyb3rk0tik in https://github.com/soxoj/maigret/pull/352
+* Added some scripts by @soxoj in https://github.com/soxoj/maigret/pull/355
+* Added package publishing instruction by @soxoj in https://github.com/soxoj/maigret/pull/356
+* Added DB statistics autoupdate and write to sites.md by @soxoj in https://github.com/soxoj/maigret/pull/357
+* CI autoupdate by @soxoj in https://github.com/soxoj/maigret/pull/359
+* Op.gg fixes by @soxoj in https://github.com/soxoj/maigret/pull/363
+* Wikipedia fix by @soxoj in https://github.com/soxoj/maigret/pull/365
+* Disabled Netvibes and LeetCode by @soxoj in https://github.com/soxoj/maigret/pull/366
+* Fixed several false positives, improved statistics info by @soxoj in https://github.com/soxoj/maigret/pull/368
+* Fix false positives  by @soxoj in https://github.com/soxoj/maigret/pull/370
+* Fixed the rest of false positives for now by @soxoj in https://github.com/soxoj/maigret/pull/371
+* Fix false positive and CI by @soxoj in https://github.com/soxoj/maigret/pull/372
+* Added new sites to data.json by @kustermariocoding in https://github.com/soxoj/maigret/pull/375
+* Fixed issue with str alexaRank by @soxoj in https://github.com/soxoj/maigret/pull/382
+* Bump tqdm from 4.62.3 to 4.63.0 by @dependabot in https://github.com/soxoj/maigret/pull/374
+* Bump pytest-asyncio from 0.18.1 to 0.18.2 by @dependabot in https://github.com/soxoj/maigret/pull/380
+* @imgbot made their first contribution in https://github.com/soxoj/maigret/pull/319
+* @kustermariocoding made their first contribution in https://github.com/soxoj/maigret/pull/375
+
+**Full Changelog**: https://github.com/soxoj/maigret/compare/v0.4.1...v0.4.2
+
+## [0.4.1] - 2022-01-15
+* Added dozen of sites, improved submit mode by @soxoj in https://github.com/soxoj/maigret/pull/288
+* Bump requests from 2.26.0 to 2.27.0 by @dependabot in https://github.com/soxoj/maigret/pull/292
+* changed Bayoushooter to use XenForo and foursquare to use correct checkType by @antomarsi in https://github.com/soxoj/maigret/pull/289
+* Bump requests from 2.27.0 to 2.27.1 by @dependabot in https://github.com/soxoj/maigret/pull/293
+* Added aparat.com by @soxoj in https://github.com/soxoj/maigret/pull/294
+* Fixed BongaCams, links parsing improved by @soxoj in https://github.com/soxoj/maigret/pull/297
+* Temporary fix for Twitter (#299) by @soxoj in https://github.com/soxoj/maigret/pull/300
+* Fixed TikTok checks (#303) by @soxoj in https://github.com/soxoj/maigret/pull/306
+* Bump pycountry from 20.7.3 to 22.1.10 by @dependabot in https://github.com/soxoj/maigret/pull/313
+* Pornhub search improved by @soxoj in https://github.com/soxoj/maigret/pull/315
+* Codacademy fixed by @soxoj in https://github.com/soxoj/maigret/pull/316
+* Bump pytest-asyncio from 0.16.0 to 0.17.0 by @dependabot in https://github.com/soxoj/maigret/pull/314
+
+**Full Changelog**: https://github.com/soxoj/maigret/compare/v0.4.0...v0.4.1
+
+## [0.4.0] - 2022-01-03
+* Delayed import of requests module, speed check command, reqs updated by @soxoj in https://github.com/soxoj/maigret/pull/189
+* Snapcraft yaml added by @soxoj in https://github.com/soxoj/maigret/pull/190
+* Create codeql-analysis.yml by @soxoj in https://github.com/soxoj/maigret/pull/191
+* Move wiki pages to ReadTheDocs by @egornagornov in https://github.com/soxoj/maigret/pull/194
+* Created ReadTheDocs requirements file by @soxoj in https://github.com/soxoj/maigret/pull/195
+* Fix incompatible version requirements by @JasperJuergensen in https://github.com/soxoj/maigret/pull/196
+* Added link to documentation by @soxoj in https://github.com/soxoj/maigret/pull/198
+* Upgraded base docker image by @soxoj in https://github.com/soxoj/maigret/pull/199
+* Run CodeQL only aflter merge and each Saturday by @soxoj in https://github.com/soxoj/maigret/pull/201
+* Added cascade settings loading from /.maigret/settings.json and ./settings.json by @soxoj in https://github.com/soxoj/maigret/pull/200
+* Documentation and settings improved by @soxoj in https://github.com/soxoj/maigret/pull/203
+* New config options added by @soxoj in https://github.com/soxoj/maigret/pull/204
+* Added export of cli entrypoint by @soxoj in https://github.com/soxoj/maigret/pull/207
+* Removed redundant logging by @soxoj in https://github.com/soxoj/maigret/pull/210
+* PyInstaller workflow by @soxoj in https://github.com/soxoj/maigret/pull/206
+* Create bug.md by @soxoj in https://github.com/soxoj/maigret/pull/213
+* Fixed path and names of report files by @soxoj in https://github.com/soxoj/maigret/pull/216
+* Box drawing logic improved, added new settings by @soxoj in https://github.com/soxoj/maigret/pull/217
+* Fixes for win32 release by @soxoj in https://github.com/soxoj/maigret/pull/218
+* Bump six from 1.15.0 to 1.16.0 by @dependabot in https://github.com/soxoj/maigret/pull/221
+* Bump flake8 from 3.8.4 to 4.0.1 by @dependabot in https://github.com/soxoj/maigret/pull/219
+* Bump aiohttp from 3.7.4 to 3.8.0 by @dependabot in https://github.com/soxoj/maigret/pull/220
+* Bump aiohttp-socks from 0.5.5 to 0.6.0 by @dependabot in https://github.com/soxoj/maigret/pull/222
+* Bump typing-extensions from 3.7.4.3 to 3.10.0.2 by @dependabot in https://github.com/soxoj/maigret/pull/224
+* Bump multidict from 5.1.0 to 5.2.0 by @dependabot in https://github.com/soxoj/maigret/pull/225
+* Bump idna from 2.10 to 3.3 by @dependabot in https://github.com/soxoj/maigret/pull/228
+* Bump pytest-cov from 2.10.1 to 3.0.0 by @dependabot in https://github.com/soxoj/maigret/pull/227
+* Bump mock from 4.0.2 to 4.0.3 by @dependabot in https://github.com/soxoj/maigret/pull/226
+* Bump certifi from 2020.12.5 to 2021.10.8 by @dependabot in https://github.com/soxoj/maigret/pull/233
+* Bump pytest-httpserver from 1.0.0 to 1.0.2 by @dependabot in https://github.com/soxoj/maigret/pull/232
+* Bump lxml from 4.6.3 to 4.6.4 by @dependabot in https://github.com/soxoj/maigret/pull/231
+* Bump pefile from 2019.4.18 to 2021.9.3 by @dependabot in https://github.com/soxoj/maigret/pull/229
+* Bump pytest-rerunfailures from 9.1.1 to 10.2 by @dependabot in https://github.com/soxoj/maigret/pull/230
+* Bump yarl from 1.6.3 to 1.7.2 by @dependabot in https://github.com/soxoj/maigret/pull/237
+* Bump async-timeout from 4.0.0 to 4.0.1 by @dependabot in https://github.com/soxoj/maigret/pull/236
+* Bump psutil from 5.7.0 to 5.8.0 by @dependabot in https://github.com/soxoj/maigret/pull/234
+* Bump jinja2 from 3.0.2 to 3.0.3 by @dependabot in https://github.com/soxoj/maigret/pull/235
+* Bump pytest from 6.2.4 to 6.2.5 by @dependabot in https://github.com/soxoj/maigret/pull/238
+* Bump tqdm from 4.55.0 to 4.62.3 by @dependabot in https://github.com/soxoj/maigret/pull/242
+* Bump arabic-reshaper from 2.1.1 to 2.1.3 by @dependabot in https://github.com/soxoj/maigret/pull/243
+* Bump pytest-asyncio from 0.14.0 to 0.16.0 by @dependabot in https://github.com/soxoj/maigret/pull/240
+* Bump chardet from 3.0.4 to 4.0.0 by @dependabot in https://github.com/soxoj/maigret/pull/241
+* Bump soupsieve from 2.1 to 2.3.1 by @dependabot in https://github.com/soxoj/maigret/pull/239
+* Bump aiohttp from 3.8.0 to 3.8.1 by @dependabot in https://github.com/soxoj/maigret/pull/246
+* Bump typing-extensions from 3.10.0.2 to 4.0.0 by @dependabot in https://github.com/soxoj/maigret/pull/245
+* Bump aiohttp-socks from 0.6.0 to 0.6.1 by @dependabot in https://github.com/soxoj/maigret/pull/249
+* Bump aiohttp-socks from 0.6.1 to 0.7.1 by @dependabot in https://github.com/soxoj/maigret/pull/250
+* Bump typing-extensions from 4.0.0 to 4.0.1 by @dependabot in https://github.com/soxoj/maigret/pull/253
+* Fixed some false positives by @soxoj in https://github.com/soxoj/maigret/pull/254
+* Disabled non-working sites by @soxoj in https://github.com/soxoj/maigret/pull/255
+* Added false results buttons to reports, fixed some falses by @soxoj in https://github.com/soxoj/maigret/pull/256
+* Fixed xHamster, added support of proxies to self-check mode by @soxoj in https://github.com/soxoj/maigret/pull/259
+* Disabled non-working sites, updated public sites list by @soxoj in https://github.com/soxoj/maigret/pull/263
+* Bump lxml from 4.6.4 to 4.6.5 by @dependabot in https://github.com/soxoj/maigret/pull/266
+* Bump lxml from 4.6.5 to 4.7.1 by @dependabot in https://github.com/soxoj/maigret/pull/269
+* Bump pytest-httpserver from 1.0.2 to 1.0.3 by @dependabot in https://github.com/soxoj/maigret/pull/270
+* Fixed failed tests (thx to Meta aka Facebook) by @soxoj in https://github.com/soxoj/maigret/pull/273
+* Fixed votetags, updated issue template by @soxoj in https://github.com/soxoj/maigret/pull/278
+* Bump async-timeout from 4.0.1 to 4.0.2 by @dependabot in https://github.com/soxoj/maigret/pull/275
+* Fixed some false positives by @soxoj in https://github.com/soxoj/maigret/pull/280
+* Bump attrs from 21.2.0 to 21.3.0 by @dependabot in https://github.com/soxoj/maigret/pull/281
+* Bump psutil from 5.8.0 to 5.9.0 by @dependabot in https://github.com/soxoj/maigret/pull/282
+* Bump attrs from 21.3.0 to 21.4.0 by @dependabot in https://github.com/soxoj/maigret/pull/283
+
+**Full Changelog**: https://github.com/soxoj/maigret/compare/v0.3.1...v0.4.0
+
+## [0.3.1] - 2021-10-31
+* fixed false positives
+* accelerated maigret start time by 3 times
+
+## [0.3.0] - 2021-06-02
+* added support of Tor and I2P sites
+* added experimental DNS checking feature
+* implemented sorting by data points for reports
+* reports fixes
+
+## [0.2.4] - 2021-05-18
+* cli output report
+* various improvements
+
+## [0.2.3] - 2021-05-12
+* added Yelp and yelp_userid support
+* tags markup stabilization
+* improved errors detection
+
+## [0.2.2] - 2021-05-07
+* improved ids extractors
+* updated sites and engines
+* updates CLI options
+
+## [0.2.1] - 2021-05-02
+* fixed json reports generation bug, added tests
+
+## [0.2.0] - 2021-05-02
+* added `--retries` option
+* added `source` feature for sites' mirrors
+* improved `submit` mode
+* lot of style and logic fixes
+
+## [0.1.20] - 2021-05-02 [YANKED]

 ## [0.1.19] - 2021-04-14
 * added `--no-progressbar` option
@@ -0,0 +1,128 @@
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, religion, or sexual identity
+and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the
+  overall community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or
+  advances of any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email
+  address, without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+https://t.me/soxoj.
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series
+of actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or
+permanent ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior,  harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within
+the community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.0, available at
+https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
+
+Community Impact Guidelines were inspired by [Mozilla's code of conduct
+enforcement ladder](https://github.com/mozilla/diversity).
+
+[homepage]: https://www.contributor-covenant.org
+
+For answers to common questions about this code of conduct, see the FAQ at
+https://www.contributor-covenant.org/faq. Translations are available at
+https://www.contributor-covenant.org/translations.
@@ -0,0 +1,185 @@
+# How to contribute
+
+Hey! I'm really glad you're reading this. Maigret contains a lot of sites, and it is very hard to keep all the sites operational. That's why any fix is important.
+
+## Code of Conduct
+
+Please read and follow the [Code of Conduct](CODE_OF_CONDUCT.md) to foster a welcoming and inclusive community.
+
+## Local setup
+
+Install Maigret with development dependencies via [Poetry](https://python-poetry.org/):
+
+```bash
+git clone https://github.com/soxoj/maigret && cd maigret
+poetry install --with dev
+```
+
+Activate the repo's git hooks **once after cloning**:
+
+```bash
+git config --local core.hooksPath .githooks/
+```
+
+The pre-commit hook does two things every time you commit changes that touch the site database:
+
+- regenerates the database signature `maigret/resources/db_meta.json` (used to detect compatible auto-updates), and
+- regenerates `sites.md` (the human-readable list of supported sites with per-engine statistics).
+
+It also auto-stages the regenerated files so they land in the same commit as your edits. **Always run `git commit` from inside the repo so the hook can fire** — without it, your PR will land with a stale signature and a stale `sites.md`, and database auto-update will misbehave for users on your branch.
+
+## How to contribute
+
+There are two main ways to help.
+
+### 1. Add a new site
+
+**Beginner.** Use the `--submit` mode — Maigret takes a single existing-account URL, auto-detects the site engine, picks `presenseStrs` / `absenceStrs`, and offers to add the entry:
+
+```bash
+maigret --submit https://example.com/users/alice
+```
+
+`--submit` works well when the site has clean status codes and no anti-bot protection. It will *not* discover a public JSON API (`urlProbe`), classify protection (`tls_fingerprint`, `cf_js_challenge`, `ip_reputation`, ...), or recognise SPA / soft-404 pages. For those, fall back to manual editing.
+
+**Advanced.** Edit `maigret/resources/data.json` by hand — see *Editing `data.json` safely* below. There is also an `add-a-site` issue template if you want a maintainer to do it for you.
+
+### 2. Fix existing sites
+
+The most useful work in this project is keeping checks accurate over time. Sites change layout, switch engines, add Cloudflare, redirect to login walls — every fix is welcome.
+
+**Where to start.** Good candidates:
+
+- Issues with the `false-positive` label, especially those opened automatically by the Telegram bot.
+- Sites currently `disabled: true` in `data.json` — many were disabled on a transient symptom and have since healed.
+- Sites for which `--self-check --diagnose` reports a problem.
+- A focused audit of one engine (vBulletin, XenForo, phpBB, Discourse, Flarum, ...). Engine-wide breakage usually has a single root cause and several sites can be fixed in one PR.
+
+**Diagnose with built-in tools.**
+
+> By default, Maigret skips entries with `disabled: true` in every mode (`--self-check`, `--site`, plain search). Whenever your target is a disabled site — diagnosing it, validating a fix, running the two-filter check below — pass **`--use-disabled-sites`** explicitly. Without the flag, the site is silently dropped from the run and you get an empty result that looks like "everything's fine".
+
+- Per-site diagnosis with recommendations:
+
+  ```bash
+  maigret --self-check --site "SiteName" --diagnose
+  # add --use-disabled-sites if the entry is currently disabled
+  ```
+
+  Without `--auto-disable`, this only reports — it never edits the database. Add `--auto-disable` only when you really want to write the result back.
+
+- Single-site comparison of claimed vs unclaimed responses (status, markers, headers):
+
+  ```bash
+  python utils/site_check.py --site "SiteName" --diagnose
+  python utils/site_check.py --site "SiteName" --compare-methods   # raw aiohttp vs Maigret's checker
+  ```
+
+- Mass check of top-N sites:
+
+  ```bash
+  python utils/check_top_n.py --top 100 --only-broken
+  ```
+
+### Understanding `checkType`
+
+Each site entry uses one of three `checkType` modes to decide whether a profile exists. Picking the right one for your site is the most important data-modeling decision in `data.json`:
+
+- **`message`** (most common, most flexible) — Maigret fetches the page and inspects the HTML body. The profile is reported as found when the body contains at least one substring from `presenseStrs` **and** none of the substrings from `absenceStrs`. Pick narrow, profile-specific markers: a `<title>` fragment unique to profile pages, a CSS class only rendered on profiles (e.g. `"profile-card"`), or a JSON field name from an embedded data blob (`"displayName":`). Avoid generic words (`name`, `email`) and HTML/ARIA boilerplate (`polite`, `alert`, `navigation`, `status`) — they match on every page including error and anti-bot challenge pages, and produce false positives. If the marker contains non-ASCII text, double-check the page is UTF-8 (some legacy sites serve KOI8-R or Windows-1251, in which case byte-level matching silently fails — prefer ASCII markers or a JSON API).
+
+- **`status_code`** — Maigret only looks at the HTTP status code; 2xx means "found", anything else means "not found". Use this only when the site reliably returns proper status codes — typically clean JSON APIs that return HTTP 200 for real users and HTTP 404 for missing ones. Don't use it for sites that return HTTP 200 with a soft "user not found" page (this is the single most common cause of false-positive checks).
+
+- **`response_url`** — Maigret follows the redirect chain and inspects the final URL. Useful when the server reliably redirects missing-user URLs to a different path (e.g. `/login`, `/404`, the homepage) while existing-user URLs stay put. For most sites `message` is a better fit; reach for `response_url` only when a redirect-based signal is genuinely the most stable one.
+
+**`urlProbe` (optional, works with any `checkType`).** If the most reliable signal lives at a different URL than the public profile page — a JSON API, a GraphQL endpoint, a mobile-app route — set `urlProbe` to that URL. Maigret fetches `urlProbe` for the check, but reports continue to show the human-readable `url` so users see a profile link they can click. Examples: GitHub uses `https://github.com/{username}` as `url` and `https://api.github.com/users/{username}` as `urlProbe`; Picsart uses the web profile as `url` and `https://api.picsart.com/users/show/{username}.json` as `urlProbe`. A clean public API is almost always more stable than parsing HTML — it's worth probing for one before settling on `message` against the SPA shell.
+
+**Errors vs absence.** Anything that means "the server can't answer right now" — rate limits, captchas, "Checking your browser", "unusual traffic", maintenance pages — belongs in `errors` (mapping the substring to a human-readable error string), not in `absenceStrs`. The `errors` mechanism produces an UNKNOWN result instead of a false CLAIMED or false AVAILABLE.
+
+Full reference for `checkType`, `urlProbe`, `engine`, and the rest of the `data.json` schema is in the [development guide](docs/source/development.rst), section *How to fix false-positives*.
+
+### Editing `data.json` safely
+
+`data.json` is a single ~36 000-line JSON file. **Make surgical, line-level edits only.** Never rewrite it by reading it into a Python dict and dumping it back — `json.load` + `json.dump` reformats every entry and produces an unreviewable 70 000-line diff. The same rule applies to any helper script that touches the file: it must preserve the original formatting of untouched entries.
+
+If your editor reformats JSON on save, disable that for `data.json` before editing.
+
+### Two-filter validation when re-enabling a site
+
+Removing `disabled: true` requires **two** independent checks. `--self-check` alone is not sufficient — it only verifies the two specific usernames recorded in the entry, so a site that returns CLAIMED for *any* arbitrary username will still pass the self-check.
+
+```bash
+# Filter 1: self-check on the recorded claimed/unclaimed pair
+maigret --self-check --site "SiteName" --use-disabled-sites
+
+# Filter 2: live probe with a clearly fake username — nothing should match
+maigret noonewouldeverusethis7 --site "SiteName" --use-disabled-sites --print-not-found
+```
+
+Both filters need `--use-disabled-sites`, since a candidate for re-enable still has `disabled: true` in the working tree until your edit lands. If you forget the flag, both commands silently no-op.
+
+If the second command reports `[+]` for the fake username, the check is a false positive — do not enable. This step takes seconds and is non-negotiable for any re-enable PR.
+
+## Site naming, tags, and protection
+
+- **Site naming conventions** (Title Case by default, brand-specific exceptions, no `www.` prefix, etc.) are documented in the [development guide](docs/source/development.rst), section *Site naming conventions*.
+
+- **Country tags** (`us`, `ru`, `kr`, ...) attribute an account to a country of origin or residence — they're not a traffic-share label. Global services (GitHub, YouTube, Reddit) get **no** country tag; regional services (VK → `ru`, Naver → `kr`) **must** have one. Don't assign a country tag from Alexa/SimilarWeb audience stats.
+
+- **Category tags** must come from the canonical `"tags"` array at the bottom of `data.json`. The `test_tags_validity` test fails if you introduce an unregistered tag. If no existing tag fits well, either pick the closest reasonable match or add the new tag to the canonical list as an explicit, separate change. Don't use platform names (`writefreely`, `pixelfed`) — use category names (`blog`, `photo`).
+
+- **Protection tags** (`tls_fingerprint`, `ip_reputation`, `cf_js_challenge`, `cf_firewall`, `aws_waf_js_challenge`, `ddos_guard_challenge`, `js_challenge`, `custom_bot_protection`) describe the kind of anti-bot protection a site uses. One of them — **`tls_fingerprint`** — is load-bearing: when a site fingerprints the TLS handshake (JA3/JA4) and blocks non-browser clients, tagging it with `tls_fingerprint` makes Maigret automatically swap its HTTP client to [`curl_cffi`](https://github.com/lexiforest/curl_cffi) with Chrome browser emulation, which is usually enough to pass. The site stays `enabled` — no `disabled: true` is needed. Examples: Instagram, NPM, Codepen, Kickstarter, Letterboxd. The remaining tags are documentation-only and pair with `disabled: true` until a per-provider solver is integrated. The full taxonomy and the rules for picking the right tag are in the [development guide](docs/source/development.rst), section *protection (site protection tracking)*. Don't add a protection tag without empirical evidence it applies in the current environment.
+
+## Testing
+
+CI runs the same checks on every PR, but please run them locally first:
+
+```bash
+make format     # auto-format with black
+make lint       # flake / mypy
+make test       # pytest with coverage
+```
+
+## Submitting changes
+
+Open a [GitHub PR](https://github.com/soxoj/maigret/pulls) against `main`. Always write a clear log message:
+
+```
+$ git commit -m "A brief summary of the commit
+>
+> A paragraph describing what changed and its impact."
+```
+
+One-line messages are fine for small changes; bigger changes should explain the *why* in the body.
+
+## Coding conventions
+
+### General
+
+- Follow [PEP 8](https://www.python.org/dev/peps/pep-0008/) for Python.
+- Make sure all tests pass before opening the PR.
+
+### Code style
+
+- **Indentation**: 4 spaces per level.
+- **Imports**: standard library first, third-party next, project-local last; group them logically.
+
+### Naming
+
+- **Variables and functions**: `snake_case`.
+- **Classes**: `CamelCase`.
+- **Constants**: `UPPER_CASE`.
+
+Start reading the code and you'll get the hang of it.
+
+## Getting help
+
+If you're stuck on something — a check that won't behave, a setup error, an unclear field in `data.json`, or just want to discuss an approach before opening a PR — there are two places to ask:
+
+- [GitHub Discussions](https://github.com/soxoj/maigret/discussions) — searchable, public, good for technical questions and design ideas. Prefer this for anything other contributors might run into too.
+- Telegram: [@soxoj](https://t.me/soxoj) — direct channel to the maintainer, good for quick questions and informal chat.
+
+Bug reports and feature requests still belong in [GitHub Issues](https://github.com/soxoj/maigret/issues).
+
+## License
+
+Maigret is MIT-licensed; by submitting a contribution you agree to publish it under the same license. There is no CLA.
@@ -1,25 +1,27 @@
-FROM python:3.7
+FROM python:3.11-slim AS base
 LABEL maintainer="Soxoj <soxoj@protonmail.com>"
-
 WORKDIR /app
-
-ADD requirements.txt .
-
-RUN pip install --upgrade pip
-
-RUN apt update -y
-
-RUN apt install -y\
-      gcc \
-      musl-dev \
-      libxml2 \
+RUN pip install --no-cache-dir --upgrade pip
+RUN apt-get update && \
+    apt-get install --no-install-recommends -y \
+      build-essential \
+      python3-dev \
+      pkg-config \
+      libcairo2-dev \
      libxml2-dev \
-      libxslt-dev \
-&&  YARL_NO_EXTENSIONS=1 python3 -m pip install maigret \
-&&  rm -rf /var/cache/apk/* \
-           /tmp/* \
-           /var/tmp/*
+      libxslt1-dev \
+    && rm -rf /var/lib/apt/lists/* /tmp/*
+COPY . .
+RUN YARL_NO_EXTENSIONS=1 python3 -m pip install --no-cache-dir .
+# For production use, set FLASK_HOST to a specific IP address for security
+ENV FLASK_HOST=0.0.0.0

-ADD . .
+# Web UI variant: auto-launches the web interface on $PORT
+FROM base AS web
+ENV PORT=5000
+EXPOSE 5000
+ENTRYPOINT ["sh", "-c", "exec maigret --web \"$PORT\""]

+# Default variant (last stage = `docker build .` target): CLI, backwards-compatible
+FROM base AS cli
 ENTRYPOINT ["maigret"]
@@ -0,0 +1,118 @@
+@echo off
+goto check_Permissions
+
+:check_Permissions
+net session >nul 2>&1
+if %errorLevel% == 0 (
+    echo Success: Elevated permissions granted.
+) else (
+    echo Failure: Requires elevated permissions.
+    pause >nul
+)
+
+cls
+echo --------------------------------------------------------
+echo          Python 3.8 or higher and pip3 required.
+echo --------------------------------------------------------
+echo             Press [I] to begin installation.
+echo             Press [R] If already installed.
+echo --------------------------------------------------------
+choice /c IR
+if %errorlevel%==1 goto check_python
+if %errorlevel%==2 goto after
+
+:check_python
+cls
+for /f "tokens=2 delims= " %%i in ('python --version 2^>nul') do (
+    for /f "tokens=1,2 delims=." %%j in ("%%i") do (
+        if %%j GEQ 3 (
+            if %%k GEQ 8 (
+                goto check_pip
+            )
+        )
+    )
+)
+echo Python 3.8 or higher is required. Please install it first.
+pause
+exit /b
+
+:check_pip
+pip --version 2>nul | findstr /r /c:"pip" >nul
+if %errorlevel% neq 0 (
+    echo pip is required. Please install it first.
+    pause
+    exit /b
+)
+goto install1
+
+:install1
+cls
+echo ========================================================
+echo                    Maigret Installation
+echo ========================================================
+echo.
+echo --------------------------------------------------------
+echo   If your pip installation is outdated, it could cause
+echo         cryptography to fail on installation.
+echo --------------------------------------------------------
+echo          Check for and install pip 23.3.2 now?
+echo --------------------------------------------------------
+choice /c YN
+if %errorlevel%==1 goto install2
+if %errorlevel%==2 goto install3
+
+:install2
+cls
+python -m pip install --upgrade pip==23.3.2
+if %errorlevel% neq 0 (
+    echo Failed to update pip to version 23.3.2. Please check your installation.
+    pause
+    exit /b
+)
+goto install3
+
+:install3
+cls
+echo ========================================================
+echo                   Maigret Installation
+echo ========================================================
+echo.
+echo --------------------------------------------------------
+echo Installing Maigret...
+python -m pip install maigret
+if %errorlevel% neq 0 (
+    echo Failed to install Maigret. Please check your installation.
+    pause
+    exit /b
+)
+echo.
+echo +------------------------------------------------------+
+echo              Maigret installed successfully.           
+echo +------------------------------------------------------+
+pause
+goto after
+
+:after
+cls
+echo ========================================================
+echo                     Maigret Usage
+echo ========================================================
+echo.
+echo +--------------------------------------------------------+
+echo To use Maigret, you can run the following command:
+echo.
+echo     maigret [options] [username]
+echo.
+echo For example, to search for a username:
+echo.
+echo     maigret example_username
+echo.
+echo For more options and usage details, refer to the Maigret documentation.
+echo.
+echo https://github.com/soxoj/maigret/blob/5b3b81b4822f6deb2e9c31eb95039907f25beb5e/README.md
+echo +--------------------------------------------------------+
+echo.
+cmd
+pause
+exit /b
+exit /b
@@ -1,7 +1,6 @@
 MIT License

-Copyright (c) 2019 Sherlock Project
-Copyright (c) 2020-2021 Soxoj
+Copyright (c) 2020-2026 Soxoj

 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
@@ -1,4 +0,0 @@
-include LICENSE
-include README.md
-include requirements.txt
-include maigret/resources/*
@@ -0,0 +1,41 @@
+LINT_FILES=maigret wizard.py tests
+
+test:
+	coverage run --source=./maigret,./maigret/web -m pytest tests
+	coverage report -m
+	coverage html
+
+rerun-tests:
+	pytest --lf -vv
+
+lint:
+	@echo 'syntax errors or undefined names'
+	flake8 --count --select=E9,F63,F7,F82 --show-source --statistics ${LINT_FILES}
+
+	@echo 'warning'
+	flake8 --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --ignore=E731,W503,E501 ${LINT_FILES}
+
+	@echo 'mypy'
+	mypy --check-untyped-defs ${LINT_FILES}
+
+speed:
+	time python3 -m maigret --version
+	python3 -c "import timeit; t = timeit.Timer('import maigret'); print(t.timeit(number = 1000000))"
+	python3 -X importtime -c "import maigret" 2> maigret-import.log
+	python3 -m tuna maigret-import.log
+
+format:
+	@echo 'black'
+	black --skip-string-normalization ${LINT_FILES}
+
+pull:
+	git stash
+	git checkout main
+	git pull origin main
+	git stash pop
+
+clean:
+	rm -rf reports htmcov dist
+
+install:
+	pip3 install .
@@ -1,113 +1,316 @@
 # Maigret

-![PyPI](https://img.shields.io/pypi/v/maigret?style=flat-square)
-![PyPI - Downloads](https://img.shields.io/pypi/dw/maigret?style=flat-square)
-[![Chat - Gitter](./static/chat_gitter.svg)](https://gitter.im/maigret-osint/community)
+<div align="center">
+  <div>
+    <a href="https://pypi.org/project/maigret/">
+        <img alt="PyPI version badge for Maigret" src="https://img.shields.io/pypi/v/maigret?style=flat-square" />
+    </a>
+    <a href="https://pypi.org/project/maigret/">  
+        <img alt="PyPI download count for Maigret" src="https://img.shields.io/pypi/dw/maigret?style=flat-square" />
+    </a>
+    <a href="https://github.com/soxoj/maigret">
+        <img alt="Minimum Python version required: 3.10+" src="https://img.shields.io/badge/Python-3.10%2B-brightgreen?style=flat-square" />
+    </a>
+    <a href="https://github.com/soxoj/maigret/blob/main/LICENSE">
+        <img alt="License badge for Maigret" src="https://img.shields.io/github/license/soxoj/maigret?style=flat-square" />
+    </a>
+    <a href="https://github.com/soxoj/maigret">
+        <img alt="View count for Maigret project" src="https://komarev.com/ghpvc/?username=maigret&color=brightgreen&label=views&style=flat-square" />
+    </a>
+  </div>
+  <br>
+  <div>
+    <img src="https://raw.githubusercontent.com/soxoj/maigret/main/static/maigret.png" height="300" alt="Maigret logo"/>
+  </div>
+  <br>
+  <div>
+    <b>English</b> · <a href="README.zh-CN.md">简体中文</a>
+  </div>
+  <br>
+</div>

-<p align="center">
-  <img src="./static/maigret.png" />
-</p>
+**Maigret** collects a dossier on a person **by username only**, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys required.

-<i>The Commissioner Jules Maigret is a fictional French police detective, created by Georges Simenon. His investigation method is based on understanding the personality of different people and their interactions.</i>
+## Contents

-## About
+- [In one minute](#in-one-minute)
+- [Main features](#main-features)
+- [Demo](#demo)
+- [Installation](#installation)
+- [Usage](#usage)
+- [Contributing](#contributing)
+- [Commercial Use](#commercial-use)
+- [About](#about)

-Purpose of Maigret - **collect a dossier on a person by username only**, checking for accounts on a huge number of sites.
+<a id="one-minute"></a>
+## In one minute

-This is a [sherlock](https://github.com/sherlock-project/) fork with cool features under heavy development.
-*Don't forget to regularly update source code from repo*.
+Ensure you have Python 3.10 or higher.

-Currently supported more than 2000 sites ([full list](./sites.md)), by default search is launched against 500 popular sites in descending order of popularity.
+```bash
+pip install maigret
+maigret YOUR_USERNAME
+```
+
+No install? Try the [Telegram bot](https://t.me/maigret_search_bot) or a [Cloud Shell](#cloud-shells). 
+
+Want a web UI? See [how to launch it](#web-interface).
+
+See also: [Quick start](https://maigret.readthedocs.io/en/latest/quick-start.html). 

 ## Main features

-* Profile pages parsing, [extracting](https://github.com/soxoj/socid_extractor) personal info, links to other profiles, etc.
-* Recursive search by new usernames found
-* Search by tags (site categories, countries)
-* Censorship and captcha detection
-* Very few false positives
+- Supports 3,000+ sites ([see full list](https://github.com/soxoj/maigret/blob/main/sites.md)). A default run checks the 500 highest-ranked sites by traffic; pass `-a` to scan everything, or `--tags` to narrow by category/country.
+- Embeddable in Python projects — import `maigret` and run searches programmatically (see [library usage](https://maigret.readthedocs.io/en/latest/library-usage.html)).
+- [Extracts](https://github.com/soxoj/socid_extractor) all available information about the account owner from profile pages and site APIs, including links to other accounts.
+- Performs recursive search using discovered usernames and other IDs.
+- Allows filtering by tags (site categories, countries).
+- Detects and partially bypasses blocks, censorship, and CAPTCHA.
+- Fetches an [auto-updated site database](https://maigret.readthedocs.io/en/latest/settings.html#database-auto-update) from GitHub each run (once per 24 hours), and falls back to the built-in database if offline.
+- Works with Tor and I2P websites; able to check domains.
+- Ships with a [web interface](#web-interface) for browsing results as a graph and downloading reports in every format from a single page.
+- Optional [AI analysis mode](#ai-analysis) (`--ai`) that turns raw findings into a short investigation summary using an OpenAI-compatible API.
+
+For the complete feature list, see the [features documentation](https://maigret.readthedocs.io/en/latest/features.html).
+
+### Used by
+
+Professional OSINT and social-media analysis tools built on Maigret:
+
+<a href="https://github.com/SocialLinks-IO/sociallinks-api"><img height="60" alt="Social Links API" src="https://github.com/user-attachments/assets/789747b2-d7a0-4d4e-8868-ffc4427df660"></a>
+<a href="https://sociallinks.io/products/sl-crimewall"><img height="60" alt="Social Links Crimewall" src="https://github.com/user-attachments/assets/0b18f06c-2f38-477b-b946-1be1a632a9d1"></a>
+<a href="https://usersearch.ai/"><img height="60" alt="UserSearch" src="https://github.com/user-attachments/assets/66daa213-cf7d-40cf-9267-42f97cf77580"></a>
+
+## Demo
+
+### Video
+
+<a href="https://asciinema.org/a/Ao0y7N0TTxpS0pisoprQJdylZ">
+  <img src="https://asciinema.org/a/Ao0y7N0TTxpS0pisoprQJdylZ.svg" alt="asciicast" width="600">
+</a>
+
+### Reports
+
+[PDF report](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.pdf), [HTML report](https://htmlpreview.github.io/?https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.html)
+
+![HTML report screenshot](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotography_html_screenshot.png)
+
+![XMind 8 report screenshot](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotography_xmind_screenshot.png)
+
+[Full console output](https://raw.githubusercontent.com/soxoj/maigret/main/static/recursive_search.md)

 ## Installation

-**NOTE**: Python 3.6 or higher and pip is required.
+Already ran the [In one minute](#one-minute) steps? You're set. Below are alternative methods.

-**Python 3.8 is recommended.**
+Don't want to install anything? Use the [Telegram bot](https://t.me/maigret_search_bot).
+
+### Windows
+
+Download a standalone EXE from [Releases](https://github.com/soxoj/maigret/releases). Video guide: https://youtu.be/qIgwTZOmMmM.
+
+<a id="cloud-shells"></a>
+### Cloud Shells
+
+Run Maigret in the browser via cloud shells or Jupyter notebooks:
+
+<a href="https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/soxoj/maigret&tutorial=cloudshell-tutorial.md"><img src="https://user-images.githubusercontent.com/27065646/92304704-8d146d80-ef80-11ea-8c29-0deaabb1c702.png" alt="Open in Cloud Shell" height="50"></a>
+<a href="https://repl.it/github/soxoj/maigret"><img src="https://replit.com/badge/github/soxoj/maigret" alt="Run on Replit" height="50"></a>
+
+<a href="https://colab.research.google.com/gist/soxoj/879b51bc3b2f8b695abb054090645000/maigret-collab.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="45"></a>
+<a href="https://mybinder.org/v2/gist/soxoj/9d65c2f4d3bec5dd25949197ea73cf3a/HEAD"><img src="https://mybinder.org/badge_logo.svg" alt="Open In Binder" height="45"></a>
+
+### Local installation (pip)

-### Package installing
 ```bash
 # install from pypi
 pip3 install maigret

+# usage
+maigret username
+```
+
+### From source
+
+```bash
 # or clone and install manually
 git clone https://github.com/soxoj/maigret && cd maigret
+
+# build and install
 pip3 install .
+
+# usage
+maigret username
 ```

-### Cloning a repository
+### Docker
+
+Two image variants are published:
+
+- `soxoj/maigret:latest` — CLI mode (default)
+- `soxoj/maigret:web` — auto-launches the [web interface](#web-interface)

 ```bash
-git clone https://github.com/soxoj/maigret && cd maigret
+# official image (CLI)
+docker pull soxoj/maigret
+
+# CLI usage
+docker run -v /mydir:/app/reports soxoj/maigret:latest username --html
+
+# Web UI (open http://localhost:5000)
+docker run -p 5000:5000 soxoj/maigret:web
+
+# Web UI on a custom port
+docker run -e PORT=8080 -p 8080:8080 soxoj/maigret:web
+
+# manual build
+docker build -t maigret .                  # CLI image (default target)
+docker build --target web -t maigret-web . # Web UI image
 ```

-You can use your a free virtual machine, the repo will be automatically cloned:
+### Troubleshooting

-[![Open in Cloud Shell](https://user-images.githubusercontent.com/27065646/92304704-8d146d80-ef80-11ea-8c29-0deaabb1c702.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/soxoj/maigret&tutorial=README.md) [![Run on Repl.it](https://user-images.githubusercontent.com/27065646/92304596-bf719b00-ef7f-11ea-987f-2c1f3c323088.png)](https://repl.it/github/soxoj/maigret)
-<a href="https://colab.research.google.com/gist//soxoj/879b51bc3b2f8b695abb054090645000/maigret.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="40"></a>
+Build errors? See the [troubleshooting guide](https://maigret.readthedocs.io/en/latest/installation.html#troubleshooting).
+
+## Usage
+
+### Examples

 ```bash
-pip3 install -r requirements.txt
-```
+# make HTML, PDF, and Xmind8 reports
+maigret user --html
+maigret user --pdf
+maigret user --xmind #Output not compatible with xmind 2022+

-## Using examples
-
-```bash
-# for a cloned repo
-./maigret.py user
-
-# for a package
-maigret user
-```
-
-Features:
-```bash
-# make HTML and PDF reports
-maigret user --html --pdf
+# machine-readable exports
+maigret user --json ndjson   # newline-delimited JSON (also: --json simple)
+maigret user --csv
+maigret user --txt
+maigret user --graph         # interactive D3 graph (HTML)

 # search on sites marked with tags photo & dating
 maigret user --tags photo,dating

+# search on sites marked with tag us
+maigret user --tags us

 # search for three usernames on all available sites
 maigret user1 user2 user3 -a

+# AI-assisted investigation summary (needs OPENAI_API_KEY)
+maigret user --ai
 ```

-Run `maigret --help` to get arguments description. Also options are documented in [the Maigret Wiki](https://github.com/soxoj/maigret/wiki/Command-line-options).
+Run `maigret --help` for all options. Docs: [CLI options](https://maigret.readthedocs.io/en/latest/command-line-options.html), [more examples](https://maigret.readthedocs.io/en/latest/usage-examples.html). Running into 403s or timeouts? See [TROUBLESHOOTING.md](TROUBLESHOOTING.md).

-With Docker:
-```
-# manual build
-docker build -t maigret . && docker run maigret user
+<a id="web-interface"></a>
+### Web interface

-# official image
-docker run soxoj/maigret:latest user
+Maigret has a built-in web UI with a results graph and downloadable reports.
+
+<details>
+<summary>Web Interface Screenshots</summary>
+
+![Web interface: how to start](https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot_start.png)
+
+![Web interface: results](https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot.png)
+
+</details>
+
+```console
+maigret --web 5000
 ```

-## Demo with page parsing and recursive username search
+Open http://127.0.0.1:5000, enter a username, and view results.

-[PDF report](./static/report_alexaimephotographycars.pdf), [HTML report](https://htmlpreview.github.io/?https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.html)
+### Python library

-![animation of recursive search](./static/recursive_search.svg)
+**Maigret can be embedded in your own Python projects.** The CLI is a thin wrapper around an async function you can call directly — build custom pipelines, feed results into your own tooling, or run it inside a larger OSINT workflow.

-![HTML report screenshot](./static/report_alexaimephotography_html_screenshot.png)
+See the full [library usage guide](https://maigret.readthedocs.io/en/latest/library-usage.html) for a working example, async patterns, and how to filter sites by tag.

-![XMind report screenshot](./static/report_alexaimephotography_xmind_screenshot.png)
+### Useful CLI flags

+- `--parse URL` — parse a profile page, extract IDs/usernames, and use them to kick off a recursive search.
+- `--permute` — generate likely username variants from two or more inputs (e.g. `john doe` → `johndoe`, `j.doe`, …) and search for all of them.
+- `--self-check [--auto-disable]` — verify `usernameClaimed` / `usernameUnclaimed` pairs against live sites for maintainers auditing the database.
+- `--ai` / `--ai-model` — run the [AI analysis](#ai-analysis) over the search results and stream a short investigation summary to the terminal.

-[Full console output](./static/recursive_search.md)
+<a id="ai-analysis"></a>
+### AI analysis

-## License
+`--ai` collects the search results, builds an internal Markdown report, and sends it to an OpenAI-compatible chat completion endpoint to produce a short, neutral investigation summary (likely real name, location, occupation, interests, languages, confidence, follow-up leads). Per-site progress is suppressed and the model's output is streamed to stdout.

-MIT © [Maigret](https://github.com/soxoj/maigret)<br/>
-MIT © [Sherlock Project](https://github.com/sherlock-project/)<br/>
-Original Creator of Sherlock Project - [Siddharth Dushantha](https://github.com/sdushantha)
+```bash
+export OPENAI_API_KEY=sk-...
+maigret user --ai
+
+# pick a different model
+maigret user --ai --ai-model gpt-4o-mini
+```
+
+The key can also be set as `openai_api_key` in `settings.json`. The endpoint defaults to `https://api.openai.com/v1`, but `openai_api_base_url` in `settings.json` can point to any OpenAI-compatible API (Azure OpenAI, OpenRouter, a local server, …). See the [settings docs](https://maigret.readthedocs.io/en/latest/settings.html) for the full list of options.
+
+### Tor / I2P / proxies
+
+Maigret can route checks through a proxy, Tor, or I2P — useful for `.onion` / `.i2p` sites and for bypassing WAFs that block datacenter IPs.
+
+```bash
+# any HTTP/SOCKS proxy
+maigret user --proxy socks5://127.0.0.1:1080
+
+# Tor (default gateway socks5://127.0.0.1:9050)
+maigret user --tor-proxy socks5://127.0.0.1:9050
+
+# I2P (default gateway http://127.0.0.1:4444)
+maigret user --i2p-proxy http://127.0.0.1:4444
+```
+
+Start your Tor / I2P daemon before running the command — Maigret does not manage these gateways.
+
+### Cloudflare bypass
+
+> **Experimental.** The Cloudflare webgate is under active development; the configuration schema, CLI behaviour, and the set of routed sites may change without backwards-compatibility guarantees.
+
+A subset of sites in the database require a real browser to solve a JavaScript challenge. Maigret can offload these checks to a local [FlareSolverr](https://github.com/FlareSolverr/FlareSolverr) instance:
+
+```bash
+docker run -d -p 8191:8191 --name flaresolverr ghcr.io/flaresolverr/flaresolverr:latest
+maigret --cloudflare-bypass <username>
+```
+
+The bypass is opt-in (`--cloudflare-bypass` or `cloudflare_bypass.enabled` in `settings.json`) and only fires for sites whose `protection` field matches. See the [feature docs](https://maigret.readthedocs.io/en/latest/features.html#cloudflare-bypass) for backend options and configuration.
+
+## Contributing
+
+Add or fix new sites surgically in `data.json` (no `json.load`/`json.dump`), then run `./utils/update_site_data.py` to regenerate `sites.md` and the database metadata, and open a pull request. For more details, see the [CONTRIBUTING guide](https://github.com/soxoj/maigret/blob/main/CONTRIBUTING.md) and [development docs](https://maigret.readthedocs.io/en/latest/development.html). Release history: [CHANGELOG.md](CHANGELOG.md).
+
+## Commercial Use
+
+The open-source Maigret is MIT-licensed and free for commercial use without restriction — but site checks break over time and need active maintenance.
+
+For serious commercial use — with a **daily-updated site database** or a **username-check API** — reach out: 📧 [maigret@soxoj.com](mailto:maigret@soxoj.com)
+
+- Private site database — 5 000+ sites, updated daily (separate from the public open-source database)
+- Username check API — integrate Maigret into your product
+
+## About
+
+### Disclaimer
+
+**For educational and lawful purposes only.** You are responsible for complying with all applicable laws (GDPR, CCPA, etc.) in your jurisdiction. The authors bear no responsibility for misuse.
+
+### Feedback
+
+[Open an issue](https://github.com/soxoj/maigret/issues) · [GitHub Discussions](https://github.com/soxoj/maigret/discussions) · [Telegram](https://t.me/soxoj)
+
+### SOWEL classification
+
+OSINT techniques used:
+- [SOTL-2.2. Search For Accounts On Other Platforms](https://sowel.soxoj.com/other-platform-accounts)
+- [SOTL-6.1. Check Logins Reuse To Find Another Account](https://sowel.soxoj.com/logins-reuse)
+- [SOTL-6.2. Check Nicknames Reuse To Find Another Account](https://sowel.soxoj.com/nicknames-reuse) 
+
+### License
+
+MIT © [Maigret](https://github.com/soxoj/maigret)
@@ -0,0 +1,310 @@
+# Maigret
+
+<div align="center">
+  <div>
+    <a href="https://pypi.org/project/maigret/">
+        <img alt="Maigret 的 PyPI 版本" src="https://img.shields.io/pypi/v/maigret?style=flat-square" />
+    </a>
+    <a href="https://pypi.org/project/maigret/">  
+        <img alt="Maigret 的 PyPI 周下载量" src="https://img.shields.io/pypi/dw/maigret?style=flat-square" />
+    </a>
+    <a href="https://github.com/soxoj/maigret">
+        <img alt="所需最低 Python 版本:3.10+" src="https://img.shields.io/badge/Python-3.10%2B-brightgreen?style=flat-square" />
+    </a>
+    <a href="https://github.com/soxoj/maigret/blob/main/LICENSE">
+        <img alt="Maigret 的开源许可证" src="https://img.shields.io/github/license/soxoj/maigret?style=flat-square" />
+    </a>
+    <a href="https://github.com/soxoj/maigret">
+        <img alt="Maigret 项目访问量" src="https://komarev.com/ghpvc/?username=maigret&color=brightgreen&label=views&style=flat-square" />
+    </a>
+  </div>
+  <br>
+  <div>
+    <img src="https://raw.githubusercontent.com/soxoj/maigret/main/static/maigret.png" height="300" alt="Maigret logo"/>
+  </div>
+  <br>
+  <div>
+    <a href="README.md">English</a> · <b>简体中文</b>
+  </div>
+  <br>
+</div>
+
+**Maigret** 仅凭一个用户名,就能在大量站点上查找其账号,并从网页中收集所有可获取的公开信息,为目标人物生成一份档案。无需任何 API 密钥。
+
+## 目录
+
+- [一分钟上手](#one-minute)
+- [核心特性](#main-features)
+- [演示](#demo)
+- [安装](#installation)
+- [使用](#usage)
+- [参与贡献](#contributing)
+- [商业使用](#commercial-use)
+- [关于](#about)
+
+<a id="one-minute"></a>
+## 一分钟上手
+
+请先确认本机的 Python 版本不低于 3.10。
+
+```bash
+pip install maigret
+maigret YOUR_USERNAME
+```
+
+不想本地安装?可以试试 [Telegram 机器人](https://t.me/maigret_search_bot),或者使用[云端 Shell](#cloud-shells)。
+
+想要一个 Web 界面?参见[启动方式](#web-interface)。
+
+延伸阅读:[快速入门](https://maigret.readthedocs.io/en/latest/quick-start.html)。
+
+<a id="main-features"></a>
+## 核心特性
+
+- 支持 3000+ 站点(完整列表见 [sites.md](https://github.com/soxoj/maigret/blob/main/sites.md))。默认仅检查访问量排名前 500 的站点;加上 `-a` 可全量扫描,或使用 `--tags` 按分类/国家筛选。
+- 可作为 Python 库嵌入到自己的项目中——直接 `import maigret` 即可在代码里发起搜索(参见[库使用文档](https://maigret.readthedocs.io/en/latest/library-usage.html))。
+- 通过 [socid_extractor](https://github.com/soxoj/socid_extractor) 从个人主页和站点 API 中[提取](https://github.com/soxoj/socid_extractor)账号所有者的所有可获取信息,包括指向其他账号的链接。
+- 基于已发现的用户名和其他 ID,执行递归搜索。
+- 支持按标签(站点分类、国家)进行筛选。
+- 能够检测并部分绕过封锁、审查和 CAPTCHA。
+- 每次运行时(每 24 小时一次)从 GitHub 拉取一份[自动更新的站点数据库](https://maigret.readthedocs.io/en/latest/settings.html#database-auto-update);离线时会回退到内置数据库。
+- 可访问 Tor 与 I2P 站点;支持检查域名。
+- 自带一个 [Web 界面](#web-interface),可在同一页面将结果以图谱方式浏览,并下载各种格式的报告。
+- 可选的 [AI 分析模式](#ai-analysis)(`--ai`),通过 OpenAI 兼容 API 将原始搜索结果整理成一份简短的调查摘要。
+
+完整特性列表请见[特性文档](https://maigret.readthedocs.io/en/latest/features.html)。
+
+### 谁在使用
+
+基于 Maigret 构建的专业 OSINT 与社交媒体分析工具:
+
+<a href="https://github.com/SocialLinks-IO/sociallinks-api"><img height="60" alt="Social Links API" src="https://github.com/user-attachments/assets/789747b2-d7a0-4d4e-8868-ffc4427df660"></a>
+<a href="https://sociallinks.io/products/sl-crimewall"><img height="60" alt="Social Links Crimewall" src="https://github.com/user-attachments/assets/0b18f06c-2f38-477b-b946-1be1a632a9d1"></a>
+<a href="https://usersearch.ai/"><img height="60" alt="UserSearch" src="https://github.com/user-attachments/assets/66daa213-cf7d-40cf-9267-42f97cf77580"></a>
+
+<a id="demo"></a>
+## 演示
+
+### 视频
+
+<a href="https://asciinema.org/a/Ao0y7N0TTxpS0pisoprQJdylZ">
+  <img src="https://asciinema.org/a/Ao0y7N0TTxpS0pisoprQJdylZ.svg" alt="asciicast" width="600">
+</a>
+
+### 报告示例
+
+[PDF 报告](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.pdf)、[HTML 报告](https://htmlpreview.github.io/?https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.html)
+
+![HTML 报告截图](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotography_html_screenshot.png)
+
+![XMind 8 报告截图](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotography_xmind_screenshot.png)
+
+[完整的命令行输出示例](https://raw.githubusercontent.com/soxoj/maigret/main/static/recursive_search.md)
+
+<a id="installation"></a>
+## 安装
+
+如果你已经按[一分钟上手](#one-minute)的步骤跑通了,就无需再装。下面列出几种可选的安装方式。
+
+什么都不想装?直接用 [Telegram 机器人](https://t.me/maigret_search_bot)。
+
+### Windows
+
+从 [Releases](https://github.com/soxoj/maigret/releases) 下载独立的 EXE 文件。视频指引:https://youtu.be/qIgwTZOmMmM。
+
+<a id="cloud-shells"></a>
+### 云端 Shell
+
+通过云端 Shell 或 Jupyter Notebook 在浏览器里运行 Maigret:
+
+<a href="https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/soxoj/maigret&tutorial=cloudshell-tutorial.md"><img src="https://user-images.githubusercontent.com/27065646/92304704-8d146d80-ef80-11ea-8c29-0deaabb1c702.png" alt="Open in Cloud Shell" height="50"></a>
+<a href="https://repl.it/github/soxoj/maigret"><img src="https://replit.com/badge/github/soxoj/maigret" alt="Run on Replit" height="50"></a>
+
+<a href="https://colab.research.google.com/gist/soxoj/879b51bc3b2f8b695abb054090645000/maigret-collab.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="45"></a>
+<a href="https://mybinder.org/v2/gist/soxoj/9d65c2f4d3bec5dd25949197ea73cf3a/HEAD"><img src="https://mybinder.org/badge_logo.svg" alt="Open In Binder" height="45"></a>
+
+### 本地安装(pip)
+
+```bash
+# 从 PyPI 安装
+pip3 install maigret
+
+# 使用
+maigret username
+```
+
+### 从源码安装
+
+```bash
+# 也可以克隆仓库后手动安装
+git clone https://github.com/soxoj/maigret && cd maigret
+
+# 构建并安装
+pip3 install .
+
+# 使用
+maigret username
+```
+
+### Docker
+
+官方提供两个镜像变体:
+
+- `soxoj/maigret:latest` —— CLI 模式(默认)
+- `soxoj/maigret:web` —— 自动启动 [Web 界面](#web-interface)
+
+```bash
+# 拉取官方镜像(CLI)
+docker pull soxoj/maigret
+
+# CLI 用法
+docker run -v /mydir:/app/reports soxoj/maigret:latest username --html
+
+# Web UI(在 http://localhost:5000 打开)
+docker run -p 5000:5000 soxoj/maigret:web
+
+# 自定义 Web UI 端口
+docker run -e PORT=8080 -p 8080:8080 soxoj/maigret:web
+
+# 手动构建
+docker build -t maigret .                  # CLI 镜像(默认 target)
+docker build --target web -t maigret-web . # Web UI 镜像
+```
+
+### 故障排查
+
+构建报错?请见[故障排查指南](https://maigret.readthedocs.io/en/latest/installation.html#troubleshooting)。
+
+<a id="usage"></a>
+## 使用
+
+### 示例
+
+```bash
+# 生成 HTML、PDF、XMind 8 报告
+maigret user --html
+maigret user --pdf
+maigret user --xmind # 与 XMind 2022+ 不兼容
+
+# 机器可读的导出格式
+maigret user --json ndjson   # 行分隔 JSON(也支持 --json simple)
+maigret user --csv
+maigret user --txt
+maigret user --graph         # 交互式 D3 图谱(HTML)
+
+# 仅在带有 photo 与 dating 标签的站点上搜索
+maigret user --tags photo,dating
+
+# 仅在带有 us 标签的站点上搜索
+maigret user --tags us
+
+# 同时在所有站点上搜索三个用户名
+maigret user1 user2 user3 -a
+
+# AI 辅助调查摘要(需要 OPENAI_API_KEY)
+maigret user --ai
+```
+
+完整选项请运行 `maigret --help`。文档:[命令行选项](https://maigret.readthedocs.io/en/latest/command-line-options.html)、[更多示例](https://maigret.readthedocs.io/en/latest/usage-examples.html)。遇到 403 或超时?参见 [TROUBLESHOOTING.md](TROUBLESHOOTING.md)。
+
+<a id="web-interface"></a>
+### Web 界面
+
+Maigret 内置一个 Web UI,提供结果图谱视图和报告下载。
+
+<details>
+<summary>Web 界面截图</summary>
+
+![Web 界面:启动页](https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot_start.png)
+
+![Web 界面:结果页](https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot.png)
+
+</details>
+
+```console
+maigret --web 5000
+```
+
+在浏览器中打开 http://127.0.0.1:5000,输入用户名即可查看结果。
+
+### Python 库
+
+**Maigret 可以嵌入到你自己的 Python 项目里使用。** CLI 只是对一个异步函数的薄包装,你完全可以直接调用它——构建自定义流水线、把结果接入自家工具,或将其嵌入更大的 OSINT 工作流。
+
+完整示例(包含异步用法和按标签筛选站点)请参见[库使用指南](https://maigret.readthedocs.io/en/latest/library-usage.html)。
+
+### 常用 CLI 参数
+
+- `--parse URL` —— 解析一个个人主页,从中提取 ID/用户名,并以此为起点发起递归搜索。
+- `--permute` —— 基于两个或更多输入生成可能的用户名变体(例如 `john doe` → `johndoe`、`j.doe` …)并对其逐一搜索。
+- `--self-check [--auto-disable]` —— 维护者用于核对数据库的工具:针对线上站点验证 `usernameClaimed` / `usernameUnclaimed` 配对是否仍然有效。
+- `--ai` / `--ai-model` —— 启用 [AI 分析](#ai-analysis),将搜索结果交给 OpenAI 兼容 API,并把简短的调查摘要流式输出到终端。
+
+<a id="ai-analysis"></a>
+### AI 分析
+
+`--ai` 会先收集搜索结果、在内存中构建 Markdown 报告,再将其发送到一个 OpenAI 兼容的 chat completion 接口,生成一份简短、克制的调查摘要(最可能的真实姓名、所在地、职业、兴趣、语言、置信度以及后续线索)。开启该模式后,逐站点的进度输出会被静默,模型的输出会以流式方式打印到 stdout。
+
+```bash
+export OPENAI_API_KEY=sk-...
+maigret user --ai
+
+# 切换到其它模型
+maigret user --ai --ai-model gpt-4o-mini
+```
+
+API key 也可以写入 `settings.json` 的 `openai_api_key` 字段。接口地址默认为 `https://api.openai.com/v1`,通过在 `settings.json` 中设置 `openai_api_base_url`,可以指向任何 OpenAI 兼容的服务(Azure OpenAI、OpenRouter、本地推理服务等)。完整选项见[配置文档](https://maigret.readthedocs.io/en/latest/settings.html)。
+
+### Tor / I2P / 代理
+
+Maigret 支持通过代理、Tor 或 I2P 转发请求——这对访问 `.onion` / `.i2p` 站点,以及绕过会拦截数据中心 IP 的 WAF 都很有用。
+
+```bash
+# 任意 HTTP/SOCKS 代理
+maigret user --proxy socks5://127.0.0.1:1080
+
+# Tor(默认网关 socks5://127.0.0.1:9050)
+maigret user --tor-proxy socks5://127.0.0.1:9050
+
+# I2P(默认网关 http://127.0.0.1:4444)
+maigret user --i2p-proxy http://127.0.0.1:4444
+```
+
+请先启动 Tor / I2P 守护进程再运行上述命令——Maigret 不会替你管理这些网关。
+
+<a id="contributing"></a>
+## 参与贡献
+
+请精确地在 `data.json` 里新增或修复站点(不要使用 `json.load`/`json.dump` 整体读写),然后运行 `./utils/update_site_data.py` 重新生成 `sites.md` 和数据库元数据,再提交 Pull Request。更多细节见 [CONTRIBUTING 指南](https://github.com/soxoj/maigret/blob/main/CONTRIBUTING.md) 和[开发文档](https://maigret.readthedocs.io/en/latest/development.html)。版本历史见 [CHANGELOG.md](CHANGELOG.md)。
+
+<a id="commercial-use"></a>
+## 商业使用
+
+开源版本的 Maigret 采用 MIT 许可证,可不受限制地用于商业用途——但站点检查会随时间失效,需要持续维护。
+
+如果你有更严肃的商业需求——希望使用**每日更新的站点数据库**或**用户名查询 API**——欢迎联系:📧 [maigret@soxoj.com](mailto:maigret@soxoj.com)
+
+- 私有站点数据库 —— 5000+ 站点,每日更新(独立于公开开源数据库)
+- 用户名查询 API —— 将 Maigret 集成进你的产品
+
+<a id="about"></a>
+## 关于
+
+### 免责声明
+
+**仅供教育与合法用途。** 使用者需自行承担遵守所在司法辖区相关法律(GDPR、CCPA 等)的责任。作者不对任何滥用行为负责。
+
+### 反馈
+
+[提交 issue](https://github.com/soxoj/maigret/issues) · [GitHub Discussions](https://github.com/soxoj/maigret/discussions) · [Telegram](https://t.me/soxoj)
+
+### SOWEL 分类
+
+涉及到的 OSINT 技术:
+- [SOTL-2.2. Search For Accounts On Other Platforms](https://sowel.soxoj.com/other-platform-accounts)
+- [SOTL-6.1. Check Logins Reuse To Find Another Account](https://sowel.soxoj.com/logins-reuse)
+- [SOTL-6.2. Check Nicknames Reuse To Find Another Account](https://sowel.soxoj.com/nicknames-reuse) 
+
+### 许可证
+
+MIT © [Maigret](https://github.com/soxoj/maigret)
@@ -0,0 +1,91 @@
+# Troubleshooting
+
+Common issues when running Maigret and how to fix them. If none of this helps, [open an issue](https://github.com/soxoj/maigret/issues) with the output of `maigret --version` and the exact command you ran.
+
+## "Lots of sites fail / timeout / return 403"
+
+This is by far the most common report. It almost always comes from anti-bot protection (Cloudflare, DDoS-Guard, Akamai, etc.) or a slow network — not from a bug in Maigret.
+
+**Results vary a lot depending on where you run from.** The same command on the same username can produce very different output on:
+
+- **Mobile internet** (4G/5G) — usually the best results. Carrier NAT shares your IP with thousands of real users, so WAFs rarely block it.
+- **Home broadband** — generally good, though some ISPs are reputation-flagged.
+- **Hosting / cloud / VPS infrastructure** (AWS, GCP, DigitalOcean, Hetzner, etc.) — the worst case. Datacenter IP ranges are blanket-blocked or challenged by most WAFs, so you will see many false negatives and 403s.
+
+If a run looks suspiciously empty, **try a different network before assuming Maigret is broken**: tether from your phone, switch between Wi-Fi and mobile, or move the run off a VPS onto a residential machine. Comparing results across two networks is also the fastest way to tell whether a missing account is genuinely missing or just blocked on the current IP.
+
+Once you have a sense of the baseline, try these tweaks in order:
+
+1. **Raise the timeout.** The default is 30 seconds. On mobile networks or for slow sites, bump it:
+   ```bash
+   maigret user --timeout 60
+   ```
+2. **Retry failed checks.** Transient 5xx / timeouts often clear on a second try:
+   ```bash
+   maigret user --retries 2
+   ```
+3. **Lower parallelism.** Some WAFs rate-limit aggressively. Maigret defaults to 100 concurrent connections (`-n` / `--max-connections`) — dropping this makes you look less like a scanner:
+   ```bash
+   maigret user -n 20
+   ```
+4. **Route through a residential proxy.** Datacenter IPs (AWS, GCP, DigitalOcean) are blanket-blocked by many WAFs. A residential / mobile proxy usually fixes this:
+   ```bash
+   maigret user --proxy http://user:pass@residential-proxy:port
+   ```
+   Note: Tor (`--tor-proxy`) rarely helps here — most WAFs block Tor exit nodes just as aggressively as datacenter IPs. Use Tor only when you actually need to reach `.onion` sites (see below).
+
+If specific sites *always* fail regardless of the above, they are likely broken in the database (stale markers, new WAF, site redesign). Report them with `--print-errors` output so a maintainer can look at the check config.
+
+## "No results at all" / "maigret: command not found"
+
+- **`command not found`** — `pip install maigret` put the binary under `~/.local/bin` (Linux/macOS) or `%APPDATA%\Python\Scripts` (Windows). Add that directory to `PATH`, or run `python3 -m maigret user` instead.
+- **Empty output** — check that you actually passed a username; `maigret` alone prints help. Also confirm Python 3.10+ with `python3 --version`.
+
+## "SSL / certificate errors"
+
+Usually caused by a corporate MITM proxy or an outdated `certifi` bundle.
+
+```bash
+pip install --upgrade certifi
+```
+
+If you are behind a corporate proxy, set `HTTPS_PROXY` / `HTTP_PROXY` environment variables and pass `--proxy "$HTTPS_PROXY"` so Maigret uses the same route.
+
+## ".onion / .i2p sites are skipped"
+
+These sites only load through the matching gateway. Start your Tor or I2P daemon first, then:
+
+```bash
+# Tor
+maigret user --tor-proxy socks5://127.0.0.1:9050
+
+# I2P
+maigret user --i2p-proxy http://127.0.0.1:4444
+```
+
+Maigret does not launch or manage these daemons — they must already be running.
+
+## "The PDF / XMind / HTML report looks wrong"
+
+- **PDF** — requires `weasyprint` and its system dependencies (Pango, Cairo, GDK-PixBuf). On Debian/Ubuntu: `apt install libpango-1.0-0 libpangoft2-1.0-0`. macOS: `brew install pango`.
+- **XMind** — the `--xmind` flag generates **XMind 8** files. XMind 2022+ (Zen / XMind 2023) uses a different format and will not open them. Use XMind 8 or convert via `--html`.
+- **HTML** looks unstyled — open it through a local file path (`file:///...`), not via a preview pane that strips CSS.
+
+## "The site database is out of date"
+
+Maigret auto-fetches a fresh `data.json` from GitHub once every 24 hours. To force-refresh now:
+
+```bash
+maigret user --force-update
+```
+
+To run entirely against the local built-in copy (e.g. offline):
+
+```bash
+maigret user --no-autoupdate
+```
+
+## Still stuck?
+
+- [Open an issue](https://github.com/soxoj/maigret/issues) — include your OS, Python version, Maigret version, and the full command.
+- Ask in [GitHub Discussions](https://github.com/soxoj/maigret/discussions) or the [Telegram](https://t.me/soxoj) channel.
@@ -0,0 +1,69 @@
+# Maigret
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/soxoj/maigret/main/static/maigret.png" height="220" alt="Maigret logo"/>
+</div>
+
+**Maigret** collects a dossier on a person **by username only**, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys required.
+
+## Installation
+
+Google Cloud Shell does not ship with all the system libraries Maigret needs (`libcairo2-dev`, `pkg-config`). The helper script below installs them and then builds Maigret from the cloned source.
+
+Copy the command and run it in the Cloud Shell terminal:
+
+```bash
+./utils/cloudshell_install.sh
+```
+
+When the script finishes, verify the install:
+
+```bash
+maigret --version
+```
+
+## Usage examples
+
+Run a basic search for a username. By default Maigret checks the **500 highest-ranked sites by traffic** — pass `-a` to scan the full 3,000+ database.
+
+```bash
+maigret soxoj
+```
+
+Search several usernames at once:
+
+```bash
+maigret user1 user2 user3
+```
+
+Narrow the run to sites related to cryptocurrency via the `crypto` tag (you can also use country tags):
+
+```bash
+maigret vitalik.eth --tags crypto
+```
+
+Generate reports in HTML, PDF, and XMind 8 formats:
+
+```bash
+maigret soxoj --html
+maigret soxoj --pdf
+maigret soxoj --xmind
+```
+
+Download a generated report from Cloud Shell to your local machine:
+
+```bash
+cloudshell download reports/report_soxoj.pdf
+```
+
+Tune reliability on flaky networks — raise the timeout and retry failed checks:
+
+```bash
+maigret soxoj --timeout 60 --retries 2
+```
+
+For the full list of options see `maigret --help` or the [CLI documentation](https://maigret.readthedocs.io/en/latest/command-line-options.html).
+
+## Further reading
+
+Full project documentation: [maigret.readthedocs.io](https://maigret.readthedocs.io/)
@@ -10,4 +10,4 @@
 pixabay.com	FALSE	/	FALSE	0	anonymous_user_id	c1e4ee09-5674-4252-aa94-8c47b1ea80ab
 pixabay.com	FALSE	/	FALSE	1647214439	csrftoken	vfetTSvIul7gBlURt6s985JNM18GCdEwN5MWMKqX4yI73xoPgEj42dbNefjGx5fr
 pixabay.com	FALSE	/	FALSE	1647300839	client_width	1680
-pixabay.com	FALSE	/	FALSE	748111764839	is_human	1
+pixabay.com	FALSE	/	FALSE	748111764839	is_human	1
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.http://sphinx-doc.org/
+	exit /b 1
+)
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
@@ -0,0 +1,2 @@
+sphinx-copybutton
+sphinx_rtd_theme
@@ -0,0 +1,311 @@
+.. _command-line-options:
+
+Command line options
+====================
+
+Usernames
+---------
+
+``maigret username1 username2 ...``
+
+You can specify several usernames separated by space. Usernames are
+**not** mandatory as there are other operations modes (see below).
+
+Parsing of account pages and online documents
+---------------------------------------------
+
+``maigret --parse URL``
+
+Maigret will try to extract information about the document/account owner
+(including username and other ids) and will make a search by the
+extracted username and ids. See examples in the :ref:`extracting-information-from-pages` section.
+
+Main options
+------------
+
+Options are also configurable through settings files, see
+:doc:`settings section <settings>`.
+
+``--tags`` - Filter sites for searching by tags: sites categories and
+two-letter country codes (**not a language!**). E.g. photo, dating, sport; jp, us, global.
+Multiple tags can be associated with one site. **Warning**: tags markup is
+not stable now. Read more :doc:`in the separate section <tags>`.
+
+``--exclude-tags`` - Exclude sites with specific tags from the search
+(blacklist). E.g. ``--exclude-tags porn,dating`` will skip all sites
+tagged with ``porn`` or ``dating``. Can be combined with ``--tags`` to
+include certain categories while excluding others. Read more
+:doc:`in the separate section <tags>`.
+
+``-n``, ``--max-connections`` - Allowed number of concurrent connections
+**(default: 100)**.
+
+``-a``, ``--all-sites`` - Use all sites for scan **(default: top 500)**.
+
+``--top-sites`` - Count of sites for scan ranked by Majestic Million
+**(default: top 500)**.
+
+**Mirrors:** After the top *N* sites by Majestic Million rank are chosen (respecting
+``--tags``, ``--use-disabled-sites``, etc.), Maigret may add extra sites
+whose database field ``source`` names a **parent platform** that itself falls
+in the Majestic Million top *N* when ranking **including disabled** sites. For example,
+if ``Twitter`` ranks in the first 500 by Majestic Million, a mirror such as ``memory.lol``
+(with ``source: Twitter``) is included even though it has no rank and would
+otherwise be cut off. The same applies to Instagram-related mirrors (e.g.
+Picuki) when ``Instagram`` is in that parent top *N* by rank—even if the
+official ``Instagram`` entry is disabled and not scanned by default, its
+mirrors can still be pulled in. The final list is the ranked top *N* plus
+these mirrors (no fixed upper bound on mirror count).
+
+``--timeout`` - Time (in seconds) to wait for responses from sites
+**(default: 30)**. A longer timeout will be more likely to get results
+from slow sites. On the other hand, this may cause a long delay to
+gather all results. The choice of the right timeout should be carried
+out taking into account the bandwidth of the Internet connection.
+
+``--cookies-jar-file`` - File with custom cookies in Netscape format
+(aka cookies.txt). You can install an extension to your browser to
+download own cookies (`Chrome <https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid>`_, `Firefox <https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/>`_).
+
+``--no-recursion`` - Disable parsing pages for other usernames and
+recursive search by them.
+
+``--use-disabled-sites`` - Use disabled sites to search (may cause many
+false positives).
+
+``--id-type`` - Specify identifier(s) type (default: username).
+Supported types: gaia_id, vk_id, yandex_public_id, ok_id, wikimapia_uid.
+Currently, you must add ``-a`` flag to run a scan on sites with custom
+id types, sites will be filtered automatically.
+
+``--ignore-ids`` - Do not make search by the specified username or other
+ids. Useful for repeated scanning with found known irrelevant usernames.
+
+``--db`` - Load Maigret database from a JSON file or an online, valid,
+JSON file. See :ref:`custom-database` below.
+
+``--no-autoupdate`` - Disable the automatic database update check that
+runs at startup. The currently cached (or bundled) database is used
+as-is.
+
+``--force-update`` - Force a database update check at startup, ignoring
+the usual check interval. Implies ``--no-autoupdate`` for the rest of
+the run after the explicit update finishes.
+
+``--retries RETRIES`` - Count of attempts to restart temporarily failed
+requests.
+
+``--cloudflare-bypass`` *(experimental)* - Route checks for sites tagged
+``protection: ["cf_js_challenge"]`` / ``["cf_firewall"]`` / ``["webgate"]``
+through a local Chrome-based solver (FlareSolverr by default). The bypass
+is opt-in — without this flag (or
+``settings.cloudflare_bypass.enabled = true``) those sites are checked
+the usual way, which Cloudflare almost always blocks: you get an UNKNOWN
+status with a JS-challenge / firewall error rather than a real result.
+Configure the backend in ``settings.cloudflare_bypass.modules``.
+See :ref:`cloudflare-bypass`. **Experimental** — the flag, schema and
+routing rules may change without backwards-compatibility guarantees.
+
+.. _custom-database:
+
+Using a custom sites database
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``--db`` flag accepts three forms:
+
+1. **HTTP(S) URL** — fetched as-is, e.g.
+   ``--db https://example.com/my_db.json``.
+2. **Local file path** — absolute (``--db /tmp/private.json``) or
+   relative to the current working directory
+   (``--db LLM/maigret_private_db.json``).
+3. **Module-relative path** — kept for backwards compatibility, resolved
+   against the installed ``maigret/`` package directory (e.g. the
+   default ``resources/data.json``).
+
+Resolution order for local paths: the path is first tried as given
+(absolute or cwd-relative); if that file does not exist, Maigret falls
+back to the legacy module-relative resolution. If neither location
+contains the file, Maigret exits with an error rather than silently
+loading the bundled database.
+
+When ``--db`` points to a custom file, automatic database updates are
+skipped — the file is used exactly as provided.
+
+On every run Maigret prints the database it actually loaded, for
+example::
+
+    [+] Using sites database: /path/to/maigret_private_db.json (6 sites)
+
+If loading the requested database fails for any other reason (corrupt
+JSON, missing required keys, …), Maigret prints a warning, falls back
+to the bundled database, and reports the fallback explicitly::
+
+    [-] Falling back to bundled database: /…/maigret/resources/data.json
+    [+] Using sites database: /…/maigret/resources/data.json (3154 sites)
+
+A typical invocation against a private database, with auto-update
+disabled and all sites scanned, looks like::
+
+    python3 -m maigret username \
+        --db LLM/maigret_private_db.json \
+        --no-autoupdate -a
+
+Reports
+-------
+
+``-P``, ``--pdf`` - Generate a PDF report (general report on all
+usernames).
+
+``-H``, ``--html`` - Generate an HTML report file (general report on all
+usernames).
+
+``-X``, ``--xmind`` - Generate an XMind 8 mindmap (one report per
+username).
+
+``-C``, ``--csv`` - Generate a CSV report (one report per username).
+
+``-T``, ``--txt`` - Generate a TXT report (one report per username).
+
+``-J``, ``--json`` - Generate a JSON report of specific type: simple,
+ndjson (one report per username). E.g. ``--json ndjson``
+
+``-M``, ``--md`` - Generate a Markdown report (general report on all
+usernames). See :ref:`markdown-report` below.
+
+``--ai`` - Run an AI-powered analysis of the search results using an
+OpenAI-compatible chat completion API. The internal Markdown report is
+sent to the model, which returns a short investigation summary that is
+streamed to the terminal. See :ref:`ai-analysis` below.
+
+``--ai-model`` - Model name to use with ``--ai``. Defaults to
+``openai_model`` from settings (``gpt-4o`` out of the box).
+
+``-fo``, ``--folderoutput`` - Results will be saved to this folder,
+``results`` by default. Will be created if doesn’t exist.
+
+Output options
+--------------
+
+``-v``, ``--verbose`` - Display extra information and metrics.
+*(loglevel=WARNING)*
+
+``-vv``, ``--info`` - Display service information. *(loglevel=INFO)*
+
+``-vvv``, ``--debug``, ``-d`` - Display debugging information and site
+responses. *(loglevel=DEBUG)*
+
+``--print-not-found`` - Print sites where the username was not found.
+
+``--print-errors`` - Print errors messages: connection, captcha, site
+country ban, etc.
+
+Other operations modes
+----------------------
+
+``--version`` - Display version information and dependencies.
+
+``--self-check`` - Do self-checking for sites and database. Each site is
+tested by looking up its known-claimed and known-unclaimed usernames and
+verifying that the results match expectations. Individual site failures
+(network errors, unexpected exceptions, etc.) are caught and logged
+without stopping the overall process, so the check always runs to
+completion. After checking, Maigret reports a summary of issues found.
+If any sites were disabled (see ``--auto-disable``), Maigret asks if you
+want to save updates; answering y/Y will rewrite the local database.
+
+``--auto-disable`` - Used with ``--self-check``: automatically disable
+sites that fail checks (incorrect detection of claimed/unclaimed
+usernames, connection errors, or unexpected exceptions). Without this
+flag, ``--self-check`` only **reports** issues without modifying the
+database.
+
+``--diagnose`` - Used with ``--self-check``: print detailed diagnosis
+information for each failing site, including the check type, the list
+of issues found, and recommendations (e.g. suggesting a different
+``checkType``).
+
+``--submit URL`` - Do an automatic analysis of the given account URL or
+site main page URL to determine the site engine and methods to check
+account presence. After checking Maigret asks if you want to add the
+site, answering y/Y will rewrite the local database.
+
+.. _markdown-report:
+
+Markdown report (LLM-friendly)
+------------------------------
+
+The ``--md`` / ``-M`` flag generates a Markdown report designed for both human reading and analysis by AI assistants (ChatGPT, Claude, etc.).
+
+.. code-block:: console
+
+   maigret username --md
+
+The report includes:
+
+- **Summary** with aggregated personal data (all fullnames, locations, bios found across accounts), country tags, website tags, first/last seen timestamps.
+- **Per-account sections** with profile URL, site tags, and all extracted fields (username, bio, follower count, linked accounts, etc.).
+- **Possible false positives** disclaimer explaining that accounts may belong to different people.
+- **Ethical use** notice about applicable data protection laws.
+
+**Using with AI tools:**
+
+The Markdown format is optimized for LLM context windows. You can feed the report directly to an AI assistant for follow-up analysis:
+
+.. code-block:: console
+
+   # Generate the report
+   maigret johndoe --md
+
+   # Feed it to an AI tool
+   cat reports/report_johndoe.md | llm "Analyze this OSINT report and summarize key findings"
+
+The structured Markdown with per-site sections makes it easy for AI tools to extract relationships, cross-reference identities, and identify patterns across accounts.
+
+For a built-in alternative that calls the model for you and prints the
+summary directly, see :ref:`ai-analysis` below.
+
+.. _ai-analysis:
+
+AI analysis (built-in)
+----------------------
+
+The ``--ai`` flag turns the search results into a short investigation
+summary by sending the internal Markdown report to an OpenAI-compatible
+chat completion API and streaming the model's reply to the terminal.
+
+.. code-block:: console
+
+   export OPENAI_API_KEY=sk-...
+   maigret username --ai
+
+   # use a smaller / cheaper model
+   maigret username --ai --ai-model gpt-4o-mini
+
+While ``--ai`` is active, per-site progress lines and the short text
+report at the end are suppressed so the streamed summary is the main
+output. The Markdown report itself is built in memory and is **not**
+written to disk by ``--ai`` alone — combine with ``--md`` if you also
+want the file on disk.
+
+The summary follows a fixed format with sections for the most likely
+real name, location, occupation, interests, languages, main website,
+username variants, number of platforms, active years, a confidence
+rating, and a short list of follow-up leads. The model is instructed
+to rely only on what is supported by the report and to avoid mixing
+clearly unrelated profiles into the main identity.
+
+**Configuration.** The API key is resolved from
+``settings.openai_api_key`` first, then from the ``OPENAI_API_KEY``
+environment variable. The endpoint defaults to
+``https://api.openai.com/v1`` and can be redirected to any
+OpenAI-compatible service (Azure OpenAI, OpenRouter, a local server,
+…) by setting ``openai_api_base_url`` in ``settings.json``. See
+:ref:`settings` for the full list of options.
+
+.. note::
+
+   ``--ai`` makes a network request to the configured chat completion
+   endpoint and sends the full Markdown report (which contains the
+   gathered profile data). Use it only with providers and accounts
+   you trust with that data.
+
@@ -0,0 +1,36 @@
+# Configuration file for the Sphinx documentation builder.
+
+# -- Project information
+
+project = 'Maigret'
+copyright = '2025, soxoj'
+author = 'soxoj'
+
+release = '0.5.0'
+version = '0.5'
+
+# -- General configuration
+
+extensions = [
+    'sphinx.ext.duration',
+    'sphinx.ext.doctest',
+    'sphinx.ext.autodoc',
+    'sphinx.ext.autosummary',
+    'sphinx.ext.intersphinx',
+    'sphinx_copybutton'
+]
+
+intersphinx_mapping = {
+    'python': ('https://docs.python.org/3/', None),
+    'sphinx': ('https://www.sphinx-doc.org/en/master/', None),
+}
+intersphinx_disabled_domains = ['std']
+
+templates_path = ['_templates']
+
+# -- Options for HTML output
+
+html_theme = 'sphinx_rtd_theme'
+
+# -- Options for EPUB output
+epub_show_urls = 'footnote'
@@ -0,0 +1,354 @@
+.. _development:
+
+Development
+==============
+
+Frequently Asked Questions
+--------------------------
+
+1. Where to find the list of supported sites?
+
+The human-readable list of supported sites is available in the `sites.md <https://github.com/soxoj/maigret/blob/main/sites.md>`_ file in the repository.
+It's been generated automatically from the main JSON file with the list of supported sites.
+
+The machine-readable JSON file with the list of supported sites is available in the
+`data.json <https://github.com/soxoj/maigret/blob/main/maigret/resources/data.json>`_ file in the directory `resources`.
+
+2. Which methods to check the account presence are supported?
+
+The supported methods (``checkType`` values in ``data.json``) are:
+
+- ``message`` - the most reliable method, checks if any string from ``presenceStrs`` is present and none of the strings from ``absenceStrs`` are present in the HTML response
+- ``status_code`` - checks that status code of the response is 2XX
+- ``response_url`` - check if there is not redirect and the response is 2XX
+
+.. note::
+   Maigret natively treats specific anti-bot HTTP status codes (like LinkedIn's ``HTTP 999``) as a standard "Not Found/Available" signal instead of throwing an infrastructure Server Error, gracefully preventing false positives.
+
+See the details of check mechanisms in the `checking.py <https://github.com/soxoj/maigret/blob/main/maigret/checking.py#L339>`_ file.
+
+.. note::
+   Maigret now uses the **Majestic Million** dataset for site popularity sorting instead of the discontinued Alexa Rank API. For backward compatibility with existing configurations and parsers, the ranking field in `data.json` and internal site models remains named ``alexaRank`` and ``alexa_rank``.
+
+**Mirrors and ``--top-sites``:** When you limit scans with ``--top-sites N``, Maigret also includes *mirror* sites (entries whose ``source`` field points at a parent platform such as Twitter or Instagram) if that parent would appear in the Majestic Million top *N* when disabled sites are considered for ranking. See the **Mirrors** paragraph under ``--top-sites`` in :doc:`command-line-options`.
+
+Testing
+-------
+
+It is recommended use Python 3.10 for testing.
+
+Install test requirements:
+
+.. code-block:: console
+
+  poetry install --with dev
+
+
+Use the following commands to check Maigret:
+
+.. code-block:: console
+
+  # run linter and typing checks
+  # order of checks:
+  # - critical syntax errors or undefined names
+  # - flake checks
+  # - mypy checks
+  make lint
+
+  # run black formatter
+  make format
+
+  # run testing with coverage html report
+  # current test coverage is 58%
+  make test
+
+  # open html report
+  open htmlcov/index.html
+
+  # get flamechart of imports to estimate startup time
+  make speed
+
+
+Site naming conventions
+-----------------------------------------------
+
+Site names are the keys in ``data.json`` and appear in user-facing reports. Follow these rules:
+
+- **Title Case** by default: ``Product Hunt``, ``Hacker News``.
+- **Lowercase** only if the brand itself is written that way: ``kofi``, ``note``, ``hi5``.
+- **No domain suffix** (``calendly.com`` → ``Calendly``), unless the domain is part of the recognized brand name: ``last.fm``, ``VC.ru``, ``Archive.org``.
+- **No full UPPERCASE** unless the brand is an acronym: ``VK``, ``CNET``, ``ICQ``, ``IFTTT``.
+- **No** ``www.`` **or** ``https://`` **prefix** in the name.
+- **Spaces** are allowed when the brand uses them: ``Star Citizen``, ``Google Maps``.
+- **{username} templates** in names are acceptable: ``{username}.tilda.ws``.
+
+When in doubt, check how the service refers to itself on its homepage.
+
+How to fix false-positives
+-----------------------------------------------
+
+If you want to work with sites database, don't forget to activate statistics update git hook, command for it would look like this: ``git config --local core.hooksPath .githooks/``.
+
+You should make your git commits from your maigret git repo folder, or else the hook wouldn't find the statistics update script.
+
+1. Determine the problematic site.
+
+If you already know which site has a false-positive and want to fix it specifically, go to the next step.
+
+Otherwise, simply run a search with a random username (e.g. `laiuhi3h4gi3u4hgt`) and check the results.
+Alternatively, you can use `the Telegram bot <https://t.me/maigret_search_bot>`_.
+
+2. Open the account link in your browser and check:
+
+- If the site is completely gone, remove it from the list
+- If the site still works but looks different, update in data.json how we check it
+- If the site requires login to view profiles, disable checking it
+
+3. Find the site in the `data.json <https://github.com/soxoj/maigret/blob/main/maigret/resources/data.json>`_ file.
+
+If the ``checkType`` method is not ``message`` and you are going to fix check, update it:
+- put ``message`` in ``checkType``
+- put in ``absenceStrs`` a keyword that is present in the HTML response for an non-existing account
+- put in ``presenceStrs`` a keyword that is present in the HTML response for an existing account
+
+If you have trouble determining the right keywords, you can use automatic detection by passing the account URL with the ``--submit`` option:
+
+.. code-block:: console
+
+  maigret --submit https://my.mail.ru/bk/alex
+
+To disable checking, set ``disabled`` to ``true`` or simply run:
+
+.. code-block:: console
+
+  maigret --self-check --site My.Mail.ru@bk.ru
+
+To debug the check method using the response HTML, you can run:
+
+.. code-block:: console
+
+  maigret soxoj --site My.Mail.ru@bk.ru -d 2> response.txt
+
+There are few options for sites data.json helpful in various cases:
+
+- ``engine`` - a predefined check for the sites of certain type (e.g. forums), see the ``engines`` section in the JSON file
+- ``headers`` - a dictionary of additional headers to be sent to the site
+- ``requestHeadOnly`` - set to ``true`` if it's enough to make a HEAD request to the site
+- ``regexCheck`` - a regex to check if the username is valid, in case of frequent false-positives
+- ``requestMethod`` - set the HTTP method to use (e.g., ``POST``). By default, Maigret natively defaults to GET or HEAD.
+- ``requestPayload`` - a dictionary with the JSON payload to send for POST requests (e.g., ``{"username": "{username}"}``), extremely useful for parsing GraphQL or modern JSON APIs.
+- ``protection`` - a list of protection types detected on the site (see below).
+
+``protection`` (site protection tracking)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``protection`` field records what kind of anti-bot protection a site uses. Maigret reads this field and automatically applies the appropriate bypass mechanism where one exists.
+
+Two categories of tag:
+
+- **Load-bearing.** Maigret changes its HTTP client or headers based on the tag. Currently only ``tls_fingerprint`` (switches to ``curl_cffi`` with Chrome-class TLS).
+- **Documentation-only.** Maigret does **not** change behavior based on the tag; it records *why* the site is hard so a future solver can target the right set of sites without re-auditing.
+
+Within the documentation-only tags, there is a further split that dictates whether the site is ``disabled: true``:
+
+- ``ip_reputation`` is the **only** doc-tag that **keeps the site enabled**. It means "works for most users, fails from datacenter/cloud IPs." Disabling would silently hide a working site from anyone with a clean IP. The fix is **external** to Maigret (residential IP or ``--proxy``).
+- ``cf_js_challenge``, ``cf_firewall``, ``aws_waf_js_challenge``, ``ddos_guard_challenge``, ``custom_bot_protection``, ``js_challenge`` all pair with ``disabled: true``. They mean "does not work for anyone right now"; the tag identifies the provider so that when a bypass ships, every site with that tag can be re-enabled in one pass.
+
+Supported values:
+
+- ``tls_fingerprint`` *(load-bearing; site stays enabled)* — the site fingerprints the TLS handshake (JA3/JA4) and blocks non-browser clients. Maigret automatically uses ``curl_cffi`` with Chrome browser emulation to bypass this. Requires the ``curl_cffi`` package (included as a dependency). Examples: Instagram, NPM, Codepen, Kickstarter, Letterboxd.
+- ``ip_reputation`` *(documentation-only; site stays enabled)* — the site blocks requests from datacenter/cloud IPs regardless of headers or TLS. Cannot be bypassed automatically; run Maigret from a regular internet connection (not a datacenter) or use a proxy (``--proxy``). The site is **not** marked ``disabled`` because it continues to work for users on residential IPs. Examples: Reddit, Patreon, Figma, OnlyFans.
+- ``cf_js_challenge`` *(documentation-only; pair with ``disabled: true``)* — Cloudflare Managed Challenge / Turnstile JS challenge. Symptom: HTTP 403 with ``cf-mitigated: challenge`` header; body contains ``challenges.cloudflare.com``, ``_cf_chl_opt``, ``window._cf_chl``, or "Just a moment". Not bypassable via ``curl_cffi`` TLS impersonation (verified across Chrome 123/124/131, Safari 17/18, Firefox 133/135, Edge 101 — all return the same 403 challenge page); a real browser executing the challenge JS is required to obtain the clearance cookie. Sites stay ``disabled: true`` until a CF-challenge solver is integrated. Examples: DMOJ, Elakiri, Fanlore, Bdoutdoors, TheStudentRoom, forum.hr.
+- ``cf_firewall`` *(documentation-only; pair with ``disabled: true``)* — Cloudflare firewall rule / bot score block (WAF action=block, **not** action=challenge). Symptom: HTTP 403 served by Cloudflare (``server: cloudflare``, ``cf-ray`` header) **without** JS-challenge markers — body typically shows "Access denied", "Attention Required", or just a bare 1015/1016/1020 error page. Unlike ``ip_reputation``, residential IPs are **not** sufficient to bypass — Cloudflare decides based on a composite of bot score, TLS fingerprint, UA, ASN, and custom site-owner rules, so ``curl_cffi`` Chrome impersonation from a residential line still returns 403. Sites stay ``disabled: true`` until a per-site bypass (cookies, real browser, or residential+clean session) is found. Examples: Fark, Fodors, Huntingnet, Hunttalk.
+- ``aws_waf_js_challenge`` *(documentation-only; pair with ``disabled: true``)* — the site is protected by AWS WAF with a JavaScript challenge. Symptom: HTTP 202 with empty body and ``x-amzn-waf-action: challenge`` header (a token-granting challenge that requires executing the CAPTCHA/challenge JS bundle). Neither ``curl_cffi`` TLS impersonation nor User-Agent changes bypass this — a real browser or the official AWS WAF challenge-solver SDK is required. Sites stay ``disabled: true`` until a solver is integrated. Example: Dreamwidth.
+- ``ddos_guard_challenge`` *(documentation-only; pair with ``disabled: true``)* — DDoS-Guard (ddos-guard.net) anti-bot page. Symptom: HTTP 403 with ``server: ddos-guard`` header; body contains "DDoS-Guard". DDoS-Guard fingerprints different UAs per source IP, so a single User-Agent override does not work across environments; a JS-capable bypass or DDoS-Guard-aware solver is required. Sites stay ``disabled: true`` until a solver is integrated. Example: ForumHouse.
+- ``js_challenge`` *(documentation-only; pair with ``disabled: true``)* — **fallback** for JavaScript-challenge systems whose provider cannot be identified (custom in-house challenge pages that are not Cloudflare, AWS WAF, or any other recognized vendor). Prefer a provider-specific tag whenever the provider can be pinned down from response headers or body signatures.
+- ``custom_bot_protection`` *(documentation-only; pair with ``disabled: true``)* — **fallback** for non-JS-challenge bot protection served by a custom/in-house system (not Cloudflare, not AWS WAF, not DDoS-Guard). Typical symptom: HTTP 403 from the site's own origin server (``server: nginx``, AWS ELB, etc.) with a branded block page, returned regardless of TLS fingerprint or residential IP. Not generically bypassable; investigate per site (cookies, session, proxy geography). Examples: Hackerearth ("HackerEarth Guardian"), FreelanceJob (nginx-level block).
+
+**Rule: prefer provider-specific protection tags.** When a site is blocked by an identifiable anti-bot vendor, always record the vendor in the tag (``cf_js_challenge``, ``cf_firewall``, ``aws_waf_js_challenge``, ``ddos_guard_challenge``, and future additions such as ``sucuri_challenge``, ``incapsula_challenge``). The generic ``js_challenge`` and ``custom_bot_protection`` tags are reserved for custom/unknown systems. Rationale: bypass solvers are inherently provider-specific (a Cloudflare Turnstile solver does not help with AWS WAF); recording the provider in advance lets us fan out fixes the moment a per-provider solver is added, without re-auditing every disabled site. The same principle applies to other protection categories when the provider is identifiable.
+
+Example:
+
+.. code-block:: json
+
+    "Instagram": {
+        "url": "https://www.instagram.com/{username}/",
+        "checkType": "message",
+        "presenseStrs": ["\"routePath\":\"\\/"],
+        "absenceStrs": ["\"routePath\":null"],
+        "protection": ["tls_fingerprint"]
+    }
+
+``urlProbe`` (optional profile probe URL)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+By default Maigret performs the HTTP request to the same URL as ``url`` (the public profile link pattern).
+
+If you set ``urlProbe`` in ``data.json``, Maigret **fetches** that URL for the presence check (API, GraphQL, JSON endpoint, etc.), while **reports and ``url_user``** still use ``url`` — the human-readable profile page users should open.
+
+Placeholders: ``{username}``, ``{urlMain}``, ``{urlSubpath}`` (same as for ``url``). Example: GitHub uses ``url`` ``https://github.com/{username}`` and ``urlProbe`` ``https://api.github.com/users/{username}``; Picsart uses the web profile ``https://picsart.com/u/{username}`` and probes ``https://api.picsart.com/users/show/{username}.json``.
+
+Implementation: ``make_site_result`` in `checking.py <https://github.com/soxoj/maigret/blob/main/maigret/checking.py>`_.
+
+Site check fixes using LLM
+--------------------------
+
+.. note::
+   The ``LLM/`` directory at the root of the repository contains detailed instructions for editing site checks (in Markdown format): checklist, full guide to ``checkType`` / ``data.json`` / ``urlProbe``, handling false positives, searching for public JSON APIs, and the proposal log for ``socid_extractor``.
+
+Main files:
+
+- `site-checks-playbook.md <https://github.com/soxoj/maigret/blob/main/LLM/site-checks-playbook.md>`_ — short checklist
+- `site-checks-guide.md <https://github.com/soxoj/maigret/blob/main/LLM/site-checks-guide.md>`_ — detailed guide
+- `socid_extractor_improvements.log <https://github.com/soxoj/maigret/blob/main/LLM/socid_extractor_improvements.log>`_ — template and entries for identity extractor improvements
+
+These files should be kept up-to-date whenever changes are made to the check logic in the code or in ``data.json``.
+
+.. _activation-mechanism:
+
+Activation mechanism
+--------------------
+
+The activation mechanism helps make requests to sites requiring additional authentication like cookies, JWT tokens, or custom headers.
+
+Let's study the Vimeo site check record from the Maigret database:
+
+.. code-block:: json
+
+      "Vimeo": {
+          "tags": [
+              "us",
+              "video"
+          ],
+          "headers": {
+              "Authorization": "jwt eyJ0..."
+          },
+          "activation": {
+              "url": "https://vimeo.com/_rv/viewer",
+              "marks": [
+                  "Something strange occurred. Please get in touch with the app's creator."
+              ],
+              "method": "vimeo"
+          },
+          "urlProbe": "https://api.vimeo.com/users/{username}?fields=name...",
+          "checkType": "status_code",
+          "alexaRank": 148,
+          "urlMain": "https://vimeo.com/",
+          "url": "https://vimeo.com/{username}",
+          "usernameClaimed": "blue",
+          "usernameUnclaimed": "noonewouldeverusethis7"
+      },
+
+The activation method is:
+
+.. code-block:: python
+
+    def vimeo(site, logger, cookies={}):
+        headers = dict(site.headers)
+        if "Authorization" in headers:
+            del headers["Authorization"]
+        import requests
+
+        r = requests.get(site.activation["url"], headers=headers)
+        jwt_token = r.json()["jwt"]
+        site.headers["Authorization"] = "jwt " + jwt_token
+
+Here's how the activation process works when a JWT token becomes invalid:
+
+1. The site check makes an HTTP request to ``urlProbe`` with the invalid token
+2. The response contains an error message specified in the ``activation``/``marks`` field
+3. When this error is detected, the ``vimeo`` activation function is triggered
+4. The activation function obtains a new JWT token and updates it in the site check record
+5. On the next site check (either through retry or a new Maigret run), the valid token is used and the check succeeds
+
+Examples of activation mechanism implementation are available in `activation.py <https://github.com/soxoj/maigret/blob/main/maigret/activation.py>`_ file.
+
+How to publish new version of Maigret
+-------------------------------------
+
+**Collaborats rights are requires, write Soxoj to get them**.
+
+For new version publishing you must create a new branch in repository
+with a bumped version number and actual changelog first. After it you
+must create a release, and GitHub action automatically create a new 
+PyPi package. 
+
+- New branch example: https://github.com/soxoj/maigret/commit/e520418f6a25d7edacde2d73b41a8ae7c80ddf39
+- Release example: https://github.com/soxoj/maigret/releases/tag/v0.4.1
+
+1. Make a new branch locally with a new version name. Check the current version number here: https://pypi.org/project/maigret/.
+**Increase only patch version (third number)** if there are no breaking changes.
+
+.. code-block:: console
+
+  git checkout -b 0.4.0
+
+2. Update Maigret version in three files manually:
+
+- pyproject.toml
+- maigret/__version__.py 
+- docs/source/conf.py
+- snapcraft.yaml
+
+3. Create a new empty text section in the beginning of the file `CHANGELOG.md` with a current date:
+
+.. code-block:: console
+
+  ## [0.4.0] - 2022-01-03
+
+4. Get auto-generate release notes:
+
+- Open https://github.com/soxoj/maigret/releases/new
+- Click `Choose a tag`, enter `v0.4.0` (your version)
+- Click `Create new tag`
+- Press `+ Auto-generate release notes`
+- Copy all the text from description text field below
+- Paste it to empty text section in `CHANGELOG.txt`
+- Remove redundant lines `## What's Changed` and `## New Contributors` section if it exists
+- *Close the new release page*
+
+5. Commit all the changes, push, make pull request
+
+.. code-block:: console
+
+  git add -p
+  git commit -m 'Bump to YOUR VERSION'
+  git push origin head
+
+
+6. Merge pull request
+
+7. Create new release
+
+- Open https://github.com/soxoj/maigret/releases/new again
+- Click `Choose a tag`
+- Enter actual version in format `v0.4.0`
+- Also enter actual version in the field `Release title` 
+- Click `Create new tag`
+- Press `+ Auto-generate release notes`
+- **Press "Publish release" button**
+
+8. That's all, now you can simply wait push to PyPi. You can monitor it in Action page: https://github.com/soxoj/maigret/actions/workflows/python-publish.yml
+
+Documentation updates
+---------------------
+
+Documentations is auto-generated and auto-deployed from the ``docs`` directory.
+
+To manually update documentation:
+
+1. Change something in the ``.rst`` files in the ``docs/source`` directory.
+2. Install ``python -m pip install -e .`` in the docs directory.
+3. Run ``make singlehtml`` in the terminal in the docs directory.
+4. Open ``build/singlehtml/index.html`` in your browser to see the result.
+5. If everything is ok, commit and push your changes to GitHub. 
+
+Roadmap
+-------
+
+.. warning::
+   This roadmap requires updating to reflect the current project status and future plans.
+
+.. figure:: https://i.imgur.com/kk8cFdR.png   
+   :target: https://i.imgur.com/kk8cFdR.png
+   :align: center
@@ -0,0 +1,352 @@
+.. _features:
+
+Features
+========
+
+This is the list of Maigret features.
+
+.. _web-interface:
+
+Web Interface
+-------------
+
+You can run Maigret with a web interface, where you can view the graph with results and download reports of all formats on a single page.
+
+
+.. image:: https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot_start.png
+   :alt: Web interface: how to start
+
+
+.. image:: https://raw.githubusercontent.com/soxoj/maigret/main/static/web_interface_screenshot.png
+   :alt: Web interface: results
+
+
+Instructions:
+
+1. Run Maigret with the ``--web`` flag and specify the port number.
+
+.. code-block:: console
+
+  maigret --web 5000
+
+2. Open http://127.0.0.1:5000 in your browser and enter one or more usernames to make a search.
+
+3. Wait a bit for the search to complete and view the graph with results, the table with all accounts found, and download reports of all formats.
+
+Personal info gathering
+-----------------------
+
+Maigret does the `parsing of accounts webpages and extraction <https://github.com/soxoj/socid-extractor>`_ of personal info, links to other profiles, etc.
+Extracted info displayed as an additional result in CLI output and as tables in HTML and PDF reports.
+Also, Maigret use found ids and usernames from links to start a recursive search.
+
+Enabled by default, can be disabled with ``--no extracting``.
+
+.. code-block:: text
+
+    $ python3 -m maigret soxoj --timeout 5
+        [-] Starting a search on top 500 sites from the Maigret database...
+        [!] You can run search by full list of sites with flag `-a`
+        [*] Checking username soxoj on:
+        ...
+        [+] GitHub: https://github.com/soxoj
+                ├─uid: 31013580
+                ├─image: https://avatars.githubusercontent.com/u/31013580?v=4
+                ├─created_at: 2017-08-14T17:03:07Z
+                ├─location: Amsterdam, Netherlands
+                ├─follower_count: 1304
+                ├─following_count: 54
+                ├─fullname: Soxoj
+                ├─public_gists_count: 3
+                ├─public_repos_count: 88
+                ├─twitter_username: sox0j
+                ├─bio: Head of OSINT Center of Excellence in @SocialLinks-IO
+                ├─is_company: Social Links
+                └─blog_url: soxoj.com
+        ...
+
+Recursive search
+----------------
+
+Maigret has the ability to scan account pages for :ref:`common identifiers <supported-identifier-types>` and usernames found in links.
+When people include links to their other social media accounts, Maigret can automatically detect and initiate new searches for those profiles.
+Any information discovered through this process will be shown in both the command-line interface output and generated reports.
+
+Enabled by default, can be disabled with ``--no-recursion``.
+
+
+.. code-block:: text
+
+    $ python3 -m maigret soxoj --timeout 5
+        [-] Starting a search on top 500 sites from the Maigret database...
+        [!] You can run search by full list of sites with flag `-a`
+        [*] Checking username soxoj on:
+        ...
+        [+] GitHub: https://github.com/soxoj
+                ├─uid: 31013580
+                ├─image: https://avatars.githubusercontent.com/u/31013580?v=4
+                ├─created_at: 2017-08-14T17:03:07Z
+                ├─location: Amsterdam, Netherlands
+                ├─follower_count: 1304
+                ├─following_count: 54
+                ├─fullname: Soxoj
+                ├─public_gists_count: 3
+                ├─public_repos_count: 88
+                ├─twitter_username: sox0j     <===== another username found here
+                ├─bio: Head of OSINT Center of Excellence in @SocialLinks-IO
+                ├─is_company: Social Links
+                └─blog_url: soxoj.com
+        ...
+        Searching |████████████████████████████████████████| 500/500 [100%] in 9.1s (54.85/s)
+        [-] You can see detailed site check errors with a flag `--print-errors`
+        [*] Checking username sox0j on:
+        [+] Telegram: https://t.me/sox0j
+            ├─fullname: @Sox0j
+            ...
+
+Username permutations
+---------------------
+
+Maigret can generate permutations of usernames. Just pass a few usernames in the CLI and use ``--permute`` flag.
+Thanks to `@balestek <https://github.com/balestek>`_ for the idea and implementation.
+
+.. code-block:: text
+
+    $ python3 -m maigret --permute hope dream --timeout 5
+    [-] 12 permutations from hope dream to check...
+        ├─ hopedream
+        ├─ _hopedream 
+        ├─ hopedream_
+        ├─ hope_dream
+        ├─ hope-dream
+        ├─ hope.dream
+        ├─ dreamhope
+        ├─ _dreamhope
+        ├─ dreamhope_
+        ├─ dream_hope
+        ├─ dream-hope
+        └─ dream.hope
+    [-] Starting a search on top 500 sites from the Maigret database...
+    [!] You can run search by full list of sites with flag `-a`
+    [*] Checking username hopedream on:
+    ...
+
+Reports 
+-------
+
+Maigret currently supports HTML, PDF, TXT, XMind 8 mindmap, and JSON reports.
+
+HTML/PDF reports contain:
+
+- profile photo
+- all the gathered personal info
+- additional information about supposed personal data (full name, gender, location), resulting from statistics of all found accounts
+
+Also, there is a short text report in the CLI output after the end of a searching phase.
+
+.. warning::
+   XMind 8 mindmaps are incompatible with XMind 2022!
+
+AI analysis
+-----------
+
+Maigret can produce a short, human-readable investigation summary on top
+of the raw search results using the ``--ai`` flag. It builds the
+internal Markdown report, sends it to an OpenAI-compatible chat
+completion endpoint, and streams the model's reply directly to the
+terminal.
+
+.. code-block:: console
+
+   export OPENAI_API_KEY=sk-...
+   maigret username --ai
+
+The summary uses a fixed format with the most likely real name,
+location, occupation, interests, languages, main website, username
+variants, number of platforms, active years, a confidence rating, and a
+short list of follow-up leads. While ``--ai`` is active, per-site
+progress and the short text report are suppressed so the streamed
+summary is the main output.
+
+The endpoint, model, and API key are configured via ``settings.json``
+(``openai_api_key``, ``openai_model``, ``openai_api_base_url``) or the
+``OPENAI_API_KEY`` environment variable. Any OpenAI-compatible API can
+be used (Azure OpenAI, OpenRouter, a local server, …). See
+:ref:`ai-analysis` and :ref:`settings` for details.
+
+Tags
+----
+
+The Maigret sites database very big (and will be bigger), and it is maybe an overhead to run a search for all the sites.
+Also, it is often hard to understand, what sites more interesting for us in the case of a certain person.
+
+Tags markup allows selecting a subset of sites by interests (photo, messaging, finance, etc.) or by country. Tags of found accounts grouped and displayed in the reports.
+
+See full description :doc:`in the Tags Wiki page <tags>`.
+
+Censorship and captcha detection
+--------------------------------
+
+Maigret can detect common errors such as censorship stub pages, CloudFlare captcha pages, and others. 
+If you get more them 3% errors of a certain type in a session, you've got a warning message in the CLI output with recommendations to improve performance and avoid problems.
+
+Retries
+-------
+
+Maigret will do retries of the requests with temporary errors got (connection failures, proxy errors, etc.).
+
+One attempt by default, can be changed with option ``--retries N``.
+
+Database self-check
+-------------------
+
+Maigret includes a self-check mode (``--self-check``) that validates every site
+in the database by looking up its known-claimed and known-unclaimed usernames
+and verifying that the detection results match expectations.
+
+The self-check is **error-resilient**: if an individual site check raises an
+unexpected exception (e.g. a network error or a parsing failure), the error is
+caught, logged, and recorded as an issue — the remaining sites continue to be
+checked without interruption. This means the process always runs to completion,
+even when checking hundreds of sites with ``-a --self-check``.
+
+Use ``--auto-disable`` together with ``--self-check`` to automatically disable
+sites that fail checks. Without it, issues are only reported. Use ``--diagnose``
+to print detailed per-site diagnosis including the check type, specific issues,
+and recommendations.
+
+.. code-block:: console
+
+  # Report-only mode (no changes to the database)
+  maigret --self-check
+
+  # Automatically disable failing sites and save updates
+  maigret -a --self-check --auto-disable
+
+  # Show detailed diagnosis for each failing site
+  maigret -a --self-check --diagnose
+
+Archives and mirrors checking
+-----------------------------
+
+The Maigret database contains not only the original websites, but also mirrors, archives, and aggregators. For example:
+
+- `Picuki <https://www.picuki.com/>`_, Instagram mirror
+- (no longer available) `Reddit BigData search <https://camas.github.io/reddit-search/>`_
+- (no longer available) `Twitter shadowban <https://shadowban.eu/>`_ checker
+
+It allows getting additional info about the person and checking the existence of the account even if the main site is unavailable (bot protection, captcha, etc.)
+
+.. _cloudflare-bypass:
+
+Cloudflare webgate bypass
+-------------------------
+
+.. warning::
+
+   **Experimental feature.** The Cloudflare webgate is under active
+   development. The configuration schema, CLI flag behaviour, and the set
+   of sites that route through it may change without backwards-compatibility
+   guarantees. Expect rough edges (CF rate limits, occasional solver
+   failures) and report issues so they can be ironed out.
+
+Some sites sit behind a full Cloudflare JavaScript challenge or a CF firewall
+hard block — these are tagged ``protection: ["cf_js_challenge"]`` or
+``protection: ["cf_firewall"]`` in the database and are normally kept disabled
+because neither aiohttp nor curl_cffi can solve the JS challenge on their own.
+
+Maigret can offload these checks to a local Chrome-based solver. Two backends
+are supported, configured in ``settings.json`` under
+``cloudflare_bypass.modules`` (the first reachable module wins; subsequent
+ones are tried as a fallback chain):
+
+* **FlareSolverr** (recommended). Runs a real Chrome instance and exposes a
+  JSON API. The upstream HTTP status, headers and final URL are preserved, so
+  ``checkType: status_code`` and ``checkType: response_url`` keep working
+  through the bypass.
+
+  .. code-block:: console
+
+    docker run -d -p 8191:8191 --name flaresolverr ghcr.io/flaresolverr/flaresolverr:latest
+
+* **CloudflareBypassForScraping** (legacy fallback). Returns rendered HTML
+  only, so the upstream status code is lost — ``checkType: message`` keeps
+  working but ``status_code`` checks misfire (treated as 200 on success).
+
+Activate the bypass either with the CLI flag::
+
+    maigret --cloudflare-bypass <username>
+
+or by setting ``cloudflare_bypass.enabled`` to ``true`` in ``settings.json``.
+The bypass only fires for sites whose ``protection`` field intersects
+``cloudflare_bypass.trigger_protection`` (default
+``["cf_js_challenge", "cf_firewall", "webgate"]``); all other sites use the
+normal aiohttp / curl_cffi path.
+
+If all configured modules are unreachable, affected sites get an UNKNOWN
+status with an actionable error pointing at the first module's URL — the
+fix is almost always to start the FlareSolverr container.
+
+FlareSolverr session reuse is automatic: Maigret pins a single
+``session: <session_prefix>-<pid>`` per run, so cf_clearance cookies are
+shared between checks of the same domain (5–10× faster on subsequent
+requests to that host).
+
+Activation
+----------
+The activation mechanism helps make requests to sites requiring additional authentication like cookies, JWT tokens, or custom headers.
+
+It works by implementing a custom function that:
+
+1. Makes a specialized HTTP request to a specific website endpoint
+2. Processes the response
+3. Updates the headers/cookies for that site in the local Maigret database
+
+Since activation only triggers after encountering specific errors, a retry (or another Maigret run) is needed to obtain a valid response with the updated authentication.
+
+The activation mechanism is enabled by default, and cannot be disabled at the moment.
+
+See for more details in Development section :ref:`activation-mechanism`.
+
+.. _extracting-information-from-pages:
+
+Extraction of information from account pages
+--------------------------------------------
+
+Maigret can parse URLs and content of web pages by URLs to extract info about account owner and other meta information.
+
+You must specify the URL with the option ``--parse``, it's can be a link to an account or an online document. List of supported sites `see here <https://github.com/soxoj/socid-extractor#sites>`_.
+
+After the end of the parsing phase, Maigret will start the search phase by :doc:`supported identifiers <supported-identifier-types>` found (usernames, ids, etc.).
+
+.. code-block:: console
+
+  $ maigret --parse https://docs.google.com/spreadsheets/d/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw/edit\#gid\=0
+
+  Scanning webpage by URL https://docs.google.com/spreadsheets/d/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw/edit#gid=0...
+  ┣╸org_name: Gooten
+  ┗╸mime_type: application/vnd.google-apps.ritz
+  Scanning webpage by URL https://clients6.google.com/drive/v2beta/files/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw?fields=alternateLink%2CcopyRequiresWriterPermission%2CcreatedDate%2Cdescription%2CdriveId%2CfileSize%2CiconLink%2Cid%2Clabels(starred%2C%20trashed)%2ClastViewedByMeDate%2CmodifiedDate%2Cshared%2CteamDriveId%2CuserPermission(id%2Cname%2CemailAddress%2Cdomain%2Crole%2CadditionalRoles%2CphotoLink%2Ctype%2CwithLink)%2Cpermissions(id%2Cname%2CemailAddress%2Cdomain%2Crole%2CadditionalRoles%2CphotoLink%2Ctype%2CwithLink)%2Cparents(id)%2Ccapabilities(canMoveItemWithinDrive%2CcanMoveItemOutOfDrive%2CcanMoveItemOutOfTeamDrive%2CcanAddChildren%2CcanEdit%2CcanDownload%2CcanComment%2CcanMoveChildrenWithinDrive%2CcanRename%2CcanRemoveChildren%2CcanMoveItemIntoTeamDrive)%2Ckind&supportsTeamDrives=true&enforceSingleParent=true&key=AIzaSyC1eQ1xj69IdTMeii5r7brs3R90eck-m7k...
+  ┣╸created_at: 2016-02-16T18:51:52.021Z
+  ┣╸updated_at: 2019-10-23T17:15:47.157Z
+  ┣╸gaia_id: 15696155517366416778
+  ┣╸fullname: Nadia Burgess
+  ┣╸email: nadia@gooten.com
+  ┣╸image: https://lh3.googleusercontent.com/a-/AOh14GheZe1CyNa3NeJInWAl70qkip4oJ7qLsD8vDy6X=s64
+  ┗╸email_username: nadia
+
+.. code-block:: console
+
+  $ maigret.py --parse https://steamcommunity.com/profiles/76561199113454789
+  Scanning webpage by URL https://steamcommunity.com/profiles/76561199113454789...
+  ┣╸steam_id: 76561199113454789
+  ┣╸nickname: Pok
+  ┗╸username: Machine42
+
+
+Simple API
+----------
+
+Maigret can be easily integrated with the use of Python package `maigret <https://pypi.org/project/maigret/>`_.
+
+Example: the official `Telegram bot <https://github.com/soxoj/maigret-tg-bot>`_
@@ -0,0 +1,54 @@
+.. _index:
+
+Welcome to the Maigret docs!
+============================
+
+**Maigret** is an easy-to-use and powerful OSINT tool for collecting a dossier on a person by a username (alias) only.
+
+This is achieved by checking for accounts on a huge number of sites and gathering all the available information from web pages.
+
+The project's main goal — give to OSINT researchers and pentesters a **universal tool** to get maximum information
+about a person of interest by a username and integrate it with other tools in automatization pipelines.
+
+.. warning::
+   **This tool is intended for educational and lawful purposes only.**
+   The developers do not endorse or encourage any illegal activities or misuse of this tool.
+   Regulations regarding the collection and use of personal data vary by country and region,
+   including but not limited to GDPR in the EU, CCPA in the USA, and similar laws worldwide.
+
+   It is your sole responsibility to ensure that your use of this tool complies with all applicable laws
+   and regulations in your jurisdiction. Any illegal use of this tool is strictly prohibited,
+   and you are fully accountable for your actions.
+
+   The authors and developers of this tool bear no responsibility for any misuse
+   or unlawful activities conducted by its users.
+
+You may be interested in:
+-------------------------
+- :doc:`Quick start <quick-start>`
+- :doc:`Usage examples <usage-examples>`
+- :doc:`Command line options <command-line-options>`
+- :doc:`Features list <features>`
+- :doc:`Library usage <library-usage>`
+
+.. toctree::
+   :hidden:
+   :caption: Sections
+
+   quick-start
+   installation
+   usage-examples
+   command-line-options
+   features
+   library-usage
+   philosophy
+   supported-identifier-types
+   tags
+   settings
+   development
+
+.. toctree::
+   :hidden:
+   :caption: Use cases
+
+   use-cases/crypto
@@ -0,0 +1,149 @@
+.. _installation:
+
+Installation
+============
+
+Maigret can be installed using pip, Docker, or simply can be launched from the cloned repo.
+Also, it is available online via `official Telegram bot <https://t.me/maigret_search_bot>`_,
+source code of a bot is `available on GitHub <https://github.com/soxoj/maigret-tg-bot>`_.
+
+Windows Standalone EXE-binaries
+-------------------------------
+
+Standalone EXE-binaries for Windows are located in the `Releases section <https://github.com/soxoj/maigret/releases>`_ of GitHub repository.
+
+Currently, the new binary is created automatically after each commit to **main** and **dev** branches.
+
+Video guide on how to run it: https://youtu.be/qIgwTZOmMmM.
+
+
+Cloud Shells and Jupyter notebooks
+----------------------------------
+
+In case you don't want to install Maigret locally, you can use cloud shells and Jupyter notebooks.
+Press one of the buttons below and follow the instructions to launch it in your browser.
+
+.. image:: https://user-images.githubusercontent.com/27065646/92304704-8d146d80-ef80-11ea-8c29-0deaabb1c702.png
+   :target: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/soxoj/maigret&tutorial=README.md
+   :alt: Open in Cloud Shell
+
+.. image:: https://replit.com/badge/github/soxoj/maigret
+   :target: https://repl.it/github/soxoj/maigret
+   :alt: Run on Replit
+   :height: 50
+
+.. image:: https://colab.research.google.com/assets/colab-badge.svg
+   :target: https://colab.research.google.com/gist/soxoj/879b51bc3b2f8b695abb054090645000/maigret-collab.ipynb
+   :alt: Open In Colab
+   :height: 45
+
+.. image:: https://mybinder.org/badge_logo.svg
+   :target: https://mybinder.org/v2/gist/soxoj/9d65c2f4d3bec5dd25949197ea73cf3a/HEAD
+   :alt: Open In Binder
+   :height: 45
+
+Local installation from PyPi
+----------------------------
+
+Maigret ships with a bundled site database. After installation from PyPI (or any other method), it can **automatically fetch a newer compatible database from GitHub** when you run it—see :ref:`database-auto-update` in :doc:`settings`.
+
+.. note::
+   Python 3.10 or higher and pip is required, **Python 3.11 is recommended.**
+
+.. code-block:: bash
+
+   # install from pypi
+   pip3 install maigret
+
+   # usage
+   maigret username
+
+Development version (GitHub)
+----------------------------
+
+.. code-block:: bash
+
+   git clone https://github.com/soxoj/maigret && cd maigret
+   pip3 install .
+
+   # OR
+   pip3 install git+https://github.com/soxoj/maigret.git
+
+   # usage
+   maigret username
+
+   # OR use poetry in case you plan to develop Maigret
+   pip3 install poetry
+   poetry run maigret
+
+Docker
+------
+
+.. code-block:: bash
+
+   # official image of the development version, updated from the github repo
+   docker pull soxoj/maigret
+
+   # usage
+   docker run -v /mydir:/app/reports soxoj/maigret:latest username --html
+
+   # manual build
+   docker build -t maigret .
+
+Troubleshooting
+---------------
+
+If you encounter build errors during installation such as ``cannot find ft2build.h``
+or errors related to ``reportlab`` / ``_renderPM``, you need to install system-level
+dependencies required to compile native extensions.
+
+**Debian/Ubuntu/Kali:**
+
+.. code-block:: bash
+
+   sudo apt install -y libfreetype6-dev libjpeg-dev libffi-dev
+
+**Fedora/RHEL/CentOS:**
+
+.. code-block:: bash
+
+   sudo dnf install -y freetype-devel libjpeg-devel libffi-devel
+
+**Arch Linux:**
+
+.. code-block:: bash
+
+   sudo pacman -S freetype2 libjpeg-turbo libffi
+
+**macOS (Homebrew):**
+
+.. code-block:: bash
+
+   brew install freetype
+
+After installing the system dependencies, retry the maigret installation.
+
+If you continue to have issues, consider using Docker instead, which includes all
+necessary dependencies.
+
+Optional: Cloudflare bypass solver
+----------------------------------
+
+.. warning::
+
+   **Experimental.** The Cloudflare webgate is under active development;
+   the configuration schema and CLI behaviour may change without
+   backwards-compatibility guarantees.
+
+Sites tagged ``cf_js_challenge`` / ``cf_firewall`` need a real browser to pass
+their JavaScript challenge. To check those sites you can run a local
+`FlareSolverr <https://github.com/FlareSolverr/FlareSolverr>`_ instance —
+Maigret will route protected checks to it when ``--cloudflare-bypass`` is set:
+
+.. code-block:: bash
+
+   docker run -d -p 8191:8191 --name flaresolverr ghcr.io/flaresolverr/flaresolverr:latest
+
+This is **optional** — Maigret runs without it; only sites whose
+``protection`` field intersects ``settings.cloudflare_bypass.trigger_protection``
+require the solver. See :ref:`cloudflare-bypass` for details.
@@ -0,0 +1,139 @@
+.. _library-usage:
+
+Library usage
+=============
+
+Maigret's CLI is a thin wrapper around an async Python API. You can embed Maigret in your own tools, pipelines, and OSINT workflows — no need to shell out.
+
+This page covers the common patterns. For the full argument list of the underlying function, see ``maigret.checking.maigret`` in the source.
+
+Installation
+------------
+
+.. code-block:: bash
+
+   pip install maigret
+
+Minimal example
+---------------
+
+A working end-to-end search against the top 500 sites:
+
+.. code-block:: python
+
+   import asyncio
+   import logging
+
+   from maigret import search as maigret_search
+   from maigret.sites import MaigretDatabase
+
+   # Load the bundled site database
+   db = MaigretDatabase().load_from_path(
+       "maigret/resources/data.json"
+   )
+
+   # Pick which sites to scan (same filtering the CLI uses)
+   sites = db.ranked_sites_dict(top=500)
+
+   results = asyncio.run(
+       maigret_search(
+           username="soxoj",
+           site_dict=sites,
+           logger=logging.getLogger("maigret"),
+           timeout=30,
+           is_parsing_enabled=True,
+       )
+   )
+
+   for site_name, result in results.items():
+       if result["status"].is_found():
+           print(site_name, result["url_user"])
+
+Key points:
+
+- ``maigret_search`` is an ``async`` function — wrap it with ``asyncio.run(...)`` or ``await`` it from inside your own event loop.
+- ``is_parsing_enabled=True`` turns on ``socid_extractor`` so ``result["ids_data"]`` is populated with profile fields (bio, linked accounts, uids, etc.).
+- Each entry in the returned dict has a ``"status"`` object with ``is_found()``, plus ``url_user``, ``http_status``, ``rank``, ``ids_data``, and more.
+
+Filtering sites
+---------------
+
+``ranked_sites_dict`` accepts the same filters as the CLI:
+
+.. code-block:: python
+
+   # All sites tagged as coding, top 200 by rank
+   sites = db.ranked_sites_dict(top=200, tags=["coding"])
+
+   # Exclude NSFW and dating sites
+   sites = db.ranked_sites_dict(excluded_tags=["nsfw", "dating"])
+
+   # Only specific sites by name
+   sites = db.ranked_sites_dict(names=["GitHub", "Reddit", "VK"])
+
+   # Include disabled sites (useful for maintenance / self-check)
+   sites = db.ranked_sites_dict(disabled=True)
+
+Running inside an existing event loop
+-------------------------------------
+
+If your application already runs an asyncio loop (FastAPI, aiohttp server, a Discord bot, etc.), ``await`` ``maigret_search`` directly instead of calling ``asyncio.run``:
+
+.. code-block:: python
+
+   async def check_username(username: str) -> dict:
+       results = await maigret_search(
+           username=username,
+           site_dict=sites,
+           logger=logger,
+           timeout=30,
+       )
+       return {
+           name: r["url_user"]
+           for name, r in results.items()
+           if r["status"].is_found()
+       }
+
+Routing through a proxy
+-----------------------
+
+The same proxy / Tor / I2P flags the CLI exposes are plain keyword arguments:
+
+.. code-block:: python
+
+   results = await maigret_search(
+       username="soxoj",
+       site_dict=sites,
+       logger=logger,
+       proxy="socks5://127.0.0.1:1080",
+       tor_proxy="socks5://127.0.0.1:9050",   # used for .onion sites
+       i2p_proxy="http://127.0.0.1:4444",     # used for .i2p sites
+       timeout=30,
+   )
+
+Full function signature
+-----------------------
+
+.. code-block:: python
+
+   async def maigret(
+       username: str,
+       site_dict: Dict[str, MaigretSite],
+       logger,
+       query_notify=None,
+       proxy=None,
+       tor_proxy=None,
+       i2p_proxy=None,
+       timeout=30,
+       is_parsing_enabled=False,
+       id_type="username",
+       debug=False,
+       forced=False,
+       max_connections=100,
+       no_progressbar=False,
+       cookies=None,
+       retries=0,
+       check_domains=False,
+   ) -> QueryResultWrapper
+
+See :doc:`command-line-options` for a description of each option — the semantics match the CLI flags one-to-one.
@@ -0,0 +1,41 @@
+.. _philosophy:
+
+Philosophy
+==========
+
+    *The Commissioner Jules Maigret is a fictional French police detective, created by Georges Simenon.
+    His investigation method is based on understanding the personality of different people and their
+    interactions.*
+
+TL;DR: Username => Dossier
+
+Maigret is designed to gather all the available information about person by his username.
+
+What kind of information is this? First, links to person accounts. Secondly, all the machine-extractable
+pieces of info, such as: other usernames, full name, URLs to people's images, birthday, location (country,
+city, etc.), gender.
+
+All this information forms some dossier, but it also useful for other tools and analytical purposes.
+Each collected piece of data has a label of a certain format (for example, ``follower_count`` for the number
+of subscribers or ``created_at`` for account creation time) so that it can be parsed and analyzed by various
+systems and stored in databases.
+
+Origins
+-------
+
+Maigret started from studying what OSINT investigators actually use in practice — and from
+the realization that many popular tools do not deliver real investigative value. The original
+research behind this observation is summarized in the article
+`What's wrong with namecheckers <https://soxoj.medium.com/whats-wrong-with-namecheckers-981e5cba600e>`_.
+For a broader landscape of username-checking tools, see the curated
+`OSINT namecheckers list <https://github.com/soxoj/osint-namecheckers-list>`_.
+
+Two ideas grew out of that research:
+
+- `socid-extractor <https://github.com/soxoj/socid-extractor>`_ — a library focused on pulling
+  structured identity data (user IDs, full names, linked accounts, bios, timestamps, etc.) out of
+  account pages and public API responses, so that finding an account is not the end of the pipeline.
+- **Maigret** itself — which started as a fork of
+  `Sherlock <https://github.com/sherlock-project/sherlock>`_ but has long since outgrown the
+  original project in coverage, extraction depth, and check reliability. Today Maigret is used
+  as a component by major OSINT vendors in their commercial products.
@@ -0,0 +1,15 @@
+.. _quick-start:
+
+Quick start
+===========
+
+After :doc:`installing Maigret <installation>`, you can begin searching by providing one or more usernames to look up:
+
+``maigret username1 username2 ...``
+
+Maigret will search for accounts with the specified usernames across a vast number of websites. It will provide you with a list 
+of URLs to any discovered accounts, along with relevant information extracted from those profiles.
+
+.. image:: maigret_screenshot.png
+   :alt: Maigret search results screenshot
+   :align: center
@@ -0,0 +1,240 @@
+.. _settings:
+
+Settings
+==============
+
+.. warning::
+   The settings system is under development and may be subject to change.
+
+Options are also configurable through settings files. See
+`settings JSON file <https://github.com/soxoj/maigret/blob/main/maigret/resources/settings.json>`_
+for the list of currently supported options.
+
+After start Maigret tries to load configuration from the following sources in exactly the same order:
+
+.. code-block:: console
+
+  # relative path, based on installed package path
+  resources/settings.json
+
+  # absolute path, configuration file in home directory
+  ~/.maigret/settings.json
+
+  # relative path, based on current working directory
+  settings.json
+
+Missing any of these files is not an error.
+If the next settings file contains already known option,
+this option will be rewrited. So it is possible to make
+custom configuration for different users and directories.
+
+.. _database-auto-update:
+
+Database auto-update
+--------------------
+
+Maigret ships with a bundled site database, but it gets outdated between releases. To keep the database current, Maigret automatically checks for updates on startup.
+
+**How it works:**
+
+1. On startup, Maigret checks if more than 24 hours have passed since the last update check.
+2. If so, it fetches a lightweight metadata file (~200 bytes) from GitHub to see if a newer database is available.
+3. If a newer, compatible database exists, Maigret downloads it to ``~/.maigret/data.json`` and uses it instead of the bundled copy.
+4. If the download fails or the new database is incompatible with your Maigret version, the bundled database is used as a fallback.
+
+The downloaded database has **higher priority** than the bundled one — it replaces, not overlays.
+
+**Status messages** are printed only when an action occurs:
+
+.. code-block:: text
+
+   [*] DB auto-update: checking for updates...
+   [+] DB auto-update: database updated successfully (3180 sites)
+   [*] DB auto-update: database is up to date (3157 sites)
+   [!] DB auto-update: latest database requires maigret >= 0.6.0, you have 0.5.0
+
+**Forcing an update:**
+
+Use the ``--force-update`` flag to check for updates immediately, ignoring the check interval:
+
+.. code-block:: console
+
+   maigret username --force-update
+
+The update happens at startup, then the search continues normally with the freshly downloaded database.
+
+**Disabling auto-update:**
+
+Use the ``--no-autoupdate`` flag to skip the update check entirely:
+
+.. code-block:: console
+
+   maigret username --no-autoupdate
+
+Or set it permanently in ``~/.maigret/settings.json``:
+
+.. code-block:: json
+
+   {
+       "no_autoupdate": true
+   }
+
+This is recommended for **Docker containers**, **CI pipelines**, and **air-gapped environments**.
+
+**Configuration options** (in ``settings.json``):
+
+.. list-table::
+   :header-rows: 1
+   :widths: 35 15 50
+
+   * - Setting
+     - Default
+     - Description
+   * - ``no_autoupdate``
+     - ``false``
+     - Disable auto-update entirely
+   * - ``autoupdate_check_interval_hours``
+     - ``24``
+     - How often to check for updates (in hours)
+   * - ``db_update_meta_url``
+     - GitHub raw URL
+     - URL of the metadata file (for custom mirrors)
+
+**Using a custom database** with ``--db`` always skips auto-update — you are explicitly choosing your data source.
+
+Cloudflare webgate
+------------------
+
+.. warning::
+
+   **Experimental.** The ``cloudflare_bypass`` block is under active
+   development; field names, defaults, and the trigger-protection routing
+   rules may change without backwards-compatibility guarantees.
+
+The ``cloudflare_bypass`` block in ``settings.json`` configures the optional
+bypass described in :ref:`cloudflare-bypass`. Default value:
+
+.. code-block:: json
+
+   {
+       "cloudflare_bypass": {
+           "enabled": false,
+           "session_prefix": "maigret",
+           "trigger_protection": ["cf_js_challenge", "cf_firewall", "webgate"],
+           "modules": [
+               {
+                   "name": "flaresolverr",
+                   "method": "json_api",
+                   "url": "http://localhost:8191/v1",
+                   "max_timeout_ms": 60000
+               },
+               {
+                   "name": "chrome_webgate",
+                   "method": "url_rewrite",
+                   "url": "http://localhost:8000/html?url={url}&retries=1"
+               }
+           ]
+       }
+   }
+
+**Fields.**
+
+.. list-table::
+   :header-rows: 1
+   :widths: 30 70
+
+   * - Field
+     - Description
+   * - ``enabled``
+     - When ``true``, the bypass is active for every run; when ``false``
+       (the default), it activates only on ``--cloudflare-bypass``.
+   * - ``trigger_protection``
+     - List of ``site.protection`` values that route a check through the
+       webgate. Sites whose protection is empty or doesn't intersect this
+       list use the default (aiohttp / curl_cffi) checker.
+   * - ``session_prefix``
+     - Prefix for the FlareSolverr ``session`` field. Maigret appends the
+       process PID so concurrent runs don't collide. Reusing a session
+       caches cf_clearance between checks of the same domain.
+   * - ``modules``
+     - Ordered list of backend modules. The first reachable module
+       handles the check; later ones serve as a fallback chain.
+
+**Module methods.**
+
+* ``json_api`` — FlareSolverr-compatible POST endpoint at ``url``.
+  Preserves real upstream HTTP status, headers and final URL.
+  Optional ``max_timeout_ms`` (default ``60000``) is the per-request
+  budget the solver is allowed to spend on the JS challenge.
+* ``url_rewrite`` — legacy CloudflareBypassForScraping endpoint. The
+  ``url`` must contain a ``{url}`` placeholder; the original probe URL
+  is URL-encoded and substituted in. Returns rendered HTML only —
+  ``checkType: status_code`` and ``response_url`` checks misfire under
+  this method (treated as a synthetic HTTP 200 on success).
+
+**Optional ``proxy`` field (``json_api`` only).**
+
+A module may carry a ``proxy`` entry that the solver routes the upstream
+request through. Useful when a site enforces ``ip_reputation`` rules
+that block the solver host. Two forms are accepted:
+
+.. code-block:: json
+
+   { "proxy": "socks5://localhost:1080" }
+
+.. code-block:: json
+
+   { "proxy": { "url": "http://gw.example:3128",
+                "username": "u",
+                "password": "p" } }
+
+Only ``url``/``username``/``password`` are forwarded; other keys are
+dropped. Cloudflare ``Error 1015 / 1020`` responses indicate the IP is
+rate-limited or banned — switch the proxy rather than retrying.
+.. _ai-analysis-settings:
+
+AI analysis
+-----------
+
+The ``--ai`` flag (see :ref:`ai-analysis`) talks to an OpenAI-compatible
+chat completion API. Three settings control how that request is made:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 35 25 40
+
+   * - Setting
+     - Default
+     - Description
+   * - ``openai_api_key``
+     - ``""`` (empty)
+     - API key. If empty, Maigret falls back to the ``OPENAI_API_KEY``
+       environment variable.
+   * - ``openai_model``
+     - ``gpt-4o``
+     - Default model name. Overridable per-run with ``--ai-model``.
+   * - ``openai_api_base_url``
+     - ``https://api.openai.com/v1``
+     - Base URL of the chat completion API. Point this at any
+       OpenAI-compatible service (Azure OpenAI, OpenRouter, a local
+       server, …) to use it instead of OpenAI directly.
+
+Example ``~/.maigret/settings.json`` snippet using a non-OpenAI
+endpoint:
+
+.. code-block:: json
+
+   {
+       "openai_api_key": "sk-...",
+       "openai_model": "gpt-4o-mini",
+       "openai_api_base_url": "https://openrouter.ai/api/v1"
+   }
+
+The key resolution order is ``settings.openai_api_key`` → ``OPENAI_API_KEY``
+environment variable; the first non-empty value wins.
+
+.. note::
+
+   ``--ai`` sends the full internal Markdown report (which contains the
+   gathered profile data) to the configured endpoint. Only use providers
+   and accounts you trust with that data.
@@ -0,0 +1,15 @@
+.. _supported-identifier-types:
+
+Supported identifier types
+==========================
+
+Maigret can search against not only ordinary usernames, but also through certain common identifiers. There is a list of all currently supported identifiers.
+
+- **gaia_id** - Google inner numeric user identifier, in former times was placed in a Google Plus account URL. 
+- **steam_id** - Steam inner numeric user identifier.
+- **wikimapia_uid** - Wikimapia.org inner numeric user identifier.
+- **uidme_uguid** - uID.me inner numeric user identifier.
+- **yandex_public_id** - Yandex sites inner letter user identifier. See also: `YaSeeker <https://github.com/HowToFind-bot/YaSeeker>`_. 
+- **vk_id** - VK.com inner numeric user identifier.
+- **ok_id** - OK.ru inner numeric user identifier.
+- **yelp_userid** - Yelp inner user identifier.
@@ -0,0 +1,46 @@
+.. _tags:
+
+Tags
+====
+
+The use of tags allows you to select a subset of the sites from big Maigret DB for search.
+
+.. warning::
+   Tags markup is still not stable.
+
+There are several types of tags:
+
+1. **Country codes**: ``us``, ``jp``, ``br``... (`ISO 3166-1 alpha-2 <https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2>`_). A country tag means that having an account on the site implies a connection to that country — either origin or residence. The goal is attribution, not perfect accuracy.
+
+   - **Global sites** (GitHub, YouTube, Reddit, Medium, etc.) get **no country tag** — an account there says nothing about where a person is from.
+   - **Regional/local sites** where an account implies a specific country **must** have a country tag: ``VK`` → ``ru``, ``Naver`` → ``kr``, ``Zhihu`` → ``cn``.
+   - Multiple country tags are allowed when a service is used predominantly in a few countries (e.g. ``Xing`` → ``de``, ``eu``).
+   - Do **not** assign country tags based on traffic statistics alone — a site popular in India by traffic is not "Indian" if it is used globally.
+
+2. **Site engines**. Most of them are forum engines now: ``uCoz``, ``vBulletin``, ``XenForo`` et al. Full list of engines stored in the Maigret database.
+
+3. **Sites' subject/type and interests of its users**. Full list of "standard" tags is `present in the source code <https://github.com/soxoj/maigret/blob/main/maigret/sites.py#L13>`_ only for a moment. 
+
+Usage
+-----
+``--tags us,jp`` -- search on US and Japanese sites (actually marked as such in the Maigret database)
+
+``--tags coding`` -- search on sites related to software development.
+
+``--tags ucoz`` -- search on uCoz sites only (mostly CIS countries)
+
+Blacklisting (excluding) tags
+------------------------------
+You can exclude sites with certain tags from the search using ``--exclude-tags``:
+
+``--exclude-tags porn,dating`` -- skip all sites tagged with ``porn`` or ``dating``.
+
+``--exclude-tags ru`` -- skip all Russian sites.
+
+You can combine ``--tags`` and ``--exclude-tags`` to fine-tune your search:
+
+``--tags forum --exclude-tags ru`` -- search on forum sites, but skip Russian ones.
+
+In the web interface, the tag cloud supports three states per tag:
+click once to **include** (green), click again to **exclude** (dark/strikethrough),
+and click once more to return to **neutral** (red).
@@ -0,0 +1,80 @@
+.. _usage-examples:
+
+Usage examples
+==============
+
+You can use Maigret as:
+
+- a command line tool: initial and a default mode
+- a `web interface <#web-interface>`_: view the graph with results and download all report formats on a single page
+- a library: integrate Maigret into your own project
+
+Use Cases
+---------
+
+
+1. Search for accounts with username ``machine42`` on top 500 sites (by default, according to Majestic Million rank) from the Maigret DB.
+
+.. code-block:: console
+
+  maigret machine42
+
+2. Search for accounts with username ``machine42`` on **all sites** from the Maigret DB.
+
+.. code-block:: console
+
+  maigret machine42 -a
+
+.. note::
+   Maigret will search for accounts on a huge number of sites,
+   and some of them may return false positive results. At the moment, we are working on autorepair mode to deliver 
+   the most accurate results. 
+   
+   If you experience many false positives, you can do the following:
+
+   - Install the last development version of Maigret from GitHub
+   - Run Maigret with ``--self-check --auto-disable`` flag and agree on disabling of problematic sites
+
+3. Search for accounts with username ``machine42`` and generate HTML and PDF reports.
+
+.. code-block:: console
+
+  maigret machine42 -HP
+
+or
+
+.. code-block:: console
+
+  maigret machine42 -a --html --pdf
+
+
+4. Search for accounts with username ``machine42`` on Facebook only.
+
+.. code-block:: console
+
+  maigret machine42 --site Facebook
+
+5. Extract information from the Steam page by URL and start a search for accounts with found username ``machine42``.
+
+.. code-block:: console
+
+  maigret --parse https://steamcommunity.com/profiles/76561199113454789 
+
+6. Search for accounts with username ``machine42`` only on US and Japanese sites.
+
+.. code-block:: console
+
+  maigret machine42 --tags us,jp
+
+7. Search for accounts with username ``machine42`` only on sites related to software development.
+
+.. code-block:: console
+
+  maigret machine42 --tags coding
+
+8. Search for accounts with username ``machine42`` on uCoz sites only (mostly CIS countries).
+
+.. code-block:: console
+
+  maigret machine42 --tags ucoz
+
@@ -0,0 +1,147 @@
+.. _use-case-crypto:
+
+Cryptocurrency & Web3 Investigations
+=====================================
+
+Blockchain transactions are public, but the people behind wallets are not. Maigret helps bridge this gap by finding Web3 accounts tied to a username, revealing the person behind a pseudonymous crypto persona.
+
+Why it matters
+--------------
+
+Crypto investigations often start with a wallet address or an ENS name but hit a wall — the blockchain tells you *what* happened, not *who* did it. A username, however, is reused across platforms. If someone trades on OpenSea as ``zachxbt`` and posts on Warpcast as ``zachxbt``, Maigret connects the dots and builds a full profile.
+
+Common scenarios:
+
+- **Scam attribution.** A rug-pull promoter uses the same alias on Fragment (Telegram username marketplace), OpenSea, and a personal blog.
+- **Sanctions compliance.** Verifying whether a counterparty's online footprint matches known sanctioned individuals.
+- **Due diligence.** Before an OTC deal or DAO vote, checking whether the other party has a consistent online presence or is a freshly created sockpuppet.
+- **Stolen funds tracing.** A stolen NFT appears on OpenSea under a new account — but the username matches a Warpcast profile with real-world links.
+
+Supported sites
+---------------
+
+Maigret currently checks the following crypto and Web3 platforms:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 20 40 40
+
+   * - Site
+     - What it reveals
+     - Notes
+   * - **OpenSea**
+     - NFT collections, trading history, profile bio, linked website
+     -
+   * - **Rarible**
+     - NFT marketplace profile, collections, listing history
+     - Complements OpenSea for NFT attribution across marketplaces
+   * - **Zora**
+     - Zora Network profile, minted NFTs, creator activity
+     - Ethereum L2 creator platform; useful for on-chain art attribution
+   * - **Polymarket**
+     - Prediction-market profile, positions, public portfolio P&L
+     - Useful for political/financial prediction attribution
+   * - **Warpcast** (Farcaster)
+     - Decentralized social profile, posts, follower graph, Farcaster ID
+     - Every Farcaster ID maps to an Ethereum address via the on-chain ID registry
+   * - **Fragment**
+     - Telegram username ownership, TON wallet address, purchase date and price
+     - Valuable for linking Telegram identities to TON wallets
+   * - **Paragraph**
+     - Web3 blog/newsletter, ETH wallet address, linked Twitter handle
+     - Richest cross-platform data among crypto sites
+   * - **Tonometerbot**
+     - TON wallet balance, subscriber count, NFT collection, rankings
+     - TON blockchain analytics
+   * - **Spatial**
+     - Metaverse profile, linked social accounts (Discord, Twitter, Instagram, LinkedIn, TikTok)
+     - Rich cross-platform links
+   * - **Revolut.me**
+     - Payment handle: first/last name, country code, base currency, supported payment methods
+     - Not strictly Web3, but widely used by crypto OTC traders for fiat off-ramps; the public API returns structured KYC-adjacent data
+
+Real-world example: zachxbt
+---------------------------
+
+`ZachXBT <https://twitter.com/zachxbt>`_ is a well-known on-chain investigator. Let's see what Maigret can find from just the username ``zachxbt``:
+
+.. code-block:: console
+
+   maigret zachxbt --tags crypto
+
+Maigret finds 5 accounts and automatically extracts structured data from each:
+
+**Fragment** — confirms the Telegram username ``@zachxbt`` is claimed, reveals the TON wallet address (``EQBisZrk...``), purchase price (10 TON), and date (January 2023).
+
+**Paragraph** — the richest result. Returns the real name used on the platform (``ZachXBT``), bio (``Scam survivor turned 2D investigator``), an Ethereum wallet address (``0x23dBf066...``), and a linked Twitter handle (``zachxbt``). The ``wallet_address`` field is especially valuable — it directly links the pseudonym to an on-chain identity.
+
+**Warpcast** — Farcaster profile with a Farcaster ID (``fid: 20931``), profile image, and social graph (33K followers). Every Farcaster ID is tied to an Ethereum address via the on-chain ID registry, so this is another on-chain anchor.
+
+**OpenSea** — NFT marketplace profile with bio (``On-chain sleuth | 10x rug pull survivor``), avatar (hosted on ``seadn.io`` with an Ethereum address in the URL path), and a link to an external investigations page.
+
+**Hive Blog** — blockchain-based blog account created in March 2025. Low activity (1 post), but confirms the username is claimed across blockchain ecosystems.
+
+From a single username, Maigret produces:
+
+- **2 wallet addresses** — one TON (from Fragment), one Ethereum (from Paragraph)
+- **1 confirmed Twitter handle** — ``zachxbt`` (from Paragraph)
+- **1 Telegram username** — ``@zachxbt`` (from Fragment)
+- **1 external link** — ``investigations.notion.site`` (from OpenSea)
+- **Social graph data** — 33K Farcaster followers, blog activity timestamps
+
+This is enough to pivot into blockchain analysis tools (Etherscan, Arkham, Nansen) using the wallet addresses, or into social media analysis using the Twitter handle.
+
+Workflow: from username to wallet
+---------------------------------
+
+**Step 1: Search crypto platforms**
+
+.. code-block:: console
+
+   maigret <username> --tags crypto -v
+
+Review the results. Pay attention to:
+
+- **Fragment** — if the username is claimed, you get a TON wallet address directly.
+- **Paragraph** — blog profiles often contain an ETH address and a Twitter handle.
+- **Warpcast** — Farcaster IDs map to Ethereum addresses via the on-chain registry.
+- **OpenSea** — avatar URLs sometimes contain wallet addresses in the path.
+
+**Step 2: Expand with extracted identifiers**
+
+Maigret automatically extracts additional identifiers from found profiles (real names, linked accounts, profile URLs) and recursively searches for them. This is enabled by default. If Maigret finds a linked Twitter handle on a Paragraph profile, it will automatically search for that handle across all sites.
+
+**Step 3: Cross-reference with non-crypto platforms**
+
+The real power is connecting crypto personas to mainstream accounts. Drop the tag filter:
+
+.. code-block:: console
+
+   maigret <username> -a
+
+This checks all 3000+ sites. A match on GitHub, Reddit, or a forum can reveal the person behind the wallet.
+
+Workflow: from wallet to identity
+---------------------------------
+
+If you start with a wallet address rather than a username, you can use complementary tools to get a username first:
+
+1. **ENS / Unstoppable Domains** — resolve the wallet address to a human-readable name (``vitalik.eth``). Then search that name in Maigret.
+2. **Etherscan labels** — check if the address has a public label (exchange, known entity).
+3. **Fragment** — search the TON wallet address to find which Telegram usernames it purchased.
+4. **Arkham Intelligence / Nansen** — blockchain attribution platforms that may tag the address with a known identity.
+
+Once you have a username candidate, feed it to Maigret.
+
+Tips
+----
+
+- **Username reuse is the #1 signal.** Crypto-native users often reuse their ENS name (``alice.eth``) or a variation (``alice_eth``, ``aliceeth``) across platforms. Try all variations.
+- **Fragment is uniquely valuable** because it directly links Telegram usernames to TON wallet addresses — a rare on-chain / off-chain bridge.
+- **Warpcast profiles are Ethereum-native.** Every Farcaster account is tied to an Ethereum address via the ID registry contract. If you find a Warpcast profile, you implicitly have a wallet address.
+- **Paragraph often has the richest data** — wallet address, Twitter handle, bio, and activity timestamps in a single API response.
+- **Use** ``--exclude-tags`` **to skip irrelevant sites** when you're focused on crypto:
+
+  .. code-block:: console
+
+     maigret alice_eth --exclude-tags porn,dating,forum
@@ -0,0 +1,43 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "acxNWJOUmLc4"
+      },
+      "outputs": [],
+      "source": [
+        "!git clone https://github.com/soxoj/maigret\n",
+        "!pip3 install ./maigret/\n",
+        "from IPython.display import clear_output\n",
+        "clear_output()\n",
+        "username = str(input(\"Username >> \"))\n",
+        "!maigret {username} -a -n 10"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "S3SmapMHmOoD"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}
@@ -1,18 +0,0 @@
-#!/usr/bin/env python3
-import asyncio
-import sys
-
-from maigret.maigret import main
-
-
-def run():
-    try:
-        loop = asyncio.get_event_loop()
-        loop.run_until_complete(main())
-    except KeyboardInterrupt:
-        print('Maigret is interrupted.')
-        sys.exit(1)
-
-
-if __name__ == "__main__":
-    run()
@@ -1,5 +1,24 @@
 """Maigret"""

-from .checking import maigret as search
+__title__ = 'Maigret'
+__package__ = 'maigret'
+__author__ = 'Soxoj'
+__author_email__ = 'soxoj@protonmail.com'
+
+
+from .__version__ import __version__
+try:
+    from .checking import maigret as search
+except ImportError as e:
+    raise ImportError(
+        "Missing required dependency while starting Maigret.\n\n"
+        "If installed from PyPI:\n"
+        "    pip install -U maigret\n\n"
+        "If running from a cloned repository:\n"
+        "    pip install -e .\n\n"
+        "Then run Maigret as:\n"
+        "    python -m maigret <username>"
+    ) from e
+from .maigret import main as cli
 from .sites import MaigretEngine, MaigretSite, MaigretDatabase
-from .notify import QueryNotifyPrint as Notifier
+from .notify import QueryNotifyPrint as Notifier
@@ -0,0 +1,3 @@
+"""Maigret version file"""
+
+__version__ = '0.6.0'
@@ -1,69 +1,135 @@
+import json
 from http.cookiejar import MozillaCookieJar
 from http.cookies import Morsel

-import requests
 from aiohttp import CookieJar


 class ParsingActivator:
    @staticmethod
-    def twitter(site, logger, cookies={}):
+    def twitter(site, logger, cookies={}, **kwargs):
        headers = dict(site.headers)
-        del headers['x-guest-token']
-        r = requests.post(site.activation['url'], headers=headers)
+        del headers["x-guest-token"]
+        import requests
+
+        r = requests.post(site.activation["url"], headers=headers)
        logger.info(r)
        j = r.json()
-        guest_token = j[site.activation['src']]
-        site.headers['x-guest-token'] = guest_token
+        guest_token = j[site.activation["src"]]
+        site.headers["x-guest-token"] = guest_token

    @staticmethod
-    def vimeo(site, logger, cookies={}):
+    def vimeo(site, logger, cookies={}, **kwargs):
        headers = dict(site.headers)
-        if 'Authorization' in headers:
-            del headers['Authorization']
-        r = requests.get(site.activation['url'], headers=headers)
-        jwt_token = r.json()['jwt']
-        site.headers['Authorization'] = 'jwt ' + jwt_token
+        if "Authorization" in headers:
+            del headers["Authorization"]
+        import requests
+
+        r = requests.get(site.activation["url"], headers=headers)
+        logger.debug(f"Vimeo viewer activation: {json.dumps(r.json(), indent=4)}")
+        jwt_token = r.json()["jwt"]
+        site.headers["Authorization"] = "jwt " + jwt_token

    @staticmethod
-    def spotify(site, logger, cookies={}):
-        headers = dict(site.headers)
-        if 'Authorization' in headers:
-            del headers['Authorization']
-        r = requests.get(site.activation['url'])
-        bearer_token = r.json()['accessToken']
-        site.headers['authorization'] = f'Bearer {bearer_token}'
+    def onlyfans(site, logger, url=None, **kwargs):
+        # Signing rules (static_param / checksum_indexes / checksum_constant / format / app_token)
+        # live in data.json under OnlyFans.activation and rotate upstream every ~1–3 weeks.
+        # If "Please refresh the page" keeps firing after activation, refresh them from:
+        #   https://raw.githubusercontent.com/DATAHOARDERS/dynamic-rules/main/onlyfans.json
+        import hashlib
+        import secrets
+        import time as _time
+        from urllib.parse import urlparse
+
+        import requests
+
+        act = site.activation
+        static_param = act["static_param"]
+        indexes = act["checksum_indexes"]
+        constant = act["checksum_constant"]
+        fmt = act["format"]
+        init_url = act["url"]
+
+        user_id = site.headers.get("user-id", "0") or "0"
+
+        def _sign(path):
+            t = str(int(_time.time() * 1000))
+            msg = "\n".join([static_param, t, path, user_id]).encode()
+            sha = hashlib.sha1(msg).hexdigest()
+            cs = sum(ord(sha[i]) for i in indexes) + constant
+            return t, fmt.format(sha, abs(cs))
+
+        if site.headers.get("x-bc", "").strip("0") == "":
+            site.headers["x-bc"] = secrets.token_hex(20)
+
+        if not site.headers.get("cookie"):
+            init_path = urlparse(init_url).path
+            t, sg = _sign(init_path)
+            hdrs = dict(site.headers)
+            hdrs["time"] = t
+            hdrs["sign"] = sg
+            hdrs.pop("cookie", None)
+            r = requests.get(init_url, headers=hdrs, timeout=15)
+            jar = "; ".join(f"{k}={v}" for k, v in r.cookies.items())
+            if jar:
+                site.headers["cookie"] = jar
+                logger.debug(f"OnlyFans init: got cookies {list(r.cookies.keys())}")
+
+        target_path = urlparse(url).path if url else urlparse(init_url).path
+        t, sg = _sign(target_path)
+        site.headers["time"] = t
+        site.headers["sign"] = sg
+        logger.debug(f"OnlyFans signed {target_path} time={t}")

    @staticmethod
-    def xssis(site, logger, cookies={}):
-        if not cookies:
-            logger.debug('You must have cookies to activate xss.is parsing!')
-            return
-
+    def weibo(site, logger, **kwargs):
        headers = dict(site.headers)
-        post_data = {
-            '_xfResponseType': 'json',
-            '_xfToken': '1611177919,a2710362e45dad9aa1da381e21941a38'
-        }
-        headers['content-type'] = 'application/x-www-form-urlencoded; charset=UTF-8'
-        r = requests.post(site.activation['url'], headers=headers, cookies=cookies, data=post_data)
-        csrf = r.json()['csrf']
-        site.get_params['_xfToken'] = csrf
+        import requests
+
+        session = requests.Session()
+        # 1 stage: get the redirect URL
+        r = session.get(
+            "https://weibo.com/clairekuo", headers=headers, allow_redirects=False
+        )
+        logger.debug(
+            f"1 stage: {'success' if r.status_code == 302 else 'no 302 redirect, fail!'}"
+        )
+        location = r.headers.get("Location", "")
+
+        # 2 stage: go to passport visitor page
+        headers["Referer"] = location
+        r = session.get(location, headers=headers)
+        logger.debug(
+            f"2 stage: {'success' if r.status_code == 200 else 'no 200 response, fail!'}"
+        )
+
+        # 3 stage: gen visitor token
+        headers["Referer"] = location
+        r = session.post(
+            "https://passport.weibo.com/visitor/genvisitor2",
+            headers=headers,
+            data={'cb': 'visitor_gray_callback', 'tid': '', 'from': 'weibo'},
+        )
+        cookies = r.headers.get('set-cookie')
+        logger.debug(
+            f"3 stage: {'success' if r.status_code == 200 and cookies else 'no 200 response and cookies, fail!'}"
+        )
+        site.headers["Cookie"] = cookies


-async def import_aiohttp_cookies(cookiestxt_filename):
+def import_aiohttp_cookies(cookiestxt_filename):
    cookies_obj = MozillaCookieJar(cookiestxt_filename)
    cookies_obj.load(ignore_discard=True, ignore_expires=True)

    cookies = CookieJar()

    cookies_list = []
-    for domain in cookies_obj._cookies.values():
+    for domain in cookies_obj._cookies.values():  # type: ignore[attr-defined]
        for key, cookie in list(domain.values())[0].items():
-            c = Morsel()
+            c: Morsel = Morsel()
            c.set(key, cookie.value, cookie.value)
-            c['domain'] = cookie.domain
-            c['path'] = cookie.path
+            c["domain"] = cookie.domain
+            c["path"] = cookie.path
            cookies_list.append((key, c))

    cookies.update_cookies(cookies_list)
@@ -0,0 +1,162 @@
+"""Maigret AI Analysis Module
+
+Provides AI-powered analysis of search results using OpenAI-compatible APIs.
+"""
+
+import asyncio
+import json
+import os
+import sys
+import threading
+
+import aiohttp
+
+
+def load_ai_prompt() -> str:
+    """Load the AI system prompt from the resources directory."""
+    maigret_path = os.path.dirname(os.path.realpath(__file__))
+    prompt_path = os.path.join(maigret_path, "resources", "ai_prompt.txt")
+    with open(prompt_path, "r", encoding="utf-8") as f:
+        return f.read()
+
+
+def resolve_api_key(settings) -> str | None:
+    """Resolve OpenAI API key from settings or environment variable.
+
+    Priority: settings.openai_api_key > OPENAI_API_KEY env var.
+    """
+    key = getattr(settings, "openai_api_key", None)
+    if key:
+        return key
+    return os.environ.get("OPENAI_API_KEY")
+
+
+class _Spinner:
+    """Simple animated spinner for terminal output."""
+
+    FRAMES = ["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"]
+
+    def __init__(self, text=""):
+        self.text = text
+        self._stop = threading.Event()
+        self._thread = None
+
+    def start(self):
+        self._thread = threading.Thread(target=self._spin, daemon=True)
+        self._thread.start()
+
+    def _spin(self):
+        i = 0
+        while not self._stop.is_set():
+            frame = self.FRAMES[i % len(self.FRAMES)]
+            sys.stderr.write(f"\r{frame} {self.text}")
+            sys.stderr.flush()
+            i += 1
+            self._stop.wait(0.08)
+
+    def stop(self):
+        self._stop.set()
+        if self._thread:
+            self._thread.join()
+        sys.stderr.write("\r\033[2K")
+        sys.stderr.flush()
+
+
+async def print_streaming(text: str, delay: float = 0.04):
+    """Print text word by word with a delay, simulating streaming LLM output."""
+    words = text.split(" ")
+    for i, word in enumerate(words):
+        if i > 0:
+            sys.stdout.write(" ")
+        sys.stdout.write(word)
+        sys.stdout.flush()
+        await asyncio.sleep(delay)
+    sys.stdout.write("\n")
+    sys.stdout.flush()
+
+
+async def _check_response(resp):
+    """Raise descriptive errors for non-success HTTP responses."""
+    if resp.status == 401:
+        raise RuntimeError("Invalid OpenAI API key (HTTP 401)")
+    if resp.status == 429:
+        raise RuntimeError("OpenAI API rate limit exceeded (HTTP 429)")
+    if resp.status != 200:
+        body = await resp.text()
+        raise RuntimeError(f"OpenAI API error (HTTP {resp.status}): {body[:500]}")
+
+
+async def _stream_response(resp, spinner, first_token):
+    """Stream tokens from resp, display them, and return (first_token, full_analysis)."""
+    full_response = []
+    async for line in resp.content:
+        decoded = line.decode("utf-8").strip()
+        if not decoded or not decoded.startswith("data: "):
+            continue
+        data_str = decoded[len("data: "):]
+        if data_str == "[DONE]":
+            break
+        try:
+            chunk = json.loads(data_str)
+        except json.JSONDecodeError:
+            continue
+        delta = chunk.get("choices", [{}])[0].get("delta", {})
+        content = delta.get("content", "")
+        if not content:
+            continue
+        if first_token:
+            spinner.stop()
+            print()
+            first_token = False
+        sys.stdout.write(content)
+        sys.stdout.flush()
+        full_response.append(content)
+    return first_token, "".join(full_response)
+
+
+async def get_ai_analysis(
+    api_key: str,
+    markdown_report: str,
+    model: str = "gpt-4o",
+    api_base_url: str = "https://api.openai.com/v1",
+) -> str:
+    """Send the markdown report to an OpenAI-compatible API and return the analysis.
+
+    Uses streaming to display tokens as they arrive.
+    Raises on HTTP errors with descriptive messages.
+    """
+    system_prompt = load_ai_prompt()
+
+    url = f"{api_base_url.rstrip('/')}/chat/completions"
+    headers = {
+        "Authorization": f"Bearer {api_key}",
+        "Content-Type": "application/json",
+    }
+    payload = {
+        "model": model,
+        "stream": True,
+        "messages": [
+            {"role": "system", "content": system_prompt},
+            {"role": "user", "content": markdown_report},
+        ],
+    }
+
+    spinner = _Spinner("Analysing the data with AI...")
+    spinner.start()
+    first_token = True
+
+    try:
+        async with aiohttp.ClientSession() as session:
+            async with session.post(url, json=payload, headers=headers) as resp:
+                await _check_response(resp)
+                first_token, analysis = await _stream_response(resp, spinner, first_token)
+    except Exception:
+        spinner.stop()
+        raise
+
+    if first_token:
+        # No tokens received — stop spinner anyway
+        spinner.stop()
+
+    print()
+    return analysis
@@ -0,0 +1,342 @@
+"""
+Database auto-update logic for maigret.
+
+Checks a lightweight meta file to determine if a newer site database is available,
+downloads it if compatible, and caches it locally in ~/.maigret/.
+"""
+
+import hashlib
+import json
+import logging
+import os
+import os.path as path
+import tempfile
+from datetime import datetime, timezone
+from typing import Optional
+
+import requests
+from colorama import Fore, Style
+
+from .__version__ import __version__
+
+logger = logging.getLogger("maigret")
+
+_use_color = True
+
+
+def _print_info(msg: str) -> None:
+    text = f"[*] {msg}"
+    if _use_color:
+        print(Style.BRIGHT + Fore.GREEN + text + Style.RESET_ALL)
+    else:
+        print(text)
+
+
+def _print_success(msg: str) -> None:
+    text = f"[+] {msg}"
+    if _use_color:
+        print(Style.BRIGHT + Fore.GREEN + text + Style.RESET_ALL)
+    else:
+        print(text)
+
+
+def _print_warning(msg: str) -> None:
+    text = f"[!] {msg}"
+    if _use_color:
+        print(Style.BRIGHT + Fore.YELLOW + text + Style.RESET_ALL)
+    else:
+        print(text)
+
+
+DEFAULT_META_URL = (
+    "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/db_meta.json"
+)
+DEFAULT_CHECK_INTERVAL_HOURS = 24
+MAIGRET_HOME = path.expanduser("~/.maigret")
+CACHED_DB_PATH = path.join(MAIGRET_HOME, "data.json")
+STATE_PATH = path.join(MAIGRET_HOME, "autoupdate_state.json")
+BUNDLED_DB_PATH = path.join(path.dirname(path.realpath(__file__)), "resources", "data.json")
+
+
+def _parse_version(version_str: str) -> tuple:
+    """Parse a version string like '0.5.0' into a comparable tuple (0, 5, 0)."""
+    try:
+        return tuple(int(x) for x in version_str.strip().split("."))
+    except (ValueError, AttributeError):
+        return (0, 0, 0)
+
+
+def _ensure_maigret_home() -> None:
+    os.makedirs(MAIGRET_HOME, exist_ok=True)
+
+
+def _load_state() -> dict:
+    try:
+        with open(STATE_PATH, "r", encoding="utf-8") as f:
+            return json.load(f)
+    except (FileNotFoundError, json.JSONDecodeError, OSError):
+        return {}
+
+
+def _save_state(state: dict) -> None:
+    _ensure_maigret_home()
+    tmp_path = STATE_PATH + ".tmp"
+    try:
+        with open(tmp_path, "w", encoding="utf-8") as f:
+            json.dump(state, f, indent=2, ensure_ascii=False)
+        os.replace(tmp_path, STATE_PATH)
+    except OSError:
+        try:
+            os.unlink(tmp_path)
+        except OSError:
+            pass
+
+
+def _needs_check(state: dict, interval_hours: int) -> bool:
+    last_check = state.get("last_check_at")
+    if not last_check:
+        return True
+    try:
+        last_dt = datetime.fromisoformat(last_check.replace("Z", "+00:00"))
+        elapsed = (datetime.now(timezone.utc) - last_dt).total_seconds() / 3600
+        return elapsed >= interval_hours
+    except (ValueError, TypeError):
+        return True
+
+
+def _fetch_meta(meta_url: str, timeout: int = 10) -> Optional[dict]:
+    try:
+        response = requests.get(meta_url, timeout=timeout)
+        if response.status_code == 200:
+            return response.json()
+    except Exception:
+        pass
+    return None
+
+
+def _is_version_compatible(meta: dict) -> bool:
+    min_ver = meta.get("min_maigret_version", "0.0.0")
+    return _parse_version(__version__) >= _parse_version(min_ver)
+
+
+def _is_update_available(meta: dict, state: dict) -> bool:
+    if not path.isfile(CACHED_DB_PATH):
+        return True
+    remote_date = meta.get("updated_at", "")
+    cached_date = state.get("last_meta", {}).get("updated_at", "")
+    return remote_date > cached_date
+
+
+def _download_and_verify(data_url: str, expected_sha256: str, timeout: int = 60) -> Optional[str]:
+    _ensure_maigret_home()
+    tmp_fd, tmp_path = tempfile.mkstemp(dir=MAIGRET_HOME, suffix=".json")
+    try:
+        response = requests.get(data_url, timeout=timeout)
+        if response.status_code != 200:
+            return None
+
+        content = response.content
+        actual_sha256 = hashlib.sha256(content).hexdigest()
+        if actual_sha256 != expected_sha256:
+            _print_warning("DB auto-update: SHA-256 mismatch, download rejected")
+            return None
+
+        # Validate JSON structure
+        data = json.loads(content)
+        if not all(k in data for k in ("sites", "engines", "tags")):
+            _print_warning("DB auto-update: invalid database structure")
+            return None
+
+        os.write(tmp_fd, content)
+        os.close(tmp_fd)
+        tmp_fd = None
+        os.replace(tmp_path, CACHED_DB_PATH)
+        return CACHED_DB_PATH
+    except Exception:
+        return None
+    finally:
+        if tmp_fd is not None:
+            os.close(tmp_fd)
+        try:
+            os.unlink(tmp_path)
+        except OSError:
+            pass
+
+
+def _best_local() -> str:
+    """Return cached DB if it exists and is valid, otherwise bundled."""
+    if path.isfile(CACHED_DB_PATH):
+        try:
+            with open(CACHED_DB_PATH, "r", encoding="utf-8") as f:
+                data = json.load(f)
+            if "sites" in data:
+                return CACHED_DB_PATH
+        except (json.JSONDecodeError, OSError):
+            pass
+    return BUNDLED_DB_PATH
+
+
+def _now_iso() -> str:
+    return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
+
+
+def resolve_db_path(
+    db_file_arg: str,
+    no_autoupdate: bool = False,
+    meta_url: str = DEFAULT_META_URL,
+    check_interval_hours: int = DEFAULT_CHECK_INTERVAL_HOURS,
+    color: bool = True,
+) -> str:
+    """
+    Determine which database file to use, potentially downloading an update.
+
+    Returns the path to the database file that should be loaded.
+    """
+    global _use_color
+    _use_color = color
+
+    default_db_name = "resources/data.json"
+
+    # User specified a custom DB — skip auto-update
+    is_url = db_file_arg.startswith("http://") or db_file_arg.startswith("https://")
+    is_default = db_file_arg == default_db_name
+    if is_url:
+        return db_file_arg
+    if not is_default:
+        # Try the path as-is (absolute or relative to cwd) first.
+        if path.isfile(db_file_arg):
+            return path.abspath(db_file_arg)
+        # Fall back to legacy behavior: resolve relative to the maigret module dir.
+        module_relative = path.join(path.dirname(path.realpath(__file__)), db_file_arg)
+        if module_relative != db_file_arg and path.isfile(module_relative):
+            return module_relative
+        if module_relative != db_file_arg:
+            raise FileNotFoundError(
+                f"Custom database file not found: {db_file_arg!r} "
+                f"(also tried {module_relative!r})"
+            )
+        raise FileNotFoundError(f"Custom database file not found: {db_file_arg!r}")
+
+    # Auto-update disabled
+    if no_autoupdate:
+        return _best_local()
+
+    # Check interval
+    _ensure_maigret_home()
+    state = _load_state()
+    if not _needs_check(state, check_interval_hours):
+        return _best_local()
+
+    # Time to check
+    _print_info("DB auto-update: checking for updates...")
+    meta = _fetch_meta(meta_url)
+    if meta is None:
+        _print_warning("DB auto-update: could not reach update server, using local database")
+        state["last_check_at"] = _now_iso()
+        _save_state(state)
+        return _best_local()
+
+    # Version compatibility
+    if not _is_version_compatible(meta):
+        min_ver = meta.get("min_maigret_version", "?")
+        _print_warning(
+            f"DB auto-update: latest database requires maigret >= {min_ver}, "
+            f"you have {__version__}. Please upgrade with: pip install -U maigret"
+        )
+        state["last_check_at"] = _now_iso()
+        _save_state(state)
+        return _best_local()
+
+    # Check if update available
+    if not _is_update_available(meta, state):
+        sites_count = meta.get("sites_count", "?")
+        _print_info(f"DB auto-update: database is up to date ({sites_count} sites)")
+        state["last_check_at"] = _now_iso()
+        state["last_meta"] = meta
+        _save_state(state)
+        return _best_local()
+
+    # Download update
+    new_count = meta.get("sites_count", "?")
+    old_count = state.get("last_meta", {}).get("sites_count")
+    if old_count:
+        _print_info(f"DB auto-update: downloading updated database ({new_count} sites, was {old_count})...")
+    else:
+        _print_info(f"DB auto-update: downloading database ({new_count} sites)...")
+
+    data_url = meta.get("data_url", "")
+    expected_sha = meta.get("data_sha256", "")
+    result = _download_and_verify(data_url, expected_sha)
+
+    if result is None:
+        _print_warning("DB auto-update: download failed, using local database")
+        state["last_check_at"] = _now_iso()
+        _save_state(state)
+        return _best_local()
+
+    _print_success(f"DB auto-update: database updated successfully ({new_count} sites)")
+    state["last_check_at"] = _now_iso()
+    state["last_meta"] = meta
+    state["cached_db_sha256"] = expected_sha
+    _save_state(state)
+    return CACHED_DB_PATH
+
+
+def force_update(
+    meta_url: str = DEFAULT_META_URL,
+    color: bool = True,
+) -> bool:
+    """
+    Force check for database updates and download if available.
+
+    Returns True if database was updated, False otherwise.
+    """
+    global _use_color
+    _use_color = color
+
+    _ensure_maigret_home()
+
+    _print_info("DB update: checking for updates...")
+    meta = _fetch_meta(meta_url)
+    if meta is None:
+        _print_warning("DB update: could not reach update server")
+        return False
+
+    if not _is_version_compatible(meta):
+        min_ver = meta.get("min_maigret_version", "?")
+        _print_warning(
+            f"DB update: latest database requires maigret >= {min_ver}, "
+            f"you have {__version__}. Please upgrade with: pip install -U maigret"
+        )
+        return False
+
+    state = _load_state()
+    new_count = meta.get("sites_count", "?")
+    old_count = state.get("last_meta", {}).get("sites_count")
+
+    if not _is_update_available(meta, state):
+        _print_info(f"DB update: database is already up to date ({new_count} sites)")
+        state["last_check_at"] = _now_iso()
+        state["last_meta"] = meta
+        _save_state(state)
+        return False
+
+    if old_count:
+        _print_info(f"DB update: downloading updated database ({new_count} sites, was {old_count})...")
+    else:
+        _print_info(f"DB update: downloading database ({new_count} sites)...")
+
+    data_url = meta.get("data_url", "")
+    expected_sha = meta.get("data_sha256", "")
+    result = _download_and_verify(data_url, expected_sha)
+
+    if result is None:
+        _print_warning("DB update: download failed")
+        return False
+
+    _print_success(f"DB update: database updated successfully ({new_count} sites)")
+    state["last_check_at"] = _now_iso()
+    state["last_meta"] = meta
+    state["cached_db_sha256"] = expected_sha
+    _save_state(state)
+    return True
@@ -0,0 +1,182 @@
+from typing import Dict, List, Any, Tuple
+
+from .result import MaigretCheckResult
+from .types import QueryResultWrapper
+
+
+# error got as a result of completed search query
+class CheckError:
+    _type = 'Unknown'
+    _desc = ''
+
+    def __init__(self, typename, desc=''):
+        self._type = typename
+        self._desc = desc
+
+    def __str__(self):
+        if not self._desc:
+            return f'{self._type} error'
+
+        return f'{self._type} error: {self._desc}'
+
+    @property
+    def type(self):
+        return self._type
+
+    @property
+    def desc(self):
+        return self._desc
+
+
+COMMON_ERRORS = {
+    '<title>Attention Required! | Cloudflare</title>': CheckError(
+        'Captcha', 'Cloudflare'
+    ),
+    '<title>Just a moment</title>': CheckError(
+        'Bot protection', 'Cloudflare challenge page'
+    ),
+    'Please stand by, while we are checking your browser': CheckError(
+        'Bot protection', 'Cloudflare'
+    ),
+    '<span data-translate="checking_browser">Checking your browser before accessing</span>': CheckError(
+        'Bot protection', 'Cloudflare'
+    ),
+    'This website is using a security service to protect itself from online attacks.': CheckError(
+        'Access denied', 'Cloudflare'
+    ),
+    '<title>Доступ ограничен</title>': CheckError('Censorship', 'Rostelecom'),
+    'document.getElementById(\'validate_form_submit\').disabled=true': CheckError(
+        'Captcha', 'Mail.ru'
+    ),
+    'Verifying your browser, please wait...<br>DDoS Protection by</font> Blazingfast.io': CheckError(
+        'Bot protection', 'Blazingfast'
+    ),
+    '404</h1><p class="error-card__description">Мы&nbsp;не&nbsp;нашли страницу': CheckError(
+        'Resolving', 'MegaFon 404 page'
+    ),
+    'Доступ к информационному ресурсу ограничен на основании Федерального закона': CheckError(
+        'Censorship', 'MGTS'
+    ),
+    'Incapsula incident ID': CheckError('Bot protection', 'Incapsula'),
+    '<title>Client Challenge</title>': CheckError('Bot protection', 'Anti-bot challenge'),
+    '<title>DDoS-Guard</title>': CheckError('Bot protection', 'DDoS-Guard'),
+    'Сайт заблокирован хостинг-провайдером': CheckError(
+        'Site-specific', 'Site is disabled (Beget)'
+    ),
+    'Generated by cloudfront (CloudFront)': CheckError('Request blocked', 'Cloudflare'),
+    '/cdn-cgi/challenge-platform/h/b/orchestrate/chl_page': CheckError(
+        'Just a moment: bot redirect challenge', 'Cloudflare'
+    ),
+}
+
+ERRORS_TYPES = {
+    'Captcha': 'Try to switch to another IP address or to use service cookies',
+    'Bot protection': 'Try to switch to another IP address',
+    'Censorship': 'Switch to another internet service provider',
+    'Request timeout': 'Try to increase timeout or to switch to another internet service provider',
+    'Connecting failure': 'Try to decrease number of parallel connections (e.g. -n 10)',
+}
+
+# TODO: checking for reason
+ERRORS_REASONS = {
+    'Login required': 'Add authorization cookies through `--cookies-jar-file` (see cookies.txt)',
+}
+
+TEMPORARY_ERRORS_TYPES = [
+    'Request timeout',
+    'Unknown',
+    'Request failed',
+    'Connecting failure',
+    'HTTP',
+    'Proxy',
+    'Interrupted',
+    'Connection lost',
+]
+
+THRESHOLD = 3  # percent
+
+
+def is_important(err_data):
+    return err_data['perc'] >= THRESHOLD
+
+
+def is_permanent(err_type):
+    return err_type not in TEMPORARY_ERRORS_TYPES
+
+
+def detect(text):
+    for flag, err in COMMON_ERRORS.items():
+        if flag in text:
+            return err
+    return None
+
+
+def solution_of(err_type) -> str:
+    return ERRORS_TYPES.get(err_type, '')
+
+
+def extract_and_group(search_res: QueryResultWrapper) -> List[Dict[str, Any]]:
+    errors_counts: Dict[str, int] = {}
+    for r in search_res.values():
+        if r and isinstance(r, dict) and r.get('status'):
+            if not isinstance(r['status'], MaigretCheckResult):
+                continue
+
+            err = r['status'].error
+            if not err:
+                continue
+            errors_counts[err.type] = errors_counts.get(err.type, 0) + 1
+
+    counts = []
+    for err, count in sorted(errors_counts.items(), key=lambda x: x[1], reverse=True):
+        counts.append(
+            {
+                'err': err,
+                'count': count,
+                'perc': round(count / len(search_res), 2) * 100,
+            }
+        )
+
+    return counts
+
+
+def notify_about_errors(
+    search_results: QueryResultWrapper, query_notify, show_statistics=False
+) -> List[Tuple]:
+    """
+    Prepare error notifications in search results, text + symbol,
+    to be displayed by notify object.
+
+    Example:
+    [
+        ("Too many errors of type "timeout" (50.0%)", "!")
+        ("Verbose error statistics:", "-")
+    ]
+    """
+    results = []
+
+    errs = extract_and_group(search_results)
+    was_errs_displayed = False
+    for e in errs:
+        if not is_important(e):
+            continue
+        text = f'Too many errors of type "{e["err"]}" ({round(e["perc"],2)}%)'
+        solution = solution_of(e['err'])
+        if solution:
+            text = '. '.join([text, solution.capitalize()])
+
+        results.append((text, '!'))
+        was_errs_displayed = True
+
+    if show_statistics:
+        results.append(('Verbose error statistics:', '-'))
+        for e in errs:
+            text = f'{e["err"]}: {round(e["perc"],2)}%'
+            results.append((text, '!'))
+
+    if was_errs_displayed:
+        results.append(
+            ('You can see detailed site check errors with a flag `--print-errors`', '-')
+        )
+
+    return results
@@ -0,0 +1,245 @@
+import asyncio
+import inspect
+import sys
+import time
+from typing import Any, Iterable, List, Callable
+
+import alive_progress
+from alive_progress import alive_bar
+
+from .types import QueryDraft
+
+
+def create_task_func():
+    if sys.version_info.minor > 6:
+        create_asyncio_task = asyncio.create_task
+    else:
+        loop = asyncio.get_event_loop()
+        create_asyncio_task = loop.create_task
+    return create_asyncio_task
+
+
+class AsyncExecutor:
+    # Deprecated: will be removed soon, don't use it
+    def __init__(self, *args, **kwargs):
+        self.logger = kwargs['logger']
+
+    async def run(self, tasks: Iterable[QueryDraft]):
+        start_time = time.time()
+        results = await self._run(tasks)
+        self.execution_time = time.time() - start_time
+        self.logger.debug(f'Spent time: {self.execution_time}')
+        return results
+
+    async def _run(self, tasks: Iterable[QueryDraft]):
+        await asyncio.sleep(0)
+
+
+class AsyncioSimpleExecutor(AsyncExecutor):
+    # Deprecated: will be removed soon, don't use it
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.semaphore = asyncio.Semaphore(kwargs.get('in_parallel', 100))
+
+    async def _run(self, tasks: Iterable[QueryDraft]):
+        async def sem_task(f, args, kwargs):
+            async with self.semaphore:
+                return await f(*args, **kwargs)
+
+        futures = [sem_task(f, args, kwargs) for f, args, kwargs in tasks]
+        return await asyncio.gather(*futures)
+
+
+class AsyncioProgressbarExecutor(AsyncExecutor):
+    # Deprecated: will be removed soon, don't use it
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    async def _run(self, tasks: Iterable[QueryDraft]):
+        futures = [f(*args, **kwargs) for f, args, kwargs in tasks]
+        total_tasks = len(futures)
+        results = []
+
+        # Use alive_bar for progress tracking
+        with alive_bar(total_tasks, title='Searching', force_tty=True) as progress:
+            # Chunk progress updates for efficiency
+            async def track_task(task):
+                result = await task
+                progress()  # Update progress bar once task completes
+                return result
+
+            # Use gather to run tasks concurrently and track progress
+            results = await asyncio.gather(*(track_task(f) for f in futures))
+
+        return results
+
+
+class AsyncioProgressbarSemaphoreExecutor(AsyncExecutor):
+    # Deprecated: will be removed soon, don't use it
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.semaphore = asyncio.Semaphore(kwargs.get('in_parallel', 1))
+
+    async def _run(self, tasks: Iterable[QueryDraft]):
+        async def _wrap_query(q: QueryDraft):
+            async with self.semaphore:
+                f, args, kwargs = q
+                return await f(*args, **kwargs)
+
+        async def semaphore_gather(tasks: Iterable[QueryDraft]):
+            coros = [_wrap_query(q) for q in tasks]
+            results = []
+
+            # Use alive_bar correctly as a context manager
+            with alive_bar(len(coros), title='Searching', force_tty=True) as progress:
+                for f in asyncio.as_completed(coros):
+                    results.append(await f)
+                    progress()  # Update the progress bar
+            return results
+
+        return await semaphore_gather(tasks)
+
+
+class AsyncioProgressbarQueueExecutor(AsyncExecutor):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.workers_count = kwargs.get('in_parallel', 10)
+        self.queue: asyncio.Queue = asyncio.Queue(self.workers_count)
+        self.timeout = kwargs.get('timeout')
+        # Pass a progress function; alive_bar by default
+        self.progress_func = kwargs.get('progress_func', alive_bar)
+        self.progress = None
+
+    # TODO: tests
+    async def increment_progress(self, count):
+        """Update progress by calling the provided progress function."""
+        if self.progress:
+            if inspect.iscoroutinefunction(self.progress):
+                await self.progress(count)
+            else:
+                self.progress(count)
+                await asyncio.sleep(0)
+
+    # TODO: tests
+    async def stop_progress(self):
+        """Stop the progress tracking."""
+        if hasattr(self.progress, "close") and self.progress:
+            close_func = self.progress.close
+            if inspect.iscoroutinefunction(close_func):
+                await close_func()
+            else:
+                close_func()
+                await asyncio.sleep(0)
+
+    async def worker(self):
+        """Consume tasks from the queue and process them."""
+        while True:
+            try:
+                f, args, kwargs = self.queue.get_nowait()
+            except asyncio.QueueEmpty:
+                return
+
+            query_future = f(*args, **kwargs)
+            query_task = create_task_func()(query_future)
+            try:
+                result = await asyncio.wait_for(query_task, timeout=self.timeout)
+            except asyncio.TimeoutError:
+                result = kwargs.get('default')
+
+            self.results.append(result)
+
+            if self.progress:
+                await self.increment_progress(1)
+
+            self.queue.task_done()
+
+    async def _run(self, queries: Iterable[QueryDraft]):
+        """Main runner function to execute tasks with progress tracking."""
+        self.results: List[Any] = []
+        queries_list = list(queries)
+        min_workers = min(len(queries_list), self.workers_count)
+        workers = [create_task_func()(self.worker()) for _ in range(min_workers)]
+
+        # Initialize the progress bar
+        if self.progress_func:
+            with self.progress_func(
+                len(queries_list), title="Searching", force_tty=True
+            ) as bar:
+                self.progress = bar  # Assign alive_bar's callable to self.progress
+
+                # Add tasks to the queue
+                for t in queries_list:
+                    await self.queue.put(t)
+
+                # Wait for tasks to complete
+                await self.queue.join()
+
+                # Cancel any remaining workers
+                for w in workers:
+                    w.cancel()
+
+        return self.results
+
+
+class AsyncioQueueGeneratorExecutor:
+    # Deprecated: will be removed soon, don't use it
+    def __init__(self, *args, **kwargs):
+        self.workers_count = kwargs.get('in_parallel', 10)
+        self.queue: asyncio.Queue = asyncio.Queue()
+        self.timeout = kwargs.get('timeout')
+        self.logger = kwargs['logger']
+        self._results: asyncio.Queue = asyncio.Queue()
+        self._stop_signal = object()
+
+    async def worker(self):
+        """Process tasks from the queue and put results into the results queue."""
+        while True:
+            task = await self.queue.get()
+            if task is self._stop_signal:
+                self.queue.task_done()
+                break
+
+            try:
+                f, args, kwargs = task
+                query_future = f(*args, **kwargs)
+                query_task = create_task_func()(query_future)
+
+                try:
+                    result = await asyncio.wait_for(query_task, timeout=self.timeout)
+                except asyncio.TimeoutError:
+                    result = kwargs.get('default')
+                await self._results.put(result)
+            except Exception as e:
+                self.logger.error(f"Error in worker: {e}", exc_info=True)
+            finally:
+                self.queue.task_done()
+
+    async def run(self, queries: Iterable[Callable[..., Any]]):
+        """Run workers to process queries in parallel."""
+        start_time = time.time()
+
+        # Add tasks to the queue
+        for t in queries:
+            await self.queue.put(t)
+
+        # Create workers
+        workers = [
+            asyncio.create_task(self.worker()) for _ in range(self.workers_count)
+        ]
+
+        # Add stop signals
+        for _ in range(self.workers_count):
+            await self.queue.put(self._stop_signal)
+
+        try:
+            while any(w.done() is False for w in workers) or not self._results.empty():
+                try:
+                    result = await asyncio.wait_for(self._results.get(), timeout=1)
+                    yield result
+                except asyncio.TimeoutError:
+                    pass
+        finally:
+            # Ensure all workers are awaited
+            await asyncio.gather(*workers)
+            self.execution_time = time.time() - start_time
+            self.logger.debug(f"Spent time: {self.execution_time}")
@@ -1,17 +1,17 @@
-"""Sherlock Notify Module
+"""Console and query notification helpers.

-This module defines the objects for notifying the caller about the
-results of queries.
+This module defines objects for notifying the caller about the results of queries.
 """
+
 import sys

 from colorama import Fore, Style, init

-from .result import QueryStatus
+from .result import MaigretCheckStatus
 from .utils import get_dict_ascii_tree


-class QueryNotify():
+class QueryNotify:
    """Query Notify Object.

    Base class that describes methods available to notify the results of
@@ -39,7 +39,7 @@ class QueryNotify():

        return

-    def start(self, message=None, id_type='username'):
+    def start(self, message=None, id_type="username"):
        """Notify Start.

        Notify method for start of query.  This method will be called before
@@ -116,8 +116,15 @@ class QueryNotifyPrint(QueryNotify):
    Query notify class that prints results.
    """

-    def __init__(self, result=None, verbose=False, print_found_only=False,
-                 skip_check_errors=False, color=True):
+    def __init__(
+        self,
+        result=None,
+        verbose=False,
+        print_found_only=False,
+        skip_check_errors=False,
+        color=True,
+        silent=False,
+    ):
        """Create Query Notify Print Object.

        Contains information about a specific method of notifying the results
@@ -143,10 +150,32 @@ class QueryNotifyPrint(QueryNotify):
        self.print_found_only = print_found_only
        self.skip_check_errors = skip_check_errors
        self.color = color
+        self.silent = silent

        return

-    def start(self, message, id_type):
+    def make_colored_terminal_notify(
+        self, status, text, status_color, text_color, appendix
+    ):
+        text = [
+            f"{Style.BRIGHT}{Fore.WHITE}[{status_color}{status}{Fore.WHITE}]"
+            + f"{text_color} {text}: {Style.RESET_ALL}"
+            + f"{appendix}"
+        ]
+        return "".join(text)
+
+    def make_simple_terminal_notify(
+        self, status, text, status_color, text_color, appendix
+    ):
+        return f"[{status}] {text}: {appendix}"
+
+    def make_terminal_notify(self, *args):
+        if self.color:
+            return self.make_colored_terminal_notify(*args)
+        else:
+            return self.make_simple_terminal_notify(*args)
+
+    def start(self, message=None, id_type="username"):
        """Notify Start.

        Will print the title to the standard output.
@@ -160,23 +189,44 @@ class QueryNotifyPrint(QueryNotify):
        Nothing.
        """

+        if self.silent:
+            return
+
        title = f"Checking {id_type}"
        if self.color:
-            print(Style.BRIGHT + Fore.GREEN + "[" +
-                  Fore.YELLOW + "*" +
-                  Fore.GREEN + f"] {title}" +
-                  Fore.WHITE + f" {message}" +
-                  Fore.GREEN + " on:")
+            print(
+                Style.BRIGHT
+                + Fore.GREEN
+                + "["
+                + Fore.YELLOW
+                + "*"
+                + Fore.GREEN
+                + f"] {title}"
+                + Fore.WHITE
+                + f" {message}"
+                + Fore.GREEN
+                + " on:"
+            )
        else:
            print(f"[*] {title} {message} on:")

-    def warning(self, message, symbol='-'):
-        msg = f'[{symbol}] {message}'
+    def _colored_print(self, fore_color, msg):
        if self.color:
-            print(Style.BRIGHT + Fore.YELLOW + msg)
+            print(Style.BRIGHT + fore_color + msg)
        else:
            print(msg)

+    def success(self, message, symbol="+"):
+        msg = f"[{symbol}] {message}"
+        self._colored_print(Fore.GREEN, msg)
+
+    def warning(self, message, symbol="-"):
+        msg = f"[{symbol}] {message}"
+        self._colored_print(Fore.YELLOW, msg)
+
+    def info(self, message, symbol="*"):
+        msg = f"[{symbol}] {message}"
+        self._colored_print(Fore.BLUE, msg)

    def update(self, result, is_similar=False):
        """Notify Update.
@@ -191,77 +241,67 @@ class QueryNotifyPrint(QueryNotify):
        Return Value:
        Nothing.
        """
-        self.result = result
-
-        if not self.result.ids_data:
-            ids_data_text = ""
-        else:
-            ids_data_text = get_dict_ascii_tree(self.result.ids_data.items(), ' ')
-
-        def make_colored_terminal_notify(status, text, status_color, text_color, appendix):
-            text = [
-                f'{Style.BRIGHT}{Fore.WHITE}[{status_color}{status}{Fore.WHITE}]' +
-                f'{text_color} {text}: {Style.RESET_ALL}' +
-                f'{appendix}'
-            ]
-            return ''.join(text)
-
-        def make_simple_terminal_notify(status, text, appendix):
-            return f'[{status}] {text}: {appendix}'
-
-        def make_terminal_notify(is_colored=True, *args):
-            if is_colored:
-                return make_colored_terminal_notify(*args)
-            else:
-                return make_simple_terminal_notify(*args)
+        if self.silent:
+            return

        notify = None
+        self.result = result
+
+        ids_data_text = ""
+        if self.result.ids_data:
+            ids_data_text = get_dict_ascii_tree(self.result.ids_data.items(), " ")

        # Output to the terminal is desired.
-        if result.status == QueryStatus.CLAIMED:
+        if result.status == MaigretCheckStatus.CLAIMED:
            color = Fore.BLUE if is_similar else Fore.GREEN
-            status = '?' if is_similar else '+'
-            notify = make_terminal_notify(
-                self.color,
-                status, result.site_name,
-                color, color,
-                result.site_url_user + ids_data_text
+            status = "?" if is_similar else "+"
+            notify = self.make_terminal_notify(
+                status,
+                result.site_name,
+                color,
+                color,
+                result.site_url_user + ids_data_text,
            )
-        elif result.status == QueryStatus.AVAILABLE:
+        elif result.status == MaigretCheckStatus.AVAILABLE:
            if not self.print_found_only:
-                notify = make_terminal_notify(
-                    self.color,
-                    '-', result.site_name,
-                    Fore.RED, Fore.YELLOW,
-                    'Not found!' + ids_data_text
+                notify = self.make_terminal_notify(
+                    "-",
+                    result.site_name,
+                    Fore.RED,
+                    Fore.YELLOW,
+                    "Not found!" + ids_data_text,
                )
-        elif result.status == QueryStatus.UNKNOWN:
+        elif result.status == MaigretCheckStatus.UNKNOWN:
            if not self.skip_check_errors:
-                notify = make_terminal_notify(
-                    self.color,
-                    '?', result.site_name,
-                    Fore.RED, Fore.RED,
-                    self.result.context + ids_data_text
+                notify = self.make_terminal_notify(
+                    "?",
+                    result.site_name,
+                    Fore.RED,
+                    Fore.RED,
+                    str(self.result.error) + ids_data_text,
                )
-        elif result.status == QueryStatus.ILLEGAL:
+        elif result.status == MaigretCheckStatus.ILLEGAL:
            if not self.print_found_only:
-                text = 'Illegal Username Format For This Site!'
-                notify = make_terminal_notify(
-                    self.color,
-                    '-', result.site_name,
-                    Fore.RED, Fore.YELLOW,
-                    text + ids_data_text
+                text = "Illegal Username Format For This Site!"
+                notify = self.make_terminal_notify(
+                    "-",
+                    result.site_name,
+                    Fore.RED,
+                    Fore.YELLOW,
+                    text + ids_data_text,
                )
        else:
            # It should be impossible to ever get here...
-            raise ValueError(f"Unknown Query Status '{str(result.status)}' for "
-                             f"site '{self.result.site_name}'")
+            raise ValueError(
+                f"Unknown Query Status '{str(result.status)}' for "
+                f"site '{self.result.site_name}'"
+            )

        if notify:
-            sys.stdout.write('\x1b[1K\r')
+            sys.stdout.write("\x1b[1K\r")
            print(notify)

-        return
+        return notify

    def __str__(self):
        """Convert Object To String.
@@ -0,0 +1,26 @@
+# License MIT. by balestek https://github.com/balestek
+from itertools import permutations
+
+
+class Permute:
+    def __init__(self, elements: dict):
+        self.separators = ["", "_", "-", "."]
+        self.elements = elements
+
+    def gather(self, method: str = "strict" or "all") -> dict:
+        permutations_dict = {}
+        for i in range(1, len(self.elements) + 1):
+            for subset in permutations(self.elements, i):
+                if i == 1:
+                    if method == "all":
+                        permutations_dict[subset[0]] = self.elements[subset[0]]
+                        permutations_dict["_" + subset[0]] = self.elements[subset[0]]
+                        permutations_dict[subset[0] + "_"] = self.elements[subset[0]]
+                else:
+                    for separator in self.separators:
+                        perm = separator.join(subset)
+                        permutations_dict[perm] = self.elements[subset[0]]
+                        if separator == "":
+                            permutations_dict["_" + perm] = self.elements[subset[0]]
+                            permutations_dict[perm + "_"] = self.elements[subset[0]]
+        return permutations_dict
@@ -1,98 +1,434 @@
+import ast
 import csv
 import io
 import json
 import logging
 import os
-from argparse import ArgumentTypeError
 from datetime import datetime
+from typing import Dict, Any

-import pycountry
-import xmind
+import xmind  # type: ignore[import-untyped]
+from dateutil.tz import gettz
 from dateutil.parser import parse as parse_datetime_str
 from jinja2 import Template
-from xhtml2pdf import pisa

-from .result import QueryStatus
+from .checking import SUPPORTED_IDS
+from .result import MaigretCheckStatus
+from .sites import MaigretDatabase
 from .utils import is_country_tag, CaseConverter, enrich_link_str

+
+ADDITIONAL_TZINFO = {"CDT": gettz("America/Chicago")}
 SUPPORTED_JSON_REPORT_FORMATS = [
-    'simple',
-    'ndjson',
+    "simple",
+    "ndjson",
 ]

-'''
+"""
 UTILS
-'''
+"""


 def filter_supposed_data(data):
-    ### interesting fields
-    allowed_fields = ['fullname', 'gender', 'location', 'age']
-    filtered_supposed_data = {CaseConverter.snake_to_title(k): v[0]
-                              for k, v in data.items()
-                              if k in allowed_fields}
-    return filtered_supposed_data
+    allowed_fields = ["fullname", "gender", "location", "age"]
+
+    def _first(v):
+        if isinstance(v, (list, tuple)):
+            return v[0] if v else ""
+        return v
+
+    return {
+        CaseConverter.snake_to_title(k): _first(v)
+        for k, v in data.items()
+        if k in allowed_fields
+    }


-'''
+def sort_report_by_data_points(results):
+    return dict(
+        sorted(
+            results.items(),
+            key=lambda x: len(
+                (x[1].get('status') and x[1]['status'].ids_data or {}).keys()
+            ),
+            reverse=True,
+        )
+    )
+
+
+"""
 REPORTS SAVING
-'''
+"""


 def save_csv_report(filename: str, username: str, results: dict):
-    with open(filename, 'w', newline='', encoding='utf-8') as f:
+    with open(filename, "w", newline="", encoding="utf-8") as f:
        generate_csv_report(username, results, f)


 def save_txt_report(filename: str, username: str, results: dict):
-    with open(filename, 'w', encoding='utf-8') as f:
+    with open(filename, "w", encoding="utf-8") as f:
        generate_txt_report(username, results, f)


 def save_html_report(filename: str, context: dict):
    template, _ = generate_report_template(is_pdf=False)
    filled_template = template.render(**context)
-    with open(filename, 'w') as f:
+    with open(filename, "w", encoding="utf-8") as f:
        f.write(filled_template)


 def save_pdf_report(filename: str, context: dict):
    template, css = generate_report_template(is_pdf=True)
    filled_template = template.render(**context)
-    with open(filename, 'w+b') as f:
+
+    # moved here to speed up the launch of Maigret
+    from xhtml2pdf import pisa  # type: ignore[import-untyped]
+
+    with open(filename, "w+b") as f:
        pisa.pisaDocument(io.StringIO(filled_template), dest=f, default_css=css)


 def save_json_report(filename: str, username: str, results: dict, report_type: str):
-    with open(filename, 'w', encoding='utf-8') as f:
+    with open(filename, "w", encoding="utf-8") as f:
        generate_json_report(username, results, f, report_type=report_type)


-'''
+class MaigretGraph:
+    other_params: dict = {'size': 10, 'group': 3}
+    site_params: dict = {'size': 15, 'group': 2}
+    username_params: dict = {'size': 20, 'group': 1}
+
+    def __init__(self, graph):
+        self.G = graph
+
+    def add_node(self, key, value, color=None):
+        node_name = f'{key}: {value}'
+
+        params = dict(self.other_params)
+        if key in SUPPORTED_IDS:
+            params = dict(self.username_params)
+        elif value.startswith('http'):
+            params = dict(self.site_params)
+
+        params['title'] = node_name
+        if color:
+            params['color'] = color
+
+        self.G.add_node(node_name, **params)
+        return node_name
+
+    def link(self, node1_name, node2_name):
+        self.G.add_edge(node1_name, node2_name, weight=2)
+
+
+def save_graph_report(filename: str, username_results: list, db: MaigretDatabase):
+    import networkx as nx
+
+    G: Any = nx.Graph()
+    graph = MaigretGraph(G)
+
+    base_site_nodes = {}
+    site_account_nodes = {}
+    processed_values: Dict[str, Any] = {}  # Track processed values to avoid duplicates
+
+    for username, id_type, results in username_results:
+        # Add username node, using normalized version directly if different
+        norm_username = username.lower()
+        username_node_name = graph.add_node(id_type, norm_username)
+
+        for website_name, dictionary in results.items():
+            if not dictionary or dictionary.get("is_similar"):
+                continue
+
+            status = dictionary.get("status")
+            if not status or status.status != MaigretCheckStatus.CLAIMED:
+                continue
+
+            # base site node
+            site_base_url = website_name
+            if site_base_url not in base_site_nodes:
+                base_site_nodes[site_base_url] = graph.add_node(
+                    'site', site_base_url, color='#28a745'
+                )  # Green color
+
+            site_base_node_name = base_site_nodes[site_base_url]
+
+            # account node
+            account_url = dictionary.get('url_user', f'{site_base_url}/{norm_username}')
+            account_node_id = f"{site_base_url}: {account_url}"
+            if account_node_id not in site_account_nodes:
+                site_account_nodes[account_node_id] = graph.add_node(
+                    'account', account_url
+                )
+
+            account_node_name = site_account_nodes[account_node_id]
+
+            # link username → account → site
+            graph.link(username_node_name, account_node_name)
+            graph.link(account_node_name, site_base_node_name)
+
+            def process_ids(parent_node, ids):
+                for k, v in ids.items():
+                    if (
+                        k.endswith('_count')
+                        or k.startswith('is_')
+                        or k.endswith('_at')
+                        or k in 'image'
+                    ):
+                        continue
+
+                    # Normalize value if string
+                    norm_v = v.lower() if isinstance(v, str) else v
+                    value_key = f"{k}:{norm_v}"
+
+                    if value_key in processed_values:
+                        ids_data_name = processed_values[value_key]
+                    else:
+                        v_data = v
+                        if isinstance(v, str) and v.startswith('['):
+                            try:
+                                v_data = ast.literal_eval(v)
+                            except Exception as e:
+                                logging.error(e)
+                                continue
+
+                        if isinstance(v_data, list):
+                            list_node_name = graph.add_node(k, site_base_url)
+                            processed_values[value_key] = list_node_name
+                            for vv in v_data:
+                                data_node_name = graph.add_node(vv, site_base_url)
+                                graph.link(list_node_name, data_node_name)
+
+                                add_ids = {
+                                    a: b for b, a in db.extract_ids_from_url(vv).items()
+                                }
+                                if add_ids:
+                                    process_ids(data_node_name, add_ids)
+                            ids_data_name = list_node_name
+                        else:
+                            ids_data_name = graph.add_node(k, norm_v)
+                            processed_values[value_key] = ids_data_name
+
+                            if 'username' in k or k in SUPPORTED_IDS:
+                                new_username_key = f"username:{norm_v}"
+                                if new_username_key not in processed_values:
+                                    new_username_node_name = graph.add_node(
+                                        'username', norm_v
+                                    )
+                                    processed_values[new_username_key] = (
+                                        new_username_node_name
+                                    )
+                                    graph.link(ids_data_name, new_username_node_name)
+
+                            add_ids = {
+                                k: v for v, k in db.extract_ids_from_url(v).items()
+                            }
+                            if add_ids:
+                                process_ids(ids_data_name, add_ids)
+
+                    graph.link(parent_node, ids_data_name)
+
+            if status.ids_data:
+                process_ids(account_node_name, status.ids_data)
+
+    # Remove overly long nodes
+    nodes_to_remove = [node for node in G.nodes if len(str(node)) > 100]
+    G.remove_nodes_from(nodes_to_remove)
+
+    # Remove site nodes with only one connection
+    single_degree_sites = [
+        n for n, deg in G.degree() if n.startswith("site:") and deg <= 1
+    ]
+    G.remove_nodes_from(single_degree_sites)
+
+    # Generate interactive visualization
+    from pyvis.network import Network  # type: ignore[import-untyped]
+
+    nt = Network(notebook=True, height="100vh", width="100%")
+    nt.from_nx(G)
+    nt.show(filename)
+
+
+def get_plaintext_report(context: dict) -> str:
+    output = (context['brief'] + " ").replace('. ', '.\n')
+    interests = list(map(lambda x: x[0], context.get('interests_tuple_list', [])))
+    countries = list(map(lambda x: x[0], context.get('countries_tuple_list', [])))
+    if countries:
+        output += f'Countries: {", ".join(countries)}\n'
+    if interests:
+        output += f'Interests (tags): {", ".join(interests)}\n'
+    return output.strip()
+
+
+def _md_format_value(value) -> str:
+    """Format a value for Markdown output, detecting links."""
+    if isinstance(value, list):
+        return ", ".join(str(v) for v in value)
+    s = str(value)
+    if s.startswith("http://") or s.startswith("https://"):
+        return f"[{s}]({s})"
+    return s
+
+
+def generate_markdown_report(context: dict, run_info: dict = None) -> str:
+    username = context.get("username", "unknown")
+    generated_at = context.get("generated_at", "")
+    brief = context.get("brief", "")
+    countries = context.get("countries_tuple_list", [])
+    interests = context.get("interests_tuple_list", [])
+    first_seen = context.get("first_seen")
+    results = context.get("results", [])
+
+    # Collect ALL values for key fields across all accounts
+    all_fields: Dict[str, list] = {}
+    last_seen = None
+    for _, _, data in results:
+        for _, v in data.items():
+            if not v.get("found") or v.get("is_similar"):
+                continue
+            ids_data = v.get("ids_data", {})
+            # Map multiple source fields to unified output fields
+            field_sources = {
+                "fullname": ("fullname", "name"),
+                "location": ("location", "country", "city", "country_code", "locale", "region"),
+                "gender": ("gender",),
+                "bio": ("bio", "about", "description"),
+            }
+            for out_field, source_keys in field_sources.items():
+                for src in source_keys:
+                    val = ids_data.get(src)
+                    if val:
+                        all_fields.setdefault(out_field, [])
+                        val_str = str(val)
+                        if val_str not in all_fields[out_field]:
+                            all_fields[out_field].append(val_str)
+            # Track last_seen
+            for ts_field in ("last_online", "latest_activity_at", "updated_at"):
+                ts = ids_data.get(ts_field)
+                if ts and (last_seen is None or str(ts) > str(last_seen)):
+                    last_seen = ts
+
+    lines = []
+    lines.append(f"# Report by searching on username \"{username}\"\n")
+
+    # Generated line with run info
+    gen_line = f"Generated at {generated_at} by [Maigret](https://github.com/soxoj/maigret)"
+    if run_info:
+        parts = []
+        if run_info.get("sites_count"):
+            parts.append(f"{run_info['sites_count']} sites checked")
+        if run_info.get("flags"):
+            parts.append(f"flags: `{run_info['flags']}`")
+        if parts:
+            gen_line += f" ({', '.join(parts)})"
+    lines.append(f"{gen_line}\n")
+
+    # Summary
+    lines.append("## Summary\n")
+    lines.append(f"{brief}\n")
+
+    if all_fields:
+        lines.append("**Information extracted from accounts:**\n")
+        for field, values in all_fields.items():
+            title = CaseConverter.snake_to_title(field)
+            lines.append(f"- {title}: {'; '.join(values)}")
+        lines.append("")
+
+    if countries:
+        geo = ", ".join(f"{code} (x{count})" for code, count in countries)
+        lines.append(f"**Country tags:** {geo}\n")
+
+    if interests:
+        tags = ", ".join(f"{tag} (x{count})" for tag, count in interests)
+        lines.append(f"**Website tags:** {tags}\n")
+
+    if first_seen:
+        lines.append(f"**First seen:** {first_seen}")
+    if last_seen:
+        lines.append(f"**Last seen:** {last_seen}")
+    if first_seen or last_seen:
+        lines.append("")
+
+    # Accounts found
+    lines.append("## Accounts found\n")
+
+    for u, id_type, data in results:
+        for site_name, v in data.items():
+            if not v.get("found") or v.get("is_similar"):
+                continue
+
+            lines.append(f"### {site_name}\n")
+            lines.append(f"- **URL:** [{v.get('url_user', '')}]({v.get('url_user', '')})")
+
+            tags = v.get("status") and v["status"].tags or []
+            if tags:
+                lines.append(f"- **Tags:** {', '.join(tags)}")
+                lines.append("")
+
+            ids_data = v.get("ids_data", {})
+            if ids_data:
+                for field, value in ids_data.items():
+                    if field == "image":
+                        continue
+                    title = CaseConverter.snake_to_title(field)
+                    lines.append(f"- {title}: {_md_format_value(value)}")
+
+            lines.append("")
+
+    # Possible false positives
+    lines.append("## Possible false positives\n")
+    lines.append(
+        f"This report was generated by searching for accounts matching the username `{username}`. "
+        f"Accounts listed above may belong to different people who happen to use the same "
+        f"or similar username. Results without extracted personal information could contain "
+        f"some false positive findings. Always verify findings before drawing conclusions.\n"
+    )
+
+    # Ethical use
+    lines.append("## Ethical use\n")
+    lines.append(
+        "This report is a result of a technical collection of publicly available information "
+        "from online accounts and does not constitute personal data processing. If you intend "
+        "to use this data for personal data processing or collection purposes, ensure your use "
+        "complies with applicable laws and regulations in your jurisdiction (such as GDPR, "
+        "CCPA, and similar).\n"
+    )
+
+    return "\n".join(lines)
+
+
+def save_markdown_report(filename: str, context: dict, run_info: dict = None):
+    content = generate_markdown_report(context, run_info)
+    with open(filename, "w", encoding="utf-8") as f:
+        f.write(content)
+
+
+"""
 REPORTS GENERATING
-'''
+"""


 def generate_report_template(is_pdf: bool):
    """
-        HTML/PDF template generation
+    HTML/PDF template generation
    """

    def get_resource_content(filename):
-        return open(os.path.join(maigret_path, 'resources', filename)).read()
+        return open(os.path.join(maigret_path, "resources", filename)).read()

    maigret_path = os.path.dirname(os.path.realpath(__file__))

    if is_pdf:
-        template_content = get_resource_content('simple_report_pdf.tpl')
-        css_content = get_resource_content('simple_report_pdf.css')
+        template_content = get_resource_content("simple_report_pdf.tpl")
+        css_content = get_resource_content("simple_report_pdf.css")
    else:
-        template_content = get_resource_content('simple_report.tpl')
+        template_content = get_resource_content("simple_report.tpl")
        css_content = None

    template = Template(template_content)
-    template.globals['title'] = CaseConverter.snake_to_title
-    template.globals['detect_link'] = enrich_link_str
+    template.globals["title"] = CaseConverter.snake_to_title  # type: ignore
+    template.globals["detect_link"] = enrich_link_str  # type: ignore
    return template, css_content


@@ -100,15 +436,18 @@ def generate_report_context(username_results: list):
    brief_text = []
    usernames = {}
    extended_info_count = 0
-    tags = {}
-    supposed_data = {}
+    tags: Dict[str, int] = {}
+    supposed_data: Dict[str, Any] = {}

    first_seen = None

+    # moved here to speed up the launch of Maigret
+    import pycountry
+
    for username, id_type, results in username_results:
        found_accounts = 0
        new_ids = []
-        usernames[username] = {'type': id_type}
+        usernames[username] = {"type": id_type}

        for website_name in results:
            dictionary = results[website_name]
@@ -116,59 +455,74 @@ def generate_report_context(username_results: list):
            if not dictionary:
                continue

-            if dictionary.get('is_similar'):
+            if dictionary.get("is_similar"):
                continue

-            status = dictionary.get('status')
+            status = dictionary.get("status")
            if not status:  # FIXME: currently in case of timeout
                continue

            if status.ids_data:
-                dictionary['ids_data'] = status.ids_data
+                dictionary["ids_data"] = status.ids_data
                extended_info_count += 1

                # detect first seen
-                created_at = status.ids_data.get('created_at')
+                created_at = status.ids_data.get("created_at")
                if created_at:
                    if first_seen is None:
                        first_seen = created_at
                    else:
                        try:
-                            known_time = parse_datetime_str(first_seen)
-                            new_time = parse_datetime_str(created_at)
+                            known_time = parse_datetime_str(
+                                first_seen, tzinfos=ADDITIONAL_TZINFO
+                            )
+                            new_time = parse_datetime_str(
+                                created_at, tzinfos=ADDITIONAL_TZINFO
+                            )
                            if new_time < known_time:
                                first_seen = created_at
-                        except:
-                            logging.debug('Problems with converting datetime %s/%s', first_seen, created_at)
+                        except Exception as e:
+                            logging.debug(
+                                "Problems with converting datetime %s/%s: %s",
+                                first_seen,
+                                created_at,
+                                str(e),
+                                exc_info=True,
+                            )

                for k, v in status.ids_data.items():
                    # suppose target data
-                    field = 'fullname' if k == 'name' else k
-                    if not field in supposed_data:
+                    field = "fullname" if k == "name" else k
+                    if field not in supposed_data:
                        supposed_data[field] = []
                    supposed_data[field].append(v)
                    # suppose country
-                    if k in ['country', 'locale']:
+                    if k in ["country", "locale"]:
                        try:
                            if is_country_tag(k):
-                                tag = pycountry.countries.get(alpha_2=v).alpha_2.lower()
+                                country = pycountry.countries.get(alpha_2=v)
+                                tag = country.alpha_2.lower()  # type: ignore[union-attr]
                            else:
-                                tag = pycountry.countries.search_fuzzy(v)[0].alpha_2.lower()
+                                tag = pycountry.countries.search_fuzzy(v)[
+                                    0
+                                ].alpha_2.lower()  # type: ignore[attr-defined]
                            # TODO: move countries to another struct
                            tags[tag] = tags.get(tag, 0) + 1
                        except Exception as e:
-                            logging.debug('pycountry exception', exc_info=True)
+                            logging.debug(
+                                "Pycountry exception: %s", str(e), exc_info=True
+                            )

-            new_usernames = dictionary.get('ids_usernames')
+            new_usernames = dictionary.get("ids_usernames")
            if new_usernames:
                for u, utype in new_usernames.items():
-                    if not u in usernames:
+                    if u not in usernames:
                        new_ids.append((u, utype))
-                        usernames[u] = {'type': utype}
+                        usernames[u] = {"type": utype}

-            if status.status == QueryStatus.CLAIMED:
+            if status.status == MaigretCheckStatus.CLAIMED:
                found_accounts += 1
-                dictionary['found'] = True
+                dictionary["found"] = True
            else:
                continue

@@ -177,22 +531,24 @@ def generate_report_context(username_results: list):
                for t in status.tags:
                    tags[t] = tags.get(t, 0) + 1

-        brief_text.append(f'Search by {id_type} {username} returned {found_accounts} accounts.')
+        brief_text.append(
+            f"Search by {id_type} {username} returned {found_accounts} accounts."
+        )

        if new_ids:
            ids_list = []
            for u, t in new_ids:
-                ids_list.append(f'{u} ({t})' if t != 'username' else u)
-            brief_text.append(f'Found target\'s other IDs: ' + ', '.join(ids_list) + '.')
+                ids_list.append(f"{u} ({t})" if t != "username" else u)
+            brief_text.append("Found target's other IDs: " + ", ".join(ids_list) + ".")

-    brief_text.append(f'Extended info extracted from {extended_info_count} accounts.')
+    brief_text.append(f"Extended info extracted from {extended_info_count} accounts.")

-    brief = ' '.join(brief_text).strip()
+    brief = " ".join(brief_text).strip()
    tuple_sort = lambda d: sorted(d, key=lambda x: x[1], reverse=True)

-    if 'global' in tags:
+    if "global" in tags:
        # remove tag 'global' useless for country detection
-        del tags['global']
+        del tags["global"]

    first_username = username_results[0][0]
    countries_lists = list(filter(lambda x: is_country_tag(x[0]), tags.items()))
@@ -201,35 +557,38 @@ def generate_report_context(username_results: list):
    filtered_supposed_data = filter_supposed_data(supposed_data)

    return {
-        'username': first_username,
-        'brief': brief,
-        'results': username_results,
-        'first_seen': first_seen,
-        'interests_tuple_list': tuple_sort(interests_list),
-        'countries_tuple_list': tuple_sort(countries_lists),
-        'supposed_data': filtered_supposed_data,
-        'generated_at': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
+        "username": first_username,
+        # TODO: return brief list
+        "brief": brief,
+        "results": username_results,
+        "first_seen": first_seen,
+        "interests_tuple_list": tuple_sort(interests_list),
+        "countries_tuple_list": tuple_sort(countries_lists),
+        "supposed_data": filtered_supposed_data,
+        "generated_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
    }


 def generate_csv_report(username: str, results: dict, csvfile):
    writer = csv.writer(csvfile)
-    writer.writerow(['username',
-                     'name',
-                     'url_main',
-                     'url_user',
-                     'exists',
-                     'http_status'
-                     ]
-                    )
+    writer.writerow(
+        ["username", "name", "url_main", "url_user", "exists", "http_status"]
+    )
    for site in results:
-        writer.writerow([username,
-                         site,
-                         results[site]['url_main'],
-                         results[site]['url_user'],
-                         str(results[site]['status'].status),
-                         results[site]['http_status'],
-                         ])
+        # TODO: fix the reason
+        status = 'Unknown'
+        if "status" in results[site]:
+            status = str(results[site]["status"].status)
+        writer.writerow(
+            [
+                username,
+                site,
+                results[site].get("url_main", ""),
+                results[site].get("url_user", ""),
+                status,
+                results[site].get("http_status", 0),
+            ]
+        )


 def generate_txt_report(username: str, results: dict, file):
@@ -239,29 +598,38 @@ def generate_txt_report(username: str, results: dict, file):
        # TODO: fix no site data issue
        if not dictionary:
            continue
-        if dictionary.get("status").status == QueryStatus.CLAIMED:
+        if (
+            dictionary.get("status")
+            and dictionary["status"].status == MaigretCheckStatus.CLAIMED
+        ):
            exists_counter += 1
            file.write(dictionary["url_user"] + "\n")
-    file.write(f'Total Websites Username Detected On : {exists_counter}')
+    file.write(f"Total Websites Username Detected On : {exists_counter}")


 def generate_json_report(username: str, results: dict, file, report_type):
-    exists_counter = 0
-    is_report_per_line = report_type.startswith('ndjson')
+    is_report_per_line = report_type.startswith("ndjson")
    all_json = {}

    for sitename in results:
        site_result = results[sitename]
        # TODO: fix no site data issue
-        if not site_result or site_result.get("status").status != QueryStatus.CLAIMED:
+        if not site_result or not site_result.get("status"):
+            continue
+
+        if site_result["status"].status != MaigretCheckStatus.CLAIMED:
            continue

        data = dict(site_result)
-        data['status'] = data['status'].json()
+        data["status"] = data["status"].json()
+        data["site"] = data["site"].json
+        for field in ["future", "checker"]:
+            if field in data:
+                del data[field]

        if is_report_per_line:
-            data['sitename'] = sitename
-            file.write(json.dumps(data) + '\n')
+            data["sitename"] = sitename
+            file.write(json.dumps(data) + "\n")
        else:
            all_json[sitename] = data

@@ -269,9 +637,9 @@ def generate_json_report(username: str, results: dict, file, report_type):
        file.write(json.dumps(all_json))


-'''
+"""
 XMIND 8 Functions
-'''
+"""


 def save_xmind_report(filename, username, results):
@@ -279,14 +647,22 @@ def save_xmind_report(filename, username, results):
        os.remove(filename)
    workbook = xmind.load(filename)
    sheet = workbook.getPrimarySheet()
-    design_sheet(sheet, username, results)
+    design_xmind_sheet(sheet, username, results)
    xmind.save(workbook, path=filename)


-def design_sheet(sheet, username, results):
-    ##all tag list
-    alltags = {}
-    supposed_data = {}
+def add_xmind_subtopic(userlink, k, v, supposed_data):
+    currentsublabel = userlink.addSubTopic()
+    field = "fullname" if k == "name" else k
+    if field not in supposed_data:
+        supposed_data[field] = []
+    supposed_data[field].append(v)
+    currentsublabel.setTitle("%s: %s" % (k, v))
+
+
+def design_xmind_sheet(sheet, username, results):
+    alltags: Dict[str, Any] = {}
+    supposed_data: Dict[str, Any] = {}

    sheet.setTitle("%s Analysis" % (username))
    root_topic1 = sheet.getRootTopic()
@@ -298,62 +674,45 @@ def design_sheet(sheet, username, results):

    for website_name in results:
        dictionary = results[website_name]
+        if not dictionary:
+            continue
+        result_status = dictionary.get("status")
+        # TODO: fix the reason
+        if not result_status or result_status.status != MaigretCheckStatus.CLAIMED:
+            continue

-        if dictionary.get("status").status == QueryStatus.CLAIMED:
-            ## firsttime I found that entry
-            for tag in dictionary.get("status").tags:
-                if tag.strip() == "":
-                    continue
-                if tag not in alltags.keys():
-                    if not is_country_tag(tag):
-                        tagsection = root_topic1.addSubTopic()
-                        tagsection.setTitle(tag)
-                        alltags[tag] = tagsection
+        stripped_tags = list(map(lambda x: x.strip(), result_status.tags))
+        normalized_tags = list(
+            filter(lambda x: x and not is_country_tag(x), stripped_tags)
+        )

-            category = None
-            for tag in dictionary.get("status").tags:
-                if tag.strip() == "":
-                    continue
-                if not is_country_tag(tag):
-                    category = tag
+        category = None
+        for tag in normalized_tags:
+            if tag in alltags.keys():
+                continue
+            tagsection = root_topic1.addSubTopic()
+            tagsection.setTitle(tag)
+            alltags[tag] = tagsection
+            category = tag

-            if category is None:
-                userlink = undefinedsection.addSubTopic()
-                userlink.addLabel(dictionary.get("status").site_url_user)
+        section = alltags[category] if category else undefinedsection
+        userlink = section.addSubTopic()
+        userlink.addLabel(result_status.site_url_user)
+
+        ids_data = result_status.ids_data or {}
+        for k, v in ids_data.items():
+            # suppose target data
+            if isinstance(v, list):
+                for currentval in v:
+                    add_xmind_subtopic(userlink, k, currentval, supposed_data)
            else:
-                userlink = alltags[category].addSubTopic()
-                userlink.addLabel(dictionary.get("status").site_url_user)
+                add_xmind_subtopic(userlink, k, v, supposed_data)

-            if dictionary.get("status").ids_data:
-                for k, v in dictionary.get("status").ids_data.items():
-                    # suppose target data
-                    if not isinstance(v, list):
-                        currentsublabel = userlink.addSubTopic()
-                        field = 'fullname' if k == 'name' else k
-                        if not field in supposed_data:
-                            supposed_data[field] = []
-                        supposed_data[field].append(v)
-                        currentsublabel.setTitle("%s: %s" % (k, v))
-                    else:
-                        for currentval in v:
-                            currentsublabel = userlink.addSubTopic()
-                            field = 'fullname' if k == 'name' else k
-                            if not field in supposed_data:
-                                supposed_data[field] = []
-                            supposed_data[field].append(currentval)
-                            currentsublabel.setTitle("%s: %s" % (k, currentval))
-    ### Add Supposed DATA
-    filterede_supposed_data = filter_supposed_data(supposed_data)
-    if (len(filterede_supposed_data) > 0):
+    # add supposed data
+    filtered_supposed_data = filter_supposed_data(supposed_data)
+    if len(filtered_supposed_data) > 0:
        undefinedsection = root_topic1.addSubTopic()
        undefinedsection.setTitle("SUPPOSED DATA")
-        for k, v in filterede_supposed_data.items():
+        for k, v in filtered_supposed_data.items():
            currentsublabel = undefinedsection.addSubTopic()
            currentsublabel.setTitle("%s: %s" % (k, v))
-
-
-def check_supported_json_format(value):
-    if value and not value in SUPPORTED_JSON_REPORT_FORMATS:
-        raise ArgumentTypeError(f'JSON report type must be one of the following types: '
-                                + ', '.join(SUPPORTED_JSON_REPORT_FORMATS))
-    return value
@@ -0,0 +1,62 @@
+You are an OSINT analyst that converts raw username-investigation reports into a short, clean human-readable summary.
+
+Your task:
+Read the attached account-discovery report and produce a concise report in exactly this style:
+
+# Investigation Summary
+
+Name: <most likely real full name>
+Location: <most likely current location>
+Occupation: <short combined description based only on strong signals>
+Interests: <3–6 broad interests inferred from platform types, bios, and activity>
+Languages: <languages supported by strong evidence only>
+Website: <main personal website if clearly present>
+Username: <main username> (variant: <variant usernames if any>)
+Platforms: <number> profiles, active from <first year> to <last year>
+Confidence: <High / Medium / Low> — <one short explanation why>
+
+# Other leads
+
+- <lead 1>
+- <lead 2>
+- <lead 3 if needed>
+
+Rules:
+1. Use only information supported by the report.
+2. Resolve identity using consistency of username, full name, bio, links, company, and location.
+3. Prefer strong repeated signals over one-off weak signals.
+4. If one profile clearly conflicts with the rest, mention it in "Other leads" as a likely false positive instead of mixing it into the main identity.
+5. Keep the tone analytical and neutral.
+6. Do not mention every platform individually.
+7. Do not include raw URLs except for the main website.
+8. Do not mention NSFW/adult platforms in the main summary unless they are the only source for a critical lead; if such a profile looks inconsistent, mention it only as a likely false positive.
+9. "Occupation" should be a compact merged description, for example: "Chief Product Officer (CPO) at ..., entrepreneur, OSINT community founder".
+10. "Interests" should be broad categories, not noisy tags. Convert raw platform/tag evidence into natural categories like OSINT, software development, blogging, gaming, streaming, etc.
+11. "Languages" should only include languages clearly supported by bios, texts, country tags, or profile content.
+12. For "Platforms", count the profiles reported as found by the report summary, not manually deduplicated.
+13. For active years, use the earliest and latest reliable dates from the consistent identity cluster. Ignore obvious outlier dates if they belong to likely false positives or weak profiles.
+14. For confidence:
+   - High = strong consistency across username, name, bio, links, location, and/or company
+   - Medium = partial consistency with some gaps
+   - Low = mostly username-only matches
+15. If some field is not reliably known, omit speculation and use the best cautious wording possible.
+16. For "Name", output only the most likely real personal name in clean canonical form.
+    - Remove nicknames, handles, aliases, or bracketed parts such as "(Soxoj)".
+    - Example: "Dmitriy (Soxoj) Danilov" -> "Dmitriy Danilov".
+17. For "Website", output only the plain domain or URL as text, not a markdown hyperlink.
+18. In "Other leads", do not label conflicting profiles as "false positive", "likely unrelated", or "potentially a false positive".
+    - Instead, use neutral intelligence wording such as:
+      "Accounts were found that are most likely unrelated to the main identity, but may indicate possible cross-border activity and should be verified."
+19. When describing anomalies in "Other leads", prefer cautious investigative phrasing:
+    - "may be unrelated"
+    - "requires verification"
+    - "could indicate separate activity"
+    - "should be checked manually"
+20. Do not include nicknames or aliases inside the Name field unless they are clearly part of the legal or real-world name.
+
+Output requirements:
+- Return only the final formatted text.
+- Keep it short.
+- No preamble, no explanations.
+
+Now analyze the following report
@@ -0,0 +1,8 @@
+{
+    "version": 1,
+    "updated_at": "2026-05-05T20:17:24Z",
+    "sites_count": 3154,
+    "min_maigret_version": "0.6.0",
+    "data_sha256": "acf9d9fef8412bf05fa09d50c1ae363e5c8394597b1aaa3f98a9a1c4e31ca356",
+    "data_url": "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/data.json"
+}
@@ -0,0 +1,85 @@
+{
+    "presence_strings": [
+        "user not found",
+        "404",
+        "Page not found",
+        "error 404",
+        "username",
+        "not found",
+        "пользователь",
+        "profile",
+        "lastname",
+        "firstname",
+        "DisplayName",
+        "biography",
+        "title",
+        "birthday",
+        "репутация",
+        "информация",
+        "e-mail",
+        "body",
+        "html",
+        "style"
+    ],
+    "supposed_usernames": [
+        "alex", "god", "admin", "red", "blue", "john"
+    ],
+    "retries_count": 0,
+    "sites_db_path": "resources/data.json",
+    "timeout": 30,
+    "max_connections": 100,
+    "recursive_search": true,
+    "info_extracting": true,
+    "cookie_jar_file": null,
+    "ignore_ids_list": [],
+    "reports_path": "reports",
+    "proxy_url": null,
+    "tor_proxy_url": "socks5://127.0.0.1:9050",
+    "i2p_proxy_url": "http://127.0.0.1:4444",
+    "domain_search": false,
+    "scan_all_sites": false,
+    "top_sites_count": 500,
+    "scan_disabled_sites": false,
+    "scan_sites_list": [],
+    "self_check_enabled": false,
+    "print_not_found": false,
+    "print_check_errors": false,
+    "colored_print": true,
+    "show_progressbar": true,
+    "report_sorting": "default",
+    "json_report_type": "",
+    "txt_report": false,
+    "csv_report": false,
+    "xmind_report": false,
+    "graph_report": false,
+    "pdf_report": false,
+    "html_report": false,
+    "md_report": false,
+    "openai_api_key": "",
+    "openai_model": "gpt-4o",
+    "openai_api_base_url": "https://api.openai.com/v1",
+    "web_interface_port": 5000,
+    "no_autoupdate": false,
+    "db_update_meta_url": "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/db_meta.json",
+    "autoupdate_check_interval_hours": 24,
+    "cloudflare_bypass": {
+        "enabled": false,
+        "session_prefix": "maigret",
+        "trigger_protection": ["cf_js_challenge", "cf_firewall", "webgate"],
+        "modules": [
+            {
+                "name": "flaresolverr",
+                "method": "json_api",
+                "url": "http://localhost:8191/v1",
+                "max_timeout_ms": 60000,
+                "comment": "FlareSolverr (https://github.com/FlareSolverr/FlareSolverr). docker run -d -p 8191:8191 ghcr.io/flaresolverr/flaresolverr:latest"
+            },
+            {
+                "name": "chrome_webgate",
+                "method": "url_rewrite",
+                "url": "http://localhost:8000/html?url={url}&retries=1",
+                "comment": "CloudflareBypassForScraping fallback. WARNING: returns rendered HTML only — checkType: status_code and response_url misfire."
+            }
+        ]
+    }
+}
@@ -68,7 +68,7 @@
        <div class="row-mb">
            <div class="col-md">
                <div class="card flex-md-row mb-4 box-shadow h-md-250">
-                    <img class="card-img-right flex-auto d-md-block" alt="Photo" style="width: 200px; height: 200px; object-fit: scale-down;" src="{{ v.status.ids_data.image or 'https://i.imgur.com/040fmbw.png' }}" data-holder-rendered="true">
+                    <img class="card-img-right flex-auto d-md-block" alt="Photo" style="width: 200px; height: 200px; object-fit: scale-down;" src="{{ v.status and v.status.ids_data and v.status.ids_data.image or 'https://i.imgur.com/040fmbw.png' }}" data-holder-rendered="true">
                    <div class="card-body d-flex flex-column align-items-start" style="padding-top: 0;">
                    <h3 class="mb-0" style="padding-top: 1rem;">
                        <a class="text-dark" href="{{ v.url_main }}" target="_blank">{{ k }}</a>
@@ -38,4 +38,8 @@ div {
  border-bottom-color: #3e3e3e;
  border-bottom-width: 1px;
  border-bottom-style: solid;
+}
+.invalid-button {
+  position: absolute;
+  left: 10px;
 }
@@ -2,14 +2,16 @@

 This module defines various objects for recording the results of queries.
 """
+
 from enum import Enum


-class QueryStatus(Enum):
+class MaigretCheckStatus(Enum):
    """Query Status Enumeration.

    Describes status of query about a given username.
    """
+
    CLAIMED = "Claimed"  # Username Detected
    AVAILABLE = "Available"  # Username Not Detected
    UNKNOWN = "Unknown"  # Error Occurred While Trying To Detect Username
@@ -27,19 +29,24 @@ class QueryStatus(Enum):
        return self.value


-class QueryResult():
-    """Query Result Object.
-
-    Describes result of query about a given username.
+class MaigretCheckResult:
+    """
+    Describes result of checking a given username on a given site
    """

-    def __init__(self, username, site_name, site_url_user, status, ids_data=None,
-                 query_time=None, context=None, tags=[]):
-        """Create Query Result Object.
-
-        Contains information about a specific method of detecting usernames on
-        a given type of web sites.
-
+    def __init__(
+        self,
+        username,
+        site_name,
+        site_url_user,
+        status,
+        ids_data=None,
+        query_time=None,
+        context=None,
+        error=None,
+        tags=[],
+    ):
+        """
        Keyword Arguments:
        self                   -- This object.
        username               -- String indicating username that query result
@@ -73,19 +80,23 @@ class QueryResult():
        self.context = context
        self.ids_data = ids_data
        self.tags = tags
+        self.error = error

    def json(self):
        return {
-            'username': self.username,
-            'site_name': self.site_name,
-            'url': self.site_url_user,
-            'status': str(self.status),
-            'ids': self.ids_data or {},
-            'tags': self.tags,
+            "username": self.username,
+            "site_name": self.site_name,
+            "url": self.site_url_user,
+            "status": str(self.status),
+            "ids": self.ids_data or {},
+            "tags": self.tags,
        }

    def is_found(self):
-        return self.status == QueryStatus.CLAIMED
+        return self.status == MaigretCheckStatus.CLAIMED
+
+    def __repr__(self):
+        return f"<{self.__str__()}>"

    def __str__(self):
        """Convert Object To String.
@@ -0,0 +1,91 @@
+import os
+import os.path as path
+import json
+from typing import List
+
+SETTINGS_FILES_PATHS = [
+    path.join(path.dirname(path.realpath(__file__)), "resources/settings.json"),
+    path.expanduser('~/.maigret/settings.json'),
+    path.join(os.getcwd(), 'settings.json'),
+]
+
+
+class Settings:
+    # main maigret setting
+    retries_count: int
+    sites_db_path: str
+    timeout: int
+    max_connections: int
+    recursive_search: bool
+    info_extracting: bool
+    cookie_jar_file: str
+    ignore_ids_list: List
+    reports_path: str
+    proxy_url: str
+    tor_proxy_url: str
+    i2p_proxy_url: str
+    domain_search: bool
+    scan_all_sites: bool
+    top_sites_count: int
+    scan_disabled_sites: bool
+    scan_sites_list: List
+    self_check_enabled: bool
+    print_not_found: bool
+    print_check_errors: bool
+    colored_print: bool
+    show_progressbar: bool
+    report_sorting: str
+    json_report_type: str
+    txt_report: bool
+    csv_report: bool
+    xmind_report: bool
+    pdf_report: bool
+    html_report: bool
+    graph_report: bool
+    md_report: bool
+    web_interface_port: int
+    no_autoupdate: bool
+    db_update_meta_url: str
+    autoupdate_check_interval_hours: int
+    cloudflare_bypass: dict
+
+    # submit mode settings
+    presence_strings: list
+    supposed_usernames: list
+
+    def __init__(self):
+        pass
+
+    def load(self, paths=None):
+        was_inited = False
+
+        if not paths:
+            paths = SETTINGS_FILES_PATHS
+
+        for filename in paths:
+            data = {}
+
+            try:
+                with open(filename, "r", encoding="utf-8") as file:
+                    data = json.load(file)
+            except FileNotFoundError:
+                # treast as a normal situation
+                pass
+            except Exception as error:
+                return False, ValueError(
+                    f"Problem with parsing json contents of "
+                    f"settings file '{filename}':  {str(error)}."
+                )
+
+            self.__dict__.update(data)
+            if data:
+                was_inited = True
+
+        return (
+            was_inited,
+            f'None of the default settings files found: {", ".join(paths)}',
+        )
+
+    @property
+    def json(self):
+        return self.__dict__
@@ -1,28 +1,18 @@
-# -*- coding: future_annotations -*-
+# ****************************** -*-
 """Maigret Sites Information"""
 import copy
 import json
 import sys
-
-import requests
+from typing import Optional, List, Dict, Any, Tuple

 from .utils import CaseConverter, URLMatcher, is_country_tag

-# TODO: move to data.json
-SUPPORTED_TAGS = [
-    'gaming', 'coding', 'photo', 'music', 'blog', 'finance', 'freelance', 'dating',
-    'tech', 'forum', 'porn', 'erotic', 'webcam', 'video', 'movies', 'hacking', 'art',
-    'discussion', 'sharing', 'writing', 'wiki', 'business', 'shopping', 'sport',
-    'books', 'news', 'documents', 'travel', 'maps', 'hobby', 'apps', 'classified',
-    'career', 'geosocial', 'streaming', 'education', 'networking', 'torrent',
-    'science', 'medicine',
-]
-

 class MaigretEngine:
+    site: Dict[str, Any] = {}
+
    def __init__(self, name, data):
        self.name = name
-        self.site = {}
        self.__dict__.update(data)

    @property
@@ -31,44 +21,87 @@ class MaigretEngine:


 class MaigretSite:
+    # Fields that should not be serialized when converting site to JSON
    NOT_SERIALIZABLE_FIELDS = [
-        'name',
-        'engineData',
-        'requestFuture',
-        'detectedEngine',
-        'engineObj',
-        'stats',
-        'urlRegexp',
+        "name",
+        "engineData",
+        "requestFuture",
+        "detectedEngine",
+        "engineObj",
+        "stats",
+        "urlRegexp",
    ]

+    # Username known to exist on the site
+    username_claimed = ""
+    # Username known to not exist on the site
+    username_unclaimed = ""
+    # Additional URL path component, e.g. /forum in https://example.com/forum/users/{username}
+    url_subpath = ""
+    # Main site URL (the main page)
+    url_main = ""
+    # Full URL pattern for username page, e.g. https://example.com/forum/users/{username}
+    url = ""
+    # Whether site is disabled. Not used by Maigret without --use-disabled argument
+    disabled = False
+    # Whether a positive result indicates accounts with similar usernames rather than exact matches
+    similar_search = False
+    # Whether to ignore 403 status codes
+    ignore403 = False
+    # Site category tags
+    tags: List[str] = []
+
+    # Type of identifier (username, gaia_id etc); see SUPPORTED_IDS in checking.py
+    type = "username"
+    # Custom HTTP headers
+    headers: Dict[str, str] = {}
+    # Error message substrings
+    errors: Dict[str, str] = {}
+    # Site activation requirements
+    activation: Dict[str, Any] = {}
+    # Regular expression for username validation
+    regex_check = None
+    # URL to probe site status
+    url_probe = None
+    # Type of check to perform
+    check_type = ""
+    # HTTP request method (GET, POST, HEAD, etc.)
+    request_method = ""
+    # HTTP request payload (for POST, PUT, etc.)
+    request_payload: Dict[str, Any] = {}
+    # Whether to only send HEAD requests (GET by default)
+    request_head_only = ""
+    # GET parameters to include in requests
+    get_params: Dict[str, Any] = {}
+
+    # Substrings in HTML response that indicate profile exists
+    presense_strs: List[str] = []
+    # Substrings in HTML response that indicate profile doesn't exist
+    absence_strs: List[str] = []
+    # Site statistics
+    stats: Dict[str, Any] = {}
+
+    # Site engine name
+    engine = None
+    # Engine-specific configuration
+    engine_data: Dict[str, Any] = {}
+    # Engine instance
+    engine_obj: Optional["MaigretEngine"] = None
+    # Future for async requests
+    request_future = None
+    # Alexa traffic rank
+    alexa_rank = None
+    # Source (in case a site is a mirror of another site)
+    source: Optional[str] = None
+
+    # URL protocol (http/https)
+    protocol = ''
+    # Protection types detected on this site (e.g. ["tls_fingerprint", "ddos_guard"])
+    protection: List[str] = []
+
    def __init__(self, name, information):
        self.name = name
-
-        self.disabled = False
-        self.similar_search = False
-        self.ignore403 = False
-        self.tags = []
-
-        self.type = 'username'
-        self.headers = {}
-        self.errors = {}
-        self.activation = {}
-        self.url_subpath = ''
-        self.regex_check = None
-        self.url_probe = None
-        self.check_type = ''
-        self.request_head_only = ''
-        self.get_params = {}
-
-        self.presense_strs = []
-        self.absence_strs = []
-        self.stats = {}
-
-        self.engine = None
-        self.engine_data = {}
-        self.engine_obj = None
-        self.request_future = None
-        self.alexa_rank = None
+        self.url_subpath = ""

        for k, v in information.items():
            self.__dict__[CaseConverter.camel_to_snake(k)] = v
@@ -82,23 +115,115 @@ class MaigretSite:
    def __str__(self):
        return f"{self.name} ({self.url_main})"

+    def __is_equal_by_url_or_name(self, url_or_name_str: str):
+        lower_url_or_name_str = url_or_name_str.lower()
+        lower_url = self.url.lower()
+        lower_name = self.name.lower()
+        lower_url_main = self.url_main.lower()
+
+        return (
+            lower_name == lower_url_or_name_str
+            or (lower_url_main and lower_url_main == lower_url_or_name_str)
+            or (lower_url_main and lower_url_main in lower_url_or_name_str)
+            or (lower_url_main and lower_url_or_name_str in lower_url_main)
+            or (lower_url and lower_url_or_name_str in lower_url)
+        )
+
+    def __eq__(self, other):
+        if isinstance(other, MaigretSite):
+            # Compare only relevant attributes, not internal state like request_future
+            attrs_to_compare = [
+                'name',
+                'url_main',
+                'url_subpath',
+                'type',
+                'headers',
+                'errors',
+                'activation',
+                'regex_check',
+                'url_probe',
+                'check_type',
+                'request_method',
+                'request_payload',
+                'request_head_only',
+                'get_params',
+                'presense_strs',
+                'absence_strs',
+                'stats',
+                'engine',
+                'engine_data',
+                'alexa_rank',
+                'source',
+                'protocol',
+            ]
+
+            return all(
+                getattr(self, attr) == getattr(other, attr) for attr in attrs_to_compare
+            )
+        elif isinstance(other, str):
+            # Compare only by name (exactly) or url_main (partial similarity)
+            return self.__is_equal_by_url_or_name(other)
+        return False
+
    def update_detectors(self):
-        if 'url' in self.__dict__:
+        if "url" in self.__dict__:
            url = self.url
-            for group in ['urlMain', 'urlSubpath']:
+            for group in ["urlMain", "urlSubpath"]:
                if group in url:
-                    url = url.replace('{' + group + '}', self.__dict__[CaseConverter.camel_to_snake(group)])
+                    url = url.replace(
+                        "{" + group + "}",
+                        self.__dict__[CaseConverter.camel_to_snake(group)],
+                    )

-            self.url_regexp = URLMatcher.make_profile_url_regexp(url, self.regex_check)
+            self.url_regexp = URLMatcher.make_profile_url_regexp(url, self.regex_check or "")

-    def detect_username(self, url: str) -> str:
+    def detect_username(self, url: str) -> Optional[str]:
        if self.url_regexp:
            match_groups = self.url_regexp.match(url)
            if match_groups:
-                return match_groups.groups()[-1].rstrip('/')
+                username = next(
+                    (
+                        group.rstrip("/")
+                        for group in reversed(match_groups.groups())
+                        if isinstance(group, str) and group
+                    ),
+                    None,
+                )
+                return username

        return None

+    def extract_id_from_url(self, url: str) -> Optional[Tuple[str, str]]:
+        """
+        Extracts username from url.
+        It's outdated, detects only a format of https://example.com/{username}
+        """
+        if not self.url_regexp:
+            return None
+
+        match_groups = self.url_regexp.match(url)
+        if not match_groups:
+            return None
+        _id = next(
+            (
+                group.rstrip("/")
+                for group in reversed(match_groups.groups())
+                if isinstance(group, str) and group
+            ),
+            None,
+        )
+        if _id is None:
+            return None
+        _type = self.type
+
+        return _id, _type
+
+    @property
+    def pretty_name(self):
+        if self.source:
+            return f"{self.name} [{self.source}]"
+        return self.name
+
    @property
    def json(self):
        result = {}
@@ -106,7 +231,7 @@ class MaigretSite:
            # convert to camelCase
            field = CaseConverter.snake_to_camel(k)
            # strip empty elements
-            if v in (False, '', [], {}, None, sys.maxsize, 'username'):
+            if v in (False, "", [], {}, None, sys.maxsize, "username"):
                continue
            if field in self.NOT_SERIALIZABLE_FIELDS:
                continue
@@ -114,13 +239,32 @@ class MaigretSite:

        return result

-    def update(self, updates: dict) -> MaigretSite:
+    @property
+    def errors_dict(self) -> dict:
+        errors: Dict[str, str] = {}
+        if self.engine_obj:
+            errors.update(self.engine_obj.site.get('errors', {}))
+        errors.update(self.errors)
+        return errors
+
+    def get_url_template(self) -> str:
+        url = URLMatcher.extract_main_part(self.url)
+        if url.startswith("{username}"):
+            url = "SUBDOMAIN"
+        elif url == "":
+            url = f"{self.url} ({self.engine or 'no engine'})"
+        else:
+            parts = url.split("/")
+            url = "/" + "/".join(parts[1:])
+        return url
+
+    def update(self, updates: "dict") -> "MaigretSite":
        self.__dict__.update(updates)
        self.update_detectors()

        return self

-    def update_from_engine(self, engine: MaigretEngine) -> MaigretSite:
+    def update_from_engine(self, engine: MaigretEngine) -> "MaigretSite":
        engine_data = engine.site
        for k, v in engine_data.items():
            field = CaseConverter.camel_to_snake(k)
@@ -138,7 +282,7 @@ class MaigretSite:

        return self

-    def strip_engine_data(self) -> MaigretSite:
+    def strip_engine_data(self) -> "MaigretSite":
        if not self.engine_obj:
            return self

@@ -146,7 +290,7 @@ class MaigretSite:
        self.url_regexp = None

        self_copy = copy.deepcopy(self)
-        engine_data = self_copy.engine_obj.site
+        engine_data = self_copy.engine_obj and self_copy.engine_obj.site or {}
        site_data_keys = list(self_copy.__dict__.keys())

        for k in engine_data.keys():
@@ -172,8 +316,9 @@ class MaigretSite:

 class MaigretDatabase:
    def __init__(self):
-        self._sites = []
-        self._engines = []
+        self._tags: list = []
+        self._sites: list = []
+        self._engines: list = []

    @property
    def sites(self):
@@ -183,29 +328,129 @@ class MaigretDatabase:
    def sites_dict(self):
        return {site.name: site for site in self._sites}

-    def ranked_sites_dict(self, reverse=False, top=sys.maxsize, tags=[], names=[],
-                          disabled=True, id_type='username'):
+    def has_site(self, site: MaigretSite):
+        for s in self._sites:
+            if site == s:
+                return True
+        return False
+
+    def __contains__(self, site):
+        return self.has_site(site)
+
+    def ranked_sites_dict(
+        self,
+        reverse=False,
+        top=sys.maxsize,
+        tags=[],
+        excluded_tags=[],
+        names=[],
+        disabled=True,
+        id_type="username",
+    ):
        """
-            Ranking and filtering of the sites list
+        Ranking and filtering of the sites list
+
+        When ``top`` is limited (not "all sites"), **mirrors** may be appended after
+        the Alexa-ranked slice. A mirror is any filtered site with a non-empty
+        ``source`` field equal to the name of a site that appears in the first
+        ``top`` positions of a **parent ranking** that includes disabled sites.
+        Thus mirrors such as third-party viewers (e.g. for Twitter or Instagram)
+        are still scanned when their parent platform ranks highly, even if the
+        official site is disabled and omitted from the main list.
+
+        Args:
+            reverse (bool, optional): Reverse the sorting order. Defaults to False.
+            top (int, optional): Maximum number of sites to return. Defaults to sys.maxsize.
+            tags (list, optional): List of tags to filter sites by (whitelist). Defaults to empty list.
+            excluded_tags (list, optional): List of tags to exclude sites by (blacklist). Defaults to empty list.
+            names (list, optional): List of site names (or urls, see MaigretSite.__eq__) to filter by. Defaults to empty list.
+            disabled (bool, optional): Whether to include disabled sites. Defaults to True.
+            id_type (str, optional): Type of identifier to filter by. Defaults to "username".
+
+        Returns:
+            dict: Dictionary of filtered and ranked sites (base top slice plus mirrors),
+            with site names as keys and MaigretSite objects as values
        """
        normalized_names = list(map(str.lower, names))
        normalized_tags = list(map(str.lower, tags))
+        normalized_excluded_tags = list(map(str.lower, excluded_tags))

        is_name_ok = lambda x: x.name.lower() in normalized_names
-        is_engine_ok = lambda x: isinstance(x.engine, str) and x.engine.lower() in normalized_tags
+        is_source_ok = lambda x: x.source and x.source.lower() in normalized_names
+        is_engine_ok = (
+            lambda x: isinstance(x.engine, str) and x.engine.lower() in normalized_tags
+        )
        is_tags_ok = lambda x: set(x.tags).intersection(set(normalized_tags))
-        is_disabled_needed = lambda x: not x.disabled or ('disabled' in tags or disabled)
+        is_protocol_in_tags = lambda x: x.protocol and x.protocol in normalized_tags
+        is_disabled_needed = lambda x: not x.disabled or (
+            "disabled" in tags or disabled
+        )
        is_id_type_ok = lambda x: x.type == id_type

-        filter_tags_engines_fun = lambda x: not tags or is_engine_ok(x) or is_tags_ok(x)
-        filter_names_fun = lambda x: not names or is_name_ok(x)
+        is_excluded_by_tag = lambda x: set(
+            map(str.lower, x.tags)
+        ).intersection(set(normalized_excluded_tags))
+        is_excluded_by_engine = lambda x: (
+            isinstance(x.engine, str)
+            and x.engine.lower() in normalized_excluded_tags
+        )
+        is_excluded_by_protocol = lambda x: (
+            x.protocol and x.protocol in normalized_excluded_tags
+        )
+        is_not_excluded = lambda x: not excluded_tags or not (
+            is_excluded_by_tag(x)
+            or is_excluded_by_engine(x)
+            or is_excluded_by_protocol(x)
+        )

-        filter_fun = lambda x: filter_tags_engines_fun(x) and filter_names_fun(x) \
-                               and is_disabled_needed(x) and is_id_type_ok(x)
+        filter_tags_engines_fun = (
+            lambda x: not tags
+            or is_engine_ok(x)
+            or is_tags_ok(x)
+            or is_protocol_in_tags(x)
+        )
+        filter_names_fun = lambda x: not names or is_name_ok(x) or is_source_ok(x)
+
+        filter_fun = (
+            lambda x: filter_tags_engines_fun(x)
+            and is_not_excluded(x)
+            and filter_names_fun(x)
+            and is_disabled_needed(x)
+            and is_id_type_ok(x)
+        )

        filtered_list = [s for s in self.sites if filter_fun(s)]

-        sorted_list = sorted(filtered_list, key=lambda x: x.alexa_rank, reverse=reverse)[:top]
+        sorted_list = sorted(
+            filtered_list, key=lambda x: x.alexa_rank, reverse=reverse
+        )[:top]
+
+        # Mirrors: sites whose `source` matches a parent platform that ranks in the
+        # top `top` by Alexa when disabled entries are included in the ranking pool
+        # (so e.g. Instagram can be a parent for Picuki even if Instagram is disabled).
+        if top < sys.maxsize and sorted_list:
+            filter_fun_ranking_parents = (
+                lambda x: filter_tags_engines_fun(x)
+                and is_not_excluded(x)
+                and filter_names_fun(x)
+                and is_id_type_ok(x)
+            )
+            ranking_pool = [s for s in self.sites if filter_fun_ranking_parents(s)]
+            sorted_parents = sorted(
+                ranking_pool, key=lambda x: x.alexa_rank, reverse=reverse
+            )[:top]
+            parent_names_lower = {s.name.lower() for s in sorted_parents}
+            base_names = {s.name for s in sorted_list}
+
+            def is_mirror(s) -> bool:
+                if not s.source or s.name in base_names:
+                    return False
+                return s.source.lower() in parent_names_lower
+
+            mirrors = [s for s in filtered_list if is_mirror(s)]
+            mirrors.sort(key=lambda x: (x.alexa_rank, x.name))
+            sorted_list = list(sorted_list) + mirrors
+
        return {site.name: site for site in sorted_list}

    @property
@@ -216,7 +461,7 @@ class MaigretDatabase:
    def engines_dict(self):
        return {engine.name: engine for engine in self._engines}

-    def update_site(self, site: MaigretSite) -> MaigretDatabase:
+    def update_site(self, site: MaigretSite) -> "MaigretDatabase":
        for s in self._sites:
            if s.name == site.name:
                s = site
@@ -225,23 +470,30 @@ class MaigretDatabase:
        self._sites.append(site)
        return self

-    def save_to_file(self, filename: str) -> MaigretDatabase:
+    def save_to_file(self, filename: str) -> "MaigretDatabase":
+        if '://' in filename:
+            return self
+
        db_data = {
-            'sites': {site.name: site.strip_engine_data().json for site in self._sites},
-            'engines': {engine.name: engine.json for engine in self._engines},
+            "sites": {site.name: site.strip_engine_data().json for site in self._sites},
+            "engines": {engine.name: engine.json for engine in self._engines},
+            "tags": self._tags,
        }

-        json_data = json.dumps(db_data, indent=4)
+        json_data = json.dumps(db_data, indent=4, ensure_ascii=False)

-        with open(filename, 'w') as f:
+        with open(filename, "w", encoding="utf-8") as f:
            f.write(json_data)

        return self

-    def load_from_json(self, json_data: dict) -> MaigretDatabase:
+    def load_from_json(self, json_data: dict) -> "MaigretDatabase":
        # Add all of site information from the json file to internal site list.
        site_data = json_data.get("sites", {})
        engines_data = json_data.get("engines", {})
+        tags = json_data.get("tags", [])
+
+        self._tags += tags

        for engine_name in engines_data:
            self._engines.append(MaigretEngine(engine_name, engines_data[engine_name]))
@@ -250,127 +502,215 @@ class MaigretDatabase:
            try:
                maigret_site = MaigretSite(site_name, site_data[site_name])

-                engine = site_data[site_name].get('engine')
+                engine = site_data[site_name].get("engine")
                if engine:
                    maigret_site.update_from_engine(self.engines_dict[engine])

                self._sites.append(maigret_site)
            except KeyError as error:
-                raise ValueError(f"Problem parsing json content for site {site_name}: "
-                                 f"Missing attribute {str(error)}."
-                                 )
+                raise ValueError(
+                    f"Problem parsing json content for site {site_name}: "
+                    f"Missing attribute {str(error)}."
+                )

        return self

-    def load_from_str(self, db_str: str) -> MaigretDatabase:
+    def load_from_str(self, db_str: "str") -> "MaigretDatabase":
        try:
            data = json.loads(db_str)
        except Exception as error:
-            raise ValueError(f"Problem parsing json contents from str"
-                             f"'{db_str[:50]}'...:  {str(error)}."
-                             )
+            raise ValueError(
+                f"Problem parsing json contents from str"
+                f"'{db_str[:50]}'...:  {str(error)}."
+            )

        return self.load_from_json(data)

-    def load_from_url(self, url: str) -> MaigretDatabase:
-        is_url_valid = url.startswith('http://') or url.startswith('https://')
+    def load_from_path(self, path: str) -> "MaigretDatabase":
+        if '://' in path:
+            return self.load_from_http(path)
+        else:
+            return self.load_from_file(path)
+
+    def load_from_http(self, url: str) -> "MaigretDatabase":
+        is_url_valid = url.startswith("http://") or url.startswith("https://")

        if not is_url_valid:
            raise FileNotFoundError(f"Invalid data file URL '{url}'.")

+        import requests
+
        try:
            response = requests.get(url=url)
        except Exception as error:
-            raise FileNotFoundError(f"Problem while attempting to access "
-                                    f"data file URL '{url}':  "
-                                    f"{str(error)}"
-                                    )
+            raise FileNotFoundError(
+                f"Problem while attempting to access "
+                f"data file URL '{url}':  "
+                f"{str(error)}"
+            )

        if response.status_code == 200:
            try:
                data = response.json()
            except Exception as error:
-                raise ValueError(f"Problem parsing json contents at "
-                                 f"'{url}':  {str(error)}."
-                                 )
+                raise ValueError(
+                    f"Problem parsing json contents at " f"'{url}':  {str(error)}."
+                )
        else:
-            raise FileNotFoundError(f"Bad response while accessing "
-                                    f"data file URL '{url}'."
-                                    )
+            raise FileNotFoundError(
+                f"Bad response while accessing " f"data file URL '{url}'."
+            )

        return self.load_from_json(data)

-    def load_from_file(self, filename: str) -> MaigretDatabase:
+    def load_from_file(self, filename: "str") -> "MaigretDatabase":
        try:
-            with open(filename, 'r', encoding='utf-8') as file:
+            with open(filename, "r", encoding="utf-8") as file:
                try:
                    data = json.load(file)
                except Exception as error:
-                    raise ValueError(f"Problem parsing json contents from "
-                                     f"file '{filename}':  {str(error)}."
-                                     )
+                    raise ValueError(
+                        f"Problem parsing json contents from "
+                        f"file '{filename}':  {str(error)}."
+                    )
        except FileNotFoundError as error:
-            raise FileNotFoundError(f"Problem while attempting to access "
-                                    f"data file '{filename}'."
-                                    )
+            raise FileNotFoundError(
+                f"Problem while attempting to access " f"data file '{filename}'."
+            ) from error

        return self.load_from_json(data)

    def get_scan_stats(self, sites_dict):
        sites = sites_dict or self.sites_dict
-        found_flags = {}
+        found_flags: Dict[str, int] = {}
        for _, s in sites.items():
-            if 'presense_flag' in s.stats:
-                flag = s.stats['presense_flag']
+            if "presense_flag" in s.stats:
+                flag = s.stats["presense_flag"]
                found_flags[flag] = found_flags.get(flag, 0) + 1

        return found_flags

-    def get_db_stats(self, sites_dict):
-        if not sites_dict:
-            sites_dict = self.sites_dict()
+    def extract_ids_from_url(self, url: str) -> dict:
+        results = {}
+        for s in self._sites:
+            result = s.extract_id_from_url(url)
+            if not result:
+                continue
+            _id, _type = result
+            results[_id] = _type
+        return results

-        output = ''
+    def get_db_stats(self, is_markdown=False):
+        # Initialize counters
+        sites_dict = self.sites_dict
+        urls: Dict[str, int] = {}
+        tags: Dict[str, int] = {}
+        engine_total: Dict[str, int] = {}
+        engine_enabled: Dict[str, int] = {}
        disabled_count = 0
-        total_count = len(sites_dict)
-        urls = {}
-        tags = {}
+        message_checks_one_factor = 0
+        status_checks = 0

-        for _, site in sites_dict.items():
+        # Collect statistics
+        for site in sites_dict.values():
+            # Count disabled sites
            if site.disabled:
                disabled_count += 1

-            url = URLMatcher.extract_main_part(site.url)
-            if url.startswith('{username}'):
-                url = 'SUBDOMAIN'
-            elif url == '':
-                url = f'{site.url} ({site.engine})'
-            else:
-                parts = url.split('/')
-                url = '/' + '/'.join(parts[1:])
+            # Count URL types
+            url_type = site.get_url_template()
+            urls[url_type] = urls.get(url_type, 0) + 1

-            urls[url] = urls.get(url, 0) + 1
+            # Count check types for enabled sites
+            if not site.disabled:
+                if site.check_type == 'message':
+                    if not (site.absence_strs and site.presense_strs):
+                        message_checks_one_factor += 1
+                elif site.check_type == 'status_code':
+                    status_checks += 1

+            # Count engines
+            if site.engine:
+                engine_total[site.engine] = engine_total.get(site.engine, 0) + 1
+                if not site.disabled:
+                    engine_enabled[site.engine] = (
+                        engine_enabled.get(site.engine, 0) + 1
+                    )
+
+            # Count tags
            if not site.tags:
-                tags['NO_TAGS'] = tags.get('NO_TAGS', 0) + 1
-
-            for tag in site.tags:
-                if is_country_tag(tag):
-                    # currenty do not display country tags
-                    continue
+                tags["NO_TAGS"] = tags.get("NO_TAGS", 0) + 1
+            for tag in filter(lambda x: not is_country_tag(x), site.tags):
                tags[tag] = tags.get(tag, 0) + 1

-        output += f'Enabled/total sites: {total_count - disabled_count}/{total_count}\n'
-        output += 'Top sites\' profile URLs:\n'
-        for url, count in sorted(urls.items(), key=lambda x: x[1], reverse=True)[:20]:
+        # Calculate percentages
+        total_count = len(sites_dict)
+        enabled_count = total_count - disabled_count
+        enabled_perc = round(100 * enabled_count / total_count, 2)
+        checks_perc = round(100 * message_checks_one_factor / enabled_count, 2)
+        status_checks_perc = round(100 * status_checks / enabled_count, 2)
+
+        # Sites with probing and activation (kinda special cases, let's watch them)
+        site_with_probing = []
+        site_with_activation = []
+        for site in sites_dict.values():
+
+            def get_site_label(site):
+                return f"{site.name}{' (disabled)' if site.disabled else ''}"
+
+            if site.url_probe:
+                site_with_probing.append(get_site_label(site))
+            if site.activation:
+                site_with_activation.append(get_site_label(site))
+
+        # Format output
+        separator = "\n\n"
+        output = [
+            f"Enabled/total sites: {enabled_count}/{total_count} = {enabled_perc}%",
+            f"Incomplete message checks: {message_checks_one_factor}/{enabled_count} = {checks_perc}% (false positive risks)",
+            f"Status code checks: {status_checks}/{enabled_count} = {status_checks_perc}% (false positive risks)",
+            f"False positive risk (total): {checks_perc + status_checks_perc:.2f}%",
+            f"Sites with probing: {', '.join(sorted(site_with_probing))}",
+            f"Sites with activation: {', '.join(sorted(site_with_activation))}",
+            self._format_top_items("profile URLs", urls, 20, is_markdown),
+            self._format_engine_stats(engine_total, engine_enabled, is_markdown),
+            self._format_top_items("tags", tags, 20, is_markdown, self._tags),
+        ]
+
+        return separator.join(output)
+
+    def _format_engine_stats(self, engine_total, engine_enabled, is_markdown):
+        """Format per-engine enabled/total counts, sorted by total descending."""
+        output = "Sites by engine:\n"
+        for engine, total in sorted(
+            engine_total.items(), key=lambda x: x[1], reverse=True
+        ):
+            enabled = engine_enabled.get(engine, 0)
+            perc = round(100 * enabled / total, 1) if total else 0.0
+            if is_markdown:
+                output += f"- `{engine}`: {enabled}/{total} ({perc}%)\n"
+            else:
+                output += f"{enabled}/{total} ({perc}%)\t{engine}\n"
+        return output
+
+    def _format_top_items(
+        self, title, items_dict, limit, is_markdown, valid_items=None
+    ):
+        """Helper method to format top items lists"""
+        output = f"Top {limit} {title}:\n"
+        for item, count in sorted(items_dict.items(), key=lambda x: x[1], reverse=True)[
+            :limit
+        ]:
            if count == 1:
                break
-            output += f'{count}\t{url}\n'
-        output += 'Top sites\' tags:\n'
-        for tag, count in sorted(tags.items(), key=lambda x: x[1], reverse=True):
-            mark = ''
-            if not tag in SUPPORTED_TAGS:
-                mark = ' (non-standard)'
-            output += f'{count}\t{tag}{mark}\n'
-
+            mark = (
+                " (non-standard)"
+                if valid_items is not None and item not in valid_items
+                else ""
+            )
+            output += (
+                f"- ({count})\t`{item}`{mark}\n"
+                if is_markdown
+                else f"{count}\t{item}{mark}\n"
+            )
        return output
@@ -1,232 +1,678 @@
-import difflib
+import asyncio
+import json
+import re
+import os
+import logging
+from typing import Any, Dict, List, Optional, Tuple

-import requests
+from aiohttp import ClientSession, TCPConnector
+import cloudscraper  # type: ignore[import-untyped]
+from colorama import Fore, Style

-from .checking import *
+from .activation import import_aiohttp_cookies
+from .result import MaigretCheckResult
+from .settings import Settings
+from .sites import MaigretDatabase, MaigretEngine, MaigretSite
+from .utils import get_random_user_agent
+from .checking import site_self_check
+from .utils import get_match_ratio, generate_random_username


-DESIRED_STRINGS = ["username", "not found", "пользователь", "profile", "lastname", "firstname", "biography",
-                   "birthday", "репутация", "информация", "e-mail"]
+class CloudflareSession:
+    def __init__(self):
+        self.scraper = cloudscraper.create_scraper()

-SUPPOSED_USERNAMES = ['alex', 'god', 'admin', 'red', 'blue', 'john']
+    async def get(self, *args, **kwargs):
+        await asyncio.sleep(0)
+        res = self.scraper.get(*args, **kwargs)
+        self.last_text = res.text
+        self.status = res.status_code
+        return self

-RATIO = 0.6
-TOP_FEATURES = 5
-URL_RE = re.compile(r'https?://(www\.)?')
+    def status_code(self):
+        return self.status
+
+    async def text(self):
+        await asyncio.sleep(0)
+        return self.last_text
+
+    async def close(self):
+        pass


-def get_match_ratio(x):
-    return round(max([
-        difflib.SequenceMatcher(a=x.lower(), b=y).ratio()
-        for y in DESIRED_STRINGS
-    ]), 2)
-
-
-def extract_mainpage_url(url):
-    return '/'.join(url.split('/', 3)[:3])
-
-
-async def site_self_check(site, logger, semaphore, db: MaigretDatabase, silent=False):
-    changes = {
-        'disabled': False,
+class Submitter:
+    HEADERS = {
+        "User-Agent": get_random_user_agent(),
    }

-    check_data = [
-        (site.username_claimed, QueryStatus.CLAIMED),
-        (site.username_unclaimed, QueryStatus.AVAILABLE),
-    ]
+    SEPARATORS = "\"'\n"

-    logger.info(f'Checking {site.name}...')
+    RATIO = 0.6
+    TOP_FEATURES = 5
+    URL_RE = re.compile(r"https?://(www\.)?")

-    for username, status in check_data:
-        results_dict = await maigret(
-            username=username,
-            site_dict={site.name: site},
-            logger=logger,
-            timeout=30,
-            id_type=site.type,
-            forced=True,
-            no_progressbar=True,
+    def __init__(self, db: MaigretDatabase, settings: Settings, logger, args):
+        self.settings = settings
+        self.args = args
+        self.db = db
+        self.logger = logger
+
+        from aiohttp_socks import ProxyConnector
+
+        proxy = self.args.proxy
+        cookie_jar = None
+        if args.cookie_file:
+            if not os.path.exists(args.cookie_file):
+                logger.error(f"Cookie file {args.cookie_file} does not exist!")
+            else:
+                cookie_jar = import_aiohttp_cookies(args.cookie_file)
+
+        ssl_context = __import__('ssl').create_default_context()
+        ssl_context.check_hostname = False
+        ssl_context.verify_mode = __import__('ssl').CERT_NONE
+        connector = ProxyConnector.from_url(proxy) if proxy else TCPConnector(ssl=ssl_context)
+        self.session = ClientSession(
+            connector=connector, trust_env=True, cookie_jar=cookie_jar
        )

-        # don't disable entries with other ids types
-        # TODO: make normal checking
-        if site.name not in results_dict:
-            logger.info(results_dict)
-            changes['disabled'] = True
-            continue
+    async def close(self):
+        await self.session.close()

-        result = results_dict[site.name]['status']
+    @staticmethod
+    def get_alexa_rank(site_url_main):
+        import requests
+        import xml.etree.ElementTree as ElementTree

-        site_status = result.status
+        url = f"http://data.alexa.com/data?cli=10&url={site_url_main}"
+        xml_data = requests.get(url).text
+        root = ElementTree.fromstring(xml_data)
+        alexa_rank = 0

-        if site_status != status:
-            if site_status == QueryStatus.UNKNOWN:
-                msgs = site.absence_strs
-                etype = site.check_type
-                logger.warning(
-                    f'Error while searching {username} in {site.name}: {result.context}, {msgs}, type {etype}')
-                # don't disable in case of available username
-                if status == QueryStatus.CLAIMED:
-                    changes['disabled'] = True
-            elif status == QueryStatus.CLAIMED:
-                logger.warning(f'Not found `{username}` in {site.name}, must be claimed')
-                logger.info(results_dict[site.name])
-                changes['disabled'] = True
-            else:
-                logger.warning(f'Found `{username}` in {site.name}, must be available')
-                logger.info(results_dict[site.name])
-                changes['disabled'] = True
+        try:
+            reach_elem = root.find('.//REACH')
+            if reach_elem is not None:
+                alexa_rank = int(reach_elem.attrib['RANK'])
+        except Exception:
+            pass

-    logger.info(f'Site {site.name} checking is finished')
+        return alexa_rank

-    return changes
+    @staticmethod
+    def extract_mainpage_url(url):
+        return "/".join(url.split("/", 3)[:3])

+    async def site_self_check(self, site, semaphore, silent=False):
+        # Call the general function from the checking.py
+        changes = await site_self_check(
+            site=site,
+            logger=self.logger,
+            semaphore=semaphore,
+            db=self.db,
+            silent=silent,
+            proxy=self.args.proxy,
+            cookies=self.args.cookie_file,
+            # Don't skip errors in submit mode - we need check both false positives/true negatives
+            skip_errors=False,
+            cloudflare_bypass=getattr(self, 'cloudflare_bypass', None),
+        )
+        return changes

-async def detect_known_engine(db, url_exists, url_mainpage):
-    try:
-        r = requests.get(url_mainpage)
-    except Exception as e:
-        print(e)
-        print('Some error while checking main page')
-        return None
+    def generate_additional_fields_dialog(self, engine: MaigretEngine, dialog):
+        fields = {}
+        if 'urlSubpath' in engine.site.get('url', ''):
+            msg = (
+                'Detected engine suppose additional URL subpath using (/forum/, /blog/, etc). '
+                'Enter in manually if it exists: '
+            )
+            subpath = input(msg).strip('/')
+            if subpath:
+                fields['urlSubpath'] = f'/{subpath}'
+        return fields

-    for e in db.engines:
-        strs_to_check = e.__dict__.get('presenseStrs')
-        if strs_to_check and r and r.text:
-            all_strs_in_response = True
-            for s in strs_to_check:
-                if not s in r.text:
-                    all_strs_in_response = False
-            if all_strs_in_response:
-                engine_name = e.__dict__.get('name')
-                print(f'Detected engine {engine_name} for site {url_mainpage}')
+    async def detect_known_engine(
+        self, url_exists, url_mainpage, session, follow_redirects, headers
+    ) -> Tuple[List[MaigretSite], str]:

+        session = session or self.session
+        resp_text, _ = await self.get_html_response_to_compare(
+            url_exists, session, follow_redirects, headers
+        )
+
+        for engine in self.db.engines:
+            strs_to_check = engine.__dict__.get("presenseStrs")
+            if strs_to_check and resp_text:
+                all_strs_in_response = True
+                for s in strs_to_check:
+                    if s not in resp_text:
+                        all_strs_in_response = False
                sites = []
-                for u in SUPPOSED_USERNAMES:
-                    site_data = {
-                        'urlMain': url_mainpage,
-                        'name': url_mainpage.split('//')[0],
-                        'engine': engine_name,
-                        'usernameClaimed': u,
-                        'usernameUnclaimed': 'noonewouldeverusethis7',
-                    }
+                if all_strs_in_response:
+                    engine_name = engine.__dict__.get("name")

-                    maigret_site = MaigretSite(url_mainpage.split('/')[-1], site_data)
-                    maigret_site.update_from_engine(db.engines_dict[engine_name])
-                    sites.append(maigret_site)
+                    print(f"Detected engine {engine_name} for site {url_mainpage}")

-                return sites
+                    usernames_to_check = self.settings.supposed_usernames
+                    supposed_username = self.extract_username_dialog(url_exists)
+                    if supposed_username:
+                        usernames_to_check = [supposed_username] + usernames_to_check

-    return None
+                    add_fields = self.generate_additional_fields_dialog(
+                        engine, url_exists
+                    )

+                    for u in usernames_to_check:
+                        site_data = {
+                            "urlMain": url_mainpage,
+                            "name": url_mainpage.split("//")[1].split("/")[0],
+                            "engine": engine_name,
+                            "usernameClaimed": u,
+                            "usernameUnclaimed": "noonewouldeverusethis7",
+                            **add_fields,
+                        }
+                        self.logger.info(site_data)

-async def check_features_manually(db, url_exists, url_mainpage, cookie_file):
-    url_parts = url_exists.split('/')
-    supposed_username = url_parts[-1]
-    new_name = input(f'Is "{supposed_username}" a valid username? If not, write it manually: ')
-    if new_name:
-        supposed_username = new_name
-    non_exist_username = 'noonewouldeverusethis7'
+                        maigret_site = MaigretSite(
+                            url_mainpage.split("/")[-1], site_data
+                        )
+                        maigret_site.update_from_engine(
+                            self.db.engines_dict[engine_name]
+                        )
+                        sites.append(maigret_site)

-    url_user = url_exists.replace(supposed_username, '{username}')
-    url_not_exists = url_exists.replace(supposed_username, non_exist_username)
+                    return sites, resp_text

-    # cookies
-    cookie_dict = None
-    if cookie_file:
-        cookie_jar = await import_aiohttp_cookies(cookie_file)
-        cookie_dict = {c.key: c.value for c in cookie_jar}
+        return [], resp_text

-    a = requests.get(url_exists, cookies=cookie_dict).text
-    b = requests.get(url_not_exists, cookies=cookie_dict).text
+    @staticmethod
+    def extract_username_dialog(url):
+        url_parts = url.rstrip("/").split("/")
+        supposed_username = url_parts[-1].strip('@')
+        entered_username = input(
+            f"{Fore.GREEN}[?] Is \"{supposed_username}\" a valid username? If not, write it manually: {Style.RESET_ALL}"
+        )
+        return entered_username if entered_username else supposed_username

-    tokens_a = set(a.split('"'))
-    tokens_b = set(b.split('"'))
+    # TODO: replace with checking.py/SimpleAiohttpChecker call
+    @staticmethod
+    async def get_html_response_to_compare(
+        url: str, session: Optional[ClientSession] = None, redirects=False, headers: Optional[Dict] = None
+    ):
+        assert session is not None, "session must not be None"
+        async with session.get(
+            url, allow_redirects=redirects, headers=headers
+        ) as response:
+            # Try different encodings or fallback to 'ignore' errors
+            try:
+                html_response = await response.text(encoding='utf-8')
+            except UnicodeDecodeError:
+                try:
+                    html_response = await response.text(encoding='latin1')
+                except UnicodeDecodeError:
+                    html_response = await response.text(errors='ignore')
+            return html_response, response.status

-    a_minus_b = tokens_a.difference(tokens_b)
-    b_minus_a = tokens_b.difference(tokens_a)
+    async def check_features_manually(
+        self,
+        username: str,
+        url_exists: str,
+        cookie_filename="",  # TODO: use cookies
+        session: Optional[ClientSession] = None,
+        follow_redirects=False,
+        headers: Optional[dict] = None,
+    ) -> Tuple[Optional[List[str]], Optional[List[str]], str, str]:

-    top_features_count = int(input(f'Specify count of features to extract [default {TOP_FEATURES}]: ') or TOP_FEATURES)
+        random_username = generate_random_username()
+        url_of_non_existing_account = url_exists.lower().replace(
+            username.lower(), random_username
+        )

-    presence_list = sorted(a_minus_b, key=get_match_ratio, reverse=True)[:top_features_count]
+        try:
+            session = session or self.session
+            first_html_response, first_status = await self.get_html_response_to_compare(
+                url_exists, session, follow_redirects, headers
+            )
+            second_html_response, second_status = (
+                await self.get_html_response_to_compare(
+                    url_of_non_existing_account, session, follow_redirects, headers
+                )
+            )
+            await session.close()
+        except Exception as e:
+            self.logger.error(
+                f"Error while getting HTTP response for username {username}: {e}",
+                exc_info=True,
+            )
+            return None, None, str(e), random_username

-    print('Detected text features of existing account: ' + ', '.join(presence_list))
-    features = input('If features was not detected correctly, write it manually: ')
+        self.logger.info(f"URL with existing account: {url_exists}")
+        self.logger.info(
+            f"HTTP response status for URL with existing account: {first_status}"
+        )
+        self.logger.info(
+            f"HTTP response length URL with existing account: {len(first_html_response)}"
+        )
+        self.logger.debug(first_html_response)

-    if features:
-        presence_list = features.split(',')
+        self.logger.info(f"URL with existing account: {url_of_non_existing_account}")
+        self.logger.info(
+            f"HTTP response status for URL with non-existing account: {second_status}"
+        )
+        self.logger.info(
+            f"HTTP response length URL with non-existing account: {len(second_html_response)}"
+        )
+        self.logger.debug(second_html_response)

-    absence_list = sorted(b_minus_a, key=get_match_ratio, reverse=True)[:top_features_count]
-    print('Detected text features of non-existing account: ' + ', '.join(absence_list))
-    features = input('If features was not detected correctly, write it manually: ')
+        # TODO: filter by errors, move to dialog function
+        if (
+            "/cdn-cgi/challenge-platform" in first_html_response
+            or "\t\t\t\tnow: " in first_html_response
+            or "Sorry, you have been blocked" in first_html_response
+        ):
+            self.logger.info("Cloudflare detected, skipping")
+            return None, None, "Cloudflare detected, skipping", random_username

-    if features:
-        absence_list = features.split(',')
+        tokens_a = set(re.split(f'[{self.SEPARATORS}]', first_html_response))
+        tokens_b = set(re.split(f'[{self.SEPARATORS}]', second_html_response))

-    site_data = {
-        'absenceStrs': absence_list,
-        'presenseStrs': presence_list,
-        'url': url_user,
-        'urlMain': url_mainpage,
-        'usernameClaimed': supposed_username,
-        'usernameUnclaimed': non_exist_username,
-        'checkType': 'message',
-    }
+        a_minus_b: List[str] = [x.strip('\\') for x in tokens_a.difference(tokens_b)]
+        b_minus_a: List[str] = [x.strip('\\') for x in tokens_b.difference(tokens_a)]

-    site = MaigretSite(url_mainpage.split('/')[-1], site_data)
-    return site
+        # Filter out strings containing usernames
+        a_minus_b = [s for s in a_minus_b if username.lower() not in s.lower()]
+        b_minus_a = [s for s in b_minus_a if random_username.lower() not in s.lower()]

-async def submit_dialog(db, url_exists, cookie_file):
-    domain_raw = URL_RE.sub('', url_exists).strip().strip('/')
-    domain_raw = domain_raw.split('/')[0]
+        def filter_tokens(token: str, html_response: str) -> bool:
+            is_in_html = token in html_response
+            is_long_str = len(token) >= 50
+            is_number = re.match(r'^\d\.?\d+$', token) or re.match(r':^\d+$', token)
+            is_whitelisted_number = token in ['200', '404', '403']

-    # check for existence
-    matched_sites = list(filter(lambda x: domain_raw in x.url_main + x.url, db.sites))
-    if matched_sites:
-        print(f'Sites with domain "{domain_raw}" already exists in the Maigret database!')
-        status = lambda s: '(disabled)' if s.disabled else ''
-        url_block = lambda s: f'\n\t{s.url_main}\n\t{s.url}'
-        print('\n'.join([f'{site.name} {status(site)}{url_block(site)}' for site in matched_sites]))
-        return False
+            return not (
+                is_in_html or is_long_str or (is_number and not is_whitelisted_number)
+            )

-    url_mainpage = extract_mainpage_url(url_exists)
+        a_minus_b = list(
+            filter(lambda t: filter_tokens(t, second_html_response), a_minus_b)
+        )
+        b_minus_a = list(
+            filter(lambda t: filter_tokens(t, first_html_response), b_minus_a)
+        )

-    sites = await detect_known_engine(db, url_exists, url_mainpage)
-    if not sites:
-        print('Unable to detect site engine, lets generate checking features')
-        sites = [await check_features_manually(db, url_exists, url_mainpage, cookie_file)]
+        if len(a_minus_b) == len(b_minus_a) == 0:
+            return (
+                None,
+                None,
+                "HTTP responses for pages with existing and non-existing accounts are the same",
+                random_username,
+            )

-    print(sites[0].__dict__)
+        match_fun = get_match_ratio(self.settings.presence_strings)

-    sem = asyncio.Semaphore(1)
-    log_level = logging.INFO
-    logging.basicConfig(
-        format='[%(filename)s:%(lineno)d] %(levelname)-3s  %(asctime)s %(message)s',
-        datefmt='%H:%M:%S',
-        level=log_level
-    )
-    logger = logging.getLogger('site-submit')
-    logger.setLevel(log_level)
+        presence_list = sorted(a_minus_b, key=match_fun, reverse=True)[
+            : self.TOP_FEATURES
+        ]
+        absence_list = sorted(b_minus_a, key=match_fun, reverse=True)[
+            : self.TOP_FEATURES
+        ]

-    found = False
-    chosen_site = None
-    for s in sites:
-        chosen_site = s
-        result = await site_self_check(s, logger, sem, db)
-        if not result['disabled']:
-            found = True
-            break
+        self.logger.info(f"Detected presence features: {presence_list}")
+        self.logger.info(f"Detected absence features: {absence_list}")

-    if not found:
-        print(f'Sorry, we couldn\'t find params to detect account presence/absence in {chosen_site.name}.')
-        print('Try to run this mode again and increase features count or choose others.')
-    else:
-        if input(f'Site {chosen_site.name} successfully checked. Do you want to save it in the Maigret DB? [Yn] ').lower() in 'y':
-            print(chosen_site.json)
-            site_data = chosen_site.strip_engine_data()
-            print(site_data.json)
-            db.update_site(site_data)
-            return True
+        return presence_list, absence_list, "Found", random_username

-    return False
+    async def add_site(self, site):
+        sem = asyncio.Semaphore(1)
+        print(
+            f"{Fore.BLUE}{Style.BRIGHT}[*] Adding site {site.name}, let's check it...{Style.RESET_ALL}"
+        )
+
+        result = await self.site_self_check(site, sem)
+        if result["disabled"]:
+            print(f"Checks failed for {site.name}, please, verify them manually.")
+            return {
+                "valid": False,
+                "reason": "checks_failed",
+            }
+
+        while True:
+            print("\nAvailable fields to edit:")
+            editable_fields = {
+                '1': 'name',
+                '2': 'tags',
+                '3': 'url',
+                '4': 'url_main',
+                '5': 'username_claimed',
+                '6': 'username_unclaimed',
+                '7': 'presense_strs',
+                '8': 'absence_strs',
+            }
+
+            for num, field in editable_fields.items():
+                current_value = getattr(site, field)
+                print(f"{num}. {field} (current: {current_value})")
+
+            print("0. finish editing")
+            print("10. reject and block domain")
+            print("11. invalid params, remove")
+
+            choice = input("\nSelect field number to edit (0-8): ").strip()
+
+            if choice == '0':
+                break
+
+            if choice == '10':
+                return {
+                    "valid": False,
+                    "reason": "manual block",
+                }
+
+            if choice == '11':
+                return {
+                    "valid": False,
+                    "reason": "remove",
+                }
+
+            if choice in editable_fields:
+                field = editable_fields[choice]
+                current_value = getattr(site, field)
+                new_value = input(
+                    f"Enter new value for {field} (current: {current_value}): "
+                ).strip()
+
+                if field in ['tags', 'presense_strs', 'absence_strs']:
+                    new_value = list(map(str.strip, new_value.split(',')))  # type: ignore[assignment]
+
+                if new_value:
+                    setattr(site, field, new_value)
+                    print(f"Updated {field} to: {new_value}")
+
+        self.logger.info(site.json)
+        self.db.update_site(site)
+        return {
+            "valid": True,
+        }
+
+    async def dialog(self, url_exists, cookie_file):
+        """
+        An implementation of the submit mode:
+        - User provides a URL of a existing social media account
+        - Maigret tries to detect the site engine and understand how to check
+          for account presence with HTTP responses analysis
+        - If detection succeeds, Maigret generates a new site entry/replace old one in the database
+        """
+        old_site = None
+        additional_options_enabled = self.logger.level in (
+            logging.DEBUG,
+            logging.WARNING,
+        )
+
+        domain_raw = self.URL_RE.sub("", url_exists).strip().strip("/")
+        domain_raw = domain_raw.split("/")[0]
+        self.logger.info('Domain is %s', domain_raw)
+
+        # check for existence
+        domain_re = re.compile(
+            r'://(www\.)?' + re.escape(domain_raw) + r'(/|$)'
+        )
+        matched_sites = list(
+            filter(
+                lambda x: domain_re.search(x.url_main + x.url), self.db.sites
+            )
+        )
+
+        if matched_sites:
+            # TODO: update the existing site
+            print(
+                f"{Fore.YELLOW}[!] Sites with domain \"{domain_raw}\" already exists in the Maigret database!{Style.RESET_ALL}"
+            )
+
+            site_status = lambda s: "(disabled)" if s.disabled else ""
+            url_block = lambda s: f"\n\t{s.url_main}\n\t{s.url}"
+            print(
+                "\n".join(
+                    [
+                        f"{site.name} {site_status(site)}{url_block(site)}"
+                        for site in matched_sites
+                    ]
+                )
+            )
+
+            if (
+                input(
+                    f"{Fore.GREEN}[?] Do you want to continue? [yN] {Style.RESET_ALL}"
+                ).lower()
+                in "n"
+            ):
+                return False
+
+            site_names = [site.name for site in matched_sites]
+            site_name = (
+                input(
+                    f"{Fore.GREEN}[?] Which site do you want to update in case of success? 1st by default. [{', '.join(site_names)}] {Style.RESET_ALL}"
+                )
+                or matched_sites[0].name
+            )
+            old_site = next(
+                (site for site in matched_sites if site.name == site_name), None
+            )
+            if old_site is None:
+                print(
+                    f'{Fore.RED}[!] Site "{site_name}" not found in the matched list. Proceeding without updating an existing site.{Style.RESET_ALL}'
+                )
+            else:
+                print(
+                    f'{Fore.GREEN}[+] We will update site "{old_site.name}" in case of success.{Style.RESET_ALL}'
+                )
+
+        # Check if the site check is ordinary or not
+        if old_site and (old_site.url_probe or old_site.activation):
+            skip = input(
+                f"{Fore.RED}[!] The site check depends on activation / probing mechanism! Consider to update it manually. Continue? [yN]{Style.RESET_ALL}"
+            )
+            if skip.lower() in ['n', '']:
+                return False
+
+            # TODO: urlProbe support
+            # TODO: activation support
+
+        url_mainpage = self.extract_mainpage_url(url_exists)
+
+        # headers update
+        custom_headers = dict(self.HEADERS)
+        while additional_options_enabled:
+            header_key = input(
+                f'{Fore.GREEN}[?] Specify custom header if you need or just press Enter to skip. Header name: {Style.RESET_ALL}'
+            )
+            if not header_key:
+                break
+            header_value = input(f'{Fore.GREEN}[?] Header value: {Style.RESET_ALL}')
+            custom_headers[header_key.strip()] = header_value.strip()
+
+        # redirects settings update
+        redirects = False
+        if additional_options_enabled:
+            redirects = (
+                'y'
+                in input(
+                    f'{Fore.GREEN}[?] Should we do redirects automatically? [yN] {Style.RESET_ALL}'
+                ).lower()
+            )
+
+        print('Detecting site engine, please wait...')
+        sites: List[MaigretSite] = []
+        text = None
+        try:
+            sites, text = await self.detect_known_engine(
+                url_exists,
+                url_exists,
+                session=None,
+                follow_redirects=redirects,
+                headers=custom_headers,
+            )
+        except KeyboardInterrupt:
+            print('Engine detect process is interrupted.')
+
+        if text and 'cloudflare' in text.lower():
+            print(
+                'Cloudflare protection detected. I will use cloudscraper for further work'
+            )
+            # self.session = CloudflareSession()
+
+        if not sites:
+            print("Unable to detect site engine, lets generate checking features")
+
+            supposed_username = self.extract_username_dialog(url_exists)
+            self.logger.info(f"Supposed username: {supposed_username}")
+
+            # TODO: pass status_codes
+            # check it here and suggest to enable / auto-enable redirects
+            presence_list, absence_list, status, non_exist_username = (
+                await self.check_features_manually(
+                    username=supposed_username,
+                    url_exists=url_exists,
+                    cookie_filename=cookie_file,
+                    follow_redirects=redirects,
+                    headers=custom_headers,
+                )
+            )
+
+            if status == "Found":
+                site_data = {
+                    "absenceStrs": absence_list,
+                    "presenseStrs": presence_list,
+                    "url": url_exists.replace(supposed_username, '{username}'),
+                    "urlMain": url_mainpage,
+                    "usernameClaimed": supposed_username,
+                    "usernameUnclaimed": non_exist_username,
+                    "headers": custom_headers,
+                    "checkType": "message",
+                }
+                self.logger.info(json.dumps(site_data, indent=4))
+
+                if custom_headers != self.HEADERS:
+                    site_data['headers'] = custom_headers
+
+                site = MaigretSite(url_mainpage.split("/")[-1], site_data)
+                sites.append(site)
+
+            else:
+                print(
+                    f"{Fore.RED}[!] The check for site failed! Reason: {status}{Style.RESET_ALL}"
+                )
+                return False
+
+        self.logger.debug(sites[0].__dict__)
+
+        sem = asyncio.Semaphore(1)
+
+        print(f"{Fore.GREEN}[*] Checking, please wait...{Style.RESET_ALL}")
+        found = False
+        chosen_site = None
+        for s in sites:
+            chosen_site = s
+            result = await self.site_self_check(s, sem)
+            if not result["disabled"]:
+                found = True
+                break
+
+        assert chosen_site is not None, "No sites to check"
+
+        if not found:
+            print(
+                f"{Fore.RED}[!] The check for site '{chosen_site.name}' failed!{Style.RESET_ALL}"
+            )
+            print(
+                "Try to run this mode again and increase features count or choose others."
+            )
+            self.logger.debug(json.dumps(chosen_site.json))
+            return False
+        else:
+            if (
+                input(
+                    f"{Fore.GREEN}[?] Site {chosen_site.name} successfully checked. Do you want to save it in the Maigret DB? [Yn] {Style.RESET_ALL}"
+                )
+                .lower()
+                .strip("y")
+            ):
+                return False
+
+        if self.args.verbose:
+            self.logger.info(
+                "Verbose mode is enabled, additional settings are available"
+            )
+            source = input(
+                f"{Fore.GREEN}[?] Name the source site if it is mirror: {Style.RESET_ALL}"
+            )
+            if source:
+                chosen_site.source = source
+
+        default_site_name = old_site.name if old_site else chosen_site.name
+        new_name = (
+            input(
+                f"{Fore.GREEN}[?] Change site name if you want [{default_site_name}]: {Style.RESET_ALL}"
+            )
+            or default_site_name
+        )
+        if new_name != default_site_name:
+            self.logger.info(f"New site name is {new_name}")
+            chosen_site.name = new_name
+
+        default_tags_str = ""
+        if old_site:
+            default_tags_str = f' [{", ".join(old_site.tags)}]'
+
+        new_tags = input(
+            f"{Fore.GREEN}[?] Site tags{default_tags_str}: {Style.RESET_ALL}"
+        )
+        if new_tags:
+            chosen_site.tags = list(map(str.strip, new_tags.split(',')))
+        else:
+            chosen_site.tags = []
+        self.logger.info(f"Site tags are: {', '.join(chosen_site.tags)}")
+        # rank = Submitter.get_alexa_rank(chosen_site.url_main)
+        # if rank:
+        #     print(f'New alexa rank: {rank}')
+        #     chosen_site.alexa_rank = rank
+
+        self.logger.info(chosen_site.json)
+        stripped_site = chosen_site.strip_engine_data()
+        self.logger.info(stripped_site.json)
+
+        if old_site:
+            # Update old site with new values and log changes
+            fields_to_check = {
+                'url': 'URL',
+                'url_main': 'Main URL',
+                'username_claimed': 'Username claimed',
+                'username_unclaimed': 'Username unclaimed',
+                'check_type': 'Check type',
+                'presense_strs': 'Presence strings',
+                'absence_strs': 'Absence strings',
+                'tags': 'Tags',
+                'source': 'Source',
+                'headers': 'Headers',
+            }
+
+            for field, display_name in fields_to_check.items():
+                old_value = getattr(old_site, field)
+                new_value = getattr(stripped_site, field)
+                if field == 'tags' and not new_tags:
+                    continue
+                if str(old_value) != str(new_value):
+                    print(
+                        f"{Fore.YELLOW}[*] '{display_name}' updated: {Fore.RED}{old_value} {Fore.YELLOW}to {Fore.GREEN}{new_value}{Style.RESET_ALL}"
+                    )
+                old_site.__dict__[field] = new_value
+
+        # update the site
+        final_site = old_site if old_site else stripped_site
+        self.db.update_site(final_site)
+
+        # save the db in file
+        if self.args.db_file != self.settings.sites_db_path:
+            print(
+                f"{Fore.GREEN}[+] Maigret DB is saved to {self.args.db}.{Style.RESET_ALL}"
+            )
+            self.db.save_to_file(self.args.db)
+
+        return True
@@ -0,0 +1,11 @@
+from typing import Callable, List, Dict, Tuple, Any
+
+
+# search query
+QueryDraft = Tuple[Callable, List, Dict]
+
+# options dict
+QueryOptions = Dict[str, Any]
+
+# TODO: throw out
+QueryResultWrapper = Dict[str, Any]
@@ -1,78 +1,129 @@
+# coding: utf8
+import ast
+import difflib
 import re
+import random
+import string
+from typing import Any
+
+
+DEFAULT_USER_AGENTS = [
+    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36",
+]


 class CaseConverter:
    @staticmethod
    def camel_to_snake(camelcased_string: str) -> str:
-        return re.sub(r'(?<!^)(?=[A-Z])', '_', camelcased_string).lower()
+        return re.sub(r"(?<!^)(?=[A-Z])", "_", camelcased_string).lower()

    @staticmethod
    def snake_to_camel(snakecased_string: str) -> str:
-        formatted = ''.join(word.title() for word in snakecased_string.split('_'))
+        formatted = "".join(word.title() for word in snakecased_string.split("_"))
        result = formatted[0].lower() + formatted[1:]
        return result

    @staticmethod
    def snake_to_title(snakecased_string: str) -> str:
-        words = snakecased_string.split('_')
+        words = snakecased_string.split("_")
        words[0] = words[0].title()
-        return ' '.join(words)
+        return " ".join(words)


 def is_country_tag(tag: str) -> bool:
    """detect if tag represent a country"""
-    return bool(re.match("^([a-zA-Z]){2}$", tag)) or tag == 'global'
+    return bool(re.match("^([a-zA-Z]){2}$", tag)) or tag == "global"


 def enrich_link_str(link: str) -> str:
    link = link.strip()
-    if link.startswith('www.') or (link.startswith('http') and '//' in link):
+    if link.startswith("www.") or (link.startswith("http") and "//" in link):
        return f'<a class="auto-link" href="{link}">{link}</a>'
    return link


 class URLMatcher:
-    _HTTP_URL_RE_STR = '^https?://(www.)?(.+)$'
+    _HTTP_URL_RE_STR = "^https?://(www.|m.)?(.+)$"
    HTTP_URL_RE = re.compile(_HTTP_URL_RE_STR)
-    UNSAFE_SYMBOLS = '.?'
+    UNSAFE_SYMBOLS = ".?"

    @classmethod
    def extract_main_part(self, url: str) -> str:
        match = self.HTTP_URL_RE.search(url)
        if match and match.group(2):
-            return match.group(2).rstrip('/')
+            return match.group(2).rstrip("/")

-        return ''
+        return ""

    @classmethod
-    def make_profile_url_regexp(self, url: str, username_regexp: str = ''):
+    def make_profile_url_regexp(self, url: str, username_regexp: str = ""):
        url_main_part = self.extract_main_part(url)
        for c in self.UNSAFE_SYMBOLS:
-            url_main_part = url_main_part.replace(c, f'\\{c}')
-        username_regexp = username_regexp or '.+?'
+            url_main_part = url_main_part.replace(c, f"\\{c}")
+        prepared_username_regexp = (username_regexp or ".+?").lstrip('^').rstrip('$')

-        url_regexp = url_main_part.replace('{username}', f'({username_regexp})')
-        regexp_str = self._HTTP_URL_RE_STR.replace('(.+)', url_regexp)
+        url_regexp = url_main_part.replace(
+            "{username}", f"({prepared_username_regexp})"
+        )
+        regexp_str = self._HTTP_URL_RE_STR.replace("(.+)", url_regexp)

-        return re.compile(regexp_str)
+        return re.compile(regexp_str, re.IGNORECASE)


-def get_dict_ascii_tree(items, prepend='', new_line=True):
-    text = ''
+def ascii_data_display(data: str) -> Any:
+    try:
+        return ast.literal_eval(data)
+    except (ValueError, SyntaxError):
+        return data
+
+
+def get_dict_ascii_tree(items, prepend="", new_line=True):
+    new_result = b'\xe2\x94\x9c'.decode()
+    new_line = b'\xe2\x94\x80'.decode()
+    last_result = b'\xe2\x94\x94'.decode()
+    skip_result = b'\xe2\x94\x82'.decode()
+
+    text = ""
    for num, item in enumerate(items):
-        box_symbol = '┣╸' if num != len(items) - 1 else '┗╸'
+        box_symbol = (
+            new_result + new_line if num != len(items) - 1 else last_result + new_line
+        )

-        if type(item) == tuple:
+        if isinstance(item, tuple):
            field_name, field_value = item
-            if field_value.startswith('[\''):
+            if field_value.startswith("['"):
                is_last_item = num == len(items) - 1
-                prepend_symbols = ' ' * 3 if is_last_item else ' ┃ '
-                field_value = get_dict_ascii_tree(eval(field_value), prepend_symbols)
-            text += f'\n{prepend}{box_symbol}{field_name}: {field_value}'
+                prepend_symbols = " " * 3 if is_last_item else f" {skip_result} "
+                data = ascii_data_display(field_value)
+                field_value = get_dict_ascii_tree(data, prepend_symbols)
+            text += f"\n{prepend}{box_symbol}{field_name}: {field_value}"
        else:
-            text += f'\n{prepend}{box_symbol} {item}'
+            text += f"\n{prepend}{box_symbol} {item}"

    if not new_line:
        text = text[1:]

    return text
+
+
+def get_random_user_agent():
+    return random.choice(DEFAULT_USER_AGENTS)
+
+
+def get_match_ratio(base_strs: list):
+    def get_match_inner(s: str):
+        return round(
+            max(
+                [
+                    difflib.SequenceMatcher(a=s.lower(), b=s2.lower()).ratio()
+                    for s2 in base_strs
+                ]
+            ),
+            2,
+        )
+
+    return get_match_inner
+
+
+def generate_random_username():
+    return ''.join(random.choices(string.ascii_lowercase, k=10))
@@ -0,0 +1,353 @@
+from flask import (
+    Flask,
+    render_template,
+    request,
+    send_file,
+    Response,
+    flash,
+    redirect,
+    url_for,
+)
+import logging
+import os
+import asyncio
+from datetime import datetime
+from threading import Thread
+from typing import Any, Dict
+import maigret
+import maigret.settings
+from maigret.sites import MaigretDatabase
+from maigret.report import generate_report_context
+
+app = Flask(__name__)
+# Use environment variable for secret key, generate random one if not set
+app.secret_key = os.getenv('FLASK_SECRET_KEY', os.urandom(24).hex())
+
+# add background job tracking
+background_jobs: Dict[str, Any] = {}
+job_results = {}
+
+# Configuration
+app.config["MAIGRET_DB_FILE"] = os.path.join(os.path.dirname(os.path.dirname(__file__)), 'resources', 'data.json')
+app.config["COOKIES_FILE"] = "cookies.txt"
+app.config["UPLOAD_FOLDER"] = 'uploads'
+app.config["REPORTS_FOLDER"] = os.path.abspath('/tmp/maigret_reports')
+
+
+def setup_logger(log_level, name):
+    logger = logging.getLogger(name)
+    logger.setLevel(log_level)
+    return logger
+
+
+async def maigret_search(username, options):
+    logger = setup_logger(logging.WARNING, 'maigret')
+    try:
+        db = MaigretDatabase().load_from_path(app.config["MAIGRET_DB_FILE"])
+
+        top_sites = int(options.get('top_sites') or 500)
+        if options.get('all_sites'):
+            top_sites = 999999999  # effectively all
+
+        tags = options.get('tags', [])
+        excluded_tags = options.get('excluded_tags', [])
+        site_list = options.get('site_list', [])
+        logger.info(f"Filtering sites by tags: {tags}, excluded: {excluded_tags}")
+
+        sites = db.ranked_sites_dict(
+            top=top_sites,
+            tags=tags,
+            excluded_tags=excluded_tags,
+            names=site_list,
+            disabled=False,
+            id_type='username',
+        )
+
+        logger.info(f"Found {len(sites)} sites matching the tag criteria")
+
+        results = await maigret.search(
+            username=username,
+            site_dict=sites,
+            timeout=int(options.get('timeout', 30)),
+            logger=logger,
+            id_type='username',
+            cookies=app.config["COOKIES_FILE"] if options.get('use_cookies') else None,
+            is_parsing_enabled=(not options.get('disable_extracting', False)),
+            recursive_search_enabled=(
+                not options.get('disable_recursive_search', False)
+            ),
+            check_domains=options.get('with_domains', False),
+            proxy=options.get('proxy', None),
+            tor_proxy=options.get('tor_proxy', None),
+            i2p_proxy=options.get('i2p_proxy', None),
+        )
+        return results
+    except Exception as e:
+        logger.error(f"Error during search: {str(e)}")
+        raise
+
+
+async def search_multiple_usernames(usernames, options):
+    results = []
+    for username in usernames:
+        try:
+            search_results = await maigret_search(username.strip(), options)
+            results.append((username.strip(), 'username', search_results))
+        except Exception as e:
+            logging.error(f"Error searching username {username}: {str(e)}")
+    return results
+
+
+def process_search_task(usernames, options, timestamp):
+    try:
+        loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(loop)
+
+        general_results = loop.run_until_complete(
+            search_multiple_usernames(usernames, options)
+        )
+
+        os.makedirs(app.config["REPORTS_FOLDER"], exist_ok=True)
+        session_folder = os.path.join(
+            app.config["REPORTS_FOLDER"], f"search_{timestamp}"
+        )
+        os.makedirs(session_folder, exist_ok=True)
+
+        graph_path = os.path.join(session_folder, "combined_graph.html")
+        maigret.report.save_graph_report(
+            graph_path,
+            general_results,
+            MaigretDatabase().load_from_path(app.config["MAIGRET_DB_FILE"]),
+        )
+
+        individual_reports = []
+        for username, id_type, results in general_results:
+            report_base = os.path.join(session_folder, f"report_{username}")
+
+            csv_path = f"{report_base}.csv"
+            json_path = f"{report_base}.json"
+            pdf_path = f"{report_base}.pdf"
+            html_path = f"{report_base}.html"
+
+            context = generate_report_context(general_results)
+
+            maigret.report.save_csv_report(csv_path, username, results)
+            maigret.report.save_json_report(
+                json_path, username, results, report_type='ndjson'
+            )
+            maigret.report.save_pdf_report(pdf_path, context)
+            maigret.report.save_html_report(html_path, context)
+
+            claimed_profiles = []
+            for site_name, site_data in results.items():
+                if (
+                    site_data.get('status')
+                    and site_data['status'].status
+                    == maigret.result.MaigretCheckStatus.CLAIMED
+                ):
+                    claimed_profiles.append(
+                        {
+                            'site_name': site_name,
+                            'url': site_data.get('url_user', ''),
+                            'tags': (
+                                site_data.get('status').tags
+                                if site_data.get('status')
+                                else []
+                            ),
+                        }
+                    )
+
+            individual_reports.append(
+                {
+                    'username': username,
+                    'csv_file': os.path.join(
+                        f"search_{timestamp}", f"report_{username}.csv"
+                    ),
+                    'json_file': os.path.join(
+                        f"search_{timestamp}", f"report_{username}.json"
+                    ),
+                    'pdf_file': os.path.join(
+                        f"search_{timestamp}", f"report_{username}.pdf"
+                    ),
+                    'html_file': os.path.join(
+                        f"search_{timestamp}", f"report_{username}.html"
+                    ),
+                    'claimed_profiles': claimed_profiles,
+                }
+            )
+
+        # save results and mark job as complete using timestamp as key
+        job_results[timestamp] = {
+            'status': 'completed',
+            'session_folder': f"search_{timestamp}",
+            'graph_file': os.path.join(f"search_{timestamp}", "combined_graph.html"),
+            'usernames': usernames,
+            'individual_reports': individual_reports,
+        }
+
+    except Exception as e:
+        logging.error(f"Error in search task for timestamp {timestamp}: {str(e)}")
+        job_results[timestamp] = {'status': 'failed', 'error': str(e)}
+    finally:
+        background_jobs[timestamp]['completed'] = True
+
+
+@app.route('/')
+def index():
+    # load site data for autocomplete
+    db = MaigretDatabase().load_from_path(app.config["MAIGRET_DB_FILE"])
+    site_options = []
+
+    for site in db.sites:
+        # add main site name
+        site_options.append(site.name)
+        # add URL if different from name
+        if site.url_main and site.url_main not in site_options:
+            site_options.append(site.url_main)
+
+    # sort and deduplicate
+    site_options = sorted(set(site_options))
+
+    return render_template('index.html', site_options=site_options)
+
+
+# Modified search route
+@app.route('/search', methods=['POST'])
+def search():
+    usernames_input = request.form.get('usernames', '').strip()
+    if not usernames_input:
+        flash('At least one username is required', 'danger')
+        return redirect(url_for('index'))
+
+    usernames = [
+        u.strip() for u in usernames_input.replace(',', ' ').split() if u.strip()
+    ]
+
+    # Create timestamp for this search session
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+
+    # Get selected tags - ensure it's a list
+    selected_tags = request.form.getlist('tags')
+    excluded_tags = request.form.getlist('excluded_tags')
+    logging.info(f"Selected tags: {selected_tags}, Excluded tags: {excluded_tags}")
+
+    options = {
+        'top_sites': request.form.get('top_sites') or '500',
+        'timeout': request.form.get('timeout') or '30',
+        'use_cookies': 'use_cookies' in request.form,
+        'all_sites': 'all_sites' in request.form,
+        'disable_recursive_search': 'disable_recursive_search' in request.form,
+        'disable_extracting': 'disable_extracting' in request.form,
+        'with_domains': 'with_domains' in request.form,
+        'proxy': request.form.get('proxy', None) or None,
+        'tor_proxy': request.form.get('tor_proxy', None) or None,
+        'i2p_proxy': request.form.get('i2p_proxy', None) or None,
+        'permute': 'permute' in request.form,
+        'tags': selected_tags,  # Pass selected tags as a list
+        'excluded_tags': excluded_tags,  # Pass excluded tags as a list
+        'site_list': [
+            s.strip() for s in request.form.get('site', '').split(',') if s.strip()
+        ],
+    }
+
+    logging.info(
+        f"Starting search for usernames: {usernames} with tags: {selected_tags}, excluded: {excluded_tags}"
+    )
+
+    # Start background job
+    background_jobs[timestamp] = {
+        'completed': False,
+        'thread': Thread(
+            target=process_search_task, args=(usernames, options, timestamp)
+        ),
+    }
+    background_jobs[timestamp]['thread'].start()  # type: ignore[union-attr]
+
+    return redirect(url_for('status', timestamp=timestamp))
+
+
+@app.route('/status/<timestamp>')
+def status(timestamp):
+    logging.info(f"Status check for timestamp: {timestamp}")
+
+    # Validate timestamp
+    if timestamp not in background_jobs:
+        flash('Invalid search session.', 'danger')
+        logging.error(f"Invalid search session: {timestamp}")
+        return redirect(url_for('index'))
+
+    # Check if job is completed
+    if background_jobs[timestamp]['completed']:
+        result = job_results.get(timestamp)
+        if not result:
+            flash('No results found for this search session.', 'warning')
+            logging.error(f"No results found for completed session: {timestamp}")
+            return redirect(url_for('index'))
+
+        if result['status'] == 'completed':
+            # Note: use the session_folder from the results to redirect
+            return redirect(url_for('results', session_id=result['session_folder']))
+        else:
+            error_msg = result.get('error', 'Unknown error occurred.')
+            flash(f'Search failed: {error_msg}', 'danger')
+            logging.error(f"Search failed for session {timestamp}: {error_msg}")
+            return redirect(url_for('index'))
+
+    # If job is still running, show a status page
+    return render_template('status.html', timestamp=timestamp)
+
+
+@app.route('/results/<session_id>')
+def results(session_id):
+    # Find completed results that match this session_folder
+    result_data = next(
+        (
+            r
+            for r in job_results.values()
+            if r.get('status') == 'completed' and r['session_folder'] == session_id
+        ),
+        None,
+    )
+
+    if not result_data:
+        flash('No results found for this session ID.', 'danger')
+        logging.error(f"Results for session {session_id} not found in job_results.")
+        return redirect(url_for('index'))
+
+    return render_template(
+        'results.html',
+        usernames=result_data['usernames'],
+        graph_file=result_data['graph_file'],
+        individual_reports=result_data['individual_reports'],
+        timestamp=session_id.replace('search_', ''),
+    )
+
+
+@app.route('/reports/<path:filename>')
+def download_report(filename):
+    try:
+        os.makedirs(app.config["REPORTS_FOLDER"], exist_ok=True)
+        file_path = os.path.normpath(
+            os.path.join(app.config["REPORTS_FOLDER"], filename)
+        )
+        if not file_path.startswith(app.config["REPORTS_FOLDER"]):
+            raise Exception("Invalid file path")
+        return send_file(file_path)
+    except Exception as e:
+        logging.error(f"Error serving file {filename}: {str(e)}")
+        return "File not found", 404
+
+
+if __name__ == '__main__':
+    logging.basicConfig(
+        level=logging.INFO,
+        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+    )
+    debug_mode = os.getenv('FLASK_DEBUG', 'False').lower() in ['true', '1', 't']
+
+    # Host configuration: secure by default
+    # Use 127.0.0.1 for local development, 0.0.0.0 only if explicitly set
+    host = os.getenv('FLASK_HOST', '127.0.0.1')
+    port = int(os.getenv('FLASK_PORT', '5000'))
+
+    app.run(host=host, port=port, debug=debug_mode)
@@ -0,0 +1,118 @@
+<!DOCTYPE html>
+<html lang="en" data-bs-theme="dark">
+
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Maigret Web Interface</title>
+    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
+    <style>
+        body {
+            min-height: 100vh;
+            display: flex;
+            flex-direction: column;
+        }
+
+        .main-container {
+            flex: 1;
+            padding-top: 2rem;
+        }
+
+        .form-container {
+            max-width: auto;
+            margin: auto;
+            padding-bottom: 2rem;
+        }
+
+        [data-bs-theme="dark"] {
+            --bs-body-bg: #212529;
+            --bs-body-color: #dee2e6;
+        }
+
+        .header {
+            padding: 1rem 0;
+            margin-bottom: 2rem;
+            border-bottom: 1px solid var(--bs-border-color);
+        }
+
+        .header-content {
+            display: flex;
+            align-items: center;
+            justify-content: space-between;
+        }
+
+        .logo-container {
+            display: flex;
+            align-items: center;
+            gap: 1rem;
+        }
+
+        .logo {
+            height: 40px;
+            width: auto;
+        }
+
+        .footer {
+            margin-top: auto;
+            padding: 1rem 0;
+            text-align: center;
+            border-top: 1px solid var(--bs-border-color);
+            font-size: 0.9rem;
+        }
+
+        .footer a {
+            color: inherit;
+            text-decoration: none;
+        }
+
+        .footer a:hover {
+            text-decoration: underline;
+        }
+    </style>
+</head>
+
+<body>
+    <div class="header">
+        <div class="container">
+            <div class="header-content">
+                <div class="logo-container">
+                    <img src="{{ url_for('static', filename='maigret.png') }}" alt="Maigret Logo" class="logo">
+                    <h1 class="h4 mb-0">Maigret Web Interface</h1>
+                </div>
+                <button class="btn btn-outline-secondary" id="theme-toggle">
+                    Toggle Dark/Light Mode
+                </button>
+            </div>
+        </div>
+    </div>
+
+    <div class="main-container">
+        <div class="container">
+            {% block content %}{% endblock %}
+        </div>
+    </div>
+
+    <footer class="footer">
+        <div class="container">
+            <p class="mb-0">
+                Powered by <a href="https://github.com/soxoj/maigret" target="_blank">Maigret</a> |
+                Licensed under <a href="https://github.com/soxoj/maigret/blob/main/LICENSE" target="_blank">MIT
+                    License</a>
+            </p>
+        </div>
+    </footer>
+
+    <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script>
+    <script>
+        document.getElementById('theme-toggle').addEventListener('click', function () {
+            const html = document.documentElement;
+            if (html.getAttribute('data-bs-theme') === 'dark') {
+                html.setAttribute('data-bs-theme', 'light');
+            } else {
+                html.setAttribute('data-bs-theme', 'dark');
+            }
+        });
+    </script>
+</body>
+
+</html>
@@ -0,0 +1,520 @@
+{% extends "base.html" %}
+
+{% block content %}
+<style>
+    .tag-cloud {
+        display: flex;
+        flex-wrap: wrap;
+        gap: 8px;
+        padding: 15px;
+        border-radius: 8px;
+        background: rgba(0, 0, 0, 0.05);
+        margin-bottom: 20px;
+    }
+
+    .tag {
+        display: inline-block;
+        padding: 5px 10px;
+        border-radius: 15px;
+        background-color: #dc3545;
+        color: white;
+        cursor: pointer;
+        font-size: 14px;
+        transition: all 0.3s ease;
+        user-select: none;
+    }
+
+    .tag.selected {
+        background-color: #28a745;
+    }
+
+    .tag.excluded {
+        background-color: #343a40;
+        text-decoration: line-through;
+    }
+
+    .tag:hover {
+        transform: translateY(-2px);
+        box-shadow: 0 2px 5px rgba(0, 0, 0, 0.2);
+    }
+
+    .hidden-select {
+        display: none !important;
+    }
+
+    .site-input-container {
+        position: relative;
+    }
+
+    .site-input {
+        width: 100%;
+    }
+
+    .selected-sites {
+        display: flex;
+        flex-wrap: wrap;
+        gap: 8px;
+        padding: 10px 0;
+    }
+
+    .selected-site {
+        background-color: #214e7b;
+        padding: 2px 8px;
+        border-radius: 12px;
+        font-size: 14px;
+        display: inline-flex;
+        align-items: center;
+        gap: 5px;
+    }
+
+    .remove-site {
+        cursor: pointer;
+        color: #dc3545;
+        font-weight: bold;
+    }
+
+    .section-header {
+        cursor: pointer;
+        padding: 1rem;
+        background: rgba(255, 255, 255, 0.05);
+        border-radius: 4px;
+        margin-bottom: 0.5rem;
+        display: flex;
+        justify-content: space-between;
+        align-items: center;
+    }
+
+    .section-content {
+        padding: 1rem;
+        display: none;
+    }
+
+    .section-content.show {
+        display: block;
+    }
+
+    .chevron::after {
+        content: '▼';
+        transition: transform 0.2s;
+    }
+
+    .chevron.collapsed::after {
+        transform: rotate(-90deg);
+    }
+
+    .main-search-section {
+        background: rgba(255, 255, 255, 0.03);
+        padding: 2rem;
+        border-radius: 8px;
+        margin-bottom: 2rem;
+    }
+
+    .search-button {
+        width: 100%;
+        padding: 1rem;
+        font-size: 1.2rem;
+        margin-top: 2rem;
+    }
+</style>
+
+<div class="form-container">
+    {% if error %}
+    <div class="alert alert-danger">{{ error }}</div>
+    {% endif %}
+
+    <form method="POST" action="{{ url_for('search') }}" class="mb-4">
+        <!-- Main Search Section -->
+        <div class="main-search-section">
+            <div class="mb-4">
+                <label for="usernames" class="form-label h5">Usernames to Search</label>
+                <textarea class="form-control" id="usernames" name="usernames" rows="3" required
+                    placeholder="Enter one or more usernames (separated by spaces or commas)..."></textarea>
+            </div>
+
+            <div class="row align-items-center">
+                <div class="col-md-6">
+                    <label for="top_sites" class="form-label">Number of Sites</label>
+                    <input type="number" class="form-control" id="top_sites" name="top_sites" min="1" max="10000"
+                        placeholder="Default: 500">
+                </div>
+                <div class="col-md-6">
+                    <label for="timeout" class="form-label">Timeout (seconds)</label>
+                    <input type="number" class="form-control" id="timeout" name="timeout" min="1"
+                        placeholder="Default: 30">
+                </div>
+                <div class="col-12 mt-3">
+                    <div class="form-check">
+                        <input type="checkbox" class="form-check-input" id="all_sites" name="all_sites"
+                            onchange="document.getElementById('top_sites').disabled = this.checked;">
+                        <label class="form-check-label" for="all_sites">Search All Sites</label>
+                    </div>
+                </div>
+            </div>
+        </div>
+
+        <!-- Filters Section -->
+        <div class="mb-4">
+            <div class="section-header" onclick="toggleSection('filters')">
+                <h5 class="mb-0">Filters</h5>
+                <span class="chevron"></span>
+            </div>
+            <div id="filters" class="section-content">
+                <div class="mb-3 site-input-container">
+                    <label for="site" class="form-label">Specify Sites (Optional)</label>
+                    <input type="text" class="form-control site-input" id="siteInput"
+                        placeholder="Type to search for sites..." list="siteOptions">
+                    <input type="hidden" id="site" name="site">
+                    <datalist id="siteOptions">
+                        {% for site in site_options %}
+                        <option value="{{ site }}">
+                            {% endfor %}
+                    </datalist>
+                    <div class="selected-sites" id="selectedSites"></div>
+                </div>
+
+                <div class="mb-3">
+                    <label class="form-label">Tags (click to cycle: include → exclude → neutral)</label>
+                    <div class="mb-2">
+                        <small class="text-muted">
+                            <span style="display:inline-block;width:12px;height:12px;background:#28a745;border-radius:50%;"></span> Included (whitelist)
+                            &nbsp;&nbsp;
+                            <span style="display:inline-block;width:12px;height:12px;background:#343a40;border-radius:50%;"></span> Excluded (blacklist)
+                            &nbsp;&nbsp;
+                            <span style="display:inline-block;width:12px;height:12px;background:#dc3545;border-radius:50%;"></span> Neutral
+                        </small>
+                    </div>
+                    <div class="tag-cloud" id="tagCloud"></div>
+                    <select multiple class="hidden-select" id="tags" name="tags">
+                        <option value="gaming">Gaming</option>
+                        <option value="coding">Coding</option>
+                        <option value="photo">Photo</option>
+                        <option value="music">Music</option>
+                        <option value="blog">Blog</option>
+                        <option value="finance">Finance</option>
+                        <option value="freelance">Freelance</option>
+                        <option value="dating">Dating</option>
+                        <option value="tech">Tech</option>
+                        <option value="forum">Forum</option>
+                        <option value="porn">Porn</option>
+                        <option value="erotic">Erotic</option>
+                        <option value="webcam">Webcam</option>
+                        <option value="video">Video</option>
+                        <option value="movies">Movies</option>
+                        <option value="hacking">Hacking</option>
+                        <option value="art">Art</option>
+                        <option value="discussion">Discussion</option>
+                        <option value="sharing">Sharing</option>
+                        <option value="writing">Writing</option>
+                        <option value="wiki">Wiki</option>
+                        <option value="business">Business</option>
+                        <option value="shopping">Shopping</option>
+                        <option value="sport">Sport</option>
+                        <option value="books">Books</option>
+                        <option value="news">News</option>
+                        <option value="documents">Documents</option>
+                        <option value="travel">Travel</option>
+                        <option value="maps">Maps</option>
+                        <option value="hobby">Hobby</option>
+                        <option value="apps">Apps</option>
+                        <option value="classified">Classified</option>
+                        <option value="career">Career</option>
+                        <option value="geosocial">Geosocial</option>
+                        <option value="streaming">Streaming</option>
+                        <option value="education">Education</option>
+                        <option value="networking">Networking</option>
+                        <option value="torrent">Torrent</option>
+                        <option value="science">Science</option>
+                        <option value="medicine">Medicine</option>
+                        <option value="reading">Reading</option>
+                        <option value="stock">Stock</option>
+                        <option value="messaging">Messaging</option>
+                        <option value="trading">Trading</option>
+                        <option value="links">Links</option>
+                        <option value="fashion">Fashion</option>
+                        <option value="tasks">Tasks</option>
+                        <option value="military">Military</option>
+                        <option value="auto">Auto</option>
+                        <option value="gambling">Gambling</option>
+                        <option value="cybercriminal">Cybercriminal</option>
+                        <option value="review">Review</option>
+                        <option value="bookmarks">Bookmarks</option>
+                        <option value="design">Design</option>
+                        <option value="tor">Tor</option>
+                        <option value="i2p">I2P</option>
+                        <option value="q&a">Q&A</option>
+                        <option value="crypto">Crypto</option>
+                        <option value="ai">AI</option>
+                        <!-- Country tags -->
+                        <option value="ae" data-group="country">AE - United Arab Emirates</option>
+                        <option value="ao" data-group="country">AO - Angola</option>
+                        <option value="ar" data-group="country">AR - Argentina</option>
+                        <option value="at" data-group="country">AT - Austria</option>
+                        <option value="au" data-group="country">AU - Australia</option>
+                        <option value="az" data-group="country">AZ - Azerbaijan</option>
+                        <option value="bd" data-group="country">BD - Bangladesh</option>
+                        <option value="be" data-group="country">BE - Belgium</option>
+                        <option value="bg" data-group="country">BG - Bulgaria</option>
+                        <option value="br" data-group="country">BR - Brazil</option>
+                        <option value="by" data-group="country">BY - Belarus</option>
+                        <option value="ca" data-group="country">CA - Canada</option>
+                        <option value="ch" data-group="country">CH - Switzerland</option>
+                        <option value="cl" data-group="country">CL - Chile</option>
+                        <option value="cn" data-group="country">CN - China</option>
+                        <option value="co" data-group="country">CO - Colombia</option>
+                        <option value="cr" data-group="country">CR - Costa Rica</option>
+                        <option value="cz" data-group="country">CZ - Czechia</option>
+                        <option value="de" data-group="country">DE - Germany</option>
+                        <option value="dk" data-group="country">DK - Denmark</option>
+                        <option value="dz" data-group="country">DZ - Algeria</option>
+                        <option value="ee" data-group="country">EE - Estonia</option>
+                        <option value="eg" data-group="country">EG - Egypt</option>
+                        <option value="es" data-group="country">ES - Spain</option>
+                        <option value="eu" data-group="country">EU - European Union</option>
+                        <option value="fi" data-group="country">FI - Finland</option>
+                        <option value="fr" data-group="country">FR - France</option>
+                        <option value="gb" data-group="country">GB - United Kingdom</option>
+                        <option value="global" data-group="country">🌍 Global</option>
+                        <option value="gr" data-group="country">GR - Greece</option>
+                        <option value="hk" data-group="country">HK - Hong Kong</option>
+                        <option value="hr" data-group="country">HR - Croatia</option>
+                        <option value="hu" data-group="country">HU - Hungary</option>
+                        <option value="id" data-group="country">ID - Indonesia</option>
+                        <option value="ie" data-group="country">IE - Ireland</option>
+                        <option value="il" data-group="country">IL - Israel</option>
+                        <option value="in" data-group="country">IN - India</option>
+                        <option value="ir" data-group="country">IR - Iran</option>
+                        <option value="it" data-group="country">IT - Italy</option>
+                        <option value="jp" data-group="country">JP - Japan</option>
+                        <option value="kg" data-group="country">KG - Kyrgyzstan</option>
+                        <option value="kr" data-group="country">KR - Korea</option>
+                        <option value="kz" data-group="country">KZ - Kazakhstan</option>
+                        <option value="la" data-group="country">LA - Laos</option>
+                        <option value="lk" data-group="country">LK - Sri Lanka</option>
+                        <option value="lt" data-group="country">LT - Lithuania</option>
+                        <option value="ma" data-group="country">MA - Morocco</option>
+                        <option value="md" data-group="country">MD - Moldova</option>
+                        <option value="mg" data-group="country">MG - Madagascar</option>
+                        <option value="mk" data-group="country">MK - North Macedonia</option>
+                        <option value="mx" data-group="country">MX - Mexico</option>
+                        <option value="ng" data-group="country">NG - Nigeria</option>
+                        <option value="nl" data-group="country">NL - Netherlands</option>
+                        <option value="no" data-group="country">NO - Norway</option>
+                        <option value="ph" data-group="country">PH - Philippines</option>
+                        <option value="pk" data-group="country">PK - Pakistan</option>
+                        <option value="pl" data-group="country">PL - Poland</option>
+                        <option value="pt" data-group="country">PT - Portugal</option>
+                        <option value="re" data-group="country">RE - Réunion</option>
+                        <option value="ro" data-group="country">RO - Romania</option>
+                        <option value="rs" data-group="country">RS - Serbia</option>
+                        <option value="ru" data-group="country">RU - Russia</option>
+                        <option value="sa" data-group="country">SA - Saudi Arabia</option>
+                        <option value="sd" data-group="country">SD - Sudan</option>
+                        <option value="se" data-group="country">SE - Sweden</option>
+                        <option value="sg" data-group="country">SG - Singapore</option>
+                        <option value="sk" data-group="country">SK - Slovakia</option>
+                        <option value="sv" data-group="country">SV - El Salvador</option>
+                        <option value="th" data-group="country">TH - Thailand</option>
+                        <option value="tn" data-group="country">TN - Tunisia</option>
+                        <option value="tr" data-group="country">TR - Türkiye</option>
+                        <option value="tw" data-group="country">TW - Taiwan</option>
+                        <option value="ua" data-group="country">UA - Ukraine</option>
+                        <option value="uk" data-group="country">UK - United Kingdom</option>
+                        <option value="us" data-group="country">US - United States</option>
+                        <option value="uz" data-group="country">UZ - Uzbekistan</option>
+                        <option value="ve" data-group="country">VE - Venezuela</option>
+                        <option value="vi" data-group="country">VI - Virgin Islands</option>
+                        <option value="vn" data-group="country">VN - Viet Nam</option>
+                        <option value="za" data-group="country">ZA - South Africa</option>
+                    </select>
+                    <select multiple class="hidden-select" id="excludedTags" name="excluded_tags">
+                    </select>
+                </div>
+            </div>
+        </div>
+
+        <!-- Advanced Options Section -->
+        <div class="mb-4">
+            <div class="section-header" onclick="toggleSection('advanced')">
+                <h5 class="mb-0">Advanced Options</h5>
+                <span class="chevron"></span>
+            </div>
+            <div id="advanced" class="section-content">
+                <div class="mb-3 form-check">
+                    <input type="checkbox" class="form-check-input" id="permute" name="permute">
+                    <label class="form-check-label" for="permute">Enable Username Permutations</label>
+                </div>
+                <div class="mb-3 form-check">
+                    <input type="checkbox" class="form-check-input" id="disable_recursive_search"
+                        name="disable_recursive_search">
+                    <label class="form-check-label" for="disable_recursive_search">Disable Recursive Search</label>
+                </div>
+                <div class="mb-3 form-check">
+                    <input type="checkbox" class="form-check-input" id="disable_extracting" name="disable_extracting">
+                    <label class="form-check-label" for="disable_extracting">Disable Information Extraction</label>
+                </div>
+                <div class="mb-3 form-check">
+                    <input type="checkbox" class="form-check-input" id="with_domains" name="with_domains">
+                    <label class="form-check-label" for="with_domains">Check Domains</label>
+                </div>
+                <div class="mb-3">
+                    <label for="proxy" class="form-label">Proxy URL</label>
+                    <input type="text" class="form-control" id="proxy" name="proxy"
+                        placeholder="e.g., 127.0.0.1:1080">
+                </div>
+                <div class="mb-3">
+                    <label for="tor_proxy" class="form-label">TOR Proxy URL</label>
+                    <input type="text" class="form-control" id="tor_proxy" name="tor_proxy"
+                        placeholder="Default: 127.0.0.1:9050">
+                </div>
+                <div class="mb-3">
+                    <label for="i2p_proxy" class="form-label">I2P Proxy URL</label>
+                    <input type="text" class="form-control" id="i2p_proxy" name="i2p_proxy"
+                        placeholder="Default: 127.0.0.1:4444">
+                </div>
+            </div>
+        </div>
+
+        <button type="submit" class="btn search-button" style="background-color: rgb(249, 207, 0); color: black;">
+            Start Search
+        </button>
+    </form>
+</div>
+
+<script>
+    function toggleSection(sectionId) {
+        const content = document.getElementById(sectionId);
+        const header = content.previousElementSibling;
+        content.classList.toggle('show');
+        header.querySelector('.chevron').classList.toggle('collapsed');
+    }
+
+    document.addEventListener('DOMContentLoaded', function () {
+        // Tag cloud functionality with include/exclude (whitelist/blacklist) support
+        const tagCloud = document.getElementById('tagCloud');
+        const hiddenSelect = document.getElementById('tags');
+        const excludedSelect = document.getElementById('excludedTags');
+        const allTags = Array.from(hiddenSelect.options).map(opt => ({
+            value: opt.value,
+            label: opt.text,
+            group: opt.dataset.group || 'category'
+        }));
+
+        function updateTagSelects() {
+            // Clear and repopulate hidden selects based on tag states
+            Array.from(hiddenSelect.options).forEach(opt => opt.selected = false);
+            // Clear excluded select
+            excludedSelect.innerHTML = '';
+
+            document.querySelectorAll('#tagCloud .tag').forEach(tagEl => {
+                const val = tagEl.dataset.value;
+                if (tagEl.classList.contains('selected')) {
+                    const option = Array.from(hiddenSelect.options).find(opt => opt.value === val);
+                    if (option) option.selected = true;
+                } else if (tagEl.classList.contains('excluded')) {
+                    const opt = document.createElement('option');
+                    opt.value = val;
+                    opt.selected = true;
+                    excludedSelect.appendChild(opt);
+                }
+            });
+        }
+
+        let lastGroup = '';
+        allTags.forEach(tag => {
+            if (tag.group !== lastGroup && tag.group === 'country') {
+                const separator = document.createElement('div');
+                separator.style.cssText = 'width:100%;margin:8px 0 4px;padding:4px 0;border-top:1px solid rgba(0,0,0,0.15);font-size:13px;color:#666;';
+                separator.textContent = 'Countries';
+                tagCloud.appendChild(separator);
+            }
+            lastGroup = tag.group;
+
+            const tagElement = document.createElement('span');
+            tagElement.className = 'tag';
+            tagElement.textContent = tag.label;
+            tagElement.dataset.value = tag.value;
+
+            // Single click cycles: neutral -> included -> excluded -> neutral
+            tagElement.addEventListener('click', function (e) {
+                e.preventDefault();
+                if (this.classList.contains('selected')) {
+                    // included -> excluded
+                    this.classList.remove('selected');
+                    this.classList.add('excluded');
+                } else if (this.classList.contains('excluded')) {
+                    // excluded -> neutral
+                    this.classList.remove('excluded');
+                } else {
+                    // neutral -> included
+                    this.classList.add('selected');
+                }
+                updateTagSelects();
+            });
+
+            tagCloud.appendChild(tagElement);
+        });
+
+        // Site selection functionality
+        const siteInput = document.getElementById('siteInput');
+        const hiddenInput = document.getElementById('site');
+        const selectedSitesContainer = document.getElementById('selectedSites');
+        let selectedSites = new Set();
+
+        function updateHiddenInput() {
+            hiddenInput.value = Array.from(selectedSites).join(',');
+        }
+
+        function addSite(site) {
+            if (site && !selectedSites.has(site)) {
+                selectedSites.add(site);
+                updateHiddenInput();
+                const siteElement = document.createElement('span');
+                siteElement.className = 'selected-site';
+                siteElement.innerHTML = `${site}<span class="remove-site" data-site="${site}">&times;</span>`;
+                selectedSitesContainer.appendChild(siteElement);
+            }
+        }
+
+        function removeSite(site) {
+            selectedSites.delete(site);
+            updateHiddenInput();
+            const siteElements = selectedSitesContainer.querySelectorAll('.selected-site');
+            siteElements.forEach(el => {
+                if (el.querySelector('.remove-site').dataset.site === site) {
+                    el.remove();
+                }
+            });
+        }
+
+        siteInput.addEventListener('change', function (e) {
+            const value = this.value.trim();
+            if (value) {
+                addSite(value);
+                this.value = '';
+            }
+        });
+
+        selectedSitesContainer.addEventListener('click', function (e) {
+            if (e.target.classList.contains('remove-site')) {
+                removeSite(e.target.dataset.site);
+            }
+        });
+
+        siteInput.addEventListener('paste', function (e) {
+            e.preventDefault();
+            const paste = (e.clipboardData || window.clipboardData).getData('text');
+            const sites = paste.split(',').map(site => site.trim()).filter(site => site);
+            sites.forEach(addSite);
+        });
+
+        const form = document.querySelector('form');
+        form.addEventListener('submit', function (e) {
+            const selectedTags = Array.from(tagCloud.querySelectorAll('.tag.selected'));
+            Array.from(hiddenSelect.options).forEach(opt => {
+                opt.selected = selectedTags.some(tag => tag.dataset.value === opt.value);
+            });
+            updateHiddenInput();
+        });
+    });
+</script>
+{% endblock %}
@@ -0,0 +1,156 @@
+{% extends "base.html" %}
+{% block content %}
+<style>
+    .tag-badge {
+       background-color: #214e7b;
+       padding: 2px 8px;
+       border-radius: 12px;
+       font-size: 14px;
+       display: inline-flex;
+       align-items: center;
+       gap: 5px;
+       margin: 2px;
+       color: white;
+    }
+    
+    .profile-list {
+       list-style: none;
+       padding: 0;
+    }
+    
+    .profile-item {
+       margin-bottom: 10px;
+       padding: 10px;
+       display: flex;
+       justify-content: space-between;
+       align-items: center;
+       border-bottom: 1px solid rgba(255, 255, 255, 0.1);
+    }
+    
+    .profile-link {
+       display: flex;
+       align-items: center;
+       gap: 8px;
+    }
+    
+    .favicon {
+       width: 16px;
+       height: 16px;
+    }
+    
+    .tag-container {
+       display: flex;
+       flex-wrap: wrap;
+       gap: 5px;
+       justify-content: flex-end;
+    }
+    
+    .report-container {
+       margin-bottom: 1rem;
+    }
+    
+    .report-header {
+       cursor: pointer;
+       padding: 1rem;
+       background: rgba(255, 255, 255, 0.05);
+       border-radius: 4px;
+       margin-bottom: 0.5rem;
+    }
+    
+    .report-content {
+       display: none;
+    }
+    
+    .report-content.show {
+       display: block;
+    }
+    
+    .chevron::after {
+       content: '▼';
+       margin-left: 8px;
+       transition: transform 0.2s;
+    }
+    
+    .chevron.collapsed::after {
+       transform: rotate(-90deg);
+    }
+    </style>
+
+    <div class="form-container">
+        <h1 class="mb-4">Search Results</h1>
+        <!-- Flash messages -->
+        {% with messages = get_flashed_messages() %}
+            {% if messages %}
+                {% for message in messages %}
+                <div class="alert alert-info">{{ message }}</div>
+                {% endfor %}
+            {% endif %}
+        {% endwith %}
+     
+        <p>The search has completed. <a href="{{ url_for('index')}}">Back to start.</a></p>
+     
+        {% if graph_file %}
+        <h3>Combined Graph</h3>
+        <iframe src="{{ url_for('download_report', filename=graph_file) }}" style="width:100%; height:600px; border:none;"></iframe>
+        {% endif %}
+     
+        <hr>
+     
+        {% if individual_reports %}
+        <h3>Individual Reports</h3>
+        <div class="reports-list">
+            {% for report in individual_reports %}
+            <div class="report-container">
+                <div class="report-header" onclick="toggleReport(this)" data-target="report-{{ loop.index }}">
+                    <h5 class="mb-0 d-flex align-items-center">
+                        <span>{{ report.username }}</span>
+                        <span class="chevron"></span>
+                    </h5>
+                </div>
+                <div id="report-{{ loop.index }}" class="report-content">
+                    <p>
+                        <a href="{{ url_for('download_report', filename=report.csv_file) }}">CSV Report</a> |
+                        <a href="{{ url_for('download_report', filename=report.json_file) }}">JSON Report</a> |
+                        <a href="{{ url_for('download_report', filename=report.pdf_file) }}">PDF Report</a> |
+                        <a href="{{ url_for('download_report', filename=report.html_file) }}">HTML Report</a>
+                    </p>
+                    {% if report.claimed_profiles %}
+                    <strong>Claimed Profiles:</strong>
+                    <ul class="profile-list">
+                        {% for profile in report.claimed_profiles %}
+                        <li class="profile-item">
+                            <div class="profile-link">
+                                <img class="favicon" src="https://www.google.com/s2/favicons?domain={{ profile.url }}" onerror="this.style.display='none'" alt="">
+                                <a href="{{ profile.url }}" target="_blank">{{ profile.site_name }}</a>
+                            </div>
+                            {% if profile.tags %}
+                            <div class="tag-container">
+                                {% for tag in profile.tags %}
+                                <span class="tag-badge">{{ tag }}</span>
+                                {% endfor %}
+                            </div>
+                            {% endif %}
+                        </li>
+                        {% endfor %}
+                    </ul>
+                    {% else %}
+                    <p>No claimed profiles found.</p>
+                    {% endif %}
+                </div>
+            </div>
+            {% endfor %}
+        </div>
+        {% else %}
+        <p>No individual reports available.</p>
+        {% endif %}
+     </div>
+     
+     <script>
+     function toggleReport(header) {
+        const reportId = header.getAttribute('data-target');
+        const content = document.getElementById(reportId);
+        content.classList.toggle('show');
+        header.querySelector('.chevron').classList.toggle('collapsed');
+     }
+     </script>
+{% endblock %}
@@ -0,0 +1,16 @@
+{% extends "base.html" %}
+{% block content %}
+<div class="container mt-4 text-center">
+    <h2>Search in progress...</h2>
+    <p>Your request is being processed in the background. This page will automatically redirect once the results are ready.</p>
+    <div class="spinner-border text-primary" role="status">
+      <span class="visually-hidden">Loading...</span>
+    </div>
+    <script>
+    // Auto-refresh the page every 5 seconds to check completion
+    setTimeout(function() {
+        window.location.reload();
+    }, 5000);
+    </script>
+</div>
+{% endblock %}
@@ -0,0 +1,47 @@
+# Download this first to avoid compatibility issues:
+#
+# sudo zypper in python3-devel
+# sudo zypper in python3-dev
+#
+# Then run 'pip3 install -r opensuse.txt' as usual.
+#
+aiodns>=3.0.0
+aiohttp>=3.8.6
+aiohttp-socks>=0.7.1
+arabic-reshaper~=3.0.0
+async-timeout
+attrs>=22.2.0
+certifi>=2023.7.22
+chardet>=5.0.0
+colorama
+future>=0.18.3
+future-annotations>=1.0.0
+html5lib>=1.1
+idna>=3.4
+Jinja2
+lxml>=4.9.2
+MarkupSafe
+mock>=4.0.3
+multidict
+pycountry>=22.3.5
+PyPDF2>=3.0.1
+PySocks>=1.7.1
+python-bidi>=0.4.2
+requests
+requests-futures>=1.0.0
+six>=1.16.0
+socid-extractor>=0.0.24
+soupsieve>=2.3.2.post1
+stem>=1.8.1
+torrequest>=0.1.0
+tqdm
+typing-extensions
+webencodings>=0.5.1
+svglib
+xhtml2pdf~=0.2.11
+XMind>=1.2.0
+yarl
+networkx
+pyvis>=0.2.1
+reportlab
+cloudscraper>=1.2.71
@@ -0,0 +1,7 @@
+#!/usr/bin/env python3
+import asyncio
+
+import maigret
+
+if __name__ == "__main__":
+    asyncio.run(maigret.cli())
@@ -0,0 +1,55 @@
+# -*- mode: python ; coding: utf-8 -*-
+from PyInstaller.utils.hooks import collect_all
+
+datas = []
+binaries = []
+hiddenimports = []
+
+full_import_modules = ['maigret', 'socid_extractor', 'arabic_reshaper', 'pyvis', 'reportlab.graphics.barcode']
+
+for module in full_import_modules:
+    tmp_ret = collect_all(module)
+    datas += tmp_ret[0]; binaries += tmp_ret[1]; hiddenimports += tmp_ret[2]
+
+hiddenimports += ['PySocks', 'beautifulsoup4', 'python-dateutil',
+                  'future-annotations', 'six', 'python-bidi',
+                  'typing-extensions', 'attrs', 'torrequest']
+
+block_cipher = None
+
+
+a = Analysis(['maigret_standalone.py'],
+             pathex=[],
+             binaries=binaries,
+             datas=datas,
+             hiddenimports=hiddenimports,
+             hookspath=[],
+             hooksconfig={},
+             runtime_hooks=[],
+             excludes=[],
+             win_no_prefer_redirects=False,
+             win_private_assemblies=False,
+             cipher=block_cipher,
+             noarchive=False)
+
+pyz = PYZ(a.pure, a.zipped_data,
+             cipher=block_cipher)
+
+exe = EXE(pyz,
+          a.scripts,
+          a.binaries,
+          a.zipfiles,
+          a.datas,  
+          [],
+          name='maigret_standalone',
+          debug=False,
+          bootloader_ignore_signals=False,
+          strip=False,
+          upx=True,
+          upx_exclude=[],
+          runtime_tmpdir=None,
+          console=True,
+          disable_windowed_traceback=False,
+          target_arch=None,
+          codesign_identity=None,
+          entitlements_file=None )
@@ -0,0 +1,5 @@
+maigret @ https://github.com/soxoj/maigret/archive/refs/heads/main.zip
+pefile==2023.2.7 # do not bump while pyinstaller is 6.11.1, there is a conflict
+psutil==7.2.2
+pyinstaller==6.20.0
+pywin32-ctypes==0.2.3
@@ -0,0 +1,103 @@
+[build-system]
+requires = ["poetry-core"]
+build-backend = "poetry.core.masonry.api"
+
+[tool.poetry]
+name = "maigret"
+version = "0.6.0"
+description = "🕵️‍♂️ Collect a dossier on a person by username from thousands of sites."
+authors = ["Soxoj <soxoj@protonmail.com>"]
+readme = "README.md"
+license = "MIT License"
+homepage = "https://pypi.org/project/maigret"
+documentation = "https://maigret.readthedocs.io"
+repository = "https://github.com/soxoj/maigret"
+classifiers = [
+    "Development Status :: 5 - Production/Stable",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
+    "Programming Language :: Python :: 3.14",
+    "Intended Audience :: Information Technology",
+    "Operating System :: OS Independent",
+    "License :: OSI Approved :: MIT License",
+    "Natural Language :: English"
+]
+
+[tool.poetry.urls]
+"Bug Tracker" = "https://github.com/soxoj/maigret/issues"
+
+[tool.poetry.dependencies]
+# poetry install
+# Install only production dependencies:
+# poetry install --without dev
+# Install with dev dependencies:
+# poetry install --with dev
+python = "^3.10"
+aiodns = ">=3,<5"
+aiohttp = "^3.12.14"
+aiohttp-socks = ">=0.10.1,<0.12.0"
+arabic-reshaper = "^3.0.0"
+async-timeout = "^5.0.1"
+attrs = ">=25.3,<27.0"
+certifi = ">=2025.6.15,<2027.0.0"
+chardet = ">=5,<8"
+colorama = "^0.4.6"
+future = "^1.0.0"
+future-annotations= "^1.0.0"
+html5lib = "^1.1"
+idna = "^3.4"
+Jinja2 = "^3.1.6"
+lxml = ">=6.0.2,<7.0"
+MarkupSafe = "^3.0.2"
+mock = "^5.1.0"
+multidict = "^6.6.3"
+pycountry = ">=24.6.1,<27.0.0"
+PyPDF2 = "^3.0.1"
+PySocks = "^1.7.1"
+python-bidi = "^0.6.3"
+requests = "^2.32.4"
+requests-futures = "^1.0.2"
+requests-toolbelt = "^1.0.0"
+six = "^1.17.0"
+socid-extractor = ">=0.0.27,<0.0.29"
+soupsieve = "^2.6"
+stem = "^1.8.1"
+torrequest = "^0.1.0"
+alive_progress = "^3.2.0"
+typing-extensions = "^4.14.1"
+webencodings = "^0.5.1"
+xhtml2pdf = "^0.2.11"
+XMind = "^1.2.0"
+yarl = "^1.20.1"
+networkx = "^2.6.3"
+pyvis = "^0.3.2"
+reportlab = "^4.4.3"
+cloudscraper = "^1.2.71"
+flask = {extras = ["async"], version = "^3.1.1"}
+asgiref = "^3.9.1"
+platformdirs = "^4.3.8"
+curl-cffi = ">=0.14,<1.0"
+
+
+[tool.poetry.group.dev.dependencies]
+# How to add a new dev dependency: poetry add black --group dev
+# Install dev dependencies with: poetry install --with dev
+flake8 = "^7.1.1"
+pytest = ">=8.3.4,<10.0.0"
+pytest-asyncio = "^1.0.0"
+pytest-cov = ">=6,<8"
+pytest-httpserver = "^1.0.0"
+pytest-rerunfailures = ">=15.1,<17.0"
+reportlab = "^4.4.3"
+mypy = ">=1.14.1,<3.0.0"
+tuna = "^0.5.11"
+coverage = "^7.9.2"
+black = ">=25.1,<27.0"
+
+[tool.poetry.scripts]
+# Run with: poetry run maigret <username>
+maigret = "maigret.maigret:run"
+update_sitesmd = "utils.update_site_data:main"
@@ -3,3 +3,5 @@
 filterwarnings =
    error
    ignore::UserWarning
+    ignore:codecs.open\(\) is deprecated:DeprecationWarning:xmind.core.saver
+asyncio_mode=auto
@@ -1,38 +0,0 @@
-aiohttp==3.7.4
-aiohttp-socks==0.5.5
-arabic-reshaper==2.1.1
-async-timeout==3.0.1
-attrs==20.3.0
-beautifulsoup4==4.9.3
-bs4==0.0.1
-certifi==2020.12.5
-chardet==3.0.4
-colorama==0.4.4
-python-dateutil==2.8.1
-future==0.18.2
-future-annotations==1.0.0
-html5lib==1.1
-idna==2.10
-Jinja2==2.11.3
-lxml==4.6.3
-MarkupSafe==1.1.1
-mock==4.0.2
-multidict==5.1.0
-pycountry==20.7.3
-PyPDF2==1.26.0
-PySocks==1.7.1
-python-bidi==0.4.2
-python-socks==1.1.2
-requests>=2.24.0
-requests-futures==1.0.0
-six==1.15.0
-socid-extractor>=0.0.16
-soupsieve==2.1
-stem==1.8.0
-torrequest==0.1.0
-tqdm==4.55.0
-typing-extensions==3.7.4.3
-webencodings==0.5.1
-xhtml2pdf==0.2.5
-XMind==1.2.0
-yarl==1.6.3
@@ -1,3 +0,0 @@
-[egg_info]
-tag_build = 
-tag_date = 0
@@ -1,27 +0,0 @@
-from setuptools import (
-    setup,
-    find_packages,
-)
-
-
-with open('README.md') as fh:
-    readme = fh.read()
-    long_description = readme.replace('./', 'https://raw.githubusercontent.com/soxoj/maigret/main/')
-
-with open('requirements.txt') as rf:
-    requires = rf.read().splitlines()
-
-setup(name='maigret',
-      version='0.1.19',
-      description='Collect a dossier on a person by username from a huge number of sites',
-      long_description=long_description,
-      long_description_content_type="text/markdown",
-      url='https://github.com/soxoj/maigret',
-      install_requires=requires,
-      entry_points={'console_scripts': ['maigret = maigret.maigret:run']},
-      packages=find_packages(),
-      include_package_data=True,
-      author='Soxoj',
-      author_email='soxoj@protonmail.com',
-      license='MIT',
-      zip_safe=False)
@@ -0,0 +1,32 @@
+title: Maigret
+icon: static/maigret.png
+name: maigret
+summary: 🕵️‍♂️ Collect a dossier on a person by username from thousands of sites.
+description: |
+  **Maigret** collects a dossier on a person **by username only**, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys required.
+  
+  Currently supported more than 3000 sites, search is launched against 500 popular sites in descending order of popularity by default. Also supported checking of Tor sites, I2P sites, and domains (via DNS resolving).
+
+version: 0.5.0
+license: MIT
+base: core22
+confinement: strict
+
+source-code: https://github.com/soxoj/maigret
+issues:
+  - https://github.com/soxoj/maigret/issues
+donation:
+  - https://patreon.com/soxoj
+contact:
+  - mailto:soxoj@protonmail.com
+
+parts:
+  maigret:
+    plugin: python
+    source: .
+
+type: app
+apps:
+  maigret:
+    command: bin/maigret
+    plugs: [ network, network-bind, home ]
@@ -6,14 +6,43 @@ import pytest
 from _pytest.mark import Mark

 from maigret.sites import MaigretDatabase
+from maigret.maigret import setup_arguments_parser
+from maigret.settings import Settings
+from aiohttp import web
+
+
+LOCAL_SERVER_PORT = 8080

 CUR_PATH = os.path.dirname(os.path.realpath(__file__))
 JSON_FILE = os.path.join(CUR_PATH, '../maigret/resources/data.json')
-empty_mark = Mark('', [], {})
+SETTINGS_FILE = os.path.join(CUR_PATH, '../maigret/resources/settings.json')
+TEST_JSON_FILE = os.path.join(CUR_PATH, 'db.json')
+LOCAL_TEST_JSON_FILE = os.path.join(CUR_PATH, 'local.json')
+empty_mark = Mark('', (), {})
+
+
+RESULTS_EXAMPLE = {
+    'Reddit': {
+        'cookies': None,
+        'parsing_enabled': False,
+        'url_main': 'https://www.reddit.com/',
+        'username': 'Skyeng',
+    },
+    'GooglePlayStore': {
+        'cookies': None,
+        'http_status': 200,
+        'is_similar': False,
+        'parsing_enabled': False,
+        'rank': 1,
+        'url_main': 'https://play.google.com/store',
+        'url_user': 'https://play.google.com/store/apps/developer?id=Skyeng',
+        'username': 'Skyeng',
+    },
+}


 def by_slow_marker(item):
-    return item.get_closest_marker('slow', default=empty_mark)
+    return item.get_closest_marker('slow', default=empty_mark).name


 def pytest_collection_modifyitems(items):
@@ -26,15 +55,24 @@ def get_test_reports_filenames():

 def remove_test_reports():
    reports_list = get_test_reports_filenames()
-    for f in reports_list: os.remove(f)
+    for f in reports_list:
+        os.remove(f)
    logging.error(f'Removed test reports {reports_list}')


@pytest.fixture(scope='session')
 def default_db():
-    db = MaigretDatabase().load_from_file(JSON_FILE)
+    return MaigretDatabase().load_from_file(JSON_FILE)

-    return db
+
+@pytest.fixture(scope='function')
+def test_db():
+    return MaigretDatabase().load_from_file(TEST_JSON_FILE)
+
+
+@pytest.fixture(scope='function')
+def local_test_db():
+    return MaigretDatabase().load_from_file(LOCAL_TEST_JSON_FILE)


@pytest.fixture(autouse=True)
@@ -42,3 +80,39 @@ def reports_autoclean():
    remove_test_reports()
    yield
    remove_test_reports()
+
+
+@pytest.fixture(scope='session')
+def settings():
+    settings = Settings()
+    settings.load([SETTINGS_FILE])
+    return settings
+
+
+@pytest.fixture(scope='session')
+def argparser():
+    settings = Settings()
+    settings.load([SETTINGS_FILE])
+    return setup_arguments_parser(settings)
+
+
+@pytest.fixture(scope="session")
+def httpserver_listen_address():
+    return ("localhost", 8989)
+
+
+@pytest.fixture
+async def cookie_test_server():
+    async def handle_cookies(request):
+        print(f"Received cookies: {request.cookies}")
+        cookies_dict = {k: v for k, v in request.cookies.items()}
+        return web.json_response({'cookies': cookies_dict})
+
+    app = web.Application()
+    app.router.add_get('/cookies', handle_cookies)
+    runner = web.AppRunner(app)
+    await runner.setup()
+    server = web.TCPSite(runner, port=LOCAL_SERVER_PORT)
+    await server.start()
+    yield server
+    await runner.cleanup()
@@ -0,0 +1,63 @@
+{
+    "engines": {
+        "Discourse": {
+            "name": "Discourse",
+            "site": {
+                "presenseStrs": [
+                    "<meta name=\"generator\" content=\"Discourse"
+                ],
+                "absenceStrs": [
+                    "Oops! That page doesn\u2019t exist or is private.",
+                    "wrap not-found-container"
+                ],
+                "checkType": "message",
+                "url": "{urlMain}/u/{username}/summary"
+            },
+            "presenseStrs": [
+                "<meta name=\"generator\" content=\"Discourse"
+            ]
+        }
+    },
+    "sites": {
+        "ValidActive": {
+            "tags": ["global", "us"],
+            "disabled": false,
+            "checkType": "status_code",
+            "alexaRank": 1,
+            "url": "https://play.google.com/store/apps/developer?id={username}",
+            "urlMain": "https://play.google.com/store",
+            "usernameClaimed": "KONAMI",
+            "usernameUnclaimed": "noonewouldeverusethis7"
+        },
+        "InvalidActive": {
+            "tags": ["global", "us"],
+            "disabled": false,
+            "checkType": "status_code",
+            "alexaRank": 1,
+            "url": "https://play.google.com/store/apps/dev?id={username}",
+            "urlMain": "https://play.google.com/store",
+            "usernameClaimed": "KONAMI",
+            "usernameUnclaimed": "noonewouldeverusethis7"
+        },
+        "ValidInactive": {
+            "tags": ["global", "us"],
+            "disabled": true,
+            "checkType": "status_code",
+            "alexaRank": 1,
+            "url": "https://play.google.com/store/apps/developer?id={username}",
+            "urlMain": "https://play.google.com/store",
+            "usernameClaimed": "KONAMI",
+            "usernameUnclaimed": "noonewouldeverusethis7"
+        },
+        "InvalidInactive": {
+            "tags": ["global", "us"],
+            "disabled": true,
+            "checkType": "status_code",
+            "alexaRank": 1,
+            "url": "https://play.google.com/store/apps/dev?id={username}",
+            "urlMain": "https://play.google.com/store",
+            "usernameClaimed": "KONAMI",
+            "usernameUnclaimed": "noonewouldeverusethis7"
+        }
+    }
+}
@@ -0,0 +1,21 @@
+{
+    "engines": {},
+    "sites": {
+        "StatusCode": {
+            "checkType": "status_code",
+            "url": "http://localhost:8989/url?id={username}",
+            "urlMain": "http://localhost:8989/",
+            "usernameClaimed": "claimed",
+            "usernameUnclaimed": "unclaimed"
+        },
+        "Message": {
+            "checkType": "message",
+            "url": "http://localhost:8989/url?id={username}",
+            "urlMain": "http://localhost:8989/",
+            "presenseStrs": ["user", "profile"],
+            "absenseStrs": ["not found", "404"],
+            "usernameClaimed": "claimed",
+            "usernameUnclaimed": "unclaimed"
+        }
+    }
+}
@@ -1,10 +1,13 @@
 """Maigret activation test functions"""
+
 import json
+import yarl

 import aiohttp
 import pytest
 from mock import Mock

+from tests.conftest import LOCAL_SERVER_PORT
 from maigret.activation import ParsingActivator, import_aiohttp_cookies

 COOKIES_TXT = """# HTTP Cookie File downloaded with cookies.txt by Genuinous @genuinous
@@ -18,37 +21,145 @@ xss.is	FALSE	/	TRUE	0	xf_csrf	test
 xss.is	FALSE	/	TRUE	1642709308	xf_user	tset
 .xss.is	TRUE	/	FALSE	0	muchacho_cache	test
 .xss.is	TRUE	/	FALSE	1924905600	132_evc	test
-httpbin.org	FALSE	/	FALSE	0	a	b
+localhost	FALSE	/	FALSE	0	a	b
 """


+@pytest.mark.skip("captcha")
@pytest.mark.slow
-def test_twitter_activation(default_db):
-    twitter_site = default_db.sites_dict['Twitter']
-    token1 = twitter_site.headers['x-guest-token']
+def test_vimeo_activation(default_db):
+    vimeo_site = default_db.sites_dict['Vimeo']
+    token1 = vimeo_site.headers['Authorization']

-    ParsingActivator.twitter(twitter_site, Mock())
-    token2 = twitter_site.headers['x-guest-token']
+    ParsingActivator.vimeo(vimeo_site, Mock())
+    token2 = vimeo_site.headers['Authorization']

    assert token1 != token2


+@pytest.mark.slow
@pytest.mark.asyncio
-async def test_import_aiohttp_cookies():
+async def test_import_aiohttp_cookies(cookie_test_server):
    cookies_filename = 'cookies_test.txt'
    with open(cookies_filename, 'w') as f:
        f.write(COOKIES_TXT)

-    cookie_jar = await import_aiohttp_cookies(cookies_filename)
-    assert list(cookie_jar._cookies.keys()) == ['xss.is', 'httpbin.org']
+    cookie_jar = import_aiohttp_cookies(cookies_filename)
+    url = f'http://localhost:{LOCAL_SERVER_PORT}/cookies'

-    url = 'https://httpbin.org/cookies'
-    connector = aiohttp.TCPConnector(ssl=False)
-    session = aiohttp.ClientSession(connector=connector, trust_env=True,
-                                    cookie_jar=cookie_jar)
+    cookies = cookie_jar.filter_cookies(yarl.URL(url))
+    assert cookies['a'].value == 'b'

-    response = await session.get(url=url)
-    result = json.loads(await response.content.read())
-    await session.close()
+    async with aiohttp.ClientSession(cookie_jar=cookie_jar) as session:
+        async with session.get(url=url) as response:
+            result = await response.json()
+            print(f"Server response: {result}")

    assert result == {'cookies': {'a': 'b'}}
+
+
+# ---- OnlyFans signing tests (pure-compute, no network) ----
+
+class _FakeSite:
+    """Minimal stand-in for MaigretSite with the attributes onlyfans() touches."""
+
+    def __init__(self, headers=None, activation=None):
+        self.headers = headers or {}
+        self.activation = activation or {
+            "static_param": "jLM8LXHU1CGcuCzPMNwWX9osCScVuP4D",
+            "checksum_indexes": [28, 3, 16, 32, 25, 24, 23, 0, 26],
+            "checksum_constant": -180,
+            "format": "57203:{}:{:x}:69cfa6d8",
+            "url": "https://onlyfans.com/api2/v2/init",
+        }
+
+
+class _FakeResponse:
+    def __init__(self, cookies=None):
+        self.cookies = cookies or {}
+
+
+def test_onlyfans_sets_xbc_when_zero(monkeypatch):
+    site = _FakeSite(headers={"x-bc": "0", "cookie": "existing=1"})
+
+    # Prevent any real network. If _sign path still fires requests.get, fail loudly.
+    import maigret.activation as act_mod
+
+    def boom(*a, **kw):  # pragma: no cover - sanity
+        raise AssertionError("requests.get should not run when cookie is present")
+
+    monkeypatch.setattr(act_mod.__dict__.get("requests", None) or __import__("requests"), "get", boom, raising=False)
+
+    logger = Mock()
+    ParsingActivator.onlyfans(site, logger, url="https://onlyfans.com/api2/v2/users/adam")
+
+    # x-bc must be rewritten to a non-zero hex token
+    assert site.headers["x-bc"] != "0"
+    assert len(site.headers["x-bc"]) == 40  # 20 bytes → 40 hex chars
+    # time / sign headers set for target URL
+    assert "time" in site.headers and site.headers["time"].isdigit()
+    assert site.headers["sign"].startswith("57203:")
+
+
+def test_onlyfans_fetches_init_cookie_when_missing(monkeypatch):
+    """When cookie header is absent, init endpoint is called and its cookies stored."""
+    site = _FakeSite(headers={"x-bc": "already_set_token", "user-id": "0"})
+
+    import requests
+
+    captured = {}
+
+    def fake_get(url, headers=None, timeout=15):
+        captured["url"] = url
+        captured["headers"] = dict(headers or {})
+        return _FakeResponse(cookies={"sess": "abc123", "csrf": "xyz"})
+
+    monkeypatch.setattr(requests, "get", fake_get)
+
+    logger = Mock()
+    ParsingActivator.onlyfans(site, logger, url="https://onlyfans.com/api2/v2/users/adam")
+
+    # init request made
+    assert captured["url"] == site.activation["url"]
+    # headers passed to init include freshly generated time/sign
+    assert "time" in captured["headers"]
+    assert captured["headers"]["sign"].startswith("57203:")
+    # cookie header populated from response
+    assert site.headers["cookie"] == "sess=abc123; csrf=xyz"
+
+
+def test_onlyfans_signature_is_deterministic_for_same_time(monkeypatch):
+    """Two calls with patched time produce identical signatures."""
+    site1 = _FakeSite(headers={"x-bc": "token", "cookie": "c=1"})
+    site2 = _FakeSite(headers={"x-bc": "token", "cookie": "c=1"})
+
+    import maigret.activation
+    monkeypatch.setattr(maigret.activation, "_time", __import__("time"), raising=False)
+
+    fixed = 1_700_000_000.123
+    import time as time_mod
+    monkeypatch.setattr(time_mod, "time", lambda: fixed)
+
+    logger = Mock()
+    ParsingActivator.onlyfans(site1, logger, url="https://onlyfans.com/api2/v2/users/adam")
+    ParsingActivator.onlyfans(site2, logger, url="https://onlyfans.com/api2/v2/users/adam")
+
+    assert site1.headers["time"] == site2.headers["time"]
+    assert site1.headers["sign"] == site2.headers["sign"]
+
+
+def test_onlyfans_sign_differs_per_path(monkeypatch):
+    """Different target URLs must yield different signatures."""
+    site = _FakeSite(headers={"x-bc": "token", "cookie": "c=1"})
+
+    import time as time_mod
+    monkeypatch.setattr(time_mod, "time", lambda: 1_700_000_000.0)
+
+    logger = Mock()
+    ParsingActivator.onlyfans(site, logger, url="https://onlyfans.com/api2/v2/users/adam")
+    sig_adam = site.headers["sign"]
+
+    ParsingActivator.onlyfans(site, logger, url="https://onlyfans.com/api2/v2/users/bob")
+    sig_bob = site.headers["sign"]
+
+    assert sig_adam != sig_bob
@@ -1,66 +1,467 @@
-"""Maigret checking logic test functions"""
+from argparse import ArgumentTypeError
+
+from mock import Mock
 import pytest
-import asyncio
-import logging
-from maigret.checking import AsyncioSimpleExecutor, AsyncioProgressbarExecutor, AsyncioProgressbarSemaphoreExecutor, AsyncioProgressbarQueueExecutor

-logger = logging.getLogger(__name__)
+from maigret import search
+from maigret.checking import (
+    detect_error_page,
+    extract_ids_data,
+    parse_usernames,
+    update_results_info,
+    get_failed_sites,
+    timeout_check,
+    debug_response_logging,
+    process_site_result,
+)
+from maigret.errors import CheckError
+from maigret.result import MaigretCheckResult, MaigretCheckStatus
+from maigret.sites import MaigretSite

-async def func(n):
-    await asyncio.sleep(0.1 * (n % 3))
-    return n
+
+def site_result_except(server, username, **kwargs):
+    query = f'id={username}'
+    server.expect_request('/url', query_string=query).respond_with_data(**kwargs)
+
+
+@pytest.mark.slow
+@pytest.mark.asyncio
+async def test_checking_by_status_code(httpserver, local_test_db):
+    sites_dict = local_test_db.sites_dict
+
+    site_result_except(httpserver, 'claimed', status=200)
+    site_result_except(httpserver, 'unclaimed', status=404)
+
+    result = await search('claimed', site_dict=sites_dict, logger=Mock())
+    assert result['StatusCode']['status'].is_found() is True
+
+    result = await search('unclaimed', site_dict=sites_dict, logger=Mock())
+    assert result['StatusCode']['status'].is_found() is False
+
+
+@pytest.mark.slow
+@pytest.mark.asyncio
+async def test_checking_by_message_positive_full(httpserver, local_test_db):
+    sites_dict = local_test_db.sites_dict
+
+    site_result_except(httpserver, 'claimed', response_data="user profile")
+    site_result_except(httpserver, 'unclaimed', response_data="404 not found")
+
+    result = await search('claimed', site_dict=sites_dict, logger=Mock())
+    assert result['Message']['status'].is_found() is True
+
+    result = await search('unclaimed', site_dict=sites_dict, logger=Mock())
+    assert result['Message']['status'].is_found() is False
+
+
+@pytest.mark.slow
+@pytest.mark.asyncio
+async def test_checking_by_message_positive_part(httpserver, local_test_db):
+    sites_dict = local_test_db.sites_dict
+
+    site_result_except(httpserver, 'claimed', response_data="profile")
+    site_result_except(httpserver, 'unclaimed', response_data="404")
+
+    result = await search('claimed', site_dict=sites_dict, logger=Mock())
+    assert result['Message']['status'].is_found() is True
+
+    result = await search('unclaimed', site_dict=sites_dict, logger=Mock())
+    assert result['Message']['status'].is_found() is False
+
+
+@pytest.mark.slow
+@pytest.mark.asyncio
+async def test_checking_by_message_negative(httpserver, local_test_db):
+    sites_dict = local_test_db.sites_dict
+
+    site_result_except(httpserver, 'claimed', response_data="")
+    site_result_except(httpserver, 'unclaimed', response_data="user 404")
+
+    result = await search('claimed', site_dict=sites_dict, logger=Mock())
+    assert result['Message']['status'].is_found() is False
+
+    result = await search('unclaimed', site_dict=sites_dict, logger=Mock())
+    assert result['Message']['status'].is_found() is True
+
+
+# ---- Pure-function unit tests (no network) ----
+
+
+def test_detect_error_page_site_specific():
+    err = detect_error_page(
+        "Please enable JavaScript to proceed",
+        200,
+        {"Please enable JavaScript to proceed": "Scraping protection"},
+        ignore_403=False,
+    )
+    assert err is not None
+    assert err.type == "Site-specific"
+    assert err.desc == "Scraping protection"
+
+
+def test_detect_error_page_403():
+    err = detect_error_page("some body", 403, {}, ignore_403=False)
+    assert err is not None
+    assert err.type == "Access denied"
+
+
+def test_detect_error_page_403_ignored():
+    # XenForo engine uses ignore403 because member-not-found also returns 403
+    assert detect_error_page("not found body", 403, {}, ignore_403=True) is None
+
+
+def test_detect_error_page_999_linkedin():
+    # LinkedIn returns 999 on bot suspicion — must NOT be reported as Server error
+    assert detect_error_page("", 999, {}, ignore_403=False) is None
+
+
+def test_detect_error_page_500():
+    err = detect_error_page("", 503, {}, ignore_403=False)
+    assert err is not None
+    assert err.type == "Server"
+    assert "503" in err.desc
+
+
+def test_detect_error_page_ok():
+    assert detect_error_page("hello world", 200, {}, ignore_403=False) is None
+
+
+def test_parse_usernames_single_username():
+    logger = Mock()
+    result = parse_usernames({"profile_username": "alice"}, logger)
+    assert result == {"alice": "username"}
+
+
+def test_parse_usernames_list_of_usernames():
+    logger = Mock()
+    result = parse_usernames({"other_usernames": "['alice', 'bob']"}, logger)
+    assert result == {"alice": "username", "bob": "username"}
+
+
+def test_parse_usernames_malformed_list():
+    logger = Mock()
+    result = parse_usernames({"other_usernames": "not-a-list"}, logger)
+    # should swallow the error and just return empty
+    assert result == {}
+    assert logger.warning.called
+
+
+def test_parse_usernames_supported_id():
+    logger = Mock()
+    # "telegram" is in SUPPORTED_IDS per socid_extractor
+    from maigret.checking import SUPPORTED_IDS
+    if SUPPORTED_IDS:
+        key = next(iter(SUPPORTED_IDS))
+        result = parse_usernames({key: "some_value"}, logger)
+        assert result.get("some_value") == key
+
+
+def test_update_results_info_links():
+    info = {"username": "test"}
+    result = update_results_info(
+        info,
+        {"links": "['https://example.com/a', 'https://example.com/b']", "website": "https://example.com/w"},
+        {"alice": "username"},
+    )
+    assert result["ids_usernames"] == {"alice": "username"}
+    assert "https://example.com/w" in result["ids_links"]
+    assert "https://example.com/a" in result["ids_links"]
+
+
+def test_update_results_info_no_website():
+    info = {}
+    result = update_results_info(info, {"links": "[]"}, {})
+    assert result["ids_links"] == []
+
+
+def test_extract_ids_data_bad_html_returns_empty():
+    logger = Mock()
+    # Random HTML should not raise — returns {} if nothing matches
+    out = extract_ids_data("<html><body>nothing special</body></html>", logger, Mock(name="Site"))
+    assert isinstance(out, dict)
+
+
+def test_get_failed_sites_filters_permanent_errors():
+    # Temporary errors (Request timeout, Connecting failure, etc.) are retryable → returned.
+    # Permanent ones (Captcha, Access denied, etc.) and results without error → filtered out.
+    good_status = MaigretCheckResult("u", "S1", "https://s1", MaigretCheckStatus.CLAIMED)
+    timeout_err = MaigretCheckResult(
+        "u", "S2", "https://s2", MaigretCheckStatus.UNKNOWN,
+        error=CheckError("Request timeout", "slow server"),
+    )
+    captcha_err = MaigretCheckResult(
+        "u", "S3", "https://s3", MaigretCheckStatus.UNKNOWN,
+        error=CheckError("Captcha", "Cloudflare"),
+    )
+    results = {
+        "S1": {"status": good_status},
+        "S2": {"status": timeout_err},
+        "S3": {"status": captcha_err},
+        "S4": {},  # no status at all
+    }
+    failed = get_failed_sites(results)
+    # Only the temporary-error site is retry-worthy
+    assert failed == ["S2"]
+
+
+def test_timeout_check_valid():
+    assert timeout_check("2.5") == 2.5
+    assert timeout_check("30") == 30.0
+
+
+def test_timeout_check_invalid():
+    with pytest.raises(ArgumentTypeError):
+        timeout_check("abc")
+    with pytest.raises(ArgumentTypeError):
+        timeout_check("0")
+    with pytest.raises(ArgumentTypeError):
+        timeout_check("-1")
+
+
+def test_debug_response_logging_writes(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    debug_response_logging("https://example.com", "<html>hi</html>", 200, None)
+    out = (tmp_path / "debug.log").read_text()
+    assert "https://example.com" in out
+    assert "200" in out
+
+
+def test_debug_response_logging_no_response(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    debug_response_logging("https://example.com", None, None, CheckError("Timeout"))
+    out = (tmp_path / "debug.log").read_text()
+    assert "No response" in out
+
+
+def _make_site(data_overrides=None):
+    base = {
+        "url": "https://x/{username}",
+        "urlMain": "https://x",
+        "checkType": "status_code",
+        "usernameClaimed": "a",
+        "usernameUnclaimed": "b",
+    }
+    if data_overrides:
+        base.update(data_overrides)
+    return MaigretSite("TestSite", base)
+
+
+def test_process_site_result_no_response_returns_info():
+    site = _make_site()
+    info = {"username": "a", "parsing_enabled": False, "url_user": "https://x/a"}
+    out = process_site_result(None, Mock(), Mock(), info, site)
+    assert out is info
+
+
+def test_process_site_result_status_already_set():
+    site = _make_site()
+    pre = MaigretCheckResult("a", "S", "u", MaigretCheckStatus.ILLEGAL)
+    info = {"username": "a", "parsing_enabled": False, "status": pre, "url_user": "u"}
+    # Since status is already set, function returns without changes
+    out = process_site_result(("<html/>", 200, None), Mock(), Mock(), info, site)
+    assert out["status"] is pre
+
+
+def test_process_site_result_status_code_claimed():
+    site = _make_site({"checkType": "status_code"})
+    info = {"username": "a", "parsing_enabled": False, "url_user": "https://x/a"}
+    out = process_site_result(("<html/>", 200, None), Mock(), Mock(), info, site)
+    assert out["status"].status == MaigretCheckStatus.CLAIMED
+    assert out["http_status"] == 200
+
+
+def test_process_site_result_status_code_available():
+    site = _make_site({"checkType": "status_code"})
+    info = {"username": "a", "parsing_enabled": False, "url_user": "https://x/a"}
+    out = process_site_result(("<html/>", 404, None), Mock(), Mock(), info, site)
+    assert out["status"].status == MaigretCheckStatus.AVAILABLE
+
+
+def test_process_site_result_message_claimed():
+    site = _make_site({
+        "checkType": "message",
+        "presenseStrs": ["profile-name"],
+        "absenceStrs": ["not found"],
+    })
+    info = {"username": "a", "parsing_enabled": False, "url_user": "https://x/a"}
+    out = process_site_result(("<div class='profile-name'>Alice</div>", 200, None), Mock(), Mock(), info, site)
+    assert out["status"].status == MaigretCheckStatus.CLAIMED
+
+
+def test_process_site_result_message_available_by_absence():
+    site = _make_site({
+        "checkType": "message",
+        "presenseStrs": ["profile-name"],
+        "absenceStrs": ["not found"],
+    })
+    info = {"username": "a", "parsing_enabled": False, "url_user": "https://x/a"}
+    out = process_site_result(("<h1>not found</h1> profile-name too", 200, None), Mock(), Mock(), info, site)
+    # absence marker wins even if presence marker also appears
+    assert out["status"].status == MaigretCheckStatus.AVAILABLE
+
+
+def test_process_site_result_with_error_is_unknown():
+    site = _make_site({"checkType": "status_code"})
+    info = {"username": "a", "parsing_enabled": False, "url_user": "https://x/a"}
+    resp = ("body", 403, CheckError("Captcha", "Cloudflare"))
+    out = process_site_result(resp, Mock(), Mock(), info, site)
+    assert out["status"].status == MaigretCheckStatus.UNKNOWN
+    assert out["status"].error is not None
+
+
+# ---- CurlCffiChecker: TLS impersonation header sanitisation ----
+
+
+class _FakeCurlResponse:
+    def __init__(self, text="ok", status_code=200):
+        self.text = text
+        self.status_code = status_code
+
+
+class _FakeCurlSession:
+    """Captures the kwargs of the last .get/.post/.head call for assertions."""
+
+    last_method = None
+    last_kwargs = None
+
+    async def __aenter__(self):
+        return self
+
+    async def __aexit__(self, exc_type, exc, tb):
+        return False
+
+    async def get(self, **kwargs):
+        type(self).last_method = 'get'
+        type(self).last_kwargs = kwargs
+        return _FakeCurlResponse()
+
+    async def post(self, **kwargs):
+        type(self).last_method = 'post'
+        type(self).last_kwargs = kwargs
+        return _FakeCurlResponse()
+
+    async def head(self, **kwargs):
+        type(self).last_method = 'head'
+        type(self).last_kwargs = kwargs
+        return _FakeCurlResponse()
+
+
+@pytest.fixture
+def fake_curl_cffi(monkeypatch):
+    """Replace CurlCffiAsyncSession with a recorder. Resets capture between tests."""
+    from maigret import checking
+    _FakeCurlSession.last_method = None
+    _FakeCurlSession.last_kwargs = None
+    monkeypatch.setattr(checking, 'CurlCffiAsyncSession', _FakeCurlSession)
+    return _FakeCurlSession


@pytest.mark.asyncio
-async def test_simple_asyncio_executor():
-    tasks = [(func, [n], {}) for n in range(10)]
-    executor = AsyncioSimpleExecutor(logger=logger)
-    assert await executor.run(tasks) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
-    assert executor.execution_time > 0.2
-    assert executor.execution_time < 0.3
+async def test_curl_cffi_strips_random_user_agent_to_let_impersonation_drive_ua(fake_curl_cffi):
+    """Regression: maigret used to forward `get_random_user_agent()` (often Chrome 91)
+    to curl_cffi alongside `impersonate="chrome"` (Chrome 131 TLS). Cloudflare composite
+    bot scoring rejects the resulting "Chrome 91 UA + Chrome 131 TLS" combo with a JS
+    challenge. The fix strips User-Agent and Connection from the headers passed to
+    curl_cffi so the impersonation default UA wins.
+    """
+    from maigret.checking import CurlCffiChecker

-@pytest.mark.asyncio
-async def test_asyncio_progressbar_executor():
-    tasks = [(func, [n], {}) for n in range(10)]
+    checker = CurlCffiChecker(logger=Mock(), browser_emulate='chrome')
+    checker.prepare(
+        url='https://example.com/u/test',
+        headers={
+            "User-Agent": "Mozilla/5.0 ... Chrome/91.0.4472.124 ...",  # maigret default
+            "Connection": "close",                                     # maigret default
+        },
+        allow_redirects=True,
+        timeout=10,
+        method='get',
+    )
+    await checker.check()

-    executor = AsyncioProgressbarExecutor(logger=logger)
-    # no guarantees for the results order
-    assert sorted(await executor.run(tasks)) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
-    assert executor.execution_time > 0.2
-    assert executor.execution_time < 0.3
+    sent = fake_curl_cffi.last_kwargs
+    assert fake_curl_cffi.last_method == 'get'
+    assert sent['impersonate'] == 'chrome'
+    # The whole point of the fix: random UA must not leak through.
+    assert sent['headers'] is None or 'User-Agent' not in sent['headers']
+    assert sent['headers'] is None or 'user-agent' not in {k.lower() for k in sent['headers']}
+    # Connection: close also stripped (interferes with impersonation defaults).
+    assert sent['headers'] is None or 'Connection' not in sent['headers']


@pytest.mark.asyncio
-async def test_asyncio_progressbar_semaphore_executor():
-    tasks = [(func, [n], {}) for n in range(10)]
+async def test_curl_cffi_preserves_site_specific_headers(fake_curl_cffi):
+    """Site-specific headers (e.g. Content-Type for POST APIs, auth tokens, cookies)
+    must survive the User-Agent strip — only UA and Connection are removed.
+    """
+    from maigret.checking import CurlCffiChecker

-    executor = AsyncioProgressbarSemaphoreExecutor(logger=logger, in_parallel=5)
-    # no guarantees for the results order
-    assert sorted(await executor.run(tasks)) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
-    assert executor.execution_time > 0.2
-    assert executor.execution_time < 0.4
+    checker = CurlCffiChecker(logger=Mock(), browser_emulate='chrome')
+    checker.prepare(
+        url='https://example.com/api',
+        headers={
+            "User-Agent": "Mozilla/5.0 random",
+            "Connection": "close",
+            "Content-Type": "application/json",
+            "X-Csrf-Token": "abc123",
+        },
+        allow_redirects=True,
+        timeout=10,
+        method='get',
+    )
+    await checker.check()
+
+    sent_headers = fake_curl_cffi.last_kwargs['headers']
+    assert sent_headers is not None
+    assert sent_headers.get("Content-Type") == "application/json"
+    assert sent_headers.get("X-Csrf-Token") == "abc123"
+    # Sanity: stripped pair is gone
+    assert "User-Agent" not in sent_headers
+    assert "Connection" not in sent_headers


@pytest.mark.asyncio
-async def test_asyncio_progressbar_queue_executor():
-    tasks = [(func, [n], {}) for n in range(10)]
+async def test_curl_cffi_handles_empty_headers(fake_curl_cffi):
+    """No headers at all → headers kwarg is None (not an empty dict that could confuse
+    curl_cffi's impersonation header injection)."""
+    from maigret.checking import CurlCffiChecker

-    executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=2)
-    assert await executor.run(tasks) == [0, 1, 3, 2, 4, 6, 7, 5, 9, 8]
-    assert executor.execution_time > 0.5
-    assert executor.execution_time < 0.6
+    checker = CurlCffiChecker(logger=Mock(), browser_emulate='chrome')
+    checker.prepare(
+        url='https://example.com/u/test',
+        headers=None,
+        allow_redirects=True,
+        timeout=10,
+        method='get',
+    )
+    await checker.check()

-    executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=3)
-    assert await executor.run(tasks) == [0, 3, 1, 4, 6, 2, 7, 9, 5, 8]
-    assert executor.execution_time > 0.4
-    assert executor.execution_time < 0.5
+    assert fake_curl_cffi.last_kwargs['headers'] is None
+    assert fake_curl_cffi.last_kwargs['impersonate'] == 'chrome'

-    executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=5)
-    assert await executor.run(tasks) == [0, 3, 6, 1, 4, 7, 9, 2, 5, 8]
-    assert executor.execution_time > 0.3
-    assert executor.execution_time < 0.4

-    executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=10)
-    assert await executor.run(tasks) == [0, 3, 6, 9, 1, 4, 7, 2, 5, 8]
-    assert executor.execution_time > 0.2
-    assert executor.execution_time < 0.3
+@pytest.mark.asyncio
+async def test_curl_cffi_strips_ua_for_post_too(fake_curl_cffi):
+    """The same UA-strip must apply on POST (e.g. Discord-style POST username probes
+    with `tls_fingerprint`)."""
+    from maigret.checking import CurlCffiChecker
+
+    checker = CurlCffiChecker(logger=Mock(), browser_emulate='chrome')
+    checker.prepare(
+        url='https://example.com/api/check',
+        headers={
+            "User-Agent": "Mozilla/5.0 random",
+            "Content-Type": "application/json",
+        },
+        allow_redirects=True,
+        timeout=10,
+        method='post',
+        payload={"username": "test"},
+    )
+    await checker.check()
+
+    sent = fake_curl_cffi.last_kwargs
+    assert fake_curl_cffi.last_method == 'post'
+    assert sent['json'] == {"username": "test"}
+    assert "User-Agent" not in sent['headers']
+    assert sent['headers'].get("Content-Type") == "application/json"
--- a/Show More
+++ b/Show More