From 79e93ab715e4660429cf1f556b7697c2ec846ab4 Mon Sep 17 00:00:00 2001 From: Soxoj <31013580+soxoj@users.noreply.github.com> Date: Tue, 5 May 2026 22:21:00 +0200 Subject: [PATCH] AI mode documentation (#2620) --- README.md | 20 ++++++++++ README.zh-CN.md | 20 ++++++++++ docs/source/command-line-options.rst | 56 ++++++++++++++++++++++++++++ docs/source/features.rst | 27 ++++++++++++++ docs/source/settings.rst | 48 ++++++++++++++++++++++++ maigret/resources/db_meta.json | 4 +- 6 files changed, 173 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index fff26d2..aea33c0 100644 --- a/README.md +++ b/README.md @@ -69,6 +69,7 @@ See also: [Quick start](https://maigret.readthedocs.io/en/latest/quick-start.htm - Fetches an [auto-updated site database](https://maigret.readthedocs.io/en/latest/settings.html#database-auto-update) from GitHub each run (once per 24 hours), and falls back to the built-in database if offline. - Works with Tor and I2P websites; able to check domains. - Ships with a [web interface](#web-interface) for browsing results as a graph and downloading reports in every format from a single page. +- Optional [AI analysis mode](#ai-analysis) (`--ai`) that turns raw findings into a short investigation summary using an OpenAI-compatible API. For the complete feature list, see the [features documentation](https://maigret.readthedocs.io/en/latest/features.html). @@ -195,6 +196,9 @@ maigret user --tags us # search for three usernames on all available sites maigret user1 user2 user3 -a + +# AI-assisted investigation summary (needs OPENAI_API_KEY) +maigret user --ai ``` Run `maigret --help` for all options. Docs: [CLI options](https://maigret.readthedocs.io/en/latest/command-line-options.html), [more examples](https://maigret.readthedocs.io/en/latest/usage-examples.html). Running into 403s or timeouts? See [TROUBLESHOOTING.md](TROUBLESHOOTING.md). @@ -230,6 +234,22 @@ See the full [library usage guide](https://maigret.readthedocs.io/en/latest/libr - `--parse URL` — parse a profile page, extract IDs/usernames, and use them to kick off a recursive search. - `--permute` — generate likely username variants from two or more inputs (e.g. `john doe` → `johndoe`, `j.doe`, …) and search for all of them. - `--self-check [--auto-disable]` — verify `usernameClaimed` / `usernameUnclaimed` pairs against live sites for maintainers auditing the database. +- `--ai` / `--ai-model` — run the [AI analysis](#ai-analysis) over the search results and stream a short investigation summary to the terminal. + + +### AI analysis + +`--ai` collects the search results, builds an internal Markdown report, and sends it to an OpenAI-compatible chat completion endpoint to produce a short, neutral investigation summary (likely real name, location, occupation, interests, languages, confidence, follow-up leads). Per-site progress is suppressed and the model's output is streamed to stdout. + +```bash +export OPENAI_API_KEY=sk-... +maigret user --ai + +# pick a different model +maigret user --ai --ai-model gpt-4o-mini +``` + +The key can also be set as `openai_api_key` in `settings.json`. The endpoint defaults to `https://api.openai.com/v1`, but `openai_api_base_url` in `settings.json` can point to any OpenAI-compatible API (Azure OpenAI, OpenRouter, a local server, …). See the [settings docs](https://maigret.readthedocs.io/en/latest/settings.html) for the full list of options. ### Tor / I2P / proxies diff --git a/README.zh-CN.md b/README.zh-CN.md index fc1e0d2..f2367f0 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -70,6 +70,7 @@ maigret YOUR_USERNAME - 每次运行时(每 24 小时一次)从 GitHub 拉取一份[自动更新的站点数据库](https://maigret.readthedocs.io/en/latest/settings.html#database-auto-update);离线时会回退到内置数据库。 - 可访问 Tor 与 I2P 站点;支持检查域名。 - 自带一个 [Web 界面](#web-interface),可在同一页面将结果以图谱方式浏览,并下载各种格式的报告。 +- 可选的 [AI 分析模式](#ai-analysis)(`--ai`),通过 OpenAI 兼容 API 将原始搜索结果整理成一份简短的调查摘要。 完整特性列表请见[特性文档](https://maigret.readthedocs.io/en/latest/features.html)。 @@ -199,6 +200,9 @@ maigret user --tags us # 同时在所有站点上搜索三个用户名 maigret user1 user2 user3 -a + +# AI 辅助调查摘要(需要 OPENAI_API_KEY) +maigret user --ai ``` 完整选项请运行 `maigret --help`。文档:[命令行选项](https://maigret.readthedocs.io/en/latest/command-line-options.html)、[更多示例](https://maigret.readthedocs.io/en/latest/usage-examples.html)。遇到 403 或超时?参见 [TROUBLESHOOTING.md](TROUBLESHOOTING.md)。 @@ -234,6 +238,22 @@ maigret --web 5000 - `--parse URL` —— 解析一个个人主页,从中提取 ID/用户名,并以此为起点发起递归搜索。 - `--permute` —— 基于两个或更多输入生成可能的用户名变体(例如 `john doe` → `johndoe`、`j.doe` …)并对其逐一搜索。 - `--self-check [--auto-disable]` —— 维护者用于核对数据库的工具:针对线上站点验证 `usernameClaimed` / `usernameUnclaimed` 配对是否仍然有效。 +- `--ai` / `--ai-model` —— 启用 [AI 分析](#ai-analysis),将搜索结果交给 OpenAI 兼容 API,并把简短的调查摘要流式输出到终端。 + + +### AI 分析 + +`--ai` 会先收集搜索结果、在内存中构建 Markdown 报告,再将其发送到一个 OpenAI 兼容的 chat completion 接口,生成一份简短、克制的调查摘要(最可能的真实姓名、所在地、职业、兴趣、语言、置信度以及后续线索)。开启该模式后,逐站点的进度输出会被静默,模型的输出会以流式方式打印到 stdout。 + +```bash +export OPENAI_API_KEY=sk-... +maigret user --ai + +# 切换到其它模型 +maigret user --ai --ai-model gpt-4o-mini +``` + +API key 也可以写入 `settings.json` 的 `openai_api_key` 字段。接口地址默认为 `https://api.openai.com/v1`,通过在 `settings.json` 中设置 `openai_api_base_url`,可以指向任何 OpenAI 兼容的服务(Azure OpenAI、OpenRouter、本地推理服务等)。完整选项见[配置文档](https://maigret.readthedocs.io/en/latest/settings.html)。 ### Tor / I2P / 代理 diff --git a/docs/source/command-line-options.rst b/docs/source/command-line-options.rst index b111d91..ef8a9b3 100644 --- a/docs/source/command-line-options.rst +++ b/docs/source/command-line-options.rst @@ -161,6 +161,14 @@ ndjson (one report per username). E.g. ``--json ndjson`` ``-M``, ``--md`` - Generate a Markdown report (general report on all usernames). See :ref:`markdown-report` below. +``--ai`` - Run an AI-powered analysis of the search results using an +OpenAI-compatible chat completion API. The internal Markdown report is +sent to the model, which returns a short investigation summary that is +streamed to the terminal. See :ref:`ai-analysis` below. + +``--ai-model`` - Model name to use with ``--ai``. Defaults to +``openai_model`` from settings (``gpt-4o`` out of the box). + ``-fo``, ``--folderoutput`` - Results will be saved to this folder, ``results`` by default. Will be created if doesn’t exist. @@ -242,3 +250,51 @@ The Markdown format is optimized for LLM context windows. You can feed the repor The structured Markdown with per-site sections makes it easy for AI tools to extract relationships, cross-reference identities, and identify patterns across accounts. +For a built-in alternative that calls the model for you and prints the +summary directly, see :ref:`ai-analysis` below. + +.. _ai-analysis: + +AI analysis (built-in) +---------------------- + +The ``--ai`` flag turns the search results into a short investigation +summary by sending the internal Markdown report to an OpenAI-compatible +chat completion API and streaming the model's reply to the terminal. + +.. code-block:: console + + export OPENAI_API_KEY=sk-... + maigret username --ai + + # use a smaller / cheaper model + maigret username --ai --ai-model gpt-4o-mini + +While ``--ai`` is active, per-site progress lines and the short text +report at the end are suppressed so the streamed summary is the main +output. The Markdown report itself is built in memory and is **not** +written to disk by ``--ai`` alone — combine with ``--md`` if you also +want the file on disk. + +The summary follows a fixed format with sections for the most likely +real name, location, occupation, interests, languages, main website, +username variants, number of platforms, active years, a confidence +rating, and a short list of follow-up leads. The model is instructed +to rely only on what is supported by the report and to avoid mixing +clearly unrelated profiles into the main identity. + +**Configuration.** The API key is resolved from +``settings.openai_api_key`` first, then from the ``OPENAI_API_KEY`` +environment variable. The endpoint defaults to +``https://api.openai.com/v1`` and can be redirected to any +OpenAI-compatible service (Azure OpenAI, OpenRouter, a local server, +…) by setting ``openai_api_base_url`` in ``settings.json``. See +:ref:`settings` for the full list of options. + +.. note:: + + ``--ai`` makes a network request to the configured chat completion + endpoint and sends the full Markdown report (which contains the + gathered profile data). Use it only with providers and accounts + you trust with that data. + diff --git a/docs/source/features.rst b/docs/source/features.rst index 00e3c45..2fa7387 100644 --- a/docs/source/features.rst +++ b/docs/source/features.rst @@ -147,6 +147,33 @@ Also, there is a short text report in the CLI output after the end of a searchin .. warning:: XMind 8 mindmaps are incompatible with XMind 2022! +AI analysis +----------- + +Maigret can produce a short, human-readable investigation summary on top +of the raw search results using the ``--ai`` flag. It builds the +internal Markdown report, sends it to an OpenAI-compatible chat +completion endpoint, and streams the model's reply directly to the +terminal. + +.. code-block:: console + + export OPENAI_API_KEY=sk-... + maigret username --ai + +The summary uses a fixed format with the most likely real name, +location, occupation, interests, languages, main website, username +variants, number of platforms, active years, a confidence rating, and a +short list of follow-up leads. While ``--ai`` is active, per-site +progress and the short text report are suppressed so the streamed +summary is the main output. + +The endpoint, model, and API key are configured via ``settings.json`` +(``openai_api_key``, ``openai_model``, ``openai_api_base_url``) or the +``OPENAI_API_KEY`` environment variable. Any OpenAI-compatible API can +be used (Azure OpenAI, OpenRouter, a local server, …). See +:ref:`ai-analysis` and :ref:`settings` for details. + Tags ---- diff --git a/docs/source/settings.rst b/docs/source/settings.rst index dc36000..09449f0 100644 --- a/docs/source/settings.rst +++ b/docs/source/settings.rst @@ -101,3 +101,51 @@ This is recommended for **Docker containers**, **CI pipelines**, and **air-gappe - URL of the metadata file (for custom mirrors) **Using a custom database** with ``--db`` always skips auto-update — you are explicitly choosing your data source. + +.. _ai-analysis-settings: + +AI analysis +----------- + +The ``--ai`` flag (see :ref:`ai-analysis`) talks to an OpenAI-compatible +chat completion API. Three settings control how that request is made: + +.. list-table:: + :header-rows: 1 + :widths: 35 25 40 + + * - Setting + - Default + - Description + * - ``openai_api_key`` + - ``""`` (empty) + - API key. If empty, Maigret falls back to the ``OPENAI_API_KEY`` + environment variable. + * - ``openai_model`` + - ``gpt-4o`` + - Default model name. Overridable per-run with ``--ai-model``. + * - ``openai_api_base_url`` + - ``https://api.openai.com/v1`` + - Base URL of the chat completion API. Point this at any + OpenAI-compatible service (Azure OpenAI, OpenRouter, a local + server, …) to use it instead of OpenAI directly. + +Example ``~/.maigret/settings.json`` snippet using a non-OpenAI +endpoint: + +.. code-block:: json + + { + "openai_api_key": "sk-...", + "openai_model": "gpt-4o-mini", + "openai_api_base_url": "https://openrouter.ai/api/v1" + } + +The key resolution order is ``settings.openai_api_key`` → ``OPENAI_API_KEY`` +environment variable; the first non-empty value wins. + +.. note:: + + ``--ai`` sends the full internal Markdown report (which contains the + gathered profile data) to the configured endpoint. Only use providers + and accounts you trust with that data. diff --git a/maigret/resources/db_meta.json b/maigret/resources/db_meta.json index d563902..e372d27 100644 --- a/maigret/resources/db_meta.json +++ b/maigret/resources/db_meta.json @@ -1,7 +1,7 @@ { "version": 1, - "updated_at": "2026-05-05T17:17:59Z", - "sites_count": 3155, + "updated_at": "2026-05-05T20:17:24Z", + "sites_count": 3154, "min_maigret_version": "0.6.0", "data_sha256": "acf9d9fef8412bf05fa09d50c1ae363e5c8394597b1aaa3f98a9a1c4e31ca356", "data_url": "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/data.json"