Make xhtml2pdf optional, fix install on Linux without libcairo

Move xhtml2pdf to the new [pdf] extra so default `pip install maigret`
no longer pulls pycairo (which has no Linux/macOS wheels and breaks the
build without libcairo2-dev). save_pdf_report now raises a clear
RuntimeError pointing to `pip install 'maigret[pdf]'`, and the CLI
turns it into a friendly warning instead of a crash. Adds tests
covering the missing-extra path, plus per-OS install docs.

Fix for #2657, #2534
This commit is contained in:
Soxoj
2026-05-15 12:17:10 +02:00
parent 1e99b6a07c
commit ffb4c1856c
8 changed files with 236 additions and 8 deletions
+2
View File
@@ -173,6 +173,8 @@ docker build --target web -t maigret-web . # Web UI image
Build errors? See the [troubleshooting guide](https://maigret.readthedocs.io/en/latest/installation.html#troubleshooting). Build errors? See the [troubleshooting guide](https://maigret.readthedocs.io/en/latest/installation.html#troubleshooting).
PDF reports (`--pdf`) are an optional extra — install with `pip install 'maigret[pdf]'`. They need system-level graphics libraries on Linux/macOS; see the [PDF reports section](https://maigret.readthedocs.io/en/latest/installation.html#optional-pdf-reports-maigret-pdf) for per-OS install steps.
## Usage ## Usage
### Examples ### Examples
+137
View File
@@ -58,6 +58,17 @@ Maigret ships with a bundled site database. After installation from PyPI (or any
# usage # usage
maigret username maigret username
PDF report support is shipped as an **optional extra** because it relies on
system-level graphics libraries that pip cannot install for you. If you plan to
use ``--pdf``, install Maigret with the ``pdf`` extra:
.. code-block:: bash
pip3 install 'maigret[pdf]'
See :ref:`pdf-extra` below for the full background on why PDF support is
optional and how to fix the most common build errors.
Development version (GitHub) Development version (GitHub)
---------------------------- ----------------------------
@@ -126,6 +137,132 @@ After installing the system dependencies, retry the maigret installation.
If you continue to have issues, consider using Docker instead, which includes all If you continue to have issues, consider using Docker instead, which includes all
necessary dependencies. necessary dependencies.
.. _pdf-extra:
Optional: PDF reports (``maigret[pdf]``)
----------------------------------------
The ``--pdf`` report format is shipped as an optional extra. To enable it:
.. code-block:: bash
pip3 install 'maigret[pdf]'
If PDF support is not installed and you pass ``--pdf``, Maigret prints a
warning and continues without crashing — every other output format
(``--html``, ``--json``, ``--csv``, ``--txt``, ``--xmind``, ``--graph``)
keeps working.
Why is PDF optional?
~~~~~~~~~~~~~~~~~~~~
Maigret renders PDFs by converting an HTML template, and that conversion
pipeline ultimately depends on the ``cairo`` graphics library through a
chain of Python packages roughly shaped like::
maigret[pdf] → xhtml2pdf → svglib → rlPyCairo → pycairo → libcairo2 (system)
The bottom of that chain is a C library — ``libcairo2`` — that has to exist
on the host *before* pip can build the Python bindings. The Python binding
package (``pycairo``) currently ships **only Windows wheels** on PyPI; on
Linux and macOS pip falls back to building from source, and the build fails
the moment ``pkg-config`` cannot find ``cairo``. The error looks like::
../cairo/meson.build:31:12: ERROR: Dependency "cairo" not found (tried pkg-config)
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
Pulling this whole chain for every Maigret install just so the much smaller
group of users who actually want PDFs can have them is a poor trade — so
``xhtml2pdf`` is gated behind the ``pdf`` extra.
Installing the system prerequisites
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Install the cairo headers, ``pkg-config``, and a working C toolchain
*before* running ``pip install 'maigret[pdf]'``.
**Debian / Ubuntu / Linux Mint / Kali:**
.. code-block:: bash
sudo apt update
sudo apt install -y libcairo2-dev pkg-config python3-dev build-essential
pip3 install --upgrade pip setuptools wheel
pip3 install 'maigret[pdf]'
**Fedora / RHEL / CentOS:**
.. code-block:: bash
sudo dnf install -y cairo-devel pkgconfig python3-devel gcc
pip3 install 'maigret[pdf]'
**Arch Linux:**
.. code-block:: bash
sudo pacman -S cairo pkgconf base-devel
pip3 install 'maigret[pdf]'
**Alpine Linux:**
.. code-block:: bash
sudo apk add cairo-dev pkgconf python3-dev build-base
pip3 install 'maigret[pdf]'
**macOS (Homebrew):**
.. code-block:: bash
brew install cairo pkg-config
pip3 install --upgrade pip setuptools wheel
pip3 install 'maigret[pdf]'
**Windows:**
No system packages are needed — ``pycairo`` ships prebuilt wheels for
Windows. Just run:
.. code-block:: bash
pip install 'maigret[pdf]'
**Google Cloud Shell / Colab / Replit / generic CI:**
These environments behave like Debian/Ubuntu — install the same
``libcairo2-dev pkg-config python3-dev build-essential`` triple before
``pip install 'maigret[pdf]'``. If you do not control the base image and
cannot ``apt install``, skip the extra and use ``--html`` reports instead;
HTML reports contain the same data and open in any browser.
``maigret: command not found`` after install
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If pip prints warnings like::
WARNING: The scripts maigret and update_sitesmd are installed in
'/home/<user>/.local/bin' which is not on PATH.
…and ``maigret --version`` then fails with ``command not found``, your
``--user`` install put the entry-point script in a directory the shell does
not search. Add it to ``PATH``:
.. code-block:: bash
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
Or install into a virtual environment, where the entry point lands in the
venv's ``bin/`` automatically:
.. code-block:: bash
python3 -m venv ~/.venvs/maigret
source ~/.venvs/maigret/bin/activate
pip install 'maigret[pdf]' # or just `pip install maigret`
Optional: Cloudflare bypass solver Optional: Cloudflare bypass solver
---------------------------------- ----------------------------------
+7 -1
View File
@@ -908,8 +908,14 @@ async def main():
if args.pdf: if args.pdf:
username = username.replace('/', '_') username = username.replace('/', '_')
filename = report_filepath_tpl.format(username=username, postfix='.pdf') filename = report_filepath_tpl.format(username=username, postfix='.pdf')
try:
save_pdf_report(filename, report_context) save_pdf_report(filename, report_context)
query_notify.warning(f'PDF report on all usernames saved in {filename}') except RuntimeError as e:
query_notify.warning(str(e))
else:
query_notify.warning(
f'PDF report on all usernames saved in {filename}'
)
if args.md: if args.md:
username = username.replace('/', '_') username = username.replace('/', '_')
+13 -3
View File
@@ -78,13 +78,23 @@ def save_html_report(filename: str, context: dict):
f.write(filled_template) f.write(filled_template)
PDF_EXTRA_HINT = (
"PDF reports require the optional 'pdf' extra. "
"Install it with: pip install 'maigret[pdf]'"
)
def save_pdf_report(filename: str, context: dict): def save_pdf_report(filename: str, context: dict):
# Imported lazily so that users without the optional 'pdf' extra
# can still import maigret.report and use other report formats.
try:
from xhtml2pdf import pisa # type: ignore[import-untyped]
except ImportError as e:
raise RuntimeError(PDF_EXTRA_HINT) from e
template, css = generate_report_template(is_pdf=True) template, css = generate_report_template(is_pdf=True)
filled_template = template.render(**context) filled_template = template.render(**context)
# moved here to speed up the launch of Maigret
from xhtml2pdf import pisa # type: ignore[import-untyped]
with open(filename, "w+b") as f: with open(filename, "w+b") as f:
pisa.pisaDocument(io.StringIO(filled_template), dest=f, default_css=css) pisa.pisaDocument(io.StringIO(filled_template), dest=f, default_css=css)
+1 -1
View File
@@ -1,6 +1,6 @@
{ {
"version": 1, "version": 1,
"updated_at": "2026-05-11T17:38:18Z", "updated_at": "2026-05-15T10:17:13Z",
"sites_count": 3154, "sites_count": 3154,
"min_maigret_version": "0.6.0", "min_maigret_version": "0.6.0",
"data_sha256": "1787a341c90d91a56507ae704c8471743709b56d85d6c3dfa8c56189dccbc6dd", "data_sha256": "1787a341c90d91a56507ae704c8471743709b56d85d6c3dfa8c56189dccbc6dd",
+9 -1
View File
@@ -69,7 +69,7 @@ torrequest = "^0.1.0"
alive_progress = "^3.2.0" alive_progress = "^3.2.0"
typing-extensions = "^4.14.1" typing-extensions = "^4.14.1"
webencodings = "^0.5.1" webencodings = "^0.5.1"
xhtml2pdf = "^0.2.11" xhtml2pdf = {version = "^0.2.11", optional = true}
XMind = "^1.2.0" XMind = "^1.2.0"
yarl = "^1.20.1" yarl = "^1.20.1"
networkx = "^2.6.3" networkx = "^2.6.3"
@@ -82,6 +82,13 @@ platformdirs = "^4.3.8"
curl-cffi = ">=0.14,<1.0" curl-cffi = ">=0.14,<1.0"
[tool.poetry.extras]
# Install PDF support with: pip install 'maigret[pdf]'
# Skipped by default because the underlying `pycairo` has no Linux/macOS
# wheels on PyPI and requires system libcairo + pkg-config to build.
pdf = ["xhtml2pdf"]
[tool.poetry.group.dev.dependencies] [tool.poetry.group.dev.dependencies]
# How to add a new dev dependency: poetry add black --group dev # How to add a new dev dependency: poetry add black --group dev
# Install dev dependencies with: poetry install --with dev # Install dev dependencies with: poetry install --with dev
@@ -92,6 +99,7 @@ pytest-cov = ">=6,<8"
pytest-httpserver = "^1.0.0" pytest-httpserver = "^1.0.0"
pytest-rerunfailures = ">=15.1,<17.0" pytest-rerunfailures = ">=15.1,<17.0"
reportlab = "^4.4.3" reportlab = "^4.4.3"
xhtml2pdf = "^0.2.11"
mypy = ">=1.14.1,<3.0.0" mypy = ">=1.14.1,<3.0.0"
tuna = "^0.5.11" tuna = "^0.5.11"
coverage = "^7.9.2" coverage = "^7.9.2"
+1 -1
View File
@@ -3158,7 +3158,7 @@ Rank data fetched from Majestic Million by domains.
1. ![](https://www.google.com/s2/favicons?domain=https://app.airnfts.com) [AirNFTs (https://app.airnfts.com)](https://app.airnfts.com)*: top 100M, crypto, nft* 1. ![](https://www.google.com/s2/favicons?domain=https://app.airnfts.com) [AirNFTs (https://app.airnfts.com)](https://app.airnfts.com)*: top 100M, crypto, nft*
1. ![](https://www.google.com/s2/favicons?domain=https://greasyfork.org) [GreasyFork (https://greasyfork.org)](https://greasyfork.org)*: top 100M, coding* 1. ![](https://www.google.com/s2/favicons?domain=https://greasyfork.org) [GreasyFork (https://greasyfork.org)](https://greasyfork.org)*: top 100M, coding*
The list was updated at (2026-05-11) The list was updated at (2026-05-15)
## Statistics ## Statistics
Enabled/total sites: 2524/3154 = 80.03% Enabled/total sites: 2524/3154 = 80.03%
+65
View File
@@ -3,6 +3,9 @@
import copy import copy
import json import json
import os import os
import subprocess
import sys
import textwrap
import pytest import pytest
from io import StringIO from io import StringIO
@@ -442,6 +445,68 @@ def test_pdf_report():
assert os.path.exists(report_name) assert os.path.exists(report_name)
def test_save_pdf_report_raises_helpful_error_without_xhtml2pdf(
monkeypatch, tmp_path
):
# Setting an entry to None makes a subsequent `import` raise ImportError —
# this simulates the optional 'pdf' extra not being installed without
# actually uninstalling xhtml2pdf from the test environment.
monkeypatch.setitem(sys.modules, 'xhtml2pdf', None)
monkeypatch.setitem(sys.modules, 'xhtml2pdf.pisa', None)
context = generate_report_context(TEST)
target = tmp_path / "report.pdf"
with pytest.raises(RuntimeError) as excinfo:
save_pdf_report(str(target), context)
msg = str(excinfo.value)
assert "maigret[pdf]" in msg
assert "pip install" in msg
assert not target.exists()
def test_xhtml2pdf_is_not_module_level_dependency():
# Guard against a regression where someone hoists `import xhtml2pdf` /
# `from xhtml2pdf import pisa` to the top of maigret/report.py — that
# would force every Maigret user to install the optional extra.
import maigret.report as report_module
module_globals = vars(report_module)
assert 'xhtml2pdf' not in module_globals
assert 'pisa' not in module_globals
def test_import_maigret_without_xhtml2pdf():
# End-to-end check: spawn a fresh interpreter where xhtml2pdf is blocked
# before any maigret module is loaded, and confirm the package, the
# report module, and save_pdf_report itself all import cleanly. Mirrors
# what a user without the [pdf] extra installed would experience.
code = textwrap.dedent(
"""
import sys
sys.modules['xhtml2pdf'] = None
sys.modules['xhtml2pdf.pisa'] = None
import maigret
import maigret.report
from maigret.report import save_pdf_report
assert callable(save_pdf_report)
print("OK")
"""
)
result = subprocess.run(
[sys.executable, "-c", code],
capture_output=True,
text=True,
)
assert result.returncode == 0, (
f"stdout={result.stdout!r} stderr={result.stderr!r}"
)
assert "OK" in result.stdout
def test_text_report(): def test_text_report():
context = generate_report_context(TEST) context = generate_report_context(TEST)
report_text = get_plaintext_report(context) report_text = get_plaintext_report(context)