mirror of
https://github.com/soxoj/maigret.git
synced 2026-05-06 22:19:01 +00:00
Fix crash on -a --self-check by adding exception handling to site check coroutines (#2466)
* Initial plan * Fix crash on -a --self-check by adding exception handling in site_self_check and self_check Wrap the body of site_self_check in try/except to catch unexpected errors and always return a valid changes dict. Also add a safety-net try/except in self_check around awaiting individual site check futures so that a single site failure doesn't crash the entire self-check process. Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/5e27d620-5cbb-43d2-a9f9-ecb53a29904d Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com> * Restore @pytest.mark.slow on test_maigret_results Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/5e27d620-5cbb-43d2-a9f9-ecb53a29904d Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com> * Document --self-check error resilience, --auto-disable, and --diagnose in docs/ Update command-line-options.rst with expanded --self-check description and new --auto-disable and --diagnose entries. Add a "Database self-check" section to features.rst explaining error-resilient behaviour and usage examples. Update usage-examples.rst to reference --auto-disable. Agent-Logs-Url: https://github.com/soxoj/maigret/sessions/af1f0f09-9112-4902-8475-e81d235ff3ed Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
This commit is contained in:
@@ -133,12 +133,25 @@ Other operations modes
|
|||||||
|
|
||||||
``--version`` - Display version information and dependencies.
|
``--version`` - Display version information and dependencies.
|
||||||
|
|
||||||
``--self-check`` - Do self-checking for sites and database and disable
|
``--self-check`` - Do self-checking for sites and database. Each site is
|
||||||
non-working ones **for current search session** by default. It’s useful
|
tested by looking up its known-claimed and known-unclaimed usernames and
|
||||||
for testing new internet connection (it depends on provider/hosting on
|
verifying that the results match expectations. Individual site failures
|
||||||
which sites there will be censorship stub or captcha display). After
|
(network errors, unexpected exceptions, etc.) are caught and logged
|
||||||
checking Maigret asks if you want to save updates, answering y/Y will
|
without stopping the overall process, so the check always runs to
|
||||||
rewrite the local database.
|
completion. After checking, Maigret reports a summary of issues found.
|
||||||
|
If any sites were disabled (see ``--auto-disable``), Maigret asks if you
|
||||||
|
want to save updates; answering y/Y will rewrite the local database.
|
||||||
|
|
||||||
|
``--auto-disable`` - Used with ``--self-check``: automatically disable
|
||||||
|
sites that fail checks (incorrect detection of claimed/unclaimed
|
||||||
|
usernames, connection errors, or unexpected exceptions). Without this
|
||||||
|
flag, ``--self-check`` only **reports** issues without modifying the
|
||||||
|
database.
|
||||||
|
|
||||||
|
``--diagnose`` - Used with ``--self-check``: print detailed diagnosis
|
||||||
|
information for each failing site, including the check type, the list
|
||||||
|
of issues found, and recommendations (e.g. suggesting a different
|
||||||
|
``checkType``).
|
||||||
|
|
||||||
``--submit URL`` - Do an automatic analysis of the given account URL or
|
``--submit URL`` - Do an automatic analysis of the given account URL or
|
||||||
site main page URL to determine the site engine and methods to check
|
site main page URL to determine the site engine and methods to check
|
||||||
|
|||||||
@@ -170,6 +170,35 @@ Maigret will do retries of the requests with temporary errors got (connection fa
|
|||||||
|
|
||||||
One attempt by default, can be changed with option ``--retries N``.
|
One attempt by default, can be changed with option ``--retries N``.
|
||||||
|
|
||||||
|
Database self-check
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Maigret includes a self-check mode (``--self-check``) that validates every site
|
||||||
|
in the database by looking up its known-claimed and known-unclaimed usernames
|
||||||
|
and verifying that the detection results match expectations.
|
||||||
|
|
||||||
|
The self-check is **error-resilient**: if an individual site check raises an
|
||||||
|
unexpected exception (e.g. a network error or a parsing failure), the error is
|
||||||
|
caught, logged, and recorded as an issue — the remaining sites continue to be
|
||||||
|
checked without interruption. This means the process always runs to completion,
|
||||||
|
even when checking hundreds of sites with ``-a --self-check``.
|
||||||
|
|
||||||
|
Use ``--auto-disable`` together with ``--self-check`` to automatically disable
|
||||||
|
sites that fail checks. Without it, issues are only reported. Use ``--diagnose``
|
||||||
|
to print detailed per-site diagnosis including the check type, specific issues,
|
||||||
|
and recommendations.
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# Report-only mode (no changes to the database)
|
||||||
|
maigret --self-check
|
||||||
|
|
||||||
|
# Automatically disable failing sites and save updates
|
||||||
|
maigret -a --self-check --auto-disable
|
||||||
|
|
||||||
|
# Show detailed diagnosis for each failing site
|
||||||
|
maigret -a --self-check --diagnose
|
||||||
|
|
||||||
Archives and mirrors checking
|
Archives and mirrors checking
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
|
||||||
|
|||||||
@@ -33,7 +33,7 @@ Use Cases
|
|||||||
If you experience many false positives, you can do the following:
|
If you experience many false positives, you can do the following:
|
||||||
|
|
||||||
- Install the last development version of Maigret from GitHub
|
- Install the last development version of Maigret from GitHub
|
||||||
- Run Maigret with ``--self-check`` flag and agree on disabling of problematic sites
|
- Run Maigret with ``--self-check --auto-disable`` flag and agree on disabling of problematic sites
|
||||||
|
|
||||||
3. Search for accounts with username ``machine42`` and generate HTML and PDF reports.
|
3. Search for accounts with username ``machine42`` and generate HTML and PDF reports.
|
||||||
|
|
||||||
|
|||||||
@@ -967,6 +967,7 @@ async def site_self_check(
|
|||||||
"recommendations": [],
|
"recommendations": [],
|
||||||
}
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
check_data = [
|
check_data = [
|
||||||
(site.username_claimed, MaigretCheckStatus.CLAIMED),
|
(site.username_claimed, MaigretCheckStatus.CLAIMED),
|
||||||
(site.username_unclaimed, MaigretCheckStatus.AVAILABLE),
|
(site.username_unclaimed, MaigretCheckStatus.AVAILABLE),
|
||||||
@@ -1091,6 +1092,19 @@ async def site_self_check(
|
|||||||
site.tags.remove("unchecked")
|
site.tags.remove("unchecked")
|
||||||
db.update_site(site)
|
db.update_site(site)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(
|
||||||
|
f"Self-check of {site.name} failed with unexpected error: {e}",
|
||||||
|
exc_info=True,
|
||||||
|
)
|
||||||
|
changes["issues"].append(f"Unexpected error: {e}")
|
||||||
|
if auto_disable and not site.disabled:
|
||||||
|
changes["disabled"] = True
|
||||||
|
site.disabled = True
|
||||||
|
db.update_site(site)
|
||||||
|
if not silent:
|
||||||
|
print(f"Disabled site {site.name} (unexpected error)...")
|
||||||
|
|
||||||
return changes
|
return changes
|
||||||
|
|
||||||
|
|
||||||
@@ -1142,7 +1156,18 @@ async def self_check(
|
|||||||
if tasks:
|
if tasks:
|
||||||
with alive_bar(len(tasks), title='Self-checking', force_tty=True, disable=no_progressbar) as progress:
|
with alive_bar(len(tasks), title='Self-checking', force_tty=True, disable=no_progressbar) as progress:
|
||||||
for site_name, f in tasks:
|
for site_name, f in tasks:
|
||||||
|
try:
|
||||||
result = await f
|
result = await f
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(
|
||||||
|
f"Self-check task for {site_name} raised unexpected error: {e}",
|
||||||
|
exc_info=True,
|
||||||
|
)
|
||||||
|
result = {
|
||||||
|
"disabled": False,
|
||||||
|
"issues": [f"Unexpected error: {e}"],
|
||||||
|
"recommendations": [],
|
||||||
|
}
|
||||||
result['site_name'] = site_name
|
result['site_name'] = site_name
|
||||||
all_results.append(result)
|
all_results.append(result)
|
||||||
progress() # Update the progress bar
|
progress() # Update the progress bar
|
||||||
|
|||||||
+37
-1
@@ -12,7 +12,8 @@ from maigret.maigret import (
|
|||||||
extract_ids_from_page,
|
extract_ids_from_page,
|
||||||
extract_ids_from_results,
|
extract_ids_from_results,
|
||||||
)
|
)
|
||||||
from maigret.sites import MaigretSite
|
from maigret.checking import site_self_check
|
||||||
|
from maigret.sites import MaigretSite, MaigretDatabase
|
||||||
from maigret.result import MaigretCheckResult, MaigretCheckStatus
|
from maigret.result import MaigretCheckResult, MaigretCheckStatus
|
||||||
from tests.conftest import RESULTS_EXAMPLE
|
from tests.conftest import RESULTS_EXAMPLE
|
||||||
|
|
||||||
@@ -83,6 +84,41 @@ async def test_self_check_progressbar_enabled_by_default(test_db):
|
|||||||
assert kwargs.get('disable') is False
|
assert kwargs.get('disable') is False
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_site_self_check_handles_exception(test_db):
|
||||||
|
"""Verify that site_self_check catches unexpected exceptions and returns a valid result."""
|
||||||
|
logger = Mock()
|
||||||
|
sem = asyncio.Semaphore(1)
|
||||||
|
site = test_db.sites_dict['ValidActive']
|
||||||
|
|
||||||
|
with patch('maigret.checking.maigret', side_effect=RuntimeError("test crash")):
|
||||||
|
result = await site_self_check(site, logger, sem, test_db)
|
||||||
|
|
||||||
|
assert isinstance(result, dict)
|
||||||
|
assert "issues" in result
|
||||||
|
assert len(result["issues"]) > 0
|
||||||
|
assert any("Unexpected error" in issue for issue in result["issues"])
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_self_check_handles_task_exception(test_db):
|
||||||
|
"""Verify that self_check continues when individual site checks raise exceptions."""
|
||||||
|
logger = Mock()
|
||||||
|
|
||||||
|
with patch('maigret.checking.maigret', side_effect=RuntimeError("test crash")):
|
||||||
|
result = await self_check(
|
||||||
|
test_db, test_db.sites_dict, logger, silent=True,
|
||||||
|
no_progressbar=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert isinstance(result, dict)
|
||||||
|
assert 'results' in result
|
||||||
|
assert len(result['results']) == len(test_db.sites_dict)
|
||||||
|
for r in result['results']:
|
||||||
|
assert 'site_name' in r
|
||||||
|
assert 'issues' in r
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.slow
|
@pytest.mark.slow
|
||||||
@pytest.mark.skip(reason="broken, fixme")
|
@pytest.mark.skip(reason="broken, fixme")
|
||||||
def test_maigret_results(test_db):
|
def test_maigret_results(test_db):
|
||||||
|
|||||||
Reference in New Issue
Block a user