fix(Instagram): refresh rate-limit marker for stale Login title (#2674 )

fix(checking): reject URLs and emails extracted as usernames (#2673 )
Closes #1403
2026-05-17 11:55:36 +00:00 · 2026-05-16 18:23:13 +02:00 · 2026-05-16 17:52:00 +02:00
13 changed files with 143 additions and 202 deletions
@@ -95,13 +95,6 @@ Each site entry uses one of three `checkType` modes to decide whether a profile
 **Errors vs absence.** Anything that means "the server can't answer right now" — rate limits, captchas, "Checking your browser", "unusual traffic", maintenance pages — belongs in `errors` (mapping the substring to a human-readable error string), not in `absenceStrs`. The `errors` mechanism produces an UNKNOWN result instead of a false CLAIMED or false AVAILABLE.
 **`regexCheck` and non-ASCII usernames.** When `{username}` is interpolated into a URL **path segment** and the username contains characters that need percent-encoding (Cyrillic, Chinese, Korean, spaces, etc.), Maigret skips the site with an `URL-incompatible username` error rather than send a request that would land on a generic listing/homepage and trip overly-broad `presenseStrs`. This default avoids the cascade of false-positives observed in [#459](https://github.com/soxoj/maigret/issues/459) and [#2633](https://github.com/soxoj/maigret/issues/2633). Two corollaries for site entries:
 - If your site legitimately accepts non-ASCII characters in the URL path (a wiki that mounts Unicode usernames, a Russian forum that serves Cyrillic slugs, etc.), declare the actual format with an explicit `regexCheck`. For example, a MediaWiki-style wiki could use `"regexCheck": "^[^\\/\\\\#<>\\[\\]\\|{}]+$"`; a Japanese blog platform might use `"regexCheck": "^[\\w\\-_\\.]+$"` (Python's `\w` matches Unicode letters). Don't paper this over with `regexCheck: "."` — pick a regex that reflects what the site actually accepts.
 - If `{username}` is in a query string (`?name={username}`) or only in `requestPayload`, the default has no effect — query/body values are URL-encoded as parameters and most APIs handle that fine.
 The default kicks in *only* when no per-site `regexCheck` is set. Existing per-site regexes always win.
 Full reference for `checkType`, `urlProbe`, `engine`, and the rest of the `data.json` schema is in the [development guide](docs/source/development.rst), section *How to fix false-positives*.
 ### Editing `data.json` safely
@@ -134,50 +134,11 @@ There are few options for sites data.json helpful in various cases:
 - ``engine`` - a predefined check for the sites of certain type (e.g. forums), see the ``engines`` section in the JSON file
 - ``headers`` - a dictionary of additional headers to be sent to the site
 - ``requestHeadOnly`` - set to ``true`` if it's enough to make a HEAD request to the site
- ``regexCheck`` - a regex to check if the username is valid, in case of frequent false-positives (see ``regexCheck`` and the non-ASCII default below)
+- ``regexCheck`` - a regex to check if the username is valid, in case of frequent false-positives
 - ``requestMethod`` - set the HTTP method to use (e.g., ``POST``). By default, Maigret natively defaults to GET or HEAD.
 - ``requestPayload`` - a dictionary with the JSON payload to send for POST requests (e.g., ``{"username": "{username}"}``), extremely useful for parsing GraphQL or modern JSON APIs.
 - ``protection`` - a list of protection types detected on the site (see below).
 ``regexCheck`` and non-ASCII usernames
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 When ``{username}`` is interpolated into a URL **path segment** and the user-supplied username contains characters that would be percent-encoded by :py:func:`urllib.parse.quote` (Cyrillic, Chinese, Korean, Arabic, spaces, etc.), Maigret skips the site with an ``URL-incompatible username`` error rather than send a request that would land on a generic listing/homepage and trip overly-broad ``presenseStrs``. This default closes the cascade of false-positives observed in `issue #459 <https://github.com/soxoj/maigret/issues/459>`_ and `issue #2633 <https://github.com/soxoj/maigret/issues/2633>`_.
 Scope of the default:
 - Active **only** when ``{username}`` is in the URL path of ``url`` (or ``urlProbe`` if set), e.g. ``https://example.com/u/{username}``.
 - **Not** active when ``{username}`` is in the query string (``?name={username}``) or only in ``requestPayload`` — those values are URL-encoded as parameters and most APIs handle them fine.
 - **Always** deferred when the site has its own ``regexCheck`` — an explicit per-site rule wins.
 Opting a site into broader matching:
 If a site genuinely accepts non-ASCII characters in the URL path (a wiki that mounts Unicode usernames, a Russian forum that serves Cyrillic slugs, etc.), declare the actual accepted format with an explicit ``regexCheck`` that matches your reality. A few worked examples:
 - A MediaWiki-style wiki that allows any character except the MediaWiki-forbidden punctuation:
  .. code-block:: json
     {
       "url": "https://wiki.example/wiki/User:{username}",
       "regexCheck": "^[^\\/\\\\#<>\\[\\]\\|{}]+$"
     }
 - A Japanese blog platform that allows Unicode word characters + dash + dot:
  .. code-block:: json
     {
       "url": "https://blog.example/{username}",
       "regexCheck": "^[\\w\\-_\\.]+$"
     }
  In Python's regex engine, ``\\w`` against a ``str`` pattern matches Unicode letters by default, so Hiragana / Hangul / Cyrillic / etc. all pass.
 **Do not** paper this over with ``"regexCheck": "."`` — that's a placeholder, not a description of what the site accepts; it will let any string through, including URLs and emails that other parts of Maigret may pick up and feed back into recursive search (see ``parse_usernames`` in ``checking.py``).
 The complementary direction also matters: if you notice an existing site with a too-permissive ``regexCheck`` (e.g. ``"^[^\\.]+$"``, which means "anything but a dot" — that gladly lets non-ASCII through), tighten it to the actual accepted character class for the site (typically ``"^[A-Za-z0-9_-]+$"`` for ASCII slugs) when fixing related false-positives.
 ``protection`` (site protection tracking)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -31,7 +31,7 @@ from .executors import AsyncioQueueGeneratorExecutor
 from .result import MaigretCheckResult, MaigretCheckStatus
 from .sites import MaigretDatabase, MaigretSite
 from .types import QueryOptions, QueryResultWrapper
-from .utils import ascii_data_display, get_random_user_agent
+from .utils import ascii_data_display, get_random_user_agent, is_plausible_username
 SUPPORTED_IDS = (
@@ -49,34 +49,6 @@ SUPPORTED_IDS = (
 BAD_CHARS = "#"
 def _username_fits_url_template(site: MaigretSite, username: str) -> bool:
    """Decide whether a username can be safely substituted into a site's URL
    path without producing a percent-encoded slug that the site cannot match.
    Rationale: most sites that interpolate ``{username}`` into a URL path
    segment treat the slug as an ASCII identifier. When a username contains
    non-ASCII characters (or other reserved characters), ``urllib.parse.quote``
    percent-encodes the bytes; the site typically cannot resolve such a slug
    and falls back to a generic listing/homepage that trips overly-broad
    ``presenseStrs`` markers, producing a false CLAIMED. See issues #459 and
    #2633. Sites that genuinely accept broader character sets (e.g. wikis
    that allow Unicode usernames) opt into permissive matching by setting
    their own ``regexCheck``; in that case this helper is bypassed entirely.
    Returns True when the check should proceed, False when the result is
    inherently unreliable and the site should be skipped (ILLEGAL).
    """
    if site.regex_check:
        return True
    template = site.url_probe or site.url or ""
    if "{username}" not in template:
        return True
    path_part, _sep, _query = template.partition("?")
    if "{username}" not in path_part:
        return True
    return quote(username, safe='') == username
 def build_cloudflare_bypass_config(
    settings_obj: Optional[Any], force_enable: bool = False
 ) -> Optional[Dict[str, Any]]:
@@ -667,7 +639,6 @@ def process_site_result(
    html_text, status_code, check_error = response
    # TODO: add elapsed request time counting
    response_time = None
    if logger.level == logging.DEBUG:
@@ -701,7 +672,6 @@ def process_site_result(
                f"Failed activation {method} for site {site.name}: {str(e)}",
                exc_info=True,
            )
        # TODO: temporary check error
    site_name = site.pretty_name
    # presense flags
@@ -908,23 +878,6 @@ def make_site_result(
        results_site["http_status"] = ""
        results_site["response_text"] = ""
        # query_notify.update(results_site["status"])
    # username would be percent-encoded into a path segment — see #459/#2633.
    elif not _username_fits_url_template(site, username):
        results_site["status"] = MaigretCheckResult(
            username,
            site.name,
            url,
            MaigretCheckStatus.ILLEGAL,
            error=CheckError(
                'URL-incompatible username',
                'username contains characters that would be percent-encoded '
                'in this site\'s URL path; result would be unreliable. Add a '
                '`regexCheck` to opt this site in if it accepts these chars.'
            ),
        )
        results_site["url_user"] = ""
        results_site["http_status"] = ""
        results_site["response_text"] = ""
    else:
        # URL of user on site (if it exists)
        results_site["url_user"] = url
@@ -1341,7 +1294,6 @@ async def site_self_check(
                )
                # don't disable entries with other ids types
                # TODO: make normal checking
                if site.name not in results_dict:
                    logger.info(results_dict)
                    changes["issues"].append(f"Site {site.name} not in results (wrong id_type?)")
@@ -1570,13 +1522,23 @@ def parse_usernames(extracted_ids_data, logger) -> Dict:
    new_usernames = {}
    for k, v in extracted_ids_data.items():
        if "username" in k and not "usernames" in k:
            if is_plausible_username(v):
                new_usernames[v] = "username"
            else:
                logger.debug(
                    f"Rejected non-username value extracted under key {k!r}: {v!r}"
                )
        elif "usernames" in k:
            try:
                tree = ast.literal_eval(v)
                if isinstance(tree, list):
                    for n in tree:
                        if is_plausible_username(n):
                            new_usernames[n] = "username"
                        else:
                            logger.debug(
                                f"Rejected non-username item from list under key {k!r}: {n!r}"
                            )
            except Exception as e:
                logger.warning(e)
        if k in SUPPORTED_IDS:
@@ -77,7 +77,6 @@ ERRORS_TYPES = {
    'Connecting failure': 'Try to decrease number of parallel connections (e.g. -n 10)',
 }
 # TODO: checking for reason
 ERRORS_REASONS = {
    'Login required': 'Add authorization cookies through `--cookies-jar-file` (see cookies.txt)',
 }
@@ -55,7 +55,7 @@ from .report import (
 from .sites import MaigretDatabase
 from .submit import Submitter
 from .types import QueryResultWrapper
-from .utils import get_dict_ascii_tree
+from .utils import get_dict_ascii_tree, is_plausible_username
 from .settings import Settings
 from .permutator import Permute
@@ -85,13 +85,23 @@ def extract_ids_from_page(url, logger, timeout=5) -> dict:
        for k, v in info.items():
            # TODO: merge with the same functionality in checking module
            if 'username' in k and not 'usernames' in k:
                if is_plausible_username(v):
                    results[v] = 'username'
                else:
                    logger.debug(
                        f"Rejected non-username value extracted under key {k!r}: {v!r}"
                    )
            elif 'usernames' in k:
                try:
                    tree = ast.literal_eval(v)
                    if isinstance(tree, list):
                        for n in tree:
                            if is_plausible_username(n):
                                results[n] = 'username'
                            else:
                                logger.debug(
                                    f"Rejected non-username item from list under key {k!r}: {n!r}"
                                )
                except Exception as e:
                    logger.warning(e)
            if k in SUPPORTED_IDS:
@@ -516,7 +516,6 @@ def generate_report_context(username_results: list):
                                tag = pycountry.countries.search_fuzzy(v)[
                                    0
                                ].alpha_2.lower()  # type: ignore[attr-defined]
                            # TODO: move countries to another struct
                            tags[tag] = tags.get(tag, 0) + 1
                        except Exception as e:
                            logging.debug(
@@ -568,7 +567,6 @@ def generate_report_context(username_results: list):
    return {
        "username": first_username,
        # TODO: return brief list
        "brief": brief,
        "results": username_results,
        "first_seen": first_seen,
@@ -3767,7 +3767,7 @@
            "absenceStrs": [
                "Couldn't find any profile with name"
            ],
-            "regexCheck": "^[A-Za-z0-9_]{3,16}$",
+            "regexCheck": "^.{1,25}$",
            "usernameClaimed": "blue",
            "usernameUnclaimed": "noonewouldeverusethis7",
            "alexaRank": 1635,
@@ -8218,17 +8218,7 @@
        "Namuwiki": {
            "url": "https://namu.wiki/w/%EC%82%AC%EC%9A%A9%EC%9E%90:{username}",
            "urlMain": "https://namu.wiki/",
-            "checkType": "message",
+            "checkType": "status_code",
            "presenseStrs": [
                "<meta property=\"og:title\""
            ],
            "absenceStrs": [
                "새 문서 만들기"
            ],
            "regexCheck": "^[\\w\\-_.]+$",
            "protection": [
                "cf_js_challenge"
            ],
            "usernameClaimed": "namu",
            "usernameUnclaimed": "noonewouldeverusethis7",
            "alexaRank": 7047,
@@ -13252,7 +13242,7 @@
                "ru"
            ],
            "checkType": "response_url",
-            "regexCheck": "^[A-Za-z0-9_.]+$",
+            "regexCheck": "^[^-]+$",
            "alexaRank": 29071,
            "urlMain": "https://studfile.net",
            "url": "https://studfile.net/users/{username}/",
@@ -15613,7 +15603,7 @@
            "tags": [
                "coding"
            ],
-            "regexCheck": "^[A-Za-z0-9_-]+$",
+            "regexCheck": "^[^\\.]+$",
            "checkType": "message",
            "absenceStrs": [
                "<title>Users - Hacking with Swift</title>"
@@ -17106,7 +17096,7 @@
            "tags": [
                "hacking"
            ],
-            "regexCheck": "^[A-Za-z0-9_-]+$",
+            "regexCheck": "^[^\\.]+$",
            "checkType": "message",
            "absenceStrs": [
                "Cannot Retrieve Information For The Specified Username"
@@ -17566,7 +17556,7 @@
            "errors": {
                "An error has occurred.": "Site error"
            },
-            "regexCheck": "^[A-Za-z0-9_-]+$",
+            "regexCheck": "^[^\\.]+$",
            "checkType": "message",
            "absenceStrs": [
                "No such user."
@@ -20690,7 +20680,7 @@
            "tags": [
                "ru"
            ],
-            "regexCheck": "^[A-Za-z0-9_-]+$",
+            "regexCheck": "^[^\\.]+$",
            "checkType": "message",
            "absenceStrs": [
                "Указанный пользователь не найден"
@@ -20822,7 +20812,7 @@
            "tags": [
                "hu"
            ],
-            "regexCheck": "^[A-Za-z0-9_-]+$",
+            "regexCheck": "^[^\\.]+$",
            "checkType": "message",
            "absenceStrs": [
                "<title>Log in - Chan4Chan</title>"
@@ -1,8 +1,8 @@
 {
    "version": 1,
-    "updated_at": "2026-05-17T08:44:03Z",
+    "updated_at": "2026-05-16T16:00:20Z",
    "sites_count": 3155,
    "min_maigret_version": "0.6.1",
-    "data_sha256": "896a15cfb0de131848de5ae915a81d60d9d86a3e4537dc1004adeab29ceb4b43",
+    "data_sha256": "0997b68c05eedb6e714432ed79580688d4923c56ef1ebf46db69b90039ef00d7",
    "data_url": "https://raw.githubusercontent.com/soxoj/maigret/main/maigret/resources/data.json"
 }
@@ -127,3 +127,29 @@ def get_match_ratio(base_strs: list):
 def generate_random_username():
    return ''.join(random.choices(string.ascii_lowercase, k=10))
 def is_plausible_username(value: Any) -> bool:
    """Reject obviously non-username strings extracted from sites' identity data.
    Extractor schemes occasionally populate fields named like ``*_username``
    with URLs (e.g. ``instagram_username`` -> ``https://instagram.com/X``) or
    emails (e.g. ``your_username`` -> ``user@example.com``). Feeding such a
    value back into a site URL template produces broken requests on every
    subsequent site, which manifests as a cascade of false errors and the
    "wrong username" symptom in #1403.
    """
    if not isinstance(value, str):
        return False
    s = value.strip()
    if not s:
        return False
    if "://" in s or s.startswith(("http://", "https://", "www.", "//")):
        return False
    if "/" in s:
        return False
    if any(c.isspace() for c in s):
        return False
    if "@" in s and "." in s:
        return False
    return True
@@ -3159,16 +3159,16 @@ Rank data fetched from Majestic Million by domains.
 1. ![](https://www.google.com/s2/favicons?domain=https://greasyfork.org) [GreasyFork (https://greasyfork.org)](https://greasyfork.org)*: top 100M, coding*
 1. ![](https://www.google.com/s2/favicons?domain=https://faceit.com/) [Faceit (https://faceit.com/)](https://faceit.com/)*: top 100M, gaming*
-The list was updated at (2026-05-17)
+The list was updated at (2026-05-15)
 ## Statistics
 Enabled/total sites: 2522/3155 = 79.94%
 Incomplete message checks: 311/2522 = 12.33% (false positive risks)
-Status code checks: 634/2522 = 25.14% (false positive risks)
+Status code checks: 635/2522 = 25.18% (false positive risks)
-False positive risk (total): 37.47%
+False positive risk (total): 37.51%
 Sites with probing: 500px, Armchairgm, BinarySearch (disabled), BleachFandom, Bluesky, BongaCams, Boosty, BuyMeACoffee, Calendly, Cent, Chess, Code Sandbox (disabled), Code Snippet Wiki, DailyMotion, Discord, Diskusjon.no, Disqus, Docker Hub, Duolingo, Faceit, FandomCommunityCentral, GitHub, GitLab, Google Plus (archived), Gravatar, HackTheBox, Hackerrank, Hashnode, Holopin, Imgur, Issuu, Keybase, Kick, Kvinneguiden, LeetCode, Lesswrong, Livejasmin, LocalCryptos (disabled), Medium, MicrosoftLearn, MixCloud, Monkeytype, NPM, Niftygateway, Omg.lol, OnlyFans, Paragraph, Picsart, Plurk, Polarsteps, Rarible, Reddit, Reddit Search (Pushshift) (disabled), Revolut.me, RoyalCams, Scratch, Soop, SportsTracker, Spotify, StackOverflow, Substack, TAP'D, Topcoder, Trello, Twitch, Twitter, Twitter Shadowban (disabled), UnstoppableDomains, Vimeo, Vivino, Warframe Market, Warpcast, Weibo, Wikipedia, Yapisal (disabled), YouNow, en.brickimedia.org, forums.grandstream.com, nightbot, notabug.org, qiwi.me (disabled)
@@ -13,7 +13,6 @@ from maigret.checking import (
    timeout_check,
    debug_response_logging,
    process_site_result,
    _username_fits_url_template,
 )
 from maigret.errors import CheckError
 from maigret.result import MaigretCheckResult, MaigretCheckStatus
@@ -145,79 +144,6 @@ def test_detect_error_page_instagram_login_wall():
    assert "rate-limited" in err.desc
 def _site_for_url(url_pattern, regex_check=None, url_probe=None):
    """Build a minimal MaigretSite stub for the URL-template helper tests."""
    raw = {
        "url": url_pattern,
        "urlMain": "https://example.com/",
        "checkType": "message",
        "usernameClaimed": "alice",
        "usernameUnclaimed": "noone",
    }
    if regex_check is not None:
        raw["regexCheck"] = regex_check
    if url_probe is not None:
        raw["urlProbe"] = url_probe
    return MaigretSite("Example", raw)
 # Regression tests for #459 / #2633 — usernames that would be percent-encoded
 # into a URL path segment trip generic presence markers on fallback pages.
 def test_username_fits_path_segment_ascii_slug_passes():
    site = _site_for_url("https://example.com/u/{username}")
    assert _username_fits_url_template(site, "alice") is True
    assert _username_fits_url_template(site, "alice-bob") is True
    assert _username_fits_url_template(site, "alice.bob_42") is True
 def test_username_fits_path_segment_non_ascii_blocked():
    site = _site_for_url("https://example.com/u/{username}")
    # Cyrillic
    assert _username_fits_url_template(site, "Александр") is False
    # Chinese
    assert _username_fits_url_template(site, "快嘴摩卡酱") is False
    # Korean
    assert _username_fits_url_template(site, "홍길동") is False
    # Space (also percent-encoded)
    assert _username_fits_url_template(site, "alice bob") is False
 def test_username_fits_query_string_is_unconstrained():
    """If {username} sits in the query string, the value is URL-encoded as a
    parameter and most APIs handle that fine — don't block."""
    site = _site_for_url("https://example.com/api/users?name={username}")
    assert _username_fits_url_template(site, "快嘴摩卡酱") is True
    assert _username_fits_url_template(site, "Александр") is True
 def test_username_fits_explicit_regex_check_bypasses_helper():
    """When the site declares its own regexCheck, the helper defers entirely."""
    # Permissive site: accepts anything via Unicode-friendly regex.
    site = _site_for_url(
        "https://wiki.example/User:{username}", regex_check=r"^[\w\- .]+$"
    )
    assert _username_fits_url_template(site, "Александр") is True
    assert _username_fits_url_template(site, "快嘴摩卡酱") is True
 def test_username_fits_url_probe_overrides_url():
    """urlProbe is the actual request URL; the helper must use it when set."""
    # Path-segment url, but urlProbe is a clean query API → no validation
    site = _site_for_url(
        "https://example.com/u/{username}",
        url_probe="https://example.com/api/u?name={username}",
    )
    assert _username_fits_url_template(site, "快嘴摩卡酱") is True
 def test_username_fits_post_payload_sites_skipped():
    """Sites with {username} only in requestPayload (no {username} in URL
    template at all) should pass unconditionally — payload is JSON-encoded,
    not URL-path-encoded."""
    site = _site_for_url("https://api.example.com/check")
    assert _username_fits_url_template(site, "快嘴摩卡酱") is True
 def test_detect_error_page_instagram_marker_no_false_positive_on_profile():
    """The login-wall marker must NOT match a real profile page. On a claimed
    user page, `routePath` carries the user-route template
@@ -254,6 +180,33 @@ def test_parse_usernames_malformed_list():
    assert logger.warning.called
 def test_parse_usernames_rejects_url_value():
    """Regression for #1403: extractors sometimes return a URL under a *_username
    key; that URL must not be fed back as a candidate username."""
    logger = Mock()
    result = parse_usernames(
        {"instagram_username": "https://instagram.com/zuck"}, logger
    )
    assert result == {}
 def test_parse_usernames_rejects_email_value():
    """Regression for #1403: e.g. socid_extractor's 'your_username' returns an
    email under a key matching the username heuristic."""
    logger = Mock()
    result = parse_usernames({"your_username": "alice@example.com"}, logger)
    assert result == {}
 def test_parse_usernames_filters_urls_inside_list():
    logger = Mock()
    result = parse_usernames(
        {"other_usernames": "['alice', 'https://example.com/bob']"}, logger
    )
    # 'alice' should survive; the URL should be dropped.
    assert result == {"alice": "username"}
 def test_parse_usernames_supported_id():
    logger = Mock()
    # "telegram" is in SUPPORTED_IDS per socid_extractor
@@ -10,6 +10,7 @@ from maigret.utils import (
    URLMatcher,
    get_dict_ascii_tree,
    get_match_ratio,
    is_plausible_username,
 )
@@ -144,3 +145,52 @@ def test_get_match_ratio():
    fun = get_match_ratio(["test", "maigret", "username"])
    assert fun("test") == 1
 # Regression tests for #1403 — Gravatar URL leaking into next-iteration username.
 # Extractor schemes occasionally store URLs/emails under '*_username' keys; without
 # validation these were fed back into the search loop and produced cascades of false
 # errors. See maigret/utils.py::is_plausible_username.
 def test_is_plausible_username_accepts_bare_usernames():
    assert is_plausible_username("alice")
    assert is_plausible_username("alice.bob")
    assert is_plausible_username("alice_bob-42")
    assert is_plausible_username("Алиса")
 def test_is_plausible_username_rejects_urls():
    assert not is_plausible_username("https://gravatar.com/alice")
    assert not is_plausible_username("http://example.com/user/alice")
    assert not is_plausible_username("//example.com/alice")
    assert not is_plausible_username("www.facebook.com/zuck")
 def test_is_plausible_username_accepts_http_prefixed_handles():
    """Don't over-match: bare names that just happen to start with 'http' or 'www'
    are legitimate (e.g. the httpie CLI maintainer's handle)."""
    assert is_plausible_username("httpie")
    assert is_plausible_username("http_user")
    assert is_plausible_username("wwwsuperstar")
 def test_is_plausible_username_rejects_path_like():
    assert not is_plausible_username("user/alice")
    assert not is_plausible_username("alice/")
 def test_is_plausible_username_rejects_emails():
    assert not is_plausible_username("alice@example.com")
    assert not is_plausible_username("user@maigret.io")
 def test_is_plausible_username_rejects_whitespace_and_empty():
    assert not is_plausible_username("")
    assert not is_plausible_username("   ")
    assert not is_plausible_username("alice bob")
    assert not is_plausible_username("alice\nbob")
 def test_is_plausible_username_rejects_non_strings():
    assert not is_plausible_username(None)
    assert not is_plausible_username(42)
    assert not is_plausible_username(["alice"])
@@ -165,7 +165,6 @@ if __name__ == '__main__':
    sites = {site.name: site for site in sites_subset}
    engines = db.engines
    # TODO: usernames extractors
    ok_usernames = ['alex', 'god', 'admin', 'red', 'blue', 'john']
    if args.username:
        ok_usernames = [args.username] + ok_usernames
Author	SHA1	Message	Date
Soxoj	ceed9aa9cc	fix(Instagram): refresh rate-limit marker for stale Login title (#2674 )	2026-05-16 18:23:13 +02:00
Soxoj	51a5169987	fix(checking): reject URLs and emails extracted as usernames (#2673 ) Closes #1403	2026-05-16 17:52:00 +02:00