Merge pull request #119 from soxoj/0.1.20

Bump to 0.1.20
2026-05-07 14:34:33 +00:00 · 2021-05-02 12:05:08 +03:00 · 2021-05-02 12:02:53 +03:00 · 2021-05-02 11:08:52 +03:00 · 2021-05-02 11:06:37 +03:00 · 2021-05-01 23:58:28 +03:00
38 changed files with 10260 additions and 7134 deletions
@@ -15,7 +15,7 @@ jobs:
    runs-on: ubuntu-latest
    strategy:
      matrix:
-        python-version: [3.6, 3.7, 3.8, 3.9]
+        python-version: [3.6.9, 3.7, 3.8, 3.9]
    steps:
    - uses: actions/checkout@v2
@@ -26,8 +26,8 @@ jobs:
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
-        python -m pip install flake8 pytest
+        python -m pip install flake8 pytest pytest-rerunfailures
        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
    - name: Test with pytest
      run: |
-        pytest
+        pytest --reruns 3 --reruns-delay 5
@@ -2,6 +2,37 @@
 ## [Unreleased]
 ## [0.1.20] - 2021-05-02
 * added `--retries` option
 * added `source` feature for sites' mirrors
 * improved `submit` mode
 * lot of style and logic fixes
 ## [0.1.19] - 2021-04-14
 * added `--no-progressbar` option
 * fixed ascii tree bug
 * fixed `python -m maigret` run
 * fixed requests freeze with timeout async tasks
 ## [0.1.18] - 2021-03-30
 * some API improvements
 ## [0.1.17] - 2021-03-30
 * simplified maigret search API
 * improved documentation
 * fixed 403 response code ignoring bug
 ## [0.1.16] - 2021-03-21
 * improved URL parsing mode
 * improved sites submit mode
 * added uID.me uguid support
 * improved requests processing
 ## [0.1.15] - 2021-03-14
 * improved HTML reports
 * fixed python-3.6-specific error
 * false positives fixes
 ## [0.1.14] - 2021-02-25
 * added JSON export formats
 * improved tags markup
@@ -1,21 +1,21 @@
-FROM python:3.7-alpine
+FROM python:3.7
 LABEL maintainer="Soxoj <soxoj@protonmail.com>"
 WORKDIR /app
 ADD requirements.txt .
-RUN pip install --upgrade pip \
+RUN pip install --upgrade pip
-&& apk add --update --virtual .build-dependencies \
+
-      build-base \
+RUN apt update -y
 RUN apt install -y\
      gcc \
      musl-dev \
      libxml2 \
      libxml2-dev \
      libxslt-dev \
      jpeg-dev \
 &&  YARL_NO_EXTENSIONS=1 python3 -m pip install maigret \
 &&  apk del .build-dependencies \
 &&  rm -rf /var/cache/apk/* \
           /tmp/* \
           /var/tmp/*
@@ -26,6 +26,7 @@ Currently supported more than 2000 sites ([full list](./sites.md)), by default s
 * Search by tags (site categories, countries)
 * Censorship and captcha detection
 * Very few false positives
 * Failed requests' restarts
 ## Installation
@@ -33,20 +34,43 @@ Currently supported more than 2000 sites ([full list](./sites.md)), by default s
 **Python 3.8 is recommended.**
 ### Package installing
 ```bash
 # install from pypi
-$ pip3 install maigret
+pip3 install maigret
 # or clone and install manually
-$ git clone https://github.com/soxoj/maigret && cd maigret
+git clone https://github.com/soxoj/maigret && cd maigret
-$ pip3 install .
+pip3 install .
 ```
 ### Cloning a repository
 ```bash
 git clone https://github.com/soxoj/maigret && cd maigret
 ```
 You can use a free virtual machine, the repo will be automatically cloned:
 [![Open in Cloud Shell](https://user-images.githubusercontent.com/27065646/92304704-8d146d80-ef80-11ea-8c29-0deaabb1c702.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/soxoj/maigret&tutorial=README.md) [![Run on Repl.it](https://user-images.githubusercontent.com/27065646/92304596-bf719b00-ef7f-11ea-987f-2c1f3c323088.png)](https://repl.it/github/soxoj/maigret)
 <a href="https://colab.research.google.com/gist//soxoj/879b51bc3b2f8b695abb054090645000/maigret.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="40"></a>
 ```bash
 pip3 install -r requirements.txt
 ```
 ## Using examples
 ```bash
-maigret user
+# for a cloned repo
 ./maigret.py user
 # for a package
 maigret user
 ```
 Features:
 ```bash
 # make HTML and PDF reports
 maigret user --html --pdf
@@ -63,19 +87,17 @@ Run `maigret --help` to get arguments description. Also options are documented i
 With Docker:
 ```
-docker build -t maigret .
+# manual build
 docker build -t maigret . && docker run maigret user
-docker run maigret user
+# official image
 docker run soxoj/maigret:latest user
 ```
 ## Demo with page parsing and recursive username search
 [PDF report](./static/report_alexaimephotographycars.pdf), [HTML report](https://htmlpreview.github.io/?https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.html)
 ```bash
 maigret alexaimephotographycars
 ```
 ![animation of recursive search](./static/recursive_search.svg)
 ![HTML report screenshot](./static/report_alexaimephotography_html_screenshot.png)
@@ -1,15 +1,13 @@
 # HTTP Cookie File downloaded with cookies.txt by Genuinous @genuinous
 # This file can be used by wget, curl, aria2c and other standard compliant tools.
 # Usage Examples:
-#   1) wget -x --load-cookies cookies.txt "https://xss.is/search/"
+#   1) wget -x --load-cookies cookies.txt "https://pixabay.com/users/blue-156711/"
-#   2) curl --cookie cookies.txt "https://xss.is/search/"
+#   2) curl --cookie cookies.txt "https://pixabay.com/users/blue-156711/"
-#   3) aria2c --load-cookies cookies.txt "https://xss.is/search/"
+#   3) aria2c --load-cookies cookies.txt "https://pixabay.com/users/blue-156711/"
 #
-xss.is	FALSE	/	TRUE	0	xf_csrf	PMnZNsr42HETwYEr
+.pixabay.com	TRUE	/	TRUE	1618356838	__cfduid	d56929cd50d11474f421b849df5758a881615764837
-xss.is	FALSE	/	TRUE	0	xf_from_search	google
+.pixabay.com	TRUE	/	TRUE	1615766638	__cf_bm	ea8f7c565b44d749f65500f0e45176cebccaeb09-1615764837-1800-AYJIXh2boDJ6HPf44JI9fnteWABHOVvkxiSccACP9EiS1E58UDTGhViXtqjFfVE0QRj1WowP4ss2DzCs+pW+qUc=
-xss.is	FALSE	/	TRUE	1642709308	xf_user	215268%2CZNKB_-64Wk-BOpsdtLYy-1UxfS5zGpxWaiEGUhmX
+pixabay.com	FALSE	/	FALSE	0	anonymous_user_id	c1e4ee09-5674-4252-aa94-8c47b1ea80ab
-xss.is	FALSE	/	TRUE	0	xf_session	sGdxJtP_sKV0LCG8vUQbr6cL670_EFWM
+pixabay.com	FALSE	/	FALSE	1647214439	csrftoken	vfetTSvIul7gBlURt6s985JNM18GCdEwN5MWMKqX4yI73xoPgEj42dbNefjGx5fr
-.xss.is	TRUE	/	FALSE	0	muchacho_cache	[&quot;00fbb0f2772c9596b0483d6864563cce&quot;]
+pixabay.com	FALSE	/	FALSE	1647300839	client_width	1680
-.xss.is	TRUE	/	FALSE	0	muchacho_png	[&quot;00fbb0f2772c9596b0483d6864563cce&quot;]
+pixabay.com	FALSE	/	FALSE	748111764839	is_human	1
 .xss.is	TRUE	/	FALSE	0	muchacho_etag	[&quot;00fbb0f2772c9596b0483d6864563cce&quot;]
 .xss.is	TRUE	/	FALSE	1924905600	2e66e4dd94a7a237d0d1b4d50f01e179_evc	[&quot;00fbb0f2772c9596b0483d6864563cce&quot;]
@@ -0,0 +1,5 @@
 #!/bin/sh
 FILES="maigret wizard.py maigret.py tests"
 echo 'black'
 black --skip-string-normalization $FILES
@@ -0,0 +1,11 @@
 #!/bin/sh
 FILES="maigret wizard.py maigret.py tests"
 echo 'syntax errors or undefined names'
 flake8 --count --select=E9,F63,F7,F82 --show-source --statistics $FILES
 echo 'warning'
 flake8 --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --ignore=E731,W503 $FILES
 echo 'mypy'
 mypy ./maigret ./wizard.py ./tests
@@ -1,4 +1,4 @@
-#! /usr/bin/env python3
+#!/usr/bin/env python3
 import asyncio
 import sys
@@ -15,4 +15,4 @@ def run():
 if __name__ == "__main__":
-    run()
+    run()
@@ -1 +1,5 @@
 """Maigret"""
 from .checking import maigret as search
 from .sites import MaigretEngine, MaigretSite, MaigretDatabase
 from .notify import QueryNotifyPrint as Notifier
@@ -6,7 +6,7 @@ Maigret entrypoint
 import asyncio
-import maigret
+from .maigret import main
 if __name__ == "__main__":
-    asyncio.run(maigret.main())
+    asyncio.run(main())
@@ -1,56 +1,56 @@
 import aiohttp
 from aiohttp import CookieJar
 import asyncio
 import json
 from http.cookiejar import MozillaCookieJar
 from http.cookies import Morsel
 import requests
 from aiohttp import CookieJar
 class ParsingActivator:
    @staticmethod
    def twitter(site, logger, cookies={}):
        headers = dict(site.headers)
-        del headers['x-guest-token']
+        del headers["x-guest-token"]
-        r = requests.post(site.activation['url'], headers=headers)
+        r = requests.post(site.activation["url"], headers=headers)
        logger.info(r)
        j = r.json()
-        guest_token = j[site.activation['src']]
+        guest_token = j[site.activation["src"]]
-        site.headers['x-guest-token'] = guest_token
+        site.headers["x-guest-token"] = guest_token
    @staticmethod
    def vimeo(site, logger, cookies={}):
        headers = dict(site.headers)
-        if 'Authorization' in headers:
+        if "Authorization" in headers:
-            del headers['Authorization']
+            del headers["Authorization"]
-        r = requests.get(site.activation['url'], headers=headers)
+        r = requests.get(site.activation["url"], headers=headers)
-        jwt_token = r.json()['jwt']
+        jwt_token = r.json()["jwt"]
-        site.headers['Authorization'] = 'jwt ' + jwt_token
+        site.headers["Authorization"] = "jwt " + jwt_token
    @staticmethod
    def spotify(site, logger, cookies={}):
        headers = dict(site.headers)
-        if 'Authorization' in headers:
+        if "Authorization" in headers:
-            del headers['Authorization']
+            del headers["Authorization"]
-        r = requests.get(site.activation['url'])
+        r = requests.get(site.activation["url"])
-        bearer_token = r.json()['accessToken']
+        bearer_token = r.json()["accessToken"]
-        site.headers['authorization'] = f'Bearer {bearer_token}'
+        site.headers["authorization"] = f"Bearer {bearer_token}"
    @staticmethod
    def xssis(site, logger, cookies={}):
        if not cookies:
-            logger.debug('You must have cookies to activate xss.is parsing!')
+            logger.debug("You must have cookies to activate xss.is parsing!")
            return
        headers = dict(site.headers)
        post_data = {
-            '_xfResponseType': 'json',
+            "_xfResponseType": "json",
-            '_xfToken': '1611177919,a2710362e45dad9aa1da381e21941a38'
+            "_xfToken": "1611177919,a2710362e45dad9aa1da381e21941a38",
        }
-        headers['content-type'] = 'application/x-www-form-urlencoded; charset=UTF-8'
+        headers["content-type"] = "application/x-www-form-urlencoded; charset=UTF-8"
-        r = requests.post(site.activation['url'], headers=headers, cookies=cookies, data=post_data)
+        r = requests.post(
-        csrf = r.json()['csrf']
+            site.activation["url"], headers=headers, cookies=cookies, data=post_data
-        site.get_params['_xfToken'] = csrf
+        )
        csrf = r.json()["csrf"]
        site.get_params["_xfToken"] = csrf
 async def import_aiohttp_cookies(cookiestxt_filename):
@@ -64,8 +64,8 @@ async def import_aiohttp_cookies(cookiestxt_filename):
        for key, cookie in list(domain.values())[0].items():
            c = Morsel()
            c.set(key, cookie.value, cookie.value)
-            c['domain'] = cookie.domain
+            c["domain"] = cookie.domain
-            c['path'] = cookie.path
+            c["path"] = cookie.path
            cookies_list.append((key, c))
    cookies.update_cookies(cookies_list)
@@ -1,133 +1,125 @@
 import asyncio
 import logging
 from mock import Mock
 import re
 import ssl
 import sys
 import tqdm
 from typing import Tuple, Optional, Dict, List
 import aiohttp
 import tqdm.asyncio
 from aiohttp_socks import ProxyConnector
 from mock import Mock
 from python_socks import _errors as proxy_errors
 from socid_extractor import extract
 from .activation import ParsingActivator, import_aiohttp_cookies
 from . import errors
 from .errors import CheckError
 from .executors import (
    AsyncExecutor,
    AsyncioSimpleExecutor,
    AsyncioProgressbarQueueExecutor,
 )
 from .result import QueryResult, QueryStatus
 from .sites import MaigretDatabase, MaigretSite
 from .types import QueryOptions, QueryResultWrapper
 from .utils import get_random_user_agent
 supported_recursive_search_ids = (
-    'yandex_public_id',
+    "yandex_public_id",
-    'gaia_id',
+    "gaia_id",
-    'vk_id',
+    "vk_id",
-    'ok_id',
+    "ok_id",
-    'wikimapia_uid',
+    "wikimapia_uid",
-    'steam_id',
+    "steam_id",
    "uidme_uguid",
 )
-common_errors = {
+unsupported_characters = "#"
    '<title>Attention Required! | Cloudflare</title>': 'Cloudflare captcha',
    'Please stand by, while we are checking your browser': 'Cloudflare captcha',
    '<title>Доступ ограничен</title>': 'Rostelecom censorship',
    'document.getElementById(\'validate_form_submit\').disabled=true': 'Mail.ru captcha',
    'Verifying your browser, please wait...<br>DDoS Protection by</font> Blazingfast.io': 'Blazingfast protection',
    '404</h1><p class="error-card__description">Мы&nbsp;не&nbsp;нашли страницу': 'MegaFon 404 page',
    'Доступ к информационному ресурсу ограничен на основании Федерального закона': 'MGTS censorship',
    'Incapsula incident ID': 'Incapsula antibot protection',
 }
 unsupported_characters = '#'
-async def get_response(request_future, site_name, logger):
+async def get_response(request_future, logger) -> Tuple[str, int, Optional[CheckError]]:
    html_text = None
    status_code = 0
-
+    error: Optional[CheckError] = CheckError("Unknown")
    error_text = "General Unknown Error"
    expection_text = None
    try:
        response = await request_future
        status_code = response.status
        response_content = await response.content.read()
-        charset = response.charset or 'utf-8'
+        charset = response.charset or "utf-8"
-        decoded_content = response_content.decode(charset, 'ignore')
+        decoded_content = response_content.decode(charset, "ignore")
        html_text = decoded_content
-        if status_code > 0:
+        if status_code == 0:
-            error_text = None
+            error = CheckError("Connection lost")
        else:
            error = None
        logger.debug(html_text)
-    except asyncio.TimeoutError as errt:
+    except asyncio.TimeoutError as e:
-        error_text = "Timeout Error"
+        error = CheckError("Request timeout", str(e))
-        expection_text = str(errt)
+    except aiohttp.client_exceptions.ClientConnectorError as e:
-    except (ssl.SSLCertVerificationError, ssl.SSLError) as err:
+        error = CheckError("Connecting failure", str(e))
-        error_text = "SSL Error"
+    except aiohttp.http_exceptions.BadHttpMessage as e:
-        expection_text = str(err)
+        error = CheckError("HTTP", str(e))
-    except aiohttp.client_exceptions.ClientConnectorError as err:
+    except proxy_errors.ProxyError as e:
-        error_text = "Error Connecting"
+        error = CheckError("Proxy", str(e))
-        expection_text = str(err)
+    except KeyboardInterrupt:
-    except aiohttp.http_exceptions.BadHttpMessage as err:
+        error = CheckError("Interrupted")
-        error_text = "HTTP Error"
+    except Exception as e:
-        expection_text = str(err)
+        # python-specific exceptions
-    except proxy_errors.ProxyError as err:
+        if sys.version_info.minor > 6:
-        error_text = "Proxy Error"
+            if isinstance(e, ssl.SSLCertVerificationError) or isinstance(
-        expection_text = str(err)
+                e, ssl.SSLError
-    except Exception as err:
+            ):
-        logger.warning(f'Unhandled error while requesting {site_name}: {err}')
+                error = CheckError("SSL", str(e))
-        logger.debug(err, exc_info=True)
+        else:
-        error_text = "Some Error"
+            logger.debug(e, exc_info=True)
-        expection_text = str(err)
+            error = CheckError("Unexpected", str(e))
-    # TODO: return only needed information
+    return str(html_text), status_code, error
    return html_text, status_code, error_text, expection_text
 async def update_site_dict_from_response(sitename, site_dict, results_info, semaphore, logger, query_notify):
    async with semaphore:
        site_obj = site_dict[sitename]
        future = site_obj.request_future
        if not future:
            # ignore: search by incompatible id type
            return
        response = await get_response(request_future=future,
                                      site_name=sitename,
                                      logger=logger)
        site_dict[sitename] = process_site_result(response, query_notify, logger, results_info, site_obj)
 # TODO: move to separate class
-def detect_error_page(html_text, status_code, fail_flags, ignore_403):
+def detect_error_page(
    html_text, status_code, fail_flags, ignore_403
 ) -> Optional[CheckError]:
    # Detect service restrictions such as a country restriction
    for flag, msg in fail_flags.items():
        if flag in html_text:
-            return 'Some site error', msg
+            return CheckError("Site-specific", msg)
    # Detect common restrictions such as provider censorship and bot protection
-    for flag, msg in common_errors.items():
+    err = errors.detect(html_text)
-        if flag in html_text:
+    if err:
-            return 'Error', msg
+        return err
    # Detect common site errors
    if status_code == 403 and not ignore_403:
-        return 'Access denied', 'Access denied, use proxy/vpn'
+        return CheckError("Access denied", "403 status code, use proxy/vpn")
    elif status_code >= 500:
-        return f'Error {status_code}', f'Site error {status_code}'
+        return CheckError("Server", f"{status_code} status code")
-    return None, None
+    return None
-def process_site_result(response, query_notify, logger, results_info, site: MaigretSite):
+def process_site_result(
    response, query_notify, logger, results_info: QueryResultWrapper, site: MaigretSite
 ):
    if not response:
        return results_info
    fulltags = site.tags
    # Retrieve other site information again
-    username = results_info['username']
+    username = results_info["username"]
-    is_parsing_enabled = results_info['parsing_enabled']
+    is_parsing_enabled = results_info["parsing_enabled"]
    url = results_info.get("url_user")
    logger.debug(url)
@@ -139,42 +131,47 @@ def process_site_result(response, query_notify, logger, results_info, site: Maig
    # Get the expected check type
    check_type = site.check_type
    # Get the failure messages and comments
    failure_errors = site.errors
    # TODO: refactor
    if not response:
-        logger.error(f'No response for {site.name}')
+        logger.error(f"No response for {site.name}")
        return results_info
-    html_text, status_code, error_text, expection_text = response
+    html_text, status_code, check_error = response
    site_error_text = '?'
    # TODO: add elapsed request time counting
    response_time = None
    if logger.level == logging.DEBUG:
-        with open('debug.txt', 'a') as f:
+        with open("debug.txt", "a") as f:
-            status = status_code or 'No response'
+            status = status_code or "No response"
-            f.write(f'url: {url}\nerror: {str(error_text)}\nr: {status}\n')
+            f.write(f"url: {url}\nerror: {check_error}\nr: {status}\n")
            if html_text:
-                f.write(f'code: {status}\nresponse: {str(html_text)}\n')
+                f.write(f"code: {status}\nresponse: {str(html_text)}\n")
-    if status_code and not error_text:
+    # additional check for errors
-        error_text, site_error_text = detect_error_page(html_text, status_code, failure_errors,
+    if status_code and not check_error:
-                                                        site.ignore_403)
+        check_error = detect_error_page(
            html_text, status_code, site.errors, site.ignore403
        )
    if site.activation and html_text:
-        is_need_activation = any([s for s in site.activation['marks'] if s in html_text])
+        is_need_activation = any(
            [s for s in site.activation["marks"] if s in html_text]
        )
        if is_need_activation:
-            method = site.activation['method']
+            method = site.activation["method"]
            try:
                activate_fun = getattr(ParsingActivator(), method)
                # TODO: async call
                activate_fun(site, logger)
            except AttributeError:
-                logger.warning(f'Activation method {method} for site {site.name} not found!')
+                logger.warning(
                    f"Activation method {method} for site {site.name} not found!"
                )
            except Exception as e:
                logger.warning(f"Failed activation {method} for site {site.name}: {e}")
    site_name = site.pretty_name
    # presense flags
    # True by default
    presense_flags = site.presense_strs
@@ -182,55 +179,53 @@ def process_site_result(response, query_notify, logger, results_info, site: Maig
    if html_text:
        if not presense_flags:
            is_presense_detected = True
-            site.stats['presense_flag'] = None
+            site.stats["presense_flag"] = None
        else:
            for presense_flag in presense_flags:
                if presense_flag in html_text:
                    is_presense_detected = True
-                    site.stats['presense_flag'] = presense_flag
+                    site.stats["presense_flag"] = presense_flag
-                    logger.info(presense_flag)
+                    logger.debug(presense_flag)
                    break
-    if error_text is not None:
+    def build_result(status, **kwargs):
-        logger.debug(error_text)
+        return QueryResult(
-        result = QueryResult(username,
+            username,
-                             site.name,
+            site_name,
-                             url,
+            url,
-                             QueryStatus.UNKNOWN,
+            status,
-                             query_time=response_time,
+            query_time=response_time,
-                             context=f'{error_text}: {site_error_text}', tags=fulltags)
+            tags=fulltags,
            **kwargs,
        )
    if check_error:
        logger.debug(check_error)
        result = QueryResult(
            username,
            site_name,
            url,
            QueryStatus.UNKNOWN,
            query_time=response_time,
            error=check_error,
            context=str(CheckError),
            tags=fulltags,
        )
    elif check_type == "message":
        absence_flags = site.absence_strs
        is_absence_flags_list = isinstance(absence_flags, list)
        absence_flags_set = set(absence_flags) if is_absence_flags_list else {absence_flags}
        # Checks if the error message is in the HTML
-        is_absence_detected = any([(absence_flag in html_text) for absence_flag in absence_flags_set])
+        is_absence_detected = any(
            [(absence_flag in html_text) for absence_flag in site.absence_strs]
        )
        if not is_absence_detected and is_presense_detected:
-            result = QueryResult(username,
+            result = build_result(QueryStatus.CLAIMED)
                                 site.name,
                                 url,
                                 QueryStatus.CLAIMED,
                                 query_time=response_time, tags=fulltags)
        else:
-            result = QueryResult(username,
+            result = build_result(QueryStatus.AVAILABLE)
                                 site.name,
                                 url,
                                 QueryStatus.AVAILABLE,
                                 query_time=response_time, tags=fulltags)
    elif check_type == "status_code":
        # Checks if the status code of the response is 2XX
-        if (not status_code >= 300 or status_code < 200) and is_presense_detected:
+        if is_presense_detected and (not status_code >= 300 or status_code < 200):
-            result = QueryResult(username,
+            result = build_result(QueryStatus.CLAIMED)
                                 site.name,
                                 url,
                                 QueryStatus.CLAIMED,
                                 query_time=response_time, tags=fulltags)
        else:
-            result = QueryResult(username,
+            result = build_result(QueryStatus.AVAILABLE)
                                 site.name,
                                 url,
                                 QueryStatus.AVAILABLE,
                                 query_time=response_time, tags=fulltags)
    elif check_type == "response_url":
        # For this detection method, we have turned off the redirect.
        # So, there is no need to check the response URL: it will always
@@ -238,21 +233,14 @@ def process_site_result(response, query_notify, logger, results_info, site: Maig
        # code indicates that the request was successful (i.e. no 404, or
        # forward to some odd redirect).
        if 200 <= status_code < 300 and is_presense_detected:
-            result = QueryResult(username,
+            result = build_result(QueryStatus.CLAIMED)
                                 site.name,
                                 url,
                                 QueryStatus.CLAIMED,
                                 query_time=response_time, tags=fulltags)
        else:
-            result = QueryResult(username,
+            result = build_result(QueryStatus.AVAILABLE)
                                 site.name,
                                 url,
                                 QueryStatus.AVAILABLE,
                                 query_time=response_time, tags=fulltags)
    else:
        # It should be impossible to ever get here...
-        raise ValueError(f"Unknown check type '{check_type}' for "
+        raise ValueError(
-                         f"site '{site.name}'")
+            f"Unknown check type '{check_type}' for " f"site '{site.name}'"
        )
    extracted_ids_data = {}
@@ -260,54 +248,230 @@ def process_site_result(response, query_notify, logger, results_info, site: Maig
        try:
            extracted_ids_data = extract(html_text)
        except Exception as e:
-            logger.warning(f'Error while parsing {site.name}: {e}', exc_info=True)
+            logger.warning(f"Error while parsing {site.name}: {e}", exc_info=True)
        if extracted_ids_data:
            new_usernames = {}
            for k, v in extracted_ids_data.items():
-                if 'username' in k:
+                if "username" in k:
-                    new_usernames[v] = 'username'
+                    new_usernames[v] = "username"
                if k in supported_recursive_search_ids:
                    new_usernames[v] = k
-            results_info['ids_usernames'] = new_usernames
+            results_info["ids_usernames"] = new_usernames
-            results_info['ids_links'] = eval(extracted_ids_data.get('links', '[]'))
+            results_info["ids_links"] = eval(extracted_ids_data.get("links", "[]"))
            result.ids_data = extracted_ids_data
    # Notify caller about results of query.
    query_notify.update(result, site.similar_search)
    # Save status of request
-    results_info['status'] = result
+    results_info["status"] = result
    # Save results from request
-    results_info['http_status'] = status_code
+    results_info["http_status"] = status_code
-    results_info['is_similar'] = site.similar_search
+    results_info["is_similar"] = site.similar_search
    # results_site['response_text'] = html_text
-    results_info['rank'] = site.alexa_rank
+    results_info["rank"] = site.alexa_rank
    return results_info
-async def maigret(username, site_dict, query_notify, logger,
+def make_site_result(
-                  proxy=None, timeout=None, recursive_search=False,
+    site: MaigretSite, username: str, options: QueryOptions, logger
-                  id_type='username', debug=False, forced=False,
+) -> QueryResultWrapper:
-                  max_connections=100, no_progressbar=False,
+    results_site: QueryResultWrapper = {}
-                  cookies=None):
+
    # Record URL of main site and username
    results_site["site"] = site
    results_site["username"] = username
    results_site["parsing_enabled"] = options["parsing"]
    results_site["url_main"] = site.url_main
    results_site["cookies"] = (
        options.get("cookie_jar")
        and options["cookie_jar"].filter_cookies(site.url_main)
        or None
    )
    headers = {
        "User-Agent": get_random_user_agent(),
    }
    headers.update(site.headers)
    if "url" not in site.__dict__:
        logger.error("No URL for site %s", site.name)
    # URL of user on site (if it exists)
    url = site.url.format(
        urlMain=site.url_main, urlSubpath=site.url_subpath, username=username
    )
    # workaround to prevent slash errors
    url = re.sub("(?<!:)/+", "/", url)
    session = options['session']
    # site check is disabled
    if site.disabled and not options['forced']:
        logger.debug(f"Site {site.name} is disabled, skipping...")
        results_site["status"] = QueryResult(
            username,
            site.name,
            url,
            QueryStatus.ILLEGAL,
            error=CheckError("Check is disabled"),
        )
    # current username type could not be applied
    elif site.type != options["id_type"]:
        results_site["status"] = QueryResult(
            username,
            site.name,
            url,
            QueryStatus.ILLEGAL,
            error=CheckError('Unsupported identifier type', f'Want "{site.type}"'),
        )
    # username is not allowed.
    elif site.regex_check and re.search(site.regex_check, username) is None:
        results_site["status"] = QueryResult(
            username,
            site.name,
            url,
            QueryStatus.ILLEGAL,
            error=CheckError(
                'Unsupported username format', f'Want "{site.regex_check}"'
            ),
        )
        results_site["url_user"] = ""
        results_site["http_status"] = ""
        results_site["response_text"] = ""
        # query_notify.update(results_site["status"])
    else:
        # URL of user on site (if it exists)
        results_site["url_user"] = url
        url_probe = site.url_probe
        if url_probe is None:
            # Probe URL is normal one seen by people out on the web.
            url_probe = url
        else:
            # There is a special URL for probing existence separate
            # from where the user profile normally can be found.
            url_probe = url_probe.format(
                urlMain=site.url_main,
                urlSubpath=site.url_subpath,
                username=username,
            )
        for k, v in site.get_params.items():
            url_probe += f"&{k}={v}"
        if site.check_type == "status_code" and site.request_head_only:
            # In most cases when we are detecting by status code,
            # it is not necessary to get the entire body:  we can
            # detect fine with just the HEAD response.
            request_method = session.head
        else:
            # Either this detect method needs the content associated
            # with the GET response, or this specific website will
            # not respond properly unless we request the whole page.
            request_method = session.get
        if site.check_type == "response_url":
            # Site forwards request to a different URL if username not
            # found.  Disallow the redirect so we can capture the
            # http status from the original URL request.
            allow_redirects = False
        else:
            # Allow whatever redirect that the site wants to do.
            # The final result of the request will be what is available.
            allow_redirects = True
        future = request_method(
            url=url_probe,
            headers=headers,
            allow_redirects=allow_redirects,
            timeout=options['timeout'],
        )
        # Store future request object in the results object
        results_site["future"] = future
    return results_site
 async def check_site_for_username(
    site, username, options: QueryOptions, logger, query_notify, *args, **kwargs
 ) -> Tuple[str, QueryResultWrapper]:
    default_result = make_site_result(site, username, options, logger)
    future = default_result.get("future")
    if not future:
        return site.name, default_result
    response = await get_response(request_future=future, logger=logger)
    response_result = process_site_result(
        response, query_notify, logger, default_result, site
    )
    return site.name, response_result
 async def debug_ip_request(session, logger):
    future = session.get(url="https://icanhazip.com")
    ip, status, check_error = await get_response(future, logger)
    if ip:
        logger.debug(f"My IP is: {ip.strip()}")
    else:
        logger.debug(f"IP requesting {check_error.type}: {check_error.desc}")
 def get_failed_sites(results: Dict[str, QueryResultWrapper]) -> List[str]:
    sites = []
    for sitename, r in results.items():
        status = r.get('status', {})
        if status and status.error:
            if errors.is_permanent(status.error.type):
                continue
            sites.append(sitename)
    return sites
 async def maigret(
    username: str,
    site_dict: Dict[str, MaigretSite],
    logger,
    query_notify=None,
    proxy=None,
    timeout=None,
    is_parsing_enabled=False,
    id_type="username",
    debug=False,
    forced=False,
    max_connections=100,
    no_progressbar=False,
    cookies=None,
    retries=0,
 ) -> QueryResultWrapper:
    """Main search func
-    Checks for existence of username on various social media sites.
+    Checks for existence of username on certain sites.
    Keyword Arguments:
-    username               -- String indicating username that report
+    username               -- Username string will be used for search.
-                              should be created against.
+    site_dict              -- Dictionary containing sites data in MaigretSite objects.
    site_dict              -- Dictionary containing all of the site data.
    query_notify           -- Object with base type of QueryNotify().
                              This will be used to notify the caller about
                              query results.
-    proxy                  -- String indicating the proxy URL
+    logger                 -- Standard Python logger object.
    timeout                -- Time in seconds to wait before timing out request.
                              Default is no timeout.
-    recursive_search       -- Search for other usernames in website pages & recursive search by them.
+    is_parsing_enabled     -- Extract additional info from account pages.
    id_type                -- Type of username to search.
                              Default is 'username', see all supported here:
                              https://github.com/soxoj/maigret/wiki/Supported-identifier-types
    max_connections        -- Maximum number of concurrent connections allowed.
                              Default is 100.
    no_progressbar         -- Displaying of ASCII progressbar during scanner.
    cookies                -- Filename of a cookie jar file to use for each request.
    Return Value:
    Dictionary containing results from report. Key of dictionary is the name
@@ -323,153 +487,101 @@ async def maigret(username, site_dict, query_notify, logger,
                       there was an HTTP error when checking for existence.
    """
-    # Notify caller that we are starting the query.
+    # notify caller that we are starting the query.
    if not query_notify:
        query_notify = Mock()
    query_notify.start(username, id_type)
-    # TODO: connector
+    # make http client session
-    connector = ProxyConnector.from_url(proxy) if proxy else aiohttp.TCPConnector(ssl=False)
+    connector = (
-    # connector = aiohttp.TCPConnector(ssl=False)
+        ProxyConnector.from_url(proxy) if proxy else aiohttp.TCPConnector(ssl=False)
    )
    connector.verify_ssl = False
    cookie_jar = None
    if cookies:
-        logger.debug(f'Using cookies jar file {cookies}')
+        logger.debug(f"Using cookies jar file {cookies}")
        cookie_jar = await import_aiohttp_cookies(cookies)
-    session = aiohttp.ClientSession(connector=connector, trust_env=True, cookie_jar=cookie_jar)
+    session = aiohttp.ClientSession(
        connector=connector, trust_env=True, cookie_jar=cookie_jar
    )
    if logger.level == logging.DEBUG:
-        future = session.get(url='https://icanhazip.com')
+        await debug_ip_request(session, logger)
        ip, status, error, expection = await get_response(future, None, logger)
        if ip:
            logger.debug(f'My IP is: {ip.strip()}')
        else:
            logger.debug(f'IP requesting {error}: {expection}')
    # Results from analysis of all sites
    results_total = {}
    # First create futures for all requests. This allows for the requests to run in parallel
    for site_name, site in site_dict.items():
        if site.type != id_type:
            continue
        if site.disabled and not forced:
            logger.debug(f'Site {site.name} is disabled, skipping...')
            continue
        # Results from analysis of this specific site
        results_site = {}
        # Record URL of main site and username
        results_site['username'] = username
        results_site['parsing_enabled'] = recursive_search
        results_site['url_main'] = site.url_main
        results_site['cookies'] = cookie_jar and cookie_jar.filter_cookies(site.url_main) or None
        headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11.1; rv:55.0) Gecko/20100101 Firefox/55.0',
        }
        headers.update(site.headers)
        if not 'url' in site.__dict__:
            logger.error('No URL for site %s', site.name)
        # URL of user on site (if it exists)
        url = site.url.format(
            urlMain=site.url_main,
            urlSubpath=site.url_subpath,
            username=username
        )
        # workaround to prevent slash errors
        url = re.sub('(?<!:)/+', '/', url)
        # Don't make request if username is invalid for the site
        if site.regex_check and re.search(site.regex_check, username) is None:
            # No need to do the check at the site: this user name is not allowed.
            results_site['status'] = QueryResult(username,
                                                 site_name,
                                                 url,
                                                 QueryStatus.ILLEGAL)
            results_site["url_user"] = ""
            results_site['http_status'] = ""
            results_site['response_text'] = ""
            query_notify.update(results_site['status'])
        else:
            # URL of user on site (if it exists)
            results_site["url_user"] = url
            url_probe = site.url_probe
            if url_probe is None:
                # Probe URL is normal one seen by people out on the web.
                url_probe = url
            else:
                # There is a special URL for probing existence separate
                # from where the user profile normally can be found.
                url_probe = url_probe.format(
                    urlMain=site.url_main,
                    urlSubpath=site.url_subpath,
                    username=username,
                )
            for k, v in site.get_params.items():
                url_probe += f'&{k}={v}'
            if site.check_type == 'status_code' and site.request_head_only:
                # In most cases when we are detecting by status code,
                # it is not necessary to get the entire body:  we can
                # detect fine with just the HEAD response.
                request_method = session.head
            else:
                # Either this detect method needs the content associated
                # with the GET response, or this specific website will
                # not respond properly unless we request the whole page.
                request_method = session.get
            if site.check_type == "response_url":
                # Site forwards request to a different URL if username not
                # found.  Disallow the redirect so we can capture the
                # http status from the original URL request.
                allow_redirects = False
            else:
                # Allow whatever redirect that the site wants to do.
                # The final result of the request will be what is available.
                allow_redirects = True
            future = request_method(url=url_probe, headers=headers,
                                    allow_redirects=allow_redirects,
                                    timeout=timeout,
                                    )
            # Store future in data for access later
            # TODO: move to separate obj
            site.request_future = future
        # Add this site's results into final dictionary with all of the other results.
        results_total[site_name] = results_site
    # TODO: move into top-level function
    sem = asyncio.Semaphore(max_connections)
    tasks = []
    for sitename, result_obj in results_total.items():
        update_site_coro = update_site_dict_from_response(sitename, site_dict, result_obj, sem, logger, query_notify)
        future = asyncio.ensure_future(update_site_coro)
        tasks.append(future)
    # setup parallel executor
    executor: Optional[AsyncExecutor] = None
    if no_progressbar:
-        await asyncio.gather(*tasks)
+        executor = AsyncioSimpleExecutor(logger=logger)
    else:
-        for f in tqdm.asyncio.tqdm.as_completed(tasks):
+        executor = AsyncioProgressbarQueueExecutor(
-            await f
+            logger=logger, in_parallel=max_connections, timeout=timeout + 0.5
        )
    # make options objects for all the requests
    options: QueryOptions = {}
    options["cookies"] = cookie_jar
    options["session"] = session
    options["parsing"] = is_parsing_enabled
    options["timeout"] = timeout
    options["id_type"] = id_type
    options["forced"] = forced
    # results from analysis of all sites
    all_results: Dict[str, QueryResultWrapper] = {}
    sites = list(site_dict.keys())
    attempts = retries + 1
    while attempts:
        tasks_dict = {}
        for sitename, site in site_dict.items():
            if sitename not in sites:
                continue
            default_result: QueryResultWrapper = {
                'site': site,
                'status': QueryResult(
                    username,
                    sitename,
                    '',
                    QueryStatus.UNKNOWN,
                    error=CheckError('Request failed'),
                ),
            }
            tasks_dict[sitename] = (
                check_site_for_username,
                [site, username, options, logger, query_notify],
                {'default': (sitename, default_result)},
            )
        cur_results = await executor.run(tasks_dict.values())
        # wait for executor timeout errors
        await asyncio.sleep(1)
        all_results.update(cur_results)
        sites = get_failed_sites(dict(cur_results))
        attempts -= 1
        if not sites:
            break
        if attempts:
            query_notify.warning(
                f'Restarting checks for {len(sites)} sites... ({attempts} attempts left)'
            )
    # closing http client session
    await session.close()
-    # Notify caller that all queries are finished.
+    # notify caller that all queries are finished
    query_notify.finish()
-    return results_total
+    return all_results
 def timeout_check(value):
@@ -497,10 +609,11 @@ def timeout_check(value):
    return timeout
-async def site_self_check(site, logger, semaphore, db: MaigretDatabase, silent=False):
+async def site_self_check(
-    query_notify = Mock()
+    site: MaigretSite, logger, semaphore, db: MaigretDatabase, silent=False
 ):
    changes = {
-        'disabled': False,
+        "disabled": False,
    }
    try:
@@ -513,29 +626,29 @@ async def site_self_check(site, logger, semaphore, db: MaigretDatabase, silent=F
        logger.error(site.__dict__)
        check_data = []
-    logger.info(f'Checking {site.name}...')
+    logger.info(f"Checking {site.name}...")
    for username, status in check_data:
        async with semaphore:
            results_dict = await maigret(
-                username,
+                username=username,
-                {site.name: site},
+                site_dict={site.name: site},
-                query_notify,
+                logger=logger,
                logger,
                timeout=30,
                id_type=site.type,
                forced=True,
                no_progressbar=True,
                retries=1,
            )
            # don't disable entries with other ids types
            # TODO: make normal checking
            if site.name not in results_dict:
                logger.info(results_dict)
-                changes['disabled'] = True
+                changes["disabled"] = True
                continue
-            result = results_dict[site.name]['status']
+            result = results_dict[site.name]["status"]
        site_status = result.status
@@ -544,33 +657,37 @@ async def site_self_check(site, logger, semaphore, db: MaigretDatabase, silent=F
                msgs = site.absence_strs
                etype = site.check_type
                logger.warning(
-                    f'Error while searching {username} in {site.name}: {result.context}, {msgs}, type {etype}')
+                    f"Error while searching {username} in {site.name}: {result.context}, {msgs}, type {etype}"
                )
                # don't disable in case of available username
                if status == QueryStatus.CLAIMED:
-                    changes['disabled'] = True
+                    changes["disabled"] = True
            elif status == QueryStatus.CLAIMED:
-                logger.warning(f'Not found `{username}` in {site.name}, must be claimed')
+                logger.warning(
                    f"Not found `{username}` in {site.name}, must be claimed"
                )
                logger.info(results_dict[site.name])
-                changes['disabled'] = True
+                changes["disabled"] = True
            else:
-                logger.warning(f'Found `{username}` in {site.name}, must be available')
+                logger.warning(f"Found `{username}` in {site.name}, must be available")
                logger.info(results_dict[site.name])
-                changes['disabled'] = True
+                changes["disabled"] = True
-    logger.info(f'Site {site.name} checking is finished')
+    logger.info(f"Site {site.name} checking is finished")
-    if changes['disabled'] != site.disabled:
+    if changes["disabled"] != site.disabled:
-        site.disabled = changes['disabled']
+        site.disabled = changes["disabled"]
        db.update_site(site)
        if not silent:
-            action = 'Disabled' if site.disabled else 'Enabled'
+            action = "Disabled" if site.disabled else "Enabled"
-            print(f'{action} site {site.name}...')
+            print(f"{action} site {site.name}...")
    return changes
-async def self_check(db: MaigretDatabase, site_data: dict, logger, silent=False,
+async def self_check(
-                     max_connections=10) -> bool:
+    db: MaigretDatabase, site_data: dict, logger, silent=False, max_connections=10
 ) -> bool:
    sem = asyncio.Semaphore(max_connections)
    tasks = []
    all_sites = site_data
@@ -592,13 +709,15 @@ async def self_check(db: MaigretDatabase, site_data: dict, logger, silent=False,
    total_disabled = disabled_new_count - disabled_old_count
    if total_disabled >= 0:
-        message = 'Disabled'
+        message = "Disabled"
    else:
-        message = 'Enabled'
+        message = "Enabled"
        total_disabled *= -1
    if not silent:
        print(
-            f'{message} {total_disabled} ({disabled_old_count} => {disabled_new_count}) checked sites. Run with `--info` flag to get more information')
+            f"{message} {total_disabled} ({disabled_old_count} => {disabled_new_count}) checked sites. "
            "Run with `--info` flag to get more information"
        )
    return total_disabled != 0
@@ -0,0 +1,115 @@
 from typing import Dict, List, Any
 from .result import QueryResult
 # error got as a result of completed search query
 class CheckError:
    _type = 'Unknown'
    _desc = ''
    def __init__(self, typename, desc=''):
        self._type = typename
        self._desc = desc
    def __str__(self):
        if not self._desc:
            return f'{self._type} error'
        return f'{self._type} error: {self._desc}'
    @property
    def type(self):
        return self._type
    @property
    def desc(self):
        return self._desc
 COMMON_ERRORS = {
    '<title>Attention Required! | Cloudflare</title>': CheckError(
        'Captcha', 'Cloudflare'
    ),
    'Please stand by, while we are checking your browser': CheckError(
        'Bot protection', 'Cloudflare'
    ),
    '<title>Доступ ограничен</title>': CheckError('Censorship', 'Rostelecom'),
    'document.getElementById(\'validate_form_submit\').disabled=true': CheckError(
        'Captcha', 'Mail.ru'
    ),
    'Verifying your browser, please wait...<br>DDoS Protection by</font> Blazingfast.io': CheckError(
        'Bot protection', 'Blazingfast'
    ),
    '404</h1><p class="error-card__description">Мы&nbsp;не&nbsp;нашли страницу': CheckError(
        'Resolving', 'MegaFon 404 page'
    ),
    'Доступ к информационному ресурсу ограничен на основании Федерального закона': CheckError(
        'Censorship', 'MGTS'
    ),
    'Incapsula incident ID': CheckError('Bot protection', 'Incapsula'),
 }
 ERRORS_TYPES = {
    'Captcha': 'Try to switch to another IP address or to use service cookies',
    'Bot protection': 'Try to switch to another IP address',
    'Censorship': 'switch to another internet service provider',
    'Request timeout': 'Try to increase timeout or to switch to another internet service provider',
 }
 TEMPORARY_ERRORS_TYPES = [
    'Request timeout',
    'Unknown',
    'Request failed',
    'Connecting failure',
    'HTTP',
    'Proxy',
    'Interrupted',
    'Connection lost',
 ]
 THRESHOLD = 3  # percent
 def is_important(err_data):
    return err_data['perc'] >= THRESHOLD
 def is_permanent(err_type):
    return err_type not in TEMPORARY_ERRORS_TYPES
 def detect(text):
    for flag, err in COMMON_ERRORS.items():
        if flag in text:
            return err
    return None
 def solution_of(err_type) -> str:
    return ERRORS_TYPES.get(err_type, '')
 def extract_and_group(search_res: dict) -> List[Dict[str, Any]]:
    errors_counts: Dict[str, int] = {}
    for r in search_res:
        if r and isinstance(r, dict) and r.get('status'):
            if not isinstance(r['status'], QueryResult):
                continue
            err = r['status'].error
            if not err:
                continue
            errors_counts[err.type] = errors_counts.get(err.type, 0) + 1
    counts = []
    for err, count in sorted(errors_counts.items(), key=lambda x: x[1], reverse=True):
        counts.append(
            {
                'err': err,
                'count': count,
                'perc': round(count / len(search_res), 2) * 100,
            }
        )
    return counts
@@ -0,0 +1,118 @@
 import asyncio
 import time
 import tqdm
 import sys
 from typing import Iterable, Any, List
 from .types import QueryDraft
 def create_task_func():
    if sys.version_info.minor > 6:
        create_asyncio_task = asyncio.create_task
    else:
        loop = asyncio.get_event_loop()
        create_asyncio_task = loop.create_task
    return create_asyncio_task
 class AsyncExecutor:
    def __init__(self, *args, **kwargs):
        self.logger = kwargs['logger']
    async def run(self, tasks: Iterable[QueryDraft]):
        start_time = time.time()
        results = await self._run(tasks)
        self.execution_time = time.time() - start_time
        self.logger.debug(f'Spent time: {self.execution_time}')
        return results
    async def _run(self, tasks: Iterable[QueryDraft]):
        await asyncio.sleep(0)
 class AsyncioSimpleExecutor(AsyncExecutor):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    async def _run(self, tasks: Iterable[QueryDraft]):
        futures = [f(*args, **kwargs) for f, args, kwargs in tasks]
        return await asyncio.gather(*futures)
 class AsyncioProgressbarExecutor(AsyncExecutor):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    async def _run(self, tasks: Iterable[QueryDraft]):
        futures = [f(*args, **kwargs) for f, args, kwargs in tasks]
        results = []
        for f in tqdm.asyncio.tqdm.as_completed(futures):
            results.append(await f)
        return results
 class AsyncioProgressbarSemaphoreExecutor(AsyncExecutor):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.semaphore = asyncio.Semaphore(kwargs.get('in_parallel', 1))
    async def _run(self, tasks: Iterable[QueryDraft]):
        async def _wrap_query(q: QueryDraft):
            async with self.semaphore:
                f, args, kwargs = q
                return await f(*args, **kwargs)
        async def semaphore_gather(tasks: Iterable[QueryDraft]):
            coros = [_wrap_query(q) for q in tasks]
            results = []
            for f in tqdm.asyncio.tqdm.as_completed(coros):
                results.append(await f)
            return results
        return await semaphore_gather(tasks)
 class AsyncioProgressbarQueueExecutor(AsyncExecutor):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.workers_count = kwargs.get('in_parallel', 10)
        self.progress_func = kwargs.get('progress_func', tqdm.tqdm)
        self.queue = asyncio.Queue(self.workers_count)
        self.timeout = kwargs.get('timeout')
    async def worker(self):
        while True:
            try:
                f, args, kwargs = self.queue.get_nowait()
            except asyncio.QueueEmpty:
                return
            query_future = f(*args, **kwargs)
            query_task = create_task_func()(query_future)
            try:
                result = await asyncio.wait_for(query_task, timeout=self.timeout)
            except asyncio.TimeoutError:
                result = kwargs.get('default')
            self.results.append(result)
            self.progress.update(1)
            self.queue.task_done()
    async def _run(self, queries: Iterable[QueryDraft]):
        self.results: List[Any] = []
        queries_list = list(queries)
        min_workers = min(len(queries_list), self.workers_count)
        workers = [create_task_func()(self.worker()) for _ in range(min_workers)]
        self.progress = self.progress_func(total=len(queries_list))
        for t in queries_list:
            await self.queue.put(t)
        await self.queue.join()
        for w in workers:
            w.cancel()
        self.progress.close()
        return self.results
@@ -1,188 +1,360 @@
 """
 Maigret main module
 """
-
+import aiohttp
 import asyncio
 import logging
 import os
 import platform
 import sys
 import platform
 from argparse import ArgumentParser, RawDescriptionHelpFormatter
 import requests
-from socid_extractor import parse, __version__ as socid_version
+from socid_extractor import extract, parse, __version__ as socid_version
-from .checking import *
+from .checking import (
    timeout_check,
    supported_recursive_search_ids,
    self_check,
    unsupported_characters,
    maigret,
 )
 from . import errors
 from .notify import QueryNotifyPrint
-from .report import save_csv_report, save_xmind_report, save_html_report, save_pdf_report, \
+from .report import (
-    generate_report_context, save_txt_report, SUPPORTED_JSON_REPORT_FORMATS, check_supported_json_format, \
+    save_csv_report,
-    save_json_report
+    save_xmind_report,
    save_html_report,
    save_pdf_report,
    generate_report_context,
    save_txt_report,
    SUPPORTED_JSON_REPORT_FORMATS,
    check_supported_json_format,
    save_json_report,
 )
 from .sites import MaigretDatabase
 from .submit import submit_dialog
 from .utils import get_dict_ascii_tree
-__version__ = '0.1.14'
+__version__ = '0.1.20'
 def notify_about_errors(search_results, query_notify):
    errs = errors.extract_and_group(search_results.values())
    was_errs_displayed = False
    for e in errs:
        if not errors.is_important(e):
            continue
        text = f'Too many errors of type "{e["err"]}" ({e["perc"]}%)'
        solution = errors.solution_of(e['err'])
        if solution:
            text = '. '.join([text, solution])
        query_notify.warning(text, '!')
        was_errs_displayed = True
    if was_errs_displayed:
        query_notify.warning(
            'You can see detailed site check errors with a flag `--print-errors`'
        )
 def setup_arguments_parser():
    version_string = '\n'.join(
        [
            f'%(prog)s {__version__}',
            f'Socid-extractor:  {socid_version}',
            f'Aiohttp:  {aiohttp.__version__}',
            f'Requests:  {requests.__version__}',
            f'Python:  {platform.python_version()}',
        ]
    )
    parser = ArgumentParser(
        formatter_class=RawDescriptionHelpFormatter,
        description=f"Maigret v{__version__}",
    )
    parser.add_argument(
        "--version",
        action="version",
        version=version_string,
        help="Display version information and dependencies.",
    )
    parser.add_argument(
        "--info",
        "-vv",
        action="store_true",
        dest="info",
        default=False,
        help="Display service information.",
    )
    parser.add_argument(
        "--verbose",
        "-v",
        action="store_true",
        dest="verbose",
        default=False,
        help="Display extra information and metrics.",
    )
    parser.add_argument(
        "-d",
        "--debug",
        "-vvv",
        action="store_true",
        dest="debug",
        default=False,
        help="Saving debugging information and sites responses in debug.txt.",
    )
    parser.add_argument(
        "--site",
        action="append",
        metavar='SITE_NAME',
        dest="site_list",
        default=[],
        help="Limit analysis to just the listed sites (use several times to specify more than one)",
    )
    parser.add_argument(
        "--proxy",
        "-p",
        metavar='PROXY_URL',
        action="store",
        dest="proxy",
        default=None,
        help="Make requests over a proxy. e.g. socks5://127.0.0.1:1080",
    )
    parser.add_argument(
        "--db",
        metavar="DB_FILE",
        dest="db_file",
        default=None,
        help="Load Maigret database from a JSON file or an online, valid, JSON file.",
    )
    parser.add_argument(
        "--cookies-jar-file",
        metavar="COOKIE_FILE",
        dest="cookie_file",
        default=None,
        help="File with cookies.",
    )
    parser.add_argument(
        "--timeout",
        action="store",
        metavar='TIMEOUT',
        dest="timeout",
        type=timeout_check,
        default=30,
        help="Time (in seconds) to wait for response to requests. "
        "Default timeout of 30.0s. "
        "A longer timeout will be more likely to get results from slow sites. "
        "On the other hand, this may cause a long delay to gather all results. ",
    )
    parser.add_argument(
        "--retries",
        action="store",
        type=int,
        metavar='RETRIES',
        default=1,
        help="Attempts to restart temporary failed requests.",
    )
    parser.add_argument(
        "-n",
        "--max-connections",
        action="store",
        type=int,
        dest="connections",
        default=100,
        help="Allowed number of concurrent connections.",
    )
    parser.add_argument(
        "-a",
        "--all-sites",
        action="store_true",
        dest="all_sites",
        default=False,
        help="Use all sites for scan.",
    )
    parser.add_argument(
        "--top-sites",
        action="store",
        default=500,
        type=int,
        help="Count of sites for scan ranked by Alexa Top (default: 500).",
    )
    parser.add_argument(
        "--print-not-found",
        action="store_true",
        dest="print_not_found",
        default=False,
        help="Print sites where the username was not found.",
    )
    parser.add_argument(
        "--print-errors",
        action="store_true",
        dest="print_check_errors",
        default=False,
        help="Print errors messages: connection, captcha, site country ban, etc.",
    )
    parser.add_argument(
        "--submit",
        metavar='EXISTING_USER_URL',
        type=str,
        dest="new_site_to_submit",
        default=False,
        help="URL of existing profile in new site to submit.",
    )
    parser.add_argument(
        "--no-color",
        action="store_true",
        dest="no_color",
        default=False,
        help="Don't color terminal output",
    )
    parser.add_argument(
        "--no-progressbar",
        action="store_true",
        dest="no_progressbar",
        default=False,
        help="Don't show progressbar.",
    )
    parser.add_argument(
        "--browse",
        "-b",
        action="store_true",
        dest="browse",
        default=False,
        help="Browse to all results on default bowser.",
    )
    parser.add_argument(
        "--no-recursion",
        action="store_true",
        dest="disable_recursive_search",
        default=False,
        help="Disable recursive search by additional data extracted from pages.",
    )
    parser.add_argument(
        "--no-extracting",
        action="store_true",
        dest="disable_extracting",
        default=False,
        help="Disable parsing pages for additional data and other usernames.",
    )
    parser.add_argument(
        "--self-check",
        action="store_true",
        default=False,
        help="Do self check for sites and database and disable non-working ones.",
    )
    parser.add_argument(
        "--stats", action="store_true", default=False, help="Show database statistics."
    )
    parser.add_argument(
        "--use-disabled-sites",
        action="store_true",
        default=False,
        help="Use disabled sites to search (may cause many false positives).",
    )
    parser.add_argument(
        "--parse",
        dest="parse_url",
        default='',
        help="Parse page by URL and extract username and IDs to use for search.",
    )
    parser.add_argument(
        "--id-type",
        dest="id_type",
        default='username',
        help="Specify identifier(s) type (default: username).",
    )
    parser.add_argument(
        "--ignore-ids",
        action="append",
        metavar='IGNORED_IDS',
        dest="ignore_ids_list",
        default=[],
        help="Do not make search by the specified username or other ids.",
    )
    parser.add_argument(
        "username",
        nargs='+',
        metavar='USERNAMES',
        action="store",
        help="One or more usernames to check with social networks.",
    )
    parser.add_argument(
        "--tags", dest="tags", default='', help="Specify tags of sites."
    )
    # reports options
    parser.add_argument(
        "--folderoutput",
        "-fo",
        dest="folderoutput",
        default="reports",
        help="If using multiple usernames, the output of the results will be saved to this folder.",
    )
    parser.add_argument(
        "-T",
        "--txt",
        action="store_true",
        dest="txt",
        default=False,
        help="Create a TXT report (one report per username).",
    )
    parser.add_argument(
        "-C",
        "--csv",
        action="store_true",
        dest="csv",
        default=False,
        help="Create a CSV report (one report per username).",
    )
    parser.add_argument(
        "-H",
        "--html",
        action="store_true",
        dest="html",
        default=False,
        help="Create an HTML report file (general report on all usernames).",
    )
    parser.add_argument(
        "-X",
        "--xmind",
        action="store_true",
        dest="xmind",
        default=False,
        help="Generate an XMind 8 mindmap report (one report per username).",
    )
    parser.add_argument(
        "-P",
        "--pdf",
        action="store_true",
        dest="pdf",
        default=False,
        help="Generate a PDF report (general report on all usernames).",
    )
    parser.add_argument(
        "-J",
        "--json",
        action="store",
        metavar='REPORT_TYPE',
        dest="json",
        default='',
        type=check_supported_json_format,
        help=f"Generate a JSON report of specific type: {', '.join(SUPPORTED_JSON_REPORT_FORMATS)}"
        " (one report per username).",
    )
    return parser
 async def main():
-    version_string = '\n'.join([
+    arg_parser = setup_arguments_parser()
-        f'%(prog)s {__version__}',
+    args = arg_parser.parse_args()
        f'Socid-extractor:  {socid_version}',
        f'Aiohttp:  {aiohttp.__version__}',
        f'Requests:  {requests.__version__}',
        f'Python:  {platform.python_version()}',
    ])
    parser = ArgumentParser(formatter_class=RawDescriptionHelpFormatter,
                            description=f"Maigret v{__version__}"
                            )
    parser.add_argument("--version",
                        action="version", version=version_string,
                        help="Display version information and dependencies."
                        )
    parser.add_argument("--info", "-vv",
                        action="store_true", dest="info", default=False,
                        help="Display service information."
                        )
    parser.add_argument("--verbose", "-v",
                        action="store_true", dest="verbose", default=False,
                        help="Display extra information and metrics."
                        )
    parser.add_argument("-d", "--debug", "-vvv",
                        action="store_true", dest="debug", default=False,
                        help="Saving debugging information and sites responses in debug.txt."
                        )
    parser.add_argument("--site",
                        action="append", metavar='SITE_NAME',
                        dest="site_list", default=[],
                        help="Limit analysis to just the listed sites (use several times to specify more than one)"
                        )
    parser.add_argument("--proxy", "-p", metavar='PROXY_URL',
                        action="store", dest="proxy", default=None,
                        help="Make requests over a proxy. e.g. socks5://127.0.0.1:1080"
                        )
    parser.add_argument("--db", metavar="DB_FILE",
                        dest="db_file", default=None,
                        help="Load Maigret database from a JSON file or an online, valid, JSON file.")
    parser.add_argument("--cookies-jar-file", metavar="COOKIE_FILE",
                        dest="cookie_file", default=None,
                        help="File with cookies.")
    parser.add_argument("--timeout",
                        action="store", metavar='TIMEOUT',
                        dest="timeout", type=timeout_check, default=10,
                        help="Time (in seconds) to wait for response to requests."
                             "Default timeout of 10.0s. "
                             "A longer timeout will be more likely to get results from slow sites."
                             "On the other hand, this may cause a long delay to gather all results."
                        )
    parser.add_argument("-n", "--max-connections",
                        action="store", type=int,
                        dest="connections", default=100,
                        help="Allowed number of concurrent connections."
                        )
    parser.add_argument("-a", "--all-sites",
                        action="store_true", dest="all_sites", default=False,
                        help="Use all sites for scan."
                        )
    parser.add_argument("--top-sites",
                        action="store", default=500, type=int,
                        help="Count of sites for scan ranked by Alexa Top (default: 500)."
                        )
    parser.add_argument("--print-not-found",
                        action="store_true", dest="print_not_found", default=False,
                        help="Print sites where the username was not found."
                        )
    parser.add_argument("--print-errors",
                        action="store_true", dest="print_check_errors", default=False,
                        help="Print errors messages: connection, captcha, site country ban, etc."
                        )
    parser.add_argument("--submit", metavar='EXISTING_USER_URL',
                        type=str, dest="new_site_to_submit", default=False,
                        help="URL of existing profile in new site to submit."
                        )
    parser.add_argument("--no-color",
                        action="store_true", dest="no_color", default=False,
                        help="Don't color terminal output"
                        )
    parser.add_argument("--browse", "-b",
                        action="store_true", dest="browse", default=False,
                        help="Browse to all results on default bowser."
                        )
    parser.add_argument("--no-recursion",
                        action="store_true", dest="disable_recursive_search", default=False,
                        help="Disable parsing pages for other usernames and recursive search by them."
                        )
    parser.add_argument("--self-check",
                        action="store_true", default=False,
                        help="Do self check for sites and database and disable non-working ones."
                        )
    parser.add_argument("--stats",
                        action="store_true", default=False,
                        help="Show database statistics."
                        )
    parser.add_argument("--use-disabled-sites",
                        action="store_true", default=False,
                        help="Use disabled sites to search (may cause many false positives)."
                        )
    parser.add_argument("--parse",
                        dest="parse_url", default='',
                        help="Parse page by URL and extract username and IDs to use for search."
                        )
    parser.add_argument("--id-type",
                        dest="id_type", default='username',
                        help="Specify identifier(s) type (default: username)."
                        )
    parser.add_argument("--ignore-ids",
                        action="append", metavar='IGNORED_IDS',
                        dest="ignore_ids_list", default=[],
                        help="Do not make search by the specified username or other ids."
                        )
    parser.add_argument("username",
                        nargs='+', metavar='USERNAMES',
                        action="store",
                        help="One or more usernames to check with social networks."
                        )
    parser.add_argument("--tags",
                        dest="tags", default='',
                        help="Specify tags of sites."
                        )
    # reports options
    parser.add_argument("--folderoutput", "-fo", dest="folderoutput", default="reports",
                        help="If using multiple usernames, the output of the results will be saved to this folder."
                        )
    parser.add_argument("-T", "--txt",
                        action="store_true", dest="txt", default=False,
                        help="Create a TXT report (one report per username)."
                        )
    parser.add_argument("-C", "--csv",
                        action="store_true", dest="csv", default=False,
                        help="Create a CSV report (one report per username)."
                        )
    parser.add_argument("-H", "--html",
                        action="store_true", dest="html", default=False,
                        help="Create an HTML report file (general report on all usernames)."
                        )
    parser.add_argument("-X", "--xmind",
                        action="store_true",
                        dest="xmind", default=False,
                        help="Generate an XMind 8 mindmap report (one report per username)."
                        )
    parser.add_argument("-P", "--pdf",
                        action="store_true",
                        dest="pdf", default=False,
                        help="Generate a PDF report (general report on all usernames)."
                        )
    parser.add_argument("-J", "--json",
                        action="store", metavar='REPORT_TYPE',
                        dest="json", default='', type=check_supported_json_format,
                        help=f"Generate a JSON report of specific type: {', '.join(SUPPORTED_JSON_REPORT_FORMATS)}"
                        " (one report per username)."
                        )
    args = parser.parse_args()
    # Logging
    log_level = logging.ERROR
    logging.basicConfig(
        format='[%(filename)s:%(lineno)d] %(levelname)-3s  %(asctime)s %(message)s',
        datefmt='%H:%M:%S',
-        level=log_level
+        level=log_level,
    )
    if args.debug:
@@ -199,10 +371,10 @@ async def main():
    usernames = {
        u: args.id_type
        for u in args.username
-        if u not in ['-']
+        if u not in ['-'] and u not in args.ignore_ids_list
        and u not in args.ignore_ids_list
    }
    parsing_enabled = not args.disable_extracting
    recursive_search_enabled = not args.disable_recursive_search
    # Make prompts
@@ -210,54 +382,79 @@ async def main():
        print("Using the proxy: " + args.proxy)
    if args.parse_url:
-        page, _ = parse(args.parse_url, cookies_str='')
+        # url, headers
-        info = extract(page)
+        reqs = [(args.parse_url, set())]
-        text = 'Extracted ID data from webpage: ' + ', '.join([f'{a}: {b}' for a, b in info.items()])
+        try:
-        print(text)
+            # temporary workaround for URL mutations MVP
-        for k, v in info.items():
+            from socid_extractor import mutate_url
-            if 'username' in k:
+
-                usernames[v] = 'username'
+            reqs += list(mutate_url(args.parse_url))
-            if k in supported_recursive_search_ids:
+        except Exception as e:
-                usernames[v] = k
+            logger.warning(e)
            pass
        for req in reqs:
            url, headers = req
            print(f'Scanning webpage by URL {url}...')
            page, _ = parse(url, cookies_str='', headers=headers)
            info = extract(page)
            if not info:
                print('Nothing extracted')
            else:
                print(get_dict_ascii_tree(info.items(), new_line=False), ' ')
            for k, v in info.items():
                if 'username' in k:
                    usernames[v] = 'username'
                if k in supported_recursive_search_ids:
                    usernames[v] = k
    if args.tags:
        args.tags = list(set(str(args.tags).split(',')))
    if args.db_file is None:
-        args.db_file = \
+        args.db_file = os.path.join(
-            os.path.join(os.path.dirname(os.path.realpath(__file__)),
+            os.path.dirname(os.path.realpath(__file__)), "resources/data.json"
-                         "resources/data.json"
+        )
                         )
    if args.top_sites == 0 or args.all_sites:
        args.top_sites = sys.maxsize
    # Create notify object for query results.
-    query_notify = QueryNotifyPrint(result=None,
+    query_notify = QueryNotifyPrint(
-                                    verbose=args.verbose,
+        result=None,
-                                    print_found_only=not args.print_not_found,
+        verbose=args.verbose,
-                                    skip_check_errors=not args.print_check_errors,
+        print_found_only=not args.print_not_found,
-                                    color=not args.no_color)
+        skip_check_errors=not args.print_check_errors,
        color=not args.no_color,
    )
    # Create object with all information about sites we are aware of.
    db = MaigretDatabase().load_from_file(args.db_file)
-    get_top_sites_for_id = lambda x: db.ranked_sites_dict(top=args.top_sites, tags=args.tags,
+    get_top_sites_for_id = lambda x: db.ranked_sites_dict(
-                                                          names=args.site_list,
+        top=args.top_sites,
-                                                          disabled=False, id_type=x)
+        tags=args.tags,
        names=args.site_list,
        disabled=False,
        id_type=x,
    )
    site_data = get_top_sites_for_id(args.id_type)
    if args.new_site_to_submit:
-        is_submitted = await submit_dialog(db, args.new_site_to_submit)
+        is_submitted = await submit_dialog(
            db, args.new_site_to_submit, args.cookie_file, logger
        )
        if is_submitted:
            db.save_to_file(args.db_file)
    # Database self-checking
    if args.self_check:
        print('Maigret sites database self-checking...')
-        is_need_update = await self_check(db, site_data, logger, max_connections=args.connections)
+        is_need_update = await self_check(
            db, site_data, logger, max_connections=args.connections
        )
        if is_need_update:
-            if input('Do you want to save changes permanently? [yYnN]\n').lower() == 'y':
+            if input('Do you want to save changes permanently? [Yn]\n').lower() == 'y':
                db.save_to_file(args.db_file)
                print('Database was successfully updated.')
            else:
@@ -269,7 +466,6 @@ async def main():
    # Make reports folder is not exists
    os.makedirs(args.folderoutput, exist_ok=True)
    report_path = args.folderoutput
    # Define one report filename template
    report_filepath_tpl = os.path.join(args.folderoutput, 'report_{username}{postfix}')
@@ -288,9 +484,13 @@ async def main():
        query_notify.warning('No sites to check, exiting!')
        sys.exit(2)
    else:
-        query_notify.warning(f'Starting a search on top {len(site_data)} sites from the Maigret database...')
+        query_notify.warning(
            f'Starting a search on top {len(site_data)} sites from the Maigret database...'
        )
        if not args.all_sites:
-            query_notify.warning(f'You can run search by full list of sites with flag `-a`', '!')
+            query_notify.warning(
                'You can run search by full list of sites with flag `-a`', '!'
            )
    already_checked = set()
    general_results = []
@@ -305,42 +505,53 @@ async def main():
            already_checked.add(username.lower())
        if username in args.ignore_ids_list:
-            query_notify.warning(f'Skip a search by username {username} cause it\'s marked as ignored.')
+            query_notify.warning(
                f'Skip a search by username {username} cause it\'s marked as ignored.'
            )
            continue
        # check for characters do not supported by sites generally
-        found_unsupported_chars = set(unsupported_characters).intersection(set(username))
+        found_unsupported_chars = set(unsupported_characters).intersection(
            set(username)
        )
        if found_unsupported_chars:
-            pretty_chars_str = ','.join(map(lambda s: f'"{s}"', found_unsupported_chars))
+            pretty_chars_str = ','.join(
                map(lambda s: f'"{s}"', found_unsupported_chars)
            )
            query_notify.warning(
-                f'Found unsupported URL characters: {pretty_chars_str}, skip search by username "{username}"')
+                f'Found unsupported URL characters: {pretty_chars_str}, skip search by username "{username}"'
            )
            continue
        sites_to_check = get_top_sites_for_id(id_type)
-        results = await maigret(username,
+        results = await maigret(
-                                dict(sites_to_check),
+            username=username,
-                                query_notify,
+            site_dict=dict(sites_to_check),
-                                proxy=args.proxy,
+            query_notify=query_notify,
-                                timeout=args.timeout,
+            proxy=args.proxy,
-                                recursive_search=recursive_search_enabled,
+            timeout=args.timeout,
-                                id_type=id_type,
+            is_parsing_enabled=parsing_enabled,
-                                debug=args.verbose,
+            id_type=id_type,
-                                logger=logger,
+            debug=args.verbose,
-                                cookies=args.cookie_file,
+            logger=logger,
-                                forced=args.use_disabled_sites,
+            cookies=args.cookie_file,
-                                max_connections=args.connections,
+            forced=args.use_disabled_sites,
-                                )
+            max_connections=args.connections,
            no_progressbar=args.no_progressbar,
            retries=args.retries,
        )
        notify_about_errors(results, query_notify)
        username_result = (username, id_type, results)
        general_results.append((username, id_type, results))
        # TODO: tests
        for website_name in results:
            dictionary = results[website_name]
            # TODO: fix no site data issue
-            if not dictionary:
+            if not dictionary or not recursive_search_enabled:
                continue
            new_usernames = dictionary.get('ids_usernames')
@@ -371,10 +582,13 @@ async def main():
            query_notify.warning(f'TXT report for {username} saved in {filename}')
        if args.json:
-            filename = report_filepath_tpl.format(username=username, postfix=f'_{args.json}.json')
+            filename = report_filepath_tpl.format(
                username=username, postfix=f'_{args.json}.json'
            )
            save_json_report(filename, username, results, report_type=args.json)
-            query_notify.warning(f'JSON {args.json} report for {username} saved in {filename}')
+            query_notify.warning(
-
+                f'JSON {args.json} report for {username} saved in {filename}'
            )
    # reporting for all the result
    if general_results:
@@ -4,12 +4,14 @@ This module defines the objects for notifying the caller about the
 results of queries.
 """
 import sys
 from colorama import Fore, Style, init
 from .result import QueryStatus
 from .utils import get_dict_ascii_tree
-class QueryNotify():
+class QueryNotify:
    """Query Notify Object.
    Base class that describes methods available to notify the results of
@@ -37,7 +39,7 @@ class QueryNotify():
        return
-    def start(self, message=None, id_type='username'):
+    def start(self, message=None, id_type="username"):
        """Notify Start.
        Notify method for start of query.  This method will be called before
@@ -114,8 +116,14 @@ class QueryNotifyPrint(QueryNotify):
    Query notify class that prints results.
    """
-    def __init__(self, result=None, verbose=False, print_found_only=False,
+    def __init__(
-                 skip_check_errors=False, color=True):
+        self,
        result=None,
        verbose=False,
        print_found_only=False,
        skip_check_errors=False,
        color=True,
    ):
        """Create Query Notify Print Object.
        Contains information about a specific method of notifying the results
@@ -160,38 +168,29 @@ class QueryNotifyPrint(QueryNotify):
        title = f"Checking {id_type}"
        if self.color:
-            print(Style.BRIGHT + Fore.GREEN + "[" +
+            print(
-                  Fore.YELLOW + "*" +
+                Style.BRIGHT
-                  Fore.GREEN + f"] {title}" +
+                + Fore.GREEN
-                  Fore.WHITE + f" {message}" +
+                + "["
-                  Fore.GREEN + " on:")
+                + Fore.YELLOW
                + "*"
                + Fore.GREEN
                + f"] {title}"
                + Fore.WHITE
                + f" {message}"
                + Fore.GREEN
                + " on:"
            )
        else:
            print(f"[*] {title} {message} on:")
-    def warning(self, message, symbol='-'):
+    def warning(self, message, symbol="-"):
-        msg = f'[{symbol}] {message}'
+        msg = f"[{symbol}] {message}"
        if self.color:
            print(Style.BRIGHT + Fore.YELLOW + msg)
        else:
            print(msg)
    def get_additional_data_text(self, items, prepend=''):
        text = ''
        for num, item in enumerate(items):
            box_symbol = '┣╸' if num != len(items) - 1 else '┗╸'
            if type(item) == tuple:
                field_name, field_value = item
                if field_value.startswith('[\''):
                    is_last_item = num == len(items) - 1
                    prepend_symbols = ' ' * 3 if is_last_item else ' ┃ '
                    field_value = self.get_additional_data_text(eval(field_value), prepend_symbols)
                text += f'\n{prepend}{box_symbol}{field_name}: {field_value}'
            else:
                text += f'\n{prepend}{box_symbol} {item}'
        return text
    def update(self, result, is_similar=False):
        """Notify Update.
@@ -210,18 +209,20 @@ class QueryNotifyPrint(QueryNotify):
        if not self.result.ids_data:
            ids_data_text = ""
        else:
-            ids_data_text = self.get_additional_data_text(self.result.ids_data.items(), ' ')
+            ids_data_text = get_dict_ascii_tree(self.result.ids_data.items(), " ")
-        def make_colored_terminal_notify(status, text, status_color, text_color, appendix):
+        def make_colored_terminal_notify(
            status, text, status_color, text_color, appendix
        ):
            text = [
-                f'{Style.BRIGHT}{Fore.WHITE}[{status_color}{status}{Fore.WHITE}]' +
+                f"{Style.BRIGHT}{Fore.WHITE}[{status_color}{status}{Fore.WHITE}]"
-                f'{text_color} {text}: {Style.RESET_ALL}' +
+                + f"{text_color} {text}: {Style.RESET_ALL}"
-                f'{appendix}'
+                + f"{appendix}"
            ]
-            return ''.join(text)
+            return "".join(text)
        def make_simple_terminal_notify(status, text, appendix):
-            return f'[{status}] {text}: {appendix}'
+            return f"[{status}] {text}: {appendix}"
        def make_terminal_notify(is_colored=True, *args):
            if is_colored:
@@ -234,45 +235,55 @@ class QueryNotifyPrint(QueryNotify):
        # Output to the terminal is desired.
        if result.status == QueryStatus.CLAIMED:
            color = Fore.BLUE if is_similar else Fore.GREEN
-            status = '?' if is_similar else '+'
+            status = "?" if is_similar else "+"
            notify = make_terminal_notify(
                self.color,
-                status, result.site_name,
+                status,
-                color, color,
+                result.site_name,
-                result.site_url_user + ids_data_text
+                color,
                color,
                result.site_url_user + ids_data_text,
            )
        elif result.status == QueryStatus.AVAILABLE:
            if not self.print_found_only:
                notify = make_terminal_notify(
                    self.color,
-                    '-', result.site_name,
+                    "-",
-                    Fore.RED, Fore.YELLOW,
+                    result.site_name,
-                    'Not found!' + ids_data_text
+                    Fore.RED,
                    Fore.YELLOW,
                    "Not found!" + ids_data_text,
                )
        elif result.status == QueryStatus.UNKNOWN:
            if not self.skip_check_errors:
                notify = make_terminal_notify(
                    self.color,
-                    '?', result.site_name,
+                    "?",
-                    Fore.RED, Fore.RED,
+                    result.site_name,
-                    self.result.context + ids_data_text
+                    Fore.RED,
                    Fore.RED,
                    str(self.result.error) + ids_data_text,
                )
        elif result.status == QueryStatus.ILLEGAL:
            if not self.print_found_only:
-                text = 'Illegal Username Format For This Site!'
+                text = "Illegal Username Format For This Site!"
                notify = make_terminal_notify(
                    self.color,
-                    '-', result.site_name,
+                    "-",
-                    Fore.RED, Fore.YELLOW,
+                    result.site_name,
-                    text + ids_data_text
+                    Fore.RED,
                    Fore.YELLOW,
                    text + ids_data_text,
                )
        else:
            # It should be impossible to ever get here...
-            raise ValueError(f"Unknown Query Status '{str(result.status)}' for "
+            raise ValueError(
-                             f"site '{self.result.site_name}'")
+                f"Unknown Query Status '{str(result.status)}' for "
                f"site '{self.result.site_name}'"
            )
        if notify:
-            sys.stdout.write('\x1b[1K\r')
+            sys.stdout.write("\x1b[1K\r")
            print(notify)
        return
@@ -1,90 +1,101 @@
 import csv
 import json
 import io
 import json
 import logging
 import os
 from argparse import ArgumentTypeError
 from datetime import datetime
 from typing import Dict, Any
 import pycountry
 import xmind
-from datetime import datetime
+from dateutil.parser import parse as parse_datetime_str
 from jinja2 import Template
 from xhtml2pdf import pisa
 from argparse import ArgumentTypeError
 from dateutil.parser import parse as parse_datetime_str
 from .result import QueryStatus
 from .utils import is_country_tag, CaseConverter, enrich_link_str
 SUPPORTED_JSON_REPORT_FORMATS = [
-    'simple',
+    "simple",
-    'ndjson',
+    "ndjson",
 ]
-
+"""
 '''
 UTILS
-'''
+"""
 def filter_supposed_data(data):
-    ### interesting fields
+    # interesting fields
-    allowed_fields = ['fullname', 'gender', 'location', 'age']
+    allowed_fields = ["fullname", "gender", "location", "age"]
-    filtered_supposed_data = {CaseConverter.snake_to_title(k): v[0]
+    filtered_supposed_data = {
-                              for k, v in data.items()
+        CaseConverter.snake_to_title(k): v[0]
-                              if k in allowed_fields}
+        for k, v in data.items()
        if k in allowed_fields
    }
    return filtered_supposed_data
-'''
+"""
 REPORTS SAVING
-'''
+"""
 def save_csv_report(filename: str, username: str, results: dict):
-    with open(filename, 'w', newline='', encoding='utf-8') as f:
+    with open(filename, "w", newline="", encoding="utf-8") as f:
        generate_csv_report(username, results, f)
 def save_txt_report(filename: str, username: str, results: dict):
-    with open(filename, 'w', encoding='utf-8') as f:
+    with open(filename, "w", encoding="utf-8") as f:
        generate_txt_report(username, results, f)
 def save_html_report(filename: str, context: dict):
    template, _ = generate_report_template(is_pdf=False)
    filled_template = template.render(**context)
-    with open(filename, 'w') as f:
+    with open(filename, "w") as f:
        f.write(filled_template)
 def save_pdf_report(filename: str, context: dict):
    template, css = generate_report_template(is_pdf=True)
    filled_template = template.render(**context)
-    with open(filename, 'w+b') as f:
+    with open(filename, "w+b") as f:
        pisa.pisaDocument(io.StringIO(filled_template), dest=f, default_css=css)
 def save_json_report(filename: str, username: str, results: dict, report_type: str):
-    with open(filename, 'w', encoding='utf-8') as f:
+    with open(filename, "w", encoding="utf-8") as f:
        generate_json_report(username, results, f, report_type=report_type)
-'''
+"""
 REPORTS GENERATING
-'''
+"""
 def generate_report_template(is_pdf: bool):
    """
-        HTML/PDF template generation
+    HTML/PDF template generation
    """
    def get_resource_content(filename):
-        return open(os.path.join(maigret_path, 'resources', filename)).read()
+        return open(os.path.join(maigret_path, "resources", filename)).read()
    maigret_path = os.path.dirname(os.path.realpath(__file__))
    if is_pdf:
-        template_content = get_resource_content('simple_report_pdf.tpl')
+        template_content = get_resource_content("simple_report_pdf.tpl")
-        css_content = get_resource_content('simple_report_pdf.css')
+        css_content = get_resource_content("simple_report_pdf.css")
    else:
-        template_content = get_resource_content('simple_report.tpl')
+        template_content = get_resource_content("simple_report.tpl")
        css_content = None
    template = Template(template_content)
-    template.globals['title'] = CaseConverter.snake_to_title
+    template.globals["title"] = CaseConverter.snake_to_title  # type: ignore
-    template.globals['detect_link'] = enrich_link_str
+    template.globals["detect_link"] = enrich_link_str  # type: ignore
    return template, css_content
@@ -92,15 +103,15 @@ def generate_report_context(username_results: list):
    brief_text = []
    usernames = {}
    extended_info_count = 0
-    tags = {}
+    tags: Dict[str, int] = {}
-    supposed_data = {}
+    supposed_data: Dict[str, Any] = {}
    first_seen = None
    for username, id_type, results in username_results:
        found_accounts = 0
        new_ids = []
-        usernames[username] = {'type': id_type}
+        usernames[username] = {"type": id_type}
        for website_name in results:
            dictionary = results[website_name]
@@ -108,16 +119,19 @@ def generate_report_context(username_results: list):
            if not dictionary:
                continue
-            if dictionary.get('is_similar'):
+            if dictionary.get("is_similar"):
                continue
            status = dictionary.get("status")
            if not status:  # FIXME: currently in case of timeout
                continue
            status = dictionary.get('status')
            if status.ids_data:
-                dictionary['ids_data'] = status.ids_data
+                dictionary["ids_data"] = status.ids_data
                extended_info_count += 1
                # detect first seen
-                created_at = status.ids_data.get('created_at')
+                created_at = status.ids_data.get("created_at")
                if created_at:
                    if first_seen is None:
                        first_seen = created_at
@@ -127,37 +141,46 @@ def generate_report_context(username_results: list):
                            new_time = parse_datetime_str(created_at)
                            if new_time < known_time:
                                first_seen = created_at
-                        except:
+                        except Exception as e:
-                            logging.debug('Problems with converting datetime %s/%s', first_seen, created_at)
+                            logging.debug(
                                "Problems with converting datetime %s/%s: %s",
                                first_seen,
                                created_at,
                                str(e),
                            )
                for k, v in status.ids_data.items():
                    # suppose target data
-                    field = 'fullname' if k == 'name' else k
+                    field = "fullname" if k == "name" else k
-                    if not field in supposed_data:
+                    if field not in supposed_data:
                        supposed_data[field] = []
                    supposed_data[field].append(v)
                    # suppose country
-                    if k in ['country', 'locale']:
+                    if k in ["country", "locale"]:
                        try:
                            if is_country_tag(k):
                                tag = pycountry.countries.get(alpha_2=v).alpha_2.lower()
                            else:
-                                tag = pycountry.countries.search_fuzzy(v)[0].alpha_2.lower()
+                                tag = pycountry.countries.search_fuzzy(v)[
                                    0
                                ].alpha_2.lower()
                            # TODO: move countries to another struct
                            tags[tag] = tags.get(tag, 0) + 1
                        except Exception as e:
-                            logging.debug('pycountry exception', exc_info=True)
+                            logging.debug(
                                "Pycountry exception: %s", str(e), exc_info=True
                            )
-            new_usernames = dictionary.get('ids_usernames')
+            new_usernames = dictionary.get("ids_usernames")
            if new_usernames:
                for u, utype in new_usernames.items():
-                    if not u in usernames:
+                    if u not in usernames:
                        new_ids.append((u, utype))
-                        usernames[u] = {'type': utype}
+                        usernames[u] = {"type": utype}
            if status.status == QueryStatus.CLAIMED:
                found_accounts += 1
-                dictionary['found'] = True
+                dictionary["found"] = True
            else:
                continue
@@ -166,25 +189,24 @@ def generate_report_context(username_results: list):
                for t in status.tags:
                    tags[t] = tags.get(t, 0) + 1
-
+        brief_text.append(
-        brief_text.append(f'Search by {id_type} {username} returned {found_accounts} accounts.')
+            f"Search by {id_type} {username} returned {found_accounts} accounts."
        )
        if new_ids:
            ids_list = []
            for u, t in new_ids:
-                ids_list.append(f'{u} ({t})' if t != 'username' else u)
+                ids_list.append(f"{u} ({t})" if t != "username" else u)
-            brief_text.append(f'Found target\'s other IDs: ' + ', '.join(ids_list) + '.')
+            brief_text.append("Found target's other IDs: " + ", ".join(ids_list) + ".")
-    brief_text.append(f'Extended info extracted from {extended_info_count} accounts.')
+    brief_text.append(f"Extended info extracted from {extended_info_count} accounts.")
-
+    brief = " ".join(brief_text).strip()
    brief = ' '.join(brief_text).strip()
    tuple_sort = lambda d: sorted(d, key=lambda x: x[1], reverse=True)
-    if 'global' in tags:
+    if "global" in tags:
        # remove tag 'global' useless for country detection
-        del tags['global']
+        del tags["global"]
    first_username = username_results[0][0]
    countries_lists = list(filter(lambda x: is_country_tag(x[0]), tags.items()))
@@ -193,35 +215,33 @@ def generate_report_context(username_results: list):
    filtered_supposed_data = filter_supposed_data(supposed_data)
    return {
-        'username': first_username,
+        "username": first_username,
-        'brief': brief,
+        "brief": brief,
-        'results': username_results,
+        "results": username_results,
-        'first_seen': first_seen,
+        "first_seen": first_seen,
-        'interests_tuple_list': tuple_sort(interests_list),
+        "interests_tuple_list": tuple_sort(interests_list),
-        'countries_tuple_list': tuple_sort(countries_lists),
+        "countries_tuple_list": tuple_sort(countries_lists),
-        'supposed_data': filtered_supposed_data,
+        "supposed_data": filtered_supposed_data,
-        'generated_at': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
+        "generated_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
    }
 def generate_csv_report(username: str, results: dict, csvfile):
    writer = csv.writer(csvfile)
-    writer.writerow(['username',
+    writer.writerow(
-                     'name',
+        ["username", "name", "url_main", "url_user", "exists", "http_status"]
-                     'url_main',
+    )
                     'url_user',
                     'exists',
                     'http_status'
                     ]
                    )
    for site in results:
-        writer.writerow([username,
+        writer.writerow(
-                         site,
+            [
-                         results[site]['url_main'],
+                username,
-                         results[site]['url_user'],
+                site,
-                         str(results[site]['status'].status),
+                results[site]["url_main"],
-                         results[site]['http_status'],
+                results[site]["url_user"],
-                        ])
+                str(results[site]["status"].status),
                results[site]["http_status"],
            ]
        )
 def generate_txt_report(username: str, results: dict, file):
@@ -234,12 +254,11 @@ def generate_txt_report(username: str, results: dict, file):
        if dictionary.get("status").status == QueryStatus.CLAIMED:
            exists_counter += 1
            file.write(dictionary["url_user"] + "\n")
-    file.write(f'Total Websites Username Detected On : {exists_counter}')
+    file.write(f"Total Websites Username Detected On : {exists_counter}")
 def generate_json_report(username: str, results: dict, file, report_type):
-    exists_counter = 0
+    is_report_per_line = report_type.startswith("ndjson")
    is_report_per_line = report_type.startswith('ndjson')
    all_json = {}
    for sitename in results:
@@ -249,20 +268,23 @@ def generate_json_report(username: str, results: dict, file, report_type):
            continue
        data = dict(site_result)
-        data['status'] = data['status'].json()
+        data["status"] = data["status"].json()
        if is_report_per_line:
-            data['sitename'] = sitename
+            data["sitename"] = sitename
-            file.write(json.dumps(data)+'\n')
+            file.write(json.dumps(data) + "\n")
        else:
            all_json[sitename] = data
    if not is_report_per_line:
        file.write(json.dumps(all_json))
-'''
+
 """
 XMIND 8 Functions
-'''
+"""
 def save_xmind_report(filename, username, results):
    if os.path.exists(filename):
        os.remove(filename)
@@ -273,13 +295,12 @@ def save_xmind_report(filename, username, results):
 def design_sheet(sheet, username, results):
    ##all tag list
    alltags = {}
    supposed_data = {}
-    sheet.setTitle("%s Analysis"%(username))
+    sheet.setTitle("%s Analysis" % (username))
    root_topic1 = sheet.getRootTopic()
-    root_topic1.setTitle("%s"%(username))
+    root_topic1.setTitle("%s" % (username))
    undefinedsection = root_topic1.addSubTopic()
    undefinedsection.setTitle("Undefined")
@@ -289,7 +310,7 @@ def design_sheet(sheet, username, results):
        dictionary = results[website_name]
        if dictionary.get("status").status == QueryStatus.CLAIMED:
-            ## firsttime I found that entry
+            # firsttime I found that entry
            for tag in dictionary.get("status").tags:
                if tag.strip() == "":
                    continue
@@ -318,22 +339,22 @@ def design_sheet(sheet, username, results):
                    # suppose target data
                    if not isinstance(v, list):
                        currentsublabel = userlink.addSubTopic()
-                        field = 'fullname' if k == 'name' else k
+                        field = "fullname" if k == "name" else k
-                        if not field in supposed_data:
+                        if field not in supposed_data:
                            supposed_data[field] = []
                        supposed_data[field].append(v)
                        currentsublabel.setTitle("%s: %s" % (k, v))
                    else:
                        for currentval in v:
                            currentsublabel = userlink.addSubTopic()
-                            field = 'fullname' if k == 'name' else k
+                            field = "fullname" if k == "name" else k
-                            if not field in supposed_data:
+                            if field not in supposed_data:
                                supposed_data[field] = []
                            supposed_data[field].append(currentval)
                            currentsublabel.setTitle("%s: %s" % (k, currentval))
-    ### Add Supposed DATA
+    # add supposed data
    filterede_supposed_data = filter_supposed_data(supposed_data)
-    if(len(filterede_supposed_data) >0):
+    if len(filterede_supposed_data) > 0:
        undefinedsection = root_topic1.addSubTopic()
        undefinedsection.setTitle("SUPPOSED DATA")
        for k, v in filterede_supposed_data.items():
@@ -342,8 +363,9 @@ def design_sheet(sheet, username, results):
 def check_supported_json_format(value):
-    if value and not value in SUPPORTED_JSON_REPORT_FORMATS:
+    if value and value not in SUPPORTED_JSON_REPORT_FORMATS:
-        raise ArgumentTypeError(f'JSON report type must be one of the following types: '
+        raise ArgumentTypeError(
-            + ', '.join(SUPPORTED_JSON_REPORT_FORMATS))
+            "JSON report type must be one of the following types: "
            + ", ".join(SUPPORTED_JSON_REPORT_FORMATS)
        )
    return value
@@ -68,7 +68,7 @@
        <div class="row-mb">
            <div class="col-md">
                <div class="card flex-md-row mb-4 box-shadow h-md-250">
-                    <img class="card-img-right flex-auto d-none d-md-block" alt="Photo" style="width: 200px; height: 200px; object-fit: scale-down;" src="{{ v.status.ids_data.image or 'https://i.imgur.com/040fmbw.png' }}" data-holder-rendered="true">
+                    <img class="card-img-right flex-auto d-md-block" alt="Photo" style="width: 200px; height: 200px; object-fit: scale-down;" src="{{ v.status.ids_data.image or 'https://i.imgur.com/040fmbw.png' }}" data-holder-rendered="true">
                    <div class="card-body d-flex flex-column align-items-start" style="padding-top: 0;">
                    <h3 class="mb-0" style="padding-top: 1rem;">
                        <a class="text-dark" href="{{ v.url_main }}" target="_blank">{{ k }}</a>
@@ -10,6 +10,7 @@ class QueryStatus(Enum):
    Describes status of query about a given username.
    """
    CLAIMED = "Claimed"  # Username Detected
    AVAILABLE = "Available"  # Username Not Detected
    UNKNOWN = "Unknown"  # Error Occurred While Trying To Detect Username
@@ -27,14 +28,24 @@ class QueryStatus(Enum):
        return self.value
-class QueryResult():
+class QueryResult:
    """Query Result Object.
    Describes result of query about a given username.
    """
-    def __init__(self, username, site_name, site_url_user, status, ids_data=None,
+    def __init__(
-                 query_time=None, context=None, tags=[]):
+        self,
        username,
        site_name,
        site_url_user,
        status,
        ids_data=None,
        query_time=None,
        context=None,
        error=None,
        tags=[],
    ):
        """Create Query Result Object.
        Contains information about a specific method of detecting usernames on
@@ -73,17 +84,21 @@ class QueryResult():
        self.context = context
        self.ids_data = ids_data
        self.tags = tags
        self.error = error
    def json(self):
        return {
-            'username': self.username,
+            "username": self.username,
-            'site_name': self.site_name,
+            "site_name": self.site_name,
-            'url': self.site_url_user,
+            "url": self.site_url_user,
-            'status': str(self.status),
+            "status": str(self.status),
-            'ids': self.ids_data or {},
+            "ids": self.ids_data or {},
-            'tags': self.tags,
+            "tags": self.tags,
        }
    def is_found(self):
        return self.status == QueryStatus.CLAIMED
    def __str__(self):
        """Convert Object To String.
@@ -1,9 +1,9 @@
-# -*- coding: future_annotations -*-
+# ****************************** -*-
 """Maigret Sites Information"""
 import copy
 import json
 import re
 import sys
 from typing import Optional, List, Dict, Any
 import requests
@@ -11,18 +11,56 @@ from .utils import CaseConverter, URLMatcher, is_country_tag
 # TODO: move to data.json
 SUPPORTED_TAGS = [
-    'gaming', 'coding', 'photo', 'music', 'blog', 'finance', 'freelance', 'dating',
+    "gaming",
-    'tech', 'forum', 'porn', 'erotic', 'webcam', 'video', 'movies', 'hacking', 'art',
+    "coding",
-    'discussion', 'sharing', 'writing', 'wiki', 'business', 'shopping', 'sport',
+    "photo",
-    'books', 'news', 'documents', 'travel', 'maps', 'hobby', 'apps', 'classified',
+    "music",
-    'career', 'geosocial', 'streaming', 'education', 'networking', 'torrent',
+    "blog",
    "finance",
    "freelance",
    "dating",
    "tech",
    "forum",
    "porn",
    "erotic",
    "webcam",
    "video",
    "movies",
    "hacking",
    "art",
    "discussion",
    "sharing",
    "writing",
    "wiki",
    "business",
    "shopping",
    "sport",
    "books",
    "news",
    "documents",
    "travel",
    "maps",
    "hobby",
    "apps",
    "classified",
    "career",
    "geosocial",
    "streaming",
    "education",
    "networking",
    "torrent",
    "science",
    "medicine",
    "reading",
    "stock",
 ]
 class MaigretEngine:
    site: Dict[str, Any] = {}
    def __init__(self, name, data):
        self.name = name
        self.site = {}
        self.__dict__.update(data)
    @property
@@ -32,43 +70,49 @@ class MaigretEngine:
 class MaigretSite:
    NOT_SERIALIZABLE_FIELDS = [
-        'name',
+        "name",
-        'engineData',
+        "engineData",
-        'requestFuture',
+        "requestFuture",
-        'detectedEngine',
+        "detectedEngine",
-        'engineObj',
+        "engineObj",
-        'stats',
+        "stats",
-        'urlRegexp',
+        "urlRegexp",
    ]
    username_claimed = ""
    username_unclaimed = ""
    url_subpath = ""
    url_main = ""
    url = ""
    disabled = False
    similar_search = False
    ignore403 = False
    tags: List[str] = []
    type = "username"
    headers: Dict[str, str] = {}
    errors: Dict[str, str] = {}
    activation: Dict[str, Any] = {}
    regex_check = None
    url_probe = None
    check_type = ""
    request_head_only = ""
    get_params: Dict[str, Any] = {}
    presense_strs: List[str] = []
    absence_strs: List[str] = []
    stats: Dict[str, Any] = {}
    engine = None
    engine_data: Dict[str, Any] = {}
    engine_obj: Optional["MaigretEngine"] = None
    request_future = None
    alexa_rank = None
    source = None
    def __init__(self, name, information):
        self.name = name
-
+        self.url_subpath = ""
        self.disabled = False
        self.similar_search = False
        self.ignore_403 = False
        self.tags = []
        self.type = 'username'
        self.headers = {}
        self.errors = {}
        self.activation = {}
        self.url_subpath = ''
        self.regex_check = None
        self.url_probe = None
        self.check_type = ''
        self.request_head_only = ''
        self.get_params = {}
        self.presense_strs = []
        self.absence_strs = []
        self.stats = {}
        self.engine = None
        self.engine_data = {}
        self.engine_obj = None
        self.request_future = None
        self.alexa_rank = None
        for k, v in information.items():
            self.__dict__[CaseConverter.camel_to_snake(k)] = v
@@ -83,23 +127,31 @@ class MaigretSite:
        return f"{self.name} ({self.url_main})"
    def update_detectors(self):
-        if 'url' in self.__dict__:
+        if "url" in self.__dict__:
            url = self.url
-            for group in ['urlMain', 'urlSubpath']:
+            for group in ["urlMain", "urlSubpath"]:
                if group in url:
-                    url = url.replace('{'+group+'}', self.__dict__[CaseConverter.camel_to_snake(group)])
+                    url = url.replace(
                        "{" + group + "}",
                        self.__dict__[CaseConverter.camel_to_snake(group)],
                    )
            self.url_regexp = URLMatcher.make_profile_url_regexp(url, self.regex_check)
-    def detect_username(self, url: str) -> str:
+    def detect_username(self, url: str) -> Optional[str]:
        if self.url_regexp:
            import logging
            match_groups = self.url_regexp.match(url)
            if match_groups:
-                return match_groups.groups()[-1].rstrip('/')
+                return match_groups.groups()[-1].rstrip("/")
        return None
    @property
    def pretty_name(self):
        if self.source:
            return f"{self.name} [{self.source}]"
        return self.name
    @property
    def json(self):
        result = {}
@@ -107,7 +159,7 @@ class MaigretSite:
            # convert to camelCase
            field = CaseConverter.snake_to_camel(k)
            # strip empty elements
-            if v in (False, '', [], {}, None, sys.maxsize, 'username'):
+            if v in (False, "", [], {}, None, sys.maxsize, "username"):
                continue
            if field in self.NOT_SERIALIZABLE_FIELDS:
                continue
@@ -115,13 +167,13 @@ class MaigretSite:
        return result
-    def update(self, updates: dict) -> MaigretSite:
+    def update(self, updates: "dict") -> "MaigretSite":
        self.__dict__.update(updates)
        self.update_detectors()
        return self
-    def update_from_engine(self, engine: MaigretEngine) -> MaigretSite:
+    def update_from_engine(self, engine: MaigretEngine) -> "MaigretSite":
        engine_data = engine.site
        for k, v in engine_data.items():
            field = CaseConverter.camel_to_snake(k)
@@ -139,7 +191,7 @@ class MaigretSite:
        return self
-    def strip_engine_data(self) -> MaigretSite:
+    def strip_engine_data(self) -> "MaigretSite":
        if not self.engine_obj:
            return self
@@ -147,7 +199,7 @@ class MaigretSite:
        self.url_regexp = None
        self_copy = copy.deepcopy(self)
-        engine_data = self_copy.engine_obj.site
+        engine_data = self_copy.engine_obj and self_copy.engine_obj.site or {}
        site_data_keys = list(self_copy.__dict__.keys())
        for k in engine_data.keys():
@@ -156,7 +208,8 @@ class MaigretSite:
            # remove dict keys
            if isinstance(engine_data[k], dict) and is_exists:
                for f in engine_data[k].keys():
-                    del self_copy.__dict__[field][f]
+                    if f in self_copy.__dict__[field]:
                        del self_copy.__dict__[field][f]
                continue
            # remove list items
            if isinstance(engine_data[k], list) and is_exists:
@@ -183,29 +236,47 @@ class MaigretDatabase:
    def sites_dict(self):
        return {site.name: site for site in self._sites}
-    def ranked_sites_dict(self, reverse=False, top=sys.maxsize, tags=[], names=[],
+    def ranked_sites_dict(
-                          disabled=True, id_type='username'):
+        self,
        reverse=False,
        top=sys.maxsize,
        tags=[],
        names=[],
        disabled=True,
        id_type="username",
    ):
        """
-            Ranking and filtering of the sites list
+        Ranking and filtering of the sites list
        """
        normalized_names = list(map(str.lower, names))
        normalized_tags = list(map(str.lower, tags))
        is_name_ok = lambda x: x.name.lower() in normalized_names
-        is_engine_ok = lambda x: isinstance(x.engine, str) and x.engine.lower() in normalized_tags
+        is_source_ok = lambda x: x.source and x.source.lower() in normalized_names
        is_engine_ok = (
            lambda x: isinstance(x.engine, str) and x.engine.lower() in normalized_tags
        )
        is_tags_ok = lambda x: set(x.tags).intersection(set(normalized_tags))
-        is_disabled_needed = lambda x: not x.disabled or ('disabled' in tags or disabled)
+        is_disabled_needed = lambda x: not x.disabled or (
            "disabled" in tags or disabled
        )
        is_id_type_ok = lambda x: x.type == id_type
        filter_tags_engines_fun = lambda x: not tags or is_engine_ok(x) or is_tags_ok(x)
-        filter_names_fun = lambda x: not names or is_name_ok(x)
+        filter_names_fun = lambda x: not names or is_name_ok(x) or is_source_ok(x)
-        filter_fun = lambda x: filter_tags_engines_fun(x) and filter_names_fun(x) \
+        filter_fun = (
-                               and is_disabled_needed(x) and is_id_type_ok(x)
+            lambda x: filter_tags_engines_fun(x)
            and filter_names_fun(x)
            and is_disabled_needed(x)
            and is_id_type_ok(x)
        )
        filtered_list = [s for s in self.sites if filter_fun(s)]
-        sorted_list = sorted(filtered_list, key=lambda x: x.alexa_rank, reverse=reverse)[:top]
+        sorted_list = sorted(
            filtered_list, key=lambda x: x.alexa_rank, reverse=reverse
        )[:top]
        return {site.name: site for site in sorted_list}
    @property
@@ -216,7 +287,7 @@ class MaigretDatabase:
    def engines_dict(self):
        return {engine.name: engine for engine in self._engines}
-    def update_site(self, site: MaigretSite) -> MaigretDatabase:
+    def update_site(self, site: MaigretSite) -> "MaigretDatabase":
        for s in self._sites:
            if s.name == site.name:
                s = site
@@ -225,21 +296,20 @@ class MaigretDatabase:
        self._sites.append(site)
        return self
-    def save_to_file(self, filename: str) -> MaigretDatabase:
+    def save_to_file(self, filename: str) -> "MaigretDatabase":
        db_data = {
-            'sites': {site.name: site.strip_engine_data().json for site in self._sites},
+            "sites": {site.name: site.strip_engine_data().json for site in self._sites},
-            'engines': {engine.name: engine.json for engine in self._engines},
+            "engines": {engine.name: engine.json for engine in self._engines},
        }
        json_data = json.dumps(db_data, indent=4)
-        with open(filename, 'w') as f:
+        with open(filename, "w") as f:
            f.write(json_data)
        return self
-
+    def load_from_json(self, json_data: dict) -> "MaigretDatabase":
    def load_from_json(self, json_data: dict) -> MaigretDatabase:
        # Add all of site information from the json file to internal site list.
        site_data = json_data.get("sites", {})
        engines_data = json_data.get("engines", {})
@@ -251,32 +321,32 @@ class MaigretDatabase:
            try:
                maigret_site = MaigretSite(site_name, site_data[site_name])
-                engine = site_data[site_name].get('engine')
+                engine = site_data[site_name].get("engine")
                if engine:
                    maigret_site.update_from_engine(self.engines_dict[engine])
                self._sites.append(maigret_site)
            except KeyError as error:
-                raise ValueError(f"Problem parsing json content for site {site_name}: "
+                raise ValueError(
-                                 f"Missing attribute {str(error)}."
+                    f"Problem parsing json content for site {site_name}: "
-                                 )
+                    f"Missing attribute {str(error)}."
                )
        return self
-
+    def load_from_str(self, db_str: "str") -> "MaigretDatabase":
    def load_from_str(self, db_str: str) -> MaigretDatabase:
        try:
            data = json.loads(db_str)
        except Exception as error:
-            raise ValueError(f"Problem parsing json contents from str"
+            raise ValueError(
-                             f"'{db_str[:50]}'...:  {str(error)}."
+                f"Problem parsing json contents from str"
-                             )
+                f"'{db_str[:50]}'...:  {str(error)}."
            )
        return self.load_from_json(data)
-
+    def load_from_url(self, url: str) -> "MaigretDatabase":
-    def load_from_url(self, url: str) -> MaigretDatabase:
+        is_url_valid = url.startswith("http://") or url.startswith("https://")
        is_url_valid = url.startswith('http://') or url.startswith('https://')
        if not is_url_valid:
            raise FileNotFoundError(f"Invalid data file URL '{url}'.")
@@ -284,39 +354,40 @@ class MaigretDatabase:
        try:
            response = requests.get(url=url)
        except Exception as error:
-            raise FileNotFoundError(f"Problem while attempting to access "
+            raise FileNotFoundError(
-                                    f"data file URL '{url}':  "
+                f"Problem while attempting to access "
-                                    f"{str(error)}"
+                f"data file URL '{url}':  "
-                                    )
+                f"{str(error)}"
            )
        if response.status_code == 200:
            try:
                data = response.json()
            except Exception as error:
-                raise ValueError(f"Problem parsing json contents at "
+                raise ValueError(
-                                 f"'{url}':  {str(error)}."
+                    f"Problem parsing json contents at " f"'{url}':  {str(error)}."
-                                 )
+                )
        else:
-            raise FileNotFoundError(f"Bad response while accessing "
+            raise FileNotFoundError(
-                                    f"data file URL '{url}'."
+                f"Bad response while accessing " f"data file URL '{url}'."
-                                    )
+            )
        return self.load_from_json(data)
-
+    def load_from_file(self, filename: "str") -> "MaigretDatabase":
    def load_from_file(self, filename: str) -> MaigretDatabase:
        try:
-            with open(filename, 'r', encoding='utf-8') as file:
+            with open(filename, "r", encoding="utf-8") as file:
                try:
                    data = json.load(file)
                except Exception as error:
-                    raise ValueError(f"Problem parsing json contents from "
+                    raise ValueError(
-                                     f"file '{filename}':  {str(error)}."
+                        f"Problem parsing json contents from "
-                                     )
+                        f"file '{filename}':  {str(error)}."
                    )
        except FileNotFoundError as error:
-            raise FileNotFoundError(f"Problem while attempting to access "
+            raise FileNotFoundError(
-                                    f"data file '{filename}'."
+                f"Problem while attempting to access " f"data file '{filename}'."
-                                    )
+            ) from error
        return self.load_from_json(data)
@@ -324,8 +395,8 @@ class MaigretDatabase:
        sites = sites_dict or self.sites_dict
        found_flags = {}
        for _, s in sites.items():
-            if 'presense_flag' in s.stats:
+            if "presense_flag" in s.stats:
-                flag = s.stats['presense_flag']
+                flag = s.stats["presense_flag"]
                found_flags[flag] = found_flags.get(flag, 0) + 1
        return found_flags
@@ -334,7 +405,7 @@ class MaigretDatabase:
        if not sites_dict:
            sites_dict = self.sites_dict()
-        output = ''
+        output = ""
        disabled_count = 0
        total_count = len(sites_dict)
        urls = {}
@@ -345,18 +416,18 @@ class MaigretDatabase:
                disabled_count += 1
            url = URLMatcher.extract_main_part(site.url)
-            if url.startswith('{username}'):
+            if url.startswith("{username}"):
-                url = 'SUBDOMAIN'
+                url = "SUBDOMAIN"
-            elif url == '':
+            elif url == "":
-                url = f'{site.url} ({site.engine})'
+                url = f"{site.url} ({site.engine})"
            else:
-                parts = url.split('/')
+                parts = url.split("/")
-                url = '/' + '/'.join(parts[1:])
+                url = "/" + "/".join(parts[1:])
            urls[url] = urls.get(url, 0) + 1
            if not site.tags:
-                tags['NO_TAGS'] = tags.get('NO_TAGS', 0) + 1
+                tags["NO_TAGS"] = tags.get("NO_TAGS", 0) + 1
            for tag in site.tags:
                if is_country_tag(tag):
@@ -364,17 +435,17 @@ class MaigretDatabase:
                    continue
                tags[tag] = tags.get(tag, 0) + 1
-        output += f'Enabled/total sites: {total_count-disabled_count}/{total_count}\n'
+        output += f"Enabled/total sites: {total_count - disabled_count}/{total_count}\n"
-        output += 'Top sites\' profile URLs:\n'
+        output += "Top sites' profile URLs:\n"
        for url, count in sorted(urls.items(), key=lambda x: x[1], reverse=True)[:20]:
            if count == 1:
                break
-            output += f'{count}\t{url}\n'
+            output += f"{count}\t{url}\n"
-        output += 'Top sites\' tags:\n'
+        output += "Top sites' tags:\n"
        for tag, count in sorted(tags.items(), key=lambda x: x[1], reverse=True):
-            mark = ''
+            mark = ""
-            if not tag in SUPPORTED_TAGS:
+            if tag not in SUPPORTED_TAGS:
-                mark = ' (non-standard)'
+                mark = " (non-standard)"
-            output += f'{count}\t{tag}{mark}\n'
+            output += f"{count}\t{tag}{mark}\n"
-        return output
+        return output
@@ -1,34 +1,58 @@
 import asyncio
 import difflib
-import json
+import re
 from typing import List
 import requests
 from mock import Mock
-from .checking import *
+from .activation import import_aiohttp_cookies
 from .checking import maigret
 from .result import QueryStatus
 from .sites import MaigretDatabase, MaigretSite, MaigretEngine
 from .utils import get_random_user_agent
-DESIRED_STRINGS = ["username", "not found", "пользователь", "profile", "lastname", "firstname", "biography",
+
-                   "birthday", "репутация", "информация", "e-mail"]
+DESIRED_STRINGS = [
    "username",
    "not found",
    "пользователь",
    "profile",
    "lastname",
    "firstname",
    "biography",
    "birthday",
    "репутация",
    "информация",
    "e-mail",
 ]
 SUPPOSED_USERNAMES = ["alex", "god", "admin", "red", "blue", "john"]
 HEADERS = {
    "User-Agent": get_random_user_agent(),
 }
 RATIO = 0.6
 TOP_FEATURES = 5
-URL_RE = re.compile(r'https?://(www\.)?')
+URL_RE = re.compile(r"https?://(www\.)?")
 def get_match_ratio(x):
-    return round(max([
+    return round(
-        difflib.SequenceMatcher(a=x.lower(), b=y).ratio()
+        max(
-        for y in DESIRED_STRINGS
+            [difflib.SequenceMatcher(a=x.lower(), b=y).ratio() for y in DESIRED_STRINGS]
-    ]), 2)
+        ),
        2,
    )
-def extract_domain(url):
+def extract_mainpage_url(url):
-    return '/'.join(url.split('/', 3)[:3])
+    return "/".join(url.split("/", 3)[:3])
 async def site_self_check(site, logger, semaphore, db: MaigretDatabase, silent=False):
    query_notify = Mock()
    changes = {
-        'disabled': False,
+        "disabled": False,
    }
    check_data = [
@@ -36,29 +60,27 @@ async def site_self_check(site, logger, semaphore, db: MaigretDatabase, silent=F
        (site.username_unclaimed, QueryStatus.AVAILABLE),
    ]
-    logger.info(f'Checking {site.name}...')
+    logger.info(f"Checking {site.name}...")
    for username, status in check_data:
-        async with semaphore:
+        results_dict = await maigret(
-            results_dict = await maigret(
+            username=username,
-                username,
+            site_dict={site.name: site},
-                {site.name: site},
+            logger=logger,
-                query_notify,
+            timeout=30,
-                logger,
+            id_type=site.type,
-                timeout=30,
+            forced=True,
-                id_type=site.type,
+            no_progressbar=True,
-                forced=True,
+        )
                no_progressbar=True,
            )
-            # don't disable entries with other ids types
+        # don't disable entries with other ids types
-            # TODO: make normal checking
+        # TODO: make normal checking
-            if site.name not in results_dict:
+        if site.name not in results_dict:
-                logger.info(results_dict)
+            logger.info(results_dict)
-                changes['disabled'] = True
+            changes["disabled"] = True
-                continue
+            continue
-            result = results_dict[site.name]['status']
+        result = results_dict[site.name]["status"]
        site_status = result.status
@@ -67,48 +89,133 @@ async def site_self_check(site, logger, semaphore, db: MaigretDatabase, silent=F
                msgs = site.absence_strs
                etype = site.check_type
                logger.warning(
-                    f'Error while searching {username} in {site.name}: {result.context}, {msgs}, type {etype}')
+                    "Error while searching '%s' in %s: %s, %s, check type %s",
                    username,
                    site.name,
                    result.context,
                    msgs,
                    etype,
                )
                # don't disable in case of available username
                if status == QueryStatus.CLAIMED:
-                    changes['disabled'] = True
+                    changes["disabled"] = True
            elif status == QueryStatus.CLAIMED:
-                logger.warning(f'Not found `{username}` in {site.name}, must be claimed')
+                logger.warning(
                    f"Not found `{username}` in {site.name}, must be claimed"
                )
                logger.info(results_dict[site.name])
-                changes['disabled'] = True
+                changes["disabled"] = True
            else:
-                logger.warning(f'Found `{username}` in {site.name}, must be available')
+                logger.warning(f"Found `{username}` in {site.name}, must be available")
                logger.info(results_dict[site.name])
-                changes['disabled'] = True
+                changes["disabled"] = True
-    logger.info(f'Site {site.name} checking is finished')
+    logger.info(f"Site {site.name} checking is finished")
    return changes
-async def submit_dialog(db, url_exists):
+def generate_additional_fields_dialog(engine: MaigretEngine, dialog):
-    domain_raw = URL_RE.sub('', url_exists).strip().strip('/')
+    fields = {}
-    domain_raw = domain_raw.split('/')[0]
+    if 'urlSubpath' in engine.site.get('url', ''):
        msg = (
            'Detected engine suppose additional URL subpath using (/forum/, /blog/, etc). '
            'Enter in manually if it exists: '
        )
        subpath = input(msg).strip('/')
        if subpath:
            fields['urlSubpath'] = f'/{subpath}'
    return fields
    matched_sites = list(filter(lambda x: domain_raw in x.url_main+x.url, db.sites))
    if matched_sites:
        print(f'Sites with domain "{domain_raw}" already exists in the Maigret database!')
        status = lambda s: '(disabled)' if s.disabled else ''
        url_block = lambda s: f'\n\t{s.url_main}\n\t{s.url}'
        print('\n'.join([f'{site.name} {status(site)}{url_block(site)}' for site in matched_sites]))
        return False
-    url_parts = url_exists.split('/')
+async def detect_known_engine(
    db, url_exists, url_mainpage, logger
 ) -> List[MaigretSite]:
    try:
        r = requests.get(url_mainpage)
    except Exception as e:
        logger.warning(e)
        print("Some error while checking main page")
        return []
    for engine in db.engines:
        strs_to_check = engine.__dict__.get("presenseStrs")
        if strs_to_check and r and r.text:
            all_strs_in_response = True
            for s in strs_to_check:
                if s not in r.text:
                    all_strs_in_response = False
            sites = []
            if all_strs_in_response:
                engine_name = engine.__dict__.get("name")
                print(f"Detected engine {engine_name} for site {url_mainpage}")
                usernames_to_check = SUPPOSED_USERNAMES
                supposed_username = extract_username_dialog(url_exists)
                if supposed_username:
                    usernames_to_check = [supposed_username] + usernames_to_check
                add_fields = generate_additional_fields_dialog(engine, url_exists)
                for u in usernames_to_check:
                    site_data = {
                        "urlMain": url_mainpage,
                        "name": url_mainpage.split("//")[1],
                        "engine": engine_name,
                        "usernameClaimed": u,
                        "usernameUnclaimed": "noonewouldeverusethis7",
                        **add_fields,
                    }
                    logger.info(site_data)
                    maigret_site = MaigretSite(url_mainpage.split("/")[-1], site_data)
                    maigret_site.update_from_engine(db.engines_dict[engine_name])
                    sites.append(maigret_site)
                return sites
    return []
 def extract_username_dialog(url):
    url_parts = url.rstrip("/").split("/")
    supposed_username = url_parts[-1]
-    new_name = input(f'Is "{supposed_username}" a valid username? If not, write it manually: ')
+    entered_username = input(
-    if new_name:
+        f'Is "{supposed_username}" a valid username? If not, write it manually: '
-        supposed_username = new_name
+    )
-    non_exist_username = 'noonewouldeverusethis7'
+    return entered_username if entered_username else supposed_username
-    url_user = url_exists.replace(supposed_username, '{username}')
+
 async def check_features_manually(
    db, url_exists, url_mainpage, cookie_file, logger, redirects=True
 ):
    supposed_username = extract_username_dialog(url_exists)
    non_exist_username = "noonewouldeverusethis7"
    url_user = url_exists.replace(supposed_username, "{username}")
    url_not_exists = url_exists.replace(supposed_username, non_exist_username)
-    a = requests.get(url_exists).text
+    # cookies
-    b = requests.get(url_not_exists).text
+    cookie_dict = None
    if cookie_file:
        cookie_jar = await import_aiohttp_cookies(cookie_file)
        cookie_dict = {c.key: c.value for c in cookie_jar}
    exists_resp = requests.get(
        url_exists, cookies=cookie_dict, headers=HEADERS, allow_redirects=redirects
    )
    logger.debug(exists_resp.status_code)
    logger.debug(exists_resp.text)
    non_exists_resp = requests.get(
        url_not_exists, cookies=cookie_dict, headers=HEADERS, allow_redirects=redirects
    )
    logger.debug(non_exists_resp.status_code)
    logger.debug(non_exists_resp.text)
    a = exists_resp.text
    b = non_exists_resp.text
    tokens_a = set(a.split('"'))
    tokens_b = set(b.split('"'))
@@ -116,57 +223,114 @@ async def submit_dialog(db, url_exists):
    a_minus_b = tokens_a.difference(tokens_b)
    b_minus_a = tokens_b.difference(tokens_a)
-    top_features_count = int(input(f'Specify count of features to extract [default {TOP_FEATURES}]: ') or TOP_FEATURES)
+    if len(a_minus_b) == len(b_minus_a) == 0:
        print("The pages for existing and non-existing account are the same!")
-    presence_list = sorted(a_minus_b, key=get_match_ratio, reverse=True)[:top_features_count]
+    top_features_count = int(
        input(f"Specify count of features to extract [default {TOP_FEATURES}]: ")
        or TOP_FEATURES
    )
-    print('Detected text features of existing account: ' + ', '.join(presence_list))
+    presence_list = sorted(a_minus_b, key=get_match_ratio, reverse=True)[
-    features = input('If features was not detected correctly, write it manually: ')
+        :top_features_count
    ]
    print("Detected text features of existing account: " + ", ".join(presence_list))
    features = input("If features was not detected correctly, write it manually: ")
    if features:
-        presence_list = features.split(',')
+        presence_list = features.split(",")
-    absence_list = sorted(b_minus_a, key=get_match_ratio, reverse=True)[:top_features_count]
+    absence_list = sorted(b_minus_a, key=get_match_ratio, reverse=True)[
-    print('Detected text features of non-existing account: ' + ', '.join(absence_list))
+        :top_features_count
-    features = input('If features was not detected correctly, write it manually: ')
+    ]
    print("Detected text features of non-existing account: " + ", ".join(absence_list))
    features = input("If features was not detected correctly, write it manually: ")
    if features:
-        absence_list = features.split(',')
+        absence_list = features.split(",")
    url_main = extract_domain(url_exists)
    site_data = {
-        'absenceStrs': absence_list,
+        "absenceStrs": absence_list,
-        'presenseStrs': presence_list,
+        "presenseStrs": presence_list,
-        'url': url_user,
+        "url": url_user,
-        'urlMain': url_main,
+        "urlMain": url_mainpage,
-        'usernameClaimed': supposed_username,
+        "usernameClaimed": supposed_username,
-        'usernameUnclaimed': non_exist_username,
+        "usernameUnclaimed": non_exist_username,
-        'checkType': 'message',
+        "checkType": "message",
    }
-    site = MaigretSite(url_main.split('/')[-1], site_data)
+    site = MaigretSite(url_mainpage.split("/")[-1], site_data)
    return site
-    print(site.__dict__)
+
 async def submit_dialog(db, url_exists, cookie_file, logger):
    domain_raw = URL_RE.sub("", url_exists).strip().strip("/")
    domain_raw = domain_raw.split("/")[0]
    # check for existence
    matched_sites = list(filter(lambda x: domain_raw in x.url_main + x.url, db.sites))
    if matched_sites:
        print(
            f'Sites with domain "{domain_raw}" already exists in the Maigret database!'
        )
        status = lambda s: "(disabled)" if s.disabled else ""
        url_block = lambda s: f"\n\t{s.url_main}\n\t{s.url}"
        print(
            "\n".join(
                [
                    f"{site.name} {status(site)}{url_block(site)}"
                    for site in matched_sites
                ]
            )
        )
        if input("Do you want to continue? [yN] ").lower() in "n":
            return False
    url_mainpage = extract_mainpage_url(url_exists)
    sites = await detect_known_engine(db, url_exists, url_mainpage, logger)
    if not sites:
        print("Unable to detect site engine, lets generate checking features")
        sites = [
            await check_features_manually(
                db, url_exists, url_mainpage, cookie_file, logger
            )
        ]
    logger.debug(sites[0].__dict__)
    sem = asyncio.Semaphore(1)
    log_level = logging.INFO
    logging.basicConfig(
        format='[%(filename)s:%(lineno)d] %(levelname)-3s  %(asctime)s %(message)s',
        datefmt='%H:%M:%S',
        level=log_level
    )
    logger = logging.getLogger('site-submit')
    logger.setLevel(log_level)
-    result = await site_self_check(site, logger, sem, db)
+    found = False
    chosen_site = None
    for s in sites:
        chosen_site = s
        result = await site_self_check(s, logger, sem, db)
        if not result["disabled"]:
            found = True
            break
-    if result['disabled']:
+    if not found:
-        print(f'Sorry, we couldn\'t find params to detect account presence/absence in {site.name}.')
+        print(
-        print('Try to run this mode again and increase features count or choose others.')
+            f"Sorry, we couldn't find params to detect account presence/absence in {chosen_site.name}."
        )
        print(
            "Try to run this mode again and increase features count or choose others."
        )
    else:
-        if input(f'Site {site.name} successfully checked. Do you want to save it in the Maigret DB? [yY] ') in 'yY':
+        if (
-            db.update_site(site)
+            input(
                f"Site {chosen_site.name} successfully checked. Do you want to save it in the Maigret DB? [Yn] "
            ).lower()
            in "y"
        ):
            logger.debug(chosen_site.json)
            site_data = chosen_site.strip_engine_data()
            logger.debug(site_data.json)
            db.update_site(site_data)
            return True
    return False
@@ -0,0 +1,11 @@
 from typing import Callable, List, Dict, Tuple, Any
 # search query
 QueryDraft = Tuple[Callable, List, Dict]
 # options dict
 QueryOptions = Dict[str, Any]
 # TODO: throw out
 QueryResultWrapper = Dict[str, Any]
@@ -1,58 +1,88 @@
 import re
-import sys
+import random
 DEFAULT_USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
 ]
 class CaseConverter:
    @staticmethod
    def camel_to_snake(camelcased_string: str) -> str:
-        return re.sub(r'(?<!^)(?=[A-Z])', '_', camelcased_string).lower()
+        return re.sub(r"(?<!^)(?=[A-Z])", "_", camelcased_string).lower()
    @staticmethod
    def snake_to_camel(snakecased_string: str) -> str:
-        formatted = ''.join(word.title() for word in snakecased_string.split('_'))
+        formatted = "".join(word.title() for word in snakecased_string.split("_"))
        result = formatted[0].lower() + formatted[1:]
        return result
    @staticmethod
    def snake_to_title(snakecased_string: str) -> str:
-        words = snakecased_string.split('_')
+        words = snakecased_string.split("_")
        words[0] = words[0].title()
-        return ' '.join(words)
+        return " ".join(words)
 def is_country_tag(tag: str) -> bool:
    """detect if tag represent a country"""
-    return bool(re.match("^([a-zA-Z]){2}$", tag)) or tag == 'global'
+    return bool(re.match("^([a-zA-Z]){2}$", tag)) or tag == "global"
 def enrich_link_str(link: str) -> str:
    link = link.strip()
-    if link.startswith('www.') or (link.startswith('http') and '//' in link):
+    if link.startswith("www.") or (link.startswith("http") and "//" in link):
        return f'<a class="auto-link" href="{link}">{link}</a>'
    return link
 class URLMatcher:
-    _HTTP_URL_RE_STR = '^https?://(www.)?(.+)$'
+    _HTTP_URL_RE_STR = "^https?://(www.)?(.+)$"
    HTTP_URL_RE = re.compile(_HTTP_URL_RE_STR)
-    UNSAFE_SYMBOLS = '.?'
+    UNSAFE_SYMBOLS = ".?"
    @classmethod
    def extract_main_part(self, url: str) -> str:
        match = self.HTTP_URL_RE.search(url)
        if match and match.group(2):
-            return match.group(2).rstrip('/')
+            return match.group(2).rstrip("/")
-        return ''
+        return ""
    @classmethod
-    def make_profile_url_regexp(self, url: str, username_regexp: str = ''):
+    def make_profile_url_regexp(self, url: str, username_regexp: str = ""):
        url_main_part = self.extract_main_part(url)
        for c in self.UNSAFE_SYMBOLS:
-            url_main_part = url_main_part.replace(c, f'\\{c}')
+            url_main_part = url_main_part.replace(c, f"\\{c}")
-        username_regexp = username_regexp or '.+?'
+        username_regexp = username_regexp or ".+?"
-        url_regexp = url_main_part.replace('{username}', f'({username_regexp})')
+        url_regexp = url_main_part.replace("{username}", f"({username_regexp})")
-        regexp_str = self._HTTP_URL_RE_STR.replace('(.+)', url_regexp)
+        regexp_str = self._HTTP_URL_RE_STR.replace("(.+)", url_regexp)
-        return re.compile(regexp_str)
+        return re.compile(regexp_str)
 def get_dict_ascii_tree(items, prepend="", new_line=True):
    text = ""
    for num, item in enumerate(items):
        box_symbol = "┣╸" if num != len(items) - 1 else "┗╸"
        if type(item) == tuple:
            field_name, field_value = item
            if field_value.startswith("['"):
                is_last_item = num == len(items) - 1
                prepend_symbols = " " * 3 if is_last_item else " ┃ "
                field_value = get_dict_ascii_tree(eval(field_value), prepend_symbols)
            text += f"\n{prepend}{box_symbol}{field_name}: {field_value}"
        else:
            text += f"\n{prepend}{box_symbol} {item}"
    if not new_line:
        text = text[1:]
    return text
 def get_random_user_agent():
    return random.choice(DEFAULT_USER_AGENTS)
@@ -1,4 +1,4 @@
-aiohttp==3.7.3
+aiohttp==3.7.4
 aiohttp-socks==0.5.5
 arabic-reshaper==2.1.1
 async-timeout==3.0.1
@@ -13,22 +13,20 @@ future==0.18.2
 future-annotations==1.0.0
 html5lib==1.1
 idna==2.10
-Jinja2==2.11.2
+Jinja2==2.11.3
-lxml==4.6.2
+lxml==4.6.3
 MarkupSafe==1.1.1
 mock==4.0.2
 multidict==5.1.0
 Pillow==8.1.0
 pycountry==20.7.3
 PyPDF2==1.26.0
 PySocks==1.7.1
 python-bidi==0.4.2
 python-socks==1.1.2
 reportlab==3.5.59
 requests>=2.24.0
 requests-futures==1.0.0
 six==1.15.0
-socid-extractor>=0.0.12
+socid-extractor>=0.0.16
 soupsieve==2.1
 stem==1.8.0
 torrequest==0.1.0
@@ -1,3 +1,9 @@
 [egg_info]
 tag_build = 
-tag_date = 0
+tag_date = 0
 [flake8]
 per-file-ignores = __init__.py:F401
 [mypy]
 ignore_missing_imports = True
@@ -12,7 +12,7 @@ with open('requirements.txt') as rf:
    requires = rf.read().splitlines()
 setup(name='maigret',
-      version='0.1.14',
+      version='0.1.20',
      description='Collect a dossier on a person by username from a huge number of sites',
      long_description=long_description,
      long_description_content_type="text/markdown",
@@ -0,0 +1,2 @@
 #!/bin/sh
 pytest tests
@@ -1,11 +1,11 @@
 import glob
 import logging
 import os
 import pytest
 from _pytest.mark import Mark
 from mock import Mock
-from maigret.sites import MaigretDatabase, MaigretSite
+from maigret.sites import MaigretDatabase
 CUR_PATH = os.path.dirname(os.path.realpath(__file__))
 JSON_FILE = os.path.join(CUR_PATH, '../maigret/resources/data.json')
@@ -26,7 +26,8 @@ def get_test_reports_filenames():
 def remove_test_reports():
    reports_list = get_test_reports_filenames()
-    for f in reports_list: os.remove(f)
+    for f in reports_list:
        os.remove(f)
    logging.error(f'Removed test reports {reports_list}')
@@ -1,5 +1,6 @@
 """Maigret activation test functions"""
 import json
 import aiohttp
 import pytest
 from mock import Mock
@@ -43,8 +44,9 @@ async def test_import_aiohttp_cookies():
    url = 'https://httpbin.org/cookies'
    connector = aiohttp.TCPConnector(ssl=False)
-    session = aiohttp.ClientSession(connector=connector, trust_env=True,
+    session = aiohttp.ClientSession(
-                                    cookie_jar=cookie_jar)
+        connector=connector, trust_env=True, cookie_jar=cookie_jar
    )
    response = await session.get(url=url)
    result = json.loads(await response.content.read())
@@ -0,0 +1,73 @@
 """Maigret checking logic test functions"""
 import pytest
 import asyncio
 import logging
 from maigret.executors import (
    AsyncioSimpleExecutor,
    AsyncioProgressbarExecutor,
    AsyncioProgressbarSemaphoreExecutor,
    AsyncioProgressbarQueueExecutor,
 )
 logger = logging.getLogger(__name__)
 async def func(n):
    await asyncio.sleep(0.1 * (n % 3))
    return n
@pytest.mark.asyncio
 async def test_simple_asyncio_executor():
    tasks = [(func, [n], {}) for n in range(10)]
    executor = AsyncioSimpleExecutor(logger=logger)
    assert await executor.run(tasks) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    assert executor.execution_time > 0.2
    assert executor.execution_time < 0.3
@pytest.mark.asyncio
 async def test_asyncio_progressbar_executor():
    tasks = [(func, [n], {}) for n in range(10)]
    executor = AsyncioProgressbarExecutor(logger=logger)
    # no guarantees for the results order
    assert sorted(await executor.run(tasks)) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    assert executor.execution_time > 0.2
    assert executor.execution_time < 0.3
@pytest.mark.asyncio
 async def test_asyncio_progressbar_semaphore_executor():
    tasks = [(func, [n], {}) for n in range(10)]
    executor = AsyncioProgressbarSemaphoreExecutor(logger=logger, in_parallel=5)
    # no guarantees for the results order
    assert sorted(await executor.run(tasks)) == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    assert executor.execution_time > 0.2
    assert executor.execution_time < 0.4
@pytest.mark.asyncio
 async def test_asyncio_progressbar_queue_executor():
    tasks = [(func, [n], {}) for n in range(10)]
    executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=2)
    assert await executor.run(tasks) == [0, 1, 3, 2, 4, 6, 7, 5, 9, 8]
    assert executor.execution_time > 0.5
    assert executor.execution_time < 0.6
    executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=3)
    assert await executor.run(tasks) == [0, 3, 1, 4, 6, 2, 7, 9, 5, 8]
    assert executor.execution_time > 0.4
    assert executor.execution_time < 0.5
    executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=5)
    assert await executor.run(tasks) == [0, 3, 6, 1, 4, 7, 9, 2, 5, 8]
    assert executor.execution_time > 0.3
    assert executor.execution_time < 0.4
    executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=10)
    assert await executor.run(tasks) == [0, 3, 6, 9, 1, 4, 7, 2, 5, 8]
    assert executor.execution_time > 0.2
    assert executor.execution_time < 0.3
@@ -1,46 +1,37 @@
 """Maigret main module test functions"""
 import asyncio
 import pytest
 from mock import Mock
 from maigret.maigret import self_check
-from maigret.sites import MaigretDatabase, MaigretSite
+from maigret.sites import MaigretDatabase
 EXAMPLE_DB = {
-    'engines': {
+    'engines': {},
    },
    'sites': {
        "GooglePlayStore": {
-            "tags": [
+            "tags": ["global", "us"],
                "global",
                "us"
            ],
            "disabled": False,
            "checkType": "status_code",
            "alexaRank": 1,
            "url": "https://play.google.com/store/apps/developer?id={username}",
            "urlMain": "https://play.google.com/store",
            "usernameClaimed": "Facebook_nosuchname",
-            "usernameUnclaimed": "noonewouldeverusethis7"
+            "usernameUnclaimed": "noonewouldeverusethis7",
        },
        "Reddit": {
-            "tags": [
+            "tags": ["news", "social", "us"],
                "news",
                "social",
                "us"
            ],
            "checkType": "status_code",
-            "presenseStrs": [
+            "presenseStrs": ["totalKarma"],
                "totalKarma"
            ],
            "disabled": True,
            "alexaRank": 17,
            "url": "https://www.reddit.com/user/{username}",
            "urlMain": "https://www.reddit.com/",
            "usernameClaimed": "blue",
-            "usernameUnclaimed": "noonewouldeverusethis7"
+            "usernameUnclaimed": "noonewouldeverusethis7",
        },
-    }
+    },
 }
@@ -7,8 +7,16 @@ from io import StringIO
 import xmind
 from jinja2 import Template
-from maigret.report import generate_csv_report, generate_txt_report, save_xmind_report, save_html_report, \
+from maigret.report import (
-    save_pdf_report, generate_report_template, generate_report_context, generate_json_report
+    generate_csv_report,
    generate_txt_report,
    save_xmind_report,
    save_html_report,
    save_pdf_report,
    generate_report_template,
    generate_report_context,
    generate_json_report,
 )
 from maigret.result import QueryResult, QueryStatus
 EXAMPLE_RESULTS = {
@@ -17,14 +25,16 @@ EXAMPLE_RESULTS = {
        'parsing_enabled': True,
        'url_main': 'https://www.github.com/',
        'url_user': 'https://www.github.com/test',
-        'status': QueryResult('test',
+        'status': QueryResult(
-                              'GitHub',
+            'test',
-                              'https://www.github.com/test',
+            'GitHub',
-                              QueryStatus.CLAIMED,
+            'https://www.github.com/test',
-                              tags=['test_tag']),
+            QueryStatus.CLAIMED,
            tags=['test_tag'],
        ),
        'http_status': 200,
        'is_similar': False,
-        'rank': 78
+        'rank': 78,
    }
 }
@@ -33,74 +43,196 @@ BAD_RESULT = QueryResult('', '', '', QueryStatus.AVAILABLE)
 GOOD_500PX_RESULT = copy.deepcopy(GOOD_RESULT)
 GOOD_500PX_RESULT.tags = ['photo', 'us', 'global']
-GOOD_500PX_RESULT.ids_data = {"uid": "dXJpOm5vZGU6VXNlcjoyNjQwMzQxNQ==", "legacy_id": "26403415",
+GOOD_500PX_RESULT.ids_data = {
-                              "username": "alexaimephotographycars", "name": "Alex Aim\u00e9",
+    "uid": "dXJpOm5vZGU6VXNlcjoyNjQwMzQxNQ==",
-                              "website": "www.flickr.com/photos/alexaimephotography/",
+    "legacy_id": "26403415",
-                              "facebook_link": " www.instagram.com/street.reality.photography/",
+    "username": "alexaimephotographycars",
-                              "instagram_username": "alexaimephotography", "twitter_username": "Alexaimephotogr"}
+    "name": "Alex Aim\u00e9",
    "website": "www.flickr.com/photos/alexaimephotography/",
    "facebook_link": " www.instagram.com/street.reality.photography/",
    "instagram_username": "alexaimephotography",
    "twitter_username": "Alexaimephotogr",
 }
 GOOD_REDDIT_RESULT = copy.deepcopy(GOOD_RESULT)
 GOOD_REDDIT_RESULT.tags = ['news', 'us']
-GOOD_REDDIT_RESULT.ids_data = {"reddit_id": "t5_1nytpy", "reddit_username": "alexaimephotography",
+GOOD_REDDIT_RESULT.ids_data = {
-                               "fullname": "alexaimephotography",
+    "reddit_id": "t5_1nytpy",
-                               "image": "https://styles.redditmedia.com/t5_1nytpy/styles/profileIcon_7vmhdwzd3g931.jpg?width=256&height=256&crop=256:256,smart&frame=1&s=4f355f16b4920844a3f4eacd4237a7bf76b2e97e",
+    "reddit_username": "alexaimephotography",
-                               "is_employee": "False", "is_nsfw": "False", "is_mod": "True", "is_following": "True",
+    "fullname": "alexaimephotography",
-                               "has_user_profile": "True", "hide_from_robots": "False",
+    "image": "https://styles.redditmedia.com/t5_1nytpy/styles/profileIcon_7vmhdwzd3g931.jpg?width=256&height=256&crop=256:256,smart&frame=1&s=4f355f16b4920844a3f4eacd4237a7bf76b2e97e",
-                               "created_at": "2019-07-10 12:20:03", "total_karma": "53959", "post_karma": "52738"}
+    "is_employee": "False",
    "is_nsfw": "False",
    "is_mod": "True",
    "is_following": "True",
    "has_user_profile": "True",
    "hide_from_robots": "False",
    "created_at": "2019-07-10 12:20:03",
    "total_karma": "53959",
    "post_karma": "52738",
 }
 GOOD_IG_RESULT = copy.deepcopy(GOOD_RESULT)
 GOOD_IG_RESULT.tags = ['photo', 'global']
-GOOD_IG_RESULT.ids_data = {"instagram_username": "alexaimephotography", "fullname": "Alexaimephotography",
+GOOD_IG_RESULT.ids_data = {
-                           "id": "6828488620",
+    "instagram_username": "alexaimephotography",
-                           "image": "https://scontent-hel3-1.cdninstagram.com/v/t51.2885-19/s320x320/95420076_1169632876707608_8741505804647006208_n.jpg?_nc_ht=scontent-hel3-1.cdninstagram.com&_nc_ohc=jd87OUGsX4MAX_Ym5GX&tp=1&oh=0f42badd68307ba97ec7fb1ef7b4bfd4&oe=601E5E6F",
+    "fullname": "Alexaimephotography",
-                           "bio": "Photographer \nChild of fine street arts",
+    "id": "6828488620",
-                           "external_url": "https://www.flickr.com/photos/alexaimephotography2020/"}
+    "image": "https://scontent-hel3-1.cdninstagram.com/v/t51.2885-19/s320x320/95420076_1169632876707608_8741505804647006208_n.jpg?_nc_ht=scontent-hel3-1.cdninstagram.com&_nc_ohc=jd87OUGsX4MAX_Ym5GX&tp=1&oh=0f42badd68307ba97ec7fb1ef7b4bfd4&oe=601E5E6F",
    "bio": "Photographer \nChild of fine street arts",
    "external_url": "https://www.flickr.com/photos/alexaimephotography2020/",
 }
 GOOD_TWITTER_RESULT = copy.deepcopy(GOOD_RESULT)
 GOOD_TWITTER_RESULT.tags = ['social', 'us']
-TEST = [('alexaimephotographycars', 'username', {
+TEST = [
-    '500px': {'username': 'alexaimephotographycars', 'parsing_enabled': True, 'url_main': 'https://500px.com/',
+    (
-              'url_user': 'https://500px.com/p/alexaimephotographycars',
+        'alexaimephotographycars',
-              'ids_usernames': {'alexaimephotographycars': 'username', 'alexaimephotography': 'username',
+        'username',
-                                'Alexaimephotogr': 'username'}, 'status': GOOD_500PX_RESULT, 'http_status': 200,
+        {
-              'is_similar': False, 'rank': 2981},
+            '500px': {
-    'Reddit': {'username': 'alexaimephotographycars', 'parsing_enabled': True, 'url_main': 'https://www.reddit.com/',
+                'username': 'alexaimephotographycars',
-               'url_user': 'https://www.reddit.com/user/alexaimephotographycars', 'status': BAD_RESULT,
+                'parsing_enabled': True,
-               'http_status': 404, 'is_similar': False, 'rank': 17},
+                'url_main': 'https://500px.com/',
-    'Twitter': {'username': 'alexaimephotographycars', 'parsing_enabled': True, 'url_main': 'https://www.twitter.com/',
+                'url_user': 'https://500px.com/p/alexaimephotographycars',
-                'url_user': 'https://twitter.com/alexaimephotographycars', 'status': BAD_RESULT, 'http_status': 400,
+                'ids_usernames': {
-                'is_similar': False, 'rank': 55},
+                    'alexaimephotographycars': 'username',
-    'Instagram': {'username': 'alexaimephotographycars', 'parsing_enabled': True,
+                    'alexaimephotography': 'username',
-                  'url_main': 'https://www.instagram.com/',
+                    'Alexaimephotogr': 'username',
-                  'url_user': 'https://www.instagram.com/alexaimephotographycars', 'status': BAD_RESULT,
+                },
-                  'http_status': 404, 'is_similar': False, 'rank': 29}}), ('alexaimephotography', 'username', {
+                'status': GOOD_500PX_RESULT,
-    '500px': {'username': 'alexaimephotography', 'parsing_enabled': True, 'url_main': 'https://500px.com/',
+                'http_status': 200,
-              'url_user': 'https://500px.com/p/alexaimephotography', 'status': BAD_RESULT, 'http_status': 200,
+                'is_similar': False,
-              'is_similar': False, 'rank': 2981},
+                'rank': 2981,
-    'Reddit': {'username': 'alexaimephotography', 'parsing_enabled': True, 'url_main': 'https://www.reddit.com/',
+            },
-               'url_user': 'https://www.reddit.com/user/alexaimephotography',
+            'Reddit': {
-               'ids_usernames': {'alexaimephotography': 'username'}, 'status': GOOD_REDDIT_RESULT, 'http_status': 200,
+                'username': 'alexaimephotographycars',
-               'is_similar': False, 'rank': 17},
+                'parsing_enabled': True,
-    'Twitter': {'username': 'alexaimephotography', 'parsing_enabled': True, 'url_main': 'https://www.twitter.com/',
+                'url_main': 'https://www.reddit.com/',
-                'url_user': 'https://twitter.com/alexaimephotography', 'status': BAD_RESULT, 'http_status': 400,
+                'url_user': 'https://www.reddit.com/user/alexaimephotographycars',
-                'is_similar': False, 'rank': 55},
+                'status': BAD_RESULT,
-    'Instagram': {'username': 'alexaimephotography', 'parsing_enabled': True, 'url_main': 'https://www.instagram.com/',
+                'http_status': 404,
-                  'url_user': 'https://www.instagram.com/alexaimephotography',
+                'is_similar': False,
-                  'ids_usernames': {'alexaimephotography': 'username'}, 'status': GOOD_IG_RESULT, 'http_status': 200,
+                'rank': 17,
-                  'is_similar': False, 'rank': 29}}), ('Alexaimephotogr', 'username', {
+            },
-    '500px': {'username': 'Alexaimephotogr', 'parsing_enabled': True, 'url_main': 'https://500px.com/',
+            'Twitter': {
-              'url_user': 'https://500px.com/p/Alexaimephotogr', 'status': BAD_RESULT, 'http_status': 200,
+                'username': 'alexaimephotographycars',
-              'is_similar': False, 'rank': 2981},
+                'parsing_enabled': True,
-    'Reddit': {'username': 'Alexaimephotogr', 'parsing_enabled': True, 'url_main': 'https://www.reddit.com/',
+                'url_main': 'https://www.twitter.com/',
-               'url_user': 'https://www.reddit.com/user/Alexaimephotogr', 'status': BAD_RESULT, 'http_status': 404,
+                'url_user': 'https://twitter.com/alexaimephotographycars',
-               'is_similar': False, 'rank': 17},
+                'status': BAD_RESULT,
-    'Twitter': {'username': 'Alexaimephotogr', 'parsing_enabled': True, 'url_main': 'https://www.twitter.com/',
+                'http_status': 400,
-                'url_user': 'https://twitter.com/Alexaimephotogr', 'status': GOOD_TWITTER_RESULT, 'http_status': 400,
+                'is_similar': False,
-                'is_similar': False, 'rank': 55},
+                'rank': 55,
-    'Instagram': {'username': 'Alexaimephotogr', 'parsing_enabled': True, 'url_main': 'https://www.instagram.com/',
+            },
-                  'url_user': 'https://www.instagram.com/Alexaimephotogr', 'status': BAD_RESULT, 'http_status': 404,
+            'Instagram': {
-                  'is_similar': False, 'rank': 29}})]
+                'username': 'alexaimephotographycars',
                'parsing_enabled': True,
                'url_main': 'https://www.instagram.com/',
                'url_user': 'https://www.instagram.com/alexaimephotographycars',
                'status': BAD_RESULT,
                'http_status': 404,
                'is_similar': False,
                'rank': 29,
            },
        },
    ),
    (
        'alexaimephotography',
        'username',
        {
            '500px': {
                'username': 'alexaimephotography',
                'parsing_enabled': True,
                'url_main': 'https://500px.com/',
                'url_user': 'https://500px.com/p/alexaimephotography',
                'status': BAD_RESULT,
                'http_status': 200,
                'is_similar': False,
                'rank': 2981,
            },
            'Reddit': {
                'username': 'alexaimephotography',
                'parsing_enabled': True,
                'url_main': 'https://www.reddit.com/',
                'url_user': 'https://www.reddit.com/user/alexaimephotography',
                'ids_usernames': {'alexaimephotography': 'username'},
                'status': GOOD_REDDIT_RESULT,
                'http_status': 200,
                'is_similar': False,
                'rank': 17,
            },
            'Twitter': {
                'username': 'alexaimephotography',
                'parsing_enabled': True,
                'url_main': 'https://www.twitter.com/',
                'url_user': 'https://twitter.com/alexaimephotography',
                'status': BAD_RESULT,
                'http_status': 400,
                'is_similar': False,
                'rank': 55,
            },
            'Instagram': {
                'username': 'alexaimephotography',
                'parsing_enabled': True,
                'url_main': 'https://www.instagram.com/',
                'url_user': 'https://www.instagram.com/alexaimephotography',
                'ids_usernames': {'alexaimephotography': 'username'},
                'status': GOOD_IG_RESULT,
                'http_status': 200,
                'is_similar': False,
                'rank': 29,
            },
        },
    ),
    (
        'Alexaimephotogr',
        'username',
        {
            '500px': {
                'username': 'Alexaimephotogr',
                'parsing_enabled': True,
                'url_main': 'https://500px.com/',
                'url_user': 'https://500px.com/p/Alexaimephotogr',
                'status': BAD_RESULT,
                'http_status': 200,
                'is_similar': False,
                'rank': 2981,
            },
            'Reddit': {
                'username': 'Alexaimephotogr',
                'parsing_enabled': True,
                'url_main': 'https://www.reddit.com/',
                'url_user': 'https://www.reddit.com/user/Alexaimephotogr',
                'status': BAD_RESULT,
                'http_status': 404,
                'is_similar': False,
                'rank': 17,
            },
            'Twitter': {
                'username': 'Alexaimephotogr',
                'parsing_enabled': True,
                'url_main': 'https://www.twitter.com/',
                'url_user': 'https://twitter.com/Alexaimephotogr',
                'status': GOOD_TWITTER_RESULT,
                'http_status': 400,
                'is_similar': False,
                'rank': 55,
            },
            'Instagram': {
                'username': 'Alexaimephotogr',
                'parsing_enabled': True,
                'url_main': 'https://www.instagram.com/',
                'url_user': 'https://www.instagram.com/Alexaimephotogr',
                'status': BAD_RESULT,
                'http_status': 404,
                'is_similar': False,
                'rank': 29,
            },
        },
    ),
 ]
 SUPPOSED_BRIEF = """Search by username alexaimephotographycars returned 1 accounts. Found target's other IDs: alexaimephotography, Alexaimephotogr. Search by username alexaimephotography returned 2 accounts. Search by username Alexaimephotogr returned 1 accounts. Extended info extracted from 3 accounts."""
@@ -187,7 +319,10 @@ def test_save_xmind_report():
    assert data['topic']['topics'][0]['title'] == 'Undefined'
    assert data['topic']['topics'][1]['title'] == 'test_tag'
    assert len(data['topic']['topics'][1]['topics']) == 1
-    assert data['topic']['topics'][1]['topics'][0]['label'] == 'https://www.github.com/test'
+    assert (
        data['topic']['topics'][1]['topics'][0]['label']
        == 'https://www.github.com/test'
    )
 def test_html_report():
@@ -1,35 +1,30 @@
 """Maigret Database test functions"""
 from maigret.sites import MaigretDatabase, MaigretSite
 EXAMPLE_DB = {
    'engines': {
        "XenForo": {
-          "presenseStrs": ["XenForo"],
+            "presenseStrs": ["XenForo"],
-          "site": {
+            "site": {
-            "absenceStrs": [
+                "absenceStrs": [
-              "The specified member cannot be found. Please enter a member's entire name.",
+                    "The specified member cannot be found. Please enter a member's entire name.",
-            ],
+                ],
-            "checkType": "message",
+                "checkType": "message",
-            "errors": {
+                "errors": {"You must be logged-in to do that.": "Login required"},
-              "You must be logged-in to do that.": "Login required"
+                "url": "{urlMain}{urlSubpath}/members/?username={username}",
            },
            "url": "{urlMain}{urlSubpath}/members/?username={username}"
          }
        },
    },
    'sites': {
        "Amperka": {
-          "engine": "XenForo",
+            "engine": "XenForo",
-          "rank": 121613,
+            "rank": 121613,
-          "tags": [
+            "tags": ["ru"],
-            "ru"
+            "urlMain": "http://forum.amperka.ru",
-          ],
+            "usernameClaimed": "adam",
-          "urlMain": "http://forum.amperka.ru",
+            "usernameUnclaimed": "noonewouldeverusethis7",
          "usernameClaimed": "adam",
          "usernameUnclaimed": "noonewouldeverusethis7"
        },
-    }
+    },
 }
@@ -117,8 +112,14 @@ def test_site_url_detector():
    db = MaigretDatabase()
    db.load_from_json(EXAMPLE_DB)
-    assert db.sites[0].url_regexp.pattern == r'^https?://(www.)?forum\.amperka\.ru/members/\?username=(.+?)$'
+    assert (
-    assert db.sites[0].detect_username('http://forum.amperka.ru/members/?username=test') == 'test'
+        db.sites[0].url_regexp.pattern
        == r'^https?://(www.)?forum\.amperka\.ru/members/\?username=(.+?)$'
    )
    assert (
        db.sites[0].detect_username('http://forum.amperka.ru/members/?username=test')
        == 'test'
    )
 def test_ranked_sites_dict():
@@ -167,6 +168,7 @@ def test_ranked_sites_dict_disabled():
    assert len(db.ranked_sites_dict()) == 2
    assert len(db.ranked_sites_dict(disabled=False)) == 1
 def test_ranked_sites_dict_id_type():
    db = MaigretDatabase()
    db.update_site(MaigretSite('1', {}))
@@ -1,66 +1,126 @@
 """Maigret utils test functions"""
 import itertools
 import re
-from maigret.utils import CaseConverter, is_country_tag, enrich_link_str, URLMatcher
+
 from maigret.utils import (
    CaseConverter,
    is_country_tag,
    enrich_link_str,
    URLMatcher,
    get_dict_ascii_tree,
 )
 def test_case_convert_camel_to_snake():
-	a = 'SnakeCasedString'
+    a = 'SnakeCasedString'
-	b = CaseConverter.camel_to_snake(a)
+    b = CaseConverter.camel_to_snake(a)
    assert b == 'snake_cased_string'
 	assert b == 'snake_cased_string'
 def test_case_convert_snake_to_camel():
-	a = 'camel_cased_string'
+    a = 'camel_cased_string'
-	b = CaseConverter.snake_to_camel(a)
+    b = CaseConverter.snake_to_camel(a)
    assert b == 'camelCasedString'
 	assert b == 'camelCasedString'
 def test_case_convert_snake_to_title():
-	a = 'camel_cased_string'
+    a = 'camel_cased_string'
-	b = CaseConverter.snake_to_title(a)
+    b = CaseConverter.snake_to_title(a)
    assert b == 'Camel cased string'
 def test_case_convert_camel_with_digits_to_snake():
    a = 'ignore403'
    b = CaseConverter.camel_to_snake(a)
    assert b == 'ignore403'
 	assert b == 'Camel cased string'
 def test_is_country_tag():
-	assert is_country_tag('ru') == True
+    assert is_country_tag('ru') == True
-	assert is_country_tag('FR') == True
+    assert is_country_tag('FR') == True
-	assert is_country_tag('a1') == False
+    assert is_country_tag('a1') == False
-	assert is_country_tag('dating') == False
+    assert is_country_tag('dating') == False
    assert is_country_tag('global') == True
 	assert is_country_tag('global') == True
 def test_enrich_link_str():
-	assert enrich_link_str('test') == 'test'
+    assert enrich_link_str('test') == 'test'
-	assert enrich_link_str(' www.flickr.com/photos/alexaimephotography/') == '<a class="auto-link" href="www.flickr.com/photos/alexaimephotography/">www.flickr.com/photos/alexaimephotography/</a>'
+    assert (
        enrich_link_str(' www.flickr.com/photos/alexaimephotography/')
        == '<a class="auto-link" href="www.flickr.com/photos/alexaimephotography/">www.flickr.com/photos/alexaimephotography/</a>'
    )
 def test_url_extract_main_part():
-	url_main_part = 'flickr.com/photos/alexaimephotography'
+    url_main_part = 'flickr.com/photos/alexaimephotography'
-	parts = [
+    parts = [
-		['http://', 'https://'],
+        ['http://', 'https://'],
-		['www.', ''],
+        ['www.', ''],
-		[url_main_part],
+        [url_main_part],
-		['/', ''],
+        ['/', ''],
-	]
+    ]
    url_regexp = re.compile('^https?://(www.)?flickr.com/photos/(.+?)$')
    for url_parts in itertools.product(*parts):
        url = ''.join(url_parts)
        assert URLMatcher.extract_main_part(url) == url_main_part
        assert not url_regexp.match(url) is None
 	url_regexp = re.compile('^https?://(www.)?flickr.com/photos/(.+?)$')
 	for url_parts in itertools.product(*parts):
 		url = ''.join(url_parts)
 		assert URLMatcher.extract_main_part(url) == url_main_part
 		assert not url_regexp.match(url) is None
 def test_url_make_profile_url_regexp():
-	url_main_part = 'flickr.com/photos/{username}'
+    url_main_part = 'flickr.com/photos/{username}'
-	parts = [
+    parts = [
-		['http://', 'https://'],
+        ['http://', 'https://'],
-		['www.', ''],
+        ['www.', ''],
-		[url_main_part],
+        [url_main_part],
-		['/', ''],
+        ['/', ''],
-	]
+    ]
-	for url_parts in itertools.product(*parts):
+    for url_parts in itertools.product(*parts):
-		url = ''.join(url_parts)
+        url = ''.join(url_parts)
-		assert URLMatcher.make_profile_url_regexp(url).pattern == r'^https?://(www.)?flickr\.com/photos/(.+?)$'
+        assert (
            URLMatcher.make_profile_url_regexp(url).pattern
            == r'^https?://(www.)?flickr\.com/photos/(.+?)$'
        )
 def test_get_dict_ascii_tree():
    data = {
        'uid': 'dXJpOm5vZGU6VXNlcjoyNjQwMzQxNQ==',
        'legacy_id': '26403415',
        'username': 'alexaimephotographycars',
        'name': 'Alex Aimé',
        'created_at': '2018-05-04T10:17:01.000+0000',
        'image': 'https://drscdn.500px.org/user_avatar/26403415/q%3D85_w%3D300_h%3D300/v2?webp=true&v=2&sig=0235678a4f7b65e007e864033ebfaf5ef6d87fad34f80a8639d985320c20fe3b',
        'image_bg': 'https://drscdn.500px.org/user_cover/26403415/q%3D65_m%3D2048/v2?webp=true&v=1&sig=bea411fb158391a4fdad498874ff17088f91257e59dfb376ff67e3a44c3a4201',
        'website': 'www.instagram.com/street.reality.photography/',
        'facebook_link': ' www.instagram.com/street.reality.photography/',
        'instagram_username': 'Street.Reality.Photography',
        'twitter_username': 'Alexaimephotogr',
    }
    ascii_tree = get_dict_ascii_tree(data.items())
    assert (
        ascii_tree
        == """
 ┣╸uid: dXJpOm5vZGU6VXNlcjoyNjQwMzQxNQ==
 ┣╸legacy_id: 26403415
 ┣╸username: alexaimephotographycars
 ┣╸name: Alex Aimé
 ┣╸created_at: 2018-05-04T10:17:01.000+0000
 ┣╸image: https://drscdn.500px.org/user_avatar/26403415/q%3D85_w%3D300_h%3D300/v2?webp=true&v=2&sig=0235678a4f7b65e007e864033ebfaf5ef6d87fad34f80a8639d985320c20fe3b
 ┣╸image_bg: https://drscdn.500px.org/user_cover/26403415/q%3D65_m%3D2048/v2?webp=true&v=1&sig=bea411fb158391a4fdad498874ff17088f91257e59dfb376ff67e3a44c3a4201
 ┣╸website: www.instagram.com/street.reality.photography/
 ┣╸facebook_link:  www.instagram.com/street.reality.photography/
 ┣╸instagram_username: Street.Reality.Photography
 ┗╸twitter_username: Alexaimephotogr"""
    )
@@ -20,8 +20,9 @@ RANKS.update({
    '5000': '5K',
    '10000': '10K',
    '100000': '100K',
-    '10000000': '1M',
+    '10000000': '10M',
-    '50000000': '10M',
+    '50000000': '50M',
    '100000000': '100M',
 })
 SEMAPHORE = threading.Semaphore(10)
@@ -58,8 +59,9 @@ def get_rank(domain_to_query, site, print_errors=True):
 def get_step_rank(rank):
    def get_readable_rank(r):
        return RANKS[str(r)]
    valid_step_ranks = sorted(map(int, RANKS.keys()))
-    if rank == 0:
+    if rank == 0 or rank == sys.maxsize:
        return get_readable_rank(valid_step_ranks[-1])
    else:
        return get_readable_rank(list(filter(lambda x: x >= rank, valid_step_ranks))[0])
@@ -73,6 +75,8 @@ if __name__ == '__main__':
                        help="JSON file with sites data to update.")
    parser.add_argument('--empty-only', help='update only sites without rating', action='store_true')
    parser.add_argument('--exclude-engine', help='do not update score with certain engine',
                        action="append", dest="exclude_engine_list", default=[])
    pool = list()
@@ -92,6 +96,8 @@ Rank data fetched from Alexa by domains.
            url_main = site.url_main
            if site.alexa_rank < sys.maxsize and args.empty_only:
                continue
            if args.exclude_engine_list and site.engine in args.exclude_engine_list:
                continue
            site.alexa_rank = 0
            th = threading.Thread(target=get_rank, args=(url_main, site))
            pool.append((site.name, url_main, th))
@@ -0,0 +1,71 @@
 #!/usr/bin/env python3
 import asyncio
 import logging
 import maigret
 # top popular sites from the Maigret database
 TOP_SITES_COUNT = 300
 # Maigret HTTP requests timeout
 TIMEOUT = 10
 # max parallel requests
 MAX_CONNECTIONS = 50
 if __name__ == '__main__':
    # setup logging and asyncio
    logger = logging.getLogger('maigret')
    logger.setLevel(logging.WARNING)
    loop = asyncio.get_event_loop()
    # setup Maigret
    db = maigret.MaigretDatabase().load_from_file('./maigret/resources/data.json')
    # also can be downloaded from web
    # db = MaigretDatabase().load_from_url(MAIGRET_DB_URL)
    # user input
    username = input('Enter username to search: ')
    sites_count_raw = input(
        f'Select the number of sites to search ({TOP_SITES_COUNT} for default, {len(db.sites_dict)} max): '
    )
    sites_count = int(sites_count_raw) or TOP_SITES_COUNT
    sites = db.ranked_sites_dict(top=sites_count)
    show_progressbar_raw = input('Do you want to show a progressbar? [Yn] ')
    show_progressbar = show_progressbar_raw.lower() != 'n'
    extract_info_raw = input(
        'Do you want to extract additional info from accounts\' pages? [Yn] '
    )
    extract_info = extract_info_raw.lower() != 'n'
    use_notifier_raw = input(
        'Do you want to use notifier for displaying results while searching? [Yn] '
    )
    use_notifier = use_notifier_raw.lower() != 'n'
    notifier = None
    if use_notifier:
        notifier = maigret.Notifier(print_found_only=True, skip_check_errors=True)
    # search!
    search_func = maigret.search(
        username=username,
        site_dict=sites,
        timeout=TIMEOUT,
        logger=logger,
        max_connections=MAX_CONNECTIONS,
        query_notify=notifier,
        no_progressbar=(not show_progressbar),
        is_parsing_enabled=extract_info,
    )
    results = loop.run_until_complete(search_func)
    input('Search completed. Press any key to show results.')
    for sitename, data in results.items():
        is_found = data['status'].is_found()
        print(f'{sitename} - {"Found!" if is_found else "Not found"}')
Author	SHA1	Message	Date
soxoj	1afdda7336	Merge pull request #119 from soxoj/0.1.20 Bump to 0.1.20	2021-05-02 12:05:08 +03:00
Soxoj	252d12ff9e	Bump to 0.1.20	2021-05-02 12:02:53 +03:00
soxoj	6afb17e24f	Merge pull request #118 from soxoj/submit-improving-new-sites Some sites added, submit mode improved	2021-05-02 11:08:52 +03:00
Soxoj	7fdd965bb2	Some sites added, submit mode improved	2021-05-02 11:06:37 +03:00
soxoj	8e30e969f9	Merge pull request #117 from soxoj/retries-refactoring Introduced `--retries` flag, made thorough refactoring	2021-05-01 23:58:28 +03:00
Soxoj	5ee91f6659	Introduced `--retries` flag, made thorough refactoring - updated sites list - test scripts linting	2021-05-01 23:54:01 +03:00
soxoj	7fd4a2c516	Merge pull request #116 from soxoj/refactoring-errors Refactoring and linting, added notifications about frequent search errors	2021-04-30 12:06:29 +03:00
Soxoj	bfa6afac32	Refactoring and linting, added notifications about frequent search errors	2021-04-30 12:03:13 +03:00
soxoj	bfaf276f6e	Merge pull request #115 from soxoj/submit-source-improving Added some new sites, implemented filtering by source site with `--na…	2021-04-29 17:18:31 +03:00
Soxoj	c9194b20ba	Added some new sites, implemented filtering by source site with `--name`, improved submit mode	2021-04-29 17:11:17 +03:00
soxoj	a30a012550	Merge pull request #114 from soxoj/new-sites-source-feature Added some new sites and introduced 'source' feature	2021-04-29 15:17:13 +03:00
Soxoj	2cdc9bb276	Added some new sites and introduced 'source' feature	2021-04-29 15:14:21 +03:00
soxoj	99fc6c8a8f	Merge pull request #113 from soxoj/errors-stats Errors stats MVP, some fp fixes	2021-04-25 01:13:39 +03:00
Soxoj	b269c4a8e0	Added new modules	2021-04-25 01:12:15 +03:00
Soxoj	f43dc5bd6f	Errors stats MVP, some fp fixes	2021-04-25 01:08:23 +03:00
soxoj	83cda9e37f	Merge pull request #112 from soxoj/tapd-added Sites update	2021-04-19 00:25:55 +03:00
soxoj	cc3df85690	Merge branch 'main' into tapd-added	2021-04-18 22:40:27 +03:00
Soxoj	8007e92021	Sites update	2021-04-18 22:38:30 +03:00
soxoj	daaddbde4e	Merge pull request #111 from soxoj/fp-fixes-18-04-21 Some false positives fixes	2021-04-18 15:26:11 +03:00
Soxoj	cea5073962	Some false positives fixes	2021-04-18 15:20:35 +03:00
soxoj	b345512489	Merge pull request #110 from soxoj/0.1.19 Bump to 0.1.19	2021-04-14 23:16:30 +03:00
Soxoj	786cb59145	Bump to 0.1.19	2021-04-14 23:14:33 +03:00
soxoj	481baddec6	Merge pull request #109 from soxoj/fp-fixes Some false positive fixes	2021-04-12 23:18:47 +03:00
Soxoj	ecb3d76581	Some false positive fixes	2021-04-12 23:16:26 +03:00
soxoj	8a8fab5bed	Merge pull request #108 from soxoj/async-tasks-timeout Added asyncio tasks with timeouts, non-blocking work with queue	2021-04-12 23:01:59 +03:00
Soxoj	2fee65fe4e	Added asyncio tasks with timeouts, non-blocking work with queue	2021-04-11 17:56:27 +03:00
soxoj	dabba859f3	Merge pull request #107 from soxoj/main-module-bugfix Fixed maigret-as-a-module start	2021-04-06 00:36:45 +03:00
Soxoj	74d4d40abd	Fixed maigret-as-a-module start	2021-04-06 00:33:39 +03:00
soxoj	d6f6d78d3f	Merge pull request #104 from soxoj/ascii-tree-bugfix Fixed ascii tree bug	2021-04-02 09:08:14 +03:00
Soxoj	1b61c5085e	Fixed ascii tree bug	2021-04-02 09:03:22 +03:00
soxoj	01e20518c1	Merge pull request #100 from soxoj/fp-fixes Fixed some false positives	2021-03-31 23:20:18 +03:00
Soxoj	8477385289	Fixed some false positives	2021-03-31 23:17:47 +03:00
soxoj	491dd8f166	Merge pull request #99 from soxoj/no-progressbar-option Added `--no-progressbar` flag	2021-03-30 19:47:42 +03:00
Soxoj	c64b7a1c85	Added --no-progressbar flag	2021-03-30 19:44:01 +03:00
soxoj	03511a7a8f	Merge pull request #97 from soxoj/wizard Some API improvements	2021-03-30 01:16:12 +03:00
Soxoj	7f1a0fae03	Some API improvements	2021-03-30 01:14:46 +03:00
soxoj	b0de174df2	Merge pull request #96 from soxoj/wizard Added search wizard script as an API usage example	2021-03-30 01:11:12 +03:00
Soxoj	b5db3f0035	Added search wizard script as an API usage example	2021-03-30 01:09:06 +03:00
soxoj	53d698bb7b	Merge pull request #95 from soxoj/socid-bump Updated socid_extractor version	2021-03-30 00:37:02 +03:00
soxoj	23fff42ca7	Merge pull request #94 from soxoj/dependabot/pip/lxml-4.6.3 Bump lxml from 4.6.2 to 4.6.3	2021-03-30 00:34:13 +03:00
Soxoj	51d9e6f5f6	Bump to v0.1.17	2021-03-30 00:33:51 +03:00
Soxoj	640c04f20b	Updated socid_extractor version	2021-03-30 00:31:40 +03:00
dependabot[bot]	69f78e331b	Bump lxml from 4.6.2 to 4.6.3 Bumps [lxml](https://github.com/lxml/lxml) from 4.6.2 to 4.6.3. - [Release notes](https://github.com/lxml/lxml/releases) - [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt) - [Commits](https://github.com/lxml/lxml/compare/lxml-4.6.2...lxml-4.6.3) Signed-off-by: dependabot[bot] <support@github.com>	2021-03-29 21:25:19 +00:00
soxoj	69c315b00e	Merge pull request #93 from soxoj/docs-requirements Documentation and API improving	2021-03-30 00:24:49 +03:00
Soxoj	b755628a1d	Documentation and API improving	2021-03-30 00:19:17 +03:00
soxoj	7490a412db	Merge pull request #92 from soxoj/ignore403-bugfix Fixed bug with ignore403 for engine-based sites	2021-03-28 17:40:35 +03:00
Soxoj	2741680d4a	Fixed bug with ignore403 for engine-based sites	2021-03-28 17:37:18 +03:00
soxoj	e5fc221ce2	Merge pull request #91 from soxoj/async-3.6.9-fix Fix of 3.6.9 asyncio create_task error	2021-03-24 21:43:11 +03:00
Soxoj	a044e3dd79	Fix of 3.6.9 asyncio create_task error	2021-03-24 21:37:56 +03:00
soxoj	6da4ff1e7b	Merge pull request #89 from soxoj/v0.1.16 Bump to 0.1.16	2021-03-21 18:58:48 +03:00
Soxoj	eb2442401d	Bump to 0.1.16	2021-03-21 18:50:13 +03:00
soxoj	d23d24eeca	Merge pull request #88 from soxoj/parsing-mode-improve Improving "parse" mode for extracting usernames and other info for a …	2021-03-21 18:41:17 +03:00
Soxoj	a2ddb15f09	Improving "parse" mode for extracting usernames and other info for a further search	2021-03-21 18:34:57 +03:00
soxoj	e90e85d2a9	Merge pull request #85 from soxoj/submit-improving Improved submit mode, several sites added	2021-03-21 14:04:09 +03:00
Soxoj	2bb01f7019	Improved submit mode, several sites added	2021-03-21 13:59:59 +03:00
soxoj	b586a4cd06	Merge pull request #84 from soxoj/ucoz-support Added support of uID.me and uCoz sites	2021-03-20 23:26:35 +03:00
Soxoj	28733282ab	CI reruns	2021-03-20 23:24:55 +03:00
Soxoj	0a7a7ad70d	Added support of uID.me and uCoz sites	2021-03-20 23:21:53 +03:00
soxoj	c895f6b418	Merge pull request #82 from soxoj/dependabot/pip/jinja2-2.11.3 Bump jinja2 from 2.11.2 to 2.11.3	2021-03-20 20:59:35 +03:00
soxoj	a6286a0286	Merge pull request #83 from soxoj/executors-update Created async requests executors, some sites fixes	2021-03-20 20:59:22 +03:00
Soxoj	314eb25d1f	Created async requests executors, some sites fixes	2021-03-20 20:57:07 +03:00
dependabot[bot]	fbbc8b49f3	Bump jinja2 from 2.11.2 to 2.11.3 Bumps [jinja2](https://github.com/pallets/jinja) from 2.11.2 to 2.11.3. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/master/CHANGES.rst) - [Commits](https://github.com/pallets/jinja/compare/2.11.2...2.11.3) Signed-off-by: dependabot[bot] <support@github.com>	2021-03-20 05:47:45 +00:00
soxoj	faa03b62e5	Merge pull request #81 from soxoj/dependabot/pip/pillow-8.1.1 Bump pillow from 8.1.0 to 8.1.1	2021-03-19 21:04:50 +03:00
dependabot[bot]	d676f7bb94	Bump pillow from 8.1.0 to 8.1.1 Bumps [pillow](https://github.com/python-pillow/Pillow) from 8.1.0 to 8.1.1. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst) - [Commits](https://github.com/python-pillow/Pillow/compare/8.1.0...8.1.1) Signed-off-by: dependabot[bot] <support@github.com>	2021-03-19 15:57:58 +00:00
soxoj	d4757aab78	Merge pull request #80 from soxoj/reformatting Reformat code, some sites added	2021-03-19 01:52:54 +03:00
Soxoj	908176be85	Reformat code, some sites added	2021-03-19 01:48:20 +03:00
soxoj	940f408da3	Merge pull request #79 from soxoj/new-sites-submit Added new sites through auto submit, some fixes	2021-03-18 23:35:19 +03:00
Soxoj	8c700b9810	Added new sites through auto submit, some fixes	2021-03-18 23:21:33 +03:00
soxoj	f9c9af5f41	Merge pull request #78 from soxoj/docker-update-readme Update README.md	2021-03-16 23:39:33 +03:00
soxoj	57a9a82102	Update README.md	2021-03-16 23:38:58 +03:00
soxoj	9bbca995e9	Merge pull request #77 from vincenttjia/main Fix Dockerfile	2021-03-16 23:34:17 +03:00
Vincent Tjianattan	39b713497d	Fix scipy build dependencies Fix scipy build dependencies by changing the image from python:3.7-alpine to python:3.7	2021-03-17 00:42:35 +07:00
soxoj	6a84875775	Merge pull request #76 from soxoj/new-sites Several sites added, Disqus improved, tags fixes	2021-03-15 23:58:09 +03:00
soxoj	84f7d93478	Merge branch 'main' into new-sites	2021-03-15 23:52:52 +03:00
Soxoj	17870ef5c8	Several sites added, Disqus improved, tags fixes	2021-03-15 23:45:20 +03:00
soxoj	d3cd5e45a1	Merge pull request #75 from soxoj/collab-badge Collab link added	2021-03-15 02:52:52 +03:00
soxoj	9a3f2f0aa7	Update README.md	2021-03-15 02:50:54 +03:00
soxoj	4b7d344b41	Merge pull request #73 from soxoj/cloud-based-run Update README.md	2021-03-15 00:28:19 +03:00
soxoj	ac9cfe7885	Update README.md	2021-03-15 00:26:29 +03:00
soxoj	6058a4b70c	Fixed repl.it	2021-03-15 00:15:16 +03:00
soxoj	3aa225bda4	Update README.md	2021-03-15 00:13:29 +03:00
soxoj	c6661e22ff	Merge pull request #72 from soxoj/v0.1.15 Bump to 0.1.15	2021-03-14 20:15:12 +03:00
Soxoj	fdb68b5e80	Bump to 0.1.15	2021-03-14 20:11:32 +03:00
soxoj	9fe6b99239	Merge pull request #71 from soxoj/html-report-img-fix Fixed HTML report images hiding for small screens + some minor fixes	2021-03-14 17:31:12 +03:00
Soxoj	b9d303fde3	Fixed HTML report images hiding for small screens + some minor fixes	2021-03-14 16:15:31 +03:00
soxoj	d29e88d96f	Merge pull request #70 from soxoj/extracting-flag Added separate `no-extracing` flag to rule page parsing	2021-03-14 13:22:29 +03:00
Soxoj	731a8e01f9	Added separate `no-extracing` flag to rule page parsing	2021-03-14 13:03:29 +03:00
soxoj	cf7acfd8c8	Merge pull request #69 from soxoj/tiktok-fix TikTok fixes	2021-03-13 00:02:25 +03:00
soxoj	9e6bd05acc	Merge pull request #68 from soxoj/ssl-error-catching Fixed catching of python-specific exception	2021-03-13 00:00:45 +03:00
Soxoj	6ea1dc33f7	TikTok fixes	2021-03-12 23:58:46 +03:00
Soxoj	d5bc92d26a	Fixed catching of python-specific exception	2021-03-12 23:34:59 +03:00
soxoj	f7263c9b3c	Merge pull request #67 from soxoj/fp-fixes Some false positives fixes	2021-03-12 23:31:54 +03:00
Soxoj	e6f82a8ba3	Some false positives fixes	2021-03-12 22:53:53 +03:00
soxoj	ba7a38092c	Merge pull request #65 from soxoj/dependabot/pip/aiohttp-3.7.4 Bump aiohttp from 3.7.3 to 3.7.4	2021-02-26 22:06:04 +03:00
dependabot[bot]	92a1677213	Bump aiohttp from 3.7.3 to 3.7.4 Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.7.3 to 3.7.4. - [Release notes](https://github.com/aio-libs/aiohttp/releases) - [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst) - [Commits](https://github.com/aio-libs/aiohttp/compare/v3.7.3...v3.7.4) Signed-off-by: dependabot[bot] <support@github.com>	2021-02-26 03:07:44 +00:00