Merge pull request #188 from soxoj/fp-fixes-speed-up

Accelerated start time & fixed some false positives
2026-05-07 06:24:35 +00:00 · 2021-10-31 18:35:26 +03:00 · 2021-10-31 18:25:01 +03:00 · 2021-09-04 18:26:29 +03:00 · 2021-09-04 18:10:56 +03:00 · 2021-09-04 18:02:02 +03:00
28 changed files with 1241 additions and 553 deletions
@@ -0,0 +1,13 @@
+---
+name: Add a site
+about: I want to add a new site for Maigret checks
+title: New site
+labels: new-site
+assignees: soxoj
+
+---
+
+Link to the site main page: https://example.com
+Link to an existing account: https://example.com/users/john
+Link to a nonexistent account: https://example.com/users/noonewouldeverusethis7
+Tags: photo, us, ...
@@ -2,6 +2,10 @@

 ## [Unreleased]

+## [0.3.1] - 2021-10-31
+* fixed false positives
+* accelerated maigret start time by 3 times
+
 ## [0.3.0] - 2021-06-02
 * added support of Tor and I2P sites
 * added experimental DNS checking feature
@@ -0,0 +1,128 @@
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, religion, or sexual identity
+and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the
+  overall community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or
+  advances of any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email
+  address, without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+https://t.me/soxoj.
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series
+of actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or
+permanent ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior,  harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within
+the community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.0, available at
+https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
+
+Community Impact Guidelines were inspired by [Mozilla's code of conduct
+enforcement ladder](https://github.com/mozilla/diversity).
+
+[homepage]: https://www.contributor-covenant.org
+
+For answers to common questions about this code of conduct, see the FAQ at
+https://www.contributor-covenant.org/faq. Translations are available at
+https://www.contributor-covenant.org/translations.
@@ -0,0 +1,30 @@
+# How to contribute
+
+Hey! I'm really glad you're reading this. Maigret contains a lot of sites, and it is very hard to keep all the sites operational. That's why any fix is important. 
+
+## How to add a new site
+
+#### Beginner level
+
+You can use Maigret **submit mode** (`maigret --submit URL`) to add a new site or update an existing site. In this mode Maigret do an automatic analysis of the given account URL or site main page URL to determine the site engine and methods to check account presence. After checking Maigret asks if you want to add the site, answering y/Y will rewrite the local database.
+
+#### Advanced level
+
+You can edit [the database JSON file](https://github.com/soxoj/maigret/blob/main/maigret/resources/data.json) (`./maigret/resources/data.json`) manually.
+
+## Testing
+
+There are CI checks for every PR to the Maigret repository. But it will be better to run `make format`, `make link` and `make test` to ensure you've made a corrent changes. 
+
+## Submitting changes
+
+To submit you changes you must [send a GitHub PR](https://github.com/soxoj/maigret/pulls) to the Maigret project.
+Always write a clear log message for your commits. One-line messages are fine for small changes, but bigger changes should look like this:
+
+    $ git commit -m "A brief summary of the commit
+    > 
+    > A paragraph describing what changed and its impact."
+
+## Coding conventions
+
+Start reading the code and you'll get the hang of it. ;)
@@ -1,25 +1,16 @@
-FROM python:3.7
-LABEL maintainer="Soxoj <soxoj@protonmail.com>"
-
+FROM python:3.9
+MAINTAINER Soxoj <soxoj@protonmail.com>
 WORKDIR /app
-
-ADD requirements.txt .
-
 RUN pip install --upgrade pip
-
-RUN apt update -y
-
-RUN apt install -y\
+RUN apt update && \
+	apt install -y \
      gcc \
      musl-dev \
      libxml2 \
      libxml2-dev \
-      libxslt-dev \
-&&  YARL_NO_EXTENSIONS=1 python3 -m pip install maigret \
-&&  rm -rf /var/cache/apk/* \
-           /tmp/* \
-           /var/tmp/*
-
+      libxslt-dev
+RUN apt clean \
+    && rm -rf /var/lib/apt/lists/* /tmp/*
 ADD . .
-
+RUN YARL_NO_EXTENSIONS=1 python3 -m pip install .
 ENTRYPOINT ["maigret"]
@@ -0,0 +1,35 @@
+LINT_FILES=maigret wizard.py tests
+
+test:
+	coverage run --source=./maigret -m pytest tests
+	coverage report -m
+	coverage html
+
+rerun-tests:
+	pytest --lf -vv
+
+lint:
+	@echo 'syntax errors or undefined names'
+	flake8 --count --select=E9,F63,F7,F82 --show-source --statistics ${LINT_FILES} maigret.py
+
+	@echo 'warning'
+	flake8 --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --ignore=E731,W503 ${LINT_FILES} maigret.py
+
+	@echo 'mypy'
+	mypy ${LINT_FILES}
+
+format:
+	@echo 'black'
+	black --skip-string-normalization ${LINT_FILES}
+
+pull:
+	git stash
+	git checkout main
+	git pull origin main
+	git stash pop
+
+clean:
+	rm -rf reports htmcov dist
+
+install:
+	pip3 install .
@@ -8,9 +8,12 @@
    <a href="https://pypi.org/project/maigret/">
      <img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dw/maigret?style=flat-square">
    </a>
+    <a href="https://pypi.org/project/maigret/">
+      <img alt="Views" src="https://komarev.com/ghpvc/?username=maigret&color=brightgreen&label=views&style=flat-square">
+    </a>
  </p>
  <p align="center">
-    <img src="./static/maigret.png" height="200"/>
+    <img src="https://raw.githubusercontent.com/soxoj/maigret/main/static/maigret.png" height="200"/>
  </p>
 </p>

@@ -20,7 +23,7 @@

 **Maigret** collect a dossier on a person **by username only**, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys required. Maigret is an easy-to-use and powerful fork of [Sherlock](https://github.com/sherlock-project/sherlock).

-Currently supported more than 2000 sites ([full list](./sites.md)), search is launched against 500 popular sites in descending order of popularity by default. Also supported checking of Tor sites, I2P sites, and domains (via DNS resolving).  
+Currently supported more than 2000 sites ([full list](https://raw.githubusercontent.com/soxoj/maigret/main/sites.md)), search is launched against 500 popular sites in descending order of popularity by default. Also supported checking of Tor sites, I2P sites, and domains (via DNS resolving).

 ## Main features

@@ -35,10 +38,13 @@ See full description of Maigret features [in the Wiki](https://github.com/soxoj/
 ## Installation

 Maigret can be installed using pip, Docker, or simply can be launched from the cloned repo.
-Also you can run Maigret using cloud shells (see buttons below). 
+Also you can run Maigret using cloud shells and Jupyter notebooks (see buttons below). 

-[![Open in Cloud Shell](https://user-images.githubusercontent.com/27065646/92304704-8d146d80-ef80-11ea-8c29-0deaabb1c702.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/soxoj/maigret&tutorial=README.md) [![Run on Repl.it](https://user-images.githubusercontent.com/27065646/92304596-bf719b00-ef7f-11ea-987f-2c1f3c323088.png)](https://repl.it/github/soxoj/maigret)
-<a href="https://colab.research.google.com/gist//soxoj/879b51bc3b2f8b695abb054090645000/maigret.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="40"></a>
+[![Open in Cloud Shell](https://user-images.githubusercontent.com/27065646/92304704-8d146d80-ef80-11ea-8c29-0deaabb1c702.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/soxoj/maigret&tutorial=README.md)
+<a href="https://repl.it/github/soxoj/maigret"><img src="https://user-images.githubusercontent.com/27065646/92304596-bf719b00-ef7f-11ea-987f-2c1f3c323088.png" alt="Run on Repl.it" height="50"></a>
+
+<a href="https://colab.research.google.com/gist/soxoj/879b51bc3b2f8b695abb054090645000/maigret-collab.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="45"></a>
+<a href="https://mybinder.org/v2/gist/soxoj/9d65c2f4d3bec5dd25949197ea73cf3a/HEAD"><img src="https://mybinder.org/badge_logo.svg" alt="Open In Binder" height="45"></a>

 ### Package installing

@@ -97,16 +103,16 @@ Use `maigret --help` to get full options description. Also options are documente

 ## Demo with page parsing and recursive username search

-[PDF report](./static/report_alexaimephotographycars.pdf), [HTML report](https://htmlpreview.github.io/?https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.html)
+[PDF report](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.pdf), [HTML report](https://htmlpreview.github.io/?https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.html)

-![animation of recursive search](./static/recursive_search.svg)
+![animation of recursive search](https://raw.githubusercontent.com/soxoj/maigret/main/static/recursive_search.svg)

-![HTML report screenshot](./static/report_alexaimephotography_html_screenshot.png)
+![HTML report screenshot](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotography_html_screenshot.png)

-![XMind report screenshot](./static/report_alexaimephotography_xmind_screenshot.png)
+![XMind report screenshot](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotography_xmind_screenshot.png)


-[Full console output](./static/recursive_search.md)
+[Full console output](https://raw.githubusercontent.com/soxoj/maigret/main/static/recursive_search.md)

 ## License

@@ -0,0 +1,68 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "8v6PEfyXb0Gx"
+   },
+   "outputs": [],
+   "source": [
+    "# clone the repo\n",
+    "!git clone https://github.com/soxoj/maigret\n",
+    "!pip3 install -r maigret/requirements.txt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "cXOQUAhDchkl"
+   },
+   "outputs": [],
+   "source": [
+    "# help\n",
+    "!python3 maigret/maigret.py --help"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "SjDmpN4QGnJu"
+   },
+   "outputs": [],
+   "source": [
+    "# search\n",
+    "!python3 maigret/maigret.py user"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "collapsed_sections": [],
+   "include_colab_link": true,
+   "name": "maigret.ipynb",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
@@ -1,5 +0,0 @@
-#!/bin/sh
-FILES="maigret wizard.py maigret.py tests"
-
-echo 'black'
-black --skip-string-normalization $FILES
@@ -1,11 +0,0 @@
-#!/bin/sh
-FILES="maigret wizard.py maigret.py tests"
-
-echo 'syntax errors or undefined names'
-flake8 --count --select=E9,F63,F7,F82 --show-source --statistics $FILES
-
-echo 'warning'
-flake8 --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --ignore=E731,W503 $FILES
-
-echo 'mypy'
-mypy ./maigret ./wizard.py ./tests
@@ -1,3 +1,3 @@
 """Maigret version file"""

-__version__ = '0.3.0'
+__version__ = '0.3.1'
@@ -1,6 +1,11 @@
 import asyncio
 import logging
-from mock import Mock
+
+try:
+    from mock import Mock
+except ImportError:
+    from unittest.mock import Mock
+
 import re
 import ssl
 import sys
@@ -8,12 +13,11 @@ import tqdm
 from typing import Tuple, Optional, Dict, List
 from urllib.parse import quote

-import aiohttp
 import aiodns
 import tqdm.asyncio
-from aiohttp_socks import ProxyConnector
 from python_socks import _errors as proxy_errors
 from socid_extractor import extract
+from aiohttp import TCPConnector, ClientSession, http_exceptions
 from aiohttp.client_exceptions import ServerDisconnectedError, ClientConnectorError

 from .activation import ParsingActivator, import_aiohttp_cookies
@@ -31,6 +35,7 @@ from .utils import get_random_user_agent, ascii_data_display


 SUPPORTED_IDS = (
+    "username",
    "yandex_public_id",
    "gaia_id",
    "vk_id",
@@ -54,12 +59,15 @@ class SimpleAiohttpChecker(CheckerBase):
        cookie_jar = kwargs.get('cookie_jar')
        self.logger = kwargs.get('logger', Mock())

+        # moved here to speed up the launch of Maigret
+        from aiohttp_socks import ProxyConnector
+
        # make http client session
        connector = (
-            ProxyConnector.from_url(proxy) if proxy else aiohttp.TCPConnector(ssl=False)
+            ProxyConnector.from_url(proxy) if proxy else TCPConnector(ssl=False)
        )
        connector.verify_ssl = False
-        self.session = aiohttp.ClientSession(
+        self.session = ClientSession(
            connector=connector, trust_env=True, cookie_jar=cookie_jar
        )

@@ -107,7 +115,7 @@ class SimpleAiohttpChecker(CheckerBase):
            error = CheckError("Connecting failure", str(e))
        except ServerDisconnectedError as e:
            error = CheckError("Server disconnected", str(e))
-        except aiohttp.http_exceptions.BadHttpMessage as e:
+        except http_exceptions.BadHttpMessage as e:
            error = CheckError("HTTP", str(e))
        except proxy_errors.ProxyError as e:
            error = CheckError("Proxy", str(e))
@@ -133,9 +141,12 @@ class ProxiedAiohttpChecker(SimpleAiohttpChecker):
        cookie_jar = kwargs.get('cookie_jar')
        self.logger = kwargs.get('logger', Mock())

+        # moved here to speed up the launch of Maigret
+        from aiohttp_socks import ProxyConnector
+
        connector = ProxyConnector.from_url(proxy)
        connector.verify_ssl = False
-        self.session = aiohttp.ClientSession(
+        self.session = ClientSession(
            connector=connector, trust_env=True, cookie_jar=cookie_jar
        )

@@ -1,7 +1,6 @@
 """
 Maigret main module
 """
-import aiohttp
 import asyncio
 import logging
 import os
@@ -10,8 +9,7 @@ import platform
 from argparse import ArgumentParser, RawDescriptionHelpFormatter
 from typing import List, Tuple

-import requests
-from socid_extractor import extract, parse, __version__ as socid_version
+from socid_extractor import extract, parse

 from .__version__ import __version__
 from .checking import (
@@ -34,11 +32,13 @@ from .report import (
    save_json_report,
    get_plaintext_report,
    sort_report_by_data_points,
+    save_graph_report,
 )
 from .sites import MaigretDatabase
-from .submit import submit_dialog
+from .submit import Submitter
 from .types import QueryResultWrapper
 from .utils import get_dict_ascii_tree
+from .settings import Settings


 def notify_about_errors(search_results: QueryResultWrapper, query_notify):
@@ -61,17 +61,6 @@ def notify_about_errors(search_results: QueryResultWrapper, query_notify):
        )


-def extract_ids_from_url(url: str, db: MaigretDatabase) -> dict:
-    results = {}
-    for s in db.sites:
-        result = s.extract_id_from_url(url)
-        if not result:
-            continue
-        _id, _type = result
-        results[_id] = _type
-    return results
-
-
 def extract_ids_from_page(url, logger, timeout=5) -> dict:
    results = {}
    # url, headers
@@ -117,18 +106,22 @@ def extract_ids_from_results(results: QueryResultWrapper, db: MaigretDatabase) -
                ids_results[u] = utype

        for url in dictionary.get('ids_links', []):
-            ids_results.update(extract_ids_from_url(url, db))
+            ids_results.update(db.extract_ids_from_url(url))

    return ids_results


 def setup_arguments_parser():
+    from aiohttp import __version__ as aiohttp_version
+    from requests import __version__ as requests_version
+    from socid_extractor import __version__ as socid_version
+
    version_string = '\n'.join(
        [
            f'%(prog)s {__version__}',
            f'Socid-extractor:  {socid_version}',
-            f'Aiohttp:  {aiohttp.__version__}',
-            f'Requests:  {requests.__version__}',
+            f'Aiohttp:  {aiohttp_version}',
+            f'Requests:  {requests_version}',
            f'Python:  {platform.python_version()}',
        ]
    )
@@ -204,7 +197,7 @@ def setup_arguments_parser():
        metavar="DB_FILE",
        dest="db_file",
        default=None,
-        help="Load Maigret database from a JSON file or an online, valid, JSON file.",
+        help="Load Maigret database from a JSON file or HTTP web resource.",
    )
    parser.add_argument(
        "--cookies-jar-file",
@@ -430,6 +423,14 @@ def setup_arguments_parser():
        default=False,
        help="Generate a PDF report (general report on all usernames).",
    )
+    report_group.add_argument(
+        "-G",
+        "--graph",
+        action="store_true",
+        dest="graph",
+        default=False,
+        help="Generate a graph report (general report on all usernames).",
+    )
    report_group.add_argument(
        "-J",
        "--json",
@@ -496,6 +497,12 @@ async def main():
    if args.tags:
        args.tags = list(set(str(args.tags).split(',')))

+    settings = Settings(
+        os.path.join(
+            os.path.dirname(os.path.realpath(__file__)), "resources/settings.json"
+        )
+    )
+
    if args.db_file is None:
        args.db_file = os.path.join(
            os.path.dirname(os.path.realpath(__file__)), "resources/data.json"
@@ -514,7 +521,7 @@ async def main():
    )

    # Create object with all information about sites we are aware of.
-    db = MaigretDatabase().load_from_file(args.db_file)
+    db = MaigretDatabase().load_from_path(args.db_file)
    get_top_sites_for_id = lambda x: db.ranked_sites_dict(
        top=args.top_sites,
        tags=args.tags,
@@ -526,9 +533,8 @@ async def main():
    site_data = get_top_sites_for_id(args.id_type)

    if args.new_site_to_submit:
-        is_submitted = await submit_dialog(
-            db, args.new_site_to_submit, args.cookie_file, logger
-        )
+        submitter = Submitter(db=db, logger=logger, settings=settings)
+        is_submitted = await submitter.dialog(args.new_site_to_submit, args.cookie_file)
        if is_submitted:
            db.save_to_file(args.db_file)

@@ -687,6 +693,11 @@ async def main():
            save_pdf_report(filename, report_context)
            query_notify.warning(f'PDF report on all usernames saved in {filename}')

+        if args.graph:
+            filename = report_filepath_tpl.format(username=username, postfix='.html')
+            save_graph_report(filename, general_results, db)
+            query_notify.warning(f'Graph report on all usernames saved in {filename}')
+
        text_report = get_plaintext_report(report_context)
        if text_report:
            query_notify.info('Short text report:')
@@ -1,3 +1,4 @@
+import ast
 import csv
 import io
 import json
@@ -6,13 +7,13 @@ import os
 from datetime import datetime
 from typing import Dict, Any

-import pycountry
 import xmind
 from dateutil.parser import parse as parse_datetime_str
 from jinja2 import Template
-from xhtml2pdf import pisa

+from .checking import SUPPORTED_IDS
 from .result import QueryStatus
+from .sites import MaigretDatabase
 from .utils import is_country_tag, CaseConverter, enrich_link_str

 SUPPORTED_JSON_REPORT_FORMATS = [
@@ -73,6 +74,10 @@ def save_html_report(filename: str, context: dict):
 def save_pdf_report(filename: str, context: dict):
    template, css = generate_report_template(is_pdf=True)
    filled_template = template.render(**context)
+
+    # moved here to speed up the launch of Maigret
+    from xhtml2pdf import pisa
+
    with open(filename, "w+b") as f:
        pisa.pisaDocument(io.StringIO(filled_template), dest=f, default_css=css)

@@ -82,6 +87,131 @@ def save_json_report(filename: str, username: str, results: dict, report_type: s
        generate_json_report(username, results, f, report_type=report_type)


+class MaigretGraph:
+    other_params = {'size': 10, 'group': 3}
+    site_params = {'size': 15, 'group': 2}
+    username_params = {'size': 20, 'group': 1}
+
+    def __init__(self, graph):
+        self.G = graph
+
+    def add_node(self, key, value):
+        node_name = f'{key}: {value}'
+
+        params = self.other_params
+        if key in SUPPORTED_IDS:
+            params = self.username_params
+        elif value.startswith('http'):
+            params = self.site_params
+
+        self.G.add_node(node_name, title=node_name, **params)
+
+        if value != value.lower():
+            normalized_node_name = self.add_node(key, value.lower())
+            self.link(node_name, normalized_node_name)
+
+        return node_name
+
+    def link(self, node1_name, node2_name):
+        self.G.add_edge(node1_name, node2_name, weight=2)
+
+
+def save_graph_report(filename: str, username_results: list, db: MaigretDatabase):
+    # moved here to speed up the launch of Maigret
+    import networkx as nx
+
+    G = nx.Graph()
+    graph = MaigretGraph(G)
+
+    for username, id_type, results in username_results:
+        username_node_name = graph.add_node(id_type, username)
+
+        for website_name in results:
+            dictionary = results[website_name]
+            # TODO: fix no site data issue
+            if not dictionary:
+                continue
+
+            if dictionary.get("is_similar"):
+                continue
+
+            status = dictionary.get("status")
+            if not status:  # FIXME: currently in case of timeout
+                continue
+
+            if dictionary["status"].status != QueryStatus.CLAIMED:
+                continue
+
+            site_fallback_name = dictionary.get(
+                'url_user', f'{website_name}: {username.lower()}'
+            )
+            # site_node_name = dictionary.get('url_user', f'{website_name}: {username.lower()}')
+            site_node_name = graph.add_node('site', site_fallback_name)
+            graph.link(username_node_name, site_node_name)
+
+            def process_ids(parent_node, ids):
+                for k, v in ids.items():
+                    if k.endswith('_count') or k.startswith('is_') or k.endswith('_at'):
+                        continue
+                    if k in 'image':
+                        continue
+
+                    v_data = v
+                    if v.startswith('['):
+                        try:
+                            v_data = ast.literal_eval(v)
+                        except Exception as e:
+                            logging.error(e)
+
+                    # value is a list
+                    if isinstance(v_data, list):
+                        list_node_name = graph.add_node(k, site_fallback_name)
+                        for vv in v_data:
+                            data_node_name = graph.add_node(vv, site_fallback_name)
+                            graph.link(list_node_name, data_node_name)
+
+                            add_ids = {
+                                a: b for b, a in db.extract_ids_from_url(vv).items()
+                            }
+                            if add_ids:
+                                process_ids(data_node_name, add_ids)
+                    else:
+                        # value is just a string
+                        # ids_data_name = f'{k}: {v}'
+                        # if ids_data_name == parent_node:
+                        #     continue
+
+                        ids_data_name = graph.add_node(k, v)
+                        # G.add_node(ids_data_name, size=10, title=ids_data_name, group=3)
+                        graph.link(parent_node, ids_data_name)
+
+                        # check for username
+                        if 'username' in k or k in SUPPORTED_IDS:
+                            new_username_node_name = graph.add_node('username', v)
+                            graph.link(ids_data_name, new_username_node_name)
+
+                        add_ids = {k: v for v, k in db.extract_ids_from_url(v).items()}
+                        if add_ids:
+                            process_ids(ids_data_name, add_ids)
+
+            if status.ids_data:
+                process_ids(site_node_name, status.ids_data)
+
+    nodes_to_remove = []
+    for node in G.nodes:
+        if len(str(node)) > 100:
+            nodes_to_remove.append(node)
+
+    [G.remove_node(node) for node in nodes_to_remove]
+
+    # moved here to speed up the launch of Maigret
+    from pyvis.network import Network
+
+    nt = Network(notebook=True, height="750px", width="100%")
+    nt.from_nx(G)
+    nt.show(filename)
+
+
 def get_plaintext_report(context: dict) -> str:
    output = (context['brief'] + " ").replace('. ', '.\n')
    interests = list(map(lambda x: x[0], context.get('interests_tuple_list', [])))
@@ -130,6 +260,9 @@ def generate_report_context(username_results: list):

    first_seen = None

+    # moved here to speed up the launch of Maigret
+    import pycountry
+
    for username, id_type, results in username_results:
        found_accounts = 0
        new_ids = []
@@ -171,6 +171,7 @@
            "usernameUnclaimed": "noonewouldeverusethis7"
        },
        "2fast4u": {
+            "disabled": true,
            "tags": [
                "nl"
            ],
@@ -269,7 +270,7 @@
                "forum",
                "ru"
            ],
-            "engine": "vBulletin",
+            "engine": "XenForo",
            "alexaRank": 221253,
            "urlMain": "https://4cheat.ru",
            "usernameClaimed": "adam",
@@ -1158,7 +1159,8 @@
            ],
            "checkType": "message",
            "absenceStrs": [
-                "does not exist"
+                "does not exist",
+                "This user has not filled out their profile page yet."
            ],
            "alexaRank": 80,
            "urlMain": "https://armchairgm.fandom.com/",
@@ -1603,7 +1605,10 @@
            ],
            "checkType": "message",
            "absenceStrs": [
-                "\u0423\u043f\u0441, \u0441\u0442\u0440\u0430\u043d\u0438\u0446\u0430, \u043a\u043e\u0442\u043e\u0440\u0443\u044e \u0432\u044b \u0438\u0441\u043a\u0430\u043b\u0438, \u043d\u0435 \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u0435\u0442"
+                "error-page__title"
+            ],
+            "presenseStrs": [
+                "user-name"
            ],
            "alexaRank": 5852,
            "urlMain": "https://www.baby.ru/",
@@ -2035,7 +2040,11 @@
                "ru",
                "wiki"
            ],
-            "checkType": "status_code",
+            "checkType": "message",
+            "absenceStrs": [
+                "does not exist",
+                "\u042d\u0442\u043e\u0442 \u0443\u0447\u0430\u0441\u0442\u043d\u0438\u043a \u043f\u043e\u043a\u0430 \u043d\u0435 \u0437\u0430\u043f\u043e\u043b\u043d\u0438\u043b \u0441\u0432\u043e\u0439 \u043f\u0440\u043e\u0444\u0438\u043b\u044c."
+            ],
            "alexaRank": 80,
            "urlMain": "https://bleach.fandom.com/ru",
            "url": "https://bleach.fandom.com/ru/wiki/%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:{username}",
@@ -2410,11 +2419,12 @@
        },
        "CapitalcityCombats": {
            "tags": [
-                "az",
-                "it",
                "ru"
            ],
            "checkType": "message",
+            "errors": {
+                "http://img.combats.com/errs/503.png": "Maintenance"
+            },
            "absenceStrs": [
                "<TITLE>\u041f\u0440\u043e\u0438\u0437\u043e\u0448\u043b\u0430 \u043e\u0448\u0438\u0431\u043a\u0430</TITLE>"
            ],
@@ -3402,11 +3412,12 @@
        },
        "Demonscity": {
            "tags": [
-                "az",
-                "it",
-                "pa"
+                "ru"
            ],
            "checkType": "message",
+            "errors": {
+                "http://img.combats.com/errs/503.png": "Maintenance"
+            },
            "absenceStrs": [
                "\u043d\u0435 \u043d\u0430\u0439\u0434\u0435\u043d"
            ],
@@ -3641,6 +3652,7 @@
            "errors": {
                "Invalid API key": "New API key needed"
            },
+            "regexCheck": "^[^/]+$",
            "urlProbe": "https://disqus.com/api/3.0/users/details?user=username%3A{username}&attach=userFlaggedUser&api_key=E8Uh5l5fHZ6gD8U3KycjAIAk46f68Zw7C6eW8WSjZvCLXebZ7p0r1yrYDrLilk2F",
            "checkType": "status_code",
            "presenseStrs": [
@@ -3924,7 +3936,10 @@
            ],
            "checkType": "message",
            "absenceStrs": [
-                "<title></title>"
+                "<title>  Not Found"
+            ],
+            "presenseStrs": [
+                "<title>  User"
            ],
            "alexaRank": 22598,
            "urlMain": "https://e621.net",
@@ -4449,7 +4464,8 @@
            ],
            "checkType": "message",
            "absenceStrs": [
-                "does not exist"
+                "does not exist",
+                "This user has not filled out their profile page yet."
            ],
            "alexaRank": 80,
            "urlMain": "https://community.fandom.com",
@@ -4603,6 +4619,7 @@
        },
        "Fifasoccer": {
            "urlSubpath": "/forum",
+            "disabled": true,
            "tags": [
                "forum",
                "ru",
@@ -4621,7 +4638,7 @@
            ],
            "checkType": "message",
            "absenceStrs": [
-                "<head><script>top.location.href='/404'"
+                "top.location.href = '/404';"
            ],
            "alexaRank": 4157,
            "urlMain": "https://www.filmweb.pl/user/adam",
@@ -4900,6 +4917,7 @@
            "usernameUnclaimed": "noonewouldeverusethis7"
        },
        "ForexDengi": {
+            "disabled": true,
            "tags": [
                "forum",
                "ru"
@@ -5581,7 +5599,8 @@
            "regexCheck": "^\\S+$",
            "errors": {
                "Are You a Robot?": "Captcha detected",
-                "Your IP address has been temporarily blocked due to a large number of HTTP requests": "Too many requests"
+                "Your IP address has been temporarily blocked due to a large number of HTTP requests": "Too many requests",
+                "your IP was banned": "IP ban"
            },
            "checkType": "message",
            "absenceStrs": [
@@ -6032,7 +6051,7 @@
            "usernameClaimed": "blue",
            "usernameUnclaimed": "noonewouldeverusethis7"
        },
-        "GoogleMaps": {
+        "Google Maps": {
            "tags": [
                "maps",
                "us"
@@ -6051,6 +6070,22 @@
            "usernameClaimed": "105054951427011407574",
            "usernameUnclaimed": "noonewouldeverusethis7"
        },
+        "Google Plus (archived)": {
+            "checkType": "message",
+            "type": "gaia_id",
+            "alexaRank": 1,
+            "presenseStrs": [
+                "original"
+            ],
+            "absenceStrs": [
+                "[]"
+            ],
+            "urlMain": "https://plus.google.com",
+            "urlProbe": "https://web.archive.org/web/timemap/?url=http%3A%2F%2Fplus.google.com%2F{username}&matchType=prefix&collapse=urlkey&output=json&fl=original%2Cmimetype%2Ctimestamp%2Cendtimestamp%2Cgroupcount%2Cuniqcount&filter=!statuscode%3A%5B45%5D..&limit=100000&_=1624789582128",
+            "url": "https://web.archive.org/web/*/plus.google.com/{username}*",
+            "usernameClaimed": "117522081019092547227",
+            "usernameUnclaimed": "noonewouldeverusethis7"
+        },
        "GooglePlayStore": {
            "tags": [
                "apps",
@@ -7099,6 +7134,9 @@
                "ru"
            ],
            "checkType": "status_code",
+            "errors": {
+                "The script encountered an error and will be aborted": "Site error"
+            },
            "alexaRank": 3405363,
            "urlMain": "http://ispdn.ru",
            "url": "http://ispdn.ru/forum/user/{username}/",
@@ -7323,6 +7361,9 @@
                "forum",
                "in"
            ],
+            "errors": {
+                "You are not logged in or you do not have permission to access this page.": "Auth required"
+            },
            "engine": "vBulletin",
            "alexaRank": 9210,
            "urlMain": "https://forums.kali.org/",
@@ -7877,11 +7918,14 @@
            ],
            "checkType": "message",
            "absenceStrs": [
-                "@context\": \"https://schema.org"
+                "https://likee.video/@/"
            ],
-            "alexaRank": 1072749,
-            "urlMain": "https://likee.com",
-            "url": "https://likee.com/user/@{username}/",
+            "presenseStrs": [
+                "user_name"
+            ],
+            "alexaRank": 38032,
+            "url": "https://likee.video/@{username}",
+            "urlMain": "https://likee.video",
            "usernameClaimed": "adam",
            "usernameUnclaimed": "noonewouldeverusethis7"
        },
@@ -8217,6 +8261,9 @@
                "ru"
            ],
            "checkType": "message",
+            "errors": {
+                "has been temporarily blocked": "IP ban"
+            },
            "absenceStrs": [
                "\u0417\u0430\u043f\u0440\u043e\u0448\u0435\u043d\u043d\u0430\u044f \u0432\u0430\u043c\u0438 \u0441\u0442\u0440\u0430\u043d\u0438\u0446\u0430 \u043d\u0435 \u043d\u0430\u0439\u0434\u0435\u043d\u0430.",
                "\u0414\u0430\u043d\u043d\u044b\u0435 \u043e \u0432\u044b\u0431\u0440\u0430\u043d\u043d\u043e\u043c \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435 \u043d\u0435 \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u044e\u0442",
@@ -8679,6 +8726,7 @@
            "usernameUnclaimed": "noonewouldeverusethis7"
        },
        "Metacafe": {
+            "disabled": true,
            "tags": [
                "in",
                "us"
@@ -11609,9 +11657,12 @@
                "discussion",
                "news"
            ],
-            "checkType": "status_code",
+            "checkType": "message",
+            "absenceStrs": [
+                "Sorry, nobody on Reddit goes by that name."
+            ],
            "presenseStrs": [
-                "totalKarma"
+                "Karma</h5>"
            ],
            "alexaRank": 19,
            "urlMain": "https://www.reddit.com/",
@@ -12328,10 +12379,13 @@
            "tags": [
                "ru"
            ],
-            "checkType": "status_code",
+            "checkType": "message",
+            "absenceStrs": [
+                "Not Found"
+            ],
            "alexaRank": 166278,
            "urlMain": "https://serveradmin.ru/",
-            "url": "https://serveradmin.ru/forum/profile/{username}/",
+            "url": "https://serveradmin.ru/author/{username}",
            "usernameClaimed": "fedor",
            "usernameUnclaimed": "noonewouldeverusethis7"
        },
@@ -13024,7 +13078,7 @@
                "us"
            ],
            "headers": {
-                "authorization": "Bearer BQCypIuUtz7zDFov8xN86mj1BelLf7Apf9WBaC5yYfNkmGe4r7Hz4Awp6dqPuCAP9K9F5yYtjbyZX_vlr4I"
+                "authorization": "Bearer BQB8QPkkvz_PhWGy4sSY4ijssYjumEHJgJJBFu3VX2Sm4XIoT9jp0eFZrYL3TayY4QZGHmMiz3BCPLcAth4"
            },
            "errors": {
                "Spotify is currently not available in your country.": "Access denied in your country, use proxy/vpn"
@@ -13978,7 +14032,8 @@
                "us"
            ],
            "errors": {
-                "Website unavailable": "Site error"
+                "Website unavailable": "Site error",
+                "is currently offline": "Site error"
            },
            "checkType": "message",
            "absenceStrs": [
@@ -14450,7 +14505,7 @@
                "sec-ch-ua": "Google Chrome\";v=\"87\", \" Not;A Brand\";v=\"99\", \"Chromium\";v=\"87\"",
                "authorization": "Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA",
                "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
-                "x-guest-token": "1400174453577900043"
+                "x-guest-token": "1411741418192883712"
            },
            "errors": {
                "Bad guest token": "x-guest-token update required"
@@ -14813,9 +14868,10 @@
            "usernameClaimed": "red",
            "usernameUnclaimed": "noonewouldeverusethis7"
        },
-        "Vgtimes": {
+        "Vgtimes/Games": {
            "tags": [
-                "ru"
+                "ru",
+                "forum"
            ],
            "checkType": "status_code",
            "alexaRank": 17926,
@@ -14857,7 +14913,7 @@
                "video"
            ],
            "headers": {
-                "Authorization": "jwt eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE2MjI2NjcxMjAsInVzZXJfaWQiOm51bGwsImFwcF9pZCI6NTg0NzksInNjb3BlcyI6InB1YmxpYyIsInRlYW1fdXNlcl9pZCI6bnVsbH0.V4VVbLzNwPU21rNP5moSxrPcPw--C7_Qz9VHgcJc1CA"
+                "Authorization": "jwt eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE2MzU2OTI0NjAsInVzZXJfaWQiOm51bGwsImFwcF9pZCI6NTg0NzksInNjb3BlcyI6InB1YmxpYyIsInRlYW1fdXNlcl9pZCI6bnVsbH0.KZHo96wUe5__rTqZQqAWiJKPKOy2-sjyxRjhOuuhyEc"
            },
            "activation": {
                "url": "https://vimeo.com/_rv/viewer",
@@ -15030,7 +15086,11 @@
            "tags": [
                "in"
            ],
-            "checkType": "response_url",
+            "checkType": "message",
+            "absenceStrs": [
+                " looking for. Perhaps searching can help.",
+                "<a href=\"https://www.votetags.info/author/\" title=\"\">"
+            ],
            "alexaRank": 39522,
            "urlMain": "https://www.votetags.info/",
            "url": "https://www.votetags.info/author/{username}/",
@@ -16134,6 +16194,7 @@
            ]
        },
        "allgaz": {
+            "disabled": true,
            "tags": [
                "forum",
                "ru"
@@ -18991,6 +19052,7 @@
                "ru",
                "ua"
            ],
+            "disabled": true,
            "engine": "Discourse",
            "alexaRank": 718392,
            "urlMain": "https://forum.reverse4you.org",
@@ -19157,11 +19219,17 @@
            "tags": [
                "cn"
            ],
-            "checkType": "status_code",
+            "checkType": "message",
+            "absenceStrs": [
+                "message\":\"Not Found\""
+            ],
+            "presenseStrs": [
+                "- SegmentFault \u601d\u5426</title>"
+            ],
            "alexaRank": 2697,
            "urlMain": "https://segmentfault.com/",
            "url": "https://segmentfault.com/u/{username}",
-            "usernameClaimed": "bule",
+            "usernameClaimed": "john",
            "usernameUnclaimed": "noonewouldeverusethis7"
        },
        "shadow-belgorod.ucoz.ru": {
@@ -19314,7 +19382,8 @@
            ],
            "checkType": "message",
            "absenceStrs": [
-                "We couldn't find that user"
+                "We couldn't find that user",
+                "Page Not Found"
            ],
            "alexaRank": 24562,
            "urlMain": "https://www.sparkpeople.com",
@@ -28239,6 +28308,104 @@
            "tags": [
                "music"
            ]
+        },
+        "Dev.by": {
+            "absenceStrs": [
+                "error-page"
+            ],
+            "presenseStrs": [
+                "profile__info"
+            ],
+            "url": "https://id.dev.by/users/{username}",
+            "urlMain": "https://id.dev.by",
+            "usernameClaimed": "admin",
+            "usernameUnclaimed": "noonewouldeverusethis7",
+            "checkType": "message",
+            "alexaRank": 50263,
+            "tags": [
+                "news",
+                "tech",
+                "by"
+            ]
+        },
+        "Vgtimes": {
+            "absenceStrs": [
+                "\u041f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044c \u0441 \u0442\u0430\u043a\u0438\u043c \u0438\u043c\u0435\u043d\u0435\u043c \u043d\u0435 \u043d\u0430\u0439\u0434\u0435\u043d"
+            ],
+            "presenseStrs": [
+                "user_profile"
+            ],
+            "url": "https://vgtimes.ru/user/{username}",
+            "urlMain": "https://vgtimes.ru",
+            "usernameClaimed": "admin",
+            "usernameUnclaimed": "noonewouldeverusethis7",
+            "checkType": "message",
+            "alexaRank": 16751,
+            "tags": [
+                "gaming",
+                "news",
+                "ru"
+            ]
+        },
+        "Onlyfinder": {
+            "absenceStrs": [
+                "\"rows\":[]"
+            ],
+            "presenseStrs": [
+                "Username"
+            ],
+            "url": "https://onlyfinder.com/json/search?q={username}&start=0",
+            "urlMain": "https://onlyfinder.com",
+            "usernameClaimed": "wilaribeiro",
+            "usernameUnclaimed": "noonewouldeverusethis7",
+            "checkType": "message",
+            "headers": {
+                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
+                "accept": "application/json, text/javascript, */*; q=0.01",
+                "x-requested-with": "XMLHttpRequest",
+                "cookie": "t=93712308; __cflb=02DiuFyCGPVyrmPMNwK31DjBY5udTKcbYh9HYtAX6rR1n"
+            },
+            "alexaRank": 286487,
+            "tags": [
+                "webcam"
+            ]
+        },
+        "partnerkin.com": {
+            "absenceStrs": [
+                "<title></title>"
+            ],
+            "presenseStrs": [
+                "<title>\u041f\u0440\u043e\u0444\u0438\u043b\u044c"
+            ],
+            "url": "https://partnerkin.com/user/{username}",
+            "urlMain": "https://partnerkin.com",
+            "usernameClaimed": "test",
+            "usernameUnclaimed": "noonewouldeverusethis7",
+            "checkType": "message",
+            "tags": [
+                "finance"
+            ]
+        },
+        "hozpitality": {
+            "presenseStrs": [
+                "USERNAME"
+            ],
+            "url": "https://www.hozpitality.com/{username}/profile",
+            "urlMain": "https://www.hozpitality.com",
+            "usernameClaimed": "admin",
+            "usernameUnclaimed": "noonewouldeverusethis7",
+            "checkType": "response_url"
+        },
+        "blogs.klerk.ru": {
+            "presenseStrs": [
+                "profile-links"
+            ],
+            "url": "https://blogs.klerk.ru/users/{username}/",
+            "urlMain": "https://blogs.klerk.ru",
+            "usernameClaimed": "admin",
+            "usernameUnclaimed": "noonewouldeverusethis7",
+            "checkType": "message",
+            "alexaRank": 6859
        }
    },
    "engines": {
@@ -28445,5 +28612,63 @@
                ]
            }
        }
-    }
+    },
+    "tags": [
+        "gaming",
+        "coding",
+        "photo",
+        "music",
+        "blog",
+        "finance",
+        "freelance",
+        "dating",
+        "tech",
+        "forum",
+        "porn",
+        "erotic",
+        "webcam",
+        "video",
+        "movies",
+        "hacking",
+        "art",
+        "discussion",
+        "sharing",
+        "writing",
+        "wiki",
+        "business",
+        "shopping",
+        "sport",
+        "books",
+        "news",
+        "documents",
+        "travel",
+        "maps",
+        "hobby",
+        "apps",
+        "classified",
+        "career",
+        "geosocial",
+        "streaming",
+        "education",
+        "networking",
+        "torrent",
+        "science",
+        "medicine",
+        "reading",
+        "stock",
+        "messaging",
+        "trading",
+        "links",
+        "fashion",
+        "tasks",
+        "military",
+        "auto",
+        "gambling",
+        "cybercriminal",
+        "review",
+        "bookmarks",
+        "design",
+        "tor",
+        "i2p"
+    ]
 }
@@ -0,0 +1,17 @@
+{
+    "presence_strings": [
+        "username",
+        "not found",
+        "пользователь",
+        "profile",
+        "lastname",
+        "firstname",
+        "biography",
+        "birthday",
+        "репутация",
+        "информация",
+        "e-mail"
+    ],
+    "supposed_usernames": [
+        "alex", "god", "admin", "red", "blue", "john"]
+}
@@ -0,0 +1,29 @@
+import json
+
+
+class Settings:
+    presence_strings: list
+    supposed_usernames: list
+
+    def __init__(self, filename):
+        data = {}
+
+        try:
+            with open(filename, "r", encoding="utf-8") as file:
+                try:
+                    data = json.load(file)
+                except Exception as error:
+                    raise ValueError(
+                        f"Problem with parsing json contents of "
+                        f"settings file '{filename}':  {str(error)}."
+                    )
+        except FileNotFoundError as error:
+            raise FileNotFoundError(
+                f"Problem while attempting to access settings file '{filename}'."
+            ) from error
+
+        self.__dict__.update(data)
+
+    @property
+    def json(self):
+        return self.__dict__
@@ -9,66 +9,6 @@ import requests

 from .utils import CaseConverter, URLMatcher, is_country_tag

-# TODO: move to data.json
-SUPPORTED_TAGS = [
-    "gaming",
-    "coding",
-    "photo",
-    "music",
-    "blog",
-    "finance",
-    "freelance",
-    "dating",
-    "tech",
-    "forum",
-    "porn",
-    "erotic",
-    "webcam",
-    "video",
-    "movies",
-    "hacking",
-    "art",
-    "discussion",
-    "sharing",
-    "writing",
-    "wiki",
-    "business",
-    "shopping",
-    "sport",
-    "books",
-    "news",
-    "documents",
-    "travel",
-    "maps",
-    "hobby",
-    "apps",
-    "classified",
-    "career",
-    "geosocial",
-    "streaming",
-    "education",
-    "networking",
-    "torrent",
-    "science",
-    "medicine",
-    "reading",
-    "stock",
-    "messaging",
-    "trading",
-    "links",
-    "fashion",
-    "tasks",
-    "military",
-    "auto",
-    "gambling",
-    "cybercriminal",
-    "review",
-    "bookmarks",
-    "design",
-    "tor",
-    "i2p",
-]
-

 class MaigretEngine:
    site: Dict[str, Any] = {}
@@ -204,12 +144,12 @@ class MaigretSite:
        errors.update(self.errors)
        return errors

-    def get_url_type(self) -> str:
+    def get_url_template(self) -> str:
        url = URLMatcher.extract_main_part(self.url)
        if url.startswith("{username}"):
            url = "SUBDOMAIN"
        elif url == "":
-            url = f"{self.url} ({self.engine})"
+            url = f"{self.url} ({self.engine or 'no engine'})"
        else:
            parts = url.split("/")
            url = "/" + "/".join(parts[1:])
@@ -273,8 +213,9 @@ class MaigretSite:

 class MaigretDatabase:
    def __init__(self):
-        self._sites = []
-        self._engines = []
+        self._tags: list = []
+        self._sites: list = []
+        self._engines: list = []

    @property
    def sites(self):
@@ -351,9 +292,13 @@ class MaigretDatabase:
        return self

    def save_to_file(self, filename: str) -> "MaigretDatabase":
+        if '://' in filename:
+            return self
+
        db_data = {
            "sites": {site.name: site.strip_engine_data().json for site in self._sites},
            "engines": {engine.name: engine.json for engine in self._engines},
+            "tags": self._tags,
        }

        json_data = json.dumps(db_data, indent=4)
@@ -367,6 +312,9 @@ class MaigretDatabase:
        # Add all of site information from the json file to internal site list.
        site_data = json_data.get("sites", {})
        engines_data = json_data.get("engines", {})
+        tags = json_data.get("tags", [])
+
+        self._tags += tags

        for engine_name in engines_data:
            self._engines.append(MaigretEngine(engine_name, engines_data[engine_name]))
@@ -399,7 +347,13 @@ class MaigretDatabase:

        return self.load_from_json(data)

-    def load_from_url(self, url: str) -> "MaigretDatabase":
+    def load_from_path(self, path: str) -> "MaigretDatabase":
+        if '://' in path:
+            return self.load_from_http(path)
+        else:
+            return self.load_from_file(path)
+
+    def load_from_http(self, url: str) -> "MaigretDatabase":
        is_url_valid = url.startswith("http://") or url.startswith("https://")

        if not is_url_valid:
@@ -455,6 +409,16 @@ class MaigretDatabase:

        return found_flags

+    def extract_ids_from_url(self, url: str) -> dict:
+        results = {}
+        for s in self._sites:
+            result = s.extract_id_from_url(url)
+            if not result:
+                continue
+            _id, _type = result
+            results[_id] = _type
+        return results
+
    def get_db_stats(self, sites_dict):
        if not sites_dict:
            sites_dict = self.sites_dict()
@@ -469,7 +433,7 @@ class MaigretDatabase:
            if site.disabled:
                disabled_count += 1

-            url_type = site.get_url_type()
+            url_type = site.get_url_template()
            urls[url_type] = urls.get(url_type, 0) + 1

            if not site.tags:
@@ -488,7 +452,7 @@ class MaigretDatabase:
        output += "Top tags:\n"
        for tag, count in sorted(tags.items(), key=lambda x: x[1], reverse=True)[:200]:
            mark = ""
-            if tag not in SUPPORTED_TAGS:
+            if tag not in self._tags:
                mark = " (non-standard)"
            output += f"{count}\t{tag}{mark}\n"

@@ -1,5 +1,5 @@
 import asyncio
-import difflib
+import json
 import re
 from typing import List
 import xml.etree.ElementTree as ET
@@ -8,382 +8,376 @@ import requests
 from .activation import import_aiohttp_cookies
 from .checking import maigret
 from .result import QueryStatus
+from .settings import Settings
 from .sites import MaigretDatabase, MaigretSite, MaigretEngine
-from .utils import get_random_user_agent
+from .utils import get_random_user_agent, get_match_ratio


-DESIRED_STRINGS = [
-    "username",
-    "not found",
-    "пользователь",
-    "profile",
-    "lastname",
-    "firstname",
-    "biography",
-    "birthday",
-    "репутация",
-    "информация",
-    "e-mail",
-]
-
-SUPPOSED_USERNAMES = ["alex", "god", "admin", "red", "blue", "john"]
-
-HEADERS = {
-    "User-Agent": get_random_user_agent(),
-}
-
-SEPARATORS = "\"'"
-
-RATIO = 0.6
-TOP_FEATURES = 5
-URL_RE = re.compile(r"https?://(www\.)?")
-
-
-def get_match_ratio(x):
-    return round(
-        max(
-            [difflib.SequenceMatcher(a=x.lower(), b=y).ratio() for y in DESIRED_STRINGS]
-        ),
-        2,
-    )
-
-
-def get_alexa_rank(site_url_main):
-    url = f"http://data.alexa.com/data?cli=10&url={site_url_main}"
-    xml_data = requests.get(url).text
-    root = ET.fromstring(xml_data)
-    alexa_rank = 0
-
-    try:
-        alexa_rank = int(root.find('.//REACH').attrib['RANK'])
-    except Exception:
-        pass
-
-    return alexa_rank
-
-
-def extract_mainpage_url(url):
-    return "/".join(url.split("/", 3)[:3])
-
-
-async def site_self_check(site, logger, semaphore, db: MaigretDatabase, silent=False):
-    changes = {
-        "disabled": False,
+class Submitter:
+    HEADERS = {
+        "User-Agent": get_random_user_agent(),
    }

-    check_data = [
-        (site.username_claimed, QueryStatus.CLAIMED),
-        (site.username_unclaimed, QueryStatus.AVAILABLE),
-    ]
+    SEPARATORS = "\"'"

-    logger.info(f"Checking {site.name}...")
+    RATIO = 0.6
+    TOP_FEATURES = 5
+    URL_RE = re.compile(r"https?://(www\.)?")

-    for username, status in check_data:
-        results_dict = await maigret(
-            username=username,
-            site_dict={site.name: site},
-            logger=logger,
-            timeout=30,
-            id_type=site.type,
-            forced=True,
-            no_progressbar=True,
-        )
+    def __init__(self, db: MaigretDatabase, settings: Settings, logger):
+        self.settings = settings
+        self.db = db
+        self.logger = logger

-        # don't disable entries with other ids types
-        # TODO: make normal checking
-        if site.name not in results_dict:
-            logger.info(results_dict)
-            changes["disabled"] = True
-            continue
+    @staticmethod
+    def get_alexa_rank(site_url_main):
+        url = f"http://data.alexa.com/data?cli=10&url={site_url_main}"
+        xml_data = requests.get(url).text
+        root = ET.fromstring(xml_data)
+        alexa_rank = 0

-        result = results_dict[site.name]["status"]
+        try:
+            alexa_rank = int(root.find('.//REACH').attrib['RANK'])
+        except Exception:
+            pass

-        site_status = result.status
+        return alexa_rank

-        if site_status != status:
-            if site_status == QueryStatus.UNKNOWN:
-                msgs = site.absence_strs
-                etype = site.check_type
-                logger.warning(
-                    "Error while searching '%s' in %s: %s, %s, check type %s",
-                    username,
-                    site.name,
-                    result.context,
-                    msgs,
-                    etype,
-                )
-                # don't disable in case of available username
-                if status == QueryStatus.CLAIMED:
-                    changes["disabled"] = True
-            elif status == QueryStatus.CLAIMED:
-                logger.warning(
-                    f"Not found `{username}` in {site.name}, must be claimed"
-                )
-                logger.info(results_dict[site.name])
-                changes["disabled"] = True
-            else:
-                logger.warning(f"Found `{username}` in {site.name}, must be available")
-                logger.info(results_dict[site.name])
-                changes["disabled"] = True
+    @staticmethod
+    def extract_mainpage_url(url):
+        return "/".join(url.split("/", 3)[:3])

-    logger.info(f"Site {site.name} checking is finished")
+    async def site_self_check(self, site, semaphore, silent=False):
+        changes = {
+            "disabled": False,
+        }

-    return changes
-
-
-def generate_additional_fields_dialog(engine: MaigretEngine, dialog):
-    fields = {}
-    if 'urlSubpath' in engine.site.get('url', ''):
-        msg = (
-            'Detected engine suppose additional URL subpath using (/forum/, /blog/, etc). '
-            'Enter in manually if it exists: '
-        )
-        subpath = input(msg).strip('/')
-        if subpath:
-            fields['urlSubpath'] = f'/{subpath}'
-    return fields
-
-
-async def detect_known_engine(
-    db, url_exists, url_mainpage, logger
-) -> List[MaigretSite]:
-    try:
-        r = requests.get(url_mainpage)
-        logger.debug(r.text)
-    except Exception as e:
-        logger.warning(e)
-        print("Some error while checking main page")
-        return []
-
-    for engine in db.engines:
-        strs_to_check = engine.__dict__.get("presenseStrs")
-        if strs_to_check and r and r.text:
-            all_strs_in_response = True
-            for s in strs_to_check:
-                if s not in r.text:
-                    all_strs_in_response = False
-            sites = []
-            if all_strs_in_response:
-                engine_name = engine.__dict__.get("name")
-
-                print(f"Detected engine {engine_name} for site {url_mainpage}")
-
-                usernames_to_check = SUPPOSED_USERNAMES
-                supposed_username = extract_username_dialog(url_exists)
-                if supposed_username:
-                    usernames_to_check = [supposed_username] + usernames_to_check
-
-                add_fields = generate_additional_fields_dialog(engine, url_exists)
-
-                for u in usernames_to_check:
-                    site_data = {
-                        "urlMain": url_mainpage,
-                        "name": url_mainpage.split("//")[1],
-                        "engine": engine_name,
-                        "usernameClaimed": u,
-                        "usernameUnclaimed": "noonewouldeverusethis7",
-                        **add_fields,
-                    }
-                    logger.info(site_data)
-
-                    maigret_site = MaigretSite(url_mainpage.split("/")[-1], site_data)
-                    maigret_site.update_from_engine(db.engines_dict[engine_name])
-                    sites.append(maigret_site)
-
-                return sites
-
-    return []
-
-
-def extract_username_dialog(url):
-    url_parts = url.rstrip("/").split("/")
-    supposed_username = url_parts[-1].strip('@')
-    entered_username = input(
-        f'Is "{supposed_username}" a valid username? If not, write it manually: '
-    )
-    return entered_username if entered_username else supposed_username
-
-
-async def check_features_manually(
-    db, url_exists, url_mainpage, cookie_file, logger, redirects=False
-):
-    custom_headers = {}
-    while True:
-        header_key = input(
-            'Specify custom header if you need or just press Enter to skip. Header name: '
-        )
-        if not header_key:
-            break
-        header_value = input('Header value: ')
-        custom_headers[header_key.strip()] = header_value.strip()
-
-    supposed_username = extract_username_dialog(url_exists)
-    non_exist_username = "noonewouldeverusethis7"
-
-    url_user = url_exists.replace(supposed_username, "{username}")
-    url_not_exists = url_exists.replace(supposed_username, non_exist_username)
-
-    headers = dict(HEADERS)
-    headers.update(custom_headers)
-
-    # cookies
-    cookie_dict = None
-    if cookie_file:
-        logger.info(f'Use {cookie_file} for cookies')
-        cookie_jar = import_aiohttp_cookies(cookie_file)
-        cookie_dict = {c.key: c.value for c in cookie_jar}
-
-    exists_resp = requests.get(
-        url_exists, cookies=cookie_dict, headers=headers, allow_redirects=redirects
-    )
-    logger.debug(url_exists)
-    logger.debug(exists_resp.status_code)
-    logger.debug(exists_resp.text)
-
-    non_exists_resp = requests.get(
-        url_not_exists, cookies=cookie_dict, headers=headers, allow_redirects=redirects
-    )
-    logger.debug(url_not_exists)
-    logger.debug(non_exists_resp.status_code)
-    logger.debug(non_exists_resp.text)
-
-    a = exists_resp.text
-    b = non_exists_resp.text
-
-    tokens_a = set(re.split(f'[{SEPARATORS}]', a))
-    tokens_b = set(re.split(f'[{SEPARATORS}]', b))
-
-    a_minus_b = tokens_a.difference(tokens_b)
-    b_minus_a = tokens_b.difference(tokens_a)
-
-    if len(a_minus_b) == len(b_minus_a) == 0:
-        print("The pages for existing and non-existing account are the same!")
-
-    top_features_count = int(
-        input(f"Specify count of features to extract [default {TOP_FEATURES}]: ")
-        or TOP_FEATURES
-    )
-
-    presence_list = sorted(a_minus_b, key=get_match_ratio, reverse=True)[
-        :top_features_count
-    ]
-
-    print("Detected text features of existing account: " + ", ".join(presence_list))
-    features = input("If features was not detected correctly, write it manually: ")
-
-    if features:
-        presence_list = list(map(str.strip, features.split(",")))
-
-    absence_list = sorted(b_minus_a, key=get_match_ratio, reverse=True)[
-        :top_features_count
-    ]
-    print("Detected text features of non-existing account: " + ", ".join(absence_list))
-    features = input("If features was not detected correctly, write it manually: ")
-
-    if features:
-        absence_list = list(map(str.strip, features.split(",")))
-
-    site_data = {
-        "absenceStrs": absence_list,
-        "presenseStrs": presence_list,
-        "url": url_user,
-        "urlMain": url_mainpage,
-        "usernameClaimed": supposed_username,
-        "usernameUnclaimed": non_exist_username,
-        "checkType": "message",
-    }
-
-    if headers != HEADERS:
-        site_data['headers'] = headers
-
-    site = MaigretSite(url_mainpage.split("/")[-1], site_data)
-    return site
-
-
-async def submit_dialog(db, url_exists, cookie_file, logger):
-    domain_raw = URL_RE.sub("", url_exists).strip().strip("/")
-    domain_raw = domain_raw.split("/")[0]
-    logger.info('Domain is %s', domain_raw)
-
-    # check for existence
-    matched_sites = list(filter(lambda x: domain_raw in x.url_main + x.url, db.sites))
-
-    if matched_sites:
-        print(
-            f'Sites with domain "{domain_raw}" already exists in the Maigret database!'
-        )
-        status = lambda s: "(disabled)" if s.disabled else ""
-        url_block = lambda s: f"\n\t{s.url_main}\n\t{s.url}"
-        print(
-            "\n".join(
-                [
-                    f"{site.name} {status(site)}{url_block(site)}"
-                    for site in matched_sites
-                ]
-            )
-        )
-
-        if input("Do you want to continue? [yN] ").lower() in "n":
-            return False
-
-    url_mainpage = extract_mainpage_url(url_exists)
-
-    print('Detecting site engine, please wait...')
-    sites = []
-    try:
-        sites = await detect_known_engine(db, url_exists, url_mainpage, logger)
-    except KeyboardInterrupt:
-        print('Engine detect process is interrupted.')
-
-    if not sites:
-        print("Unable to detect site engine, lets generate checking features")
-        sites = [
-            await check_features_manually(
-                db, url_exists, url_mainpage, cookie_file, logger
-            )
+        check_data = [
+            (site.username_claimed, QueryStatus.CLAIMED),
+            (site.username_unclaimed, QueryStatus.AVAILABLE),
        ]

-    logger.debug(sites[0].__dict__)
+        self.logger.info(f"Checking {site.name}...")

-    sem = asyncio.Semaphore(1)
-
-    print("Checking, please wait...")
-    found = False
-    chosen_site = None
-    for s in sites:
-        chosen_site = s
-        result = await site_self_check(s, logger, sem, db)
-        if not result["disabled"]:
-            found = True
-            break
-
-    if not found:
-        print(
-            f"Sorry, we couldn't find params to detect account presence/absence in {chosen_site.name}."
-        )
-        print(
-            "Try to run this mode again and increase features count or choose others."
-        )
-        return False
-    else:
-        if (
-            input(
-                f"Site {chosen_site.name} successfully checked. Do you want to save it in the Maigret DB? [Yn] "
+        for username, status in check_data:
+            results_dict = await maigret(
+                username=username,
+                site_dict={site.name: site},
+                logger=self.logger,
+                timeout=30,
+                id_type=site.type,
+                forced=True,
+                no_progressbar=True,
            )
-            .lower()
-            .strip("y")
-        ):
+
+            # don't disable entries with other ids types
+            # TODO: make normal checking
+            if site.name not in results_dict:
+                self.logger.info(results_dict)
+                changes["disabled"] = True
+                continue
+
+            result = results_dict[site.name]["status"]
+
+            site_status = result.status
+
+            if site_status != status:
+                if site_status == QueryStatus.UNKNOWN:
+                    msgs = site.absence_strs
+                    etype = site.check_type
+                    self.logger.warning(
+                        "Error while searching '%s' in %s: %s, %s, check type %s",
+                        username,
+                        site.name,
+                        result.context,
+                        msgs,
+                        etype,
+                    )
+                    # don't disable in case of available username
+                    if status == QueryStatus.CLAIMED:
+                        changes["disabled"] = True
+                elif status == QueryStatus.CLAIMED:
+                    self.logger.warning(
+                        f"Not found `{username}` in {site.name}, must be claimed"
+                    )
+                    self.logger.info(results_dict[site.name])
+                    changes["disabled"] = True
+                else:
+                    self.logger.warning(
+                        f"Found `{username}` in {site.name}, must be available"
+                    )
+                    self.logger.info(results_dict[site.name])
+                    changes["disabled"] = True
+
+        self.logger.info(f"Site {site.name} checking is finished")
+
+        return changes
+
+    def generate_additional_fields_dialog(self, engine: MaigretEngine, dialog):
+        fields = {}
+        if 'urlSubpath' in engine.site.get('url', ''):
+            msg = (
+                'Detected engine suppose additional URL subpath using (/forum/, /blog/, etc). '
+                'Enter in manually if it exists: '
+            )
+            subpath = input(msg).strip('/')
+            if subpath:
+                fields['urlSubpath'] = f'/{subpath}'
+        return fields
+
+    async def detect_known_engine(self, url_exists, url_mainpage) -> List[MaigretSite]:
+        try:
+            r = requests.get(url_mainpage)
+            self.logger.debug(r.text)
+        except Exception as e:
+            self.logger.warning(e)
+            print("Some error while checking main page")
+            return []
+
+        for engine in self.db.engines:
+            strs_to_check = engine.__dict__.get("presenseStrs")
+            if strs_to_check and r and r.text:
+                all_strs_in_response = True
+                for s in strs_to_check:
+                    if s not in r.text:
+                        all_strs_in_response = False
+                sites = []
+                if all_strs_in_response:
+                    engine_name = engine.__dict__.get("name")
+
+                    print(f"Detected engine {engine_name} for site {url_mainpage}")
+
+                    usernames_to_check = self.settings.supposed_usernames
+                    supposed_username = self.extract_username_dialog(url_exists)
+                    if supposed_username:
+                        usernames_to_check = [supposed_username] + usernames_to_check
+
+                    add_fields = self.generate_additional_fields_dialog(
+                        engine, url_exists
+                    )
+
+                    for u in usernames_to_check:
+                        site_data = {
+                            "urlMain": url_mainpage,
+                            "name": url_mainpage.split("//")[1],
+                            "engine": engine_name,
+                            "usernameClaimed": u,
+                            "usernameUnclaimed": "noonewouldeverusethis7",
+                            **add_fields,
+                        }
+                        self.logger.info(site_data)
+
+                        maigret_site = MaigretSite(
+                            url_mainpage.split("/")[-1], site_data
+                        )
+                        maigret_site.update_from_engine(
+                            self.db.engines_dict[engine_name]
+                        )
+                        sites.append(maigret_site)
+
+                    return sites
+
+        return []
+
+    def extract_username_dialog(self, url):
+        url_parts = url.rstrip("/").split("/")
+        supposed_username = url_parts[-1].strip('@')
+        entered_username = input(
+            f'Is "{supposed_username}" a valid username? If not, write it manually: '
+        )
+        return entered_username if entered_username else supposed_username
+
+    async def check_features_manually(
+        self, url_exists, url_mainpage, cookie_file, redirects=False
+    ):
+        custom_headers = {}
+        while True:
+            header_key = input(
+                'Specify custom header if you need or just press Enter to skip. Header name: '
+            )
+            if not header_key:
+                break
+            header_value = input('Header value: ')
+            custom_headers[header_key.strip()] = header_value.strip()
+
+        supposed_username = self.extract_username_dialog(url_exists)
+        non_exist_username = "noonewouldeverusethis7"
+
+        url_user = url_exists.replace(supposed_username, "{username}")
+        url_not_exists = url_exists.replace(supposed_username, non_exist_username)
+
+        headers = dict(self.HEADERS)
+        headers.update(custom_headers)
+
+        # cookies
+        cookie_dict = None
+        if cookie_file:
+            self.logger.info(f'Use {cookie_file} for cookies')
+            cookie_jar = import_aiohttp_cookies(cookie_file)
+            cookie_dict = {c.key: c.value for c in cookie_jar}
+
+        exists_resp = requests.get(
+            url_exists, cookies=cookie_dict, headers=headers, allow_redirects=redirects
+        )
+        self.logger.debug(url_exists)
+        self.logger.debug(exists_resp.status_code)
+        self.logger.debug(exists_resp.text)
+
+        non_exists_resp = requests.get(
+            url_not_exists,
+            cookies=cookie_dict,
+            headers=headers,
+            allow_redirects=redirects,
+        )
+        self.logger.debug(url_not_exists)
+        self.logger.debug(non_exists_resp.status_code)
+        self.logger.debug(non_exists_resp.text)
+
+        a = exists_resp.text
+        b = non_exists_resp.text
+
+        tokens_a = set(re.split(f'[{self.SEPARATORS}]', a))
+        tokens_b = set(re.split(f'[{self.SEPARATORS}]', b))
+
+        a_minus_b = tokens_a.difference(tokens_b)
+        b_minus_a = tokens_b.difference(tokens_a)
+
+        if len(a_minus_b) == len(b_minus_a) == 0:
+            print("The pages for existing and non-existing account are the same!")
+
+        top_features_count = int(
+            input(
+                f"Specify count of features to extract [default {self.TOP_FEATURES}]: "
+            )
+            or self.TOP_FEATURES
+        )
+
+        match_fun = get_match_ratio(self.settings.presence_strings)
+
+        presence_list = sorted(a_minus_b, key=match_fun, reverse=True)[
+            :top_features_count
+        ]
+
+        print("Detected text features of existing account: " + ", ".join(presence_list))
+        features = input("If features was not detected correctly, write it manually: ")
+
+        if features:
+            presence_list = list(map(str.strip, features.split(",")))
+
+        absence_list = sorted(b_minus_a, key=match_fun, reverse=True)[
+            :top_features_count
+        ]
+        print(
+            "Detected text features of non-existing account: " + ", ".join(absence_list)
+        )
+        features = input("If features was not detected correctly, write it manually: ")
+
+        if features:
+            absence_list = list(map(str.strip, features.split(",")))
+
+        site_data = {
+            "absenceStrs": absence_list,
+            "presenseStrs": presence_list,
+            "url": url_user,
+            "urlMain": url_mainpage,
+            "usernameClaimed": supposed_username,
+            "usernameUnclaimed": non_exist_username,
+            "checkType": "message",
+        }
+
+        if headers != self.HEADERS:
+            site_data['headers'] = headers
+
+        site = MaigretSite(url_mainpage.split("/")[-1], site_data)
+        return site
+
+    async def dialog(self, url_exists, cookie_file):
+        domain_raw = self.URL_RE.sub("", url_exists).strip().strip("/")
+        domain_raw = domain_raw.split("/")[0]
+        self.logger.info('Domain is %s', domain_raw)
+
+        # check for existence
+        matched_sites = list(
+            filter(lambda x: domain_raw in x.url_main + x.url, self.db.sites)
+        )
+
+        if matched_sites:
+            print(
+                f'Sites with domain "{domain_raw}" already exists in the Maigret database!'
+            )
+            status = lambda s: "(disabled)" if s.disabled else ""
+            url_block = lambda s: f"\n\t{s.url_main}\n\t{s.url}"
+            print(
+                "\n".join(
+                    [
+                        f"{site.name} {status(site)}{url_block(site)}"
+                        for site in matched_sites
+                    ]
+                )
+            )
+
+            if input("Do you want to continue? [yN] ").lower() in "n":
+                return False
+
+        url_mainpage = self.extract_mainpage_url(url_exists)
+
+        print('Detecting site engine, please wait...')
+        sites = []
+        try:
+            sites = await self.detect_known_engine(url_exists, url_mainpage)
+        except KeyboardInterrupt:
+            print('Engine detect process is interrupted.')
+
+        if not sites:
+            print("Unable to detect site engine, lets generate checking features")
+            sites = [
+                await self.check_features_manually(
+                    url_exists, url_mainpage, cookie_file
+                )
+            ]
+
+        self.logger.debug(sites[0].__dict__)
+
+        sem = asyncio.Semaphore(1)
+
+        print("Checking, please wait...")
+        found = False
+        chosen_site = None
+        for s in sites:
+            chosen_site = s
+            result = await self.site_self_check(s, sem)
+            if not result["disabled"]:
+                found = True
+                break
+
+        if not found:
+            print(
+                f"Sorry, we couldn't find params to detect account presence/absence in {chosen_site.name}."
+            )
+            print(
+                "Try to run this mode again and increase features count or choose others."
+            )
+            self.logger.debug(json.dumps(chosen_site.json))
            return False
+        else:
+            if (
+                input(
+                    f"Site {chosen_site.name} successfully checked. Do you want to save it in the Maigret DB? [Yn] "
+                )
+                .lower()
+                .strip("y")
+            ):
+                return False

-    chosen_site.name = input("Change site name if you want: ") or chosen_site.name
-    chosen_site.tags = list(map(str.strip, input("Site tags: ").split(',')))
-    rank = get_alexa_rank(chosen_site.url_main)
-    if rank:
-        print(f'New alexa rank: {rank}')
-        chosen_site.alexa_rank = rank
+        chosen_site.name = input("Change site name if you want: ") or chosen_site.name
+        chosen_site.tags = list(map(str.strip, input("Site tags: ").split(',')))
+        rank = Submitter.get_alexa_rank(chosen_site.url_main)
+        if rank:
+            print(f'New alexa rank: {rank}')
+            chosen_site.alexa_rank = rank

-    logger.debug(chosen_site.json)
-    site_data = chosen_site.strip_engine_data()
-    logger.debug(site_data.json)
-    db.update_site(site_data)
-    return True
+        self.logger.debug(chosen_site.json)
+        site_data = chosen_site.strip_engine_data()
+        self.logger.debug(site_data.json)
+        self.db.update_site(site_data)
+        return True
@@ -1,4 +1,5 @@
 import ast
+import difflib
 import re
 import random
 from typing import Any
@@ -95,3 +96,18 @@ def get_dict_ascii_tree(items, prepend="", new_line=True):

 def get_random_user_agent():
    return random.choice(DEFAULT_USER_AGENTS)
+
+
+def get_match_ratio(base_strs: list):
+    def get_match_inner(s: str):
+        return round(
+            max(
+                [
+                    difflib.SequenceMatcher(a=s.lower(), b=s2.lower()).ratio()
+                    for s2 in base_strs
+                ]
+            ),
+            2,
+        )
+
+    return get_match_inner
@@ -37,3 +37,5 @@ webencodings==0.5.1
 xhtml2pdf==0.2.5
 XMind==1.2.0
 yarl==1.6.3
+networkx==2.5.1
+pyvis==0.1.9
@@ -5,14 +5,13 @@ from setuptools import (


 with open('README.md') as fh:
-    readme = fh.read()
-    long_description = readme.replace('./', 'https://raw.githubusercontent.com/soxoj/maigret/main/')
+    long_description = fh.read()

 with open('requirements.txt') as rf:
    requires = rf.read().splitlines()

 setup(name='maigret',
-      version='0.3.0',
+      version='0.3.1',
      description='Collect a dossier on a person by username from a huge number of sites',
      long_description=long_description,
      long_description_content_type="text/markdown",
@@ -1,4 +0,0 @@
-#!/bin/sh
-coverage run --source=./maigret -m pytest tests
-coverage report -m
-coverage html
@@ -13,6 +13,7 @@ DEFAULT_ARGS: Dict[str, Any] = {
    'disable_recursive_search': False,
    'folderoutput': 'reports',
    'html': False,
+    'graph': False,
    'id_type': 'username',
    'ignore_ids_list': [],
    'info': False,
@@ -1,15 +1,16 @@
 """Maigret data test functions"""

 from maigret.utils import is_country_tag
-from maigret.sites import SUPPORTED_TAGS


 def test_tags_validity(default_db):
    unknown_tags = set()

+    tags = default_db._tags
+
    for site in default_db.sites:
        for tag in filter(lambda x: not is_country_tag(x), site.tags):
-            if tag not in SUPPORTED_TAGS:
+            if tag not in tags:
                unknown_tags.add(tag)

    assert unknown_tags == set()
@@ -9,7 +9,6 @@ from maigret.maigret import self_check, maigret
 from maigret.maigret import (
    extract_ids_from_page,
    extract_ids_from_results,
-    extract_ids_from_url,
 )
 from maigret.sites import MaigretSite
 from maigret.result import QueryResult, QueryStatus
@@ -144,18 +143,18 @@ def test_maigret_results(test_db):


 def test_extract_ids_from_url(default_db):
-    assert extract_ids_from_url('https://www.reddit.com/user/test', default_db) == {
+    assert default_db.extract_ids_from_url('https://www.reddit.com/user/test') == {
        'test': 'username'
    }
-    assert extract_ids_from_url('https://vk.com/id123', default_db) == {'123': 'vk_id'}
-    assert extract_ids_from_url('https://vk.com/ida123', default_db) == {
+    assert default_db.extract_ids_from_url('https://vk.com/id123') == {'123': 'vk_id'}
+    assert default_db.extract_ids_from_url('https://vk.com/ida123') == {
        'ida123': 'username'
    }
-    assert extract_ids_from_url(
-        'https://my.mail.ru/yandex.ru/dipres8904/', default_db
+    assert default_db.extract_ids_from_url(
+        'https://my.mail.ru/yandex.ru/dipres8904/'
    ) == {'dipres8904': 'username'}
-    assert extract_ids_from_url(
-        'https://reviews.yandex.ru/user/adbced123', default_db
+    assert default_db.extract_ids_from_url(
+        'https://reviews.yandex.ru/user/adbced123'
    ) == {'adbced123': 'yandex_public_id'}


@@ -1,5 +1,6 @@
 """Maigret Database test functions"""
 from maigret.sites import MaigretDatabase, MaigretSite
+from maigret.utils import URLMatcher

 EXAMPLE_DB = {
    'engines': {
@@ -179,3 +180,26 @@ def test_ranked_sites_dict_id_type():
    assert len(db.ranked_sites_dict()) == 2
    assert len(db.ranked_sites_dict(id_type='username')) == 2
    assert len(db.ranked_sites_dict(id_type='gaia_id')) == 1
+
+
+def test_get_url_template():
+    site = MaigretSite(
+        "test",
+        {
+            "urlMain": "https://ya.ru/",
+            "url": "{urlMain}{urlSubpath}/members/?username={username}",
+        },
+    )
+    assert (
+        site.get_url_template()
+        == "{urlMain}{urlSubpath}/members/?username={username} (no engine)"
+    )
+
+    site = MaigretSite(
+        "test",
+        {
+            "urlMain": "https://ya.ru/",
+            "url": "https://{username}.ya.ru",
+        },
+    )
+    assert site.get_url_template() == "SUBDOMAIN"
@@ -8,6 +8,7 @@ from maigret.utils import (
    enrich_link_str,
    URLMatcher,
    get_dict_ascii_tree,
+    get_match_ratio,
 )


@@ -136,3 +137,9 @@ def test_get_dict_ascii_tree():
 ┣╸instagram_username: Street.Reality.Photography
 ┗╸twitter_username: Alexaimephotogr"""
    )
+
+
+def test_get_match_ratio():
+    fun = get_match_ratio(["test", "maigret", "username"])
+
+    assert fun("test") == 1
Author	SHA1	Message	Date
soxoj	37854a867b	Merge pull request #188 from soxoj/fp-fixes-speed-up Accelerated start time & fixed some false positives	2021-10-31 18:35:26 +03:00
Soxoj	6480eebbdf	Accelerated start time & fixed some false positives	2021-10-31 18:25:01 +03:00
soxoj	aad862b2ed	Merge pull request #187 from soxoj/false-positive-fixes-04-09 Fixed some false positives from telegram bot	2021-09-04 18:26:29 +03:00
Soxoj	c6d0f332bd	Fixed some false positives from telegram bot	2021-09-04 18:10:56 +03:00
soxoj	f1c006159e	Merge pull request #186 from soxoj/reddit-fix Reddit search fixed	2021-09-04 18:02:02 +03:00
Soxoj	69a09fcd94	Reddit search fixed	2021-09-04 17:50:35 +03:00
soxoj	9f948928e6	Merge pull request #181 from soxoj/sites-fixes-18-07-21 False positives fixes	2021-07-18 23:00:32 +03:00
Soxoj	a3034c11ff	False positives fixes	2021-07-18 20:25:21 +03:00
soxoj	d47c72b972	Merge pull request #179 from soxoj/sites-fixes-04-07-21 Added new sites and fixed some fp	2021-07-05 00:23:02 +03:00
Soxoj	8062ec30e9	Added new sites and fixed some fp	2021-07-05 00:17:05 +03:00
soxoj	32000a1cfd	Merge pull request #178 from soxoj/url-source-fix Fixed DB loading from URL	2021-07-02 23:33:09 +03:00
Soxoj	8af6ce3af5	Fixed DB loading from URL	2021-07-02 23:30:44 +03:00
soxoj	0dd1dd5d76	Merge pull request #176 from soxoj/google-plus Added checking of Google Plus account through Wayback machine	2021-06-27 13:36:06 +03:00
Soxoj	4aab21046b	Added checking of Google Plus account through Wayback machine	2021-06-27 13:33:25 +03:00
soxoj	92ac9ec8b7	Merge pull request #175 from soxoj/new-sites-26-06-21 Added new sites	2021-06-26 13:16:18 +03:00
Soxoj	ca2c8b3502	Added new sites	2021-06-26 13:12:45 +03:00
soxoj	4362a41fca	Merge pull request #174 from soxoj/graph-report-draft Draft of graph report	2021-06-21 22:27:40 +03:00
Soxoj	c7977f1cdf	Draft of graph report	2021-06-21 22:20:51 +03:00
soxoj	49708da980	Update README.md	2021-06-13 03:18:36 +03:00
soxoj	bc1398061f	Merge pull request #172 from soxoj/dockerfile-update Dockerfile update	2021-06-13 01:52:35 +03:00
Soxoj	e8634c8c56	Dockerfile update	2021-06-13 01:50:36 +03:00
soxoj	dc59b93f38	Merge pull request #171 from soxoj/add-code-of-conduct-1 Create CODE_OF_CONDUCT.md	2021-06-13 01:25:59 +03:00
soxoj	c727cbae27	Create CODE_OF_CONDUCT.md	2021-06-13 01:25:50 +03:00
soxoj	e6c6cc8f6d	Update issue templates	2021-06-13 01:22:55 +03:00
soxoj	c80e8b1207	Create CONTRIBUTING.md	2021-06-13 01:17:28 +03:00
soxoj	6e78fdeb81	Merge pull request #170 from soxoj/readme-links-fixes Fixed links to static files in README	2021-06-13 00:51:04 +03:00
Soxoj	9c22e09808	Fixed links to static files in README	2021-06-13 00:49:43 +03:00
soxoj	f057fd3a68	Merge pull request #169 from soxoj/submit-mode-refactoring Refactoring of submit module, some fixes	2021-06-13 00:45:37 +03:00
Soxoj	9b0acc092a	Refactoring of submit module, some fixes	2021-06-13 00:43:28 +03:00
soxoj	e6b4cdfa77	Added README views counter	2021-06-06 18:25:06 +03:00
Soxoj	eb721dc7e3	Makefile, some fixes	2021-06-06 17:32:04 +03:00
soxoj	eba0c4531c	Merge pull request #167 from soxoj/fixes-04-06-21 Fixed some false positives	2021-06-04 02:59:53 +03:00
Soxoj	b4a26c03fe	Fixed some false positives	2021-06-04 02:57:32 +03:00