From eeab6ba82cc07ac48f8a6b2eeeae9a70a26995dc Mon Sep 17 00:00:00 2001 From: Egor Nagornov Date: Tue, 2 Nov 2021 17:21:27 +0700 Subject: [PATCH] Move wiki pages to ReadTheDocs --- docs/Makefile | 20 +++ docs/make.bat | 35 +++++ docs/source/command-line-options.rst | 124 ++++++++++++++++++ docs/source/conf.py | 36 +++++ .../extracting-information-from-pages.rst | 35 +++++ docs/source/features.rst | 76 +++++++++++ docs/source/index.rst | 29 ++++ docs/source/philosophy.rst | 6 + docs/source/roadmap.rst | 18 +++ docs/source/supported-identifier-types.rst | 15 +++ docs/source/tags.rst | 24 ++++ docs/source/usage-examples.rst | 53 ++++++++ 12 files changed, 471 insertions(+) create mode 100644 docs/Makefile create mode 100644 docs/make.bat create mode 100644 docs/source/command-line-options.rst create mode 100644 docs/source/conf.py create mode 100644 docs/source/extracting-information-from-pages.rst create mode 100644 docs/source/features.rst create mode 100644 docs/source/index.rst create mode 100644 docs/source/philosophy.rst create mode 100644 docs/source/roadmap.rst create mode 100644 docs/source/supported-identifier-types.rst create mode 100644 docs/source/tags.rst create mode 100644 docs/source/usage-examples.rst diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 0000000..d0c3cbf --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = source +BUILDDIR = build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/make.bat b/docs/make.bat new file mode 100644 index 0000000..9534b01 --- /dev/null +++ b/docs/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=source +set BUILDDIR=build + +if "%1" == "" goto help + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.http://sphinx-doc.org/ + exit /b 1 +) + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/docs/source/command-line-options.rst b/docs/source/command-line-options.rst new file mode 100644 index 0000000..44dcd8f --- /dev/null +++ b/docs/source/command-line-options.rst @@ -0,0 +1,124 @@ +.. _command-line-options: + +Command line options +==================== + +Usernames +--------- + +``maigret username1 username2 ...`` + +You can specify several usernames separated by space. Usernames are +**not** mandatory as there are other operations modes (see below). + +Parsing of account pages and online documents +--------------------------------------------- + +``maigret --parse URL`` + +Maigret will try to extract information about the document/account owner +(including username and other ids) and will make a search by the +extracted username and ids. :doc:`Examples `. + +Main options +------------ + +``--tags`` - Filter sites for searching by tags: sites categories and +two-letter country codes. E.g. photo, dating, sport; jp, us, global. +Multiple tags can be associated with one site. **Warning: tags markup is +not stable now.** + +``-n``, ``--max-connections`` - Allowed number of concurrent connections +**(default: 100)**. + +``-a``, ``--all-sites`` - Use all sites for scan **(default: top 500)**. + +``--top-sites`` - Count of sites for scan ranked by Alexa Top +**(default: top 500)**. + +``--timeout`` - Time (in seconds) to wait for responses from sites +**(default: 30)**. A longer timeout will be more likely to get results +from slow sites. On the other hand, this may cause a long delay to +gather all results. The choice of the right timeout should be carried +out taking into account the bandwidth of the Internet connection. + +``--cookies-jar-file`` - File with custom cookies in Netscape format +(aka cookies.txt). You can install an extension to your browser to +download own cookies (`Chrome `_, `Firefox `_). + +``--no-recursion`` - Disable parsing pages for other usernames and +recursive search by them. + +``--use-disabled-sites`` - Use disabled sites to search (may cause many +false positives). + +``--id-type`` - Specify identifier(s) type (default: username). +Supported types: gaia_id, vk_id, yandex_public_id, ok_id, wikimapia_uid. +Currently, you must add ``-a`` flag to run a scan on sites with custom +id types, sites will be filtered automatically. + +``--ignore-ids`` - Do not make search by the specified username or other +ids. Useful for repeated scanning with found known irrelevant usernames. + +``--db`` - Load Maigret database from a JSON file or an online, valid, +JSON file. + +``--retries RETRIES`` - Count of attempts to restart temporarily failed +requests. + +Reports +------- + +``-P``, ``--pdf`` - Generate a PDF report (general report on all +usernames). + +``-H``, ``--html`` - Generate an HTML report file (general report on all +usernames). + +``-X``, ``--xmind`` - Generate an XMind 8 mindmap (one report per +username). + +``-C``, ``--csv`` - Generate a CSV report (one report per username). + +``-T``, ``--txt`` - Generate a TXT report (one report per username). + +``-J``, ``--json`` - Generate a JSON report of specific type: simple, +ndjson (one report per username). E.g. ``--json ndjson`` + +``-fo``, ``--folderoutput`` - Results will be saved to this folder, +``results`` by default. Will be created if doesn’t exist. + +Output options +-------------- + +``-v``, ``--verbose`` - Display extra information and metrics. +*(loglevel=WARNING)* + +``-vv``, ``--info`` - Display service information. *(loglevel=INFO)* + +``-vvv``, ``--debug``, ``-d`` - Display debugging information and site +responses. *(loglevel=DEBUG)* + +``--print-not-found`` - Print sites where the username was not found. + +``--print-errors`` - Print errors messages: connection, captcha, site +country ban, etc. + +Other operations modes +---------------------- + +``--version`` - Display version information and dependencies. + +``--self-check`` - Do self-checking for sites and database and disable +non-working ones **for current search session** by default. It’s useful +for testing new internet connection (it depends on provider/hosting on +which sites there will be censorship stub or captcha display). After +checking Maigret asks if you want to save updates, answering y/Y will +rewrite the local database. + +``--submit URL`` - Do an automatic analysis of the given account URL or +site main page URL to determine the site engine and methods to check +account presence. After checking Maigret asks if you want to add the +site, answering y/Y will rewrite the local database. + + diff --git a/docs/source/conf.py b/docs/source/conf.py new file mode 100644 index 0000000..a0d4a62 --- /dev/null +++ b/docs/source/conf.py @@ -0,0 +1,36 @@ +# Configuration file for the Sphinx documentation builder. + +# -- Project information + +project = 'Maigret' +copyright = '2021, soxoj' +author = 'soxoj' + +release = '0.3.1' +version = '0.3.1' + +# -- General configuration + +extensions = [ + 'sphinx.ext.duration', + 'sphinx.ext.doctest', + 'sphinx.ext.autodoc', + 'sphinx.ext.autosummary', + 'sphinx.ext.intersphinx', + 'sphinx_copybutton' +] + +intersphinx_mapping = { + 'python': ('https://docs.python.org/3/', None), + 'sphinx': ('https://www.sphinx-doc.org/en/master/', None), +} +intersphinx_disabled_domains = ['std'] + +templates_path = ['_templates'] + +# -- Options for HTML output + +html_theme = 'sphinx_rtd_theme' + +# -- Options for EPUB output +epub_show_urls = 'footnote' diff --git a/docs/source/extracting-information-from-pages.rst b/docs/source/extracting-information-from-pages.rst new file mode 100644 index 0000000..c24e6d5 --- /dev/null +++ b/docs/source/extracting-information-from-pages.rst @@ -0,0 +1,35 @@ +.. _extracting-information-from-pages: + +Extracting information from pages +================================= +Maigret can parse URLs and content of web pages by URLs to extract info about account owner and other meta information. + +You must specify the URL with the option ``--parse``, it's can be a link to an account or an online document. List of supported sites `see here `_. + +After the end of the parsing phase, Maigret will start the search phase by :doc:`supported identifiers ` found (usernames, ids, etc.). + +Examples +-------- +.. code-block:: console + + $ maigret --parse https://docs.google.com/spreadsheets/d/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw/edit\#gid\=0 + + Scanning webpage by URL https://docs.google.com/spreadsheets/d/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw/edit#gid=0... + ┣╸org_name: Gooten + ┗╸mime_type: application/vnd.google-apps.ritz + Scanning webpage by URL https://clients6.google.com/drive/v2beta/files/1HtZKMLRXNsZ0HjtBmo0Gi03nUPiJIA4CC4jTYbCAnXw?fields=alternateLink%2CcopyRequiresWriterPermission%2CcreatedDate%2Cdescription%2CdriveId%2CfileSize%2CiconLink%2Cid%2Clabels(starred%2C%20trashed)%2ClastViewedByMeDate%2CmodifiedDate%2Cshared%2CteamDriveId%2CuserPermission(id%2Cname%2CemailAddress%2Cdomain%2Crole%2CadditionalRoles%2CphotoLink%2Ctype%2CwithLink)%2Cpermissions(id%2Cname%2CemailAddress%2Cdomain%2Crole%2CadditionalRoles%2CphotoLink%2Ctype%2CwithLink)%2Cparents(id)%2Ccapabilities(canMoveItemWithinDrive%2CcanMoveItemOutOfDrive%2CcanMoveItemOutOfTeamDrive%2CcanAddChildren%2CcanEdit%2CcanDownload%2CcanComment%2CcanMoveChildrenWithinDrive%2CcanRename%2CcanRemoveChildren%2CcanMoveItemIntoTeamDrive)%2Ckind&supportsTeamDrives=true&enforceSingleParent=true&key=AIzaSyC1eQ1xj69IdTMeii5r7brs3R90eck-m7k... + ┣╸created_at: 2016-02-16T18:51:52.021Z + ┣╸updated_at: 2019-10-23T17:15:47.157Z + ┣╸gaia_id: 15696155517366416778 + ┣╸fullname: Nadia Burgess + ┣╸email: nadia@gooten.com + ┣╸image: https://lh3.googleusercontent.com/a-/AOh14GheZe1CyNa3NeJInWAl70qkip4oJ7qLsD8vDy6X=s64 + ┗╸email_username: nadia + +.. code-block:: console + + $ maigret.py --parse https://steamcommunity.com/profiles/76561199113454789 + Scanning webpage by URL https://steamcommunity.com/profiles/76561199113454789... + ┣╸steam_id: 76561199113454789 + ┣╸nickname: Pok + ┗╸username: Machine42 diff --git a/docs/source/features.rst b/docs/source/features.rst new file mode 100644 index 0000000..6367d5d --- /dev/null +++ b/docs/source/features.rst @@ -0,0 +1,76 @@ +.. _features: + +Features +======== + +This is the list of Maigret features. + +Personal info gathering +----------------------- + +Maigret does the `parsing of accounts webpages and extraction `_ of personal info, links to other profiles, etc. +Extracted info displayed as an additional result in CLI output and as tables in HTML and PDF reports. +Also, Maigret use found ids and usernames from links to start a recursive search. + +Enabled by default, can be disabled with ``--no extracting``. + +Recursive search +---------------- + +Maigret can extract some :ref:`common ids ` and usernames from links on the account page (often people placed links to their other accounts) and immediately start new searches. All the gathered information will be displayed in CLI output and reports. + +Enabled by default, can be disabled with ``--no-recursion``. + +Reports +------- + +Maigret currently supports HTML, PDF, TXT, XMind mindmap, and JSON reports. + +HTML/PDF reports contain: + +- profile photo +- all the gathered personal info +- additional information about supposed personal data (full name, gender, location), resulting from statistics of all found accounts + +Also, there is a short text report in the CLI output after the end of a searching phase. + +Tags +---- + +The Maigret sites database very big (and will be bigger), and it is maybe an overhead to run a search for all the sites. +Also, it is often hard to understand, what sites more interesting for us in the case of a certain person. + +Tags markup allows selecting a subset of sites by interests (photo, messaging, finance, etc.) or by country. Tags of found accounts grouped and displayed in the reports. + +See full description :doc:`in the Tags Wiki page `. + +Censorship and captcha detection +-------------------------------- + +Maigret can detect common errors such as censorship stub pages, CloudFlare captcha pages, and others. +If you get more them 3% errors of a certain type in a session, you've got a warning message in the CLI output with recommendations to improve performance and avoid problems. + +Retries +------- + +Maigret will do retries of the requests with temporary errors got (connection failures, proxy errors, etc.). + +One attempt by default, can be changed with option ``--retries N``. + +Archives and mirrors checking +----------------------------- + +The Maigret database contains not only the original websites, but also mirrors, archives, and aggregators. For example: + +- `Reddit BigData search `_ +- `Picuki `_, Instagram mirror +- `Twitter shadowban `_ checker + +It allows getting additional info about the person and checking the existence of the account even if the main site is unavailable (bot protection, captcha, etc.) + +Simple API +---------- + +Maigret can be easily integrated with the use of Python package `maigret `_. + +Example: the official `Telegram bot `_ diff --git a/docs/source/index.rst b/docs/source/index.rst new file mode 100644 index 0000000..ec6bda6 --- /dev/null +++ b/docs/source/index.rst @@ -0,0 +1,29 @@ +.. _index: + +Welcome to the Maigret docs! +============================ + +**Maigret** is an easy-to-use and powerful OSINT tool for collecting a dossier on a person by username only. + +This is achieved by checking for accounts on a huge number of sites and gathering all the available information from web pages. + +The project's main goal - give to OSINT researchers and pentesters a **universal tool** to get maximum information about a subject and integrate it with other tools in automatization pipelines. + +You may be interested in: +------------------------- +- :doc:`Command line options description ` and :doc:`usage examples ` +- :doc:`Features list ` +- :doc:`Project roadmap ` + +.. toctree:: + :hidden: + :caption: Sections + + command-line-options + extracting-information-from-pages + features + philosophy + roadmap + supported-identifier-types + tags + usage-examples diff --git a/docs/source/philosophy.rst b/docs/source/philosophy.rst new file mode 100644 index 0000000..bd53312 --- /dev/null +++ b/docs/source/philosophy.rst @@ -0,0 +1,6 @@ +.. _philosophy: + +Philosophy +========== + +Username => Dossier diff --git a/docs/source/roadmap.rst b/docs/source/roadmap.rst new file mode 100644 index 0000000..c1dca39 --- /dev/null +++ b/docs/source/roadmap.rst @@ -0,0 +1,18 @@ +.. _roadmap: + +Roadmap +======= + +.. figure:: https://i.imgur.com/kk8cFdR.png + :target: https://i.imgur.com/kk8cFdR.png + :align: center + +Current status +-------------- + +- Sites DB stats - ok +- Scan sessions stats - ok +- Site engine autodetect - ok +- Engines for all the sites - WIP +- Unified reporting flow - ok +- Retries - ok diff --git a/docs/source/supported-identifier-types.rst b/docs/source/supported-identifier-types.rst new file mode 100644 index 0000000..312a223 --- /dev/null +++ b/docs/source/supported-identifier-types.rst @@ -0,0 +1,15 @@ +.. _supported-identifier-types: + +Supported identifier types +========================== + +Maigret can search against not only ordinary usernames, but also through certain common identifiers. There is a list of all currently supported identifiers. + +- **gaia_id** - Google inner numeric user identifier, in former times was placed in a Google Plus account URL. +- **steam_id** - Steam inner numeric user identifier. +- **wikimapia_uid** - Wikimapia.org inner numeric user identifier. +- **uidme_uguid** - uID.me inner numeric user identifier. +- **yandex_public_id** - Yandex sites inner letter user identifier. See also: `YaSeeker `_. +- **vk_id** - VK.com inner numeric user identifier. +- **ok_id** - OK.ru inner numeric user identifier. +- **yelp_userid** - Yelp inner user identifier. diff --git a/docs/source/tags.rst b/docs/source/tags.rst new file mode 100644 index 0000000..b3d3f4b --- /dev/null +++ b/docs/source/tags.rst @@ -0,0 +1,24 @@ +.. _tags: + +Tags +==== + +The use of tags allows you to select a subset of the sites from big Maigret DB for search. + +**Warning: tags markup is not stable now.** + +There are several types of tags: + +1. **Country codes**: ``us``, ``jp``, ``br``... (`ISO 3166-1 alpha-2 `_). These tags reflect the site language and regional origin of its users and are then used to locate the owner of a username. If the regional origin is difficult to establish or a site is positioned as worldwide, `no country code is given`. There could be multiple country code tags for one site. + +2. **Site engines**. Most of them are forum engines now: ``uCoz``, ``vBulletin``, ``XenForo`` et al. Full list of engines stored in the Maigret database. + +3. **Sites' subject/type and interests of its users**. Full list of "standard" tags is `present in the source code `_ only for a moment. + +Usage +----- +``--tags en,jp`` -- search on US and Japanese sites (actually marked as such in the Maigret database) + +``--tags coding`` -- search on sites related to software development. + +``--tags ucoz`` -- search on uCoz sites only (mostly CIS countries) diff --git a/docs/source/usage-examples.rst b/docs/source/usage-examples.rst new file mode 100644 index 0000000..4aa6fa6 --- /dev/null +++ b/docs/source/usage-examples.rst @@ -0,0 +1,53 @@ +.. _usage-examples: + +Usage examples +============== + +Start a search for accounts with username ``machine42`` on top 500 sites from the Maigret DB. + +.. code-block:: console + + maigret machine42 + +Start a search for accounts with username ``machine42`` on **all sites** from the Maigret DB. + +.. code-block:: console + + maigret machine42 -a + +Start a search [...] and generate HTML and PDF reports. + +.. code-block:: console + + maigret machine42 -a -HP + +Start a search for accounts with username ``machine42`` only on Facebook. + +.. code-block:: console + + maigret machine42 --site Facebook + +Extract information from the Steam page by URL and start a search for accounts with found username ``machine42``. + +.. code-block:: console + + maigret --parse https://steamcommunity.com/profiles/76561199113454789 + +Start a search for accounts with username ``machine42`` only on US and Japanese sites. + +.. code-block:: console + + maigret michael --tags en,jp + +Start a search for accounts with username ``machine42`` only on sites related to software development. + +.. code-block:: console + + maigret michael --tags coding + +Start a search for accounts with username ``machine42`` on uCoz sites only (mostly CIS countries). + +.. code-block:: console + + maigret michael --tags ucoz +