diff --git a/.gitignore b/.gitignore index 16b131f..4b887a9 100644 --- a/.gitignore +++ b/.gitignore @@ -1,5 +1,6 @@ # Virtual Environment venv/ +.venv/ # Editor Configurations .vscode/ @@ -38,3 +39,7 @@ htmlcov/ # Maigret files settings.json + +# other +*.egg-info +build diff --git a/Makefile b/Makefile index fa161a6..d091c83 100644 --- a/Makefile +++ b/Makefile @@ -10,10 +10,10 @@ rerun-tests: lint: @echo 'syntax errors or undefined names' - flake8 --count --select=E9,F63,F7,F82 --show-source --statistics ${LINT_FILES} maigret.py + flake8 --count --select=E9,F63,F7,F82 --show-source --statistics ${LINT_FILES} @echo 'warning' - flake8 --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --ignore=E731,W503,E501 ${LINT_FILES} maigret.py + flake8 --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --ignore=E731,W503,E501 ${LINT_FILES} @echo 'mypy' mypy ${LINT_FILES} diff --git a/docs/requirements.txt b/docs/requirements.txt index 787317a..57edd76 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -1 +1,2 @@ sphinx-copybutton +sphinx_rtd_theme \ No newline at end of file diff --git a/docs/source/development.rst b/docs/source/development.rst index 972c737..618e678 100644 --- a/docs/source/development.rst +++ b/docs/source/development.rst @@ -3,10 +3,31 @@ Development ============== +Frequently Asked Questions +------------------------- + +1. Where to find the list of supported sites? + +The human-readable list of supported sites is available in the `sites.md `_ file in the repository. +It's been generated automatically from the main JSON file with the list of supported sites. + +The machine-readable JSON file with the list of supported sites is available in the +`data.json `_ file in the directory `resources`. + +2. Which methods to check the account presence are supported? + +The supported methods (``checkType`` values in ``data.json``) are: + +- ``message`` - the most reliable method, checks if any string from ``presenceStrs`` is present and none of the strings from ``absenceStrs`` are present in the HTML response +- ``status_code`` - checks that status code of the response is 2XX +- ``response_url`` - check if there is not redirect and the response is 2XX + +See the details of check mechanisms in the `checking.py `_ file. + Testing ------- -It is recommended use Python 3.7/3.8 for test due to some conflicts in 3.9. +It is recommended use Python 3.10 for testing. Install test requirements: @@ -20,20 +41,68 @@ Use the following commands to check Maigret: .. code-block:: console # run linter and typing checks - # order of checks% + # order of checks: # - critical syntax errors or undefined names # - flake checks # - mypy checks make lint # run testing with coverage html report - # current test coverage is 60% - make text + # current test coverage is 58% + make test # open html report open htmlcov/index.html +How to fix false-positives +----------------------------------------------- + +1. Determine the problematic site. + +If you already know which site has a false-positive and want to fix it specifically, go to the next step. + +Otherwise, simply run a search with a random username (e.g. `laiuhi3h4gi3u4hgt`) and check the results. +Alternatively, you can use `the Telegram bot `_. + +2. Open the account link in your browser and check: + +- If the site is completely gone, remove it from the list +- If the site still works but looks different, update in data.json how we check it +- If the site requires login to view profiles, disable checking it + +3. Find the site in the `data.json `_ file. + +If the ``checkType`` method is not ``message`` and you are going to fix check, update it: +- put ``message`` in ``checkType`` +- put in ``absenceStrs`` a keyword that is present in the HTML response for an non-existing account +- put in ``presenceStrs`` a keyword that is present in the HTML response for an existing account + +If you have trouble determining the right keywords, you can use automatic detection by passing the account URL with the ``--submit`` option: + +.. code-block:: console + + maigret --submit https://my.mail.ru/bk/alex + +To disable checking, set ``disabled`` to ``true`` or simply run: + +.. code-block:: console + + maigret --self-check --site My.Mail.ru@bk.ru + +To debug the check method using the response HTML, you can run: + +.. code-block:: console + + maigret soxoj --site My.Mail.ru@bk.ru -d 2> response.txt + +There are few options for sites data.json helpful in various cases: + +- ``engine`` - a predefined check for the sites of certain type (e.g. forums), see the ``engines`` section in the JSON file +- ``headers`` - a dictionary of additional headers to be sent to the site +- ``requestHeadOnly`` - set to ``true`` if it's enough to make a HEAD request to the site +- ``regexCheck`` - a regex to check if the username is valid, in case of frequent false-positives + How to publish new version of Maigret ------------------------------------- @@ -98,4 +167,17 @@ PyPi package. - Press `+ Auto-generate release notes` - **Press "Publish release" button** -8. That's all, now you can simply wait push to PyPi. You can monitor it in Action page: https://github.com/soxoj/maigret/actions/workflows/python-publish.yml \ No newline at end of file +8. That's all, now you can simply wait push to PyPi. You can monitor it in Action page: https://github.com/soxoj/maigret/actions/workflows/python-publish.yml + +Documentation updates +-------------------- + +Documentations is auto-generated and auto-deployed from the ``docs`` directory. + +To manually update documentation: + +1. Change something in the ``.rst`` files in the ``docs/source`` directory. +2. Install ``pip install -r requirements.txt`` in the docs directory. +3. Run ``make singlehtml`` in the terminal in the docs directory. +4. Open ``build/singlehtml/index.html`` in your browser to see the result. +5. If everything is ok, commit and push your changes to GitHub. diff --git a/docs/source/index.rst b/docs/source/index.rst index 85a5562..420483f 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -3,11 +3,12 @@ Welcome to the Maigret docs! ============================ -**Maigret** is an easy-to-use and powerful OSINT tool for collecting a dossier on a person by username only. +**Maigret** is an easy-to-use and powerful OSINT tool for collecting a dossier on a person by a username (alias) only. This is achieved by checking for accounts on a huge number of sites and gathering all the available information from web pages. -The project's main goal - give to OSINT researchers and pentesters a **universal tool** to get maximum information about a subject and integrate it with other tools in automatization pipelines. +The project's main goal — give to OSINT researchers and pentesters a **universal tool** to get maximum information +about a person of interest by a username and integrate it with other tools in automatization pipelines. You may be interested in: ------------------------- diff --git a/docs/source/roadmap.rst b/docs/source/roadmap.rst index c1dca39..eb989dd 100644 --- a/docs/source/roadmap.rst +++ b/docs/source/roadmap.rst @@ -3,6 +3,8 @@ Roadmap ======= +**This roadmap is outdated and needs to be updated.** + .. figure:: https://i.imgur.com/kk8cFdR.png :target: https://i.imgur.com/kk8cFdR.png :align: center diff --git a/maigret/resources/data.json b/maigret/resources/data.json index a887115..4f85979 100644 --- a/maigret/resources/data.json +++ b/maigret/resources/data.json @@ -10609,9 +10609,11 @@ ], "type": "ok_id", "checkType": "message", + "presenceStrs": [ + "profile__menu" + ], "absenceStrs": [ - "/invite?mna=&mnb=", - "mm-error-404__head" + "mm-profile_not-found_content" ], "alexaRank": 49, "urlMain": "https://my.mail.ru/", @@ -10625,9 +10627,11 @@ ], "type": "vk_id", "checkType": "message", + "presenceStrs": [ + "profile__menu" + ], "absenceStrs": [ - "/invite?mna=&mnb=", - "mm-error-404__head" + "mm-profile_not-found_content" ], "alexaRank": 49, "urlMain": "https://my.mail.ru/", @@ -10640,9 +10644,11 @@ "ru" ], "checkType": "message", + "presenceStrs": [ + "profile__menu" + ], "absenceStrs": [ - "/invite?mna=&mnb=", - "mm-error-404__head" + "mm-profile_not-found_content" ], "alexaRank": 49, "urlMain": "https://my.mail.ru/", @@ -10655,9 +10661,11 @@ "ru" ], "checkType": "message", + "presenceStrs": [ + "profile__menu" + ], "absenceStrs": [ - "/invite?mna=&mnb=", - "mm-error-404__head" + "mm-profile_not-found_content" ], "alexaRank": 49, "urlMain": "https://my.mail.ru/", @@ -10670,9 +10678,11 @@ "ru" ], "checkType": "message", + "presenceStrs": [ + "profile__menu" + ], "absenceStrs": [ - "/invite?mna=&mnb=", - "mm-error-404__head" + "mm-profile_not-found_content" ], "alexaRank": 49, "urlMain": "https://my.mail.ru/", @@ -10685,9 +10695,11 @@ "ru" ], "checkType": "message", + "presenceStrs": [ + "profile__menu" + ], "absenceStrs": [ - "/invite?mna=&mnb=", - "mm-error-404__head" + "mm-profile_not-found_content" ], "alexaRank": 49, "urlMain": "https://my.mail.ru/", @@ -10700,9 +10712,11 @@ "ru" ], "checkType": "message", + "presenceStrs": [ + "profile__menu" + ], "absenceStrs": [ - "/invite?mna=&mnb=", - "mm-error-404__head" + "mm-profile_not-found_content" ], "alexaRank": 49, "urlMain": "https://my.mail.ru/", @@ -10715,9 +10729,11 @@ "ru" ], "checkType": "message", + "presenceStrs": [ + "profile__menu" + ], "absenceStrs": [ - "/invite?mna=&mnb=", - "mm-error-404__head" + "mm-profile_not-found_content" ], "alexaRank": 49, "urlMain": "https://my.mail.ru/", @@ -34975,4 +34991,4 @@ "crypto", "ai" ] -} +} \ No newline at end of file diff --git a/maigret/resources/simple_report.tpl b/maigret/resources/simple_report.tpl index 7c3c48e..c2e3322 100644 --- a/maigret/resources/simple_report.tpl +++ b/maigret/resources/simple_report.tpl @@ -68,7 +68,6 @@
- Invalid? Photo

diff --git a/maigret/resources/simple_report_pdf.tpl b/maigret/resources/simple_report_pdf.tpl index 6a532aa..f3db395 100644 --- a/maigret/resources/simple_report_pdf.tpl +++ b/maigret/resources/simple_report_pdf.tpl @@ -64,7 +64,6 @@
- Invalid?
diff --git a/sites.md b/sites.md index 6f5966b..ade310a 100644 --- a/sites.md +++ b/sites.md @@ -1,5 +1,5 @@ -## List of supported sites (search methods): total 3104 +## List of supported sites (search methods): total 3103 Rank data fetched from Alexa by domains. @@ -1367,7 +1367,6 @@ Rank data fetched from Alexa by domains. 1. ![](https://www.google.com/s2/favicons?domain=https://www.w7forums.com) [W7forums (https://www.w7forums.com)](https://www.w7forums.com)*: top 10M, forum* 1. ![](https://www.google.com/s2/favicons?domain=http://kik.me/) [Kik (http://kik.me/)](http://kik.me/)*: top 10M, us* 1. ![](https://www.google.com/s2/favicons?domain=https://www.arrse.co.uk/) [Arrse (https://www.arrse.co.uk/)](https://www.arrse.co.uk/)*: top 10M, ca, forum, gb, in, pk* -1. ![](https://www.google.com/s2/favicons?domain=https://www.alik.cz/) [Alik.cz (https://www.alik.cz/)](https://www.alik.cz/)*: top 10M, cz* 1. ![](https://www.google.com/s2/favicons?domain=http://profile.hatena.com) [Hatena (http://profile.hatena.com)](http://profile.hatena.com)*: top 10M, bookmarks, jp* 1. ![](https://www.google.com/s2/favicons?domain=https://forum.shopsmith.com) [forum.shopsmith.com (https://forum.shopsmith.com)](https://forum.shopsmith.com)*: top 10M, forum, pk* 1. ![](https://www.google.com/s2/favicons?domain=http://southparkz.net) [southparkz.net (http://southparkz.net)](http://southparkz.net)*: top 10M, ru* @@ -3108,16 +3107,16 @@ Rank data fetched from Alexa by domains. 1. ![](https://www.google.com/s2/favicons?domain=https://ngl.link) [ngl.link (https://ngl.link)](https://ngl.link)*: top 100M, q&a* 1. ![](https://www.google.com/s2/favicons?domain=https://bitpapa.com) [bitpapa.com (https://bitpapa.com)](https://bitpapa.com)*: top 100M, crypto* -The list was updated at (2024-05-23 19:05:28.745384+00:00 UTC) +The list was updated at (2024-11-23 17:40:24.439883+00:00 UTC) ## Statistics -Enabled/total sites: 2770/3104 = 89.24% +Enabled/total sites: 2769/3103 = 89.24% -Incomplete message checks: 424/2770 = 15.31% (false positive risks) +Incomplete message checks: 424/2769 = 15.31% (false positive risks) -Status code checks: 721/2770 = 26.03% (false positive risks) +Status code checks: 720/2769 = 26.0% (false positive risks) -False positive risk (total): 41.34% +False positive risk (total): 41.31% Top 20 profile URLs: - (796) `{urlMain}/index/8-0-{username} (uCoz)` @@ -3127,7 +3126,7 @@ Top 20 profile URLs: - (133) `{urlMain}{urlSubpath}/member.php?username={username} (vBulletin)` - (127) `{urlMain}{urlSubpath}/search.php?author={username} (phpBB/Search)` - (117) `/profile/{username}` -- (109) `/u/{username}` +- (108) `/u/{username}` - (88) `/users/{username}` - (87) `{urlMain}/u/{username}/summary (Discourse)` - (54) `/wiki/User:{username}`