Compare commits

..

2 Commits

Author SHA1 Message Date
dezort12 b62aec4882 disable-sites 2023-01-31 13:18:56 -05:00
dezort12 4062dab288 disable-donationalerts 2023-01-27 14:30:40 -05:00
60 changed files with 1100 additions and 5898 deletions
-2
View File
@@ -1,5 +1,3 @@
# These are supported funding model platforms
patreon: soxoj
github: soxoj
buy_me_a_coffee: soxoj
+3 -3
View File
@@ -2,7 +2,7 @@ name: Package exe with PyInstaller - Windows
on:
push:
branches: [ main, dev ]
branches: [ main ]
jobs:
build:
@@ -10,13 +10,13 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v2
- name: PyInstaller Windows
uses: JackMcKew/pyinstaller-action-windows@main
with:
path: pyinstaller
- uses: actions/upload-artifact@v4
- uses: actions/upload-artifact@v2
with:
name: maigret_standalone_win32
path: pyinstaller/dist/windows # or path/to/artifact
+4 -4
View File
@@ -13,7 +13,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]
python-version: [3.7, 3.8, 3.9]
steps:
- uses: actions/checkout@v2
@@ -24,8 +24,8 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install poetry
python -m poetry install --with dev
python -m pip install -r test-requirements.txt
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Test with pytest
run: |
poetry run pytest --reruns 3 --reruns-delay 5
pytest --reruns 3 --reruns-delay 5
-5
View File
@@ -1,6 +1,5 @@
# Virtual Environment
venv/
.venv/
# Editor Configurations
.vscode/
@@ -39,7 +38,3 @@ htmlcov/
# Maigret files
settings.json
# other
*.egg-info
build
-16
View File
@@ -1,16 +0,0 @@
version: 2
build:
os: ubuntu-22.04
tools:
python: "3.10"
sphinx:
configuration: docs/source/conf.py
formats:
- pdf
python:
install:
- requirements: docs/requirements.txt
+1 -24
View File
@@ -2,10 +2,6 @@
Hey! I'm really glad you're reading this. Maigret contains a lot of sites, and it is very hard to keep all the sites operational. That's why any fix is important.
## Code of Conduct
Please read and follow the [Code of Conduct](CODE_OF_CONDUCT.md) to foster a welcoming and inclusive community.
## How to add a new site
#### Beginner level
@@ -31,23 +27,4 @@ Always write a clear log message for your commits. One-line messages are fine fo
## Coding conventions
### General Guidelines
- Try to follow [PEP 8](https://www.python.org/dev/peps/pep-0008/) for Python code style.
- Ensure your code passes all tests before submitting a pull request.
### Code Style
- **Indentation**: Use 4 spaces per indentation level.
- **Imports**:
- Standard library imports should be placed at the top.
- Third-party imports should follow.
- Group imports logically.
### Naming Conventions
- **Variables and Functions**: Use `snake_case`.
- **Classes**: Use `CamelCase`.
- **Constants**: Use `UPPER_CASE`.
Start reading the code and you'll get the hang of it. ;)
Start reading the code and you'll get the hang of it. ;)
+1 -1
View File
@@ -1,4 +1,4 @@
FROM python:3.10-slim
FROM python:3.9-slim
LABEL maintainer="Soxoj <soxoj@protonmail.com>"
WORKDIR /app
RUN pip install --no-cache-dir --upgrade pip
-128
View File
@@ -1,128 +0,0 @@
@echo off
REM check if running as admin
goto check_Permissions
:check_Permissions
echo Administrative permissions required. Detecting permissions...
net session >nul 2>&1
if %errorLevel% == 0 (
goto 1
) else (
cls
echo Failure: You MUST run this as administator, otherwise commands will fail.
)
pause >nul
REM Step 2: Check if Python and pip3 are installed
python --version >nul 2>&1
if %errorlevel% neq 0 (
echo Python is not installed. Please install Python 3.8 or higher.
pause
exit /b
)
pip3 --version >nul 2>&1
if %errorlevel% neq 0 (
echo pip3 is not installed. Please install pip3.
pause
exit /b
)
REM Step 3: Check Python version
python -c "import sys; exit(0) if sys.version_info >= (3,8) else exit(1)"
if %errorlevel% neq 0 (
echo Python version 3.8 or higher is required.
pause
exit /b
)
:1
cls
:::===============================================================
::: ______ __ __ _ _
::: | ____| | \/ | (_) | |
::: | |__ __ _ ___ _ _ | \ / | __ _ _ __ _ _ __ ___| |_
::: | __| / _` / __| | | | | |\/| |/ _` | |/ _` | '__/ _ \ __|
::: | |___| (_| \__ \ |_| | | | | | (_| | | (_| | | | __/ |_
::: |______\__,_|___/\__, | |_| |_|\__,_|_|\__, |_| \___|\__|
::: __/ | __/ |
::: |___/ |___/
:::
:::===============================================================
echo.
for /f "delims=: tokens=*" %%A in ('findstr /b ::: "%~f0"') do @echo(%%A
echo.
echo ----------------------------------------------------------------
echo Python 3.8 or higher and pip3 required.
echo ----------------------------------------------------------------
echo Press [I] to begin installation.
echo Press [R] If already installed.
echo ----------------------------------------------------------------
choice /c IR
if %errorlevel%==1 goto install1
if %errorlevel%==2 goto after
:install1
cls
echo ========================================================
echo Maigret Installation Script
echo ========================================================
echo.
echo --------------------------------------------------------
echo If your pip installation is outdated, it could cause
echo cryptography to fail on installation.
echo --------------------------------------------------------
echo check for and install pip updates now?
echo --------------------------------------------------------
choice /c YN
if %errorlevel%==1 goto install2
if %errorlevel%==2 goto install3
:install2
cls
python -m pip install --upgrade pip
goto:install3
:install3
cls
echo ========================================================
echo Maigret Installation Script
echo ========================================================
echo.
echo --------------------------------------------------------
echo Install requirements and maigret?
echo --------------------------------------------------------
choice /c YN
if %errorlevel%==1 goto install4
if %errorlevel%==2 goto 1
:install4
cls
pip install .
pip install maigret
goto:after
:after
cls
echo ========================================================
echo Maigret Background Search
echo ========================================================
echo.
echo --------------------------------------------------------
echo Please Enter Username / Email
echo --------------------------------------------------------
set /p input=
maigret %input%
echo.
echo.
echo.
echo.
pause
goto:after
+2 -2
View File
@@ -10,10 +10,10 @@ rerun-tests:
lint:
@echo 'syntax errors or undefined names'
flake8 --count --select=E9,F63,F7,F82 --show-source --statistics ${LINT_FILES}
flake8 --count --select=E9,F63,F7,F82 --show-source --statistics ${LINT_FILES} maigret.py
@echo 'warning'
flake8 --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --ignore=E731,W503,E501 ${LINT_FILES}
flake8 --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --ignore=E731,W503,E501 ${LINT_FILES} maigret.py
@echo 'mypy'
mypy ${LINT_FILES}
+17 -41
View File
@@ -3,35 +3,27 @@
<p align="center">
<p align="center">
<a href="https://pypi.org/project/maigret/">
<img alt="PyPI version badge for Maigret" src="https://img.shields.io/pypi/v/maigret?style=flat-square" />
<img alt="PyPI" src="https://img.shields.io/pypi/v/maigret?style=flat-square">
</a>
<a href="https://pypi.org/project/maigret/">
<img alt="PyPI download count for Maigret" src="https://img.shields.io/pypi/dw/maigret?style=flat-square" />
<a href="https://pypi.org/project/maigret/">
<img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dw/maigret?style=flat-square">
</a>
<a href="https://github.com/soxoj/maigret">
<img alt="Minimum Python version required: 3.10+" src="https://img.shields.io/badge/Python-3.10%2B-brightgreen?style=flat-square" />
</a>
<a href="https://github.com/soxoj/maigret/blob/main/LICENSE">
<img alt="License badge for Maigret" src="https://img.shields.io/github/license/soxoj/maigret?style=flat-square" />
</a>
<a href="https://github.com/soxoj/maigret">
<img alt="View count for Maigret project" src="https://komarev.com/ghpvc/?username=maigret&color=brightgreen&label=views&style=flat-square" />
<a href="https://pypi.org/project/maigret/">
<img alt="Views" src="https://komarev.com/ghpvc/?username=maigret&color=brightgreen&label=views&style=flat-square">
</a>
</p>
<p align="center">
<img src="https://raw.githubusercontent.com/soxoj/maigret/main/static/maigret.png" height="300"/>
<img src="https://raw.githubusercontent.com/soxoj/maigret/main/static/maigret.png" height="200"/>
</p>
</p>
<i>The Commissioner Jules Maigret is a fictional French police detective, created by Georges Simenon. His investigation method is based on understanding the personality of different people and their interactions.</i>
<b>👉👉👉 [Online Telegram bot](https://t.me/osint_maigret_bot)</b>
## About
**Maigret** collects a dossier on a person **by username only**, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys required. Maigret is an easy-to-use and powerful fork of [Sherlock](https://github.com/sherlock-project/sherlock).
Currently supported more than 3000 sites ([full list](https://github.com/soxoj/maigret/blob/main/sites.md)), search is launched against 500 popular sites in descending order of popularity by default. Also supported checking of Tor sites, I2P sites, and domains (via DNS resolving).
Currently supported more than 2500 sites ([full list](https://github.com/soxoj/maigret/blob/main/sites.md)), search is launched against 500 popular sites in descending order of popularity by default. Also supported checking of Tor sites, I2P sites, and domains (via DNS resolving).
## Main features
@@ -45,13 +37,11 @@ See full description of Maigret features [in the documentation](https://maigret.
## Installation
‼️ Maigret is available online via [official Telegram bot](https://t.me/osint_maigret_bot).
Maigret can be installed using pip, Docker, or simply can be launched from the cloned repo.
Standalone EXE-binaries for Windows are located in [Releases section](https://github.com/soxoj/maigret/releases) of GitHub repository.
Also, you can run Maigret using cloud shells and Jupyter notebooks (see buttons below).
Also you can run Maigret using cloud shells and Jupyter notebooks (see buttons below).
[![Open in Cloud Shell](https://user-images.githubusercontent.com/27065646/92304704-8d146d80-ef80-11ea-8c29-0deaabb1c702.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/soxoj/maigret&tutorial=README.md)
<a href="https://repl.it/github/soxoj/maigret"><img src="https://replit.com/badge/github/soxoj/maigret" alt="Run on Replit" height="50"></a>
@@ -61,7 +51,7 @@ Also, you can run Maigret using cloud shells and Jupyter notebooks (see buttons
### Package installing
**NOTE**: Python 3.10 or higher and pip is required, **Python 3.11 is recommended.**
**NOTE**: Python 3.7 or higher and pip is required, **Python 3.8 is recommended.**
```bash
# install from pypi
@@ -76,12 +66,10 @@ maigret username
```bash
# or clone and install manually
git clone https://github.com/soxoj/maigret && cd maigret
# build and install
pip3 install .
pip3 install -r requirements.txt
# usage
maigret username
./maigret.py username
```
### Docker
@@ -100,17 +88,12 @@ docker build -t maigret .
## Usage examples
```bash
# make HTML, PDF, and Xmind8 reports
maigret user --html
maigret user --pdf
maigret user --xmind #Output not compatible with xmind 2022+
# make HTML and PDF reports
maigret user --html --pdf
# search on sites marked with tags photo & dating
maigret user --tags photo,dating
# search on sites marked with tag us
maigret user --tags us
# search for three usernames on all available sites
maigret user1 user2 user3 -a
```
@@ -120,29 +103,22 @@ Use `maigret --help` to get full options description. Also options [are document
## Contributing
Maigret has open-source code, so you may contribute your own sites by adding them to `data.json` file, or bring changes to it's code!
For more information about development and contribution, please read the [development documentation](https://maigret.readthedocs.io/en/latest/development.html).
If you want to contribute, don't forget to activate statistics update hook, command for it would look like this: `git config --local core.hooksPath .githooks/`
You should make your git commits from your maigret git repo folder, or else the hook wouldn't find the statistics update script.
## Demo with page parsing and recursive username search
[PDF report](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.pdf), [HTML report](https://htmlpreview.github.io/?https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotographycars.html)
![animation of recursive search](https://raw.githubusercontent.com/soxoj/maigret/main/static/recursive_search.gif)
![animation of recursive search](https://raw.githubusercontent.com/soxoj/maigret/main/static/recursive_search.svg)
![HTML report screenshot](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotography_html_screenshot.png)
![XMind 8 report screenshot](https://raw.githubusercontent.com/soxoj/maigret/main/static/report_alexaimephotography_xmind_screenshot.png)
[Full console output](https://raw.githubusercontent.com/soxoj/maigret/main/static/recursive_search.md)
### SOWEL classification
This tool uses the following OSINT techniques:
- [SOTL-2.2. Search For Accounts On Other Platforms](https://sowel.soxoj.com/other-platform-accounts)
- [SOTL-6.1. Check Logins Reuse To Find Another Account](https://sowel.soxoj.com/logins-reuse)
- [SOTL-6.2. Check Nicknames Reuse To Find Another Account](https://sowel.soxoj.com/nicknames-reuse)
## License
MIT © [Maigret](https://github.com/soxoj/maigret)<br/>
Executable
+18
View File
@@ -0,0 +1,18 @@
#!/usr/bin/env python3
import asyncio
import sys
from maigret.maigret import main
def run():
try:
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
except KeyboardInterrupt:
print('Maigret is interrupted.')
sys.exit(1)
if __name__ == "__main__":
run()
+1 -1
View File
@@ -10,4 +10,4 @@
pixabay.com FALSE / FALSE 0 anonymous_user_id c1e4ee09-5674-4252-aa94-8c47b1ea80ab
pixabay.com FALSE / FALSE 1647214439 csrftoken vfetTSvIul7gBlURt6s985JNM18GCdEwN5MWMKqX4yI73xoPgEj42dbNefjGx5fr
pixabay.com FALSE / FALSE 1647300839 client_width 1680
pixabay.com FALSE / FALSE 748111764839 is_human 1
pixabay.com FALSE / FALSE 748111764839 is_human 1
-1
View File
@@ -1,2 +1 @@
sphinx-copybutton
sphinx_rtd_theme
+3 -3
View File
@@ -18,7 +18,7 @@ Parsing of account pages and online documents
Maigret will try to extract information about the document/account owner
(including username and other ids) and will make a search by the
extracted username and ids. See examples :doc:`in the separate section <extracting-information-from-pages>`.
extracted username and ids. :doc:`Examples <extracting-information-from-pages>`.
Main options
------------
@@ -28,8 +28,8 @@ Options are also configurable through settings files, see
``--tags`` - Filter sites for searching by tags: sites categories and
two-letter country codes (**not a language!**). E.g. photo, dating, sport; jp, us, global.
Multiple tags can be associated with one site. **Warning**: tags markup is
not stable now. Read more :doc:`in the separate section <tags>`.
Multiple tags can be associated with one site. **Warning: tags markup is
not stable now.**
``-n``, ``--max-connections`` - Allowed number of concurrent connections
**(default: 100)**.
+5 -91
View File
@@ -3,31 +3,10 @@
Development
==============
Frequently Asked Questions
-------------------------
1. Where to find the list of supported sites?
The human-readable list of supported sites is available in the `sites.md <https://github.com/soxoj/maigret/blob/main/sites.md>`_ file in the repository.
It's been generated automatically from the main JSON file with the list of supported sites.
The machine-readable JSON file with the list of supported sites is available in the
`data.json <https://github.com/soxoj/maigret/blob/main/maigret/resources/data.json>`_ file in the directory `resources`.
2. Which methods to check the account presence are supported?
The supported methods (``checkType`` values in ``data.json``) are:
- ``message`` - the most reliable method, checks if any string from ``presenceStrs`` is present and none of the strings from ``absenceStrs`` are present in the HTML response
- ``status_code`` - checks that status code of the response is 2XX
- ``response_url`` - check if there is not redirect and the response is 2XX
See the details of check mechanisms in the `checking.py <https://github.com/soxoj/maigret/blob/main/maigret/checking.py#L339>`_ file.
Testing
-------
It is recommended use Python 3.10 for testing.
It is recommended use Python 3.7/3.8 for test due to some conflicts in 3.9.
Install test requirements:
@@ -41,72 +20,20 @@ Use the following commands to check Maigret:
.. code-block:: console
# run linter and typing checks
# order of checks:
# order of checks%
# - critical syntax errors or undefined names
# - flake checks
# - mypy checks
make lint
# run testing with coverage html report
# current test coverage is 58%
make test
# current test coverage is 60%
make text
# open html report
open htmlcov/index.html
How to fix false-positives
-----------------------------------------------
If you want to work with sites database, don't forget to activate statistics update git hook, command for it would look like this: ``git config --local core.hooksPath .githooks/``.
You should make your git commits from your maigret git repo folder, or else the hook wouldn't find the statistics update script.
1. Determine the problematic site.
If you already know which site has a false-positive and want to fix it specifically, go to the next step.
Otherwise, simply run a search with a random username (e.g. `laiuhi3h4gi3u4hgt`) and check the results.
Alternatively, you can use `the Telegram bot <https://t.me/osint_maigret_bot>`_.
2. Open the account link in your browser and check:
- If the site is completely gone, remove it from the list
- If the site still works but looks different, update in data.json how we check it
- If the site requires login to view profiles, disable checking it
3. Find the site in the `data.json <https://github.com/soxoj/maigret/blob/main/maigret/resources/data.json>`_ file.
If the ``checkType`` method is not ``message`` and you are going to fix check, update it:
- put ``message`` in ``checkType``
- put in ``absenceStrs`` a keyword that is present in the HTML response for an non-existing account
- put in ``presenceStrs`` a keyword that is present in the HTML response for an existing account
If you have trouble determining the right keywords, you can use automatic detection by passing the account URL with the ``--submit`` option:
.. code-block:: console
maigret --submit https://my.mail.ru/bk/alex
To disable checking, set ``disabled`` to ``true`` or simply run:
.. code-block:: console
maigret --self-check --site My.Mail.ru@bk.ru
To debug the check method using the response HTML, you can run:
.. code-block:: console
maigret soxoj --site My.Mail.ru@bk.ru -d 2> response.txt
There are few options for sites data.json helpful in various cases:
- ``engine`` - a predefined check for the sites of certain type (e.g. forums), see the ``engines`` section in the JSON file
- ``headers`` - a dictionary of additional headers to be sent to the site
- ``requestHeadOnly`` - set to ``true`` if it's enough to make a HEAD request to the site
- ``regexCheck`` - a regex to check if the username is valid, in case of frequent false-positives
How to publish new version of Maigret
-------------------------------------
@@ -171,17 +98,4 @@ PyPi package.
- Press `+ Auto-generate release notes`
- **Press "Publish release" button**
8. That's all, now you can simply wait push to PyPi. You can monitor it in Action page: https://github.com/soxoj/maigret/actions/workflows/python-publish.yml
Documentation updates
--------------------
Documentations is auto-generated and auto-deployed from the ``docs`` directory.
To manually update documentation:
1. Change something in the ``.rst`` files in the ``docs/source`` directory.
2. Install ``pip install -r requirements.txt`` in the docs directory.
3. Run ``make singlehtml`` in the terminal in the docs directory.
4. Open ``build/singlehtml/index.html`` in your browser to see the result.
5. If everything is ok, commit and push your changes to GitHub.
8. That's all, now you can simply wait push to PyPi. You can monitor it in Action page: https://github.com/soxoj/maigret/actions/workflows/python-publish.yml
+3 -86
View File
@@ -14,96 +14,14 @@ Also, Maigret use found ids and usernames from links to start a recursive search
Enabled by default, can be disabled with ``--no extracting``.
.. code-block:: text
$ python3 -m maigret soxoj --timeout 5
[-] Starting a search on top 500 sites from the Maigret database...
[!] You can run search by full list of sites with flag `-a`
[*] Checking username soxoj on:
...
[+] GitHub: https://github.com/soxoj
├─uid: 31013580
├─image: https://avatars.githubusercontent.com/u/31013580?v=4
├─created_at: 2017-08-14T17:03:07Z
├─location: Amsterdam, Netherlands
├─follower_count: 1304
├─following_count: 54
├─fullname: Soxoj
├─public_gists_count: 3
├─public_repos_count: 88
├─twitter_username: sox0j
├─bio: Head of OSINT Center of Excellence in @SocialLinks-IO
├─is_company: Social Links
└─blog_url: soxoj.com
...
Recursive search
----------------
Maigret has the ability to scan account pages for :ref:`common identifiers <supported-identifier-types>` and usernames found in links.
When people include links to their other social media accounts, Maigret can automatically detect and initiate new searches for those profiles.
Any information discovered through this process will be shown in both the command-line interface output and generated reports.
Maigret can extract some :ref:`common ids <supported-identifier-types>` and usernames from links on the account page (often people placed links to their other accounts) and immediately start new searches. All the gathered information will be displayed in CLI output and reports.
Enabled by default, can be disabled with ``--no-recursion``.
.. code-block:: text
$ python3 -m maigret soxoj --timeout 5
[-] Starting a search on top 500 sites from the Maigret database...
[!] You can run search by full list of sites with flag `-a`
[*] Checking username soxoj on:
...
[+] GitHub: https://github.com/soxoj
├─uid: 31013580
├─image: https://avatars.githubusercontent.com/u/31013580?v=4
├─created_at: 2017-08-14T17:03:07Z
├─location: Amsterdam, Netherlands
├─follower_count: 1304
├─following_count: 54
├─fullname: Soxoj
├─public_gists_count: 3
├─public_repos_count: 88
├─twitter_username: sox0j <===== another username found here
├─bio: Head of OSINT Center of Excellence in @SocialLinks-IO
├─is_company: Social Links
└─blog_url: soxoj.com
...
Searching |████████████████████████████████████████| 500/500 [100%] in 9.1s (54.85/s)
[-] You can see detailed site check errors with a flag `--print-errors`
[*] Checking username sox0j on:
[+] Telegram: https://t.me/sox0j
├─fullname: @Sox0j
...
Username permutations
--------------------
Maigret can generate permutations of usernames. Just pass a few usernames in the CLI and use ``--permute`` flag.
Thanks to `@balestek <https://github.com/balestek>`_ for the idea and implementation.
.. code-block:: text
$ python3 -m maigret --permute hope dream --timeout 5
[-] 12 permutations from hope dream to check...
├─ hopedream
├─ _hopedream
├─ hopedream_
├─ hope_dream
├─ hope-dream
├─ hope.dream
├─ dreamhope
├─ _dreamhope
├─ dreamhope_
├─ dream_hope
├─ dream-hope
└─ dream.hope
[-] Starting a search on top 500 sites from the Maigret database...
[!] You can run search by full list of sites with flag `-a`
[*] Checking username hopedream on:
...
Reports
Reports
-------
Maigret currently supports HTML, PDF, TXT, XMind 8 mindmap, and JSON reports.
@@ -116,8 +34,7 @@ HTML/PDF reports contain:
Also, there is a short text report in the CLI output after the end of a searching phase.
.. warning::
XMind 8 mindmaps are incompatible with XMind 2022!
**Warning**: XMind 8 mindmaps are incompatible with XMind 2022!
Tags
----
+5 -6
View File
@@ -3,12 +3,11 @@
Welcome to the Maigret docs!
============================
**Maigret** is an easy-to-use and powerful OSINT tool for collecting a dossier on a person by a username (alias) only.
**Maigret** is an easy-to-use and powerful OSINT tool for collecting a dossier on a person by username only.
This is achieved by checking for accounts on a huge number of sites and gathering all the available information from web pages.
The project's main goal give to OSINT researchers and pentesters a **universal tool** to get maximum information
about a person of interest by a username and integrate it with other tools in automatization pipelines.
The project's main goal - give to OSINT researchers and pentesters a **universal tool** to get maximum information about a subject and integrate it with other tools in automatization pipelines.
You may be interested in:
-------------------------
@@ -21,12 +20,12 @@ You may be interested in:
:caption: Sections
command-line-options
usage-examples
extracting-information-from-pages
features
philosophy
extracting-information-from-pages
roadmap
supported-identifier-types
tags
usage-examples
settings
development
roadmap
+1 -1
View File
@@ -5,7 +5,7 @@ Philosophy
TL;DR: Username => Dossier
Maigret is designed to gather all the available information about person by his username.
Maigret is designed to gather all the available information about person by his usernname.
What kind of information is this? First, links to person accounts. Secondly, all the machine-extractable
pieces of info, such as: other usernames, full name, URLs to people's images, birthday, location (country,
-3
View File
@@ -3,9 +3,6 @@
Roadmap
=======
.. warning::
This roadmap is outdated and needs to be updated.
.. figure:: https://i.imgur.com/kk8cFdR.png
:target: https://i.imgur.com/kk8cFdR.png
:align: center
+1 -2
View File
@@ -5,8 +5,7 @@ Tags
The use of tags allows you to select a subset of the sites from big Maigret DB for search.
.. warning::
Tags markup is still not stable.
**Warning: tags markup is not stable now.**
There are several types of tags:
+65 -40
View File
@@ -1,43 +1,68 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8v6PEfyXb0Gx"
},
"outputs": [],
"source": [
"# clone the repo\n",
"!git clone https://github.com/soxoj/maigret\n",
"!pip3 install -r maigret/requirements.txt"
]
},
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "acxNWJOUmLc4"
},
"outputs": [],
"source": [
"!git clone https://github.com/soxoj/maigret\n",
"!pip3 install ./maigret/\n",
"from IPython.display import clear_output\n",
"clear_output()\n",
"username = str(input(\"Username >> \"))\n",
"!maigret {username} -a -n 10"
]
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "S3SmapMHmOoD"
},
"execution_count": null,
"outputs": []
}
]
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "cXOQUAhDchkl"
},
"outputs": [],
"source": [
"# help\n",
"!python3 maigret/maigret.py --help"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "SjDmpN4QGnJu"
},
"outputs": [],
"source": [
"# search\n",
"!python3 maigret/maigret.py user"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"include_colab_link": true,
"name": "maigret.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Executable
+21
View File
@@ -0,0 +1,21 @@
#!/usr/bin/env python3
import asyncio
import sys
from maigret.maigret import main
def run():
try:
if sys.version_info.minor >= 10:
asyncio.run(main())
else:
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
except KeyboardInterrupt:
print('Maigret is interrupted.')
sys.exit(1)
if __name__ == "__main__":
run()
+120 -160
View File
@@ -1,40 +1,39 @@
# Standard library imports
import ast
import asyncio
import logging
import random
import re
import ssl
import sys
from typing import Dict, List, Optional, Tuple
from urllib.parse import quote
# Third party imports
import aiodns
import alive_progress
from alive_progress import alive_bar
from aiohttp import ClientSession, TCPConnector, http_exceptions
from aiohttp.client_exceptions import ClientConnectorError, ServerDisconnectedError
from python_socks import _errors as proxy_errors
from socid_extractor import extract
try:
from mock import Mock
except ImportError:
from unittest.mock import Mock
# Local imports
from . import errors
import re
import ssl
import sys
import tqdm
import random
from typing import Tuple, Optional, Dict, List
from urllib.parse import quote
import aiodns
import tqdm.asyncio
from python_socks import _errors as proxy_errors
from socid_extractor import extract
from aiohttp import TCPConnector, ClientSession, http_exceptions
from aiohttp.client_exceptions import ServerDisconnectedError, ClientConnectorError
from .activation import ParsingActivator, import_aiohttp_cookies
from . import errors
from .errors import CheckError
from .executors import (
AsyncExecutor,
AsyncioSimpleExecutor,
AsyncioProgressbarQueueExecutor,
)
from .result import QueryResult, QueryStatus
from .sites import MaigretDatabase, MaigretSite
from .types import QueryOptions, QueryResultWrapper
from .utils import ascii_data_display, get_random_user_agent
from .utils import get_random_user_agent, ascii_data_display
SUPPORTED_IDS = (
@@ -58,120 +57,119 @@ class CheckerBase:
class SimpleAiohttpChecker(CheckerBase):
def __init__(self, *args, **kwargs):
self.proxy = kwargs.get('proxy')
self.cookie_jar = kwargs.get('cookie_jar')
proxy = kwargs.get('proxy')
cookie_jar = kwargs.get('cookie_jar')
self.logger = kwargs.get('logger', Mock())
self.url = None
self.headers = None
self.allow_redirects = True
self.timeout = 0
self.method = 'get'
# moved here to speed up the launch of Maigret
from aiohttp_socks import ProxyConnector
# make http client session
connector = ProxyConnector.from_url(proxy) if proxy else TCPConnector(ssl=False)
connector.verify_ssl = False
self.session = ClientSession(
connector=connector, trust_env=True, cookie_jar=cookie_jar
)
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get'):
self.url = url
self.headers = headers
self.allow_redirects = allow_redirects
self.timeout = timeout
self.method = method
return None
if method == 'get':
request_method = self.session.get
else:
request_method = self.session.head
future = request_method(
url=url,
headers=headers,
allow_redirects=allow_redirects,
timeout=timeout,
)
return future
async def close(self):
pass
await self.session.close()
async def check(self, future) -> Tuple[str, int, Optional[CheckError]]:
html_text = None
status_code = 0
error: Optional[CheckError] = CheckError("Unknown")
async def _make_request(self, session, url, headers, allow_redirects, timeout, method, logger) -> Tuple[str, int, Optional[CheckError]]:
try:
request_method = session.get if method == 'get' else session.head
async with request_method(
url=url,
headers=headers,
allow_redirects=allow_redirects,
timeout=timeout,
) as response:
status_code = response.status
response_content = await response.content.read()
charset = response.charset or "utf-8"
decoded_content = response_content.decode(charset, "ignore")
response = await future
error = CheckError("Connection lost") if status_code == 0 else None
logger.debug(decoded_content)
status_code = response.status
response_content = await response.content.read()
charset = response.charset or "utf-8"
decoded_content = response_content.decode(charset, "ignore")
html_text = decoded_content
return decoded_content, status_code, error
error = None
if status_code == 0:
error = CheckError("Connection lost")
self.logger.debug(html_text)
except asyncio.TimeoutError as e:
return None, 0, CheckError("Request timeout", str(e))
error = CheckError("Request timeout", str(e))
except ClientConnectorError as e:
return None, 0, CheckError("Connecting failure", str(e))
error = CheckError("Connecting failure", str(e))
except ServerDisconnectedError as e:
return None, 0, CheckError("Server disconnected", str(e))
error = CheckError("Server disconnected", str(e))
except http_exceptions.BadHttpMessage as e:
return None, 0, CheckError("HTTP", str(e))
error = CheckError("HTTP", str(e))
except proxy_errors.ProxyError as e:
return None, 0, CheckError("Proxy", str(e))
error = CheckError("Proxy", str(e))
except KeyboardInterrupt:
return None, 0, CheckError("Interrupted")
error = CheckError("Interrupted")
except Exception as e:
# python-specific exceptions
if sys.version_info.minor > 6 and (
isinstance(e, ssl.SSLCertVerificationError)
or isinstance(e, ssl.SSLError)
):
return None, 0, CheckError("SSL", str(e))
error = CheckError("SSL", str(e))
else:
logger.debug(e, exc_info=True)
return None, 0, CheckError("Unexpected", str(e))
self.logger.debug(e, exc_info=True)
error = CheckError("Unexpected", str(e))
async def check(self) -> Tuple[str, int, Optional[CheckError]]:
from aiohttp_socks import ProxyConnector
connector = ProxyConnector.from_url(self.proxy) if self.proxy else TCPConnector(ssl=False)
connector.verify_ssl = False
if error == "Invalid proxy response":
self.logger.debug(error, exc_info=True)
async with ClientSession(
connector=connector,
trust_env=True,
cookie_jar=self.cookie_jar.copy() if self.cookie_jar else None
) as session:
html_text, status_code, error = await self._make_request(
session,
self.url,
self.headers,
self.allow_redirects,
self.timeout,
self.method,
self.logger
)
if error and str(error) == "Invalid proxy response":
self.logger.debug(error, exc_info=True)
return str(html_text) if html_text else '', status_code, error
return str(html_text), status_code, error
class ProxiedAiohttpChecker(SimpleAiohttpChecker):
def __init__(self, *args, **kwargs):
self.proxy = kwargs.get('proxy')
self.cookie_jar = kwargs.get('cookie_jar')
proxy = kwargs.get('proxy')
cookie_jar = kwargs.get('cookie_jar')
self.logger = kwargs.get('logger', Mock())
# moved here to speed up the launch of Maigret
from aiohttp_socks import ProxyConnector
connector = ProxyConnector.from_url(proxy)
connector.verify_ssl = False
self.session = ClientSession(
connector=connector, trust_env=True, cookie_jar=cookie_jar
)
class AiodnsDomainResolver(CheckerBase):
if sys.platform == 'win32': # Temporary workaround for Windows
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
def __init__(self, *args, **kwargs):
loop = asyncio.get_event_loop()
self.logger = kwargs.get('logger', Mock())
self.resolver = aiodns.DNSResolver(loop=loop)
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get'):
self.url = url
return None
return self.resolver.query(url, 'A')
async def check(self) -> Tuple[str, int, Optional[CheckError]]:
async def check(self, future) -> Tuple[str, int, Optional[CheckError]]:
status = 404
error = None
text = ''
try:
res = await self.resolver.query(self.url, 'A')
res = await future
text = str(res[0].host)
status = 200
except aiodns.error.DNSError:
@@ -190,7 +188,7 @@ class CheckerMock:
def prepare(self, url, headers=None, allow_redirects=True, timeout=0, method='get'):
return None
async def check(self) -> Tuple[str, int, Optional[CheckError]]:
async def check(self, future) -> Tuple[str, int, Optional[CheckError]]:
await asyncio.sleep(0)
return '', 0, None
@@ -376,16 +374,8 @@ def process_site_result(
if extracted_ids_data:
new_usernames = {}
for k, v in extracted_ids_data.items():
if "username" in k and not "usernames" in k:
if "username" in k:
new_usernames[v] = "username"
elif "usernames" in k:
try:
tree = ast.literal_eval(v)
if type(tree) == list:
for n in tree:
new_usernames[n] = "username"
except Exception as e:
logger.warning(e)
if k in SUPPORTED_IDS:
new_usernames[v] = k
@@ -425,8 +415,6 @@ def make_site_result(
headers = {
"User-Agent": get_random_user_agent(),
# tell server that we want to close connection after request
"Connection": "close",
}
headers.update(site.headers)
@@ -533,8 +521,7 @@ def make_site_result(
# Store future request object in the results object
results_site["future"] = future
results_site["checker"] = checker
results_site["checker"] = checker
return results_site
@@ -542,19 +529,14 @@ def make_site_result(
async def check_site_for_username(
site, username, options: QueryOptions, logger, query_notify, *args, **kwargs
) -> Tuple[str, QueryResultWrapper]:
default_result = make_site_result(
site, username, options, logger, retry=kwargs.get('retry')
)
# future = default_result.get("future")
# if not future:
# return site.name, default_result
checker = default_result.get("checker")
if not checker:
print(f"error, no checker for {site.name}")
default_result = make_site_result(site, username, options, logger, retry=kwargs.get('retry'))
future = default_result.get("future")
if not future:
return site.name, default_result
response = await checker.check()
checker = default_result["checker"]
response = await checker.check(future=future)
response_result = process_site_result(
response, query_notify, logger, default_result, site
@@ -566,8 +548,8 @@ async def check_site_for_username(
async def debug_ip_request(checker, logger):
checker.prepare(url="https://icanhazip.com")
ip, status, check_error = await checker.check()
future = checker.prepare(url="https://icanhazip.com")
ip, status, check_error = await checker.check(future)
if ip:
logger.debug(f"My IP is: {ip.strip()}")
else:
@@ -685,11 +667,8 @@ async def maigret(
executor = AsyncioSimpleExecutor(logger=logger)
else:
executor = AsyncioProgressbarQueueExecutor(
logger=logger,
in_parallel=max_connections,
timeout=timeout + 0.5,
*args,
**kwargs,
logger=logger, in_parallel=max_connections, timeout=timeout + 0.5,
*args, **kwargs
)
# make options objects for all the requests
@@ -731,10 +710,7 @@ async def maigret(
tasks_dict[sitename] = (
check_site_for_username,
[site, username, options, logger, query_notify],
{
'default': (sitename, default_result),
'retry': retries - attempts + 1,
},
{'default': (sitename, default_result), 'retry': retries-attempts+1},
)
cur_results = await executor.run(tasks_dict.values())
@@ -757,8 +733,10 @@ async def maigret(
# closing http client session
await clearweb_checker.close()
await tor_checker.close()
await i2p_checker.close()
if tor_proxy:
await tor_checker.close()
if i2p_proxy:
await i2p_checker.close()
# notify caller that all queries are finished
query_notify.finish()
@@ -793,7 +771,7 @@ def timeout_check(value):
async def site_self_check(
site: MaigretSite,
logger: logging.Logger,
logger,
semaphore,
db: MaigretDatabase,
silent=False,
@@ -839,9 +817,6 @@ async def site_self_check(
result = results_dict[site.name]["status"]
if result.error and 'Cannot connect to host' in result.error.desc:
changes["disabled"] = True
site_status = result.status
if site_status != status:
@@ -869,24 +844,18 @@ async def site_self_check(
if changes["disabled"] != site.disabled:
site.disabled = changes["disabled"]
logger.info(f"Switching disabled status of {site.name} to {site.disabled}")
db.update_site(site)
if not silent:
action = "Disabled" if site.disabled else "Enabled"
print(f"{action} site {site.name}...")
# remove service tag "unchecked"
if "unchecked" in site.tags:
site.tags.remove("unchecked")
db.update_site(site)
return changes
async def self_check(
db: MaigretDatabase,
site_data: dict,
logger: logging.Logger,
logger,
silent=False,
max_connections=10,
proxy=None,
@@ -900,7 +869,6 @@ async def self_check(
def disabled_count(lst):
return len(list(filter(lambda x: x.disabled, lst)))
unchecked_old_count = len([site for site in all_sites.values() if "unchecked" in site.tags])
disabled_old_count = disabled_count(all_sites.values())
for _, site in all_sites.items():
@@ -910,30 +878,22 @@ async def self_check(
future = asyncio.ensure_future(check_coro)
tasks.append(future)
if tasks:
with alive_bar(len(tasks), title='Self-checking', force_tty=True) as progress:
for f in asyncio.as_completed(tasks):
await f
progress() # Update the progress bar
for f in tqdm.asyncio.tqdm.as_completed(tasks):
await f
unchecked_new_count = len([site for site in all_sites.values() if "unchecked" in site.tags])
disabled_new_count = disabled_count(all_sites.values())
total_disabled = disabled_new_count - disabled_old_count
if total_disabled:
if total_disabled >= 0:
message = "Disabled"
else:
message = "Enabled"
total_disabled *= -1
if total_disabled >= 0:
message = "Disabled"
else:
message = "Enabled"
total_disabled *= -1
if not silent:
print(
f"{message} {total_disabled} ({disabled_old_count} => {disabled_new_count}) checked sites. "
"Run with `--info` flag to get more information"
)
if not silent:
print(
f"{message} {total_disabled} ({disabled_old_count} => {disabled_new_count}) checked sites. "
"Run with `--info` flag to get more information"
)
if unchecked_new_count != unchecked_old_count:
print(f"Unchecked sites verified: {unchecked_old_count - unchecked_new_count}")
return total_disabled != 0 or unchecked_new_count != unchecked_old_count
return total_disabled != 0
-6
View File
@@ -58,12 +58,6 @@ COMMON_ERRORS = {
'Сайт заблокирован хостинг-провайдером': CheckError(
'Site-specific', 'Site is disabled (Beget)'
),
'Generated by cloudfront (CloudFront)': CheckError(
'Request blocked', 'Cloudflare'
),
'/cdn-cgi/challenge-platform/h/b/orchestrate/chl_page': CheckError(
'Just a moment: bot redirect challenge', 'Cloudflare'
)
}
ERRORS_TYPES = {
+34 -69
View File
@@ -1,13 +1,12 @@
import asyncio
import sys
import time
from typing import Any, Iterable, List
import alive_progress
from alive_progress import alive_bar
import tqdm
import sys
from typing import Iterable, Any, List
from .types import QueryDraft
def create_task_func():
if sys.version_info.minor > 6:
create_asyncio_task = asyncio.create_task
@@ -35,14 +34,9 @@ class AsyncExecutor:
class AsyncioSimpleExecutor(AsyncExecutor):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.semaphore = asyncio.Semaphore(kwargs.get('in_parallel', 100))
async def _run(self, tasks: Iterable[QueryDraft]):
async def sem_task(f, args, kwargs):
async with self.semaphore:
return await f(*args, **kwargs)
futures = [sem_task(f, args, kwargs) for f, args, kwargs in tasks]
futures = [f(*args, **kwargs) for f, args, kwargs in tasks]
return await asyncio.gather(*futures)
@@ -52,20 +46,9 @@ class AsyncioProgressbarExecutor(AsyncExecutor):
async def _run(self, tasks: Iterable[QueryDraft]):
futures = [f(*args, **kwargs) for f, args, kwargs in tasks]
total_tasks = len(futures)
results = []
# Use alive_bar for progress tracking
with alive_bar(total_tasks, title='Searching', force_tty=True) as progress:
# Chunk progress updates for efficiency
async def track_task(task):
result = await task
progress() # Update progress bar once task completes
return result
# Use gather to run tasks concurrently and track progress
results = await asyncio.gather(*(track_task(f) for f in futures))
for f in tqdm.asyncio.tqdm.as_completed(futures):
results.append(await f)
return results
@@ -83,12 +66,8 @@ class AsyncioProgressbarSemaphoreExecutor(AsyncExecutor):
async def semaphore_gather(tasks: Iterable[QueryDraft]):
coros = [_wrap_query(q) for q in tasks]
results = []
# Use alive_bar correctly as a context manager
with alive_bar(len(coros), title='Searching', force_tty=True) as progress:
for f in asyncio.as_completed(coros):
results.append(await f)
progress() # Update the progress bar
for f in tqdm.asyncio.tqdm.as_completed(coros):
results.append(await f)
return results
return await semaphore_gather(tasks)
@@ -98,35 +77,27 @@ class AsyncioProgressbarQueueExecutor(AsyncExecutor):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.workers_count = kwargs.get('in_parallel', 10)
self.progress_func = kwargs.get('progress_func', tqdm.tqdm)
self.queue = asyncio.Queue(self.workers_count)
self.timeout = kwargs.get('timeout')
# Pass a progress function; alive_bar by default
self.progress_func = kwargs.get('progress_func', alive_bar)
self.progress = None
# TODO: tests
async def increment_progress(self, count):
"""Update progress by calling the provided progress function."""
if self.progress:
if asyncio.iscoroutinefunction(self.progress):
await self.progress(count)
else:
self.progress(count)
await asyncio.sleep(0)
update_func = self.progress.update
if asyncio.iscoroutinefunction(update_func):
await update_func(count)
else:
update_func(count)
await asyncio.sleep(0)
# TODO: tests
async def stop_progress(self):
"""Stop the progress tracking."""
if hasattr(self.progress, "close") and self.progress:
close_func = self.progress.close
if asyncio.iscoroutinefunction(close_func):
await close_func()
else:
close_func()
await asyncio.sleep(0)
stop_func = self.progress.close
if asyncio.iscoroutinefunction(stop_func):
await stop_func()
else:
stop_func()
await asyncio.sleep(0)
async def worker(self):
"""Consume tasks from the queue and process them."""
while True:
try:
f, args, kwargs = self.queue.get_nowait()
@@ -141,33 +112,27 @@ class AsyncioProgressbarQueueExecutor(AsyncExecutor):
result = kwargs.get('default')
self.results.append(result)
if self.progress:
await self.increment_progress(1)
await self.increment_progress(1)
self.queue.task_done()
async def _run(self, queries: Iterable[QueryDraft]):
"""Main runner function to execute tasks with progress tracking."""
self.results: List[Any] = []
queries_list = list(queries)
min_workers = min(len(queries_list), self.workers_count)
workers = [create_task_func()(self.worker()) for _ in range(min_workers)]
# Initialize the progress bar
if self.progress_func:
with self.progress_func(len(queries_list), title="Searching", force_tty=True) as bar:
self.progress = bar # Assign alive_bar's callable to self.progress
self.progress = self.progress_func(total=len(queries_list))
# Add tasks to the queue
for t in queries_list:
await self.queue.put(t)
for t in queries_list:
await self.queue.put(t)
# Wait for tasks to complete
await self.queue.join()
await self.queue.join()
# Cancel any remaining workers
for w in workers:
w.cancel()
for w in workers:
w.cancel()
return self.results
await self.stop_progress()
return self.results
+7 -45
View File
@@ -1,7 +1,6 @@
"""
Maigret main module
"""
import ast
import asyncio
import logging
import os
@@ -41,10 +40,9 @@ from .submit import Submitter
from .types import QueryResultWrapper
from .utils import get_dict_ascii_tree
from .settings import Settings
from .permutator import Permute
def notify_about_errors(search_results: QueryResultWrapper, query_notify, show_statistics=False):
def notify_about_errors(search_results: QueryResultWrapper, query_notify):
errs = errors.extract_and_group(search_results)
was_errs_displayed = False
for e in errs:
@@ -58,17 +56,12 @@ def notify_about_errors(search_results: QueryResultWrapper, query_notify, show_s
query_notify.warning(text, '!')
was_errs_displayed = True
if show_statistics:
query_notify.warning(f'Verbose error statistics:')
for e in errs:
text = f'{e["err"]}: {round(e["perc"],2)}%'
query_notify.warning(text, '!')
if was_errs_displayed:
query_notify.warning(
'You can see detailed site check errors with a flag `--print-errors`'
)
def extract_ids_from_page(url, logger, timeout=5) -> dict:
results = {}
# url, headers
@@ -92,17 +85,8 @@ def extract_ids_from_page(url, logger, timeout=5) -> dict:
else:
print(get_dict_ascii_tree(info.items(), new_line=False), ' ')
for k, v in info.items():
# TODO: merge with the same functionality in checking module
if 'username' in k and not 'usernames' in k:
if 'username' in k:
results[v] = 'username'
elif 'usernames' in k:
try:
tree = ast.literal_eval(v)
if type(tree) == list:
for n in tree:
results[n] = 'username'
except Exception as e:
logger.warning(e)
if k in SUPPORTED_IDS:
results[v] = k
@@ -211,12 +195,6 @@ def setup_arguments_parser(settings: Settings):
choices=SUPPORTED_IDS,
help="Specify identifier(s) type (default: username).",
)
parser.add_argument(
"--permute",
action="store_true",
default=False,
help="Permute at least 2 usernames to generate more possible usernames.",
)
parser.add_argument(
"--db",
metavar="DB_FILE",
@@ -499,7 +477,7 @@ async def main():
arg_parser = setup_arguments_parser(settings)
args = arg_parser.parse_args()
# Re-set logging level based on args
# Re-set loggging level based on args
if args.debug:
log_level = logging.DEBUG
elif args.info:
@@ -514,10 +492,6 @@ async def main():
for u in args.username
if u and u not in ['-'] and u not in args.ignore_ids_list
}
original_usernames = ""
if args.permute and len(usernames) > 1 and args.id_type == 'username':
original_usernames = " ".join(usernames.keys())
usernames = Permute(usernames).gather(method='strict')
parsing_enabled = not args.disable_extracting
recursive_search_enabled = not args.disable_recursive_search
@@ -569,11 +543,7 @@ async def main():
# Database self-checking
if args.self_check:
if len(site_data) == 0:
query_notify.warning('No sites to self-check with the current filters! Exiting...')
return
query_notify.success(f'Maigret sites database self-check started for {len(site_data)} sites...')
print('Maigret sites database self-checking...')
is_need_update = await self_check(
db,
site_data,
@@ -592,9 +562,7 @@ async def main():
print('Database was successfully updated.')
else:
print('Updates will be applied only for current search session.')
if args.verbose or args.debug:
query_notify.info('Scan sessions flags stats: ' + str(db.get_scan_stats(site_data)))
print('Scan sessions flags stats: ' + str(db.get_scan_stats(site_data)))
# Database statistics
if args.stats:
@@ -613,12 +581,6 @@ async def main():
query_notify.warning('No usernames to check, exiting.')
sys.exit(0)
if len(usernames) > 1 and args.permute and args.id_type == 'username':
query_notify.warning(
f"{len(usernames)} permutations from {original_usernames} to check..." +
get_dict_ascii_tree(usernames, prepend="\t")
)
if not site_data:
query_notify.warning('No sites to check, exiting!')
sys.exit(2)
@@ -682,7 +644,7 @@ async def main():
check_domains=args.with_domains,
)
notify_about_errors(results, query_notify, show_statistics=args.verbose)
notify_about_errors(results, query_notify)
if args.reports_sorting == "data":
results = sort_report_by_data_points(results)
-4
View File
@@ -211,10 +211,6 @@ class QueryNotifyPrint(QueryNotify):
else:
print(msg)
def success(self, message, symbol="+"):
msg = f"[{symbol}] {message}"
self._colored_print(Fore.GREEN, msg)
def warning(self, message, symbol="-"):
msg = f"[{symbol}] {message}"
self._colored_print(Fore.YELLOW, msg)
-26
View File
@@ -1,26 +0,0 @@
# License MIT. by balestek https://github.com/balestek
from itertools import permutations
class Permute:
def __init__(self, elements: dict):
self.separators = ["", "_", "-", "."]
self.elements = elements
def gather(self, method: str = "strict" or "all") -> dict:
permutations_dict = {}
for i in range(1, len(self.elements) + 1):
for subset in permutations(self.elements, i):
if i == 1:
if method == "all":
permutations_dict[subset[0]] = self.elements[subset[0]]
permutations_dict["_" + subset[0]] = self.elements[subset[0]]
permutations_dict[subset[0] + "_"] = self.elements[subset[0]]
else:
for separator in self.separators:
perm = separator.join(subset)
permutations_dict[perm] = self.elements[subset[0]]
if separator == "":
permutations_dict["_" + perm] = self.elements[subset[0]]
permutations_dict[perm + "_"] = self.elements[subset[0]]
return permutations_dict
+2 -6
View File
@@ -8,7 +8,6 @@ from datetime import datetime
from typing import Dict, Any
import xmind
from dateutil.tz import gettz
from dateutil.parser import parse as parse_datetime_str
from jinja2 import Template
@@ -17,8 +16,6 @@ from .result import QueryStatus
from .sites import MaigretDatabase
from .utils import is_country_tag, CaseConverter, enrich_link_str
ADDITIONAL_TZINFO = {"CDT": gettz("America/Chicago")}
SUPPORTED_JSON_REPORT_FORMATS = [
"simple",
"ndjson",
@@ -295,8 +292,8 @@ def generate_report_context(username_results: list):
first_seen = created_at
else:
try:
known_time = parse_datetime_str(first_seen, tzinfos=ADDITIONAL_TZINFO)
new_time = parse_datetime_str(created_at, tzinfos=ADDITIONAL_TZINFO)
known_time = parse_datetime_str(first_seen)
new_time = parse_datetime_str(created_at)
if new_time < known_time:
first_seen = created_at
except Exception as e:
@@ -305,7 +302,6 @@ def generate_report_context(username_results: list):
first_seen,
created_at,
str(e),
exc_info=True,
)
for k, v in status.ids_data.items():
+286 -1593
View File
File diff suppressed because it is too large Load Diff
+2 -11
View File
@@ -1,30 +1,21 @@
{
"presence_strings": [
"user not found",
"404",
"Page not found",
"error 404",
"username",
"not found",
"пользователь",
"profile",
"lastname",
"firstname",
"DisplayName",
"biography",
"title",
"birthday",
"репутация",
"информация",
"e-mail",
"body",
"html",
"style"
"e-mail"
],
"supposed_usernames": [
"alex", "god", "admin", "red", "blue", "john"
],
"retries_count": 0,
"retries_count": 1,
"sites_db_path": "resources/data.json",
"timeout": 30,
"max_connections": 100,
+1
View File
@@ -68,6 +68,7 @@
<div class="row-mb">
<div class="col-md">
<div class="card flex-md-row mb-4 box-shadow h-md-250">
<span style="position: absolute; right: 10px;"><a href="https://github.com/soxoj/maigret/issues/new?assignees=soxoj&amp;labels=bug&amp;template=report-false-result.md&amp;title=Invalid%20result%20{{ v.url_user }}">Invalid?</a></span>
<img class="card-img-right flex-auto d-md-block" alt="Photo" style="width: 200px; height: 200px; object-fit: scale-down;" src="{{ v.status and v.status.ids_data and v.status.ids_data.image or 'https://i.imgur.com/040fmbw.png' }}" data-holder-rendered="true">
<div class="card-body d-flex flex-column align-items-start" style="padding-top: 0;">
<h3 class="mb-0" style="padding-top: 1rem;">
+1
View File
@@ -64,6 +64,7 @@
<div class="sitebox" style="margin-top: 20px;" >
<div>
<div>
<span class="invalid-button"><a href="https://github.com/soxoj/maigret/issues/new?assignees=soxoj&amp;labels=bug&amp;template=report-false-result.md&amp;title=Invalid%20result%20{{ v.url_user }}">Invalid?</a></span>
<table>
<tr>
<td valign="top">
+9 -65
View File
@@ -80,36 +80,6 @@ class MaigretSite:
def __str__(self):
return f"{self.name} ({self.url_main})"
def __is_equal_by_url_or_name(self, url_or_name_str: str):
lower_url_or_name_str = url_or_name_str.lower()
lower_url = self.url.lower()
lower_name = self.name.lower()
lower_url_main = self.url_main.lower()
return \
lower_name == lower_url_or_name_str or \
(lower_url_main and lower_url_main == lower_url_or_name_str) or \
(lower_url_main and lower_url_main in lower_url_or_name_str) or \
(lower_url_main and lower_url_or_name_str in lower_url_main) or \
(lower_url and lower_url_or_name_str in lower_url)
def __eq__(self, other):
if isinstance(other, MaigretSite):
# Compare only relevant attributes, not internal state like request_future
attrs_to_compare = ['name', 'url_main', 'url_subpath', 'type', 'headers',
'errors', 'activation', 'regex_check', 'url_probe',
'check_type', 'request_head_only', 'get_params',
'presense_strs', 'absence_strs', 'stats', 'engine',
'engine_data', 'alexa_rank', 'source', 'protocol']
return all(getattr(self, attr) == getattr(other, attr)
for attr in attrs_to_compare)
elif isinstance(other, str):
# Compare only by name (exactly) or url_main (partial similarity)
return self.__is_equal_by_url_or_name(other)
return False
def update_detectors(self):
if "url" in self.__dict__:
url = self.url
@@ -131,10 +101,6 @@ class MaigretSite:
return None
def extract_id_from_url(self, url: str) -> Optional[Tuple[str, str]]:
"""
Extracts username from url.
It's outdated, detects only a format of https://example.com/{username}
"""
if not self.url_regexp:
return None
@@ -257,16 +223,6 @@ class MaigretDatabase:
def sites_dict(self):
return {site.name: site for site in self._sites}
def has_site(self, site: MaigretSite):
for s in self._sites:
if site == s:
print(f"input == site: {site} == {s}")
return True
return False
def __contains__(self, site):
return self.has_site(site)
def ranked_sites_dict(
self,
reverse=False,
@@ -499,43 +455,31 @@ class MaigretDatabase:
for tag in filter(lambda x: not is_country_tag(x), site.tags):
tags[tag] = tags.get(tag, 0) + 1
enabled_count = total_count - disabled_count
enabled_perc = round(100 * enabled_count / total_count, 2)
output += (
f"Enabled/total sites: {enabled_count}/{total_count} = {enabled_perc}%\n\n"
)
enabled_count = total_count-disabled_count
enabled_perc = round(100*enabled_count/total_count, 2)
output += f"Enabled/total sites: {enabled_count}/{total_count} = {enabled_perc}%\n\n"
checks_perc = round(100 * message_checks_one_factor / enabled_count, 2)
checks_perc = round(100*message_checks_one_factor/enabled_count, 2)
output += f"Incomplete message checks: {message_checks_one_factor}/{enabled_count} = {checks_perc}% (false positive risks)\n\n"
status_checks_perc = round(100 * status_checks / enabled_count, 2)
status_checks_perc = round(100*status_checks/enabled_count, 2)
output += f"Status code checks: {status_checks}/{enabled_count} = {status_checks_perc}% (false positive risks)\n\n"
output += (
f"False positive risk (total): {checks_perc+status_checks_perc:.2f}%\n\n"
)
output += f"False positive risk (total): {checks_perc+status_checks_perc:.2f}%\n\n"
top_urls_count = 20
output += f"Top {top_urls_count} profile URLs:\n"
for url, count in sorted(urls.items(), key=lambda x: x[1], reverse=True)[
:top_urls_count
]:
for url, count in sorted(urls.items(), key=lambda x: x[1], reverse=True)[:top_urls_count]:
if count == 1:
break
output += f"- ({count})\t`{url}`\n" if is_markdown else f"{count}\t{url}\n"
top_tags_count = 20
output += f"\nTop {top_tags_count} tags:\n"
for tag, count in sorted(tags.items(), key=lambda x: x[1], reverse=True)[
:top_tags_count
]:
for tag, count in sorted(tags.items(), key=lambda x: x[1], reverse=True)[:top_tags_count]:
mark = ""
if tag not in self._tags:
mark = " (non-standard)"
output += (
f"- ({count})\t`{tag}`{mark}\n"
if is_markdown
else f"{count}\t{tag}{mark}\n"
)
output += f"- ({count})\t`{tag}`{mark}\n" if is_markdown else f"{count}\t{tag}{mark}\n"
return output
+21 -124
View File
@@ -1,12 +1,11 @@
import asyncio
import json
import re
from typing import List
from xml.etree import ElementTree
from typing import List, Tuple
import xml.etree.ElementTree as ET
from aiohttp import TCPConnector, ClientSession
import requests
import cloudscraper
from colorama import Fore, Style
from .activation import import_aiohttp_cookies
from .checking import maigret
@@ -37,13 +36,12 @@ class CloudflareSession:
async def close(self):
pass
class Submitter:
HEADERS = {
"User-Agent": get_random_user_agent(),
}
SEPARATORS = "\"'\n"
SEPARATORS = "\"'"
RATIO = 0.6
TOP_FEATURES = 5
@@ -56,7 +54,6 @@ class Submitter:
self.logger = logger
from aiohttp_socks import ProxyConnector
proxy = self.args.proxy
cookie_jar = None
if args.cookie_file:
@@ -72,7 +69,7 @@ class Submitter:
def get_alexa_rank(site_url_main):
url = f"http://data.alexa.com/data?cli=10&url={site_url_main}"
xml_data = requests.get(url).text
root = ElementTree.fromstring(xml_data)
root = ET.fromstring(xml_data)
alexa_rank = 0
try:
@@ -138,27 +135,20 @@ class Submitter:
if status == QueryStatus.CLAIMED:
changes["disabled"] = True
elif status == QueryStatus.CLAIMED:
print(
f"{Fore.YELLOW}[!] Not found `{username}` in {site.name}, must be claimed{Style.RESET_ALL}"
self.logger.warning(
f"Not found `{username}` in {site.name}, must be claimed"
)
self.logger.warning(site.json)
self.logger.info(results_dict[site.name])
changes["disabled"] = True
else:
print(
f"{Fore.YELLOW}[!] Found `{username}` in {site.name}, must be available{Style.RESET_ALL}"
self.logger.warning(
f"Found `{username}` in {site.name}, must be available"
)
self.logger.warning(site.json)
self.logger.info(results_dict[site.name])
changes["disabled"] = True
else:
print(f"{Fore.GREEN}[+] {username} is successfully checked: {status} in {site.name}{Style.RESET_ALL}")
self.logger.info(f"Site {site.name} checking is finished")
# remove service tag "unchecked"
if "unchecked" in site.tags:
site.tags.remove("unchecked")
changes["tags"] = site.tags
return changes
def generate_additional_fields_dialog(self, engine: MaigretEngine, dialog):
@@ -173,9 +163,7 @@ class Submitter:
fields['urlSubpath'] = f'/{subpath}'
return fields
async def detect_known_engine(
self, url_exists, url_mainpage
) -> [List[MaigretSite], str]:
async def detect_known_engine(self, url_exists, url_mainpage) -> [List[MaigretSite], str]:
resp_text = ''
try:
r = await self.session.get(url_mainpage)
@@ -233,8 +221,7 @@ class Submitter:
return [], resp_text
@staticmethod
def extract_username_dialog(url):
def extract_username_dialog(self, url):
url_parts = url.rstrip("/").split("/")
supposed_username = url_parts[-1].strip('@')
entered_username = input(
@@ -293,10 +280,6 @@ class Submitter:
a_minus_b = tokens_a.difference(tokens_b)
b_minus_a = tokens_b.difference(tokens_a)
# additional filtering by html response
a_minus_b = [t for t in a_minus_b if not t in non_exists_resp_text]
b_minus_a = [t for t in b_minus_a if not t in exists_resp_text]
if len(a_minus_b) == len(b_minus_a) == 0:
print("The pages for existing and non-existing account are the same!")
@@ -313,8 +296,6 @@ class Submitter:
:top_features_count
]
self.logger.debug([(keyword, match_fun(keyword)) for keyword in presence_list])
print("Detected text features of existing account: " + ", ".join(presence_list))
features = input("If features was not detected correctly, write it manually: ")
@@ -324,8 +305,6 @@ class Submitter:
absence_list = sorted(b_minus_a, key=match_fun, reverse=True)[
:top_features_count
]
self.logger.debug([(keyword, match_fun(keyword)) for keyword in absence_list])
print(
"Detected text features of non-existing account: " + ", ".join(absence_list)
)
@@ -350,76 +329,6 @@ class Submitter:
site = MaigretSite(url_mainpage.split("/")[-1], site_data)
return site
async def add_site(self, site):
sem = asyncio.Semaphore(1)
print(f"{Fore.BLUE}{Style.BRIGHT}[*] Adding site {site.name}, let's check it...{Style.RESET_ALL}")
result = await self.site_self_check(site, sem)
if result["disabled"]:
print(
f"Checks failed for {site.name}, please, verify them manually."
)
return {
"valid": False,
"reason": "checks_failed",
}
while True:
print("\nAvailable fields to edit:")
editable_fields = {
'1': 'name',
'2': 'tags',
'3': 'url',
'4': 'url_main',
'5': 'username_claimed',
'6': 'username_unclaimed',
'7': 'presense_strs',
'8': 'absence_strs',
}
for num, field in editable_fields.items():
current_value = getattr(site, field)
print(f"{num}. {field} (current: {current_value})")
print("0. finish editing")
print("10. reject and block domain")
print("11. invalid params, remove")
choice = input("\nSelect field number to edit (0-8): ").strip()
if choice == '0':
break
if choice == '10':
return {
"valid": False,
"reason": "manual block",
}
if choice == '11':
return {
"valid": False,
"reason": "remove",
}
if choice in editable_fields:
field = editable_fields[choice]
current_value = getattr(site, field)
new_value = input(f"Enter new value for {field} (current: {current_value}): ").strip()
if field in ['tags', 'presense_strs', 'absence_strs']:
new_value = list(map(str.strip, new_value.split(',')))
if new_value:
setattr(site, field, new_value)
print(f"Updated {field} to: {new_value}")
self.logger.info(site.json)
self.db.update_site(site)
return {
"valid": True,
}
async def dialog(self, url_exists, cookie_file):
domain_raw = self.URL_RE.sub("", url_exists).strip().strip("/")
domain_raw = domain_raw.split("/")[0]
@@ -452,16 +361,14 @@ class Submitter:
print('Detecting site engine, please wait...')
sites = []
text = None
try:
sites, text = await self.detect_known_engine(url_exists, url_exists)
except KeyboardInterrupt:
print('Engine detect process is interrupted.')
if 'cloudflare' in text.lower():
print(
'Cloudflare protection detected. I will use cloudscraper for futher work'
)
print('Cloudflare protection detected. I will use cloudscraper for futher work')
# self.session = CloudflareSession()
if not sites:
@@ -469,16 +376,11 @@ class Submitter:
redirects = False
if self.args.verbose:
redirects = (
'y' in input('Should we do redirects automatically? [yN] ').lower()
)
redirects = 'y' in input('Should we do redirects automatically? [yN] ').lower()
sites = [
await self.check_features_manually(
url_exists,
url_mainpage,
cookie_file,
redirects,
url_exists, url_mainpage, cookie_file, redirects,
)
]
@@ -498,7 +400,7 @@ class Submitter:
if not found:
print(
f"{Fore.RED}[!] The check for site '{chosen_site.name}' failed!{Style.RESET_ALL}"
f"Sorry, we couldn't find params to detect account presence/absence in {chosen_site.name}."
)
print(
"Try to run this mode again and increase features count or choose others."
@@ -522,18 +424,13 @@ class Submitter:
chosen_site.name = input("Change site name if you want: ") or chosen_site.name
chosen_site.tags = list(map(str.strip, input("Site tags: ").split(',')))
# rank = Submitter.get_alexa_rank(chosen_site.url_main)
# if rank:
# print(f'New alexa rank: {rank}')
# chosen_site.alexa_rank = rank
rank = Submitter.get_alexa_rank(chosen_site.url_main)
if rank:
print(f'New alexa rank: {rank}')
chosen_site.alexa_rank = rank
self.logger.debug(chosen_site.json)
site_data = chosen_site.strip_engine_data()
self.logger.debug(site_data.json)
self.db.update_site(site_data)
if self.args.db:
print(f"{Fore.GREEN}[+] Maigret DB is saved to {self.args.db}.{Style.RESET_ALL}")
self.db.save_to_file(self.args.db)
return True
-47
View File
@@ -1,47 +0,0 @@
# Download this first to avoid compatibility issues:
#
# sudo zypper in python3-devel
# sudo zypper in python3-dev
#
# Then run 'pip3 install -r opensuse.txt' as usual.
#
aiodns>=3.0.0
aiohttp>=3.8.6
aiohttp-socks>=0.7.1
arabic-reshaper~=3.0.0
async-timeout
attrs>=22.2.0
certifi>=2023.7.22
chardet>=5.0.0
colorama
future>=0.18.3
future-annotations>=1.0.0
html5lib>=1.1
idna>=3.4
Jinja2
lxml>=4.9.2
MarkupSafe
mock>=4.0.3
multidict
pycountry>=22.3.5
PyPDF2>=3.0.1
PySocks>=1.7.1
python-bidi>=0.4.2
requests
requests-futures>=1.0.0
six>=1.16.0
socid-extractor>=0.0.24
soupsieve>=2.3.2.post1
stem>=1.8.1
torrequest>=0.1.0
tqdm
typing-extensions
webencodings>=0.5.1
svglib
xhtml2pdf~=0.2.11
XMind>=1.2.0
yarl
networkx
pyvis>=0.2.1
reportlab
cloudscraper>=1.2.71
Generated
-2794
View File
File diff suppressed because it is too large Load Diff
+4 -4
View File
@@ -1,5 +1,5 @@
maigret @ https://github.com/soxoj/maigret/archive/refs/heads/main.zip
pefile==2023.2.7 # do not bump while pyinstaller is 6.11.1, there is a conflict
psutil==6.1.0
pyinstaller==6.11.1
pywin32-ctypes==0.2.1
pefile==2022.5.30
psutil==5.9.2
pyinstaller @ https://github.com/pyinstaller/pyinstaller/archive/develop.zip
pywin32-ctypes==0.2.0
-80
View File
@@ -1,80 +0,0 @@
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
[tool.poetry]
name = "maigret"
version = "0.4.4"
description = "🕵️‍♂️ Collect a dossier on a person by username from thousands of sites."
authors = ["Soxoj <soxoj@protonmail.com>"]
readme = "README.md"
license = "MIT License"
homepage = "https://pypi.org/project/maigret"
documentation = "https://maigret.readthedocs.io"
repository = "https://github.com/soxoj/maigret"
classifiers = [
"Development Status :: 5 - Production/Stable",
"Programming Language :: Python :: 3",
"Intended Audience :: Information Technology",
"Operating System :: OS Independent",
"License :: OSI Approved :: MIT License",
"Natural Language :: English"
]
[tool.poetry.urls]
"Bug Tracker" = "https://github.com/soxoj/maigret/issues"
[tool.poetry.dependencies]
python = "^3.10"
aiodns = "^3.0.0"
aiohttp = "^3.11.8"
aiohttp-socks = "^0.9.1"
arabic-reshaper = "^3.0.0"
async-timeout = "^5.0.1"
attrs = "^24.2.0"
certifi = "^2024.8.30"
chardet = "^5.0.0"
colorama = "^0.4.6"
future = "^1.0.0"
future-annotations= "^1.0.0"
html5lib = "^1.1"
idna = "^3.4"
Jinja2 = "^3.1.3"
lxml = "^5.3.0"
MarkupSafe = "^3.0.2"
mock = "^4.0.3"
multidict = "^6.0.4"
pycountry = "^24.6.1"
PyPDF2 = "^3.0.1"
PySocks = "^1.7.1"
python-bidi = "^0.6.3"
requests = "^2.31.0"
requests-futures = "^1.0.2"
six = "^1.16.0"
socid-extractor = "^0.0.26"
soupsieve = "^2.6"
stem = "^1.8.1"
torrequest = "^0.1.0"
alive_progress = "^3.2.0"
typing-extensions = "^4.8.0"
webencodings = "^0.5.1"
xhtml2pdf = "^0.2.11"
XMind = "^1.2.0"
yarl = "^1.8.2"
networkx = "^2.6.3"
pyvis = "^0.3.2"
reportlab = "^4.2.0"
cloudscraper = "^1.2.71"
[tool.poetry.group.dev.dependencies]
flake8 = "^7.1.1"
pytest = "^7.2.0"
pytest-asyncio = "^0.23.8"
pytest-cov = "^6.0.0"
pytest-httpserver = "^1.0.0"
pytest-rerunfailures = "^15.0"
reportlab = "^4.2.0"
[tool.poetry.scripts]
maigret = "maigret.maigret:run"
+39
View File
@@ -0,0 +1,39 @@
aiodns==3.0.0
aiohttp==3.8.3
aiohttp-socks==0.7.1
arabic-reshaper==2.1.4
async-timeout==4.0.2
attrs==22.1.0
certifi==2022.9.24
chardet==5.0.0
colorama==0.4.6
future==0.18.2
future-annotations==1.0.0
html5lib==1.1
idna==3.4
Jinja2==3.1.2
lxml==4.9.1
MarkupSafe==2.1.1
mock==4.0.3
multidict==6.0.2
pycountry==22.3.5
PyPDF2==2.10.8
PySocks==1.7.1
python-bidi==0.4.2
requests==2.28.1
requests-futures==1.0.0
six==1.16.0
socid-extractor>=0.0.21
soupsieve==2.3.2.post1
stem==1.8.1
torrequest==0.1.0
tqdm==4.64.1
typing-extensions==4.4.0
webencodings==0.5.1
xhtml2pdf==0.2.8
XMind==1.2.0
yarl==1.8.1
networkx==2.5.1
pyvis==0.2.1
reportlab==3.6.11
cloudscraper==1.2.66
+9
View File
@@ -0,0 +1,9 @@
[egg_info]
tag_build =
tag_date = 0
[flake8]
per-file-ignores = __init__.py:F401
[mypy]
ignore_missing_imports = True
+26
View File
@@ -0,0 +1,26 @@
from setuptools import (
setup,
find_packages,
)
with open('README.md') as fh:
long_description = fh.read()
with open('requirements.txt') as rf:
requires = rf.read().splitlines()
setup(name='maigret',
version='0.4.4',
description='Collect a dossier on a person by username from a huge number of sites',
long_description=long_description,
long_description_content_type="text/markdown",
url='https://github.com/soxoj/maigret',
install_requires=requires,
entry_points={'console_scripts': ['maigret = maigret.maigret:run']},
packages=find_packages(),
include_package_data=True,
author='Soxoj',
author_email='soxoj@protonmail.com',
license='MIT',
zip_safe=False)
+225 -283
View File
File diff suppressed because it is too large Load Diff
+35 -24
View File
@@ -1,32 +1,43 @@
title: Maigret
icon: static/maigret.png
name: maigret
summary: 🕵️‍♂️ Collect a dossier on a person by username from thousands of sites.
name: maigret2
adopt-info: maigret2
summary: SOCMINT / Instagram
description: |
**Maigret** collects a dossier on a person **by username only**, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys required. Maigret is an easy-to-use and powerful fork of Sherlock.
Currently supported more than 3000 sites, search is launched against 500 popular sites in descending order of popularity by default. Also supported checking of Tor sites, I2P sites, and domains (via DNS resolving).
Test Test Test
version: 0.4.4
license: MIT
base: core22
base: core20
grade: stable
confinement: strict
compression: lzo
source-code: https://github.com/soxoj/maigret
issues:
- https://github.com/soxoj/maigret/issues
donation:
- https://patreon.com/soxoj
contact:
- mailto:soxoj@protonmail.com
architectures:
- build-on: amd64
parts:
maigret:
plugin: python
source: .
type: app
apps:
maigret:
maigret2:
command: bin/maigret
plugs: [ network, network-bind, home ]
environment:
LC_ALL: C.UTF-8
plugs:
- home
- network
parts:
maigret2:
plugin: python
source: https://github.com/soxoj/maigret
source-type: git
build-packages:
- python3-pip
- python3-six
- python3
stage-packages:
- python3
- python3-six
override-pull: |
snapcraftctl pull
snapcraftctl set-version "$(git describe --tags | sed 's/^v//' | cut -d "-" -f1)"
Binary file not shown.

Before

Width:  |  Height:  |  Size: 45 KiB

After

Width:  |  Height:  |  Size: 9.0 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.8 MiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 44 KiB

+8
View File
@@ -0,0 +1,8 @@
reportlab==3.6.11
flake8==5.0.4
pytest==7.2.0
pytest-asyncio==0.16.0;python_version<"3.7"
pytest-asyncio==0.20.1;python_version>="3.7"
pytest-cov==4.0.0
pytest-httpserver==1.0.6
pytest-rerunfailures==10.2
+1 -1
View File
@@ -19,7 +19,7 @@ empty_mark = Mark('', (), {})
def by_slow_marker(item):
return item.get_closest_marker('slow', default=empty_mark).name
return item.get_closest_marker('slow', default=empty_mark)
def pytest_collection_modifyitems(items):
+9 -28
View File
@@ -1,44 +1,25 @@
{
"engines": {},
"sites": {
"ValidActive": {
"GooglePlayStore": {
"tags": ["global", "us"],
"disabled": false,
"checkType": "status_code",
"alexaRank": 1,
"url": "https://play.google.com/store/apps/developer?id={username}",
"urlMain": "https://play.google.com/store",
"usernameClaimed": "OpenAI",
"usernameClaimed": "Facebook_nosuchname",
"usernameUnclaimed": "noonewouldeverusethis7"
},
"InvalidActive": {
"tags": ["global", "us"],
"disabled": false,
"Reddit": {
"tags": ["news", "social", "us"],
"checkType": "status_code",
"alexaRank": 1,
"url": "https://play.google.com/store/apps/dev?id={username}",
"urlMain": "https://play.google.com/store",
"usernameClaimed": "OpenAI",
"usernameUnclaimed": "noonewouldeverusethis7"
},
"ValidInactive": {
"tags": ["global", "us"],
"presenseStrs": ["totalKarma"],
"disabled": true,
"checkType": "status_code",
"alexaRank": 1,
"url": "https://play.google.com/store/apps/developer?id={username}",
"urlMain": "https://play.google.com/store",
"usernameClaimed": "OpenAI",
"usernameUnclaimed": "noonewouldeverusethis7"
},
"InvalidInactive": {
"tags": ["global", "us"],
"disabled": true,
"checkType": "status_code",
"alexaRank": 1,
"url": "https://play.google.com/store/apps/dev?id={username}",
"urlMain": "https://play.google.com/store",
"usernameClaimed": "OpenAI",
"alexaRank": 17,
"url": "https://www.reddit.com/user/{username}",
"urlMain": "https://www.reddit.com/",
"usernameClaimed": "blue",
"usernameUnclaimed": "noonewouldeverusethis7"
}
}
+1 -2
View File
@@ -41,8 +41,7 @@ async def test_import_aiohttp_cookies():
f.write(COOKIES_TXT)
cookie_jar = import_aiohttp_cookies(cookies_filename)
# new aiohttp support
assert list(cookie_jar._cookies.keys()) in (['xss.is', 'httpbin.org'], [('xss.is', '/'), ('httpbin.org', '/')], [('xss.is', ''), ('httpbin.org', '')])
assert list(cookie_jar._cookies.keys()) == ['xss.is', 'httpbin.org']
url = 'https://httpbin.org/cookies'
connector = aiohttp.TCPConnector(ssl=False)
+1 -2
View File
@@ -23,12 +23,11 @@ DEFAULT_ARGS: Dict[str, Any] = {
'no_progressbar': False,
'parse_url': '',
'pdf': False,
'permute': False,
'print_check_errors': False,
'print_not_found': False,
'proxy': None,
'reports_sorting': 'default',
'retries': 0,
'retries': 1,
'self_check': False,
'site_list': [],
'stats': False,
-3
View File
@@ -13,7 +13,4 @@ def test_tags_validity(default_db):
if tag not in tags:
unknown_tags.add(tag)
# make sure all tags are known
# if you see "unchecked" tag error, please, do
# maigret --db `pwd`/maigret/resources/data.json --self-check --tag unchecked --use-disabled-sites
assert unknown_tags == set()
+4 -4
View File
@@ -55,12 +55,12 @@ async def test_asyncio_progressbar_queue_executor():
executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=2)
assert await executor.run(tasks) == [0, 1, 3, 2, 4, 6, 7, 5, 9, 8]
assert executor.execution_time > 0.5
assert executor.execution_time < 0.7
assert executor.execution_time < 0.6
executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=3)
assert await executor.run(tasks) == [0, 3, 1, 4, 6, 2, 7, 9, 5, 8]
assert executor.execution_time > 0.4
assert executor.execution_time < 0.6
assert executor.execution_time < 0.5
executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=5)
assert await executor.run(tasks) in (
@@ -68,9 +68,9 @@ async def test_asyncio_progressbar_queue_executor():
[0, 3, 6, 1, 4, 9, 7, 2, 5, 8],
)
assert executor.execution_time > 0.3
assert executor.execution_time < 0.5
assert executor.execution_time < 0.4
executor = AsyncioProgressbarQueueExecutor(logger=logger, in_parallel=10)
assert await executor.run(tasks) == [0, 3, 6, 9, 1, 4, 7, 2, 5, 8]
assert executor.execution_time > 0.2
assert executor.execution_time < 0.4
assert executor.execution_time < 0.3
+55 -12
View File
@@ -35,22 +35,65 @@ RESULTS_EXAMPLE = {
@pytest.mark.slow
@pytest.mark.asyncio
async def test_self_check_db(test_db):
# initalize logger to debug
def test_self_check_db_positive_disable(test_db):
logger = Mock()
assert test_db.sites[0].disabled is False
loop = asyncio.get_event_loop()
loop.run_until_complete(
self_check(test_db, test_db.sites_dict, logger, silent=True)
)
assert test_db.sites[0].disabled is True
@pytest.mark.slow
@pytest.mark.skip(reason="broken, fixme")
def test_self_check_db_positive_enable(test_db):
logger = Mock()
assert test_db.sites_dict['InvalidActive'].disabled is False
assert test_db.sites_dict['ValidInactive'].disabled is True
assert test_db.sites_dict['ValidActive'].disabled is False
assert test_db.sites_dict['InvalidInactive'].disabled is True
test_db.sites[0].disabled = True
test_db.sites[0].username_claimed = 'Skyeng'
assert test_db.sites[0].disabled is True
await self_check(test_db, test_db.sites_dict, logger, silent=False)
loop = asyncio.get_event_loop()
loop.run_until_complete(
self_check(test_db, test_db.sites_dict, logger, silent=True)
)
assert test_db.sites_dict['InvalidActive'].disabled is True
assert test_db.sites_dict['ValidInactive'].disabled is False
assert test_db.sites_dict['ValidActive'].disabled is False
assert test_db.sites_dict['InvalidInactive'].disabled is True
assert test_db.sites[0].disabled is False
@pytest.mark.slow
def test_self_check_db_negative_disabled(test_db):
logger = Mock()
test_db.sites[0].disabled = True
assert test_db.sites[0].disabled is True
loop = asyncio.get_event_loop()
loop.run_until_complete(
self_check(test_db, test_db.sites_dict, logger, silent=True)
)
assert test_db.sites[0].disabled is True
@pytest.mark.skip(reason='broken, fixme')
@pytest.mark.slow
def test_self_check_db_negative_enabled(test_db):
logger = Mock()
test_db.sites[0].disabled = False
test_db.sites[0].username_claimed = 'Skyeng'
assert test_db.sites[0].disabled is False
loop = asyncio.get_event_loop()
loop.run_until_complete(
self_check(test_db, test_db.sites_dict, logger, silent=True)
)
assert test_db.sites[0].disabled is False
@pytest.mark.slow
-17
View File
@@ -202,20 +202,3 @@ def test_get_url_template():
},
)
assert site.get_url_template() == "SUBDOMAIN"
def test_has_site_url_or_name(default_db):
# by the same url or partial match
assert default_db.has_site("https://aback.com.ua/user/") == True
assert default_db.has_site("https://aback.com.ua") == True
# acceptable partial match
assert default_db.has_site("https://aback.com.ua/use") == True
assert default_db.has_site("https://aback.com") == True
# by name
assert default_db.has_site("Aback") == True
# false
assert default_db.has_site("https://aeifgoai3h4g8a3u4g5") == False
assert default_db.has_site("aeifgoai3h4g8a3u4g5") == False
+14 -6
View File
@@ -3,13 +3,23 @@
This module generates the listing of supported sites in file `SITES.md`
and pretty prints file with sites data.
"""
import aiohttp
import asyncio
import json
import sys
import requests
import logging
import threading
import xml.etree.ElementTree as ET
from datetime import datetime
from argparse import ArgumentParser, RawDescriptionHelpFormatter
from maigret.maigret import get_response
from maigret.sites import MaigretDatabase, MaigretEngine
import tqdm.asyncio
from maigret.maigret import get_response, site_self_check
from maigret.sites import MaigretSite, MaigretDatabase, MaigretEngine
from maigret.utils import CaseConverter
async def check_engine_of_site(site_name, sites_with_engines, future, engine_name, semaphore, logger):
async with semaphore:
@@ -88,10 +98,8 @@ if __name__ == '__main__':
tasks.append(future)
# progress bar
with alive_progress(len(tasks), title='Checking sites') as progress:
for f in asyncio.as_completed(tasks):
loop.run_until_complete(f)
progress()
for f in tqdm.asyncio.tqdm.as_completed(tasks):
loop.run_until_complete(f)
print(f'Total detected {len(new_engine_sites)} sites on engine {engine_name}')
# dict with new found engine sites
+3 -5
View File
@@ -3,7 +3,7 @@ import json
import random
import re
import alive_progress
import tqdm.asyncio
from mock import Mock
import requests
@@ -181,7 +181,7 @@ if __name__ == '__main__':
raw_maigret_data = json.dumps({site.name: site.json for site in sites_subset})
new_sites = []
for site in alive_progress.alive_it(urls):
for site in tqdm.asyncio.tqdm(urls):
site_lowercase = site.lower()
domain_raw = URL_RE.sub('', site_lowercase).strip().strip('/')
@@ -271,9 +271,7 @@ if __name__ == '__main__':
future = asyncio.ensure_future(check_coro)
tasks.append(future)
with alive_progress(len(tasks), title='Checking sites') as progress:
for f in asyncio.as_completed(tasks):
progress()
for f in tqdm.asyncio.tqdm.as_completed(tasks, timeout=TIMEOUT):
try:
loop.run_until_complete(f)
except asyncio.exceptions.TimeoutError:
+4 -4
View File
@@ -3,12 +3,13 @@
This module generates the listing of supported sites in file `SITES.md`
and pretty prints file with sites data.
"""
import json
import sys
import requests
import logging
import threading
import xml.etree.ElementTree as ET
from datetime import datetime, timezone
from datetime import datetime
from argparse import ArgumentParser, RawDescriptionHelpFormatter
from maigret.maigret import MaigretDatabase
@@ -26,10 +27,9 @@ RANKS.update({
SEMAPHORE = threading.Semaphore(20)
def get_rank(domain_to_query, site, print_errors=True):
with SEMAPHORE:
# Retrieve ranking data via alexa API
#Retrieve ranking data via alexa API
url = f"http://data.alexa.com/data?cli=10&url={domain_to_query}"
xml_data = requests.get(url).text
root = ET.fromstring(xml_data)
@@ -137,7 +137,7 @@ Rank data fetched from Alexa by domains.
site_file.write(f'1. {favicon} [{site}]({url_main})*: top {valid_rank}{tags}*{note}\n')
db.update_site(site)
site_file.write(f'\nThe list was updated at ({datetime.now(timezone.utc).date()} UTC)\n')
site_file.write(f'\nThe list was updated at ({datetime.utcnow()} UTC)\n')
db.save_to_file(args.base_file)
statistics_text = db.get_db_stats(is_markdown=True)
+27 -13
View File
@@ -1,38 +1,56 @@
#!/usr/bin/env python3
import asyncio
import logging
import maigret
# top popular sites from the Maigret database
TOP_SITES_COUNT = 300
# Maigret HTTP requests timeout
TIMEOUT = 10
# max parallel requests
MAX_CONNECTIONS = 50
def main():
if __name__ == '__main__':
# setup logging and asyncio
logger = logging.getLogger('maigret')
logger.setLevel(logging.WARNING)
loop = asyncio.get_event_loop()
# setup Maigret
db = maigret.MaigretDatabase().load_from_file('./maigret/resources/data.json')
# also can be downloaded from web
# db = MaigretDatabase().load_from_url(MAIGRET_DB_URL)
# user input
username = input('Enter username to search: ')
sites_count = int(input(
sites_count_raw = input(
f'Select the number of sites to search ({TOP_SITES_COUNT} for default, {len(db.sites_dict)} max): '
)) or TOP_SITES_COUNT
)
sites_count = int(sites_count_raw) or TOP_SITES_COUNT
sites = db.ranked_sites_dict(top=sites_count)
show_progressbar = input('Do you want to show a progressbar? [Yn] ').lower() != 'n'
extract_info = input(
show_progressbar_raw = input('Do you want to show a progressbar? [Yn] ')
show_progressbar = show_progressbar_raw.lower() != 'n'
extract_info_raw = input(
'Do you want to extract additional info from accounts\' pages? [Yn] '
).lower() != 'n'
use_notifier = input(
)
extract_info = extract_info_raw.lower() != 'n'
use_notifier_raw = input(
'Do you want to use notifier for displaying results while searching? [Yn] '
).lower() != 'n'
)
use_notifier = use_notifier_raw.lower() != 'n'
notifier = None
if use_notifier:
notifier = maigret.Notifier(print_found_only=True, skip_check_errors=True)
# search!
search_func = maigret.search(
username=username,
site_dict=sites,
@@ -40,7 +58,7 @@ def main():
logger=logger,
max_connections=MAX_CONNECTIONS,
query_notify=notifier,
no_progressbar=not show_progressbar,
no_progressbar=(not show_progressbar),
is_parsing_enabled=extract_info,
)
@@ -51,7 +69,3 @@ def main():
for sitename, data in results.items():
is_found = data['status'].is_found()
print(f'{sitename} - {"Found!" if is_found else "Not found"}')
if __name__ == '__main__':
main()