mirror of
https://github.com/soxoj/maigret.git
synced 2026-05-06 22:19:01 +00:00
Tags and site names improvements (#2427)
- Added social tag to social networks (33 sites) - Fixed wrong tags (8 sites) - Filled empty tags for 213 sites in top-1000 - Country tag cleanup (~374 sites) - Site naming normalization (75 sites) - New tests (3) - Documentation updates
This commit is contained in:
@@ -69,6 +69,21 @@ Use the following commands to check Maigret:
|
||||
make speed
|
||||
|
||||
|
||||
Site naming conventions
|
||||
-----------------------------------------------
|
||||
|
||||
Site names are the keys in ``data.json`` and appear in user-facing reports. Follow these rules:
|
||||
|
||||
- **Title Case** by default: ``Product Hunt``, ``Hacker News``.
|
||||
- **Lowercase** only if the brand itself is written that way: ``kofi``, ``note``, ``hi5``.
|
||||
- **No domain suffix** (``calendly.com`` → ``Calendly``), unless the domain is part of the recognized brand name: ``last.fm``, ``VC.ru``, ``Archive.org``.
|
||||
- **No full UPPERCASE** unless the brand is an acronym: ``VK``, ``CNET``, ``ICQ``, ``IFTTT``.
|
||||
- **No** ``www.`` **or** ``https://`` **prefix** in the name.
|
||||
- **Spaces** are allowed when the brand uses them: ``Star Citizen``, ``Google Maps``.
|
||||
- **{username} templates** in names are acceptable: ``{username}.tilda.ws``.
|
||||
|
||||
When in doubt, check how the service refers to itself on its homepage.
|
||||
|
||||
How to fix false-positives
|
||||
-----------------------------------------------
|
||||
|
||||
|
||||
@@ -10,7 +10,12 @@ The use of tags allows you to select a subset of the sites from big Maigret DB f
|
||||
|
||||
There are several types of tags:
|
||||
|
||||
1. **Country codes**: ``us``, ``jp``, ``br``... (`ISO 3166-1 alpha-2 <https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2>`_). These tags reflect the site language and regional origin of its users and are then used to locate the owner of a username. If the regional origin is difficult to establish or a site is positioned as worldwide, `no country code is given`. There could be multiple country code tags for one site.
|
||||
1. **Country codes**: ``us``, ``jp``, ``br``... (`ISO 3166-1 alpha-2 <https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2>`_). A country tag means that having an account on the site implies a connection to that country — either origin or residence. The goal is attribution, not perfect accuracy.
|
||||
|
||||
- **Global sites** (GitHub, YouTube, Reddit, Medium, etc.) get **no country tag** — an account there says nothing about where a person is from.
|
||||
- **Regional/local sites** where an account implies a specific country **must** have a country tag: ``VK`` → ``ru``, ``Naver`` → ``kr``, ``Zhihu`` → ``cn``.
|
||||
- Multiple country tags are allowed when a service is used predominantly in a few countries (e.g. ``Xing`` → ``de``, ``eu``).
|
||||
- Do **not** assign country tags based on traffic statistics alone — a site popular in India by traffic is not "Indian" if it is used globally.
|
||||
|
||||
2. **Site engines**. Most of them are forum engines now: ``uCoz``, ``vBulletin``, ``XenForo`` et al. Full list of engines stored in the Maigret database.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user