Skip to content

Commit

Permalink
Minor adjustments to README
Browse files Browse the repository at this point in the history
Includes remove references to setup.py to prepare for eventual removal,
and new pip invocation example.
  • Loading branch information
kjd committed Sep 14, 2022
1 parent 9234d29 commit ff093ca
Showing 1 changed file with 92 additions and 86 deletions.
178 changes: 92 additions & 86 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,17 +1,19 @@
Internationalized Domain Names in Applications (IDNA)
=====================================================

Support for the Internationalised Domain Names in Applications
(IDNA) protocol as specified in `RFC 5891 <https://tools.ietf.org/html/rfc5891>`_.
This is the latest version of the protocol and is sometimes referred to as
“IDNA 2008”.
Support for the Internationalized Domain Names in
Applications (IDNA) protocol as specified in `RFC 5891
<https://tools.ietf.org/html/rfc5891>`_. This is the latest version of
the protocol and is sometimes referred to as “IDNA 2008”.

This library also provides support for Unicode Technical Standard 46,
`Unicode IDNA Compatibility Processing <https://unicode.org/reports/tr46/>`_.
This library also provides support for Unicode Technical
Standard 46, `Unicode IDNA Compatibility Processing
<https://unicode.org/reports/tr46/>`_.

This acts as a suitable replacement for the “encodings.idna” module that
comes with the Python standard library, but which only supports the
older superseded IDNA specification (`RFC 3490 <https://tools.ietf.org/html/rfc3490>`_).
This acts as a suitable replacement for the “encodings.idna”
module that comes with the Python standard library, but which
only supports the older superseded IDNA specification (`RFC 3490
<https://tools.ietf.org/html/rfc3490>`_).

Basic functions are simply executed:

Expand All @@ -27,24 +29,19 @@ Basic functions are simply executed:
Installation
------------

To install this library, you can use pip:
This package is available for installation from PyPI:

.. code-block:: bash
$ pip install idna
Alternatively, you can install the package using the bundled setup script:

.. code-block:: bash
$ python setup.py install
$ python3 -m pip install idna
Usage
-----

For typical usage, the ``encode`` and ``decode`` functions will take a domain
name argument and perform a conversion to A-labels or U-labels respectively.
For typical usage, the ``encode`` and ``decode`` functions will take a
domain name argument and perform a conversion to A-labels or U-labels
respectively.

.. code-block:: pycon
Expand All @@ -65,8 +62,8 @@ You may use the codec encoding and decoding methods using the
>>> print(b'xn--d1acufc.xn--80akhbyknj4f'.decode('idna'))
домен.испытание
Conversions can be applied at a per-label basis using the ``ulabel`` or ``alabel``
functions if necessary:
Conversions can be applied at a per-label basis using the ``ulabel`` or
``alabel`` functions if necessary:

.. code-block:: pycon
Expand All @@ -76,20 +73,22 @@ functions if necessary:
Compatibility Mapping (UTS #46)
+++++++++++++++++++++++++++++++

As described in `RFC 5895 <https://tools.ietf.org/html/rfc5895>`_, the IDNA
specification does not normalize input from different potential ways a user
may input a domain name. This functionality, known as a “mapping”, is
considered by the specification to be a local user-interface issue distinct
from IDNA conversion functionality.
As described in `RFC 5895 <https://tools.ietf.org/html/rfc5895>`_, the
IDNA specification does not normalize input from different potential
ways a user may input a domain name. This functionality, known as
a “mapping”, is considered by the specification to be a local
user-interface issue distinct from IDNA conversion functionality.

This library provides one such mapping, that was developed by the Unicode
Consortium. Known as `Unicode IDNA Compatibility Processing <https://unicode.org/reports/tr46/>`_,
it provides for both a regular mapping for typical applications, as well as
a transitional mapping to help migrate from older IDNA 2003 applications.
This library provides one such mapping, that was developed by the
Unicode Consortium. Known as `Unicode IDNA Compatibility Processing
<https://unicode.org/reports/tr46/>`_, it provides for both a regular
mapping for typical applications, as well as a transitional mapping to
help migrate from older IDNA 2003 applications.

For example, “Königsgäßchen” is not a permissible label as *LATIN CAPITAL
LETTER K* is not allowed (nor are capital letters in general). UTS 46 will
convert this into lower case prior to applying the IDNA conversion.
For example, “Königsgäßchen” is not a permissible label as *LATIN
CAPITAL LETTER K* is not allowed (nor are capital letters in general).
UTS 46 will convert this into lower case prior to applying the IDNA
conversion.

.. code-block:: pycon
Expand All @@ -102,36 +101,38 @@ convert this into lower case prior to applying the IDNA conversion.
>>> print(idna.decode('xn--knigsgchen-b4a3dun'))
königsgäßchen
Transitional processing provides conversions to help transition from the older
2003 standard to the current standard. For example, in the original IDNA
specification, the *LATIN SMALL LETTER SHARP S* (ß) was converted into two
*LATIN SMALL LETTER S* (ss), whereas in the current IDNA specification this
conversion is not performed.
Transitional processing provides conversions to help transition from
the older 2003 standard to the current standard. For example, in the
original IDNA specification, the *LATIN SMALL LETTER SHARP S* (ß) was
converted into two *LATIN SMALL LETTER S* (ss), whereas in the current
IDNA specification this conversion is not performed.

.. code-block:: pycon
>>> idna.encode('Königsgäßchen', uts46=True, transitional=True)
'xn--knigsgsschen-lcb0w'
Implementors should use transitional processing with caution, only in rare
cases where conversion from legacy labels to current labels must be performed
(i.e. IDNA implementations that pre-date 2008). For typical applications
that just need to convert labels, transitional processing is unlikely to be
beneficial and could produce unexpected incompatible results.
Implementors should use transitional processing with caution, only in
rare cases where conversion from legacy labels to current labels must be
performed (i.e. IDNA implementations that pre-date 2008). For typical
applications that just need to convert labels, transitional processing
is unlikely to be beneficial and could produce unexpected incompatible
results.

``encodings.idna`` Compatibility
++++++++++++++++++++++++++++++++

Function calls from the Python built-in ``encodings.idna`` module are
mapped to their IDNA 2008 equivalents using the ``idna.compat`` module.
Simply substitute the ``import`` clause in your code to refer to the
new module name.
Simply substitute the ``import`` clause in your code to refer to the new
module name.

Exceptions
----------

All errors raised during the conversion following the specification should
raise an exception derived from the ``idna.IDNAError`` base class.
All errors raised during the conversion following the specification
should raise an exception derived from the ``idna.IDNAError`` base
class.

More specific exceptions that may be generated as ``idna.IDNABidiError``
when the error reflects an illegal combination of left-to-right and
Expand All @@ -149,29 +150,31 @@ tables for performance. These tables are derived from computing against
eligibility criteria in the respective standards. These tables are
computed using the command-line script ``tools/idna-data``.

This tool will fetch relevant codepoint data from the Unicode repository
and perform the required calculations to identify eligibility. There are
This tool will fetch relevant codepoint data from the Unicode repository
and perform the required calculations to identify eligibility. There are
three main modes:

* ``idna-data make-libdata``. Generates ``idnadata.py`` and ``uts46data.py``,
the pre-calculated lookup tables using for IDNA and UTS 46 conversions. Implementors
who wish to track this library against a different Unicode version may use this tool
to manually generate a different version of the ``idnadata.py`` and ``uts46data.py``
files.
* ``idna-data make-libdata``. Generates ``idnadata.py`` and
``uts46data.py``, the pre-calculated lookup tables using for IDNA and
UTS 46 conversions. Implementors who wish to track this library against
a different Unicode version may use this tool to manually generate a
different version of the ``idnadata.py`` and ``uts46data.py`` files.

* ``idna-data make-table``. Generate a table of the IDNA disposition
(e.g. PVALID, CONTEXTJ, CONTEXTO) in the format found in Appendix B.1 of RFC
5892 and the pre-computed tables published by `IANA <https://www.iana.org/>`_.
(e.g. PVALID, CONTEXTJ, CONTEXTO) in the format found in Appendix
B.1 of RFC 5892 and the pre-computed tables published by `IANA
<https://www.iana.org/>`_.

* ``idna-data U+0061``. Prints debugging output on the various properties
associated with an individual Unicode codepoint (in this case, U+0061), that are
used to assess the IDNA and UTS 46 status of a codepoint. This is helpful in debugging
or analysis.
* ``idna-data U+0061``. Prints debugging output on the various
properties associated with an individual Unicode codepoint (in this
case, U+0061), that are used to assess the IDNA and UTS 46 status of a
codepoint. This is helpful in debugging or analysis.

The tool accepts a number of arguments, described using ``idna-data -h``. Most notably,
the ``--version`` argument allows the specification of the version of Unicode to use
in computing the table data. For example, ``idna-data --version 9.0.0 make-libdata``
will generate library data against Unicode 9.0.0.
The tool accepts a number of arguments, described using ``idna-data
-h``. Most notably, the ``--version`` argument allows the specification
of the version of Unicode to use in computing the table data. For
example, ``idna-data --version 9.0.0 make-libdata`` will generate
library data against Unicode 9.0.0.


Additional Notes
Expand All @@ -180,25 +183,28 @@ Additional Notes
* **Packages**. The latest tagged release version is published in the
`Python Package Index <https://pypi.org/project/idna/>`_.

* **Version support**. This library supports Python 3.5 and higher. As this library
serves as a low-level toolkit for a variety of applications, many of which strive
for broad compatibility with older Python versions, there is no rush to remove
older intepreter support. Removing support for older versions should be well
justified in that the maintenance burden has become too high.

* **Python 2**. Python 2 is supported by version 2.x of this library. While active
development of the version 2.x series has ended, notable issues being corrected
may be backported to 2.x. Use "idna<3" in your requirements file if you need this
library for a Python 2 application.

* **Testing**. The library has a test suite based on each rule of the IDNA specification, as
well as tests that are provided as part of the Unicode Technical Standard 46,
`Unicode IDNA Compatibility Processing <https://unicode.org/reports/tr46/>`_.

* **Emoji**. It is an occasional request to support emoji domains in this library. Encoding
of symbols like emoji is expressly prohibited by the technical standard IDNA 2008 and
emoji domains are broadly phased out across the domain industry due to associated security
risks. For now, applications that wish need to support these non-compliant labels may
wish to consider trying the encode/decode operation in this library first, and then falling
back to using `encodings.idna`. See `the Github project <https://github.com/kjd/idna/issues/18>`_
for more discussion.
* **Version support**. This library supports Python 3.5 and higher.
As this library serves as a low-level toolkit for a variety of
applications, many of which strive for broad compatibility with older
Python versions, there is no rush to remove older intepreter support.
Removing support for older versions should be well justified in that the
maintenance burden has become too high.

* **Python 2**. Python 2 is supported by version 2.x of this library.
While active development of the version 2.x series has ended, notable
issues being corrected may be backported to 2.x. Use "idna<3" in your
requirements file if you need this library for a Python 2 application.

* **Testing**. The library has a test suite based on each rule of the
IDNA specification, as well as tests that are provided as part of the
Unicode Technical Standard 46, `Unicode IDNA Compatibility Processing
<https://unicode.org/reports/tr46/>`_.

* **Emoji**. It is an occasional request to support emoji domains in
this library. Encoding of symbols like emoji is expressly prohibited by
the technical standard IDNA 2008 and emoji domains are broadly phased
out across the domain industry due to associated security risks. For
now, applications that wish need to support these non-compliant labels
may wish to consider trying the encode/decode operation in this library
first, and then falling back to using `encodings.idna`. See `the Github
project <https://github.com/kjd/idna/issues/18>`_ for more discussion.

0 comments on commit ff093ca

Please sign in to comment.