PEP: 3120
Title: Using UTF-8 as the default source encoding
Author: Martin von Lœwis <martin@v.loewis.de>
Created: 15-Apr-2007
Python-Version: 3.0
This PEP proposes
to change the default source encoding from ASCII to UTF-8.
(...)
In Python 1, the source encoding was unspecified, except that the source encoding had to be a superset of the system's basic execution character set (i.e. an ASCII superset, on most systems).
(...)
In Python 2.0, the source encoding changed to Latin-1 as a side effect of introducing Unicode.
(...)
PEP 263 identified the problem that you can use only those Unicode characters in a Unicode literal which are also in Latin-1,
and introduced a syntax for declaring the source encoding. If no source encoding was given, the default should be ASCII.
For compatibility with Python 2.0 and 2.1, files were interpreted as Latin-1 for a transitional period. This transition ended with Python 2.5, which gives an error if non-ASCII characters are encountered and no source encoding is declared.
(...)
With PEP 263, using arbitrary non-ASCII characters in a Python file is possible, but tedious. One has to explicitly add an encoding declaration.
(...)
For Python 2, an important reason for using non-UTF-8 encodings was that byte string literals would be in the source encoding at run-time, allowing then to output them to a file or render them to the user as-is.
With Python 3, all strings will be Unicode strings, so the
original encoding of the source will have no impact at run-time.
The parser needs to be changed to accept bytes > 127 if no source encoding is specified; instead of giving an error, it needs to check that the bytes are well-formed UTF-8 (...)
IDLE needs to be changed to use UTF-8 as the default encoding.
http://www.python.org/dev/peps/pep-3120/
Partager