Skip to content

Latest commit

 

History

History
440 lines (365 loc) · 14.6 KB

File metadata and controls

440 lines (365 loc) · 14.6 KB

类型系统

Python 中的类型系统:

 is instance of
        +-----+  +---------------------+ +------------------------+
        |     |  |    is instance of   | |     is instance of     |
        |     |  |                     | |                        |
        |   +-+--v------+           +--+-v-----+            +-----+-------+
        +--->           |           |          |            |             |
            |   type    |           |  Class A |            |  instance a |
            |           |           |          |            |             |
            +---+----^--+           +-----+----+            +-------------+
                |    |                    |
is sub-class of |    |is instance of      |
                |    |                    |
                |    |                    |
            +---v----+--+                 |
            |           |  is sub-class of|
     +----->|  object   <-----------------+
     |      |           |
     |      +-----------+
     |             |
     +-------------+
        is sub-class of

上图中有两个比较特殊的对象:object 和 type。

object 是所有类的基类. 为了类型系统的自恰, object 是自己的基类, type 的基类是 object .

>>> issubclass(int, object)
True
>>> issubclass(str, object)
True
>>> issubclass(object, object)
True
>>> issubclass(type, object)
True

type 是 metaclass,class 是通过 metaclass 创建的,所以 class 是 type 的实例. 为了类型系统的自恰, type 的类型属于自己, object 的类型属于 type.

>>> isinstance(int, type)
True
>>> isinstance(object, type)
True
>>> isinstance(type, type)
True

数据结构

来自 Include/object.h 的注释:

/* Object and type object interface */

/*
Objects are structures allocated on the heap.  Special rules apply to
the use of objects to ensure they are properly garbage-collected.
Objects are never allocated statically or on the stack; they must be
accessed through special macros and functions only.  (Type objects are
exceptions to the first rule; the standard types are represented by
statically initialized type objects, although work on type/class unification
for Python 2.2 made it possible to have heap-allocated type objects too).

An object has a 'reference count' that is increased or decreased when a
pointer to the object is copied or deleted; when the reference count
reaches zero there are no references to the object left and it can be
removed from the heap.

An object has a 'type' that determines what it represents and what kind
of data it contains.  An object's type is fixed when it is created.
Types themselves are represented as objects; an object contains a
pointer to the corresponding type object.  The type itself has a type
pointer pointing to the object representing the type 'type', which
contains a pointer to itself!.

Objects do not float around in memory; once allocated an object keeps
the same size and address.  Objects that must hold variable-size data
can contain pointers to variable-size parts of the object.  Not all
objects of the same type have the same size; but the size cannot change
after allocation.  (These restrictions are made so a reference to an
object can be simply a pointer -- moving an object would require
updating all the pointers, and changing an object's size would require
moving it if there was another object right next to it.)

Objects are always accessed through pointers of the type 'PyObject *'.
The type 'PyObject' is a structure that only contains the reference count
and the type pointer.  The actual memory allocated for an object
contains other data that can only be accessed after casting the pointer
to a pointer to a longer structure type.  This longer type must start
with the reference count and type fields; the macro PyObject_HEAD should be
used for this (to accommodate for future changes).  The implementation
of a particular object type can cast the object pointer to the proper
type and back.

A standard interface exists for objects that contain an array of items
whose size is determined when the object is allocated.
*/

PyObject

最最基础的 PyObject:

// Include/object.h 
typedef struct _object {
    Py_ssize_t ob_refcnt;
    PyTypeObject *ob_type;
} PyObject;

Python 中所有对象结构体的开头都是 PyObject.

PyVarObject 表示变长对象, 其中的 ob_size 一般表示该对象包含多少个元素, 并不一定是字节长度.

// Include/object.h 
typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;

PyTypeObject

// Include/cpython/object.h
struct _typeobject {
    PyObject_VAR_HEAD
    const char *tp_name; /* For printing, in format "<module>.<name>" */
    Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */

    /* Methods to implement standard operations */

    destructor tp_dealloc;
    Py_ssize_t tp_vectorcall_offset;
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    PyAsyncMethods *tp_as_async; /* formerly known as tp_compare (Python 2)
                                    or tp_reserved (Python 3) */
    reprfunc tp_repr;

    /* Method suites for standard classes */

    PyNumberMethods *tp_as_number;
    PySequenceMethods *tp_as_sequence;
    PyMappingMethods *tp_as_mapping;

    /* More standard operations (here for binary compatibility) */

    hashfunc tp_hash;
    ternaryfunc tp_call;
    reprfunc tp_str;
    getattrofunc tp_getattro;
    setattrofunc tp_setattro;

    /* Functions to access object as input/output buffer */
    PyBufferProcs *tp_as_buffer;

    /* Flags to define presence of optional/expanded features */
    unsigned long tp_flags;

    const char *tp_doc; /* Documentation string */

    /* Assigned meaning in release 2.0 */
    /* call function for all accessible objects */
    traverseproc tp_traverse;

    /* delete references to contained objects */
    inquiry tp_clear;

    /* Assigned meaning in release 2.1 */
    /* rich comparisons */
    richcmpfunc tp_richcompare;

    /* weak reference enabler */
    Py_ssize_t tp_weaklistoffset;

    /* Iterators */
    getiterfunc tp_iter;
    iternextfunc tp_iternext;

    /* Attribute descriptor and subclassing stuff */
    struct PyMethodDef *tp_methods;
    struct PyMemberDef *tp_members;
    struct PyGetSetDef *tp_getset;
    struct _typeobject *tp_base;
    PyObject *tp_dict;
    descrgetfunc tp_descr_get;
    descrsetfunc tp_descr_set;
    Py_ssize_t tp_dictoffset;
    initproc tp_init;
    allocfunc tp_alloc;
    newfunc tp_new;
    freefunc tp_free; /* Low-level free-memory routine */
    inquiry tp_is_gc; /* For PyObject_IS_GC */
    PyObject *tp_bases;
    PyObject *tp_mro; /* method resolution order */
    PyObject *tp_cache;
    PyObject *tp_subclasses;
    PyObject *tp_weaklist;
    destructor tp_del;

    /* Type attribute cache version tag. Added in version 2.6 */
    unsigned int tp_version_tag;

    destructor tp_finalize;
    vectorcallfunc tp_vectorcall;
};

/* The *real* layout of a type object when allocated on the heap */
typedef struct _heaptypeobject {
    /* Note: there's a dependency on the order of these members
       in slotptr() in typeobject.c . */
    PyTypeObject ht_type;
    PyAsyncMethods as_async;
    PyNumberMethods as_number;
    PyMappingMethods as_mapping;
    PySequenceMethods as_sequence; /* as_sequence comes after as_mapping,
                                      so that the mapping wins when both
                                      the mapping and the sequence define
                                      a given operator (e.g. __getitem__).
                                      see add_operators() in typeobject.c . */
    PyBufferProcs as_buffer;
    PyObject *ht_name, *ht_slots, *ht_qualname;
    struct _dictkeysobject *ht_cached_keys;
    PyObject *ht_module;
    /* here are optional user slots, followed by the members. */
} PyHeapTypeObject;

PyBaseObject_Type(object)

PyBaseObject_Type 的定义如下:

// Objects/typeobject.c

PyTypeObject PyBaseObject_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "object",                                   /* tp_name */
    sizeof(PyObject),                           /* tp_basicsize */
    0,                                          /* tp_itemsize */
    object_dealloc,                             /* tp_dealloc */
    0,                                          /* tp_vectorcall_offset */
    0,                                          /* tp_getattr */
    0,                                          /* tp_setattr */
    0,                                          /* tp_as_async */
    object_repr,                                /* tp_repr */
    0,                                          /* tp_as_number */
    0,                                          /* tp_as_sequence */
    0,                                          /* tp_as_mapping */
    (hashfunc)_Py_HashPointer,                  /* tp_hash */
    0,                                          /* tp_call */
    object_str,                                 /* tp_str */
    PyObject_GenericGetAttr,                    /* tp_getattro */
    PyObject_GenericSetAttr,                    /* tp_setattro */
    0,                                          /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,   /* tp_flags */
    object_doc,                                 /* tp_doc */
    0,                                          /* tp_traverse */
    0,                                          /* tp_clear */
    object_richcompare,                         /* tp_richcompare */
    0,                                          /* tp_weaklistoffset */
    0,                                          /* tp_iter */
    0,                                          /* tp_iternext */
    object_methods,                             /* tp_methods */
    0,                                          /* tp_members */
    object_getsets,                             /* tp_getset */
    0,                                          /* tp_base */
    0,                                          /* tp_dict */
    0,                                          /* tp_descr_get */
    0,                                          /* tp_descr_set */
    0,                                          /* tp_dictoffset */
    object_init,                                /* tp_init */
    PyType_GenericAlloc,                        /* tp_alloc */
    object_new,                                 /* tp_new */
    PyObject_Del,                               /* tp_free */
};

内部类

内部类的实现可以参考 tuple, list 等对象的代码.

基本原理:

  • 定义对象结构体

    typedef struct {
        PyObject_VAR_HEAD
        int foo;
    } PyFooObject;
  • 定义类型结构体

    PyTypeObject PyFoo_Type = {
        PyVarObject_HEAD_INIT(&PyType_Type, 0) 
        "foo",                                    /* tp_name */
        // 定义剩下的 tp_xxx
    }
  • 实现 tp_new, tp_alloc, tp_dealloc 等关键方法.

  • 实现其它可选方法和属性(PyMethodDef, PyMemberDef).

用户类

对于用户自定义的类,Python 并不知道用户会在类中定义多少变量和属性,所以必须使用另外一种结构来表示用户自定义的类,那就是 PyHeapTypeObject。

类的创建

类的创建流程见类型系统之元类.

实例的创建

用于测试的 Python 代码如下:

class C:
    pass

C()

对应的字节码如下:

28 LOAD_NAME                2 (C)
30 CALL_FUNCTION            0
32 POP_TOP

CALL_FUNCTION 的调用路径:call_function -> PyObject_Vectorcall -> _PyObject_VectorcallTstate -> _PyObject_MakeTpCall -> tp_call(type(C).__call__) -> tp_new(C.__new__) -> tp_init(C.__init__).

如果没有自定义 __new__ 方法, 那么 tp_new 继承自 object.

PyBaseObject_Type.tp_new 的代码如下:

static PyObject *
object_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    if (excess_args(args, kwds)) {
        if (type->tp_new != object_new) {
            PyErr_SetString(PyExc_TypeError,
                            "object.__new__() takes exactly one argument (the type to instantiate)");
            return NULL;
        }
        if (type->tp_init == object_init) {
            PyErr_Format(PyExc_TypeError, "%.200s() takes no arguments",
                         type->tp_name);
            return NULL;
        }
    }

    // 如果 type 是抽象类, 那么抛出异常.
    if (type->tp_flags & Py_TPFLAGS_IS_ABSTRACT) {
        PyObject *abstract_methods;
        PyObject *sorted_methods;
        PyObject *joined;
        PyObject *comma;
        _Py_static_string(comma_id, ", ");
        Py_ssize_t method_count;

        /* Compute ", ".join(sorted(type.__abstractmethods__))
           into joined. */
        abstract_methods = type_abstractmethods(type, NULL);
        if (abstract_methods == NULL)
            return NULL;
        sorted_methods = PySequence_List(abstract_methods);
        Py_DECREF(abstract_methods);
        if (sorted_methods == NULL)
            return NULL;
        if (PyList_Sort(sorted_methods)) {
            Py_DECREF(sorted_methods);
            return NULL;
        }
        comma = _PyUnicode_FromId(&comma_id);
        if (comma == NULL) {
            Py_DECREF(sorted_methods);
            return NULL;
        }
        joined = PyUnicode_Join(comma, sorted_methods);
        method_count = PyObject_Length(sorted_methods);
        Py_DECREF(sorted_methods);
        if (joined == NULL)
            return NULL;
        if (method_count == -1)
            return NULL;

        PyErr_Format(PyExc_TypeError,
                     "Can't instantiate abstract class %s "
                     "with abstract method%s %U",
                     type->tp_name,
                     method_count > 1 ? "s" : "",
                     joined);
        Py_DECREF(joined);
        return NULL;
    }
    return type->tp_alloc(type, 0);
}

如果用户没有自定义 __init__ 方法, 那么 tp_init 继承自 object.

PyBaseObject_Type.tp_init 的代码如下:

static int
object_init(PyObject *self, PyObject *args, PyObject *kwds)
{
    PyTypeObject *type = Py_TYPE(self);
    if (excess_args(args, kwds)) {
        if (type->tp_init != object_init) {
            PyErr_SetString(PyExc_TypeError,
                            "object.__init__() takes exactly one argument (the instance to initialize)");
            return -1;
        }
        if (type->tp_new == object_new) {
            PyErr_Format(PyExc_TypeError,
                         "%.200s.__init__() takes exactly one argument (the instance to initialize)",
                         type->tp_name);
            return -1;
        }
    }
    return 0;
}