Discussion:
[Cython] Py_New still good way to create classes?
Hoyt Koepke
2009-04-05 19:26:15 UTC
Permalink
Hello,

I'm using an extension class where I need to create a number of
instances quickly, and I'm looking at the section "Can Cython create
objects or apply operators to locally created objects as pure C code?"
on http://wiki.cython.org/FAQ. Is this information still correct?

In particular, I'm wondering about it in reference to supporting
cyclical garbage collection and creating object via PyObject_GC_New
etc. IIRC, cython automatically supports cyclical garbage collection
on classes that have python members, and my extension class falls into
this category. Should I

a) Use Py_NEW, as specified in the FAQ?
b) Replace Py_NEW with Py_Object_GC_New, and do a?
c) Ignore this and just go with standard creation and deletion,
ignoring the fact that I'll be calling the constructors through
python.

Thanks!
--Hoyt

++++++++++++++++++++++++++++++++++++++++++++++++
+ Hoyt Koepke
+ University of Washington Department of Statistics
+ http://www.stat.washington.edu/~hoytak/
+ hoytak-***@public.gmane.org
++++++++++++++++++++++++++++++++++++++++++
Robert Bradshaw
2009-04-15 09:31:42 UTC
Permalink
Post by Hoyt Koepke
Hello,
I'm using an extension class where I need to create a number of
instances quickly, and I'm looking at the section "Can Cython create
objects or apply operators to locally created objects as pure C code?"
on http://wiki.cython.org/FAQ. Is this information still correct?
In particular, I'm wondering about it in reference to supporting
cyclical garbage collection and creating object via PyObject_GC_New
etc. IIRC, cython automatically supports cyclical garbage collection
on classes that have python members, and my extension class falls into
this category. Should I
a) Use Py_NEW, as specified in the FAQ?
Yes, if one has an absolute need for speed. Cyclic garbage collection
is supported via a flag on the type.
Post by Hoyt Koepke
b) Replace Py_NEW with Py_Object_GC_New, and do a?
Py_Object_GC_New will allocate the space, but will not call the
__cinit__ functions, and in particular will not initialize the base
fields, so don't use this unless you really know what you're doing.
Post by Hoyt Koepke
c) Ignore this and just go with standard creation and deletion,
ignoring the fact that I'll be calling the constructors through
python.
This is what I would do, at least until I determined object creation
was a large bottleneck. PY_NEW as stated on the wiki calls the
__cinit__ functions, but avoids calling the __init__ functions, so
depending on your usecase it can be a savings. Also, it would be
desirable and feasible to potentially call __init__ via a C call
rather than a Python call (maybe, if the signature is simple enough,
using the cpdef trick to avoid the pythonic argument packing/
unpacking overhead.)

- Robert
Brent Pedersen
2009-04-15 17:39:01 UTC
Permalink
On Wed, Apr 15, 2009 at 2:31 AM, Robert Bradshaw
Post by Robert Bradshaw
Post by Hoyt Koepke
Hello,
I'm using an extension class where I need to create a number of
instances quickly, and I'm looking at the section "Can Cython create
objects or apply operators to locally created objects as pure C code?"
on http://wiki.cython.org/FAQ.  Is this information still correct?
In particular, I'm wondering about it in reference to supporting
cyclical garbage collection and creating object via PyObject_GC_New
etc.  IIRC, cython automatically supports cyclical garbage collection
on classes that have python members, and my extension class falls into
this category.  Should I
a) Use Py_NEW, as specified in the FAQ?
Yes, if one has an absolute need for speed. Cyclic garbage collection
is supported via a flag on the type.
Post by Hoyt Koepke
b) Replace Py_NEW with Py_Object_GC_New, and do a?
Py_Object_GC_New will allocate the space, but will not call the
__cinit__ functions, and in particular will not initialize the base
fields, so don't use this unless you really know what you're doing.
Post by Hoyt Koepke
c) Ignore this and just go with standard creation and deletion,
ignoring the fact that I'll be calling the constructors through
python.
This is what I would do, at least until I determined object creation
was a large bottleneck. PY_NEW as stated on the wiki calls the
__cinit__ functions, but avoids calling the __init__ functions, so
depending on your usecase it can be a savings. Also, it would be
desirable and feasible to potentially call __init__ via a C call
rather than a Python call (maybe, if the signature is simple enough,
using the cpdef trick to avoid the pythonic argument packing/
unpacking overhead.)
- Robert
_______________________________________________
Cython-dev mailing list
http://codespeak.net/mailman/listinfo/cython-dev
hi, i have a use-case where i'm creating lots of objects as well. so i
wrote up a quick test to see the speed of various methods:
http://gist.github.com/95916

assuming i haven't done anything stupid, results in seconds for that case are:

PY_NEW on Cython class 0.652909994125
__init__ on Python class 1.51110291481
batch PY_NEW 0.434900999069
__init__ on Cython class 0.673034906387
batch __init__ on Cython class 0.532235145569

-b
Stefan Behnel
2009-04-15 18:25:44 UTC
Permalink
Hi,

thanks for sharing that.
Post by Brent Pedersen
assuming i haven't done anything stupid
No, nothing stupid, but something that can reduce the comparability of the
timings. You are creating a 1000000 item list on each benchmark, using a
call to range() in some cases and a list comprehension in others.

It's usually better to move initialisations out of the timings, e.g. by
creating a large range() object once and re-using it. That reduces the
impact of unrelated operations on the absolute numbers.

Stefan
Brent Pedersen
2009-04-16 02:31:29 UTC
Permalink
Post by Stefan Behnel
Hi,
thanks for sharing that.
Post by Brent Pedersen
assuming i haven't done anything stupid
No, nothing stupid, but something that can reduce the comparability of the
timings. You are creating a 1000000 item list on each benchmark, using a
call to range() in some cases and a list comprehension in others.
It's usually better to move initialisations out of the timings, e.g. by
creating a large range() object once and re-using it. That reduces the
impact of unrelated operations on the absolute numbers.
ah, i see, updated that. fixing that makes the python constructor look
even slower.
now it assumes that creating a list comprehension without assgning to
a variable is
the same as calling a function that returns an array--also without assigning.

here are the new timings:

PY_NEW on Cython class: 1.137
__init__ on Python class: 28.468
__init__ on Python class with slots: 9.936
batch PY_NEW total: 0.821 , interval only: 0.363
batch __init__ on Cython class total 0.975 , interval_only: 0.524
__init__ on Cython class 1.154

so for this case using PY_NEW macro actual doesnt improve speed that
much over a cdef'ed class.
especially if using a "batch" method is applicable (as it is for my use-case).

i didnt realize slots would affect on object creation time that much
-- in this case it's 3x faster with slots!
(and then another 6-10x with a cdef'ed class)

here's my updated gist:
http://gist.github.com/95916
Post by Stefan Behnel
Stefan
_______________________________________________
Cython-dev mailing list
http://codespeak.net/mailman/listinfo/cython-dev
Stefan Behnel
2009-04-16 05:05:13 UTC
Permalink
Hi,

now, those are interesting timings. Which versions of Cython and Python did
you use?
Post by Brent Pedersen
PY_NEW on Cython class: 1.137
__init__ on Cython class 1.154
You're right, that's almost negligible. However, the __init__ benchmark
involves the overhead of a Python call to create_interval, which uses the
same Python calling convention as the constructor. So it's more or less
doing the same thing in both cases, just additionally calling __init__
through a C struct member dereference in the second case (but without
repacking the arguments).

Also note that it can actually be a feature that PY_NEW doesn't call
__init__(). I happily misuse that in lxml.etree to assign different
behaviour to a user class instantiation and an internal proxy creation.
Post by Brent Pedersen
batch PY_NEW total: 0.821 , interval only: 0.363
batch __init__ on Cython class total 0.975 , interval_only: 0.524
As expected, the difference is a lot larger here. Ignoring the setup
overhead, that's some 30% faster, as PY_NEW does the field initialisation
inside a plain C call, whereas the call to __init__ requires the Python
calling convention (i.e. tuple packing and unpacking).
Post by Brent Pedersen
__init__ on Python class: 28.468
__init__ on Python class with slots: 9.936
Ok, that's a completely different performance league. The first benchmark
requires setting up a dictionary for the instance and putting the
attributes there, whereas the second happily assigns the attributes to
predefined slots, without the overhead of a dict.

To sum it up, using PY_NEW from the beginning looks like a premature
optimisation to me. If you notice that class creation becomes a bottleneck
later on, *and* your classes are instantiated internally in the relevant
cases, *and* the setup involves working with plain C field values (such as
the 'start' and 'end' ints in your case), switching to PY_NEW can give you
about 30% in plain class instantiation time (YMWV, don't forget to
benchmark your own code).

Thanks for sharing,

Stefan
Brent Pedersen
2009-04-16 15:04:19 UTC
Permalink
Post by Stefan Behnel
Hi,
now, those are interesting timings. Which versions of Cython and Python did
you use?
hi,
i am using python 2.5 with cython .10 and .11 -- which both give same ballpark.
thanks for the explanation.
-brent
Post by Stefan Behnel
Post by Brent Pedersen
PY_NEW on Cython class: 1.137
__init__ on Cython class 1.154
You're right, that's almost negligible. However, the __init__ benchmark
involves the overhead of a Python call to create_interval, which uses the
same Python calling convention as the constructor. So it's more or less
doing the same thing in both cases, just additionally calling __init__
through a C struct member dereference in the second case (but without
repacking the arguments).
Also note that it can actually be a feature that PY_NEW doesn't call
__init__(). I happily misuse that in lxml.etree to assign different
behaviour to a user class instantiation and an internal proxy creation.
Post by Brent Pedersen
batch PY_NEW total: 0.821 , interval only: 0.363
batch __init__ on Cython class total 0.975 , interval_only: 0.524
As expected, the difference is a lot larger here. Ignoring the setup
overhead, that's some 30% faster, as PY_NEW does the field initialisation
inside a plain C call, whereas the call to __init__ requires the Python
calling convention (i.e. tuple packing and unpacking).
Post by Brent Pedersen
__init__ on Python class: 28.468
__init__ on Python class with slots: 9.936
Ok, that's a completely different performance league. The first benchmark
requires setting up a dictionary for the instance and putting the
attributes there, whereas the second happily assigns the attributes to
predefined slots, without the overhead of a dict.
To sum it up, using PY_NEW from the beginning looks like a premature
optimisation to me. If you notice that class creation becomes a bottleneck
later on, *and* your classes are instantiated internally in the relevant
cases, *and* the setup involves working with plain C field values (such as
the 'start' and 'end' ints in your case), switching to PY_NEW can give you
about 30% in plain class instantiation time (YMWV, don't forget to
benchmark your own code).
Thanks for sharing,
Stefan
_______________________________________________
Cython-dev mailing list
http://codespeak.net/mailman/listinfo/cython-dev
Hoyt Koepke
2009-04-16 17:24:34 UTC
Permalink
Thanks everyone, this discussion is quite informative. My classes all
fall in the potential-30% class, and I've already written it in, so I
think I'll keep things in.

-- Hoyt

++++++++++++++++++++++++++++++++++++++++++++++++
+ Hoyt Koepke
+ University of Washington Department of Statistics
+ http://www.stat.washington.edu/~hoytak/
+ hoytak-***@public.gmane.org
++++++++++++++++++++++++++++++++++++++++++
Robert Bradshaw
2009-04-18 08:15:29 UTC
Permalink
On Wed, Apr 15, 2009 at 11:25 AM, Stefan Behnel
Post by Stefan Behnel
Hi,
thanks for sharing that.
Post by Brent Pedersen
assuming i haven't done anything stupid
No, nothing stupid, but something that can reduce the
comparability of the
timings. You are creating a 1000000 item list on each benchmark, using a
call to range() in some cases and a list comprehension in others.
It's usually better to move initialisations out of the timings, e.g. by
creating a large range() object once and re-using it. That reduces the
impact of unrelated operations on the absolute numbers.
ah, i see, updated that. fixing that makes the python constructor look
even slower.
now it assumes that creating a list comprehension without assgning to
a variable is
the same as calling a function that returns an array--also without assigning.
PY_NEW on Cython class: 1.137
__init__ on Python class: 28.468
__init__ on Python class with slots: 9.936
batch PY_NEW total: 0.821 , interval only: 0.363
batch __init__ on Cython class total 0.975 , interval_only: 0.524
__init__ on Cython class 1.154
so for this case using PY_NEW macro actual doesnt improve speed that
much over a cdef'ed class.
especially if using a "batch" method is applicable (as it is for my use-case).
Here's some more data points, specific to Sage (Python 2.5, Cython 0.11)

-------------------

%cython

from sage.rings.integer cimport Integer

cdef class A:
pass

def time_py_new(long N):
cdef long i
for i from 0 <= i < N:
z = PY_NEW(Integer)

def time_py_init(long N):
cdef long i
for i from 0 <= i < N:
z = Integer()

def time_py_new_A(long N):
cdef long i
for i from 0 <= i < N:
z = PY_NEW(A)

def time_py_init_A(long N):
cdef long i
for i from 0 <= i < N:
z = A()

--------------------

sage: time time_py_init(10**7)
Time: CPU 0.67 s, Wall: 0.68 s
sage: time time_py_new(10**7)
Time: CPU 0.16 s, Wall: 0.17 s
sage: time time_py_init_A(10**7)
Time: CPU 0.66 s, Wall: 0.67 s
sage: time time_py_new_A(10**7)
Time: CPU 0.47 s, Wall: 0.48 s

Note that I have an empty constructor in both cases. In summary,
Integer has a custom tp_new slot with a pool to avoid allocation/
deallocation overhead, and PY_NEW here saves quite a bit. The 30%
speed difference for a standard cdef class seems to hold about right--
probably two thirds of the time is probably in allocating memory from
the heap (and also releasing it, in my test).

- Robert

Continue reading on narkive:
Loading...