Discussion:
Rewriting/compiling parts of CPython's stdlib in Cython
Stefan Behnel
2011-03-22 06:10:09 UTC
Permalink
Hi,

there seems to be quite some interest in a project to get parts of CPython
and specifically its stdlib rewritten in Cython. I've copied the latest
python-dev mail below. The relevant part of the thread is here:

http://thread.gmane.org/gmane.comp.python.devel/122273/focus=122798

In short, we have strong supporters, but Guido has understandable doubts
against a new (and quite large) dependency and potential semantic
deviations. But there seem to be cases where slight changes would be
acceptable that Cython compiled modules might introduce, such as emitting
different exception messages, changing Python classes into extension
classes, or even preventing monkey patching in modules that are backed by C
modules anyway.

It would be helpful to get support from the side of external distributors
that use Cython already, e.g. Sage, Enthought/SciPy, ActiveState, etc. If
they agreed to test the Cython generated stdlib modules in their
distributions, we could get user feedback that would allow python-dev to
take a well founded decision.

Do we have any volunteers for trying this out? Both on the side of
distributors and implementors?

At the current state of affairs, the implementation could still be financed
by a Python backed GSoC project, although it would be cool if more users
could just step up and simply try to compile and optimise stdlib modules
(preferably without major changes to the code). It's certainly a great way
to show off your Cython skills :). I gave it a try with difflib and it
turned out to be quite easy.

http://blog.behnel.de/index.php?p=155

Reimplementing existing C modules in Cython might, however, be more
interesting for python-dev, but also be a larger undertaking. So a GSoC
might be worth running on that.

Note that the latest Cython release does not have generator support yet,
and Vitja's branch on github is not very stable. We will try to get it up
to speed and merged during the workshop next week, at which point it will
make more sense to get this project started than right now.

Stefan
Thanks for the clarifications. I now have a much better understanding
of what Cython is. But I'm not sold. For one, your attitude about
strict language compatibility worries me when it comes to the stdlib.
Not sure what you mean exactly. Given our large user base, we do worry a
lot about things like backwards compatibility, for example.
If you are referring to compatibility with Python, I don't think anyone
in the project really targets Cython as a a drop-in replacement for a
Python runtime. We aim to compile Python code, yes, and there's a
hand-wavy idea in the back of our head that we may want a plain Python
compatibility mode at some point that will disable several important
optimisations.
I think that's the attitude Guido worries about: if you don't have the
desire to provide 100% Python compatibility under all circumstances
(i.e. including if someone passes parameters of "incorrect" types),
then there is very little chance that we would replace a Python module
with a Cython-compiled one.
The only exception would be cases where the Python semantics is murky
(e.g. where Jython or so actually behaves differently for the same
Python code, and still claims language conformance). E.g. the exact
message on a TypeError might change when compiling with Cython,
but the cases in which you get a TypeError must not change.
One other significant use case is the situation where we have an
optional replacement module written in C (e.g. heapqmodule.c vs.
heapq.py). There are usually many semantic differences between the C
and pure-python module that we don't care about (e.g. monkeypatching
won't work).
The size of Cython as a dependency and its development speed are still
problems though. In general for the core I don't think we want the
repo to contain generated code that can only be regenerated using a
3rd party dependency. (True, we have a few generated files, e.g.
configure; but in that case the generator -- autoconf -- is a
standard installed tool on Linux and is used by most open source
projects.)
Still, I think it would be great if someone tried something like this
for a specific stdlib module and came back with a story about the
experience, rather than having a theoretical discussion about possible
pros and cons.
Stefan Behnel
2011-03-22 06:38:59 UTC
Permalink
Post by Stefan Behnel
there seems to be quite some interest in a project to get parts of CPython
and specifically its stdlib rewritten in Cython.
[...] I gave it a try with difflib and it turned out to be quite easy.
http://blog.behnel.de/index.php?p=155
BTW, given how short that patch is, wouldn't that make a nice little
presentation for the Open Cython Day? Show off how to quickly optimise a
stdlib module with Cython?

Stefan
Vitja Makarov
2011-03-22 06:57:15 UTC
Permalink
Post by Stefan Behnel
Post by Stefan Behnel
there seems to be quite some interest in a project to get parts of CPython
and specifically its stdlib rewritten in Cython.
[...] I gave it a try with difflib and it turned out to be quite easy.
http://blog.behnel.de/index.php?p=155
BTW, given how short that patch is, wouldn't that make a nice little
presentation for the Open Cython Day? Show off how to quickly optimise a
stdlib module with Cython?
That should be nice ;) Greate step forward for cython and python.

Btw I think that stdlib module cython optimization should be as simple
as writting .pxd file with minimal .py or none modifications
As in difflib

Also it would be good to have:
- performance comparison/monitoring infrastructure
- more tests on stdlib
--
vitja.
Robert Bradshaw
2011-03-22 07:14:45 UTC
Permalink
Post by Stefan Behnel
Hi,
there seems to be quite some interest in a project to get parts of CPython
and specifically its stdlib rewritten in Cython. I've copied the latest
http://thread.gmane.org/gmane.comp.python.devel/122273/focus=122798
Interesting.
Post by Stefan Behnel
In short, we have strong supporters, but Guido has understandable doubts
against a new (and quite large) dependency and potential semantic
deviations.
Reading the list, I think others on the list overestimate the semantic
differences. Mostly we're talking about things like is vs. equality
for floating point numbers and tracebacks (at least for un-annotated
code). It's a valid point that Cython is still under such active
development.
Post by Stefan Behnel
But there seem to be cases where slight changes would be
acceptable that Cython compiled modules might introduce, such as emitting
different exception messages, changing Python classes into extension
classes, or even preventing monkey patching in modules that are backed by C
modules anyway.
It would be helpful to get support from the side of external distributors
that use Cython already, e.g. Sage, Enthought/SciPy, ActiveState, etc. If
they agreed to test the Cython generated stdlib modules in their
distributions, we could get user feedback that would allow python-dev to
take a well founded decision.
Do we have any volunteers for trying this out? Both on the side of
distributors and implementors?
I think Sage might be willing to give it a try. I'll ask tomorrow as
part of a talk I'm giving. Note that due to the way Python's import
mechanism works, it would be easy (as a first pass) to make a
"cythonize this Python install" which would just compile (a subset of)
the .py files and drop .so files next to them. This would require no
messing with the Python build system or distribution, easy to test and
benchmark, and be easy to clean up.
Post by Stefan Behnel
At the current state of affairs, the implementation could still be financed
by a Python backed GSoC project, although it would be cool if more users
could just step up and simply try to compile and optimise stdlib modules
(preferably without major changes to the code). It's certainly a great way
to show off your Cython skills :). I gave it a try with difflib and it
turned out to be quite easy.
http://blog.behnel.de/index.php?p=155
Reimplementing existing C modules in Cython might, however, be more
interesting for python-dev, but also be a larger undertaking. So a GSoC
might be worth running on that.
I think that's a great idea. Would you be willing to mentor such a project.
Post by Stefan Behnel
Note that the latest Cython release does not have generator support yet, and
Vitja's branch on github is not very stable. We will try to get it up to
speed and merged during the workshop next week, at which point it will make
more sense to get this project started than right now.
Stefan
Thanks for the clarifications. I now have a much better understanding
of what Cython is. But I'm not sold. For one, your attitude about
strict language compatibility worries me when it comes to the stdlib.
Not sure what you mean exactly. Given our large user base, we do worry a
lot about things like backwards compatibility, for example.
If you are referring to compatibility with Python, I don't think anyone
in the project really targets Cython as a a drop-in replacement for a
Python runtime. We aim to compile Python code, yes, and there's a
hand-wavy idea in the back of our head that we may want a plain Python
compatibility mode at some point that will disable several important
optimisations.
I think that's the attitude Guido worries about: if you don't have the
desire to provide 100% Python compatibility under all circumstances
(i.e. including if someone passes parameters of "incorrect" types),
then there is very little chance that we would replace a Python module
with a Cython-compiled one.
The only exception would be cases where the Python semantics is murky
(e.g. where Jython or so actually behaves differently for the same
 Python code, and still claims language conformance). E.g. the exact
message on a TypeError might change when compiling with Cython,
but the cases in which you get a TypeError must not change.
One other significant use case is the situation where we have an
optional replacement module written in C (e.g. heapqmodule.c vs.
heapq.py). There are usually many semantic differences between the C
and pure-python module that we don't care about (e.g. monkeypatching
won't work).
The size of Cython as a dependency and its development speed are still
problems though. In general for the core I don't think we want the
repo to contain generated code that can only be regenerated using a
3rd party dependency. (True, we have a few generated files, e.g.
configure; but in that case the generator -- autoconf --  is a
standard installed tool on Linux and is used by most open source
projects.)
Still, I think it would be great if someone tried something like this
for a specific stdlib module and came back with a story about the
experience, rather than having a theoretical discussion about possible
pros and cons.
_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel
Stefan Behnel
2011-03-22 07:59:04 UTC
Permalink
Post by Robert Bradshaw
Post by Stefan Behnel
there seems to be quite some interest in a project to get parts of CPython
and specifically its stdlib rewritten in Cython. [...]
In short, we have strong supporters, but Guido has understandable doubts
against a new (and quite large) dependency and potential semantic
deviations.
Reading the list, I think others on the list overestimate the semantic
differences. Mostly we're talking about things like is vs. equality
for floating point numbers and tracebacks (at least for un-annotated
code).
I think so too, that's what I tried to make clearer with my last reply. I
think Cython is actually pretty close to Python semantics overall, and
almost all deviations are explicitly triggered by type annotations in the code.
Post by Robert Bradshaw
It's a valid point that Cython is still under such active development.
Absolutely. Eventually, we'd have to settle on a specific version for the
compiler used in the stdlib, and support that at least as long as the
CPython version that uses it.
Post by Robert Bradshaw
Post by Stefan Behnel
It would be helpful to get support from the side of external distributors
that use Cython already, e.g. Sage, Enthought/SciPy, ActiveState, etc. If
they agreed to test the Cython generated stdlib modules in their
distributions, we could get user feedback that would allow python-dev to
take a well founded decision.
Do we have any volunteers for trying this out? Both on the side of
distributors and implementors?
I think Sage might be willing to give it a try. I'll ask tomorrow as
part of a talk I'm giving.
Cool.
Post by Robert Bradshaw
Note that due to the way Python's import
mechanism works, it would be easy (as a first pass) to make a
"cythonize this Python install" which would just compile (a subset of)
the .py files and drop .so files next to them. This would require no
messing with the Python build system or distribution, easy to test and
benchmark, and be easy to clean up.
Sure, I implemented that in pyximport ages ago, when Cython was really far
from being able to compile much in the stdlib. We should totally give it a
try once Vitja's branch is in shape.
Post by Robert Bradshaw
Post by Stefan Behnel
At the current state of affairs, the implementation could still be financed
by a Python backed GSoC project, although it would be cool if more users
could just step up and simply try to compile and optimise stdlib modules
(preferably without major changes to the code). It's certainly a great way
to show off your Cython skills :). I gave it a try with difflib and it
turned out to be quite easy.
http://blog.behnel.de/index.php?p=155
Reimplementing existing C modules in Cython might, however, be more
interesting for python-dev, but also be a larger undertaking. So a GSoC
might be worth running on that.
I think that's a great idea. Would you be willing to mentor such a project.
As usual, I'm not sure I'll have the time, but if no-one else steps up, I'd
consider it.

Stefan
Stefan Behnel
2011-03-24 17:42:55 UTC
Permalink
Post by Stefan Behnel
Post by Robert Bradshaw
Post by Stefan Behnel
Reimplementing existing C modules in Cython might, however, be more
interesting for python-dev, but also be a larger undertaking. So a GSoC
might be worth running on that.
I think that's a great idea. Would you be willing to mentor such a project.
As usual, I'm not sure I'll have the time, but if no-one else steps up, I'd
consider it.
Should we sign up with the PSF GSoC umbrella to officially propose this
project?

https://spreadsheets.google.com/viewform?formkey=dHh3WFNGYzkyWWE0ZjM1eFFoUUVGWmc6MQ

Do we have any other topics for the GSoC this year? We'd be just in time if
we discuss this at the workshop, although somewhat late already. The
student application deadline is from March 28 to April 8.

http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2011/timeline

Stefan
Robert Bradshaw
2011-03-24 19:18:06 UTC
Permalink
Post by Stefan Behnel
Post by Stefan Behnel
Post by Robert Bradshaw
Post by Stefan Behnel
Reimplementing existing C modules in Cython might, however, be more
interesting for python-dev, but also be a larger undertaking. So a GSoC
might be worth running on that.
I think that's a great idea. Would you be willing to mentor such a project.
As usual, I'm not sure I'll have the time, but if no-one else steps up, I'd
consider it.
Should we sign up with the PSF GSoC umbrella to officially propose this
project?
https://spreadsheets.google.com/viewform?formkey=dHh3WFNGYzkyWWE0ZjM1eFFoUUVGWmc6MQ
Do we have any other topics for the GSoC this year? We'd be just in time if
we discuss this at the workshop, although somewhat late already. The student
application deadline is from March 28 to April 8.
http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2011/timeline
There's always our enhancements page, but some of those are fairly
advanced. I think this'd be a really good GSoC project, very modular,
and doesn't require getting to know the internals of the compiler to
start making a serious contribution (at least not right away, I'm sure
bugs and optimizations will be submitted).

I'm willing to (co?)-mentor. Stefan? Craig? With the three of us we
could easily mentor at least one student, and perhaps two (if another
really solid proposal/student comes up). Anyone else willing to
mentor? I haven't pushed on GSoC much this year yet because no one's
stepped up to mentor, but there's still ample time on our side.

- Robert
Stefan Behnel
2011-03-24 19:58:37 UTC
Permalink
Post by Robert Bradshaw
Post by Stefan Behnel
Post by Stefan Behnel
Post by Robert Bradshaw
Post by Stefan Behnel
Reimplementing existing C modules in Cython might, however, be more
interesting for python-dev, but also be a larger undertaking. So a GSoC
might be worth running on that.
I think that's a great idea. Would you be willing to mentor such a project.
As usual, I'm not sure I'll have the time, but if no-one else steps up, I'd
consider it.
Should we sign up with the PSF GSoC umbrella to officially propose this
project?
https://spreadsheets.google.com/viewform?formkey=dHh3WFNGYzkyWWE0ZjM1eFFoUUVGWmc6MQ
Do we have any other topics for the GSoC this year? We'd be just in time if
we discuss this at the workshop, although somewhat late already. The student
application deadline is from March 28 to April 8.
http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2011/timeline
There's always our enhancements page, but some of those are fairly
advanced. I think this'd be a really good GSoC project, very modular,
and doesn't require getting to know the internals of the compiler to
start making a serious contribution (at least not right away, I'm sure
bugs and optimizations will be submitted).
Well, almost certainly bugs will be found, at least, and if we get a fix as
well, I'd be the more happy. :)
Post by Robert Bradshaw
I'm willing to (co?)-mentor. Stefan? Craig? With the three of us we
could easily mentor at least one student, and perhaps two (if another
really solid proposal/student comes up).
+1, splitting this will make it easier for everyone. And there will be
support from the lists anyway.
Post by Robert Bradshaw
Anyone else willing to
mentor? I haven't pushed on GSoC much this year yet because no one's
stepped up to mentor, but there's still ample time on our side.
Ok, want to sign up the Cython project then?

What became of the Sage proposal, BTW? Will they give a compiled stdlib a try?

Stefan
Robert Bradshaw
2011-03-24 20:03:45 UTC
Permalink
Post by Stefan Behnel
Post by Robert Bradshaw
Post by Stefan Behnel
Post by Stefan Behnel
Post by Robert Bradshaw
Post by Stefan Behnel
Reimplementing existing C modules in Cython might, however, be more
interesting for python-dev, but also be a larger undertaking. So a GSoC
might be worth running on that.
I think that's a great idea. Would you be willing to mentor such a project.
As usual, I'm not sure I'll have the time, but if no-one else steps up, I'd
consider it.
Should we sign up with the PSF GSoC umbrella to officially propose this
project?
https://spreadsheets.google.com/viewform?formkey=dHh3WFNGYzkyWWE0ZjM1eFFoUUVGWmc6MQ
Do we have any other topics for the GSoC this year? We'd be just in time if
we discuss this at the workshop, although somewhat late already. The student
application deadline is from March 28 to April 8.
http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2011/timeline
There's always our enhancements page, but some of those are fairly
advanced. I think this'd be a really good GSoC project, very modular,
and doesn't require getting to know the internals of the compiler to
start making a serious contribution (at least not right away, I'm sure
bugs and optimizations will be submitted).
Well, almost certainly bugs will be found, at least, and if we get a fix as
well, I'd be the more happy. :)
Post by Robert Bradshaw
I'm willing to (co?)-mentor. Stefan? Craig? With the three of us we
could easily mentor at least one student, and perhaps two (if another
really solid proposal/student comes up).
+1, splitting this will make it easier for everyone. And there will be
support from the lists anyway.
Post by Robert Bradshaw
Anyone else willing to
mentor? I haven't pushed on GSoC much this year yet because no one's
stepped up to mentor, but there's still ample time on our side.
Ok, want to sign up the Cython project then?
Well, I'm talking about under the Python foundation umbrella.
Post by Stefan Behnel
What became of the Sage proposal, BTW? Will they give a compiled stdlib a try?
I completely forgot to ask (too many other things on my mind). We have
a fairly extensive test suite, so I think the thing to do would be to
try it out and see if anything breaks (which will be a good confidence
builder for both Cython and Sage).

- Robert
Stefan Behnel
2011-03-24 20:26:16 UTC
Permalink
Post by Robert Bradshaw
Post by Stefan Behnel
Post by Robert Bradshaw
Anyone else willing to
mentor? I haven't pushed on GSoC much this year yet because no one's
stepped up to mentor, but there's still ample time on our side.
Ok, want to sign up the Cython project then?
Well, I'm talking about under the Python foundation umbrella.
That's what I meant.

https://spreadsheets.google.com/viewform?formkey=dHh3WFNGYzkyWWE0ZjM1eFFoUUVGWmc6MQ
Post by Robert Bradshaw
Post by Stefan Behnel
What became of the Sage proposal, BTW? Will they give a compiled stdlib a try?
I completely forgot to ask (too many other things on my mind). We have
a fairly extensive test suite, so I think the thing to do would be to
try it out and see if anything breaks (which will be a good confidence
builder for both Cython and Sage).
Sure. I think the first step is to show results, then we can see who likes
them good enough to take the risk.

Stefan
Dan Stromberg
2011-03-22 23:09:44 UTC
Permalink
I think it's a good idea, but I think it'd be better to use pure mode to get
code that runs either way, or some sort of preprocessor (I've used m4 with
good luck for this, though it doesn't syntax highlight nicely) to
automatically derive pure python and cython from the same source file.

For me at least, the branch of Cython that supports generators has worked
flawlessly - except for one buglet that prevented use on 2.5.x. It was
quickly fixed when reported though.

Cython probably should get out of the tiny-version-number phase first
though. I often feel that opensource projects use tiny version numbers for
years as a sort of cop out. well after people have begun relying on the code
for production use.
Post by Stefan Behnel
Hi,
there seems to be quite some interest in a project to get parts of CPython
and specifically its stdlib rewritten in Cython. I've copied the latest
http://thread.gmane.org/gmane.comp.python.devel/122273/focus=122798
In short, we have strong supporters, but Guido has understandable doubts
against a new (and quite large) dependency and potential semantic
deviations. But there seem to be cases where slight changes would be
acceptable that Cython compiled modules might introduce, such as emitting
different exception messages, changing Python classes into extension
classes, or even preventing monkey patching in modules that are backed by C
modules anyway.
It would be helpful to get support from the side of external distributors
that use Cython already, e.g. Sage, Enthought/SciPy, ActiveState, etc. If
they agreed to test the Cython generated stdlib modules in their
distributions, we could get user feedback that would allow python-dev to
take a well founded decision.
Do we have any volunteers for trying this out? Both on the side of
distributors and implementors?
At the current state of affairs, the implementation could still be financed
by a Python backed GSoC project, although it would be cool if more users
could just step up and simply try to compile and optimise stdlib modules
(preferably without major changes to the code). It's certainly a great way
to show off your Cython skills :). I gave it a try with difflib and it
turned out to be quite easy.
http://blog.behnel.de/index.php?p=155
Reimplementing existing C modules in Cython might, however, be more
interesting for python-dev, but also be a larger undertaking. So a GSoC
might be worth running on that.
Note that the latest Cython release does not have generator support yet,
and Vitja's branch on github is not very stable. We will try to get it up to
speed and merged during the workshop next week, at which point it will make
more sense to get this project started than right now.
Stefan
Thanks for the clarifications. I now have a much better understanding
of what Cython is. But I'm not sold. For one, your attitude about
strict language compatibility worries me when it comes to the stdlib.
Not sure what you mean exactly. Given our large user base, we do worry a
lot about things like backwards compatibility, for example.
If you are referring to compatibility with Python, I don't think anyone
in the project really targets Cython as a a drop-in replacement for a
Python runtime. We aim to compile Python code, yes, and there's a
hand-wavy idea in the back of our head that we may want a plain Python
compatibility mode at some point that will disable several important
optimisations.
I think that's the attitude Guido worries about: if you don't have the
desire to provide 100% Python compatibility under all circumstances
(i.e. including if someone passes parameters of "incorrect" types),
then there is very little chance that we would replace a Python module
with a Cython-compiled one.
The only exception would be cases where the Python semantics is murky
(e.g. where Jython or so actually behaves differently for the same
Python code, and still claims language conformance). E.g. the exact
message on a TypeError might change when compiling with Cython,
but the cases in which you get a TypeError must not change.
One other significant use case is the situation where we have an
optional replacement module written in C (e.g. heapqmodule.c vs.
heapq.py). There are usually many semantic differences between the C
and pure-python module that we don't care about (e.g. monkeypatching
won't work).
The size of Cython as a dependency and its development speed are still
problems though. In general for the core I don't think we want the
repo to contain generated code that can only be regenerated using a
3rd party dependency. (True, we have a few generated files, e.g.
configure; but in that case the generator -- autoconf -- is a
standard installed tool on Linux and is used by most open source
projects.)
Still, I think it would be great if someone tried something like this
for a specific stdlib module and came back with a story about the
experience, rather than having a theoretical discussion about possible
pros and cons.
_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel
Robert Bradshaw
2011-03-22 23:54:30 UTC
Permalink
Post by Dan Stromberg
I think it's a good idea, but I think it'd be better to use pure mode to get
code that runs either way, or some sort of preprocessor (I've used m4 with
good luck for this, though it doesn't syntax highlight nicely) to
automatically derive pure python and cython from the same source file.
It doesn't hurt to explore the potential before coming up with the
actual solution. Ideally, the .py files would not have to be modified
at all.
Post by Dan Stromberg
For me at least, the branch of Cython that supports generators has worked
flawlessly - except for one buglet that prevented use on 2.5.x.  It was
quickly fixed when reported though.
Cython probably should get out of the tiny-version-number phase first
though.  I often feel that opensource projects use tiny version numbers for
years as a sort of cop out. well after people have begun relying on the code
for production use.
We have a clear 1.0 goal, being able to compile the full Python
language. We're not there yet, but very close. It may make sense at
that point to also clean up any cruft we don't want to continue
supporting forever. I agree, until that point, there's no way we would
be a Python development dependency.

- Robert
Post by Dan Stromberg
Post by Stefan Behnel
Hi,
there seems to be quite some interest in a project to get parts of CPython
and specifically its stdlib rewritten in Cython. I've copied the latest
http://thread.gmane.org/gmane.comp.python.devel/122273/focus=122798
In short, we have strong supporters, but Guido has understandable doubts
against a new (and quite large) dependency and potential semantic
deviations. But there seem to be cases where slight changes would be
acceptable that Cython compiled modules might introduce, such as emitting
different exception messages, changing Python classes into extension
classes, or even preventing monkey patching in modules that are backed by C
modules anyway.
It would be helpful to get support from the side of external distributors
that use Cython already, e.g. Sage, Enthought/SciPy, ActiveState, etc. If
they agreed to test the Cython generated stdlib modules in their
distributions, we could get user feedback that would allow python-dev to
take a well founded decision.
Do we have any volunteers for trying this out? Both on the side of
distributors and implementors?
At the current state of affairs, the implementation could still be
financed by a Python backed GSoC project, although it would be cool if more
users could just step up and simply try to compile and optimise stdlib
modules (preferably without major changes to the code). It's certainly a
great way to show off your Cython skills :). I gave it a try with difflib
and it turned out to be quite easy.
http://blog.behnel.de/index.php?p=155
Reimplementing existing C modules in Cython might, however, be more
interesting for python-dev, but also be a larger undertaking. So a GSoC
might be worth running on that.
Note that the latest Cython release does not have generator support yet,
and Vitja's branch on github is not very stable. We will try to get it up to
speed and merged during the workshop next week, at which point it will make
more sense to get this project started than right now.
Stefan
Thanks for the clarifications. I now have a much better understanding
of what Cython is. But I'm not sold. For one, your attitude about
strict language compatibility worries me when it comes to the stdlib.
Not sure what you mean exactly. Given our large user base, we do worry a
lot about things like backwards compatibility, for example.
If you are referring to compatibility with Python, I don't think anyone
in the project really targets Cython as a a drop-in replacement for a
Python runtime. We aim to compile Python code, yes, and there's a
hand-wavy idea in the back of our head that we may want a plain Python
compatibility mode at some point that will disable several important
optimisations.
I think that's the attitude Guido worries about: if you don't have the
desire to provide 100% Python compatibility under all circumstances
(i.e. including if someone passes parameters of "incorrect" types),
then there is very little chance that we would replace a Python module
with a Cython-compiled one.
The only exception would be cases where the Python semantics is murky
(e.g. where Jython or so actually behaves differently for the same
 Python code, and still claims language conformance). E.g. the exact
message on a TypeError might change when compiling with Cython,
but the cases in which you get a TypeError must not change.
One other significant use case is the situation where we have an
optional replacement module written in C (e.g. heapqmodule.c vs.
heapq.py). There are usually many semantic differences between the C
and pure-python module that we don't care about (e.g. monkeypatching
won't work).
The size of Cython as a dependency and its development speed are still
problems though. In general for the core I don't think we want the
repo to contain generated code that can only be regenerated using a
3rd party dependency. (True, we have a few generated files, e.g.
configure; but in that case the generator -- autoconf --  is a
standard installed tool on Linux and is used by most open source
projects.)
Still, I think it would be great if someone tried something like this
for a specific stdlib module and came back with a story about the
experience, rather than having a theoretical discussion about possible
pros and cons.
_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel
_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel
Craig Citro
2011-03-23 07:11:09 UTC
Permalink
Post by Robert Bradshaw
We have a clear 1.0 goal, being able to compile the full Python
language. We're not there yet, but very close. It may make sense at
that point to also clean up any cruft we don't want to continue
supporting forever. I agree, until that point, there's no way we would
be a Python development dependency.
Are we aiming for 100% compatibility, or 99.9%? For instance, I seem to
recall a few dark corners we aren't planning on covering -- maybe some
details of traceback construction? (I want to say multiple inheritance, too,
but I think that's only an issue for cdef classes, right?) I think it would
be good to have this written down -- in particular, it seems like there's
some momentum right now for clearly delineating "Python language semantics"
vs. "CPython implementation detail" in the python-devel community, so it
might be a particularly good time to raise these questions.

-cc
Stefan Behnel
2011-03-23 07:45:01 UTC
Permalink
Post by Craig Citro
Post by Robert Bradshaw
We have a clear 1.0 goal, being able to compile the full Python
language. We're not there yet, but very close. It may make sense at
that point to also clean up any cruft we don't want to continue
supporting forever. I agree, until that point, there's no way we would
be a Python development dependency.
Are we aiming for 100% compatibility, or 99.9%? For instance, I seem to
recall a few dark corners we aren't planning on covering
I think it's more like 98%. CPython, PyPy and friends consider the more
obscure features like frame access a core Python language feature, but I
don't see an interest in making that work in Cython. If someone wants to
write the code, fine, as long as it's switched off outside of a future
"strict compatibility mode". I've never seen this used in any code that I
would have wanted to compile with Cython, and even in Python code it's only
really used for more or less "clever hacks".
Post by Craig Citro
maybe some details of traceback construction?
What I would like to see here is simply a fallback of the frames that we
create for tracebacks to look for a .py file (or even .pyx/.pxi file) next
to the compiled .so file, so that they can print the source line for the
executed line in the .so.

Apart from that, it boils down to the issue of frame construction. See
above. Keep in mind that CPython's own C code currently gives you no frames
at all, so Cython modules already have a serious advantage over what's
there today.
Post by Craig Citro
(I want to say multiple inheritance, too,
but I think that's only an issue for cdef classes, right?)
Yes. That's how things work in CPython at the C level, and I'm fine with that.
Post by Craig Citro
I think it would
be good to have this written down -- in particular, it seems like there's
some momentum right now for clearly delineating "Python language semantics"
vs. "CPython implementation detail" in the python-devel community, so it
might be a particularly good time to raise these questions.
+1, a "where do we stand?" topic is planned for the workshop anyway.

Stefan
Vitja Makarov
2011-03-23 07:52:08 UTC
Permalink
Post by Stefan Behnel
Post by Craig Citro
Post by Robert Bradshaw
We have a clear 1.0 goal, being able to compile the full Python
language. We're not there yet, but very close. It may make sense at
that point to also clean up any cruft we don't want to continue
supporting forever. I agree, until that point, there's no way we would
be a Python development dependency.
Are we aiming for 100% compatibility, or 99.9%? For instance, I seem to
recall a few dark corners we aren't planning on covering
I think it's more like 98%. CPython, PyPy and friends consider the more
obscure features like frame access a core Python language feature, but I
don't see an interest in making that work in Cython. If someone wants to
write the code, fine, as long as it's switched off outside of a future
"strict compatibility mode". I've never seen this used in any code that I
would have wanted to compile with Cython, and even in Python code it's only
really used for more or less "clever hacks".
Post by Craig Citro
maybe some details of traceback construction?
What I would like to see here is simply a fallback of the frames that we
create for tracebacks to look for a .py file (or even .pyx/.pxi file) next
to the compiled .so file, so that they can print the source line for the
executed line in the .so.
Apart from that, it boils down to the issue of frame construction. See
above. Keep in mind that CPython's own C code currently gives you no frames
at all, so Cython modules already have a serious advantage over what's there
today.
Post by Craig Citro
(I want to say multiple inheritance, too,
but I think that's only an issue for cdef classes, right?)
Yes. That's how things work in CPython at the C level, and I'm fine with that.
Post by Craig Citro
I think it would
be good to have this written down -- in particular, it seems like there's
some momentum right now for clearly delineating "Python language semantics"
vs. "CPython implementation detail" in the python-devel community, so it
might be a particularly good time to raise these questions.
+1, a "where do we stand?" topic is planned for the workshop anyway.
One more interesting thing is introspection support via inspect module
--
vitja.
Robert Bradshaw
2011-03-24 19:38:23 UTC
Permalink
Post by Stefan Behnel
Post by Craig Citro
Post by Robert Bradshaw
We have a clear 1.0 goal, being able to compile the full Python
language. We're not there yet, but very close. It may make sense at
that point to also clean up any cruft we don't want to continue
supporting forever. I agree, until that point, there's no way we would
be a Python development dependency.
Are we aiming for 100% compatibility, or 99.9%? For instance, I seem to
recall a few dark corners we aren't planning on covering
I started a list at http://wiki.cython.org/Unsupported . I'd say we
can be as compatible as Jython/IronPython is, and more than CPython is
between minor versions. I would be happy with a short, well-justified
list of differences. This will be clearer once the community (which
we're a part of) defines what Python vs. implementation details means.
Post by Stefan Behnel
I think it's more like 98%. CPython, PyPy and friends consider the more
obscure features like frame access a core Python language feature, but I
don't see an interest in making that work in Cython. If someone wants to
write the code, fine, as long as it's switched off outside of a future
"strict compatibility mode". I've never seen this used in any code that I
would have wanted to compile with Cython, and even in Python code it's only
really used for more or less "clever hacks".
Post by Craig Citro
maybe some details of traceback construction?
What I would like to see here is simply a fallback of the frames that we
create for tracebacks to look for a .py file (or even .pyx/.pxi file) next
to the compiled .so file, so that they can print the source line for the
executed line in the .so.
Apart from that, it boils down to the issue of frame construction. See
above. Keep in mind that CPython's own C code currently gives you no frames
at all, so Cython modules already have a serious advantage over what's there
today.
Post by Craig Citro
(I want to say multiple inheritance, too,
but I think that's only an issue for cdef classes, right?)
Yes. That's how things work in CPython at the C level, and I'm fine with that.
We have multiple inheritance for normal classes, so that's covered.
Any auto-cdeffing of classes would have to be done carefully.
Post by Stefan Behnel
Post by Craig Citro
I think it would
be good to have this written down -- in particular, it seems like there's
some momentum right now for clearly delineating "Python language semantics"
vs. "CPython implementation detail" in the python-devel community, so it
might be a particularly good time to raise these questions.
+1, a "where do we stand?" topic is planned for the workshop anyway.
+1

- Robert
Sturla Molden
2011-03-25 13:03:54 UTC
Permalink
Post by Robert Bradshaw
I started a list at http://wiki.cython.org/Unsupported . I'd say we
can be as compatible as Jython/IronPython is, and more than CPython is
between minor versions. I would be happy with a short, well-justified
list of differences. This will be clearer once the community (which
we're a part of) defines what Python vs. implementation details means.
Looking at Guido's comment, Cython must be able to compile all valid
Python if this will have any chance of success.

Is the plan to include Cython in the standard library? I don't think a
large external dependency like Cython will be accepted unless it's a
part of the CPython distribution.

Why stop with the standard library? Why not implement the whole CPython
interpreter in Cython?


Sturla
Stefan Behnel
2011-03-25 13:42:39 UTC
Permalink
Post by Robert Bradshaw
I started a list at http://wiki.cython.org/Unsupported . I'd say we
can be as compatible as Jython/IronPython is, and more than CPython is
between minor versions. I would be happy with a short, well-justified
list of differences. This will be clearer once the community (which
we're a part of) defines what Python vs. implementation details means.
Looking at Guido's comment, Cython must be able to compile all valid Python
if this will have any chance of success.
I think there are two levels of required compatibility, the lower one of
which applies to a reimplementation of C modules in Cythen. That's the
reason why I see that as the more worthwhile goal for now.
Is the plan to include Cython in the standard library? I don't think a
large external dependency like Cython will be accepted unless it's a part
of the CPython distribution.
It was the plan a while ago, but the problem is that Cython's evolution
(and stability) isn't very much in line with that of CPython. I'm not as
pessimistic as you seem to be, though. From the POV of CPython, Cython is
basically just a build tool. We could try to come up with a release series
that we consider stable enough to base CPython on it. If we agree to
maintain that for the whole lifetime of the corresponding CPython
release(s), it wouldn't necessarily have to be part of CPython itself
(although it could...)
Why stop with the standard library? Why not implement the whole CPython
interpreter in Cython?
No-one said we'd stop there. ;)

However, going down that route will quickly turn into a problem of
boot-strapping. How would you run Cython without a Python interpreter?
Would you need an old interpreter installed in order to run Cython to build
a new one? Currently, you only need a C compiler and cmmi tools. I could
easily understand anyone who'd reject entering into such a recursive
dependency.

Stefan
Robert Bradshaw
2011-03-25 18:03:51 UTC
Permalink
Post by Stefan Behnel
Post by Sturla Molden
Post by Robert Bradshaw
I started a list at http://wiki.cython.org/Unsupported . I'd say we
can be as compatible as Jython/IronPython is, and more than CPython is
between minor versions. I would be happy with a short, well-justified
list of differences. This will be clearer once the community (which
we're a part of) defines what Python vs. implementation details means.
Looking at Guido's comment, Cython must be able to compile all valid
Python if this will have any chance of success.
Good thing that's our goal (pending an actual definition of "all valid Python.")
Post by Stefan Behnel
I think there are two levels of required compatibility, the lower one of
which applies to a reimplementation of C modules in Cythen. That's the
reason why I see that as the more worthwhile goal for now.
Post by Sturla Molden
Is the plan to include Cython in the standard library? I don't think a
large external dependency like Cython will be accepted unless it's a part
of the CPython distribution.
It was the plan a while ago, but the problem is that Cython's evolution (and
stability) isn't very much in line with that of CPython. I'm not as
pessimistic as you seem to be, though. From the POV of CPython, Cython is
basically just a build tool. We could try to come up with a release series
that we consider stable enough to base CPython on it. If we agree to
maintain that for the whole lifetime of the corresponding CPython
release(s), it wouldn't necessarily have to be part of CPython itself
(although it could...)
As a first step, CPython would just ship the generated C, only
requiring Cython as a development dependency, not a build dependency.
This would also be safer from the stability POV.
Post by Stefan Behnel
Post by Sturla Molden
Why stop with the standard library? Why not implement the whole CPython
interpreter in Cython?
No-one said we'd stop there. ;)
Of course the CPython interpreter is a large, fairly-optimized C
codebase already, so the pitfalls of just re-writing it for the sake
of re-writing it probably outweigh the gains.
Post by Stefan Behnel
However, going down that route will quickly turn into a problem of
boot-strapping. How would you run Cython without a Python interpreter? Would
you need an old interpreter installed in order to run Cython to build a new
one? Currently, you only need a C compiler and cmmi tools. I could easily
understand anyone who'd reject entering into such a recursive dependency.
It would be as easy as bootstrapping gcc... :).

- Robert
Sturla Molden
2011-03-27 22:39:02 UTC
Permalink
Post by Robert Bradshaw
Post by Sturla Molden
Looking at Guido's comment, Cython must be able to compile all valid
Python if this will have any chance of success.
Good thing that's our goal (pending an actual definition of "all valid Python.")
In lack of a Python language specification it can be hard to tell
implementation details from syntax. It sounded though as if Guido was
worried about Cython's compatibility with Python, and maybe the Cython
dev team's attitude to Python compatibility.

Also don't think Cython's main strength in this context was properly
clarified in the debate. It is easy to over-focus on "speed", when it's
really a matter of "writing Python C extensions easily" -- i.e. without
knowing (a lot) about Python's C API, not having to worry about
reference counting, and the possibility of using Python code as
prototype. Cython is, without comparison, the easiest way of writing C
extensions for Python. FWIW, it's easier to use Cython than ctypes.
Using Cython instead of the C API will also avoid many programming
errors, because a compiler does fewer mistakes than a human. Those
aspects are important to communicate, not just "Cython can be as fast as
C++".

Sturla
David Cournapeau
2011-03-28 04:46:01 UTC
Permalink
Post by Sturla Molden
Cython is,
without comparison, the easiest way of writing C extensions for Python.
FWIW, it's easier to use Cython than ctypes. Using Cython instead of the C
API will also avoid many programming errors, because a compiler does fewer
mistakes than a human.
Agreed. I did not jump in in the discussion on python-dev because I am
not involved in cython development, but I felt that this point may not
have been obvious for people unfamiliar with cython. Being able to
have less C in python itself sounds like a better goal to me, and even
more useful for alternative implementations than compiling the stdlib
(porting cython must be easier than porting C in almost every case).
One could imagine using cython for most stuff in Modules (I don't know
how much from Modules would be needed for cython itself to solve the
bootstrap issue).

Cython helps making things fast, but it also removes the need to do
raw C wrappers in most cases.

cheers,

David
Robert Bradshaw
2011-03-29 00:09:08 UTC
Permalink
Post by Sturla Molden
Post by Robert Bradshaw
Post by Sturla Molden
Looking at Guido's comment, Cython must be able to compile all valid
Python if this will have any chance of success.
Good thing that's our goal (pending an actual definition of "all valid Python.")
In lack of a Python language specification it can be hard to tell
implementation details from syntax. It sounded though as if Guido was
worried about Cython's compatibility with Python, and maybe the Cython dev
team's attitude to Python compatibility.
We are very concerned about Python compatibility.
Post by Sturla Molden
Also don't think Cython's main strength in this context was properly
clarified in the debate. It is easy to over-focus on "speed", when it's
really a matter of "writing Python C extensions easily" -- i.e. without
knowing (a lot) about Python's C API, not having to worry about reference
counting, and the possibility of using Python code as prototype. Cython is,
without comparison, the easiest way of writing C extensions for Python.
FWIW, it's easier to use Cython than ctypes. Using Cython instead of the C
API will also avoid many programming errors, because a compiler does fewer
mistakes than a human. Those aspects are important to communicate, not just
"Cython can be as fast as C++".
That is a good point. On the other hand, I don't see re-implementing
working C modules written, though probably valuable from a maintenance
point of view, as compelling of a use case.

- Robert
Stefan Behnel
2011-03-29 06:22:42 UTC
Permalink
Post by Robert Bradshaw
I don't see re-implementing
working C modules written, though probably valuable from a maintenance
point of view, as compelling of a use case.
It would be rather helpful for CPython, though. Many stdlib modules lack
dedicated maintainers, and it's likely easier in the Python world to find a
maintainer for code that's "mostly Python", than for code that's C plus
CPython's C-API.

Stefan
Sturla Molden
2011-03-29 09:45:12 UTC
Permalink
Post by Robert Bradshaw
We are very concerned about Python compatibility.
I did not intend to say you are not.

Judging from Guido's answer to Stephan, I think Guido is worried you are
not.

And that, BTW, is sufficient to prevent the use of Cython in CPython stdlib.

Sturla

Greg Ewing
2011-03-26 00:57:08 UTC
Permalink
Post by Sturla Molden
Why stop with the standard library? Why not implement the whole CPython
interpreter in Cython?
That would be tricky, because the code that Cython generates
depends on large chunks of CPython as an infrastructure.
--
Greg
Stefan Behnel
2011-03-23 07:31:07 UTC
Permalink
Post by Robert Bradshaw
Post by Dan Stromberg
I think it's a good idea, but I think it'd be better to use pure mode to get
code that runs either way, or some sort of preprocessor (I've used m4 with
good luck for this, though it doesn't syntax highlight nicely) to
automatically derive pure python and cython from the same source file.
It doesn't hurt to explore the potential before coming up with the
actual solution. Ideally, the .py files would not have to be modified
at all.
Or only slightly, in an acceptable way (whatever that means in a given
context). For difflib, for example, I could clean up a couple of things
that I'd also frown upon in Python code, like taking off a bound method for
__contains__, instead of using a straight and obvious 'in' test.
Post by Robert Bradshaw
Post by Dan Stromberg
For me at least, the branch of Cython that supports generators has worked
flawlessly
It certainly has bugs. For example, I get C compiler warnings when
compiling the optimised difflib. And it has disabled one of my favourite
features, inlined generator expressions. We'll try to fix it up during the
workshop next week.

Stefan
Loading...