[Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.

I've submitted to google the link is:
http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/arthur_sr/1#

<http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/arthur_sr/1#>It
would be really important if you could give me a feedback to my proposal...

Thank you

Best Regards

Arthur

Post by Arthur de Souza Ribeiro
I've wrote a proposal to the project: Reimplement C modules in CPython's
standard library in Cython.
I'd be glad if you could take a look a it and give me your feedback.
the link for the proposal is: http://wiki.cython.org/arthursribeiro
Thank you.
Best Regards
Arthur

Robert Bradshaw

2011-04-07 23:08:59 UTC

On Thu, Apr 7, 2011 at 2:31 PM, Arthur de Souza Ribeiro

Post by Arthur de Souza Ribeiro
I've submitted to google the link
is: http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/arthur_sr/1#
It would be really important if you could give me a feedback to my
proposal...
Thank you
Best Regards
Arthur

Some quick points:

- Python ships with extensive regression tests--use (and possibly
augment) those to test your work rather than writing your own.
- Three modules for a whole summer seems a bit weak, especially for
someone who already knows Cython. Target at least one module/week
seems like a good pace; some will be quickies, others might take 40+
hours. And I guarantee you'll get better and faster with practice.
- Now that generators are supported, it could also be interesting to
look at compiling all the non-C modules and fixing exposed bugs if
any, but that might be out of scope.

What I'd like to see is an implementation of a single simple but not
entirely trivial (e.g. not math) module, passing regression tests with
comprable if not better speed than the current C version (though I
think it'd probably make sense to start out with the Python version
and optimize that). E.g. http://docs.python.org/library/json.html
looks like a good candidate. That should only take 8 hours or so,
maybe two days at most, given your background. I'm not expecting
anything before the application deadline, but if you could whip
something like this out in the next week to point to that would help
your application out immensely. In fact, one of the Python
foundation's requirements is that students submit a patch before being
accepted, and this would knock out that requirement and give you a
chance to prove yourself. Create an account on https://github.com and
commit your code into a new repository there.

Hope that helps.

- Robert

I've wrote a proposal to the project: Reimplement C modules in CPython's
standard library in Cython.
I'd be glad if you could take a look a it and give me your feedback.
the link for the proposal is: http://wiki.cython.org/arthursribeiro
Thank you.
Best Regards
Arthur

_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel

Arthur de Souza Ribeiro

2011-04-08 00:43:29 UTC

Post by Robert Bradshaw
On Thu, Apr 7, 2011 at 2:31 PM, Arthur de Souza Ribeiro

Post by Arthur de Souza Ribeiro
I've submitted to google the link

http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/arthur_sr/1#

Post by Arthur de Souza Ribeiro
It would be really important if you could give me a feedback to my
proposal...
Thank you
Best Regards
Arthur

- Python ships with extensive regression tests--use (and possibly
augment) those to test your work rather than writing your own.

Thank you for that information Robert, I didn't realize this.

Post by Robert Bradshaw
- Three modules for a whole summer seems a bit weak, especially for
someone who already knows Cython. Target at least one module/week
seems like a good pace; some will be quickies, others might take 40+
hours. And I guarantee you'll get better and faster with practice.

I'm going to refactor this Robert, as soon as I remake my project's roadmap
I'll send to you again.

Post by Robert Bradshaw
- Now that generators are supported, it could also be interesting to
look at compiling all the non-C modules and fixing exposed bugs if
any, but that might be out of scope.

I will try to take a look at this after implementing some cython code to a
the module you suggested.

Post by Robert Bradshaw
What I'd like to see is an implementation of a single simple but not
entirely trivial (e.g. not math) module, passing regression tests with
comprable if not better speed than the current C version (though I
think it'd probably make sense to start out with the Python version
and optimize that). E.g. http://docs.python.org/library/json.html
looks like a good candidate. That should only take 8 hours or so,
maybe two days at most, given your background. I'm not expecting
anything before the application deadline, but if you could whip
something like this out in the next week to point to that would help
your application out immensely. In fact, one of the Python
foundation's requirements is that students submit a patch before being
accepted, and this would knock out that requirement and give you a
chance to prove yourself. Create an account on https://github.com and
commit your code into a new repository there.

I will start the implementation of json module right now. I created my
github account and as soon as I have code implemented I will send repository
link.

Thanks for all the points you listed, I will work on all of them and send an
e-mail.

Best Regards.

[]s

Arthur

Hope that helps.

Post by Robert Bradshaw
- Robert

_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel

Arthur de Souza Ribeiro

2011-04-08 06:38:46 UTC

I've made some changes to my proposal, as you said, I changed the number of
modules I'm going to reimplement, I jumped from three to twelve modules,
what modules are these and when I want to implement is described at:

http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/arthur_sr/1#

<http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/arthur_sr/1#>
and

http://wiki.cython.org/arthursribeiro

<http://wiki.cython.org/arthursribeiro>If you could take another look I
would appreciate a lot.

Best Regards.

[]s

Arthur

Post by Robert Bradshaw
On Thu, Apr 7, 2011 at 2:31 PM, Arthur de Souza Ribeiro

Post by Arthur de Souza Ribeiro
I've submitted to google the link

http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/arthur_sr/1#

Post by Arthur de Souza Ribeiro
It would be really important if you could give me a feedback to my
proposal...
Thank you
Best Regards
Arthur

- Python ships with extensive regression tests--use (and possibly
augment) those to test your work rather than writing your own.

Thank you for that information Robert, I didn't realize this.

I'm going to refactor this Robert, as soon as I remake my project's
roadmap I'll send to you again.

Post by Robert Bradshaw
- Now that generators are supported, it could also be interesting to
look at compiling all the non-C modules and fixing exposed bugs if
any, but that might be out of scope.

I will try to take a look at this after implementing some cython code to a
the module you suggested.

I will start the implementation of json module right now. I created my
github account and as soon as I have code implemented I will send repository
link.
Thanks for all the points you listed, I will work on all of them and send
an e-mail.
Best Regards.
[]s
Arthur
Hope that helps.

Post by Robert Bradshaw
- Robert

Post by Arthur de Souza Ribeiro
I've wrote a proposal to the project: Reimplement C modules in

CPython's

Post by Arthur de Souza Ribeiro
standard library in Cython.
I'd be glad if you could take a look a it and give me your feedback.
the link for the proposal is: http://wiki.cython.org/arthursribeiro
Thank you.
Best Regards
Arthur

_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel

Robert Bradshaw

2011-04-08 08:50:45 UTC

Looking better. I would add some details about how you plan to
compare/profile your new implementations, and also at least add a note
that compatibility will be ensured with the Python regression tests.

It may make sense to let the exact list of modules be somewhat
flexible, for example, based on feedback from the Python and Cython
community on what would be the most worthwhile to Cythonize. Maybe the
final milestone would be "re-implement several additional modules as
chosen by the Python community to provide maximum value" or something
like that.

- Robert

On Thu, Apr 7, 2011 at 11:38 PM, Arthur de Souza Ribeiro

Post by Arthur de Souza Ribeiro
I've made some changes to my proposal, as you said, I changed the number of
modules I'm going to reimplement, I jumped from three to twelve modules,
http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/arthur_sr/1#
and
http://wiki.cython.org/arthursribeiro
If you could take another look I would appreciate a lot.
Best Regards.
[]s
Arthur

Post by Robert Bradshaw
On Thu, Apr 7, 2011 at 2:31 PM, Arthur de Souza Ribeiro

- Python ships with extensive regression tests--use (and possibly
augment) those to test your work rather than writing your own.

Thank you for that information Robert, I didn't realize this.

I'm going to refactor this Robert, as soon as I remake my project's
roadmap I'll send to you again.

Post by Robert Bradshaw
- Now that generators are supported, it could also be interesting to
look at compiling all the non-C modules and fixing exposed bugs if
any, but that might be out of scope.

I will try to take a look at this after implementing some cython code to a
the module you suggested.

I will start the implementation of json module right now. I created my
github account and as soon as I have code implemented I will send repository
link.
Thanks for all the points you listed, I will work on all of them and send
an e-mail.
Best Regards.
[]s
Arthur

Post by Robert Bradshaw
Hope that helps.
- Robert

I've wrote a proposal to the project: Reimplement C modules in CPython's
standard library in Cython.
I'd be glad if you could take a look a it and give me your feedback.
the link for the proposal is: http://wiki.cython.org/arthursribeiro
Thank you.
Best Regards
Arthur

_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel

Arthur de Souza Ribeiro

2011-04-08 17:31:12 UTC

Post by Robert Bradshaw
Looking better. I would add some details about how you plan to
compare/profile your new implementations, and also at least add a note
that compatibility will be ensured with the Python regression tests.

I'm going to update this on the wiki page, it's just because the application
period in google-melange ends today and it's not recommended to put details
os how you plan to make the project in there. About the comparison, I wonder
talk to the community a little more about it, because I see in cython's web
page and in the tutorials numbers telling how much more efficient cython is
against python, so I planned to use the same strategy that is used to show
the numbers that are there. And about the tests, I'm studying how can I
check compatibility and do these tasks to put that on wiki too.

Post by Robert Bradshaw
It may make sense to let the exact list of modules be somewhat
flexible, for example, based on feedback from the Python and Cython
community on what would be the most worthwhile to Cythonize. Maybe the
final milestone would be "re-implement several additional modules as
chosen by the Python community to provide maximum value" or something
like that.

You mean just to tell how many modules I'm going to re-implement, but not
telling what modules are these? Re-implementing by community demand?

Thank you very much again.

Best Regards.

[]s

Arthur

Post by Robert Bradshaw
- Robert
On Thu, Apr 7, 2011 at 11:38 PM, Arthur de Souza Ribeiro

Post by Arthur de Souza Ribeiro
I've made some changes to my proposal, as you said, I changed the number

Post by Arthur de Souza Ribeiro
modules I'm going to reimplement, I jumped from three to twelve modules,

http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/arthur_sr/1#

Post by Arthur de Souza Ribeiro
and
http://wiki.cython.org/arthursribeiro
If you could take another look I would appreciate a lot.
Best Regards.
[]s
Arthur

Post by Robert Bradshaw
On Thu, Apr 7, 2011 at 2:31 PM, Arthur de Souza Ribeiro

Post by Arthur de Souza Ribeiro
I've submitted to google the link

http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/arthur_sr/1#

Post by Arthur de Souza Ribeiro
It would be really important if you could give me a feedback to my
proposal...
Thank you
Best Regards
Arthur

- Python ships with extensive regression tests--use (and possibly
augment) those to test your work rather than writing your own.

Thank you for that information Robert, I didn't realize this.

I'm going to refactor this Robert, as soon as I remake my project's
roadmap I'll send to you again.

Post by Robert Bradshaw
- Now that generators are supported, it could also be interesting to
look at compiling all the non-C modules and fixing exposed bugs if
any, but that might be out of scope.

I will try to take a look at this after implementing some cython code to

Post by Arthur de Souza Ribeiro
the module you suggested.

I will start the implementation of json module right now. I created my
github account and as soon as I have code implemented I will send

repository

Post by Arthur de Souza Ribeiro
link.
Thanks for all the points you listed, I will work on all of them and

send

Post by Arthur de Souza Ribeiro
an e-mail.
Best Regards.
[]s
Arthur

Post by Robert Bradshaw
Hope that helps.
- Robert

_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel

Robert Bradshaw

2011-04-08 17:40:55 UTC

On Fri, Apr 8, 2011 at 10:31 AM, Arthur de Souza Ribeiro

You mean just to tell how many modules I'm going to re-implement, but not
telling what modules are these? Re-implementing by community demand?

Yes, exactly (for the last milestone, I think it's good to have more
direction at the start as well as have something to point to when
soliciting feedback. Maybe say "at least three" as it might be some
big ones or a handful of little ones. You, I, and everyone else will
have a better idea of what'll be most profitable at this point.

Post by Arthur de Souza Ribeiro
Thank you very much again.

No problem, thanks for your interest.

Post by Arthur de Souza Ribeiro
Best Regards.
[]s
Arthur

Post by Robert Bradshaw
- Robert
On Thu, Apr 7, 2011 at 11:38 PM, Arthur de Souza Ribeiro

Post by Robert Bradshaw
On Thu, Apr 7, 2011 at 2:31 PM, Arthur de Souza Ribeiro

- Python ships with extensive regression tests--use (and possibly
augment) those to test your work rather than writing your own.

Thank you for that information Robert, I didn't realize this.

I'm going to refactor this Robert, as soon as I remake my project's
roadmap I'll send to you again.

Post by Robert Bradshaw
- Now that generators are supported, it could also be interesting to
look at compiling all the non-C modules and fixing exposed bugs if
any, but that might be out of scope.

I will try to take a look at this after implementing some cython code to a
the module you suggested.

I will start the implementation of json module right now. I created my
github account and as soon as I have code implemented I will send repository
link.
Thanks for all the points you listed, I will work on all of them and send
an e-mail.
Best Regards.
[]s
Arthur

Post by Robert Bradshaw
Hope that helps.
- Robert

I've wrote a proposal to the project: Reimplement C modules in CPython's
standard library in Cython.
I'd be glad if you could take a look a it and give me your feedback.
the link for the proposal is: http://wiki.cython.org/arthursribeiro
Thank you.
Best Regards
Arthur

_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel

Arthur de Souza Ribeiro

2011-04-08 17:59:36 UTC

The moduels suggested for the first two milestones you think are ok?

Best Regards..

[]s

Arthur

Post by Robert Bradshaw
On Fri, Apr 8, 2011 at 10:31 AM, Arthur de Souza Ribeiro

I'm going to update this on the wiki page, it's just because the

application

Post by Arthur de Souza Ribeiro
period in google-melange ends today and it's not recommended to put

details

Post by Arthur de Souza Ribeiro
os how you plan to make the project in there. About the comparison, I

wonder

Post by Arthur de Souza Ribeiro
talk to the community a little more about it, because I see in cython's

web

Post by Arthur de Souza Ribeiro
page and in the tutorials numbers telling how much more efficient cython

Post by Arthur de Souza Ribeiro
against python, so I planned to use the same strategy that is used to

show

Post by Arthur de Souza Ribeiro
the numbers that are there. And about the tests, I'm studying how can I
check compatibility and do these tasks to put that on wiki too.

You mean just to tell how many modules I'm going to re-implement, but not
telling what modules are these? Re-implementing by community demand?

Post by Arthur de Souza Ribeiro
Thank you very much again.

No problem, thanks for your interest.

Post by Arthur de Souza Ribeiro
Best Regards.
[]s
Arthur

Post by Robert Bradshaw
- Robert
On Thu, Apr 7, 2011 at 11:38 PM, Arthur de Souza Ribeiro

Post by Arthur de Souza Ribeiro
I've made some changes to my proposal, as you said, I changed the

number

Post by Arthur de Souza Ribeiro
of
modules I'm going to reimplement, I jumped from three to twelve

modules,
http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/arthur_sr/1#

Post by Arthur de Souza Ribeiro
and
http://wiki.cython.org/arthursribeiro
If you could take another look I would appreciate a lot.
Best Regards.
[]s
Arthur

Post by Robert Bradshaw
On Thu, Apr 7, 2011 at 2:31 PM, Arthur de Souza Ribeiro

Post by Arthur de Souza Ribeiro
I've submitted to google the link

http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/arthur_sr/1#

Post by Arthur de Souza Ribeiro
It would be really important if you could give me a feedback to my
proposal...
Thank you
Best Regards
Arthur

- Python ships with extensive regression tests--use (and possibly
augment) those to test your work rather than writing your own.

Thank you for that information Robert, I didn't realize this.

I'm going to refactor this Robert, as soon as I remake my project's
roadmap I'll send to you again.

Post by Robert Bradshaw
- Now that generators are supported, it could also be interesting to
look at compiling all the non-C modules and fixing exposed bugs if
any, but that might be out of scope.

I will try to take a look at this after implementing some cython code to a
the module you suggested.

Post by Robert Bradshaw
What I'd like to see is an implementation of a single simple but not
entirely trivial (e.g. not math) module, passing regression tests

with

Post by Robert Bradshaw
comprable if not better speed than the current C version (though I
think it'd probably make sense to start out with the Python version
and optimize that). E.g. http://docs.python.org/library/json.html
looks like a good candidate. That should only take 8 hours or so,
maybe two days at most, given your background. I'm not expecting
anything before the application deadline, but if you could whip
something like this out in the next week to point to that would help
your application out immensely. In fact, one of the Python
foundation's requirements is that students submit a patch before

being

Post by Robert Bradshaw
accepted, and this would knock out that requirement and give you a
chance to prove yourself. Create an account on https://github.comand
commit your code into a new repository there.

I will start the implementation of json module right now. I created

Post by Arthur de Souza Ribeiro
github account and as soon as I have code implemented I will send repository
link.
Thanks for all the points you listed, I will work on all of them and send
an e-mail.
Best Regards.
[]s
Arthur

Post by Robert Bradshaw
Hope that helps.
- Robert

http://wiki.cython.org/arthursribeiro

Post by Arthur de Souza Ribeiro
Thank you.
Best Regards
Arthur

_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel

mark florisson

2011-04-08 18:03:40 UTC

On 8 April 2011 19:59, Arthur de Souza Ribeiro

Post by Arthur de Souza Ribeiro
The moduels suggested for the first two milestones you think are ok?
Best Regards..
[]s
Arthur

You mention the 'dis' module, but isn't that one (and 'opcode' too)
entirely written in Python?

Arthur de Souza Ribeiro

2011-04-08 18:12:00 UTC

My mistake, I just typed wrong, I was talking about the nis one... By the
way, I changed this module to the array one...

Best Regards.

Arthur

Post by mark florisson
On 8 April 2011 19:59, Arthur de Souza Ribeiro

Post by Arthur de Souza Ribeiro
The moduels suggested for the first two milestones you think are ok?
Best Regards..
[]s
Arthur

You mention the 'dis' module, but isn't that one (and 'opcode' too)
entirely written in Python?
_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel

Stefan Behnel

2011-04-12 06:42:29 UTC

2011/4/7 Robert Bradshaw

I will start the implementation of json module right now. I created my
github account and as soon as I have code implemented I will send repository
link.

Arthur de Souza Ribeiro

2011-04-12 12:59:11 UTC

Hi Stefan, yes, I'm working on this, in fact I'm trying to recompile json
module (http://docs.python.org/library/json.html) adding some type
definitions and cython things o get the code faster.

I'm getting in trouble with some things too, I'm going to enumerate here so
that, you could give me some tips about how to solve them.

1 - Compile package modules - json module is inside a package (files:
__init__.py, decoder.py, encoder.py, decoder.py) is there a way to generate
the cython modules just like its get generated by cython?

2 - Because I'm getting in trouble with issue #1, I'm running the tests
manually, I go to %Python-dir%/Lib/tests/json_tests, get the files
corresponding to the tests python make and run manually.

3 - To get the performance of the module, I'm thinking about to use the
timeit function in the unit tests for the project. I think a good number of
executions would be made and it would be possible to compare each time.

4 - I didn't create the .pxd files, some problems are happening, it tells
methods are not defined, but, they are defined, I will try to investigate
this better

The code is in this repository:
https://github.com/arthursribeiro/JSON-module your feedback would be very
important, so that I could improve my skills to get more and more able to
work sooner in the project.

I think some things implemented in this rewriting process are going to be
useful when doing this with C modules...

Thank you very much.

Best Regards.

[]s

Arthur

2011/4/7 Robert Bradshaw

I will start the implementation of json module right now. I created my
github account and as soon as I have code implemented I will send repository
link.

Any news on this? We're currently discussing which of the Cython-related
projects to mentor. It's likely that not all of our projects will get
accepted, so if you could get us a good initial idea about your work, we'd
have a stronger incentive to value yours over the others.
Stefan
_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel

Sturla Molden

2011-04-12 15:33:49 UTC

Post by Arthur de Souza Ribeiro
__init__.py, decoder.py, encoder.py, decoder.py) is there a way to
generate the cython modules just like its get generated by cython?

I'll propose these 10 guidelines:

1. The major concern is to replace the manual use of Python C API with
Cython. We aim to improve correctness and readability, not speed.

2. Replacing plain C with Cython for readability is less important,
sometimes even discourged. If you do, it's ok to leverage on Python
container types if it makes the code concise and readable, even if it
will sacrifice some speed.

3. Use exceptions instead of C style error checks: It's better to ask
forgiveness than permission.

4. Use exceptions correctly. All resourse C allocation belongs in
__cinit__. All C resource deallocation belongs in __dealloc__. Remember
that exceptions can cause resource leaks if you don't. Wrap all resource
allocation in an extension type. Never use functions like malloc or
fopen directly in your Cython code, except in a __cinit__ method.

5. We should keep as much of the code in Python as we can. Replacing
Python with Cython for speed is less important. Only the parts that will
really benefit from static typing should be changed to Cython.

6. Leave the __init__.py file as it is. A Python package is allowed
contain a mix of Python source files and Cython extension libraries.

7. Be careful to release the GIL whenever appropriate, and never release
it otherwise. Don't yield the GIL just because you can, it does not come
for free, even with a single thread.

8. Use the Python and C standard libraries whenever you can. Don't
re-invent the wheel. Don't use system dependent APIs when the standard
libraries declare a common interface. Callbacks to Python are ok.

9. Write code that will work correctly on 32 and 64 bit systems, big- or
little-endian. Know your C: Py_intptr_t can contain a pointer.
Py_ssize_t can represent the largest array size allowed. Py_intptr_t and
Py_ssize_t can have different size. The native array offset can be
different from Py_ssize_t, for which a common example is AMD64.

10. Don't clutter the namespace, use pxd includes. Short source files
are preferred to long. Simple is better than complex. Keep the source
nice and tidy.

Sturla

Robert Bradshaw

2011-04-12 20:55:01 UTC

Post by Sturla Molden

Post by Arthur de Souza Ribeiro
__init__.py, decoder.py, encoder.py, decoder.py) is there a way to generate
the cython modules just like its get generated by cython?

1. The major concern is to replace the manual use of Python C API with
Cython. We aim to improve correctness and readability, not speed.

Speed is a concern, otherwise many of these modules wouldn't have been
written in C in the first place (at least, not the ones with a pure
Python counterpart). Of course some of them are just wrapping C
libraries where speed doesn't matter as much.

Post by Sturla Molden
2. Replacing plain C with Cython for readability is less important,
sometimes even discourged.

Huh? I'd say this is a big point of the project. Maybe less so than
manual dependance on the C API, but certainly not discouraged.

Post by Sturla Molden
If you do, it's ok to leverage on Python
container types if it makes the code concise and readable, even if it will
sacrifice some speed.

That's true.

Post by Sturla Molden
3. Use exceptions instead of C style error checks: It's better to ask
forgiveness than permission.

Yep, this is natural in Cython.

Post by Sturla Molden
4. Use exceptions correctly. All resourse C allocation belongs in __cinit__.
All C resource deallocation belongs in __dealloc__. Remember that exceptions
can cause resource leaks if you don't. Wrap all resource allocation in an
extension type. Never use functions like malloc or fopen directly in your
Cython code, except in a __cinit__ method.

This can be useful advice, but is not strictly necessary. Try..finally
can fill this need as well.

Post by Sturla Molden
5. We should keep as much of the code in Python as we can. Replacing Python
with Cython for speed is less important. Only the parts that will really
benefit from static typing should be changed to Cython.

True. Of course, compiling the (unchanged) pure Python files with
Cython could also yield interesting results, but that's not part of
the project.

Post by Sturla Molden
6. Leave the __init__.py file as it is. A Python package is allowed contain
a mix of Python source files and Cython extension libraries.
7. Be careful to release the GIL whenever appropriate, and never release it
otherwise. Don't yield the GIL just because you can, it does not come for
free, even with a single thread.
8. Use the Python and C standard libraries whenever you can. Don't
re-invent the wheel. Don't use system dependent APIs when the standard
libraries declare a common interface. Callbacks to Python are ok.
9. Write code that will work correctly on 32 and 64 bit systems, big- or
little-endian. Know your C: Py_intptr_t can contain a pointer. Py_ssize_t
can represent the largest array size allowed. Py_intptr_t and Py_ssize_t can
have different size. The native array offset can be different from
Py_ssize_t, for which a common example is AMD64.

It's rare to have to do pointer arithmetic in Cython, and rarer still
to have to store the pointer as an integer.

Post by Sturla Molden
10. Don't clutter the namespace, use pxd includes. Short source files are
preferred to long. Simple is better than complex. Keep the source nice and
tidy.

Not sure what you mean by "pxd includes," but yes, you should use pxd
files and cimport just as you would in Python to keep things
manageable and modular.

- Robert

Stefan Behnel

2011-04-12 18:22:05 UTC

Post by Arthur de Souza Ribeiro
Hi Stefan, yes, I'm working on this, in fact I'm trying to recompile json
module (http://docs.python.org/library/json.html) adding some type
definitions and cython things o get the code faster.

Cool.

Post by Arthur de Souza Ribeiro
I'm getting in trouble with some things too, I'm going to enumerate here so
that, you could give me some tips about how to solve them.
__init__.py, decoder.py, encoder.py, decoder.py) is there a way to generate
the cython modules just like its get generated by cython?

The __init__.py doesn't really look performance critical. It's better to
leave that modules in plain Python, that improves readability by reducing
surprises and simplifies reuse by other implementations.

That being said, you can compile each module separately, just use the
"cython" command line tool for that, or write a little distutils script as in

http://docs.cython.org/src/quickstart/build.html#building-a-cython-module-using-distutils

Don't worry too much about a build integration for now.

Post by Arthur de Souza Ribeiro
2 - Because I'm getting in trouble with issue #1, I'm running the tests
manually, I go to %Python-dir%/Lib/tests/json_tests, get the files
corresponding to the tests python make and run manually.

That's fine.

Post by Arthur de Souza Ribeiro
3 - To get the performance of the module, I'm thinking about to use the
timeit function in the unit tests for the project. I think a good number of
executions would be made and it would be possible to compare each time.

That's ok for a start, artificial benchmarks are good to test specific
functionality. However, unit tests tend to be short running with a lot of
overhead, so later on, you will need to use real code to benchmark the
modules. I would expect that there are benchmarks for JSON implementations
around, and you can just generate a large JSON file and run loads and dumps
on it.

Post by Arthur de Souza Ribeiro
4 - I didn't create the .pxd files, some problems are happening, it tells
methods are not defined, but, they are defined, I will try to investigate
this better

When reporting usage related problems (preferably on the cython-users
mailing list), it's best to present the exact error messages and the
relevant code snippets, so that others can quickly understand what's going
on and/or reproduce the problem.

Post by Arthur de Souza Ribeiro
https://github.com/arthursribeiro/JSON-module your feedback would be very
important, so that I could improve my skills to get more and more able to
work sooner in the project.

I'd strongly suggest implementing this in pure Python (.py files instead of
.pyx files), with externally provided static types for performance. A
single code base is very advantageous for a large project like CPython,
much more than the ultimate 5% better performance.

Post by Arthur de Souza Ribeiro
I think some things implemented in this rewriting process are going to be
useful when doing this with C modules...

Well, if you can get the existing Python implementation up to mostly
comparable speed as the C implementation, then there is no need to care
about the C module anymore. Even if you can get only 90% of a module to run
at comparable speed, and need to keep 10% in plain C, that's already a huge
improvement in terms of maintainability.

Stefan

Robert Bradshaw

2011-04-12 20:42:02 UTC

Cool.

The __init__.py doesn't really look performance critical. It's better to
leave that modules in plain Python, that improves readability by reducing
surprises and simplifies reuse by other implementations.
That being said, you can compile each module separately, just use the
"cython" command line tool for that, or write a little distutils script as in
http://docs.cython.org/src/quickstart/build.html#building-a-cython-module-using-distutils
Don't worry too much about a build integration for now.

That's fine.

Post by Arthur de Souza Ribeiro
3 - To get the performance of the module, I'm thinking about to use the
timeit function in the unit tests for the project. I think a good number
of
executions would be made and it would be possible to compare each time.

Post by Arthur de Souza Ribeiro
4 - I didn't create the .pxd files, some problems are happening, it tells
methods are not defined, but, they are defined, I will try to investigate
this better

I'd strongly suggest implementing this in pure Python (.py files instead of
.pyx files), with externally provided static types for performance. A single
code base is very advantageous for a large project like CPython, much more
than the ultimate 5% better performance.

While this is advantageous for the final product, it may not be the
easiest to get up and running with.

Post by Arthur de Souza Ribeiro
I think some things implemented in this rewriting process are going to be
useful when doing this with C modules...

Stefan Behnel

2011-04-13 16:58:12 UTC

I'd strongly suggest implementing this in pure Python (.py files instead of
.pyx files), with externally provided static types for performance. A single
code base is very advantageous for a large project like CPython, much more
than the ultimate 5% better performance.

While this is advantageous for the final product, it may not be the
easiest to get up and running with.

Agreed. Arthur, it's fine if you write Cython code in a .pyx file to get
started. You can just extract the declarations later.

Stefan

Arthur de Souza Ribeiro

2011-04-15 02:31:09 UTC

I've created the .pyx files and it passed in all python tests.

To test them, as I said, I copied the .py test files to my project
directory, generated the .so files, import them instead of python modules
and run. I run every test file and it passed in all of them. To run the
tests, run the file 'run-tests.sh'

I used just .pyx in this module, should I reimplement it using pxd with the
normal .py?

I'll still see the performance, and tell here... The code is updated in
repository...

Best Regards

[]s

Arthur

I'd strongly suggest implementing this in pure Python (.py files instead of
.pyx files), with externally provided static types for performance. A single
code base is very advantageous for a large project like CPython, much more
than the ultimate 5% better performance.

While this is advantageous for the final product, it may not be the
easiest to get up and running with.

Agreed. Arthur, it's fine if you write Cython code in a .pyx file to get
started. You can just extract the declarations later.
Stefan
_______________________________________________
cython-devel mailing list
http://mail.python.org/mailman/listinfo/cython-devel

Stefan Behnel

2011-04-15 06:31:47 UTC

[please avoid top-posting]

Post by Arthur de Souza Ribeiro
I've created the .pyx files and it passed in all python tests.

Fine.

As far as I can see, you only added static types in some places. Did you
test if they are actually required (maybe using "cython -a")? Some of them
look rather counterproductive and should lead to a major slow-down. I added
comments to your initial commit.

Note that it's not obvious from your initial commit what you actually
changed. It would have been better to import the original file first,
rename it to .pyx, and then commit your changes.

It appears that you accidentally added your .c and .so files to your repo.

https://github.com/arthursribeiro/JSON-module

Post by Arthur de Souza Ribeiro
To test them, as I said, I copied the .py test files to my project
directory, generated the .so files, import them instead of python modules
and run. I run every test file and it passed in all of them. To run the
tests, run the file 'run-tests.sh'
I used just .pyx in this module, should I reimplement it using pxd with the
normal .py?

Not at this point. I think it's more important to get some performance
numbers to see how your module behaves compared to the C accelerator module
(_json.c). I think the best approach to this project would actually be to
start with profiling the Python implementation to see where performance
problems occur (or to look through _json.c to see what the CPython
developers considered performance critical), and then put the focus on
trying to speed up only those parts of the Python implementation, by adding
static types and potentially even rewriting them in a way that Cython can
optimise them better.

Stefan

Arthur de Souza Ribeiro

2011-04-17 18:07:38 UTC

Post by Stefan Behnel
[please avoid top-posting]
I've created the .pyx files and it passed in all python tests.
Fine.
As far as I can see, you only added static types in some places.

Did you test if they are actually required (maybe using "cython -a")? Some

Post by Stefan Behnel
of them look rather counterproductive and should lead to a major slow-down.

In fact, I didn't, but, after you told me to do that, I run cython -a and
removed some unnecessary types.

Post by Stefan Behnel
I added comments to your initial commit.

Hi Stefan, about your first comment : "And it's better to let Cython know
that this name refers to a function." in line 69 of encoder.pyx file I
didn't understand well what does that mean, can you explain more this
comment?

About the other comments, I think I solved them all, any problem with them
or other ones, please tell me. I'll try to fix.

Post by Stefan Behnel
Note that it's not obvious from your initial commit what you actually
changed. It would have been better to import the original file first, rename
it to .pyx, and then commit your changes.

I created a directory named 'Diff files' where I put the files generated by
'diff' command that i run in my computer, if you think it still be better if
I commit and then change, there is no problem for me...

Post by Stefan Behnel
It appears that you accidentally added your .c and .so files to your repo.
https://github.com/arthursribeiro/JSON-module

Removed them.

Post by Stefan Behnel
To test them, as I said, I copied the .py test files to my project

Post by Arthur de Souza Ribeiro
directory, generated the .so files, import them instead of python modules
and run. I run every test file and it passed in all of them. To run the
tests, run the file 'run-tests.sh'
I used just .pyx in this module, should I reimplement it using pxd with the
normal .py?

I've profilled the module I created and the module that is in Python 3.2,
the result is that the cython module spent about 73% less time then python's
one, the output to the module was like this (blue for cython, red for
python):

The behavior between my module and python's one seems to be the same I think
that's the way it should be.

JSONModule nested_dict
10004 function calls in 0.268 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
10000 0.196 0.000 0.196 0.000 :0(dumps)
1 0.000 0.000 0.268 0.268 :0(exec)
1 0.000 0.000 0.000 0.000 :0(setprofile)
1 0.072 0.072 0.268 0.268 <string>:1(<module>)
1 0.000 0.000 0.268 0.268 profile:0(for ii in
range(10000): fun(thing))
0 0.000 0.000 profile:0(profiler)

json nested_dict
60004 function calls in 1.016 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.016 1.016 :0(exec)
20000 0.136 0.000 0.136 0.000 :0(isinstance)
10000 0.120 0.000 0.120 0.000 :0(join)
1 0.000 0.000 0.000 0.000 :0(setprofile)
1 0.088 0.088 1.016 1.016 <string>:1(<module>)
10000 0.136 0.000 0.928 0.000 __init__.py:180(dumps)
10000 0.308 0.000 0.792 0.000 encoder.py:172(encode)
10000 0.228 0.000 0.228 0.000 encoder.py:193(iterencode)
1 0.000 0.000 1.016 1.016 profile:0(for ii in
range(10000): fun(thing))
0 0.000 0.000 profile:0(profiler)

JSONModule ustring
10004 function calls in 0.140 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
10000 0.072 0.000 0.072 0.000 :0(dumps)
1 0.000 0.000 0.140 0.140 :0(exec)
1 0.000 0.000 0.000 0.000 :0(setprofile)
1 0.068 0.068 0.140 0.140 <string>:1(<module>)
1 0.000 0.000 0.140 0.140 profile:0(for ii in
range(10000): fun(thing))
0 0.000 0.000 profile:0(profiler)

json ustring
40004 function calls in 0.580 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
10000 0.092 0.000 0.092 0.000 :0(encode_basestring_ascii)
1 0.004 0.004 0.580 0.580 :0(exec)
10000 0.060 0.000 0.060 0.000 :0(isinstance)
1 0.000 0.000 0.000 0.000 :0(setprofile)
1 0.100 0.100 0.576 0.576 <string>:1(<module>)
10000 0.152 0.000 0.476 0.000 __init__.py:180(dumps)
10000 0.172 0.000 0.324 0.000 encoder.py:172(encode)
1 0.000 0.000 0.580 0.580 profile:0(for ii in
range(10000): fun(thing))
0 0.000 0.000 profile:0(profiler)

The code is upated in repository, any comments that you might have, please,
let me know. Thank you very much for your feedback.

Best Regards.

[]s

Arthur

Post by Stefan Behnel
Stefan

Sturla Molden

2011-04-17 18:24:11 UTC

Post by Arthur de Souza Ribeiro
I've profilled the module I created and the module that is in Python
3.2, the result is that the cython module spent about 73% less time
then python's one, the output to the module was like this (blue for

The number of function calls are different. For nested_dict, you have
37320 calls per second for Cython and 59059 calls per second for Python.
I am not convinced that is better.

Sturla

Stefan Behnel

2011-04-17 19:16:39 UTC

Received: from localhost (HELO mail.python.org) (127.0.0.1)
by albatross.python.org with SMTP; 17 Apr 2011 21:20:32 +0200
Received: from mo-p00-ob.rzone.de (mo-p00-ob.rzone.de [81.169.146.160])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(No client certificate requested)
by mail.python.org (Postfix) with ESMTPS
for <cython-devel-+ZN9ApsXKcEdnm+***@public.gmane.org>; Sun, 17 Apr 2011 21:20:31 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; t=1303067800; l=925;
s=domk; d=behnel.de;
h=Content-Transfer-Encoding:Content-Type:In-Reply-To:References:
Subject:To:MIME-Version:From:Date:X-RZG-CLASS-ID:X-RZG-AUTH;
bh=IPTeDKxDVoKOfdsQHVlxgmIOhv4=;
b=i78p33olK0+94gGlSOwJJqSoj5YcSTs2zI8ueVPJSWSs42mrPB+LKzUOJdNeh7AIpFp
lORCJ/bnbO4eM1EKW2juob1iK/bDNdQVzn+14Dc6+UyEJ5nql3+YOP7BZzD9cZig0geN4
VF5LhOpY2I+vStxHUncDa3gaJvosSYBRsTw=
X-RZG-AUTH: :P3gBc0GmW/1m+Hpz/iE67KMIbYPRIpfI4s1sy6q7DnJ7L8q/42HzBuMNi/g=
X-RZG-CLASS-ID: mo00
Received: from [192.168.178.29]
(dslb-084-056-059-146.pools.arcor-ip.net [84.56.59.146])
by post.strato.de (jimi mo53) (RZmta 25.14)
with ESMTPA id 2014c8n3HILlZg for <cython-devel-+ZN9ApsXKcEdnm+***@public.gmane.org>;
Sun, 17 Apr 2011 21:16:40 +0200 (MEST)
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
rv:1.9.2.14) Gecko/20110223 Lightning/1.0b2 Thunderbird/3.1.8
In-Reply-To: <4DAB304B.4010801-***@public.gmane.org>
X-BeenThere: cython-devel-+ZN9ApsXKcEdnm+***@public.gmane.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Core developer mailing list of the Cython compiler
<cython-devel.python.org>
List-Unsubscribe: <http://mail.python.org/mailman/options/cython-devel>,
<mailto:cython-devel-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=unsubscribe>
List-Archive: <http://mail.python.org/pipermail/cython-devel>
List-Post: <mailto:cython-devel-+ZN9ApsXKcEdnm+***@public.gmane.org>
List-Help: <mailto:cython-devel-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=help>
List-Subscribe: <http://mail.python.org/mailman/listinfo/cython-devel>,
<mailto:cython-devel-request-+ZN9ApsXKcEdnm+***@public.gmane.org?subject=subscribe>
Sender: cython-devel-bounces+gcpc-cython-dev-2=m.gmane.org-+ZN9ApsXKcEdnm+***@public.gmane.org
Errors-To: cython-devel-bounces+gcpc-cython-dev-2=m.gmane.org-+ZN9ApsXKcEdnm+***@public.gmane.org
Archived-At: <http://permalink.gmane.org/gmane.comp.python.cython.devel/11722>

Post by Arthur de Souza Ribeiro
I've profilled the module I created and the module that is in Python 3.2,
the result is that the cython module spent about 73% less time then
python's one, the output to the module was like this (blue for cython,

The number of function calls are different. For nested_dict, you have 37320
calls per second for Cython and 59059 calls per second for Python. I am not
convinced that is better.

Note that there are 20000 calls to isinstance(), which Cython handles
internally. The profiler cannot see those.

However, the different number of functions calls also makes the profiling
results less comparable, since there are fewer calls into the profiler.
This leads to a lower performance penalty for Cython in the absolute
timings, and consequently to an unfair comparison.

Stefan

Sturla Molden

2011-04-17 20:11:19 UTC

Post by Stefan Behnel
However, the different number of functions calls also makes the
profiling results less comparable, since there are fewer calls into
the profiler. This leads to a lower performance penalty for Cython in
the absolute timings, and consequently to an unfair comparison.

As I understand it, the profiler will give a profile of a module.

To measure absolute performance, one should use timeit or just time.clock.

Sturla

Stefan Behnel

2011-04-17 20:56:48 UTC

Post by Arthur de Souza Ribeiro
Hi Stefan, about your first comment : "And it's better to let Cython know
that this name refers to a function." in line 69 of encoder.pyx file I
didn't understand well what does that mean, can you explain more this
comment?

Hmm, sorry, I think that was not so important. That code line is only used
to override the Python implementation with the implementation from the
external C accelerator module. To do that, it assigns either of the two
functions to a name. So, when that name is called in the code, Cython
cannot know that it actually is a function, and has to resort to Python
calling, whereas a visible c(p)def function that is defined inside of the
same module could be called faster.

I missed the fact that this name isn't really used inside of the module, so
whether Cython knows that it's a function or not isn't really all that
important.

I added another comment to this commit, though:

https://github.com/arthursribeiro/JSON-module/commit/e2d80e0aeab6d39ff2d9b847843423ebdb9c57b7#diff-4

Post by Arthur de Souza Ribeiro
About the other comments, I think I solved them all, any problem with them
or other ones, please tell me. I'll try to fix.

It looks like you fixed a good deal of them.

I actually tried to work with your code, but I'm not sure how you are
building it. Could you give me a hint on that?

Where did you actually take the original code from? Python 3.2? Or from
Python's hg branch?

Diff only gives you the final outcome. Committing on top of the original
files has the advantage of making the incremental changes visible
separately. That makes it clearer what you tried, and a good commit comment
will then make it clear why you did it.

Post by Stefan Behnel
I think it's more important to get some performance
numbers to see how your module behaves compared to the C accelerator module
(_json.c). I think the best approach to this project would actually be to
start with profiling the Python implementation to see where performance
problems occur (or to look through _json.c to see what the CPython
developers considered performance critical), and then put the focus on
trying to speed up only those parts of the Python implementation, by adding
static types and potentially even rewriting them in a way that Cython can
optimise them better.

I've profilled the module I created and the module that is in Python 3.2,
the result is that the cython module spent about 73% less time then python's

That's a common mistake when profiling: the actual time it takes to run is
not meaningful. Depending on how far the two profiled programs differ, they
may interact with the profiler in more or less intensive ways (as is
clearly the case here), so the total time it takes for the programs to run
can differ quite heavily under profiling, even if the non-profiled programs
run at exactly the same speed.

Also, I don't think you have enabled profiling for the Cython code. You can
do that by passing the "profile=True" directive to the compiler, or by
putting it at the top of the source files. That will add module-inner
function calls to the profiling output. Note, however, that enabling
profiling will slow down the execution, so disable it when you measure
absolute run times.

http://docs.cython.org/src/tutorial/profiling_tutorial.html
Colours tend to pass rather badly through mailing lists. Many people
disable the HTML presentation of e-mails, and plain text does not have
colours. But it was still obvious enough what you meant.

Post by Arthur de Souza Ribeiro
The behavior between my module and python's one seems to be the same I think
that's the way it should be.
JSONModule nested_dict
10004 function calls in 0.268 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
10000 0.196 0.000 0.196 0.000 :0(dumps)

This is a pretty short list (I stripped the uninteresting parts). The
profile right below shows a lot more entries in encoder.py. It would be
good to see these calls in the Cython code as well.

Post by Arthur de Souza Ribeiro
json nested_dict
60004 function calls in 1.016 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.016 1.016 :0(exec)
20000 0.136 0.000 0.136 0.000 :0(isinstance)
10000 0.120 0.000 0.120 0.000 :0(join)
1 0.000 0.000 0.000 0.000 :0(setprofile)
1 0.088 0.088 1.016 1.016<string>:1(<module>)
10000 0.136 0.000 0.928 0.000 __init__.py:180(dumps)
10000 0.308 0.000 0.792 0.000 encoder.py:172(encode)
10000 0.228 0.000 0.228 0.000 encoder.py:193(iterencode)
[...]
JSONModule ustring
10004 function calls in 0.140 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
10000 0.072 0.000 0.072 0.000 :0(dumps)
[...]
json ustring
40004 function calls in 0.580 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
10000 0.092 0.000 0.092 0.000 :0(encode_basestring_ascii)
1 0.004 0.004 0.580 0.580 :0(exec)
10000 0.060 0.000 0.060 0.000 :0(isinstance)
1 0.000 0.000 0.000 0.000 :0(setprofile)
1 0.100 0.100 0.576 0.576<string>:1(<module>)
10000 0.152 0.000 0.476 0.000 __init__.py:180(dumps)
10000 0.172 0.000 0.324 0.000 encoder.py:172(encode)
The code is upated in repository, any comments that you might have, please,
let me know. Thank you very much for your feedback.

Thank you for the numbers. Could you add absolute timings using timeit? And
maybe also try with larger input data?

ISTM that a lot of overhead comes from calls that Cython can easily
optimise all by itself: isinstance() and (bytes|unicode).join(). That's the
kind of observation that previously let me suggest to start by benchmarking
and profiling in the first place. Cython compiled code has quite different
performance characteristics from code executing in CPython's interpreter,
so it's important to start by getting an idea of how the code behaves when
compiled, and then optimising it in the places where it still needs to run
faster.

Optimisation is an incremental process, and you will often end up reverting
changes along the way when you see that they did not improve the
performance, or maybe just made it so slightly faster that the speed
improvement is not worth the code degradation of the optimisation change in
question.

Could you try to come up with a short list of important code changes you
made that let this module run faster, backed by some timings that show the
difference with and without each change?

Stefan

Arthur de Souza Ribeiro

2011-04-20 03:04:25 UTC

Post by Arthur de Souza Ribeiro
Hi Stefan, about your first comment : "And it's better to let Cython know

Post by Arthur de Souza Ribeiro
that this name refers to a function." in line 69 of encoder.pyx file I
didn't understand well what does that mean, can you explain more this
comment?

Hmm, sorry, I think that was not so important. That code line is only used
to override the Python implementation with the implementation from the
external C accelerator module. To do that, it assigns either of the two
functions to a name. So, when that name is called in the code, Cython cannot
know that it actually is a function, and has to resort to Python calling,
whereas a visible c(p)def function that is defined inside of the same module
could be called faster.
I missed the fact that this name isn't really used inside of the module, so
whether Cython knows that it's a function or not isn't really all that
important.

So, I don't have to be worried about this, right?

Post by Arthur de Souza Ribeiro
https://github.com/arthursribeiro/JSON-module/commit/e2d80e0aeab6d39ff2d9b847843423ebdb9c57b7#diff-4

I saw your comment and what I understood of it is that the alias that are
being attributed to the type names make code slower, I tried to compile in
cython the same way that it was in python, but, there is something wrong
with it. It says:

Error compiling Cython file:
------------------------------------------------------------
...
def _make_iterencode(dict markers, _default, _encoder, _indent, _floatstr,
_key_separator, _item_separator, bint _sort_keys, bint _skipkeys,
bint _one_shot,
## HACK: hand-optimized bytecode; turn globals into locals
ValueError=ValueError,
dict=dict,
float=float,
^
------------------------------------------------------------

encoder.pyx:273:13: Empty declarator

I turned that way because I think the user can maybe change what types are
going to be used and cython do not allow do these things like python. (for
reserved words)

Post by Arthur de Souza Ribeiro
About the other comments, I think I solved them all, any problem with them

Post by Arthur de Souza Ribeiro
or other ones, please tell me. I'll try to fix.

It looks like you fixed a good deal of them.
I actually tried to work with your code, but I'm not sure how you are
building it. Could you give me a hint on that?

I'm manually building them using setup.py files, for every module I create
one and build manually, I don't think that's the best way to do it, but, to
test things, that's the way I'm doing.

Post by Arthur de Souza Ribeiro
Where did you actually take the original code from? Python 3.2? Or from
Python's hg branch?

I took the original code from Python 3.2

Post by Arthur de Souza Ribeiro
Note that it's not obvious from your initial commit what you actually

Post by Stefan Behnel
changed. It would have been better to import the original file first, rename
it to .pyx, and then commit your changes.

Post by Stefan Behnel
numbers to see how your module behaves compared to the C accelerator module
(_json.c). I think the best approach to this project would actually be to
start with profiling the Python implementation to see where performance
problems occur (or to look through _json.c to see what the CPython
developers considered performance critical), and then put the focus on
trying to speed up only those parts of the Python implementation, by adding
static types and potentially even rewriting them in a way that Cython can
optimise them better.

I've profilled the module I created and the module that is in Python 3.2,
the result is that the cython module spent about 73% less time then python's

That's a common mistake when profiling: the actual time it takes to run is
not meaningful. Depending on how far the two profiled programs differ, they
may interact with the profiler in more or less intensive ways (as is clearly
the case here), so the total time it takes for the programs to run can
differ quite heavily under profiling, even if the non-profiled programs run
at exactly the same speed.
Also, I don't think you have enabled profiling for the Cython code. You can
do that by passing the "profile=True" directive to the compiler, or by
putting it at the top of the source files. That will add module-inner
function calls to the profiling output. Note, however, that enabling
profiling will slow down the execution, so disable it when you measure
absolute run times.
http://docs.cython.org/src/tutorial/profiling_tutorial.html

As you said, I didn't enable profiling for Cython code, I did it and got a
bigger number of function calls if compared to the old ones. I added a new
test case for list object and profiled the code, as you said, They differ
exactly by the number of calls to isinstance function, the result stayed
like this:

------------------------------------ USING Profiler
------------------------------------
JSONModule nested_dict
Tue Apr 19 23:16:48 2011 Profile.prof

200003 function calls in 0.964 seconds

Random listing order was used

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.964 0.964 {built-in method exec}
50000 0.114 0.000 0.804 0.000 __init__.pyx:179(dumps)
50000 0.217 0.000 0.690 0.000 encoder.pyx:193(encode)
50000 0.473 0.000 0.473 0.000 encoder.pyx:214(iterencode)
50000 0.089 0.000 0.893 0.000 {JSONModule.dumps}
1 0.071 0.071 0.964 0.964 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}

json nested_dict
Tue Apr 19 23:16:49 2011 Profile.prof

300003 function calls in 1.350 seconds

Random listing order was used

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.350 1.350 {built-in method exec}
50000 0.157 0.000 1.255 0.000 __init__.py:180(dumps)
50000 0.115 0.000 0.115 0.000 {method 'join' of 'str'
objects}
50000 0.558 0.000 0.558 0.000 encoder.py:193(iterencode)
50000 0.317 0.000 1.099 0.000 encoder.py:172(encode)
1 0.094 0.094 1.350 1.350 <string>:1(<module>)
100000 0.108 0.000 0.108 0.000 {built-in method isinstance}
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}

JSONModule ustring
Tue Apr 19 23:16:49 2011 Profile.prof

150003 function calls in 0.297 seconds

Random listing order was used

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.297 0.297 {built-in method exec}
50000 0.099 0.000 0.160 0.000 __init__.pyx:179(dumps)
50000 0.061 0.000 0.061 0.000 encoder.pyx:193(encode)
50000 0.082 0.000 0.242 0.000 {JSONModule.dumps}
1 0.055 0.055 0.297 0.297 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}

json ustring
Tue Apr 19 23:16:50 2011 Profile.prof

200003 function calls in 0.419 seconds

Random listing order was used

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.419 0.419 {built-in method exec}
50000 0.118 0.000 0.346 0.000 __init__.py:180(dumps)
50000 0.052 0.000 0.052 0.000 {built-in method
encode_basestring_ascii}
50000 0.138 0.000 0.228 0.000 encoder.py:172(encode)
1 0.072 0.072 0.419 0.419 <string>:1(<module>)
50000 0.038 0.000 0.038 0.000 {built-in method isinstance}
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}

JSONModule xlist
Tue Apr 19 23:16:50 2011 Profile.prof

200003 function calls in 0.651 seconds

Random listing order was used

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.651 0.651 {built-in method exec}
50000 0.108 0.000 0.500 0.000 __init__.pyx:179(dumps)
50000 0.154 0.000 0.392 0.000 encoder.pyx:193(encode)
50000 0.238 0.000 0.238 0.000 encoder.pyx:214(iterencode)
50000 0.086 0.000 0.585 0.000 {JSONModule.dumps}
1 0.065 0.065 0.651 0.651 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}

json xlist
Tue Apr 19 23:16:51 2011 Profile.prof

300003 function calls in 1.029 seconds

Random listing order was used

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.029 1.029 {built-in method exec}
50000 0.145 0.000 0.940 0.000 __init__.py:180(dumps)
50000 0.062 0.000 0.062 0.000 {method 'join' of 'str'
objects}
50000 0.323 0.000 0.323 0.000 encoder.py:193(iterencode)
50000 0.304 0.000 0.795 0.000 encoder.py:172(encode)
1 0.089 0.089 1.029 1.029 <string>:1(<module>)
100000 0.106 0.000 0.106 0.000 {built-in method isinstance}
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}

----------------------------------------------------------------------------------------

Post by Arthur de Souza Ribeiro
Colours tend to pass rather badly through mailing lists. Many people
disable the HTML presentation of e-mails, and plain text does not have
colours. But it was still obvious enough what you meant.

Sorry about this.

Post by Arthur de Souza Ribeiro
Thank you for the numbers. Could you add absolute timings using timeit? And
maybe also try with larger input data?

Using timer, I got the following output:

------------------------------------ USING Timeit
--------------------------------------
JSONModule nested_dict spent 11.39 usec/pass (cython)
JSONModule ustring spent 0.94 usec/pass (cython)
JSONModule xlist spent 5.71 usec/pass (cython)

json nested_dict spent 16.61 usec/pass
json ustring spent 1.88 usec/pass
json xlist spent 10.38 usec/pass
----------------------------------------------------------------------------------------

The testcases are the same of the profile ones.

Post by Arthur de Souza Ribeiro
ISTM that a lot of overhead comes from calls that Cython can easily
optimise all by itself: isinstance() and (bytes|unicode).join(). That's the
kind of observation that previously let me suggest to start by benchmarking
and profiling in the first place. Cython compiled code has quite different
performance characteristics from code executing in CPython's interpreter, so
it's important to start by getting an idea of how the code behaves when
compiled, and then optimising it in the places where it still needs to run
faster.

As you said, starting profiling is a better approach, specially because
every change made reflects in time changes (using timeit of profiling).

Post by Arthur de Souza Ribeiro
Optimisation is an incremental process, and you will often end up reverting
changes along the way when you see that they did not improve the
performance, or maybe just made it so slightly faster that the speed
improvement is not worth the code degradation of the optimisation change in
question.
Could you try to come up with a short list of important code changes you
made that let this module run faster, backed by some timings that show the
difference with and without each change?

To let this module run faster, the bests changes were in classes definitions
(I'm going to show numbers soon) using cinit and defining the variables made
the code faster. Another changes that I made were in for loops. I created an
int variable and made to loop in range of a number instead of a 'for x in
...' statement. The other ones were about to add static types specially to
int and boolean types.

Post by Arthur de Souza Ribeiro
Stefan

Thank you very much again.

Best Regards.

[]s

Arthur

Greg Ewing

2011-04-20 03:34:18 UTC

Post by Arthur de Souza Ribeiro
def _make_iterencode(dict markers, _default, _encoder, _indent, _floatstr,
_key_separator, _item_separator, bint _sort_keys, bint
_skipkeys, bint _one_shot,
## HACK: hand-optimized bytecode; turn globals into locals
ValueError=ValueError,
dict=dict,
float=float,
^
------------------------------------------------------------
encoder.pyx:273:13: Empty declarator

You may need to choose something other than 'float' for the local name to avoid
confusing the parser (it thinks you're about to declare a parameter of type
'float').

--
Greg

Stefan Behnel

2011-04-08 06:50:35 UTC