[Cython] Cython circular cdef import patch

Discussion:

Gary Furnish

2008-03-25 18:17:06 UTC

This patch adds extra logic to code generation to sort dependencies to
guarantee that C code is generated in the right order for circular cdef
imports. It does NOT solve python circular import issues.
a.pyx: cimport b
b.pyx: cimport a
Is thus legal, but you can not reference a or b in the global namespace of
b.pyx or a.pyx (such as to instantiate a class).
This patch also modifies cython to output the exact line in the C code where
an exception was thrown in addition to the currently displayed pyx file and
line. This enables significantly faster developmental debugging.
Finally it splits module initialization into two phases: one that initiates
types and handles imports, and another that executes python commands at the
global namespace level. This will be more useful as Cython starts to assume
more advanced optimization and code generation features
The patch is available at:
http://trac.sagemath.org/sage_trac/ticket/2655(the third attachment
only) and is based against 0.9.6.12,
although I can rebase if needed. I am hoping this can be merged into
Cython.

--Gary Furnish

Stefan Behnel

2008-04-08 17:50:31 UTC

Permalink

Hi,

Post by Gary Furnish
This patch adds extra logic to code generation to sort dependencies to
guarantee that C code is generated in the right order for circular cdef
imports. It does NOT solve python circular import issues.
a.pyx: cimport b
b.pyx: cimport a
Is thus legal, but you can not reference a or b in the global namespace of
b.pyx or a.pyx (such as to instantiate a class).
This patch also modifies cython to output the exact line in the C code where
an exception was thrown in addition to the currently displayed pyx file and
line. This enables significantly faster developmental debugging.
Finally it splits module initialization into two phases: one that initiates
types and handles imports, and another that executes python commands at the
global namespace level. This will be more useful as Cython starts to assume
more advanced optimization and code generation features
http://trac.sagemath.org/sage_trac/ticket/2655(the third attachment
only) and is based against 0.9.6.12,
although I can rebase if needed. I am hoping this can be merged into
Cython.

The patch doesn't work for me. When I fix the problems with the header file
generation, I end up with C code that defines the "__pyx_obj_*" class structs
in the wrong order. Note that there are no circular dependencies involved in
my case and that the two classes where the problem occurs are defined in the
same source file (although their highest base class is defined in a separate
file, not sure if that matters.

Personally, seeing all problems I have with this patch and not really seeing a
clear gain, I would just revert the patch for now.

Stefan

Gary Furnish

2008-04-08 22:00:54 UTC

Permalink

Can you produce a testcase for this issue? It is *critical* for fast
symbolics in Sage, so I would prefer to fix the bug as opposed to reverting.
I see the problem now, but I will need the test cases you are using to
produce a fix. Once I have them it should be a relatively easy patch.

Post by Gary Furnish
Hi,

Post by Gary Furnish
b.pyx or a.pyx (such as to instantiate a class).
This patch also modifies cython to output the exact line in the C code

where

Post by Gary Furnish
an exception was thrown in addition to the currently displayed pyx file

and

Post by Gary Furnish
line. This enables significantly faster developmental debugging.
Finally it splits module initialization into two phases: one that

initiates

Post by Gary Furnish
types and handles imports, and another that executes python commands at

the

Post by Gary Furnish
global namespace level. This will be more useful as Cython starts to

assume

Post by Gary Furnish
more advanced optimization and code generation features
http://trac.sagemath.org/sage_trac/ticket/2655(the<http://trac.sagemath.org/sage_trac/ticket/2655%28the>third attachment
only) and is based against 0.9.6.12,
although I can rebase if needed. I am hoping this can be merged into
Cython.

The patch doesn't work for me. When I fix the problems with the header file
generation, I end up with C code that defines the "__pyx_obj_*" class structs
in the wrong order. Note that there are no circular dependencies involved in
my case and that the two classes where the problem occurs are defined in the
same source file (although their highest base class is defined in a separate
file, not sure if that matters.
Personally, seeing all problems I have with this patch and not really seeing a
clear gain, I would just revert the patch for now.
Stefan

Stefan Behnel

2008-04-09 10:57:01 UTC

Permalink

Post by Gary Furnish
Can you produce a testcase for this issue? It is *critical* for fast
symbolics in Sage, so I would prefer to fix the bug as opposed to reverting.
I see the problem now, but I will need the test cases you are using to
produce a fix. Once I have them it should be a relatively easy patch.

Personally, I do not consider the patch that was applied a clean patch, as
it mixes a number of independent smaller and bigger changes - and a subset
of them has proven to be problematic.

So I attached a changeset that reverts the parts that I think were related
to the behaviour I see and that does some minor source cleanup of the
affected areas. I had to do that by hand, as the three related changesets
(one by you, two by Robert) are split around the package merge, which
makes it really hard to get them reverted cleanly.

I will try to come up with a test case - which may take a bit longer as I
am not sure the current test suite is ready to handle multi-file tests
(although a mix of .pyx and .pxd should work, I'll have to check that). I
would ask you to provide a clean patch against my reverted source version,
so that we have a single clean changeset of what you wanted to achieve,
possibly followed by one or more bug fix commits that eventually lead to a
working patch.

I'm not questioning the intention of your changes in general, I'm just
concerned about a) getting the bugs out and b) making it traceable in the
revision history what change had what effects. Smaller, separate
self-contained commits are always better than a mix of different changes
in a big patch.

BTW, what is the "init2" function for? From what I read, you seem to have
cut it out of the original module init function. What's its purpose?
(implying: what would be a clearer name than "init2"?) And: why is it
declared "PyMODINIT_FUNC" instead of plain "void", which seems to be the
way you call it?

Stefan

Stefan Behnel

2008-04-09 15:48:17 UTC

Permalink

Hi,

Here is a test case for the ordering bug (and a fix for the test runner). A
test for the header file generation is harder to write, as it requires
breaking out of the current "compile[-run]" test scheme into a
"compile-compile[-run]" scheme that builds a module against the previously
generated header file. I have currently no idea how to enable that in anything
but an ugly hackish way.

Stefan

Stefan Behnel

2008-04-11 05:55:17 UTC

Permalink

Hi again,

Post by Gary Furnish
Finally it splits module initialization into two phases: one that initiates
types and handles imports, and another that executes python commands at the
global namespace level.

If the goal is to clean up the module initialisation code, then two functions
are not enough. IMHO, clean module init code consists of one top-level init
function that calls separate functions to 1) create the module, 2) intern
constants, 3) set up global names, 4) initialise the C-API, 5) handle
cimports, 6) set up types, 7) execute module-level code. (Don't know if this
is a complete list).

Can you provide a patch that does that?

Stefan

Stefan Behnel

2008-04-12 12:42:43 UTC

Permalink

Hi,

Post by Gary Furnish
This patch adds extra logic to code generation to sort dependencies to
guarantee that C code is generated in the right order for circular cdef
imports.

And another issue with the original patch:

https://bugs.launchpad.net/cython/+bug/215550

Stefan

Stefan Behnel

2008-04-26 06:30:39 UTC

Permalink

Hi,

I'm in the process of fixing it now. I am basing it against rev 400 as the
current cython-devel is all manner of broken against sage (unicode errors),
any idea when the hg repo will be working again?

hmm, interesting. I'm not aware of any problems, the tests run just fine. What
kind of errors do you get?

I assume that this is due to the PEP 263/3120 implementation? Maybe the Sage
sources do 'unexpected' things with byte strings versus unicode strings?

Stefan

Gary Furnish

2008-04-26 09:05:35 UTC

Permalink

Ok, I have a working patch, but it keeps failing in lxml in rather
mysterious ways (after fixing your reported bugs)

cdef extern from "lxml.etree_api.h":

# first function to call!
cdef int import_lxml__etree() except -1

##########################################################################
# public ElementTree API classes

cdef class lxml.etree._Document [ object LxmlDocument ]:
cdef tree.xmlDoc* _c_doc

cdef class lxml.etree._Element [ object LxmlElement ]:
cdef _Document _doc
cdef tree.xmlNode* _c_node

cdef class lxml.etree.ElementBase(_Element) [ object LxmlElementBase ]:
pass

cdef class lxml.etree._ElementTree [ object LxmlElementTree ]:
cdef _Document _doc
cdef _Element _context_node

cdef class lxml.etree.ElementClassLookup [ object LxmlElementClassLookup
]:
cdef object (*_lookup_function)(object, _Document, tree.xmlNode*)

cdef class lxml.etree.FallbackElementClassLookup(ElementClassLookup) \
[ object LxmlFallbackElementClassLookup ]:
cdef ElementClassLookup fallback
cdef object (*_fallback_function)(object, _Document, tree.xmlNode*)

##########################################################################

Can you elaborate on how exactly this works? It is by far the most
mysterious code I have ever seen in Cython.

Post by Stefan Behnel
Hi,

Post by Gary Furnish
This patch adds extra logic to code generation to sort dependencies to
guarantee that C code is generated in the right order for circular cdef
imports.

https://bugs.launchpad.net/cython/+bug/215550
Stefan

Stefan Behnel

2008-04-26 09:20:52 UTC

Permalink

Hi,

Post by Gary Furnish
Ok, I have a working patch, but it keeps failing in lxml in rather
mysterious ways (after fixing your reported bugs)
# first function to call!
cdef int import_lxml__etree() except -1
cdef tree.xmlDoc* _c_doc
cdef _Document _doc
cdef tree.xmlNode* _c_node
pass
Can you elaborate on how exactly this works? It is by far the most
mysterious code I have ever seen in Cython.

Cython generates a header file for the lxml.etree module ("lxml.etree_api.h")
that contains the "public" extension classes (or all? don't remember) and the
"public api" functions. The code excerpt above is from a .pxd file that
redefines the classes and functions used in lxml as a public API, so that
other Cython modules (especially lxml.objectify) can call into lxml.etree. The
advantage is that the API does not contain the C methods of the types, which
lxml does not export.

Does that make it clearer?

Stefan

Stefan Behnel

2008-04-26 07:19:13 UTC

Permalink

[placing this a bit more prominently]

Hi,

current cython-devel is all manner of broken against sage (unicode errors),
any idea when the hg repo will be working again?

hmm, interesting. I'm not aware of any problems, Cython's tests run just fine.
I just added a few more to make sure everything works as expected. What kind
of errors do you get?

My guess is that this is due to the PEP 3120 implementation. If you use 8-bit
strings in your sources, you must declare their encoding in the source header
(PEP 263). Otherwise, Cython will expect them to be UTF-8 encoded (PEP 3120),
and decode them on input. So anything but UTF-8 will (very likely) give you a
compiler error.

I don't know what encoding Sage commonly uses. After all, the most common
unicode bug is that people just don't care about encodings. In case it's
latin-1, you will have to add this at the beginning of the non-ASCII source files:

# -*- coding: latin-1 -*-

Alternatively, you can recode the source files from latin1 to UTF-8.

To figure out if a source file uses non-ASCII characters, you can do something
like this:

import codecs
codecs.open("thefile.pyx", encoding="ASCII").read()

You get an exception if it's not ASCII.

Stefan