Discussion:
Cython code generation - Question/Proposal to refactor error exits out of generate_result_code
(too old to reply)
Kay Hayen
2007-12-28 11:31:42 UTC
Permalink
Hello,

I have had a chance to look at the code generation in some more detail now and
have realized that I will need to do something relatively invasive as an
initial step.

The Node tree seems to generate most of the code for expressions through calls
to generate_result_code, which has an implementation like this:

class BackquoteNode(ExprNode):
# [...]
def generate_result_code(self, code):
code.putln(
"%s = PyObject_Repr(%s); %s" % (
self.result_code,
self.arg.py_result(),
code.error_goto_if_null(self.result_code, self.pos)))


That's a very typical example in that a value is created and immediately
tested.

With the immediately release of temporary variables that got used up in this
call, the right time is to do it immediately after the call that used it and
before the error checks.

Cython is full of calls to error_goto_if_null and similar near everything that
creates an assigment and needs an Python exception check.

I would like to, as an immediate step, strip these code.error_goto_if_null()
statements and put it in a separate function. Actually one step further, I
would like to have the following in place:


class BackquoteNode(ExprNode):
# [...]
def generate_result_code(self, code):
code.putln(
"%s = PyObject_Repr(%s) % (
self.result_code,
self.arg.py_result()))

def get_result_code_condition( self ):
return "!%s" % self.result_code, SomeEnumClassifyingTheErrorCheck

And then the users of generate_result_code() could call code.putSomething()
with what get_result_code_condition() said.

That change would as a bonus enable us to have options that control the error
exit code production. I personally would for my code not care too much if
allocations of new lists or tuples failed due to out of memory for probably
all of the code, plus simply disable exceptions for certain scopes:

I personally would like to have:

@PyRex.Unchecked
class FspecAccessor:
# Perfectly unit tested code, so why have run time checks at all?
[...]

So while at that, I would look into accessing these options as if they were
local to the node, not using Options directly, but indirectly, leaving the
door open for future annotations that control this.

But back to the subject. I will need to have this to implement my first patch.
As it's not going to change the generated code at all, i guess it's
uncontroversial?

I wanted to ask first, because it's a lot of code places to change. And it
would be tough to get that approach rejected only after I did that. :-)

Best regards,
Kay Hayen
Ondrej Certik
2007-12-28 11:44:52 UTC
Permalink
Post by Kay Hayen
Hello,
I have had a chance to look at the code generation in some more detail now and
have realized that I will need to do something relatively invasive as an
initial step.
The Node tree seems to generate most of the code for expressions through calls
# [...]
code.putln(
"%s = PyObject_Repr(%s); %s" % (
self.result_code,
self.arg.py_result(),
code.error_goto_if_null(self.result_code, self.pos)))
That's a very typical example in that a value is created and immediately
tested.
With the immediately release of temporary variables that got used up in this
call, the right time is to do it immediately after the call that used it and
before the error checks.
Cython is full of calls to error_goto_if_null and similar near everything that
creates an assigment and needs an Python exception check.
I would like to, as an immediate step, strip these code.error_goto_if_null()
statements and put it in a separate function. Actually one step further, I
# [...]
code.putln(
"%s = PyObject_Repr(%s) % (
self.result_code,
self.arg.py_result()))
return "!%s" % self.result_code, SomeEnumClassifyingTheErrorCheck
And then the users of generate_result_code() could call code.putSomething()
with what get_result_code_condition() said.
That change would as a bonus enable us to have options that control the error
exit code production. I personally would for my code not care too much if
allocations of new lists or tuples failed due to out of memory for probably
@PyRex.Unchecked
# Perfectly unit tested code, so why have run time checks at all?
[...]
So while at that, I would look into accessing these options as if they were
local to the node, not using Options directly, but indirectly, leaving the
door open for future annotations that control this.
But back to the subject. I will need to have this to implement my first patch.
As it's not going to change the generated code at all, i guess it's
uncontroversial?
I wanted to ask first, because it's a lot of code places to change. And it
would be tough to get that approach rejected only after I did that. :-)
I am glad you are working on this.

Ondrej
Robert Bradshaw
2007-12-28 17:45:15 UTC
Permalink
Post by Kay Hayen
Hello,
I have had a chance to look at the code generation in some more detail now and
have realized that I will need to do something relatively invasive as an
initial step.
The Node tree seems to generate most of the code for expressions through calls
# [...]
code.putln(
"%s = PyObject_Repr(%s); %s" % (
self.result_code,
self.arg.py_result(),
code.error_goto_if_null(self.result_code, self.pos)))
That's a very typical example in that a value is created and
immediately
tested.
With the immediately release of temporary variables that got used up in this
call, the right time is to do it immediately after the call that used it and
before the error checks.
Cython is full of calls to error_goto_if_null and similar near
everything that
creates an assigment and needs an Python exception check.
I would like to, as an immediate step, strip these
code.error_goto_if_null()
statements and put it in a separate function. Actually one step further, I
# [...]
code.putln(
"%s = PyObject_Repr(%s) % (
self.result_code,
self.arg.py_result()))
return "!%s" % self.result_code, SomeEnumClassifyingTheErrorCheck
And then the users of generate_result_code() could call
code.putSomething()
with what get_result_code_condition() said.
That change would as a bonus enable us to have options that control the error
exit code production. I personally would for my code not care too much if
allocations of new lists or tuples failed due to out of memory for probably
@PyRex.Unchecked
# Perfectly unit tested code, so why have run time checks at all?
[...]
So while at that, I would look into accessing these options as if they were
local to the node, not using Options directly, but indirectly,
leaving the
door open for future annotations that control this.
But back to the subject. I will need to have this to implement my first patch.
As it's not going to change the generated code at all, i guess it's
uncontroversial?
Just be very, very careful :-).
Post by Kay Hayen
I wanted to ask first, because it's a lot of code places to change. And it
would be tough to get that approach rejected only after I did
that. :-)
Sounds like a good idea to me. Note that in some cases the error
cannot be detected by the result code however. Perhaps there should
be a generic generate_error_code(self, code) that calls your
get_result_code_condition() (or get_error_condition(), or whatever).

- Robert
Kay Hayen
2007-12-29 06:23:44 UTC
Permalink
Hello,

trying to...
Post by Kay Hayen
I have had a chance to look at the code generation in some more detail now
and have realized that I will need to do something relatively invasive as
an initial step.
Or even worse.
Post by Kay Hayen
The Node tree seems to generate most of the code for expressions through
# [...]
code.putln(
"%s = PyObject_Repr(%s); %s" % (
self.result_code,
self.arg.py_result(),
code.error_goto_if_null(self.result_code, self.pos)))
That's a very typical example in that a value is created and immediately
tested.
While the philosophy or everything should be that it more or less works like
this, the goal to keep the same C code as we have now, and to separate out
the error handling won't fit.

So I would have to add error checks after series of branches. As discussed
elsewhere, this may not bloat the resulting binary, but it sure degrades the
generated C code readability, and as such I found it unacceptable. I think
Robert would say that too.
Post by Kay Hayen
Cython is full of calls to error_goto_if_null and similar near everything
that creates an assigment and needs an Python exception check.
Well so, yeah, I guess, I have to get into checking the error exits each one
by one, and add a list of temporary variable names to dispose. That list will
be empty with the correct per-node options set.

Makes me want to track in-use temporary variables in a global object that I
pass around. I have to checkout, I saw some "env" things passed to other
functions, maybe it's just not passed to all code generating functions and I
would add that or its a member that I didn't see.

It's a pity btw, that the Nodes don't have explicit constructors. There is no
way PyLint can help with these classes when the object creator gets to
determine the object properties.
Post by Kay Hayen
That change would as a bonus enable us to have options that control the
error exit code production. I personally would for my code not care too
much if allocations of new lists or tuples failed due to out of memory for
probably all of the code, plus simply disable exceptions for certain
I want to stick to that still.

As I have to touch all error exits, I guess it's a fine time to qualify them
as "Only out of memory" (allocations of lists, tuple or int
failed), "Programming mistake" (calling repr on something that hasn't got the
interface, operators fail, because something was None, or namespace lookups
fail) or "Unlikely Event" (dictionary misses maybe, didn't see much of
those).

So I would make code generation for error exits dependant on "local" options
that default to do everything.
Post by Kay Hayen
But back to the subject. I will need to have this to implement my first
patch. As it's not going to change the generated code at all, i guess it's
uncontroversial?
I wanted to ask first, because it's a lot of code places to change. And it
would be tough to get that approach rejected only after I did that. :-)
These points should remain. I will not initially touch the disposal code that
is only generated after the result was generated. I should have a look though
at how it gets at the list of what it can dispose.

Yous,
Kay
Robert Bradshaw
2007-12-29 08:12:26 UTC
Permalink
Post by Kay Hayen
Hello,
trying to...
Post by Kay Hayen
I have had a chance to look at the code generation in some more detail now
and have realized that I will need to do something relatively
invasive as
an initial step.
Or even worse.
Post by Kay Hayen
The Node tree seems to generate most of the code for expressions through
# [...]
code.putln(
"%s = PyObject_Repr(%s); %s" % (
self.result_code,
self.arg.py_result(),
code.error_goto_if_null(self.result_code, self.pos)))
That's a very typical example in that a value is created and
immediately
tested.
While the philosophy or everything should be that it more or less works like
this, the goal to keep the same C code as we have now, and to
separate out
the error handling won't fit.
So I would have to add error checks after series of branches. As discussed
elsewhere, this may not bloat the resulting binary, but it sure degrades the
generated C code readability, and as such I found it unacceptable. I think
Robert would say that too.
Yes, the further the error checking gets from the operation that
could have caused the error (especially if there's branching), the
more I think it impacts readability. So you're saying that due to
this you're still going to do the error handling before freeing the
temps, as it is now, right? In the worst case, I believe temps are
only kept one operation later than they need to be in most cases.
Post by Kay Hayen
Post by Kay Hayen
Cython is full of calls to error_goto_if_null and similar near everything
that creates an assigment and needs an Python exception check.
Well so, yeah, I guess, I have to get into checking the error exits each one
by one, and add a list of temporary variable names to dispose. That list will
be empty with the correct per-node options set.
Makes me want to track in-use temporary variables in a global
object that I
pass around. I have to checkout, I saw some "env" things passed to other
functions, maybe it's just not passed to all code generating
functions and I
would add that or its a member that I didn't see.
Env caries the scope information. The place to do this (to be
consistent with the rest of the codebase) is not in the code-
generation phase, but the analyse_operations/types phase. This is
where temps get requested/released. I think one can even query what
temps are in use at this point (note, however, that not all temps in
use at a given point should be released--e.g. for a try-except
statement in a loop one shouldn't release the loop temp variables).
It would probably make sense to attach this as a list and then in the
code generation phase just look at this list.
Post by Kay Hayen
It's a pity btw, that the Nodes don't have explicit constructors. There is no
way PyLint can help with these classes when the object creator gets to
determine the object properties.
Wasn't my decision, but that's how it works...
Post by Kay Hayen
Post by Kay Hayen
That change would as a bonus enable us to have options that
control the
error exit code production. I personally would for my code not care too
much if allocations of new lists or tuples failed due to out of memory for
probably all of the code, plus simply disable exceptions for certain
I want to stick to that still.
As I have to touch all error exits, I guess it's a fine time to qualify them
as "Only out of memory" (allocations of lists, tuple or int
failed), "Programming mistake" (calling repr on something that
hasn't got the
interface, operators fail, because something was None, or namespace lookups
fail) or "Unlikely Event" (dictionary misses maybe, didn't see much of
those).
So I would make code generation for error exits dependant on
"local" options
that default to do everything.
Sounds good.
Post by Kay Hayen
Post by Kay Hayen
But back to the subject. I will need to have this to implement my first
patch. As it's not going to change the generated code at all, i guess it's
uncontroversial?
I wanted to ask first, because it's a lot of code places to
change. And it
would be tough to get that approach rejected only after I did
that. :-)
These points should remain. I will not initially touch the disposal code that
is only generated after the result was generated. I should have a look though
at how it gets at the list of what it can dispose.
You could put asserts in the disposal code for testing (they should
all be NULL, right?).

Just to clarify, you have two main goals here. (1) Factor out the
error-checking code so that you can suppress checking for certain
errors (e.g. memory errors, or function that you "trust" to always
work). (2) Remove the XDECREFs for temporary variables only when
exceptions are thrown (which I'm still not convinced is worth the
code bloat/effort).

- Robert
Kay Hayen
2007-12-29 22:12:44 UTC
Permalink
Hello Robert,
Post by Robert Bradshaw
Yes, the further the error checking gets from the operation that
could have caused the error (especially if there's branching), the
more I think it impacts readability. So you're saying that due to
this you're still going to do the error handling before freeing the
temps, as it is now, right? In the worst case, I believe temps are
only kept one operation later than they need to be in most cases.
I still want to release temp variables as soon as possible. And as a matter of
fact, that's independent of everything else. Currently I am moving the
disposal code to the point where it becomes clear that the temp can be
released.

So I have removed (made optional) the code like this:

def generate_evaluation_code(self, code):
# [...]
if self.is_temp:
self.generate_subexpr_disposal_code(code)

and in turn frequently (but not as much as feared initially) made changes
like:

class BackquoteNode(ExprNode):
# [...]
def generate_result_code(self, code):
code.putln(
"%s = PyObject_Repr(%s);" % (
self.result_code,
self.arg.py_result()))

self.generate_subexpr_disposal_code(code)

self.generate_error_exit(
code = code,
condition = "!%s" % self.result_code,
position = self.pos,
error_kind = "OutOfMemory",
dispose = [])

In some cases, like when adding to a tuple e.g. I just have this now:

def generate_operation_code(self, code):
code.putln(
"%s = PyTuple_New(%s);" % (
self.result_code,
len(self.args)))

self.generate_error_exit(
code = code,
condition = "!%s" % self.result_code,
position = self.pos,
error_kind = "OutOfMemory",
dispose = [])

for i, arg in enumerate(self.args):
if not arg.result_in_temp():
code.put_incref(arg.result_code, arg.ctype())

code.putln(
"PyTuple_SET_ITEM(%s, %s, %s);" % (
self.result_code,
i,
arg.py_result()))

arg.generate_post_assignment_code(code)

That seems fairly straightforward. I promised to make the case why it is good
to release temp variables early:

1. They may be very very big.

Imagine:

bigone = "A" * (2**30)
bigone = bigone + "\n" + "\n"

Having extra copies of large objects around is going to hurt.

2. It will reduce the number of temporary variables in some cases.

Now the number of local object variables is one more than necessary. That may
lead to more ifs than necessary for conditional code flows, where they cannot
be decided NULL statically.

3. It's good to be explicit about when a variable stops being useful and data
flow. See e.g. this new style code:

/*
* c = (a,b,a**32,a+a) # <<<<<<<<<<<<<<
*/

__pyx_1 = PyNumber_Add(__pyx_v_a, __pyx_v_a);
if (unlikely(!__pyx_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; goto
__pyx_L1;}
__pyx_3 = PyTuple_New(4);
if (unlikely(!__pyx_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; goto
__pyx_L1;}
Py_INCREF(__pyx_v_a);
PyTuple_SET_ITEM(__pyx_3, 0, __pyx_v_a);
Py_INCREF(__pyx_v_b);
PyTuple_SET_ITEM(__pyx_3, 1, __pyx_v_b);
PyTuple_SET_ITEM(__pyx_3, 2, __pyx_2);
__pyx_2 = 0;
PyTuple_SET_ITEM(__pyx_3, 3, __pyx_1);
__pyx_1 = 0;
Py_DECREF(__pyx_v_c);
__pyx_v_c = __pyx_3;
__pyx_3 = 0;

__pyx_2, __pyx_1 are now set to 0 as soon as their scope of use ends. I think
previously that was done afterwards only.

or this here:

/*
* b = `a*a+1` # <<<<<<<<<<<<<<
*/
__pyx_1 = PyNumber_Multiply(__pyx_v_a, __pyx_v_a);
if (unlikely(!__pyx_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; goto
__pyx_L1;}
__pyx_2 = PyNumber_Add(__pyx_1, __pyx_num_1);
Py_DECREF(__pyx_1); __pyx_1 = 0;
if (unlikely(!__pyx_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; goto
__pyx_L1;}
__pyx_1 = PyObject_Repr(__pyx_2);
Py_DECREF(__pyx_2); __pyx_2 = 0;
if (unlikely(!__pyx_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; goto
__pyx_L1;}
Py_DECREF(__pyx_v_b);
__pyx_v_b = __pyx_1;
__pyx_1 = 0;

For my eys, the data flow is now a little more clear, as the scope of __pyx_2
and and __pyx_1 are ended near the cause of it.

4. Next step needs it.

This change will allow as a next step, to remove the disposal of temp
variables (the series of XDECREF() calls from the error free code path. And
consider, this code path is the one normally taken, the one which matters to
speed up.

We could let the XDECREF on the exception path initially, because that's not
going to be as performance critical and normally the compiler should be able
to tell the temporary variables use perfectly fine.

This new insight is exciting I think, because I think the first change is
mostly a cleanup with little to no gains, but the next step will likely be
measurable already without increasing code size.
Post by Robert Bradshaw
This is
where temps get requested/released. I think one can even query what
temps are in use at this point (note, however, that not all temps in
use at a given point should be released--e.g. for a try-except
statement in a loop one shouldn't release the loop temp variables).
It would probably make sense to attach this as a list and then in the
code generation phase just look at this list.
Is there something like a temporary variable for a loop? Or isn't that more a
local variable that leaks into outer scope. I think it's even like that for
list compresensions, right?
Post by Robert Bradshaw
You could put asserts in the disposal code for testing (they should
all be NULL, right?).
That's right, I will move these XDECREFs to the exception exits only and
otherwise assert to be NULL on the normal exit case.
Post by Robert Bradshaw
Just to clarify, you have two main goals here. (1) Factor out the
error-checking code so that you can suppress checking for certain
errors (e.g. memory errors, or function that you "trust" to always
work). (2) Remove the XDECREFs for temporary variables only when
exceptions are thrown (which I'm still not convinced is worth the
code bloat/effort).
Yeah, I want to make the error checks optional, and later we could come up
with syntax to allow applying options like these to a scope.

Let me say the following:

The error checking mode and categorization of error kind is sweets to convince
you to merge a relative wide reaching patch that would be more of a cleanup,
the strongest point probably, that the current late disposal of temp
variables leads to worse cache locality.

If you won't, I have no worry, I will keep increasing the incentive with
provable performance gains later on. :-)

Yours,
Kay
Robert Bradshaw
2007-12-30 00:07:23 UTC
Permalink
Post by Kay Hayen
Hello Robert,
Post by Robert Bradshaw
Yes, the further the error checking gets from the operation that
could have caused the error (especially if there's branching), the
more I think it impacts readability. So you're saying that due to
this you're still going to do the error handling before freeing the
temps, as it is now, right? In the worst case, I believe temps are
only kept one operation later than they need to be in most cases.
I still want to release temp variables as soon as possible. And as a matter of
fact, that's independent of everything else. Currently I am moving the
disposal code to the point where it becomes clear that the temp can be
released.
OK, I misunderstood you. What were you referring to then?
Post by Kay Hayen
# [...]
self.generate_subexpr_disposal_code(code)
and in turn frequently (but not as much as feared initially) made changes
# [...]
code.putln(
"%s = PyObject_Repr(%s);" % (
self.result_code,
self.arg.py_result()))
self.generate_subexpr_disposal_code(code)
self.generate_error_exit(
code = code,
condition = "!%s" % self.result_code,
position = self.pos,
error_kind = "OutOfMemory",
dispose = [])
code.putln(
"%s = PyTuple_New(%s);" % (
self.result_code,
len(self.args)))
self.generate_error_exit(
code = code,
condition = "!%s" % self.result_code,
position = self.pos,
error_kind = "OutOfMemory",
dispose = [])
code.put_incref(arg.result_code, arg.ctype())
code.putln(
"PyTuple_SET_ITEM(%s, %s, %s);" % (
self.result_code,
i,
arg.py_result()))
arg.generate_post_assignment_code(code)
That seems fairly straightforward. I promised to make the case why it is good
1. They may be very very big.
bigone = "A" * (2**30)
bigone = bigone + "\n" + "\n"
Having extra copies of large objects around is going to hurt.
Ignoring the fact that code like this is bad (because of all the
copying) it looks like the current code does this as well as it can
on this snippet anyways.
Post by Kay Hayen
2. It will reduce the number of temporary variables in some cases.
Now the number of local object variables is one more than
necessary. That may
lead to more ifs than necessary for conditional code flows, where they cannot
be decided NULL statically.
True.
Post by Kay Hayen
3. It's good to be explicit about when a variable stops being
useful and data
/*
* c = (a,b,a**32,a+a) # <<<<<<<<<<<<<<
*/
__pyx_1 = PyNumber_Add(__pyx_v_a, __pyx_v_a);
if (unlikely(!__pyx_1)) {__pyx_filename = __pyx_f[0];
__pyx_lineno = 6; goto
__pyx_L1;}
__pyx_3 = PyTuple_New(4);
if (unlikely(!__pyx_3)) {__pyx_filename = __pyx_f[0];
__pyx_lineno = 6; goto
__pyx_L1;}
Py_INCREF(__pyx_v_a);
PyTuple_SET_ITEM(__pyx_3, 0, __pyx_v_a);
Py_INCREF(__pyx_v_b);
PyTuple_SET_ITEM(__pyx_3, 1, __pyx_v_b);
PyTuple_SET_ITEM(__pyx_3, 2, __pyx_2);
__pyx_2 = 0;
PyTuple_SET_ITEM(__pyx_3, 3, __pyx_1);
__pyx_1 = 0;
Py_DECREF(__pyx_v_c);
__pyx_v_c = __pyx_3;
__pyx_3 = 0;
__pyx_2, __pyx_1 are now set to 0 as soon as their scope of use ends. I think
previously that was done afterwards only.
/*
* b = `a*a+1` # <<<<<<<<<<<<<<
*/
__pyx_1 = PyNumber_Multiply(__pyx_v_a, __pyx_v_a);
if (unlikely(!__pyx_1)) {__pyx_filename = __pyx_f[0];
__pyx_lineno = 4; goto
__pyx_L1;}
__pyx_2 = PyNumber_Add(__pyx_1, __pyx_num_1);
Py_DECREF(__pyx_1); __pyx_1 = 0;
if (unlikely(!__pyx_2)) {__pyx_filename = __pyx_f[0];
__pyx_lineno = 4; goto
__pyx_L1;}
__pyx_1 = PyObject_Repr(__pyx_2);
Py_DECREF(__pyx_2); __pyx_2 = 0;
if (unlikely(!__pyx_1)) {__pyx_filename = __pyx_f[0];
__pyx_lineno = 4; goto
__pyx_L1;}
Py_DECREF(__pyx_v_b);
__pyx_v_b = __pyx_1;
__pyx_1 = 0;
Other then swapping the order of errors vs. releasing temps, it
doesn't look like temps are released any earlier this version than
the current version.

__pyx_1 = __pyx_v_a;
Py_INCREF(__pyx_1);
__pyx_2 = PyNumber_Multiply(__pyx_1, __pyx_v_a); if (unlikely(!
__pyx_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 7; goto
__pyx_L1;}
Py_DECREF(__pyx_1); __pyx_1 = 0;
__pyx_1 = PyNumber_Add(__pyx_2, __pyx_num_1); if (unlikely(!
__pyx_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 7; goto
__pyx_L1;}
Py_DECREF(__pyx_2); __pyx_2 = 0;
__pyx_2 = PyObject_Repr(__pyx_1); if (unlikely(!__pyx_2))
{__pyx_filename = __pyx_f[0]; __pyx_lineno = 7; goto __pyx_L1;}
Py_DECREF(__pyx_1); __pyx_1 = 0;
Py_DECREF(__pyx_v_b);
__pyx_v_b = __pyx_2;
__pyx_2 = 0;
Post by Kay Hayen
For my eys, the data flow is now a little more clear, as the scope of __pyx_2
and and __pyx_1 are ended near the cause of it.
In my eyes, mixing the computation and releasing of temps makes
things less clear to me, but maybe that's cause when I look at the
code I usually am curious about the subexpressions->result code, not
in tracking which temps get allocated/deallocated where.
Post by Kay Hayen
4. Next step needs it.
This change will allow as a next step, to remove the disposal of temp
variables (the series of XDECREF() calls from the error free code path. And
consider, this code path is the one normally taken, the one which matters to
speed up.
We could let the XDECREF on the exception path initially, because that's not
going to be as performance critical and normally the compiler
should be able
to tell the temporary variables use perfectly fine.
This new insight is exciting I think, because I think the first change is
mostly a cleanup with little to no gains, but the next step will likely be
measurable already without increasing code size.
This is where I'm confused. In the error-free code path, why should
any temps be left to XDECREF? If there are, that's the bug and it was
lazy programing in the first place to have to worry about temporary
variables hanging around when you exit, right?
Post by Kay Hayen
Post by Robert Bradshaw
This is
where temps get requested/released. I think one can even query what
temps are in use at this point (note, however, that not all temps in
use at a given point should be released--e.g. for a try-except
statement in a loop one shouldn't release the loop temp variables).
It would probably make sense to attach this as a list and then in the
code generation phase just look at this list.
Is there something like a temporary variable for a loop? Or isn't that more a
local variable that leaks into outer scope. I think it's even like that for
list compresensions, right?
Yes, for instance when I write

for x in A: ...

(ignoring optimizations of when A is a list) it creates iter(A) and
stores that in a temporary variable.
Post by Kay Hayen
Post by Robert Bradshaw
You could put asserts in the disposal code for testing (they should
all be NULL, right?).
That's right, I will move these XDECREFs to the exception exits only and
otherwise assert to be NULL on the normal exit case.
Post by Robert Bradshaw
Just to clarify, you have two main goals here. (1) Factor out the
error-checking code so that you can suppress checking for certain
errors (e.g. memory errors, or function that you "trust" to always
work). (2) Remove the XDECREFs for temporary variables only when
exceptions are thrown (which I'm still not convinced is worth the
code bloat/effort).
Yeah, I want to make the error checks optional, and later we could come up
with syntax to allow applying options like these to a scope.
The error checking mode and categorization of error kind is sweets to convince
you to merge a relative wide reaching patch that would be more of a cleanup,
the strongest point probably, that the current late disposal of temp
variables leads to worse cache locality.
If you won't, I have no worry, I will keep increasing the incentive with
provable performance gains later on. :-)
I always assumed the compiler would re-arrange the operations to get
good cache locality anyways.

I am hesitant of wide ranging patches for two reasons. First, it is
very easy to make subtle ref-counting errors that take forever to
hunt down. (I've had to hunt down my fair share of these myself.)
Secondly, unless there is proveable gain (which I am hopeful you can
provide--not trying to discourage you) I'd ratether stay close to the
original Pyrex source to make it easy to incorporate its changes back
into our fork.

- Robert
Stefan Behnel
2007-12-30 08:28:15 UTC
Permalink
Post by Robert Bradshaw
I am hesitant of wide ranging patches for two reasons. First, it is
very easy to make subtle ref-counting errors that take forever to
hunt down. (I've had to hunt down my fair share of these myself.)
So did I. The last ones even slipped through into a(n alpha) release of lxml
before someone else discovered them as a memory leak. It took me days then to
track them down.

Kay, I don't want to discourage you in any way, but the bigger an internal
code change is, the more I'd ask for a) a good motivation for it and b) a
proof that it doesn't break things. I know, b) is somewhat tricky at the
moment, as we do not have an automated test suite. Maybe we should take some
time to build one up now (including ref-counting checks through the gc
module), that would make it easier for you to provide that proof.

Stefan
Kay Hayen
2007-12-30 10:38:20 UTC
Permalink
Hello,
Post by Stefan Behnel
Kay, I don't want to discourage you in any way, but the bigger an internal
code change is, the more I'd ask for a) a good motivation for it and b) a
proof that it doesn't break things. I know, b) is somewhat tricky at the
moment, as we do not have an automated test suite. Maybe we should take
some time to build one up now (including ref-counting checks through the gc
module), that would make it easier for you to provide that proof.
I fully understand and I would you as a project recommend to seriously
consider having that sort of tests as soon as possible, independant of
availability of my patches.

I think I will make a small series of patches, each with a small defined
purpose, whose effects can be reviewed.

Currently that would be:

PATCH 1) Move disposal of temporary variables to as early as possible. Make
the error exits all classified and optional.

Nothing much to gain here, although the removed OutOfMemory checks are
interesting to me personally.

PATCH 2) Move disposal of temporary variables from error exits target to the
error exit branch in question. Make that code a var args loop that the
compiler can optimize as he sees fit (inline or not, loop or unroll, etc).

At that point, with -Os this should have a nice impact on the code size, plus
a potentially benchmarkable effect on speed of error exits. The normal code
flow will remain unaffected.

PATCH 3) Move disposal of local variables to as early as possible.

That's not going to gain much either. It will just prove that the list of
initialized or potentially initialzed local variables is there and working.

PATCH 4) Remove disposal of local variables from exit goto target and to the
error exit caller in question.

The use of the var args loop may reduce the code size for -Os somewhat, but
for multiple exits that use the same local variables, we will have to rely on
the compiler to save us from code size increase.

I think this will already provide some benchmarkable speed for the code that I
manually tweaked.

PATCH 5) Do away with None assignments to local variables and go with NULL,
adding checks on every use of them.

At that point, everything will be in place, to make this as less noisy as
possible. Cython will already know which variables can't be initialized,
which could be, and which must be.

I can therefore only add checks for the case, where it's unclear. That's nicer
for readability. It's not going to change performance to do that, these
checks would only be those that can be eliminated by the C compiler as well.

Ok, let me work more on PATCH 1), I think it's getting nice already, I am
compiling demos for coverage and seem to be mostly there.

Yours,
Kay
Kay Hayen
2007-12-30 10:01:37 UTC
Permalink
Hello Robert,
Post by Robert Bradshaw
Post by Kay Hayen
Post by Robert Bradshaw
Yes, the further the error checking gets from the operation that
could have caused the error (especially if there's branching), the
more I think it impacts readability. So you're saying that due to
this you're still going to do the error handling before freeing the
temps, as it is now, right? In the worst case, I believe temps are
only kept one operation later than they need to be in most cases.
I still want to release temp variables as soon as possible. And as a matter of
fact, that's independent of everything else. Currently I am moving the
disposal code to the point where it becomes clear that the temp can be
released.
OK, I misunderstood you. What were you referring to then?
My initial approach, was to remove all error check for what is typically
self.result_code and do it later on.

That won't look nice, as sometimes Cython is making run time checks that avoid
the need for error checking the result completely. In these cases I would
have added error checks not necessary.

These are ugly to read, and as discussed in the pointer thread, currently it's
not possible to hint the compiler about the unnecessity.

So that was a no go and the proposal was flawed. I have to get right in the
middle of code generation instead.
Post by Robert Bradshaw
Post by Kay Hayen
bigone = "A" * (2**30)
bigone = bigone + "\n" + "\n"
Having extra copies of large objects around is going to hurt.
Ignoring the fact that code like this is bad (because of all the
copying) it looks like the current code does this as well as it can
on this snippet anyways.
I agree, that's code that requires big temps is bad in any case. But the
Python compiler hardly require the user to change it. I was only going to
shows an underlying principle that keeping references longer than necessary
has a price.
Post by Robert Bradshaw
Other then swapping the order of errors vs. releasing temps, it
doesn't look like temps are released any earlier this version than
the current version.
I think they are, but only slightly, the example isn't good. I will make a
better example later on in the process.
Post by Robert Bradshaw
Post by Kay Hayen
For my eys, the data flow is now a little more clear, as the scope of __pyx_2
and and __pyx_1 are ended near the cause of it.
In my eyes, mixing the computation and releasing of temps makes
things less clear to me, but maybe that's cause when I look at the
code I usually am curious about the subexpressions->result code, not
in tracking which temps get allocated/deallocated where.
It would make sense to only see that kind of thing and not the referencing
and/or no error checks and so on. But that's a view thing. I don't think the
generated source can make everything transparent at once.
Post by Robert Bradshaw
Post by Kay Hayen
This new insight is exciting I think, because I think the first change is
mostly a cleanup with little to no gains, but the next step will likely be
measurable already without increasing code size.
This is where I'm confused. In the error-free code path, why should
any temps be left to XDECREF? If there are, that's the bug and it was
lazy programing in the first place to have to worry about temporary
variables hanging around when you exit, right?
Forget about it, that was non sense from me. Of course only the error exits
were cleaning up temps.
Post by Robert Bradshaw
Yes, for instance when I write
for x in A: ...
(ignoring optimizations of when A is a list) it creates iter(A) and
stores that in a temporary variable.
Interesting, well indeed. So every loop exit needs a different treatment
already. I will check if I find something that I can generalize there.
Post by Robert Bradshaw
I am hesitant of wide ranging patches for two reasons. First, it is
very easy to make subtle ref-counting errors that take forever to
hunt down. (I've had to hunt down my fair share of these myself.)
Secondly, unless there is proveable gain (which I am hopeful you can
provide--not trying to discourage you) I'd ratether stay close to the
original Pyrex source to make it easy to incorporate its changes back
into our fork.
I agree, that makes sense. With that first patch, I now realize, I am far from
any performance gain. In fact, I will only have pessimized error exits
regarding the temporary variables cleanups unless it's totally clever and
generates special code for every exit itself.

So merging that patch will be pointless on all counts except the error
generation control. I will still make it a separate patch as then it can be
reviewed independently on its cleanup aspects.

Yours,
Kay
Robert Bradshaw
2007-12-30 10:36:22 UTC
Permalink
Post by Kay Hayen
Post by Robert Bradshaw
Post by Kay Hayen
[...]
bigone = "A" * (2**30)
bigone = bigone + "\n" + "\n"
Having extra copies of large objects around is going to hurt.
Ignoring the fact that code like this is bad (because of all the
copying) it looks like the current code does this as well as it can
on this snippet anyways.
I agree, that's code that requires big temps is bad in any case. But the
Python compiler hardly require the user to change it. I was only going to
shows an underlying principle that keeping references longer than necessary
has a price.
Post by Robert Bradshaw
Other then swapping the order of errors vs. releasing temps, it
doesn't look like temps are released any earlier this version than
the current version.
I think they are, but only slightly, the example isn't good. I will make a
better example later on in the process.
It does sound better in principle, and I bet you will be able to come
up with a better example, but in practice I think it'll usually boil
down to keeping a single (redundant) pointer one step longer than
necessary.
Post by Kay Hayen
Post by Robert Bradshaw
Post by Kay Hayen
For my eys, the data flow is now a little more clear, as the scope of __pyx_2
and and __pyx_1 are ended near the cause of it.
In my eyes, mixing the computation and releasing of temps makes
things less clear to me, but maybe that's cause when I look at the
code I usually am curious about the subexpressions->result code, not
in tracking which temps get allocated/deallocated where.
It would make sense to only see that kind of thing and not the
referencing
and/or no error checks and so on. But that's a view thing. I don't think the
generated source can make everything transparent at once.
Of course, this is a totally subjective thing--but as the most
important "eyes" that will be viewing the source are those of the C
compiler, your early-temp release code could be an improvement from
that point of view.
Post by Kay Hayen
Post by Robert Bradshaw
Yes, for instance when I write
for x in A: ...
(ignoring optimizations of when A is a list) it creates iter(A) and
stores that in a temporary variable.
Interesting, well indeed. So every loop exit needs a different
treatment
already. I will check if I find something that I can generalize there.
Can't promise that loops are the only things that have non-
traditional long-lasting temps, and conditional statements do things
like share temps, etc.
Post by Kay Hayen
Post by Robert Bradshaw
I am hesitant of wide ranging patches for two reasons. First, it is
very easy to make subtle ref-counting errors that take forever to
hunt down. (I've had to hunt down my fair share of these myself.)
Secondly, unless there is proveable gain (which I am hopeful you can
provide--not trying to discourage you) I'd ratether stay close to the
original Pyrex source to make it easy to incorporate its changes back
into our fork.
I agree, that makes sense. With that first patch, I now realize, I am far from
any performance gain. In fact, I will only have pessimized error exits
regarding the temporary variables cleanups unless it's totally
clever and
generates special code for every exit itself.
So merging that patch will be pointless on all counts except the error
generation control. I will still make it a separate patch as then it can be
reviewed independently on its cleanup aspects.
So to summarize, right now I see three things going on: (1) more
error generation control (2) earlier temp releases and (3) removing
XDECREFs from the error path. I think (1) is a great idea (though in
all likelihood I'll rarely if every use it), (2) may have enough
benefits to outweigh what I see as drawbacks, and (3) will probably
involve a massive amount of change and complication for a small if
any gain in performance in the error handling.

I'm not trying to discourage you here, I just think (3) has a very
low return on investment, especially compared to some of the
significant improvements you've suggested like more correct local
variable handling and (the real biggie) type inference.

- Robert
Kay Hayen
2007-12-30 12:37:57 UTC
Permalink
Hello Robert,
Post by Robert Bradshaw
Post by Kay Hayen
Post by Robert Bradshaw
Yes, for instance when I write
for x in A: ...
(ignoring optimizations of when A is a list) it creates iter(A) and
stores that in a temporary variable.
Interesting, well indeed. So every loop exit needs a different treatment
already. I will check if I find something that I can generalize there.
Can't promise that loops are the only things that have non-
traditional long-lasting temps, and conditional statements do things
like share temps, etc.
But well, Cython should already handle it, right?

I was looking at ForInStatNode node and saw that it doesn't dispose the
iterator before doing the else: block, and I couldn't discover how it's
currently handled to have an exeception.

Somehow the code object may help the block with picking the right exit or it's
from an earlier stage, analysis probably. Or it is it not dealt with? Can you
help me find that?
Post by Robert Bradshaw
So to summarize, right now I see three things going on: (1) more
error generation control (2) earlier temp releases and (3) removing
XDECREFs from the error path.
Right.
Post by Robert Bradshaw
I think (1) is a great idea (though in
all likelihood I'll rarely if every use it), (2) may have enough
benefits to outweigh what I see as drawbacks, and (3) will probably
involve a massive amount of change and complication for a small if
any gain in performance in the error handling.
I won't call that massive amount of change yet. I feel like I am only cleaning
things up.
Post by Robert Bradshaw
I'm not trying to discourage you here, I just think (3) has a very
low return on investment, especially compared to some of the
significant improvements you've suggested like more correct local
variable handling and (the real biggie) type inference.
You said you want small patches, so you get em and I am trying to order things
in a way that keeps each step small, not necessarily already effective. There
is no reason to treat temp and local variables all that much different, so
all I do now would be part of that as well.

The initialized local variables will just add to the list of disposed
variables for error exits, simple as that.

Getting both changes right at the same time is too hard to me, so I split it.

These initial patches refering to temp variables will only aim at pushing the
infrastructure of code generation, the local variable parts will affect the
analysis phase.

I find it a good way to get started and so far I found 2 (non-critical I
guess) bugs in cython 0.9.6.9, patches for those later.

Local variables are certainly on the plate and the real motivation behind my
work. And I do want to get to type inference, very. But as a new developer, I
have to build my knowedge up first.

And I am very stubborn. I made that mock up of improved code, benchmarked it
and found to be faster. Now I will be satisfied only when I made Cython able
to generate that code or better. Which is hopefully not that far away. :-)

Yours,
Kay
Robert Bradshaw
2007-12-31 18:47:14 UTC
Permalink
Post by Kay Hayen
Hello Robert,
Post by Robert Bradshaw
Post by Kay Hayen
Post by Robert Bradshaw
Yes, for instance when I write
for x in A: ...
(ignoring optimizations of when A is a list) it creates iter(A) and
stores that in a temporary variable.
Interesting, well indeed. So every loop exit needs a different treatment
already. I will check if I find something that I can generalize there.
Can't promise that loops are the only things that have non-
traditional long-lasting temps, and conditional statements do things
like share temps, etc.
But well, Cython should already handle it, right?
I was looking at ForInStatNode node and saw that it doesn't dispose the
iterator before doing the else: block, and I couldn't discover how it's
currently handled to have an exeception.
Somehow the code object may help the block with picking the right exit or it's
from an earlier stage, analysis probably. Or it is it not dealt with? Can you
help me find that?
Yes, this is handled in the error goto target block. Try-except
statements keep track of what temps need to be freed.

I know your goal is to move decrefs to before the goto statement, but
this cannot be done (without massive, and I think not worthwhile
changes). You can move some, such as many temporary variables, but
for things like longer-range temps and local variables deciding which
ones need to be freed is much harder. It's not just a matter of
looking how high in the scope one is jumping, but it also depends on
what error is being raised. Perhaps an example would be more
illustrative.

if not cached:
a = 0
for a in L:
try:
a += foo(x)
except ValueError:
pass

The error checking code at foo(x) doesn't know whether or not to free
the temporary loop variable, or (eventually) the local variable a.
This is why this kind of stuff is handled at the target.
Post by Kay Hayen
Post by Robert Bradshaw
So to summarize, right now I see three things going on: (1) more
error generation control (2) earlier temp releases and (3) removing
XDECREFs from the error path.
Right.
Post by Robert Bradshaw
I think (1) is a great idea (though in
all likelihood I'll rarely if every use it), (2) may have enough
benefits to outweigh what I see as drawbacks, and (3) will probably
involve a massive amount of change and complication for a small if
any gain in performance in the error handling.
I won't call that massive amount of change yet. I feel like I am only cleaning
things up.
The example above clarifies why I feel this would involve a massive
amount of change.
Post by Kay Hayen
Post by Robert Bradshaw
I'm not trying to discourage you here, I just think (3) has a very
low return on investment, especially compared to some of the
significant improvements you've suggested like more correct local
variable handling and (the real biggie) type inference.
You said you want small patches, so you get em and I am trying to order things
in a way that keeps each step small, not necessarily already
effective.
Thanks.
Post by Kay Hayen
There
is no reason to treat temp and local variables all that much
different, so
all I do now would be part of that as well.
The initialized local variables will just add to the list of disposed
variables for error exits, simple as that.
Yes, temp variables and locals are very similar, but in the sense
that one cannot always decided to release them until one hits the
error handling code. Sometimes one can release the temps earlier
though, as you are doing. When you start initializing local variables
to NULL I'd guess you'd want to leverage the same kind of code that
now handles temps.
Post by Kay Hayen
Getting both changes right at the same time is too hard to me, so I split it.
Certainly, thanks.
Post by Kay Hayen
These initial patches refering to temp variables will only aim at pushing the
infrastructure of code generation, the local variable parts will affect the
analysis phase.
I find it a good way to get started and so far I found 2 (non-
critical I
guess) bugs in cython 0.9.6.9, patches for those later.
Very nice.
Post by Kay Hayen
Local variables are certainly on the plate and the real motivation behind my
work. And I do want to get to type inference, very. But as a new developer, I
have to build my knowedge up first.
Of course, glad you're putting in the effort.
Post by Kay Hayen
And I am very stubborn. I made that mock up of improved code,
benchmarked it
and found to be faster. Now I will be satisfied only when I made Cython able
to generate that code or better. Which is hopefully not that far away. :-)
Yes, but I don't think your improvements were due to removing XDECREF
from the error paths--rather your better local variable handling.

- Robert

Loading...