Last weekend, I did an informal mini-conference on C and Ruby with a few friends.
We started with an introduction to C as some of my friends didn’t know it at all. Of course pointers in C are a little difficult to wrap heads around at first, but hopefully the things we covered will get them interested enough to try and follow up on them. One particular example that started many questions was something along these lines:
void change(int** i)
{
*i = (int*)malloc(sizeof(int));
**i = 123;
// what happens if we do
// free(*i);
// here
}
int main()
{
int *ptr = NULL;
change(&ptr);
printf("%d", *ptr);
// we should use free(ptr); here (if main() wasn't exiting anyway)
}
Especially the part about free-ing before accessing the variable: what happens if we access the already free-d memory. On one machine we tried it, the number was still 123, which showed very well how much we cannot trust the compiler/OS to actually zero out the memory immediately when it is released and how easily this can lead to bugs that work sometimes and fail sometimes. Next we talked about the connections between C arrays and pointers.
int a[10];
int* b = a;
// an array is in fact a pointer to the head
a[0] = 1;
*a = 1; // this is the same as the above
a[1] = 1;
*(a+1) = 1;
// the C compiler knows how long an int is
// so the + 1 will point to the first element
// instead of just plus one byte
Next up using pointers/arrays, together we implemented a very easy function to check whether a string was a palindrom:
int test(char* str)
{
char *lo = str + 0; // lo is set to the first character
char *hi = str + strlen(str) - 1; // hi is set to the last character
while (lo < hi)
{
if ((*lo) != (*hi))
return 0; // the chars don't match
lo++;
hi--;
// move lo to the right, hi to the left
}
return 1; // every char matched, it is a palindrom
}
After we discussed the very basics of C, finally we started diving into the source code of Ruby (latest trunk from github). I wanted to keep it simple, so to be connected to the previous discussion, I left out the basics of VALUE and the first code we looked at was Array#reverse and Array#reverse! These functions were fairly easy to understand as there is no complicated resizing or in String#reverse’s case, complicated encodings to deal with. In fact, before looking at the code, we discussed how it could be implemented - and to no surprise the actual implementation was very similar.
VALUE rb_ary_reverse(VALUE ary)
{
VALUE *p1, *p2;
VALUE tmp;
rb_ary_modify(ary);
if (RARRAY_LEN(ary) > 1)
{
p1 = RARRAY_PTR(ary); // get pointer to first item
p2 = p1 + RARRAY_LEN(ary) - 1; // get pointer to last item
while (p1 < p2) // imaging p1 as lo and p2 as hi
{
tmp = *p1; // swap *p1 with *p2
*p1++ = *p2;
*p2-- = tmp;
}
}
return ary;
}
One thing to remember here is the *p1++ notation. To those unfamiliar with C, var++ means that first var is “returned” or in other words, var is used and only then is it incremented. (the ++ is after the variable). On the other hand, ++var would first increment var and then use this incremented value. So it is crucial to use *p1++ here instead of *++p1. Now in ruby, there are two versions of reverse, one with a bang and one without. From the code in the above helper function, we can assume that it is for the reverse! type, but it would not be very useful for Array#reverse as it would change the array in place. Instead, we saw in the sources that Array#reverse actually internally duplicates the array before doing the in-place reverse on the duplicated array. This pointed out that Array#reverse must be slower than Array#reverse! and of course it must also use more memory. Even though I didn’t go into too much detail about Ruby’s GC (which I’m not all that familiar with anyway beyond knowing it is a mark-and-sweep type collector, just like the CLR), I mentioned that the GC will be started when a memory allocation does not have enough memory. So in the end, not only is it slower and uses more memory, but there is a chance that Array#reverse triggers garbage collection which will make it even slower and use even more memory. (As an aside, realizations like these are the reasons I wanted to talk about the internals, as I believe understanding even just a little about how things work make us better programmers in the long term)
To understand the performance implications, we tried implementing Array#reverse! in ruby which we benchmarked against the native reverse!:
require 'rubygems'
require 'benchmark'
class Array
def rb_reverse!
lo,hi=0,self.length-1
while lo < hi
tmp = self[lo]
self[lo] = self[hi]
self[hi] = tmp
lo += 1; hi -= 1
end
end
end
arr = (1..10_000_000).to_a
Benchmark.bm do |x|
x.report { arr.reverse! }
x.report { arr.reverse }
x.report { arr.rb_reverse! }
end
The result was absolutely clear: user system total real 0.030000 0.000000 0.030000 ( 0.037438) 0.100000 0.040000 0.140000 ( 0.142042) 6.290000 0.020000 6.310000 ( 6.307183)
Next we looked at an (easy-looking) item from the Ruby ToDo file: * optional stepsize argument for succ()
By default Fixnum#succ
doesn’t take any arguments and just increments
the variable by 1. So let’s take a look at what it’s doing internally:
static VALUE fix_succ(VALUE num)
{
long i = FIX2LONG(num) + 1;
return LONG2NUM(i);
}
No surprise here, it takes no arguments (besides a “this”) and increments by 1. To fix this, I explained about variable argument list, so we can take one or zero arguments and naively implemented a first version of succ:
static VALUE fix_succ(int argc, VALUE* argv, VALUE num)
{
VALUE step; // this will hold the value to increment by
if (argc == 0) // in case we don't receive any arguments, use 1 as step
{
step = INT2FIX(1); // just like above
}
else
{
if (argc == 1) // when there is one argument, use it as the increment
{
step = argv[0];
}
else // if there are more than 1 arguments, raise error
{
rb_raise(rb_eArgError, "wrong number of arguments");
}
}
long i = FIX2LONG(num) + FIX2LONG(step); // add the numbers and return
return LONG2NUM(i);
}
Of course this in itself does not work, which gave me a chance to briefly mention how method lookup tables work. Then together we changed that table to indicate that succ might take any number of arguments with the entry:
rb_define_method(rb_cFixnum, "succ", fix_succ, -1);
However that naive implementation has a lot of problems that then gave me an opportunity to again mention other parts of ruby we should be aware of.
# For starters, let's try the new succ method:
1.succ
> 2 # OK
1.succ 2
> 3 # OK
1.succ 1073741823
> 1073741824 # OK
1.succ 1073741824
> 73693943 # Hey, what's going on here
# other errors
1.succ 'a'
> 73021957 # whoa, what's this?
'a'.succ
> b # OK
'a'.succ 2
> in `succ': wrong number of arguments(1 for 0) (ArgumentError) #Ooops
Of course to understand what the problem is, we should know why 1073741824 is special. As it turns out, Fixnum’s largest representable value is 1073741823, which means that 1073741824 causes an overflow. Even though C int/long can go twice as high (2^31), Fixnum can hold up to 2^30 only. This means that values that are higher than 1073741823 will be of type Bignum, rather than Fixnum. So one thing we can do is to check if the argument is a Fixnum. Only if it is a Fixnum, should we add the values together, but when it’s a Bignum, we can call into the regular + method, that will take care of everything for us. In fact, we could have directly implemented everything as an addition, but the point here is how to map C types to Ruby types and how we can deal with the differences and the limitations of the underlying language.
static VALUE fix_succ(int argc, VALUE* argv, VALUE num)
{
VALUE step;
if (argc == 0)
{
step = INT2FIX(1);
}
else
{
if (argc == 1)
{
step = argv[0];
}
else
{
rb_raise(rb_eArgError, "wrong number of arguments");
}
}
if (FIXNUM_P(step)) // We can check if step can fit in Fixnum
{
// yes, just add it together in a long
// remember, that long can be twice as big as fixnum,
// so no worries about overflow
long i = FIX2LONG(num) + FIX2LONG(step);
return LONG2NUM(i);
}
// If it didn't fit into Fixnum, so let's just add it naturally
else
{
return rb_funcall(num, '+', 1, step);
}
}
This finally works for numbers, but still has some problems. Also, “‘a’.succ 2” _will not work, but the code for _String#succ is very complicated because of the different encodings. As all my friends (except Huan) had to deal with emoji conversion at one point in their lives, it didn’t take much to scare them away from looking at those codes (or even try to implement #succ - but they are free to do so if they want to)
Finally I wanted to look at implementing Ruby extensions in C (and comparing performance) but unfortunately the mkmf lib was not found on my machine (Later I realized that it was because RVM changed to an unusable ruby version when I wasn’t looking), so instead tried implementing binary search together as an exercise. As a note, here’s the final working version:
def rbinsrc(ary, src)
st,en = 0, ary.length-1
while st <= en
mid = st + (en - st) / 2 # avoid overflow in (st + en) / 2
if (src < ary[mid])
en = mid - 1
elsif (src > ary[mid])
st = mid + 1
else
return mid
end
return -1 # not found
end
end
In fact, I created an extension to do the same thing and benchmarked the Ruby version against the C version
#include "ruby.h"
VALUE Convrt = Qnil;
void Init_convrt();
VALUE method_binsrc(VALUE self, VALUE arr, VALUE src);
void Init_convrt()
{
Convrt = rb_define_module("Convrt");
rb_define_method(Convrt, "binsrc", method_binsrc, 2);
}
VALUE method_binsrc(VALUE self, VALUE arr, VALUE src)
{
int st, en, md; int s;
if (TYPE(arr) != T_ARRAY || TYPE(src) != T_FIXNUM)
rb_raise(rb_eTypeError, "Invalid arguments");
st = 0;
en = RARRAY(arr)->len-1;
s = FIX2INT(src);
while (st <= en)
{
md = st + (en-st)/2;
if (s < FIX2INT(RARRAY(arr)->ptr[md]))
en=md-1;
else if (FIX2INT(RARRAY(arr)->ptr[md]) < s)
st = md+1;
else
return (INT2FIX(md));
}
return (INT2FIX(-1));
}
require 'benchmark'
require 'convrt'
include Convrt
def rbinsrc(arr,src)
st,en = 0,arr.length-1
while (st <= en)
md = st + (en-st)/2
if (src < arr[md])
en = md - 1
elsif (arr[md] < src)
st = md + 1
else
return md
end
end
return -1
end
N = ARGV[0].to_i a = (1..100000).to_a
Benchmark.bm do |x|
x.report { 1.upto(N) { binsrc(a,a.length-1)} }
x.report { 1.upto(N) {rbinsrc(a,a.length-1)} }
end
And the result again is very clear, which shows how a C extension for Ruby can improve performance in bottleneck situations but because it complicates things a lot, how it should be only used with good judgment.
$ ruby test_02.rb 10000
user system total real
0.010000 0.000000 0.010000 ( 0.016662)
0.430000 0.020000 0.450000 ( 0.449629)