Friday, October 22, 2010

Speeding up MySQL Routines with C using UDF : Part 1

Most of you probably already know that you can write Routines in MySQL (5.1+) to allow you to encapsulate logic that you re-use into a simple to run command. An example...

In WorkXpress we store Address data in our databases as XML, since an address is just one field there's only one cell in the database to store Address 1, Address 2, City, State, Zip and Country. So, instead of creating a special kind of table to store this data, we decided to simply store it like this :

   <multi_part_field>
   <part id="street">1757 Canvasback Lane</part>
   <part id="street2"></part>
   <part id="street3"></part>
   <part id="city">Eagan</part>
   <part id="state">MN</part>
   <part id="zip_code">55122</part>
   <part id="country">United States</part>
   <part id="type">United States</part>
   </multi_part_field>

So, when we need to pull just the City out of an address because it was requested in an expression, I can use an XML XPATH function built into MySQL called extractvalue() and pull the value out and return it. The bold line is the key to this function, the rest is just structure/sanity checks :

FUNCTION `wx_Address_Us_GetCity`(input_value text) RETURNS varchar(64) CHARSET utf8 DETERMINISTIC
 BEGIN
   IF input_value IS NULL THEN
      RETURN input_value;
   ELSE
      RETURN extractvalue(input_value, '/multi_part_field/part[@id="city"]');
   END IF;
 END

With this function in place, you can run SQL like the below (assuming there is XML Address Data in your 'value' column) and it will pull out just the City :

SELECT wx_Address_Us_GetCity(value) FROM table;

This is great functionality, and it enabled us to do a lot of really cool stuff in the backend of WorkXpress with our Expression Builder.  We hit a wall, however, with the functionality of what you can do inside a MySQL FUNCTION, as declared above, and that's where UDF's come in.

The UDF features of MySQL allow you to write functions in C/C++, and hook them up into a MySQL function!  UDF Functions are not only more fully featured, but rumored to run more efficiently/quickly depending on the functionality you're looking to write into a function.

As part of our switch from MyISAM on MySQL Server 5.1 to XtraDB (a better version of InnoDB) on PerconaDB we will absolutely need to re-write 4 of the 96 functions in C, because of feature limitations to MySQL's Function language.  Because of the rumored speed improvements in using UDF C functions over MySQL's built in language, we're likely going to rewrite all 96.

I'll be sharing performance benchmarks, pitfalls and potholes I hit along the way and hopefully my success story when the functions have been converted and are ready for release into the WorkXpress PaaS Engine.

Part 2 : MySQL's Example - udf_example.c

No comments: