Friday, October 22, 2010

Speeding up MySQL Routines with C using UDF : Part 2 -> MySQL's Example udf_example.c

<- Part 1 : Intro

The first thing I did when researching UDF's was to head over to the MySQL Documentation and read about what was ahead of me. Makes sense, right? If only it were that simple :-)

This page in the MySQL Docs tells us that there's an example udf file that comes with the source code for MySQL.  So, I hopped over to their site, downloaded the source, and found the file.  sql/udf_example.c  Now, admittedly it has been a long time since I wrote any C, or even compiled any (with the exception of ./configure, make, make install type "install from source" stuff)... so I figured their example compile line would work :

dmclain@Percona:~$ gcc -shared -o udf_example.so udf_example.c
udf_example.c:127:23: error: my_global.h: No such file or directory
udf_example.c:128:20: error: my_sys.h: No such file or directory
udf_example.c:139:19: error: mysql.h: No such file or directory
udf_example.c:142: error: syntax error before ‘LOCK_hostname’
udf_example.c:142: warning: data definition has no type or storage class

.. Not even close.  As you can see, it was missing a bunch of include files, so I ended up adding -I/path/to/mysql/includes

dmclain@Percona:~$ gcc -shared -o udf_example.so udf_example.c -I/home/dmclain/mysql_5.1/include
In file included from ./udf_example2.c:127:
./my_global.h:80:23: error: my_config.h: No such file or directory
./my_global.h:602:26: error: my_attribute.h: No such file or directory
./my_global.h:639:21: error: my_dbug.h: No such file or directory
In file included from ./udf_example2.c:127:
./my_global.h:678: error: syntax error before ‘size_socket’
./my_global.h:678: warning: data definition has no type or storage class
./my_global.h:803:2: error: #error "please add -DSTACK_DIRECTION=1 or -1 to your CPPFLAGS"
./my_global.h:871:1: warning: "isnan" redefined
In file included from /usr/include/math.h:28,
 ------ Snip --------

Now keep in mind, I didn't actually install mysql from the source that I downloaded, I was just trying to compile the udf_example.c to use with my existing PerconaDB or MySQL installation.  I didn't want to mess up my existing installation.

So, after a few hours of banging my head against this.. I found this cool site : http://www.mysqludf.org .  This is a group of folks who are trying to being a bit of order to a set of C functions that can be added to your MySQL installation in terms of the parameters, and naming conventions, etc.  What caught my eye mostly tho was the Skeleton package : http://www.mysqludf.org/participation/lib_mysqludf_skeleton.tar.gz

I downloaded and uncompressed this package which seems to include everything I need to compile an example C function for use in UDF in MySQL, which just returns it's own name, using commands I recognize!

dmclain@Percona:~$ autoreconf
dmclain@Percona:~$ ./configure
  -- Snip --
dmclain@Percona:~$ make
  -- Snip --
dmclain@Percona:~$ ls lib_mysqludf_skeleton/.libs
lib_mysqludf_skeleton.a  lib_mysqludf_skeleton.la  lib_mysqludf_skeleton.lai  lib_mysqludf_skeleton_la-lib_mysqludf_skeleton.o  lib_mysqludf_skeleton_la-mysqludf.o  lib_mysqludf_skeleton.so

Now you can use installdb.sql to register it.  I like to know what those kinds of scripts do tho, since I will need to replicate the registration process during the WorkXpress engine rollout that includes this work.  First it drops the function incase it already exists in your database, then then it adds your new function :

CREATE FUNCTION lib_mysqludf_skeleton_info RETURNS STRING SONAME 'lib_mysqludf_skeleton.so';

Notice there's no full path on the front of the SONAME. That's because MySQL has a directory where it puts plugins. You can ask your MySQL Server where *it* stores plugins with :

mysql> show variables like '%plugin%';
+---------------+-----------------------+
| Variable_name | Value                 |
+---------------+-----------------------+
| plugin_dir    | /usr/lib/mysql/plugin |
+---------------+-----------------------+
1 row in set (0.00 sec)

So I added a sym link from /usr/lib/mysql/plugin for my SO in my working area :

dmclain@Percona:~$ sudo ln -s /home/dmclain/lib_mysqludf_skeleton/.libs/lib_mysqludf_skeleton.so  /usr/lib/mysql/plugin/lib_mysqludf_skeleton.so

Then I restarted MySQL so it would find the new plugin; and ran the command :

mysql> select lib_mysqludf_skeleton_info();
+------------------------------+
| lib_mysqludf_skeleton_info() |
+------------------------------+
| lib_mysqludf_skeleton 0.0.1  |
+------------------------------+
1 row in set (0.00 sec)

So, this is step by step to download, compile, register and run the "Skeleton Info" function from http://www.mysqludf.org.

Now that I have a C function running in MySQL.. it's time to make one that does something I need it to do :-)

Speeding up MySQL Routines with C using UDF : Part 1

Most of you probably already know that you can write Routines in MySQL (5.1+) to allow you to encapsulate logic that you re-use into a simple to run command. An example...

In WorkXpress we store Address data in our databases as XML, since an address is just one field there's only one cell in the database to store Address 1, Address 2, City, State, Zip and Country. So, instead of creating a special kind of table to store this data, we decided to simply store it like this :

   <multi_part_field>
   <part id="street">1757 Canvasback Lane</part>
   <part id="street2"></part>
   <part id="street3"></part>
   <part id="city">Eagan</part>
   <part id="state">MN</part>
   <part id="zip_code">55122</part>
   <part id="country">United States</part>
   <part id="type">United States</part>
   </multi_part_field>

So, when we need to pull just the City out of an address because it was requested in an expression, I can use an XML XPATH function built into MySQL called extractvalue() and pull the value out and return it. The bold line is the key to this function, the rest is just structure/sanity checks :

FUNCTION `wx_Address_Us_GetCity`(input_value text) RETURNS varchar(64) CHARSET utf8 DETERMINISTIC
 BEGIN
   IF input_value IS NULL THEN
      RETURN input_value;
   ELSE
      RETURN extractvalue(input_value, '/multi_part_field/part[@id="city"]');
   END IF;
 END

With this function in place, you can run SQL like the below (assuming there is XML Address Data in your 'value' column) and it will pull out just the City :

SELECT wx_Address_Us_GetCity(value) FROM table;

This is great functionality, and it enabled us to do a lot of really cool stuff in the backend of WorkXpress with our Expression Builder.  We hit a wall, however, with the functionality of what you can do inside a MySQL FUNCTION, as declared above, and that's where UDF's come in.

The UDF features of MySQL allow you to write functions in C/C++, and hook them up into a MySQL function!  UDF Functions are not only more fully featured, but rumored to run more efficiently/quickly depending on the functionality you're looking to write into a function.

As part of our switch from MyISAM on MySQL Server 5.1 to XtraDB (a better version of InnoDB) on PerconaDB we will absolutely need to re-write 4 of the 96 functions in C, because of feature limitations to MySQL's Function language.  Because of the rumored speed improvements in using UDF C functions over MySQL's built in language, we're likely going to rewrite all 96.

I'll be sharing performance benchmarks, pitfalls and potholes I hit along the way and hopefully my success story when the functions have been converted and are ready for release into the WorkXpress PaaS Engine.

Part 2 : MySQL's Example - udf_example.c