Search This Blog

Thursday, August 25, 2011

Using MySQL Partitioning Instead of MERGE Tables

Using MySQL Partitioning Instead of MERGE Tables:

One common question I get is how to use partitioning instead of MERGE tables. The process I use involves using stored procedures to create and drop partitions. This article will go over the stored procedures I use; special thanks to Roland Bouman for taking a look and giving great feedback to optimize this process.


First, a simple table, not partitioned (yet):



use test;
DROP TABLE IF EXISTS my_part;
CREATE TABLE IF NOT EXISTS my_part (
id int NOT NULL,
creationDate datetime NOT NULL,
PRIMARY KEY (id,creationDate)
) ENGINE=InnoDB;

In real, life there is more to the table than just id and creationDate. The most important part is that the partitioned field(s) need to be part of the primary key.


Now, add the partition definition. This can be done in the CREATE statement, but I have found that it is easier for me to think about what fields and indexes I want first, and then worry about partitioning, so I naturally gravitate towards defining the fields and indexes first and then altering the table to add partitioning. This also helps me think about how to modify an existing table to add partitioning.



ALTER TABLE my_part PARTITION BY RANGE (TO_DAYS(creationDate)) (
partition 2010_07_01 values less than (to_days('2010-07-02')),
partition 2010_07_02 values less than (to_days('2010-07-03')),
partition 2010_07_03 values less than (to_days('2010-07-04')),
partition 2010_07_04 values less than (to_days('2010-07-05'))
);

This makes it pretty clear what is happening — the idea is to give the partition names actual dates that they hold, so that it is easy to see what partitions need to be added and deleted.


Deleting partitions


I find that making stored procedures makes things easy….so I will define a procedure called partition_drop to drop partitions. The partition_drop stored procedure takes in a table name and schema name to drop partitions from, and a date to delete up through and including that date. Here’s the procedure:



DELIMITER ||
DROP PROCEDURE IF EXISTS partition_drop ||

CREATE PROCEDURE partition_drop (IN through_date date, IN tbl varchar(64), IN db varchar(64))
BEGIN
DECLARE delete_me varchar(64);
DECLARE notfound BOOL DEFAULT FALSE;
DECLARE pname CURSOR FOR SELECT PARTITION_NAME
FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME=tbl AND TABLE_SCHEMA=db
AND DATE(PARTITION_NAME)!= 0
AND DATE(PARTITION_NAME) IS NOT NULL
AND DATE(PARTITION_NAME)<=through_date;

DECLARE CONTINUE HANDLER FOR NOT FOUND SET notfound:=TRUE;
OPEN pname;

cursor_loop: LOOP
FETCH pname INTO delete_me;
IF notfound THEN LEAVE cursor_loop; END IF;
SET @alter_stmt:=CONCAT("ALTER TABLE ",db,".",tbl," DROP PARTITION ",delete_me);

# sanity check commented out for production use
# SELECT @alter_stmt;
PREPARE stmt_alter FROM @alter_stmt; EXECUTE stmt_alter; DEALLOCATE PREPARE stmt_alter;

END LOOP;
CLOSE pname;
END ||
DELIMITER ;

Go ahead and run CALL partition_drop('2010-07-02','my_part','test'); to verify that SHOW CREATE TABLE my_part; shows that the desired partitions have been dropped.


Adding partitions


Adding partitions is similar to deleting partitions — using a stored procedure that takes in the date to add partitions up to that date. It will not try to add so many partitions that the table will have more than 1024, and it won’t add any partitions that already exist.



DELIMITER ||
DROP PROCEDURE IF EXISTS partition_add ||

CREATE PROCEDURE partition_add (IN through_date date, IN tbl varchar(64), IN db varchar(64))
BEGIN
DECLARE add_me char(10);
DECLARE max_new_parts,add_cnt smallint unsigned default 0;
SELECT 1024-COUNT(*) AS max_new_parts,
SUM(CASE WHEN
DATE(PARTITION_NAME)>=through_date then 1 else 0
END)
INTO max_new_parts, add_cnt
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_SCHEMA = db
AND TABLE_NAME = tbl;

IF add_cnt=0 THEN
BEGIN
SELECT MAX(DATE(PARTITION_NAME)) INTO add_me
FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME=tbl and TABLE_SCHEMA=db
AND DATE(PARTITION_NAME)<through_date;

# to do: declare handler for exceptions here
IF DATEDIFF(through_date,add_me)+1 < max_new_parts THEN
BEGIN
WHILE add_me<through_date do BEGIN
SET add_me:=DATE_FORMAT(add_me + INTERVAL 1 DAY,"%Y_%m_%d");
SET @alter_stmt:=CONCAT("ALTER TABLE ",db,".",tbl," ADD PARTITION (PARTITION ",add_me," VALUES LESS THAN (TO_DAYS('",add_me+INTERVAL 1 DAY, "')))" );

PREPARE stmt_alter FROM @alter_stmt; EXECUTE stmt_alter; DEALLOCATE PREPARE stmt_alter;

END;
END WHILE;
END;
END IF;
END;
END IF;
END ||
DELIMITER ;

Here’s how to call that stored procedure:

CALL partition_add ('2010_07_10', 'my_part','test');


Caveat: This stored procedure will only add in partitions from the maximum partition name/date until the date you add it in, it will not fill in any gaps. However, it is possible to add in another input parameter to be the “start date” and change the stored procedure to add the partition if it does not exist, from the start date through the end date.


Note: It was pointed out by Roland Bouman that it would be better to change both the add and drop stored procedures to do one ALTER TABLE instead of sequential ones. He wrote:


It’s probably better to generate a single statement to drop / add all partitions.


So the general pattern would be:


- generate and concatenate lines for each partition

- one single sequence of prepare, execute, deallocate to execute one DDL statement.


For the drop partition procedure, this approach would allow you to do away with the cursor. You can simply use GROUP_CONCAT in a single SELECT…INTO statement to generate the entire DDL statement.


Note: Roland also mentioned that these stored procedures could be generalized to use any interval, for example if each table partition held 7 days. The only limitation is that to get the optimal partition performance with dates, use either TO_DAYS(date_field) or YEAR(date_field) as the partitionining function.


Putting it all together


The procedure: daily I would run the following in MySQL, to keep only the previous “x” days:


CALL partition_drop('CURRENT_DATE()-INTERVAL x DAY', 'tbl_name','schema_name');


To add new partitions, I would do:

CALL partition_add('CURRENT_DATE()-INTERVAL 32 DAY', 'tbl_name','schema_name');


It is run daily just in case; the side benefit is that it will catch any gaps if you do not modify the stored procedure to include a start date for when to add. Adding a partition does not take a long time, because there’s no data to reorganize.


I would make a daily MySQL event, and then everything is stored nicely in the database, and backed up with the database. Others may choose to run the stored procedures in a cron script or scheduled task (Windows), but that requires a password to be stored somewhere — either in the script, somewhere the script can read, or in a config file such as .my.cnf.


No comments:

Post a Comment