As you may expect from open source world thingy, almost every Linux distribution has developed it’s own way to manage our favourite RDBMS service. Yet none is perfect, or even some of them seems to not work in real server scenario1.
In this post I’m trying to compare and point out most annoying aspects of initialization scripts that I had to face in production.
In ‘old days’ probably all Linux distributions used to start and stop services using so called init scripts usually written in Unix shell (sh or Bash). But situation is not so simple these days anymore.
Folks started to think about improving things, like making system initialization faster by parallelization of starting services. So Upstart was developed in Ubuntu world, then systemd appeared and became popular in other distros. There are also runit, launchd and more.
However in first part of this article I’d like to focus on traditional init.d scripts which were built many years ago, and really didn’t change a lot over time.
People tend to divide Linux world into ‘rpm-ish’ and ‘debian-ish’ distros, so lets take a look at examples from the most popular of both continents – Debian/Ubuntu and Redhat/CentOS. As you probably know historically MySQL maintainers rather poorly supported Debian/Ubuntu world (if at all), maybe that’s why Debian people decided to completely rework the init script provided by MySQL team. And IMHO that was not the best idea since the original script did the job pretty much as it should, maybe someone considered it as being too far from Debian rules though
. Note that the original MySQL init.d script has been almost identical from 5.0 version up to 5.6 (just verified community server editions: 5.0.95, 5.1.61 and 5.6.4).
Lets compare just the ‘start’ section of both the original2 and the one provided with Debian/Ubuntu init.d scripts.
Here is the one included in version dedicated to RedHat/CentOS/OracleLinux – MySQL-server-5.6.4_m7-1.el6.x86_64.rpm (nota bene the same script you can find in, so called, debian version – mysql-5.6.4-m7-debian5.0-x86_64.deb provided by Oracle, which deb is nothing more than a tar version, re-packaged in deb format):
271 case "$mode" in272 'start')273 # Start daemon274 275 # Safeguard (relative paths, core dumps..)276 cd $basedir277 278 echo $echo_n "Starting MySQL"279 if test -x $bindir/mysqld_safe280 then281 # Give extra arguments to mysqld with the my.cnf file. This script282 # may be overwritten at next upgrade.283 $bindir/mysqld_safe --datadir="$datadir" --pid-file="$mysqld_pid_file_path" $other_args >/dev/null 2>&1 &284 wait_for_pid created "$!" "$mysqld_pid_file_path"; return_value=$?285 286 # Make lock for RedHat / SuSE287 if test -w "$lockdir"288 then289 touch "$lock_file_path"290 fi291 292 exit $return_value293 else294 log_failure_msg "Couldn't find MySQL server ($bindir/mysqld_safe)"295 fi296 ;;297
One variable defined before is relevant here (comments included intentionally):
49 # Default value, in seconds, afterwhich the script should timeout waiting50 # for server start.51 # Value here is overriden by value in my.cnf.52 # 0 means don't wait at all53 # Negative numbers mean to wait indefinitely54 service_startup_timeout=900
And then the key function here is wait_for_pid:
144 wait_for_pid () {145 verb="$1" # created | removed146 pid="$2" # process ID of the program operating on the pid-file147 pid_file_path="$3" # path to the PID file.148 149 i=0150 avoid_race_condition="by checking again"151 152 while test $i -ne $service_startup_timeout ; do153 154 case "$verb" in155 'created')156 # wait for a PID-file to pop into existence.157 test -s "$pid_file_path" && i='' && break158 ;;159 'removed')160 # wait for this PID-file to disappear161 test ! -s "$pid_file_path" && i='' && break162 ;;163 *)164 echo "wait_for_pid () usage: wait_for_pid created|removed pid pid_file_path"165 exit 1166 ;;167 esac168 169 # if server isn't running, then pid-file will never be updated170 if test -n "$pid"; then171 if kill -0 "$pid" 2>/dev/null; then172 : # the server still runs173 else174 # The server may have exited between the last pid-file check and now.175 if test -n "$avoid_race_condition"; then176 avoid_race_condition=""177 continue # Check again.178 fi179 180 # there's nothing that will affect the file.181 log_failure_msg "The server quit without updating PID file ($pid_file_path)."182 return 1 # not waiting any more.183 fi184 fi185 186 echo $echo_n ".$echo_c"187 i=`expr $i + 1`188 sleep 1189 190 done191 192 if test -z "$i" ; then193 log_success_msg194 return 0195 else196 log_failure_msg197 return 1198 fi199 }
Seems complicated? Not really. Basically it starts the mysqld_safe script and waits until it’s child – mysqld creates it’s pid file. While doing waiting loop it checks if the mysqld process still exists (mysqld_safe’s ‘$!‘ assigned to the second parameter in wait_for_pid() where it becomes $pid). If not, then tries once more after a second and then, if mysqld is still away, exists immediately. This can happen if MySQL refuses to start for some critical reason, like configuration error or wrong file permissions. And if mysqld is running (kill -0 check) – it waits as long as the pid file gets created (which happens when MySQL finally gets ready for use). Unless it doesn’t – then it ends after $service_startup_timeout is reached, so quite long. Sounds wise, right? But it is still far from being perfect.3
OK, now lets see how the analogous init.d script looks like in Debian/Ubuntu. I used the one from Ubuntu Karmic release (already unsupported) because since Lucid MySQL server is controlled via Upstart.
99 case "${1:-''}" in100 'start')101 sanity_checks;102 # Start daemon103 log_daemon_msg "Starting MySQL database server" "mysqld"104 if mysqld_status check_alive nowarn; then105 log_progress_msg "already running"106 log_end_msg 0107 else108 # Could be removed during boot109 test -e /var/run/mysqld || install -m 755 -o mysql -g root -d /var/run/mysqld110 111 # Start MySQL!112 /usr/bin/mysqld_safe > /dev/null 2>&1 &113 114 # 6s was reported in #352070 to be too few when using ndbcluster115 for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14; do116 sleep 1117 if mysqld_status check_alive nowarn ; then break; fi118 log_progress_msg "."119 done120 if mysqld_status check_alive warn; then121 log_end_msg 0122 # Now start mysqlcheck or whatever the admin wants.123 output=$(/etc/mysql/debian-start)124 [ -n "$output" ] && log_action_msg "$output"125 else126 log_end_msg 1127 log_failure_msg "Please take a look at the syslog"128 fi129 fi130 ;;
Looks bit simpler but… Hey, one thought comes to my mind when I read it: have the guy who wrote it had any chance to test that against any large MySQL instance? Actually, you don’t have to try it on large, say few hundred gigabytes instance to find out 14 seconds is too short. Let’s just increase innodb_log_file_size to 1024M on server with pretty slow I/O sybsystem (like most of VPS):
service mysql start
* Starting MySQL database server mysqld
…fail!
But did it really fail to start? Let’s check the error log:
120304 22:47:53 [Note] Plugin ‘FEDERATED’ is disabled.
120304 22:47:53 InnoDB: Log file ./ib_logfile0 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile0 size to 1024 MB
InnoDB: Database physically writes the file full: wait…
InnoDB: Progress in MB: 100 200 300 400 500 600 700 800 900 1000
120304 22:48:30 InnoDB: Log file ./ib_logfile1 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile1 size to 1024 MB
InnoDB: Database physically writes the file full: wait…
InnoDB: Progress in MB: 100 200 300 400 500 600 700 800 900 1000
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
120304 22:48:59 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files…
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer…
120304 22:48:59 InnoDB: Started; log sequence number 0 47116
120304 22:49:00 [Note] Event Scheduler: Loaded 0 events
120304 22:49:00 [Note] /usr/sbin/mysqld: ready for connections.
Version: ’5.1.37-1ubuntu5.5′ socket: ‘/var/run/mysqld/mysqld.sock’ port: 3306 (Ubuntu)
So it started successfully! And init.d script already assumed that mysqld must be dead till now, simply because it couldn’t query it! What about mysqld process check? You don’t care if it’s still working after damn 14 seconds? As Debian/Ubuntu users already can think of, this is really annoying when you install or update MySQL and starting the service won’t make it on time – apt will consider the package as not configured, and will try to stop and start MySQL every time you try to touch apt-get (sic!). The quickest, dirty fix here is to change the timeout from:
1 for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14; do
to e.g.
1 for i in `seq 1 60`; do
And as all experienced DBAs know, the bigger and more complicated database instance is, the longer starting time will most likely be. Not to mention InnoDB recovery after crash – how long the wait time should be to allow this init.d script to return correct state? Also what happens after we fix too short time to start when mysqld fails to start and exits immediately? Well, since Ubuntu folks decided to skip the pid checking part, it will loop for damn 60 or whatever we set seconds, and return long after mysqld is gone…
But why am I referring to init.d script if Ubuntu already dropped it’s usage since Lucid release? Well, it’s still default service control method for very popular MySQL derivatives – Percona Server (at least up to ver 5.1.61-rel13.2-431.lucid) and MariaDB (checked mariadb-server-5.3_5.3.5-mariadb113~lucid and mariadb-server-5.5_5.5.20-mariadb1~lucid where the script is only slightly modified). Also Debian has not switched to Upstart yet, so it uses still the same one (checked mysql-server-5.1_5.1.58-1 at Debian unstable).
Is the too short timeout the only flaw in Debian/Ubuntu init.d script? I wish it was.
For some reason Debian folks decided to make MySQL administrators ‘very happy’ and fix any potential problems for them during server start. Like, you know, all those MyISAM tables crashing over and over again
So /etc/init.d/mysql script calls /etc/mysql/debian-start (line 123), where you can see:
29 check_for_crashed_tables;
and this function is defined in /usr/share/mysql/debian-start.inc.sh (so much about simplicity
) like this:
09 function check_for_crashed_tables() {10 set -e11 set -u12 13 # But do it in the background to not stall the boot process.14 logger -p daemon.info -i -t$0 "Triggering myisam-recover for all MyISAM tables"15 16 # Checking for $? is unreliable so the size of the output is checked.17 # Some table handlers like HEAP do not support CHECK TABLE.18 tempfile=`tempfile`19 # We have to use xargs in this case, because a for loop barfs on the20 # spaces in the thing to be looped over.21 LC_ALL=C $MYSQL --skip-column-names --batch -e '22 select concat('\''select count(*) into @discard from `'\'',23 TABLE_SCHEMA, '\''`.`'\'', TABLE_NAME, '\''`'\'')24 from information_schema.TABLES where ENGINE='\''MyISAM'\' | \25 xargs -i $MYSQL --skip-column-names --silent --batch \26 --force -e "{}" >$tempfile27 if [ -s $tempfile ]; then28 (29 /bin/echo -e "\n" \30 "Improperly closed tables are also reported if clients are accessing\n" \31 "the tables *now*. A list of current connections is below.\n";32 $MYADMIN processlist status33 ) >> $tempfile34 # Check for presence as a dependency on mailx would require an MTA.35 if [ -x /usr/bin/mailx ]; then36 mailx -e -s"$MYCHECK_SUBJECT" $MYCHECK_RCPT < $tempfile37 fi38 (echo "$MYCHECK_SUBJECT"; cat $tempfile) | logger -p daemon.warn -i -t$039 fi40 rm $tempfile
And let’s skip the MyISAM repair table operation itself, since there is an intriguing query here:
1 select TABLE_SCHEMA, TABLE_NAME from information_schema.TABLES where ENGINE='MyISAM';
What is so special in this select? Well, if your MySQL instance holds like tens of thousands of tables it takes ages! And during that time your server suffers from quite heavy I/O load which kills performance (yes, MySQL is already able to accept connections at this point). I’ve done quick test on a machine equipped with six SSD disks in raid10 and X5680 12-core Nehalem CPU where MySQL handles around 19000 schemas and >1,100,000 tables (and it’s not the biggest instance we manage at all):
1 select TABLE_SCHEMA, TABLE_NAME from information_schema.TABLES where ENGINE='MyISAM';2 67 rows in set (8 min 50.25 sec)
And the fact that you most likely don’t use MyISAM tables (except system tables) won’t save you!
Still wondering what to do? Go, edit /etc/mysql/debian-start and comment out that damn line!
29 # check_for_crashed_tables;
Believe me, this line could help you only if you have small set of MyISAM tables (I hope you don’t), but even then IMHO you should repair them mindfully, not let it happen unintentionally.
After analysing Debian’s MySQL init.d script I could only say to it’s makers: “Why do you try to re-invent the wheel? But if you really have to, make sure your masterpiece actually works, works wisely, and not only on your desktop!”
You can find another simple approach in less known distribution – Archlinux, which you may want to look at.4.
Fortunately, regardless of Linux distribution, you can always use tar compilation, which will be most likely safer and eventually easier to manage. This way everyone can stick to his favourite distro and still use the original MySQL init.d script. But if you are still wondering which one you should pick, this post can be helpful: http://www.mysqlperformanceblog.com/2011/12/08/which-linux-distribution-for-mysql-server/.
In the next part I’ll try to analyse how the Upstart and systemd scripts are designed to deal with MySQL these days.
- in my understanding ‘real server scenario’ is when you need dedicated database server or many of them in order to keep up with the live application needs [↩]
- http://dev.mysql.com/doc/refman/5.6/en/mysql-server.html [↩]
- http://bugs.mysql.com/bug.php?id=61291 [↩]
- http://projects.archlinux.org/svntogit/packages.git/tree/trunk/mysqld?h=packages/mysql [↩]