Saturday 6 July 2013

Process millions of records in mysql when sending millions of push notifications to IOS & Android devices

This may not be the best solution, but I am writing this post as per my knowledge...
Temporary tables and batch wise query executions are the simple techniques to process millions of records with minimum delay when sending large number of push notifications to ios and android devices.

Lets take an example. Our server side app task is to send 500k push notifications to ios devices. When users receive the notifications, they might click on it, eventually they launch the application on their devices. Our client side app will give them a bonus of 5/10 coins to them when they launch the application on their device after getting the notifications, also bonus will be available only for 1 or 2 hours, bonus will have to expire after 1 or 2 hours. There is one more, The server side application must send push notifications on daily basis and only to users with certain conditions are met.

Lets get the key points of this server side and client side applications

1) Send 500k push notifications from server side.
2) Send push notifications on daily basis.
3) Give bonus to users when they click on the push notifications
4) Bonus expire after 1/2 hours.

We are not going to discuss how the client side app is made and how to send large number of push notifications. To know more about push notifications, visit http://ourjavaknowledge.blogspot.in/2013/06/push-notification.html
We focus only on the server side implementation to process huge number of database records.

Lets take USER table that holds a million records. To process these many records, fetch them in batch wise manner. Even if you have around 32 or 64GB RAM, its safe to process them in batch manner. If you have more memory, increase the batch size. But don't try to process millions of records in one go, if you have a very limited amount of memory.

Now, lets consider our batch size is 50K. We cannot fetch and process the first 50k records straight away from user table. Because requirement says that we have to send notifications only on daily basis and only to users with certain conditions are met. Lets put a constraint here that its impossible to write a query to fetch 50K records with those conditions. So we have to filter the users one by one. Fetch first 50K records and process records one by one and put their primary key id's in List( java is my back end language, so the preferred data structure to hold unlimited data is List)

1) Fetch first 50K records
 
     SELECT * FROM user LIMIT 0, 50000

2) Insert all users primary key id's in to List after filtering records one by one

3) Fetch and insert all records in to TEMPORARY table.

    CREATE TEMPORARY TABLE user_temp AS (SELECT * FROM user WHERE 0);

     INSERT INTO user_temp
     SELECT p.* FROM user p WHERE p.id in ('list')

4) update user table with the no of pushes sent and the day it was sent.

     UPDATE user p, user_temp p2 SET p.no_of_pushes= p2.no_of_pushes + 1,   p.since_last_push=NOW() WHERE p.id = p2.id

5) Finally destroy the temporary table

     DROP TABLE user_temp;

6) Fetch the next 50K records and continue the process

To make the process faster, avoid 2nd step and try not to process records one by one, instead process all 50K records in db level and push them to temporary table in one go. This will make the process much faster.

Friday 5 July 2013

Drop Unique Key Constraint in Mysql

How to drop a unique key constraint in mysql ?

If there are no foreign key constraint defined in your table, then dropping unique key is straight forward


alter table table_name drop index index_name;


But if there are foreign key constraint defined in your table, then to drop it, we have to drop the foreign key constraint first, then drop the unique key constraint.
Usually we get the below error, If we try to drop unique key constraint without dropping foreign key constraint


ERROR 1025 (HY000): Error on rename of './tablename/#sql-14ae_81' to
'./tablename/columnname' (errno: 150)

So, drop your foreign keys first, then drop unique keys

alter table tablename drop foreign key constraint_name;

alter table table_name drop index index_name;

Monday 1 July 2013

Mysql Production Settings

Mysql community edition Production Settings (InnoDB)


  Default mysql my.cnf

  • max_allowed_packet=1M 
  • default-storage-engine=INNODB
  • max_connections=100
  • query_cache_size=0
  • table_cache=256
  • tmp_table_size=33M
  • thread_cache_size=8
  • innodb_additional_mem_pool_size=3M
  • innodb_flush_log_at_trx_commit=1
  • innodb_log_buffer_size=2M
  • innodb_buffer_pool_size=101M
  • innodb_log_file_size=51M
  • innodb_thread_concurrency=10
  • innodb_lock_wait_timeout=50
max_allowed_packet: This defines the communication packet size that can be sent from client to server. If you want your server to handle big queries(for example, if you are working with big BLOB columns), then increase the size of this value. mysql server default value is 1MB. This is not appropriate for production use. Set the value 16M/32M/64M. If server recieves a packet size bigger than the max_allowed_packet, server throws "Packet too large" error and closes the connection. So If you have more memory, then the recommended value for production is

        max_allowed_packet=64M

default-storage-engine:  MyISAM and InnoDB, two of the most popular mysql storage engines. 
MyISAM manages non-transactional tables. It provides very high speed data storage and retrieval, and is the default storage in mysql unless you configured mysql with a different one by default.
InnoDB provides transaction-safe tables. To maintain data integrity, Innodb also supports foreign key constraints.
Obviously, in production we need data integrity. So, set the default storage engine as InnoDB.         
          
        default-storage-engine=INNODB

max_connections: Maximum number of simultaneous connections allowed from clients. Increase the number of connections to allow more number of concurrent users to connect to your app. default value is 100. In production, under very high traffic, this is not sufficient. If you have a very large RAM, set a very large value for this variable.
         
        max_connections=1000

query_cache_size: This is where the result of select statement is stored. If an identical statement is received later, server retrieves from the cache rather than from parsing and executing the statement again. If set to 0, query cache is disabled. Do not set it to a very large value, Due to the need for threads to lock the cache during updates, you may see lock contention issues with a very large cache. To know more about query cache, please read the below link
http://dev.mysql.com/doc/refman/5.0/en/query-cache.html

           query_cache_size=128M
       query_cache_limit = 8M

table_cache: The number of open tables for all threads. This value affects the maximum number of files the server keeps open. If you set it a very high value, you may run up against a limit imposed by the operating system on per-process number of open file descriptors. Many operating system allows to increase the open file limit. But it varies from system to system.

       table_cache=2K
       table_definition_cache = 8K
       open_files_limit = 6K

tmp_table_size: Temporary tables are created in memory for fast data processing. If temporary table size exceeds tmp_table_size limit, it will be automatically converted to disk table.In production, if you are dealing with more number of temporary tables, then set it to high value.

          tmp_table_size = 128M
       max_heap_table_size = 128M


thread_cache_size:  How many threads the server should cache for reuse. When a client disconnects, the client's threads are put in the cache if there are fewer than thread_cache_size threads there. Requests for threads are satisfied by reusing threads taken from the cache if possible, and only when the cache is empty is a new thread created. This variable can be increased to improve performance if you have a lot of new connections. Normally, this does not provide a notable performance improvement if you have a good thread implementation. However, if your server sees hundreds of connections per second you should normally set thread_cache_size high enough so that most new connections use cached threads
  
      thread_cache_size= 16k

innodb_additional_mem_pool_size: Additional memory pool that is used by InnoDB to store metadata information

      innodb_additional_mem_pool_size=8M

innodb_flush_log_at_trx_commit: If set to 1, InnoDB will flush (fsync) the transaction logs to the disk at each commit, which offers full ACID behavior. If you are willing to compromise this safety, and you are running small transactions, you may set this to 0 or 2 to reduce disk I/O to the logs. Value 0 means that the log is only written to the log file and the log file flushed to disk approximately once per second. Value 2 means the log is written to the log file at each commit, but the log file is only flushed to disk approximately once per second. ( Content from my.ini file )

      innodb_flush_log_at_trx_commit=1

In production, we need full ACID behaviour, So assign value 1 for this.

innodb_log_buffer_size: Buffer used by the InnoDB engine to log data. As the log is full, it will be flushed to the disk. Don't set it to a very high value.
         
    innodb_log_buffer_size=8M

innodb_buffer_pool_size: If your system is dedicated to mysql, then set 50-80% of RAM size. InnoDB, unlike MyISAM, uses a buffer pool to cache both indexes and row data. The bigger you set this the less disk I/O is needed to
access data in tables. Do not set it too large, though, because competition of the physical memory may cause paging in the operating system. (my.ini file reference)

    innodb_buffer_pool_size=2048M

  
innodb_thread_concurrency: Number of threads allowed inside the InnoDB kernel. This variable limits the number of operating system threads that can concurrently run inside innodb engine. innodb thread concurrency uses operating system threads to handle requests from user transactions( Transactions may issue many requests to innodb before they comments/rollback). To know more about innodb_thread_concurrency read http://dev.mysql.com/doc/innodb/1.1/en/innodb-performance-thread_concurrency.html.
The optimal value depends highly on the application, hardware as well as the OS scheduler properties. A too high value may lead to thread thrashing.(my.ini file reference). A recommended value is 2 times the number of CPUs plus the number of disks. ie, 2* (number of cpu's + number of disks ). If you have just 1 (or) 2 cpu's, set it to infinite. But that's not the case with more than 4-5 cpu's.

    innodb_thread_concurrency=10
   
    For 1/2 cpu's
    innodb_thread_concurrency=0  
    ( 0 means infinite. This value depends on the mysql version. Since 5.0.19, 0 equals infinite )

innodb_lock_wait_timeout: The amount of time a transaction wait for a resource, before giving up and rolling back the statement is controlled by this parameter. default value is 50sec.(http://dev.mysql.com/doc/innodb/1.1/en/innodb-other-changes-innodb_lock_wait_timeout.html) In production, If you are dealing with massive number of records from database, then consider increasing this value to around 120sec.

      innodb_lock_wait_timeout=120

low_priority_updates: If set to 1, all insert/update/delete and lock table write statements wait until there is no pending select or lock table read on the affected table. Its applicable only to storage engines that use table level locks (such as MyISAM, MEMORY, and MERGE).
http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_low_priority_updates
                              
     low_priority_updates=1

So, if you are using innodb, just ignore it.


concurrent_insert: MyISAM supports concurrent insert to avoid contention between readers and writers. If myisam table has no deleted rows in the middle, an insert statement can be executed to add rows at the end of the table at the same time that select statements are reading rows from the table.
http://dev.mysql.com/doc/refman/5.0/en/concurrent-inserts.html
      
      concurrent_insert=ALWAYS



Check mysql configuration with mysqltuner


To Verify your settings, use "mysqltuner". Its a very handy tool that matches your settings with the hardware and available memory to show the recommended settings. If you are using ubuntu, follow the below steps to intall mysqltuner

root:~#sudo aptitude install mysqltuner

After installation, run

root:~#mysqltuner

The above command generates a list of commands to verify your settings, once its done, it gives the general recommendations, if something needs to be changed/improved.
It looks something like this


>>  MySQLTuner 1.0.1 - Major Hayden <major@mhtx.net>
 >>  Bug reports, feature requests, and downloads at http://mysqltuner.com/
 >>  Run with '--help' for additional options and output filtering

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.5.29-0ubuntu0.12.04.2
[OK] Operating on 64-bit architecture

-------- Storage Engine Statistics -------------------------------------------
[--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 51K (Tables: 79)
[--] Data in InnoDB tables: 337M (Tables: 261)
[--] Data in PERFORMANCE_SCHEMA tables: 0B (Tables: 17)
[!!] Total fragmented tables: 262

-------- Performance Metrics -------------------------------------------------
[--] Up for: 1d 22h 1m 53s (19M q [120.221 qps], 3K conn, TX: 51B, RX: 2B)
[--] Reads / Writes: 82% / 18%
[--] Total buffers: 1.3G global + 2.7M per thread (500 max threads)
[OK] Maximum possible memory usage: 2.6G (8% of installed RAM)
[OK] Slow queries: 0% (232/19M)
[OK] Highest usage of available connections: 7% (38/500)
[OK] Key buffer size / total MyISAM indexes: 16.0M/389.0K
[!!] Key buffer hit rate: 94.4% (591 cached / 33 reads)
[OK] Query cache efficiency: 82.9% (12M cached / 15M selects)
[!!] Query cache prunes per day: 2219
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 232 sorts)
[OK] Temporary tables created on disk: 2% (1K on disk / 52K total)
[OK] Thread cache hit rate: 98% (69 created / 3K connections)
[!!] Table cache hit rate: 0% (721 open / 93K opened)
[OK] Open file limit used: 3% (233/6K)
[OK] Table locks acquired immediately: 100% (3M immediate / 3M locks)
[OK] InnoDB data size / buffer pool: 337.8M/1.0G

-------- Recommendations -----------------------------------------------------
General recommendations:
    Run OPTIMIZE TABLE to defragment tables for better performance
    Enable the slow query log to troubleshoot bad queries
    Increase table_cache gradually to avoid file descriptor limits
Variables to adjust:
    query_cache_size (> 128M)

    table_cache (> 2048)