Why cluster?
services more user, service faster, increase rliability, get rich
Objectives
linear capacity increase,linear cost increase, exponential reliability increase.
Common topics in clustering php
Load Balancing
Database Scaling
Replicated Storage
Backups
Data Caches
Distributed Sessions
Staging Strategies
Debuging
Background Services
Load Balancing
Your load banancer may or may not...
Remove bad notes from the pool
Balance by performance
Balance by weight
Route by geolocation
Support sticky sessions
Have 1 million other features
Load Balancing Tools
some among thounds
DNS Servers
Big IP
Perlbal
nginx
Varnish
Database Scaling
common things you can do
Partitioning
Replication
Sharding
Database Partitioning(數據庫分區)
Every user is assigned to a database server
User don't share data between each other( between servers)
When you need more capacity, add another database server.
Works for some apps , dosen't work for others
implementation example: invoice and timesheet management app
Database Replication(mysql)(數據庫複製)
master-master
master-slave
master-many slave
master-master:
server1 replicates(as master) to server2(acting as slave)
server2 replicates(as master) to server1(acting as slave)
works well to a point
complete nightmare when replication gets desynchronized
dosen't actually improve write performance
good for basic high availability
master-slave:
server1 replicates(as master) to server2(acting as slave)
good frist step
makes you re-write your application to consider slave queries
dosen't increate write performance
de-synchronization is relatively painless
replication lag
master-many slave:
server1 replicates(as master) to many servers(acting as slaves)
thundering read performance
makes you re-write your application to consider slave queries
dosen't increase write performance
de-synchronization is relatively painless
replication lag
Database Sharding(數據庫分片)
data is split between multiple database servers(數據分別存儲在不同的服務器上)
logical index is kept of what data is where(for example, a mathematical index or a look up chart)(邏輯索引與數據保存在一起)
you have to grab, parse and correlate data across servers(你必須抓取、解析及在服務器間關聯數據)
theoretically limitless scalability(理論上無限的可擴展性)
complicated(複雜)
implementation example: digg, facebook, etc
Replicated Storage
common things you can do:
replicated file system
lookup tables
storage services
huge NAS arrays(巨大的磁盤陣列)
Replicated file system
very affordable
various replication modes
nothing to keep track of in your app
easy to implement
can cause massive failures if poorly configured
Lookup tables
very affordable
limitless mode; entirely up to you (限制少,完全由你)
entirely dependent on your application logic
can cause massive failures if poorly configured
Storage services
very expensive
theoretically limit-less capacity(理論上無限的容量)
easy to use
data must be pulled back first if used locally
costs and bandwidth usage can be mitigated(for example, by putting a proxy in front of it)(可以減少成本和帶寬的使用)
huge NAS arrays(巨大的磁盤陣列)
insanely expensive(瘋狂的昂貴)
insanely expensive(瘋狂的昂貴)
insanely expensive(瘋狂的昂貴)
bullet-proof fault tolerance .. at a price
easy to use... for a price
Backups
common methods:
all-RAID(dosen't work)
snapshots
copying from slaves
all-RAID dosen't work
why?
RAID won't keep your application from deleting data everywhere(RAID不能保持你任何地方的程序的一致性,當有數據被刪除時)
Snapshots
use a mechanism to make a snapshot of the partion i.e. LVM partions
works really well
easy if you do it from the beginning
requires some planning
should be used with RAID drives
copying from slaves
take a slave out of rotation and copy from it i.e. MySQL databases
works really well
easy if you do it from the beginning
requires some planning
backups can be out of date(過時,過期)
Data Caches
PHP doesn't have cross-request persistence, so someone added it: memcached
in-memory
fast
scalable
proven
use it
Got configuration data? Small,high-TTL data sets? Use APC.
Large,high-TTL data sets? Use files.
Mind the race condition.(競爭條件)
Replicated Sessions
pick your poison:
memcache w. redundancy
database
shared file system(don't actually do this)
Staging Strategies(分期策略)
if you value your free time:
Staging Strategies(dev)(分期策略)
do use source control systems(subversion, etc)
do profile your to loop for obvious performance issues
do use phpdoc tags
do make your dev environment as similar to live as practical(i.e., don't develop on windows and run live on UNIX)
do document all your changes
do use TDD(test-driven development)
Staging Strategies(Test)(分期策略)
do make test functionally identical to live, except for data
do create data fixtures (夾具)that are representative of real-life data
do create functional tests for the user interface(Selenium)
do not push anything to stage that did not pass unit tests
Staging Strategies(stage)(分期策略)
do make stage identical to a live node
do connect to the live database
do have test 'users' to perform destructive operation against
do have a mechanism to automate pushing stage to live
Staging Strategies(live)(分期策略)
do not ever make changes by hand on live
do automate pushing updates
do take nodes out rotation when you push updates
do not allow ssh access to live except when really needed
Debuging
do use xdebug on dev, test, and stage
do prepare an automated action that can turn xdebug and profiling on/off on 1 of the live nodes. you can and will run into errors that only exist on live.
do write a test case to replicate the the bug and then fix the bug, whenever possible
do first look if bugs are explainable by platform differences between development and production systems(i.e., don't develop on Windows and deploy on UNIX)
do go to my talk at ZendCon in October, "it Works on Dev"
Background Services
do void launching background processes from the web app
PHP doesn't have a native message queue, so(many) people wrote some. example, gearmand. do use a message queue.
do check for memory leaks in background tasks! many php libraries and also many php versions themselves still leak memory. try to write a loop in bash for a background task rather than in php. recycle the process often.
do plan your message format carefully
do persist important messages