Clustered PHP - DC PHP 2009

Why cluster?

        services more user, service faster, increase rliability, get rich

Objectives

        linear capacity increase,linear cost increase, exponential reliability increase.

Common topics in clustering php

Load Balancing

Database Scaling

Replicated Storage

Backups

Data Caches

Distributed Sessions

Staging Strategies

Debuging

Background Services

Load Balancing

Your load banancer may or may not...

        Remove bad notes from the pool

        Balance by performance

        Balance by weight

        Route by geolocation

        Support sticky sessions

        Have 1 million other features

Load Balancing Tools       

some among thounds

        DNS Servers

        Big IP

        Perlbal

        nginx

        Varnish

Database Scaling

common things you can do

        Partitioning

        Replication

        Sharding

Database Partitioning(數據庫分區)

        Every user is assigned to a database server

        User don't share data between each other( between servers)

        When you need more capacity, add another database server.

        Works for some apps , dosen't work for others

implementation example: invoice and timesheet management app

Database Replication(mysql)(數據庫複製)

        master-master

        master-slave

        master-many slave

master-master:

        server1 replicates(as master) to server2(acting as slave)

        server2 replicates(as master) to server1(acting as slave)

                works well to a point

                complete nightmare when replication gets desynchronized

                dosen't actually improve write performance

                good for basic high availability

master-slave:

        server1 replicates(as master) to server2(acting as slave)

                good frist step

                makes you re-write your application to consider slave queries

                dosen't increate write performance

                de-synchronization is relatively painless

                replication lag

master-many slave:

        server1 replicates(as master) to many servers(acting as slaves)

                thundering read performance

                makes you re-write your application to consider slave queries

                dosen't increase write performance

                de-synchronization is relatively painless

                replication lag

Database Sharding(數據庫分片)

        data is split between multiple database servers(數據分別存儲在不同的服務器上)

        logical index is kept of what data is where(for example, a mathematical index or a look up chart)(邏輯索引與數據保存在一起)

        you have to grab, parse and correlate data across servers(你必須抓取、解析及在服務器間關聯數據)

        theoretically limitless scalability(理論上無限的可擴展性)

        complicated(複雜)

implementation example: digg, facebook, etc

Replicated Storage

common things you can do:

        replicated file system

        lookup tables

        storage services

        huge NAS arrays(巨大的磁盤陣列)

Replicated file system

        very affordable

        various replication modes

        nothing to keep track of in your app

        easy to implement

        can cause massive failures if poorly configured

Lookup tables

        very affordable

        limitless mode; entirely up to you (限制少,完全由你)

        entirely dependent on your application logic

        can cause massive failures if poorly configured

Storage services

        very expensive

        theoretically limit-less capacity(理論上無限的容量)

        easy to use

        data must be pulled back first if used locally

        costs and bandwidth usage can be mitigated(for example, by putting a proxy in front of it)(可以減少成本和帶寬的使用)

huge NAS arrays(巨大的磁盤陣列)

        insanely expensive(瘋狂的昂貴)

        insanely expensive(瘋狂的昂貴)

        insanely expensive(瘋狂的昂貴)

        bullet-proof fault tolerance .. at a price

        easy to use... for a price

Backups

common methods:

        all-RAID(dosen't work)

        snapshots

        copying from slaves

all-RAID dosen't work 

           why?

        RAID won't keep your application from deleting data everywhere(RAID不能保持你任何地方的程序的一致性,當有數據被刪除時)

Snapshots         

use a mechanism to make a snapshot of the partion i.e. LVM partions

        works really well

        easy if you do it from the beginning

        requires some planning

        should be used with RAID drives

copying from slaves

        take a slave out of rotation and copy from it i.e. MySQL databases

                works really well

                easy if you do it from the beginning

        requires some planning

        backups can be out of date(過時,過期)

Data Caches

PHP doesn't have cross-request persistence, so someone added it: memcached

        in-memory

        fast

        scalable

        proven

        use it

Got configuration data? Small,high-TTL data sets? Use APC.

Large,high-TTL data sets? Use files.

Mind the race condition.(競爭條件)      

Replicated Sessions

pick your poison:

        memcache w. redundancy

        database

        shared file system(don't actually do this)

Staging Strategies(分期策略)

if you value your free time:

 

 

 

Staging Strategies(dev)(分期策略)

        do use source control systems(subversion, etc)

        do profile your to loop for obvious performance issues

        do use phpdoc tags

        do make your dev environment as similar to live as practical(i.e., don't develop on windows and run live on UNIX)

        do document all your changes

        do use TDD(test-driven development)

Staging Strategies(Test)(分期策略)

        do make test functionally identical to live, except for data

        do create data fixtures (夾具)that are representative of real-life data

        do create functional tests for the user interface(Selenium)

        do not push anything to stage that did not pass unit tests

Staging Strategies(stage)(分期策略)     

        do make stage identical to a live node

        do connect to the live database

        do have test 'users' to perform destructive operation against

        do have a mechanism to automate pushing stage to live

Staging Strategies(live)(分期策略)     

        do not ever make changes by hand on live

        do automate pushing updates

        do take nodes out rotation when you push updates

        do not allow ssh access to live except when really needed

Debuging

        do use xdebug on dev, test, and stage

        do prepare an automated action that can turn xdebug and profiling on/off on 1 of the live nodes. you can and will run into errors that only exist on live.

        do write a test case to replicate the the bug and then fix the bug, whenever possible

        do first look if bugs are explainable by platform differences between development and production systems(i.e., don't develop on Windows and deploy on UNIX)

        do go to my talk at ZendCon in October, "it Works on Dev"

Background Services

        do void launching background processes from the web app

        PHP doesn't have a native message queue, so(many) people wrote some. example, gearmand. do use a message queue.

        do check for memory leaks in background tasks! many php libraries and also many php versions themselves still leak memory. try to write a loop in bash for a background task rather than in php. recycle the process often.

        do plan your message format carefully

        do persist important messages

 

[email protected]

       

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章