Windows Azure Storage
Basic Knowledge-Availability
-Durability
-Scalability
-PartitionKey
-Blob - Container name + BlobName
-Messages -Queue Name
-Entity -Table Name + Patition key
-Throughput
-single Queue and table parition
-Up to 500 trans per sec
-single Blob partition
- reads/write up to 60MB/s
-single storage account
-Up to 5000 trans per sec
-Up to 3 GB reads/write per sec
Restful API and Client Libaray Supported
-Client libaray need to create a client base on credential + Restful URI
Blob Storage
--$root container
- Operation
-GetBlob (get whole blob or a specific range)
-putblob
-delete blob
-copyblob
-new copy
-Copy + Delete: rename a blob
-snapshotblob
-Read only version
-Restore (promotion snapshot to new version of blob)
-List snapshots
-leaseblob (exclusive update)
-Acquire,Renew,Release,Renew
- Use case: master election process
-meta data (can be get and set separately with blob)
-Sharing scenarios
-Container ACLS (Access Control List)
**if give delete permission on container, it does not mean can delete container,but can delete all the blob in the container
-Signed identifier-Shared Access Signatures
-Short URI
-Support dynamic change start/end time, permission
-BlobRequestOptions : http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.storageclient.blobrequestoptions_properties.aspx
-Retry
-Timeout
-Custom domain name-Accesscondition
-Block Blob (Accessing Stream Workload)
-2 phase commitment
- benefit for retry & efficient continuation
-PutBlockList(u=blockId1,c=blockId2,blockId3..)-CommittedList
-GetBlockList
-canget commmited List
-md5check when you download the content
-canget uncommmited list
-figure what part upload fails-Uncommited List
-PutBlock(BlockId1)-PageBlob (Accessing Random Workload)
- 1 phase commitment
-PutPage[512,2048]
-put has to be 512 byte align
-PutPage
-ClearPage
-GetPageRanges
-GetValid page ranges in the blob
-GetBlob[1000,2048)
BlobTips
-high throughput
-default connect limit
-update/downloadmultiple files in paraller
-ParallelOperationThreadCount
-singleblob uploading>32MB
-BlobRequestOptions
-Timeout
if useprogramming restful potocal please use -retry and exponential backoff fortimeout or server busy
-CDN
-Block Blob
-stream +commit-base write
-Page Blob
- randomwrite/read
-Set Timeout value on BlobClient or BlobRequestOptions
-Client Library uses default 90 sec
- Use Share Access Singatures
-Container Access Level -allows revoking permission
-Provider appropriate permission
-use https since there are pre-authentically URL
Drive
-NTFS API
-Page blob
- use Disk Management
-Create VHD(*.vhd)
-Upload to blob
-IntitialCache
-Create Cloud Drive base on blob
-Mount Drive
-basically it is Get Lease of Page blob
-Demount
-basically it is Release Lease of Page blob
-Snapshot Drive
-to support multiple drives read only
-Mounted by one VM at a time for read/write
-A VM can dynamically mount up to 16 drive
Table
-WCF(ADO) Data Service
-PatitionKey
-Entity Locality
-Entity GroupTransactions
-Tablescalibility
-Table
-Entity
-Insert
-Update
-Merge
-Replace
-Delete
-Query
-Entity GroupTransactions
-Operations
-LinqQuery.AsTableServiceQuery<Movie>()
-ContinuationToken (1000 each time)
-SaveChangesWithRetries()
-SaveChangesOptions
-Batch
TableTips
-Default .Net HTTP connectionsis set to 2
-If programing retry, need toimplement
-SaveChangesWithRetries
-AsTableServiceQuery (Continuation token)
-**Handle Confilct bcos of retry
- with retry ,previous operations might success but might network error does not return toclient
-Avoid "Append only"on parition key
- good to haveinsert cross table
- SELECT A PARTITION KEY
-From scalability, Query Efficient & Speed, Entity Group Transaction as below
-Scalability
-Patition Keyallow load balance cross servers
-good to havepartition key load balance
-avoidsingle partition key, read is not scable
-good to havepartition key load distribute incase throttle
-avoid append and prepend only
- each time only one server is busy, write is not scable
-Query Efficient & Speed
-Avoid frequency scan
-Parallel query
-Single Entity
-Goodto have partition key and row key
-Table ScanQuery
-Avoid Continue Token
-WhereRating>5
-Use RangeQuery & Parallel
-WherePatitionkey>='A' and Patitionkey<'D' and Rating>5
-WherePatitionkey>>'D' and Rating>5
-Avoidto use "OR"
-Expectcontinuation token for all expect in 1 entity
-ifcount>1000
-ifexecution time >5s
-ifat the end of partition range boundary
-Large Scan
-Split to rang and Parallel
-Use another table
-"OR"
-Individualquery and Parallar
-User Interaction
-Cache
-Entity Group Transaction
-Reduce roundtrip
-<=100commandsand payload <4MB
-Account ID as partition key
-insteadof user table and rental table
-WCF Data Service
- use new context for each logical operation
-bcos context track the entity, if you are going to update 1 million entity , then....
-Add object/attach to can throw exception if entity is already exist
-Point query throws exception if resource is not exist - useIgnoreResourceNotFoundException
-Point queries use the table's clustered index.
Queue
-Loosely Coupled workflow withqueues
-Guarantee delivery/processingthe message - 2 steps process
-Message Dequeue& Invisible
-Delete Messageor Crash re-visible
-FetchAttributes
-GetmessageCount and decide increase/reduce worker
-make message processingidempotent
-do not rely on order
QueueTips
-Message can be up to 64KB
-A Message maybe processed morethan once
-Message process canbe any order
-For higher throughout
-Batch multiplework item into a single message
-Use multipleQueue
-use DequeueCount to removeposion message
-Monitor message count todynamic increase/reduce worker role
Others
LooselyCoupled Worker with Queue
-case study
-Continuationfor long running Work items
-RecordProgress
-Scale QueueThroughput
-Batchwork item into Blob and store Blob into Queue
-Oruse multiple queue
Lifecycle management (upgrade and versioning)
In-placeRolling Upgrade
-remember that (old versionrunning side by side with new version)
-Protocal change with Rollingupdate
-2 steps process
-version 1.5
-version 2
-Windows Azure Table Schemachange
-type of change
-Addingnon-key properties
-Removingnon-key properties
-changingpartition key or row key
-2 steps process
-V1Client: IgnoremissingProperties
What is New (2011 September Event "Build") ?
-Blob
-Efficient Resume for browsers and streaming media player
-Table
-QueryProjection ($select)
-Projectonly selected columns
e.g.:var query=(from entity incontext.CreateQuery<CustomerSubSetTable>("Customers")
.selectnew CustomerSubSetTable
{
PartitionKey=entity.PartitionKey,
RowKey=entity.RowKey,
TotolPurcharse=entity.TotalPurcharse
}).AsTableServiceQuery<CustomerSubSetTable>();
foreach(CustomerSubSetTablecustomer in query)
{
}
-UpsertEntity (don't put ETag)
-InsertOrReplace
-InsertOrMerge
-Queue
-Allowworker to extend invisibility timeout
-Allowworker to update content of queue message
-Enableefficient continuation on worker failure
Storage Analytics
-Log (storein windows azure blob, a request typically appear in log within 15 minutes)
-traceall transactions for blob, table ,queue
-howlong request take
-whatclient ip
-whatis the request id
-whichblob, container was been access
-Metric(storein windows azure table)
-perhours of summary of key statistics about the traffic to their blob, table,queue
-totaltransaction
-storageserver latency
-applicationE2E latency
-timefor input to be transferred to storage service
-timefor storage to process request and compute result (as storage server latency)
-timefor application to retrive result
**RetentionPolicy on both logs and metric in terms of days
Windows Azure Storage Internal - Storage Stamps (how to makeazure storage availbility, durability, scalability)
-onestorage account is assigned to one storage stamps (storage tenant)
-onestorage stamps include 10-20 rack data storage (2-30 TB)
-3 Layer
-FrontEnd Layer
-authentication
-authorization
-login,routing
-holdpartition map (index)
-Patitionlayer
-knowwhat is table, blob, queue object
-maketable, blob,queue object strong consistent
-Scalableobject(table,blob,queue) index
-spreadthe index cross 100s server
-dynamic load balance
-DFSLayer
-makefiles durable,replic 3 times cross fault domain and upgrad domain
-dochecksum
-loadbalancing
-read- each replics can be reads
-writeuse journal drive to low latency