Update design doc with zone implementation details (#8738)

Fixes #8719
master
Harshavardhana 5 years ago committed by kannappanr
parent cb935980a5
commit dd311623df
  1. 139
      docs/distributed/DESIGN.md

@ -1,14 +1,15 @@
# Distributed Server Design Guide [![Slack](https://slack.min.io/slack?type=svg)](https://slack.min.io) # Distributed Server Design Guide [![Slack](https://slack.min.io/slack?type=svg)](https://slack.min.io)
This document explains the design approach and advanced use cases of the MinIO distributed server. This document explains the design, architecture and advanced use cases of the MinIO distributed server.
## Command-line ## Command-line
``` ```
NAME: NAME:
minio server - start object storage server. minio server - start object storage server
USAGE: USAGE:
minio server [FLAGS] DIR1 [DIR2..] minio server [FLAGS] DIR1 [DIR2..]
minio server [FLAGS] DIR{1...64} minio server [FLAGS] DIR{1...64}
minio server [FLAGS] DIR{1...64} DIR{65...128}
DIR: DIR:
DIR points to a directory on a filesystem. When you want to combine DIR points to a directory on a filesystem. When you want to combine
@ -48,46 +49,30 @@ Expansion of ellipses and choice of erasure sets based on this expansion is an a
- In this algorithm, we also make sure that we spread the disks out evenly. MinIO server expands ellipses passed as arguments. Here is a sample expansion to demonstrate the process. - In this algorithm, we also make sure that we spread the disks out evenly. MinIO server expands ellipses passed as arguments. Here is a sample expansion to demonstrate the process.
``` ```
minio server http://host{1...4}/export{1...8} minio server http://host{1...2}/export{1...8}
``` ```
Expected expansion Expected expansion
``` ```
> http://host1/export1 > http://host1/export1
> http://host2/export1 > http://host2/export1
> http://host3/export1
> http://host4/export1
> http://host1/export2 > http://host1/export2
> http://host2/export2 > http://host2/export2
> http://host3/export2
> http://host4/export2
> http://host1/export3 > http://host1/export3
> http://host2/export3 > http://host2/export3
> http://host3/export3
> http://host4/export3
> http://host1/export4 > http://host1/export4
> http://host2/export4 > http://host2/export4
> http://host3/export4
> http://host4/export4
> http://host1/export5 > http://host1/export5
> http://host2/export5 > http://host2/export5
> http://host3/export5
> http://host4/export5
> http://host1/export6 > http://host1/export6
> http://host2/export6 > http://host2/export6
> http://host3/export6
> http://host4/export6
> http://host1/export7 > http://host1/export7
> http://host2/export7 > http://host2/export7
> http://host3/export7
> http://host4/export7
> http://host1/export8 > http://host1/export8
> http://host2/export8 > http://host2/export8
> http://host3/export8
> http://host4/export8
``` ```
A noticeable trait of this expansion is that it chooses unique hosts such that the erasure code is efficient across drives and hosts. *A noticeable trait of this expansion is that it chooses unique hosts such the setup provides maximum protection and availability.*
- Choosing an erasure set for the object is decided during `PutObject()`, object names are used to find the right erasure set using the following pseudo code. - Choosing an erasure set for the object is decided during `PutObject()`, object names are used to find the right erasure set using the following pseudo code.
```go ```go
@ -103,6 +88,40 @@ Input for the key is the object name specified in `PutObject()`, returns a uniqu
- MinIO does erasure coding at the object level not at the volume level, unlike other object storage vendors. This allows applications to choose different storage class by setting `x-amz-storage-class=STANDARD/REDUCED_REDUNDANCY` for each object uploads so effectively utilizing the capacity of the cluster. Additionally these can also be enforced using IAM policies to make sure the client uploads with correct HTTP headers. - MinIO does erasure coding at the object level not at the volume level, unlike other object storage vendors. This allows applications to choose different storage class by setting `x-amz-storage-class=STANDARD/REDUCED_REDUNDANCY` for each object uploads so effectively utilizing the capacity of the cluster. Additionally these can also be enforced using IAM policies to make sure the client uploads with correct HTTP headers.
- MinIO also supports expansion of existing clusters in zones. Each zone is a self contained entity with same SLA's (read/write quorum) for each object as original cluster. By using the existing namespace for lookup validation MinIO ensures conflicting objects are not created. When no such object exists then MinIO simply uses the least used zone.
*There are no limits on how many zones can be combined*
```
minio server http://host{1...32}/export{1...32} http://host{5...6}/export{1...8}
```
In above example there are two zones
- 32 * 32 = 1024 drives zone1
- 2 * 8 = 16 drives zone2
> Notice the requirement of common SLA here original cluster had 1024 drives with 16 drives per erasure set, second zone is expected to have a minimum of 16 drives to match the original cluster SLA or it should be in multiples of 16.
Following pseudo code returns the correct least used zone index to upload an object.
```go
func getAvailableZoneIdx(ctx context.Context) int {
zones := z.getZonesAvailableSpace(ctx)
total := zones.TotalAvailable()
// choose when we reach this many
choose := rand.Uint64() % total
atTotal := uint64(0)
for _, zone := range zones {
atTotal += zone.Available
if atTotal > choose && zone.Available > 0 {
return zone.Index
}
}
// Should not happen, but print values just in case.
panic(fmt.Errorf("reached end of zones (total: %v, atTotal: %v, choose: %v)", total, atTotal, choose))
}
```
## Other usages ## Other usages
### Advanced use cases with multiple ellipses ### Advanced use cases with multiple ellipses
@ -114,7 +133,7 @@ minio server /mnt/controller{1...4}/data{1...16}
Standalone erasure coded configuration with 16 sets, 16 disks per set, across mounts and controllers. Standalone erasure coded configuration with 16 sets, 16 disks per set, across mounts and controllers.
``` ```
minio server /mnt{1..4}/controller{1...4}/data{1...16} minio server /mnt{1...4}/controller{1...4}/data{1...16}
``` ```
Distributed erasure coded configuration with 2 sets, 16 disks per set across hosts. Distributed erasure coded configuration with 2 sets, 16 disks per set across hosts.
@ -126,81 +145,3 @@ Distributed erasure coded configuration with rack level redundancy 32 sets in to
``` ```
minio server http://rack{1...4}-host{1...8}.example.net/export{1...16} minio server http://rack{1...4}-host{1...8}.example.net/export{1...16}
``` ```
## Backend `format.json` changes
`format.json` has new fields
- `disk` is changed to `this`
- `jbod` is changed to `sets` , along with this change sets is also a two dimensional list representing total sets and disks per set.
A sample `format.json` looks like below
```json
{
"version": "1",
"format": "xl",
"xl": {
"version": "2",
"this": "4ec63786-3dbd-4a9e-96f5-535f6e850fb1",
"sets": [
[
"4ec63786-3dbd-4a9e-96f5-535f6e850fb1",
"1f3cf889-bc90-44ca-be2a-732b53be6c9d",
"4b23eede-1846-482c-b96f-bfb647f058d3",
"e1f17302-a850-419d-8cdb-a9f884a63c92"
], [
"2ca4c5c1-dccb-4198-a840-309fea3b5449",
"6d1e666e-a22c-4db4-a038-2545c2ccb6d5",
"d4fa35ab-710f-4423-a7c2-e1ca33124df0",
"88c65e8b-00cb-4037-a801-2549119c9a33"
]
],
"distributionAlgo": "CRCMOD"
}
}
```
New `format-xl.go` behavior is format structure is used as a opaque type, `Format` field signifies the format of the backend. Once the format has been identified it is now the job of the identified backend to further interpret the next structures and validate them.
```go
type formatType string
const (
formatFS formatType = "fs"
formatXL = "xl"
)
type format struct {
Version string
Format BackendFormat
}
```
### Current format
```go
type formatXLV1 struct{
format
XL struct{
Version string
Disk string
JBOD []string
}
}
```
### New format
```go
type formatXLV2 struct {
Version string `json:"version"`
Format string `json:"format"`
XL struct {
Version string `json:"version"`
This string `json:"this"`
Sets [][]string `json:"sets"`
DistributionAlgo string `json:"distributionAlgo"`
} `json:"xl"`
}
```

Loading…
Cancel
Save