minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	fb96779a8a	Add large bucket support for erasure coded backend (#5160 ) This PR implements an object layer which combines input erasure sets of XL layers into a unified namespace. This object layer extends the existing erasure coded implementation, it is assumed in this design that providing > 16 disks is a static configuration as well i.e if you started the setup with 32 disks with 4 sets 8 disks per pack then you would need to provide 4 sets always. Some design details and restrictions: - Objects are distributed using consistent ordering to a unique erasure coded layer. - Each pack has its own dsync so locks are synchronized properly at pack (erasure layer). - Each pack still has a maximum of 16 disks requirement, you can start with multiple such sets statically. - Static sets set of disks and cannot be changed, there is no elastic expansion allowed. - Static sets set of disks and cannot be changed, there is no elastic removal allowed. - ListObjects() across sets can be noticeably slower since List happens on all servers, and is merged at this sets layer. Fixes #5465 Fixes #5464 Fixes #5461 Fixes #5460 Fixes #5459 Fixes #5458 Fixes #5460 Fixes #5488 Fixes #5489 Fixes #5497 Fixes #5496	7 years ago
Harshavardhana	994fe53669	Avoid shadowing ignored errors listAllBuckets() (#5524 ) It can happen such that one of the disks that was down would return 'errDiskNotFound' but the err is preserved due to loop shadowing which leads to issues when healing the bucket.	7 years ago
Aditya Manthramurthy	a337ea4d11	Move admin APIs to new path and add redesigned heal APIs (#5351 ) - Changes related to moving admin APIs - admin APIs now have an endpoint under /minio/admin - admin APIs are now versioned - a new API to server the version is added at "GET /minio/admin/version" and all API operations have the path prefix /minio/admin/v1/<operation> - new service stop API added - credentials change API is moved to /minio/admin/v1/config/credential - credentials change API and configuration get/set API now require TLS so that credentials are protected - all API requests now receive JSON - heal APIs are disabled as they will be changed substantially - Heal API changes Heal API is now provided at a single endpoint with the ability for a client to start a heal sequence on all the data in the server, a single bucket, or under a prefix within a bucket. When a heal sequence is started, the server returns a unique token that needs to be used for subsequent 'status' requests to fetch heal results. On each status request from the client, the server returns heal result records that it has accumulated since the previous status request. The server accumulates upto 1000 records and pauses healing further objects until the client requests for status. If the client does not request any further records for a long time, the server aborts the heal sequence automatically. A heal result record is returned for each entity healed on the server, such as system metadata, object metadata, buckets and objects, and has information about the before and after states on each disk. A client may request to force restart a heal sequence - this causes the running heal sequence to be aborted at the next safe spot and starts a new heal sequence.	7 years ago
poornas	0bb6247056	Move nslocking from s3 layer to object layer (#5382 ) Fixes #5350	7 years ago
Harshavardhana	c0721164be	Automatically set goroutines based on shardSize (#5346 ) Update reedsolomon library to enable feature to automatically set number of go-routines based on the input shard size, since shard size is sort of a constant in Minio for objects > 10MiB (default blocksize) klauspost reported around 15-20% improvement in performance numbers on older systems such as AVX and SSE3 ``` name old speed new speed delta Encode10x2x10000-8 5.45GB/s ± 1% 6.22GB/s ± 1% +14.20% (p=0.000 n=9+9) Encode100x20x10000-8 1.44GB/s ± 1% 1.64GB/s ± 1% +13.77% (p=0.000 n=10+10) Encode17x3x1M-8 10.0GB/s ± 5% 12.0GB/s ± 1% +19.88% (p=0.000 n=10+10) Encode10x4x16M-8 7.81GB/s ± 5% 8.56GB/s ± 5% +9.58% (p=0.000 n=10+9) Encode5x2x1M-8 15.3GB/s ± 2% 19.6GB/s ± 2% +28.57% (p=0.000 n=9+10) Encode10x2x1M-8 12.2GB/s ± 5% 15.0GB/s ± 5% +22.45% (p=0.000 n=10+10) Encode10x4x1M-8 7.84GB/s ± 1% 9.03GB/s ± 1% +15.19% (p=0.000 n=9+9) Encode50x20x1M-8 1.73GB/s ± 4% 2.09GB/s ± 4% +20.59% (p=0.000 n=10+9) Encode17x3x16M-8 10.6GB/s ± 1% 11.7GB/s ± 4% +10.12% (p=0.000 n=8+10) ```	7 years ago
Nitish Tiwari	1a3dbbc9dd	Add x-amz-storage-class support (#5295 ) This adds configurable data and parity options on a per object basis. To use variable parity - Users can set environment variables to cofigure variable parity - Then add header x-amz-storage-class to putobject requests with relevant storage class values Fixes #4997	7 years ago
Harshavardhana	8efa82126b	Convert errors tracer into a separate package (#5221 )	7 years ago
Aditya Manthramurthy	4c9fae90ff	Optimize healObject by eliminating extra data passes (#4949 )	7 years ago
Frank Wessels	61e0b1454a	Add support for timeouts for locks (#4377 )	7 years ago
Andreas Auernhammer	85fcee1919	erasure: simplify XL backend operations (#4649 ) (#4758 ) This change provides new implementations of the XL backend operations: - create file - read file - heal file Further this change adds table based tests for all three operations. This affects also the bitrot algorithm integration. Algorithms are now integrated in an idiomatic way (like crypto.Hash). Fixes #4696 Fixes #4649 Fixes #4359	7 years ago
Aditya Manthramurthy	32da1aa9d6	XL: Simplify heal-format operations This is in preparation for updated admin heal API. * Improve case analysis of healFormatXL() - fixes a case where disks could have unhandled errors. * Simplify healFormatXLFreshDisks() and healFormatXLCorruptedDisks() to share more code and handle fewer cases for improved simplicity and reduced code repetition. * Fix test cases.	7 years ago
Anis Elleuch	af8071c86a	xl: Fix rare freeze after many disk/network errors (#4438 ) xl.storageDisks is sometimes passed to some low-level XL functions. Some disks in xl.storageDisks are set to nil when they encounter some errors. This means all elements in xl.storageDisks will be nil after some time which lead to an unusable XL.	8 years ago
Harshavardhana	075b8903d7	fs: Add safe locking semantics for `format.json` (#4523 ) This patch also reverts previous changes which were merged for migration to the newer disk format. We will be bringing these changes in subsequent releases. But we wish to add protection in this release such that future release migrations are protected. Revert "fs: Migration should handle bucketConfigs as regular objects. (#4482)" This reverts commit `976870a391`. Revert "fs: Migrate object metadata to objects directory. (#4195)" This reverts commit `76f4f20609`.	8 years ago
Krishnan Parthasarathi	ca64b86112	Return possible states a heal operation (#4045 )	8 years ago
Krishnan Parthasarathi	2bd694dbc8	Add disksUnavailable healStatus const (#3990 ) `disksUnavailable` healStatus constant indicates that a given object needs healing but one or more of disks requiring heal are offline. This can be used by admin heal API consumers to distinguish between a successful heal and a no-op since the outdated disks were offline.	8 years ago
Krishnan Parthasarathi	c27ece409b	heal: Check if all parts are available and valid (#3967 ) In the algorithm to check if an object requires healing, in addition to checking if all disks have xl.json present we should check if all parts of the object are present and have valid blake2b checksums. Also fixed a minor compilation error in heal-objects-list.go.	8 years ago
Krishnan Parthasarathi	c192e5c9b2	Implement heal-upload admin API (#3914 ) This API is meant for administrative tools like mc-admin to heal an ongoing multipart upload on a Minio server. N B This set of admin APIs apply only for Minio servers. `github.com/minio/minio/pkg/madmin` provides a go SDK for this (and other admin) operations. Specifically, func HealUpload(bucket, object, uploadID string, dryRun bool) error Sample admin API request: POST /?heal&bucket=mybucket&object=myobject&upload-id=myuploadID&dry-run - Header(s): ["x-minio-operation"] = "upload" Notes: - bucket, object and upload-id are mandatory query parameters - if dry-run is set, API returns success if all parameters passed are valid.	8 years ago
Harshavardhana	e49efcb9d9	xl: quickHeal heal bucket only when needed. (#3854 ) This improves the startup time significantly for clusters which have lot of buckets. Also fixes a bug where `.minio.sys` is created on disks which do not have `format.json`	8 years ago
Krishnan Parthasarathi	e3fd4c0dd6	XL: Make listOnlineDisks and outDatedDisks consistent w/ each other. (#3808 )	8 years ago
Harshavardhana	bcc5b6e1ef	xl: Rename getOrderedDisks as shuffleDisks appropriately. (#3796 ) This PR is for readability cleanup - getOrderedDisks as shuffleDisks - getOrderedPartsMetadata as shufflePartsMetadata Distribution is now a second argument instead being the primary input argument for brevity. Also change the usage of type casted int64(0), instead rely on direct type reference as `var variable int64` everywhere.	8 years ago
Harshavardhana	6a6c930f5b	xl: Abort multipart upload should honor quorum properly. (#3670 ) Current implementation didn't honor quorum properly and didn't handle the errors generated properly. This patch addresses that and also moves common code `cleanupMultipartUploads` into xl specific private function. Fixes #3665	8 years ago
Krishnan Parthasarathi	864b8795aa	heal: Should delete stale object parts before healing (#3649 )	8 years ago
Anis Elleuch	0715032598	heal: Add ListBucketsHeal object API (#3563 ) ListBucketsHeal will list which buckets that need to be healed: * ListBucketsHeal() (buckets []BucketInfo, err error)	8 years ago
Krishnan Parthasarathi	c194b9f5f1	Implement mgmt REST APIs for heal subcommands (#3533 ) The heal APIs supported in this change are, - listing of objects to be healed. - healing a bucket. - healing an object.	8 years ago
Harshavardhana	1c699d8d3f	fs: Re-implement object layer to remember the fd (#3509 ) This patch re-writes FS backend to support shared backend sharing locks for safe concurrent access across multiple servers.	8 years ago
Harshavardhana	2d6f8153fa	format: Check properly for disks in valid formats. (#3427 ) There was an error in how we validated disk formats, if one of the disk was formatted and was formatted with FS would cause confusion and object layer would never initialize essentially go into an infinite loop. Validate pre-emptively and also check for FS format properly.	8 years ago
Harshavardhana	4daa0d2cee	lock: Moving locking to handler layer. (#3381 ) This is implemented so that the issues like in the following flow don't affect the behavior of operation. ``` GetObjectInfo() .... --> Time window for mutation (no lock held) .... --> Time window for mutation (no lock held) GetObject() ``` This happens when two simultaneous uploads are made to the same object the object has returned wrong info to the client. Another classic example is "CopyObject" API itself which reads from a source object and copies to destination object. Fixes #3370 Fixes #2912	8 years ago
Harshavardhana	ff4ce0ee14	fs/xl: Combine input checks into re-usable functions. (#3383 ) Repeated code around both object layers are moved and combined into simple re-usable functions.	8 years ago
Bala FA	0f2e493c9a	Use isErrIgnored() function wherever applicable. (#3343 )	8 years ago
Bala FA	1d4ac4b084	Rename getUUID() into mustGetUUID() (#3320 ) In case of UUID generation failure mustGetUUID() will panic than infinitely trying in for loop.	8 years ago
Harshavardhana	5197649081	utils: reduceErrs returns and validates quorum errors. (#3300 ) This is needed as explained by @krisis Lets say we have following errors. ``` []error{nil, errFileNotFound, errDiskAccessDenied, errDiskAccesDenied} ``` Since the last two errors are filtered, the maximum is nil, depending on map order. Let's say we get nil from reduceErr. Clearly at this point we don't have quorum nodes agreeing about the data and since GetObject only requires N/2 (Read quorum) and isDiskQuorum would have returned true. This is problematic and can lead to undersiable consequences. Fixes #3298	8 years ago
Krishnan Parthasarathi	eed9ab0464	XL: pickValidXLMeta should return error instead of panic'ing (#3277 )	8 years ago
Harshavardhana	0b9f0d14a1	auth/rpc: Take remote disk offline after maximum allowed attempts. (#3288 ) Disks when are offline for a long period of time, we should ignore the disk after trying Login upto 5 times. This is to reduce the network chattiness, this also reduces the overall time spent on `net.Dial`. Fixes #3286	8 years ago
Anis Elleuch	ffbee70e04	Avoid removing 'tmp' directory inside '.minio.sys' (#3294 )	8 years ago
Harshavardhana	1c47365445	xl/bootup: Upon bootup handle errors loading bucket and event configs. (#3287 ) In a situation when we have lots of buckets the bootup time might have slowed down a bit but during this situation the servers quickly going up and down would be an in-transit state. Certain calls which do not use quorum like `readXLMetaStat` might return an error saying `errDiskNotFound` this is returned in place of expected `errFileNotFound` which leads to an issue where server doesn't start. To avoid this situation we need to ignore them as safe values to be ignored, for the most part these are network related errors. Fixes #3275	8 years ago
Harshavardhana	c91d3791f9	heal: Add healing support for bucket, bucket metadata files. (#3252 ) This patch implements healing in general but it is only used as part of quickHeal(). Fixes #3237	8 years ago
Aditya Manthramurthy	dd0698d14c	Improve namespace lock API: (#3203 ) - abstract out instrumentation information. - use separate lockInstance type that encapsulates the nsMutex, volume, path and opsID as the frontend or top-level lock object.	8 years ago
Harshavardhana	39331b6b4e	xl: GetCheckSumInfo() shouldn't fail if hash not available. (#2984 ) In a multipart upload scenario disks going down and coming backup can lead to certain parts missing on the disk/server which was going down. This is a valid case since these blocks can be missing and should be healed through heal operation. But we are not supposed to fail prematurely since we have enough data on the other disks as well within read-quorum. This fix relaxes previous assumption, fixes a major corruption issue reproduced by @vadmeste. Fixes #2976	8 years ago
Harshavardhana	fee3f99a6e	xl: heal bucket should validate if bucket exists first. (#2953 ) Fixes #2944	8 years ago
Krishna Srinivas	f5f007e183	Test: Add test case for xl.HealObject() (#2884 ) fixes #2842	8 years ago
Harshavardhana	1e6d67b16d	server: Remove deadcode. (#2699 )	8 years ago
Krishna Srinivas	7cc77eba45	XL/Healing: errDiskNotFound is the only pardonable error in xlShouldHeal. (#2586 ) This is so that we try to heal a file for all the "bad" cases except when the disk is down.	8 years ago
Anis Elleuch	200d327737	List only objects that need healing (#2546 )	8 years ago
Harshavardhana	bccf549463	server: Move all the top level files into cmd folder. (#2490 ) This change brings a change which was done for the 'mc' package to allow for clean repo and have a cleaner github drop in experience.	8 years ago
Krishna Srinivas	e2498edb45	contoller: Implement controlled healing and trigger (#2381 ) This patch introduces new command line 'control' - minio control TO manage minio server connecting through GoRPC API frontend. - minio control heal Is implemented for healing objects.	8 years ago
karthic rao	5fe72cf205	Removing readAllMeta from xl-v1-healing.go and placing it in xl-v1-utils.go (#2296 )	8 years ago
Krishna Srinivas	b090c7112e	Refactor of xl.PutObjectPart and erasureCreateFile. (#2193 ) * XL: Refactor of xl.PutObjectPart and erasureCreateFile. * GetCheckSum and AddCheckSum methods for xlMetaV1 * Simple unit test case for erasureCreateFile()	9 years ago
Harshavardhana	3b69b4ada4	server: Change server startup message. (#2195 ) This change brings in the new agreed startup message for the server. Adds additional links point to Minio SDKs as well.	9 years ago
Harshavardhana	623e0f9243	XL: listOnlineDisks should use modTime instead of version. (#2166 ) This change is needed to make reading from objects future proof in-terms of handling online disks. Our current counter is not based on affirmative knowledge and relies on arithmetic sequence which can lead to bugs. Using modTime simplifies the understanding of `xl.json` and future tooling / debugging of the format.	9 years ago
Krishnan Parthasarathi	bef72f26db	xl: Make locking more granular for PutObjectPart requests (#2168 )	9 years ago

44 Commits (574b667c5639079da8e8c3cad2fad5d8cbee7224)