minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	09bc49bd51	fix: healBucket across sets should capture results properly (#11341 ) healing `.minio.sys/config` returns incorrect quorum errors across sets, healing of the buckets.	4 years ago
Harshavardhana	f21d650ed4	fix: readData in bulk call using messagepack byte wrappers (#11228 ) This PR refactors the way we use buffers for O_DIRECT and to re-use those buffers for messagepack reader writer. After some extensive benchmarking found that not all objects have this benefit, and only objects smaller than 64KiB see this benefit overall. Benefits are seen from almost all objects from 1KiB - 32KiB Beyond this no objects see benefit with bulk call approach as the latency of bytes sent over the wire v/s streaming content directly from disk negate each other with no remarkable benefits. All other optimizations include reuse of msgp.Reader, msgp.Writer using sync.Pool's for all internode calls.	4 years ago
Harshavardhana	d0027c3c41	do not use large buffers if not necessary (#11220 ) without this change, there is a performance regression for small objects GETs, this makes the overall speed to go back to pre '59d363' commit days.	4 years ago
Harshavardhana	c4131c2798	feat: Small object optimization read data in single bulk call (#11207 )	4 years ago
Anis Elleuch	677e80c0f8	xl: Remove check-dir in ReadVersion (#11200 ) The only purpose of check-dir flag in ReadVersion is to return 404 when an object has xl.meta but without data. This is causing an extract call to the disk which can be penalizing in case of busy system where disks receive many concurrent access.	4 years ago
Ritesh H Shukla	556524c715	Reduce logging when peer is offline (#11184 )	4 years ago
Harshavardhana	5b30bbda92	fix: add more protection distribution to match EcIndex (#10772 ) allows for more stricter validation in picking up the right set of disks for reconstruction.	4 years ago
Krishna Srinivas	c49a80db41	fix: use meta.Erasure.Index for GetObject() to reconstruct object (#10764 )	4 years ago
Anis Elleuch	eb95353cb1	fix: Get/HeadObject return 404 on non quorum objects (#10753 )	4 years ago
Klaus Post	b1c99e88ac	reduce CPU usage upto 50% in readdir (#10466 )	4 years ago
Klaus Post	2d58a8d861	Add storage layer contexts (#10321 ) Add context to all (non-trivial) calls to the storage layer. Contexts are propagated through the REST client. - `context.TODO()` is left in place for the places where it needs to be added to the caller. - `endWalkCh` could probably be removed from the walkers, but no changes so far. The "dangerous" part is that now a caller disconnecting will propagate down, so a "delete" operation will now be interrupted. In some cases we might want to disconnect this functionality so the operation completes if it has started, leaving the system in a cleaner state.	4 years ago
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	5 years ago
Harshavardhana	f44cfb2863	use GlobalContext whenever possible (#9280 ) This change is throughout the codebase to ensure that all codepaths honor GlobalContext	5 years ago
Harshavardhana	30707659b5	[feature] allow for an odd number of erasure packs (#9221 ) Too many deployments come up with an odd number of hosts or drives, to facilitate even distribution among those setups allow for odd and prime numbers based packs.	5 years ago
Harshavardhana	aa2e89bfe3	Use jsoniter whenever applicable instead of encoding/json (#8766 ) This PR adds jsoniter package to replace encoding/json in places where faster json unmarshal is necessary whenever input JSON is large enough. Some benchmarking comparison between jsoniter and enconding/json benchmark old MB/s new MB/s speedup BenchmarkParseUnmarshal/N10-4 110.02 331.17 3.01x BenchmarkParseUnmarshal/N100-4 125.74 524.09 4.17x BenchmarkParseUnmarshal/N500-4 131.68 542.60 4.12x BenchmarkParseUnmarshal/N1000-4 133.93 514.88 3.84x BenchmarkParseUnmarshal/N5000-4 122.10 415.36 3.40x BenchmarkParseUnmarshal/N10000-4 132.13 403.90 3.06x	5 years ago
Harshavardhana	822eb5ddc7	Bring in safe mode support (#8478 ) This PR refactors object layer handling such that upon failure in sub-system initialization server reaches a stage of safe-mode operation wherein only certain API operations are enabled and available. This allows for fixing many scenarios such as - incorrect configuration in vault, etcd, notification targets - missing files, incomplete config migrations unable to read encrypted content etc - any other issues related to notification, policies, lifecycle etc	5 years ago
Harshavardhana	68a519a468	Use errgroups instead of sync.WaitGroup as needed (#8354 )	5 years ago
Harshavardhana	ff5bf51952	admin/heal: Fix deep healing to heal objects under more conditions (#8321 ) - Heal if the part.1 is truncated from its original size - Heal if the part.1 fails while being verified in between - Heal if the part.1 fails while being at a certain offset Other cleanups include make sure to flush the HTTP responses properly from storage-rest-server, avoid using 'defer' to improve call latency. 'defer' incurs latency avoid them in our hot-paths such as storage-rest handlers. Fixes #8319	5 years ago
Harshavardhana	53e4887e02	Simplify and cleanup metadata r/w functions (#8146 )	5 years ago
Harshavardhana	b52a3e523c	Avoid using fastjson parser pool, move back to jsoniter (#8190 ) It looks like from implementation point of view fastjson parser pool doesn't behave the same way as expected when dealing many `xl.json` from multiple disks. The fastjson parser pool usage ends up returning incorrect xl.json entries for checksums, with references pointing to older entries. This led to the subtle bug where checksum info is duplicated from a previous xl.json read of a different file from different disk.	5 years ago
Harshavardhana	9ca7470ccc	Avoid using jsoniter, move to fastjson (#8063 ) This is to avoid using unsafe.Pointer type code dependency for MinIO, this causes crashes on ARM64 platforms Refer #8005 collection of runtime crashes due to unsafe.Pointer usage incorrectly. We have seen issues like this before when using jsoniter library in the past. This PR hopes to fix this using fastjson	5 years ago
Praveen raj Mani	c113d4e49c	Posix CreateFile should work for compressed lengths (#7584 )	6 years ago
Harshavardhana	f767a2538a	Optimize listing with leaf check offloaded to posix (#7541 ) Other listing optimizations include - remove double sorting while filtering object entries - improve error message when upload-id is not in quorum - use jsoniter for full unmarshal json, instead of gjson - remove unused code	6 years ago
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	6 years ago
poornas	5a80cbec2a	Add double encryption at S3 gateway. (#6423 ) This PR adds pass-through, single encryption at gateway and double encryption support (gateway encryption with pass through of SSE headers to backend). If KMS is set up (either with Vault as KMS or using MINIO_SSE_MASTER_KEY),gateway will automatically perform single encryption. If MINIO_GATEWAY_SSE is set up in addition to Vault KMS, double encryption is performed.When neither KMS nor MINIO_GATEWAY_SSE is set, do a pass through to backend. When double encryption is specified, MINIO_GATEWAY_SSE can be set to "C" for SSE-C encryption at gateway and backend, "S3" for SSE-S3 encryption at gateway/backend or both to support more than one option. Fixes #6323, #6696	6 years ago
Harshavardhana	c82acc599a	Treat empty xl.json as file not found (#6804 ) If the buffer is empty we can avoid parsing it and treat it essentially as `xl.json` is effectively missing.	6 years ago
Harshavardhana	36990aeafd	Avoid double bucket validation in DeleteObjectHandler (#6720 ) On a heavily loaded server, getBucketInfo() becomes slow, one can easily observe deleting an object causes many additional network calls. This PR is to let the underlying call return the actual error and write it back to the client.	6 years ago
Praveen raj Mani	ce9d36d954	Add object compression support (#6292 ) Add support for streaming (golang/LZ77/snappy) compression.	6 years ago
Harshavardhana	a63bc9254d	Add 'disk' tag to log output to enhance 'disk not found' errors (#6460 )	6 years ago
kannappanr	2d84b02bc4	Check for absence of checksum field and attributes. (#6298 ) Fixes #6295	6 years ago
kannappanr	c7946ab9ab	Remove unnecessary error log messages (#6186 )	6 years ago
Krishna Srinivas	bb34bd91f1	Fix unnecessary log messages to avoid flooding the logs (#5900 )	7 years ago
Krishna Srinivas	6831177394	Do not log errFileNotFound error (#5853 )	7 years ago
Harshavardhana	adf9a9d300	Remove all unused variables and functions (#5823 )	7 years ago
kannappanr	cef992a395	Remove error package and cause functions (#5784 )	7 years ago
kannappanr	f8a3fd0c2a	Create logger package and rename errorIf to LogIf (#5678 ) Removing message from error logging Replace errors.Trace with LogIf	7 years ago
Anis Elleuch	120b061966	Add multipart support in SSE-C encryption (#5576 ) ) Add Put/Get support of multipart in encryption ) Add GET Range support for encryption ) Add CopyPart encrypted support ) Support decrypting of large single PUT object	7 years ago
Harshavardhana	fb96779a8a	Add large bucket support for erasure coded backend (#5160 ) This PR implements an object layer which combines input erasure sets of XL layers into a unified namespace. This object layer extends the existing erasure coded implementation, it is assumed in this design that providing > 16 disks is a static configuration as well i.e if you started the setup with 32 disks with 4 sets 8 disks per pack then you would need to provide 4 sets always. Some design details and restrictions: - Objects are distributed using consistent ordering to a unique erasure coded layer. - Each pack has its own dsync so locks are synchronized properly at pack (erasure layer). - Each pack still has a maximum of 16 disks requirement, you can start with multiple such sets statically. - Static sets set of disks and cannot be changed, there is no elastic expansion allowed. - Static sets set of disks and cannot be changed, there is no elastic removal allowed. - ListObjects() across sets can be noticeably slower since List happens on all servers, and is merged at this sets layer. Fixes #5465 Fixes #5464 Fixes #5461 Fixes #5460 Fixes #5459 Fixes #5458 Fixes #5460 Fixes #5488 Fixes #5489 Fixes #5497 Fixes #5496	7 years ago
Aditya Manthramurthy	a337ea4d11	Move admin APIs to new path and add redesigned heal APIs (#5351 ) - Changes related to moving admin APIs - admin APIs now have an endpoint under /minio/admin - admin APIs are now versioned - a new API to server the version is added at "GET /minio/admin/version" and all API operations have the path prefix /minio/admin/v1/<operation> - new service stop API added - credentials change API is moved to /minio/admin/v1/config/credential - credentials change API and configuration get/set API now require TLS so that credentials are protected - all API requests now receive JSON - heal APIs are disabled as they will be changed substantially - Heal API changes Heal API is now provided at a single endpoint with the ability for a client to start a heal sequence on all the data in the server, a single bucket, or under a prefix within a bucket. When a heal sequence is started, the server returns a unique token that needs to be used for subsequent 'status' requests to fetch heal results. On each status request from the client, the server returns heal result records that it has accumulated since the previous status request. The server accumulates upto 1000 records and pauses healing further objects until the client requests for status. If the client does not request any further records for a long time, the server aborts the heal sequence automatically. A heal result record is returned for each entity healed on the server, such as system metadata, object metadata, buckets and objects, and has information about the before and after states on each disk. A client may request to force restart a heal sequence - this causes the running heal sequence to be aborted at the next safe spot and starts a new heal sequence.	7 years ago
Nitish Tiwari	ede504400f	Add validation of xlMeta ErasureInfo field (#5389 )	7 years ago
Harshavardhana	d45a8784fc	Fix hash order to generate more even distribution (#5247 ) The problem in existing code was the following line ``` start := int(keyCrc%uint32(cardinality)) \| 1 ``` A given a value of N cardinality the ending result because of the the bitwise '\|' would lead to always higher affinity to odd sequences. As can be seen from the test cases that this can lead to many objects being allocated the same set of disks or atleast the first disk is an odd disk always. This introduces a performance problem for majority of the objects under concurrent load. Remove `\| 1` to provide a more cleaner distribution and the new code will be. ``` start := int(keyCrc % uint32(cardinality)) ``` Thanks to Krishna Srinivas for pointing out the bitwise situation here.	7 years ago
Harshavardhana	8efa82126b	Convert errors tracer into a separate package (#5221 )	7 years ago
Harshavardhana	0b546ddfd4	Return errors in PutObject()/PutObjectPart() if input size is -1. (#5015 ) Amazon S3 API expects all incoming stream has a content-length set it was superflous for us to support object layer which supports unknown sized stream as well, this PR removes such requirements and explicitly error out if input stream is less than zero.	7 years ago
Harshavardhana	2e6ee68409	fix: [minor] Avoid unnecessary typecasting. (#4828 ) We don't need to typecast identifiers from their base to type to same type again. This is not a bug and compiler is fine to skip it but it is better to avoid if not needed.	7 years ago
Frank Wessels	a2f2044528	Minor corrections in comments for xl utils (#4815 )	7 years ago
Andreas Auernhammer	85fcee1919	erasure: simplify XL backend operations (#4649 ) (#4758 ) This change provides new implementations of the XL backend operations: - create file - read file - heal file Further this change adds table based tests for all three operations. This affects also the bitrot algorithm integration. Algorithms are now integrated in an idiomatic way (like crypto.Hash). Fixes #4696 Fixes #4649 Fixes #4359	7 years ago
Frank Wessels	46897b1100	Name return values to prevent the need (and unnecessary code bloat) (#4576 ) This is done to explicitly instantiate objects for every return statement.	8 years ago
Anis Elleuch	af8071c86a	xl: Fix rare freeze after many disk/network errors (#4438 ) xl.storageDisks is sometimes passed to some low-level XL functions. Some disks in xl.storageDisks are set to nil when they encounter some errors. This means all elements in xl.storageDisks will be nil after some time which lead to an unusable XL.	8 years ago
Aditya Manthramurthy	8975da4e84	Add new ReadFileWithVerify storage-layer API (#4349 ) This is an enhancement to the XL/distributed-XL mode. FS mode is unaffected. The ReadFileWithVerify storage-layer call is similar to ReadFile with the additional functionality of performing bit-rot checking. It accepts additional parameters for a hashing algorithm to use and the expected hex-encoded hash string. This patch provides significant performance improvement because: 1. combines the step of reading the file (during erasure-decoding/reconstruction) with bit-rot verification; 2. limits the number of file-reads; and 3. avoids transferring the file over the network for bit-rot verification. ReadFile API is implemented as ReadFileWithVerify with empty hashing arguments. Credits to AB and Harsha for the algorithmic improvement. Fixes #4236.	8 years ago
Harshavardhana	155a90403a	fs/erasure: Rename meta 'md5Sum' as 'etag'. (#4319 ) This PR also does backend format change to 1.0.1 from 1.0.0. Backward compatible changes are still kept to read the 'md5Sum' key. But all new objects will be stored with the same details under 'etag'. Fixes #4312	8 years ago

12 Commits (cfc8b92dff1afe22e195cbafb80d86380672a638)