minio

Commit Graph

Author	SHA1	Message	Date
Poorna Krishnamoorthy	1ebf6f146a	Add support for ILM transition (#10565 ) This PR adds transition support for ILM to transition data to another MinIO target represented by a storage class ARN. Subsequent GET or HEAD for that object will be streamed from the transition tier. If PostRestoreObject API is invoked, the transitioned object can be restored for duration specified to the source cluster.	4 years ago
Harshavardhana	8f7fe0405e	fix: delete marker replication should support directories (#10878 ) allow directories to be replicated as well, along with their delete markers in replication. Bonus fix to fix bloom filter updates for directories to be preserved.	4 years ago
Harshavardhana	9a34fd5c4a	Revert "Revert "Add delete marker replication support (#10396 )"" This reverts commit `267d7bf0a9`.	4 years ago
Harshavardhana	17a5ff51ff	fix: move context timeout closer to network for Delete calls (#10897 ) allowing for disconnects to be limited to the drive themselves instead of disconnecting all drives.	4 years ago
Harshavardhana	267d7bf0a9	Revert "Add delete marker replication support (#10396 )" This reverts commit `50c10a5087`. PR is moved to origin/dev branch	4 years ago
Poorna Krishnamoorthy	50c10a5087	Add delete marker replication support (#10396 ) Delete marker replication is implemented for V2 configuration specified in AWS spec (though AWS allows it only in the V1 configuration). This PR also brings in a MinIO only extension of replicating permanent deletes, i.e. deletes specifying version id are replicated to target cluster.	4 years ago
Harshavardhana	b72cac4cf3	fix: dangling objects on actual namespace (#10822 )	4 years ago
Klaus Post	2294e53a0b	Don't retain context in locker (#10515 ) Use the context for internal timeouts, but disconnect it from outgoing calls so we always receive the results and cancel it remotely.	4 years ago
Harshavardhana	8527f22df1	optimize request URL encoding for internode (#10811 ) this reduces allocations in order of magnitude Also, revert "erasure: delete dangling objects automatically (#10765)" affects list caching should be investigated.	4 years ago
Anis Elleuch	b456292295	erasure: delete dangling objects automatically (#10765 )	4 years ago
Klaus Post	a982baff27	ListObjects Metadata Caching (#10648 ) Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c Gist of improvements: * Cross-server caching and listing will use the same data across servers and requests. * Lists can be arbitrarily resumed at a constant speed. * Metadata for all files scanned is stored for streaming retrieval. * The existing bloom filters controlled by the crawler is used for validating caches. * Concurrent requests for the same data (or parts of it) will not spawn additional walkers. * Listing a subdirectory of an existing recursive cache will use the cache. * All listing operations are fully streamable so the number of objects in a bucket no longer dictates the amount of memory. * Listings can be handled by any server within the cluster. * Caches are cleaned up when out of date or superseded by a more recent one.	4 years ago
Harshavardhana	5b30bbda92	fix: add more protection distribution to match EcIndex (#10772 ) allows for more stricter validation in picking up the right set of disks for reconstruction.	4 years ago
Krishna Srinivas	c49a80db41	fix: use meta.Erasure.Index for GetObject() to reconstruct object (#10764 )	4 years ago
Anis Elleuch	eb95353cb1	fix: Get/HeadObject return 404 on non quorum objects (#10753 )	4 years ago
Harshavardhana	253194e491	do not hold write locks - if objects don't exist (#10644 )	4 years ago
Harshavardhana	736e58dd68	fix: handle concurrent lockers with multiple optimizations (#10640 ) - select lockers which are non-local and online to have affinity towards remote servers for lock contention - optimize lock retry interval to avoid sending too many messages during lock contention, reduces average CPU usage as well - if bucket is not set, when deleteObject fails make sure setPutObjHeaders() honors lifecycle only if bucket name is set. - fix top locks to list out always the oldest lockers always, avoid getting bogged down into map's unordered nature.	4 years ago
Harshavardhana	18063bf25c	fix: cleanup old directory handling code (#10633 ) we don't need them anymore, remove legacy code.	4 years ago
Harshavardhana	6fcbdd5607	remove unused putObjectDir code (#10528 )	4 years ago
Harshavardhana	02c1a08a5b	fix: make sure to lock CopyObject for in-place updates (#10492 )	4 years ago
Harshavardhana	0104af6bcc	delayed locks until we have started reading the body (#10474 ) This is to ensure that Go contexts work properly, after some interesting experiments I found that Go net/http doesn't cancel the context when Body is non-zero and hasn't been read till EOF. The following gist explains this, this can lead to pile up of go-routines on the server which will never be canceled and will die at a really later point in time, which can simply overwhelm the server. https://gist.github.com/harshavardhana/c51dcfd055780eaeb71db54f9c589150 To avoid this refactor the locking such that we take locks after we have started reading from the body and only take locks when needed. Also, remove contextReader as it's not useful, doesn't work as expected context is not canceled until the body reaches EOF so there is no point in wrapping it with context and putting a `select {` on it which can unnecessarily increase the CPU overhead. We will still use the context to cancel the lockers etc. Additional simplification in the locker code to avoid timers as re-using them is a complicated ordeal avoid them in the hot path, since locking is very common this may avoid lots of allocations.	4 years ago
Harshavardhana	48919de301	fix: for defer'ed deleteObject use internal context (#10463 )	4 years ago
Harshavardhana	c13afd56e8	Remove MaxConnsPerHost settings to avoid potential hangs (#10438 ) MaxConnsPerHost can potentially hang a call without any way to timeout, we do not need this setting for our proxy and gateway implementations instead IdleConn settings are good enough. Also ensure to use NewRequestWithContext and make sure to take the disks offline only for network errors. Fixes #10304	4 years ago
Klaus Post	2d58a8d861	Add storage layer contexts (#10321 ) Add context to all (non-trivial) calls to the storage layer. Contexts are propagated through the REST client. - `context.TODO()` is left in place for the places where it needs to be added to the caller. - `endWalkCh` could probably be removed from the walkers, but no changes so far. The "dangerous" part is that now a caller disconnecting will propagate down, so a "delete" operation will now be interrupted. In some cases we might want to disconnect this functionality so the operation completes if it has started, leaving the system in a cleaner state.	4 years ago
Harshavardhana	37da0c647e	fix: delete marker compatibility behavior for suspended bucket (#10395 ) - delete-marker should be created on a suspended bucket as `null` - delete-marker should delete any pre-existing `null` versioned object and create an entry `null`	4 years ago
poornas	79e21601b0	fix: web handlers to enforce replication (#10249 ) This PR also preserves source ETag for replication	4 years ago
Harshavardhana	5ce82b45da	add CopyObject optimization when source and destination are same (#10170 ) when source and destination are same and versioning is enabled on the destination bucket - we do not need to re-create the entire object once again to optimize on space utilization. Cases this PR is not supporting - any pre-existing legacy object will not be preserved in this manner, meaning a new dataDir will be created. - key-rotation and storage class changes of course will never re-use the dataDir	4 years ago
Harshavardhana	b68bc75dad	fix: quorum calculation mistake with reduced parity (#10186 ) With reduced parity our write quorum should be same as read quorum, but code was still assuming ``` readQuorum+1 ``` In all situations which is not necessary.	4 years ago
Harshavardhana	35212b673e	add unformatted disk as part of the error list (#10128 ) these errors should be ignored for quorum error calculation to ensure that we don't prematurely return unformatted disk error as part of API calls	4 years ago
poornas	c43da3005a	Add support for server side bucket replication (#9882 )	4 years ago
Harshavardhana	14b1c9f8e4	fix: return Range errors after If-Matches (#10045 ) closes #7292	4 years ago
Anis Elleuch	778e9c864f	Move dependency from minio-go v6 to v7 (#10042 )	4 years ago
Harshavardhana	2743d4ca87	fix: Add support for preserving mtime for replication (#9995 ) This PR is needed for bucket replication support	4 years ago
Harshavardhana	810a4f0723	fix: return proper errors Get/HeadObject for deleteMarkers (#9957 )	4 years ago
Harshavardhana	a38ce29137	fix: simplify background heal and trigger heal items early (#9928 ) Bonus fix during versioning merge one of the PR was missing the offline/online disk count fix from #9801 port it correctly over to the master branch from release. Additionally, add versionID support for MRF Fixes #9910 Fixes #9931	4 years ago
Harshavardhana	e79874f58e	[feat] Preserve version supplied by client (#9854 ) Just like GET/DELETE APIs it is possible to preserve client supplied versionId's, of course the versionIds have to be uuid, if an existing versionId is found it is overwritten if no object locking policies are found. - PUT /bucketname/objectname?versionId=<id> - POST /bucketname/objectname?uploads=&versionId=<id> - PUT /bucketname/objectname?verisonId=<id> (with x-amz-copy-source)	4 years ago
Harshavardhana	4ac31ea82b	fix: find current location of object multi-zones (#9840 ) PutObject on multiple-zone with versioning would not overwrite the correct location of the object if the object has delete marker, leading to duplicate objects on two zones. This PR fixes by adding affinity towards delete marker when GetObjectInfo() returns error, use the zone index which has the delete marker.	4 years ago
Anis Elleuch	2073b79633	fix: Remove unnecessary debug log line (#9834 )	5 years ago
Anis Elleuch	63e9005f01	fix: Avoid updating object tags on failed disks (#9819 )	5 years ago
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	5 years ago
Harshavardhana	4790868878	allow background IAM load to speed up startup (#9796 ) Also fix healthcheck handler to run success only if object layer has initialized fully for S3 API access call.	5 years ago
Harshavardhana	41688a936b	fix: CopyObject behavior on expanded zones (#9729 ) CopyObject was not correctly figuring out the correct destination object location and would end up creating duplicate objects on two different zones, reproduced by doing encryption based key rotation.	5 years ago
Harshavardhana	3da1869d5e	Avoid double reads on metadata during GetObject() (#9719 ) Overall TTFB can see a dramatic improvement with this change - did not do any benchmark as such but the change itself is self-explanatory	5 years ago
Klaus Post	4a007e3767	Prefer local disks when fetching data blocks (#9563 ) If the requested server is part of the set this will always read from the local disk, even if the disk contains a parity shard. In default setup there is a 50% chance that at least one shard that otherwise would have been fetched remotely will be read locally instead. It basically trades RPC call overhead for reed-solomon. On distributed localhost this seems to be fairly break-even, with a very small gain in throughput and latency. However on networked servers this should be a bigger 1MB objects, before: ``` Operation: GET. Concurrency: 32. Hosts: 4. Requests considered: 76257: * Avg: 25ms 50%: 24ms 90%: 32ms 99%: 42ms Fastest: 7ms Slowest: 67ms * First Byte: Average: 23ms, Median: 22ms, Best: 5ms, Worst: 65ms Throughput: * Average: 1213.68 MiB/s, 1272.63 obj/s (59.948s, starting 14:45:44 CEST) ``` After: ``` Operation: GET. Concurrency: 32. Hosts: 4. Requests considered: 78845: * Avg: 24ms 50%: 24ms 90%: 31ms 99%: 39ms Fastest: 8ms Slowest: 62ms * First Byte: Average: 22ms, Median: 21ms, Best: 6ms, Worst: 57ms Throughput: * Average: 1255.11 MiB/s, 1316.08 obj/s (59.938s, starting 14:43:58 CEST) ``` Bonus fix: Only ask for heal once on an object.	5 years ago
P R	3f6d624c7b	add gateway object tagging support (#9124 )	5 years ago
Bala FA	3773874cd3	add bucket tagging support (#9389 ) This patch also simplifies object tagging support	5 years ago
Klaus Post	073aac3d92	add data update tracking using bloom filter (#9208 ) By monitoring PUT/DELETE and heal operations it is possible to track changed paths and keep a bloom filter for this data. This can help prioritize paths to scan. The bloom filter can identify paths that have not changed, and the few collisions will only result in a marginal extra workload. This can be implemented on either a bucket+(1 prefix level) with reasonable performance. The bloom filter is set to have a false positive rate at 1% at 1M entries. A bloom table of this size is about ~2500 bytes when serialized. To not force a full scan of all paths that have changed cycle bloom filters would need to be kept, so we guarantee that dirty paths have been scanned within cycle runs. Until cycle bloom filters have been collected all paths are considered dirty.	5 years ago
Harshavardhana	60d415bb8a	deprecate/remove global WORM mode (#9436 ) global WORM mode is a complex piece for which the time has passed, with the advent of S3 compatible object locking and retention implementation global WORM is sort of deprecated, this has been mentioned in our documentation for some time, now the time has come for this to go.	5 years ago
Harshavardhana	282c9f790a	fix: validate partNumber in queryParam as part of preConditions (#9386 )	5 years ago
Bala FA	95e89f1712	proactive deep heal object when a bitrot is detected (#9192 )	5 years ago
Harshavardhana	30707659b5	[feature] allow for an odd number of erasure packs (#9221 ) Too many deployments come up with an odd number of hosts or drives, to facilitate even distribution among those setups allow for odd and prime numbers based packs.	5 years ago

39 Commits (0fa430c1da76699894ec192952b0dac2dd16052f)