minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	e0055609bb	fix: crawler to skip healing the drives in a set being healed (#11274 ) If an erasure set had a drive replacement recently, we don't need to attempt healing on another drive with in the same erasure set - this would ensure we do not double heal the same content and also prioritizes usage for such an erasure set to be calculated sooner.	4 years ago
Harshavardhana	3ca6330661	fix: optimize parentDirIsObject by moving isObject to storage layer (#11291 ) For objects with `N` prefix depth, this PR reduces `N` such network operations by converting `CheckFile` into a single bulk operation. Reduction in chattiness here would allow disks to be utilized more cleanly, while maintaining the same functionality along with one extra volume check stat() call is removed. Update tests to test multiple sets scenario	4 years ago
Anis Elleuch	c9d502e6fa	parentDirIsObject() to return quickly with inexistant parent (#11204 ) Rewrite parentIsObject() function. Currently if a client uploads a/b/c/d, we always check if c, b, a are actual objects or not. The new code will check with the reverse order and quickly quit if the segment doesn't exist. So if a, b, c in 'a/b/c' does not exist in the first place, then returns false quickly.	4 years ago
Harshavardhana	c19e6ce773	avoid a crash in crawler when lifecycle is not initialized (#11170 ) Bonus for static buffers use bytes.NewReader instead of bytes.NewBuffer, to use a more reader friendly implementation	4 years ago
Klaus Post	a982baff27	ListObjects Metadata Caching (#10648 ) Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c Gist of improvements: * Cross-server caching and listing will use the same data across servers and requests. * Lists can be arbitrarily resumed at a constant speed. * Metadata for all files scanned is stored for streaming retrieval. * The existing bloom filters controlled by the crawler is used for validating caches. * Concurrent requests for the same data (or parts of it) will not spawn additional walkers. * Listing a subdirectory of an existing recursive cache will use the cache. * All listing operations are fully streamable so the number of objects in a bucket no longer dictates the amount of memory. * Listings can be handled by any server within the cluster. * Caches are cleaned up when out of date or superseded by a more recent one.	4 years ago
Harshavardhana	f1cc16e788	fix: background heal rely on getOnlineDisks() (#10687 )	4 years ago
Klaus Post	3820a905e0	in getOnlineDisks wait for disks to be populated (#10685 )	4 years ago
Harshavardhana	f9be783f3e	fix: allow crawler to crawl on disks without usage constraints (#10677 ) additionally also change the resolution usage wise return of disks, allows to small byte level differences to be masked.	4 years ago
Harshavardhana	6484453fc6	optionally allow strict quorum listing (#10649 ) ``` export MINIO_API_LIST_STRICT_QUORUM=on ``` would enable listing in quorum if necessary	4 years ago
Harshavardhana	2b4eb87d77	pick disks which are common maximally used (#10600 ) further optimization to ensure that good disks are always used for listing, other than healing we only use disks that are maximally used.	4 years ago
Harshavardhana	00eb6f6bc9	cache DiskInfo at storage layer for performance (#10586 ) `mc admin info` on busy setups will not move HDD heads unnecessarily for repeated calls, provides a better responsiveness for the call overall. Bonus change allow listTolerancePerSet be N-1 for good entries, to avoid skipping entries for some reason one of the disk went offline.	4 years ago
Harshavardhana	66174692a2	add '.healing.bin' for tracking currently healing disk (#10573 ) add a hint on the disk to allow for tracking fresh disk being healed, to allow for restartable heals, and also use this as a way to track and remove disks. There are more pending changes where we should move all the disk formatting logic to backend drives, this PR doesn't deal with this refactor instead makes it easier to track healing in the future.	4 years ago
Anis Elleuch	ce6cef6855	erasure: Call Walk() from all disks (#10445 ) It does not make sense to call Walk() in only N/2 disks and then requires N/2 quorum, just keep it N/2+1 The commit fixes this behavior.	4 years ago
Harshavardhana	eb19c8af40	Bump response header timeout for proxying list request (#10420 )	4 years ago
Klaus Post	2d58a8d861	Add storage layer contexts (#10321 ) Add context to all (non-trivial) calls to the storage layer. Contexts are propagated through the REST client. - `context.TODO()` is left in place for the places where it needs to be added to the caller. - `endWalkCh` could probably be removed from the walkers, but no changes so far. The "dangerous" part is that now a caller disconnecting will propagate down, so a "delete" operation will now be interrupted. In some cases we might want to disconnect this functionality so the operation completes if it has started, leaving the system in a cleaner state.	4 years ago
Harshavardhana	a359e36e35	tolerate listing with only readQuorum disks (#10357 ) We can reduce this further in the future, but this is a good value to keep around. With the advent of continuous healing, we can be assured that namespace will eventually be consistent so we are okay to avoid the necessity to a list across all drives on all sets. Bonus Pop()'s in parallel seem to have the potential to wait too on large drive setups and cause more slowness instead of gaining any performance remove it for now. Also, implement load balanced reply for local disks, ensuring that local disks have an affinity for - cleanupStaleMultipartUploads()	4 years ago
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	5 years ago
Harshavardhana	f44cfb2863	use GlobalContext whenever possible (#9280 ) This change is throughout the codebase to ensure that all codepaths honor GlobalContext	5 years ago
Harshavardhana	30707659b5	[feature] allow for an odd number of erasure packs (#9221 ) Too many deployments come up with an odd number of hosts or drives, to facilitate even distribution among those setups allow for odd and prime numbers based packs.	5 years ago
Krishna Srinivas	ef6304c5c2	Improve connectDisks() performance (#9203 )	5 years ago
Harshavardhana	ee4a6a823d	Migrate config to KV data format (#8392 ) - adding oauth support to MinIO browser (#8400) by @kanagaraj - supports multi-line get/set/del for all config fields - add support for comments, allow toggle - add extensive validation of config before saving - support MinIO browser to support proper claims, using STS tokens - env support for all config parameters, legacy envs are also supported with all documentation now pointing to latest ENVs - preserve accessKey/secretKey from FS mode setups - add history support implements three APIs - ClearHistory - RestoreHistory - ListHistory - add help command support for each config parameters - all the bug fixes after migration to KV, and other bug fixes encountered during testing.	5 years ago
Anis Elleuch	7bf093c06a	xl: Fix isObject() to consider not found disks (#8411 ) xl.isObject() returns 'nil' for not found disks when calculating the existance of xl.json for a given object, which what StatFile() is also doing (setting nil) if xl.json exists. This commit avoids this confusion by setting errDiskNotFound error when the storage disk is not found.	5 years ago
Harshavardhana	68a519a468	Use errgroups instead of sync.WaitGroup as needed (#8354 )	5 years ago
Harshavardhana	e6d8e272ce	Use const slashSeparator instead of "/" everywhere (#8028 )	5 years ago
Harshavardhana	64998fc4ab	Remove delayIsLeaf requirement simplify ListObjects further (#7593 )	6 years ago
Harshavardhana	f767a2538a	Optimize listing with leaf check offloaded to posix (#7541 ) Other listing optimizations include - remove double sorting while filtering object entries - improve error message when upload-id is not in quorum - use jsoniter for full unmarshal json, instead of gjson - remove unused code	6 years ago
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	6 years ago
Harshavardhana	b6c00405ec	Do not pro-actively return false in isObjectDir() (#7246 ) We should change the logic for both isObject() and isObjectDir() leaf detection to be done with quorum, due to how our directory navigation works - this allows for properly deleting all the dangling directories or objects if any.	6 years ago
Harshavardhana	30135eed86	Redo how to handle stale dangling files (#7171 ) foo.CORRUPTED should never be created because when multiple sets are involved we would hash the file to wrong a location, this PR removes the code. But allows DeleteBucket() to work properly to delete dangling buckets/objects. Also adds another option to Healing where a user needs to specify `--remove` such that all dangling objects will be deleted with user confirmation.	6 years ago
kannappanr	c7946ab9ab	Remove unnecessary error log messages (#6186 )	6 years ago
Krishna Srinivas	ce02ab613d	Simplify erasure code by separating bitrot from erasure code (#5959 )	6 years ago
Harshavardhana	ad86454580	Make sure to handle FaultyDisks in listing ops (#6204 ) Continuing from PR `157ed65c35` Our posix.go implementation did not handle I/O errors properly on the disks, this led to situations where top-level callers such as ListObjects might return early without even verifying all the available disks. This commit tries to address this in Kubernetes, drbd/nbd based persistent volumes which can disconnect under load and result in the situations with disks return I/O errors. This commit also simplifies listing operation, listing never returns any error. We can avoid this since we pretty much ignore most of the errors anyways. When objects are accessed directly we return proper errors.	6 years ago
Anis Elleuch	6d5f2a4391	Better support of empty directories (#5890 ) Better support of HEAD and listing of zero sized objects with trailing slash (a.k.a empty directory). For that, isLeafDir function is added to indicate if the specified object is an empty directory or not. Each backend (xl, fs) has the responsibility to store that information. Currently, in both of XL & FS, an empty directory is represented by an empty directory in the backend. isLeafDir() checks if the given path is an empty directory or not, since dir listing is costly if the latter contains too many objects, readDirN() is added in this PR to list only N number of entries. In isLeadDir(), we will only list one entry to check if a directory is empty or not.	7 years ago
kannappanr	cef992a395	Remove error package and cause functions (#5784 )	7 years ago
kannappanr	f8a3fd0c2a	Create logger package and rename errorIf to LogIf (#5678 ) Removing message from error logging Replace errors.Trace with LogIf	7 years ago
Aditya Manthramurthy	ea8973b7d7	Return bit-rot verified data instead of re-reading from disk (#5568 ) - Data from disk was being read after bitrot verification to return data for GetObject. Strictly speaking this does not guarantee bitrot protection, as disks may return bad data even temporarily. - This fix reads data from disk, verifies data for bitrot and then returns data to the client directly.	7 years ago
Harshavardhana	fb96779a8a	Add large bucket support for erasure coded backend (#5160 ) This PR implements an object layer which combines input erasure sets of XL layers into a unified namespace. This object layer extends the existing erasure coded implementation, it is assumed in this design that providing > 16 disks is a static configuration as well i.e if you started the setup with 32 disks with 4 sets 8 disks per pack then you would need to provide 4 sets always. Some design details and restrictions: - Objects are distributed using consistent ordering to a unique erasure coded layer. - Each pack has its own dsync so locks are synchronized properly at pack (erasure layer). - Each pack still has a maximum of 16 disks requirement, you can start with multiple such sets statically. - Static sets set of disks and cannot be changed, there is no elastic expansion allowed. - Static sets set of disks and cannot be changed, there is no elastic removal allowed. - ListObjects() across sets can be noticeably slower since List happens on all servers, and is merged at this sets layer. Fixes #5465 Fixes #5464 Fixes #5461 Fixes #5460 Fixes #5459 Fixes #5458 Fixes #5460 Fixes #5488 Fixes #5489 Fixes #5497 Fixes #5496	7 years ago
Harshavardhana	8efa82126b	Convert errors tracer into a separate package (#5221 )	7 years ago
Harshavardhana	d3eb5815d9	Avoid DDOS in PutObject() when objectName is '/' and size '0' (#4962 ) It can happen that an incoming PutObject() request might have inputs of following form eg:- - bucketName is 'testbucket' - objectName is '/' bucketName exists and was previously created but there are no other objects in this bucket. In a situation like this parentDirIsObject() goes into an infinite loop. Verifying that if '/' is an object fails on both backends but the resulting `path.Dir('/')` returns `'/'` this causes the closure to loop onto itself. Fixes #4940	7 years ago
Bala FA	1c97dcb10a	Add UTCNow() function. (#3931 ) This patch adds UTCNow() function which returns current UTC time. This is equivalent of UTCNow() == time.Now().UTC()	8 years ago
Bala FA	0f2e493c9a	Use isErrIgnored() function wherever applicable. (#3343 )	8 years ago
Anis Elleuch	a47ce7ab22	Add support of fallocate for FS and XL backends (#3032 )	8 years ago
Anis Elleuch	7a549096de	XL and FS use different tree walk ignored errors (#2707 )	8 years ago
Harshavardhana	bccf549463	server: Move all the top level files into cmd folder. (#2490 ) This change brings a change which was done for the 'mc' package to allow for clean repo and have a cleaner github drop in experience.	8 years ago
Harshavardhana	a0635dcdd9	XL: Do not rely on getLoadBalancedQuorumDisks for NS consistency. (#2243 ) The reason is any function relying on `getLoadBalancedQuorumDisks` cannot possibly have an idempotent behavior. The problem comes from given a set of N disks returning just a shuffled N/2 disks. In case of a scenario where we have N/2 number of failed disks, the returned value of `getLoadBalancedQuorumDisks` is not equal to the same failed disks so essentially calls using such disks might succeed or fail randomly at different intervals in time. This proposal change is we move to `getLoadBalancedDisks()` and use the shuffled N disks as a whole. Since most of the time we might hit a good disk since we are not reducing our solution space. This also provides consistent behavior for all the functions which rely on shuffled disks. Fixes #2242	8 years ago
Harshavardhana	cef26fd6ea	XL: Refactor usage of reduceErrs and consistent behavior. (#2240 ) This refactor is also needed in lieu of our quorum requirement change for the newly understood logic behind klauspost/reedsolom implementation.	8 years ago
frankw	63b3f1dcfd	Use new algorithm to get fixed random order of disks (#2147 )	8 years ago
Harshavardhana	ca1b1921c4	XL: Implement ignore errors. (#2136 ) Each metadata ops have a list of errors which can be ignored, this is essentially needed when - disks are not found - disks are found but cannot be accessed (permission denied) - disks are there but fresh disks were added This is needed since we don't have healing code in place where it would have healed the fresh disks added. Fixes #2072	8 years ago
Bala FA	61598ed02f	posix: return errFaultyDisk on I/O errors. (#1885 ) When I/O error is occured more than allowed limit, posix returns errFaultyDisk. Fixes #1884	9 years ago
Harshavardhana	51f3d4e0ca	XL/multipart: statPart should ignore errDiskNotFound. (#1862 ) startPart should also take uploadId and partName as arguments.	9 years ago

17 Commits (09bc49bd51ae5ce87cbabf8eb98385e0a5fc5601)