minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	1e53bf2789	fix: allow expansion with newer constraints for older setups (#11372 ) currently we had a restriction where older setups would need to follow previous style of "stripe" count being same expansion, we can relax that instead newer pools can be expanded for older setups with newer constraints of common parity ratio.	4 years ago
Harshavardhana	7624c8b9bb	fix: honor storage class uniformity for multiple pools (#11309 )	4 years ago
Harshavardhana	1ad2b7b699	fix: add stricter validation for erasure server pools (#11299 ) During expansion we need to validate if - new deployment is expanded with newer constraints - existing deployment is expanded with older constraints - multiple server pools rejected if they have different deploymentID and distribution algo	4 years ago
Harshavardhana	f903cae6ff	Support variable server pools (#11256 ) Current implementation requires server pools to have same erasure stripe sizes, to facilitate same SLA and expectations. This PR allows server pools to be variadic, i.e they do not have to be same erasure stripe sizes - instead they should have SLA for parity ratio. If the parity ratio cannot be guaranteed by the new server pool, the deployment is rejected i.e server pool expansion is not allowed.	4 years ago
Harshavardhana	790833f3b2	Revert "Support variable server sets (#10314 )" This reverts commit `aabf053d2f`.	4 years ago
Harshavardhana	aabf053d2f	Support variable server sets (#10314 )	4 years ago
Harshavardhana	df93102235	fix: unwrapping issues with os.Is* functions (#10949 ) reduces 3 stat calls, reducing the overall startup time significantly.	4 years ago
Harshavardhana	d1b1fee080	fix: save healing tracker right before healing (#10915 ) this change avoids a situation where accidentally if the user deleted the healing tracker or drives were replaced again within the 10sec window.	4 years ago
Klaus Post	86e0d272f3	Reduce WriteAll allocs (#10810 ) WriteAll saw 127GB allocs in a 5 minute timeframe for 4MiB buffers used by `io.CopyBuffer` even if they are pooled. Since all writers appear to write byte buffers, just send those instead and write directly. The files are opened through the `os` package so they have no special properties anyway. This removes the alloc and copy for each operation. REST sends content length so a precise alloc can be made.	4 years ago
Harshavardhana	b686bb9c83	fix: replaced drive properly by healing the entire drive (#10799 ) Bonus fixes, we do not need reload format anymore as the replaced drive is healed locally we only need to ensure that drive heal reloads the drive properly. We preserve the UUID of the original order, this means that the replacement in `format.json` doesn't mean that the drive needs to be reloaded into memory anymore. fixes #10791	4 years ago
Klaus Post	a982baff27	ListObjects Metadata Caching (#10648 ) Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c Gist of improvements: * Cross-server caching and listing will use the same data across servers and requests. * Lists can be arbitrarily resumed at a constant speed. * Metadata for all files scanned is stored for streaming retrieval. * The existing bloom filters controlled by the crawler is used for validating caches. * Concurrent requests for the same data (or parts of it) will not spawn additional walkers. * Listing a subdirectory of an existing recursive cache will use the cache. * All listing operations are fully streamable so the number of objects in a bucket no longer dictates the amount of memory. * Listings can be handled by any server within the cluster. * Caches are cleaned up when out of date or superseded by a more recent one.	4 years ago
Harshavardhana	029758cb20	fix: retain the previous UUID for newly replaced drives (#10759 ) only newly replaced drives get the new `format.json`, this avoids disks reloading their in-memory reference format, ensures that drives are online without reloading the in-memory reference format. keeping reference format in-tact means UUIDs never change once they are formatted.	4 years ago
Harshavardhana	6a8c62f9fd	make sure to preserve UUID from reference format (#10748 ) reference format should be source of truth for inconsistent drives which reconnect, add them back to their original position remove automatic fix for existing offline disk uuids	4 years ago
Harshavardhana	66174692a2	add '.healing.bin' for tracking currently healing disk (#10573 ) add a hint on the disk to allow for tracking fresh disk being healed, to allow for restartable heals, and also use this as a way to track and remove disks. There are more pending changes where we should move all the disk formatting logic to backend drives, this PR doesn't deal with this refactor instead makes it easier to track healing in the future.	4 years ago
Klaus Post	2d58a8d861	Add storage layer contexts (#10321 ) Add context to all (non-trivial) calls to the storage layer. Contexts are propagated through the REST client. - `context.TODO()` is left in place for the places where it needs to be added to the caller. - `endWalkCh` could probably be removed from the walkers, but no changes so far. The "dangerous" part is that now a caller disconnecting will propagate down, so a "delete" operation will now be interrupted. In some cases we might want to disconnect this functionality so the operation completes if it has started, leaving the system in a cleaner state.	4 years ago
Harshavardhana	a359e36e35	tolerate listing with only readQuorum disks (#10357 ) We can reduce this further in the future, but this is a good value to keep around. With the advent of continuous healing, we can be assured that namespace will eventually be consistent so we are okay to avoid the necessity to a list across all drives on all sets. Bonus Pop()'s in parallel seem to have the potential to wait too on large drive setups and cause more slowness instead of gaining any performance remove it for now. Also, implement load balanced reply for local disks, ensuring that local disks have an affinity for - cleanupStaleMultipartUploads()	4 years ago
Harshavardhana	74116204ce	handle fresh setup with mixed drives (#10273 ) fresh drive setups when one of the drive is a root drive, we should ignore such a root drive and not proceed to format. This PR handles this properly by marking the disks which are root disk and they are taken offline.	4 years ago
Harshavardhana	b16781846e	allow server to start even with corrupted/faulty disks (#10175 )	4 years ago
Harshavardhana	187c3f62df	fix: heal replaced drives properly (#10069 ) healing was not working properly when drives were replaced, due to the error check in root disk calculation this PR fixes this behavior This PR also adds additional fix for missing metadata entries from .minio.sys as part of disk healing as well. Added code to ignore and print more context sensitive errors for better debugging. This PR is continuation of fix in `7b14e9b660`	4 years ago
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	5 years ago
Harshavardhana	d0ae69087c	fix: add proper errors for disks with preexisting content (#9703 )	5 years ago
Anis Elleuch	9baeda781a	fix storage info output with unordered endpoints arguments (#9610 ) Shuffling arguments that we pass to MinIO server are supported. However, when that happens, Prometheus returns wrong information about disks usage and online/offline status. The commit fixes the issue by avoiding relying on xl.endpoints since it is not ordered.	5 years ago
Harshavardhana	1bc32215b9	enable full linter across the codebase (#9620 ) enable linter using golangci-lint across codebase to run a bunch of linters together, we shall enable new linters as we fix more things the codebase. This PR fixes the first stage of this cleanup.	5 years ago
Harshavardhana	6ac48a65cb	fix: use unused cacheMetrics code in prometheus (#9588 ) remove all other unusued/deadcode	5 years ago
Harshavardhana	bc61417284	calculate automatic node based symmetry (#9446 ) it is possible in many screnarios that even if the divisible value is optimal, we may end up with uneven distribution due to number of nodes present in the configuration. added code allow for affinity towards various ellipses to figure out optimal value across ellipses such that we can always reach a symmetric value automatically. Fixes #9416	5 years ago
Harshavardhana	f44cfb2863	use GlobalContext whenever possible (#9280 ) This change is throughout the codebase to ensure that all codepaths honor GlobalContext	5 years ago
Harshavardhana	91f21ddc47	fix: ignore lost+found properly while reading disks (#9278 ) Fixes #9277	5 years ago
Harshavardhana	30707659b5	[feature] allow for an odd number of erasure packs (#9221 ) Too many deployments come up with an odd number of hosts or drives, to facilitate even distribution among those setups allow for odd and prime numbers based packs.	5 years ago
Harshavardhana	6f992134a2	fix: startup load time by reusing storageDisks (#9210 )	5 years ago
Klaus Post	8d98662633	re-implement data usage crawler to be more efficient (#9075 ) Implementation overview: https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b	5 years ago
Harshavardhana	6a00eb10bf	fix: allow set drive count of proper divisible values (#9101 ) Currently the code assumed some orthogonal requirements which led situations where when we have a setup where we have let's say for example 168 drives, the final set_drive_count chosen was 14. Indeed 168 drives are divisible by 12 but this wasn't allowed due to an unexpected requirement to have 12 to be a perfect modulo of 14 which is not possible. This assumption was incorrect. This PR fixes this old assumption properly, also adds few tests and some negative tests as well. Improvements are seen in error messages as well.	5 years ago
Anis Elleuch	6d5d77f62c	usage typo: Fix creating .minio.sys/background-ops bucket (#8957 ) Due to a typo in the code, a cluster was not correctly creating `background-ops` in all disks and nodes print the following error: minio3_1 \| API: SYSTEM() minio3_1 \| Time: 19:32:45 UTC 02/06/2020 minio3_1 \| DeploymentID: d67c20fa-4a1e-41f5-b319-7e3e90f425d8 minio3_1 \| Error: Bucket not found: .minio.sys/background-ops minio3_1 \| 2: cmd/data-usage.go:109:cmd.runDataUsageInfo() minio3_1 \| 1: cmd/data-usage.go:56:cmd.runDataUsageInfoUpdateRoutine() This commit fixes the typo.	5 years ago
Harshavardhana	64fde1ab95	xl/zones: return errNoHealRequired when no heal is required (#8821 ) Zone abstraction of object layer was returning `nil` incorrectly under situations where disk healing is not required. Returning `nil` is considered as healing successful, which leads to unexpected ReloadFormat() peer notification calls during startup. This PR fixes this behavior properly for zones.	5 years ago
Anis Elleuch	069876e262	xl: All nodes create meta volumes in its local disks (#8786 ) Meta volumes directories, tmp/, background-ops/, etc.. undr .minio.sys are created when disks are formatted but also when the cluster is started. However using MakeVolBulk() is not appropriate in the case of a user migrating from a version which does not have .minio.sys/background-ops/. The reason is that MakeVolBulk() exits early when an error is occured: errVolumeExists in this case, which is expected since some directories such as tmp/ already exist. This commit will avoid use MakeVolBulk and use MakeVol instead. Also the PR will make each node creates meta volumes in its local disks and stop relying on the first disk since the first node could be offline.	5 years ago
Klaus Post	37b32199e3	Validate XL sets on format (#8779 ) When formatting a set validate if a host failure will likely lead to data loss. While we don't know what config will be set in the future evaluate to our best knowledge, assuming default settings.	5 years ago
Harshavardhana	f68a7005c0	Improve disk formatting stage for large disk sets (#8690 )	5 years ago
Anis Elleuch	555969ee42	Add data usage collect with its new admin API (#8553 ) Admin data usage info API returns the following (Only FS & XL, for now) - Number of buckets - Number of objects - The total size of objects - Objects histogram - Bucket sizes	5 years ago
Harshavardhana	5d3d57c12a	Start using error wrapping with fmt.Errorf (#8588 ) Use fatih/errwrap to fix all the code to use error wrapping with fmt.Errorf()	5 years ago
Harshavardhana	4e9de58675	Avoid pointer based copy, instead use Clone() (#8547 ) This PR adds functional test to test expanded cluster syntax.	5 years ago
Harshavardhana	8392d2f510	Preserve same deploymentID on all zones (#8542 )	5 years ago
Harshavardhana	347b29d059	Implement bucket expansion (#8509 )	5 years ago
Harshavardhana	e9b2bf00ad	Support MinIO to be deployed on more than 32 nodes (#8492 ) This PR implements locking from a global entity into a more localized set level entity, allowing for locks to be held only on the resources which are writing to a collection of disks rather than a global level. In this process this PR also removes the top-level limit of 32 nodes to an unlimited number of nodes. This is a precursor change before bring in bucket expansion.	5 years ago
Harshavardhana	68a519a468	Use errgroups instead of sync.WaitGroup as needed (#8354 )	5 years ago
Harshavardhana	127641731a	Parallelize initialization of storageDisks (#8288 )	5 years ago
Harshavardhana	c8fbc94329	Fix writing 'format.json' and make it atomic (#8296 ) - Choose a unique uuid such that under situations of duplicate mounts we do not append to an existing json entry. - Avoid AppendFile instead use WriteAll() to write the entire byte array atomically.	5 years ago
Harshavardhana	53e4887e02	Simplify and cleanup metadata r/w functions (#8146 )	5 years ago
Praveen raj Mani	b976521c83	Ignore faulty disks in xl-sets Storage info (#7878 )	5 years ago
Krishna Srinivas	338e9a9be9	Put object client disconnect (#7824 ) Fail putObject and postpolicy in case client prematurely disconnects Use request's context to cancel lock requests on client disconnects	5 years ago
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	6 years ago
Krishnan Parthasarathi	93a9078b23	Assign deploymentID for first minio server in distributed setup (#7427 ) - Pass local endpoints to functions fixing formatXL during startup	6 years ago

20 Commits (8cad407e0b011e05e34f3aca8093dfdb4a2630dc)