minio

Commit Graph

Author	SHA1	Message	Date
Klaus Post	1c90a6bd49	S3 Select: Convert CSV data to JSON (#8464 )	5 years ago
Klaus Post	26e760ee62	Fix JSON Close data race. (#8486 ) The JSON stream library has no safe way of aborting while Since we cannot expect the called to safely handle "Read" and "Close" calls we must handle this. Also any Read error returned from upstream will crash the server. We preserve the errors and instead always return io.EOF upstream, but send the error on Close. `readahead v1.3.1` handles Read after Close better. Updates to `progressReader` is mostly to ensure safety. Fixes #8481	5 years ago
Klaus Post	38e6d911ea	S3 Select: Detect full object (#8456 ) Check if select is `SELECT s.* from S3Object s` and forward it to All Fixes #8371 and makes this case run significantly faster.	5 years ago
Klaus Post	51456e6adc	Select: Support Square Bracket Lists (#8457 ) Allows for S3 compatible `SELECT * from s3object s WHERE id IN [3,2]` Fixes #8422	5 years ago
Harshavardhana	d48fd6fde9	Remove unusued params and functions (#8399 )	5 years ago
Klaus Post	002ac82631	S3 Select: Add parser support for lists. (#8329 )	5 years ago
Klaus Post	c1a17c2561	S3 Select: Aggregate AVG/SUM as float (#8326 ) Force sum/average to be calculated as a float. As noted in #8221 > run SELECT AVG(CAST (Score as int)) FROM S3Object on ``` Name,Score alice,80 bob,81 ``` > AWS S3 gives 80.5 and MinIO gives 80. This also makes overflows much more unlikely.	5 years ago
Klaus Post	1c5b05c130	S3 select: Fix output conversion on select * (#8303 ) Fixes #8268	5 years ago
Klaus Post	be313f1758	S3 Select: Workaround java buffer size (#8312 ) Updates #7475 The Java implementation has a 128KB buffer and a message must be emitted before that is used. #7475 therefore limits the message size to 128KB. But up to 256 bytes are written to the buffer in each call. This means we must emit a message before shorter than 128KB. Therefore we change the limit to 128KB minus 256 bytes.	5 years ago
Klaus Post	520552ffa9	S3 select: flush when reaching limit (#8279 ) Add missing flush when reaching select limit.	5 years ago
Klaus Post	dac1cf5a9a	S3 Select: Parsing tweaks (#8261 ) * Don't output empty lines. * Trim whitespace from byte to int/float/bool conversions.	5 years ago
Klaus Post	c9b8bd8de2	S3 Select: optimize output (#8238 ) Queue output items and reuse them. Remove the unneeded type system in sql and just use the Go type system. In best case this is more than an order of magnitude speedup: ``` BenchmarkSelectAll_1M-12 1 1841049400 ns/op 274299728 B/op 4198522 allocs/op BenchmarkSelectAll_1M-12 14 84833400 ns/op 169228346 B/op 3146541 allocs/op ```	5 years ago
Klaus Post	017456df63	Wait clearing the close channel (#8250 ) Close channel should not be nilled before goroutines have exited. Fixes potential hang on closing.	5 years ago
Klaus Post	ddea0bdf11	Concurrent CSV parsing and reduce S3 select allocations (#8200 ) ``` CSV parsing, BEFORE: BenchmarkReaderBasic-12 2842 407533 ns/op 397860 B/op 957 allocs/op BenchmarkReaderReplace-12 2718 429914 ns/op 397844 B/op 957 allocs/op BenchmarkReaderReplaceTwo-12 2718 435556 ns/op 397855 B/op 957 allocs/op BenchmarkAggregateCount_100K-12 171 6798974 ns/op 16667102 B/op 308077 allocs/op BenchmarkAggregateCount_1M-12 19 65657411 ns/op 168057743 B/op 3146610 allocs/op BenchmarkSelectAll_10M-12 1 20882119900 ns/op 2758799896 B/op 41978762 allocs/op CSV parsing, AFTER: BenchmarkReaderBasic-12 3721 312549 ns/op 101920 B/op 338 allocs/op BenchmarkReaderReplace-12 3776 318810 ns/op 101993 B/op 340 allocs/op BenchmarkReaderReplaceTwo-12 3610 330967 ns/op 102012 B/op 341 allocs/op BenchmarkAggregateCount_100K-12 295 4149588 ns/op 3553623 B/op 103261 allocs/op BenchmarkAggregateCount_1M-12 30 37746503 ns/op 33827931 B/op 1049435 allocs/op BenchmarkSelectAll_10M-12 1 17608495800 ns/op 1416504040 B/op 21007082 allocs/op ~ benchcmp old.txt new.txt benchmark old ns/op new ns/op delta BenchmarkReaderBasic-12 407533 312549 -23.31% BenchmarkReaderReplace-12 429914 318810 -25.84% BenchmarkReaderReplaceTwo-12 435556 330967 -24.01% BenchmarkAggregateCount_100K-12 6798974 4149588 -38.97% BenchmarkAggregateCount_1M-12 65657411 37746503 -42.51% BenchmarkSelectAll_10M-12 20882119900 17608495800 -15.68% benchmark old allocs new allocs delta BenchmarkReaderBasic-12 957 338 -64.68% BenchmarkReaderReplace-12 957 340 -64.47% BenchmarkReaderReplaceTwo-12 957 341 -64.37% BenchmarkAggregateCount_100K-12 308077 103261 -66.48% BenchmarkAggregateCount_1M-12 3146610 1049435 -66.65% BenchmarkSelectAll_10M-12 41978762 21007082 -49.96% benchmark old bytes new bytes delta BenchmarkReaderBasic-12 397860 101920 -74.38% BenchmarkReaderReplace-12 397844 101993 -74.36% BenchmarkReaderReplaceTwo-12 397855 102012 -74.36% BenchmarkAggregateCount_100K-12 16667102 3553623 -78.68% BenchmarkAggregateCount_1M-12 168057743 33827931 -79.87% BenchmarkSelectAll_10M-12 2758799896 1416504040 -48.66% ``` ``` BenchmarkReaderHuge/97K-12 2200 540840 ns/op 184.32 MB/s 1604450 B/op 687 allocs/op BenchmarkReaderHuge/194K-12 1522 752257 ns/op 265.04 MB/s 2143135 B/op 1335 allocs/op BenchmarkReaderHuge/389K-12 1190 947858 ns/op 420.69 MB/s 3221831 B/op 2630 allocs/op BenchmarkReaderHuge/778K-12 806 1472486 ns/op 541.61 MB/s 5201856 B/op 5187 allocs/op BenchmarkReaderHuge/1557K-12 426 2575269 ns/op 619.36 MB/s 9101330 B/op 10233 allocs/op BenchmarkReaderHuge/3115K-12 286 4034656 ns/op 790.66 MB/s 12397968 B/op 16099 allocs/op BenchmarkReaderHuge/6230K-12 172 6830563 ns/op 934.05 MB/s 16008416 B/op 26844 allocs/op BenchmarkReaderHuge/12461K-12 100 11409467 ns/op 1118.39 MB/s 22655163 B/op 48107 allocs/op BenchmarkReaderHuge/24922K-12 66 19780395 ns/op 1290.19 MB/s 35158559 B/op 90216 allocs/op BenchmarkReaderHuge/49844K-12 34 37282559 ns/op 1369.03 MB/s 60528624 B/op 174497 allocs/op ```	5 years ago
Yao Zongyou	18fedc67d5	friendly prompt for s3select MalformedXML error (#8171 ) partly fix #7911	5 years ago
Yao Zongyou	ec9bfd3aef	speed up the performance of s3select on csv (#7945 )	5 years ago
Kanagaraj M	12353caf35	Fix: Support Unicode delimiters in s3 select (#7931 )	5 years ago
Yao Zongyou	c4f480a839	fix csv read bug (#7885 )	5 years ago
Yao Zongyou	60831e3299	aggregation functions' argument may already has been cast to numeric (#7876 )	5 years ago
Yao Zongyou	037319066f	fix unicode support related bugs in s3select (#7877 )	5 years ago
Ryan Tam	bd56f80250	Fix ignored alias for aggregate result in S3 Select (#7849 ) The SQL parser as it stands right now ignores alias for aggregate result, e.g. `SELECT COUNT(*) AS thing FROM s3object` doesn't actually return record like `{"thing": 42}`, it returns a record like `{"_1": 42}`. Column alias for aggregate result is supported in AWS's S3 Select, so this commit fixes that by respecting the `expr.As` in the expression. Also improve test for S3 select On top of testing a simple `SELECT` query, we want to test a few more "advanced" queries (e.g. aggregation). Convert existing tests into table driven tests[1], and add the new test cases with "advanced" queries into them. [1] - https://github.com/golang/go/wiki/TableDrivenTests	5 years ago
Yao Zongyou	941fed8e4a	s3Select: call Close on error to release the read lock (#7830 )	5 years ago
Yao Zongyou	55092bede1	add timestamp compare support (#7832 )	5 years ago
Yao Zongyou	90a3b830f4	fix typo and the string representation of the time.Time value (#7831 )	5 years ago
Yao Zongyou	23b9df0694	Fix s3select TRIM function's nil pointer dereference bug (#7817 )	5 years ago
Joe Stevens	a19cf063b5	Fixes for multiplatform dev and testing from forks (#7734 ) Add support for correct dependency URLs on all platforms only build mountinfo.go on linux make testfile path relative to support fork work	6 years ago
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	6 years ago
Aditya Manthramurthy	b1b1d77893	Set S3 Select record message length to 128KiB (#7475 ) - Previously this limit was a little more than 1MiB, and it broke compatibility with AWS SDK Java causing a buffer overflow error.	6 years ago
Kirill Motkov	3d29ab4059	Rewrite if-else chains to switch statements (#7382 )	6 years ago
Harshavardhana	91d85a0d53	Fix stale locks held by SelectParquet API (#7364 ) Vendorize upstream parquet-go to fix this issue.	6 years ago
Aditya Manthramurthy	e463386921	Add JSON Path expression evaluation support (#7315 ) - Includes support for FROM clause JSON path	6 years ago
Aditya Manthramurthy	f4879ed96d	Use jstream to serialize records to JSON format in S3Select (#7318 ) - Also, switch to jstream to generate internal record representation from CSV/JSON readers - This fixes a bug in which JSON output objects have their keys reversed from the order they are specified in the Select columns. - Also includes a fix for tests.	6 years ago
Harshavardhana	2520e535a0	Allow lazyQuotes for certain types of CSV (#7278 ) Set lazyQuotes to true, to allow a quote to appear in an unquote field and a non-doubled quote may appear in a quoted field.	6 years ago
Aditya Manthramurthy	80a351633f	Update vendorized bcicen/jstream (#7257 ) - Includes an error handling fix that is waiting to be merged upstream - Uses order-preserving (un)marshalling for JSON objects.	6 years ago
Aditya Manthramurthy	8a405cab2f	COUNT() function in select should return an int (#7243 )	6 years ago
Harshavardhana	df35d7db9d	Introduce staticcheck for stricter builds (#7035 )	6 years ago
Aditya Manthramurthy	ee5b3622a5	Evaluate where clause in aggregation queries (#7235 )	6 years ago
Harshavardhana	85e939636f	Fix JSON parser handling for certain objects (#7162 ) This PR also adds some comments and simplifies the code. Primary handling is done to ensure that we make sure to honor cached buffer. Added unit tests as well Fixes #7141	6 years ago
Aditya Manthramurthy	4aa9ee153b	Fix S3 Select request XML parsing (#7202 )	6 years ago
Aditya Manthramurthy	fd4e15c116	Flush the records staging buffer periodically (#7193 ) - Staging buffer is flushed every 500ms. In cases where the result records are slowly generated (e.g. when a where condition matches very few records), this change causes the server to send results even though the staging buffer is not full. - Refactor messageWriter code to use simpler channel based co-ordination instead of atomic variables.	6 years ago
Aditya Manthramurthy	f04f8bbc78	Add support for Timestamp data type in SQL Select (#7185 ) This change adds support for casting strings to Timestamp via CAST: `CAST('2010T' AS TIMESTAMP)` It also implements the following date-time functions: - UTCNOW() - DATE_ADD() - DATE_DIFF() - EXTRACT() For values passed to these functions, date-types are automatically inferred.	6 years ago
Aditya Manthramurthy	91c839ad28	Use a buffer to collect SQL Select result rows (#7158 ) Batching records into a single SQL Select message in the response leads to significant speed up as the message header overhead is made negligible. This change leads to a speed up of 3-5x for queries that select many small records.	6 years ago
Aditya Manthramurthy	2786055df4	Add new SQL parser to support S3 Select syntax (#7102 ) - New parser written from scratch, allows easier and complete parsing of the full S3 Select SQL syntax. Parser definition is directly provided by the AST defined for the SQL grammar. - Bring support to parse and interpret SQL involving JSON path expressions; evaluation of JSON path expressions will be subsequently added. - Bring automatic type inference and conversion for untyped values (e.g. CSV data).	6 years ago
Bala FA	e23a42305c	Rebase minio/parquet-go and fix null handling. (#7067 )	6 years ago
Bala FA	b0deea27df	Refactor s3select to support parquet. (#7023 ) Also handle pretty formatted JSON documents.	6 years ago
Aditya Manthramurthy	2aeb3fbe86	Fix csv output delimiter bug (#6994 )	6 years ago
Harshavardhana	4c7c571875	Support JSON to CSV and CSV to JSON output format conversion (#6910 ) This PR implements one of the pending items in issue #6286 in S3 API a user can request CSV output for a JSON document and a JSON output for a CSV document. This PR refactors the code a little bit to bring this feature.	6 years ago
Harshavardhana	272b8003d6	Honor header only when requested for use (#6815 )	6 years ago
Harshavardhana	7e1661f4fa	Performance improvements to SELECT API on certain query operations (#6752 ) This improves the performance of certain queries dramatically, such as 'count()' etc. Without this PR ``` ~ time mc select --query "select count() from S3Object" myminio/sjm-airlines/star2000.csv.gz 2173762 real 0m42.464s user 0m0.071s sys 0m0.010s ``` With this PR ``` ~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz 2173762 real 0m17.603s user 0m0.093s sys 0m0.008s ``` Almost a 250% improvement in performance. This PR avoids a lot of type conversions and instead relies on raw sequences of data and interprets them lazily. ``` benchcmp old new benchmark old ns/op new ns/op delta BenchmarkSQLAggregate_100K-4 551213 259782 -52.87% BenchmarkSQLAggregate_1M-4 6981901985 2432413729 -65.16% BenchmarkSQLAggregate_2M-4 13511978488 4536903552 -66.42% BenchmarkSQLAggregate_10M-4 68427084908 23266283336 -66.00% benchmark old allocs new allocs delta BenchmarkSQLAggregate_100K-4 2366 485 -79.50% BenchmarkSQLAggregate_1M-4 47455492 21462860 -54.77% BenchmarkSQLAggregate_2M-4 95163637 43110771 -54.70% BenchmarkSQLAggregate_10M-4 476959550 216906510 -54.52% benchmark old bytes new bytes delta BenchmarkSQLAggregate_100K-4 1233079 1086024 -11.93% BenchmarkSQLAggregate_1M-4 2607984120 557038536 -78.64% BenchmarkSQLAggregate_2M-4 5254103616 1128149168 -78.53% BenchmarkSQLAggregate_10M-4 26443524872 5722715992 -78.36% ```	6 years ago
Harshavardhana	f162d7bd97	Performance improvements by re-using record buffer (#6622 ) Avoid unnecessary pointer reference allocations when not needed, for example - SelectFuncs{} - Row{}	6 years ago

1 2

60 Commits (4e9de58675efb9ba03188f16f005ba943e5508c4)