feat: disable Parquet by default (breaking change) (#9920)

I have built a fuzz test and it crashes heavily in seconds and will OOM shortly after.
It seems like supporting Parquet is basically a completely open way to crash the 
server if you can upload a file and run s3 select on it.

Until Parquet is more hardened it is DISABLED by default since hostile 
crafted input can easily crash the server.

If you are in a controlled environment where it is safe to assume no hostile
content can be uploaded to your cluster you can safely enable Parquet.

To enable Parquet set the environment variable `MINIO_API_SELECT_PARQUET=on`
while starting the MinIO server.

Furthermore, we guard parquet by recover functions.
master
Klaus Post 4 years ago committed by GitHub
parent d2a3f92452
commit adca28801d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 15
      docs/select/README.md
  2. 14
      pkg/s3select/parquet/reader.go
  3. 4
      pkg/s3select/select.go
  4. 2
      pkg/s3select/select_test.go

@ -3,13 +3,26 @@ Traditional retrieval of objects is always as whole entities, i.e GetObject for
You can use the Select API to query objects with following features: You can use the Select API to query objects with following features:
- CSV, JSON and Parquet - Objects must be in CSV, JSON, or Parquet format. - Objects must be in CSV, JSON, or Parquet(*) format.
- UTF-8 is the only encoding type the Select API supports. - UTF-8 is the only encoding type the Select API supports.
- GZIP or BZIP2 - CSV and JSON files can be compressed using GZIP or BZIP2. The Select API supports columnar compression for Parquet using GZIP, Snappy, LZ4. Whole object compression is not supported for Parquet objects. - GZIP or BZIP2 - CSV and JSON files can be compressed using GZIP or BZIP2. The Select API supports columnar compression for Parquet using GZIP, Snappy, LZ4. Whole object compression is not supported for Parquet objects.
- Server-side encryption - The Select API supports querying objects that are protected with server-side encryption. - Server-side encryption - The Select API supports querying objects that are protected with server-side encryption.
Type inference and automatic conversion of values is performed based on the context when the value is un-typed (such as when reading CSV data). If present, the CAST function overrides automatic conversion. Type inference and automatic conversion of values is performed based on the context when the value is un-typed (such as when reading CSV data). If present, the CAST function overrides automatic conversion.
The [mc sql](https://docs.min.io/docs/minio-client-complete-guide.html#sql) command can be used for executing queries using the command line.
(*) Parquet is disabled on the MinIO server by default. See below how to enable it.
## Enabling Parquet Format
Parquet is DISABLED by default since hostile crafted input can easily crash the server.
If you are in a controlled environment where it is safe to assume no hostile content can be uploaded to your cluster you can safely enable Parquet.
To enable Parquet set the environment variable `MINIO_API_SELECT_PARQUET=on`.
# Example using Python API
## 1. Prerequisites ## 1. Prerequisites
- Install MinIO Server from [here](http://docs.min.io/docs/minio-quickstart-guide). - Install MinIO Server from [here](http://docs.min.io/docs/minio-quickstart-guide).
- Familiarity with AWS S3 API. - Familiarity with AWS S3 API.

@ -17,6 +17,7 @@
package parquet package parquet
import ( import (
"fmt"
"io" "io"
"github.com/bcicen/jstream" "github.com/bcicen/jstream"
@ -34,6 +35,12 @@ type Reader struct {
// Read - reads single record. // Read - reads single record.
func (r *Reader) Read(dst sql.Record) (rec sql.Record, rerr error) { func (r *Reader) Read(dst sql.Record) (rec sql.Record, rerr error) {
defer func() {
if rec := recover(); rec != nil {
rerr = fmt.Errorf("panic reading parquet record: %v", rec)
}
}()
parquetRecord, err := r.reader.Read() parquetRecord, err := r.reader.Read()
if err != nil { if err != nil {
if err != io.EOF { if err != io.EOF {
@ -92,7 +99,12 @@ func (r *Reader) Close() error {
} }
// NewReader - creates new Parquet reader using readerFunc callback. // NewReader - creates new Parquet reader using readerFunc callback.
func NewReader(getReaderFunc func(offset, length int64) (io.ReadCloser, error), args *ReaderArgs) (*Reader, error) { func NewReader(getReaderFunc func(offset, length int64) (io.ReadCloser, error), args *ReaderArgs) (r *Reader, err error) {
defer func() {
if rec := recover(); rec != nil {
err = fmt.Errorf("panic reading parquet header: %v", rec)
}
}()
reader, err := parquetgo.NewReader(getReaderFunc, nil) reader, err := parquetgo.NewReader(getReaderFunc, nil)
if err != nil { if err != nil {
if err != io.EOF { if err != io.EOF {

@ -26,6 +26,7 @@ import (
"io" "io"
"io/ioutil" "io/ioutil"
"net/http" "net/http"
"os"
"strings" "strings"
"sync" "sync"
@ -334,6 +335,9 @@ func (s3Select *S3Select) Open(getReader func(offset, length int64) (io.ReadClos
} }
return nil return nil
case parquetFormat: case parquetFormat:
if !strings.EqualFold(os.Getenv("MINIO_API_SELECT_PARQUET"), "on") {
return errors.New("parquet format parsing not enabled on server")
}
var err error var err error
s3Select.recordReader, err = parquet.NewReader(getReader, &s3Select.Input.ParquetArgs) s3Select.recordReader, err = parquet.NewReader(getReader, &s3Select.Input.ParquetArgs)
return err return err

@ -925,6 +925,8 @@ func TestJSONInput(t *testing.T) {
} }
func TestParquetInput(t *testing.T) { func TestParquetInput(t *testing.T) {
os.Setenv("MINIO_API_SELECT_PARQUET", "on")
defer os.Setenv("MINIO_API_SELECT_PARQUET", "off")
var testTable = []struct { var testTable = []struct {
requestXML []byte requestXML []byte

Loading…
Cancel
Save