I have built a fuzz test and it crashes heavily in seconds and will OOM shortly after.
It seems like supporting Parquet is basically a completely open way to crash the
server if you can upload a file and run s3 select on it.
Until Parquet is more hardened it is DISABLED by default since hostile
crafted input can easily crash the server.
If you are in a controlled environment where it is safe to assume no hostile
content can be uploaded to your cluster you can safely enable Parquet.
To enable Parquet set the environment variable `MINIO_API_SELECT_PARQUET=on`
while starting the MinIO server.
Furthermore, we guard parquet by recover functions.
@ -3,13 +3,26 @@ Traditional retrieval of objects is always as whole entities, i.e GetObject for
You can use the Select API to query objects with following features:
You can use the Select API to query objects with following features:
- CSV, JSON and Parquet - Objects must be in CSV, JSON, or Parquet format.
- Objects must be in CSV, JSON, or Parquet(*) format.
- UTF-8 is the only encoding type the Select API supports.
- UTF-8 is the only encoding type the Select API supports.
- GZIP or BZIP2 - CSV and JSON files can be compressed using GZIP or BZIP2. The Select API supports columnar compression for Parquet using GZIP, Snappy, LZ4. Whole object compression is not supported for Parquet objects.
- GZIP or BZIP2 - CSV and JSON files can be compressed using GZIP or BZIP2. The Select API supports columnar compression for Parquet using GZIP, Snappy, LZ4. Whole object compression is not supported for Parquet objects.
- Server-side encryption - The Select API supports querying objects that are protected with server-side encryption.
- Server-side encryption - The Select API supports querying objects that are protected with server-side encryption.
Type inference and automatic conversion of values is performed based on the context when the value is un-typed (such as when reading CSV data). If present, the CAST function overrides automatic conversion.
Type inference and automatic conversion of values is performed based on the context when the value is un-typed (such as when reading CSV data). If present, the CAST function overrides automatic conversion.
The [mc sql](https://docs.min.io/docs/minio-client-complete-guide.html#sql) command can be used for executing queries using the command line.
(*) Parquet is disabled on the MinIO server by default. See below how to enable it.
## Enabling Parquet Format
Parquet is DISABLED by default since hostile crafted input can easily crash the server.
If you are in a controlled environment where it is safe to assume no hostile content can be uploaded to your cluster you can safely enable Parquet.
To enable Parquet set the environment variable `MINIO_API_SELECT_PARQUET=on`.
# Example using Python API
## 1. Prerequisites
## 1. Prerequisites
- Install MinIO Server from [here](http://docs.min.io/docs/minio-quickstart-guide).
- Install MinIO Server from [here](http://docs.min.io/docs/minio-quickstart-guide).