Skip to content

TileDB Integration with Local Minio doesn't work. #12182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mohd109 opened this issue Apr 20, 2025 · 5 comments
Open

TileDB Integration with Local Minio doesn't work. #12182

mohd109 opened this issue Apr 20, 2025 · 5 comments

Comments

@mohd109
Copy link

mohd109 commented Apr 20, 2025

What is the bug?

After setting Minio acces key and secret along with the endpoint, GDAL recognizes the local endpoint for some requests, but when I try to transfer to TileDB, it tries to connect to AWS servers. This is the CPL output

app-1 | /app/main.py:36: MovedIn20Warning: The declarative_base() function is now available as sqlalchemy.orm.declarative_base(). (deprecated since: 2.0) (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
app-1 | Base = declarative_base()
app-1 | /app/main.py:219: DeprecationWarning:
app-1 | on_event is deprecated, use lifespan event handlers instead.
app-1 |
app-1 | Read more about it in the
app-1 | FastAPI docs for Lifespan Events.
app-1 |
app-1 | @app.on_event("startup")
app-1 | INFO: Started server process [1]
app-1 | INFO: Waiting for application startup.
app-1 | INFO:main:Initializing MinIO bucket
app-1 | INFO:main:MinIO bucket ready
app-1 | INFO: Application startup complete.
app-1 | INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
app-1 | /opt/anaconda3/envs/geo-processing/lib/python3.10/site-packages/osgeo/gdal.py:311: FutureWarning: Neither gdal.UseExceptions() nor gdal.DontUseExceptions() has been explicitly called. In GDAL 4.0, exceptions will be enabled by default.
app-1 | warnings.warn(
app-1 | /app/main.py:352: PydanticDeprecatedSince20: The dict method is deprecated; use model_dump instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
app-1 | **fp_params.dict()
app-1 | /app/main.py:371: PydanticDeprecatedSince20: The dict method is deprecated; use model_dump instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
app-1 | create_tiledb_array(input_path, vsi_path, tdb_params.dict())
app-1 | ------------------------------------------------
app-1 | /tmp/tmpkeida12t/sample.tif
app-1 | ------------------------------------------------
app-1 | GDAL: GDALOpen(/tmp/tmpkeida12t/sample.tif, this=0x5558df85f560) succeeds as GTiff.
app-1 | CPLError: Filename should be of the form /vsis3/bucket/key
app-1 | HTTP: libcurl/8.13.0 OpenSSL/3.5.0 zlib/1.3.1 zstd/1.5.7 libssh2/1.11.1 nghttp2/1.64.0
app-1 | CURL_INFO_TEXT: Host minio:9000 was resolved.
app-1 | CURL_INFO_TEXT: IPv6: (none)
app-1 | CURL_INFO_TEXT: IPv4: 172.20.0.3
app-1 | CURL_INFO_TEXT: Trying 172.20.0.3:9000...
app-1 | CURL_INFO_TEXT: Connected to minio (172.20.0.3) port 9000
app-1 | CURL_INFO_TEXT: using HTTP/1.x
app-1 | CURL_INFO_HEADER_OUT: GET /testortho/ HTTP/1.1
app-1 | Host: minio:9000
app-1 | User-Agent: GDAL/3.10.3
app-1 | Accept: /
app-1 | Range: bytes=0-16383
app-1 | x-amz-date: 20250420T060446Z
app-1 | x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
app-1 | Authorization: AWS4-HMAC-SHA256 Credential=KxSXSvEkAuUFa1HRa74Z/20250420/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=3394a3fdabdb9c8fdb481f5ad2f693884217986ed9d818b55ae1c1ab17e200ee
app-1 |
app-1 | CURL_INFO_TEXT: Request completely sent off
app-1 | CURL_INFO_HEADER_IN: HTTP/1.1 200 OK
app-1 | CURL_INFO_HEADER_IN: Accept-Ranges: bytes
app-1 | CURL_INFO_HEADER_IN: Content-Length: 235
app-1 | CURL_INFO_HEADER_IN: Content-Type: application/xml
app-1 | CURL_INFO_HEADER_IN: Server: MinIO
app-1 | CURL_INFO_HEADER_IN: Strict-Transport-Security: max-age=31536000; includeSubDomains
app-1 | CURL_INFO_HEADER_IN: Vary: Origin
app-1 | CURL_INFO_HEADER_IN: Vary: Accept-Encoding
app-1 | CURL_INFO_HEADER_IN: X-Amz-Id-2: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
app-1 | CURL_INFO_HEADER_IN: X-Amz-Request-Id: 1837F1A8634BDF02
app-1 | CURL_INFO_HEADER_IN: X-Content-Type-Options: nosniff
app-1 | CURL_INFO_HEADER_IN: X-Ratelimit-Limit: 3844
app-1 | CURL_INFO_HEADER_IN: X-Ratelimit-Remaining: 3844
app-1 | CURL_INFO_HEADER_IN: X-Xss-Protection: 1; mode=block
app-1 | CURL_INFO_HEADER_IN: Date: Sun, 20 Apr 2025 06:04:46 GMT
app-1 | CURL_INFO_HEADER_IN:
app-1 | CURL_INFO_TEXT: Connection #0 to host minio left intact
app-1 | S3: GetFileSize(http://minio:9000/testortho/)=235 response_code=200
app-1 | GDAL: On-demand registering /opt/anaconda3/envs/geo-processing/lib/gdalplugins/gdal_TileDB.so using GDALRegister_TileDB.
app-1 | CPLError: S3: Error while listing with prefix 's3://testortho/__schema/' and delimiter '/'[Error Type: 23] [HTTP Response Code: 403] [Exception: InvalidAccessKeyId] [Remote IP: 50.7.85.34] [Request ID: 83MZRMVXVTYR2Y2C] [Headers: 'content-type' = 'application/xml' 'date' = 'Sun, 20 Apr 2025 06:04:46 GMT' 'server' = 'AmazonS3' 'transfer-encoding' = 'chunked' 'x-amz-id-2' = 'y3VQzS9uYB+m7RmG0nJJV1wpnav1iD2ZEhSBJEOMNKnVWrllA8qlm7dIB94g60yEn4wkYK4sfhI=' 'x-amz-request-id' = '83MZRMVXVTYR2Y2C'] : The AWS Access Key Id you provided does not exist in our records.
app-1 | GTiff: ScanDirectories()
app-1 | GDAL: GDALDefaultOverviews::OverviewScan()

Steps to reproduce the issue

`

class Settings(BaseSettings):
# PostGIS settings
pg_host: str = "db"
pg_port: int = 5432
pg_db: str = "gisdb"
pg_user: str = "postgres"
pg_password: str = "postgres"

# MinIO settings
minio_endpoint: str = "minio:9000"
minio_access: str = "KxSXSvEkAuUFa1HRa74Z"
minio_secret: str = "mVUjhBJCCPBAZhRCH7noqtiwk3bNxHNIcbq5387K"
minio_bucket: str = "tiledb-data"

settings = Settings()
def create_tiledb_array(input_path: str, vsi_path: str, options: dict):
gdal.SetConfigOption("CPL_DEBUG", "ON")
gdal.SetConfigOption("CPL_LOG_ERRORS", "ON")
gdal.SetConfigOption("CPL_CURL_VERBOSE", "YES")
gdal.SetConfigOption("GDAL_HTTP_NETRC", "NO")
gdal.SetConfigOption("AWS_ACCESS_KEY_ID", settings.minio_access)
gdal.SetConfigOption("AWS_SECRET_ACCESS_KEY", settings.minio_secret)
gdal.SetConfigOption("AWS_S3_ENDPOINT", settings.minio_endpoint)
gdal.SetConfigOption("AWS_HTTPS", "NO")
gdal.SetConfigOption("AWS_VIRTUAL_HOSTING", "FALSE")
gdal.SetConfigOption("VSI_CACHE", "TRUE")
gdal.SetConfigOption("GDAL_HTTP_TIMEOUT", "300")
gdal.SetConfigOption('CPL_VSIS3_CREATE_DIR_OBJECT', 'YES')
gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN', 'YES')

# Create a configuration object
config = dict()

# Set configuration parameters
config["vfs.s3.scheme"] = "http"
config["vfs.s3.region"] = ""
config["vfs.s3.endpoint_override"] = "minio:9000"
config["vfs.s3.use_virtual_addressing"] = "false"
config["vfs.s3.aws_access_key_id"] = settings.minio_access
config["vfs.s3.aws_secret_access_key"] = settings.minio_secret

with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.cfg') as cfg_file:
    for key, value in config.items():
        cfg_file.write(f"{key}={value}\n")
    cfg_path = cfg_file.name
    
try:
    # GDAL Translate options for TileDB
    translate_options = gdal.TranslateOptions(
        format="TileDB",
        creationOptions=[
            f"TILEDB_CONFIG={cfg_path}",
            "BLOCKXSIZE=256",
            "BLOCKYSIZE=256"
        ]
    )

    print("------------------------------------------------")
    print(input_path)
    print("------------------------------------------------")

    src_ds = gdal.Open(input_path)
    if not src_ds:
        raise RuntimeError(f"Failed to open {input_path}")

    result = gdal.Translate(vsi_path, src_ds, options=translate_options)
    if not result:
        raise RuntimeError("TileDB creation failed")
    result.FlushCache()
finally:
    os.remove(cfg_path)`

Versions and provenance

I'm using gdal-ubuntu-latest docker image which is 3.10.3

Additional context

No response

@rouault rouault assigned rouault and unassigned rouault Apr 20, 2025
@normanb
Copy link
Contributor

normanb commented Apr 21, 2025

I have run TileDB with fastapi and sqlalchemy (I used to work at TileDB Inc) the easiest way to debug this is to start with gdal_translate.

I created a tiledb.config file with the following as contents;

vfs.s3.scheme http
vfs.s3.region
vfs.s3.endpoint_override http://127.0.0.1:9000
vfs.s3.use_virtual_addressing false

And set the following env variables;

export AWS_ACCESS_KEY_ID=gdaltesting
export AWS_SECRET_ACCESS_KEY=gdaltesting

I ran minio

docker run -p 9000:9000 -p 9001:9001 \
  quay.io/minio/minio server /data --console-address ":9001"

Any GDAL build of TileDB with AWS support should work, support for S3 is stable. If you are concerned then build TileDB and GDAL from source, the TileDB bootstrap command is;

../bootstrap --enable-s3 --prefix=/usr/local

And follow the instructions.

From there I did the following;

gdal_translate -of TileDB -co TILEDB_CONFIG=tiledb.config UTM2GTIF.TIF /vsis3/test/UTM2GTIF

And then

gdalinfo -oo TILEDB_CONFIG=tiledb.config /vsis3/test/UTM2GTIF

Both worked as expected.

Can you confirm these work for you?

From your logs it seems you are doing some ortho processing, TileDB is a good choice for raster processing in parallel particularly with GDAL.

With gdal_create or its python equivalents it is possible to initialize an array and then fill the tiles in the array from many processes at once.

@mohd109
Copy link
Author

mohd109 commented Apr 22, 2025

Hi again,
Thanks a lot for your response.

Actually no, it doesn't work. In command line version that you gave, it has segmentation fault error for two TIF files that are working everywhere else.
In python version that I've posted before, it has a problem finding the Minio server, in only one of internal requests, e.g. it builds the bucket, gets the bucket size with GDAL, but fails to write schema in it. While I test both with aws.config file and environmental variables, the result is exactly the same.
I used gdal_translate command in the same docker image with which I've tested python version. I can zip the whole package and send it to you or post it as a google drive link here.

Image

@normanb
Copy link
Contributor

normanb commented Apr 22, 2025

Yes, I can look into this for you. Zipping the whole package is probably best. Either send me the link to my email (norman.barker<at>gmail.com) or drop it here. Any issues we can track here for visibility.

@mohd109
Copy link
Author

mohd109 commented Apr 22, 2025 via email

@mohd109
Copy link
Author

mohd109 commented May 13, 2025

Hi again, hope you're doing well.
Any news on this topic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants