Skip to content

S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

iotoolz.extensions.S3Stream is the stream extension that allows you to interact with S3 objects like a file object.

You can use the class directly or through the iotoolz.streams functions.

boto3 must be installed to use S3Stream

# install iotoolz with the optional dependency boto3
pip install iotoolz[boto3]

Usage

Usage with S3Stream

import json

import pandas as pd

from iotoolz.extensions import S3Stream

# read a ndjson object from the s3 bucket
with S3Stream("s3://bucket/folder/key.ndjson", mode="r") as stream:
    for line in stream:
        print(json.loads(line))


# write text object to the s3 bucket
# query string will take precendence over kwargs
with S3Stream(
    "s3://bucket/folder/key.ndjson?StorageClass=ONEZONE_IA",
    mode="w",
    content_type="text/plain",
    StorageClass="REDUCED_REDUNDANCY",
) as stream:
    stream.write("hello world")


# read a csv and load into pandas
with S3Stream(
    "s3://bucket/folder/key.csv", mode="r"
) as stream:
    df = pd.read_csv(stream)

Usage with iotoolz.streams

from iotoolz.streams import open_stream, register_stream, set_schema_kwargs

# set tag for all uploaded s3 resources with scheme s3://
set_schema_kwargs(
    "s3",
    Tagging="key1=value1&key2=value2",
    Metadata={"meta-key": "meta-value"},
    StorageClass="REDUCED_REDUNDANCY"
)

# print line by line some data in from a https endpoint
with open_stream("s3://bucket/folder/key.txt", "r") as stream:
    for line in stream:
        print(line)

# save a plain text resource to a ONEZONE_IA storageclass
with open_stream(
    "s3://bucket/folder/key.txt?StorageClass=ONEZONE_IA",
    mode="wb",
    content_type="text/plain",
    encoding="utf-8"
) as stream:
    stream.write(b"hello world")

# save a local file to S3
with open_stream("key.txt", "rb") as csv_source,
     open_stream("s3://bucket/folder/key.txt", "wb") as s3_sink:
    csv_source.pipe(s3_sink)

Last update: October 27, 2020