S3¶
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.
iotoolz.extensions.S3Stream is the stream extension that allows you to interact with S3 objects like a file object.
You can use the class directly or through the iotoolz.streams
functions.
boto3
must be installed to useS3Stream
# install iotoolz with the optional dependency boto3
pip install iotoolz[boto3]
Usage¶
Usage with S3Stream¶
import json
import pandas as pd
from iotoolz.extensions import S3Stream
# read a ndjson object from the s3 bucket
with S3Stream("s3://bucket/folder/key.ndjson", mode="r") as stream:
for line in stream:
print(json.loads(line))
# write text object to the s3 bucket
# query string will take precendence over kwargs
with S3Stream(
"s3://bucket/folder/key.ndjson?StorageClass=ONEZONE_IA",
mode="w",
content_type="text/plain",
StorageClass="REDUCED_REDUNDANCY",
) as stream:
stream.write("hello world")
# read a csv and load into pandas
with S3Stream(
"s3://bucket/folder/key.csv", mode="r"
) as stream:
df = pd.read_csv(stream)
Usage with iotoolz.streams¶
from iotoolz.streams import open_stream, register_stream, set_schema_kwargs
# set tag for all uploaded s3 resources with scheme s3://
set_schema_kwargs(
"s3",
Tagging="key1=value1&key2=value2",
Metadata={"meta-key": "meta-value"},
StorageClass="REDUCED_REDUNDANCY"
)
# print line by line some data in from a https endpoint
with open_stream("s3://bucket/folder/key.txt", "r") as stream:
for line in stream:
print(line)
# save a plain text resource to a ONEZONE_IA storageclass
with open_stream(
"s3://bucket/folder/key.txt?StorageClass=ONEZONE_IA",
mode="wb",
content_type="text/plain",
encoding="utf-8"
) as stream:
stream.write(b"hello world")
# save a local file to S3
with open_stream("key.txt", "rb") as csv_source,
open_stream("s3://bucket/folder/key.txt", "wb") as s3_sink:
csv_source.pipe(s3_sink)
Last update: October 27, 2020