Skip to content

S3Stream

iotoolz.extensions.S3Stream is a stream interface implemented with AWS boto3 package.

NOTE

This is an extension module - i.e. you will need to pip install iotoolz[boto3] before you can use this stream interface.

iotoolz.extensions.s3.S3Stream

S3Stream is the stream interface to AWS S3 object store.

See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer

supported_schemas: Set[str]

Methods

__init__(self, uri, mode='r', buffering=-1, encoding=None, newline=None, content_type='', inmem_size=None, delimiter=None, chunk_size=8192, client=None, multipart_threshold=None, max_concurrency=None, multipart_chunksize=None, num_download_attempts=None, max_io_queue=None, io_chunksize=None, use_threads=None, **kwargs) special

Creates a new instance of S3Stream.

See also: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer

Parameters:

Name Type Description Default
uri str

uri string to the resource.

required
mode str

same as "open" - supports depends on the actual implementation. Defaults to "r".

'r'
buffering int

same as "open". Defaults to -1.

-1
encoding str

encoding used to decode bytes to str. Defaults to None.

None
newline str

same as "open". Defaults to None.

None
content_type str

mime type for the resource. Defaults to "".

''
inmem_size int

max size before buffer rollover from mem to disk. Defaults to None (i.e. never - may raise MemoryError).

None
delimiter Union[str, bytes]

delimiter used for determining line boundaries. Defaults to None.

None
chunk_size int

chunk size when iterating bytes stream. Defaults to io.DEFAULT_BUFFER_SIZE.

8192
client <function client at 0x7fd53d949b00>

use the provided boto3 client to interface with S3. Defaults to None.

None
multipart_threshold int

The transfer size threshold for which multipart uploads, downloads, and copies will automatically be triggered. Defaults to 8388608.

None
max_concurrency int

The maximum number of threads that will be making requests to perform a transfer. If use_threads is set to False, the value provided is ignored as the transfer will only ever use the main thread. Defaults to 10.

None
multipart_chunksize int

The partition size of each part for a multipart transfer. Defaults to 8388608.

None
num_download_attempts int

The number of download attempts that will be retried upon errors with downloading an object in S3. Note that these retries account for errors that occur when streaming down the data from s3 (i.e. socket errors and read timeouts that occur after receiving an OK response from s3). Other retryable exceptions such as throttling errors and 5xx errors are already retried by botocore (this default is 5). This does not take into account the number of exceptions retried by botocore. Defaults to 5.

None
max_io_queue int

The maximum amount of read parts that can be queued in memory to be written for a download. The size of each of these read parts is at most the size of io_chunksize. Defaults to 100.

None
io_chunksize int

The max size of each chunk in the io queue. Currently, this is size used when read is called on the downloaded stream as well. Defaults to 262144.

None
use_threads bool

If True, threads will be used when performing S3 transfers. If False, no threads will be used in performing transfers: all logic will be ran in the main thread. Defaults to True.

None
**kwargs

Additional ExtraArgs which will be passed to the 'boto3.s3.transfer.S3Transfer' client.

{}

exists(self)

Whether the stream points to an existing resource.

is_dir(self)

Whether stream points to a existing dir.

is_file(self)

Whether stream points to a existing file.

iter_dir_(self)

Yields tuple of uri and the metadata in a directory.

mkdir(self, mode=511, parents=False, exist_ok=False)

This method does nothing as you do not need to create a 'folder' for an object store.

read_to_iterable_(self, uri, chunk_size, fileobj, **kwargs)

Downloads the S3 object to buffer with 'boto3.s3.transfer.S3Transfer'.

rmdir(self, ignore_errors=False, **kwargs)

Remove the entire directory.

set_default_client(client) classmethod

Set the default boto3 client to use.

Parameters:

Name Type Description Default
client <function client at 0x7fd53d949b00>

boto3 client object.

required

set_default_download_args(**kwargs) classmethod

Set the default ExtraArgs to the 'boto3.s3.transfer.S3Transfer' download client.

See also: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer

The available kwargs are:

'VersionId', 'SSECustomerAlgorithm', 'SSECustomerKey', 'SSECustomerKeyMD5', 'RequestPayer'

set_default_transfer_config(multipart_threshold=8388608, max_concurrency=10, multipart_chunksize=8388608, num_download_attempts=5, max_io_queue=100, io_chunksize=262144, use_threads=True) classmethod

Set the default transfer config.

See also https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.TransferConfig

Parameters:

Name Type Description Default
multipart_threshold int

The transfer size threshold for which multipart uploads, downloads, and copies will automatically be triggered. Defaults to 8388608.

8388608
max_concurrency int

The maximum number of threads that will be making requests to perform a transfer. If use_threads is set to False, the value provided is ignored as the transfer will only ever use the main thread. Defaults to 10.

10
multipart_chunksize int

The partition size of each part for a multipart transfer. Defaults to 8388608.

8388608
num_download_attempts int

The number of download attempts that will be retried upon errors with downloading an object in S3. Note that these retries account for errors that occur when streaming down the data from s3 (i.e. socket errors and read timeouts that occur after receiving an OK response from s3). Other retryable exceptions such as throttling errors and 5xx errors are already retried by botocore (this default is 5). This does not take into account the number of exceptions retried by botocore. Defaults to 5.

5
max_io_queue int

The maximum amount of read parts that can be queued in memory to be written for a download. The size of each of these read parts is at most the size of io_chunksize. Defaults to 100.

100
io_chunksize int

The max size of each chunk in the io queue. Currently, this is size used when read is called on the downloaded stream as well. Defaults to 262144.

262144
use_threads bool

If True, threads will be used when performing S3 transfers. If False, no threads will be used in performing transfers: all logic will be ran in the main thread. Defaults to True.

True

set_default_upload_args(**kwargs) classmethod

Set the default ExtraArgs to the 'boto3.s3.transfer.S3Transfer' upload client.

See also: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer

The available kwargs are:

'ACL', 'CacheControl', 'ContentDisposition', 'ContentEncoding', 'ContentLanguage', 'ContentType', 'Expires', 'GrantFullControl', 'GrantRead', 'GrantReadACP', 'GrantWriteACP', 'Metadata', 'RequestPayer', 'ServerSideEncryption', 'StorageClass', 'SSECustomerAlgorithm', 'SSECustomerKey', 'SSECustomerKeyMD5', 'SSEKMSKeyId', 'Tagging', 'WebsiteRedirectLocation'

stats_(self)

Retrieve the StreamInfo.

Delete and remove the resource.

write_from_fileobj_(self, uri, fileobj, size, **kwargs)

Uploads the data in the buffer with 'boto3.s3.transfer.S3Transfer'.


Last update: October 27, 2020