S3Stream¶
iotoolz.extensions.S3Stream
is a stream interface implemented with AWS boto3
package.
NOTE
This is an extension module - i.e. you will need to
pip install iotoolz[boto3]
before you can use this stream interface.
iotoolz.extensions.s3.S3Stream
¶
S3Stream is the stream interface to AWS S3 object store.
supported_schemas: Set[str]
¶
Methods¶
__init__(self, uri, mode='r', buffering=-1, encoding=None, newline=None, content_type='', inmem_size=None, delimiter=None, chunk_size=8192, client=None, multipart_threshold=None, max_concurrency=None, multipart_chunksize=None, num_download_attempts=None, max_io_queue=None, io_chunksize=None, use_threads=None, **kwargs)
special
¶
Creates a new instance of S3Stream.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uri |
str |
uri string to the resource. |
required |
mode |
str |
same as "open" - supports depends on the actual implementation. Defaults to "r". |
'r' |
buffering |
int |
same as "open". Defaults to -1. |
-1 |
encoding |
str |
encoding used to decode bytes to str. Defaults to None. |
None |
newline |
str |
same as "open". Defaults to None. |
None |
content_type |
str |
mime type for the resource. Defaults to "". |
'' |
inmem_size |
int |
max size before buffer rollover from mem to disk. Defaults to None (i.e. never - may raise MemoryError). |
None |
delimiter |
Union[str, bytes] |
delimiter used for determining line boundaries. Defaults to None. |
None |
chunk_size |
int |
chunk size when iterating bytes stream. Defaults to io.DEFAULT_BUFFER_SIZE. |
8192 |
client |
<function client at 0x7fd53d949b00> |
use the provided boto3 client to interface with S3. Defaults to None. |
None |
multipart_threshold |
int |
The transfer size threshold for which multipart uploads, downloads, and copies will automatically be triggered. Defaults to 8388608. |
None |
max_concurrency |
int |
The maximum number of threads that will be making requests to perform a transfer. If use_threads is set to False, the value provided is ignored as the transfer will only ever use the main thread. Defaults to 10. |
None |
multipart_chunksize |
int |
The partition size of each part for a multipart transfer. Defaults to 8388608. |
None |
num_download_attempts |
int |
The number of download attempts that will be retried upon errors with downloading an object in S3. Note that these retries account for errors that occur when streaming down the data from s3 (i.e. socket errors and read timeouts that occur after receiving an OK response from s3). Other retryable exceptions such as throttling errors and 5xx errors are already retried by botocore (this default is 5). This does not take into account the number of exceptions retried by botocore. Defaults to 5. |
None |
max_io_queue |
int |
The maximum amount of read parts that can be queued in memory to be written for a download. The size of each of these read parts is at most the size of io_chunksize. Defaults to 100. |
None |
io_chunksize |
int |
The max size of each chunk in the io queue. Currently, this is size used when read is called on the downloaded stream as well. Defaults to 262144. |
None |
use_threads |
bool |
If True, threads will be used when performing S3 transfers. If False, no threads will be used in performing transfers: all logic will be ran in the main thread. Defaults to True. |
None |
**kwargs |
|
Additional ExtraArgs which will be passed to the 'boto3.s3.transfer.S3Transfer' client. |
{} |
exists(self)
¶
Whether the stream points to an existing resource.
is_dir(self)
¶
Whether stream points to a existing dir.
is_file(self)
¶
Whether stream points to a existing file.
iter_dir_(self)
¶
Yields tuple of uri and the metadata in a directory.
mkdir(self, mode=511, parents=False, exist_ok=False)
¶
This method does nothing as you do not need to create a 'folder' for an object store.
read_to_iterable_(self, uri, chunk_size, fileobj, **kwargs)
¶
Downloads the S3 object to buffer with 'boto3.s3.transfer.S3Transfer'.
rmdir(self, ignore_errors=False, **kwargs)
¶
Remove the entire directory.
set_default_client(client)
classmethod
¶
Set the default boto3 client to use.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
client |
<function client at 0x7fd53d949b00> |
boto3 client object. |
required |
set_default_download_args(**kwargs)
classmethod
¶
Set the default ExtraArgs to the 'boto3.s3.transfer.S3Transfer' download client.
The available kwargs are:
'VersionId', 'SSECustomerAlgorithm', 'SSECustomerKey', 'SSECustomerKeyMD5', 'RequestPayer'
set_default_transfer_config(multipart_threshold=8388608, max_concurrency=10, multipart_chunksize=8388608, num_download_attempts=5, max_io_queue=100, io_chunksize=262144, use_threads=True)
classmethod
¶
Set the default transfer config.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
multipart_threshold |
int |
The transfer size threshold for which multipart uploads, downloads, and copies will automatically be triggered. Defaults to 8388608. |
8388608 |
max_concurrency |
int |
The maximum number of threads that will be making requests to perform a transfer. If use_threads is set to False, the value provided is ignored as the transfer will only ever use the main thread. Defaults to 10. |
10 |
multipart_chunksize |
int |
The partition size of each part for a multipart transfer. Defaults to 8388608. |
8388608 |
num_download_attempts |
int |
The number of download attempts that will be retried upon errors with downloading an object in S3. Note that these retries account for errors that occur when streaming down the data from s3 (i.e. socket errors and read timeouts that occur after receiving an OK response from s3). Other retryable exceptions such as throttling errors and 5xx errors are already retried by botocore (this default is 5). This does not take into account the number of exceptions retried by botocore. Defaults to 5. |
5 |
max_io_queue |
int |
The maximum amount of read parts that can be queued in memory to be written for a download. The size of each of these read parts is at most the size of io_chunksize. Defaults to 100. |
100 |
io_chunksize |
int |
The max size of each chunk in the io queue. Currently, this is size used when read is called on the downloaded stream as well. Defaults to 262144. |
262144 |
use_threads |
bool |
If True, threads will be used when performing S3 transfers. If False, no threads will be used in performing transfers: all logic will be ran in the main thread. Defaults to True. |
True |
set_default_upload_args(**kwargs)
classmethod
¶
Set the default ExtraArgs to the 'boto3.s3.transfer.S3Transfer' upload client.
The available kwargs are:
'ACL', 'CacheControl', 'ContentDisposition', 'ContentEncoding', 'ContentLanguage', 'ContentType', 'Expires', 'GrantFullControl', 'GrantRead', 'GrantReadACP', 'GrantWriteACP', 'Metadata', 'RequestPayer', 'ServerSideEncryption', 'StorageClass', 'SSECustomerAlgorithm', 'SSECustomerKey', 'SSECustomerKeyMD5', 'SSEKMSKeyId', 'Tagging', 'WebsiteRedirectLocation'
stats_(self)
¶
Retrieve the StreamInfo.
unlink(self, missing_ok=True, **kwargs)
¶
Delete and remove the resource.
write_from_fileobj_(self, uri, fileobj, size, **kwargs)
¶
Uploads the data in the buffer with 'boto3.s3.transfer.S3Transfer'.