tabsdata.MongoDBDestination#

class MongoDBDestination(
uri: str,
collections_with_ids: tuple[str, str | None] | List[tuple[str, str | None]],
credentials: UserPasswordCredentials = None,
connection_options: dict = None,
if_collection_exists: Literal['append', 'replace'] = 'append',
use_trxs: bool = False,
docs_per_trx: int = 1000,
maintain_order: bool = False,
update_existing: bool = True,
fail_on_duplicate_key: bool = True,
log_intermediate_files: bool = False,
**kwargs,
)#

Bases: DestinationPlugin

__init__(
uri: str,
collections_with_ids: tuple[str, str | None] | List[tuple[str, str | None]],
credentials: UserPasswordCredentials = None,
connection_options: dict = None,
if_collection_exists: Literal['append', 'replace'] = 'append',
use_trxs: bool = False,
docs_per_trx: int = 1000,
maintain_order: bool = False,
update_existing: bool = True,
fail_on_duplicate_key: bool = True,
log_intermediate_files: bool = False,
**kwargs,
)#
Initializes the MongoDBDestination with the configuration desired to store

the data.

Parameters:
  • uri (str) – The URI of the MongoDB database.

  • collections_with_ids (tuple[str, str] | List[tuple[str, str]]) – A tuple or list of tuples with the collection and the name of the field that will be used as the unique identifier. For example, if you want to store the data in a collection called ‘my_collection’ in database ‘my_database’ and use the field ‘username’ as the unique identifier, you would provide the following tuple: ( ‘my_database.my_collection’, ‘username’). If you wanted MongoDB to autogenerate the id, you would provide the following tuple: (‘my_database.my_collection’, None).

  • credentials (UserPasswordCredentials, optional) – The credentials to connect with the database. If None, no credentials will be used.

  • connection_options (dict, optional) – A dictionary with the options to pass to the pymongo.MongoClient constructor. For example, if you want to set the timeout to 1000 milliseconds, you would provide the following dictionary: {‘serverSelectionTimeoutMS’: 1000}.

  • if_collection_exists (Literal["append", "replace"], optional) – The action to take if the collection already exists. If ‘append’, the data will be appended to the existing collection. If ‘replace’, the existing collection will be replaced with the new data. Defaults to ‘append’.

  • use_trxs (bool, optional) – Whether to use a transaction when storing the data in the database. If True, the data will be stored in a transaction, which will ensure that all the data is stored or none of it is (requires that the database is configured with a replica set). If False, the data will be stored without a transaction, which may lead to inconsistent data in the database. Defaults to False.

  • docs_per_trx (int, optional) – The maximum number of documents to store in a single transaction. If the number of documents to store exceeds this number, the data will be stored in multiple transactions.

  • maintain_order (bool, optional) – Whether to maintain the order of the documents when storing them in the database. If True, the documents will be stored in the same order as they are in the TableFrame. If False, the documents will be stored in the order that they are processed. Defaults to False.

  • update_existing (bool, optional) – Whether to update the existing documents in the database. If True, the documents will be updated if they already exist in the database. If False, the documents will be inserted without updating the existing documents, and if a document with the same id already exists execution will fail. Defaults to True.

  • fail_on_duplicate_key (bool, optional) – Whether to raise an exception if a document with the same id already exists in the collection. If True, an exception will be raised. If False, the operation will continue without raising an exception. Defaults to True.

  • log_intermediate_files (bool, optional) – Whether to log when each batch of data is stored in the database. If True, a message will be logged for each batch of data stored. If False, no message will be logged until all the data for a single collection has been stored. Defaults to False.

Methods

__init__(uri, collections_with_ids[, ...])

Initializes the MongoDBDestination with the configuration desired to store

chunk(working_dir, *results)

Trigger the exporting of the data to local parquet chunks. This method will

stream(working_dir, *results)

Trigger the exporting of the data. This method will receive the resulting data

write(files)

This method is used to write the files to the database.

Attributes

collections_with_ids

connection_options

credentials

if_collection_exists

uri