File Formats#

File Format#

class FileFormat[source]#

Bases: ABC

The class of the different possible formats for files.

CSV Format#

class CSVFormat(separator: str | int = ',', quote_char: str | int = '"', eol_char: str | int = '\n', input_encoding: str = 'Utf8', input_null_values: list | None = None, input_missing_is_null: bool = True, input_truncate_ragged_lines: bool = False, input_comment_prefix: str | int | None = None, input_try_parse_dates: bool = False, input_decimal_comma: bool = False, input_has_header: bool = True, input_skip_rows: int = 0, input_skip_rows_after_header: int = 0, input_raise_if_empty: bool = True, input_ignore_errors: bool = False, output_include_header: bool = True, output_datetime_format: str | None = None, output_date_format: str | None = None, output_time_format: str | None = None, output_float_scientific: bool | None = None, output_float_precision: int | None = None, output_null_value: str | None = None, output_quote_style: str | None = None, output_maintain_order: bool = True)[source]#

Bases: FileFormat

Initializes the CSV format object.

Parameters:
  • separator – The separator of the CSV file.

  • quote_char – (optional) The quote character of the CSV file.

  • eol_char – (optional) The end of line character of the CSV file.

  • input_encoding – (optional) The encoding of the CSV file. Only used when importing data.

  • input_null_values – (optional) The null values of the CSV file. Only used when importing data.

  • input_missing_is_null – (optional) Whether missing values should be marked as null. Only used when importing data.

  • input_truncate_ragged_lines – (optional) Whether to truncate ragged lines of the CSV file. Only used when importing data.

  • input_comment_prefix – (optional) The comment prefix of the CSV file. Only used when importing data.

  • input_try_parse_dates – (optional) Whether to try parse dates of the CSV file. Only used when importing data.

  • input_decimal_comma – (optional) Whether the CSV file uses decimal comma. Only used when importing data.

  • input_has_header – (optional) If the CSV file has header. Only used when importing data.

  • input_skip_rows – (optional) How many rows should be skipped in the CSV file. Only used when importing data.

  • input_skip_rows_after_header (int, optional) – How many rows should be skipped after the header in the CSV file. Only used when importing data.

  • input_raise_if_empty – (optional) If an error should be raised for an empty CSV. Only used when importing data.

  • input_ignore_errors – (optional) If the errors loading the CSV must be ignored. Only used when importing data.

  • output_include_header – (optional) Whether to include header in the CSV output. Only used when exporting data.

  • output_datetime_format – (optional) A format string, with the specifiers defined by the chrono Rust crate. If no format specified, the default fractional-second precision is inferred from the maximum timeunit found in the frame’s Datetime cols (if any). Only used when exporting data.

  • output_date_format – (optional) A format string, with the specifiers defined by the chrono Rust crate. Only used when exporting data.

  • output_time_format – (optional) A format string, with the specifiers defined by the chrono Rust crate. Only used when exporting data.

  • output_float_scientific – (optional) Whether to use scientific form always (true), never (false), or automatically (None). Only used when exporting data.

  • output_float_precision – (optional) Number of decimal places to write. Only used when exporting data.

  • output_null_value – (optional) A string representing null values (defaulting to the empty string). Only used when exporting data.

  • output_quote_style – (optional) Determines the quoting strategy used. Only used when exporting data. * necessary (default): This puts quotes around fields only when necessary. They are necessary when fields contain a quote, separator or record terminator. Quotes are also necessary when writing an empty record (which is indistinguishable from a record with one empty field). This is the default. * always: This puts quotes around every field. Always. * never: This never puts quotes around fields, even if that results in invalid CSV data (e.g.: by not quoting strings containing the separator). * non_numeric: This puts quotes around all fields that are non-numeric. Namely, when writing a field that does not parse as a valid float or integer, then quotes will be used even if they aren`t strictly necessary.

  • output_maintain_order – (optional) Maintain the order in which data is processed. Setting this to False will be slightly faster. Only used when exporting data.

NDJSON Format#

class NDJSONFormat[source]#

Bases: FileFormat

The class of the log file format.

Log Format#

class LogFormat[source]#

Bases: FileFormat

The class of the log file format.

Parquet Format#

class ParquetFormat[source]#

Bases: FileFormat

The class of the Parquet file format.