File Formats#
File Format#
CSV Format#
- class CSVFormat(separator: str | int = ',', quote_char: str | int = '"', eol_char: str | int = '\n', input_encoding: str = 'Utf8', input_null_values: list | None = None, input_missing_is_null: bool = True, input_truncate_ragged_lines: bool = False, input_comment_prefix: str | int | None = None, input_try_parse_dates: bool = False, input_decimal_comma: bool = False, input_has_header: bool = True, input_skip_rows: int = 0, input_skip_rows_after_header: int = 0, input_raise_if_empty: bool = True, input_ignore_errors: bool = False, output_include_header: bool = True, output_datetime_format: str | None = None, output_date_format: str | None = None, output_time_format: str | None = None, output_float_scientific: bool | None = None, output_float_precision: int | None = None, output_null_value: str | None = None, output_quote_style: str | None = None, output_maintain_order: bool = True)[source]#
Bases:
FileFormat
Initializes the CSV format object.
- Parameters:
separator – The separator of the CSV file.
quote_char – (optional) The quote character of the CSV file.
eol_char – (optional) The end of line character of the CSV file.
input_encoding – (optional) The encoding of the CSV file. Only used when importing data.
input_null_values – (optional) The null values of the CSV file. Only used when importing data.
input_missing_is_null – (optional) Whether missing values should be marked as null. Only used when importing data.
input_truncate_ragged_lines – (optional) Whether to truncate ragged lines of the CSV file. Only used when importing data.
input_comment_prefix – (optional) The comment prefix of the CSV file. Only used when importing data.
input_try_parse_dates – (optional) Whether to try parse dates of the CSV file. Only used when importing data.
input_decimal_comma – (optional) Whether the CSV file uses decimal comma. Only used when importing data.
input_has_header – (optional) If the CSV file has header. Only used when importing data.
input_skip_rows – (optional) How many rows should be skipped in the CSV file. Only used when importing data.
input_skip_rows_after_header (int, optional) – How many rows should be skipped after the header in the CSV file. Only used when importing data.
input_raise_if_empty – (optional) If an error should be raised for an empty CSV. Only used when importing data.
input_ignore_errors – (optional) If the errors loading the CSV must be ignored. Only used when importing data.
output_include_header – (optional) Whether to include header in the CSV output. Only used when exporting data.
output_datetime_format – (optional) A format string, with the specifiers defined by the chrono Rust crate. If no format specified, the default fractional-second precision is inferred from the maximum timeunit found in the frame’s Datetime cols (if any). Only used when exporting data.
output_date_format – (optional) A format string, with the specifiers defined by the chrono Rust crate. Only used when exporting data.
output_time_format – (optional) A format string, with the specifiers defined by the chrono Rust crate. Only used when exporting data.
output_float_scientific – (optional) Whether to use scientific form always (true), never (false), or automatically (None). Only used when exporting data.
output_float_precision – (optional) Number of decimal places to write. Only used when exporting data.
output_null_value – (optional) A string representing null values (defaulting to the empty string). Only used when exporting data.
output_quote_style – (optional) Determines the quoting strategy used. Only used when exporting data. * necessary (default): This puts quotes around fields only when necessary. They are necessary when fields contain a quote, separator or record terminator. Quotes are also necessary when writing an empty record (which is indistinguishable from a record with one empty field). This is the default. * always: This puts quotes around every field. Always. * never: This never puts quotes around fields, even if that results in invalid CSV data (e.g.: by not quoting strings containing the separator). * non_numeric: This puts quotes around all fields that are non-numeric. Namely, when writing a field that does not parse as a valid float or integer, then quotes will be used even if they aren`t strictly necessary.
output_maintain_order – (optional) Maintain the order in which data is processed. Setting this to False will be slightly faster. Only used when exporting data.
NDJSON Format#
- class NDJSONFormat[source]#
Bases:
FileFormat
The class of the log file format.
Log Format#
- class LogFormat[source]#
Bases:
FileFormat
The class of the log file format.
Parquet Format#
- class ParquetFormat[source]#
Bases:
FileFormat
The class of the Parquet file format.