File Destination Plugin
Latest: v3.2.0This destination plugin lets you sync data from a CloudQuery source to local files in various formats. It currently supports CSV, line-delimited JSON and Parquet.
This plugin is useful in local environments, but also in production environments where scalability, performance and cost are requirements. For example, this plugin can be used as part of a system that syncs sources across multiple virtual machines, uploads Parquet files to a remote storage (such as S3 or GCS), and finally loads them to data lakes such as BigQuery or Athena in batch mode. If this is your end goal, you may also want to look at more specific destination cloud storage destination plugins such as S3, GCS or Azure Blob Storage.
Example
This example configures the file destination, to create CSV files in ./cq_csv_output
. You can also choose json
or parquet
as the output format. Note that the file plugin only supports append
write-mode.
The (top level) spec section is described in the Destination Spec Reference.
kind: destination
spec:
name: "file"
path: "cloudquery/file"
version: "v3.2.0"
write_mode: "append" # file only supports 'append' mode
spec:
path: "path/to/files/{{TABLE}}/{{UUID}}.{{FORMAT}}"
format: "csv" # supported values are 'csv', 'json' and 'parquet'
File Spec
This is the (nested) spec used by the file destination Plugin.
-
path
(string) (required)Path template string that determines where files will be written. The path supports the following placeholder variables:
{{TABLE}}
will be replaced with the table name{{FORMAT}}
will be replaced with the file format, such ascsv
,json
orparquet
{{UUID}}
will be replaced with a random UUID to uniquely identify each file{{YEAR}}
will be replaced with the current year inYYYY
format{{MONTH}}
will be replaced with the current month inMM
format{{DAY}}
will be replaced with the current day inDD
format{{HOUR}}
will be replaced with the current hour inHH
format{{MINUTE}}
will be replaced with the current minute inmm
format
Note that timestamps are in UTC and will be the current time at the time the file is written, not when the sync started.
-
directory
(string) (required ifpath
is not set) (deprecated)Directory where all files will be written. One file will be created per table. This is now deprecated in favor of
path
which allows more flexibility, and thedirectory
option will be removed in a future version. -
format
(string) (required)Format of the output file. Supported values are
csv
,json
andparquet
. -
no_rotate
(bool) (optional)If set to true, the plugin will write to one file per table. Otherwise, for every batch a new file will be created with a different
.<UUID>
suffix. -
format_spec
(map format_spec) (optional) Optional parameters to change the format of the file
format_spec
delimiter
(string) (optional) (default:,
)
Character that will be used as want to use as the delimiter if the format type is csv
skip_header
(bool) (optional) (default: false)
Specifies if the first line of a file should be the headers (when format is csv
).