| Title: | Query 'Azure Data Lake Storage Gen2' with 'DuckDB' |
|---|---|
| Description: | Provides convenience utilities for using 'DuckDB' directly over datasets stored in 'Azure Data Lake Storage Gen2' (ADLS Gen2, 'abfss://'). Opens connections configured for Azure-backed 'Delta Lake' and 'Parquet' data, registers Azure credentials as 'DuckDB' secrets, and supports optional repository mirrors for restricted networks. Integrates well with 'DBI' for SQL workflows and with 'dplyr' and 'dbplyr' for lazy table queries. |
| Authors: | Pedro Baltazar [aut, cre, cph] |
| Maintainer: | Pedro Baltazar <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-10 12:15:12 UTC |
| Source: | https://github.com/pedrobtz/quak |
Opens a DuckDB connection and installs the azure and delta extensions.
No secret is registered — use az_set_token_secret(), az_set_sp_secret(),
or az_set_chain_secret() to supply credentials afterwards.
az_conn(conn = NULL)az_conn(conn = NULL)
conn |
An existing DuckDB connection to configure. When |
A DuckDB connection. The caller owns its lifetime; disconnect with
DBI::dbDisconnect(conn, shutdown = TRUE).
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() |> az_set_token_secret(token = my_token) DBI::dbDisconnect(conn, shutdown = TRUE) ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() |> az_set_token_secret(token = my_token) DBI::dbDisconnect(conn, shutdown = TRUE) ## End(Not run)
Queries duckdb_settings() and returns all entries whose name contains
"azure".
az_conn_settings(conn = az_conn())az_conn_settings(conn = az_conn())
conn |
A DuckDB connection. Defaults to |
A tibble::tibble() with columns name, value, description.
conn <- DBI::dbConnect(duckdb::duckdb()) az_conn_settings(conn) DBI::dbDisconnect(conn, shutdown = TRUE)conn <- DBI::dbConnect(duckdb::duckdb()) az_conn_settings(conn) DBI::dbDisconnect(conn, shutdown = TRUE)
Writes a lazy table, data frame, or SQL query to an abfs:// or
abfss:// URL using DuckDB's COPY ... TO command.
az_copy_to( conn, x, url, format = c("parquet", "csv", "json"), partition_by = NULL, overwrite = FALSE )az_copy_to( conn, x, url, format = c("parquet", "csv", "json"), partition_by = NULL, overwrite = FALSE )
conn |
A DuckDB connection. |
x |
A lazy |
url |
Character scalar. Azure Blob URL to write to. |
format |
Output format. One of |
partition_by |
Optional character vector of columns to partition by. |
overwrite |
Logical. When |
Invisibly returns url.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_copy_to( conn, "SELECT * FROM events WHERE event_date >= DATE '2026-01-01'", "abfss://container@account/exports/events", format = "parquet" ) ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_copy_to( conn, "SELECT * FROM events WHERE event_date >= DATE '2026-01-01'", "abfss://container@account/exports/events", format = "parquet" ) ## End(Not run)
Returns the Azure OAuth scope used in examples and token-based authentication
helpers. Configure it with options(quak.default_scope = "...") or the
QUAK_DEFAULT_SCOPE environment variable.
az_default_scope()az_default_scope()
A character scalar OAuth scope.
az_default_scope()az_default_scope()
Returns DuckDB's delta_list_files() output for a Delta table.
az_delta_files(conn, url)az_delta_files(conn, url)
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL pointing to a Delta table. |
A tibble-like data frame with the active file manifest.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_delta_files(conn, "abfss://container@account/tables/sales") ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_delta_files(conn, "abfss://container@account/tables/sales") ## End(Not run)
For an exact file or glob pattern, checks whether DuckDB's glob() returns
at least one match. For a plain path, also probes url/** so dataset
prefixes count as existing when they contain at least one object.
az_exists(conn, url)az_exists(conn, url)
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL or glob pattern. |
Logical scalar.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_exists(conn, "abfss://container@account/data/sales") ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_exists(conn, "abfss://container@account/data/sales") ## End(Not run)
Prints a small preview and invisibly returns it as a tibble-like data frame.
az_glimpse(conn, url, n = 10, format = NULL)az_glimpse(conn, url, n = 10, format = NULL)
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. |
n |
Number of rows to preview. Default |
format |
Optional format override. One of |
Invisibly returns the preview tibble-like data frame.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_glimpse(conn, "abfss://container@account/data/*.parquet", n = 5) ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_glimpse(conn, "abfss://container@account/data/*.parquet", n = 5) ## End(Not run)
Uses DuckDB's glob() table function over Azure storage.
az_glob(conn, pattern)az_glob(conn, pattern)
conn |
A DuckDB connection. |
pattern |
Character scalar. |
Character vector of matching paths.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_glob(conn, "abfss://container@account/data/*.parquet") ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_glob(conn, "abfss://container@account/data/*.parquet") ## End(Not run)
Queries duckdb_secrets() and returns secrets whose type is "azure".
Values are returned as DuckDB reports them; DuckDB handles redaction of
sensitive fields.
az_list_secrets(conn = conn_default())az_list_secrets(conn = conn_default())
conn |
A DuckDB connection. Defaults to |
A tibble::tibble() with the columns returned by
duckdb_secrets().
conn <- DBI::dbConnect(duckdb::duckdb()) az_list_secrets(conn) DBI::dbDisconnect(conn, shutdown = TRUE)conn <- DBI::dbConnect(duckdb::duckdb()) az_list_secrets(conn) DBI::dbDisconnect(conn, shutdown = TRUE)
Uses DuckDB's DESCRIBE SELECT over a remote scan and returns only column
names and DuckDB types.
az_schema(conn, url, format = NULL)az_schema(conn, url, format = NULL)
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. |
format |
Optional format override. One of |
A tibble-like data frame with columns name and type.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_schema(conn, "abfss://container@account/data/*.parquet") ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_schema(conn, "abfss://container@account/data/*.parquet") ## End(Not run)
Creates or replaces a DuckDB Azure secret using the credential_chain
provider. This lets DuckDB resolve credentials itself, for example from the
Azure CLI or environment.
az_set_chain_secret(conn, account = NULL, chain = "default")az_set_chain_secret(conn, account = NULL, chain = "default")
conn |
A DuckDB connection. |
account |
Optional storage account name. When supplied, the secret is scoped to that account. |
chain |
Optional character vector of DuckDB credential-chain entries.
Values are joined with semicolons and passed as DuckDB's |
Invisibly returns conn.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_set_chain_secret(conn, chain = "cli") ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_set_chain_secret(conn, chain = "cli") ## End(Not run)
Creates or replaces a DuckDB Azure secret using the service_principal
provider.
az_set_sp_secret(conn, tenant_id, client_id, client_secret, account = NULL)az_set_sp_secret(conn, tenant_id, client_id, client_secret, account = NULL)
conn |
A DuckDB connection. |
tenant_id |
Character scalar. Azure Entra tenant ID. |
client_id |
Character scalar. Service principal client ID. |
client_secret |
Character scalar. Service principal client secret. |
account |
Optional storage account name. When supplied, the secret is scoped to that account. |
Invisibly returns conn.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_set_sp_secret( conn, tenant_id = "00000000-0000-0000-0000-000000000000", client_id = Sys.getenv("AZURE_CLIENT_ID"), client_secret = Sys.getenv("AZURE_CLIENT_SECRET") ) ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_set_sp_secret( conn, tenant_id = "00000000-0000-0000-0000-000000000000", client_id = Sys.getenv("AZURE_CLIENT_ID"), client_secret = Sys.getenv("AZURE_CLIENT_SECRET") ) ## End(Not run)
Creates or replaces a DuckDB Azure secret using the access_token provider.
Use this when another package has already obtained an access token and you
want to register or refresh a token secret.
az_set_token_secret(conn, token, account = NULL)az_set_token_secret(conn, token, account = NULL)
conn |
A DuckDB connection. |
token |
Character scalar. Access token value. |
account |
Optional storage account name. When supplied, the secret is
scoped to |
Invisibly returns conn.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_set_token_secret(conn, token = "<access-token>") ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_set_token_secret(conn, token = "<access-token>") ## End(Not run)
Sets the Azure performance and transport settings exposed by DuckDB. Each
argument defaults to NULL, which leaves that setting unchanged.
az_tune( conn, concurrency = NULL, chunk_size = NULL, buffer_size = NULL, transport = NULL, metadata_cache = NULL, context_cache = NULL )az_tune( conn, concurrency = NULL, chunk_size = NULL, buffer_size = NULL, transport = NULL, metadata_cache = NULL, context_cache = NULL )
conn |
A DuckDB connection. |
concurrency |
Optional positive whole number for
|
chunk_size |
Optional positive whole number or character scalar for
|
buffer_size |
Optional positive whole number or character scalar for
|
transport |
Optional character scalar for
|
metadata_cache |
Optional logical scalar for
|
context_cache |
Optional logical scalar for |
Invisibly returns conn.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_tune(conn, concurrency = 8, metadata_cache = TRUE) ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_tune(conn, concurrency = 8, metadata_cache = TRUE) ## End(Not run)
Thin convenience wrapper around az_copy_to() with
format = "parquet".
az_write_parquet(conn, x, url, partition_by = NULL, overwrite = FALSE)az_write_parquet(conn, x, url, partition_by = NULL, overwrite = FALSE)
conn |
A DuckDB connection. |
x |
A lazy |
url |
Character scalar. Azure Blob URL to write to. |
partition_by |
Optional character vector of columns to partition by. |
overwrite |
Logical. When |
Invisibly returns url.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_write_parquet(conn, data.frame(x = 1:3), "abfss://container@account/x") ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() az_write_parquet(conn, data.frame(x = 1:3), "abfss://container@account/x") ## End(Not run)
dplyr::collect() method for tables created by tbl_delta() and
tbl_parquet(). Verifies that the backing DuckDB connection is still open
and that the azure extension is loaded before the query is materialised,
then defers to the underlying dbplyr method.
## S3 method for class 'tbl_az' collect(x, ...)## S3 method for class 'tbl_az' collect(x, ...)
x |
A |
... |
Passed on to the next |
A tibble::tibble() with the collected rows.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() tbl_delta(conn, "abfss://container@account/path/sales") |> dplyr::collect() ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() tbl_delta(conn, "abfss://container@account/path/sales") |> dplyr::collect() ## End(Not run)
When called with no arguments, returns all settings as a data frame. When name
is supplied and value is NULL, returns the value of that setting. When
both name and value are supplied, executes SET <name> = <value>.
conn_setting(conn = conn_default(), name = NULL, value = NULL)conn_setting(conn = conn_default(), name = NULL, value = NULL)
conn |
A DuckDB connection. |
name |
Optional character scalar. Setting name. |
value |
Optional value to set. Coerced to character; DuckDB casts it to the appropriate type. |
All settings: a tibble::tibble(). Single setting read: a
character scalar. Write: conn invisibly.
conn <- DBI::dbConnect(duckdb::duckdb()) conn_setting(conn, "threads") DBI::dbDisconnect(conn, shutdown = TRUE)conn <- DBI::dbConnect(duckdb::duckdb()) conn_setting(conn, "threads") DBI::dbDisconnect(conn, shutdown = TRUE)
Builds an ext_cache object: a list of closures bound to a cache directory,
implementing CRUD over cached .duckdb_extension files. Files are laid out
under <cache_path>/<version>/<platform>/<name>.duckdb_extension.
ext_cache(cache_path = ext_cache_path())ext_cache(cache_path = ext_cache_path())
cache_path |
Character scalar. Cache root directory. Defaults to
|
An ext_cache object (a list of closures) with elements:
.path: the cache root.
get(name, version, platform): path to the cached extension, or NULL.
add(name, version, platform, src): copies src into the cache.
list(): data frame of cached extensions.
del(name, version, platform): removes a cached extension. When version
and platform are omitted, removes all cached entries for name.
cache <- ext_cache(file.path(tempdir(), "quak-cache")) cache$.pathcache <- ext_cache(file.path(tempdir(), "quak-cache")) cache$.path
Resolution order: in-memory value (opts$set("cache_dir", ...)) ->
env var QUAK_CACHE_DIR -> OS-appropriate user cache directory via
tools::R_user_dir().
ext_cache_path()ext_cache_path()
Character scalar. The resolved cache path.
ext_cache_path()ext_cache_path()
Returns the path where DuckDB stores installed extension files.
This is determined by the extension_directory setting.
ext_dir(conn = conn_default())ext_dir(conn = conn_default())
conn |
A DuckDB connection. Defaults to |
Character scalar. Path to the extension directory.
conn <- DBI::dbConnect(duckdb::duckdb()) ext_dir(conn) DBI::dbDisconnect(conn, shutdown = TRUE)conn <- DBI::dbConnect(duckdb::duckdb()) ext_dir(conn) DBI::dbDisconnect(conn, shutdown = TRUE)
Tries two strategies in order, succeeding as soon as one works:
ext_install( name, cache = ext_cache(), repo = c("core", "community"), conn = conn_default(), verbose = NULL )ext_install( name, cache = ext_cache(), repo = c("core", "community"), conn = conn_default(), verbose = NULL )
name |
Character scalar. Extension name. |
cache |
An |
repo |
|
conn |
A DuckDB connection. Defaults to |
verbose |
Logical or |
SQL install: runs DuckDB's built-in INSTALL (using the configured
repository URL when one is set via repo_set_urls(), the
QUAK_CORE_REPO / QUAK_COMMUNITY_REPO env vars, or the
quak.core_repo / quak.community_repo R options).
Manual fallback: when the SQL install fails (e.g. DuckDB cannot
reach an HTTPS URL before httpfs is loaded, whereas R's curl can),
downloads the .duckdb_extension file, caches it, and copies it into
the extension directory.
A SQL failure is never raised on its own — it only surfaces (as a warning,
when verbose = TRUE) if the manual fallback also runs. An error is raised
only when both strategies fail.
Idempotent — skips install if the extension is already installed
(checked via the duckdb_extensions() pragma).
Invisibly returns conn.
## Not run: # Requires network access to download the extension. conn <- DBI::dbConnect(duckdb::duckdb()) ext_install("httpfs", conn = conn) DBI::dbDisconnect(conn, shutdown = TRUE) ## End(Not run)## Not run: # Requires network access to download the extension. conn <- DBI::dbConnect(duckdb::duckdb()) ext_install("httpfs", conn = conn) DBI::dbDisconnect(conn, shutdown = TRUE) ## End(Not run)
Executes INSTALL '/path/to/ext.duckdb_extension' on conn. Use this to
install an extension binary you already have on disk without going through a
remote repository.
ext_install_local(path, name = NULL, conn = conn_default())ext_install_local(path, name = NULL, conn = conn_default())
path |
Character scalar. Path to the |
name |
Character scalar. Extension name used in messages. Inferred
from |
conn |
A DuckDB connection. Defaults to |
Invisibly returns conn.
## Not run: # Requires a local DuckDB extension file at the given path. conn <- DBI::dbConnect(duckdb::duckdb()) ext_install_local("/path/to/httpfs.duckdb_extension", conn = conn) DBI::dbDisconnect(conn, shutdown = TRUE) ## End(Not run)## Not run: # Requires a local DuckDB extension file at the given path. conn <- DBI::dbConnect(duckdb::duckdb()) ext_install_local("/path/to/httpfs.duckdb_extension", conn = conn) DBI::dbDisconnect(conn, shutdown = TRUE) ## End(Not run)
Check whether a DuckDB extension is installed
ext_is_installed(name, conn = conn_default())ext_is_installed(name, conn = conn_default())
name |
Character scalar. Extension name. |
conn |
A DuckDB connection. Defaults to |
Logical scalar.
conn <- DBI::dbConnect(duckdb::duckdb()) ext_is_installed("httpfs", conn = conn) DBI::dbDisconnect(conn, shutdown = TRUE)conn <- DBI::dbConnect(duckdb::duckdb()) ext_is_installed("httpfs", conn = conn) DBI::dbDisconnect(conn, shutdown = TRUE)
Returns the full catalog of extensions maintained by the DuckDB core team, regardless of whether they are installed.
ext_list_available(conn = conn_default())ext_list_available(conn = conn_default())
conn |
A DuckDB connection. Defaults to |
A tibble::tibble() with columns: name, version, description.
conn <- DBI::dbConnect(duckdb::duckdb()) ext_list_available(conn) DBI::dbDisconnect(conn, shutdown = TRUE)conn <- DBI::dbConnect(duckdb::duckdb()) ext_list_available(conn) DBI::dbDisconnect(conn, shutdown = TRUE)
Queries duckdb_extensions(), returning only extensions where installed = TRUE.
ext_list_installed(conn = conn_default())ext_list_installed(conn = conn_default())
conn |
A DuckDB connection. Defaults to |
A tibble::tibble() with columns: name, installed, loaded, version, description.
conn <- DBI::dbConnect(duckdb::duckdb()) ext_list_installed(conn) DBI::dbDisconnect(conn, shutdown = TRUE)conn <- DBI::dbConnect(duckdb::duckdb()) ext_list_installed(conn) DBI::dbDisconnect(conn, shutdown = TRUE)
When path is supplied, executes LOAD '/path/to/ext.duckdb_extension'
directly — no install check or auto-install occurs. When only name is
supplied, returns immediately if the extension is already loaded. Otherwise
it checks whether the extension is installed; if not and
auto_install = TRUE, installs it (prompting first when ask = TRUE and
the session is interactive), then executes LOAD <name>.
ext_load( name = NULL, path = NULL, conn = conn_default(), auto_install = TRUE, ask = rlang::is_interactive(), cache = ext_cache(), repo = c("core", "community") )ext_load( name = NULL, path = NULL, conn = conn_default(), auto_install = TRUE, ask = rlang::is_interactive(), cache = ext_cache(), repo = c("core", "community") )
name |
Character scalar. Extension name. When |
path |
Optional character scalar. Path to a local
|
conn |
A DuckDB connection. Defaults to |
auto_install |
Logical. Install automatically when the extension is
missing. Default |
ask |
Logical. Prompt the user before installing. Defaults to
|
cache |
An |
repo |
|
Invisibly returns conn.
## Not run: # Requires network access to download and load the extension. conn <- DBI::dbConnect(duckdb::duckdb()) ext_load("httpfs", conn = conn) DBI::dbDisconnect(conn, shutdown = TRUE) ## End(Not run)## Not run: # Requires network access to download and load the extension. conn <- DBI::dbConnect(duckdb::duckdb()) ext_load("httpfs", conn = conn) DBI::dbDisconnect(conn, shutdown = TRUE) ## End(Not run)
Changes the path where DuckDB stores installed extension files for conn.
The value is written to DuckDB's extension_directory setting.
ext_set_dir(path, conn = conn_default(), create = TRUE)ext_set_dir(path, conn = conn_default(), create = TRUE)
path |
Character scalar. Path to the extension directory. |
conn |
A DuckDB connection. Defaults to |
create |
Logical. If |
Invisibly returns the normalized extension directory path.
conn <- DBI::dbConnect(duckdb::duckdb()) ext_set_dir(file.path(tempdir(), "quak-exts"), conn = conn) DBI::dbDisconnect(conn, shutdown = TRUE)conn <- DBI::dbConnect(duckdb::duckdb()) ext_set_dir(file.path(tempdir(), "quak-exts"), conn = conn) DBI::dbDisconnect(conn, shutdown = TRUE)
Removes the extension file from DuckDB's extension_directory. Optionally
also purges the corresponding entry from the local cache.
ext_uninstall( name, purge_cache = FALSE, cache = ext_cache(), conn = conn_default() )ext_uninstall( name, purge_cache = FALSE, cache = ext_cache(), conn = conn_default() )
name |
Character scalar. Extension name. |
purge_cache |
Logical. If |
cache |
An |
conn |
A DuckDB connection. Defaults to |
Invisibly returns conn.
## Not run: # Requires a connection with the extension already installed. conn <- DBI::dbConnect(duckdb::duckdb()) ext_uninstall("httpfs", conn = conn) DBI::dbDisconnect(conn, shutdown = TRUE) ## End(Not run)## Not run: # Requires a connection with the extension already installed. conn <- DBI::dbConnect(duckdb::duckdb()) ext_uninstall("httpfs", conn = conn) DBI::dbDisconnect(conn, shutdown = TRUE) ## End(Not run)
Validates the URL, loads the azure extension, then registers the dataset
as a VIEW over read_csv_auto(). Use az_conn() first if the connection
needs an Azure secret. Returns conn invisibly — use tbl_csv() if you
want a dplyr::tbl().
load_csv(conn, url, name, replace = TRUE, ...)load_csv(conn, url, name, replace = TRUE, ...)
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. Supports glob patterns. |
name |
Character scalar. Name to register the view under in DuckDB. |
replace |
Logical. Replace an existing view. Default |
... |
Reader options forwarded to DuckDB's |
Invisibly returns conn.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() load_csv(conn, "abfss://container@account/data/*.csv", name = "events") ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() load_csv(conn, "abfss://container@account/data/*.csv", name = "events") ## End(Not run)
Dispatches to load_delta(), load_parquet(), load_csv(), or
load_json() based on format. Only arguments accepted by the target
function may be passed via ...; passing format-incompatible arguments
raises an error.
load_dataset( conn, url, name, format = c("delta", "parquet", "csv", "json"), ... )load_dataset( conn, url, name, format = c("delta", "parquet", "csv", "json"), ... )
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. |
name |
Character scalar. Name to register the dataset under in DuckDB. |
format |
One of |
... |
Passed to the selected loader. |
Invisibly returns conn.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() load_dataset( conn, "abfss://container@account/path/sales", name = "sales", format = "delta" ) ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() load_dataset( conn, "abfss://container@account/path/sales", name = "sales", format = "delta" ) ## End(Not run)
Validates the URL, loads the azure and delta extensions, then registers
the table either as an ATTACH database or a VIEW. Use az_conn() first if
the connection needs an Azure secret. Returns conn invisibly — use
tbl_delta() if you want a dplyr::tbl().
load_delta( conn, url, name, method = c("attach", "view"), replace = TRUE, version = NULL, timestamp = NULL )load_delta( conn, url, name, method = c("attach", "view"), replace = TRUE, version = NULL, timestamp = NULL )
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL pointing to a Delta table. |
name |
Character scalar. Name to register the table under in DuckDB. |
method |
|
replace |
Logical. Replace an existing registration. Default |
version |
Optional non-negative Delta table version to attach. |
timestamp |
Optional Delta table timestamp to attach. Only one of
|
Invisibly returns conn.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() load_delta(conn, "abfss://container@account/path/sales", name = "sales") DBI::dbGetQuery(conn, "SELECT COUNT(*) FROM sales") ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() load_delta(conn, "abfss://container@account/path/sales", name = "sales") DBI::dbGetQuery(conn, "SELECT COUNT(*) FROM sales") ## End(Not run)
Validates the URL, loads the azure extension, then registers the dataset
as a VIEW over read_json_auto(). Use az_conn() first if the connection
needs an Azure secret. Returns conn invisibly — use tbl_json() if you
want a dplyr::tbl().
load_json(conn, url, name, replace = TRUE, ...)load_json(conn, url, name, replace = TRUE, ...)
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. Supports glob patterns. |
name |
Character scalar. Name to register the view under in DuckDB. |
replace |
Logical. Replace an existing view. Default |
... |
Reader options forwarded to DuckDB's |
Invisibly returns conn.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() load_json(conn, "abfss://container@account/data/*.json", name = "events") ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() load_json(conn, "abfss://container@account/data/*.json", name = "events") ## End(Not run)
Validates the URL, loads the azure extension, then registers the dataset
as a VIEW. Use az_conn() first if the connection needs an Azure secret.
Returns conn invisibly — use tbl_parquet() if you want a dplyr::tbl().
load_parquet(conn, url, name, hive_partitioning = FALSE, replace = TRUE)load_parquet(conn, url, name, hive_partitioning = FALSE, replace = TRUE)
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. Supports glob patterns. |
name |
Character scalar. Name to register the view under in DuckDB. |
hive_partitioning |
Logical. Enable Hive partition inference. Default |
replace |
Logical. Replace an existing view. Default |
Invisibly returns conn.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() load_parquet(conn, "abfss://container@account/data/*.parquet", name = "events") ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() load_parquet(conn, "abfss://container@account/data/*.parquet", name = "events") ## End(Not run)
Renders one row per option with its current (resolved) value, the source that value came from, the environment variable that can override it (and whether it is set), and the built-in default.
## S3 method for class 'quak_opts' print(x, mask = TRUE, ...)## S3 method for class 'quak_opts' print(x, mask = TRUE, ...)
x |
A |
mask |
Logical. When |
... |
Unused. |
Invisibly returns x.
Prints every quak option (via print.quak_opts()) and invisibly returns a
tibble of the same information. The resolution order is: value set via
options(quak.*) -> the option's env var -> a built-in default.
quak_options(mask = TRUE)quak_options(mask = TRUE)
mask |
Logical. When |
Invisibly, a tibble::tibble() with columns option, value,
source, env_var, env_value, and default.
quak_options()quak_options()
Stores URLs in R options quak.core_repo / quak.community_repo so they
can be configured org-wide in .Rprofile. When core is supplied, also
sets DuckDB's custom_extension_repository on conn; passing NULL
resets that connection setting to DuckDB's default.
repo_set_urls( core = NULL, community = NULL, check = TRUE, conn = conn_default() )repo_set_urls( core = NULL, community = NULL, check = TRUE, conn = conn_default() )
core |
Optional character scalar. URL for the core extension repository.
Omit to leave the current value unchanged. Pass |
community |
Optional character scalar. URL for the community extension
repository. Omit to leave the current value unchanged. Pass |
check |
Logical. If |
conn |
A DuckDB connection. Defaults to |
Invisibly returns a named list with elements core and community
reflecting the current option values.
old <- repo_urls() repo_set_urls(core = "https://extensions.example.com", check = FALSE) repo_urls() repo_set_urls(core = old$core, check = FALSE)old <- repo_urls() repo_set_urls(core = "https://extensions.example.com", check = FALSE) repo_urls() repo_set_urls(core = old$core, check = FALSE)
Returns the currently active repository URLs. Resolution order per repo:
R option (quak.core_repo / quak.community_repo) -> env var
(QUAK_CORE_REPO / QUAK_COMMUNITY_REPO) -> built-in default.
repo_urls()repo_urls()
A named list with elements core and community.
repo_urls()repo_urls()
Validates the URL, loads the azure extension, then returns a lazy
dplyr::tbl() over the dataset. Use az_conn() first if the connection
needs Azure extensions, settings, or secrets.
tbl_csv(conn, url, name = NULL, replace = TRUE, ...)tbl_csv(conn, url, name = NULL, replace = TRUE, ...)
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. Supports glob patterns. |
name |
Optional character scalar. Name to register the view under in
DuckDB. When |
replace |
Logical. Replace an existing view of the same name.
Default |
... |
Reader options forwarded to DuckDB's |
When name is NULL the dataset is queried directly via read_csv_auto()
with no persistent object registered on the connection. When name is
supplied the dataset is first registered as a VIEW via load_csv(), then
referenced by name.
A dplyr::tbl() backed by the CSV dataset.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() tbl_csv(conn, "abfss://container@account/data/*.csv") |> dplyr::collect() ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() tbl_csv(conn, "abfss://container@account/data/*.csv") |> dplyr::collect() ## End(Not run)
Validates the URL, loads the azure and delta extensions, then returns a
lazy dplyr::tbl() over the table. Use az_conn() first if the connection
needs Azure extensions, settings, or secrets.
tbl_delta( conn, url, name = NULL, method = c("attach", "view"), replace = TRUE, version = NULL, timestamp = NULL )tbl_delta( conn, url, name = NULL, method = c("attach", "view"), replace = TRUE, version = NULL, timestamp = NULL )
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL pointing to a Delta table
(e.g. |
name |
Optional character scalar. Name to register the table under in
DuckDB. When |
method |
|
replace |
Logical. Replace an existing registration of the same name.
Default |
version |
Optional non-negative Delta table version to read. |
timestamp |
Optional Delta table timestamp to read. Only one of
|
When name is NULL the table is queried directly via delta_scan() with
no persistent object registered on the connection. When name is supplied
the table is first registered via load_delta() (as an ATTACH database or a
VIEW depending on method), then referenced by name.
Delta time travel currently requires name because DuckDB exposes
version and timestamp through ATTACH, not delta_scan().
A dplyr::tbl() backed by the Delta table.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() tbl_delta(conn, "abfss://container@account/path/sales") |> dplyr::filter(amount > 100) |> dplyr::collect() ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() tbl_delta(conn, "abfss://container@account/path/sales") |> dplyr::filter(amount > 100) |> dplyr::collect() ## End(Not run)
Validates the URL, loads the azure extension, then returns a lazy
dplyr::tbl() over the dataset. Use az_conn() first if the connection
needs Azure extensions, settings, or secrets.
tbl_json(conn, url, name = NULL, replace = TRUE, ...)tbl_json(conn, url, name = NULL, replace = TRUE, ...)
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. Supports glob patterns. |
name |
Optional character scalar. Name to register the view under in
DuckDB. When |
replace |
Logical. Replace an existing view of the same name.
Default |
... |
Reader options forwarded to DuckDB's |
When name is NULL the dataset is queried directly via read_json_auto()
with no persistent object registered on the connection. When name is
supplied the dataset is first registered as a VIEW via load_json(), then
referenced by name.
A dplyr::tbl() backed by the JSON dataset.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() tbl_json(conn, "abfss://container@account/data/*.json") |> dplyr::collect() ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() tbl_json(conn, "abfss://container@account/data/*.json") |> dplyr::collect() ## End(Not run)
Validates the URL, loads the azure extension, then returns a lazy
dplyr::tbl() over the dataset. Use az_conn() first if the connection
needs Azure extensions, settings, or secrets.
tbl_parquet(conn, url, name = NULL, hive_partitioning = FALSE, replace = TRUE)tbl_parquet(conn, url, name = NULL, hive_partitioning = FALSE, replace = TRUE)
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. Supports glob patterns for
multi-file datasets
(e.g. |
name |
Optional character scalar. Name to register the view under in
DuckDB. When |
hive_partitioning |
Logical. Enable Hive partition inference from the
directory structure. Default |
replace |
Logical. Replace an existing view of the same name.
Default |
When name is NULL the dataset is queried directly via read_parquet()
with no persistent object registered on the connection. When name is
supplied the dataset is first registered as a VIEW via load_parquet(), then
referenced by name. Glob patterns (e.g. "*.parquet") are supported in
url for multi-file datasets.
A dplyr::tbl() backed by the Parquet dataset.
## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() tbl_parquet(conn, "abfss://container@account/data/*.parquet") |> dplyr::collect() ## End(Not run)## Not run: # Requires a live Azure account, credentials, and network access. conn <- az_conn() tbl_parquet(conn, "abfss://container@account/data/*.parquet") |> dplyr::collect() ## End(Not run)