The Search endpoint is usually the first place where users build production workflows with kagiPro.
This guide follows one realistic path: starting from a single question, refining the query syntax, scaling to batches, and choosing an error strategy that matches your use case.
library(kagiPro)
conn <- kagi_connection(
api_key = function() keyring::key_get("API_kagi")
)You create this once and reuse it for every request in your script or project.
Suppose you are collecting policy reports related to biodiversity. You want PDFs and DOCX files, hosted on specific sites, with year hints in the URL.
q <- query_search(
query = 'biodiversity "annual report"',
filetype = c("pdf", "docx"),
site = c("example.com", "gov"),
inurl = c("2024", "report"),
intitle = "summary",
expand = FALSE
)q is a named list. Even with a single query, this is useful because the same downstream code works for one query or one hundred.
If you want to validate what was built, open it directly in a browser:
open_search_query(q[[1]])out_single <- "search_single"
dir.create(out_single, recursive = TRUE, showWarnings = FALSE)
kagi_request(
connection = conn,
query = q[[1]],
limit = 5,
output = out_single,
overwrite = TRUE
)
list.files(out_single, full.names = TRUE)At this point you have stable JSON output that can be inspected, versioned, and reprocessed.
If you monitor multiple themes and sources, use expand = TRUE to generate combinations.
q_many <- query_search(
query = c("biodiversity indicators", "ecosystem services"),
site = c("ipbes.net", "cbd.int"),
filetype = c("pdf", "docx"),
expand = TRUE
)
length(q_many)Run them as a batch:
out_batch <- "search_batch"
dir.create(out_batch, recursive = TRUE, showWarnings = FALSE)
kagi_request(
connection = conn,
query = q_many,
limit = 3,
output = out_batch,
overwrite = TRUE,
workers = 2
)This pattern is appropriate for recurring jobs such as weekly monitoring.
For interactive work or CI where failures should stop execution, use strict mode:
kagi_request(
connection = conn,
query = q[[1]],
limit = 1,
output = "search_strict",
overwrite = TRUE,
error_mode = "stop"
)For long-running collection pipelines where partial progress is better than full abort, use graceful mode:
kagi_request(
connection = conn,
query = q_many,
limit = 1,
output = "search_graceful",
overwrite = TRUE,
workers = 2,
error_mode = "write_dummy"
)In graceful mode, failed requests write dummy JSON records with data = null plus error metadata, and a warning is issued.
Once collection is complete, convert the JSON folder to parquet:
kagi_request_parquet(
input_json = out_batch,
output = "search_batch_parquet",
overwrite = TRUE
)Parquet output is easier to query downstream in analytics pipelines.
error_mode = "stop" for QA/CI and "write_dummy" for large unattended runs.