---
title: "kagiPro Quickstart Guide"
author: "Rainer Krug"
format: html
vignette: >
  %\VignetteIndexEntry{kagiPro Quickstart Guide}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
execute:
  echo: true
  warning: false
  message: false
---

# kagiPro: First End-to-End Workflow

This quickstart is a complete first run with `kagiPro`: install the package, create a secure connection, build queries, execute requests, and store responses in a format ready for analysis.

The package is intentionally structured around a stable pattern:

1. Create query objects with endpoint-specific constructors.
2. Reuse a single `kagi_connection()` object.
3. Either use `kagi_fetch()` for project-folder workflows, or use
   `kagi_request()` + `kagi_request_parquet()` directly.
4. Keep JSON as the request audit trail, and parquet as analysis-ready output.

Once this pattern is familiar, every endpoint follows the same operational logic.

## Project-folder first workflow

For endpoint-scoped project outputs (aligned with `openalexPro` conventions),
use `kagi_fetch()`:

```r
q_project <- query_search("biodiversity policy", expand = FALSE)

kagi_fetch(
  connection = conn,
  query = q_project,
  project_folder = "kagi_project"
)
```

This writes to:

- `kagi_project/search/json`
- `kagi_project/search/parquet`

## Install and load the package

If `kagiPro` is not installed yet, install it from GitHub and load it:

```r
if (requireNamespace("remotes", quietly = FALSE)) {
  install.packages("remotes")
}
remotes::install_github("rkrug/kagiPro")
library(kagiPro)
```

## Create a secure API connection

Store your API key in your local keyring once:

```r
keyring::key_set("API_kagi")
```

Then create a reusable connection object:

```r
conn <- kagi_connection(
  api_key = function() keyring::key_get("API_kagi")
)
```

Using a function for `api_key` keeps credentials out of scripts and supports reproducible batch runs.

## Build your first search query

Search is a good first endpoint because it shows how query constructors work:

```r
q <- query_search(
  query = 'biodiversity "annual report"',
  filetype = c("pdf", "docx"),
  site = c("example.com", "gov"),
  inurl = c("2024", "report"),
  intitle = "summary",
  expand = FALSE
)
```

`query_search()` returns a named list, even for a single input. That consistency makes it easy to scale the same code from one query to many.

If you want to inspect the generated search string interactively:

```r
open_search_query(q[[1]])
```

## Execute and persist search results

Create an output folder and run the request:

```r
out_search <- tempfile("kagiPro-search-")
dir.create(out_search, recursive = TRUE, showWarnings = FALSE)

kagi_request(
  connection = conn,
  query = q,
  limit = 5,
  output = out_search,
  overwrite = TRUE
)

list.files(out_search, pattern = "\\.json$", full.names = TRUE)
```

Each request writes a JSON file. This makes reruns and audits straightforward.

## Run one example from each endpoint

The remaining endpoints use the same connection and request function. Only the query constructor changes.

### Enrich web

```r
q_web <- query_enrich_web("open data portals", site = "gov", expand = FALSE)

out_web <- tempfile("kagiPro-enrich-web-")
dir.create(out_web, recursive = TRUE, showWarnings = FALSE)

kagi_request(
  connection = conn,
  query = q_web,
  output = out_web,
  overwrite = TRUE
)
```

### Enrich news

```r
q_news <- query_enrich_news("biodiversity policy", expand = FALSE)

out_news <- tempfile("kagiPro-enrich-news-")
dir.create(out_news, recursive = TRUE, showWarnings = FALSE)

kagi_request(
  connection = conn,
  query = q_news,
  output = out_news,
  overwrite = TRUE
)
```

### Summarize (text input)

```r
q_sum_text <- query_summarize(
  text = paste(
    "Biodiversity underpins ecosystem services including pollination,",
    "soil fertility, water purification, and climate regulation.",
    "Species decline has implications for resilience and wellbeing."
  ),
  engine = "cecil",
  summary_type = "summary",
  target_language = "EN",
  cache = TRUE
)

out_sum <- tempfile("kagiPro-summarize-")
dir.create(out_sum, recursive = TRUE, showWarnings = FALSE)

kagi_request(
  connection = conn,
  query = q_sum_text,
  output = out_sum,
  overwrite = TRUE
)
```

### FastGPT

```r
q_fast <- query_fastgpt(
  query = "What are ecosystem services?",
  cache = TRUE,
  web_search = TRUE
)

out_fast <- tempfile("kagiPro-fastgpt-")
dir.create(out_fast, recursive = TRUE, showWarnings = FALSE)

kagi_request(
  connection = conn,
  query = q_fast,
  output = out_fast,
  overwrite = TRUE
)
```

## Convert JSON results to parquet

When you move from inspection to analysis pipelines, parquet is usually more convenient:

```r
parquet_dir <- tempfile("kagiPro-parquet-")

kagi_request_parquet(
  input_json = out_search,
  output = parquet_dir,
  overwrite = TRUE
)
```

## Bridge to OpenAlex-style vector input

If you want to pass results into `openalexPro`/`openalexVectorComp` workflows,
use the modular content pipeline: download content -> extract markdown ->
summarize markdown.

```r
download_content(
  project_folder = "kagi_project",
  endpoint = "search"
)

content_markdown(
  project_folder = "kagi_project",
  endpoint = "search"
)

markdown_abstract(
  project_folder = "kagi_project",
  endpoint = "search",
  summarizer_fn = summarize_with_openai,
  model = "gpt-4.1-mini"
)

# abstract parquet output is written under:
# kagi_project/search/abstract/query=<query_name>/
```

`id` is a deterministic hash of normalized URL.

## Where to go next

For deeper endpoint-specific workflows (batching patterns, robust error handling, and endpoint-focused examples), continue with:

- `vignette("search-endpoint", package = "kagiPro")`
- `vignette("enrich-endpoint", package = "kagiPro")`
- `vignette("summarize-endpoint", package = "kagiPro")`
- `vignette("fastgpt-endpoint", package = "kagiPro")`
- `vignette("corpus-workflow", package = "kagiPro")`

## Session info

```r
sessionInfo()
```
