Research API Overview | You.com | You.com

What is the Research API?

The Research API returns grounded, natural language answers to questions of varying complexity. It runs multiple searches, processes the results, cross-references sources, and synthesizes everything into a thorough, Markdown-formatted answer with inline citations. When you need a typed response, you can also get structured JSON by defining an output_schema.

Ask a hard question, get a researched answer with sources.

How it’s different from Search

The Search API and the Research API serve different purposes by delivering different outputs:

	Search API	Research API
Input	Query, several search parameters (count, language, livecrawl etc.)	Query, research effort, source controls, optional output schema
You get	Raw search results (URLs, snippets, metadata)	A natural language answer or structured JSON object with inline citations, plus search results
Processing	Returns results as-is for you to process	Reads, reasons over, and synthesizes results for you
Speed	Fast — single search round trip	Varies — multiple searches and reasoning steps
Control	Full control over how results are used	Control depth, sources, recency, geography, and response shape
Best for	RAG pipelines, building your own search UI, data gathering	Answering questions of varying complexity using multiple sources

Use the Search API when you want raw results to feed into your own pipeline. Use the Research API when you want a ready-to-use answer backed by sources.

How it works

Research operates as an agentic system that autonomously plans and executes a multi-step research strategy for your question.

Search, Contents, and Live News as retrieval primitives

Research uses You.com’s Search, Contents, and Live News APIs as its core tools. Rather than firing generic web queries, the system selects the right tool for each sub-question — search for discovery, contents for deep page reads, live news for time-sensitive information, and several other internal tools to aid in generating the best possible answer. This targeted tool selection reduces wasted calls and gives the reasoning model cleaner inputs at each step.

The system also evaluates retrieved sources for freshness, diversity, and relevance before incorporating them into the answer.

Context management at scale

Deep research generates far more information than any single LLM context window can hold. Research uses context-masking and compaction strategies that let it operate well beyond those limits — maintaining coherent reasoning across hundreds or thousands of turns without losing track of what it found, what it verified, and what remains unresolved.

At higher effort levels, a single query can run more than 1,000 reasoning turns and process up to 10 million tokens.

Budget-based planning

The system receives a compute budget determined by the research_effort tier you choose. It plans its approach around that budget, allocating more effort to verifying ambiguous or high-stakes claims and moving quickly through well-sourced facts. This is the mechanism that enables the range of latency, accuracy, and cost tradeoffs across tiers.

What you get

Every Research API response includes:

content: A Markdown-formatted answer by default, or a JSON object when you provide output_schema. Inline citations such as [[1, 2]] reference items in the sources array.
content_type: The format of the content field. text is returned for default Markdown responses. object is returned for structured output.
sources: The web pages the API read and cited in the answer — each with a URL, title, and relevant snippets.

1 {
2   "output": {
3     "content": "## RISC-V vs ARM: Key Architectural Differences\n\nRISC-V and ARM are both reduced instruction set architectures, but they differ in licensing, extensibility, and ecosystem maturity [[1, 2]].\n\n### Licensing\nARM requires per-chip licensing fees, while RISC-V is open-source and royalty-free [[1, 3]]...",
4     "content_type": "text",
5     "sources": [
6       {
7         "url": "https://example.com/risc-v-vs-arm",
8         "title": "RISC-V vs ARM: A Technical Comparison",
9         "snippets": [
10           "RISC-V's open ISA allows custom extensions without licensing negotiations, making it attractive for specialized hardware."
11         ]
12       },
13       {
14         "url": "https://example.com/processor-architectures",
15         "title": "Modern Processor Architectures Explained",
16         "snippets": [
17           "ARM's mature ecosystem includes extensive tooling and vendor support built over three decades."
18         ]
19       }
20     ]
21   }
22 }

Key features

Research effort levels

The research_effort parameter controls how much compute the API allocates to your question. Higher effort means more searches, deeper source reading, and more cross-referencing — at the cost of longer response times.

Level	Price per 1k	Latency	Use case
`lite`	$12	< 10s	Quick factual lookups, simple searches
`standard`	$50	~10–30s	Balanced depth for most production use (default)
`deep`	$100	< 120s	Complex multi-source research and synthesis
`exhaustive`	$450	< 300s	Comprehensive analysis across dozens of sources

For the same query, the difference between tiers is substantial. Here’s an abridged comparison for the question “Which global cities improved air quality the most over the past 10 years, and what measurable actions contributed?”:

research_effort = standard

research_effort = exhaustive

1 {
2   "output": {
3     "content": "Global assessments show that the largest recent urban air-quality improvements are concentrated in East China, parts of the eastern United States, Europe, and Japan, with especially strong gains in Chinese megacities and cities with aggressive traffic-emissions controls such as London [[1, 2, 3]].\n\n1) Beijing (China) — PM2.5 fell from ~89–90 µg/m³ in 2013 to ~58 µg/m³ in 2017 (about 35–36% in five years), with evidence from both satellite and surface observations [[4, 5]].\nKey drivers included coal phase-down, industrial controls, stricter vehicle/fuel standards, and regional enforcement [[6, 7, 8]].\n\n2) Chinese city clusters (BTH / YRD / PRD) — China's population-weighted PM2.5 fell ~32% from 2013–2017, with the largest modeled decline in Beijing–Tianjin–Hebei (~38%); across 367 cities, observed PM2.5 fell ~44% from 2013–2019 [[9, 10]].\nThe main drivers were national clean-air action plans, coal controls, industrial restructuring, and transport emissions standards [[7, 9, 10]].\n\n3) London (UK) — London achieved major NO2 reductions linked to LEZ/ULEZ policies, with monitoring and modeling studies showing accelerated declines after ULEZ implementation and meaningful reductions versus no-ULEZ scenarios [[11, 12, 13, 14]].",
4     "content_type": "text",
5     "sources": [
6       {
7         "url": "https://pubmed.ncbi.nlm.nih.gov/36356738/",
8         "title": "Trends in urban air pollution over the last two decades: A global perspective - PubMed",
9         "snippets": [
10           "At global scale, PM2.5 exposures declined slightly from 2000 to 2019 ... Improvements were observed in the Eastern US, Europe, Southeast China, and Japan..."
11         ]
12       }
13     ]
14   }
15 }

The exhaustive response identifies additional cities (Seoul, with specific UNEP data), includes more granular measurements (µg/m³ ranges, percentage reductions over specific date ranges), and cross-references more sources to verify claims.

Citation-backed answers

Every claim in the response links back to a specific source via inline citations. Your users (or your system) can verify any statement by following the numbered references to the sources array.

Markdown output

The content field is formatted in Markdown with headers, lists, and inline citations — ready to render in a UI or feed into downstream processing.

Source Control

source_control lets you constrain which web sources the research agent searches and visits. Use it when you want results from trusted domains only, need to block specific sites, want recent content, or need results focused on a specific country.

source_control is a top-level request field alongside input and research_effort.

Field	Type	Description
`include_domains`	`string[]`	Only return results from these domains. Max 500 domains.
`exclude_domains`	`string[]`	Never return results from these domains. Max 500 domains. Also blocks the research agent from visiting pages on those domains during browsing.
`freshness`	`string`	Filter results by recency. Accepts `day`, `week`, `month`, `year`, or a custom date range in `YYYY-MM-DDtoYYYY-MM-DD` format.
`country`	`string`	ISO 3166-1 alpha-2 country code, such as `US`, `GB`, or `DE`, to geographically focus web results.

include_domains and exclude_domains cannot be used together in the same request.

1 curl -X POST https://api.you.com/v1/research \
2   -H "X-API-Key: api_key" \
3   -H "Content-Type: application/json" \
4   -d '{
5     "input": "What are the latest developments in quantum computing?",
6     "research_effort": "deep",
7     "source_control": {
8       "include_domains": ["nature.com", "arxiv.org", "science.org"]
9     }
10   }'

You can also combine filters:

1 curl -X POST https://api.you.com/v1/research \
2   -H "X-API-Key: api_key" \
3   -H "Content-Type: application/json" \
4   -d '{
5     "input": "New fintech regulations",
6     "research_effort": "standard",
7     "source_control": {
8       "country": "GB",
9       "freshness": "2026-01-01to2026-04-01"
10     }
11   }'

Structured Output

Use output_schema when you want output.content returned as a JSON object instead of free-form text. This is useful for returning predictable fields, extracting entities, or feeding Research API output into another typed system.

output_schema is supported with standard, deep, and exhaustive research effort. It is not supported with lite. Sending output_schema with research_effort: "lite" returns 422.

1 curl -X POST https://api.you.com/v1/research \
2   -H "X-API-Key: api_key" \
3   -H "Content-Type: application/json" \
4   -d '{
5     "input": "What are OpenAI structured output schema constraints?",
6     "research_effort": "standard",
7     "output_schema": {
8       "type": "object",
9       "properties": {
10         "summary": {
11           "type": "string"
12         },
13         "verdict": {
14           "type": "string",
15           "enum": ["supported", "mixed", "unsupported"]
16         }
17       },
18       "required": ["summary", "verdict"],
19       "additionalProperties": false
20     }
21   }'

When output_schema is provided, the structured result is returned in output.content and output.content_type is object. Sources remain in output.sources. The API does not add citation fields into your schema object automatically.

1 {
2   "output": {
3     "content": {
4       "summary": "The schema must be object-rooted and require all declared properties.",
5       "verdict": "supported"
6     },
7     "content_type": "object",
8     "sources": []
9   }
10 }

Schema Rules

output_schema follows a narrow JSON Schema subset designed for reliable structured generation.

Required rules:

The root must be an object.
The root must not use top-level anyOf.
Every object must define properties.
Every object must set additionalProperties: false.
Every property must be listed in required.
Recursive schemas are not supported.
Standalone {"type": "null"} is not supported outside anyOf. Use a nullable union such as ["string", "null"] instead.

Supported patterns include nested objects, arrays, enums, nested anyOf, and non-recursive $defs and $ref.

Unsupported keywords:

allOf
contains
not
dependentRequired
dependentSchemas
format
if / then / else
maxContains / minContains
maxItems / minItems
maxLength / minLength
maxProperties / minProperties
maximum / minimum
multipleOf
pattern
patternProperties
propertyNames
unevaluatedItems / unevaluatedProperties
uniqueItems

Selected limits:

Limit	Value
Max nesting depth	5
Max total properties	100
Max total enum values	500
Max large-enum string budget (>250 values)	7,500
Max total schema string budget	25,000

If the schema is invalid, the request fails validation before model execution. The schema string budget counts property names, $defs names, enum values, and const values. It applies to schema shape only. Request-level limits such as total task spec size are enforced separately at the request layer.

Using Source Control and Structured Output Together

source_control and output_schema can be combined in a single request. For example, you can restrict research to specific domains while requesting a structured response:

1 curl -X POST https://api.you.com/v1/research \
2   -H "X-API-Key: api_key" \
3   -H "Content-Type: application/json" \
4   -d '{
5     "input": "What are the FDA-approved GLP-1 receptor agonists and their indications?",
6     "research_effort": "deep",
7     "source_control": {
8       "include_domains": ["fda.gov", "nih.gov", "pubmed.ncbi.nlm.nih.gov"],
9       "freshness": "year"
10     },
11     "output_schema": {
12       "type": "object",
13       "properties": {
14         "drugs": {
15           "type": "array",
16           "items": {
17             "type": "object",
18             "properties": {
19               "brand_name": { "type": "string" },
20               "generic_name": { "type": "string" },
21               "manufacturer": { "type": "string" },
22               "approved_indications": {
23                 "type": "array",
24                 "items": { "type": "string" }
25               },
26               "approval_year": { "type": "string" }
27             },
28             "required": ["brand_name", "generic_name", "manufacturer", "approved_indications", "approval_year"],
29             "additionalProperties": false
30           }
31         },
32         "summary": { "type": "string" }
33       },
34       "required": ["drugs", "summary"],
35       "additionalProperties": false
36     }
37   }'

Quickstart

1 from youdotcom import You
2 from youdotcom.models import ResearchEffort
3 
4 you = You(api_key_auth="api_key")
5 
6 res = you.research(
7     input="Top 5 EV-selling companies worldwide in 2025 so far",
8     research_effort=ResearchEffort.STANDARD,
9 )
10 
11 # The complete, natural language answer to your query
12 print(res.output.content)
13 
14 # Dive into the source data (title, URL and text snippets)
15 for i, source in enumerate(res.output.sources, 1):
16     print(f"[{i}] {source.title or 'Untitled'}: {source.url}")

Try in Postman

Fork the Research API collection, add your API key to the production environment, and hit Send.

Parameters

Parameter	Type	Required	Description
`input`	string	Yes	The research question (max 40,000 characters)
`research_effort`	string	No	Depth of research: `lite`, `standard` (default), `deep`, `exhaustive`
`source_control`	object	No	Beta. Controls which sources the research agent can use through domain, freshness, and country filters.
`output_schema`	object	No	Beta. Requests structured JSON output that follows a supported JSON Schema subset. Not available with `lite`.

View full API reference

Common use cases

Complex question answering

When a question can’t be answered from a single source — comparative analyses, multi-factor evaluations, questions that span multiple domains — the Research API handles the synthesis for you.

“Compare the pricing models of the top 3 vector databases and their tradeoffs for a 10M-document collection”

Due diligence and market research

Quickly gather verified, cited information about companies, markets, or technologies. The citation-backed output gives you traceability that raw LLM generation can’t.

Internal tools and knowledge assistants

Build internal research tools where employees can ask complex questions and get sourced answers — product comparisons, regulatory summaries, technical deep dives — without manually reading dozens of pages.

Content creation pipelines

Use the Research API as the first step in a content pipeline: ask a research question, get a cited draft, then use it as source material for blog posts, reports, or briefings.

Best practices

Match research effort to the question

Don’t use exhaustive for simple factual questions — lite or standard will be faster and cheaper. Save deep and exhaustive for questions where thoroughness and accuracy justify the longer response time.

Verify citations for high-stakes use cases

The inline citations make verification straightforward. For legal, financial, or medical contexts, build a step that follows citation URLs to confirm claims before surfacing them to end users.

Use structured inputs for better results

The input field supports up to 40,000 characters. For complex research tasks, include context, constraints, or specific angles you want covered. A well-scoped question produces a more focused answer.

Pricing

Research API pricing is tiered by effort level. All new accounts receive $100 in free credits to get started.

Higher effort tiers allocate more compute for deeper reasoning, more source verification, and higher accuracy. See the research effort levels table above for pricing and latency by tier.

For volume discounts, annual pricing, or enterprise features, visit you.com/pricing or contact [email protected].

Try it

Research template

See a working app that runs in-depth research and returns answers with citations.

Next steps

API Reference

Full parameter reference, request/response schemas and interactive playground

Try the Search API

Get raw search results for your own pipelines instead of synthesized answers

Quickstart

Get your API key and try all our APIs in under five minutes