Back to Blog
Azure Bing Grounding: Web Search in AI Responses

Azure Bing Grounding: Web Search in AI Responses

January 3, 2026
Stefan Mentović
azureaibinggroundingsearch

Learn how to add real-time web search capabilities to your Azure AI agents with Bing Grounding, complete with citations and production-ready code.

#Azure Bing Grounding: Web Search in AI Responses

Your AI agent can answer questions about your data, but what happens when users ask about current events, recent product releases, or breaking news? The model's knowledge cutoff becomes a hard wall. You could build a custom web scraping system, but that's complex, unreliable, and potentially violates terms of service.

Enter Azure Bing Grounding. It's Microsoft's answer to giving AI agents access to real-time web information, complete with automatic citations and source verification. No web scraping, no custom search indexing, no complex infrastructure. Just add a tool to your agent and let it search when needed.

In this guide, we'll explore how to provision Bing Grounding with Terraform, integrate it with Azure AI Foundry projects, and build production-ready agents that cite their sources.

#What is Bing Grounding and Why It Matters

Bing Grounding is an Azure service that allows AI agents to search the web using Bing and incorporate results directly into their responses. Unlike traditional RAG (Retrieval Augmented Generation) systems that search your private data, Bing Grounding searches the public web in real-time.

#Key Benefits

Real-time Information: Agents can access current information beyond their training data cutoff. Market prices, weather, news, documentation updates—anything publicly available on the web.

Automatic Citations: Every piece of information from Bing includes a citation with title and URL. This provides transparency and allows users to verify sources, which is critical for trust and compliance.

Agent-Controlled Search: The model decides when to search based on the user's question. You don't need to detect search intent manually—the agent handles it intelligently.

Cost Optimization: A single Bing Grounding resource can be shared across multiple environments (development, staging, production), reducing costs while maintaining isolation at the project level.

#How It Works

When you enable Bing Grounding on an Azure AI agent:

  1. User asks a question
  2. Agent determines if web search would be helpful
  3. Agent calls Bing Grounding tool automatically
  4. Bing returns search results with URLs
  5. Agent incorporates results into response
  6. Citations are extracted from annotations

The agent decides when to search based on the context. Ask about last week's tech news, and it searches. Ask about your application's business logic, and it uses its base knowledge.

#Provisioning Bing Grounding with Terraform

Bing Grounding requires the Microsoft.Bing/accounts resource type with kind Bing.Grounding. This is different from the older Bing Search API v7 that uses Cognitive Services.

#Setting Up the Resource Group

First, create a dedicated resource group for Bing. This allows you to share the Bing resource across environments while maintaining isolation.

# Resource Group for Bing grounding (on Bing subscription)
# Shared across environments for cost optimization
# Dev creates it, Prod imports it via data source
resource "azurerm_resource_group" "bing_grounding" {
  count    = var.env == "dev" ? 1 : 0
  provider = azurerm.bing
  name     = "shared-bing-rg"
  location = "westeurope" # Must be a real region

  tags = {
    environment = "shared"
    project     = "ai-platform"
    managedBy   = "terraform"
    purpose     = "shared-grounding"
  }
}

# Data source to reference existing Bing RG in prod
data "azurerm_resource_group" "bing_grounding_existing" {
  count    = var.env == "prod" ? 1 : 0
  provider = azurerm.bing
  name     = "shared-bing-rg"
}

# Local to get the resource group ID regardless of environment
locals {
  bing_rg_id = var.env == "dev" ?
    azurerm_resource_group.bing_grounding[0].id :
    data.azurerm_resource_group.bing_grounding_existing[0].id
}

This pattern creates the resource group in development and references it in production, avoiding duplicate costs.

#Creating the Bing Grounding Resource

Use the azapi provider to create the Bing Grounding resource. The standard azurerm provider doesn't support this resource type yet.

# Bing Grounding resource
# Uses Microsoft.Bing/accounts API (not CognitiveServices Bing.Search.v7)
# Only created in dev, shared with prod via data source
resource "azapi_resource" "bing_grounding" {
  count                     = var.env == "dev" ? 1 : 0
  type                      = "Microsoft.Bing/accounts@2020-06-10"
  name                      = "shared-bing-grounding"
  parent_id                 = local.bing_rg_id
  location                  = "global"
  schema_validation_enabled = false

  body = {
    kind = "Bing.Grounding"
    sku = {
      name = "G1"
    }
    properties = {
      statisticsEnabled = false
    }
  }

  tags = {
    environment = var.env
    project     = "ai-platform"
    managedBy   = "terraform"
    purpose     = "shared-grounding"
  }

  response_export_values = ["properties.endpoint"]
}

# Data source to reference existing Bing Grounding in prod
data "azapi_resource" "bing_grounding_existing" {
  count                  = var.env == "prod" ? 1 : 0
  type                   = "Microsoft.Bing/accounts@2020-06-10"
  name                   = "shared-bing-grounding"
  parent_id              = local.bing_rg_id
  response_export_values = ["properties.endpoint"]
}

# Locals to access Bing resources regardless of environment
locals {
  bing_grounding_id = var.env == "dev" ?
    azapi_resource.bing_grounding[0].id :
    data.azapi_resource.bing_grounding_existing[0].id

  bing_grounding_endpoint = var.env == "dev" ?
    azapi_resource.bing_grounding[0].output.properties.endpoint :
    data.azapi_resource.bing_grounding_existing[0].output.properties.endpoint
}

Important Notes:

  • Location must be "global" for Bing Grounding
  • SKU G1 is the standard tier for grounding
  • Only dev environment creates the resource; prod references it

#Retrieving API Keys

Use the listKeys action to retrieve the API keys securely:

# Get Bing API keys using listKeys action
data "azapi_resource_action" "bing_keys" {
  type                   = "Microsoft.Bing/accounts@2020-06-10"
  resource_id            = local.bing_grounding_id
  action                 = "listKeys"
  response_export_values = ["*"]
}

output "bing_grounding_key" {
  description = "Bing grounding API key (sensitive)"
  value       = data.azapi_resource_action.bing_keys.output.key1
  sensitive   = true
}

Store the key in Azure Key Vault or as a secret in your deployment pipeline. Never commit it to source control.

#Connecting Bing to Azure AI Foundry Projects

Once you have the Bing resource provisioned, connect it to your AI Foundry project. This makes Bing Grounding available as a tool for agents in that project.

#Creating the Connection

Use the Microsoft.CognitiveServices/accounts/connections API to create the connection:

# Connection: Bing Grounding to AI Foundry Project
# This enables Bing grounding capabilities for AI agents in the project
# Compatible with GPT-4o, GPT-3.5-turbo, GPT-4, GPT-4-turbo
resource "azapi_resource" "bing_connection" {
  type                      = "Microsoft.CognitiveServices/accounts/connections@2025-06-01"
  name                      = "bing-grounding-connection"
  parent_id                 = azurerm_cognitive_account.ai_foundry.id
  schema_validation_enabled = false

  body = {
    properties = {
      category      = "ApiKey"
      target        = local.bing_grounding_endpoint
      authType      = "ApiKey"
      isSharedToAll = true

      metadata = {
        Location   = "westeurope"
        ApiType    = "Azure"
        ResourceId = local.bing_grounding_id
      }

      credentials = {
        key = data.azapi_resource_action.bing_keys.output.key1
      }
    }
  }

  depends_on = [
    data.azapi_resource_action.bing_keys,
    azapi_resource.ai_foundry_project
  ]
}

Key Properties:

  • category: Must be "ApiKey" for Bing Grounding
  • isSharedToAll: Set to true to make available to all agents in the project
  • credentials.key: The API key retrieved from the Bing resource

#Model Compatibility

Bing Grounding works with most Azure OpenAI models:

  • ✅ gpt-4o
  • ✅ gpt-4-turbo
  • ✅ gpt-4
  • ✅ gpt-3.5-turbo
  • ❌ gpt-4o-mini-2024-07-18 (not compatible)

If you need to use gpt-4o-mini, verify compatibility with the latest Azure AI documentation.

#Building an Agent Service with Bing Grounding

With infrastructure provisioned, let's build a TypeScript service that creates agents with Bing Grounding enabled.

#Service Configuration

First, configure environment variables for authentication and project access:

// Environment variables needed:
// - PROJECT_ENDPOINT: Azure AI Foundry project endpoint
// - MODEL_DEPLOYMENT_NAME: Name of your GPT-4o deployment
// - BING_CONNECTION_ID: Connection ID from Terraform output
// - AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET (local dev)
// - Or use Managed Identity in production

import { AgentsClient, ToolUtility } from '@azure/ai-agents';
import { DefaultAzureCredential } from '@azure/identity';

export class AzureAgentService {
	private client: AgentsClient;
	private modelDeploymentName: string;
	private bingConnectionId?: string;

	constructor() {
		const projectEndpoint = process.env.PROJECT_ENDPOINT;
		this.modelDeploymentName = process.env.MODEL_DEPLOYMENT_NAME;
		this.bingConnectionId = process.env.BING_CONNECTION_ID;

		if (!projectEndpoint || !this.modelDeploymentName) {
			throw new Error('PROJECT_ENDPOINT and MODEL_DEPLOYMENT_NAME environment variables are required');
		}

		// DefaultAzureCredential supports:
		// 1. EnvironmentCredential (local dev)
		// 2. ManagedIdentityCredential (production)
		// 3. AzureCliCredential (fallback)
		const credential = new DefaultAzureCredential();

		this.client = new AgentsClient(projectEndpoint, credential);
	}
}

#Creating Agents with Bing Grounding

Enable Bing Grounding by adding it as a tool during agent creation:

interface AgentConfig {
  name: string;
  instructions?: string;
  enableBingGrounding?: boolean; // Default: true
}

async createAgent(config: AgentConfig) {
  const tools = [];

  // Always add Bing grounding tool unless explicitly disabled
  const shouldEnableBing = config.enableBingGrounding !== false;

  if (shouldEnableBing && this.bingConnectionId) {
    const bingTool = ToolUtility.createBingGroundingTool([
      {
        connectionId: this.bingConnectionId,
        count: 20, // Number of search results (max: 50, default: 5)
      },
    ]);
    tools.push(bingTool.definition);
  }

  const agent = await this.client.createAgent(this.modelDeploymentName, {
    name: config.name,
    instructions: config.instructions,
    tools: tools.length > 0 ? tools : undefined,
    temperature: 0.2, // Low temperature for consistent, factual responses
  });

  return agent;
}

Important Parameters:

  • count: Number of search results to retrieve (1-50). Higher values provide more context but use more tokens. 20 is a good balance for quality.
  • temperature: Lower values (0.1-0.3) work better for factual responses with citations.

#Configuring Search Result Count

The count parameter controls how many Bing search results the agent can access. This directly impacts response quality and cost:

Search result count options:

  • 5 results (default) - Simple queries, cost-sensitive (~500 tokens)
  • 10 results - Standard queries, balanced (~1,000 tokens)
  • 20 results - Complex queries, high quality (~2,000 tokens)
  • 50 results (maximum) - Research tasks, comprehensive (~5,000 tokens)

Higher counts improve accuracy for complex queries but increase prompt tokens significantly. Start with 10-20 and adjust based on your use case.

#Sending Messages and Extracting Citations

Once you have an agent with Bing Grounding, you can send messages and extract citations from responses.

#Sending Messages

Create a thread, send a message, and poll for completion:

async sendMessage(
  agentId: string,
  userMessage: string
): Promise<AgentResponse> {
  // Create a thread for this conversation
  const thread = await this.client.threads.create();

  try {
    // Create user message
    await this.client.messages.create(thread.id, 'user', userMessage);

    // Create and poll run
    const runStatus = await this.client.runs.createAndPoll(thread.id, agentId, {
      pollingOptions: {
        intervalInMs: 15000, // 15 seconds to avoid rate limits
      },
    });

    // Check if run failed
    if (runStatus.status === 'failed') {
      const errorMessage = runStatus.lastError?.message || 'Unknown error';
      throw new Error(`Agent run failed: ${errorMessage}`);
    }

    // Get the agent's response with citations
    const response = await this.getLatestAgentMessage(thread.id);

    return {
      content: response.content,
      citations: response.citations,
      metadata: {
        agentId,
        threadId: thread.id,
        runId: runStatus.id,
        tokensUsed: runStatus.usage ? {
          prompt: runStatus.usage.promptTokens,
          completion: runStatus.usage.completionTokens,
          total: runStatus.usage.totalTokens,
        } : undefined,
      },
    };
  } finally {
    // Clean up thread
    await this.client.threads.delete(thread.id);
  }
}

Polling Considerations:

  • Use a longer polling interval (15+ seconds) to avoid rate limit exhaustion
  • Azure AI Foundry has rate limits on API calls (e.g., 400 capacity = 40 req/sec)
  • Adjust based on your deployment's capacity and concurrent operations

#Extracting Citations from Responses

Citations are embedded in message annotations. Extract them using type guards:

import type {
  MessageTextContent,
  ThreadMessage,
  MessageTextUrlCitationAnnotation,
} from '@azure/ai-agents';

private extractMessageContent(message: ThreadMessage): {
  content: string;
  citations: Array<{ title: string; url: string }>;
} {
  const contentParts: string[] = [];
  const citations: Array<{ title: string; url: string }> = [];

  for (const content of message.content) {
    if (isOutputOfType<MessageTextContent>(content, 'text')) {
      contentParts.push(content.text.value);

      // Extract citations from annotations
      if (content.text.annotations) {
        for (const annotation of content.text.annotations) {
          if (
            isOutputOfType<MessageTextUrlCitationAnnotation>(
              annotation,
              'url_citation'
            )
          ) {
            citations.push({
              title: annotation.urlCitation.title || annotation.urlCitation.url,
              url: annotation.urlCitation.url,
            });
          }
        }
      }
    }
  }

  return {
    content: contentParts.join('\n'),
    citations,
  };
}

Citation Structure:

  • Each citation includes a title and URL
  • Title defaults to URL if not provided
  • Citations maintain order of appearance in the response
  • Duplicate URLs may appear if referenced multiple times

#Example Usage

Here's how to use the service for a one-off query:

const agentService = new AzureAgentService();

// Create an agent with Bing Grounding (enabled by default)
const agent = await agentService.createAgent({
	name: 'research-assistant',
	instructions:
		'You are a helpful research assistant. Use Bing grounding to provide accurate, up-to-date information with citations.',
});

try {
	// Send a query
	const response = await agentService.sendMessage(
		agent.id,
		'What are the latest developments in quantum computing from the past month?',
	);

	console.log('Response:', response.content);
	console.log('Citations:');
	response.citations?.forEach((citation, index) => {
		console.log(`${index + 1}. ${citation.title}`);
		console.log(`   ${citation.url}`);
	});

	console.log('Tokens used:', response.metadata.tokensUsed?.total);
} finally {
	// Clean up
	await agentService.deleteAgent(agent.id);
}

Example Output:

Response: Recent developments in quantum computing include IBM's announcement
of a 1,121-qubit quantum processor, Google's breakthrough in quantum error
correction, and Microsoft's release of Azure Quantum Elements for chemistry
simulations.

Citations:
1. IBM Unveils 1,121-Qubit Quantum Processor
   https://www.ibm.com/quantum/news/1121-qubit-processor
2. Google achieves quantum error correction milestone
   https://blog.google/technology/quantum-ai/quantum-error-correction/
3. Microsoft Azure Quantum Elements for chemistry
   https://azure.microsoft.com/en-us/products/quantum/elements/

Tokens used: 3245

#Cost Considerations and Pricing Tiers

Bing Grounding pricing consists of two components: the Bing resource itself and the tokens consumed during agent runs.

#Bing Resource Pricing

The G1 SKU for Bing Grounding is billed per 1,000 transactions (queries):

  • G1 Tier: Approximately $7 per 1,000 queries
  • No minimum commitment: Pay only for what you use
  • Shared resource: One resource can serve multiple projects and environments

#Token Consumption

Search results add to prompt tokens. Typical token usage:

Typical token usage by search result count:

  • 5 results - ~500 tokens added
  • 10 results - ~1,000 tokens added
  • 20 results - ~2,000 tokens added
  • 50 results - ~5,000 tokens added

With GPT-4o pricing at ~$5 per 1M prompt tokens, 20 search results add roughly $0.01 per query in token costs.

#Total Cost Example

For 10,000 queries per month with 20 search results each:

  • Bing Grounding: 10,000 queries × ($7 / 1,000) = $70
  • Token overhead: 10,000 queries × 2,000 tokens × ($5 / 1M) = $100
  • Total: ~$170/month for grounded search

Compare this to building a custom web search solution with scraping infrastructure, storage, and maintenance. Bing Grounding is significantly cheaper and more reliable.

#Sharing Bing Resource Across Environments

A single Bing Grounding resource can be shared across multiple environments (dev, staging, prod) while maintaining project-level isolation. This reduces costs without sacrificing security.

#How Resource Sharing Works

The shared Bing Grounding resource sits at the top of your infrastructure hierarchy. You create it once in the development environment, and then each environment's AI Foundry project - whether dev, staging, production, or disaster recovery - connects to that same resource. Each project maintains its own agents, threads, and conversation history, so there's no data leakage between environments. You get the cost savings of a single Bing resource while preserving complete isolation at the application layer.

#Terraform Pattern for Sharing

Use conditional resource creation and data sources:

# Dev creates the resource
resource "azapi_resource" "bing_grounding" {
  count = var.env == "dev" ? 1 : 0
  # ... resource configuration
}

# Other environments reference it
data "azapi_resource" "bing_grounding_existing" {
  count = var.env != "dev" ? 1 : 0
  type  = "Microsoft.Bing/accounts@2020-06-10"
  name  = "shared-bing-grounding"
  # ... rest of configuration
}

# Each environment creates its own connection
resource "azapi_resource" "bing_connection" {
  # All environments create their own connection
  # but reference the same Bing resource
  parent_id = azurerm_cognitive_account.ai_foundry.id
  # ... connection configuration
}

#Security Considerations

Project Isolation: Each AI Foundry project has its own threads, agents, and conversation history. Sharing the Bing resource does not compromise data isolation.

API Key Rotation: When rotating Bing API keys, update connections in all environments. Use a secret management system like Azure Key Vault for centralized key distribution.

Rate Limiting: All environments share the same rate limits on the Bing resource. Monitor usage to avoid throttling in production due to dev/staging activity.

#Cost Savings

Sharing one Bing resource across 4 environments (dev, staging, prod, DR):

  • Without Sharing: 4 × $7/1,000 queries = $28 per 1,000 queries (if each used separately)
  • With Sharing: 1 × $7/1,000 queries = $7 per 1,000 queries
  • Savings: 75% reduction in Bing resource costs

This is one of the most effective cost optimizations for multi-environment deployments.

#Comparing Bing Grounding with Other Approaches

Bing Grounding is one of several ways to provide AI agents with additional context. Understanding the tradeoffs helps you choose the right tool.

#Bing Grounding vs. RAG (Retrieval Augmented Generation)

RAG: Private Data Search

  • Searches your own documents, databases, knowledge bases
  • Full control over content and indexing
  • No internet access required
  • Ideal for internal company knowledge, proprietary data

Bing Grounding: Public Web Search

  • Searches the public web via Bing
  • Access to real-time information and current events
  • Automatic citations from web sources
  • Ideal for current events, market data, public documentation

When to Use Both: Use RAG for internal knowledge (product specs, company policies) and Bing Grounding for external context (market trends, competitor analysis, current events).

// Example: Combining RAG and Bing Grounding
const agent = await agentService.createAgent({
	name: 'hybrid-assistant',
	instructions: `You are a business intelligence assistant.

  Use Azure AI Search (RAG) for:
  - Internal sales data
  - Product specifications
  - Company policies

  Use Bing Grounding for:
  - Competitor news and announcements
  - Market trends and analysis
  - Industry regulations and updates`,
	// Enable both tools
	enableBingGrounding: true,
	// RAG would be added here via Azure AI Search connection
});

#Bing Grounding vs. Custom Web Tools

Custom Web Tools (Function Calling)

  • You write the search/scraping logic
  • Full control over sources and parsing
  • Complex to maintain and scale
  • Legal and ethical considerations for scraping

Bing Grounding

  • Microsoft handles search infrastructure
  • Built-in citation extraction
  • No scraping or legal concerns
  • Limited customization (no custom search filters)

When to Use Custom Tools: When you need to search specific sites (e.g., internal documentation hosted externally), parse non-standard formats, or have specific data extraction requirements that Bing doesn't support.

#Bing Grounding vs. Web Browser Tools

Some AI frameworks provide browser automation tools (Selenium, Playwright) for agents. These let agents navigate websites interactively.

Web Browser Tools

  • Full browser automation (click, scroll, fill forms)
  • Can access content behind authentication
  • Very slow (seconds to minutes per page)
  • High token consumption (full page HTML)
  • Complex error handling

Bing Grounding

  • Fast search results (sub-second)
  • Optimized for token efficiency
  • No authentication support
  • Limited to search results (can't interact with pages)

When to Use Browser Tools: For tasks requiring interaction (filling forms, clicking through multi-step processes) or accessing authenticated content. For simple information retrieval, Bing Grounding is faster and cheaper.

#Comparison Summary

Bing Grounding:

  • Data source: Public web
  • Latency: Low (~1s)
  • Token cost: Medium
  • Citations: Automatic
  • Real-time: Yes
  • Setup complexity: Low
  • Best for: Current events, public info

RAG (Retrieval Augmented Generation):

  • Data source: Your data
  • Latency: Very low (under 500ms)
  • Token cost: Low
  • Citations: Manual
  • Real-time: No
  • Setup complexity: Medium
  • Best for: Internal knowledge

Custom Web Tools:

  • Data source: Custom
  • Latency: Medium (2-5s)
  • Token cost: Medium
  • Citations: Manual
  • Real-time: Yes
  • Setup complexity: High
  • Best for: Specific sources

Browser Tools:

  • Data source: Any website
  • Latency: High (10-60s)
  • Token cost: Very high
  • Citations: Manual
  • Real-time: Yes
  • Setup complexity: Very high
  • Best for: Interactive tasks

#Key Takeaways

  • Bing Grounding adds real-time web search to Azure AI agents with automatic citations
  • Provision with Terraform using Microsoft.Bing/accounts and kind Bing.Grounding
  • Connect to AI Foundry projects via Microsoft.CognitiveServices/accounts/connections
  • Configure search result count (1-50) based on quality needs and token budget
  • Extract citations from message annotations for transparency and verification
  • Share one Bing resource across environments to reduce costs by 75%
  • Use Bing Grounding for public web data, RAG for private data, and combine both for comprehensive coverage
  • Monitor token consumption: higher search counts improve quality but increase costs

Ready to add web search to your AI agents? Start with the official Azure AI Agents documentation and explore the full range of agent tools.

For more Azure AI content, check out our guide on building production-ready AI applications with Azure AI Foundry.

Enjoyed this article? Stay updated: