Skip to content

feat: add bot detection functionality #210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Development Commands

- **Build**: `pnpm build` - Builds the module using nuxt-module-build and generates client
- **Development**: `pnpm dev` - Runs playground at `.playground` directory
- **Development Preparation**: `pnpm dev:prepare` - Prepares development environment with stub build
- **Test**: `pnpm test` - Runs vitest test suite
- **Lint**: `pnpm lint` - Runs ESLint with auto-fix using @antfu/eslint-config
- **Type Check**: `pnpm typecheck` - Runs TypeScript compiler for type checking
- **Client Development**: `pnpm client:dev` - Runs devtools UI client on port 3300
- **Release**: `pnpm release` - Builds, bumps version, and publishes

## Architecture Overview

This is a Nuxt module (`@nuxtjs/robots`) that provides robots.txt generation and robot meta tag functionality for Nuxt applications.

### Core Module Structure

- **`src/module.ts`**: Main module entry point with module options and setup logic
- **`src/runtime/`**: Runtime code that gets injected into user applications
- **`app/`**: Client-side runtime (composables, plugins)
- **`server/`**: Server-side runtime (middleware, routes, composables)
- **`src/kit.ts`**: Utilities for build-time module functionality
- **`src/util.ts`**: Shared utilities exported to end users

### Key Runtime Components

- **Server Routes**:
- `/robots.txt` route handler in `src/runtime/server/routes/robots-txt.ts`
- Debug routes under `/__robots__/` for development
- **Server Composables**: `getSiteRobotConfig()` and `getPathRobotConfig()` for runtime robot configuration
- **Client Composables**: `useRobotsRule()` for accessing robot rules in Vue components
- **Meta Plugin**: Automatically injects robot meta tags and X-Robots-Tag headers

### Build System

- Uses `@nuxt/module-builder` with unbuild configuration in `build.config.ts`
- Exports multiple entry points: main module, `/util`, and `/content`
- Supports both ESM and CommonJS via rollup configuration

### Test Structure

- **Integration Tests**: Test fixtures in `test/fixtures/` with full Nuxt apps
- **Unit Tests**: Focused tests in `test/unit/` for specific functionality
- Uses `@nuxt/test-utils` for testing Nuxt applications
- Test environment automatically set to production mode

### Development Workflow

The module supports a playground at `.playground` for local development and manual testing. The client UI (devtools integration) is developed separately in the `client/` directory.

### I18n Integration

The module has special handling for i18n scenarios, with logic in `src/i18n.ts` for splitting paths and handling localized routes.

### Content Integration

Provides integration with Nuxt Content module via `src/content.ts` for content-based robot configurations.
180 changes: 180 additions & 0 deletions docs/content/2.guides/4.bot-detection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
---
title: Bot Detection
description: Detect and classify bots with server-side header analysis and client-side browser fingerprinting.
---

## Introduction

Bot detection helps you identify automated traffic to better understand your visitors and optimize your site's behavior for different user types.

The module provides comprehensive bot detection that works on both server and client side, identifying everything from search engine crawlers to malicious automation tools.

## Getting Started

To enable bot detection, you have two options:

1. **Use the `useBotDetection()` composable** in your Vue components for reactive bot detection
2. **Use the server utilities** (`getBotDetection`, `isBot`, `getBotInfo`) in your server routes, middleware, or API handlers

Without using these functions, no bot detection will occur.

## Basic Usage

The `useBotDetection()` composable provides reactive access to bot detection results:

```vue
<script setup>
const { isBot, botType } = useBotDetection()
</script>

<template>
<div v-if="isBot">
Bot detected: {{ botType }}
</div>
<div v-else>
Human visitor
</div>
</template>
```

## Detection Methods

### Server-side Detection (Manual Only)

Analyzes HTTP headers and user agents to identify known bots when you call the detection utilities:

- Search engines (Google, Bing, Yandex)
- Social media crawlers (Twitter, Facebook)
- SEO tools (Ahrefs, SEMrush)
- AI crawlers (GPT, Claude)
- Security scanners and automation tools

This detection is lightweight and only runs when you explicitly call the detection functions or use the `useBotDetection()` composable.

### Client-side Fingerprinting (Opt-in)

Uses @fingerprintjs/botd to detect advanced automation tools:

- Headless browsers (Chrome, Firefox)
- Automation frameworks (Selenium, Playwright)
- Bot detection tools (PhantomJS, Nightmare)
- Cached results for performance

**Note:** Client-side fingerprinting is disabled by default due to performance costs. Enable it explicitly when needed.

## Watching Changes

Bot detection state is reactive and can be watched for changes:

```ts
import { watch } from 'vue'

const { isBot, botType } = useBotDetection()

watch(isBot, (detected) => {
if (detected) {
console.log(`Bot detected: ${botType.value}`)
}
})
```

## Bot Classification

Detected bots are classified by type and trust level:

```ts
const { botType, trusted } = useBotDetection()

if (botType.value) {
console.log(botType.value) // 'search-engine', 'social', 'ai', etc.
console.log(trusted.value) // true for legitimate bots
}
```

## Server-side Usage

### In Nitro Routes and Middleware

Use the provided server composables:

```ts
// server/api/example.ts
import { getBotDetection, getBotInfo, isBot } from '@nuxtjs/robots/nitro'

export default defineEventHandler(async (event) => {
// Simple boolean check
if (isBot(event)) {
return { message: 'Bot detected' }
}

// Get detailed info
const botInfo = getBotInfo(event)
if (botInfo?.trusted) {
return { message: 'Trusted bot', bot: botInfo.name }
}

return { message: 'Human user' }
})
```

### Manual Header Analysis

For custom header analysis with pure utility functions:

```ts
import { getBotDetection, getBotInfo, isBot } from '@nuxtjs/robots/util'

const headers = getHeaders(event)

// Complete detection context
const detection = getBotDetection(headers)

// Simple boolean check
if (isBot(headers)) {
console.log('Bot detected!')
}

// Detailed bot info
const botInfo = getBotInfo(headers)
if (botInfo) {
console.log(`Bot: ${botInfo.name} (${botInfo.type})`)
}
```

**Note:** These pure utility functions work with any headers object and can be used in any JavaScript environment.

## Client-side Fingerprinting

Client-side fingerprinting can be enabled through the composable options:

```vue
<script setup>
// Enable fingerprinting with error handling
const { isBot, botType, trusted } = useBotDetection({
fingerprint: true,
onFingerprintError: (error) => {
console.error('Fingerprinting failed:', error)
}
})
</script>

<template>
<div>
<p v-if="isBot">
Bot detected: {{ botType }}
<span v-if="trusted">(trusted)</span>
</p>
<p v-else>
Human visitor
</p>
</div>
</template>
```

## Performance

- **Server detection**: Lightweight header analysis with minimal overhead (manual only)
- **Client fingerprinting**: More comprehensive but expensive detection (opt-in only)
- **Caching**: Results persist in localStorage to avoid repeated expensive checks
- **Optimization**: Fingerprinting skipped if server already detected a bot
- **Binary decision**: Either definitively a bot or not detected
Loading