Skip to content

Event Automation is a Python tool by IEEE-MSIT for streamlining and automating event management tasks.

Notifications You must be signed in to change notification settings

IEEE-MSIT/event-automation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IEEE MSIT Logo

IEEE MSIT Event Automation System

Instagram Post Scraping & Event Classification Pipeline
An autonomous event management system with AI-powered content analysis, engineered by the IEEE MSIT Development Team.

Stars Forks License AI Powered Automated

Visit IEEE MSIT · Report a Bug · Request a Feature


System Overview

This revolutionary automation system eliminates the need for manual administrative oversight by autonomously scraping, analyzing, and classifying Instagram posts from IEEE MSIT's official account. The system intelligently distinguishes between events and achievements, generates structured JSON metadata using advanced AI models, implements robust duplicate detection mechanisms, and seamlessly integrates with frontend applications for dynamic content rendering.

Core Automation Pipeline

The system operates on a daily cron schedule, ensuring continuous content synchronization without human intervention:

  1. Instagram Content Extraction → Advanced scraping using instagrapi with session persistence
  2. AI-Powered Classification → Google Gemini 2.5 Flash model for intelligent content analysis
  3. Structured Data Generation → Pydantic-validated JSON schema with comprehensive metadata
  4. Duplicate Detection & Prevention → Semantic similarity analysis using LangChain
  5. Cloud Storage Integration → Cloudinary CDN for optimized image delivery
  6. Database Persistence → MongoDB Atlas with Motor async driver
  7. Frontend API Delivery → FastAPI endpoints for real-time data consumption

Advanced Architecture & Tech Stack

This system leverages cutting-edge technologies to deliver enterprise-grade automation capabilities.

Component Technologies
Backend Framework FastAPI, Uvicorn
AI & Machine Learning Google Gemini 2.5 Flash, LangChain Core, Structured Outputs
Instagram API Integration Instagrapi (Advanced Instagram Private API)
Cloud Infrastructure Cloudinary CDN, MongoDB Atlas
Data Processing Pydantic, Motor (Async MongoDB), HTTPX
Authentication & Security Environment-based configuration, Session management
Scheduling & Automation Cron-based daily execution
Image Processing Base64 encoding, Multi-format support

AI-Powered Event Classification

Intelligent Content Analysis Engine

The system employs sophisticated AI models to perform multi-dimensional content classification:

Classification Feature Implementation
Event Type Detection Workshop, Hackathon, Seminar, Conference, Bootcamp, Webinar classification
Category Extraction AI/ML, Web Development, Cybersecurity, Sustainability domain identification
Status Determination Upcoming, Registration-open, Live, Completed status inference
Temporal Analysis Date extraction and event timeline processing
Relevance Filtering Event vs. Achievement vs. Announcement classification
Duplicate Prevention Semantic similarity analysis with fuzzy matching algorithms

Structured Data Schema

interface EventInfo {
  title?: string; // AI-extracted event title
  type?: string; // Event categorization
  category?: string; // Domain classification
  status?: string; // Current event status
  startDate?: string; // Temporal extraction
  endDate?: string; // Event duration
  venue?: string; // Location identification
  registrationType?: string; // Access level determination
  actionLinks?: string[]; // Contact and registration extraction
  prizes?: string[]; // Prize structure identification
  description?: string; // Comprehensive event details
  isRelevant: boolean; // AI relevance determination
  cloudinary_url?: string; // CDN-optimized image URL
  post_date?: string; // Original posting timestamp
}

Automated Workflow Architecture

Daily Execution Pipeline

graph TD
    A[Cron Trigger - Daily] --> B[Instagram Session Authentication]
    B --> C[Scrape Latest Posts - Max 45]
    C --> D[Image & Caption Extraction]
    D --> E[AI Content Analysis - Gemini 2.5]
    E --> F{Event Relevance Check}
    F -->|Relevant| G[Database Query - Existing Events]
    F -->|Irrelevant| H[Skip Processing]
    G --> I{Duplicate Detection}
    I -->|Unique| J[Cloudinary Upload]
    I -->|Duplicate| K[Skip Insertion]
    J --> L[MongoDB Insertion]
    L --> M[JSON Response Generation]
    M --> N[Frontend API Consumption]
Loading

API Key Management & Load Balancing

The system implements intelligent API key rotation to handle high-volume processing:

class APIKeyManager:
    - Round-robin key distribution
    - Automatic failover mechanisms
    - Rate limit optimization
    - Multi-key concurrent processing

Advanced Features & Capabilities

Core Automation Features

Feature Description
Autonomous Session Management Persistent Instagram authentication with automatic session recovery and regeneration
Intelligent Rate Limiting Dynamic delay mechanisms (30-120s) to prevent API throttling and maintain compliance
Multi-Format Media Support Comprehensive handling of images, carousels, and video thumbnails with format optimization
Cloud-Native Architecture Serverless-ready design with horizontal scaling capabilities
Real-time Processing Asynchronous execution pipeline with concurrent processing optimization
Error Recovery Systems Comprehensive exception handling with automatic retry mechanisms

AI & Machine Learning Features

Feature Description
Vision-Language Model Integration Google Gemini 2.5 Flash for multimodal content understanding
Semantic Duplicate Detection Advanced similarity analysis using LangChain for content deduplication
Context-Aware Classification Temporal context integration for accurate event status determination
Structured Output Generation Pydantic-enforced schema validation for consistent data formatting
Multi-Prompt Engineering Specialized prompts for event classification and similarity detection
Confidence Scoring AI relevance determination with boolean confidence metrics

Database & Storage Features

Feature Description
MongoDB Atlas Integration Cloud-native document storage with async Motor driver for high-performance operations
Cloudinary CDN Management Automated image optimization, transformation, and global content delivery
Async Database Operations Non-blocking database interactions for optimal performance
Event Collection Management Dedicated collections for event data with indexing optimization
Backup & Recovery Systems Automated data persistence with cloud redundancy

Getting Started: Development Setup

Prerequisites

Ensure you have the following installed on your development machine:

Installation & Configuration

  1. Clone the Repository

    git clone https://github.com/AneeshAhuja31/ieee-automation.git
    cd ieee-automation
  2. Environment Setup

    # Create virtual environment
    python -m venv venv
    
    # Activate virtual environment
    # Windows
    venv\Scripts\activate
    # macOS/Linux
    source venv/bin/activate
  3. Install Dependencies

    # Core dependencies
    pip install -r requirements.txt
    
    # Analyzer module dependencies
    pip install -r app/analyser/requirements.txt
  4. Environment Configuration

    Create .env file in the root directory:

    # Instagram Authentication
    SESSION_FILE="session.json"
    USERNAME="[email protected]"
    PASSWORD="your_instagram_password"
    TARGET_USER="ieeemsit"
    
    # Google AI API Keys (Multiple for load balancing)
    GEMINI_API_KEY_1="your_gemini_api_key_1"
    GEMINI_API_KEY_2="your_gemini_api_key_2"
    
    # Cloudinary Configuration
    CLOUDINARY_CLOUD_NAME="your_cloudinary_cloud_name"
    CLOUDINARY_API_KEY="your_cloudinary_api_key"
    CLOUDINARY_API_SECRET="your_cloudinary_api_secret"
    
    # MongoDB Atlas Configuration
    MONGODB_URI="mongodb+srv://username:[email protected]/"
    MONGODB_DATABASE_NAME="ieeemsit"
  5. Run the Application

    # Start FastAPI server
    cd app
    uvicorn app:app --reload --host 0.0.0.0 --port 8000
    
    # Alternative: Run scraper independently
    python working.py
  6. API Access

    The application will be running at http://localhost:8000

    • API Documentation: http://localhost:8000/docs
    • Event Analysis Endpoint: POST /analyse/jsons

API Documentation & Integration

Primary Endpoint: Event Analysis

POST /analyse/jsons

Processes a batch of Instagram posts and returns classified event data.

{
  "json_list": [
    {
      "Post Image": "https://instagram.com/image_url",
      "Post Caption": "Join us for our upcoming workshop...",
      "Post Date": "2025-01-29"
    }
  ]
}

Response:

[
  {
    "title": "AI Workshop 2025",
    "type": "workshop",
    "category": "ai",
    "status": "upcoming",
    "startDate": "2025-02-15",
    "venue": "MSIT Campus",
    "registrationType": "free",
    "isRelevant": true,
    "cloudinary_url": "https://res.cloudinary.com/...",
    "description": "Comprehensive workshop details..."
  }
]

Development Guidelines

  1. Issue Management
    All development begins with issue creation. Browse Issues or create new ones using our templates.

  2. Branch Strategy
    Create feature branches following the [type]/[description] convention:

    git checkout -b feat/ai-model-upgrade
    git checkout -b fix/duplicate-detection-bug
    git checkout -b docs/api-documentation-update
  3. Development Standards

    • Write comprehensive docstrings for all functions
    • Implement error handling for all external API calls
    • Add type hints for improved code maintainability
    • Follow PEP 8 style guidelines
  4. Testing Requirements

    # Run unit tests
    pytest tests/
    
    # Run integration tests
    pytest tests/integration/
    
    # Performance testing
    pytest tests/performance/
  5. Pull Request Process

    • Provide detailed PR descriptions with testing evidence
    • Include performance impact analysis
    • Ensure all CI/CD checks pass
    • Request review from maintainers

Code Quality Standards

  • Type Safety: Full type annotation coverage
  • Error Handling: Comprehensive exception management
  • Documentation: Inline comments and API documentation
  • Performance: Async/await patterns for I/O operations
  • Security: Input validation and sanitization

Connect With IEEE MSIT

Stay connected with IEEE MSIT's innovation and automation initiatives:

GitHub Instagram LinkedIn Twitter

Contact: [email protected] | Phone: +91-11-2681-4816
Website: ieeemsit.vercel.app


Development Team

Meet the engineering team behind this automation revolution:

Aneesh Ahuja
Aneesh Ahuja
PR Lead RAS
Rajveer Singh
Rajveer Singh
Vice Chairperson - Web Dev

About

Event Automation is a Python tool by IEEE-MSIT for streamlining and automating event management tasks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages