How CosmaSense Works
A deep dive into the technology that powers local-first AI search
The Complete Pipeline
CosmaSense uses a sophisticated multi-stage pipeline to transform your files into searchable, semantically-aware content. Here's how each stage works:
File Discovery & Monitoring
CosmaSense starts by scanning the directories you've configured for indexing. Using efficient file system watchers, it monitors for new, modified, or deleted files in real-time.
Key Technologies:
- Native file system watchers (FSEvents on macOS, inotify on Linux)
- Efficient directory traversal with configurable exclusion patterns
- Smart filtering to ignore temporary files, build artifacts, and hidden files
- Change detection using file hashes and modification timestamps
Content Extraction & Parsing
Once a file is discovered, CosmaSense extracts its textual content using specialized parsers for different file types. This ensures that metadata and formatting are preserved where relevant.
Supported File Types:
Documents:
- PDF (text extraction)
- Microsoft Office (Word, Excel, PowerPoint)
- LibreOffice formats (ODT, ODS, ODP)
- Plain text and Markdown
Code & Configuration:
- Source code (Python, JavaScript, Java, Go, Rust, etc.)
- Configuration files (JSON, YAML, TOML, XML)
- Shell scripts and batch files
- Web files (HTML, CSS)
AI-Powered Summarization
The extracted content is processed by a local Large Language Model (LLM) to generate concise summaries. These summaries capture the key concepts and main ideas, making search results more informative.
How It Works:
- Runs entirely on your local machine - no cloud API calls
- Uses efficient quantized models optimized for CPU inference
- Generates both short (1-2 sentences) and detailed summaries
- Extracts key entities, topics, and themes from content
Vector Embedding Generation
The most critical step for semantic search: converting text into high-dimensional vectors that capture semantic meaning. Documents with similar meanings will have vectors that are close together in this vector space.
Technical Details:
- Uses state-of-the-art sentence transformer models
- Generates 384 or 768-dimensional vectors depending on model choice
- Processes text in chunks to handle large documents
- Local inference with ONNX runtime for optimal performance
Indexing & Storage
All extracted information - raw text, summaries, embeddings, and metadata - is stored in an optimized local database designed for fast retrieval.
Storage Architecture:
- SQLite for structured metadata and full-text search indices
- Specialized vector database (e.g., FAISS, Qdrant) for similarity search
- Incremental updates to minimize reindexing overhead
- Compression to reduce storage requirements
Hybrid Search Execution
When you search, CosmaSense combines multiple search strategies to deliver the most relevant results, balancing semantic understanding with traditional keyword matching.
Search Strategy:
- Vector Search: Finds semantically similar content using cosine similarity
- Full-Text Search: Traditional keyword matching with BM25 ranking
- Result Fusion: Combines and ranks results from both methods
- Re-ranking: Optional LLM-based re-ranking for enhanced relevance
Architecture Components
CosmaSense is built with a modular architecture that separates concerns and allows for flexible deployment options.
Backend Core
The heart of CosmaSense, handling all indexing, search, and AI operations. Written in Rust for performance and safety.
- Indexing engine and file watchers
- Vector and full-text search implementation
- REST API server for client interfaces
CLI Tool
Command-line interface for power users and automation. Perfect for scripting and CI/CD integration.
- Index management commands
- Query and search operations
- Configuration and maintenance
Terminal UI (TUI)
Interactive terminal interface with rich visualizations. Built with modern TUI frameworks for an intuitive experience.
- Real-time search with instant previews
- Interactive result browsing
- Index statistics and monitoring
macOS Native App
Beautiful native GUI for macOS with SwiftUI. Seamlessly integrates with macOS features like Spotlight.
- Native macOS design language
- Menu bar integration
- Global keyboard shortcuts
Ready to Try It?
Experience the power of local-first AI search. Download CosmaSense and start exploring your files like never before.