Welcome! This is a curated collection of resources around capturing & processing user context from computer screens and beyond.
Context matters. Providing it is painful.
No human user should be forced to repeat themselves. That's what computers are made for. It's the 21st century, yet we are still forced to work with old-school software that is heavily siloed. Users have to repeat inputs, rely on copy & paste, import/export or in the best case integegration via APIs.
What if all software could acess all other software that you use?
Microsoft's Office 97 Clippy "Are you writing a letter?" Was the right idea at the wrong time.
Now is the right time. If you are an innovator and excited about empowering your user with context, you came to the right place.
When developing context aware applications, one have to deal with a lot of pain points.
- Discovery & validation - Keeping up with what’s possible & Setting up experimentation
- Privacy & Compliance - Ensure user trust & compliance with regulations
- Performance - Faster is better
- Cost - LLMs are expensive - Here is our cost calculator
- Quality Testing & Evaluation - You need to test and evaluate the quality of your application
- Deployment - You need to deploy your application and keep it running
- Capturing Recording - cross-platform recording the user's screen, audio, etc
- Avoiding lock-in - avoid lock-in and stay flexible
- Capture - Ingest raw data from the source
- Pre-Process - Clean, enrich, transform the data
- Contextualize - Add meaning to the data. Merge different sources, do enrichment like entity recognition & aggregation.
- Act - Use the data to take action, optionally store it for later.
- Google AI Studio Live to stream your screen
- Mistral Pixtral
- OpenAI
- Meta 3.2 Vision
- https://github.com/microsoft/OmniParser & https://huggingface.co/microsoft/OmniParser-v2.0
- Phi-4-multimodal
- Qwen/Qwen2.5-VL
- Anthropic Computer Use
- Google Project Mariner
- https://github.com/browser-use/browser-use
- Nayla Browser
- Proxy 1.0
- OpenAI Operator
- Open Interpreter
- Microsoft Recall
- Apple on screen content
- Serval Open source alternatives for Recall/Logging
- OpenRecall Python OS+ privacy first alternative to WIndows 1,800 stars
- TotalRecall access Windows 11 recall data 2,000 stars
- Rem Recall on Mac 2,300 stars
- Windrecorder 3,000 stars
- mediar-ai/screenpipe rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get… 8,800 stars
- getomni-ai/zerox
- LLama-OCR
- Visual to “structured”
- RICO Dataset: This dataset comprises approximately 66,000 screenshots from 9,300 Android apps across 27 categories. While it primarily focuses on mobile applications, it has been widely used for screen content understanding and could offer insights applicable to desktop applications. (arxiv.org)
- ScreenQA Dataset: Introduced in the paper "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots," this dataset contains around 86,000 question-answer pairs collected from approximately 35,000 screenshots derived from the RICO dataset. Although centered on mobile app screenshots, the methodologies employed might be adaptable for desktop application contexts. (github.com)
- CIRCL Images AIL Dataset: Offered by the Computer Incident Response Center Luxembourg (CIRCL), this dataset includes images such as photos and screenshots of websites. While it may not specifically target standard software applications, it could serve as a supplementary resource. (circl.lu)
- "Taking the pain out of screenshot AI testing" - We don't want to record the same screen over and over again. Instead for developer convenience we want to record the screen once and reuse it. This allows us to test and evaluate different pipelines, models, prompts etc. but also ensure quality with production systems. One inspiration for this is https://roark.ai which does it for voice
- Screen
- Audio
- Video
- File Access
- Notifications
- Clipboard
- Keyboard
- Mouse
- Touch
- Gaze
- Gestures
- Thoughts
- Calendar, Contacts, etc.
- Messages (Email & Chats)
- Browser History