Skip to content

kontext21/context

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Welcome! This is a curated collection of resources around capturing & processing user context from computer screens and beyond.

Motivation

Context matters. Providing it is painful.

No human user should be forced to repeat themselves. That's what computers are made for. It's the 21st century, yet we are still forced to work with old-school software that is heavily siloed. Users have to repeat inputs, rely on copy & paste, import/export or in the best case integegration via APIs.

What if all software could acess all other software that you use?

Microsoft's Office 97 Clippy "Are you writing a letter?" Was the right idea at the wrong time.

Now is the right time. If you are an innovator and excited about empowering your user with context, you came to the right place.

Developer Painpoints

When developing context aware applications, one have to deal with a lot of pain points.

  1. Discovery & validation - Keeping up with what’s possible & Setting up experimentation
  2. Privacy & Compliance - Ensure user trust & compliance with regulations
  3. Performance - Faster is better
  4. Cost - LLMs are expensive - Here is our cost calculator
  5. Quality Testing & Evaluation - You need to test and evaluate the quality of your application
  6. Deployment - You need to deploy your application and keep it running
  7. Capturing Recording - cross-platform recording the user's screen, audio, etc
  8. Avoiding lock-in - avoid lock-in and stay flexible

Stages

  1. Capture - Ingest raw data from the source
  2. Pre-Process - Clean, enrich, transform the data
  3. Contextualize - Add meaning to the data. Merge different sources, do enrichment like entity recognition & aggregation.
  4. Act - Use the data to take action, optionally store it for later.

Interesting Links

Capturing

Processing

Memory

Vision Models

Document Models

Computer Use & Browser Use

Logging/Rewind/Recall

OCR

Screen Recording / Remote Desktop sharing

Benchmarking

Datasets

  • RICO Dataset: This dataset comprises approximately 66,000 screenshots from 9,300 Android apps across 27 categories. While it primarily focuses on mobile applications, it has been widely used for screen content understanding and could offer insights applicable to desktop applications. (arxiv.org)
  • ScreenQA Dataset: Introduced in the paper "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots," this dataset contains around 86,000 question-answer pairs collected from approximately 35,000 screenshots derived from the RICO dataset. Although centered on mobile app screenshots, the methodologies employed might be adaptable for desktop application contexts. (github.com)
  • CIRCL Images AIL Dataset: Offered by the Computer Incident Response Center Luxembourg (CIRCL), this dataset includes images such as photos and screenshots of websites. While it may not specifically target standard software applications, it could serve as a supplementary resource. (circl.lu)

Testing & Evaluation

  • "Taking the pain out of screenshot AI testing" - We don't want to record the same screen over and over again. Instead for developer convenience we want to record the screen once and reuse it. This allows us to test and evaluate different pipelines, models, prompts etc. but also ensure quality with production systems. One inspiration for this is https://roark.ai which does it for voice

Sources

Outputs OS Streams

  • Screen
  • Audio
  • Video
  • File Access
  • Notifications
  • Clipboard

Inputs

  • Keyboard
  • Mouse
  • Touch
  • Gaze
  • Gestures
  • Thoughts

APIs to other services

  • Calendar, Contacts, etc.
  • Messages (Email & Chats)
  • Browser History

About

Everything around capturing & processing user context from computer screens and beyond.

Resources

License

Stars

Watchers

Forks