Demo: On-Device Video Analysis with LLMs
Jaganathan, Vishnu, Gouda, Deepak, Arora, Kriti, Aggarwal, Mohit, and Zhang, Chao
In Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications Feb 2024
We present a new on-device pipeline that efficiently summarizes lecture videos and provides relevant answers directly from a smartphone. We utilize widely accessible tools like OCR and Vosk speech-to-text, coupled with powerful large language models (LLMs), to identify crucial sentences and generate summaries. By harnessing the capabilities of LLMs and the computational power of mobile devices, we fine-tune and quantize BERT and GPT-2 to achieve efficient lecture video summarization and question answering on consumer-grade smartphones like the Pixel 8 Pro. Notably, this approach eliminates the need for cloud APIs, ensuring enhanced user privacy and minimal mobile data usage.https://www.youtube.com/shorts/zwGdONlKays