University Innovation Project

Closer

Next-Generation AI Smart Glasses
Bridging Closer via English Sign Language Recognition

Scroll to explore

The Vision Behind Closer

Closer is an advanced wearable solution integrated into a sleek smart glasses form factor. Utilizing an ultra-compact compute core, the glasses capture hand gestures, process spatial landmarks via MediaPipe, and leverage edge intelligence to provide instantaneous English Sign Language translation. By translating signs into high-fidelity synthesized speech, Closer empowers the deaf and hard-of-hearing community with seamless, real-time social interaction.

👓

Wearable Core

Integrated camera and compute core designed for a lightweight glasses form factor, enabling mobility and hands-free use.

🧠

Neural Translation

GestureNet: A proprietary deep learning architecture optimized for the nuances of Arabic sign semantics.

🔊

Text-to-Speech

Recognized signs are spoken aloud instantly via the Windows Speech API with configurable audio delay.

Low Latency

End-to-end recognition latency under 300ms — from hand gesture to spoken word.

Closer Modes

Closer is a two-way Closer bridge. Whether you communicate with sign language or speech, a single launcher puts both modes at your fingertips.

🤟

Sign Language Mode

For deaf & mute users who communicate via sign language.

  • Raspberry Pi 5 camera captures hand gestures
  • GestureNet AI classifies 26 English signs
  • Predicted word is spoken aloud via TTS
  • Gesture Dashboard lets you remap any label
python launcher.py → click Sign Language
🎤

Speech → LCD Mode

For hearing users speaking to a deaf/mute person.

  • Bluetooth Headset Microphone via Raspberry Pi Zero records speech
  • Faster-Whisper (tiny.en) cloud API transcribes the audio to text
  • Text is displayed on the Pi Zero's I²C LCD screen
  • Full scrollable transcript shown in simulator GUI
python launcher.py → click Speech → LCD
🤟 Hand Gesture Pi 5 Camera GestureNet 🔊 Spoken Word
— OR —
🎤 Voice Pi Zero Mic Whisper AI 📺 LCD Text

Meet the Team

The brilliant minds behind Closer

Abdelrahman Khaled Fawzy Mansour Abdel Fattah

Project Member

Abdelrahman Ayman Mahmoud Abu El-Makarim

Project Member

Abdelmohsen Mohamed Mahmoud Naeina

Project Member

Project Demo

Watch Closer recognize English sign language in real time.

Documentation

A concise technical breakdown for academic review.

📌 Problem Statement

Deaf and mute individuals face significant Closer barriers in everyday life. Existing solutions are either costly, require specialized hardware, or do not support two-way communication. Closer addresses this by providing an affordable, open-source English sign language recognition and speech-to-text system that runs on widely available hardware.

🏗️ System Architecture

The solution utilizes a high-performance distributed architecture:

  • Wearable Node (Smart Glasses): Integrated capture module → high-frequency hand landmark extraction → localized pre-processing.
  • Inference Cloud/Server: Advanced neural classification (GestureNet) → temporal smoothing → natural language synthesis (TTS).

This offloaded computation model ensures the wearable remains lightweight and thermal-efficient while delivering server-grade AI performance.

🤖 AI Model — GestureNet

PropertyValue
ArchitectureMulti-Layer Perceptron (MLP)
Input63 normalized landmark values
Output20 classes (English sign vocabulary)
Training Samples40,000 (20k original + 20k mirrored)
Training Epochs100
Confidence Threshold60%
SmoothingMajority-vote window (n=2)

🔬 Data Collection & Augmentation

Training data was collected using a webcam and MediaPipe for landmark extraction. Each of the 20 gesture classes was recorded with 1,000 samples. To make the model hand-agnostic (works with both left and right hands), the dataset was doubled via Mirror Augmentation: all X-coordinates were negated, mathematically mirroring every right-hand sample into a left-hand equivalent without any additional data collection.

🧪 Preprocessing Pipeline

Raw landmarks from MediaPipe are normalized before training and inference:

  1. Translation: Wrist (landmark 0) is moved to the origin.
  2. Scale normalization: Divided by the wrist-to-middle-MCP distance, making the representation hand-size agnostic.
  3. Flatten: 21 × 3D points → 63-dimensional vector.

This ensures the model recognizes signs regardless of how far the hand is from the camera.

📊 Results

MetricValue
End-to-end latency< 300ms
Signs supported20 English gestures
Hand supportBoth left & right
Operating environmentAny indoor lighting
Network requirementLocal Wi-Fi only

What's Next?

Planned improvements and future development directions.

Vocabulary Expansion
Vocabulary Expansion

Full English Sign Language Alphabet

Expanding the vocabulary from 20 signs to the full English Sign Language alphabet and common phrases, enabling richer real-world Closer.

Mobile App
Mobile App

Smartphone-Based Recognition

Bringing the recognition system to Android devices using TensorFlow Lite, eliminating the need for any external hardware.

Sentence-Level Recognition
Continuous Signing

Sentence-Level Recognition

Moving from isolated word recognition to detecting continuous signing sequences with temporal modeling (LSTM / Transformer).

Integrated AR HUD
Form Factor

Integrated AR HUD

Developing an Augmented Reality Heads-Up Display to show translated text directly on the glasses lens for the wearer.

Acknowledgements

🏛️

[ University Name ]

For providing the resources, labs, and academic environment that made this project possible.

🤝

Open Source Community

MediaPipe (Google), FastAPI, PyTorch, OpenCV, and all the open-source libraries that power this system.

Engineered for accessibility. Redefining human Closer. 💚

Global Innovation University · Faculty of Technology