Saj Sense Protocol (SSP) v1.0

Unified multimodal neural codec transport protocol. 12 modalities. Per-modality encryption with independent forward-secrecy ratchets. AI-native discrete token streams. Sub-100ms end-to-end latency.

Multiple provisional patent applications filed. Patent pending.

Join 200+ researchers and developers tracking the SSP specification

SSP Frame Wire Format (Simplified) +-----------------------------------------------------------+ | Version | Flags | Sequence Number | +-----------------------------------------------------------+ | Microsecond Timestamp | +-----------------------------------------------------------+ | Modality Bitmap | Slots | Priority | SID | +-----------------------------------------------------------+ | Frame Integrity (HMAC, truncated) | +-----------------------------------------------------------+ Compact frame header. 1-255 modality slots per frame. Each slot: independent codec, encryption key, and sync anchor.

Design Goals

G1

Sub-100ms end-to-end latency

Real-time multimodal transport with bounded latency guarantees across all modality types.

G2

Graceful degradation

Perceptual impact scoring determines which modalities degrade first under bandwidth pressure.

G3

Extensible modality slots

Register custom modality IDs (0x10+) without protocol changes. Future-proof by design.

G4

Per-modality encryption

Independent encryption keys per modality. Share audio without exposing biometric data.

G5

AI-native token stream

Discrete token payloads designed for direct consumption by transformer architectures.

G6

Bandwidth-adaptive

Dynamic bitrate allocation across modalities based on perceptual importance and available bandwidth.

G7

Cross-modal prediction

Modality slots declare prediction dependencies enabling cross-modal compression gains.

G8

E2E verification

Cryptographic integrity verification on every frame. Truncated MAC for wire efficiency.

G9

MPEG-I MIHS interop

Designed for compatibility with ISO/IEC 23090-31:2025 Multimodal Information Handling System.

Gap Analysis

SSP fills capabilities absent from existing transport protocols. No existing standard supports 7+ modality framing, per-modality E2E encryption, or AI-native token output.

Capability RTP WebRTC MIHS MoQT SSP
7+ modality framing Yes
Per-modality E2E encryption Yes
AI-native token output Yes
Cross-modal prediction Yes
Perceptual bitrate allocation Yes
Latent-space watermarking Yes
Selective disclosure Yes
Audio/video streaming Yes Yes Yes Yes Yes

Per-Modality Encryption

Each modality slot carries an independent encryption_key_id referencing a per-modality key established during SSP_KEY_EXCHANGE. Each modality MAY use an independent forward-secrecy ratchet chain.

Selective Disclosure

Audio Key 1
Video Key 2
Biometric Key 3
Recipient A holds Key 1 only:
Audio — decrypted
Video — encrypted, no access
Biometric — encrypted, no access

Encryption Architecture

Cipher
AEAD authenticated encryption
Key Derivation
Per-modality ratchet chain
Key ID Field
Slot header key reference (0 = unencrypted)
Forward Secrecy
Independent ratchet per modality
Frame Integrity
Truncated MAC per frame

Modality Registry

12 registered modality IDs. Custom modalities from 0x10. Each modality slot carries independent codec, encryption, and synchronization configuration.

0x01
AudioSpeech
Human vocal content
0x02
AudioAmbient
Environmental audio
0x03
VideoFace
Facial video stream
0x04
VideoScene
Scene/environment video
0x05
HapticVibro
Vibrotactile feedback
0x06
HapticKinesthetic
Force/resistance feedback
0x07
Spatial3D
3D spatial/positional data
0x08
Biometric
Physiological signals
0x09
MotionBody
Full-body motion capture
0x0A
MotionHand
Hand/finger tracking
0x0B
Thermal
Infrared/thermal imaging
0x0C
Emotion
Affective state inference

0x10+ reserved for custom modality registration

Token Stream

SSP frames carry AI-native discrete token payloads. Each modality produces tokens at its own rate. Cross-modal synchronization anchors align streams in time.

AudioSpeech
VideoFace
Biometric
MotionBody
Discrete token
Sync anchor
Different rates per modality. Shared temporal alignment.

Intellectual Property

Comprehensive patent portfolio covering the core SSP innovations. Multiple provisional patent applications filed with international protection pathways in progress.

Multimodal Transport Patent Pending

Unified Multimodal Transport with Dynamic Modality Registration

Token Stream Patent Pending

AI-Native Discrete Token Stream for Heterogeneous Sensory Modalities

Per-Modality Encryption Patent Pending

Per-Modality Cryptographic Key Management with Independent Forward-Secrecy Ratchets

Frame Format Patent Pending

Multimodal Packet Frame Format with Cross-Modal Prediction Dependencies

Neural Codec Tokenizer Patent Pending

Domain-Adaptive Neural Codec Tokenizer for Heterogeneous Sensory Modalities

Codec Watermarking Patent Pending

Latent-Space Codec Watermarking for Neural Audio/Video Provenance

Enterprise & Defense

SSP provides the multimodal sensing infrastructure layer for organizations that cannot trust third-party processing of biometric, spatial, or classified audio/video streams.

Selective Disclosure

Per-modality encryption with independent forward-secrecy ratchets enables granular access control. Share audio transcription without video access. Share motion tracking without biometric data. Isolate thermal/spatial from emotional analysis. Each modality's key chain ratchets independently.

EU AI Act Compliance

SSP's patented latent-space watermarking embeds imperceptible provenance markers in the latent space of neural codec tokens. Survives re-encoding, transcoding, and adversarial extraction attempts. Addresses Article 50 watermarking requirements for AI-generated audio/video content.

Target Compliance

  • EU AI Act (Article 50 watermarking)
  • FIPS 140-3 validated cipher suites
  • ISO/IEC 23090-31:2025 (MPEG-I MIHS)
  • RFC 2119 requirement language

Implementation

Reference implementation in Rust. Full frame serialization, wire format roundtrip, HMAC verification, modality slot encoding, and token stream parsing.

main.rs Rust
use saj_sense::{SspFrame, ModalitySlot};
 
// Build a frame with audio + biometric modalities
let audio = ModalitySlot {
modality_id: 0x01, // AudioSpeech
codec_id: 0x03, // Medium quality
encryption_key_id: 1,
payload: audio_data,
..Default::default()
};
 
let bio = ModalitySlot {
modality_id: 0x08, // Biometric
encryption_key_id: 2, // Separate key
..Default::default()
};
 
// Selective disclosure: key 1 != key 2
let frame = SspFrame::new(vec![audio, bio]);
let wire = frame.to_bytes();
 
// Roundtrip verification
let (parsed, n) = SspFrame::from_bytes(&wire)?;
assert_eq!(parsed.slots.len(), 2);
assert_eq!(n, wire.len());
Install
cargo add saj-sense
476+
Tests passing
0
Failures
6
Patents pending
12
Modalities

Modality Slot Structure

Each SSP frame carries 1-255 modality slots. Each slot has a compact fixed header followed by variable payload bytes.

Modality ID

Registered modality identifier with extensible custom range

Codec + Payload

Independent codec selection per modality with variable-length payload

Sync + QoS

Microsecond sync anchor, QoS level, and cross-modal prediction dependencies

Encryption Key ID

Per-modality key reference for independent encryption and selective disclosure

Use Cases

SSP addresses multimodal transport requirements across industries where existing protocols fall short on encryption granularity, modality coverage, or AI-native processing.

Enterprise Communications

Per-modality encryption for video calls enables selective disclosure for compliance. Share audio transcription with legal review without exposing video feeds. Grant AI assistants access to speech tokens while biometric data stays encrypted.

Selective Disclosure Compliance AI-Native

Defense & Intelligence

Sovereign multimodal processing with air-gapped deployment compatibility. On-device inference eliminates data exfiltration risk. Per-modality key management enables compartmentalized information handling across classification levels.

Air-Gapped Sovereign Compartmentalized

Healthcare

Patient data segregation by modality enables HIPAA-aligned selective access. Share physiological telemetry with monitoring systems while keeping video and audio encrypted. Biometric modality isolation prevents accidental cross-contamination.

HIPAA-Aligned Data Segregation Telemetry

Autonomous Systems

Multi-sensor fusion with crypto-separated modalities. SSP's unified framing carries LiDAR, camera, radar, IMU, and thermal data in a single synchronized stream with cross-modal prediction for compression. Bandwidth-adaptive under satellite link constraints.

Sensor Fusion Cross-Modal Bandwidth-Adaptive

Content Authentication

Patented latent-space watermarking embeds imperceptible provenance markers in neural codec tokens. Survives re-encoding, transcoding, and adversarial extraction. Detects AI-generated audio and video content at the codec level, not as a post-processing filter.

Watermarking EU AI Act Provenance

Why SSP

Existing protocols were designed for audio and video. SSP was designed from scratch for the multimodal, AI-native era.

One protocol for all modalities

Audio, video, haptic, spatial, biometric, motion, thermal, emotion -- all in a single synchronized frame. Not separate protocols stitched together with middleware. Not RTP for audio plus a custom channel for everything else.

AudioSpeech + VideoFace + Biometric
Spatial3D + MotionBody + Thermal
= 1 SSP frame, 1 wire format, 1 protocol
Encryption built into the frame

Per-modality encryption is not a transport-layer wrapper. It is a first-class frame field. Each modality slot declares its own encryption key, enabling selective disclosure without protocol extensions, middleware layers, or application-level workarounds.

Key A --> Audio decrypted
Key B --> Video encrypted
Not bolted on top. Built into every frame.
Bandwidth-adaptive without quality cliffs

SSP uses perceptual impact scoring to determine degradation order. When bandwidth drops, lower-priority modalities gracefully reduce quality while critical streams maintain fidelity. No binary on/off. No codec renegotiation. Smooth transitions.

Full --> Standard --> Low --> Semantic
Graceful cascade, not a cliff edge

Get Started

Start building with the SSP reference implementation. Rust-first, with Python bindings available.

Rust Cargo
# Add to your Cargo.toml
cargo add saj-sense
 
# Or with specific features
cargo add saj-sense --features encryption,watermark
Python pip
# Install from PyPI
pip install sajsense
 
# Verify installation
python -c "import sajsense; print(sajsense.__version__)"

Source Code

Full reference implementation on GitHub

Documentation

API reference and integration guides

Enterprise Evaluation

Request access for enterprise deployments

Stay Updated on SSP

The Saj Sense Protocol specification is under active development. Join the waitlist to receive updates on new versions, reference implementations, and early access to the SDK.

No spam. Unsubscribe anytime.