Skip to main content

@pexip/media-processor

An package for media data processing using Web API.

Install

npm install @pexip/media-processor

Usages

Use Analyzer to get the MediaStream data

const stream = await navigator.mediaDevices.getUserMedia({audio:true});

const fftSize = 64;
// Setup an Audio Graph with `source` -> `analyzer`
const source = createStreamSourceGraphNode(stream);
const analyzer = createAnalyzerGraphNode({fftSize});
const audioGraph = createAudioGraph([[source, analyzer]]);

// Grab the current time domain data in floating point representation
const buffer = new Float32Array(fftSize);
analyzer.node?.getFloatTimeDomainData(buffer);
// Do some work with the buffer
buffer.forEach(...);

// Get the current volume, [0, 1]
const volume = analyzer.getAverageVolume(buffer);

// Release the resources when you have done with the analyzer
await audioGraph.release();

Use AudioGraph to control the audio gain to mute/unmute

const stream = await navigator.mediaDevices.getUserMedia({
audio: true,
video: true,
});

const mute = !stream.getAudioTracks()[0]?.enabled;

// Setup an Audio Graph with `source` -> `gain`
const source = createStreamSourceGraphNode(stream);
const gain = createGainGraphNode(mute);
const destination = createStreamDestinationGraphNode();

const audioGraph = createAudioGraph([[source, gain, destination]]);

// Use the output MediaStream to for the altered AudioTrack
const alteredStream = new MediaStream([
...stream.getVideoTracks(),
...destination.stream.getAudioTracks(),
]);

// Mute the audio
if (gain.node) {
gain.node.mute = true;
}

// Check if the audio is muted
gain.node.mute; // returns `true`, since we have just set the gain to 0

// Release the resources when you have done with the analyzer
await audioGraph.release();

Use noise suppression

const stream = await navigator.mediaDevices.getUserMedia({
audio: true,
video: true,
});

// Fetch denoise WebAssembly
const response = await fetch(
new URL('@pexip/denoise/denoise_bg.wasm', import.meta.url).href,
);
const wasmBuffer = await response.arrayBuffer();

// Setup an Audio Graph with `source` -> `gain`
const source = createStreamSourceGraphNode(stream);
const destination = createStreamDestinationGraphNode();

const audioGraph = createAudioGraph([[source, destination]]);
// Add worklet module
await audioGraph.addWorklet(
new URL(
'@pexip/media-processor/dist/worklets/denoise.worklet',
import.meta.url,
).href
);
const denoise = createDenoiseWorkletGraphNode(wasmBuffer);
// Route the source through the denoise node
const audioGraph.connect([source, denoise, destination]);
const audioGraph.disconnect([source, destination]);

// Release the resources when you have done with the analyzer
await audioGraph.release();

Use background blur

import {
createSegmenter,
createCanvasTransform,
createVideoTrackProcessor,
createVideoTrackProcessorWithFallback,
} from '@pexip/media-processor';

// Grab the user's camera stream
const stream = await navigator.mediaDevices.getUserMedia({video: true});

// Setting the path to that `@mediapipe/tasks-vision` assets
// It will be passed direct to
// [FilesetResolver.forVisionTasks()](https://ai.google.dev/edge/api/mediapipe/js/tasks-vision.filesetresolver#filesetresolverforvisiontasks)
const tasksVisionBasePath =
'A base path to specify the directory the Wasm files should be loaded from';

const modelAsset = {
/**
* Path to mediapipe selfie segmentation model asset
*/
path: 'A path to selfie segmentation model',
modelName: 'selfie' as const,
};

const segmenter = createSegmenter(tasksVisionBasePath, {modelAssets});
// Create a processing transformer and set the effects to `blur`
const transformer = createCanvasTransform(segmenter, {effects: 'blur'});
const processor = createVideoProcessor([transformer]);

// Start the processor
await videoProcessor.open();

// Passing the raw MediaStream to apply the effects
// Then, use the output stream for whatever purpose
const processedStream = await videoProcessor.process(stream);

Profiling Web Audio

You can do it with chrome, See here.

How AudioWorkletNode and the AudioWorkletProcessor work together

┌─────────────────────────┐             ┌──────────────────────────┐
│ │ │ │
│ Main Global Scope │ │ AudioWorkletGlobalScope │
│ │ │ │
│ ┌───────────────────┐ │ │ ┌────────────────────┐ │
│ │ │ │ MessagePort │ │ │ │
│ │ AudioWorklet │◄─┼─────────────┼─►│ AudioWorklet │ │
│ │ Node │ │ │ │ Processor │ │
│ │ │ │ │ │ │ │
│ └───────────────────┘ │ │ └────────────────────┘ │
│ │ │ │
└─────────────────────────┘ └──────────────────────────┘
Main Thread WebAudio Render Thread

Constraints when using the AudioWorklet

  • Each BaseAudioContext possesses exactly one AudioWorklet
  • 128 samples-frames
  • No fetch API in the AudioWorkletGlobalScope
  • No TextEncoder/Decoder APIs in the AudioWorkletGlobalScope

References

A library for media analysis using Web APIs.

Enumerations

EnumerationDescription
AbortReason-

Interfaces

InterfaceDescription
Segmentation-
VideoFrameLike-
ImageRecord-
Rect-
ProcessEventOpenMessage event to initializing the Processor
ProcessEventProcessMessage event to process the video frame
ProcessEventUpdateMessage event to update processor options
ProcessEventCloseMessage event to stop the processing
ProcessEventDestroyMessage event to release all the resources
ProcessEventDebugMessage event for debugging
ProcessWorkerEventProcessed-
ProcessWorkerEventOpened-
ProcessWorkerEventUpdated-
ProcessWorkerEvent-
ProcessWorkerEventDebug-
ProcessorOptions-
ProcessorProcessOptions-
ProcessWorkerDebugMessage-
ProcessorEvent-
RendererOptions-
Renderer-
RenderEventHandlers-
SegmentationModelAssetSegmentation model asset that can be used to load a segmentation model.
ImageSegmenterOptions-
SegmenterOptions-
SegmentationSmoothingConfig-
MPMaskWrapper for a mask produced by a Segmentation Task.
Benchmark-
PointInterface for Point consist of coordinates x and y
Size-
Frame-
AudioStatsData structure for the audio statistics while processing
StatsOptionsStats options for calculating the audio stats
SubscribableOptions-
WorkletMessagePortOptions-
AnalyzerSubscribableOptions-
DenoiseA wrapper for the Denoise wasm module
Gain-
Analyzer-
AudioNodeProps-
AudioNodeInit-
WorkletModule-
AudioGraphOptions-
AudioGraph-
ClockClock interface to get the current time with now method,
ThrottleOptionsLimit the rate of flow in terms of millisecond, and provided Clock
Runner-
Transform-
AsyncAssets-
Process-
BackgroundImage-
Segmenter-
Detector-
SegmentationParams-
VideoProcessor-

Type Aliases

Type AliasDescription
ProcessStatus-
RenderBackend-
RenderEffects-
RenderingEvents-
ProcessInputType-
ImageType-
ImageMapIterable-
ProcessEventsMessage events to be sent to the Processor
ProcessWorkerEventsMessage events returned from the processor
ProcessorEvents-
ProcessorWorkerEvents-
SegmentationModel-
Delegate-
ExtractMessageEventType-
IncludeMessageEventDataType-
TupleOfFrom https://github.com/Microsoft/TypeScript/issues/26223#issuecomment-674500430
Canvas-
UnsubscribeUnsubscribe the subscription
AudioBufferFloatsSame as AudioBuffer Or the return from AnalyserNode.getFloatFrequencyData()
AudioBufferBytesSame as the return from AnalyserNode.getByteFrequencyData()
AudioSamplesAudio samples from each channel, either in float or bytes form
NodeConnectionAction-
BaseAudioNode-
AudioNodeParam-
Node-
Nodes-
ConnectParamBaseType-
ConnectInitParamBaseType-
ConnectParamType-
ConnectInitParamType-
ConnectParamBase-
AudioNodeConnectParam-
AudioNodeInitConnectParam-
AudioNodeInitConnection-
AudioNodeInitConnections-
MediaStreamAudioSourceNodeInit-
MediaElementAudioSourceNodeInit-
AnalyzerNodeInit-
DenoiseWorkletNodeInit-
GainNodeInit-
MediaStreamAudioDestinationNodeInit-
AudioDestinationNodeInit-
DelayNodeInit-
ChannelSplitterNodeInit-
UniversalAudioContextStateWe need to add the missing type def to work with AudioContextState in Safari See https://developer.mozilla.org/en-US/docs/Web/API/BaseAudioContext/state#resuming_interrupted_play_states_in_ios_safari
CanvasContext-
IsVoice-
AsyncCallback-
Callback-
RunnerCreator-
WasmPaths-
Color-
InputFrame-
SegmentationTransform-
ProcessVideoTrack-

Variables

VariableDescription
PROCESSING_WIDTH-
PROCESSING_HEIGHT-
FOREGROUND_THRESHOLD-
BACKGROUND_BLUR_AMOUNT-
EDGE_BLUR_AMOUNT-
MASK_COMBINE_RATIO-
PROCESS_STATUSProcess status
RENDER_EFFECT-
RENDER_BACKEND-
RENDERING_EVENTS-
SEGMENTATION_MODELS-
isSegmentationModel-
isRenderingEvents-
toRenderingEvents-
getErrorMessage-
createObjectUpdater-
createRemoteImageBitmap-
clamping-
calculateMaxBlurPass-
resizeCalculate the coordinates and size of the source and destination to fit the target aspect ratio.
createLazyProps-
handleWebGLContextLoss-
urls-
SILENT_THRESHOLDDefault silent threshold At least one LSB 16-bit data (compare is on absolute value).
MONO_THRESHOLDDefault mono detection threshold Data must be identical within one LSB 16-bit to be identified as mono.
LOW_VOLUME_THRESHOLDDefault low volume detection threshold
CLIP_THRESHOLDDefault clipping detection threshold
VOICE_PROBABILITY_THRESHOLDDefault Voice probability threshold
CLIP_COUNT_THRESHOLDDefault clipping count threshold Number of consecutive clipThreshold level samples that indicate clipping.
FRAME_RATE-
PLAY_VIDEO_TIMEOUT-

Functions

FunctionDescription
createAudioContextA function to create AudioContext using constructor or factory function depends on the browser supports
resumeAudioOnInterruptionResume the stream whenever interrupted
resumeAudioOnUnmuteResume the AudioContext whenever the source track is unmuted
subscribeWorkletNodeSubscribe MessagePort message from an AudioWorkletNode
subscribeTimeoutAnalyzerNodeSubscribe to a timeout loop to get the data from Analyzer
createStreamSourceGraphNodeCreate a MediaStreamAudioSourceNodeInit
createMediaElementSourceGraphNodeCreate a MediaStreamAudioSourceNodeInit
createAnalyzerSubscribableGraphNodeCreate an analyzer node with push-based subscription
createDenoiseWorkletGraphNodeCreate a noise suppression node
createGainGraphNodeCreate a GainNodeInit
createAnalyzerGraphNodeCreate an AnalyzerNodeInit
createStreamDestinationGraphNodeCreate a MediaStreamAudioDestinationNodeInit
createAudioDestinationGraphNodeCreate an AudioDestinationNode
createDelayGraphNodeCreate a DelayNode
createChannelSplitterGraphNodeCreate a ChannelSplitterNode
createChannelMergerGraphNodeCreate a ChannelMergerNode
createAudioGraphAccepts AudioNodeInitConnections to build the audio graph within a signal audio context
createAudioGraphProxy-
createBenchmark-
sumSum an array of numbers
avgAverage an array of numbers
powpow function from Math in functional form number -> number -> number
rmsCalculate the Root Mean Square from provided numbers
roundRound the floating point number away from zero, which is different from Math.round
createAudioStatsAudioStats builder
fromByteToFloatConvert a byte to float, according to web audio spec
fromFloatToByteConvert a float to byte, according to web audio spec
copyByteBufferToFloatBufferCopy data from Uint8Array buffer to Float32Array buffer with byte to float conversion
toDecibelConvert a floating point gain value into a dB representation without any reference, dBFS, https://en.wikipedia.org/wiki/DBFS
processAverageVolumeCalculate the averaged volume using Root Mean Square, assuming the data is in float form
isSilentSimple silent detection to only check the first and last bit from the sample
isLowVolumeCheck if the provided gain above the low volume threshold, which is considered as low volume.
isClippingCheck if there is clipping
isMonoCheck if provided channels are mono or stereo
getAudioStatsCalculate the audio stats, expected the samples are in float form
isVoiceActivityA Naive Voice activity detection
isEqualSizeCompare the provided width and height to see if they are the same
createVoiceDetectorFromTimeDataA function to check provided time series data is considered as voice activity
createVoiceDetectorFromProbabilityA function to check the provided probability is considered as voice activity
createVADetectorCreate a voice detector based on provided params
createAudioSignalDetectorCreate a function to process the AudioStats and check if silent onSignalDetected callback is called under 2 situations:
isAudioNode-
isAudioParam-
isAudioNodeInit-
isAnalyzerNodeInit-
createAsyncCallbackLoopCreate an async callback loop to be called recursively with delay based on the frameRate
createCanvasTransform-
loadScript-
loadWasms-
createSegmenter-
isRenderEffects-
createCanvasCreate a Canvas element with provided width and height
createFrameCallbackRequestCreate a callback loop for video frame processing using requestVideoFrameCallback under-the-hood when available otherwise our fallback implementation based on setTimeout.
createVideoProcessor-
createVideoTrackProcessorVideo track processor using MediaStreamTrackProcessor API
createVideoTrackProcessorWithFallbackVideo track processor using Canvas[captureStream] API
calculateDistanceCalculate the distance between two Points
getBezierCurveControlPointsSpline Interpolation for Bezier Curve
lineCreate a straight line path command
curveCreate a cubic Bezier curve path command
closedCurveCreate a cubic Bezier curve path then turning back to the starting point with provided point of reference