Skip to main content

@pexip/media-processor

An package for media data processing using Web API.

Install

npm install @pexip/media-processor

Usages

Use Analyzer to get the MediaStream data

const stream = await navigator.mediaDevices.getUserMedia({audio:true});

const fftSize = 64;
// Setup an Audio Graph with `source` -> `analyzer`
const source = createStreamSourceGraphNode(stream);
const analyzer = createAnalyzerGraphNode({fftSize});
const audioGraph = createAudioGraph([[source, analyzer]]);

// Grab the current time domain data in floating point representation
const buffer = new Float32Array(fftSize);
analyzer.node?.getFloatTimeDomainData(buffer);
// Do some work with the buffer
buffer.forEach(...);

// Get the current volume, [0, 1]
const volume = analyzer.getAverageVolume(buffer);

// Release the resources when you have done with the analyzer
await audioGraph.release();

Use AudioGraph to control the audio gain to mute/unmute

const stream = await navigator.mediaDevices.getUserMedia({
audio: true,
video: true,
});

const mute = !stream.getAudioTracks()[0]?.enabled;

// Setup an Audio Graph with `source` -> `gain`
const source = createStreamSourceGraphNode(stream);
const gain = createGainGraphNode(mute);
const destination = createStreamDestinationGraphNode();

const audioGraph = createAudioGraph([[source, gain, destination]]);

// Use the output MediaStream to for the altered AudioTrack
const alteredStream = new MediaStream([
...stream.getVideoTracks(),
...destination.stream.getAudioTracks(),
]);

// Mute the audio
if (gain.node) {
gain.node.mute = true;
}

// Check if the audio is muted
gain.node.mute; // returns `true`, since we have just set the gain to 0

// Release the resources when you have done with the analyzer
await audioGraph.release();

Use noise suppression

const stream = await navigator.mediaDevices.getUserMedia({
audio: true,
video: true,
});

// Fetch denoise WebAssembly
const response = await fetch(
new URL('@pexip/denoise/denoise_bg.wasm', import.meta.url).href,
);
const wasmBuffer = await response.arrayBuffer();

// Setup an Audio Graph with `source` -> `gain`
const source = createStreamSourceGraphNode(stream);
const destination = createStreamDestinationGraphNode();

const audioGraph = createAudioGraph([[source, destination]]);
// Add worklet module
await audioGraph.addWorklet(
new URL(
'@pexip/media-processor/dist/worklets/denoise.worklet',
import.meta.url,
).href
);
const denoise = createDenoiseWorkletGraphNode(wasmBuffer);
// Route the source through the denoise node
const audioGraph.connect([source, denoise, destination]);
const audioGraph.disconnect([source, destination]);

// Release the resources when you have done with the analyzer
await audioGraph.release();

Use background blur

import {
createSegmenter,
createCanvasTransform,
createVideoTrackProcessor,
createVideoTrackProcessorWithFallback,
} from '@pexip/media-processor';

// Grab the user's camera stream
const stream = await navigator.mediaDevices.getUserMedia({video: true});

// Setting the path to that `@mediapipe/tasks-vision` assets
// It will be passed direct to
// [FilesetResolver.forVisionTasks()](https://ai.google.dev/edge/api/mediapipe/js/tasks-vision.filesetresolver#filesetresolverforvisiontasks)
const tasksVisionBasePath =
'A base path to specify the directory the Wasm files should be loaded from';

const modelAsset = {
/**
* Path to mediapipe selfie segmentation model asset
*/
path: 'A path to selfie segmentation model',
modelName: 'selfie' as const,
};

const segmenter = createSegmenter(tasksVisionBasePath, {modelAssets});
// Create a processing transformer and set the effects to `blur`
const transformer = createCanvasTransform(segmenter, {effects: 'blur'});
const processor = createVideoProcessor([transformer]);

// Start the processor
await videoProcessor.open();

// Passing the raw MediaStream to apply the effects
// Then, use the output stream for whatever purpose
const processedStream = await videoProcessor.process(stream);

Profiling Web Audio

You can do it with chrome, See here.

How AudioWorkletNode and the AudioWorkletProcessor work together

┌─────────────────────────┐             ┌──────────────────────────┐
│ │ │ │
│ Main Global Scope │ │ AudioWorkletGlobalScope │
│ │ │ │
│ ┌───────────────────┐ │ │ ┌────────────────────┐ │
│ │ │ │ MessagePort │ │ │ │
│ │ AudioWorklet │◄─┼─────────────┼─►│ AudioWorklet │ │
│ │ Node │ │ │ │ Processor │ │
│ │ │ │ │ │ │ │
│ └───────────────────┘ │ │ └────────────────────┘ │
│ │ │ │
└─────────────────────────┘ └──────────────────────────┘
Main Thread WebAudio Render Thread

Constraints when using the AudioWorklet

  • Each BaseAudioContext possesses exactly one AudioWorklet
  • 128 samples-frames
  • No fetch API in the AudioWorkletGlobalScope
  • No TextEncoder/Decoder APIs in the AudioWorkletGlobalScope

References

A library for media analysis using Web APIs.

Enumerations

EnumerationDescription
AbortReason-

Interfaces

InterfaceDescription
Analyzer-
AnalyzerSubscribableOptions-
AsyncAssets-
AudioGraph-
AudioGraphOptions-
AudioNodeInit-
AudioNodeProps-
AudioStatsData structure for the audio statistics while processing
BackgroundImage-
Benchmark-
ClockClock interface to get the current time with now method,
DenoiseA wrapper for the Denoise wasm module
Detector-
Frame-
Gain-
ImageRecord-
ImageSegmenterOptions-
MPMaskWrapper for a mask produced by a Segmentation Task.
PointInterface for Point consist of coordinates x and y
Process-
ProcessEventCloseMessage event to stop the processing
ProcessEventDebugMessage event for debugging
ProcessEventDestroyMessage event to release all the resources
ProcessEventOpenMessage event to initializing the Processor
ProcessEventProcessMessage event to process the video frame
ProcessEventUpdateMessage event to update processor options
ProcessorEvent-
ProcessorOptions-
ProcessorProcessOptions-
ProcessWorkerDebugMessage-
ProcessWorkerEvent-
ProcessWorkerEventDebug-
ProcessWorkerEventOpened-
ProcessWorkerEventProcessed-
ProcessWorkerEventUpdated-
Rect-
Renderer-
RendererOptions-
RenderEventHandlers-
Runner-
Segmentation-
SegmentationModelAssetSegmentation model asset that can be used to load a segmentation model.
SegmentationParams-
SegmentationSmoothingConfig-
Segmenter-
SegmenterOptions-
Size-
StatsOptionsStats options for calculating the audio stats
SubscribableOptions-
ThrottleOptionsLimit the rate of flow in terms of millisecond, and provided Clock
Transform-
VideoFrameLike-
VideoProcessor-
WorkletMessagePortOptions-
WorkletModule-

Type Aliases

Type AliasDescription
AnalyzerNodeInit-
AsyncCallback-
AudioBufferBytesSame as the return from AnalyserNode.getByteFrequencyData()
AudioBufferFloatsSame as AudioBuffer Or the return from AnalyserNode.getFloatFrequencyData()
AudioDestinationNodeInit-
AudioNodeConnectParam-
AudioNodeInitConnection-
AudioNodeInitConnections-
AudioNodeInitConnectParam-
AudioNodeParam-
AudioSamplesAudio samples from each channel, either in float or bytes form
BaseAudioNode-
Callback-
Canvas-
CanvasContext-
ChannelSplitterNodeInit-
Color-
ConnectInitParamBaseType-
ConnectInitParamType-
ConnectParamBase-
ConnectParamBaseType-
ConnectParamType-
DelayNodeInit-
Delegate-
DenoiseWorkletNodeInit-
ExtractMessageEventType-
GainNodeInit-
ImageMapIterable-
ImageType-
IncludeMessageEventDataType-
InputFrame-
IsVoice-
MediaElementAudioSourceNodeInit-
MediaStreamAudioDestinationNodeInit-
MediaStreamAudioSourceNodeInit-
Node-
NodeConnectionAction-
Nodes-
ProcessEventsMessage events to be sent to the Processor
ProcessInputType-
ProcessorEvents-
ProcessorWorkerEvents-
ProcessStatus-
ProcessVideoTrack-
ProcessWorkerEventsMessage events returned from the processor
RenderBackend-
RenderEffects-
RenderingEvents-
RunnerCreator-
SegmentationModel-
SegmentationTransform-
TupleOfFrom https://github.com/Microsoft/TypeScript/issues/26223#issuecomment-674500430
UniversalAudioContextStateWe need to add the missing type def to work with AudioContextState in Safari See https://developer.mozilla.org/en-US/docs/Web/API/BaseAudioContext/state#resuming_interrupted_play_states_in_ios_safari
UnsubscribeUnsubscribe the subscription
WasmPaths-

Variables

VariableDescription
BACKGROUND_BLUR_AMOUNT-
calculateMaxBlurPass-
clamping-
CLIP_COUNT_THRESHOLDDefault clipping count threshold Number of consecutive clipThreshold level samples that indicate clipping.
CLIP_THRESHOLDDefault clipping detection threshold
createLazyProps-
createObjectUpdater-
createRemoteImageBitmap-
EDGE_BLUR_AMOUNT-
FOREGROUND_THRESHOLD-
FRAME_RATE-
getErrorMessage-
handleWebGLContextLoss-
isRenderingEvents-
isSegmentationModel-
LOW_VOLUME_THRESHOLDDefault low volume detection threshold
MASK_COMBINE_RATIO-
MONO_THRESHOLDDefault mono detection threshold Data must be identical within one LSB 16-bit to be identified as mono.
PLAY_VIDEO_TIMEOUT-
PROCESS_STATUSProcess status
PROCESSING_HEIGHT-
PROCESSING_WIDTH-
RENDER_BACKEND-
RENDER_EFFECT-
RENDERING_EVENTS-
resizeCalculate the coordinates and size of the source and destination to fit the target aspect ratio.
SEGMENTATION_MODELS-
SILENT_THRESHOLDDefault silent threshold At least one LSB 16-bit data (compare is on absolute value).
toRenderingEvents-
urls-
VOICE_PROBABILITY_THRESHOLDDefault Voice probability threshold

Functions

FunctionDescription
avgAverage an array of numbers
calculateDistanceCalculate the distance between two Points
closedCurveCreate a cubic Bezier curve path then turning back to the starting point with provided point of reference
copyByteBufferToFloatBufferCopy data from Uint8Array buffer to Float32Array buffer with byte to float conversion
createAnalyzerGraphNodeCreate an AnalyzerNodeInit
createAnalyzerSubscribableGraphNodeCreate an analyzer node with push-based subscription
createAsyncCallbackLoopCreate an async callback loop to be called recursively with delay based on the frameRate
createAudioContextA function to create AudioContext using constructor or factory function depends on the browser supports
createAudioDestinationGraphNodeCreate an AudioDestinationNode
createAudioGraphAccepts AudioNodeInitConnections to build the audio graph within a signal audio context
createAudioGraphProxy-
createAudioSignalDetectorCreate a function to process the AudioStats and check if silent onSignalDetected callback is called under 2 situations:
createAudioStatsAudioStats builder
createBenchmark-
createCanvasCreate a Canvas element with provided width and height
createCanvasTransform-
createChannelMergerGraphNodeCreate a ChannelMergerNode
createChannelSplitterGraphNodeCreate a ChannelSplitterNode
createDelayGraphNodeCreate a DelayNode
createDenoiseWorkletGraphNodeCreate a noise suppression node
createFrameCallbackRequestCreate a callback loop for video frame processing using requestVideoFrameCallback under-the-hood when available otherwise our fallback implementation based on setTimeout.
createGainGraphNodeCreate a GainNodeInit
createMediaElementSourceGraphNodeCreate a MediaStreamAudioSourceNodeInit
createSegmenter-
createStreamDestinationGraphNodeCreate a MediaStreamAudioDestinationNodeInit
createStreamSourceGraphNodeCreate a MediaStreamAudioSourceNodeInit
createVADetectorCreate a voice detector based on provided params
createVideoProcessor-
createVideoTrackProcessorVideo track processor using MediaStreamTrackProcessor API
createVideoTrackProcessorWithFallbackVideo track processor using Canvas[captureStream] API
createVoiceDetectorFromProbabilityA function to check the provided probability is considered as voice activity
createVoiceDetectorFromTimeDataA function to check provided time series data is considered as voice activity
curveCreate a cubic Bezier curve path command
fromByteToFloatConvert a byte to float, according to web audio spec
fromFloatToByteConvert a float to byte, according to web audio spec
getAudioStatsCalculate the audio stats, expected the samples are in float form
getBezierCurveControlPointsSpline Interpolation for Bezier Curve
isAnalyzerNodeInit-
isAudioNode-
isAudioNodeInit-
isAudioParam-
isClippingCheck if there is clipping
isEqualSizeCompare the provided width and height to see if they are the same
isLowVolumeCheck if the provided gain above the low volume threshold, which is considered as low volume.
isMonoCheck if provided channels are mono or stereo
isRenderEffects-
isSilentSimple silent detection to only check the first and last bit from the sample
isVoiceActivityA Naive Voice activity detection
lineCreate a straight line path command
loadScript-
loadWasms-
powpow function from Math in functional form number -> number -> number
processAverageVolumeCalculate the averaged volume using Root Mean Square, assuming the data is in float form
resumeAudioOnInterruptionResume the stream whenever interrupted
resumeAudioOnUnmuteResume the AudioContext whenever the source track is unmuted
rmsCalculate the Root Mean Square from provided numbers
roundRound the floating point number away from zero, which is different from Math.round
subscribeTimeoutAnalyzerNodeSubscribe to a timeout loop to get the data from Analyzer
subscribeWorkletNodeSubscribe MessagePort message from an AudioWorkletNode
sumSum an array of numbers
toDecibelConvert a floating point gain value into a dB representation without any reference, dBFS, https://en.wikipedia.org/wiki/DBFS