`@pexip/media-processor`

An package for media data processing using Web API.

Install

npm install @pexip/media-processor

Usages

Use Analyzer to get the MediaStream data

const stream = await navigator.mediaDevices.getUserMedia({audio:true});

const fftSize = 64;
// Setup an Audio Graph with `source` -> `analyzer`
const source = createStreamSourceGraphNode(stream);
const analyzer = createAnalyzerGraphNode({fftSize});
const audioGraph = createAudioGraph([[source, analyzer]]);

// Grab the current time domain data in floating point representation
const buffer = new Float32Array(fftSize);
analyzer.node?.getFloatTimeDomainData(buffer);
// Do some work with the buffer
buffer.forEach(...);

// Get the current volume, [0, 1]
const volume = analyzer.getAverageVolume(buffer);

// Release the resources when you have done with the analyzer
await audioGraph.release();

Use AudioGraph to control the audio gain to mute/unmute

const stream = await navigator.mediaDevices.getUserMedia({
  audio: true,
  video: true,
});

const mute = !stream.getAudioTracks()[0]?.enabled;

// Setup an Audio Graph with `source` -> `gain`
const source = createStreamSourceGraphNode(stream);
const gain = createGainGraphNode(mute);
const destination = createStreamDestinationGraphNode();

const audioGraph = createAudioGraph([[source, gain, destination]]);

// Use the output MediaStream to for the altered AudioTrack
const alteredStream = new MediaStream([
  ...stream.getVideoTracks(),
  ...destination.stream.getAudioTracks(),
]);

// Mute the audio
if (gain.node) {
  gain.node.mute = true;
}

// Check if the audio is muted
gain.node.mute; // returns `true`, since we have just set the gain to 0

// Release the resources when you have done with the analyzer
await audioGraph.release();

Use noise suppression

const stream = await navigator.mediaDevices.getUserMedia({
  audio: true,
  video: true,
});

// Fetch denoise WebAssembly
const response = await fetch(
  new URL('@pexip/denoise/denoise_bg.wasm', import.meta.url).href,
);
const wasmBuffer = await response.arrayBuffer();

// Setup an Audio Graph with `source` -> `gain`
const source = createStreamSourceGraphNode(stream);
const destination = createStreamDestinationGraphNode();

const audioGraph = createAudioGraph([[source, destination]]);
// Add worklet module
await audioGraph.addWorklet(
new URL(
        '@pexip/media-processor/dist/worklets/denoise.worklet',
        import.meta.url,
      ).href
);
const denoise = createDenoiseWorkletGraphNode(wasmBuffer);
// Route the source through the denoise node
const audioGraph.connect([source, denoise, destination]);
const audioGraph.disconnect([source, destination]);

// Release the resources when you have done with the analyzer
await audioGraph.release();

Use background blur

import {
  createSegmenter,
  createCanvasTransform,
  createVideoTrackProcessor,
  createVideoTrackProcessorWithFallback,
} from '@pexip/media-processor';

// Grab the user's camera stream
const stream = await navigator.mediaDevices.getUserMedia({video: true});

// Setting the path to that `@mediapipe/tasks-vision` assets
// It will be passed direct to
// [FilesetResolver.forVisionTasks()](https://ai.google.dev/edge/api/mediapipe/js/tasks-vision.filesetresolver#filesetresolverforvisiontasks)
const tasksVisionBasePath =
  'A base path to specify the directory the Wasm files should be loaded from';

const modelAsset = {
  /**
   * Path to mediapipe selfie segmentation model asset
   */
  path: 'A path to selfie segmentation model',
  modelName: 'selfie' as const,
};

const segmenter = createSegmenter(tasksVisionBasePath, {modelAssets});
// Create a processing transformer and set the effects to `blur`
const transformer = createCanvasTransform(segmenter, {effects: 'blur'});
const processor = createVideoProcessor([transformer]);

// Start the processor
await videoProcessor.open();

// Passing the raw MediaStream to apply the effects
// Then, use the output stream for whatever purpose
const processedStream = await videoProcessor.process(stream);

Profiling Web Audio

You can do it with chrome, See here.

How AudioWorkletNode and the AudioWorkletProcessor work together

┌─────────────────────────┐             ┌──────────────────────────┐
│                         │             │                          │
│    Main Global Scope    │             │  AudioWorkletGlobalScope │
│                         │             │                          │
│  ┌───────────────────┐  │             │  ┌────────────────────┐  │
│  │                   │  │ MessagePort │  │                    │  │
│  │   AudioWorklet    │◄─┼─────────────┼─►│    AudioWorklet    │  │
│  │       Node        │  │             │  │      Processor     │  │
│  │                   │  │             │  │                    │  │
│  └───────────────────┘  │             │  └────────────────────┘  │
│                         │             │                          │
└─────────────────────────┘             └──────────────────────────┘
       Main Thread                          WebAudio Render Thread

Constraints when using the AudioWorklet

Each BaseAudioContext possesses exactly one AudioWorklet
128 samples-frames
No fetch API in the AudioWorkletGlobalScope
No TextEncoder/Decoder APIs in the AudioWorkletGlobalScope

References

A library for media analysis using Web APIs.

Enumerations

Enumeration	Description
AbortReason	-

Interfaces

Interface	Description
Segmentation	-
VideoFrameLike	-
ImageRecord	-
Rect	-
ProcessEventOpen	Message event to initializing the Processor
ProcessEventProcess	Message event to process the video frame
ProcessEventUpdate	Message event to update processor options
ProcessEventClose	Message event to stop the processing
ProcessEventDestroy	Message event to release all the resources
ProcessEventDebug	Message event for debugging
ProcessWorkerEventProcessed	-
ProcessWorkerEventOpened	-
ProcessWorkerEventUpdated	-
ProcessWorkerEvent	-
ProcessWorkerEventDebug	-
ProcessorOptions	-
ProcessorProcessOptions	-
ProcessWorkerDebugMessage	-
ProcessorEvent	-
RendererOptions	Options controlling the rendering and effects for person/background segmentation. All values are stable per-frame and can be safely updated in real-time.
Renderer	-
RenderEventHandlers	-
SegmentationModelAsset	Segmentation model asset that can be used to load a segmentation model.
ImageSegmenterOptions	-
SegmenterOptions	-
SegmentationSmoothingConfig	-
MPMask	Wrapper for a mask produced by a Segmentation Task.
Stats	-
Weights	-
SelectionOptions	-
Benchmark	-
Point	Interface for Point consist of coordinates x and y
Size	-
Frame	-
AudioStats	Data structure for the audio statistics while processing
StatsOptions	Stats options for calculating the audio stats
SubscribableOptions	-
WorkletMessagePortOptions	-
AnalyzerSubscribableOptions	-
Denoise	A wrapper for the Denoise wasm module
Gain	-
Analyzer	-
AudioNodeProps	-
AudioNodeInit	-
WorkletModule	-
AudioGraphOptions	-
AudioGraph	-
Clock	Clock interface to get the current time with now method,
ThrottleOptions	Limit the rate of flow in terms of millisecond, and provided Clock
Runner	-
Transform	-
AsyncAssets	-
Process	-
BackgroundImage	-
Segmenter	-
Detector	-
SegmentationParams	Options controlling the rendering and effects for person/background segmentation. All values are stable per-frame and can be safely updated in real-time.
VideoProcessor	-

Type Aliases

Type Alias	Description
ProcessStatus	-
RenderBackend	-
RenderEffects	-
RenderingEvents	-
ProcessInputType	-
ImageType	-
ImageMapIterable	-
ProcessEvents	Message events to be sent to the Processor
ProcessWorkerEvents	Message events returned from the processor
ProcessorEvents	-
ProcessorWorkerEvents	-
SegmentationModel	-
Delegate	-
ExtractMessageEventType	-
IncludeMessageEventDataType	-
TupleOf	From https://github.com/Microsoft/TypeScript/issues/26223#issuecomment-674500430
OptionalKeys	-
Canvas	-
Unsubscribe	Unsubscribe the subscription
AudioBufferFloats	Same as AudioBuffer Or the return from AnalyserNode.getFloatFrequencyData()
AudioBufferBytes	Same as the return from AnalyserNode.getByteFrequencyData()
AudioSamples	Audio samples from each channel, either in float or bytes form
NodeConnectionAction	-
BaseAudioNode	-
AudioNodeParam	-
Node	-
Nodes	-
ConnectParamBaseType	-
ConnectInitParamBaseType	-
ConnectParamType	-
ConnectInitParamType	-
ConnectParamBase	-
AudioNodeConnectParam	-
AudioNodeInitConnectParam	-
AudioNodeInitConnection	-
AudioNodeInitConnections	-
MediaStreamAudioSourceNodeInit	-
MediaElementAudioSourceNodeInit	-
AnalyzerNodeInit	-
DenoiseWorkletNodeInit	-
GainNodeInit	-
MediaStreamAudioDestinationNodeInit	-
AudioDestinationNodeInit	-
DelayNodeInit	-
ChannelSplitterNodeInit	-
UniversalAudioContextState	We need to add the missing type def to work with AudioContextState in Safari See https://developer.mozilla.org/en-US/docs/Web/API/BaseAudioContext/state#resuming_interrupted_play_states_in_ios_safari
CanvasContext	-
IsVoice	-
AsyncCallback	-
Callback	-
RunnerCreator	-
WasmPaths	-
Color	-
InputFrame	-
SegmentationTransform	-
ProcessVideoTrack	-

Variables

Variable	Description
getCanvasRenderingContext2D	-
getImageSize	-
BACKGROUND_BLUR_AMOUNT	-
BACKGROUND_THRESHOLD	-
DOWN_SAMPLE_FACTOR	-
EDGE_BLUR_AMOUNT	-
EXCLUDE_BYSTANDERS	-
FOREGROUND_THRESHOLD	-
LIGHT_WRAP_BLUR_AMOUNT	-
LIGHT_WRAP_EDGE_BAND	-
LIGHT_WRAP_INTENSITY	-
LIGHT_WRAP_TIGHTNESS	-
MASK_COMBINE_RATIO	-
MORPH_DILATE_RADIUS_PX	-
MORPH_ERODE_RADIUS_PX	-
MORPH_PASS	-
PERSON_CENTER	-
PROCESSING_HEIGHT	-
PROCESSING_WIDTH	-
SIGMA_RANGE	-
SIGMA_SPACE	-
STRONG_EDGE_BLUR_AMOUNT	-
PROCESS_STATUS	Process status
RENDER_EFFECT	-
RENDER_BACKEND	-
RENDERING_EVENTS	-
SEGMENTATION_MODELS	-
isSegmentationModel	-
isRenderingEvents	-
toRenderingEvents	-
getErrorMessage	-
createObjectUpdater	-
createRemoteImageBitmap	-
clamping	-
calculateMaxBlurPass	-
resize	Calculate the coordinates and size of the source and destination to fit the target aspect ratio.
createLazyProps	-
handleWebGLContextLoss	-
urls	-
SILENT_THRESHOLD	Default silent threshold At least one LSB 16-bit data (compare is on absolute value).
MONO_THRESHOLD	Default mono detection threshold Data must be identical within one LSB 16-bit to be identified as mono.
LOW_VOLUME_THRESHOLD	Default low volume detection threshold
CLIP_THRESHOLD	Default clipping detection threshold
VOICE_PROBABILITY_THRESHOLD	Default Voice probability threshold
CLIP_COUNT_THRESHOLD	Default clipping count threshold Number of consecutive clipThreshold level samples that indicate clipping.
FRAME_RATE	-
PLAY_VIDEO_TIMEOUT	-

Functions

Function	Description
nearestPowerOfTwo	Coerce any positive number to the nearest power of two within [minPow, maxPow] Example: nearestPowerOfTwo(13) => 16; nearestPowerOfTwo(3.2) => 4
iou	Intersection over Union (IoU), how much two masks overlap as a fraction of either's total. 1.0 = perfect match, 0 = no overlap. Used for stable, robust tracking of the main person between frames, especially with bystander removal and temporal instance switching.
labelComponents	-
scoreComponent	-
createAudioContext	A function to create `AudioContext` using constructor or factory function depends on the browser supports
resumeAudioOnInterruption	Resume the stream whenever interrupted
resumeAudioOnUnmute	Resume the AudioContext whenever the source track is unmuted
subscribeWorkletNode	Subscribe MessagePort message from an AudioWorkletNode
subscribeTimeoutAnalyzerNode	Subscribe to a timeout loop to get the data from Analyzer
createStreamSourceGraphNode	Create a MediaStreamAudioSourceNodeInit
createMediaElementSourceGraphNode	Create a MediaStreamAudioSourceNodeInit
createAnalyzerSubscribableGraphNode	Create an analyzer node with push-based subscription
createDenoiseWorkletGraphNode	Create a noise suppression node
createGainGraphNode	Create a GainNodeInit
createAnalyzerGraphNode	Create an AnalyzerNodeInit
createStreamDestinationGraphNode	Create a MediaStreamAudioDestinationNodeInit
createAudioDestinationGraphNode	Create an `AudioDestinationNode`
createDelayGraphNode	Create a `DelayNode`
createChannelSplitterGraphNode	Create a ChannelSplitterNode
createChannelMergerGraphNode	Create a ChannelMergerNode
createAudioGraph	Accepts AudioNodeInitConnections to build the audio graph within a signal audio context
createAudioGraphProxy	-
createBenchmark	Creates a benchmarking utility for measuring frame durations and calculating frames per second (FPS).
createWindowedStats	-
sum	Sum an array of numbers
avg	Average an array of numbers
pow	pow function from Math in functional form `number -> number -> number`
rms	Calculate the Root Mean Square from provided numbers
round	Round the floating point number away from zero, which is different from `Math.round`
createAudioStats	AudioStats builder
fromByteToFloat	Convert a byte to float, according to web audio spec
fromFloatToByte	Convert a float to byte, according to web audio spec
copyByteBufferToFloatBuffer	Copy data from Uint8Array buffer to Float32Array buffer with byte to float conversion
toDecibel	Convert a floating point gain value into a dB representation without any reference, dBFS, https://en.wikipedia.org/wiki/DBFS
processAverageVolume	Calculate the averaged volume using Root Mean Square, assuming the data is in float form
isSilent	Simple silent detection to only check the first and last bit from the sample
isLowVolume	Check if the provided gain above the low volume threshold, which is considered as low volume.
isClipping	Check if there is clipping
isMono	Check if provided channels are mono or stereo
getAudioStats	Calculate the audio stats, expected the samples are in float form
isVoiceActivity	A Naive Voice activity detection
isEqualSize	Compare the provided width and height to see if they are the same
createVoiceDetectorFromTimeData	A function to check provided time series data is considered as voice activity
createVoiceDetectorFromProbability	A function to check the provided probability is considered as voice activity
createVADetector	Create a voice detector based on provided params
createAudioSignalDetector	Create a function to process the AudioStats and check if silent `onSignalDetected` callback is called under 2 situations:
isAudioNode	-
isAudioParam	-
isAudioNodeInit	-
isAnalyzerNodeInit	-
createAsyncCallbackLoop	Create an async callback loop to be called recursively with delay based on the `frameRate`
createCanvasTransform	-
loadScript	-
loadWasms	-
createSegmenter	-
isRenderEffects	-
createCanvas	Create a Canvas element with provided width and height
setVideoElementSrc	-
playVideo	-
createFrameCallbackRequest	Create a callback loop for video frame processing using `requestVideoFrameCallback` under-the-hood when available otherwise our fallback implementation based on `setTimeout`.
createVideoProcessor	-
createVideoTrackProcessor	Video track processor using MediaStreamTrackProcessor API
createVideoTrackProcessorWithFallback	Video track processor using Canvas[captureStream] API
calculateDistance	Calculate the distance between two Points
getBezierCurveControlPoints	Spline Interpolation for Bezier Curve
line	Create a straight line path command
curve	Create a cubic Bezier curve path command
closedCurve	Create a cubic Bezier curve path then turning back to the starting point with provided point of reference

Install​

Usages​

Use Analyzer to get the MediaStream data​

Use AudioGraph to control the audio gain to mute/unmute​

Use noise suppression​

Use background blur​

Profiling Web Audio​

How AudioWorkletNode and the AudioWorkletProcessor work together​

Constraints when using the AudioWorklet​

References​

Enumerations​

Interfaces​

Type Aliases​

Variables​

Functions​

Install

Usages

Use Analyzer to get the MediaStream data

Use AudioGraph to control the audio gain to mute/unmute

Use noise suppression

Use background blur

Profiling Web Audio

How AudioWorkletNode and the AudioWorkletProcessor work together

Constraints when using the AudioWorklet

References

Enumerations

Interfaces

Type Aliases

Variables

Functions