Hashers
All hashers from the Hasher class.
- class perception.hashers.hasher.Hasher
All hashers implement a common set of methods from the Hasher base class.
-
allow_parallel:
bool= True Indicates whether the hashes can be computed in parallel
- compute_distance(hash1, hash2, hash_format='base64')
Compute the distance between two hashes.
- Parameters:
hash1 (
ndarray|str) – The first hash or vectorhash2 (
ndarray|str) – The second hash or vectorhash_format – If either or both of the hashes are hash strings, what format the string is encoded in.
- compute_parallel(filepaths, progress=None, progress_desc=None, max_workers=5, isometric=False)
Compute hashes in a parallelized fashion.
- Parameters:
filepaths – A list of paths to images or videos (depending on the hasher).
progress – A tqdm-like wrapper for reporting progress. If None, progress is not reported.
progress_desc – The title of the progress bar.
max_workers – The maximum number of workers
isometric – Whether to compute all eight isometric transforms for each image.
-
distance_metric:
str The metric to use when computing distance between two hashes. All hashers must supply this parameter.
-
dtype:
str The numpy type to use when converting from string to array form. All hashers must supply this parameter.
-
hash_length:
int Indicates the length of the hash vector
-
returns_multiple:
bool= False Whether or not this hash returns multiple values
- string_to_vector(hash_string, hash_format='base64')
Convert hash string to vector.
- Parameters:
hash_string (
str) – The input hash stringhash_format (
str) – One of ‘base64’ or ‘hex’
- vector_to_string(vector, hash_format='base64')
Convert vector to hash string.
- Parameters:
vector (
ndarray) – Input vectorhash_format (
str) – One of ‘base64’ or ‘hex’
- Return type:
str|None
-
allow_parallel:
Images
All image hashers inherit from the ImageHasher class.
- class perception.hashers.hasher.ImageHasher
- compute(image, hash_format='base64')
Compute a hash from an image.
- Parameters:
image (
Union[str,ndarray, PIL.Image.Image,BytesIO,SpooledTemporaryFile]) – An image represented as a filepath, a PIL image object, or as an np.ndarray object. If it is an np.ndarray object, it must be in RGB color order (note the OpenCV default is BGR).hash_format – One ‘base64’, ‘hex’, or ‘vector’
- Return type:
ndarray|str|None|list[str|None]
- compute_isometric_from_hash(hash_string_or_vector, hash_format='base64')
For supported hashes, obtain the hashes for the dihedral transformations of the original image. They are provided in the following order:
Vertical flip
Horizontal flip
180 degree rotation
90 degree rotation
90 degree rotation and vertical flip
90 degree rotation and horizontal flip
270 degree rotation
- Parameters:
hash_string_or_vector – The hash string or vector
hash_format – One ‘base64’ or ‘hex’
- compute_with_quality(image, hash_format='base64')
Compute hash and hash quality from image.
- Parameters:
image (
Union[str,ndarray, PIL.Image.Image,BytesIO,SpooledTemporaryFile]) – An image represented as a filepath, a PIL image object, or as an np.ndarray object. If it is an np.ndarray object, it must be in RGB color order (note the OpenCV default is BGR).hash_format – One ‘base64’, ‘hex’, or ‘vector’
- Return type:
tuple[ndarray|str|None|list[str|None],int]- Returns:
A tuple of (hash, quality)
The following image hash functions are included in the package.
- class perception.hashers.image.AverageHash(hash_size=8)
Computes a simple hash comparing the intensity of each pixel in a resized version of the image to the mean. Implementation based on that of ImageHash.
-
distance_metric:
str= 'hamming' The metric to use when computing distance between two hashes. All hashers must supply this parameter.
-
dtype:
str= 'bool' The numpy type to use when converting from string to array form. All hashers must supply this parameter.
-
distance_metric:
- class perception.hashers.image.BlockMean
A wrapper around OpenCV’s Block Mean hash. See paper for details.
-
distance_metric:
str= 'hamming' The metric to use when computing distance between two hashes. All hashers must supply this parameter.
-
dtype:
str= 'bool' The numpy type to use when converting from string to array form. All hashers must supply this parameter.
-
hash_length:
int= 968 Indicates the length of the hash vector
-
distance_metric:
- class perception.hashers.image.ColorMoment
A wrapper around OpenCV’s Color Moments hash. See paper for details.
-
distance_metric:
str= 'euclidean' The metric to use when computing distance between two hashes. All hashers must supply this parameter.
-
dtype:
str= 'float32' The numpy type to use when converting from string to array form. All hashers must supply this parameter.
-
hash_length:
int= 42 Indicates the length of the hash vector
-
distance_metric:
- class perception.hashers.image.DHash(hash_size=8)
A hash based on the differences between adjacent pixels. Implementation based on that of ImageHash.
-
distance_metric:
str= 'hamming' The metric to use when computing distance between two hashes. All hashers must supply this parameter.
-
dtype:
str= 'bool' The numpy type to use when converting from string to array form. All hashers must supply this parameter.
-
distance_metric:
- class perception.hashers.image.MarrHildreth
A wrapper around OpenCV’s Marr-Hildreth hash. See paper for details.
-
distance_metric:
str= 'hamming' The metric to use when computing distance between two hashes. All hashers must supply this parameter.
-
dtype:
str= 'bool' The numpy type to use when converting from string to array form. All hashers must supply this parameter.
-
hash_length:
int= 576 Indicates the length of the hash vector
-
distance_metric:
- class perception.hashers.image.PHash(hash_size=8, highfreq_factor=4, exclude_first_term=False, freq_shift=0)
Also known as the DCT hash, a hash based on discrete cosine transforms of images. See complete paper for details. Implementation based on that of ImageHash.
- Parameters:
hash_size – The number of DCT elements to retain (the hash length will be hash_size * hash_size).
highfreq_factor – The multiple of the hash size to resize the input image to before computing the DCT.
exclude_first_term – WHether to exclude the first term of the DCT
freq_shift – The number of DCT low frequency elements to skip.
-
distance_metric:
str= 'hamming' The metric to use when computing distance between two hashes. All hashers must supply this parameter.
-
dtype:
str= 'bool' The numpy type to use when converting from string to array form. All hashers must supply this parameter.
- class perception.hashers.image.PHashF(hash_size=8, highfreq_factor=4, exclude_first_term=False, freq_shift=0)
A real-valued version of PHash. It returns the raw 32-bit floats in the DCT. For a more compact approach, see PHashU8.
-
distance_metric:
str= 'euclidean' The metric to use when computing distance between two hashes. All hashers must supply this parameter.
-
dtype:
str= 'float32' The numpy type to use when converting from string to array form. All hashers must supply this parameter.
-
distance_metric:
- class perception.hashers.image.PHashU8(hash_size=8, highfreq_factor=4, exclude_first_term=False, freq_shift=0)
A real-valued version of PHash. It uses minimum / maximum scaling to convert DCT values to unsigned 8-bit integers (more compact than the 32-bit floats used by PHashF at the cost of precision).
-
distance_metric:
str= 'euclidean' The metric to use when computing distance between two hashes. All hashers must supply this parameter.
-
dtype:
str= 'uint8' The numpy type to use when converting from string to array form. All hashers must supply this parameter.
-
distance_metric:
- class perception.hashers.image.WaveletHash(hash_size=8, image_scale=None, mode='haar')
Similar to PHash but using wavelets instead of DCT. Implementation based on that of ImageHash.
-
distance_metric:
str= 'hamming' The metric to use when computing distance between two hashes. All hashers must supply this parameter.
-
dtype:
str= 'bool' The numpy type to use when converting from string to array form. All hashers must supply this parameter.
-
distance_metric:
Videos
All video hashers inherit from the VideoHasher class.
- class perception.hashers.hasher.VideoHasher
- compute(filepath, errors='raise', hash_format='base64', scenes=None, **kwargs)
Compute a hash for a video at a given filepath. All other arguments are passed to perception.hashers.tools.read_video.
- Parameters:
filepath – Path to video file
errors – One of “raise”, “ignore”, or “warn”. Passed to perception.hashers.tools.read_video.
hash_format – One of “vector”, “base64”, or “hex”
max_duration – The maximum length of the video to hash.
max_size – The maximum size of frames to queue
scenes – An array used to pass scene info back to wrapper functions
-
frames_per_second:
float= 1 The frame rate at which videos are read
- abstractmethod hash_from_final_state(state)
Called after all frames have been processed. Returns the final feature vector.
- Parameters:
state (
dict) – The state dictionary at the end of processing.- Return type:
ndarray
- abstractmethod process_frame(frame, frame_index, frame_timestamp, state=None)
Called for each frame in the video. For all but the first frame, a state is provided recording the state from the previous frame.
- Parameters:
frame (
ndarray) – The current frame as an RGB ndarrayframe_index (
int|None) – The current frame indexframe_timestamp (
float|None) – The current frame timestampstate (
dict|None) – The state from the last call to process_frame
- Return type:
dict
The following video hash functions are included in the package.
- class perception.hashers.video.FramewiseHasher(frame_hasher, interframe_threshold, frames_per_second=15, quality_threshold=None)
A hasher that simply returns frame-wise hashes at some regular interval with some minimum inter-frame distance threshold.
- compute_batches(filepath, batch_size, errors='raise', hash_format='base64')
Compute hashes for a video in batches.
- Parameters:
filepath (
str) – Path to video filebatch_size (
int) – The batch size to use for returning hasheserrors – One of “raise”, “ignore”, or “warn”. Passed to perception.hashers.tools.read_video.
hash_format – The format in which to return hashes
- hash_from_final_state(state)
Called after all frames have been processed. Returns the final feature vector.
- Parameters:
state – The state dictionary at the end of processing.
- process_frame(frame, frame_index, frame_timestamp, state=None)
Called for each frame in the video. For all but the first frame, a state is provided recording the state from the previous frame.
- Parameters:
frame – The current frame as an RGB ndarray
frame_index – The current frame index
frame_timestamp – The current frame timestamp
state – The state from the last call to process_frame
-
returns_multiple:
bool= True Whether or not this hash returns multiple values
- class perception.hashers.video.TMKL1(frame_hasher=None, frames_per_second=15, dtype='float32', distance_metric='cosine', norm=2, quality_threshold=None)
The TMK L1 video hashing algorithm.
- hash_from_final_state(state)
Called after all frames have been processed. Returns the final feature vector.
- Parameters:
state – The state dictionary at the end of processing.
- process_frame(frame, frame_index, frame_timestamp, state=None)
Called for each frame in the video. For all but the first frame, a state is provided recording the state from the previous frame.
- Parameters:
frame – The current frame as an RGB ndarray
frame_index – The current frame index
frame_timestamp – The current frame timestamp
state – The state from the last call to process_frame
- class perception.hashers.video.TMKL2(frame_hasher=None, frames_per_second=15, normalization='matrix')
The TMK L2 video hashing algorithm.
-
distance_metric:
str= 'custom' The metric to use when computing distance between two hashes. All hashers must supply this parameter.
-
dtype:
str= 'float32' The numpy type to use when converting from string to array form. All hashers must supply this parameter.
- hash_from_final_state(state)
Called after all frames have been processed. Returns the final feature vector.
- Parameters:
state – The state dictionary at the end of processing.
- process_frame(frame, frame_index, frame_timestamp, state=None)
Called for each frame in the video. For all but the first frame, a state is provided recording the state from the previous frame.
- Parameters:
frame – The current frame as an RGB ndarray
frame_index – The current frame index
frame_timestamp – The current frame timestamp
state – The state from the last call to process_frame
-
distance_metric:
Tools
These utility functions are only used by the hashers but are documented here for completeness.
- perception.hashers.tools.b64_to_hex(hash_string, dtype, hash_length, verify_length=True)
Convert a base64-encoded hash to hex.
- Parameters:
hash_string (
str) – The input hex hash stringdtype (
str) – The data type of the hashhash_length (
int) – The length of the hash vectorverify_length (
bool) – Whether to verify the string length
- perception.hashers.tools.compute_md5(filepath)
Compute the md5 hash for a file at filepath.
- Parameters:
filepath – The path to the file
- Return type:
str
- perception.hashers.tools.compute_quality(image)
Compute a quality metric, using the calculation proposed by Facebook for their PDQ hash algorithm.
- Return type:
int
- perception.hashers.tools.compute_synchronized_video_hashes(filepath, hashers, framerates=None, hash_format='base64', use_queue=True)
Compute the video hashes for a group of hashers with synchronized frame processing wherever possible.
- Parameters:
filepath (
str) – Path to video file.hashers (
dict) – A dictionary mapping hasher names to video hasher objectshash_format – The format in which to return the hashes
use_queue – Whether to use queued video frames
- perception.hashers.tools.get_common_framerates(id_rates)
Compute an optimal set of framerates for a list of framerates. Optimal here means that reading the video at each of the framerates will allow one to collect all of the frames required with the smallest possible number of frames decoded.
For example, consider if we need to read a video at 3 fps, 5 fps, 1 fps and 0.5 fps. We could read the video 4 times (once per framerate). But a more optimal approach is to read the video only twice, once at 3 frames per second and another time at 5 frames per second. For the 1 fps hasher, we simply pass every 3rd frame of the 3 fps pass. For the 0.5 fps hasher, we pass every 6th frame of the 3 fps pass. So if you pass this function {A: 3, B: 5, C: 1, D: 0.5}, you will get back {3: [A, C, D], 5: C}.
- Parameters:
id_rates (
dict) – A dictionary with IDs as keys and frame rates as values.- Returns:
- A dictionary with framerates as keys and a list of
ids as values.
- Return type:
rate_ids
- perception.hashers.tools.get_string_length(hash_length, dtype, hash_format='hex')
Compute the expected length of a hash string.
- Parameters:
hash_length (
int) – The length of the hash vectordtype (
str) – The dtype of the vectorhash_format – One of ‘base64’ or ‘hex’
- Return type:
int- Returns:
The expected string length
- perception.hashers.tools.hex_to_b64(hash_string, dtype, hash_length, verify_length=True)
Convert a hex-encoded hash to base64.
- Parameters:
hash_string (
str) – The input base64 hash stringdtype (
str) – The data type of the hashhash_length (
int) – The length of the hash vectorverify_length (
bool) – Whether to verify the string length
- perception.hashers.tools.read(filepath_or_buffer, timeout=None)
Read a file into an image object
- Parameters:
filepath_or_buffer (
Union[str,ndarray,Image,BytesIO,SpooledTemporaryFile]) – The path to the file or any object with a read method (such as io.BytesIO)timeout – If filepath_or_buffer is a URL, the timeout to use for making the HTTP request.
- Return type:
ndarray
- perception.hashers.tools.read_video(filepath, frames_per_second=None, max_queue_size=128, use_queue=True, errors='raise', use_ffmpeg=False, **kwargs)
Provides a generator of RGB frames, frame indexes, and timestamps from a video. This function requires you to have installed ffmpeg. All other arguments passed to read_video_to_generator.
- Parameters:
filepath – Path to the video file
frames_per_second (
str|float|None) – How many frames to provide for each second of video. If None, all frames are provided. If frames_per_second is “keyframes”, we use ffmpeg to select I frames from the video.max_queue_size – The maximum number of frames to load in the queue
use_queue – Whether to use a queue of frames during processing
max_duration – The maximum length of the video to hash.
max_size – The maximum size of frames to queue
errors – Whether to ‘raise’, ‘warn’, or ‘ignore’ errors
use_ffmpeg – Whether to use the FFMPEG CLI to read videos. If True, other kwargs (e.g.,
use_cuda) are passed toread_video_to_generator_ffmpeg.
- Yields:
(frame, frame_index, timestamp) tuples
- Return type:
Generator[tuple[ndarray,int|None,float|None],None,None]
- perception.hashers.tools.read_video_to_generator(filepath, frames_per_second=None, errors='raise', max_duration=None, max_size=None)
This is used by
read_videowhenuse_ffmpegis False (default).- Parameters:
filepath – See
read_video.frames_per_second (
str|float|None) – Seeread_video.errors – See
read_video.max_duration (
float|None) – Seeread_video.max_size (
int|None) – Seeread_video.
- Return type:
Generator[tuple[ndarray,int|None,float|None],None,None]- Returns:
See
read_video.
- perception.hashers.tools.read_video_to_generator_ffmpeg(filepath, frames_per_second=None, errors='raise', max_duration=None, max_size=None, interp=None, frame_rounding='up', draw_timestamps=False, use_cuda=False)
This is used by
read_videowhenuse_ffmpegis True. It differs fromread_video_to_generatorin that it uses FFMPEG instead of OpenCV and, optionally, allows for CUDA acceleration. CUDA acceleration can be faster for larger videos (>1080p) where downsampling is desired. For other videos, CUDA may be slower, but the decoding load will still be taken off the CPU, which may still be advantageous. You can specify which FFMPEG binary to use by setting PERCEPTION_FFMPEG_BINARY.- Parameters:
filepath – See read_video
frames_per_second (
str|float|None) – See read_videoerrors – See read_video
max_duration (
float|None) – See read_videomax_size (
int|None) – See read_videointerp (
str|None) – The interpolation method to use. When not using CUDA, you must choose one of the interpolation options (default: area). When using CUDA, you must choose from the interp_algo options (default: super).frame_rounding (
str) – The frame rounding method.draw_timestamps – Draw original timestamps onto the frames (for debugging only)
use_cuda – Whether to enable CUDA acceleration. Requires a CUDA-accelerated version of ffmpeg.
To build FFMPEG with CUDA, do the following in a Docker container based on nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04. The FFMPEG binary will be ffmpeg/ffmpeg.
git clone https://git.videolan.org/git/ffmpeg/nv-codec-headers.git cd nv-codec-headers make sudo make install cd .. git clone https://git.ffmpeg.org/ffmpeg.git cd ffmpeg sudo apt-get update && sudo apt-get -y install yasm export PATH=$PATH:/usr/local/cuda/bin # Note: Scroll far right to see full configure command: ./configure --enable-cuda-nvcc --enable-cuvid --enable-nvenc --enable-nvdec --enable-libnpp --enable-nonfree --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 make -j 10 sudo make install
- Return type:
Generator[tuple[ndarray,int|None,float|None],None,None]- Returns:
See
read_video
- perception.hashers.tools.string_to_vector(hash_string, dtype, hash_length, hash_format, verify_length=True)
Convert hash back to vector.
- Parameters:
hash_string (
str) – The input hash stringdtype (
str) – The data type of the hashhash_length (
int) – The length of the hash vectorhash_format (
str) – The input format of the hash (base64 or hex)verify_length (
bool) – Whether to verify the string length
- Return type:
ndarray
- perception.hashers.tools.unletterbox(image, only_remove_black=False, min_fraction_meaningful_pixels=0.1, color_threshold=2, min_side_length=50, min_reduction=0.02)
Return bounds of the non-trivial (content) region of an image, or None.
Letterboxing refers to uniform-color borders added around an image (e.g., black bars on a video frame). This function detects such borders by identifying the background color from the image corners and finding the bounding box of pixels that differ from that background.
The function returns bounds as
(x1, x2), (y1, y2)suitable for slicing:image[y1:y2, x1:x2]. The bounds are exclusive on the right/bottom (i.e., x2 and y2 point one past the last content pixel).Algorithm overview:
Sample the four corner pixels and find the most common value as the candidate background color. If all four corners differ, return
None(no consistent letterbox detected).Build a binary content mask where each pixel whose grayscale intensity differs from the background by more than
color_thresholdis marked as content.Project the mask onto rows and columns and find the first/last row and column where the fraction of content pixels exceeds
min_fraction_meaningful_pixels.Validate that the resulting crop is meaningfully smaller than the original (controlled by
min_reduction) and that both sides exceedmin_side_length.
Returns
Nonewhen:No two corners share the same color (no clear background).
Every pixel differs from the detected background (no border).
No row or column meets the content-pixel threshold.
The crop would not remove at least
min_reductionfraction from any dimension.Either cropped dimension would be smaller than
min_side_length.
- Parameters:
image (
ndarray) – Input image as annp.ndarray. May be grayscale (H×W) or RGB (H×W×3); RGB images are converted to grayscale internally for background detection.only_remove_black (
bool) – IfTrue, treat black (intensity 0) as the background regardless of corner colors. IfFalse(default), infer the background color from the most common corner value.min_fraction_meaningful_pixels (
float) – The minimum fraction (0–1) of pixels in a row or column that must differ from the background for that row/column to be considered part of the content region. Defaults to 0.1 (10%).color_threshold (
float) – The minimum absolute difference in grayscale intensity between a pixel and the background color for that pixel to be classified as content. Defaults to 2.min_side_length (
int) – The minimum width or height (in pixels) of the cropped region. If the crop would be smaller,Noneis returned. Defaults to 50.min_reduction (
float) – The minimum fraction (0–1) of the original width or height that must be removed for the crop to be worthwhile. If the crop removes less than this from both dimensions,Noneis returned. Defaults to 0.02 (2%).
- Return type:
tuple[tuple[int,int],tuple[int,int]] |None- Returns:
A tuple
((x1, x2), (y1, y2))giving the left, right, top, and bottom bounds of the content region (right/bottom exclusive), orNoneif no meaningful letterbox was detected.
- perception.hashers.tools.unletterbox_crop(image, min_fraction_meaningful_pixels=0.1, color_threshold=2, min_side_length=50, min_reduction=0.02)
Detect and crop the letterboxed regions from an image.
- Parameters:
image (
ndarray) – The image from which to remove letterboxing.min_fraction_meaningful_pixels (
float) – 0 to 1: if cropped version is smaller than this fraction of the image do not unletterbox. 0.1 == 10% of the image.color_threshold (
float) – The minimum absolute difference in grayscale intensity between a pixel and the background color for that pixel to be classified as content. Defaults to 2.min_side_length (
int) – The minimum width or height (in pixels) of the cropped region. If the crop would be smaller,Noneis returned. Defaults to 50.min_reduction (
float) – The minimum fraction (0–1) of the original width or height that must be removed for the crop to be worthwhile. If the crop removes less than this from both dimensions, the original image is returned. Defaults to 0.02 (2%).
- Return type:
ndarray|None- Returns:
The cropped image or None if the image is mostly blank space.
- perception.hashers.tools.vector_to_string(vector, dtype, hash_format)
Convert vector to hash.
- Parameters:
vector (
ndarray) – Input vector- Return type:
str|None