Hashers

All hashers from the Hasher class.

class perception.hashers.hasher.Hasher

All hashers implement a common set of methods from the Hasher base class.

allow_parallel = True

Indicates whether the hashes can be computed in parallel

compute_distance(hash1, hash2, hash_format='base64')

Compute the distance between two hashes.

Parameters:
  • hash1 (Union[ndarray, str]) – The first hash or vector
  • hash2 (Union[ndarray, str]) – The second hash or vector
  • hash_format – If either or both of the hashes are hash strings, what format the string is encoded in.
compute_parallel(filepaths, progress=None, progress_desc=None, max_workers=5, isometric=False)

Compute hashes in a parallelized fashion.

Parameters:
  • filepaths – A list of paths to images or videos (depending on the hasher).
  • progress – A tqdm-like wrapper for reporting progress. If None, progress is not reported.
  • progress_desc – The title of the progress bar.
  • max_workers – The maximum number of workers
  • isometric – Whether to compute all eight isometric transforms for each image.
distance_metric = None

The metric to use when computing distance between two hashes. All hashers must supply this parameter.

dtype = None

The numpy type to use when converting from string to array form. All hashers must supply this parameter.

hash_length = None

Indicates the length of the hash vector

returns_multiple = False

Whether or not this hash returns multiple values

string_to_vector(hash_string, hash_format='base64')

Convert hash string to vector.

Parameters:
  • hash_string (str) – The input hash string
  • hash_format (str) – One of ‘base64’ or ‘hex’
vector_to_string(vector, hash_format='base64')

Convert vector to hash string.

Parameters:
  • vector (ndarray) – Input vector
  • hash_format (str) – One of ‘base64’ or ‘hex’
Return type:

Optional[str]

Images

All image hashers inherit from the ImageHasher class.

class perception.hashers.hasher.ImageHasher
compute(image, hash_format='base64')

Compute a hash from an image.

Parameters:
  • image (Union[str, ndarray, Image, BytesIO]) – An image represented as a filepath, a PIL image object, or as an np.ndarray object. If it is an np.ndarray object, it must be in RGB color order (note the OpenCV default is BGR).
  • hash_format – One ‘base64’, ‘hex’, or ‘vector’
Return type:

Union[ndarray, str, None, List[Optional[str]]]

compute_isometric_from_hash(hash_string_or_vector, hash_format='base64')

For supported hashes, obtain the hashes for the dihedral transformations of the original image. They are provided in the following order:

  • Vertical flip
  • Horizontal flip
  • 180 degree rotation
  • 90 degree rotation
  • 90 degree rotation and vertical flip
  • 90 degree rotation and horizontal flip
  • 270 degree rotation
Parameters:
  • hash_string_or_vector – The hash string or vector
  • hash_format – One ‘base64’ or ‘hex’
compute_with_quality(image, hash_format='base64')

Compute hash and hash quality from image.

Parameters:
  • image (Union[str, ndarray, Image, BytesIO]) – An image represented as a filepath, a PIL image object, or as an np.ndarray object. If it is an np.ndarray object, it must be in RGB color order (note the OpenCV default is BGR).
  • hash_format – One ‘base64’, ‘hex’, or ‘vector’
Return type:

Tuple[Union[ndarray, str, None, List[Optional[str]]], int]

Returns:

A tuple of (hash, quality)

The following image hash functions are included in the package.

class perception.hashers.image.AverageHash(hash_size=8)

Computes a simple hash comparing the intensity of each pixel in a resized version of the image to the mean. Implementation based on that of ImageHash.

class perception.hashers.image.PHash(hash_size=8, highfreq_factor=4, exclude_first_term=False, freq_shift=0)

Also known as the DCT hash, a hash based on discrete cosine transforms of images. See complete paper for details. Implementation based on that of ImageHash.

Parameters:
  • hash_size – The number of DCT elements to retain (the hash length will be hash_size * hash_size).
  • highfreq_factor – The multiple of the hash size to resize the input image to before computing the DCT.
  • exclude_first_term – WHether to exclude the first term of the DCT
  • freq_shift – The number of DCT low frequency elements to skip.
class perception.hashers.image.WaveletHash(hash_size=8, image_scale=None, mode='haar')

Similar to PHash but using wavelets instead of DCT. Implementation based on that of ImageHash.

class perception.hashers.image.MarrHildreth

A wrapper around OpenCV’s Marr-Hildreth hash. See paper for details.

class perception.hashers.image.BlockMean

A wrapper around OpenCV’s Block Mean hash. See paper for details.

class perception.hashers.image.ColorMoment

A wrapper around OpenCV’s Color Moments hash. See paper for details.

class perception.hashers.image.DHash(hash_size=8)

A hash based on the differences between adjacent pixels. Implementation based on that of ImageHash.

class perception.hashers.image.PHashF(hash_size=8, highfreq_factor=4, exclude_first_term=False, freq_shift=0)

A real-valued version of PHash. It returns the raw 32-bit floats in the DCT. For a more compact approach, see PHashU8.

class perception.hashers.image.PHashU8(hash_size=8, highfreq_factor=4, exclude_first_term=False, freq_shift=0)

A real-valued version of PHash. It uses minimum / maximum scaling to convert DCT values to unsigned 8-bit integers (more compact than the 32-bit floats used by PHashF at the cost of precision).

Videos

All video hashers inherit from the VideoHasher class.

class perception.hashers.hasher.VideoHasher
compute(filepath, errors='raise', hash_format='base64', **kwargs)

Compute a hash for a video at a given filepath. All other arguments are passed to perception.hashers.tools.read_video.

Parameters:
  • filepath – Path to video file
  • errors – One of “raise”, “ignore”, or “warn”. Passed to perception.hashers.tools.read_video.
  • hash_format – One of “vector”, “base64”, or “hex”
  • max_duration – The maximum length of the video to hash.
  • max_size – The maximum size of frames to queue
frames_per_second = 1

The frame rate at which videos are read

hash_from_final_state(state)

Called after all frames have been processed. Returns the final feature vector.

Parameters:state (dict) – The state dictionary at the end of processing.
Return type:ndarray
process_frame(frame, frame_index, frame_timestamp, state=None)

Called for each frame in the video. For all but the first frame, a state is provided recording the state from the previous frame.

Parameters:
  • frame (ndarray) – The current frame as an RGB ndarray
  • frame_index (Optional[int]) – The current frame index
  • frame_timestamp (Optional[float]) – The current frame timestamp
  • state (Optional[dict]) – The state from the last call to process_frame
Return type:

dict

The following video hash functions are included in the package.

class perception.hashers.video.FramewiseHasher(frame_hasher, interframe_threshold, frames_per_second=15, quality_threshold=None)

A hasher that simply returns frame-wise hashes at some regular interval with some minimum inter-frame distance threshold.

compute_batches(filepath, batch_size, errors='raise', hash_format='base64')

Compute hashes for a video in batches.

Parameters:
  • filepath (str) – Path to video file
  • batch_size (int) – The batch size to use for returning hashes
  • errors – One of “raise”, “ignore”, or “warn”. Passed to perception.hashers.tools.read_video.
  • hash_format – The format in which to return hashes
hash_from_final_state(state)

Called after all frames have been processed. Returns the final feature vector.

Parameters:state – The state dictionary at the end of processing.
process_frame(frame, frame_index, frame_timestamp, state=None)

Called for each frame in the video. For all but the first frame, a state is provided recording the state from the previous frame.

Parameters:
  • frame – The current frame as an RGB ndarray
  • frame_index – The current frame index
  • frame_timestamp – The current frame timestamp
  • state – The state from the last call to process_frame
class perception.hashers.video.TMKL1(frame_hasher=None, frames_per_second=15, dtype='float32', distance_metric='cosine', norm=2, quality_threshold=None)

The TMK L1 video hashing algorithm.

hash_from_final_state(state)

Called after all frames have been processed. Returns the final feature vector.

Parameters:state – The state dictionary at the end of processing.
process_frame(frame, frame_index, frame_timestamp, state=None)

Called for each frame in the video. For all but the first frame, a state is provided recording the state from the previous frame.

Parameters:
  • frame – The current frame as an RGB ndarray
  • frame_index – The current frame index
  • frame_timestamp – The current frame timestamp
  • state – The state from the last call to process_frame
class perception.hashers.video.TMKL2(frame_hasher=None, frames_per_second=15, normalization='matrix')

The TMK L2 video hashing algorithm.

hash_from_final_state(state)

Called after all frames have been processed. Returns the final feature vector.

Parameters:state – The state dictionary at the end of processing.
process_frame(frame, frame_index, frame_timestamp, state=None)

Called for each frame in the video. For all but the first frame, a state is provided recording the state from the previous frame.

Parameters:
  • frame – The current frame as an RGB ndarray
  • frame_index – The current frame index
  • frame_timestamp – The current frame timestamp
  • state – The state from the last call to process_frame
class perception.hashers.video.SimpleSceneDetection(base_hasher=None, interscene_threshold=None, min_frame_size=50, similarity_threshold=0.95, max_scene_length=None)

The SimpleSceneDetection hasher is a wrapper around other video hashers to create separate hashes for different scenes / shots in a video. It works by shrinking each frame, blurring it, and doing a simple delta with the previous frame. If they are different, this marks the start of a new scene. In addition, this wrapper will also remove letterboxing from videos by checking for solid black areas on the edges of the frame.

Parameters:
  • base_hasher (Optional[VideoHasher]) – The base video hasher to use for each scene.
  • interscene_threshold – The distance threshold between sequential scenes that new hashes must meet to be included (this is essentially for deduplication)
  • min_frame_size – The minimum frame size to use for computing hashes. This is relevant for letterbox detection as black frames will tend to be completely “cropped” and make the frame very small.
  • max_scene_length – The maximum length of a single scene.
  • similarity_threshold – The threshold for detecting whether two frames are different enough to constitute a new scene.
compute_batches(filepath, errors='raise', hash_format='base64', batch_size=10)

Compute a hash for a video at a given filepath and yield hashes in a given batch size.

Parameters:
  • filepath – Path to video file
  • errors – One of “raise”, “ignore”, or “warn”. Passed to perception.hashers.tools.read_video.
  • hash_format – The hash format to use when returning hashes.
  • batch_size – The minimum number of hashes to include in each batch.
hash_from_final_state(state)

Called after all frames have been processed. Returns the final feature vector.

Parameters:state – The state dictionary at the end of processing.
process_frame(frame, frame_index, frame_timestamp, state=None, batch_mode=False)

Called for each frame in the video. For all but the first frame, a state is provided recording the state from the previous frame.

Parameters:
  • frame – The current frame as an RGB ndarray
  • frame_index – The current frame index
  • frame_timestamp – The current frame timestamp
  • state – The state from the last call to process_frame

Tools

These utility functions are only used by the hashers but are documented here for completeness.

perception.hashers.tools.b64_to_hex(hash_string, dtype, hash_length, verify_length=True)

Convert a base64-encoded hash to hex.

Parameters:
  • hash_string (str) – The input hex hash string
  • dtype (str) – The data type of the hash
  • hash_length (int) – The length of the hash vector
  • verify_length (bool) – Whether to verify the string length
perception.hashers.tools.compute_md5(filepath)

Compute the md5 hash for a file at filepath.

Parameters:filepath – The path to the file
Return type:str
perception.hashers.tools.compute_quality(image)

Compute a quality metric, using the calculation proposed by Facebook for their PDQ hash algorithm.

Return type:int
perception.hashers.tools.compute_synchronized_video_hashes(filepath, hashers, framerates=None, hash_format='base64', use_queue=True)

Compute the video hashes for a group of hashers with synchronized frame processing wherever possible.

Parameters:
  • filepath (str) – Path to video file.
  • hashers (dict) – A dictionary mapping hasher names to video hasher objects
  • hash_format – The format in which to return the hashes
  • use_queue – Whether to use queued video frames
perception.hashers.tools.get_common_framerates(id_rates)

Compute an optimal set of framerates for a list of framerates. Optimal here means that reading the video at each of the framerates will allow one to collect all of the frames required with the smallest possible number of frames decoded.

For example, consider if we need to read a video at 3 fps, 5 fps, 1 fps and 0.5 fps. We could read the video 4 times (once per framerate). But a more optimal approach is to read the video only twice, once at 3 frames per second and another time at 5 frames per second. For the 1 fps hasher, we simply pass every 3rd frame of the 3 fps pass. For the 0.5 fps hasher, we pass every 6th frame of the 3 fps pass. So if you pass this function {A: 3, B: 5, C: 1, D: 0.5}, you will get back {3: [A, C, D], 5: C}.

Parameters:id_rates (dict) – A dictionary with IDs as keys and frame rates as values.
Returns:
A dictionary with framerates as keys and a list of
ids as values.
Return type:rate_ids
perception.hashers.tools.get_string_length(hash_length, dtype, hash_format='hex')

Compute the expected length of a hash string.

Parameters:
  • hash_length (int) – The length of the hash vector
  • dtype (str) – The dtype of the vector
  • hash_format – One of ‘base64’ or ‘hex’
Return type:

int

Returns:

The expected string length

perception.hashers.tools.hex_to_b64(hash_string, dtype, hash_length, verify_length=True)

Convert a hex-encoded hash to base64.

Parameters:
  • hash_string (str) – The input base64 hash string
  • dtype (str) – The data type of the hash
  • hash_length (int) – The length of the hash vector
  • verify_length (bool) – Whether to verify the string length
perception.hashers.tools.read(filepath_or_buffer, timeout=None)

Read a file into an image object

Parameters:
  • filepath_or_buffer (Union[str, ndarray, Image, BytesIO]) – The path to the file or any object with a read method (such as io.BytesIO)
  • timeout – If filepath_or_buffer is a URL, the timeout to use for making the HTTP request.
perception.hashers.tools.read_video(filepath, frames_per_second=None, max_queue_size=128, use_queue=True, errors='raise', use_ffmpeg=False, **kwargs)

Provides a generator of RGB frames, frame indexes, and timestamps from a video. This function requires you to have installed ffmpeg. All other arguments passed to read_video_to_generator.

Parameters:
  • filepath – Path to the video file
  • frames_per_second (Union[str, float, None]) – How many frames to provide for each second of video. If None, all frames are provided. If frames_per_second is “keyframes”, we use ffmpeg to select I frames from the video.
  • max_queue_size – The maximum number of frames to load in the queue
  • use_queue – Whether to use a queue of frames during processing
  • max_duration – The maximum length of the video to hash.
  • max_size – The maximum size of frames to queue
  • errors – Whether to ‘raise’, ‘warn’, or ‘ignore’ errors
  • use_ffmpeg – Whether to use the FFMPEG CLI to read videos. If True, other kwargs (e.g., use_cuda) are passed to read_video_to_generator_ffmpeg.
Yields:

(frame, frame_index, timestamp) tuples

Return type:

Generator[Tuple[ndarray, Optional[int], Optional[float]], None, None]

perception.hashers.tools.read_video_to_generator(filepath, frames_per_second=None, errors='raise', max_duration=None, max_size=None)

This is used by read_video when use_ffmpeg is False (default).

Parameters:
  • filepath – See read_video.
  • frames_per_second (Union[str, float, None]) – See read_video.
  • errors – See read_video.
  • max_duration (Optional[float]) – See read_video.
  • max_size (Optional[int]) – See read_video.
Return type:

Generator[Tuple[ndarray, Optional[int], Optional[float]], None, None]

Returns:

See read_video.

perception.hashers.tools.read_video_to_generator_ffmpeg(filepath, frames_per_second=None, errors='raise', max_duration=None, max_size=None, interp=None, frame_rounding='up', draw_timestamps=False, use_cuda=False)

This is used by read_video when use_ffmpeg is True. It differs from read_video_to_generator in that it uses FFMPEG instead of OpenCV and, optionally, allows for CUDA acceleration. CUDA acceleration can be faster for larger videos (>1080p) where downsampling is desired. For other videos, CUDA may be slower, but the decoding load will still be taken off the CPU, which may still be advantageous. You can specify which FFMPEG binary to use by setting PERCEPTION_FFMPEG_BINARY.

Parameters:
  • filepath – See read_video
  • frames_per_second (Union[str, float, None]) – See read_video
  • errors – See read_video
  • max_duration (Optional[float]) – See read_video
  • max_size (Optional[int]) – See read_video
  • interp (Optional[str]) – The interpolation method to use. When not using CUDA, you must choose one of the interpolation options (default: area). When using CUDA, you must choose from the interp_algo options (default: super).
  • frame_rounding (str) – The frame rounding method.
  • draw_timestamps – Draw original timestamps onto the frames (for debugging only)
  • use_cuda – Whether to enable CUDA acceleration. Requires a CUDA-accelerated version of ffmpeg.

To build FFMPEG with CUDA, do the following in a Docker container based on nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04. The FFMPEG binary will be ffmpeg/ffmpeg.

git clone https://git.videolan.org/git/ffmpeg/nv-codec-headers.git
cd nv-codec-headers
make
sudo make install
cd ..
git clone --branch release/4.3 https://git.ffmpeg.org/ffmpeg.git
cd ffmpeg
sudo apt-get update && sudo apt-get -y install yasm
export PATH=$PATH:/usr/local/cuda/bin
./configure --enable-cuda-nvcc --enable-cuvid --enable-nvenc --enable-nvdec                     --enable-libnpp --enable-nonfree --extra-cflags=-I/usr/local/cuda/include                     --extra-ldflags=-L/usr/local/cuda/lib64
make -j 10
Return type:Generator[Tuple[ndarray, Optional[int], Optional[float]], None, None]
Returns:See read_video
perception.hashers.tools.string_to_vector(hash_string, dtype, hash_length, hash_format, verify_length=True)

Convert hash back to vector.

Parameters:
  • hash_string (str) – The input hash string
  • dtype (str) – The data type of the hash
  • hash_length (int) – The length of the hash vector
  • hash_format (str) – The input format of the hash (base64 or hex)
  • verify_length (bool) – Whether to verify the string length
Return type:

ndarray

perception.hashers.tools.unletterbox(image)

Return bounds of non-trivial region of image or None.

Unletterboxing is cropping an image such that trivial edge regions are removed. Trivial in this context means that the majority of the values in that row or column are zero or very close to zero. This is why we don’t use the terms “non-blank” or “non-empty.”

In order to do unletterboxing, this function returns bounds in the form (x1, x2), (y1, y2) where:

  • x1 is the index of the first column where over 10% of the pixels have means (average of R, G, B) > 2.
  • x2 is the index of the last column where over 10% of the pixels have means > 2.
  • y1 is the index of the first row where over 10% of the pixels have means > 2.
  • y2 is the index of the last row where over 10% of the pixels have means > 2.

If there are zero columns or zero rows where over 10% of the pixels have means > 2, this function returns None.

Note that in the case(s) of a single column and/or row of non-trivial pixels that it is possible for x1 = x2 and/or y1 = y2.

Consider these examples to understand edge cases. Given two images, L (entire left and bottom edges are 1, all other pixels 0) and U (left, bottom and right edges 1, all other pixels 0), unletterbox(L) would return the bounds of the single bottom-left pixel and unletterbox(U) would return the bounds of the entire bottom row.

Consider U1 which is the same as U but with the bottom two rows all 1s. unletterbox(U1) returns the bounds of the bottom two rows.

Parameters:image – The image from which to remove letterboxing.
Return type:Optional[Tuple[Tuple[int, int], Tuple[int, int]]]
Returns:A pair of coordinates bounds of the form (x1, x2) and (y1, y2) representing the left, right, top, and bottom bounds.
perception.hashers.tools.vector_to_string(vector, dtype, hash_format)

Convert vector to hash.

Parameters:vector (ndarray) – Input vector
Return type:Optional[str]