Tools

class perception.tools.SaferMatcher(api_key=None, username=None, password=None, url=None, hasher=None, hasher_api_id=None, quality_threshold=90)

An object for matching hashes with the known CSAM hashes in the Safer matching service. Please contact info@getsafer.io for details on obtaining credentials and information on how match responses are provided.

Here’s a minimalist example:

from perception import hashers, tools

hasher = hashers.PHash(hash_size=16)
matches = hashers.tools.SaferMatcher(
    api_key='YOUR_API_KEY',
    username='YOUR_USERNAME', # You only need to provide
    password='YOUR_PASSWORD', # an API key OR username/password.
    url='MATCHING_SERVICE_URL'
)

For authentication, you must provide the API key OR username and password pair. If neither is provided, the function will attempt to find them as environment variables with names SAFER_MATCHING_SERVICE_API_KEY, SAFER_MATCHING_SERVICE_USERNAME, and SAFER_MATCHING_SERVICE_PASSWORD, respectively. You must also provide the URL endpoint for the matching service, either as a keyword argument or as a SAFER_MATCHING_SERVICE_URL environment variable.

Parameters:
  • api_key (Optional[str]) – A base64 encoded set of matching service credentials
  • username (Optional[str]) – Matching service username
  • password (Optional[str]) – Matching service password
  • url (Optional[str]) – Safer matching service URL
  • hasher (Optional[ImageHasher]) – A hasher to use for matching
  • hasher_api_id (Optional[str]) – The hasher ID for finding matches.
  • quality_threshold (int) – The quality threshold filter to use
match(images)

Match hashes with the Safer matching service.

Parameters:images (List[Union[str, Tuple[Union[str, ndarray, Image, BytesIO], str]]]) – A list of image filepaths or (image_like, image_id) tuples.
Return type:dict
Returns:A dictionary of matches. See Safer matching service documentation ( contact Thorn for a copy).
perception.tools.deduplicate(files, hashers, isometric=False, progress=None)

Find duplicates in a list of files.

Parameters:
  • files (List[str]) – A list of filepaths.
  • hashers (List[Tuple[ImageHasher, float]]) – A list of tuples of the form (hasher, threshold)
  • isometric (bool) – Whether to compare the rotated versions of the images
  • progress (Optional[tqdm]) – A tqdm progress indicator
Return type:

List[Tuple[str, str]]

Returns:

A list of duplicated file pairs. To use, you can just remove the first entry of each pair from your dataset. The pairs are provided in the event that you wish to apply further analysis.

perception.tools.deduplicate_hashes(hashes, threshold, hash_format='base64', hasher=None, hash_length=None, hash_dtype=None, distance_metric=None, progress=None)

Find duplicates using a list of precomputed hashes.

Parameters:
  • hashes (List[Tuple[str, Union[str, ndarray]]]) – A list of (id, hash) tuples
  • threshold (float) – A distance threshold
  • hasher (Optional[ImageHasher]) – A hasher to use for computing distances
  • progress (Optional[tqdm]) – A tqdm object for reporting progress
Return type:

List[Tuple[str, str]]

Returns:

A list of duplicated id pairs. To use, you can just remove the first entry of each pair from your dataset. The pairs are provided in the event that you wish to apply further analysis.