Tools¶
-
class
perception.tools.
SaferMatcher
(api_key=None, username=None, password=None, url=None, hasher=None, hasher_api_id=None, quality_threshold=90)¶ An object for matching hashes with the known CSAM hashes in the Safer matching service. Please contact info@getsafer.io for details on obtaining credentials and information on how match responses are provided.
Here’s a minimalist example:
from perception import hashers, tools hasher = hashers.PHash(hash_size=16) matches = hashers.tools.SaferMatcher( api_key='YOUR_API_KEY', username='YOUR_USERNAME', # You only need to provide password='YOUR_PASSWORD', # an API key OR username/password. url='MATCHING_SERVICE_URL' )
For authentication, you must provide the API key OR username and password pair. If neither is provided, the function will attempt to find them as environment variables with names
SAFER_MATCHING_SERVICE_API_KEY
,SAFER_MATCHING_SERVICE_USERNAME
, andSAFER_MATCHING_SERVICE_PASSWORD
, respectively. You must also provide the URL endpoint for the matching service, either as a keyword argument or as aSAFER_MATCHING_SERVICE_URL
environment variable.Parameters: - api_key (
Optional
[str
]) – A base64 encoded set of matching service credentials - username (
Optional
[str
]) – Matching service username - password (
Optional
[str
]) – Matching service password - url (
Optional
[str
]) – Safer matching service URL - hasher (
Optional
[ImageHasher
]) – A hasher to use for matching - hasher_api_id (
Optional
[str
]) – The hasher ID for finding matches. - quality_threshold (
int
) – The quality threshold filter to use
-
match
(images)¶ Match hashes with the Safer matching service.
Parameters: images ( List
[Union
[str
,Tuple
[Union
[str
,ndarray
,Image
,BytesIO
],str
]]]) – A list of image filepaths or (image_like, image_id) tuples.Return type: dict
Returns: A dictionary of matches. See Safer matching service documentation ( contact Thorn for a copy).
- api_key (
-
perception.tools.
deduplicate
(files, hashers, isometric=False, progress=None)¶ Find duplicates in a list of files.
Parameters: - files (
List
[str
]) – A list of filepaths. - hashers (
List
[Tuple
[ImageHasher
,float
]]) – A list of tuples of the form (hasher, threshold) - isometric (
bool
) – Whether to compare the rotated versions of the images - progress (
Optional
[tqdm
]) – A tqdm progress indicator
Return type: List
[Tuple
[str
,str
]]Returns: A list of duplicated file pairs. To use, you can just remove the first entry of each pair from your dataset. The pairs are provided in the event that you wish to apply further analysis.
- files (
-
perception.tools.
deduplicate_hashes
(hashes, threshold, hash_format='base64', hasher=None, hash_length=None, hash_dtype=None, distance_metric=None, progress=None)¶ Find duplicates using a list of precomputed hashes.
Parameters: - hashes (
List
[Tuple
[str
,Union
[str
,ndarray
]]]) – A list of (id, hash) tuples - threshold (
float
) – A distance threshold - hasher (
Optional
[ImageHasher
]) – A hasher to use for computing distances - progress (
Optional
[tqdm
]) – A tqdm object for reporting progress
Return type: List
[Tuple
[str
,str
]]Returns: A list of duplicated id pairs. To use, you can just remove the first entry of each pair from your dataset. The pairs are provided in the event that you wish to apply further analysis.
- hashes (