XAITK Software Framework: Saliency API

This API consists of a number of object-oriented functor interfaces for saliency heatmap generation. These initial interfaces focus on black-box visual saliency. We define the two high-level requirements for this initial task: reference image perturbation in preparation for black-box testing, and saliency heatmap generation utilizing black-box inputs. We define a few similar interfaces for performing the saliency heatmap generation, separated by the intermediate algorithmic use cases: image similarity, classification and object detection. We explicitly do not require an abstraction for the black-box operations to fit inside. This is intended to allow for applications using these interfaces while leveraging existing functionality, only needing to perform any data formatting to fit the input defined here. Note, however, that some interfaces are defined for certain black-box concepts as part of the SMQTK ecosystem (e.g. in SMQTK-Descriptors, SMQTK-Classifier, SMQTK-Relevancy and other SMQTK-* modules).

These interfaces are based on the plugin and configuration features provided by SMQTK-Core to allow convenient hooks into implementation,discoverability, and factory generation from runtime configuration. This allows for opaque discovery of interface implementations from a class-method on the interface class object, and instantiation of a concrete instance via a JSON-like configuration fed in from an outside resource.

_images/api-docs-fig-01.png

Figure 1: Abstract Interface Inheritance.

Perturbed Image Generation

The PerturbImage interface abstracts the behavior of taking a reference image and generating some number perturbations of the image along with paired mask matrices indicating where perturbations have occurred and to what amount.

Implementations should impart no side effects upon the input image.

Immediate candidates for implementation of this interface are occlusion-based saliency algorithms [3] that perform perturbations on image pixels.

Interface: PerturbImage

class xaitk_saliency.interfaces.perturb_image.PerturbImage(*args: Any, **kwargs: Any)

Interface abstracting the behavior of taking a reference image and generating some number perturbations in the form of mask matrices indicating where perturbations should occur and to what amount.

Implementations should impart no side effects upon the input image.

abstract perturb(ref_image: numpy.ndarray) numpy.ndarray

Transform an input reference image into a number of mask matrices indicating the perturbed regions.

Output mask matrix should be 3-dimensional with the format [nMasks x Height x Width], sharing the same height and width to the input reference image. The implementing algorithm may determine the quantity of output masks per input image. These masks should indicate the regions in the corresponding perturbed image that have been modified. Values should be in the [0, 1] range, where a value closer to 1.0 indicates areas of the image that are unperturbed. Note that output mask matrices may be of a floating-point type in order to allow for fractional perturbation.

Parameters

ref_image – Reference image to generate perturbations from.

Returns

Mask matrix with shape [nMasks x Height x Width].

Image Occlusion via Perturbation Masks

A common intermediate step in this process is applying the generated perturbation masks to imagery to produce occluded images. We provide two utility functions as baseline implementation to perform this step:

  • xaitk_saliency.utils.masking.occlude_image_batch - performs the transformation as a batch operation

  • xaitk_saliency.utils.masking.occlude_image_streaming - performs the transformation in a streaming method with optional parallelization in that streaming

While the batch version is simpler and in many cases the faster of the two versions, the streaming version may be more applicable to large image masks or when a great deal of masks are being input, where in such cases the batch version would exceed available memory.

xaitk_saliency.utils.masking.occlude_image_batch(ref_image: numpy.ndarray, masks: numpy.ndarray, fill: Optional[Union[int, Sequence[int], numpy.ndarray]] = None) numpy.ndarray

Apply a number of input occlusion masks to the given reference image, producing a list of images equivalent in length, and parallel in order, to the input masks.

We expect the “mask” matrices and the image to be the same height and width, and for the mask matrix values to be in the [0, 1] range. In the mask matrix, values closer to 1 correspond to regions of the image that should NOT be occluded. E.g. a 0 in the mask will translate to fully occluding the corresponding location in the source image.

We optionally take in a “fill” that alpha-blend into masked regions of the input ref_image. fill may be either a scalar, sequence of scalars, or another image matrix congruent in shape to the ref_image. When fill is a scalar or a sequence of scalars, the scalars should be in the same data-type and value range as the input image. A sequence of scalars should be the same length as there are channels in the ref_image. When fill is an image matrix it should follow the format of [H x W] or [H x W x C], should be in the same dtype and value range as ref_image and should match the same number of channels if channels are provided. When no fill is passed, black is used (default absence of color).

Images output will mirror the input image format. As such, the fill value passed must be compatible with the input image channels for broadcasting. For example, a single channel input will not be able to be broadcast against a multi-channel fill input. A ValueError will be raised by the underlying numpy call in such cases.

NOTE: Due to the batch nature of this function, utilizing a fill color will consistently utilize more RAM due to the use of alpha blending

Assumptions:
  • Mask input is per-pixel. Does not accept per-channel masks.

  • Fill value input is in an applicable value range supported by the input image format, which is mirrored in output images.

Parameters
  • ref_image – Reference image to generate perturbations from.

  • masks – Mask matrix input of shape [N x H x W] where height and width dimensions are the same size as the input ref_image.

  • fill – Optional fill for alpha-blending based on the input masks for the occluded regions as a scalar value, a per-channel sequence or a shape-matched image.

Raises

ValueError – The input mask matrix was not 3-dimensional, its last two dimensions did not match the shape of the input imagery, or the input fill value could not be broadcast against the input image.

Returns

A numpy array of masked images.

xaitk_saliency.utils.masking.occlude_image_streaming(ref_image: numpy.ndarray, masks: Iterable[numpy.ndarray], fill: Optional[Union[int, Sequence[int], numpy.ndarray]] = None, threads: Optional[int] = None) Generator[numpy.ndarray, None, None]

Apply a number of input occlusion masks to the given reference image, producing a list of images equivalent in length, and parallel in order, to the input masks.

We expect the “mask” matrices and the image to be the same height and width, and for the mask matrix values to be in the [0, 1] range. In the mask matrix, values closer to 1 correspond to regions of the image that should NOT be occluded. E.g. a 0 in the mask will translate to fully occluding the corresponding location in the source image.

We optionally take in a “fill” that alpha-blend into masked regions of the input ref_image. fill may be either a scalar, sequence of scalars, or another image matrix congruent in shape to the ref_image. When fill is a scalar or a sequence of scalars, the scalars should be in the same data-type and value range as the input image. A sequence of scalars should be the same length as there are channels in the ref_image. When fill is an image matrix it should follow the format of [H x W] or [H x W x C], should be in the same dtype and value range as ref_image and should match the same number of channels if channels are provided. When no fill is passed, black is used (default absence of color).

Images output will mirror the input image format. As such, the fill value passed must be compatible with the input image channels for broadcasting. For example, a single channel input will not be able to be broadcast against a multi-channel fill input. A ValueError will be raised by the underlying numpy call in such cases.

Assumptions:
  • Mask input is per-pixel. Does not accept per-channel masks.

  • Fill value input is in an applicable value range supported by the input image format, which is mirrored in output images.

Parameters
  • ref_image – Original base image

  • masks – Mask images in the [N, Height, Weight] shape format.

  • fill – Optional fill for alpha-blending based on the input masks for the occluded regions as a scalar value, a per-channel sequence or a shape-matched image.

  • threads – Optional number of threads to use for parallelism when set to a positive integer. If 0, a negative value, or None, work will be performed on the main-thread in-line.

Raises

ValueError – One or more input masks in the input iterable did not match shape of the input reference image.

Returns

A generator of numpy array masked images.

Saliency Heatmap Generation

These interfaces comprise a family of siblings that all perform a similar transformation, but requiring different standard inputs. There is no standard to rule them all without being so abstract that it would break the concept of interface abstraction, or the ability to substitute any arbitrary implementations of the interface without interrupting successful execution. Each interface is intended to handle different black-box outputs from different algorithmic categories. In the future, as additional algorithmic categories are identified for which saliency map generation is applicable, additional interfaces may be defined and added to this initial repertoire.

Interface: GenerateDescriptorSimilaritySaliency

This interface proposes that implementations require externally generated feature-vectors for two reference images between which we are trying to discern the feature-space saliency. This also requires the feature-vectors for perturbed images as well as the masks of the perturbations as would be output from a PerturbImage implementation. We expect perturbations to be relative to the second reference image feature-vector.

An immediate candidate implementation for this interface is the SBSM algorithm [1].

class xaitk_saliency.interfaces.gen_descriptor_sim_sal.GenerateDescriptorSimilaritySaliency(*args: Any, **kwargs: Any)

Visual saliency map generation interface whose implementations transform black-box feature-vectors from multiple references and perturbations into saliency heat-maps.

This transformation requires two reference images, translated into feature-vectors via some black-box means, between which we are trying to discern the feature-space saliency. This also requires the feature-vectors for perturbed images as well as the masks of the perturbations as would be output from a xaitk_saliency.interfaces.perturb_image.PerturbImage implementation. We expect perturbations to be relative to the second reference image.

abstract generate(ref_descr_1: numpy.ndarray, ref_descr_2: numpy.ndarray, perturbed_descrs: numpy.ndarray, perturbed_masks: numpy.ndarray) numpy.ndarray

Generate a visual saliency heat-map matrix given the black-box descriptor generation output on two reference images, the same descriptor output on perturbed images and the masks of the visual perturbations.

Perturbation mask input into the perturbed_masks parameter here is equivalent to the perturbation mask output from a xaitk_saliency.interfaces.perturb_image.PerturbImage.perturb() method implementation. We expect perturbations to be relative to the second reference image. These should have the shape [nMasks x H x W], and values in range [0, 1], where a value closer to 1 indicates areas of the image that are unperturbed. Note the type of values in masks can be either integer, floating point or boolean within the above range definition. Implementations are responsible for handling these expected variations.

Generated saliency heat-map matrices should be floating-point typed and be composed of values in the [-1,1] range. Positive values of the saliency heat-maps indicate regions which increase image similarity scores, while negative values indicate regions which decrease image similarity scores according to the model that generated input feature vectors.

Parameters
  • ref_descr_1 – First image reference float feature-vector, shape [nFeats]

  • ref_descr_2 – Second image reference float feature-vector, shape [nFeats]

  • perturbed_descrs – Feature vectors of second reference image perturbations, float typed of shape [nMasks x nFeats].

  • perturbed_masks – Perturbation masks numpy.ndarray over the second reference image. This should be parallel in association to the perturbed_descrs parameter. This should have a shape [nMasks x H x W], and values in range [0, 1], where a value closer to 1 indicates areas of the image that are unperturbed.

Returns

Generated saliency heat-map as a float-typed numpy.ndarray with shape [H x W].

Interface: GenerateClassifierConfidenceSaliency

This interface proposes that implementations transform black-box image classification scores into saliency heatmaps. This should require a sequence of per-class confidences predicted on the reference image, a number of per-class confidences as predicted on perturbed images, as well as the masks of the reference image perturbations (as would be output from a PerturbImage implementation).

Implementations should use this input to generate a visual saliency heat-map for each input “class” in the input. This is both an effort to vectorize the operation for optimal performance, as well as to allow some algorithms to take advantage of differences in classification behavior for other classes to influence heatmap generation. For classifiers that generate many class label predictions, it is intended that only a subset of relevant class predictions need be provided here if computational performance is a consideration.

An immediate candidate implementation for this interface is the RISE algorithm [2] and occlusion-based saliency algorithms [3] that generate saliency heat-maps.

class xaitk_saliency.interfaces.gen_classifier_conf_sal.GenerateClassifierConfidenceSaliency(*args: Any, **kwargs: Any)

Visual saliency map generation interface whose implementations transform black-box image classification scores into saliency heatmaps.

This should require a sequence of per-class confidences predicted on the reference image, a number of per-class confidences as predicted on perturbed images, as well as the masks of the reference image perturbations (as would be output from a xaitk_saliency.interfaces.perturb_image.PerturbImage implementation).

Implementations should use this input to generate a visual saliency heat-map for each input “class” in the input. This is both an effort to vectorize the operation for optimal performance, as well as to allow some algorithms to take advantage of differences in classification behavior for other classes to influence heatmap generation. For classifiers that generate many class label predictions, it is intended that only a subset of relevant class predictions need be provided here if computational performance is a consideration.

abstract generate(image_conf: numpy.ndarray, perturbed_conf: numpy.ndarray, perturbed_masks: numpy.ndarray) numpy.ndarray

Generate an visual saliency heat-map matrix given the black-box classifier output on a reference image, the same classifier output on perturbed images and the masks of the visual perturbations.

Perturbation mask input into the perturbed_masks parameter here is equivalent to the perturbation mask output from a xaitk_saliency.interfaces.perturb_image.PerturbImage.perturb() method implementation. These should have the shape [nMasks x H x W], and values in range [0, 1], where a value closer to 1 indicate areas of the image that are unperturbed. Note the type of values in masks can be either integer, floating point or boolean within the above range definition. Implementations are responsible for handling these expected variations.

Generated saliency heat-map matrices should be floating-point typed and be composed of values in the [-1,1] range. Positive values of the saliency heat-maps indicate regions which increase class confidence scores, while negative values indicate regions which decrease class confidence scores according to the model that generated input confidence values.

Parameters
  • image_conf – Reference image predicted class-confidence vector, as a numpy.ndarray, for all classes that require saliency map generation. This should have a shape [nClasses], be float-typed and with values in the [0,1] range.

  • perturbed_conf – Perturbed image predicted class confidence matrix. Classes represented in this matrix should be congruent to classes represented in the image_conf vector. This should have a shape [nMasks x nClasses], be float-typed and with values in the [0,1] range.

  • perturbed_masks – Perturbation masks numpy.ndarray over the reference image. This should be parallel in association to the classification results input into the perturbed_conf parameter. This should have a shape [nMasks x H x W], and values in range [0, 1], where a value closer to 1 indicate areas of the image that are unperturbed.

Returns

Generated visual saliency heat-map for each input class as a float-type numpy.ndarray of shape [nClasses x H x W].

Interface: GenerateDetectorProposalSaliency

This interface proposes that implementations transform black-box image object detection predictions into visual saliency heatmaps. This should require externally generated object detection predictions over some image, along with predictions for perturbed images and the permutation masks for those images as would be output from a PerturbImage implementation. Object detection representations used here would need to encapsulate localization information (i.e. bounding box regions), class scores, and objectness scores (if applicable to the detector, such as YOLOv3). Object detections are converted into (4+1+nClasses) vectors (4 indices for bounding box locations, 1 index for objectness, and nClasses indices for different object classes).

Implementations should use this input to generate a visual saliency heat-map for each input detection. We assume that an input detection is coupled with a single truth class (or a single leaf node in a hierarchical structure). Input detections on the reference image may be drawn from ground truth or predictions as desired by the use case. As for perturbed image detections, we expect those to usually be decoupled from the source of reference image detections, which is why below we formulate the shape of perturbed image detects with nProps instead of nDets (though the value of that axis may be the same in some cases).

A candidate implementation for this interface is the D-RISE [4] algorithm.

class xaitk_saliency.interfaces.gen_detector_prop_sal.GenerateDetectorProposalSaliency(*args: Any, **kwargs: Any)

This interface proposes that implementations transform black-box image object detection predictions into visual saliency heatmaps. This should require externally-generated object detection predictions over some image, along with predictions for perturbed images and the permutation masks for those images as would be output from a xaitk_saliency.interfaces.perturb_image.PerturbImage implementation.

Object detection representations used here would need to encapsulate localization information (i.e. bounding box regions), class scores, and objectness scores (if applicable to the detector, such as YOLOv3). Object detections are converted into (4+1+nClasses) vectors (4 indices for bounding box locations, 1 index for objectness, and nClasses indices for different object classes).

abstract generate(ref_dets: numpy.ndarray, perturbed_dets: numpy.ndarray, perturb_masks: numpy.ndarray) numpy.ndarray

Generate visual saliency heat-map matrices for each reference detection, describing what visual information contributed to the associated reference detection.

We expect input detections to come from a black-box source that outputs our minimum requirements of a bounding-box, per-class scores. Objectness scores are required in our input format, but not necessarily from detection black-box methods as there is a sensible default value for this. See the format_detection() helper function for assistance in forming our input format, which includes this optional default fill-in. We expect objectness is a confidence score valued in the inclusive [0,1] range. We also expect classification scores to be in the inclusive [0,1] range.

We assume that an input detection is coupled with a single truth class (or a single leaf node in a hierarchical structure). Detections input as references (ref_dets parameter) may be either ground truth or predicted detections. As for perturbed image detections input (perturbed_dets), we expect the quantity of detections to be decoupled from the source of reference image detections, which is why below we formulate the shape of perturbed image detections with nProps instead of nDets.

Perturbation mask input into the perturbed_masks parameter here is equivalent to the perturbation mask output from a xaitk_saliency.interfaces.perturb_image.PerturbImage.perturb() method implementation. These should have the shape [nMasks x H x W], and values in range [0, 1], where a value closer to 1 indicate areas of the image that are unperturbed. Note the type of values in masks can be either integer, floating point or boolean within the above range definition. Implementations are responsible for handling these expected variations.

Generated saliency heat-map matrices should be floating-point typed and be composed of values in the [-1,1] range. Positive values of the saliency heat-maps indicate regions which increase object detection scores, while negative values indicate regions which decrease object detection scores according to the model that generated input object detections.

Parameters
  • ref_dets – Detections, objectness and class scores on a reference image as a float-typed array with shape [nDets x (4+1+nClasses)].

  • perturbed_dets – Object detections, objectness and class scores for perturbed variations of the reference image. We expect this to be a float-types array with shape [nMasks x nProps x (4+1+nClasses)].

  • perturb_masks – Perturbation masks numpy.ndarray over the reference image. This should be parallel in association to the detection propositions input into the perturbed_dets parameter. This should have a shape [nMasks x H x W], and values in range [0, 1], where a value closer to 1 indicate areas of the image that are unperturbed.

Returns

A visual saliency heat-map matrix describing each input reference detection. These will be float-typed arrays with shape [nDets x H x W].

Detection formatting helper

The GenerateDetectorProposalSaliency.generate() method takes in a specifically formatted matrix that combines 3 different aspects of common detector model outputs:

  • bounding boxes

  • objectness scores

  • classification scores

We provide a helper function to merge distinct output data into the unified format.

xaitk_saliency.utils.detection.format_detection(bbox_mat: numpy.ndarray, classification_mat: numpy.ndarray, objectness: Optional[numpy.ndarray] = None) numpy.ndarray

Combine detection and classification output, with optional objectness output, into the combined format required for GenerateDetectorProposalSaliency.generate() *_dets input parameters.

We enforce some shape consistency so that we can create a valid output matrix. The input bounding box matrix should be of shape [nDets x 4], the classification matrix should be of shape [nDets x nClasses], and the objectness vector, if provided, should be of size nDets.

If an objectness score vector is not provided, we assume a vector of 1’s.

The output of this function is a matrix that is of shape [nDets x (4+1+nClasses)]. This is the result of horizontally stacking the input in bbox, objectness and classification order. The output matrix data-type will follow numpy’s rules about safe-casting given the combination of input matrix types.

In exceptions about shape mismatches, index 0 refers to the bbox_mat input, index 1 refers to the objectness vector, and index 2 refers to the classification_mat.

Parameters
  • bbox_mat – Matrix of bounding boxes. This matrix should have the shape [nDets x 4]. The format of each row-vector is not important but generally expected to be [left, top, right, bottom] pixel coordinates. This matrix must be of a type that is float-castable.

  • classification_mat – Matrix of classification scores from the detector or detection classifier. This should have the shape of [nDets x nClasses]. This matrix must be of a type that is float-castable.

  • objectness – Optional vector of objectness scores for input detections. This is optional as not all detection models output this aspect. When provided, this should be a vector of ints/floats of size nDets to match the other parameter shapes.

Raises

ValueError – When input matrix shapes are mismatched such that they cannot be horizontally stacked.

Returns

Matrix combining bounding box, objectness and class confidences.

Blackbox Saliency Image Generation

Unlike the previous saliency heatmap generation interfaces, this interface uses a blackbox classifier as input along with a reference image to generate visual saliency heatmaps.

A candidate implementation for this interface is the PerturbationOcclusion implementation or one of its sub-implementations (RISEStack or SlidingWindowStack).

Interface: GenerateImageClassifierBlackboxSaliency

class xaitk_saliency.interfaces.gen_image_classifier_blackbox_sal.GenerateImageClassifierBlackboxSaliency(*args: Any, **kwargs: Any)

This interface for algorithms takes a reference image and an image classifier blackbox algorithm, then generates a number of visual saliency heatmap matrices, one for each class output by the classifier blackbox.

A classifier blackbox needs to be input, which requires some specification in how to operate the blackbox. The smqtk_classifier.ClassifyImage abstract interface is used to provide a minimal form that a blackbox classifier requires: be able to classify an image into confidences for some number of class labels.

Generates a visual saliency heatmap for each input class as a float-type numpy.ndarray of shape [nClasses x H x W].

generate(ref_image: numpy.ndarray, blackbox: smqtk_classifier.interfaces.classify_image.ClassifyImage) numpy.ndarray

Generates per-class visual saliency heatmaps for some classifier blackbox over some image of interest.

The input reference image is expected to be in matrix form and be in either a H x W or H x W x C shape format.

Output saliency map matrix should be (1) in the shape nClasses x H x W, (2) floating-point typed, and (3) composed of values in the [-1, 1] range. nClasses should be the quantity of unique class labels output by the given classifier blackbox. While specific algorithms determine the quantity of heatmaps returned, the height and width of returned heatmaps should be consistent with the input image, i.e. the H and W dimensions should match in size to the reference image’s H and W dimensions. Positive values of the saliency heatmaps indicate regions that increase respective class confidence scores, while negative values indicate regions that decrease respective class confidence scores according to the given blackbox classifier.

Parameters
  • ref_image – Reference image over which visual saliency heatmaps will be generated.

  • blackbox – The blackbox classifier handle to perform arbitrary operations on in order to deduce visual saliency.

Raises

ShapeMismatchError – The implementation result visual saliency heatmap matrix did not have matching height and width components to the reference image.

Returns

A number of visual saliency heatmaps equivalent in number to the quantity of class labels output by the blackbox classifier.

Code Examples

Generating Perturbed Images and Masks:

import PIL.Image
import numpy as np
import numpy.typing as npt
from xaitk_saliency import PerturbImage
from xaitk_saliency.utils.masking import occlude_image_batch


# Define an implementation, or use a discovered plugin.
# This does not need to be defined in-line, but may be instead
# imported from some alternative module, or found via plugin
# discovery.
class PerturbImageImplementation (PerturbImage):
   def perturb(
       self,
       ref_image: npt.ArrayLike
   ) -> np.ndarray:
       ...


...

perturb_image = PerturbImageImplementation()

...

test_image = np.asarray(PIL.Image.open("some/test/image.png"))

# Generate perturbed images and perturbation masks
mask_array = perturb_image(test_image)
perturbed_images = occlude_image_batch(test_image, mask_array)

# Returned sequences should be congruent in length.
assert perturbed_images.shape == mask_array.shape

# Do application-appropriate things with the pairs!
for img, mask in zip(image_seq, mask_array):
   render(img, mask)

Generating Similarity-based Saliency Heatmaps:

import PIL.Image
import numpy as np
from xaitk_saliency import PerturbImage
from xaitk_saliency import GenerateDescriptorSimilaritySaliency
from xaitk_saliency.utils.masking import occlude_image_batch
from MyIntegration import describe_images  # type: ignore


# Pretend we have implementations of the standard interfaces.
class PerturbImageImplementation (PerturbImage):
   ...


 class GenerateDescriptorSimilaritySaliencyImplementation (GenerateDescriptorSimilaritySaliency):
   ...


# Initializing an implementation of perturbation-based algorithms
perturb_image = PerturbImageImplementation()

# Initializing an implementation of similarity-based saliency generator
similarity_saliency = GenerateDescriptorSimilaritySaliencyImplementation()

...

# Loading test image1 from file
test_image_1 = np.asarray(PIL.Image.open("some/test/image1.png"))
# Loading reference image 2 from file
ref_image_2 = np.asarray(PIL.Image.open("some/test/image2.png"))

# Generate perturbed images and perturbation masks on reference image on which
# saliency needs to be computed.
mask_array = perturb_image(ref_image_2)
perturbed_images = occlude_image_batch(ref_image_2, mask_array)

# Compute descriptors for the test, reference and perturbed image.
# This part may be specific to your application or integration.
# The output here is expected to be in the shape [nInputs x nFeats].
test_img_descr, ref_img_descr = describe_images([test_image_1, ref_image_2])
perturb_descr = describe_images(perturbed_images)

# Compute the final similarity based-saliency map using original features from
# both the test and reference images, along with descriptors computed on the
# perturbed versions of the reference image and masks used to perturb the
# reference image
similarity_saliency_map = similarity_saliency(
  test_img_descr,  # shape: [nFeats]
  ref_img_descr,  # shape: [nFeats]
  perturb_descr,  # shape: [len(perturbed_images), nFeats]
  mask_array  # shape: [len(perturbed_images), ref_image_2.height, ref_image_2.width]
)
# The shape of the output heatmap should be congruent to the shape of input
# perturbation masks.
assert similarity_saliency_map.shape == mask_array[0].shape

Generating Classification-based Saliency Heatmaps:

import PIL.Image
import numpy as np
from xaitk_saliency import PerturbImage
from xaitk_saliency import GenerateClassifierConfidenceSaliency
from xaitk_saliency.utils.masking import occlude_image_batch
from MyIntegration import classify_images  # type: ignore


# Pretend we have implementations of the standard interfaces.
class PerturbImageImplementation (PerturbImage):
  ...


class GenerateClassifierConfidenceSaliencyImplementation (GenerateClassifierConfidenceSaliency):
  ...


# Initializing an implementation of perturbation-based algorithms
perturb_image = PerturbImageImplementation()

# Initializing an implementation of classifier-based saliency generator
classifier_saliency = GenerateClassifierConfidenceSaliencyImplementation()

...

# Loading reference image from file
ref_image = np.asarray(PIL.Image.open("some/test/image.png"))

# Generate perturbed images and perturbation masks on
# reference image on which saliency needs to be computed
mask_array = perturb_image(ref_image)
perturbed_images = occlude_image_batch(ref_image, mask_array)

# Compute class confidence predictions for reference and perturbed images.
# We assume for this example that this black-box image classification function
# returns a matrix of class label confidences with different class labels
# corresponding to different columns of the output matrix, whose shape will be
# [nInputs x nClasses].
ref_class_confs = classify_images([ref_image])[0]
perturbed_class_confs = classify_images(perturbed_images)

# We will also show the example case where we do not want to pass along all
# class confidences for saliency map generation, but only a select few.
# Maybe this would be defined by some interface or configuration.
pertinent_class_indices = [1, 4, 10]
ref_class_confs2 = ref_class_confs[pertinent_class_indices]
perturbed_class_confs2 = perturbed_class_confs[..., pertinent_class_indices]

# Computing the final classifier-based saliency map using
# classifier confidence on the original feature vector of an reference image
# along with the classifier confidence on all descriptors computed on the
# perturbed versions of the reference image and masks used to perturb the reference
# image
classifier_saliency_map = classifier_saliency(
  ref_class_confs2,  # shape: [len(pertinent_class_indices)]
  perturbed_class_confs2,  # shape: [len(perturbed_images), len(pertinent_class_indices)]
  mask_array  # shape: [len(perturbed_images), ref_image.height, ref_image.width]
)
# There should be an equal number of saliency maps output as the number of
# distinct class confidences input:
assert len(classifier_saliency_map) == len(pertinent_class_indices)
# The shape of the output heatmap should be congruent to the shape of input
# perturbation masks.
assert classifier_saliency_map[0].shape == mask_array[0].shape

References

  1. Dong B, Collins R, Hoogs A. Explainability for Content-Based Image Retrieval. InCVPR Workshops 2019 Jun (pp. 95-98).

  2. Petsiuk V, Das A, Saenko K. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421. 2018 Jun 19.

  3. Zeiler MD, Fergus R. Visualizing and understanding convolutional networks (2013). arXiv preprint arXiv:1311.2901. 2013.

  4. Petsiuk V, Jain R, Manjunatha V, Morariu VI, Mehra A, Ordonez V, Saenko K. Black-box explanation of object detectors via saliency maps. arXiv preprint arXiv:2006.03204. 2020 Jun 5.