data_structures
Class ADC

java.lang.Object
  extended by data_structures.AbstractSearchStructure
      extended by data_structures.ADC

public class ADC
extends AbstractSearchStructure

This class implements ADC (Asymmetric Distance Computation). It can be used for indexing and searching a database according to ADC approach.

Author:
Eleftherios Spyromitros-Xioufis

Field Summary
private  com.sleepycat.je.Database adcBDB
          Berkeley db for persistence of the ADC index
private  gnu.trove.list.array.TIntArrayList ids
          The ids of the corresponding images are stored in this list.
private  int numProductCentroids
          The number of centroids used to quantize each sub-vector.
private  int numSubVectors
          The number of sub-vectors used in this product quantization.
private  long persistentIndexUpdateTime
           
private  gnu.trove.list.array.TByteArrayList pqByteCodes
          The product-quantization codes for all vector are stored in this list if the code can fit in the byte range.
private  gnu.trove.list.array.TShortArrayList pqShortCodes
          The product-quantization codes for all vector are stored in this list if the code cannot fit in the byte range.
private  long productQuantizationTime
           
private  double[][][] productQuantizer
          The sub-quantizers.
private  RandomOrthogonalTransformation randomTransform
          This object is used for applying random orthogonal transformation prior to product quantization.
private  boolean randomTransformation
          Whether to apply random orthogonal transformation on the vectors prior to product quantization.
private  int subVectorLength
          The length of each subvector.
 
Fields inherited from class data_structures.AbstractSearchStructure
cachePercent, cacheSize, countSizeOnLoad, dbEnv, idToNameBDB, idToNameMappings, loadCounter, maxNumVectors, nameToIdBDB, numIndexingThreads, readOnly, syncRate, transactional, vectorLength
 
Constructor Summary
ADC(int vectorLength, int subVectorLength, int numProductCentroids, boolean randomTransformation, int maxNumVectors, int loadCounter, java.lang.String BDBEnvHome, boolean readOnly, boolean countSizeOnLoad)
          Advanced constructor.
ADC(int vectorLength, int subVectorLength, int numProductCentroids, boolean randomTransformation, int maxNumVectors, java.lang.String BDBEnvHome)
          Simple constructor.
 
Method Summary
 void closeInternal()
          Each subclass should implement this method to close the BDB databases that it uses.
private  com.aliasi.util.BoundedPriorityQueue<Result> computeKNN_ADC(int k, double[] queryVector)
           
 double[][] computeLookupADC(double[] queryVector)
          Takes a query vector as input and returns a look-up table containing the distance between each sub-vector of the query vector from each centroid of each sub-quantizer.
protected  com.aliasi.util.BoundedPriorityQueue<Result> computeNearestNeighborsInternal(int k, double[] query)
          This method returns a bounded priority queue of Result objects, corresponding to the k nearest neighbors' ids and their distances from the query vector, ordered by lowest distance.
private  int computeNearestProductIndex(double[] subvector, int subQuantizerIndex)
          Finds and returns the index of the centroid of the subquantizer with the given index which is closer to the given subvector.
protected  void indexVectorInternal(int id, double[] vector, com.sleepycat.je.Transaction txn)
          Append the pqCodes and ids arrays with the code and the id of the given vector.
private  void loadIndexInMemory()
          The previous example shows how to scan through the records in your database sequentially; that is, in the record's sort order.
 void loadProductQuantizer(java.lang.String filname)
          Load the product quantizer from a given file
 void outputIndexingTimesInternal()
          This method can be called to output indexing time measurements.
 java.lang.String toString()
           
private  void updatePersistentIndex(int id, byte[] code, com.sleepycat.je.Transaction txn)
           
private  void updatePersistentIndex(int id, short[] code, com.sleepycat.je.Transaction txn)
           
 
Methods inherited from class data_structures.AbstractSearchStructure
close, computeNearestNeighbors, createOrOpenBDBEnvAndDbs, getExternalId, getInternalId, getLoadCounter, indexVector, isIndexed, outputIndexingTimes
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

productQuantizationTime

private long productQuantizationTime

persistentIndexUpdateTime

private long persistentIndexUpdateTime

productQuantizer

private double[][][] productQuantizer
The sub-quantizers. They are needed for performing kNN queries using the ADC approach. A three dimensional table storing the sub-quantizers of the product quantizer. The first dimension goes from 1...numSubquantizers and indexes the sub-quantizers. The second dimension goes from 1...numProductCentroids and indexes the centroids of each sub-quantizer of the product quantizer. The third dimension goes from 1...subVectorLength and indexes the dimensions of each centroid of each sub-quantizer of the product quantizer.


numSubVectors

private int numSubVectors
The number of sub-vectors used in this product quantization.


numProductCentroids

private int numProductCentroids
The number of centroids used to quantize each sub-vector. (Depending on this number we can use a different type for storing the quantization code of each sub-vector. E.g. For k=256=2^8 centroids we need 8 bits, for k=1024=2^10 we need 10 bits. Currently we use a short for the quantization sub-codes, so 16 bits)


subVectorLength

private int subVectorLength
The length of each subvector.


pqByteCodes

private gnu.trove.list.array.TByteArrayList pqByteCodes
The product-quantization codes for all vector are stored in this list if the code can fit in the byte range.


pqShortCodes

private gnu.trove.list.array.TShortArrayList pqShortCodes
The product-quantization codes for all vector are stored in this list if the code cannot fit in the byte range.


ids

private gnu.trove.list.array.TIntArrayList ids
The ids of the corresponding images are stored in this list.


randomTransformation

private boolean randomTransformation
Whether to apply random orthogonal transformation on the vectors prior to product quantization.


randomTransform

private RandomOrthogonalTransformation randomTransform
This object is used for applying random orthogonal transformation prior to product quantization.


adcBDB

private com.sleepycat.je.Database adcBDB
Berkeley db for persistence of the ADC index

Constructor Detail

ADC

public ADC(int vectorLength,
           int subVectorLength,
           int numProductCentroids,
           boolean randomTransformation,
           int maxNumVectors,
           int loadCounter,
           java.lang.String BDBEnvHome,
           boolean readOnly,
           boolean countSizeOnLoad)
    throws java.lang.Exception
Advanced constructor.

Parameters:
vectorLength - The dimensionality of the vectors.
subVectorLength - The subvector length.
numProductCentroids - The number of product centroids.
randomTransformation - Whether to perform random transformation.
maxNumVectors - The maximum number of vectors.
loadCounter - The initial value of the loadCounter.
BDBEnvHome - The BDB environment home directory.
readOnly - If true the persistent store will opened only for read access (allows multiple opens).
countSizeOnLoad - Whether the loadCounter will be initialized by the size of the persistent store.
Throws:
java.lang.Exception

ADC

public ADC(int vectorLength,
           int subVectorLength,
           int numProductCentroids,
           boolean randomTransformation,
           int maxNumVectors,
           java.lang.String BDBEnvHome)
    throws java.lang.Exception
Simple constructor.

Parameters:
vectorLength -
subVectorLength -
numProductCentroids -
randomTransformation -
maxNumVectors -
BDBEnvHome -
Throws:
java.lang.Exception
Method Detail

loadProductQuantizer

public void loadProductQuantizer(java.lang.String filname)
                          throws java.io.IOException
Load the product quantizer from a given file

Parameters:
filname -
Throws:
java.io.IOException

loadIndexInMemory

private void loadIndexInMemory()
                        throws java.lang.Exception
The previous example shows how to scan through the records in your database sequentially; that is, in the record's sort order. This is mostly determined by the value contained in the records' keys (additional sorting is required in the case of duplicate records). However, you can use cursors to retrieve records based on how they are stored on disk. This can improve retrieval times, and is useful if your application needs to scan all the records in the database quickly, without concern for key sort order. You do this using the DiskOrderedCursor class.

Throws:
java.lang.Exception

indexVectorInternal

protected void indexVectorInternal(int id,
                                   double[] vector,
                                   com.sleepycat.je.Transaction txn)
                            throws java.lang.Exception
Append the pqCodes and ids arrays with the code and the id of the given vector.

Specified by:
indexVectorInternal in class AbstractSearchStructure
Parameters:
id -
vector -
txn - A transaction handle.
Throws:
java.lang.Exception

updatePersistentIndex

private void updatePersistentIndex(int id,
                                   byte[] code,
                                   com.sleepycat.je.Transaction txn)

updatePersistentIndex

private void updatePersistentIndex(int id,
                                   short[] code,
                                   com.sleepycat.je.Transaction txn)

computeNearestProductIndex

private int computeNearestProductIndex(double[] subvector,
                                       int subQuantizerIndex)
Finds and returns the index of the centroid of the subquantizer with the given index which is closer to the given subvector.

Parameters:
subvector -
subQuantizerIndex -
Returns:

computeLookupADC

public double[][] computeLookupADC(double[] queryVector)
Takes a query vector as input and returns a look-up table containing the distance between each sub-vector of the query vector from each centroid of each sub-quantizer. The calculation of this look-up table requires numSubquantizers*kSubquantizer*subVectorLength multiplications. After this calculation, the distance between the query and any vector in the database can be computed in constant time.

Parameters:
queryVector -
Returns:
a lookup table of size numSubquantizers * kSubquantizer with the distance of each sub-vector from the kSubquantizer centroids of each sub-quantizer

computeNearestNeighborsInternal

protected com.aliasi.util.BoundedPriorityQueue<Result> computeNearestNeighborsInternal(int k,
                                                                                       double[] query)
                                                                                throws java.lang.Exception
Description copied from class: AbstractSearchStructure
This method returns a bounded priority queue of Result objects, corresponding to the k nearest neighbors' ids and their distances from the query vector, ordered by lowest distance. A subclass specific implementation.

Specified by:
computeNearestNeighborsInternal in class AbstractSearchStructure
Parameters:
k - The number of nearest neighbors to return.
query - The query vector.
Returns:
A bounded priority queue of vector objects.
Throws:
java.lang.Exception

computeKNN_ADC

private com.aliasi.util.BoundedPriorityQueue<Result> computeKNN_ADC(int k,
                                                                    double[] queryVector)

closeInternal

public void closeInternal()
Description copied from class: AbstractSearchStructure
Each subclass should implement this method to close the BDB databases that it uses.

Specified by:
closeInternal in class AbstractSearchStructure

outputIndexingTimesInternal

public void outputIndexingTimesInternal()
This method can be called to output indexing time measurements.

Specified by:
outputIndexingTimesInternal in class AbstractSearchStructure

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object