data_structures
Class IVFADC

java.lang.Object
  extended by data_structures.AbstractSearchStructure
      extended by data_structures.IVFADC

public class IVFADC
extends AbstractSearchStructure

This class implements IVFADC (Indexed Very Fast Asymmetric Distance Computation). It can be used for indexing and searching a database according to IVFADC approach.

Author:
Eleftherios Spyromitros-Xioufis

Field Summary
private  long coarseQuantizationTime
           
private  double[][] coarseQuantizer
          The coarse quantizer.
private  gnu.trove.list.array.TByteArrayList[] invertedListByteVectors
          This vector of TByteArrayList objects is used to store the inverted lists which contain indexed vectors.
private  gnu.trove.list.array.TIntArrayList[] invertedListIds
           
private  gnu.trove.list.array.TShortArrayList[] invertedListShortVectors
           
private  com.sleepycat.je.Database ivfadcBDB
          Berkeley db for persistence of the IVFADC index
private  int numCoarseCentroids
          Number of centroids in the coarse quantizer
private  int numProductCentroids
          The number of centroids used to quantize each sub-vector.
private  int numSubVectors
          The number of sub-vectors used in this product quantization.
private  long persistentIndexUpdateTime
           
private  long productQuantizationTime
           
private  double[][][] productQuantizer
          The sub-quantizers.
private  boolean randomBeforeProductQuantization
          Whether to apply random orthogonal transformation on the residuals prior to product quantization.
private  RandomOrthogonalTransformation randomTransform
          This object is used for applying random orthogonal transformation prior to product quantization.
private  int subVectorLength
          The length of each subvector of the residual vectors.
private  int w
          The number of lists to visit.
 
Fields inherited from class data_structures.AbstractSearchStructure
cachePercent, cacheSize, countSizeOnLoad, dbEnv, idToNameBDB, idToNameMappings, loadCounter, maxNumVectors, nameToIdBDB, numIndexingThreads, readOnly, syncRate, transactional, vectorLength
 
Constructor Summary
IVFADC(int vectorLength, int numCoarseCentroids, int subVectorLength, int numProductCentroids, boolean randomTransformation, int loadCounter, int maxNumVectors, java.lang.String BDBEnvHome, boolean readOnly, boolean countSizeOnLoad)
          Advanced constructor.
IVFADC(int vectorLength, int numCoarseCentroids, int subVectorLength, int numProductCentroids, boolean randomTransformation, int maxNumVectors, java.lang.String BDBEnvHome)
          Simple constructor.
 
Method Summary
 void closeInternal()
          Each subclass should implement this method to close the BDB databases that it uses.
private  com.aliasi.util.BoundedPriorityQueue<Result> computeKNN_IVFADC(int k, double[] query, int w)
           
private  double[][] computeLookupADC(double[] queryVector)
          Takes a query residual vector as input and returns a look-up table containing the distance between each sub-vector of the query vector from each centroid of each sub-quantizer.
private  int computeNearestCoarseIndex(double[] vector)
          Finds and returns the index of the coarse quantizer's centroid which is closer to the given vector.
protected  int[] computeNearestCoarseIndices(double[] vector, int k)
          Returns the indices of the k coarse centroids which are closer to the given vector.
protected  com.aliasi.util.BoundedPriorityQueue<Result> computeNearestNeighborsInternal(int k, double[] query)
          This method returns a bounded priority queue of Result objects, corresponding to the k nearest neighbors' ids and their distances from the query vector, ordered by lowest distance.
private  int computeNearestProductIndex(double[] subvector, int subQuantizerIndex)
          Finds and returns the index of the centroid of the subquantizer with the given index which is closer to the given subvector.
private  double[] computeResidualVector(double[] vector, int centroidIndex)
           
 double[][] getCoarseQuantizer()
           
 double[][][] getProductQuantizer()
           
 void indexVectorInternal(int id, double[] vector, com.sleepycat.je.Transaction txn)
          Update the index with the given VLAD vector.Depending on how the coarse and product quantizers are learned, we should apply the appropriate transformations in each step.
 void loadCoarseQuantizer(java.lang.String coarseQuantizerFile)
          Load the coarse quantizers from a given file
private  void loadIndexInMemory()
          The previous example shows how to scan through the records in your database sequentially; that is, in the record's sort order.
 void loadProductQuantizer(java.lang.String filname)
          Load the product quantizer from a given file
 void outputIndexingTimesInternal()
          This method can be called to output indexing time measurements.
 void outputItemsPerList()
           
 void setCoarseQuantizer(double[][] coarseQuantizer)
          With this method we initialize the table holding the coarse quantizers.
 void setProductQuantizer(double[][][] productQuantizer)
          With this method we initialize the table holding the product quantizer.
 void setW(int w)
           
private  void updatePersistentIndex(int id, int listId, byte[] code, com.sleepycat.je.Transaction txn)
           
private  void updatePersistentIndex(int id, int listId, short[] code, com.sleepycat.je.Transaction txn)
           
 
Methods inherited from class data_structures.AbstractSearchStructure
close, computeNearestNeighbors, createOrOpenBDBEnvAndDbs, getExternalId, getInternalId, getLoadCounter, indexVector, isIndexed, outputIndexingTimes
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

coarseQuantizationTime

private long coarseQuantizationTime

productQuantizationTime

private long productQuantizationTime

persistentIndexUpdateTime

private long persistentIndexUpdateTime

coarseQuantizer

private double[][] coarseQuantizer
The coarse quantizer. A two dimensional table storing the coarse-quantizers. The 1st dimension goes from 1...numCoarseCentroids and indexes the centroids of the coarse quantizer. The 2nd dimension goes from 1...vectorLength and indexes the dimensions of each centroid.


randomBeforeProductQuantization

private boolean randomBeforeProductQuantization
Whether to apply random orthogonal transformation on the residuals prior to product quantization.


productQuantizer

private double[][][] productQuantizer
The sub-quantizers. They are needed for performing kNN queries using the ADC approach. A three dimensional table storing the sub-quantizers of the product quantizer. The first dimension goes from 1...numSubquantizers and indexes the sub-quantizers. The second dimension goes from 1...numProductCentroids and indexes the centroids of each sub-quantizer of the product quantizer. The third dimension goes from 1...subVectorLength and indexes the dimensions of each centroid of each sub-quantizer of the product quantizer.


numSubVectors

private int numSubVectors
The number of sub-vectors used in this product quantization.


numProductCentroids

private int numProductCentroids
The number of centroids used to quantize each sub-vector. (Depending on this number we can use a different type for storing the quantization code of each sub-vector. E.g. For k=256=2^8 centroids we need 8 bits, for k=1024=2^10 we need 10 bits. Currently we use a short for the quantization sub-codes, so 16 bits)


numCoarseCentroids

private int numCoarseCentroids
Number of centroids in the coarse quantizer


subVectorLength

private int subVectorLength
The length of each subvector of the residual vectors.


w

private int w
The number of lists to visit.


randomTransform

private RandomOrthogonalTransformation randomTransform
This object is used for applying random orthogonal transformation prior to product quantization.


invertedListByteVectors

private gnu.trove.list.array.TByteArrayList[] invertedListByteVectors
This vector of TByteArrayList objects is used to store the inverted lists which contain indexed vectors. All lists are constructed with an equal initial capacity which is equal to the maximum number of vectors that we want to index, divided by the number of lists. This is done in order to avoid list expansions which can increase the indexing time.


invertedListShortVectors

private gnu.trove.list.array.TShortArrayList[] invertedListShortVectors

invertedListIds

private gnu.trove.list.array.TIntArrayList[] invertedListIds

ivfadcBDB

private com.sleepycat.je.Database ivfadcBDB
Berkeley db for persistence of the IVFADC index

Constructor Detail

IVFADC

public IVFADC(int vectorLength,
              int numCoarseCentroids,
              int subVectorLength,
              int numProductCentroids,
              boolean randomTransformation,
              int loadCounter,
              int maxNumVectors,
              java.lang.String BDBEnvHome,
              boolean readOnly,
              boolean countSizeOnLoad)
       throws java.lang.Exception
Advanced constructor.

Parameters:
vectorLength - The dimensionality of the vectors.
numCoarseCentroids -
subVectorLength - The subvector length.
numProductCentroids - The number of product centroids.
randomTransformation - Whether to perform random transformation.
loadCounter - The initial value of the loadCounter.
maxNumVectors - The maximum number of vectors.
BDBEnvHome - The BDB environment home directory.
readOnly - If true the persistent store will opened only for read access (allows multiple opens).
countSizeOnLoad - Whether the loadCounter will be initialized by the size of the persistent store.
Throws:
java.lang.Exception

IVFADC

public IVFADC(int vectorLength,
              int numCoarseCentroids,
              int subVectorLength,
              int numProductCentroids,
              boolean randomTransformation,
              int maxNumVectors,
              java.lang.String BDBEnvHome)
       throws java.lang.Exception
Simple constructor.

Parameters:
vectorLength -
numCoarseCentroids -
subVectorLength -
numProductCentroids -
randomTransformation -
maxNumVectors -
BDBEnvHome -
Throws:
java.lang.Exception
Method Detail

computeKNN_IVFADC

private com.aliasi.util.BoundedPriorityQueue<Result> computeKNN_IVFADC(int k,
                                                                       double[] query,
                                                                       int w)
                                                                throws java.lang.Exception
Parameters:
k - the number of nearest neighbors to return
query - the non quantized query vector
w - the number of list to be searched for neighbors (equal to the number of nearest coarse centroids where the query vector is assigned)
Returns:
Throws:
java.lang.Exception

computeLookupADC

private double[][] computeLookupADC(double[] queryVector)
Takes a query residual vector as input and returns a look-up table containing the distance between each sub-vector of the query vector from each centroid of each sub-quantizer. The calculation of this look-up table requires numSubquantizers*kSubquantizer*subVectorLength multiplications. After this calculation, the distance between the query and any vector in the database can be computed in constant time (with numSubquantizers additions).

Parameters:
queryVector - (residual)
Returns:
a lookup table of size numSubquantizers * kSubquantizer with the distance of each su-bvector from the kSubquantizer centroids of each sub-quantizer

computeNearestCoarseIndex

private int computeNearestCoarseIndex(double[] vector)
Finds and returns the index of the coarse quantizer's centroid which is closer to the given vector.

Parameters:
vector -
Returns:

computeNearestCoarseIndices

protected int[] computeNearestCoarseIndices(double[] vector,
                                            int k)
Returns the indices of the k coarse centroids which are closer to the given vector. Fast implementation with a bounded priority queue.

Parameters:
vector -
k -
Returns:

computeNearestNeighborsInternal

protected com.aliasi.util.BoundedPriorityQueue<Result> computeNearestNeighborsInternal(int k,
                                                                                       double[] query)
                                                                                throws java.lang.Exception
Description copied from class: AbstractSearchStructure
This method returns a bounded priority queue of Result objects, corresponding to the k nearest neighbors' ids and their distances from the query vector, ordered by lowest distance. A subclass specific implementation.

Specified by:
computeNearestNeighborsInternal in class AbstractSearchStructure
Parameters:
k - The number of nearest neighbors to return.
query - The query vector.
Returns:
A bounded priority queue of vector objects.
Throws:
java.lang.Exception

computeNearestProductIndex

private int computeNearestProductIndex(double[] subvector,
                                       int subQuantizerIndex)
Finds and returns the index of the centroid of the subquantizer with the given index which is closer to the given subvector.

Parameters:
subvector -
subQuantizerIndex -
Returns:

computeResidualVector

private double[] computeResidualVector(double[] vector,
                                       int centroidIndex)
                                throws java.lang.Exception
Throws:
java.lang.Exception

getCoarseQuantizer

public double[][] getCoarseQuantizer()

getProductQuantizer

public double[][][] getProductQuantizer()

indexVectorInternal

public void indexVectorInternal(int id,
                                double[] vector,
                                com.sleepycat.je.Transaction txn)
                         throws java.lang.Exception
Update the index with the given VLAD vector.Depending on how the coarse and product quantizers are learned, we should apply the appropriate transformations in each step.

Specified by:
indexVectorInternal in class AbstractSearchStructure
Parameters:
id -
vector -
txn - A transaction handle.
Throws:
java.lang.Exception

outputIndexingTimesInternal

public void outputIndexingTimesInternal()
This method can be called to output indexing time measurements.

Specified by:
outputIndexingTimesInternal in class AbstractSearchStructure

outputItemsPerList

public void outputItemsPerList()

loadCoarseQuantizer

public void loadCoarseQuantizer(java.lang.String coarseQuantizerFile)
                         throws java.io.IOException
Load the coarse quantizers from a given file

Parameters:
filname -
Throws:
java.io.IOException

loadProductQuantizer

public void loadProductQuantizer(java.lang.String filname)
                          throws java.io.IOException
Load the product quantizer from a given file

Parameters:
filname -
Throws:
java.io.IOException

setCoarseQuantizer

public void setCoarseQuantizer(double[][] coarseQuantizer)
With this method we initialize the table holding the coarse quantizers.

Parameters:
coarseQuantizer -

setProductQuantizer

public void setProductQuantizer(double[][][] productQuantizer)
With this method we initialize the table holding the product quantizer.

Parameters:
productQuantizer -

setW

public void setW(int w)

loadIndexInMemory

private void loadIndexInMemory()
                        throws java.lang.Exception
The previous example shows how to scan through the records in your database sequentially; that is, in the record's sort order. This is mostly determined by the value contained in the records' keys (additional sorting is required in the case of duplicate records). However, you can use cursors to retrieve records based on how they are stored on disk. This can improve retrieval times, and is useful if your application needs to scan all the records in the database quickly, without concern for key sort order. You do this using the DiskOrderedCursor class.

Throws:
java.lang.Exception

updatePersistentIndex

private void updatePersistentIndex(int id,
                                   int listId,
                                   byte[] code,
                                   com.sleepycat.je.Transaction txn)

updatePersistentIndex

private void updatePersistentIndex(int id,
                                   int listId,
                                   short[] code,
                                   com.sleepycat.je.Transaction txn)

closeInternal

public void closeInternal()
Description copied from class: AbstractSearchStructure
Each subclass should implement this method to close the BDB databases that it uses.

Specified by:
closeInternal in class AbstractSearchStructure