data_structures
Class AbstractSearchStructure

java.lang.Object
  extended by data_structures.AbstractSearchStructure
Direct Known Subclasses:
ADC, IVFADC, VladArray

public abstract class AbstractSearchStructure
extends java.lang.Object

This abstract class abstracts the bdb environment creation and name to id mapping operations from the actual indexing structures. In the future we can support multi-threaded indexing and querying.

Author:
Eleftherios Spyromitros-Xioufis

Field Summary
protected  int cachePercent
          The percentage of the total memory given to the program that will be used as the BDB cache.
protected  int cacheSize
          The total memory given to the program that will be used as the BDB cache.
protected  boolean countSizeOnLoad
          Whether the load counter should be initialized when the database is loaded by counting the id-name mappings.
protected  com.sleepycat.je.Environment dbEnv
          The database environment.
protected  com.sleepycat.je.Database idToNameBDB
          Berkeley db holding id to name mappings, required for name look-up during nearest neighbor search.
protected  int idToNameMappings
          Counts the number of persisted id to name mappings.
protected  int loadCounter
          Keeps track of the total number of indexed vectors, acts as an auto-increment primary key field.
protected  int maxNumVectors
          The maximum number of vectors that can be indexed.
protected  com.sleepycat.je.Database nameToIdBDB
          Berkeley db holding name to id mappings, required during indexing to fast check if a name is already indexed.
protected  int numIndexingThreads
          The number of threads to use during indexing.
protected  boolean readOnly
          Whether the db should open only for read access.
protected  int syncRate
          The rate at which the disk-based indexed should be updated.
private  long totalInternalVectorIndexingTime
          Average time taken for internal vector indexing operations.
private  long totalNameMappingTime
          Average time taken to create a name to id and the reverse mapping.
private  long totalVectorIndexingTime
          Average total time taken to index a vector.
protected  boolean transactional
          Whether the environment will be transactional.
protected  int vectorLength
          The length of the raw vectors being indexed.
 
Constructor Summary
AbstractSearchStructure(int vectorLength, int loadCounter, int maxNumVectors, boolean readOnly, boolean countSizeOnLoad)
          This constructor can be used when we do not want persistence.
 
Method Summary
 void close()
          This method closes the BDB environment and databases!
protected abstract  void closeInternal()
          Each subclass should implement this method to close the BDB databases that it uses.
 Answer computeNearestNeighbors(int k, double[] queryVector)
          This method returns an array of Result objects, corresponding to the k nearest neighbors' ids, names and their distances from the query vector, ordered by lowest distance.
protected abstract  com.aliasi.util.BoundedPriorityQueue<Result> computeNearestNeighborsInternal(int k, double[] queryVector)
          This method returns a bounded priority queue of Result objects, corresponding to the k nearest neighbors' ids and their distances from the query vector, ordered by lowest distance.
private  void createMapping(int id, java.lang.String name, com.sleepycat.je.Transaction txn)
          This method is used to create a persistent mapping between a given id and name.
private  void createOrOpenBDBDbs()
          This method creates and/or opens the BDB dbs.
private  void createOrOpenBDBEnv(java.lang.String BDBEnvHome)
          This method creates and/or opens the BDB environment in the supplied directory.
protected  void createOrOpenBDBEnvAndDbs(java.lang.String BDBEnvHome)
          This method creates or opens (if it already exists) the BDB environment and the BDB dbs.
 java.lang.String getExternalId(int id, com.sleepycat.je.Transaction txn)
          This method returns the name or other external unique identifier of the image which was assigned the given internal id.
 int getInternalId(java.lang.String name, com.sleepycat.je.Transaction txn)
          This method returns the internal id assigned in the image with the given name or other external unique identifier.
 int getLoadCounter()
          Returns the current value of the loadCounter.
 boolean indexVector(java.lang.String name, double[] vector)
          Updates the index with the given vector.
protected abstract  void indexVectorInternal(int id, double[] vector, com.sleepycat.je.Transaction txn)
          This method should be implemented in all subclasses and do the operations required for indexing the given vector and id.
 boolean isIndexed(java.lang.String name, com.sleepycat.je.Transaction txn)
          Checks if the vector with the given name is already indexed.
 void outputIndexingTimes()
          This method can be called to output indexing time measurements.
protected abstract  void outputIndexingTimesInternal()
          Index specific time measurements.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

numIndexingThreads

protected int numIndexingThreads
The number of threads to use during indexing. Currently not used.


syncRate

protected int syncRate
The rate at which the disk-based indexed should be updated. Currently, this field is not used, the index is updated for every new vector.


nameToIdBDB

protected com.sleepycat.je.Database nameToIdBDB
Berkeley db holding name to id mappings, required during indexing to fast check if a name is already indexed.


idToNameBDB

protected com.sleepycat.je.Database idToNameBDB
Berkeley db holding id to name mappings, required for name look-up during nearest neighbor search.


dbEnv

protected com.sleepycat.je.Environment dbEnv
The database environment. Access to this field is needed by specific data structures that implement persistence. If this field is null it means that persistence is not implemented.


transactional

protected final boolean transactional
Whether the environment will be transactional. For more information on what this means, refer to the BDB documentation.

See Also:
Constant Field Values

cachePercent

protected final int cachePercent
The percentage of the total memory given to the program that will be used as the BDB cache.

See Also:
Constant Field Values

cacheSize

protected final int cacheSize
The total memory given to the program that will be used as the BDB cache.

See Also:
Constant Field Values

idToNameMappings

protected int idToNameMappings
Counts the number of persisted id to name mappings. This value can be different from loadCounter when maxNumVectors is smaller than the number of persisted vectors.


vectorLength

protected int vectorLength
The length of the raw vectors being indexed.


maxNumVectors

protected final int maxNumVectors
The maximum number of vectors that can be indexed. An exception is thrown when this number is reached.


loadCounter

protected int loadCounter
Keeps track of the total number of indexed vectors, acts as an auto-increment primary key field.


totalNameMappingTime

private long totalNameMappingTime
Average time taken to create a name to id and the reverse mapping.


totalInternalVectorIndexingTime

private long totalInternalVectorIndexingTime
Average time taken for internal vector indexing operations.


totalVectorIndexingTime

private long totalVectorIndexingTime
Average total time taken to index a vector.


countSizeOnLoad

protected boolean countSizeOnLoad
Whether the load counter should be initialized when the database is loaded by counting the id-name mappings. This count incurs a large cost when loading very large databases and can be set to false for efficiency reasons.


readOnly

protected boolean readOnly
Whether the db should open only for read access.

Constructor Detail

AbstractSearchStructure

public AbstractSearchStructure(int vectorLength,
                               int loadCounter,
                               int maxNumVectors,
                               boolean readOnly,
                               boolean countSizeOnLoad)
This constructor can be used when we do not want persistence.

Parameters:
vectorLength - The dimensionality of the VLAD vectors being indexed.
loadCounter - The initial value of the loadCounter.
maxNumVectors - The maximum number of vectors which we allow to be indexed.
readOnly - If true the persistent store will opened only for read access (allows multiple opens).
countSizeOnLoad - Whether the loadCounter will be initialized by the size of the persistent store.
Method Detail

getLoadCounter

public int getLoadCounter()
Returns the current value of the loadCounter.

Returns:

createOrOpenBDBEnvAndDbs

protected void createOrOpenBDBEnvAndDbs(java.lang.String BDBEnvHome)
                                 throws java.lang.Exception
This method creates or opens (if it already exists) the BDB environment and the BDB dbs.

Parameters:
BDBEnvHome - The directory where the BDB environment will be created.
Throws:
java.lang.Exception

createOrOpenBDBEnv

private void createOrOpenBDBEnv(java.lang.String BDBEnvHome)
                         throws java.lang.Exception
This method creates and/or opens the BDB environment in the supplied directory. We can tune the configuration in this method for better efficiency / less persistence!

Parameters:
BDBEnvHome - The directory where the BDB environment will be created.
Throws:
java.lang.Exception

createOrOpenBDBDbs

private void createOrOpenBDBDbs()
                         throws java.lang.Exception
This method creates and/or opens the BDB dbs.

Throws:
java.lang.Exception

close

public void close()
           throws java.lang.Exception
This method closes the BDB environment and databases!

Throws:
java.lang.Exception

closeInternal

protected abstract void closeInternal()
Each subclass should implement this method to close the BDB databases that it uses.


isIndexed

public boolean isIndexed(java.lang.String name,
                         com.sleepycat.je.Transaction txn)
Checks if the vector with the given name is already indexed. This method is useful to avoid re-indexing the same vector. Its convention is that if the given name is already in nameToIdBDB, then the image is indexed in all other structures e.g. IdToNameBDB. The rest of the checks are avoided for better efficiency.

Parameters:
name - The name the vector to be indexed.
txn - A transaction handle.
Returns:
true if the given name is already indexed

getInternalId

public int getInternalId(java.lang.String name,
                         com.sleepycat.je.Transaction txn)
                  throws java.lang.Exception
This method returns the internal id assigned in the image with the given name or other external unique identifier.

Parameters:
name - The name or other external unique identifier of the image.
txn - A transaction handle.
Returns:
The internal id assigned to this image.
Throws:
java.lang.Exception

createMapping

private void createMapping(int id,
                           java.lang.String name,
                           com.sleepycat.je.Transaction txn)
This method is used to create a persistent mapping between a given id and name. Should be called every time that a new vector is going to be indexed.

Parameters:
id - The assigned id.
name - The name.
txn - A transaction handle.

getExternalId

public java.lang.String getExternalId(int id,
                                      com.sleepycat.je.Transaction txn)
                               throws java.lang.Exception
This method returns the name or other external unique identifier of the image which was assigned the given internal id.

Parameters:
id - The id of the image.
txn - A transaction handle.
Returns:
The name or other external unique identifier mapped to the given id.
Throws:
java.lang.Exception

computeNearestNeighborsInternal

protected abstract com.aliasi.util.BoundedPriorityQueue<Result> computeNearestNeighborsInternal(int k,
                                                                                                double[] queryVector)
                                                                                         throws java.lang.Exception
This method returns a bounded priority queue of Result objects, corresponding to the k nearest neighbors' ids and their distances from the query vector, ordered by lowest distance. A subclass specific implementation.

Parameters:
k - The number of nearest neighbors to return.
queryVector - The query vector.
Returns:
A bounded priority queue of vector objects.
Throws:
java.lang.Exception

computeNearestNeighbors

public Answer computeNearestNeighbors(int k,
                                      double[] queryVector)
                               throws java.lang.Exception
This method returns an array of Result objects, corresponding to the k nearest neighbors' ids, names and their distances from the query vector, ordered by lowest distance. The methods calls computeNearestNeighborsInternal and then reads the mappings of ids to names.

Parameters:
k - The number of nearest neighbors to return.
queryVector - The query vector.
Returns:
An array of Result objects.
Throws:
java.lang.Exception

indexVector

public boolean indexVector(java.lang.String name,
                           double[] vector)
                    throws java.lang.Exception
Updates the index with the given vector. This is a synchronized method.

Parameters:
name - The name of the vector.
vector - The vector.
Returns:
True if the vector is successfully indexed, false otherwise.
Throws:
java.lang.Exception

indexVectorInternal

protected abstract void indexVectorInternal(int id,
                                            double[] vector,
                                            com.sleepycat.je.Transaction txn)
                                     throws java.lang.Exception
This method should be implemented in all subclasses and do the operations required for indexing the given vector and id.

Parameters:
id - The id of the vector to be indexed.
vector - The vector to be indexed.
txn - A transaction handle.
Throws:
java.lang.Exception

outputIndexingTimes

public void outputIndexingTimes()
This method can be called to output indexing time measurements.


outputIndexingTimesInternal

protected abstract void outputIndexingTimesInternal()
Index specific time measurements.

Parameters:
loadCounter -