Class: Annoy::AnnoyIndex

Inherits:
Object
  • Object
show all
Defined in:
lib/annoy.rb

Overview

AnnoyIndex is a class that provides functions for k-nearest neighbors search. The methods in this class are implemented similarly to Annoy’s Python API (github.com/spotify/annoy#full-python-api).

Examples:

require 'annoy'

index = AnnoyIndex.new(n_features: 100, metric: 'euclidean')

5000.times do |item_id|
  item_vec = Array.new(100) { rand - 0.5 }
  index.add_item(item_id, item_vec)
end

index.build(10)

index.get_nns_by_item(0, 100)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(n_features:, metric: 'angular', dtype: 'float64') ⇒ AnnoyIndex

Create a new search index.

Parameters:

  • n_features (Integer)

    The number of features (dimensions) of stored vector.

  • metric (String) (defaults to: 'angular')

    The distance metric between vectors (‘angular’, ‘dot’, ‘hamming’, ‘euclidean’, or ‘manhattan’).

  • dtype (String) (defaults to: 'float64')

    The data type of features (‘float64’ and ‘float32’). If metric is given ‘hamming’, ‘uint64’ is automatically assigned to this argument.

Raises:

  • (ArgumentError)


43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# File 'lib/annoy.rb', line 43

def initialize(n_features:, metric: 'angular', dtype: 'float64') # rubocop:disable Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/MethodLength
  raise ArgumentError, 'Expect n_features to be Integer.' unless n_features.is_a?(Numeric)

  @n_features = n_features.to_i
  @metric = metric
  @dtype = dtype

  # rubocop:disable Layout/LineLength
  @index = case @metric
           when 'angular'
             @dtype == 'float64' ? AnnoyIndexAngular.new(@n_features) : AnnoyIndexAngularFloat32.new(@n_features)
           when 'dot'
             @dtype == 'float64' ? AnnoyIndexDotProduct.new(@n_features) : AnnoyIndexDotProductFloat32.new(@n_features)
           when 'hamming'
             @dtype = 'uint64'
             AnnoyIndexHamming.new(@n_features)
           when 'euclidean'
             @dtype == 'float64' ? AnnoyIndexEuclidean.new(@n_features) : AnnoyIndexEuclideanFloat32.new(@n_features)
           when 'manhattan'
             @dtype == 'float64' ? AnnoyIndexManhattan.new(@n_features) : AnnoyIndexManhattanFloat32.new(@n_features)
           else
             raise ArgumentError, "No such metric: #{@metric}."
           end
  # rubocop:enable Layout/LineLength
end

Instance Attribute Details

#dtypeString (readonly)

Returns the data type of feature.

Returns:

  • (String)


35
36
37
# File 'lib/annoy.rb', line 35

def dtype
  @dtype
end

#metricString (readonly)

Returns the metric of index.

Returns:

  • (String)


31
32
33
# File 'lib/annoy.rb', line 31

def metric
  @metric
end

#n_featuresInteger (readonly)

Returns the number of features of indexed item.

Returns:

  • (Integer)


27
28
29
# File 'lib/annoy.rb', line 27

def n_features
  @n_features
end

Instance Method Details

#add_item(i, v) ⇒ Boolean

Add item to be indexed.

Parameters:

  • i (Integer)

    The ID of item.

  • v (Array)

    The vector of item.

Returns:

  • (Boolean)


74
75
76
# File 'lib/annoy.rb', line 74

def add_item(i, v)
  @index.add_item(i, v)
end

#build(n_trees, n_jobs: -1)) ⇒ Boolean

Build a forest of index trees. After building, no more items can be added.

This parameter is enabled only if “-DANNOYLIB_MULTITHREADED_BUILD” is specified on gem installation.

Parameters:

  • n_trees (Integer)

    The number of trees. More trees gives higher search precision.

  • n_jobs (Integer) (defaults to: -1))

    The number of threads used to build the trees. If -1 is given, uses all available CPU cores.

Returns:

  • (Boolean)


84
85
86
# File 'lib/annoy.rb', line 84

def build(n_trees, n_jobs: -1)
  @index.build(n_trees, n_jobs)
end

#get_distance(i, j) ⇒ Float or Integer

Calculate the distances between items.

Parameters:

  • i (Integer)

    The ID of item.

  • j (Integer)

    The ID of item.

Returns:

  • (Float or Integer)


147
148
149
# File 'lib/annoy.rb', line 147

def get_distance(i, j)
  @index.get_distance(i, j)
end

#get_item(i) ⇒ Array

Return the item vector.

Parameters:

  • i (Integer)

    The ID of item.

Returns:

  • (Array)


138
139
140
# File 'lib/annoy.rb', line 138

def get_item(i)
  @index.get_item(i)
end

#get_nns_by_item(i, n, search_k: -1,, include_distances: false) ⇒ Array<Integer> or Array<Array<Integer>, Array<Float>>

Search the n closest items.

Parameters:

  • i (Integer)

    The ID of query item.

  • n (Integer)

    The number of nearest neighbors.

  • search_k (Integer) (defaults to: -1,)

    The maximum number of nodes inspected during the search. If -1 is given, it sets to n * n_trees.

  • include_distances (Boolean) (defaults to: false)

    The flag indicating whether to returns all corresponding distances.

Returns:

  • (Array<Integer> or Array<Array<Integer>, Array<Float>>)


119
120
121
# File 'lib/annoy.rb', line 119

def get_nns_by_item(i, n, search_k: -1, include_distances: false)
  @index.get_nns_by_item(i, n, search_k, include_distances)
end

#get_nns_by_vector(v, n, search_k: -1,, include_distances: false) ⇒ Array<Integer> or Array<Array<Integer>, Array<Float>>

Search the n closest items.

Parameters:

  • v (Array)

    The vector of query item.

  • n (Integer)

    The number of nearest neighbors.

  • search_k (Integer) (defaults to: -1,)

    The maximum number of nodes inspected during the search. If -1 is given, it sets to n * n_trees.

  • include_distances (Boolean) (defaults to: false)

    The flag indicating whether to returns all corresponding distances.

Returns:

  • (Array<Integer> or Array<Array<Integer>, Array<Float>>)


130
131
132
# File 'lib/annoy.rb', line 130

def get_nns_by_vector(v, n, search_k: -1, include_distances: false)
  @index.get_nns_by_vector(v, n, search_k, include_distances)
end

#load(filename, prefault: false) ⇒ Boolean

Load a search index from disk.

Parameters:

  • filename (String)

    The filename of search index.

  • prefault (Boolean) (defaults to: false)

    The flag indicating whether to pre-read the entire file into memory.

Returns:

  • (Boolean)


101
102
103
# File 'lib/annoy.rb', line 101

def load(filename, prefault: false)
  @index.load(filename, prefault)
end

#n_itemsInteger

Return the number of items in the search index.

Returns:

  • (Integer)


153
154
155
# File 'lib/annoy.rb', line 153

def n_items
  @index.get_n_items
end

#n_treesInteger

Return the number of trees in the search index.

Returns:

  • (Integer)


159
160
161
# File 'lib/annoy.rb', line 159

def n_trees
  @index.get_n_trees
end

#on_disk_build(filename) ⇒ Boolean

Prepare annoy to build the index in the specified file instead of RAM. (call this method before adding items, no need to save after building).

Parameters:

  • filename (String)

    The filename of search index.

Returns:

  • (Boolean)


168
169
170
# File 'lib/annoy.rb', line 168

def on_disk_build(filename)
  @index.on_disk_build(filename)
end

#save(filename, prefault: false) ⇒ Boolean

Save the search index to disk. After saving, no more items can be added.

Parameters:

  • filename (String)

    The filename of search index.

Returns:

  • (Boolean)


92
93
94
# File 'lib/annoy.rb', line 92

def save(filename, prefault: false)
  @index.save(filename, prefault)
end

#seed(s) ⇒ Object

Set seed for the random number generator.

Parameters:

  • s (Integer)


182
183
184
# File 'lib/annoy.rb', line 182

def seed(s)
  @index.set_seed(s)
end

#unloadBoolean

Unload the search index.

Returns:

  • (Boolean)


108
109
110
# File 'lib/annoy.rb', line 108

def unload
  @index.unload
end

#verbose(flag) ⇒ Object

Set to verbose mode.

Parameters:

  • flag (Boolean)


175
176
177
# File 'lib/annoy.rb', line 175

def verbose(flag)
  @index.verbose(flag)
end