`aerospike.Scan` — Scan Class

Deprecated since version 7.0.0: aerospike.Query should be used instead.

Warning

aerospike.Scan() should not be called directly to create a Scan object. Please use aerospike.Client.scan() instead.

Overview

The Scan object is used to return all the records in a specified set (which can be omitted or None). A Scan with a None set returns all the records in the namespace.

The scan is invoked using foreach(), results(), or execute_background(). The bins returned can be filtered using select().

Fields

class aerospike.Scan

ttl: int

The time-to-live (expiration) of the record in seconds. Note that ttl is only used on background scan writes.

If this is set to aerospike.TTL_CLIENT_DEFAULT, the scan will use the client’s default scan policy ttl.

See TTL Constants for special values that can be set in the record ttl.

Default: 0 (no limit)

Note

Requires server version >= 6.0.0

Methods

class aerospike.Scan

Deprecated since version 7.0.0: aerospike.Query should be used instead.

select(bin1[, bin2[, bin3..]]): Set a filter on the record bins resulting from results() or foreach(). If a selected bin does not exist in a record it will not appear in the bins portion of that record tuple.

apply(module, function[, arguments])

Apply a record UDF to each record found by the scan User-defined functions (UDFs).

Parameters:

module (str) – the name of the Lua module.
function (str) – the name of the Lua function within the module.
arguments (list) – optional arguments to pass to the function. NOTE: these arguments must be types supported by Aerospike See: supported data types. If you need to use an unsupported type, (e.g. set or tuple) you must use your own serializer.

Returns:

one of the supported types, int, str, float (double), list, dict (map), bytearray (bytes), bool.

See also

Developing Record UDFs

add_ops(ops)

Add a list of write ops to the scan. When used with Scan.execute_background() the scan will perform the write ops on any records found. If no predicate is attached to the scan it will apply ops to all the records in the specified set. See aerospike_helpers for available ops.

Parameters:: ops – list A list of write operations generated by the aerospike_helpers e.g. list_operations, map_operations, etc.

Note

Requires server version >= 4.7.0.

import aerospike
from aerospike_helpers.operations import list_operations
from aerospike_helpers.operations import operations
scan = client.scan('test', 'demo')

ops =  [
    operations.append(test_bin, 'val_to_append'),
    list_operations.list_remove_by_index(test_bin, list_index_to_remove, aerospike.LIST_RETURN_NONE)
]
scan.add_ops(ops)

id = scan.execute_background()
client.close()

For a more comprehensive example, see using a list of write ops with Query.execute_background() .

results([policy[, nodename]]) -> list of (key, meta, bins)

Buffer the records resulting from the scan, and return them as a list of records.

Parameters:

policy (dict) – optional Policies.
nodename (str) – optional Node ID of node used to limit the scan to a single node.

Returns:

a list of Record Tuple.

import aerospike
import pprint

pp = pprint.PrettyPrinter(indent=2)
config = { 'hosts': [ ('127.0.0.1',3000)]}
client = aerospike.client(config)

client.put(('test','test','key1'), {'id':1,'a':1},
    policy={'key':aerospike.POLICY_KEY_SEND})
client.put(('test','test','key2'), {'id':2,'b':2},
    policy={'key':aerospike.POLICY_KEY_SEND})

scan = client.scan('test', 'test')
scan.select('id','a','zzz')
res = scan.results()
pp.pprint(res)
client.close()

Note

We expect to see:

[ ( ( 'test',
      'test',
      u'key2',
      bytearray(b'\xb2\x18\n\xd4\xce\xd8\xba:\x96s\xf5\x9ba\xf1j\xa7t\xeem\x01')),
    { 'gen': 52, 'ttl': 2592000},
    { 'id': 2}),
  ( ( 'test',
      'test',
      u'key1',
      bytearray(b'\x1cJ\xce\xa7\xd4Vj\xef+\xdf@W\xa5\xd8o\x8d:\xc9\xf4\xde')),
    { 'gen': 52, 'ttl': 2592000},
    { 'a': 1, 'id': 1})]

Note

As of client 7.0.0 and with server >= 6.0 results and the scan policy “partition_filter” see Partition Objects can be used to specify which partitions/records results will scan. See the example below.

# This is an example of scanning partitions 1000 - 1003.
import aerospike


scan = client.scan("test", "demo")

policy = {
    "partition_filter": {
        "begin": 1000,
        "count": 4
    },
}

# NOTE that these will only be non 0 if there are records in partitions 1000 - 1003
# results will be the records in partitions 1000 - 1003
results = scan.results(policy=policy)

foreach(callback[, policy[, options[, nodename]]])

Invoke the callback function for each of the records streaming back from the scan.

Parameters:

callback (Callable) – the function to invoke for each record.
policy (dict) – optional Policies.
options (dict) – the Options that will apply to the scan.
nodename (str) – optional Node ID of node used to limit the scan to a single node.

Note

A Record Tuple is passed as the argument to the callback function. If the scan is using the “partition_filter” scan policy the callback will receive two arguments The first is a int representing partition id, the second is the same Record Tuple as a normal callback.

import aerospike
import pprint

pp = pprint.PrettyPrinter(indent=2)
config = { 'hosts': [ ('127.0.0.1',3000)]}
client = aerospike.client(config)

client.put(('test','test','key1'), {'id':1,'a':1},
    policy={'key':aerospike.POLICY_KEY_SEND})
client.put(('test','test','key2'), {'id':2,'b':2},
    policy={'key':aerospike.POLICY_KEY_SEND})

def show_key(record):
    key, meta, bins = record
    print(key)

scan = client.scan('test', 'test')
scan_opts = {
  'concurrent': True,
  'nobins': True
}
scan.foreach(show_key, options=scan_opts)
client.close()

Note

We expect to see:

('test', 'test', u'key2', bytearray(b'\xb2\x18\n\xd4\xce\xd8\xba:\x96s\xf5\x9ba\xf1j\xa7t\xeem\x01'))
('test', 'test', u'key1', bytearray(b'\x1cJ\xce\xa7\xd4Vj\xef+\xdf@W\xa5\xd8o\x8d:\xc9\xf4\xde'))

Note

To stop the stream return False from the callback function.

import aerospike

config = { 'hosts': [ ('127.0.0.1',3000)]}
client = aerospike.client(config)

def limit(lim, result):
    c = [0] # integers are immutable so a list (mutable) is used for the counter
    def key_add(record):
        key, metadata, bins = record
        if c[0] < lim:
            result.append(key)
            c[0] = c[0] + 1
        else:
            return False
    return key_add

scan = client.scan('test','user')
keys = []
scan.foreach(limit(100, keys))
print(len(keys)) # this will be 100 if the number of matching records > 100
client.close()

Note

As of client 7.0.0 and with server >= 6.0 foreach and the scan policy “partition_filter” see Partition Objects can be used to specify which partitions/records foreach will scan. See the example below.

# This is an example of scanning partitions 1000 - 1003.
import aerospike


partitions = []

def callback(part_id, input_tuple):
    print(part_id)
    partitions.append(part_id)

scan = client.scan("test", "demo")

policy = {
    "partition_filter": {
        "begin": 1000,
        "count": 4
    },
}

scan.foreach(callback, policy)


# NOTE that these will only be non 0 if there are records in partitions 1000 - 1003
# should be 4
print(len(partitions))

# should be [1000, 1001, 1002, 1003]
print(partitions)

execute_background([policy])

Execute a record UDF on records found by the scan in the background. This method returns before the scan has completed. A UDF can be added to the scan with Scan.apply().

Parameters:: policy (dict) – optional Write Policies.
Returns:: a job ID that can be used with job_info() to track the status of the aerospike.JOB_SCAN, as it runs in the background.

Note

Python client version 3.10.0 implemented scan execute_background.

import aerospike
from aerospike import exception as ex
import sys
import time

config = {"hosts": [("127.0.0.1", 3000)]}
client = aerospike.client(config)

# register udf
try:
    client.udf_put("./my_udf.lua")
except ex.AerospikeError as e:
    print("Error: {0} [{1}]".format(e.msg, e.code))
    client.close()
    sys.exit(1)

# put records and apply udf
try:
    keys = [("test", "demo", 1), ("test", "demo", 2), ("test", "demo", 3)]
    records = [{"number": 1}, {"number": 2}, {"number": 3}]
    for i in range(3):
        client.put(keys[i], records[i])

    scan = client.scan("test", "demo")
    scan.apply("my_udf", "my_udf", ["number", 10])
    job_id = scan.execute_background()

    # wait for job to finish
    while True:
        response = client.job_info(job_id, aerospike.JOB_SCAN)
        if response["status"] != aerospike.JOB_STATUS_INPROGRESS:
            break
        time.sleep(0.25)

    brs = client.batch_read(keys)
    for br in brs.batch_records:
        print(br.record[2])
except ex.AerospikeError as e:
    print("Error: {0} [{1}]".format(e.msg, e.code))
    sys.exit(1)
finally:
    client.close()
# EXPECTED OUTPUT:
# {'number': 11}
# {'number': 12}
# {'number': 13}

-- contents of my_udf.lua
function my_udf(rec, bin, offset)
    info("my transform: %s", tostring(record.digest(rec)))
    rec[bin] = rec[bin] + offset
    aerospike:update(rec)
end

paginate()

Makes a scan instance a paginated scan. Call this if you are using the “max_records” scan policy and you need to scan data in pages.

Note

Calling .paginate() on a scan instance causes it to save its partition state. This can be retrieved later using .get_partitions_status(). This can also be done using the partition_filter policy.

# scan 3 pages of 1000 records each.

import aerospike

pages = 3
page_size = 1000
policy = {"max_records": 1000}

scan = client.scan('test', 'demo')

scan.paginate()

# NOTE: The number of pages queried and records returned per page can differ
# if record counts are small or unbalanced across nodes.
for page in range(pages):
    records = scan.results(policy=policy)

    print("got page: " + str(page))

    if scan.is_done():
        print("all done")
        break

# This id can be used to paginate queries.

is_done()

If using scan pagination, did the previous paginated or partition_filter scan using this scan instance return all records?

Returns:: A bool signifying whether this paginated scan instance has returned all records.

import aerospike

policy = {"max_records": 1000}

scan = client.scan('test', 'demo')

scan.paginate()

records = scan.results(policy=policy)

if scan.is_done():
    print("all done")

# This id can be used to monitor the progress of a paginated scan.

get_partitions_status()

Get this scan instance’s partition status. That is which partitions have been queried and which have not. The returned value is a dict with partition id, int, as keys and tuple as values. If the scan instance is not tracking its partitions, the returned dict will be empty.

Note

A scan instance must have had .paginate() called on it in order retrieve its partition status. If .paginate() was not called, the scan instance will not save partition status.

Returns:: a tuple of form (id: int, init: class`bool`, done: class`bool`, digest: bytearray). See Partition Objects for more information.

# This is an example of resuming a scan using partition status.
import aerospike


for i in range(15):
    key = ("test", "demo", i)
    bins = {"id": i}
    client.put(key, bins)

records = []
resumed_records = []

def callback(input_tuple):
    record, _, _ = input_tuple

    if len(records) == 5:
        return False

    records.append(record)

scan = client.scan("test", "demo")
scan.paginate()

scan.foreach(callback)

# The first scan should stop after 5 records.
assert len(records) == 5

partition_status = scan.get_partitions_status()

def resume_callback(part_id, input_tuple):
    record, _, _ = input_tuple
    resumed_records.append(record)

scan_resume = client.scan("test", "demo")

policy = {
    "partition_filter": {
        "partition_status": partition_status
    },
}

scan_resume.foreach(resume_callback, policy)

# should be 15
total_records = len(records) + len(resumed_records)
print(total_records)

# cleanup
for i in range(15):
    key = ("test", "demo", i)
    client.remove(key)

Policies

policy

A dict of optional scan policies which are applicable to Scan.results() and Scan.foreach(). See Policies.

See Base Policies as well.

durable_delete bool

Perform durable delete (requires Enterprise server version >= 3.10)

If the transaction results in a record deletion, leave a tombstone for the record.

Default: False
records_per_second int

Limit the scan to process records at records_per_second.

Requires server version >= 4.7.0.

Default: 0 (no limit).
max_records int

Approximate number of records to return to client.

This number is divided by the number of nodes involved in the scan.

The actual number of records returned may be less than max_records if node record counts are small and unbalanced across nodes.

Default: 0 (No Limit).

Note

Requires Aerospike server version >= 6.0
partition_filter dict

A dictionary of partition information used by the client

to perform partition scans. Useful for resuming terminated scans and

scanning particular partitions/records.

See Partition Objects for more information.

Default: {} (All partitions will be scanned).
replica

One of the Replica Options values such as aerospike.POLICY_REPLICA_MASTER

Default: aerospike.POLICY_REPLICA_SEQUENCE
ttl (int)
The default time-to-live (expiration) of the record in seconds. This field will only be used on background scan writes if aerospike.Scan.ttl is set to aerospike.TTL_CLIENT_DEFAULT.

There are also special values that can be set for this field. See TTL Constants.

Options

options

A dict of optional scan options which are applicable to Scan.foreach().

nobins bool

Whether to return the bins portion of the Record Tuple.

Default False.
concurrent bool

Whether to run the scan concurrently on all nodes of the cluster.

Default False.
percent int

Deprecated in version 6.0.0, will be removed in a coming release.

No longer available with server 5.6+.

Use scan policy max_records instead.

Percentage of records to return from the scan.

Default 100.

Added in version 1.0.39.

aerospike.Scan — Scan Class

Overview

Fields

Methods

Policies

Options

`aerospike.Scan` — Scan Class