PDF version of this article for mobile device or printing purpose can be found here
This series of articles introduce the notes on the lectures MIT6.851 - Advanced Data Structure from Professor Erik Demaine. Here is the first article. All codes are open sourced in my github.
1. Pointer Machine
Pointer machine is the model of computation. A pointer machine can be treated as a set of entries. Each entry contains \(O(1)\) fields. A field can be any type of data, or a pointer to another entry.
As the first example, the linked list defined like LinkedList<Integer> (in java) can be represented via pointer machine where the fields consist of two or more parts
valinteger value- any other
data fields next_ptrpointing to next node
Binary search tree (BST) can also be built upon pointer machine. In BST, there are three(3) essential elements in the fields
valleft_ptrright_ptr
2. Partial Persistence
The persistence of some data structure is the ability to keep all versions of that.
Partial persistence: the data structure (DS) can be updated only in latest version. And we can access (NOT CHANGE) any field in any version. Versions are linearly ordered as follows,
Example: Assuming we have a partial persistent pointer machines, then we can create a version controlled LinkedList/BST. Please note that all pointers in the LinkedList/BST are part of the fields; like the data fields, all pointers are version controlled.
Full persistence: DS can be updated in any version. This will be discussed in another article. Versions form a tree as
3. An Implementation of Partial Persistance
The implementation is based on Driscoll, Sarnak, Sleator, Tarjan–JCS 1989.
- Any pointer-machine DS with \(\le p=O(1)\) pointers to any node can be made partially persistent
- with \(O(1)\) amortized multiplicative overhead and
- \(O(1)\) space per change.
3.1 Extended pointer machine
To implement this we need the extend the pointer machine as
- all original data and pointer fields
- \(p\) back pointers
- \(2p\) mods (modifications), each one is
(version, variable, value)
Given this, we can declare the pointer machine
class PartialPersistentNode():
def __init__(self, data_vars_tags, ptr_vars_tags, max_pointer_num=1):
self.p, self.dt, self.pt = max_pointer_num, data_vars_tags, ptr_vars_tags
self.fields = {var: None for var in self.dt + self.pt}
self.back = {var: set() for var in self.pt}
self.mods = list() # list of tuple (version, field, value)
def set_back(self, ptr, new, old=None):
assert ptr in self.pt
if old is not None and old in self.back[ptr]:
self.back[ptr].remove(old)
self.back[ptr].add(new)
We let PartialPersistentNode.dt and PartialPersistentNode.pt as tags of data and pointer fields. For each pointer field, we need a set to hold the nodes pointing to current node.
For convenient, we have a function set_back to update the back pointers.
3.2 Read Field in Any Version
Since we put all modifications in PartialPersistentNode.mods, To read the PartialPersistentNode in any version v, we just need to check for the mods with time <=v.
def read(self, var, v):
assert var in self.fields.keys()
res = self.fields[var]
for (mod_version, mod_field, mod_value) in self.mods:
if mod_version > v: break
if mod_field == var: res = mod_value
return res
3.3 Update the Latest Version
To do the update,
- if the
self.modsis not full, then just append the modification information there - if mods of
selfis full, we need to- create a
newnode with all modifications of fields updated, and all back pointers copied - in all nodes
self.fields[ptr] for ptr in self.pt, their back pointers should be redirected to thenewnode - for all nodes pointing to
self, update theirptrfield (whereptr in self.pt) tonew. (NOTE: This may be a recursive process.)
- create a
def write(self, now, var, val):
assert var in self.fields.keys()
if len(self.mods) <= 2 * self.p:
self.mods.append((now, var, val))
else:
# crate the new node. save the latest value
new = PartialPersistentNode(self.dt, self.pt, self.p)
for (_, mod_field, mod_value) in self.mods:
new.fields[mod_field] = mod_value
new.fields[var] = val
new.back = {
var: set([i for i in self.back[var]])
for var in self.pt
}
# change to backpoints in pointingto
for ptr in self.pt:
node_pointing_to = new.fields[ptr]
node_pointing_to.set_back(ptr, new, self)
# recursively change pointers to self -> new (found via back pointers)
for ptr in self.pt:
for node_point_to_me in new.back[ptr]:
node_point_to_me.write(now, ptr, new)
4. Partial Persistent Linked List
With the PartialPersistentNode, we can now implement a partial persistent linked list. The declaration of LinkedList can be
class LinkedList():
def __init__(self):
self.dt, self.pt, self.p = ['value'], ['next_ptr'], 1
self.root = PartialPersistentNode(self.dt, self.pt, self.p)
self._timestamp = 0
def now(self): # equivalent to (now++)
res = self._timestamp
self._timestamp += 1
return res
def set_root_value(self, val):
self.root.write(self.now(), 'value', val)
To simplify, we use the _timestamp to track the version.
Reading LinkedList is not difficult.
def str_(self, v):
res = ""
cursor = self.root
while cursor:
res += str(cursor.read('value', v)) + '->'
cursor = cursor.read('next_ptr', v)
res += 'END'
return res
def __str__(self):
return self.str_(self._timestamp)
To append a node, move the end of the list, then create a new node, set up value and back pointer – next_ptr. Register the new node in previous(cursor in the following)’s next_ptr.
def append(self, val):
at = self.now()
cursor = self.root
while True:
next_node = cursor.read('next_ptr', at)
if next_node is None:
break
else:
cursor = next_node
new = PartialPersistentNode(self.dt, self.pt, self.p)
new.write(at, 'value', val)
new.set_back('next_ptr', cursor)
cursor.write(at, 'next_ptr', new)
return self
Similar operations for inserting,
def insert_after_ith_node(self, i, val):
at = self.now()
cursor = self.root
for _ in range(i):
cursor = cursor.read('next_ptr', at)
if cursor is None:
self.append(val)
return self
next_ = cursor.read('next_ptr', at)
new = PartialPersistentNode(self.dt, self.pt, self.p)
new.write(at, 'value', val)
cursor.write(at, 'next_ptr', new)
new.write(at, 'next_ptr', next_)
next_.set_back('next_ptr', new, cursor)
new.set_back('next_ptr', cursor)
return self
Updating the value of some node can be
def modify_ith_node_val(self, i, val):
assert i >= 0
at = self.now()
cursor = self.root
for _ in range(i):
cursor = cursor.read('next_ptr', at)
if cursor is None:
self.append(val)
return self
cursor.write(at, 'value', val)
return self
To delete some node, carefully update the back pointers and next_ptr shoud be fine :)
def delete_ith_node(self, i):
assert i > 0, "Assuming do not delete the root"
at = self.now()
cursor = self.root
for _ in range(i - 1): # move the one before what
cursor = cursor.read('next_ptr', at)
if cursor is None: return self
next_ = cursor.read('next_ptr', at)
next_next = next_.read('next_ptr', at)
cursor.write(at, 'next_ptr', next_next)
next_next.set_back('next_ptr', cursor, next_)
return self
The following shows the demo
if __name__ == '__main__':
L = LinkedList()
L.set_root_value(1)
L.append(2).append(3).append(4).append(5)
cp1 = L.now() # checkpoint 1
print(f'Current time = {cp1}, Latest linked list shows as {L}')
L.modify_ith_node_val(2, 20)
cp2 = L.now()
print(f'Current time = {cp2}, Latest linked list shows as {L}')
L.insert_after_ith_node(2, 22)
cp3 = L.now()
print(f'Current time = {cp3}, Latest linked list shows as {L}')
L.modify_ith_node_val(1, 10)
cp4 = L.now()
print(f'Current time = {cp4}, Latest linked list shows as {L}')
L.delete_ith_node(2)
cp5 = L.now()
print(f'Current time = {cp5}, Latest linked list shows as {L}')
print("\n\nRolling back to earliest check points")
for cp in [cp5, cp4, cp3, cp2, cp1]:
print(f'In {cp}, list shows as {L.str_(cp)}')
Output as
>>> STDOUT
Current time = 5, Latest linked list shows as 1->2->3->4->5->END
Current time = 7, Latest linked list shows as 1->2->20->4->5->END
Current time = 9, Latest linked list shows as 1->2->20->22->4->5->END
Current time = 11, Latest linked list shows as 1->10->20->22->4->5->END
Current time = 13, Latest linked list shows as 1->10->22->4->5->END
Rolling back to earliest check points
In 13, list shows as 1->10->22->4->5->END
In 11, list shows as 1->10->20->22->4->5->END
In 9, list shows as 1->2->20->22->4->5->END
In 7, list shows as 1->2->20->4->5->END
In 5, list shows as 1->2->3->4->5->END
A complete code in this article can be found at here.