Adjacency List vs Nested Set Tradeoffs¶
REST API for managing employees and their reporting chains -- a classic tree-traversal problem that shows up anywhere you need hierarchical relationships.
Problem¶
Model employees within a company where each employee has a single supervisor. The API must support CRUD operations and answer hierarchy queries like "give me the full reporting chain from this employee up to the CEO."
Approach¶
Data model decisions:
- Single self-referential table. Each employee row has a
supervisor_idpointing to its boss. If employees could have multiple managers, you'd need an N-M join table — but single-supervisor keeps it simple. - Adjacency list + nested sets. The
supervisor_idgives you an adjacency list. For fast subtree queries, each row also carriesleft/rightnested-set boundaries — the same idea behind interval-based containment checks in spatial indexing. - DFS at query time for reporting chains. Walk parent pointers until
supervisor_idis null. Costs O(d) queries for depth d, which is fine for most org trees (rarely deeper than ~10 levels).
Read/write tradeoffs:
- Single employee lookup, boss, and title: one query.
- Full reporting chain: DFS with d queries for d levels of hierarchy.
- Adding/updating an employee: single insert or update.
- Deleting a mid-tree node: requires subtree repair — the expensive case.
Implementation¶
Models — SQLAlchemy with self-referential supervisor_id and nested-set pointers:
class Employee(Base):
__tablename__ = "employee"
id = Column(Integer, primary_key=True)
unique_id = Column(UUID(as_uuid=True))
name = Column(String(255), nullable=False)
start_date = Column(DateTime, default=datetime.datetime.now, nullable=False)
terminate_date = Column(DateTime, nullable=True)
active = Column(Boolean, default=True)
supervisor_id = Column(Integer, nullable=True)
employee_job = relationship("EmployeeJob", uselist=False, back_populates="employee")
# Nested-set boundaries for fast subtree queries
left = Column("lft", Integer, nullable=True)
right = Column("rgt", Integer, nullable=True)
def __init__(self, name, supervisor_id=None):
self.name = name
self.start_date = datetime.datetime.now()
self.unique_id = uuid.uuid4()
if supervisor_id:
self.supervisor_id = supervisor_id
class EmployeeJob(Base):
__tablename__ = "employee_job"
id = Column(Integer, primary_key=True)
job_title = Column(String(255), nullable=False)
employee_id = Column(Integer, ForeignKey("employee.id"))
employee = relationship("Employee", back_populates="employee_job")
active_date = Column(DateTime, default=datetime.datetime.now, nullable=False)
deactive_date = Column(DateTime, nullable=True)
API — Flask Blueprint with full CRUD:
employee_api = Blueprint("employee_api", __name__)
@employee_api.route("/employee", methods=["POST"])
def create_employee():
input_data = request.get_json()
name = input_data.get("name")
new_emp = models.Employee(name)
employee_job = input_data.get("employee_job")
if employee_job:
job = models.EmployeeJob(employee_job)
new_emp.employee_job = job
session = models.get_session()
session.add(new_emp)
session.commit()
return jsonify(util.build_response({}, f"Employee created: {new_emp.id}", 201))
@employee_api.route("/employee/<id>", methods=["GET"])
def retrieve_employee(id):
emp = models.get_session().query(models.Employee).get(id)
if not emp:
return jsonify(util.build_response({}, f"No employee found: {id}", 404))
output = schema.emp_schema.dump(emp)
return jsonify(util.build_response({}, f"Employee {emp.id} retrieved", 200, output))
@employee_api.route("/employee/<id>", methods=["PUT"])
def update_employee(id):
emp = models.get_session().query(models.Employee).get(id)
if not emp:
return jsonify(util.build_response({}, f"No employee found: {id}", 404))
input_data = request.get_json()
emp.name = input_data.get("name", emp.name)
emp.supervisor_id = input_data.get("supervisor_id", emp.supervisor_id)
if "employee_job" in input_data:
emp.employee_job = models.EmployeeJob(input_data["employee_job"])
models.get_session().commit()
return jsonify(util.build_response({}, f"Employee {emp.id} updated", 200))
@employee_api.route("/employee/<id>", methods=["DELETE"])
def delete_employee(id):
emp = models.get_session().query(models.Employee).get(id)
if not emp:
return jsonify(util.build_response({}, f"No employee found: {id}", 404))
models.get_session().delete(emp)
models.get_session().commit()
return jsonify(util.build_response({}, f"Employee {id} deleted", 204))
Nested-set balancing — maintains left/right boundaries on insert using the rightmost-sibling algorithm:
def balance_tree(mapper, connection, instance):
personnel = mapper.mapped_table
if not instance.parent:
instance.left = 1
instance.right = 2
else:
right_most_sibling = connection.scalar(
select([personnel.c.rgt]).where(personnel.c.id == instance.parent.id)
)
connection.execute(
update(personnel)
.where(personnel.c.rgt >= right_most_sibling)
.values(
lft=case(
[(personnel.c.lft > right_most_sibling, personnel.c.lft + 2)],
else_=personnel.c.lft
),
rgt=case(
[(personnel.c.rgt >= right_most_sibling, personnel.c.rgt + 2)],
else_=personnel.c.rgt
),
)
)
instance.left = right_most_sibling
instance.right = right_most_sibling + 1
Takeaway¶
Adjacency lists are simple but make subtree queries expensive. Nested sets flip that tradeoff -- cheap reads, costlier writes. Pick based on your read/write ratio.