Adjacency List vs Nested Set Tradeoffs

REST API for managing employees and their reporting chains -- a classic tree-traversal problem that shows up anywhere you need hierarchical relationships.

Problem

Model employees within a company where each employee has a single supervisor. The API must support CRUD operations and answer hierarchy queries like "give me the full reporting chain from this employee up to the CEO."

Approach

Data model decisions:

  1. Single self-referential table. Each employee row has a supervisor_id pointing to its boss. If employees could have multiple managers, you'd need an N-M join table — but single-supervisor keeps it simple.
  2. Adjacency list + nested sets. The supervisor_id gives you an adjacency list. For fast subtree queries, each row also carries left/right nested-set boundaries — the same idea behind interval-based containment checks in spatial indexing.
  3. DFS at query time for reporting chains. Walk parent pointers until supervisor_id is null. Costs O(d) queries for depth d, which is fine for most org trees (rarely deeper than ~10 levels).

Read/write tradeoffs:

  • Single employee lookup, boss, and title: one query.
  • Full reporting chain: DFS with d queries for d levels of hierarchy.
  • Adding/updating an employee: single insert or update.
  • Deleting a mid-tree node: requires subtree repair — the expensive case.

Implementation

Models — SQLAlchemy with self-referential supervisor_id and nested-set pointers:

class Employee(Base):
    __tablename__ = "employee"
    id = Column(Integer, primary_key=True)
    unique_id = Column(UUID(as_uuid=True))
    name = Column(String(255), nullable=False)
    start_date = Column(DateTime, default=datetime.datetime.now, nullable=False)
    terminate_date = Column(DateTime, nullable=True)
    active = Column(Boolean, default=True)
    supervisor_id = Column(Integer, nullable=True)
    employee_job = relationship("EmployeeJob", uselist=False, back_populates="employee")

    # Nested-set boundaries for fast subtree queries
    left = Column("lft", Integer, nullable=True)
    right = Column("rgt", Integer, nullable=True)

    def __init__(self, name, supervisor_id=None):
        self.name = name
        self.start_date = datetime.datetime.now()
        self.unique_id = uuid.uuid4()
        if supervisor_id:
            self.supervisor_id = supervisor_id


class EmployeeJob(Base):
    __tablename__ = "employee_job"
    id = Column(Integer, primary_key=True)
    job_title = Column(String(255), nullable=False)
    employee_id = Column(Integer, ForeignKey("employee.id"))
    employee = relationship("Employee", back_populates="employee_job")
    active_date = Column(DateTime, default=datetime.datetime.now, nullable=False)
    deactive_date = Column(DateTime, nullable=True)

API — Flask Blueprint with full CRUD:

employee_api = Blueprint("employee_api", __name__)

@employee_api.route("/employee", methods=["POST"])
def create_employee():
    input_data = request.get_json()
    name = input_data.get("name")
    new_emp = models.Employee(name)

    employee_job = input_data.get("employee_job")
    if employee_job:
        job = models.EmployeeJob(employee_job)
        new_emp.employee_job = job

    session = models.get_session()
    session.add(new_emp)
    session.commit()
    return jsonify(util.build_response({}, f"Employee created: {new_emp.id}", 201))

@employee_api.route("/employee/<id>", methods=["GET"])
def retrieve_employee(id):
    emp = models.get_session().query(models.Employee).get(id)
    if not emp:
        return jsonify(util.build_response({}, f"No employee found: {id}", 404))
    output = schema.emp_schema.dump(emp)
    return jsonify(util.build_response({}, f"Employee {emp.id} retrieved", 200, output))

@employee_api.route("/employee/<id>", methods=["PUT"])
def update_employee(id):
    emp = models.get_session().query(models.Employee).get(id)
    if not emp:
        return jsonify(util.build_response({}, f"No employee found: {id}", 404))

    input_data = request.get_json()
    emp.name = input_data.get("name", emp.name)
    emp.supervisor_id = input_data.get("supervisor_id", emp.supervisor_id)
    if "employee_job" in input_data:
        emp.employee_job = models.EmployeeJob(input_data["employee_job"])

    models.get_session().commit()
    return jsonify(util.build_response({}, f"Employee {emp.id} updated", 200))

@employee_api.route("/employee/<id>", methods=["DELETE"])
def delete_employee(id):
    emp = models.get_session().query(models.Employee).get(id)
    if not emp:
        return jsonify(util.build_response({}, f"No employee found: {id}", 404))
    models.get_session().delete(emp)
    models.get_session().commit()
    return jsonify(util.build_response({}, f"Employee {id} deleted", 204))

Nested-set balancing — maintains left/right boundaries on insert using the rightmost-sibling algorithm:

def balance_tree(mapper, connection, instance):
    personnel = mapper.mapped_table

    if not instance.parent:
        instance.left = 1
        instance.right = 2
    else:
        right_most_sibling = connection.scalar(
            select([personnel.c.rgt]).where(personnel.c.id == instance.parent.id)
        )
        connection.execute(
            update(personnel)
            .where(personnel.c.rgt >= right_most_sibling)
            .values(
                lft=case(
                    [(personnel.c.lft > right_most_sibling, personnel.c.lft + 2)],
                    else_=personnel.c.lft
                ),
                rgt=case(
                    [(personnel.c.rgt >= right_most_sibling, personnel.c.rgt + 2)],
                    else_=personnel.c.rgt
                ),
            )
        )
        instance.left = right_most_sibling
        instance.right = right_most_sibling + 1

Takeaway

Adjacency lists are simple but make subtree queries expensive. Nested sets flip that tradeoff -- cheap reads, costlier writes. Pick based on your read/write ratio.


Back to Microservices & APIs