Skip to content

Subdomain Visit Count

Aggregate hit counts across all domain levels, a string-splitting and accumulation problem that mirrors how analytics pipelines roll up hierarchical metrics.

Problem

Given a list of count-paired domains like "900 google.mail.com", return the visit count for every subdomain. A visit to mail.yahoo.com also counts toward yahoo.com and com.

Input: ["900 google.mail.com", "50 yahoo.com", "1 intel.mail.com", "5 wiki.org"] Output (partial): mail.com: 901, com: 951, org: 5, etc.

Approach

  1. Parse each entry into a count and a domain string.
  2. Split the domain on . to get its fragments.
  3. For each suffix (e.g., google.mail.com, mail.com, com), add the count to a running total in a hash map.
  4. Return the accumulated counts.

The key insight is that a domain with d dot-separated parts generates exactly d suffixes to update.

Implementation

from collections import defaultdict

def count_domain_hits(cpdomains):
    hits = defaultdict(int)

    for entry in cpdomains:
        count, domain = entry.split()
        count = int(count)
        fragments = domain.split('.')
        for i in range(len(fragments)):
            hits['.'.join(fragments[i:])] += count

    return [f"{cnt} {dom}" for dom, cnt in hits.items()]


cpdomains = ["900 google.mail.com", "50 yahoo.com", "1 intel.mail.com", "5 wiki.org"]
for line in count_domain_hits(cpdomains):
    print(line)

Complexity

  • Time: O(N * D) where N is the number of entries and D is max domain depth (typically small)
  • Space: O(U) where U is the number of unique subdomains

Takeaway

Hierarchical rollup is a common pattern in analytics and observability, anytime you have dot-separated or slash-separated hierarchies and need to aggregate at every level.


Back to Algorithms & Data Structures