Subdomain Visit Count¶
Aggregate hit counts across all domain levels, a string-splitting and accumulation problem that mirrors how analytics pipelines roll up hierarchical metrics.
Problem¶
Given a list of count-paired domains like "900 google.mail.com", return the visit count for every subdomain. A visit to mail.yahoo.com also counts toward yahoo.com and com.
Input: ["900 google.mail.com", "50 yahoo.com", "1 intel.mail.com", "5 wiki.org"]
Output (partial): mail.com: 901, com: 951, org: 5, etc.
Approach¶
- Parse each entry into a count and a domain string.
- Split the domain on
.to get its fragments. - For each suffix (e.g.,
google.mail.com,mail.com,com), add the count to a running total in a hash map. - Return the accumulated counts.
The key insight is that a domain with d dot-separated parts generates exactly d suffixes to update.
Implementation¶
from collections import defaultdict
def count_domain_hits(cpdomains):
hits = defaultdict(int)
for entry in cpdomains:
count, domain = entry.split()
count = int(count)
fragments = domain.split('.')
for i in range(len(fragments)):
hits['.'.join(fragments[i:])] += count
return [f"{cnt} {dom}" for dom, cnt in hits.items()]
cpdomains = ["900 google.mail.com", "50 yahoo.com", "1 intel.mail.com", "5 wiki.org"]
for line in count_domain_hits(cpdomains):
print(line)
Complexity¶
- Time: O(N * D) where N is the number of entries and D is max domain depth (typically small)
- Space: O(U) where U is the number of unique subdomains
Takeaway¶
Hierarchical rollup is a common pattern in analytics and observability, anytime you have dot-separated or slash-separated hierarchies and need to aggregate at every level.