Authorization System Design: Controlling Who Can Do What

April 17, 202625 min read0 views

authorization

Authorization System Design: Controlling Who Can Do What

Introduction

Authorization answers "Is this authenticated user allowed to perform this action on this resource?" It's separate from authentication (which establishes who you are). Authorization is deceptively complex — simple role checks evolve into attribute-based policies, tenant isolation, hierarchical permissions, and delegated access. Getting it wrong doesn't just cause bugs — it causes data breaches.

Authorization Models

Role-Based Access Control (RBAC)

Users are assigned roles. Roles have permissions. Users inherit permissions from their roles.

Roles and permissions:
  admin    → [read, write, delete, manage_users]
  editor   → [read, write]
  viewer   → [read]

User "Alice" has role "editor" → can read and write, cannot delete

Implementation:
┌──────┐     ┌──────┐     ┌────────────┐
│ User │────▶│ Role │────▶│ Permission │
└──────┘     └──────┘     └────────────┘
  Alice       editor        read, write
  Bob         admin         read, write, delete, manage_users
  Carol       viewer        read

Check: can Alice delete post 42?
  Alice → editor → [read, write] → "delete" not in set → DENIED

# Simple RBAC implementation
class RBAC:
    def __init__(self):
        self.role_permissions = {
            "admin": {"read", "write", "delete", "manage_users"},
            "editor": {"read", "write"},
            "viewer": {"read"},
        }
        self.user_roles = {}  # user_id → set of roles
    
    def assign_role(self, user_id, role):
        self.user_roles.setdefault(user_id, set()).add(role)
    
    def has_permission(self, user_id, permission):
        roles = self.user_roles.get(user_id, set())
        for role in roles:
            if permission in self.role_permissions.get(role, set()):
                return True
        return False

# Usage
rbac = RBAC()
rbac.assign_role("alice", "editor")
rbac.has_permission("alice", "write")   # True
rbac.has_permission("alice", "delete")  # False

RBAC limitations: It can't express "Alice can edit her own posts but not others'." Roles are global — they don't consider the specific resource being accessed.

Attribute-Based Access Control (ABAC)

Decisions based on attributes of the user, resource, action, and environment.

Policy: "Users can edit documents they own, in their department,
         during business hours"

Attributes evaluated:
  User:     {id: "alice", department: "engineering", role: "editor"}
  Resource: {type: "document", owner: "alice", department: "engineering"}
  Action:   "edit"
  Environment: {time: "14:30", ip: "10.0.1.50"}

Policy engine evaluates:
  user.role IN ["editor", "admin"]                    → TRUE
  AND resource.owner == user.id                        → TRUE
  AND resource.department == user.department            → TRUE
  AND environment.time BETWEEN "09:00" AND "18:00"     → TRUE
  → ALLOW

Same request at 22:00:
  environment.time BETWEEN "09:00" AND "18:00" → FALSE
  → DENY

Relationship-Based Access Control (ReBAC)

Authorization based on relationships between entities. Used by Google Zanzibar (powering Google Drive, YouTube, etc.).

Relationships stored as tuples:
  (document:readme, owner, user:alice)
  (document:readme, parent, folder:engineering)
  (folder:engineering, viewer, team:backend)
  (team:backend, member, user:bob)

Check: Can Bob view document:readme?
  1. Is Bob the owner of readme? No.
  2. Is Bob a viewer of readme? No.
  3. Is readme in a folder? Yes → folder:engineering
  4. Is Bob a viewer of folder:engineering? 
     → Is Bob a member of team:backend? Yes.
     → team:backend is viewer of folder:engineering? Yes.
     → Bob inherits viewer permission through the relationship chain.
  → ALLOW

This models real-world permission hierarchies:
  Organization → Team → Folder → Document
  Permissions cascade through relationships.

Comparison of Models

┌──────────┬────────────────┬──────────────────┬──────────────────┐
│ Model    │ Complexity     │ Expressiveness   │ Best For         │
├──────────┼────────────────┼──────────────────┼──────────────────┤
│ RBAC     │ Low            │ Low-Medium       │ Internal tools,  │
│          │                │                  │ simple apps      │
├──────────┼────────────────┼──────────────────┼──────────────────┤
│ ABAC     │ Medium-High    │ High             │ Compliance,      │
│          │                │                  │ context-sensitive│
├──────────┼────────────────┼──────────────────┼──────────────────┤
│ ReBAC    │ High           │ Very High        │ File sharing,    │
│          │                │                  │ social, multi-   │
│          │                │                  │ tenant SaaS      │
└──────────┴────────────────┴──────────────────┴──────────────────┘

Most systems start with RBAC and evolve:
  RBAC → RBAC + resource ownership → ABAC or ReBAC

Authorization Architecture

Centralized Authorization Service

┌──────────────┐     ┌──────────────────┐     ┌───────────────┐
│ API Gateway  │────▶│ Authorization    │────▶│ Policy Store  │
│              │     │ Service          │     │               │
│ "Can user X  │     │ (OPA / Cedar /  │     │ Policies +    │
│  do Y on Z?" │     │  SpiceDB)       │     │ Relationships │
└──────────────┘     └──────────────────┘     └───────────────┘

Benefits:
  - Single source of truth for all authorization decisions
  - Policies updated without redeploying services
  - Audit log of all authorization checks
  - Consistent enforcement across all services

Cost:
  - Every request adds a network hop to the authz service
  - Latency: 1-10ms per check (mitigated with caching)
  - Availability: authz service down = everything down

Policy Engines

Open Policy Agent (OPA):
  - General-purpose policy engine
  - Policies written in Rego (declarative language)
  - Can be embedded as a sidecar or library

# Rego policy example
package authz

default allow = false

allow {
    input.method == "GET"
    input.user.role == "viewer"
}

allow {
    input.method == "PUT"
    input.user.role == "editor"
    input.resource.owner == input.user.id
}

allow {
    input.user.role == "admin"
}

SpiceDB (Zanzibar-inspired):
  - Purpose-built for ReBAC
  - Stores relationship tuples
  - Evaluates permission checks through relationship traversal
  - Horizontally scalable

# SpiceDB schema
definition user {}

definition team {
    relation member: user
}

definition folder {
    relation owner: user
    relation viewer: user | team#member
    permission view = owner + viewer
}

definition document {
    relation parent: folder
    relation owner: user
    permission view = owner + parent->view
    permission edit = owner
}

Multi-Tenant Authorization

SaaS applications need tenant isolation — users in Org A must never see Org B's data:

# Every query MUST include tenant_id
# This is the most critical authorization check in multi-tenant systems

def get_documents(user, tenant_id):
    # Verify user belongs to this tenant
    if not user.belongs_to_tenant(tenant_id):
        raise ForbiddenError("Access denied")
    
    # ALWAYS filter by tenant_id — defense in depth
    return db.query(
        "SELECT * FROM documents WHERE tenant_id = %s AND ...",
        (tenant_id,)
    )

# Row-Level Security (PostgreSQL) as a safety net
# Even if application code forgets tenant_id, the database enforces it
"""
CREATE POLICY tenant_isolation ON documents
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
"""

# Set tenant context before every query
def set_tenant_context(conn, tenant_id):
    conn.execute(f"SET app.current_tenant = '{tenant_id}'")

Authorization Caching

Authorization checks happen on every request. Caching is essential for performance, but stale caches create security gaps:

class AuthzCache:
    def __init__(self, authz_service, cache, ttl=60):
        self.authz_service = authz_service
        self.cache = cache
        self.ttl = ttl  # Short TTL — permission changes must propagate quickly
    
    def check(self, user_id, action, resource_id):
        cache_key = f"authz:{user_id}:{action}:{resource_id}"
        
        cached = self.cache.get(cache_key)
        if cached is not None:
            return cached == "allow"
        
        # Cache miss — check authoritative source
        allowed = self.authz_service.check(user_id, action, resource_id)
        
        # Cache the result with short TTL
        self.cache.set(cache_key, "allow" if allowed else "deny", ttl=self.ttl)
        return allowed
    
    def invalidate_user(self, user_id):
        """Call when user's permissions change (role change, team change)"""
        self.cache.delete_pattern(f"authz:{user_id}:*")

Common Authorization Vulnerabilities

1. IDOR (Insecure Direct Object Reference)
   GET /api/invoices/42 → Returns invoice 42
   But is invoice 42 owned by the requesting user?
   Fix: Always verify ownership, not just authentication.

2. Missing function-level authorization
   Regular user discovers /api/admin/users endpoint → gets all user data
   Fix: Check permissions at EVERY endpoint, not just the frontend.

3. Mass assignment / privilege escalation
   POST /api/users/me { "role": "admin" }
   If the server blindly updates all fields → user is now admin
   Fix: Whitelist updateable fields. Never trust client-sent role/permission data.

4. JWT claims trusted without verification
   JWT contains { "role": "admin" } but was forged or expired
   Fix: Verify JWT signature and expiry on every request.
   Never store sensitive permissions ONLY in the JWT without server-side verification.

Key Takeaways

Start with RBAC, evolve as needed — RBAC handles 80% of cases; add ABAC or ReBAC when you need resource-level or relationship-based permissions
Centralize authorization logic — a dedicated service (OPA, SpiceDB, Cedar) ensures consistent enforcement and auditable decisions
Multi-tenant isolation is the highest-priority check — a single tenant leak is a data breach; enforce at both application and database layers
Cache authorization decisions with short TTLs — performance matters, but stale permission caches are security vulnerabilities; invalidate on permission changes
Check authorization at the API layer, not just the UI — hiding a button doesn't prevent someone from calling the endpoint directly
Protect against IDOR — verify that the requesting user has access to the specific resource, not just that they're authenticated
Audit every authorization decision — log who accessed what, when, and whether it was allowed or denied; this is required for compliance and incident investigation

What did you think?