Data Collection Methodology

Transparent and systematic approach to gathering University of California wage data

Data Collection Process

1

Source Identification

Data is collected directly from the official UC Annual Wage website (ucannualwage.ucop.edu), which publishes employee compensation data as mandated by California state law. This ensures all data is publicly available and legally accessible.

2

Automated Collection

We use a high-performance Go-based scraper with concurrent workers to efficiently collect data. The scraper respects rate limits and implements retry logic to ensure reliable data collection without overloading the source.

3

Data Validation

Each data point is validated for completeness and accuracy. Records are checked for required fields including employee name (anonymized where necessary), title, location, year, and various pay components.

4

Storage & Organization

Data is stored in a structured JSON format, organized by campus location and year. This hierarchical structure enables efficient querying and maintains data integrity across years of historical information.

Technical Implementation

Scraper Architecture

Core Components:

  • Language: Go (Golang) for high performance and concurrency
  • Worker Pool: Configurable concurrent workers (default: 3-10)
  • Rate Limiting: Built-in delays between requests (default: 1 second)
  • Retry Logic: Automatic retry on failures with exponential backoff
  • Progress Tracking: Resume capability for interrupted scraping sessions

API Interaction:

POST https://ucannualwage.ucop.edu/wage/search
Content-Type: application/json

{
  "op": "search",
  "page": 1,
  "rows": 100,
  "year": "2024",
  "location": "Berkeley"
}

Data Structure

Storage Format:

data/
├── [Campus_Name]/
│   ├── wages_2024.json
│   ├── wages_2023.json
│   └── ...
└── scrape_progress.json

Record Schema:

{
  "location": "Berkeley",
  "year": 2024,
  "scraped_at": "2025-09-13T19:16:37Z",
  "total_records": 37078,
  "records": [
    {
      "firstname": "*****",
      "lastname": "*****",
      "title": "Professor",
      "location": "Berkeley",
      "year": "2024",
      "basepay": "150,000.00",
      "overtimepay": "0.00",
      "adjustpay": "5,000.00",
      "grosspay": "155,000.00"
    }
  ]
}

Data Processing

Aggregation Pipeline:

  • Ingestion: JSON files are parsed and validated
  • Transformation: Pay values converted from strings to numbers
  • Aggregation: Calculate totals, averages, and counts by campus/year
  • Database Storage: PostgreSQL with Drizzle ORM for efficient queries
  • Caching: Aggregated results cached for performance

Key Metrics Calculated:

  • Total wages by campus and year
  • Average wages per employee
  • Employee count trends
  • Pay distribution analysis
  • Year-over-year growth rates

Data Coverage

13 UC Locations

Complete coverage of all UC campuses and affiliated institutions

  • UC Berkeley
  • UC Davis
  • UC Irvine
  • UCLA
  • UC Merced
  • UC Riverside
  • UC San Diego
  • UC San Francisco
  • UC Santa Barbara
  • UC Santa Cruz
  • UC Office of the President
  • UC SF Law
  • ASUCLA

15 Years of Data

Historical data from 2010 to 2024

195 Location-Year Combinations
~2M+ Individual Records
100% Public Data

Update Frequency

Annual updates when new data becomes available

  • Data typically released in Q1 for previous year
  • Automated scraping process for updates
  • Historical data preserved for trend analysis
  • Version control for data changes

Privacy & Anonymization

While all data displayed is publicly available per California state law, we respect individual privacy. Names are anonymized in certain contexts (shown as "*****") while maintaining the ability to analyze compensation trends by job title, department, and campus.

The UC Annual Wage website implements its own anonymization for employees earning below certain thresholds or in specific categories. We preserve these anonymizations in our dataset.

Data Quality Assurance

Completeness Checks

Verify all expected fields are present

Calculation Validation

Ensure gross pay equals sum of components

Cross-Reference

Compare totals with official UC reports

Anomaly Detection

Flag unusual patterns for review

Limitations & Disclaimers

  • Data reflects only base salary and standard compensation components
  • Does not include benefits, retirement contributions, or other non-wage compensation
  • Mid-year hires or departures may show partial year compensation
  • Job titles and departments may change between years
  • Some records may be anonymized at the source for privacy
  • Data accuracy depends on source reporting to UC system

Questions About Our Methodology?

We're committed to transparency in our data collection and processing methods.