Data Collection Methodology
Transparent and systematic approach to gathering University of California wage data
Data Collection Process
Source Identification
Data is collected directly from the official UC Annual Wage website (ucannualwage.ucop.edu), which publishes employee compensation data as mandated by California state law. This ensures all data is publicly available and legally accessible.
Automated Collection
We use a high-performance Go-based scraper with concurrent workers to efficiently collect data. The scraper respects rate limits and implements retry logic to ensure reliable data collection without overloading the source.
Data Validation
Each data point is validated for completeness and accuracy. Records are checked for required fields including employee name (anonymized where necessary), title, location, year, and various pay components.
Storage & Organization
Data is stored in a structured JSON format, organized by campus location and year. This hierarchical structure enables efficient querying and maintains data integrity across years of historical information.
Technical Implementation
Scraper Architecture
Core Components:
- Language: Go (Golang) for high performance and concurrency
- Worker Pool: Configurable concurrent workers (default: 3-10)
- Rate Limiting: Built-in delays between requests (default: 1 second)
- Retry Logic: Automatic retry on failures with exponential backoff
- Progress Tracking: Resume capability for interrupted scraping sessions
API Interaction:
POST https://ucannualwage.ucop.edu/wage/search
Content-Type: application/json
{
"op": "search",
"page": 1,
"rows": 100,
"year": "2024",
"location": "Berkeley"
}Data Structure
Storage Format:
data/
├── [Campus_Name]/
│ ├── wages_2024.json
│ ├── wages_2023.json
│ └── ...
└── scrape_progress.json Record Schema:
{
"location": "Berkeley",
"year": 2024,
"scraped_at": "2025-09-13T19:16:37Z",
"total_records": 37078,
"records": [
{
"firstname": "*****",
"lastname": "*****",
"title": "Professor",
"location": "Berkeley",
"year": "2024",
"basepay": "150,000.00",
"overtimepay": "0.00",
"adjustpay": "5,000.00",
"grosspay": "155,000.00"
}
]
}Data Processing
Aggregation Pipeline:
- Ingestion: JSON files are parsed and validated
- Transformation: Pay values converted from strings to numbers
- Aggregation: Calculate totals, averages, and counts by campus/year
- Database Storage: PostgreSQL with Drizzle ORM for efficient queries
- Caching: Aggregated results cached for performance
Key Metrics Calculated:
- Total wages by campus and year
- Average wages per employee
- Employee count trends
- Pay distribution analysis
- Year-over-year growth rates
Data Coverage
13 UC Locations
Complete coverage of all UC campuses and affiliated institutions
- UC Berkeley
- UC Davis
- UC Irvine
- UCLA
- UC Merced
- UC Riverside
- UC San Diego
- UC San Francisco
- UC Santa Barbara
- UC Santa Cruz
- UC Office of the President
- UC SF Law
- ASUCLA
15 Years of Data
Historical data from 2010 to 2024
Update Frequency
Annual updates when new data becomes available
- Data typically released in Q1 for previous year
- Automated scraping process for updates
- Historical data preserved for trend analysis
- Version control for data changes
Privacy & Anonymization
While all data displayed is publicly available per California state law, we respect individual privacy. Names are anonymized in certain contexts (shown as "*****") while maintaining the ability to analyze compensation trends by job title, department, and campus.
The UC Annual Wage website implements its own anonymization for employees earning below certain thresholds or in specific categories. We preserve these anonymizations in our dataset.
Data Quality Assurance
Completeness Checks
Verify all expected fields are present
Calculation Validation
Ensure gross pay equals sum of components
Cross-Reference
Compare totals with official UC reports
Anomaly Detection
Flag unusual patterns for review
Limitations & Disclaimers
- Data reflects only base salary and standard compensation components
- Does not include benefits, retirement contributions, or other non-wage compensation
- Mid-year hires or departures may show partial year compensation
- Job titles and departments may change between years
- Some records may be anonymized at the source for privacy
- Data accuracy depends on source reporting to UC system
Questions About Our Methodology?
We're committed to transparency in our data collection and processing methods.