Models inherit historical discrimination; accuracy does not equal fairness
Feedback loops
Model output becomes training data → biases self-reinforce
Proxy discrimination
Discrimination through correlated variables (ZIP code for race) even without protected features
Fairness definitions
Multiple incompatible mathematical definitions; choosing between them is a values question
Right to explanation
GDPR Art. 22: high-stakes automated decisions must be explainable — requires interpretable models
Contextual integrity
Privacy = appropriate information flow by context; not just secrecy
Consent fiction
Informed consent at internet scale is practically impossible; architecture must do the work
Data as power
Behavioral data concentration = unprecedented institutional knowledge asymmetry
Industrial Revolution analogy
Individual ethics insufficient; structural regulation necessary and historically validated
Purpose limitation
Must be architectural (access controls, retention limits), not just policy-stated
Types of Algorithmic Bias — Quick Reference
Historical bias: Training data reflects past discrimination
Example: Amazon hiring model trained on male-dominated past hires
Fix: Reweight data; audit for disparate impact; choose different labels
Representation bias: Underrepresented groups have insufficient training examples
Example: Facial recognition worst on dark-skinned women (34% vs 0.8% error)
Fix: Diversify data collection; measure error rates by group
Measurement bias: Proxy variable correlates with protected characteristic
Example: ZIP code → race → credit scoring discrimination
Fix: Audit feature correlations; consider removing correlated features
Aggregation bias: One model applied to heterogeneous groups without adaptation
Example: Single global model applied to all demographics equally
Fix: Segment models or add demographic context features
Feedback loop bias: Model output used as future training labels
Example: Predictive policing → more arrests in predicted areas → model reinforced
Fix: Break loop; use independent ground truth for labels
Deployment bias: Model used in context it was not built or tested for
Example: COMPAS trained on one region, used nationally in criminal sentencing
Fix: Validate in deployment context; restrict use to validated contexts
Feedback Loop Anatomy
Feedback Loop Example: Predictive Policing
[Historical crime data]
│
▼
[Train model: predict where crime will occur]
│
▼
[Deploy: send more police to predicted areas]
│
▼
[Police discover more crime in those areas]
│
▼
[New crime data added to training set]
│
└──────────────────────────────────▶ (loop repeats, amplified)
How to break the loop:
├─ Use independent ground truth (victim reports, not police reports)
├─ Random exploration (don't always follow model predictions)
├─ Human review before feeding output back as training labels
└─ Measure and monitor group-level outcomes over time
Privacy Regulations at a Glance
Regulation
Where
Core Rights
Key Technical Requirement
Max Fine
GDPR
EU/EEA
Access, erasure, portability, explanation
Right to erasure in 30 days; DPIA for high-risk
4% global turnover
CCPA/CPRA
California
Know, delete, opt-out of sale
Honor opt-out requests; disclose sharing
$7,500/intentional violation
LGPD
Brazil
Similar to GDPR
Data minimization, purpose limitation
2% Brazil revenue, cap R$50M
PDPB
India
Access, correction, erasure
Data localization for sensitive data
Up to ₹500 crore
EU AI Act
EU
Human oversight for high-risk AI
Conformity assessment, bias testing, logging
6% turnover (prohibited AI)
EU AI Act: Risk Tiers
UNACCEPTABLE RISK (banned):
├─ Social scoring by governments
├─ Real-time remote biometric ID in public spaces (narrow exceptions)
├─ Emotion recognition in workplace/schools
└─ AI that manipulates people's behavior covertly
HIGH RISK (regulated — most relevant for data engineers):
├─ Employment: CV screening, performance evaluation, promotion decisions
├─ Credit scoring and financial decisions
├─ Criminal justice: recidivism prediction, risk assessment
├─ Education: scoring exams, admission, monitoring students
└─ Critical infrastructure management
High-risk requirements:
✓ Conformity assessment before deployment
✓ Technical documentation (training data, architecture, testing)
✓ Bias testing disaggregated by demographic group
✓ Human oversight capability
✓ Logging of decisions for audit
✓ Transparency disclosure to affected individuals
LIMITED RISK (transparency required):
└─ Chatbots, deepfakes: must disclose it is AI
MINIMAL RISK (no requirements):
└─ Spam filters, recommendation systems (mostly)
Ethical Decision Framework for Data Engineers
Step 1: MINIMALISM TEST
Q: Do we have a specific, documented use for this data?
Q: Could we achieve the same goal with less data?
If no clear purpose → do not collect
Step 2: CONTEXTUAL INTEGRITY TEST
Q: In what context was this data originally shared?
Q: Does the proposed new use respect those contextual norms?
Medical data + insurance pricing → violation
Location for navigation + advertising → likely violation
Step 3: DISPARATE IMPACT TEST
Q: What are error rates disaggregated by demographic group?
Q: Does any group bear a higher false positive/negative rate?
Q: What are the consequences of errors for the individual?
Run before deployment; run on an ongoing basis
Step 4: POWER ASYMMETRY TEST
Q: Does this system create power asymmetries?
Q: Do surveilled people have meaningful recourse?
Q: Could this system be turned against the people it serves?
Step 5: REGRET TEST
Q: Would I be comfortable if the affected people saw exactly what we do?
Q: Would I be proud explaining this to a journalist covering algorithmic harm?
If "no" to either → redesign
Surveillance Capitalism Data Flow
User visits website / uses app
│
▼
First-party data collected (login, purchases, clicks)
│
│
▼
Third-party pixels/SDKs fire (Google Analytics, Meta Pixel, etc.)
│
├──▶ Advertising network receives: user ID, URL, timestamp, device
│
▼
Real-time bidding (RTB): user profile broadcast to 500+ ad buyers
│
└──▶ Highest bidder's ad shown (~100ms)
Data broker layer:
├─ Purchases behavioral data from apps, loyalty programs, public records
├─ Aggregates: name, address, income, health, political affiliation
└─ Sells to insurers, employers, law enforcement, political campaigns
User knowledge: essentially none
Company knowledge: comprehensive behavioral profile updated continuously
Fairness Definitions (and Why They Conflict)
Demographic Parity: Equal positive prediction rate across groups
P(ŷ=1 | group=A) = P(ŷ=1 | group=B)
Equalized Odds: Equal true positive AND false positive rates
P(ŷ=1 | Y=1, group=A) = P(ŷ=1 | Y=1, group=B) [equal TPR]
P(ŷ=1 | Y=0, group=A) = P(ŷ=1 | Y=0, group=B) [equal FPR]
Predictive Parity: Equal positive predictive value
P(Y=1 | ŷ=1, group=A) = P(Y=1 | ŷ=1, group=B)
Counterfactual: Same prediction if protected attribute were different
KEY INSIGHT (Chouldechova 2017):
Demographic parity + equalized odds + predictive parity
CANNOT ALL BE SATISFIED when base rates differ between groups
This is not a technical limitation to overcome.
It is a values question: which errors harm people more?
That is a political and ethical decision, not a model parameter.
Key Case Studies
Case
System
Harm
Lesson
COMPAS (2016)
Recidivism prediction in US courts
2x false high-risk rate for Black defendants vs white
COLLECTION:
Don't collect: data you don't have a specific documented use for
Don't collect: more precision than needed (city vs exact GPS)
Don't collect: in raw form if aggregated form is sufficient
RETENTION:
Define retention period at design time, not retrospectively
Automate deletion (TTL, deletion pipelines)
Log retention ≠ endless retention; plan for right-to-erasure
SHARING:
Default to not sharing
Document every third-party data flow (required by GDPR)
Evaluate each third-party's privacy practices before enabling pixel/SDK
STORAGE:
Store user IDs, not names/emails, in event logs and analytics
PII should live in one authoritative store, referenced elsewhere by ID
Separation enables erasure propagation: delete one store, not thousands
Professional Responsibility Framing
The "just following specs" argument:
Engineer → "I just implemented what was specified"
PM → "I just defined what the business needed"
Executive → "I just approved what the team built"
No one made the discriminatory decision individually.
The system did.
Why this is insufficient:
Civil engineers: professionally licensed; legally liable for structural safety
Doctors: professional oath; cannot "just follow orders" against patient welfare
Data engineers: no comparable framework yet — but the harms are at comparable scale
What professional responsibility means in practice:
1. Raise ethical concerns during design, not after deployment
2. Require disparate impact analysis before deploying classification models
3. Document known limitations and failure modes
4. Push back on use cases that violate contextual integrity
5. Support, rather than resist, regulatory accountability frameworks