Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?

Miceli, Milagros; Posada, Julian; Yang, Tianling

Computer Science > Human-Computer Interaction

arXiv:2109.08131 (cs)

[Submitted on 16 Sep 2021]

Title:Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?

Authors:Milagros Miceli, Julian Posada, Tianling Yang

View PDF

Abstract:Research in machine learning (ML) has primarily argued that models trained on incomplete or biased datasets can lead to discriminatory outputs. In this commentary, we propose moving the research focus beyond bias-oriented framings by adopting a power-aware perspective to "study up" ML datasets. This means accounting for historical inequities, labor conditions, and epistemological standpoints inscribed in data. We draw on HCI and CSCW work to support our argument, critically analyze previous research, and point at two co-existing lines of work within our community -- one bias-oriented, the other power-aware. This way, we highlight the need for dialogue and cooperation in three areas: data quality, data work, and data documentation. In the first area, we argue that reducing societal problems to "bias" misses the context-based nature of data. In the second one, we highlight the corporate forces and market imperatives involved in the labor of data workers that subsequently shape ML datasets. Finally, we propose expanding current transparency-oriented efforts in dataset documentation to reflect the social contexts of data design and production.

Comments:	Accepted at ACM Group 2022. Forthcoming on Proceedings of the ACM on Human-Computer Interaction
Subjects:	Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2109.08131 [cs.HC]
	(or arXiv:2109.08131v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2109.08131

Submission history

From: Milagros Miceli [view email]
[v1] Thu, 16 Sep 2021 17:38:26 UTC (149 KB)

Computer Science > Human-Computer Interaction

Title:Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators