Critiquing (t)reason in knowledge representation: historical perspectives on semantic data infrastructure


In this talk, I discuss how ideas about language and meaning have become interwoven in the design of data infrastructure–affording particular possibilities for how today’s data, information, and knowledge can be ordered. Since the advent of the World Wide Web, advocacy around sharing research, government, and other forms of institutional data has surged. In the wake of this promotion, data scientists and stewards have convened to discuss, design, and implement infrastructure that organizes and contextualizes data in ways that enable non-producers to find and interpret it. To ensure that data resources are ‘FAIR’ (Findable, Accessible, Interoperable, Reusable), data scientists and stewards are continuously working to hone techniques for enhancing these resources with “semantics”–or standardized formats and taxonomies for structuring, describing, and linking data. While the concept of data FAIR-ness is quite new, the techniques used to structure data have a much longer history. In the early-1960s, researchers in a sub-field of artificial intelligence referred to as ‘knowledge representation’ began theorizing techniques for encoding the meaning of things and concepts in ways that a computer could de-code. In this talk, I narrate how this research laid the groundwork for many automated computer systems and eventually Semantic Web technologies that are leveraged to support research data sharing today. I argue that, throughout the history of knowledge representation, competing epistemologies about what it means to “represent” knowledge have incited debates over how strictly to model the structure of ideas in a domain and the role of formal logic in representing relationships. Finally, I unpack how these debates have shaped the design of contemporary semantic data infrastructure, implicating the potential for knowledge ordering in data systems today.