18 Machine Learning, Automated Science, and Virtues

Emanuele Ratti

One important ideal aspect of the scientific method is the fact that it is rule-based. This allows for transparency; anyone can go through the rules and replicate results. However, scientific practice is far from being strictly rule-based in the sense that scientific discovery is not constituted by a rigid set of steps that one has to follow to discover something scientifically valuable. Given the opacity of the way scientists discover things, a logic of discovery—strictly rules-based—has always been a dream of both scientists and philosophers.[1] Recently, developments in AI have been envisioned as likely to be effective in automating scientific discovery.

Big data analytics are only the most recent candidate for automated discovery within the tools provided by AI. In particular, it is now often said that methodologies and tools associated with big data can approximate the ideal of automated science such that humans will become dispensable. This is because human cognitive abilities will be rapidly surpassed by AI, which will take care of every aspect of scientific discovery.[2] However, it is not clear what “automated science” means in this context. Here, I argue that the idea behind automated science is not substantial, at least in biology (and in particular molecular genetics and genomics, my main case studies) and that big data analytics (machine learning in particular) are not independent of human beings and cannot form the basis of automated science, if by science we refer to a specific human activity.

By referring to machine learning methodologies (ML) in particular, I have made the point that any ML task is faced with a number of operations that require humans to make decisions based on aspects which are hardly reducible to computational abilities tout court. I report several of these examples.[3] A paradigmatic example of a task somehow irreducible to computational abilities is the choice of the algorithm,[4] in particular between supervised and unsupervised learning. Sometimes the choice is straightforward; if no labeled data set is available, then unsupervised learning is the only option. However, having a labeled data set does not necessarily imply that supervised learning is mandatory. In fact, as Libbrecht and Noble have pointed out, “every supervised learning method rests on the implicit assumption that the distribution responsible for generating the training data set is the same as the distribution responsible for generating the test data set.”[5] This means that sometimes data sets are generated in different ways and have different underlying characteristics. Libbrecht and Noble report the case of an algorithm trained to identify genes in human genomes; this probably will not work equally well for mouse genomes. However, it may work well enough for the purpose at hand. In these situations, the computer scientist has to first identify the fact that there might be a divergence between training and test sets. This divergence in ML, applied to biology, may also have a biological meaning, and it has to be scrutinized biologically and not only numerically.

Another task irreducible to computational tractability is labels interpretation. In cases of unsupervised learning, algorithms cluster data points that are similar according to a specific measure established by the algorithm. However, the algorithm itself must be informed at first about the number of groups into which the data points should be clustered. Hence, it is provided with a number of empty labels. However, when we apply such algorithms to biological data, we have to assign an interpretation to the empty labels: we have to say to which biological objects those labels represent. This usually requires collaborative research between different epistemological cultures,[6] and the way people negotiate such interpretations is opaque and difficult to reduce to rules.

I have interpreted all these problems in light of a virtue that Shannon Vallor has identified for experimentalists, namely perceptual responsiveness, defined as “a tendency to direct one’s scientific praxis in a manner that is motivated by the emergent contours of particular phenomena and the specific form(s) of practical and theoretical engagement they invite.”[7] To simplify, this virtue is a disposition to direct scientific praxis towards the affordances that a phenomenon under investigation yields. According to Vallor, a virtuous scientist is one who “properly reads or ‘decodes’ all of the salient invitations to measurement implied by the phenomenon…and creatively finds a way to take up just those invitations whose answers may shed the most light.”[8] Vallor explicitly frames perceptual responsiveness in a virtue ethics/epistemology framework, such that scientific inquiry on this matter looks like value judgment rather than a mere procedure applying rules. To stress this further, the way a phenomenon is investigated scientifically is a function of the interaction of scientist’s background, experiences, aims, ability to affordances, and so forth. It is a habit of seeing things under a specific light.

My claim is that the irreducibility to rules and clear-cut procedures identified in ML methodologies is a sign that ML practitioners may have to cultivate a computer science-version of perceptual responsiveness (a sort of phronesis). If this is the case, then science cannot be automated, not even under a strict AI regime.

EMANUELE RATTI is Visiting Assistant Professor in the Reilly Center for Science, Technology, and Values at the University of Notre Dame. Prior to this, he was a postdoctoral research associate in the Center for Theology, Science, and Human Flourishing. His areas of specialization are the History and Philosophy of Science (in particular Molecular Biology, Genomics, and A.I.), General Philosophy of Science, and Ethics of Science and Technology (including Virtue Ethics). His research trajectory deals with the data-intensive turn in the biological field and its consequences for the scope and the goals of biology. He is also interested in ethical and epistemological issues concerning the concept of automated science and the role of A.I. in science.

Bibliography

  • Alkhateeb, Ahmed, and Aeon. “Can Scientific Discovery Be Automated?” The Atlantic. April 25, 2017. https://www.theatlantic.com/science/archive/2017/04/can-scientific-discovery-be-automated/524136/.
  • Keller, Evelyn Fox. Making Sense of Life: Explaining Biological Development with Models, Metaphors and Machines. Cambridge, MA: Harvard University Press, 2002.
  • Libbrecht, Maxwell W., and William Stafford Noble. “Machine Learning Applications in Genetics and Genomics.” Nature Reviews Genetics 16 (2015): 321–32.
  • Ratti, Emanuele. “Phronesis and Automated Science: The Case of Machine Learning and Biology”. In Will Science Remain Human?, edited by Fabio Sterpetti and Marta Bertolaso. Berlin: Springer, 2019.
  • Schaffner, Kenneth. Discovery and Explanation in Biology and Medicine. Chicago: University of Chicago Press, 1993.
  • Schmidt, Michael, and Hod Lipson. “Distilling Free-Form Natural Laws from Experimental Data.” Science 324 (2009): 81–5.
  • Vallor, Shannon. “Experimental Virtues: Perceptual Responsiveness and the Praxis of Scientific Observation.” In Virtue Epistemology Naturalized: Bridges Between Virtue Epistemology and Philosophy of Science, edited by Abrol Fairweather, 269–90. Berlin: Springer, 2014.
  • Yarkoni, Tal, Russell A. Poldrack, Thomas E. Nichols, David C. Van Essen, and Tor D. Wagner. “Large-Scale Automated Synthesis of Human Functional Neuroimaging Data.” Nature Methods 8 (2011): 665–70.

  1. Kenneth Schaffner, Discovery and Explanation in Biology and Medicine (Chicago: University of Chicago Press, 1993).
  2. Tal Yarkoni, Russell A. Poldrack, Thomas E. Nichols, David C. Van Essen, and Tor D. Wagner, “Large-scale Automated Synthesis of Human Functional Neuroimaging Data,” Nature Methods 8 (2011): 665–70; Michael Schmidt and Hod Lipson, “Distilling Free-Form Natural Laws from Experimental Data,” Science 324 (2009): 81–5;  Ahmed Alkhateeb and Aeon, “Can Scientific Discovery be Automated?,” The Atlantic, April 25, 2017, https://www.theatlantic.com/science/archive/2017/04/can-scientific-discovery-be-automated/524136/.
  3. Emanuele Ratti, “Phronesis and Automated Science: The Case of Machine Learning and Biology,” in Will Science Remain Human?, edited by Fabio Sterpetti and Marta Bertolaso (Berlin: Springer, 2019).
  4. Maxwell W. Libbrecht and William Stafford Noble, “Machine Learning Applications in Genetics and Genomics,” Nature Reviews Genetics 16 (2015): 321–32.
  5. Ibid., 323.
  6. Evelyn Fox Keller, Making Sense of Life: Explaining Biological Development with Models, Metaphors and Machines (Cambridge, MA: Harvard University Press, 2002).
  7. Shannon Vallor, “Experimental Virtues: Perceptual Responsiveness and the Praxis of Scientific Observation,” in Virtue Epistemology Naturalized: Bridges Between Virtue Epistemology and Philosophy of Science, edited by Abrol Fairweather (Berlin: Springer, 2014), 271.
  8. Ibid., 276.