Six Questions About Data Science Ethics

Published in

Towards Data Science

6 min readSep 3, 2021

The words “data science” never appear in Aristotle’s Nicomachean Ethics. (Image: Couleur/Pixabay)

During the years of Facebook’s ascent, Mark Zuckerberg’s motto was famously, “Move fast and break things.” Well, since those days, a lot has been broken. Despite the various benefits big tech affords our lives, companies’ sole focus on fast growth has meant that ethical considerations of how their products might harm society have only been afterthoughts, something to apologize for later after the damage has been done.

Professional fields, from medicine to law, have developed their ethical standards over centuries. The field of data science hasn’t had the luxury of so much time. But considering that I’ve recently completed a data science bootcamp and am looking for my first position in the field, I realize I may have to wrestle with many ethical issues that are only now coming into focus in the profession.

I did some research into articles and courses on data science ethics and came up with what I feel are some of the larger (often overlapping) questions that need to be explored — and hopefully answered by data scientists with at least some degree of standards and consensus.

1. Is the data’s sourcing biased?

Data has to be gathered from some source, and data scientists need to keep in mind how the data was gathered and where it comes from. Bias can be inherent. For example, a crowdsourced dataset of potholes was once gathered by Boston residents for the city government in an initiative called Street Bump. But considering that lower-income people are less likely than wealthier people to have cell phones, to own cars, and to be plugged into tech trends, the crowdsourced data favored wealthier citizens and their neighborhoods, and hence those neighborhoods would be favored for street repairs.

Every social media platform, from Twitter to TikTok to Instagram, has its own demographics and reflects members’ interests and proclivities. So, any data scraped from the platforms would reflect these communities’ biases.

2. Is privacy being protected?

Privacy, or the lack thereof, is already a well-established issue in the tech world. Governments across the world have passed laws about it. Countless apps and services require some agreement to privacy policies when signing up, but these lengthy and impenetrable agreements are typically more for a service’s lawyers than its users.

How secure is the data collected? What would be the damage done if it were compromised? Just because a company can harvest tons of data, does that mean it should? In a scandal of the 2016 election, Cambridge Analytica harvested Facebook data when users filled out a little survey, and data from all those users’ friends was harvested, too, even though they hadn’t participated in the survey.

And when sharing the data, how can it be truly made anonymous? It’s often not enough just to remove names, addresses, and other obvious identifying information.

3. To whom does the data belong, the company or the user?

Tech companies put a lot of effort into collecting data, so it’s natural that they should expect to own it. But what about when someone wants an unflattering or compromising picture of himself or herself taken down? If a company has data that’s inaccurate about someone, how can the person find out about it and correct it?

And there are also situations where data is about someone, but they don’t have a direct hand in it. Suppose a disgruntled ex-employee of a restaurant takes to a review website to slam the place with a one-star review. And then she gets thirty of her friends to do the same. How can the restaurant owner get the site to take down the reviews if they were made in bad faith?

4. How are algorithms affecting individuals’ lives?

Suppose a company uses data of its current employees to create an algorithm to forecast the probable success of new applicants. Now, suppose that company has a sexist work culture where women find it difficult to thrive. It’s likely that any algorithm will predict poor success rates for female applicants. One could think of similar analogies for underrepresented groups during a college admissions process. In essence, algorithms can enforce an unfair status quo.

An algorithm will also be blind and unforgiving of real-world situations. Suppose in another college admissions scenario, a student was placed on academic suspension, then decided to change his ways and worked hard at school from then on. An algorithm might reject the student based on that one suspension. As a real-life example, a 2014 hostage crisis in Australia meant people near the situation needed to get away fast. When they loaded up their Uber apps, surge pricing had automatically kicked in to four times the normal rate.

While these examples could maybe be chalked up to carelessness or a lack of imagination, as a clearly unethical example, Facebook once ran an experiment in which they manipulated users’ feeds to learn if a feed of positive posts led to a user posting positive posts themselves and vice versa. (It turns out it did.) So, Facebook was manipulating people’s emotions, making many users intentionally feel bad. Unlike with a traditional psychological experiment, users never gave any explicit informed consent to participate.

5. How are algorithms affecting society?

Cultural critics have complained that recommendation systems used by movie and music websites can keep people trapped in the same genres. People are less likely to try things out of their comfort zones; they become blinkered when they could be open-minded. Meanwhile, unestablished artists have trouble getting noticed while attention flows to the already popular.

More seriously, social critics have charged recommendation algorithms on sites like YouTube with radicalizing people politically. The algorithms are designed to keep users engaged and clicking video after video, which can lead them down a dopamine-fueled rabbit hole of increasingly extreme content. It’s easy to draw a line from extreme online content to the various violent acts that show up in the news regularly.

Whereas credit scoring has historically relied on a person’s financial information, social media data has entered into the mix. Proponents say this practice allows for more nuanced scoring, but others worry it could lead to social segregation and discrimination.

6. Is the data manipulated or deceiving?

Why many negative effects mentioned so far may reflect unintentional bias, sometimes there’s something more nefarious going on. A scandal surrounding Andrew Cuomo in early 2021, the governor of New York at the time, involved reports that he instructed advisors to publicly claim far fewer Covid-19 deaths of nursing home residents than actually occurred. Reported statistics counted only deaths that occurred in homes themselves, not deaths when patients contracted the disease in homes but later died in hospitals.

Farther south, in Florida, a data scientist named Rebekah Jones has feuded with Governor Ron DeSantis; she says she was pressured to remove key Covid-19 data from a popular Department of Heath dashboard. She was fired from her job in May 2020, and the feud kept escalating. Officers raided her home that December, and she went to jail in January 2021. The Florida Office of Inspector General granted her whistleblower protection last May.

Even when the data is accurate, its presentation can be manipulated to deceive. If you look closely at this Covid-19 graph of new cases per day that appeared on Fox News in April 2020, you’ll notice that the y-axis starts at 30, not zero, and the increments along the y-axis are logarithmic, not linear, which has the graphical effect of minimizing spikes. Most people looking at this graph would come away thinking new cases weren’t accelerating as fast as they really were.

From the list of questions I assembled, it’s clear that data scientists need to understand how their work impacts society. (Of course, many of them have already given much thought to questions like mine.) I also believe the field needs professional organizations to outline standards, promulgate them among data scientists, and educate the public about them. The profession would increase its trust with the public and the media and also earn credibility in social and political controversies.

For medical professionals, the American Medical Association publishes the monthly AMA Journal of Ethics. The American Bar Association speaks out on behalf of persecuted legal professionals around the world.

Who’s speaking up for Rebekah Jones?