Develop an unsupervised learning algorithm to find subject commonalities

Image from Pixabay

After reading a news article — whether the subject matter is U.S. politics, a movie review, or a productivity tip — you can turn to someone else and give them a general idea of what it’s about, right? Or if you read a novel, you can classify it as maybe sci-fi, literary fiction, or a romance.

Humans tend to be pretty good at classifying texts. And these days, computers can do it, too.

For a recent machine learning project, I downloaded consumer complaints from the Consumer Financial Protection Bureau and developed models to classify the complaints into one of five…

How to use an NLP model to sort incoming messages

Image by author with assist from Bruno/Germany / Pixabay

The Consumer Financial Protection Bureau (CFPB), a federal agency that began operations only in 2011, looks after the interests of consumers in the financial sector. As part of that mission, consumers can send in complaints when they feel they’ve been mistreated by a credit bureau, a bank, a credit card company, or another financial service provider.

These complaints are available for data scientists to download, and as a dataset, they offer a great launching point for constructing a Natural Language Processing (NLP) classification system.

I created a model that will analyze complaints and classify them into one of five categories…

The words “data science” never appear in Aristotle’s Nicomachean Ethics. (Image Source)

During the years of Facebook’s ascent, Mark Zuckerberg’s motto was famously, “Move fast and break things.” Well, since those days, a lot has been broken. Despite the various benefits big tech affords our lives, companies’ sole focus on fast growth has meant that ethical considerations of how their products might harm society have only been afterthoughts, something to apologize for later after the damage has been done.

Professional fields, from medicine to law, have developed their ethical standards over centuries. The field of data science hasn’t had the luxury of so much time. But since I’m nearing the completion of…

Photo by George Huffman on Unsplash

Say an accident occurred between three vehicles on a September morning. Roads were dry, and the posted speed limit was 35 mph. One of the drivers was a 43-year-old woman. Would this situation lead to a serious accident?

Being able to answer a question such as this one was the goal of a project I completed for Flatiron School. I’d learned about a dozen different machine learning models, and essentially, the project was a way for me to practice working with them. (GitHub link here.)

I used accident data maintained by the City of Chicago and merged this set with…

Overview of a data modeling project for Flatiron’s data science course

Photo by Eric Hammett from Pexels

How much is a home’s price affected by:

  • a recent renovation?
  • 10 additional square feet of living space?
  • other listings for sale in the area?

For my data science project for Flatiron School, I used a dataset covering home sales in King County, Washington, in 2014 and 2015 to see if could determine how much of a home’s worth was represented by particular characteristics, such as the area of the living space, when it was built, or whether it is located on a waterfront.

The ultimate purpose of the assignment was to give me a hands-on way to learn linear…

In my data science course with Flatiron School, our recent project was to pretend that Microsoft was launching a new movie studio, and we students had to make recommendations to the executive team. The project exercised our Python skills and facility with several libraries, including Pandas and visualization libraries. Data for our analysis came from sources such as the Internet Movie Database (IMDB), The Movie Database (TMDB), Box Office Mojo, and Rotten Tomatoes.

My initial survey of the sources found tons of data on individual movies: actors, directors, release dates, and run times. I found ratings from users and numbers…

Photo by Lukas Blazek on Unsplash

In college, I majored in philosophy because I was interested in questions about the nature of life. The branch of epistemology asks, What can we be certain of, what is unknown, and what is unknowable? Although philosophical ideas may seem airy when discussed casually, the discipline uses symbolic logic and mathematical rigor to tackle questions.

My career path has been varied, and I’ve pursued opportunities over the years that have often involved writing, traveling and exploration. I worked at an English-language newspaper in Thailand, reported for a financial publication in New York, taught schoolkids in Japan, and eventually settled in…

Henry Alpert

Newly minted data scientist.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store