Companies hopelessly stumped by a Big Data problem they desperately need solved now have a cavalry to call upon: more than 500,000 data scientists across the globe, just itching to take on a challenge.
The online community Kaggle is a loose association of some of the brightest minds on the planet, who love competing against one another to push the limits of what Big Data can accomplish.
"It started with a $1,000 prize competition," recalls Anthony Goldbloom, Kaggle’s CEO and a former Australian government data scientist who felt less than challenged by his work.
Luckily for Goldbloom, interest in his quirky, new website—and the first competition it sponsored in 2010—was keen. Subsequent competitions, which proved equally popular, helped grow the Kaggle community to the point where it now refers to itself as "the world's largest community of data scientists ... from over 100 countries and 200 universities."
It was a contest in 2011, sponsored by the U.S. National Aeronautics and Space Administration (NASA)—which had gotten wind of Kaggle from an article published in Science—that put the online community on the map. "We were at the right place at the right time," Goldbloom says.
The space agency's challenge: for centuries, dark matter—which distorts light as it travels from distant galaxies—has marred the way we perceive the universe and the objects in it. A new algorithm was needed to improve on previous work to auto-correct for that distortion, which shows up in space images.
No giant cash reward was offered, no lavish stock options were dangled, and there would be no free rides on the space shuttle; instead, NASA simply offered the winning team an expenses-paid-trip (up to $3,000 in value) to present its findings to NASA’s Jet Propulsion Laboratory in Pasadena, CA. For 72 teams of bright-eyed scientists, it was a prize too heady to resist. For the companies and organizations monitoring the competition, it was a response that brought home a piercing insight into what drives Kagglers to reach for the stars: being able to go where no one has gone before.
"Money isn't the prime motivator behind these competitions," says Goldbloom.
Instead, in the NASA contest—and all other Kaggle competitions, for that matter—data science teams are competing for the recognition that, from among many of the most brilliant minds of the planet, they are the ones who were able to come up with a solution that outdid anything anyone else could imagine. The select few able to win multiple Kaggle competitions from among more than a half-million competitors (519,782 at this writing) are in the top 500 of those competitors, la creme de la creme.
"It's a powerful credential," Goldbloom says. Companies like The New York Times, Capitol One bank, and others have advertised for data scientists who are highly ranked competitors on Kaggle, he says, adding, "Facebook, Air BnB, Yelp, Walmart; they've all held competitions on Kaggle to find data scientists they'd like to interview for positions."
These days, with a global reputation for excellence, Kaggle hosts an average of 50 competitions a year. Goldbloom says Kaggle charges "six figures" to manage each competition, and the Kaggle community is ready to take on virtually any type of data challenge a business or organization can muster.
A Kaggle recently yielded an algorithm, for example, which enables a computer imaging system to identify diabetes patients who are at risk of blindness; its accuracy (86%) is better than what humans can achieve. The developer of that algorithm, University of Warwick associate professor of statistics Ben Graham, walked away with the recognition of outdoing all comers, as well as $100,000 in prize money provided by sponsor California Health Care Foundation.
"It was money well spent," says Jorge Cuadros, CEO of San Jose, CA-based EyePACS LLC, an organization dedicated to preventing vision impairment due to diabetic retinopathy. Cuadros said Graham's algorithm may be headed for a clinical trial before the year is out.
Not surprisingly, other companies with Big Data problems want in, and Goldbloom and Kaggle are ready to accommodate. Organizations have two choices, Goldbloom says: they can post a public challenge to the half-million-plus-member Kaggle community or, if they prefer their data remain private, they can work with Kaggle to put together a 'by-invitation-only' competition that keeps company data out of the public domain and enlists only the top talent from Kaggle—data scientists who have won previous Kaggle competitions.
Cuadros said Kaggle has "a great reputation in the data science community and has solved many interesting problems. Personally, I found Kaggle to be an extremely capable and intelligent group."
Joe Dysart is an Internet speaker and business consultant based in New York City.