Showing posts with label predictive analytics. Show all posts
Showing posts with label predictive analytics. Show all posts

Sunday, May 3, 2015

Putting the Train in AppTrain

In late 2005, when first learning Ruby and Rails, I founded the AppTrain project,  a web interface to the early Rails generators.   The Train represented a vehicle that was rolling along on top of Rails. As Rails grew in popularity, we began helping build Rails teams and the train took on a new meaning. We were training developers in Rails and related technologies.

And now  years later,  still excited about the future of technology, we're doing plenty of data science programming and machine learning.  With machine learning, specifically supervised learning, it’s important to build a good training set of of data.  Training sets represent a relationship between a result and a list of data points that correspond with a that result.  The result is also referred to as a signal.  

In supervised learning different algorithms can be trained on these training sets.  When an algorithm is being trained, it is looking for a function that best explains a signal.

Imagine a small data set like this:

The first number on each line is our result, or signal.  The second is the input data that leads to that signal.  Do you see a function that could predict the value of the signal x given a new value for y?


Visually we immediately see that x is always greater than y.  (x > y)  Our minds are searching for a function that will predict x.  How much greater is x than y?   You'll notice pretty quickly, it's exactly double.

x = 2y
x = 8

x represents the signal. y is the training data.  

Data Scientists have developed many algorithms that can run through numbers similar to the way our minds do.  But they can do it much faster.  Imagine a training set not with three rows, but with 100 or 1000.  It would be pretty boring to read through them to make sure x was always double y, but it's a great job for a computer.

Complicated Data Sets

Now imagine the training set has not just 2 variables (columns), x and y, but 10, or 100.

Here's data from a training set in the Restaurant Revenue prediction competition at Kaggle.  

In this case the result (or signal) we're trying to predict is the last column in each row, the revenue.

The Python programming language is a favorite of data science programmers. Scikit-learn is the machine learning library for Python. It contains learning algorithms designed be trained on data sets like this restaurant revenue data.  Each algorithm in Scikit Learn looks for functions that predict the signals found in training data. 

To solve Kaggle problems like the restaurant revenue problem, competitors typically first try one of the single models found in scikit-learn. On the discussion board for the competition, people mention using support vector machines (SVM), Gradient Boosting (GBM) , and Random Forrest. But competition winners ultimately blend techniques, or even devise their own algorithms.

Meanwhile, the train cruises forward at AppTrain.  Today we're building training sets, and training algorithms. Want to learn more about machine learning?  Attend our tech talks at Làm việc mạng in Saigon this summer.

Monday, February 9, 2015

NFL Fantasy Sports API

The NFL has made an impressive Application Programming Interface (API) available to application developers.

 Fantasty Football Web Services  allows existing or new applications to access live NFL data, potentially in real time.  It also allows users to join existing fantasy leagues, or even create new leagues.  Perhaps the most exciting offerings are still to come. Michael Vizard  of Programmable Web writes:

A big part of that effort revolves around giving fans access to statistics and analytics tools that they can use to figure out which players to draft and keep. In the postseason, even went so far as to create a separate fantasy game event that involved just the teams that made the playoffs.

Access to these statistics and analytic tools is something large organizations like the NFL need to compete with the powerful data analysis capabilities available to smaller companies today. In addition to the  playoff fantasy game mentioned above, there's also a prowbowl api available. So the NFL seems to be running with the API.

First, let's have a closer look at the The Web services available now which are documented here: .  Then we'll explore some hidden gems available now in the API that access some underlying predictive analytics, straight form the NFL!

Available Data

Any current player statistics are available through the well documented API calls. To write to the API you'll need a key which you request by emailing .

Scoring Leaders


The data comes back in XML by default, but you can clean that up with a simple format parameter at the end of your request:

Weekly Stats

Users can further filter requests by position, team, week and season.  The available  stats go back to 2009.

Advanced Stats

Additional statistics such as RedZone Touches are also available

{"QB":[{"id":"2533033","esbid":"GRI283140","gsisPlayerId":"00-0029665","firstName":"Robert","lastName":"Griffin","teamAbbr":"WAS","opponentTeamAbbr":"@HOU","position":"QB","stats":{"FanPtsAgainstOpponentPts":"25.00","FanPtsAgainstOpponentRank":"2","Carries":"9","Touches":"9","Receptions":false,"Targets":false,"ReceptionPercentage":false,"RedzoneTargets":false,"RedzoneTouches":"1","RedzoneG2g":false},"status":"Loss, 6-17"}],"RB":[{"id":"2533457","esbid":"MOR317547","gsisPlayerId":"00-0029141","firstName":"Alfred","lastName":"Morris","teamAbbr":"WAS","opponentTeamAbbr":"@HOU","position":"RB","stats":{"FanPtsAgainstOpponentPts":"25.60","FanPtsAgainstOpponentRank":"4","Carries":"28","Touches":"28","Receptions":false,"Targets":false,"ReceptionPercentage":false,"RedzoneTargets":false,"RedzoneTouches":"4","RedzoneG2g":"2"},"status":"Loss, 6-17"}],"WR":[{"id":"80425","esbid":"HAR829482","gsisPlayerId":"00-0026998","firstName":"Percy","lastName":"Harvin","teamAbbr":"NYJ","opponentTeamAbbr":"OAK","position":"WR","stats":{"FanPtsAgainstOpponentPts":"39.00","FanPtsAgainstOpponentRank":"3","Carries":"5","Touches":"11","Receptions":"6","Targets":"8","ReceptionPercentage":"75","RedzoneTargets":"1","RedzoneTouches":"2","RedzoneG2g":"1"},"status":"Win, 19-14"}],"TE":[{"id":"2530473","esbid":"ADA482150","gsisPlayerId":"00-0028337","firstName":"Kyle","lastName":"Adams","teamAbbr":"","opponentTeamAbbr":"Bye","position":"TE","stats":{"FanPtsAgainstOpponentPts":"","FanPtsAgainstOpponentRank":"","Carries":false,"Touches":"1","Receptions":"1","Targets":"1","ReceptionPercentage":"100","RedzoneTargets":false,"RedzoneTouches":false,"RedzoneG2g":false},"status":""}],"K":[{"id":"2499370","esbid":"AKE551610","gsisPlayerId":"00-0000108","firstName":"David","lastName":"Akers","teamAbbr":"","opponentTeamAbbr":"Bye","position":"K","stats":{"FanPtsAgainstOpponentPts":"","FanPtsAgainstOpponentRank":"","Carries":false,"Touches":false,"Receptions":false,"Targets":false,"ReceptionPercentage":false,"RedzoneTargets":false,"RedzoneTouches":false,"RedzoneG2g":false},"status":""}],"DEF":[{"id":"100029","esbid":false,"gsisPlayerId":false,"firstName":"San Francisco","lastName":"49ers","teamAbbr":"SF","opponentTeamAbbr":"@DAL","position":"DEF","stats":{"FanPtsAgainstOpponentPts":"5.00","FanPtsAgainstOpponentRank":"20","Carries":false,"Touches":false,"Receptions":false,"Targets":false,"ReceptionPercentage":false,"RedzoneTargets":false,"RedzoneTouches":false,"RedzoneG2g":false},"status":"Win, 28-17"}]}

Managing Leagues with API Writes

Add a Player

This call adds a player to your fantasy team.

A valid API key will get a success response:


Create a League

You can even create a new league,

then email out links for people to join:

For the application developer and fantasy sports fan, the fun has just begun.

Analytic Tools

Developers and analysts are used to writing their own analytic tools.  The API provides data designed just for custom analytics.  The Pro Bowl API returns players twitter user ids. Those feeds along with the players/news call can keep users up to date with the latest developments.  Potentially that text data can even be mined and analyzed for predictors of next weeks performance.  Does Gronk play better after appearing on Conan?


Predictive analytics has become so prevalent that the NFL is now providing projections of next weeks fantasy points.


Have a look at the JSON response to this request for 2014 week stats.

Algorithms behind the scenes at the NFL boldly predicted at week 1 that NY Giants backup Quarterback Husain Abdullah would have no points the next week (weekProjectedPts)

{"id":"729","esbid":"ABD660476","gsisPlayerId":"00-0025940","name":"Husain Abdullah","position":"DB","teamAbbr":"KC","stats":{"1":"16","70":"58","71":"13","73":"1","76":"1","81":"10","82":"39","84":"2","85":"5","89":"1"},"seasonPts":82.5,"seasonProjectedPts":0,"weekPts":3,"weekProjectedPts":0}

The NFL algorithms were close. If you change the week number to 2 in the url above, you'll see that Husain did scrape up a point the next week.


Season projected points appears in the /playesrs/stats response as well. It appears to be a placeholder for now.  It doesn't change week to week, and it's 0 for any year prior to 2014.  It will be interesting to watch this attribute in 2015.

Watch for more updates to the NFL Fantasy Sports API during the offseason, and for some interesting applications build around the new API.

Friday, November 29, 2013

Friday, April 26, 2013

Predicting the Stock Market using Big Data

In the paper Quantifying Trading Behavior in Financial Markets Using Google Trends researchers Tobias Preis, Helen Susannah Moat and H. Eugene Stanley have shown that an increase in activity of certain search terms from Google Trends correlates with a decline in stock prices in the Dow Jones Industrial Average (DJIA). The authors then compared investment strategies to show that the search activity isn't just a correlation, but can be used as a valid predictor of market activity.

This graph shows the DJIA on the left, and color codes 3 week periods in the graph according to search frequencies of the word debt. Note that red weeks, like late October of 2008 correspond with declines in the DJIA.  So when lot's of people were searching for the work debt, the stock market went down.

The word Debt was the best performing term in the study.  Notably , it performed better than terms like nyse ,nasdq, and dow jones.

The researchers then compared a Google Trends investment strategy to a basic Buy and Hold strategy and a random Dow Jones strategy.

The results were remarkable.  The Google Trends strategy far exceeded the other strategies.

Wednesday, March 27, 2013

"Big Data" is so 1998

This 1998 SiliconGraphics ad from Black Enterprise magazine offers solutions for a "Big Data" world.  256GB of system memory on a server and 400 Terabytes of storage. Not bad for the 20th century. Or for this century.

The "Big Data" buzzword almost caught on in 1998, but it's sister buzzword, Data Mining won out.  In the  first chapter of "Predictive Data Mining: A Practical Guide is titled "Big Data" (also from 1998) the author Sholom M. Weiss asks "Is data mining a revolutionary new concept? or can we benefit from the may years of research on data analysis?”

Weiss goes on to say "While big data have the potential for better results, there is no guarantee that they are more predictive than small data" With all the hype around Big Data, it helps to get back to the origins of the term and realize that it's one of may interesting problems that experts in a variety of disciplines have been wrestling with for a long time.

Popular Articles