Showing posts with label Analytics. Show all posts
Showing posts with label Analytics. Show all posts

Friday, May 8, 2015

Data Science with Python


At the last Tech Talk Tuesday we took an overview of Python's  Data Science related packages.







The key packages for numerical computing are Numpy, Scipy and Scikit-learn.  The documentation for python is great, and makes presentations like this easy.  These packages are loaded with code samples, even for complex concepts like  Grid search and cross validation.    The machine learning package, scikit-learn also has exercises below the code samples.  Doing the exercises enforces the concepts, and is great preparation for solving problems like the ones in Kaggle competitions.







We also demoed iPython Notebooks, a fantastic way to create live data analysis documents.




Thursday, January 2, 2014

Running OpenTSDB on Amazon EC2

Although there are cheaper alternatives for production systems, It's easy enough to get The Open Time Series Database OpenTSDB running on an EC2 instance of Amazon Web Services.

  1. First you'll need to run HBase on EC2
  2. Make a data directory mkdir hbase_data
  3. vi hbase-0.94.13/conf/ hbase-site.xml
  4. Using vi update the hbase.rootdir property value to: file:///home/ec2-user/hbase-0.94.13/hbase-\${user.name}/hbase
  5. sudo yum install git
  6. git clone git://github.com/OpenTSDB/opentsdb.git
  7. sudo yum install automake
  8. yum install gnuplot
  9. cd opentsdb
  10. ./build.sh
  11. env COMPRESSION=NONE HBASE_HOME=path/to/hbase-0.94.X ./src/create_table.sh
  12. tsdtmp=${TMPDIR-'/tmp'}/tsd
  13. mkdir -p "$tsdtmp" 
  14. ./build/tsdb tsd --port=4242 --staticroot=build/staticroot --cachedir="$tsdtmp"
  15. In AWS, click on your EC2 instance, then click "Security Groups" at the bottom left.  Click on the default group, then click the "inbound" tab.  You can now open the ec2 port 4242. 
Your ip address on port 4242 will display the web UI for your instance of OpenTSDB:









  • Tuesday, December 24, 2013

    The Journal of Trading: Smart Technology for Big Data


    Smart Technology for Big Data was published in the Winter edition of Journal of Trading.  You need to register to read them.  Here's the Abstract:

    This article provides an underlying structure for managing the big data phenomenon. Innovations and tools fundamental to handling big data are highlighted, and we look at how these technologies are being implemented in the financial industry. 

    See more at: http://www.iijournals.com/doi/abs/10.3905/jot.2013.9.1.057

    Wednesday, December 18, 2013

    Institutional Investor Journals: Big Data Article


    UPDATE:   Smart Technology for Big Data was published in the Winter edition of Journal of Trading, so the links below no longer work. You can access the article here: Smart Technology for Big Data (You'll still need to register if you haven't)




    My article Smart Technology for Big Data is published under advanced content at the Institutional Investor site.  You'll need to complete the free registration to read it. Enjoy!


    Friday, April 26, 2013

    Predicting the Stock Market using Big Data


    In the paper Quantifying Trading Behavior in Financial Markets Using Google Trends researchers Tobias Preis, Helen Susannah Moat and H. Eugene Stanley have shown that an increase in activity of certain search terms from Google Trends correlates with a decline in stock prices in the Dow Jones Industrial Average (DJIA). The authors then compared investment strategies to show that the search activity isn't just a correlation, but can be used as a valid predictor of market activity.


    This graph shows the DJIA on the left, and color codes 3 week periods in the graph according to search frequencies of the word debt. Note that red weeks, like late October of 2008 correspond with declines in the DJIA.  So when lot's of people were searching for the work debt, the stock market went down.

    The word Debt was the best performing term in the study.  Notably , it performed better than terms like nyse ,nasdq, and dow jones.





    The researchers then compared a Google Trends investment strategy to a basic Buy and Hold strategy and a random Dow Jones strategy.


    The results were remarkable.  The Google Trends strategy far exceeded the other strategies.


    Wednesday, March 27, 2013

    "Big Data" is so 1998

    This 1998 SiliconGraphics ad from Black Enterprise magazine offers solutions for a "Big Data" world.  256GB of system memory on a server and 400 Terabytes of storage. Not bad for the 20th century. Or for this century.



    The "Big Data" buzzword almost caught on in 1998, but it's sister buzzword, Data Mining won out.  In the  first chapter of "Predictive Data Mining: A Practical Guide is titled "Big Data" (also from 1998) the author Sholom M. Weiss asks "Is data mining a revolutionary new concept? or can we benefit from the may years of research on data analysis?”



    Weiss goes on to say "While big data have the potential for better results, there is no guarantee that they are more predictive than small data" With all the hype around Big Data, it helps to get back to the origins of the term and realize that it's one of may interesting problems that experts in a variety of disciplines have been wrestling with for a long time.



    Popular Articles