Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

Thursday, July 23, 2015

How Data Science Saigon took the lead in a Kaggle Competition


Last month, at Tech Talk Tuesday,  we formed a team for the Kaggle Competition Getting Started with Julia. Last week, out team Data Science Saigon took the number one spot on the leaderboard.  Here's how it happened.


Entering a Kaggle competition

You've got to be in it to win it.  When our team met on June 16th, we created accounts on Kaggle's site, and on bitbucket by Atlasian, a company with offices here in Saigon.  We reviewed DataFrames with Julia, and got the code from the Julia Tutorial and the K-Nearest-Neighbor tutorial working with help from the Kaggle forums.  In particular,  some method calls have changed since the tutorials were created, but we found the workaround in the Convert method error when reading image files topic.  Our versions work as of Julia 0.3.7.   Implementing the tutorial code got us to about 46th place at the time.

Machine Learning with Julia

At our next meeting we took a look at how to build a predictive model based on the Random Forest algorithm.


But boosting parameters to our Random Forest algorithm didn't drastically improve our score.  This is when we found out about the Mocha package.

Convolutional Neural Networks 



The recent resurgence in popularity of Neural Networks is due the the amazing performance of Convolutional Neural Networks (CNNs) at image classification.  This was exactly the solution our problem needed.   Dung Thai is very knowledgeable about Deep Learning, and encouraged us to try out the Mocha Package for Deep Learning in Julia. As a result we quickly moved into the top 20 on the leaderboard.




Pulling out all the Stops

At our next meeting Dung (pronounced Yung) summarized Learn from the Best , and we talked about how to get to the next level. Data Science Saigon has talent across a variety of platforms and languages including C++, Caffe, Scilab, Python and of course Julia.  We also noticed a few things about the rules for this particular competition:

  1. Outside Data is not forbidden
  2. Semi-supervised learning is not forbidden
  3. The language does not have to be Julia  
We pondered how a Convolutional Network form a good Python library like Theano would perform. We also accessed lots more training images from the Chars74k dataset and the Street View House Numbers dataset

Saigon là số một.

 Then last week Dung Thai,  Vinh Vu, and  Nguyen Quy checked in Python code using Theano that recognizes over 92% of the images correctly, and vaulted us into the #1 spot on the Getting Started with Julia leaderboard.  Congratulations to everyone taking part in Data Science Saigon.

Our Remaining Challenge

So clearly, training with lots more data improved the score.  But the question remains,  would using a CNN in Julia with the additional training data generate a similar score as the Python code?  We hope to find out when we meet again. All of our code is here.  






Monday, June 29, 2015

Technical English Seminar


It's super exciting to be a part of tonight's Technical English Seminar at VTC Academy.  This started off as another Tech Tak Tuesday presentation, but thanks to the support of VTC Academy, we've got quite a crowd coming tonight.  Here's the slides for the first part:

Tuesday, June 9, 2015

Data Frames with Julia


Today's Tech Talk Tuesday is virtual, we'll do a live one next week.
Learn how to code with R like DataFrames in Julia. And see Julia's amazing vectorized assignment operator work on a DataArray.



DataFrames with Julia from AppTrain on Vimeo.

We read a csv file into a DataFrame, then learn how to subset it, and update values in it. 
Code is at nbviewer.ipython.org/github/apptrain/julia_training/blob/master/DataFramesWithJulia.ipynb

Thursday, June 4, 2015

Tech Talk Tuesday: Reading and Writing Files with Julia

I'm planning a series of these short videos on Julia basics. 


Reading and Writing Files with Julia from AppTrain on Vimeo.

Actually this one is over 7 minutes. I'd like to get them down to under 5, but still getting the hang of this smile emoticon.  Anyway, thanks for coming to the talks, and keep coming, we'll build on the basics covered in the videos.

Monday, June 1, 2015

Tech Talk Tuesday: Starting with Julia


People asked about dialing into a Tech Talk.  Here's the next best thing, Tech Talk Screencasts.  A little unpolished, but here you go:





Starting with Julia from AppTrain Technology on Vimeo.

See how to type your first few lines of Julia code. You'll also learn some key concepts behind the Julia language: Optional Static Typing, multiple dispatch, vectorized operations, and find out what a homoiconic language is. Enjoy!

Programming with Algebra by Andre van Delft

Programming may benefit much from a theory named the Algebra of Communicating Processes (ACP). This brings ease and fun to event-handling and concurrency. We have created a Scala language extension with features from ACP, by the name of SubScript. As an example we present a simple GUI controller written in SubScript, and show some quick extensions. Then we briefly touch our current implementation work: a data flow construct and support for actor programming. The SubScript implementation is an open source project. It is fairly simple and efficient; a SubScript executor maintains a "call graph", and it does not rely on threading and polling. There is still much work to do, and a lot of ACP-related programming patterns are to be discovered. For more information see http://ift.tt/1Jk2dNy.

Saturday, May 16, 2015

Webworking - Làm việc mạng


Webworking,  (Làm việc mạng),  is a group organized to encourage people in Ho Chi Minh City with careers in web technologies, and to support those who's careers need a boost or a change.

Our most popular event, Wednesday morning coffee is an informal OpenCoffee where ambitious young entrepreneurs, developers and investors mingle and learn from each other. We talk about the latest technologies, favorite Saigon coffee shops and how to build careers and businesses in the promising Vietnamese economy.

Now you can participate from afar by sponsoring a coffee session. $25 earns you the gratitude of that weeks attendees and gets your company or organization on the event page.


Sponsor Open Coffee




Thanks for the help building meaningful careers in Saigon!

Thursday, May 14, 2015

Python vs Julia


I really enjoyed this Python or Julia comparison from these Quantitative Economists.  They give insightful advantages and disadvantages for both languages.  I dug up the site because I started to wonder if I'm crazy learning Julia at the same time that I'm working with and still learning Python.   The final statement on their page eased my mind:

Still Can’t Decide?

Learn both — you won’t regret it

Popular Articles