Follow by Email

Translate

Tuesday, April 21, 2015

Efficient Computing with NumPy - Jake Vanderplas by PyData

Jake Vanderplas Jake Vanderplas is an NSF post-doctoral fellow at University of Washington, working jointly between the Computer Science and Astronomy departments. His research involves applying recent advances in machine learning to large astronomical datasets, in order to learn about the Universe at the largest scales. He is co-author of "Statistics, Data Mining, and Machine Learning in Astronomy", a Python-centric textbook to be published by Princeton Press in 2013, and has presented many technical talks and papers in this subject area. In the Python world, Jake is active in maintaining and contributing to several core Python scientific computing packages, including Scikit-learn, Scipy, Matplotlib, and others. He occasionally blogs on python-related topics at http://ift.tt/1nhB2qI. What is PyData? PyData.org is the home for all things related to the use of Python in data management and analysis. This site aims to make open source data science tools easily accessible by listing the links in one location. If you would like to submit a download link or any items to be listed in PyData News, please let us know at: admin@pydata.org Conferences PyData conferences are a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply the language and tools to ever-evolving challenges in the vast realm of data management, processing, analytics, and visualization. We aim to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person. A major goal of PyData events and conferences is to provide a venue for users across all the various domains of data analysis to share their experiences and their techniques, as well as highlight the triumphs and potential pitfalls of using Python for certain kinds of problems. PyData is organized by NumFOCUS with the generous help and support of our sponsors. Proceeds from PyData are donated to NumFOCUS and used for the continued development of the open-source tools used by data scientists If you would like to volunteer to be a part of the PyData team contact us at: admin@pydata.org

Thursday, April 16, 2015

Single Sign on: Managing Authentication with Google, Twitter and Facebook accounts.


Last month on Tech Talk Tuesday we talked about setting up Single Sign on for websites.  Today's web solutions are based on a standard called OAuth2.  





It's not difficult to allow multiple login options to a site.  The Javascript library Passport works great for node.js applications.  The Oauth gem does the job for Ruby on Rails applications. And the WP-Oauth plugin is great for Wordpress sites.

What really impressed us was how easy it was to install and configure Oauth for Meteor applications.  Meteor is an impressive framework for setting up node.js applications quickly.  We'll be exploring it more soon.


Thursday, March 19, 2015

Introduction to Julia

"Julia is a fresh approach to technical computing."  boasts the startup message, flourished with colorful circles hovering above a bubbly ASCII Julia logo.  The formatting effort is not wasted, it's an exuberant promise: Julia will make the command line fun again.
apptrain_1@julia:~/workspace $ julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" to list help topics
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.2.1 (2014-02-11 06:30 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-linux-gnu

julia> 

Julia was created by four Data Scientists from MIT who began working on it around 2011.  The language is beginning to mature at a time when the Data Scientist job title is popping up on resumes as fast as Data Scientist jobs appear.  The timing is excellent.   R programming, an offshoot of S Programming , is the language of choice for today's mathematical programmer.  But it feels clunky, like a car from the last century. While Julia may not unseat R in the world of Data Analysis,  plans don't stop there.

If you want to code along with the examples in this article, jump to Getting Started with Julia and chose one of the three options to start coding.

Julia is a general purpose programming language.  It's creators have noble goals.  They want a language that is fast like C, they want it flexible with cool metaprograming capabilities like Ruby, they want parallel and distributed computing like Scala, and true Mathematical equations like MATLAB.



Why program in Julia?

1) Julia is Fast


Julia already boasts faster matrix multiplication and sorting than Go and Java.  It uses the LLVM compiler, which languages like GO use for fast compilation.   Julia uses just in time (JIT) compilation to machine code , and often achieves C like performance numbers.

2) Julia is written in Julia

Contributors need only work with a single language, which makes it easier for Julia users to become core contributors. 
"As a policy, we try to never resort to implementing things in C. This keeps us honest – we have to make Julia fast enough to allow us to do that" -Stephan Karpinski

And, as the languages co-creator Karpinski notes in the comments of the referenced post,   Writing the language itself in Julia means that when improvements are made to the compiler, both the system and user code gets faster.

3) Julia is Powerful

Like most programming languages, it's implementation is Open Source.  Anyone can work on the language or the documentation.  And like most modern programming languages, Julia has extensive metaprogramming support.  It's creators attribute the Lisp language for their inspiration:
Like Lisp, Julia represents its own code as a data structure of the language itself.
a) Optional Strong Typing
Using strong typing can speed up compiling, but Julia keeps strong typing optional, which frees up programmers who want to write dynamic routines that work on multiple types. 
julia> @code_typed(sort(arry))
1-element Array{Any,1}:
 :($(Expr(:lambda, {:v}, {{symbol("#s1939"),symbol("#s1924")},{{:v,Array{Float64,1},0},{symbol("#s1939"),Array{Any,1},18},{symbol("#s1924"),Array{Any,1},18}},{}}, :(begin $(Expr(:line, 358, symbol("sort.jl"), symbol("")))
        #s1939 = (top(ccall))(:jl_alloc_array_1d,$(Expr(:call1, :(top(apply_type)), :Array, Any, 1))::Type{Array{Any,1}},$(Expr(:call1, :(top(tuple)), :Any, :Int))::(Type{Any},Type{Int64}),Array{Any,1},0,0,0)::Array{Any,1}
        #s1924 = #s1939::Array{Any,1}
        return __sort#77__(#s1924::Array{Any,1},v::Array{Float64,1})::Array{Float64,1}
    end::Array{Float64,1}))))

b) Introspective


Julia's introspection is awesome, particularly if you enjoy looking at native assembler code. Dissecting assembler code comes in handy when optimizing algorithms. Julia programmers have several introspection functions for optimization. Here the code_native method shows the recursive nature of a binary sort algorithm.
julia> code_native(sort,(Array{Int,1},))
        .text
        push    RBP
        mov     RBP, RSP
        push    R14
        push    RBX
        sub     RSP, 48
        mov     QWORD PTR [RBP - 56], 6
        movabs  R14, 139889005508848
        mov     RAX, QWORD PTR [R14]
        mov     QWORD PTR [RBP - 48], RAX
        lea     RAX, QWORD PTR [RBP - 56]
        mov     QWORD PTR [R14], RAX
        xorps   XMM0, XMM0
        movups  XMMWORD PTR [RBP - 40], XMM0
        mov     QWORD PTR [RBP - 24], 0
        mov     RBX, QWORD PTR [RSI]
        movabs  RAX, 139888990457040
        mov     QWORD PTR [RBP - 32], 28524096
        mov     EDI, 28524096
        xor     ESI, ESI
        call    RAX
        lea     RSI, QWORD PTR [RBP - 32]
        movabs  RCX, 139889006084144
        mov     QWORD PTR [RBP - 40], RAX
        mov     QWORD PTR [RBP - 32], RAX
        mov     QWORD PTR [RBP - 24], RBX
        mov     EDI, 128390064
        mov     EDX, 2
        call    RCX
        mov     RCX, QWORD PTR [RBP - 48]
        mov     QWORD PTR [R14], RCX
        add     RSP, 48
        pop     RBX
        pop     R14
        pop     RBP
        ret

c) Multiple Dispatch

Multiple dispatch allows Object Oriented behavior.  Each function can have several  methods designed to operate on the types of the method parameters. The appropriate method is dispatched at runtime based on the parameter types.

julia> methods(sort)
# 4 methods for generic function "sort":
sort(r::UnitRange{T<:real at="" bstractarray="" dim::integer="" pre="" r::range="" range.jl:533="" range.jl:536="" sort.jl:358="" sort.jl:368="" sort="" v::abstractarray="">

Popular Articles