Reputation.com Data Science Labs Stack

This blog post is an introduction to the technology stack we use within the Data Science & Data Engineering Teams at Reputation.com to build our Data Science Labs portal. The Data Science Labs portal serves as a platform for our team to prototype data products, and iterate and experiment with new ideas to develop the exact algorithms and data products that will provide the most insight and value for our customers. For many of the tools we are building, the product use-case is being developed alongside the algorithms, so the more quickly we can prototype and collaborate with the product, design, and engineering teams, the more effective we will be at providing the most valuable and useful data products within our customer-facing product stack.

The portal is being actively built and iterated upon by our team of data scientists, data engineers and interns. After a few rounds of experimentation over the course of 2016, we have found that the following stack provides the best combination of feature set and flexibility to help us be as efficient as possible in prototyping new data products.

Backend

Since most of the data science toolkits the team uses daily are Python-based (Jupyter iPython, Pandas, Sklearn, TensorFlow), naturally we opted to go with a Python ecosystem for our Data Science Labs backend.  We have found Flask to work well as our backend web framework because it has the following characteristics:

  • Micro framework
  • Rich documentation
  • Large open source community support
  • Simple to learn and use
  • Routing is easy
  • Small core codebase and extensible

We also considered Django and CherryPy. Django is much more of a full-fledged solution, providing a ton of features that our application doesn’t need—such as deep ORM integration. There’s also a steep learning curve to ramp up. CherryPy, on the other hand, is comparable to Flask in that it’s an easy-to-learn micro framework. But Flask won out because of its great community support, better documentation, and vast library of plugins and extensions.

Frontend

We were looking for a frontend framework for our Data Science Labs that:

  1. is easy to ramp up
  2. provides a clean abstraction to help us separate our logic and view, and
  3. allows us to build reusable components.

We chose Vue.js over AngularJS or ReactJS. All three frameworks use 2-way data binding for reactive programming. Angular, being the oldest of the three, has a lot of community support and a deep knowledge base, but it is a behemoth in term of scale and size. Angular also is more inflexible in that it imposes a specific structure on how you layout your codebase, definitely much more opinionated than Vue.js or ReactJS.

ReactJS is rising in popularity and on its way to surpassing AngularJS as the dominant framework. It has many more advanced features than Vue.js, such as mobile rendering and server-side rendering, but neither are priorities for our project.

Vue.js, while being the newcomer, already has a large following and tremendous momentum moving forward, mainly for its reputation of being a minimalistic, simple framework. We find the syntax of Vue.js much more pleasant to work with than ReactJS’s JSX, producing code that is a lot more readable and maintainable. It is much easier to separate your CSS, JS and view files. And with the least amount of ramp-up time as far as learning curve is concerned, we find Vue.js to be the most suitable framework for our project. To summarize, Vue.js proved to be the right choice for us because it us:

  • easy to learn
  • very flexible; high compatibility with other libraries
    code readability
  • the core library focuses on the view layer, can build on top of as you go

Click here for more detailed comparisons between Vue and other frameworks.

As I mentioned, this stack is actively evolving with every new prototype we build, so check back here for updates as this platform evolves.

Semantic Convolution with Word2Vec
Finding Optimizations in Python With Program Profiling

Leave a Reply

Your email address will not be published / Required fields are marked *