Philipp Rebsamen, 12.02.2019
As I have recently passed my MCSA in Machine Learning, Data Science and Analytics with Microsoft R and Azure ML, I decided to share some thoughts on the topic of Data Science here at CSA Engineering.
The term “Data Science” has been a buzzword for quite some time now, slowly replacing traditional concepts such as business analytics, business intelligence and in some case even classic statistics. As usual, buzzwords tend to be viewed as an “entry ticket” to being considered as a successful organization. Especially big companies, whose scale alone suggests that there should already be a plethora of different data types available to consume, understandably believe they have a great deal to gain from Data Science.
Unfortunately though, many times companies just hire or train specialists, provide them with access to their data and turn them loose, expecting smart ideas and break-through results in an instant. Data Science projects with such a mindset are usually doomed from the get-go and will turn into a real “Buzzkiller”. There should always be, at the very least, an overarching goal to pursue and everyone should commit to it.
If you treat your data initiative as “just another IT project”, you are going to have a bad time. Although IT and technology is a big part of a successful Data Science project, it should not be the main driver of your project. The journey to a successful Big Data project should start with a business need instead of peer pressure (“everyone does it”). One should have at least a somewhat clear understanding of what the data analysis should achieve and how it can be measured. Insights gathered along the way can always be put in a backlog and be revisited at a later stage.
Remember that it is better to start the journey small, with a quick win for a department or business area instead of the whole company. Success in a small, straightforward project that can grow and scale is more important than a multi man-year behemoth that ultimately yields zero results, and success always brings more people to the table! Once your data-leveraging initiative gains traction, be prepared to scale rapidly: The total cost of ownership (storage, cloud computing, licensing, staff, etc.) will increase the same way your data does.
Serious Data Science projects require a multitude of tools and technologies to be evaluated over the complete stack: from storage (classic Filesystem, HDFS, NoSQL or SQL Databases) and storage location (cloud, on-premises, hybrid) to data processing (Logstash, Storm, Kakfa, Event- or IoT Hubs, etc.) and finally the actual analytics framework (R or Python including both of their vast ecosystems) and visualization tools (PowerBI, Tableau, etc.). The combinations are virtually endless and can at first feel quite overwhelming. That’s why a good understanding not only of the tools itself but also of the interaction between layers is required to not end up in a technology dead-end.
Starting on a green-field is always exciting and everyone involved would like to use the tools they already know or heard of, but sometimes they are simply not the right fit. A serious requirements engineering is the key to make sure you have the right tools for the right task. With our expertise not only in the technological field but also in systems engineering and project management, we are able to assist you from the very beginning of your analytics project.
Full Stack Engineering @ CSA
From the hardware device that collects your data via a robust data ingestion and processing strategy up to the web plattform that integrates your predictive model, CSA Engineering is able to support you in every part of your digital transformation: