Simulating the bias-variance tradeoff in R

Robin Kraft Blog, Data Science, Statistik

In my last blog post, I have elaborated on the Bagging algorithm and showed its prediction performance via simulation. Here, I want to go into the details on how to simulate the bias and variance of a nonparametric regression fitting method using R. These kinds of questions arise here at STATWORX when developing, for example, new machine learning algorithms or …

Human vs Robots

A Performance Benchmark of Different AutoML Frameworks

Fabian Müller Blog, Data Science

In a recent blog post our CEO Sebastian Heinz wrote about Google’s newest stroke of genius – AutoML Vision. A cloud service „that is able to build deep learning models for image recognition completely fully automated and from scratch„. AutoML Vision is part of the current trend towards the automation of machine learning tasks. This trend started with automation of …

XY Titel

Benchmarking Feature Selection Algorithms with Xy()

André Bleier Blog, Data Science

Feature Selection Feature Selection is one of the most interesting fields in machine learning in my opinion. It is a boundary point of two different perspectives on machine learning – performance and inference. From a performance point of view, feature selection is typically used to increase the model performance or to reduce the complexity of the problem in order to …

Food for Regression: Using Sales Data to Identify Price Elasticity

Daniel Lüttgau Blog, Data Science

A few hundred meters from our office, there is a little lunch place. It is part of a small chain that specializes in assemble-yourself, ready-to-eat salads. When we moved into our new office a few years ago, this salad vendor quickly became a daily fixture. However, overtime, this changed. We still eat there regularly, but I am certain, if one …

XY Titel

Pushing Ordinary Least Squares to the limit with Xy()

André Bleier Blog, Data Science

Introduction to Xy() Simulation is mostly about answering particular research questions. Whenever the word simulation appears somewhere in a discussion, everyone knows that this means additional effort. At STATWORX we are using simulations as a first step to proof concepts we are developing. Sometimes such a simulation is simple, in other cases a simulation is plenty of work. Though, research …

pandas vs. data.table – A study of data-frames

Christian Moreau Blog, Data Science

Overview and Setting Python and R have become the most important languages in analytics and data science. Usually a data scientist can at least navigate one language with relative ease and at STATWORX we luckily have both expertises available. While, with enough will and effort, any coding project can be completed in either language, perhaps they differ in some perfomance …