Methoden Archive

What’s Your Problem – Framing Data Science Questions The Right Way

Dominique Lade 6. Mai 2020 Blog, Data Science

How can you frame a data science question according to your client’s needs? In this blog post, our colleague Dominique explains how important it is to think about the business question in a different way – the data science way.

Machine Learning Goes Causal II: Meet the Random Forest’s Causal Brother

Livia Eichenberger 5. Februar 2020 Blog, Data Science, Statistik

A new field of Machine Learning is born: Causal Machine Learning. Learn here about the Causal Forest, one of the most famous Causal Machine Learning algorithms for estimating heterogeneous treatment effects.

Machine Learning Goes Causal I: Why Causality Matters

Livia Eichenberger 29. Januar 2020 Blog, Data Science, Statistik

A new field of Machine Learning is born: Causal Machine Learning. Learn here what it is and why it is crucial for the future of Data Science.

Evaluating Model Performance by Building Cross-Validation from Scratch

Lukas Feick 2. Oktober 2019 Blog, Data Science

Cross-validation is a widely used technique to assess the generalization performance of a machine learning model. In this blog post I will introduce the basics of cross-validation, provide guidelines to tweak its parameters, and illustrate how to build it from scratch in an efficient way.

What the Mape Is FALSELY Blamed For, Its TRUE Weaknesses and BETTER Alternatives!

Jan Fischer 16. August 2019 Blog, Data Science, Statistik

In time series context, one of most the commonly used measures is the MAPE. In this blog post, I evaluate critical arguments and weaknesses concerning the MAPE and demonstrate alternative measures.

Monotoniebedingungen in Machine Learning Modellen mit R

Martin Albers 3. Juli 2019 Blog, Data Science, Statistik

Monotoniebedingungen können helfen den Sachverhalt besser durch Modelle darstellen zu lassen. In diesem Beitrag wird erklärt wir man solche Monotoniebedingungen in R umsetzt.

Coding Random Forests in 100 lines of code*

André Bleier 5. Juni 2019 Blog, Data Science

In our series of explaining method in 100 lines of code, we tackle random forest this time! We build it from scratch and explore it’s functions.

How to Speed Up Gradient Boosting by a Factor of Two

Tobias Krabel 22. März 2019 Blog

Our latest tool development at STATWORX: random boost, an algorithm twice as fast as gradient boosting, with comparable prediction performance.

Coding Regression trees in 150 lines of R code

André Bleier 9. November 2018 Blog, Data Science

Motivation There are dozens of machine learning algorithms out there. It is impossible to learn all their mechanics, however, many algorithms sprout from the most established algorithms, e.g. ordinary least squares, gradient boosting, support vector machines, tree-based algorithms and neural networks. At STATWORX we discuss algorithms daily to evaluate their usefulness for a specific project. In any case, understanding these …

Coding Gradient boosted machines in 100 lines of R code

André Bleier 5. November 2018 Blog, Data Science