Data Science at the Command Line

Level
Total time
Trainer
Jeroen Janssens
Location
Online
Starting date and place
Logo Data Science Workshops B.V.
Provider rating: starstarstarstarstar 9.5 Data Science Workshops B.V. has an average rating of 9.5 (out of 50 reviews)
trainings.full. Best provider of the NetherlandsAward winning Best provider of 2020: #2 BovnlWinnerType. overall trainer. Read here about these awards.

Ready to work on your personal development? Book now!

9.5
Average rating for Data Science at the Command Line
Based on 15 reviews Read all reviewschevron_right
starstarstarstarstar_half
Eriks Kopass
Product Owner
9
Data Science at the Command Line

"Great training, learned some new command-line tools that shall be useful to utilize in work scenarios. Thanks to Jeroen for being prepared, to-the-point and explaining the stuff for the students with different backgrounds." - 2020-12-09 16:01

"Great training, learned some new command-line tools that shall be useful to utilize in work scenarios. Thanks to Jeroen for being prepared, … read full review - 2020-12-09 16:01

Starting dates and places

computer Online: Video conferencing
16 Sep 2021 until 17 Sep 2021
check_circle Starting date guaranteed
view details
event September 16, 2021, 10:00-17:00, Video conferencing, Day 1
event September 17, 2021, 10:00-17:00, Video conferencing, Day 2

Description

Introduction

The unix command line, although invented decades ago, is an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful, command-line tools (like parallel, jq, and csvkit), you can quickly scrub and explore your data and hack together prototypes.

This hands-on workshop is based on the O'Reilly book Data Science at the Command Line, written by our CEO Jeroen Janssens. You'll learn how to build fast data pipelines, how to leverage R and Python at the command line, and how to quickly visualise data. No prior knowledge about the unix command line is required.

By the end of this workshop you will have a solid understanding of …

Read the complete description

Frequently asked questions

There are no frequently asked questions yet. If you have any more questions or need help, contact our customer service.

Introduction

The unix command line, although invented decades ago, is an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful, command-line tools (like parallel, jq, and csvkit), you can quickly scrub and explore your data and hack together prototypes.

This hands-on workshop is based on the O'Reilly book Data Science at the Command Line, written by our CEO Jeroen Janssens. You'll learn how to build fast data pipelines, how to leverage R and Python at the command line, and how to quickly visualise data. No prior knowledge about the unix command line is required.

By the end of this workshop you will have a solid understanding of how to integrate the command line in your data science workflow. Even if you're already comfortable processing data with, for example, R or Python, being able to also leverage the power of the command line can make you a more effective and efficient data scientist.

What you'll learn

  • Automate tedious tasks
  • Parallelise and distribute your tasks to multiple cores and machines
  • Convert your existing code to reusable command-line tools
  • Easily inspect, transform, and visualise data
  • Apply a variety of supervised and unsupervised machine learning algorithms

Schedule

Day 1:

  • Introduction
    • What is the command line?
    • Why learn the command line for doing data science?
    • A real-world data science use case
    • Getting up and running with the Docker image
  • Essential concepts of the unix command line
    • Running command-line tools
    • Combining command-line tools
    • Redirecting input and output
    • Working with files
    • Getting help
  • Obtaining data from logs, spreadsheets, and databases
  • Downloading data from the Internet and accessing APIs using curl
  • Transforming data with filters such as cut, paste, grep, and sed
  • Processing other data formats efficiently
    • JSON with jq
    • CSV with csvkit
    • HTML with pup
    • XML with xmlstarlet

Day 2:

  • Running R from the command line
  • Visualising data from the command line
    • Scatter plot
    • Histogram
    • Bar chart
    • Geographic visualisation
  • Parallelising and distributing data-intensive pipelines
  • Creating reusable command-line tools
    • Automate things in a Bash script
    • Convert your existing code to a command-line tool
    • Processing arguments
    • Working with streaming data
  • Applying machine learning
    • Outlier detection
    • Dimensionality reduction
    • Classification
    • Regression
  • Conclusion

Recommended preparation

Participants are kindly requested to have the following items installed prior to the start of the workshop:

  • Docker Desktop for Windows or for Mac or for Ubuntu
  • The docker image, by running: docker pull datascienceworkshops/data-science-at-the-command-line

Clients

We’ve previously delivered this workshop at:

  • Accenture
  • Amazon
  • Container Solutions
  • Prezi
  • SURFnet
  • Snow
  • Social Point
  • Teradata
  • The New York Times

Testimonials

"Great workshop! Very well done and very useful information delivered in an excellent and interactive manner. Jeroen anticipated very well on the different knowledge levels within the group. I would highly recommend the Data Science at the Command Line workshop to anyone that is interested in either kickstarting their command-line experiences or improving their data science with Unix power tools."

--Sanne Bouwman, Data Scientist, Teradata

"As a seasoned UNIX command line adept, I didn’t expect to learn much from a Data Science at the Command Line workshop. I was wrong! Over the years, many new tools have become available that I didn’t know about, and that can be combined with traditional tools in new ways.

Since attending the workshop, I have been able to simplify and improve the efficiency of many of the scripts I use on a daily basis. Recommended for anyone working from the command line, newbies and ninjas alike!"

--Joost van Dijk, Manager Middleware Services, SURFnet

"Besides demonstrating a good knowledge and experience in command-line tools for data science, the instructor had very good training skills, clear communication, and managed to adapt the level of the training to the level of the audience, which is not always easy!"

--Marc Canaleta, CTO, Social Point

9.5
Average rating for Data Science at the Command Line
Based on 15 reviews
starstarstarstarstar_half
Eriks Kopass
Product Owner
9
Data Science at the Command Line

"Great training, learned some new command-line tools that shall be useful to utilize in work scenarios. Thanks to Jeroen for being prepared, to-the-point and explaining the stuff for the students with different backgrounds." - 2020-12-09 16:01

"Great training, learned some new command-line tools that shall be useful to utilize in work scenarios. Thanks to Jeroen for being prepared, … read full review - 2020-12-09 16:01

Evance Soumaoro
starstarstarstarstar_half
Evance Soumaoro
Senior Software Developer
9
Data Science at the Command Line

"This training was very enlightening. I discovered that most of our tasks could be achieved using simple tools, without the need for heavyweight & complex software. This training not only got me data science skills with simple tools, but I also felt very confident as a command-line power user." - 2020-11-10 09:41

"This training was very enlightening. I discovered that most of our tasks could be achieved using simple tools, without the need for heavywei… read full review - 2020-11-10 09:41

starstarstarstarstar
Marton hubay
10
Data Science at the Command Line

"By the end of the course, you are going to understand how to not overengineer certain tasks, how you can use simple shell commands to tackle problems that often come up in the field of data engineering, data science, and system engineering. I can highly recommend the trainer." - 2020-11-09 20:13

"By the end of the course, you are going to understand how to not overengineer certain tasks, how you can use simple shell commands to tackle… read full review - 2020-11-09 20:13

starstarstarstarstar_half
Steve
Data scientis
9
Data Science at the Command Line

"De cursus is voor mij een goede start geweest om in brede zin aan de slag te gaan met de command line. Wat ik prettig vond is dat er veel interactie was en ruimte voor suggesties wat betreft de inhoud van de cursus. " - 2020-11-09 15:37

"De cursus is voor mij een goede start geweest om in brede zin aan de slag te gaan met de command line. Wat ik prettig vond is dat er veel in… read full review - 2020-11-09 15:37

Jorg Rødsjø
starstarstarstarstar
Jorg Rødsjø
10
Data Science at the Command Line

"I took the class in 2016, and really liked it. When I returned home, I could put jq to use in a very effective way. This saved us a ton of time processing large data-sets. Jeroen was also very gracious and helped me with some problems I encountered after the works. Definitely recommended for anyone who needs to up their command line skills for crunching data. " - 2020-11-09 13:48

"I took the class in 2016, and really liked it. When I returned home, I could put jq to use in a very effective way. This saved us a ton of t… read full review - 2020-11-09 13:48

Gerben Venekamp
starstarstarstarstar
Gerben Venekamp
Adviseur bij SURF
10
Data Science at the Command Line

"Erg leuke, maar ook vooral inzichtelijke cursus. Hoewel ik zelf al aardig bedreven ben op de command line, heb ik toch nog een paar handige dingen weten te leren. Met de kundigheid van de trainer zit het wel snor. Ook was zijn uitleg duidelijk en prima te volgen en was er vol op gelegenheid voor het stellen van vragen. Daarnaast was er ook genoeg ruimte om zelf aan de slag te gaan en eigenhandig het e.e.a. uit te proberen. De cursus was in-house gegeven en dus daarmee een prima locatie. Mooi om te zien wat je allemaal op de command line, en dus scripting, kan doen." - 2020-11-06 07:59

"Erg leuke, maar ook vooral inzichtelijke cursus. Hoewel ik zelf al aardig bedreven ben op de command line, heb ik toch nog een paar handige … read full review - 2020-11-06 07:59

Anastasia Khomenko
starstarstarstarstar_half
Anastasia Khomenko
Data Scientist
9
Data Science at the Command Line

"Ik vond de training makkelijk te volgen, heel veel nuttige informatie. Ik vond het ook echt leuk dat de trainer was altijd klaar om workshop aan te passen voor ons. Dus wij kon zeggen wat wij wel of niet wilden leren en hij voorbereidde materialen.
Nu heb ik zeker wat meer kennis over de topic en ik kan het in mijn werk toepassen." - 2020-11-04 14:32

"Ik vond de training makkelijk te volgen, heel veel nuttige informatie. Ik vond het ook echt leuk dat de trainer was altijd klaar om workshop… read full review - 2020-11-04 14:32

starstarstarstarstar
Alessandro Ausenda
Data analyst
10
Data Science at the Command Line

"Jeroen gave a python workshop at the company I am working for and it was simply perfect. Got all the valuable information needed to get better and better a Python. Jeroen was also teaching R classes at the Tilburg University, I attended his classes and I can only suggest him as an extremely prepared instructor." - 2020-11-04 09:41

"Jeroen gave a python workshop at the company I am working for and it was simply perfect. Got all the valuable information needed to get bett… read full review - 2020-11-04 09:41

starstarstarstarstar_half
Lennard van Wanrooij
Data Scientist
9
Data Science at the Command Line

"Erg interessante cursus en het was duidelijk dat Jeroen erg goed in de inhoud zat. Ook kon hij flexibel ingaan op vragen vanuit de cursisten, waardoor de cursus meerdere richtingen op kon gaan, maar er bleef altijd een rode draad. Goede afwisseling van theorie en zelf praktisch aan de slag. Vanwege coronatijden was de cursus (helaas) volledig digitaal, maar Jeroen had hier een goede setup voor gekozen." - 2020-11-04 06:15

"Erg interessante cursus en het was duidelijk dat Jeroen erg goed in de inhoud zat. Ook kon hij flexibel ingaan op vragen vanuit de cursisten… read full review - 2020-11-04 06:15

starstarstarstarstar_half
André Klaver
9
Data Science at the Command Line

"Excellent workshop for both beginners (which I consider myself to be) and advanced users. The course was well prepared with docker images and working with the API of IMDB from the command line was as much fun as it was educational. " - 2020-11-03 16:27

"Excellent workshop for both beginners (which I consider myself to be) and advanced users. The course was well prepared with docker images an… read full review - 2020-11-03 16:27

starstarstarstarstar_half
Kees de Kievith
IT analyst
9
Data Science at the Command Line

"De cursus van Jeroen geeft een goed overzicht hoe je de command line kan gebruiken voor data science. Als een niet dagelijks gebruiker van unix/Linux was de cursus voor mij heel goed te volgen en door de vele praktische voorbeelden erg leuk om deel te nemen. Jeroen geeft tijdens de workshop ook de mogelijk om zelf een praktijk voorbeeld voor te leggen. " - 2020-11-03 11:08

"De cursus van Jeroen geeft een goed overzicht hoe je de command line kan gebruiken voor data science. Als een niet dagelijks gebruiker van u… read full review - 2020-11-03 11:08

starstarstarstarstar
Joost Helberg
CEO, Snow
10
Data Science at the Command Line

"Data Science Workshops was able to skillfully differentiate, addressing various Unix Consultants at Snow with very different skill sets. The instructor, Jeroen Janssens, made some people really rise above themselves." - 2020-10-31 12:22

"Data Science Workshops was able to skillfully differentiate, addressing various Unix Consultants at Snow with very different skill sets. The… read full review - 2020-10-31 12:22

starstarstarstarstar
Marc Canaleta
CTO, Social Point
10
Data Science at the Command Line

"Besides demonstrating a good knowledge and experience in command-line tools for data science, Jeroen had very good training skills, clear communication, and managed to adapt the level of the training to the level of the audience, which is not always easy!" - 2020-09-15 12:00

"Besides demonstrating a good knowledge and experience in command-line tools for data science, Jeroen had very good training skills, clear co… read full review - 2020-09-15 12:00

starstarstarstarstar
Sanne Bouwman
Data Scientist
10
Data Science at the Command Line

"Great workshop! Very well done and very useful information delivered in an excellent and interactive manner. Jeroen anticipated very well on the different knowledge levels within the group. I would highly recommend the Data Science at the Command Line workshop to anyone that is interested in either kickstarting their command-line experiences or improving their data science with Unix power tools." - 2020-09-11 17:51

"Great workshop! Very well done and very useful information delivered in an excellent and interactive manner. Jeroen anticipated very well on… read full review - 2020-09-11 17:51

starstarstarstarstar
Joost van Dijk
Manager Middleware Services, SURFnet
10
Data Science at the Command Line

"As a seasoned UNIX command line adept, I didn’t expect to learn much from a Data Science at the Command Line workshop. I was wrong! Over the years, many new tools have become available that I didn’t know about, and that can be combined with traditional tools in new ways.

Since attending the workshop, I have been able to simplify and improve the efficiency of many of the scripts I use on a daily basis. Recommended for anyone working from the command line, newbies and ninjas alike!" - 2020-07-20 09:52

"As a seasoned UNIX command line adept, I didn’t expect to learn much from a Data Science at the Command Line workshop. I was wrong! Over the… read full review - 2020-07-20 09:52

Jeroen Janssens - Principal Instructor
Jeroen Janssens
Principal Instructor
9.5

There are no frequently asked questions yet. If you have any more questions or need help, contact our customer service.