Data Science at the Command Line: Facing the Future with Time-Tested Tools

This hands-on consultant demonstrates how the flexibleness of the command line may also help develop into a extra effective and effective facts scientist. You’ll mix small, but strong, command-line instruments to fast receive, scrub, discover, and version your data.

To get you started—whether you’re on home windows, OS X, or Linux—author Jeroen Janssens introduces the knowledge technological know-how Toolbox, an easy-to-install digital setting choked with over eighty command-line tools.

Discover why the command line is an agile, scalable, and extensible know-how. no matter if you’re already cozy processing information with, say, Python or R, you’ll enormously increase your facts technology workflow via additionally leveraging the facility of the command line.

  • Obtain info from web pages, APIs, databases, and spreadsheets
  • Perform scrub operations on undeniable textual content, CSV, HTML/XML, and JSON
  • Explore information, compute descriptive records, and create visualizations
  • Manage your information technology workflow utilizing Drake
  • Create reusable instruments from one-liners and latest Python or R code
  • Parallelize and distribute data-intensive pipelines utilizing GNU Parallel
  • Model information with dimensionality aid, clustering, regression, and class algorithms

Show description

Preview of Data Science at the Command Line: Facing the Future with Time-Tested Tools PDF

Similar Computer Science books

PIC Robotics: A Beginner's Guide to Robotics Projects Using the PIC Micro

Here is every thing the robotics hobbyist must harness the ability of the PICMicro MCU! during this heavily-illustrated source, writer John Iovine offers plans and whole elements lists for eleven easy-to-build robots every one with a PICMicro "brain. ” The expertly written assurance of the PIC simple machine makes programming a snap -- and many enjoyable.

Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics (Interactive Technologies)

Successfully measuring the usability of any product calls for selecting the right metric, using it, and successfully utilizing the knowledge it finds. Measuring the consumer adventure offers the 1st unmarried resource of functional details to permit usability execs and product builders to just do that.

Information Retrieval: Data Structures and Algorithms

Info retrieval is a sub-field of desktop technological know-how that bargains with the computerized garage and retrieval of records. offering the newest info retrieval strategies, this consultant discusses details Retrieval facts buildings and algorithms, together with implementations in C. aimed toward software program engineers development structures with booklet processing elements, it offers a descriptive and evaluative clarification of garage and retrieval structures, dossier buildings, time period and question operations, rfile operations and undefined.

The Art of Computer Programming, Volume 4A: Combinatorial Algorithms, Part 1

The artwork of machine Programming, quantity 4A:  Combinatorial Algorithms, half 1   Knuth’s multivolume research of algorithms is widely known because the definitive description of classical machine technological know-how. the 1st 3 volumes of this paintings have lengthy comprised a distinct and valuable source in programming thought and perform.

Additional resources for Data Science at the Command Line: Facing the Future with Time-Tested Tools

Show sample text content

Eight. nine) through Tatu Ylonen, Aaron Campbell, Bob Beck, Markus Friedl, Niels Provos, Theo de Raadt, Dug track, and Markus Friedl (2014). http://www. openssh. com. $ sudo apt-get set up ssh $ guy ssh sudo Execute a command as one other consumer. Sudo (version 1. eight. 9p5) through Todd C. Miller (2013). http://www. sudo. ws/sudo. $ sudo apt-get set up sudo $ guy sudo tail Output the final a part of documents. Tail (version eight. 21) through Paul Rubin, David MacKenzie, Ian Lance Taylor, and Jim Meyering (2012). http://www. gnu. org/soft​ware/coreutils. $ sudo apt-get set up coreutils $ guy tail $ seq five | tail -n three three four five tapkee lessen dimensionality of a knowledge set utilizing numerous algorithms. Tapkee via Sergey Lisit‐ syn and Fernando Iglesias (2014). http://tapkee. lisitsyn. me. $ # See web site for set up directions $ tapkee --help $ < iris. csv cols -C species physique tapkee --method pca | header -r x,y,species checklist of Command-Line instruments | 179 tar Create, checklist, and extract TAR records. Tar (version 1. 27. 1) by way of Jeff Bailey, Paul Eggert, and Sergey Poznyakoff (2014). http://www. gnu. org/soft​ware/tar. $ sudo apt-get set up tar $ guy tar tee learn from typical enter and write to straightforward output and records. Tee (version eight. 21) through Mike Parker, Richard M. Stallman, and David MacKenzie (2012). http:// www. gnu. org/soft​ware/coreutils. $ sudo apt-get set up coreutils $ guy tee tr Translate or delete characters. Tr (version eight. 21) by way of Jim Meyering (2012). http:// www. gnu. org/soft​ware/coreutils. $ sudo apt-get set up coreutils $ guy tr tree record contents of directories in a tree-like structure. Tree (version 1. 6. zero) through Steve Baker (2014). https://launchpad. net/ubuntu/+source/tree. $ sudo apt-get set up tree $ guy tree sort exhibit the kind of a command-line instrument. style is a Bash builtin. $ aid style $ sort cd cd is a shell builtin uniq document or fail to remember repeated traces. Uniq (version eight. 21) by means of Richard M. Stallman and David MacKenzie (2012). http://www. gnu. org/soft​ware/coreutils.

Download PDF sample

Rated 4.49 of 5 – based on 6 votes