Reed Kraft-Murphy devops engineer and polymath

Website http://reedmurphy.net
Email reed@reedmurphy.net
Phone (650) 746-4420

Professional Experience

Senior Site Reliability Engineer

LinkedIn
2015-08-15 – Present

LinkedIn is the world's largest professional network with more than 433 million members in 200 countries and territories around the globe.

As a Senior Site Reliability Engineer at LinkedIn, I am responsible for ensuring that our complex, web-scale systems are healthy, monitored, automated, and designed to scale. I provide guidance, mentoring and a point of escalation to other SREs for services in my team's domain. I work closely with our development teams from the early stages of design all the way through identifying and resolving production issues.

  • Subject Matter Expert on several extant core infrastructure services
  • SME on new hardware allocation infrastructure, rolled out company wide
  • Uplifted legacy services to meet current development and operations standards
  • Primary point of contact for my team's services for new datacentre buildouts
  • Reported internal application vulnerabilites and shepherded through to closure
  • Maintain and operate globally distributed, high-availability architecture for several core cross-team services under tight SLAs
  • Develop tools to improve ability to deploy and monitor custom applications in a large-scale Linux environment
  • Work closely with development teams to ensure that platforms are designed with "operability" in mind
  • Participate in a 24x7 on-call rotation

Cloud Administrator / Developer

reedmurphy.net
2009-06 – Present

In my own time I administer a fleet of Fedora instances, providing assorted Internet services for several people, and for my own personal projects.

  • Develop and maintain Saltstack formulae for configuration management
  • Use Vagrant for a testing envrionemnt
  • Provide DNS and web hosting for reedmurphy.net and several other domains
  • Maintain a Mediawiki installation
  • Provide shell accounts for private use
  • Hack on several other in-development projects, including a Python roguelike, a distributed AI assistant, and a handful of custom web applications

Site Performance and Availability Engineer

REA Group Limited
2013-08 – 2015-08

REA Group Limited is a multinational digital advertising company specialising in property.

As a Site Performance and Availability Engineer I was responsible for implementing key components of many web application in a collaborative environment. I helped design solutions that run at scale and used infrastructure-as-code techniques to deploy and operate complex systems. Additionally, I shared on-call responsibility with a pool of other engineers.

  • Maintained and worked with globally distributed, high-availability architecture under strict SLAs
  • Worked directly in project teams, taking responsibility for monitoring, disaster recovery and automation
  • Worked on a team responsible for the operation, maintenance and deprecation of key legacy systems
  • Built and maintained continuous integration, testing, delivery and deployment pipelines and infrastructure
  • Built tools in Ruby to manage CloudFormation templates and stacks
  • Migrated our Atlassian Bamboo infrastructure to AWS
  • Used CloudFormation and cfn-init to automate the build, deployment and updating of Bamboo build agents
  • Planned and executed Amazon RDS disaster recovery processes
  • Generated GraphViz graphs of CloudFormation resources from CloudFormation stacks
  • Wrote a pure sh tool to post messages to Slack
  • Designed, planned and implemented load testing on a cloned production MySQL database using generated production-like queries
  • Built a Slack / Leankit integration for managing daily change planning and announcements
  • Implemented application monitoring using CloudWatch and PagerDuty
  • Participated in an on call rotation, both in an out of hours and as primary and escalation
  • Participated in company-wide security responses involving assessing our infrastructure for vulnerability, patching vulnerable systems, monitoring for attacks, etc.
  • Participated in a change request process involving stakeholders across multiple business units

Linux System Administrator

Catalyst Digital
2011-10 – 2013-08

Catalyst International is a "full-service marketing communications group" based on St Kilda Road, Melbourne. Catalyst Digital is Catalyst's web- and email-marketing focused child company. I also worked in a contracting capacity for Taguchi, developers of an advanced email marketing and broadcasting platform, and Catalyst's sister company.

  • Was responsible for the implementation, and ongoing administration, of development and production web servers
  • Planned and performed the decommissioning of legacy physical servers, replacing them with virtualized services provided by AWS (including EC2, S3 and RDS)
  • Implemented a push-based configuration management system using chef-solo and littlechef
  • Implemented a centralized logging, analytics and monitoring system across all hosted services using Splunk
  • Installed, configured and maintained several Drupal instances, including custom development work
  • Designed and implemented an automated Google Analytics and Google AdWords campaign reporting tool
  • Provided technical input, and liaised with developers, designers, managers and other stakeholders during project development
  • Implemented a scalable systems metric monitoring dashboard, using collectd, Graphite and chef
  • Integrated an Amazon CloudFront CDN into Taguchimail, including syncing and optimizing media uploaded via SFTP and HTTP
  • Drafted and implemented policies and procedures for aspects of Taguchi's business
  • Decommissioned legacy physical servers, replacing them with virtualized hosted services

Linux System Administrator

White Dog Green Frog
2008-04 – 2011-10

White Dog Green Frog is a Melbourne based web hosting and web development company. They provide email- and web-hosting to over a thousand clients, mainly small to medium businesses, and mostly from the Melbourne area.

  • Was responsible for all aspects of the server lifecycle for customer servers
  • Was responsible for the in-office network, including Internet connectivity, desktop, VoIP and email systems
  • Monitored distributed server logs using FOSS tools
  • Implemented and oversaw automated backups
  • Improved and maintained server security through
  • Was on call 24/7, and was responsible for responding to any alerts and incidents
  • Installed and managed the support ticketing system
  • Developing CMS components to meet web development projects’ needs

Software Engineering Intern

Google Sydney
2007-12 – 2008-01

  • Designed, implemented and documented a cryptography API for Google Gears

IT Technician

Sacred Heart College, Kyneton
2004-12 – 2007-02

  • Performed network installation, configuring and troubleshooting
  • Produced and maintained technical documentation
  • Provided Windows / Mac desktop support and troubleshooting

Education

Bachelor Applied Science (Computer Science) with Distinction
2007
RMIT University, Melbourne

Notable Projects

Resume (2015-) — This resume is rendered using JSON Resume, Python and Pystache.
Dealing with Trolls (2014) — A succinct article on recognizing and dealing with trolls online, written for The Feminist Observer, a former digital magazine.
gorl (2014) — In progress roguelike game. Written in go, using the termbox library.
awsnap (2014) — Trivially easy to use tool for the automated creation, expiry and management of Amazon EBS and RDS snapshots. Written in Ruby, using the aws-sdk gem.
rouge (2014) — Incomplete roguelike game and engine. Written in Python, using the urwid library.
Google Analytics and Google AdWords campaign report generating tool (2013) — ouputs HTML and PDF slideshows. Written in Python, using the google-api-python-client library.
Updater (2012) — Automatically applies updates to GetSimple CMS and plugins. Written in PHP, inspired by WordPress' update functionality.
News Manager RSS (2012) — An extension to the existing News Manager blogging plugin for the GetSimple CMS that serves up the blog as an RSS / Atom feed. Written in PHP.
Simple Backups (2012) — Automated, schedulable, remote and local backups for GetSimple CMS websites. Written in PHP, inspired by 'Backup and Migrate' for Drupal. 2nd place winner of the GetSimple March 2012 Create-a-thon.
ec2_volume_snapshot.py (2012) — Automated creation and rotation of Amazon EBS volume snapshots. Written in Python, using the boto library. Inspired awsnap.
A bot that automatically aggregates statistics and screenshots of user-submitted content (2012) — for a large online activist community. Still in use, and considered valuable by the community. Written in Python, using CutyCapt and xvfb for rendering.
cPanel / WHMCS reseller integration (2011) — Automated account owner settings, per-reseller defaults for hosting, spam filtering, etc. Written in sh.
YourCinemaOnline (2011) — Provided cinema screening time and booking integration between Joomla and the VenueTickets cinema ticketing system. Developed for White Dog Green Frog. Involved creating a PHP API for VenueTickets' proprietary network protocol.
CpMR (Cpanel coldMetal save and Restore) (2010) — cPanel system backup and restore script. Co-authored with Brian Coogan, written in sh, uses rsync.
WordPress bulk plugin installer (2010) — automated installation of sets of WordPress plugins. Written in bash, with heavy use of curl. Created for White Dog Green Frog. Dramatically reduced the build time of new WordPress websites.
rblcheck.sh (2010) — Automated multi DNS block list checking script. Written in bash.

Awards

2nd place, GetSimple March 2012 Create-a-thon, GetSimple CMS (2012-04-12)

"The Simple Backups plugin by ReedMurphy took a task that everyone should be doing and made it work without needing to schedule a cron job (unless of course you wanted to). His way of getting around that limitation was ingenious and is something that I will be implementing on some of my own sites in the very near future." — Chris Cagle, GetSimple Developer

Skills

Amazon
  • EC2
  • ELB
  • CloudFormation
  • S3
  • IAM
  • RDS
  • CloudWatch
  • SNS
  • cfn-init
DBA
  • MySQL
  • RDS
  • Load testing
  • Replication
  • High Availability
DevOps
  • Continuous Integration
  • Infrastructure as Code
  • Configuration Management
  • Puppet
  • Ansible
  • Chef
  • Salt
  • LeanKit
  • Version Control
  • git
Development
  • Ruby
  • Python
  • Java
  • Go
High Availability
  • System Arch.
  • Capacity Planning
  • Load Balancing
  • Autoscaling
  • Disaster Recovery
Linux SysAdmin
  • RHEL
  • CentOS
  • RPM Packaging
  • TCP/IP
  • SMTP
  • DNS
Monitoring
  • Nagios
  • CloudWatch
  • PagerDuty
  • New Relic
  • Graphite
  • collectd
  • Splunk
Scripting
  • Shell
  • Ruby
  • Python
Virtualisation
  • vSphere
  • EC2
  • cgroups
Web Development
  • PHP
  • HTML
  • Javascript
  • CSS
  • WordPress
  • Drupal

Interests

Activism

  • Feminism
  • Diversity
  • Online Activism

Gaming

  • Roguelikes
  • RPGs
  • Turn-based

Laziness

  • Automation
  • Monitoring
  • Optimization

Security

  • Pen Testing
  • War Games