Conference - Wednesday 20th & Thursday 21st March 2013
See also: Schedule / Timetable
Stephen Quinney, University of Edinburgh - Do bad guys work weekends?
In the School of Informatics we aggregate all our system logs into a central store which provides us with interesting data mining opportunities. This talk will take a walk through some fascinating statistics based on our SSH authentication data. A good deal of the security advice bandied around lacks any clear justification so I will take the chance to examine whether it is supported by our data. Based on our data I will show you which are the most effective strategies for securing your servers. In the process I will attempt to answer a variety of questions such as: Do bad guys work weekends? Are they nocturnal? Do they take a holiday at Christmas?
Jan-Piet Mens, Germany – A short introduction to Ansible
Ansible is a simple configuration management and command execution framework for "push" and "pull" deployments for Unix/Linux systems using an existing SSH infrastructure. It's particularly easy to deploy because neither does it require an "agent" on managed nodes (a newish implementation of Python suffices) nor does it require a complex PKI. We show you how to quickly get started using Ansible for ad-hoc tasks, discuss some of its modules and introduce you to Ansible's playbooks and variables. We show you how to run Ansible as a normal user (non-root), how to configure inventory data, and give you sundry tips on using Ansible effectively. If you prefer a pull-based setup, we show you how to implement that as well.
Jan-Piet Mens, Germany - Multiple choice: DNS servers
Some people say "everything is a f...ine DNS problem"; we say: you have the choice. Whether you're a developer who needs a DNS name server on your laptop to test your latest distributed whizbang app in a taxi, or whether you're a network administrator who has to offer DNS services to a corporate environment: which DNS server should you choose?
In this short talk I'll give a quick overview of the choice you have: authoritative vs. recursive; big vs. small; database vs. files; complex vs. simple; DNS and/or DNSSEC. Take one, take two, take 'em all: they're free!
Aaron Brady, iWeb - Real-Time Monitoring at Scale
Motivation:
Traditionally, monitoring systems are built around large quantum of time, often 5 minutes. Providing the best service and information to our developers and customers required much finer granularity, but running with the same of fewer resources on the client.
This talk would cover iWeb's experience building a near real-time toolchain for monitoring and alerting, the technical and procedural challenges, some of the tools we used or wrote, and some things to think about when monitoring: what you're collecting and what you're going to do with that data. It relates to the general #monitoringsucks theme.
Problem:
iWeb's monitoring for our first hundred servers was built around Munin and Nagios. These are both fine tools but have scaling limitations when you push them to monitor a lot of machines or very frequently.
Munin and Nagios both fork processes to collect metrics and alert on server problems, respectively. As we scaled to hundreds of managed servers with dozens of checks each our monitoring hosts struggled to keep up with the load.
Our RRD-generated graphs started to have more and more bands where IO and fork overhead overwhelmed machines. Our Nagios host suffered massive check lag trying to keep up and we were left finding out about outages from customers before our own monitoring.
Approach:
Using collected, an extensible long running C daemon, we moved to 10 second granularity in our metrics, letting us spot spikes that were averaged out in our own 5 minute graphs.
Switching to the Icinga fork of Nagios we were able to replace many active checks with passive checks fed from collectd. (This required some patches to collectd which I've contributed and had merged upstream). When a disk switched from OK to WARNING we were notified within 30 seconds, and not 15 minutes.
An open-source web dashboard was written to give us a Munin like interface to the data that collectd generates, to smooth tranistion [https://github.com/iwebhosting/collectd-flask]
A soon-to-be-open-sourced evetn driven website checker performs hundreds of checks in parallel and submits them to Icinga to avoid even more fork overhead (We have ~900 domians in our care).
A custom Icinga dashboard and alerting tool summaries our state and reaches on-call staff by XMPP, iOS, Push Notification or SMS when our status changes.
Conclusions:
Our experience of building a near real-time reporting system made us think about the nature of alerting. In the end, almost all of the real-time alerting has been dialled back to avoid 'alert fatigue' in our staff, with no real effect on availability.
However, the value of highly accurate graphs has proved itself and we've expanded to use Graphite dashboards of the same data, and now collect ~50,000 metrics across 400 servers at 10 second intervals.
Tim Fletcher, Brighter Connections - Storage caching
This presentation will look at the work that Brighter Connections have done in the area of storage caching, what caching is and why it matters as well as an example of a deployed solution.
- An introduction to what storage caching is and why it matters - Review of the flash hardware market and the price/performance points - Review of storage caching software solutions, both open and closed source - Look at the specifics of a real world solution including the software stack and outcomes - Discuss the future development of storage caching.
Ben Jefferson, Brighter Connections - Open Source solutions for Monitoring Vsphere infrastructure
This presentation will look at Brighter Connections approach to monitoring VMWare ESXi servers using a number of Raspberry Pi devices. In the process we will also:
- Review other low cost open hardware solutions for simple monitoring tasks - Look at the rationale for using independent devices for monitoring tasks - Look at the challenges that we encountered and the solutions we came up with - Look at the specifics of the final solutions including the software stack, existing software tools used and bespoke software developed by Brighter - Discuss options for future development of the concept.
Julian Turnbull, Xilinx – Loneliness of the Long Distance Sysadmin
We consider the needs and challenges of providing for a site within a global, distributed company where the site is perceived as small because of headcount, but has data requirements similar to those of much larger sites.
For certain types of engineering site, the bandwidth and disk space needs are almost independent of the number of employees. Various types of intra-company and external cloud solutions are considered and weighed against the actual working method currently in place, which relies on local computing power and storage.
Toshaan Bharvani, VanTosh - Open Enterprise Server
Email, scheduling, collaboration, file & document management, customer management are very important tools in running a business, however compatibility with other companies in a global business world is important. A system which is open, scalable and affordable can be built with the same features included in proprietary systems. An out-of-the-box solution doesn't exist, however it can be very easily implemented in an open source environment, based on CentOS, Zarafa, Alfesco. Each product can be used in it's open-source version or with paid options and support. The client integration uses the idea of 'if it ain't broke, don't fix it', many users do not like change, however silent changes, which do not touch the client side completely can be achieved easily, especially when saving costs. The solution accommodates both smaller and bigger implementations as the system includes scalability options.
Toshaan Bharvani, VanTosh - A case of orchestration computing
This presentation is about a deployment of large number of physical and virtual machine. The setup uses KVM as a hypervisor for the virtual machines with libvirt as virtualization management and lower hardware level backend connections to interact with the physical machines.
Machines are defined for specific functions and use templates to allow easy creation and setup. Web servers run a combination of Apache and nginx. Database machines run both MySQL and PostgreSQL databases for the specific applications. Bind is used to name machines and address them correctly. Samba and NFSv4 is used for global file serving with distributed frontends. For single user authentication OpenLDAP is used as for security and protection usage of SELinux and IPTables keep machines more secure. The whole setup, based mainly a combination of ansible, scripts and git.
David Jones & Chris Blower, ScraperWiki - Lithium
A short talk on Lithium (li), an open source command line tool that allows push-button deployment and configuration of cloud server instances.
At ScraperWiki too much time was spent configuring machines. Often incremental changes would be made to a live server configuration, and in parallel to source code that would configure that server from scratch. But because we rarely did configure the server from scratch, it rarely worked.
We desire to move towards a model where we have 'disposable instances': Spin up a new one, throw away the old one.
Lithium is the tool we created, in CoffeeScript and shell, to help us do that.
We will cover Lithium's competitors, ingredients, users and our insights in building it and using it.
Bernd Erk, Netways GMbH, Germany - ICINGA - Open Source Monitoring to the Next Level
Icinga is an enterprise grade open source monitoring system with scalable and extensible monitoring, notification and reporting capabilities. Beyond a Nagios fork, Icinga features PostgreSQL and Oracle support, a modular architecture that can be combined to suite various needs and as well as two user-friendly Uls.
This talk will introduce Icinga's technical foundations, explain how it's designed to enable greater redundancy in complex networks, and discuss recent developments. It will also explain how the new core framework replacement, Icinga 2 is ahead of it's time with a new component loader core architecture that enables it to be more efficient, scalable and easier to maintain in large environments.
We'll demo both the new web interface with integrated reports as well as the new Icinga 2 technology preview, and signpost future plans for the project.
Bernd Erk, Netways GMbH, Germany - Managing Enterprise Clouds with OpenNebula
OpenNebula is an open source cloud and data centre management solution. Supporting a variety of hypervisors such as KVM, XEN, Hyper-V and VMware, it integrates well into many existing environments. Private, hybrid and public cloud scenarios can all be catered for, and physical hardware can be separated into federated zones and yet also controlled through a single interface. Compared to many other cloud frameworks, Open Nebula offers monitoring and high availability out of the box. Images can be redistributed to use different data storage and applied to different deployment scenarios, while OpenNebula manages resource balancing and availability for the user.
This talk will introduce OpenNebula and explain a variety of design scenarios towards building a heterogeneous cloud infrastructure. Best practice case studies and a live demo of the OpenNebula web interface, Sunstone will close the presentation.
Simon Riggs, 2nd Quadrant, - Latest Developments in PostgreSQL
By March, PostgreSQL 9.3 will be on the way to beta. The talk will review the new features in 9.3, review the database landscape and discuss the roadmap for Big Data, Mobile, Web and other use cases and interfaces.
Jonathan Clarke, Normation, France - Automating Security Policies, from deployment to auditing using Rudder
Designing, applying and keeping track of security-oriented rules for your IT infrastructure can be time-consuming, costly and approximate job. Whether you're in charge of defining the policy, implementing it of checking for discrepencies, you'll be aware that all of this takes time, often out-of-hours time, that there is a lot of room for error and usually a considerable gap between ideals and reality - just how big a gap may or may not be shared with everyone involved.
This talk will show how Rudder, an open source stack for automating configuration and auditing, can be used to ease and improve on several of these issues. Topics covered will include deploying indentical settings everywhere, saving time for multiple changes, near real-time auditing of actual settings, gaining global overview to help analyze vulnerability impacts, and improved reactivity. I will include real-life examples and feedback from several companies where this has been put into action, including benefits (of course) and shortcomings (because there are always some).
The aim of this session is to discuss methods and the approach of automation applied to this field, while demonstrating and giving feedback on some of the possibilities offered by Rudder. I hope to avoid being side-tracked into talking about detailed security recommendations sticking to simple best practices for the sake of examples, thus focusing on the approach. Operational monitoring of a live mail system is a headache, there are so many variables that it’s hard to keep track of everything that’s going on.
Ian Norton, Shadowcat Systems, Lancaster - Exim, Perl and SNMP, oh my!
This talk will look at techniques that can be used to monitor your Exim based mail systems and will cover:
- What’s sitting in your mail queue? Which of your internal systems has a problem or looks like it might? Are the components of your mail system operating as you expect? Spam assassin and Antivirus.
We’ll be looking at ways to monitor Exim via scripts to examine log files, service status and generate stats that can then be collected via SNMP using OpenNMS.
Jeff Gehlbach - OpenNMS
Plenty of Free Software tools exist for managing and monitoring Linux and similar systems, but the choices begin to narrow as the number of servers grow. Many platforms hit a performance wall or become unwieldy to configure beyond a few hundred nodes; they simply were not designed to scale beyond this point. Other platforms scale better but reserve the best features for those who pay for an "enterprise" version. This talk covers the system management capabilities of OpenNMS, a 100% Free Software framework for network, system, and application management that was designed form the outset to manage tens of thousands of nodes from a single instance.
John Hackett - Bytemark - Custodian: a distributed monitoring system
Custodian is an open source, distributed, network monitoring system built by Bytemark.
Custodian is designed to deal with large scale network and service monitoring, routinely performing thousands of checks, quickly and regularly. It is a distributable system, capable of using pluggable alerting methods, with an architecture that lends itself to writing new checks, and an existing collection of tests, ranging from host pings, to monitoring a web page for specific content, ensuring services run, or even picking up on open SMTP relays.
This talk will principally explain the design choices behind Custodian, touching on its sister software, the MauveAlert notifier, and will address using the two together to implement a fast, distributed platform for network and service checks.
See also: Schedule / Timetable