Historical Monitoring

| | Comments (0) | TrackBacks (0)
Polling systems at predefined intervals can be used to gather utilization or
other statistical data from various components of the system and to check
how well services that the system provides are working. The information
gathered through such historical data collection is stored and typically used
to produce graphs of the system's performance over time or to detect or
isolate a minor problem that occurred in the past. In an environment with
written SLA policies, historical monitoring is the method used to monitor
SLA conformance.
     Historical data collection is often introduced at a site because the SAs
wonder whether they need to upgrade a network, add more memory to a
server, or get more CPU power. They might be wondering when they will need
to order more disks for a group that consumes space rapidly or when they
will need to add capacity to the backup system. To answer these questions,
the SAs realize that they need to monitor the systems in question and gather
utilization data over a period of time in order to see the trends and the peaks
in usage. There are many other uses for historical data, such as usage-based
billing, anomaly detection (see Section and presenting data to the
customer base or management (see Chapter 31).
     Historical data can consume a lot of disk space. This can be mitigated by
condensing or expiring data. Condensing data means replacing detailed data
with averages. For example, one might collect bandwidth utilization data
for a link every 5 minutes. However, retaining only hourly averages requires
about 90 percent less storage. It is common to store the full detail for the past
week but to reduce down to hourly averages for older data.
  1. The machine that all the servers report to.
526     Chapter 22 Service Monitoring
     Expiring data means deleting it. One might decide that data older than
2 years does not need to be retained at all. Alternatively, one might archive
such data to removable media--DVD or tape--in case it is ever needed.
     Limiting disk space consumption by condensing the data or expiring it
affects the level of detail or historical perspective you can provide. Bear this
trade-off in mind as you look for a system for your historical data collection.
     How you intend to use the data that you gather from the historical mon-
itoring will help to determine what level of detail you need to keep and for
how long. For example, if you are using the data for usage-based billing and
you bill monthly, you will want to keep complete details for a few years, in
case there is a customer complaint. You may then archive the data and expire
the online detailed data but save the graphs to provide online access for your
customers to reference. Alternatively, if you are simply using the graphs in-
ternally for observing trends and predicting capacity needs, you might want
a system that keeps complete data for the past 48 hours, reasonably detailed
information for the past 2 weeks, somewhat less detailed information for the
past 2 months, and very condensed data for the previous 2 years, with every-
thing older than 2 years being discarded. Consider what you are going to use
the data for and how much space you can use when deciding on how much
to condense the data. Ideally, the amount of condensing that the system does
and the expiration time of the data should be configurable.
     You also need to consider how the monitoring system gathers its data.
Typically, a system that performs historical data collection will want to poll
the systems that it monitors at regular intervals. Ideally, the polling interval
should be configurable. The polling mechanism should be able to use a stan-
dard form of communication, such as SNMPv2, as well as the usual IP mech-
anisms, such as Internet control message protocol (ICMP) echoes (pings) and
opening TCP connections on any port, sending some specific data down that
connection and checking the response received by using pattern matching. It
is also useful to have a monitoring system that records latency information,
or how long a transaction took. The latency correlates well to the end users'
experiences. Having a service that responds very slowly is practically the same
as having one that doesn't respond at all. The monitoring system should sup-
port as many other polling mechanisms as possible, preferably incorporating
a mechanism to feed in data from any source and parse the results from that
query. The ability to add your own tests is important, especially in highly
customized environments. On the other hand, a multitude of predefined tests
is also valuable, so that you do not need to write everything from scratch.

      The output that you generally want from this type of monitoring system
is graphs that have clear units along each axis. You can use the graphs to see
what the usage trends are or to notice problems, such as sudden, unexpected
peaks or drops in usage. You can use the graphs to predict when you need
to add capacity of any sort and as an aid in the budget process. A graph is also a convenient form of
documentation to pass up the management chain. A graph clearly illustrates
your point, and your managers will appreciate your having solid data to
support your request for more bandwidth, memory, disk space, or whatever
it is that you need.


| | Comments (0) | TrackBacks (0)
Monitoring is an important component of providing a reliable, professional
service. The two primary types of monitoring are real-time monitoring and
historical monitoring. Each has a very different purpose. Monitoring is a basic component of building a service and
meeting its expected or required service levels.
    "If you can't measure it, you can't manage it." In the field of system
administration, that useful business axiom becomes: "If you aren't moni-
toring it, you aren't managing it."
    Monitoring is essential for any well-run site but is a project that can keep
increasing in scope. This chapter should help you anticipate and prepare for
that. We look at what the basics of a monitoring system are and then discuss
the numerous ways that you can improve your monitoring system.
    For some sites, such as sites providing a service over the Internet, com-
prehensive monitoring is a business requirement. These sites need to monitor
everything to make sure that they don't lose revenue because of an outage that
goes unnoticed.

Happy SA?

| | Comments (0) | TrackBacks (0)
                                        A happy SA deals well with stress and an
endless incoming workload, looks forward to going to work each day, and has
a positive relationship with customers, coworkers, and managers. Happiness
is feeling sufficiently in control of your work life and having a good social and
family life. It means feeling like you're accomplishing something and deriving
satisfaction from your job. It means getting along well with the people you
work with, as well as with the management above you.
     Just as happiness means different things to different people, various tech-
niques in this chapter may appeal more to some readers than to others. Mostly,
we've tried to list what has worked for us. For example, of the hundreds of
books on time management, we try to list the 10 percent of such books that
apply to issues SAs face. If you think that books on time management are
90 percent junk, we hope that we've covered the remaining 10 percent for
you here.
     The happy SAs we've met share certain habits: good personal skills, good
communication skills, self-psychology, and techniques for managing their
managers. We use the word habits because people do them unconsciously, as
they might tap their fingers when they hear a song on the radio.
     These behaviors come naturally to some people but need to be learned by
others. Books, lectures, classes, conferences, and even training camps teach
these techniques. It's pretty amazing that happiness comes from a set of skills
that can be developed through practice! Making a habit of a technique isn't
easy. Don't expect immediate success. If you try again and again, it will be-
come easier and easier. A common rule of thumb is that a habi

From (The Practice Of System And Network Administration)
Welcome to my new blog powered by Movable Type. This is the first post on my blog and was created for me automatically when I finished the installation process. But that is ok, because I will soon be creating posts of my own!

Find recent content on the main index or look in the archives to find all content.

Tag Cloud



Powered by Movable Type 4.1