8/14/2019 Carrier Class Operating System
1/20
White Paper
Juniper Networks, Inc.
1194 North Mathilda Avenue
Sunnyvale, California 94089
USA
408.745.2000
1.888 JUNIPER
www.juniper.net
Architectural Issues in Carrier ClassOperating Systems
Jeff Doyle
JUNOS Product Management
Part Number: 200209-001 Dec 2006
8/14/2019 Carrier Class Operating System
2/20
Copyright 2006, Juniper Networks, Inc2
Architectural Issues in Carrier Class Operating Systems
Table of Contents
Executive Summary ................................................................................3
Introduction ............................................................................................3
Router Operating System Objectives .......................................................4
Objectives or Any Router OS ...........................................................4
Open Standards Support .............................................................4
Flexibility ....................................................................................5
Manageability .............................................................................5
Basic Security .............................................................................5
Service and Support....................................................................6
Basic Reliability ..........................................................................6
What Makes a Router OS Carrier Class? ............................................6
Stability ......................................................................................7
Advanced Security ......................................................................8
Scalability ...................................................................................8
Precision .....................................................................................9
High Availability ..........................................................................9
Consistency ................................................................................9Predictability.............................................................................10
Carrier Class Reliability .............................................................10
JUNOS Architecture .............. ............. .............. .............. .............. .......... 11
Modularity .............. .............. ............. .............. .............. ............. .... 11
Managing Modular Architectures ....................................................12
Intelligent Modular Design ..............................................................12
Intelligent Modular Design: The JUNOS Routing Module .................13
Intelligent Modular Design: The Periodic Packet
Management Daemon .................................................................15
The JUNOS Kernel ..........................................................................16
Engineering Discipline ..........................................................................17
JUNOS Release Schedule .................................................................18
JUNOS Single Train Release Model ..................................................18New Product Introduction ...............................................................19
Conclusions...........................................................................................20
8/14/2019 Carrier Class Operating System
3/20
Copyright 2006, Juniper Networks, Inc
Architectural Issues in Carrier Class Operating Systems
Executive Summary
Juniper Networks original market was large service providers, carriers, and other high
perormance networks requiring the utmost levels o dependability while providing a rich set o
eatures. We recognized that no router operating systems existed at that time to answer these
requirements. Only recently have other vendors begun oering router operating systems that are
being positioned as carrier class.
This paper examines the characteristics that dene a carrier class router operating system, and
the architectural and engineering practices that are required to support these characteristics.
From its inception Juniper Networks has maintained that a modular sotware architecture is
undamental to any carrier class operating system. Although at least one o our competitors
has long disagreed with that assertion, they have recently released a modular operating
system o their own, claiming that their architecture is superior to JUNOS because it is more
modular. We contend that this is a nave understanding o the benets o modularity, arising
rom inexperience in building and managing such sotware. Modularity is an engineering tool
or creating a reliable operating system, and it is as important to understand the limitations o
modularity as it is to understand its useulness.
Even the most well designed operating system, however, cannot continue to deliver carrier
class quality unless it is supported by disciplined engineering practices. Complexity must be
controlled through unwavering adherence to strict development processes and release standards;
otherwise an operating system quickly becomes unpredictable, unreliable, and unmanageable.
Introduction
For a almost two decades IP has been synonymous with Internet services. When you
thought o an IP network you thought o web browsing, e-mail, data access and transer, and
IM. During those early years network operators gained experience and condence in IP as a
oundation communications protocol, and now we are in the beginning years o building much
more demanding and critical services over IP networks. Voice, video, an array o business and
entertainment services, military and emergency response communications, industrial sensors
and controls, mobile and wireless services these are just some o the capabilities that are nowbeing deployed over IP networks.
The driver or the move to consolidation o multiple services over IP is economics: It is ar
cheaper to build and operate a single inrastructure that can support many services, and it
is attractive to customers to receive multiple services rom a single provider over a single
connection . One o the more prominent examples o this move is BTs 21CN project. Once the
incumbent telephone monopoly in the United Kingdom, BT is abandoning its circuit-switched
voice inrastructure and consolidating both old and new services onto a high-perormance IP
backbone and in the process transorming itsel rom a traditional PSTN into a cutting-edge,
orward-thinking communications company. While BT itsel calls the move radical, one o the
expected benets should make sense to the most conservative o executives: An annual savings,
when the transition is complete, o 1 billion ($1.86 billion US).
Digitised voice, data and video can now be combined, changed, merged and manipulatedon a single digital platorm, says BTs Paul Reynolds. And i it is the ability to merge multiple
inormation ormats on a single platorm that is driving the desire or convergence at a device
level, the availability o carrier class IP networks, multi-service networks and sotware-driven
switching, are uelling the agenda or undamental change in our industry.
8/14/2019 Carrier Class Operating System
4/20
Copyright 2006, Juniper Networks, Inc4
Architectural Issues in Carrier Class Operating Systems
Given its corporate history, BT certainly understands exactly what is meant by carrier class IP
networks: It is the PSTNs and telcos, with over a century o developing and operating circuit-
switched networks, that have set the modern expectations or communications service quality.
Best-eort packet delivery is perectly acceptable or early Internet-oriented applications such
as le transers and e-mail, but is wholly unacceptable or quality-sensitive applications such as
voice and video. Convergence o such services onto a common IP inrastructure can succeedonly i the service quality meets or exceeds that o legacy networks.
The heart o all IP networks is the packet processors routers and the heart o all routers is
the operating system. A carrier class IP network, then, must begin with a carrier class router
operating system.
Router Operating System Objectives
The two basic unctions o any router are route processing and packet switching. These unctions
are accomplished, respectively, by two logical entities: the control plane and the orwarding
plane. The routers operating system is the sotware that creates these two logical entities the
routing protocols and the various databases that the routing protocols use to build the orwarding
inormation, or example, are a component o the control plane, or example. The OS alsomanages the physical components o the router, and is the means by which you access the
router both directly, such as the CLI, and indirectly, such as SNMP. It also includes peripheral
protocols or managing and operating the router, such as FTP or TFTP, NTP, Telnet, and SSH.
Objectives or Any Router OS
Beore discussing the characteristics that make a router OS carrier class, there is a more basic
set o eatures that you should expect o any router OS, rom the largest high-perormance core
router to the smallest home routers. These eatures are:
Support or open standards
Feature fexibility
Manageability
Basic security
Service and support
Basic reliability
Open Standards Support
Any router uses a number o protocols and or a high-end router it is a long list indeed to
perorm its duties. These protocols can be specied by open standards bodies like the IETF, IEEE,
and ITU-T, or they can be proprietary to the manuacturer o the routers OS. Open standards are
important or three reasons.
First, they give you some assurance that your router will interoperate with other routers
supporting the same standards, regardless o the manuacturer. Proprietary protocols obligate
you to use the same vendor or all routers between which the protocol must operate, sharplyreducing your design options and your ability to negotiate pricing among multiple vendors.
Second, your network operators are ar more likely to be intimately amiliar with open standards
because the specications are publicly available. This understanding is essential when your
network experiences problems or ailures; thereore open standards support contributes directly
to network reliability.
8/14/2019 Carrier Class Operating System
5/20
Copyright 2006, Juniper Networks, Inc 5
Architectural Issues in Carrier Class Operating Systems
Third, perhaps counter-intuitively, open standards are more secure. It is certainly true that
malicious parties study open protocols or security vulnerabilities; but it is equally true that
open protocols are subject to a scope o peer review not possible or a single vendor. Thereore
security risks are more likely to be identied and corrected in open standards beore the
protocols are ever implemented. A vulnerability in proprietary code is more likely to go
unnoticed until it is exploited.
Flexibility
Most networks change. New routing protocols are introduced as the network grows, new
eatures are enabled to support added network missions, new interaces are installed to satisy
growing bandwidth or redundancy requirements. The router OS must present you with a rich
menu o protocol and conguration options to support not only your initial design choices but
the changes you are sure to make as your network grows. Additionally, the OS must have the
capability o being easily upgraded to accommodate both improved code and newly-added
eatures rom the vendor.
Manageability
Just as the OS should support a variety o protocols to adapt to dierent design philosophies and
network growth, the OS should provide a variety o means by which to manage the router. At aminimum, the OS should provide:
An intuitive command line interace (CLI) with extensive error checking capabilities and
help options
A web-based conguration tool
Simple Network Management Protocol (SNMP)
The applicable open standards management inormation bases (MIBs)
The OS should also support both direct access to the router (in conjunction with the physical
router architecture) through a console connection and a modem connection; and remote access
to the router both through a dedicated management network connection and in-band access via
protocols such as Telnet and SSH.
Further management fexibility can be achieved by oering an application programminginterace (API) using an open standard such as Extensible Markup Language (XML). Such an
interace allows the router to be managed and congured using third-party management
platorms.
Basic Security
There are many aspects to security or an IP router, but there are certain security eatures that
you should expect every router, rom the smallest to the largest, to support. A eature that should
be present on any router o any size is password-secured access. On any but the smallest home
routers this access authentication should be supplemented by the ability to dene permissions
or dierent users that is, the ability to speciy what actions a given user or user group is
authorized to perorm on the router and the ability to monitor and record what actions
each user takes while accessing the router. These three security unctions should be urther
strengthened through the capability o being supported by independent servers: For example,
Radius or TACACS or authentication and authorization, and an independent le server or
accounting.
A remote access protocol such as Secure Shell (SSH) should be available as an alternative to less
secure access protocols such as Telnet.
All routing protocols should have the capability o authenticating all peers. This is highly
recommended practice within your own network, and essential when peering with untrusted
neighbors meaning all routers in networks not under your direct control.
8/14/2019 Carrier Class Operating System
6/20
Copyright 2006, Juniper Networks, Inc6
Architectural Issues in Carrier Class Operating Systems
Finally, a router should not have any potentially vulnerable protocols, such as Telnet or small
servers such as Finger, enabled by deault. You should be required to explicitly enable all services
and protocols you desire to run on the router, and never be required to disable services you do
not want. Said another way, a router powered up out o the box should do almost nothing until
you tell it what to do. This gives you a reasonable assurance that no exploitable vulnerabilities
will go overlooked.
Service and Support
The value o strong technical support becomes most apparent when things go wrong. At such
times getting your network back to normal must be done as quickly as possible, which means
support sta must be both knowledgeable and responsive. At the same time, technical support
must be proactive; or operating systems that means implementing engineering processes
that minimize bugs and interoperability problems beore customers ever see the sotware, and
implementing ongoing programs that can identiy and correct problems in production code
beore the problems become apparent as a wider network concern.
Basic Reliability
Reliability matters or even home routers. Interruption o services is at the least irritating, and
can drive customers away rom vendors o routers that are perceived to be undependable.
Reliability increases in importance with the criticality o the services the router is transporting.
But just what is reliability? At a basic level we understand reliability to be the ability o a system
to unction as expected or a given amount o time. Ideally we would like to add without
ailures to the denition; however, as the complexity o a system increases the potential or
ailure increases. Thereore a reliable system is one in which ailures are minimized as much as
reasonably possible, but also one which can recover rom a ailure quickly and eciently when
the unexpected does occur.
Given this denition, all o the eatures discussed so ar can be seen to contribute to reliability:
Open standards support, fexibility, manageability, security, and strong technical suppport.
Moving to the discussion o the characteristics o a carrier class OS, all o those dening
characteristics can also be listed as the contributors to carrier class reliability.
What Makes a Router OS Carrier Class?
A carrier class router operating system must have all o the eatures described in the previous
section, but the quality o those eatures must ar exceed what we have described so ar. There
are also additional qualities to be ound in a carrier class OS. Although you might nd one or a
ew o these additional eatures in other operating systems, carrier class requires the presence o
all o them. The unique eatures distinguishing a router OS as carrier class are:
Stability
Advanced security
Scalability
Precision High availability
Consistency
Predictability
Carrier class reliability
1 An exception to this rule, called Hierarchical VPLS (H-VPLS) is discussed in a later section.
8/14/2019 Carrier Class Operating System
7/20
Copyright 2006, Juniper Networks, Inc 7
Architectural Issues in Carrier Class Operating Systems
Stability
Stability is the capability o a router to deliver invariable perormance under variable network
circumstances. Every network has its ups and downs: Erratic trac loads and topological
change. I a network operators business is dependent on guaranteed service levels as carrier
class networks are no router in the network can suer a perormance degradation while
coping with variable network behavior.
Every router perorms two very undamental unctions: Packet orwarding and route processing.
Packet orwarding is o course the action o reading the destination address (and possibly other
inormation) in the header o an incoming packet, making a decision about where that packet
should go, and then switching the packet to the correct outgoing interace. Route processing
is the means by which the router comes to know how to make the correct packet orwarding
decisions: Routers exchange inormation about the network inrastructure among themselves
and then determine the best path to all known destinations based on some agreed-upon set o
rules2.
A router must perorm these two basic unctions at the same time, and that has implications
or stability. I the trac load through the router becomes very heavy most resources might be
used in perorming packet orwarding, causing delays in route processing, and resulting in slow
reactions to changes in the network topology. On the other hand, a signicant change in the
network topology might cause a food o new inormation to the router; most resources might be
used in perorming route processing, slowing the routers packet orwarding.
The key to the problem as described is internal resources. I most resources are consumed by one
o the two basic unctions, the other unction suers, and the router is destabilized. The answer
to the problem is to perorm these unctions in separate physical entities, each with its own
resources, as shown in Figure 1. In such an architecture the packet orwarding (orwarding plane)
and the route processing (control plane) do not draw processing cycles away rom each other.
Figure 1
The physical architecture depicted in Figure 1 has positive implications on the routers operating
system. Because a part o the OS is the routing protocols, the OS resides in the control plane.
So, the orwarding plane can perorm to ull capacity without aecting the ability o the OS to
control the entire physical system. It also means that the OS is protected rom unintentional or
malicious infuences o the network, as discussed in the next section.
Control Plane
Route Process
RIB
Management Process
Kernel
FIB
Security
Forwarding Plane
FIB
Layer 2 Processing
Interfaces
2Static routes are a notable exception to this description; the route processing mainly occurs inside a human brain and the results o the best path determination are
manually entered into the router. But while static routes are commonly a part o carrier class network congurations, they are never a primary source o route inor-
mation in such networks.
8/14/2019 Carrier Class Operating System
8/20
Copyright 2006, Juniper Networks, Inc8
Architectural Issues in Carrier Class Operating Systems
Advanced Security
A carrier class router OS must go ar beyond the basic security eatures discussed earlier in this
paper. At this advanced level, eatures and tools must be provided which address two missions:
Strong protection o the router itsel
Protection o the network in generalAs mentioned at the end o the previous section, the physical architecture o Figure 1 is a key
contributor to protection o the router. Attacks against the router will almost always come rom
the network (rather than an out-o-band connection), and are directed against one o the routing
protocols or the OS itsel. That means that attacks must enter at the packet orwarding entity
and then make their way up to the route processing entity. The link between these two entities,
then, serves as a choke point at which malicious packets can be identied and stopped, as
shown in Figure 2.
Figure 2
Powerul rewalling capabilities must be available or detailed identication and passing o only
specically permitted packets to the control plane, blocking all others. Rate limiting capabilities
must also be available so that essential packets permitted through the rewall lters, such as
ICMP, cannot be exploited or fooding attacks.
The tools or protecting the router ne-grained packet ltering and rate limiting should be
extended to the protection o the network itsel, by being extendable to the interaces o the
orwarding plane. In this application, however, it is important that the application o such control
unctions on production interaces does not negatively impact the perormance o the router.
A carrier class router OS should also oer tools that help the network operator take action
against malicious trac entering the network. For example, i a distributed denial o service
(DDoS) attack is in progress against a node in the network or is transiting the network toward
its target, the OS should have capabilities that aid the operator in tracing the attack trac to itsentry points, where specic lters or rate limiters can be enabled to stop or alleviate the attack.
Scalability
The eature fexibility discussed earlier certainly contributes to scalability. At the carrier class
level, the eature fexibility must be at perormance. That is, the network operator must be able
to condently enable a multitude o eatures on a given router without reducing the routers
basic packet processing and orwarding rates. For example, in support o a multiservice network
a router might be running OSPF or IS-IS, Multiprotocol BGP, intricate routing policies, MPLS
and its associated signaling and trac engineering protocols, related layer 2 and layer 3 VPNs,
IP multicast protocols, highly granular packet identication both or security and or trac
classication, and advanced queuing all while orwarding packets at or near line rate.
Control Plane
AttackPackets
Forwarding Plane
8/14/2019 Carrier Class Operating System
9/20
Copyright 2006, Juniper Networks, Inc 9
Architectural Issues in Carrier Class Operating Systems
Scalability also means that new eatures can be added to the OS quickly, eciently, and saely.
Finally, scalability means that the same OS can be used on multiple hardware platorms
and with any interace type; upgrading hardware or adding interaces must not require the
replacement o the existing OS code with a dierent hardware-, interace-, or eature-specic
version o the OS.
Precision
Route calculation errors, even transient ones, can cause inaccurate packet orwarding,
orwarding loops, and black holes. In a network carrying sensitive trac such as voice and
entertainment-quality video such errors, no matter how temporary, are unacceptable. Thereore
the route calculations o a carrier class OS must be correct every time. Without precision
stability, scalability, and security are impossible.
High Availability
With the convergence o high-quality, high-demand services such as voice and video onto IP
inrastructures, network outages o any kind are no longer acceptable. Even relatively small
packet losses can have a negative eect on users perception o the service delivery; a major
node, link, or interace ailure can have serious eects or the provider. A carrier class router
operating system must thereore itsel be resilient to ailure, and must provide the network
operator with tools that minimize network ailures whenever possible and that minimize the
eects o ailures that do occur.
It must be noted that most unplanned network outages are due not to hardware or sotware
ailures, but to conguration mistakes. The possibility o ailures would be much reduced,
writes Jerey Nudler, a Senior Analyst with Enterprise Management Associates, i you consider
that changing device conguration causes 60% o downtime due to human error.3 A carrier
class router operating system must take this human actor into account and help the operator
avoid making conguration mistakes.
Planned outages taking a node ofine or routing maintenance or upgrades are somewhat
less service impacting than unplanned outages, because they are predictable. Nevertheless,
modern service level guarantees and ve nines network standards preclude the traditionalpractice o ofine router operations. Carrier class router operating systems must enable in-
service router changes and upgrades.
Consistency
Multiservice networks require complex congurations, which in turn can present enormous
operational challenges. Considering, as the previous section emphasizes, that human error is the
major cause o network outages, unnecessary operational complexity must be avoided whenever
possible. I dierent versions o an operating system are required or dierent platorms,
dierent interaces, or dierent eatures, the diculties o network management and hence the
chances o operational mistakes are signicantly increased.
The ability to run the same OS sotware image on all routers helps control operational
complexity. Such consistency requires several actors:
No platorm-specic versions o the OS
No interace-specic versions o the OS
No eature-specic versions o the OS
3http://www.networkworld.com/news/2005/101005-iet.html
8/14/2019 Carrier Class Operating System
10/20
Copyright 2006, Juniper Networks, Inc10
Architectural Issues in Carrier Class Operating Systems
A consistent OS contributes to network stability and availability not only rom an operational
aspect, but also rom a sotware maintenance aspect. I the OS vendor is managing only a single
release at a time, adding enhancements and new eatures is greatly simplied; because changes
impact only a single code set, the changes can be more thoroughly tested. This translates directly
into more reliable sotware or the customer. Similarly, the customers regression testing beore
an upgrade to a newer OS release is much more trustworthy i there are no multiple versions oreature packages to test, reducing the chance o overlooked incompatibilities or unexpected code
conficts during implementation.
Predictability
Delivery o high quality services requires a predicable transport network. There are two aspects
to predictability that are infuenced by the router OS:
Predictable network behavior
Predictable sotware management by the OS vendor
The actors contributing to consistency discussed in the previous section an OS that is not
platorm or interace specic, and no separate eature packages also help make the network
predictable by reducing the chances or unexpected events during OS changes. These actorsalso help conserve predictability because the addition o eatures, interaces, or platorms to the
network are ar less likely to entail a change o OS sotware.
Network predictability is also helped by OS resilience. Although engineering practices that
minimize sotware bugs are crucial, occasional bugs are an inescapable act in any complex
sotware code. Thereore an OS architecture that can isolate and limit the negative eects o
bugs, preventing them rom causing systemwide ailures, supports network predictability.
Another aspect o predictability is the manner in which the vendor manages the OS. Tightly
controlled development milestones, well-dened engineering quality principles, and a strict
adherence to a regular release schedule all enable condent planning or the network operator.
Carrier Class Reliability
Carrier class reliability is dened by all the qualities that go into basic reliability, as describedearlier in this paper, plus all o the carrier class qualities described in this section: stability,
advanced security, scalability, precision, high availability, consistency, and predictability. The
reduction o any one o these qualities diminishes the overall ability o the operating system to
ulll the requirements o modern carrier class networks.
2 It should be noted that there are corresponding Options A, B, and C or inter-AS Layer 3 MPLS VPNs; Option B or VPLS is actually much more scalable than its
L3VPN counterpart.
8/14/2019 Carrier Class Operating System
11/20
Copyright 2006, Juniper Networks, Inc 11
Architectural Issues in Carrier Class Operating Systems
JUNOS Architecture
Juniper Networks rst ocus market was carriers and large-scale service providers. We
recognized that there were no carrier class router operating systems in existence, and we
designed JUNOS to ll that void. The architectural choices we made then proved to be the right
choices; while continually building upon that oundation operating system architecture, we havenever ound, nor do we oresee, a need to consider a new operating system. In act it is only
recently that some competitors have begun attempting to oer an operating system similar to
what we rst oered a decade ago.
This section examines the key architectural eatures o the JUNOS sotware, and how these
eatures enable JUNOS to meet the requirements o a carrier class OS.
Modularity
The most essential architectural characteristic o JUNOS is its modularity. Rather than a single,
highly complex code base the JUNOS sotware consists o a set o individual components, each
running in its own protected memory space, communicating with each other through well-
dened interaces, and all controlled by the JUNOS kernel (Figure 3). The separate modules,
called daemons4, are key to both stability and scalability.
Figure 3
Modularity is essential to stability because o the unctional separation o sotware components.
A malunction or bug in one module might cause the module to ail, while the rest o the
system continues unctioning; a monolithic operating system, on the other hand, has no such
compartmentalization and a similar malunction or bug is likely to cause a ull system crash.
Similarly, because each module operates in its own protected memory space and cannot
scribble on another modules memory space, the modules cannot disrupt each other.
Stability is also supported by the ability to replace an individual module. So i a problem is
identied in a given module, that module can be changed; without modularity the entire
operating system would have to be changed, meaning the router must be taken out o service, to
perorm a similar code patch.
Protocols(RPD)
PPMD(Hellos)
ChassisMgmt
Operating System
SNMP
InterfaceMgmt
4Daemon is a Unix term and refects the FreeBSD origins o the JUNOS kernel.
8/14/2019 Carrier Class Operating System
12/20
Copyright 2006, Juniper Networks, Inc12
Architectural Issues in Carrier Class Operating Systems
The concept o modular scaling is certainly not new; one o the innovations Vint Cer and Bob
Kahn introduced in TCP/IP was the idea o a layered protocol stack, allowing the change o one
layer without aecting the other layers.
The modular JUNOS architecture supports scalability because new modules can be added as
needed, and existing modules can be updated, without requiring a complete overhaul o the
entire OS code. This principle has been proven over and over; through the lie o JUNOS several
dozen new modules have been added to the original OS as new eatures and capabilities
have been introduced. Yet or years ater the advent o JUNOS other routers continued to run
monolithic operating systems with inherent instability and scaling limits.
Managing Modular Architectures
There are also engineering advantages to the JUNOS architecture that contribute to stability and
scalability. A reasonably small team o engineers manages the sotware comprising each module,
and the same team o engineers is responsible or the same module release ater release.
Thereore the code is much better understood than it would be i it were a more integral part
o a monolithic code base or i there were separate release teams. As a result any additions or
change to the module code is very well understood in terms o how the changes will aect the
code. Because the module communicates with other modules through a dened interace, its
interactions with other modules are tightly controlled.
Because dedicated engineering teams manage the modules, communication within the team and
between teams can be careully controlled. A strong sense o ownership is also inspired, insuring
ewer bugs in the code. And there are no separate bug x teams; when bugs do arise, the team
responsible or writing the code is responsible or correcting the code.
So the engineering advantages o the modular JUNOS architecture result in aster code
development, testing, and debugging. The end benet is to the customer is sound, reliable
operating system sotware.
Intelligent Modular Design
It might seem that i modularity is good, the more modular the OS the better. But this is not the
case. While a component module must be small enough to be benecially managed, it must also
be large enough to contain major interdependencies. I a module is made too small, articial
barriers will be created between dependent unctions, and the interprocess communication
between those unctions adds complexity to the overall system.
A undamental advantage o grouping unctions into individual modules or processes, as already
discussed, is that the processes can be stopped, replaced, or can ail independently without
crashing the entire system. When deciding whether a unction should be a part o an existing
module or should be in its own module, a determination must be made about what it means or
this unction to stop or ail independently: Will other unctions be aected? I so, the unctions
are interdependent and should probably be grouped together in the same module. This concept
is illustrated in Figure 4.
8/14/2019 Carrier Class Operating System
13/20
Copyright 2006, Juniper Networks, Inc 1
Architectural Issues in Carrier Class Operating Systems
Figure 4
Another consideration involves shared unctions. I there is a common unction that serves
several other unctions, all o those unctions should probably be grouped together in the same
module. Otherwise, as Figure 5 shows, either heavy interprocess communication must be
accepted in order or the separated unctions to work together, or the shared unction must be
duplicated in each module.
Figure 5
Intelligent Modular Design: The JUNOS Routing Module
A particularly clear example o intelligent modular design can be ound in the JUNOS routing
module, called the Routing Protocol Daemon (RPD). The RPD contains all o the routing
protocols, such as OSPF, BGP, IS-IS, and RIP. It has been proposed by others that this is an
old architecture, and that containing each routing protocol in its own module (a BGP module,
an OPSF module, and so on) is better. There are two arguments to be made in avor separate
protocol modules:
A single protocol can ail or be stopped independently, without aecting the other
protocols.
A single protocol can be upgraded to gain new eatures without the necessity o
upgrading the entire OS.
A
A EF
G
HI
J
K
M
O
L
N
B
C
D
= Function
Module 1 Module 2 Module 3
= Functional Interaction
Interdependencies well contained
Light interprocess communications
A
A E
F
G
H
I
J
K
M
O
L
NB
C
D
= Function
Module
1
Module
2
Module
5
Module
6
Module
3
Module
4
= Functional Interaction
Interdependencies poorly contained
Heavy interprocess communications
8/14/2019 Carrier Class Operating System
14/20
Copyright 2006, Juniper Networks, Inc14
Architectural Issues in Carrier Class Operating Systems
Both o these arguments are attractive and make sense on the surace. They are, ater all, two
o the undamental reasons a modular OS is superior to a monolithic OS. And in act, Juniper
Networks has on more than one occasion considered replacing the RPD with individual protocol
modules. In each case Juniper engineers concluded that such a change was a move in the wrong
direction, and that more problems would be created than would be solved.
The rst argument, that protocols can ail or be stopped independently without aecting otherrouting protocols, is fawed because it assumes that each protocol is completely independent.
Such is not the case. Even at a supercial level all o the protocols running on a router tend to
have dependencies on each other. I OSPF ails, or example, it can aect IBGP, RSVP-TE, LDP,
and the RPF checks used both or security and or IP multicast. I BGP is stopped, it aects not
only inter-AS routing but possibly L2 and L3 MPLS VPNs, and IP multicast. Dig a little deeper and
you nd that all IP routing protocols share a dependence on several other common protocols
and unctions such as ICMP and ARP. Go even deeper and you nd that the protocols must
cooperate in such basic unctions as choosing a best route and maintaining the routing database.
I the routing protocols are in separate modules, heavy interprocess communication is required,
burdening the overall system, and sharing such basic unctions as ARP and routing database
maintenance becomes complex problems.
By maintaining all routing protocols in a single module, the RPD, the many interdependenciesamong individual protocols are contained. The interprocess communication load is not taxed,
shared unctions are controlled, and the overall system is simpler, which translates into a more
reliable routing platorm.
The second argument, that modularizing individual protocols allows the customer to upgrade
only the protocol he wishes in order to acquire new eatures, is particularly appealing. For
example, the BGP module could be upgraded to a version that supports a desirable new eature
without the necessity o upgrading the entire OS. This provides the appearance o an In-
Service Sotware Upgrade (ISSU), because one section o code can be replaced without taking
the entire system out o service. Modularizing at the protocol level would seem to make sense
when oering this approach, so that individual protocols can be updated as non-disruptively as
possible.
But given the interdependencies among protocols already discussed, replacing a single protocolsuch as BGP is hardly as non-disruptive to routing operations as it might appear on the surace.
Far more important, the practice o selectively replacing protocol modules or any OS module,
or that matter comes at a steep price in lost consistency and predictability. To illustrate the
problem, take a hypothetical router OS that has ve currently available releases: Release A
through Release E. Release B is newer (and thereore has newer eatures) than A, C is newer than
B, and so on. In each release, there is an OSPF module, an IS-IS module, a BGP module, and a
RIP module. You are allowed to pick and choose among the protocol modules to attain exactly the
eatures you want: Perhaps OSPF rom Release B, RIP rom Release A, and BGP rom Release D.
To make this menu o combinations available to you, the sotware vendor must maintain and
understand the interactions o each routing protocol module rom each release with all o the
other routing protocol modules rom each release. Given the our protocols across ve releases,
the total possible release-specic protocol combinations is approximately 45, or 1,024. Wheneverthe vendor adds a new eature to one o the protocols he must perorm regression testing
not just or that release, but or all o the 1000+ possible protocol combinations. And i you
experience problems with a newly upgraded protocol module, the vendors technical support
personnel must understand the interoperability implications o all 1000+ combinations.
5The actual number o combinations is slightly less (16 less, in this example), because a given protocol rom one release would never be combined with the same
protocol rom another release.
8/14/2019 Carrier Class Operating System
15/20
Copyright 2006, Juniper Networks, Inc 15
Architectural Issues in Carrier Class Operating Systems
This example considers just RIP, OSPF, IS-IS, and BGP modules. Add to that an MPLS module
and an IP multicast module in each o the ve releases. The possible protocol combinations
now become approximately 65, or 7,776. And this assumes that the MPLS module is not urther
divided into separate RSVP-TE, LDP, L2 VPN, L3 VPN, and VPLS modules, or that IP multicast is
not urther divided into its constituent protocols. Take the practice beyond just routing protocol
modules and include all o the OS modules, and the possible package combinations acrossseveral releases soars exponentially into the hundreds o thousands.
The liabilities o this approach are clear: A vendor might gain positive short-term customer
response by allowing mix-and-match modules rom dierent releases, but the code will
quickly become unmanageable. The end result is an inconsistent, unpredictable, and ultimately
unreliable operating system.
The JUNOS RPD thereore remains a single module containing all routing protocols. And while
the RPD can be replaced as a module, Juniper Networks supports doing so only or installing
code patches and bug xes when necessary; new eatures are acquired by upgrading the entire
OS. This practice is key to a well understood, closely controlled, highly reliable operating system.
Intelligent Modular Design: The Periodic Packet Management Daemon
Although good engineering practice dictates keeping all routing protocols in a single module,
there is another view o modularization o the routing unctions. To understand where
modularization is benecial in the routing process it is necessary to think about basic routing
unctions. On the one hand, a routing process is responsible or perorming route calculations
using the inormation presented to it. Precision and stability require that this calculation be
allowed to run uninterrupted until it is nished. I the calculation is interrupted, there is a risk
o incorrect or incomplete route inormation nding its way into the routing database, possibly
resulting in incorrect orwarding, routing loops, or packet black holes.
On the other hand, there are elements o a routing process that must be serviced as soon as
possible. Hellos, adjacency maintenance messages, and route updates have timers oten tightly
set timers that require quick processing and response. Reacting slowly to these unctions
could cause timeouts that in turn can result in unnecessary message retransmissions at best and
closed adjacencies at worst. Stability, precision, and predictability can all be negatively aected.
There is a potential confict in these two basic unctions. A route calculation is a run-to-
completion task in computer science terms, it requires cooperative multitasking. Perorming
adjacency maintenance and update tasks is a real-time, or preemptive multitasking, unction.
When a routing protocol implementation must share a processor, should it allow interruptions
o its run-to-completions tasks whenever a real-time task needs the processor, at the risk o
temporarily corrupted route data? Or should it require real-time demands to wait until run-to-
completion tasks are nished, at the risk o broken adjacencies and network instabilities?
The answer, o course, is that neither situation is acceptable. Herein, then, is a justication or
a separation o the sotware comprising the real-time and run-to-completion elements o the
routing process. JUNOS implements the RPD, with all o its constituent routing protocols, as a
run-to-completion module. The real-time elements o the routing protocols are separated into amodule called the Periodic Packet Management Daemon (PPMD). The distinct processing needs
o each module are then served, and a scheduler manages the demands o both modules on the
shared Routing Engine processor. The result is a highly responsive, accurate, and stable routing
platorm.
8/14/2019 Carrier Class Operating System
16/20
Copyright 2006, Juniper Networks, Inc16
Architectural Issues in Carrier Class Operating Systems
The JUNOS Kernel
The heart o JUNOS, the JUNOS kernel began as a FreeBSD kernel. FreeBSD is renowned or
running on servers with exceptionally long uptimes, indicating both its level o reliability and
its inrequent need or updating. Because FreeBSD is open source sotware, Juniper Networks
engineers were ree to retain what mattered, discard what didnt, and custom-build the parts that
make the kernel JUNOS rather than FreeBSD.
Recently one o Juniper Networks competitors has begun oering a new operating system
built on the proprietary QNX Neutrino microkernel, and that vendor has made much about the
supposed superiority o microkernels over kernels such as JUNOS. To understand the issue,
it helps to briefy describe the reasoning behind microkernels. A simplistic comparison o a
monolithic kernel to a microkernel is illustrated in Figure 6. Only essential system services
remain in the microkernel (hence the prex micro); unctions such as the host stack, device
drivers, and le system have become external processes running in user mode, communicating
with the microkernel via system calls. By doing this, these externalized unctions can restart or
ail independently without causing a complete kernel ailure.
Figure 6
This argument in avor o microkernels is o course the same argument in avor o modularity
in the overall OS architecture. But the principles or intelligent modular design discussed in this
paper also apply here. The system is so heavily dependent on the host stack and le system
that a ailure o one o these services is likely to have a severely negative impact on the entire
system whether they are in the kernel or external processes. And in reality, device drivers can
be sopped and started even within the kernel. So the reality o microkernels is that by adding
articial barriers between these services interprocess communication is increased; the attempt
to simpliy the kernel adds complexity to the overall system.
There is nothing new in the arguments currently being made in avor o microkernels; in act
they come rom a 20-year-old academic debate. One o the more enlightening versions o this
debate took place in 1992 between Andy Tanenbaum, proponent o the microkernel-basedMinix operating system, and Linus Torvolds, creator o the kernel-based Linux, on the Usenet
newsgroup comp.os.minix6. Tanenbaum made the same arguments then as the arguments
now being used to promote microkernels as the latest innovation in router operating system
architecture. Among people who design operating systems, Tanenbaum wrote, the debate is
essentially over. Microkernels have won7.
HostStack
DeviceDrivers
FileSystem
Processes
IPC
ExternalProcesses
SystemCalls
SystemCalls
Hardware
Microkernel
Scheduler,Paging...
System call interface no kernel
Kernal interface no hardware
Processes
Hardware
Kernel
Scheduler,Paging
Virtual Memory
Etc.
Host StackDevice DriversFile System
System call interface no kernel
Kernal interface no hardware
6A simple Google search provides the complete text o the debate.7Andy Tanenbaum, LINUX is Obsolete, comp.os.minix, January 1992.
8/14/2019 Carrier Class Operating System
17/20
Copyright 2006, Juniper Networks, Inc 17
Architectural Issues in Carrier Class Operating Systems
Yet reality has shown otherwise. While microkernels have proven popular in embedded systems
such as automotive computers and industrial controls, (QNX is amously used in the Space
Shuttles robotic arm), they have ound little acceptance in more complex operating systems.
Microkernels are mostly discredited now, writes Miles Nordin in Linux Journal, because they
have perormance problems, and the benets originally promised are a antasy. This view is
supported in the widely respected textbook on operating system design, Operating SystemConcepts: Unortunately, microkernels can suer rom perormance decreases due to increased
system unction overhead.
Juniper Networks maintains no strong position on the arguments or and against microkernels.
Rather we chose FreeBSD as the genetic orerunner o the JUNOS kernel because o its
openness, in keeping with our strong belie in open standards. Its open source sotware has
made FreeBSD the most peer-reviewed sotware in the world; the reliability o JUNOS is thereby
rooted in the reliability o FreeBSD.
Engineering Discipline
The consistent message o the previous ew sections has been that modularity is essential to a
carrier class OS architecture, but nave approaches to designing modules can cause as many ormore problems than it solves. This paper has called thoughtul, experience-based modularity
intelligent modular design.
There is a deeper message throughout this paper: A router operating system that can meet
carrier class demands is only possible when it is managed by a highly experienced, highly
disciplined engineering team ollowing strict engineering processes. J. M. Juran, the guru o
modern business and industrial quality practices, says that you can determine the quality o the
product by assessing the quality o the processes used to develop it.
Any carrier class router OS is necessarily a highly complex system. Reliability can only
be maintained in such a system when the processes or improving the code and eature
enhancements are tightly controlled.
The principles o engineering discipline and strict processes were implemented at JuniperNetworks rom the very beginning, by the engineers joining the young company. Many had
experienced rst hand what happens when the rules governing product development are loose,
and when the developers do not have control o the code: The sotware becomes unmanageable,
and changes bring unpredictable conficts that oten become apparent only when the customer
attempts to implement the sotware.
Our quality development practices have evolved and matured with the company, but Juniper
Networks has never deviated rom the standards implemented in its rst years. In act our
acquisition o TL9000 certication only required documenting the processes already in place, not
implementing new processes.
8Miles Nordin, Obsolete Microkernel Dooms MAC OS X to Lag Linux in Perormance, Linux Journal, May 2002.9Abraham Silberschatz, Peter Baer Galvin, and Greg Gagne, Operating System Concepts, Seventh Edition, John Wiley & Sons, 2005, page 62.
8/14/2019 Carrier Class Operating System
18/20
Copyright 2006, Juniper Networks, Inc18
Architectural Issues in Carrier Class Operating Systems
JUNOS Release Schedule
There are our major releases o JUNOS each year, one per quarter, always in the same months:
February
May August
November
There are also, typically, ve working releases at any given time:
Three maintenance releases
One release in beta
One release under development
The release schedule provides a high degree o predictability or customers planning upgrades
and new eature implementation. Because o this, the release schedule always has highest
priority. Several dozen new eatures are included in each release, so it is important that the
customers planning or these new eatures not be delayed by development problems with one
eature. I a new eature project becomes delayed, the eature is moved to the next release; the
release is never delayed while waiting or a specic eature development to catch up.
Well-dened development milestones are essential to this process, so that expected development
delays can be identied early on. Any rescheduling o a eature to a later release, then, normally
occurs early enough that customers expecting the eature are given plenty o lead time to adjust
their plans accordingly.
Major inrastructure projects and unusually complex new eatures are introduced in phases
over multiple releases. A good example o a phased project is Non-Stop Routing (NSR). Early
components o NSR were added to JUNOS code as early as release 7.6; these rst components
were invisible to the customer, but allowed Juniper Networks system test personnel to insure
correct integration beore moving on to the next phase components. The rst customer visible
NSR components OSPF and IS-IS support were released in JUNOS 8.1, and the NSR project
will be ully complete at JUNOS 9.0. Releasing such projects in phases insures reliability byallowing incremental regression testing o components as they are added.
The JUNOS release schedule is also essential to helping adhere to the single train model.
JUNOS Single Train Release Model
The JUNOS single-train model means that or each JUNOS release, there is only one image; that
one image runs on all T, M, and J Series routers. The same code that runs on the largest T series
router also runs on the smallest J Series router. And all eatures supported at a given release are
supported in the one image. There are no separate eature packages to add when you want to
add a eature; you only have to enable the eature you want.
There are a number o development ethics that are adhered to in order to maintain this single
train model: No eature development is perormed in maintenance (working) releases. New eatures
are added to new releases.
No back-porting o eatures is allowed. That is, when a new eature is developed in a new
release, the eature cannot be added to an older release.
No customer specials. All eatures requested by all customers are developed and
released in the mainline code.
8/14/2019 Carrier Class Operating System
19/20
Copyright 2006, Juniper Networks, Inc 19
Architectural Issues in Carrier Class Operating Systems
Stating what we will not do seems somewhat infexible. It is. Loosening any o these rules means
veering o o the ocused path delineated by our strict quality processes, and in the end our
customers would suer. Adhering to the rules means that at all times our developers are working
with only a single code at any release; the result is well-understood code, with new eatures
and changes careully tested or correct integration. For the customer, this means superior
reliability. It also eliminates or the customer any need to cautiously select rom a complex menuo platorm-specic, interace-specic, and eature-specic packages and then perorm careul
regression testing to insure that the selected code interoperates as expected with previously-
implemented versions o the code and all installed hardware.
The single train model also benets our customers in the ollowing ways:
The same development teams manage the same sotware modules release ater release,
insuring that the code and any chances made to the code is intimately understood.
The same team responsible or writing the code is responsible or nding and correcting
bugs in the code. As a result, bugs are remedied ar aster than would be possible i we
used separate bug x teams.
The dedicated engineering team concept inspires a sense o ownership or the code,
sharply reducing the chances o bugs in the sotware in the rst place.Again, these principles translate directly into reliability or our customers.
New Product Introduction
In addition to engineering rules and procedures, there must be a set o phases that guide a given
product throughout its lietime rom rst inception to end-o-lie. At Juniper Networks this is the
New Product Introduction (NPI) model. The NPI model denes seven phases, and is applied to all
engineering projects. Well-dened milestones must be met or any project to progress rom one
phase to the next. Figure 7 shows the specic NPI model or JUNOS releases and eatures.
fgure 7
Certainly every company that produces a product has some similar model or dening the
products liecycle; but without strict engineering discipline, the models mean little. Juniper
Networks NPI model is dened to provide value to the customer by enabling us to oresee
resource requirements well in advance o the point where they might aect timely delivery to
our customers.
Phase 0 Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6
Initial Feature
Content, and
SW Resource
Estimate
Design,
Development,
and
Unit Test
Beta Test
Product
Definition,
Commitment
and Approval
System and
Alpha Test
End of Life
FRS to
Production
8/14/2019 Carrier Class Operating System
20/20
Architectural Issues in Carrier Class Operating Systems
Enorcement o the development milestones is accomplished by mechanized process controls.
Communications within and between development teams is also highly mechanized, beginning
with enhancement requests rom eld sales personnel, representing our customers, all the way
to the end o product lie.
Just as module teams and the single train release model are essential or understanding the code,
the NPI model is essential or projecting resource requirements and understanding exactly where
the code is in its liecycle.
Conclusions
JUNOS is the repository o our accumulated networking knowledge. It does not distinguish
between core and edge, service provider and enterprise. The power, discipline, and consistency
o our engineering practices insure the continuing advancement o JUNOS as the single operating
system architecture or all uture Juniper Networks platorms.
JUNOS was designed rom the beginning to meet the demands o carrier class networks,
and we have continually improved upon it while never deviating rom our core engineering
principles. Our decade o experience with JUNOS modular architecture brings a level o mature
understanding o managing such architectures that cannot be matched by other vendors just
now attempting to oer similar router operating systems.
JUNOS has always been the premier operating system in high-perormance, high-demand
networks. As more and more sensitive services are added to existing networks, the unmatched
reliability o JUNOS becomes more important to serious service providers than ever beore.
Copyright 2006, Juniper Networks, Inc. All rights reserved. Juniper Networks and the Juniper Networks logo are registered trademarks o Juniper Networks, Inc. in
the United States and other countries. All other trademarks, service marks, registered trademarks, or registered service marks in this document are the property o
Juniper Networks or their respective owners. All specications are subject to change without notice. Juniper Networks assumes no responsibility or any inaccuracies
in this document or or any obligation to update inormation in this document. Juniper Networks reserves the right to change, modiy, transer, or otherwise revise this
publication without notice.
Top Related