Icon
Icon
Icon
Icon
Icon
Icon
2:03 AM
0 comments


Chapter 13. Policy Routing, Bandwidth Management, and QoS

IP networks, in general, and the Internet, as a particularly prominent example, are inherently nondeterministic in their predictability regarding operational parameters such as latency, delay, round-trip time (RTT), jitter (delay variation), and packet loss. The default service offering associated with the Internet is characterized as a best-effort variable-service response (RFC 2990, "Next Steps for the IP QoS Architecture").
To improve the situation for certain traffic classes, policy routing and quality of service (QoS) measures were introduced. Prioritizing one class of service always works at the cost of regular best-effort traffic. There is no free lunch, and statistical overbooking is the foundation of affordable Internet service provider (ISP) offerings. One strong driving force for QoS is expedited transport of real-time delay-sensitive traffic such as voice, video, and delicate data (such as storage traffic or dedicated business applications).
Note that queuing and scheduling on UNIX systems is often an integral part of packet filtering and Network Address Translation (NAT) implementations. This chapter covers neither firewall features nor NAT. (See Chapter 15, "Network Address Translation.") However, this chapter does cover packet-filtering architectures as they relate to queuing and scheduling.

Policy Routing

Policy routing is the art of deviating from destination-based shortest-path routing decisions of dynamic routing protocols. Policy routing considers aspects such as source/destination address, ports, protocol, type of service (ToS), and entry interfaces; do not confuse it with a routing policy or traffic policing. Traffic policing and shaping are sometimes summarized as traffic conditioning. Linux offers by far the most evolved policy routing approach of all Unices via multiple routing tables, the Routing Policy Database (RPDB), and the iproute2 (ip and tc) package for administration. Most other UNIX implementations implement policy routing via firewall marks and packet-mangling hooks.

Policy Routing on BSD

Policy-routing setup on BSD platforms is pretty straightforward, limited, and essentially integrated into firewall architectures. Examples 13-1 and 13-2 demonstrate its use by forwarding certain traffic based on source address or incoming interface (highlighted text).
Firewalling, NAT, and policy enforcement are done by basically the same "packet-mangling" structures.
Example 13-1. Policy-Routing Example with FreeBSD ipfilter
pass out quick on fxp0 to fxp1:192.168.2.1 from 192.168.2.200 to any


Example 13-2. Policy-Routing Example with OpenBSD Packet Filter (pf)
pass out log quick on xl0 route-to tl0:192.168.1.1 proto icmp from tl0 to any

pass out log quick on xl0 proto icmp from any to any


Linux iproute2 Policy Routing

The Linux OS can place routes within multiple routing tables that are identified by an 8-bit numeric ID or by a pseudo-name that is mapped in the file /etc/iproute2/rt_tables. By default, three tables exist: the default, the local, and the main (ID 254), as follows:
  • The default table can be discarded safely. It is reserved for last-resort postprocessing for the unlikely case that previous rules/routing tables did not process the packet.
  • The important local table (ID 255) consists of routes for local and broadcast addresses (as directly connected interfaces in Cisco lingo). The kernel maintains this table automatically. As a rule of thumb, it should not be tampered with.
  • By default, all route manipulations act on the main routing table (forwarding table). The RPDB supervises the different routing tables. Policy routing is configured via the ip rule and ip route commands.
Multiple routing tables come into play when policy routing is used, for traffic control and in the context of Multiprotocol Label Switching (MPLS) multiple routing instances (VRFs, or virtual routing and forwarding instances). In policy routing, the routing table identifier becomes one additional criterion capable of handling otherwise-identical prefix routes in different tables that will not conflict because of this tiebreaker mechanism. Example 13-3 illustrates the capabilities of the Linux policy-routing toolbox. Example 13-4 offers an example of a custom policy-routing table.
Example 13-3. Policy-Routing iproute2 Commands
[root@callisto:~#] ip rule help

Usage: ip rule [ list | add | del ] SELECTOR ACTION

SELECTOR := [ from PREFIX ] [ to PREFIX ] [ tos TOS ] [ fwmark FWMARK ]

            [ dev STRING ] [ pref NUMBER ]

ACTION := [ table TABLE_ID ] [ nat ADDRESS ]

          [ prohibit | reject | unreachable ]

          [ realms [SRCREALM/]DSTREALM ]

TABLE_ID := [ local | main | default | NUMBER ]



[root@callisto:~#] ip rule list

0:      from all lookup local

32766:  from all lookup main

32767:  from all lookup default

[root@callisto:~#] ls -al /etc/iproute2/

total 36

drwxr-xr-x    2 root     root         4096 Aug 28 08:10 ./

drwxr-xr-x   86 root     root         8192 Aug 28 08:03 ../

-rw-r--r--    1 root     root          299 Mar 15  2002 rt_dsfield

-rw-r--r--    1 root     root          296 Mar 15  2002 rt_protos

-rw-r--r--    1 root     root          114 Mar 15  2002 rt_realms

-rw-r--r--    1 root     root           98 Mar 15  2002 rt_scopes

-rw-r--r--    1 root     root           81 Aug 28 08:10 rt_tables



[root@callisto:~#] cat /etc/iproute2/rt_tables

#

# reserved values

#

255     local

254     main

253     default

0       unspec



#

# local values

#

1       lab



[root@callisto:~#] cat /etc/iproute2/rt_scopes

#

# reserved values

#

#0      global

#255    nowhere

#254    host

#253    link



#

# pseudo-reserved

#

#200    site



[root@callisto:~#] ip route help

Usage: ip route { list | flush } SELECTOR

       ip route get ADDRESS [ from ADDRESS iif STRING ]

                            [ oif STRING ]  [ tos TOS ]

       ip route { add | del | change | append | replace | monitor } ROUTE

SELECTOR := [ root PREFIX ] [ match PREFIX ] [ exact PREFIX ]

            [ table TABLE_ID ] [ proto RTPROTO ]

            [ type TYPE ] [ scope SCOPE ]

ROUTE := NODE_SPEC [ INFO_SPEC ]

NODE_SPEC := [ TYPE ] PREFIX [ tos TOS ]

             [ table TABLE_ID ] [ proto RTPROTO ]

             [ scope SCOPE ] [ metric METRIC ]

INFO_SPEC := NH OPTIONS FLAGS [ nexthop NH ]...

NH := [ via ADDRESS ] [ dev STRING ] [ weight NUMBER ] NHFLAGS

OPTIONS := FLAGS [ mtu NUMBER ] [ advmss NUMBER ]

           [ rtt NUMBER ] [ rttvar NUMBER ]

           [ window NUMBER] [ cwnd NUMBER ] [ ssthresh REALM ]

           [ realms REALM ]

TYPE := [ unicast | local | broadcast | multicast | throw |

          unreachable | prohibit | blackhole | nat ]

TABLE_ID := [ local | main | default | all | NUMBER ]

SCOPE := [ host | link | global | NUMBER ]

FLAGS := [ equalize ]

NHFLAGS := [ onlink | pervasive ]

RTPROTO := [ kernel | boot | static | NUMBER ]



[root@callisto:~#] ip route list table local

local 192.168.1.1 dev eth1  proto kernel  scope host  src 192.168.1.1

local 192.168.45.253 dev eth1  proto kernel  scope host  src 192.168.45.253

broadcast 192.168.1.0 dev eth1  proto kernel  scope link  src 192.168.1.1

broadcast 127.255.255.255 dev lo  proto kernel  scope link  src 127.0.0.1

broadcast 192.168.14.255 dev eth0  proto kernel  scope link  src 192.168.14.1

broadcast 192.168.45.255 dev eth1  proto kernel  scope link  src 192.168.45.253

broadcast 192.168.1.255 dev eth1  proto kernel  scope link  src 192.168.1.1

broadcast 192.168.14.0 dev eth0  proto kernel  scope link  src 192.168.14.1

broadcast 192.168.45.0 dev eth1  proto kernel  scope link  src 192.168.45.253

local 192.168.14.1 dev eth0  proto kernel  scope host  src 192.168.14.1

broadcast 127.0.0.0 dev lo  proto kernel  scope link  src 127.0.0.1

local 127.0.0.1 dev lo  proto kernel  scope host  src 127.0.0.1

local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1



[root@callisto:~#] ip route list table main

192.168.1.0/24 dev eth1  scope link

192.168.14.0/24 dev eth0  scope link

192.168.45.0/24 dev eth1  proto kernel  scope link  src 192.168.45.253

127.0.0.0/8 dev lo  scope link

default via 192.168.1.254 dev eth1



[root@callisto:~#] ip route list table main scope link

192.168.1.0/24 dev eth1

192.168.14.0/24 dev eth0

192.168.45.0/24 dev eth1  proto kernel  src 192.168.45.253

127.0.0.0/8 dev lo


Example 13-4. Creating and Populating a Custom Routing Table
[root@callisto:~#] echo 1 lab >> /etc/iproute2/rt_tables

[root@callisto:~#] echo 1 lab >> /etc/iproute2/rt_realms

[root@callisto:~#] ip rule del pref 32767

[root@callisto:~#] ip rule add from 192.168.14.0/24 to 192.168.7.0/24 table lab pref      

graphics/ccc.gif              32765 realms lab/lab



[root@callisto:~#] ip rule list

0:      from all lookup local

32765:  from 192.168.14.0/24 to 192.168.7.0/24 lookup lab realms lab/lab

32766:  from all lookup main

[root@callisto:~#] ip route add default via 192.168.14.254 table lab



[root@callisto:~#] ip route flush cache



[root@callisto:~#] ip route list table lab

default via 192.168.14.254 dev eth0



[root@callisto:~#] rtacct lab

Realm      BytesTo    PktsTo     BytesFrom  PktsFrom

lab        0          0          0          0


Linux routing also incorporates the concept of realms. A routing realm essentially can be compared to a route aggregate in Border Gateway Protocol (BGP) lingo; however, it is a grouping based on human logic and not necessarily on bitmasks. Realms often are used for tracking, traffic control, and packet path-accounting purposes that can be inspected via the rtacct utility. Realms are demonstrated in Example 13-4, too. Each route can be assigned to a realm either dynamically by a routing daemon or statically via the REALM option of the ip route command. I am aware of a patched version of GateD with patches from Alexej Kuznetsov that can classify prefixes to realms and can handle multiple Linux routing table IDs. For a concise discussion of realms and scope, check out the original writings of Alexej Kuznetsov, the creator of the iproute2 toolbox, at http://www.policyrouting.org/.

Cisco IOS Policy-Routing Example

Policy-based routing (PBR) enables you to classify traffic based on extended access list criteria or assign the traffic to different service classes via an IP precedence setting. Consult the Cisco.com article "Configuring Policy-Based Routing" for further information. Example 13-5 demonstrates the use of policy route maps to achieve this goal.
Example 13-5. Cisco IOS Policy Route Map for Different Next Hops and Priority
...

!

access-list 1 permit ip 192.168.1.1

access-list 2 permit ip 192.168.2.1

!

interface ethernet 1

 ip policy route-map LAB

!

route-map LAB permit 10

 match ip address 1

 set ip precedence priority

 set ip next-hop 192.168.3.1

!

route-map LAB permit 20

 match ip address 2

 set ip precedence critical

 set ip next-hop 192.168.3.2

!
...

Traffic Shaping, Queuing, Reservation, and Scheduling

Queuing works only for packets in the outbound (egress) direction. The only viable way to improve this situation is to enable bidirectional queuing on adjacent routers—for example, by configuring committed access rate (CAR)/rate limits on Cisco IOS architecture. Adjacency essentially means connected via point-to-point links or Ethernet crossover links. When no other queuing regimes are activated, almost all stack implementations resort to default first-in/first-out (FIFO) behavior.
The actual tasks involved in traffic shaping and implementing QoS include reserving resources, buffering, and scheduling behind the scenes. The choice of a queuing discipline is tricky, depending on the load on a link, and requires a thorough understanding of the internal workings of the queuing mechanism.
Queuing disciplines are a classical area of academic and applied research and go beyond the scope of this book. They are essentially procedures or measures that influence the way data is sent, delayed, and queued. You will find excellent resources for further information about queuing in the "Recommended Reading" section at the end of this chapter.
Queuing disciplines essentially come in two flavors: classless queuing disciplines (no subdivision granularity; reschedule, delay, or drop on a flat scale) and classful (class-context) queuing disciplines. The most popular queuing regimes are as follows:
  • CBQ— Class-based queuing
  • RED— Random early detection
  • WFQ— Weighted fair queuing
  • PRIQ— Priority queuing
Permanently saturated links require other strategies than bursty traffic patterns. Nothing really prevents permanently overburdened queues and interface buffers from dropping datagrams/frames. Proactively dealing with that problem is the art of congestion avoidance/management. Queuing is an integral part of the IP stack and forwarding engine and, therefore, the responsibility of the kernel. User-space utilities for administration complement the implementations. Shaping serves two purposes: limiting available bandwidth, and smoothing the use of virtual pipes.
Traffic conditioning is the art of dealing with incoming (ingress) traffic via a policer or shaper. A policer just enforces a rate limit, whereas a shaper smoothes the traffic flow to a specified rate by the use of buffers. Standard mechanisms of the Cisco IOS architecture are CAR, generic traffic shaping (GTS), and Frame Relay traffic shaping (FRTS).

Linux QoS

Linux provides a powerful and feature-rich subsystem for traffic control (traffic shaping, queuing disciplines, classification, prioritizing, sharing, filter chains), of both ingress and egress traffic. You configure such by having multiple sets of routing tables (iproute2) and by using the tc tool.
The main application of realms is in conjunction with the tc route classifier, where they help assign packets to traffic classes for accounting, policing, and scheduling. The tc tool handles these tasks:
  • Setup of queuing disciplines (QDISC) such as CBQ, RED, and SFQ
  • Setup of parent and child classes for classful queuing
  • Flexible filtering of classful queuing disciplines
  • Combinations of all these features
You also can shape inbound via the ingress option of the tc utility. It is up to you to decide whether inbound policing makes sense. Examples 13-6 through 13-10 demonstrate classless QDISCs.
Note that Example 13-6 facilitates a simple token-bucket filter (TBF) applied to interface eth0 (highlighted text), with certain parameters that influence shaping and allow short bursts while reacting with delays and drops to lasting overload conditions. In the current implementation, tokens correspond to bytes, not packets. A similar effect is achieved via a shaper device attached to eth0 in Example 13-7.
NOTE
For more details on TBF and an in-depth discussion of classful and classless queuing disciplines, see the "Linux Advanced Routing & Traffic Control HOWTO," especially for generic RED, weighted RED, and weighted round-robin (WRR).

Example 13-6. Interface Shaping with a TBF
[root@callisto:~#] tc qdisc add dev eth0 root tbf rate 220kbit latency 50ms burst 1540



[root@callisto:~#] tc -d qdisc

qdisc tbf 8001: dev eth0 rate 220Kbit burst 1539b/8 mpu 0b lat 61.0ms



[root@callisto:~#] tc -s qdisc

qdisc tbf 8001: dev eth0 rate 220Kbit burst 1539b lat 61.0ms

 Sent 425 bytes 5 pkts (dropped 0, overlimits 0)


Example 13-7. Alternative Interface Shaping with the Shaper Device
[root@callisto:~#] insmod shaper

Using /lib/modules/2.4.21/kernel/drivers/net/shaper.o



[root@callisto:~#] shapecfg -?

shapecfg attach  

shapecfg speed  



[root@callisto:~#] shapecfg attach shaper0 eth0



[root@callisto:~#] shapecfg speed shaper0 2000000



[root@callisto:~#] ifconfig shaper0 192.168.80.1 netmask 255.255.255.0 up



[root@callisto:~#] ifconfig -a

eth0      Link encap:Ethernet  HWaddr 00:10:5A:D7:93:60

          inet addr:192.168.14.1  Bcast:192.168.14.255  Mask:255.255.255.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:476 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:100

          RX bytes:0 (0.0 b)  TX bytes:53487 (52.2 Kb)

          Interrupt:5 Base address:0xd800



eth1      Link encap:Ethernet  HWaddr 52:54:05:E3:51:87

          inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:19895 errors:0 dropped:0 overruns:0 frame:0

          TX packets:14777 errors:0 dropped:0 overruns:0 carrier:0

          collisions:43 txqueuelen:100

          RX bytes:5879639 (5.6 Mb)  TX bytes:1302730 (1.2 Mb)

          Interrupt:9 Base address:0xd400



eth1:1    Link encap:Ethernet  HWaddr 52:54:05:E3:51:87

          inet addr:192.168.45.253  Bcast:192.168.45.255  Mask:255.255.255.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          Interrupt:9 Base address:0xd400



lo        Link encap:Local Loopback

          inet addr:127.0.0.1  Mask:255.0.0.0

          UP LOOPBACK RUNNING  MTU:16436  Metric:1

          RX packets:72 errors:0 dropped:0 overruns:0 frame:0

          TX packets:72 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0

          RX bytes:5416 (5.2 Kb)  TX bytes:5416 (5.2 Kb)



shaper0   Link encap:Ethernet  HWaddr 00:00:00:00:00:00

          inet addr:192.168.80.1  Mask:255.255.255.0

          UP RUNNING  MTU:1500  Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:10

          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)


Stochastic fair queuing (SFQ), as shown in Example 13-8, represents an "almost" fair queuing mechanism with reduced calculation burden. It helps on saturated links to distribute utilization in a fair way among sessions.
Example 13-8. Stochastic Fair Queuing
[root@callisto:~#] tc qdisc add dev eth0 root sfq perturb 10 quantum 2



[root@callisto:~#] tc -s -d qdisc list

qdisc sfq 8003: dev eth0 quantum 2b limit 128p flows 128/1024 perturb 10sec

 Sent 0 bytes 0 pkts (dropped 0, overlimits 0)


Example 13-9. pFIFO-Fast
[root@callisto:~#] tc qdisc add dev eth0 root pfifo limit 200k



[root@callisto:~#] tc -s -d qdisc list

qdisc pfifo 8004: dev eth0 limit 204800p

 Sent 0 bytes 0 pkts (dropped 0, overlimits 0)


Example 13-10. Random Early Detect/Discard with Explicit Congestion Notification
[root@callisto:~#] tc qdisc add dev eth0 root red limit 100 min 80 max 90 avpkt 10 burst

graphics/ccc.gif 10 probability 1 bandwidth 200 ecn



[root@callisto:~#] tc -s -d qdisc list

qdisc red 8006: dev eth0 limit 100b min 80b max 90b ecn ewma 2 Plog 4 Scell_log 17

 Sent 0 bytes 0 pkts (dropped 0, overlimits 0)

  marked 0 early 0 pdrop 0 other 0


In contrast to the previous examples, Example 13-11 offers a variant of classful queuing (priority queuing) in combination with filter chains. Class-based queuing (CBQ) is a huge field that is covered exhaustively in the HOWTO.
Example 13-11. Priority Queuing (PRIQ)
[root@callisto:~#] tc qdisc add dev eth0 root handle 1: prio



[root@callisto:~#] tc -s -d qdisc

qdisc prio 1: dev eth0 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

 Sent 1097 bytes 5 pkts (dropped 0, overlimits 0)



[root@callisto:~#] tc qdisc add dev eth0 parent 1:1 handle 10: sfq



[root@callisto:~#] tc qdisc add dev eth0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600

graphics/ccc.gif limit 3000



[root@callisto:~#] tc qdisc add dev eth0 parent 1:3 handle 30: sfq



[root@callisto:~#] tc -s -d qdisc

qdisc sfq 30: dev eth0 quantum 1514b limit 128p flows 128/1024

 Sent 0 bytes 0 pkts (dropped 0, overlimits 0)

 qdisc tbf 20: dev eth0 rate 20Kbit burst 1599b/8 mpu 0b lat 667.6ms

 Sent 85 bytes 1 pkts (dropped 0, overlimits 0)



 qdisc sfq 10: dev eth0 quantum 1514b limit 128p flows 128/1024

 Sent 0 bytes 0 pkts (dropped 0, overlimits 0)



 qdisc prio 1: dev eth0 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

 Sent 1182 bytes 6 pkts (dropped 0, overlimits 0)



[root@callisto:~#] tc -s -d qdisc list dev eth0

qdisc sfq 30: quantum 1514b limit 128p flows 128/1024

 Sent 0 bytes 0 pkts (dropped 0, overlimits 0)



 qdisc tbf 20: rate 20Kbit burst 1599b/8 mpu 0b lat 667.6ms

 Sent 85 bytes 1 pkts (dropped 0, overlimits 0)



 qdisc sfq 10: quantum 1514b limit 128p flows 128/1024

 Sent 0 bytes 0 pkts (dropped 0, overlimits 0)



 qdisc prio 1: bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

 Sent 1182 bytes 6 pkts (dropped 0, overlimits 0)



[root@callisto:~#] tc filter add dev eth0 protocol ip parent 1: prio 1 u32 match ip dport

graphics/ccc.gif 22 0xffff flowid 1:1



[root@callisto:~#] tc filter add dev eth0 protocol ip parent 1: prio 1 u32 match ip sport

graphics/ccc.gif 80 0xffff flowid 1:1



[root@callisto:~#] tc -s -d filter list dev eth0

filter parent 1: protocol ip pref 1 u32

filter parent 1: protocol ip pref 1 u32 fh 800: ht divisor 1

filter parent 1: protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0   flowid 1

graphics/ccc.gif:1 match 00000016/0000ffff at 20

filter parent 1: protocol ip pref 1 u32 fh 800::801 order 2049 key ht 800 bkt 0   flowid 1
graphics/ccc.gif:1 match 00500000/ffff0000 at 20

Layer 3 QoS: IP ToS, Precedence, CoS, IntServ, and DiffServ Codepoints

QoS definitions vary by service and approach chosen. For data communication networks, typical QoS characteristics and metrics include bandwidth, delay (latency), delay variation (jitter), and reliability, as follows:
  • Bandwidth— Peak data rate (PDR), sustained data rate (SDR), minimum data rate (MDR).
  • Delay/latency— End-to-end or round-trip delay, delay variation (jitter), node-processing delay.
  • Reliability— Availability (as percent of uptime), mean time between failures/mean time to repair (MTBF/MTTR), errors, and packet loss.
The IP header contains a Type of Service (ToS) field (see Example 13-12). Applications can set the three precedence bits of this ToS field at the network interface card (NIC) level according to their requirements.
Example 13-12. IPv4 Header with ToS Field
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | IHL | Type of Service | Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification | Flags | Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
In the context of IP QoS considerations, a 3-bit field in the ToS byte of the IP header is referred to as precedence (see Example 13-13). Using IP precedence, a network administrator can assign values from 0 (the default) to 7 to classify and prioritize types of traffic.
Example 13-13. ToS and Precedence
0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | | | | | | | PRECEDENCE | STRM|RELIABILITY| S/R |SPEED| | | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+
Many applications and routers support IP precedence. The ToS and differentiated services (DiffServ) approach directly tag the traffic itself, which therefore contains in-band QoS markings. An out-band approach is the Resource Reservation Protocol (RSVP). An integrated services (IntServ) approach provides end-to-end QoS in IP networks and relies on per-flow state information and integration with RSVP as a signaling protocol at every involved hop. (IntServ is considered to have some weaknesses.)
DiffServ takes a simpler approach with less signaling overhead and no QoS-aware intermediate network nodes for the entire path. Packets are classified and marked to receive a particular per-hop forwarding behavior on nodes along their path (RFC 2475). The DiffServ (DS) field is supposed to succeed the IPv4 ToS field in the IPv4 header, which is deprecated and in IPv6 context "rejuvenated" as the traffic-class octet (see Example 13-14).
NOTE
For DiffServ internals, see RFC 2474, "Definition of Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers."

Example 13-14. DiffServ Codepoints
The DS field structure is presented below (RFC 2474): 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | DSCP | CU | +---+---+---+---+---+---+---+---+ DSCP: Differentiated services codepoint CU: Currently unused (reserved)
Note that when you are dealing with DiffServ, two expressions are used frequently: PHB (per-hop behavior) and DSCP (DiffServ codepoint). In current architectures, IP precedence values are mapped into DSCPs

802.1P/Q Tagging/Priority—QoS at the Data-Link/MAC Sublayer

802.1P provides for eight traffic classes drawn from priority fields in 802.1Q VLAN tags. The IEEE 802.1P standard describes important methods for providing QoS at the MAC level and defines traffic-class expediting (3 bits) and dynamic-multicast filtering to ensure traffic does not traverse the boundaries of Layer 2-switched networks.
NOTE
Both 802.1P and 802.1Q are part of 802.1D.

Most vendors support 802.1P/Q in their Layer 2/3 equipment and modern NICs. This means that QoS tagging is pushed out to the network edge down to the NIC level. However, privileged treatment of these frames still is best effort in Layer 2-switched networks and does not involve reservation setup. The 3 priority bits can be mapped easily into the Layer 3 IP precedence bits or a subset of DSCPs. Therefore, we have coherent tagging, which is easy to implement. The remaining question—and there exists no uniform approach—is how to implement queuing for these priority flows at Layer 2 and Layer 3.
There is no 802.1P without 802.1Q VLAN tagging. The VLAN tag carries VLAN information—the VLAN ID (12 bits) and prioritization (3 bits). The Prioritization field was never defined in the VLAN standard, so 802.1P steps in and actually brings it to life. This effort defines a 32-bit tag header that is inserted after a frame's normal destination and source address header info. Switches, routers, servers, and even desktop systems can set these priority bits.
802.1Q priority is supported only rudimentary on UNIX. Linux vconfig can set these bits (see Example 13-15). Whether this works depends on the 802.1Q VLAN implementation of the OS.
Example 13-15. 802.1Q Priority Setting on Linux
[root@callisto:~#] vconfig add eth0 1

[root@callisto:~#] vconfig set_egress_map eth0.1 8

[root@callisto:~#] ifconfig –a

...

eth0.1    Link encap:Ethernet  HWaddr 00:10:5A:D7:93:60

          BROADCAST MULTICAST  MTU:1500  Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0

          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

...

[root@callisto:~#] cat /proc/net/vlan/eth0.1

eth0.1  VID: 1   REORDER_HDR: 1  dev->priv_flags: 1

total frames received:            0

total bytes received:             0

Broadcast/Multicast Rcvd:         0

total frames transmitted:         0

total bytes transmitted:          0

total headroom inc:               0

total encap on xmit:              0

Device: eth0
INGRESS priority mappings: 0:0  1:0  2:0  3:0  4:0  5:0  6:0 7:0

EGRESSS priority Mappings: 8:0

MPLS Exp Field and MPLS Traffic Engineering

The 3-bit MPLS Exp field (see Example 13-16) of the MPLS shim header (Layer 2 label-insertion header) can support eight different service classes (CoS, or class of service); thus DiffServ edge marking can be carried over.
Example 13-16. MPLS Label Stack Entry
The label stack is represented as a sequence of "label stack entries." Each label stack graphics/ccc.gif entry is represented by 4 octets. The label stack entries appear after the data link layer graphics/ccc.gif headers, but before any network layer headers. The top of the label stack appears earliest graphics/ccc.gif in the packet, and the bottom appears latest. The network layer packet immediately follows graphics/ccc.gif the label stack entry, which has the S bit set. (RFC 3032, "MPLS Label Stack Encoding") 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Label | Label | Exp |S| TTL | Stack +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Entry Label: Label Value, 20 bits Exp: Experimental Use, 3 bits S: Bottom of Stack, 1 bit TTL: Time To Live, 8 bits
This mechanism adds QoS to MPLS label-switched paths (LSPs). Integrated MPLS and DiffServ architectures are state-of-the art and the subject of active research and standard development. In addition, from a phenomenological point of view, CoS maps nicely into the MPLS concept of forwarding equivalence classes (FECs). FECs are a concept of treating equivalent traffic the same generic way.
MPLS uses RSVP-TE (traffic engineering) and Constraint-based Label-Distribution Protocol (CR-LDP) for special-purpose signaling. According to the recent Internet Engineering Task Force (IETF) activities, it looks as if RSVP-TE has won the race. There is a lot of work going on in the DiffServ/MPLS-TE integration area, too. This appears to be the only viable approach to the scalability problem that ISPs and carriers face when dealing with flows and service classes.
NOTE
For further information, consult the "Quality of Service" white paper at Cisco.com.

You can find more information about UNIX MPLS activities at the following website:

If You Enjoyed This Post Please Take a Second To Share It.

You Might Also Like

Stay Connected With Free Updates

Subscribe via Email

teaser