Back to Home

Cache and Forward Architecture

I am currently part of a team working on CNF which is visioned to be a dynamic future internet network. CNF is based on a fundamentally different approach of pure hop-by-hop transport with in-network storage and caching of content files. The proposed CNF architecture exploits recent dramatic decreases in the cost of network storage to provide unified and efficient transport services to both fixed and mobile users of content services. To know more about CNF please refer to the CNF Home page.

I am concentrating on the Content Name Resolution Service (CNRS) Protocol for CNF. A brief description of CNRS and thoughts about its implementation is given below.

Overview

CNF CNRS aims at providing a name resolution to the network in terms of Content ID (CID) such that any request for content would invoke the CNRS which would resolve the CID to its attributes. Attributes corresponding to a CID would consist of a variety of information pertinent to the content, such as, Content Location, Content Creator, Content Access Rights, etc. It is conceivable that for popular content, an attribute may also consist of a list of CNF routers with cached copy of the content. A possible implementation of CNRS would be the handle system.

Handle Systems

Introduction

The Handle System is a comprehensive system for assigning, managing, and resolving persistent identifiers for digital objects and other resources on the Internet. The Handle System includes an open set of protocols, a namespace, and an implementation of the protocols. The protocols enable a distributed computer system to store identifiers of digital resources and resolve those identifiers into the information necessary to locate and access the resources. This associated information can be changed as needed to reflect the current state of the identified resource without changing the identifier, thus allowing the name of the item to persist over changes of location and other state information.

Syntax

Within the handle namespace, every identifier consists of two parts: its prefix and a unique local name under the prefix, otherwise known as its suffix. The prefix and suffix are separated by the ASCII character "/". A handle may thus be defined as <Handle> ::= <Prefix> "/" <Handle Local Name> For example, handle "12345/hdl1" is defined under the Handle Prefix "12345", and its Handle Local Name is "hdl1".

Architecture

The Handle System has a two-level hierarchical service model. The top level consists of HANDLE.NET services, including in particular the Global Handle Registry (GHR). The lower level consists of all other handle services, which are known as local handle services. Local handle services (LHS) is used to give persistent identifiers to web content, so that the content can be referenced and located using those permanent identifiers (with location data stored in the associated handle records) rather than using locations(URLs) as identifiers. Each LHS consists of one primary site and one or more secondary mirror sites. Each site can consist of one or more servers. If there are multiple servers, all the identifiers, and all resolution requests, will be evenly distributed across all the servers by virtue of the use of hashing.

Handle Architecture Figure 1: Handle Architecture

Handle Systems Scalability

The scalability problem can be divided into storage and performance.

Storage

The Handle System has been designed at a very basic level as a distributed system; that is, it will run across as many computers as are required to provide the desired functionality. Handles are held in and resolved by handle servers and the handle servers are grouped into one or more handle sites within each handle service. There are no design limits on the total number of handle services which constitute the Handle System, there are no design limits on the number of sites which make up each service, and there are no limits on the number of servers which make up each site. Replication by site, within a handle service, does not require that each site contain the same number of servers; that is, while each site will have the same replicated set of handles, each site may allocate that set of handles across a different number of servers. Thus increased numbers of handles within a site can be accommodated by adding additional servers, either on the same or additional computers, additional sites can be added to a handle service at any time, and additional handle services can be created. Every service must be registered with the Global Handle Registry, but that handle service can also have as many sites with as many servers as needed. The result is that the number of identifiers that can be accommodated in the current Handle System is limited only by the number of computers available.

Performance

Constant performance across increasing numbers of identifiers is addressed by hashing, replication, and caching. Hashing is used in the Handle System to evenly allocate any number of identifiers across any number of servers within a site, and allows a single computation to determine on which server within a set of servers a given handle is located, regardless of the number of handles or the number of servers. Each server within a site is responsible for a subset of handles managed by that site. Given a specific identifier and knowledge of the handle service responsible for that identifier, a handle client selects a site within that handle service and performs a single computation on the handle to determine which server within the site contains the handle. The result of the computation becomes a pointer into a hash table, which is unique to each handle site and can be thought of as a map of the given site, mapping which handles belong to which servers. The computation is independent of the number of servers and handles, and it will not take a client any longer to locate and query the correct server for a handle within a handle service that contains billions of handles and hundreds of servers, than for a handle service that contains only millions of handles and only a few servers.

Handle Systems for CNF

Possible Scenarios

Scenario 1 Figure 2: Scenario 1

Advantages / Pros • Content cached at PO’s will be made known to the network, so any query for the content henceforth could be served by the PO itself. • Info of content at PO should be made known to only the primary site within the LHS since the primary site will inform the mirror sites

Disadvantages • Content cached at CNF routers will not be known to the network

Scenario 2 Figure 3: Scenario 2

Advantages / Pros • Content cached at PO’s and CNF routers will be made known to the network, so any query for the content henceforth could be served by the PO or the router itself. • Info of content at PO and CNF router should be made known to only the primary site within the LHS since the primary site will inform the mirror sites

Disadvantages • Messages passed to inform the CNRS will add to traffic and could cause congestion issues • If the content is purged from the router then client request for the content will not be resolved (Possible solution would be for the CNRS to send information about both the original content location and CNF router location to the client)

Scenario 3 Figure 4: Scenario 3

Here information of content at CNF routers can be sent to PO’s and PO will send updates to CNRS at regular interval in the form of batch files

Advantages / Pros • All advantages from scenario 2 and traffic due to message passing will reduce since info is sent at intervals

Disadvantages • If the content is purged from the router then client request for the content will not be resolved (Possible solution would be for the CNRS to send information about both the original content location and CNF router location to the client)

References

• Handle website - http://www.handle.net/

• Handle Handbook - http://www.handle.net/tech_manual/Handle_Technical_Manual.pdf

Back to home