发表于2004/7/16 17:17:00 2445人阅读
Berkeley DB是一个开放源代码的内嵌式数据库管理系统，能够为应用程序提供高性能的数据管理服务。应用它程序员只需要调用一些简单的API就可以完成对数据的访问和管理。与常用的数据库管理系统（如MySQL和Oracle等）有所不同，在Berkeley DB中并没有数据库服务器的概念。应用程序不需要事先同数据库服务建立起网络连接，而是通过内嵌在程序中的Berkeley DB函数库来完成对数据的保存、查询、修改和删除等操作。
Berkeley DB为许多编程语言提供了实用的API接口，包括C、C++、Java、Perl、Tcl、Python和PHP等。所有同数据库相关的操作都由Berkeley DB函数库负责统一完成。这样无论是系统中的多个进程，或者是相同进程中的多个线程，都可以在同一时间调用访问数据库的函数。而底层的数据加锁、事务日志和存储管理等都在Berkeley DB函数库中实现。它们对应用程序来讲是完全透明的。俗话说：“麻雀虽小五脏俱全。”Berkeley DB函数库本身虽然只有300KB左右，但却能够用来管理多达256TB的数据，并且在许多方面的性能还能够同商业级的数据库系统相抗衡。就拿对数据的并发操作来说，Berkeley DB能够很轻松地应付几千个用户同时访问同一个数据库的情况。此外，如果想在资源受限的嵌入式系统上进行数据库管理，Berkeley DB可能就是惟一正确的选择了。
Berkeley DB作为一种嵌入式数据库系统在许多方面有着独特的优势。首先，由于其应用程序和数据库管理系统运行在相同的进程空间当中，进行数据操作时可以避免繁琐的进程间通信，因此耗费在通信上的开销自然也就降低到了极低程度。其次，Berkeley DB使用简单的函数调用接口来完成所有的数据库操作，而不是在数据库系统中经常用到的SQL语言。这样就避免了对结构化查询语言进行解析和处理所需的开销。
Berkeley DB is a full-service embedded database system for use by software developers. It is distributed in source code form, and is compiled and linked directly into your application.
The sections below give a detailed description of Berkeley DB's important features. If you need additional detail, the complete documentation suite is available on-line.
Berkeley DB是一个源码公开，可直接编译到你的应用里的嵌入数据库系统。以下章节将详细说明Berkeley DB的主要特性。如果你需要更加详细的内容，可查看http://www.sleepycat.com/products/documentation.shtml。
Ease of Use
Berkeley DB is intended for use by software developers who need to embed reliable, high-performance data management services in their applications. It does not require mastery of database-specific query languages, like SQL. Instead, developers make function calls that operate directly on the database and the records that it manages.
Once deployed, Berkeley DB is simple to administer. Other databases require a trained database administrator to handle backups, recovery, performance tuning, and routine maintenance. Berkeley DB uses standard operating system services, and needs no special device access. Maintenance tasks such as backup and recovery can be handled by standard operating system tools. Our goal is that end users of applications that embed Berkeley DB never be aware that they are using a database.
一旦经过配置，Berkeley DB就可以轻易的管理。一些数据库系统要求系统管理员培训去掌握备份、恢复、性能调优及日常维护。Berkeley DB使用标准的操作系统功能、不需要特殊的设备访问。如备份恢复任务可以用标准的操作系统工具。我们的目标是使终端用户使用嵌入Berkeley DB时不要感觉到是在使用数据库。
Open source distribution
Berkeley DB is an open source product, meaning that it is freely available for download in source code form, and may be freely used without commercial license under certain conditions. For information on licensing, see our Product Licensing page.
The fact that developers get full source code for the product makes it easier to use for several reasons.
First, you are no longer dependent on an outside vendor for changes, performance tuning, or debugging of the software. Sleepycat offers a full range of support and consulting services for Berkeley DB, but the fact that you have the source code means you have more control over your product than any binary database product can give you.
Second, you can integrate Berkeley DB into your product's build environment in the most natural way for you.
Finally, wide distribution of the source code means that many thousands of software developers have reviewed it. Berkeley DB's public interfaces and internal interfaces have all been carefully examined by a huge number of engineers, and their suggestions have produced a smaller, simpler, more reliable, and easier-to-use package.
Berkeley DB is a compact system. The full package, including all access methods and recoverability and transaction support, is roughly 375K of text space on common architectures.
Choice of several, easy-to-use APIs
Built for programmers, by programmers, Berkeley DB requires no special training in database access languages. Instead, the system provides an easy-to-use function-call interface for operating on databases and the records that they store. This interface supports simple record insertion and search, but also more complicated operations, including cursors, joins, management of duplicate values, and more.
The C/C++ and Java APIs and full documentation for their use are included in the distributed system. Programmers working in other languages may also choose among Perl, Python, Tcl, Ruby and others. The language-specific interfaces make all the power and flexibility of Berkeley DB available in a way that is natural for the language of choice.
C/C++和Jave APIs和全部文档包含在发布的系统程序目录里，也可以选择其他语言如Perl、Python、Tcl和Ruby等。这些与语言相关接口覆盖Berkeley DB所有可用的功能与扩展性，语言是可选择的。
Because Berkeley DB can be deployed in so many different ways, Sleepycat has been careful to provide the tools that developers require, without mandating their use. A good example is Berkeley DB's support for multi-threaded operation.
The library is entirely thread-safe. As Berkeley DB itself does not mandate the use of any particular threads package, you can use the one you like best or the one most natural to your application. You can build applications that are single-threaded or multi-threaded, as your application requires.
Berkeley DB works equally well when multiple processes operate on a single database. Whether sharing is among threads in a single process, among processes on a machine, or some hybrid of the two, the database software correctly handles caching, locking, and other core services. You can concentrate on your application without worrying about database architecture.
因为Berkeley DB支持许多不同的配置方法，Sleepycat 。。。。一个好的例子是Berkeley DB支持多线程操作。
当多进程操作一个数据时，Berkeley DB同样能工作得很好。数据库无论共享在单进程中的多线程、在同一物理机器中的多进程中还是混合构架中，Berkeley DB同样能正确处理缓冲、锁和其他核心服务。你能专注你的应用而不必担心数据库的构架。
File system integration
Once your application is deployed and running at your customer's site, ongoing maintenance of the database is a major concern. Berkeley DB has been carefully designed to minimize and, in most cases, entirely eliminate database administration (DBA) tasks.
Other database systems require use of a dedicated disk (a "raw" device) for data storage. Berkeley DB uses the native file system on all platforms. Using the native file system has several important benefits.
First, since no special hardware configuration or support is required, your application will install and operate more easily. Your customer never needs to know that a database system is running.
Second, your customer never needs to dedicate storage space to your application. Since the file system is shared by all the applications running on a system, Berkeley DB can share space with other tools. You and your customers will never need to preallocate storage to your application.
Finally, ongoing administration of the database is much simpler. Berkeley DB uses the directory and file management services of the operating system. Moving databases from one location to another, or even from one machine to another, is simpler, since it only requires copying ordinary files.
Database dump and load utilities
Since Berkeley DB stores data in the native OS file system, in many cases no special backup or recovery tools are required. Operating systems typically require that a file system be completely quiescent for backup. However, some database applications must run all the time.
Berkeley DB includes programmatic interfaces to identify files that need to be backed up. Because Berkeley DB uses the native file system, applications can simply open, read, and copy files, even while the database is active. As a result, programmers can embed support for backups and recovery directly in their applications. Again, your users need never know that a database is installed.
Berkeley DB 提供接口来标识数据文件需要备份。当数据库处于运行时，因为Berkeley DB 使用本地文件系统，所以应用能打开、读和拷贝文件。即程序员能在他的应用中提供直接数据备份与恢复功能。而你的用户决不知道安装了这个数据库。
Power and Flexibility
Berkeley DB is a powerful, flexible data manager. The system provides the same services as more expensive database systems in a smaller, less expensive, and easier-to-use package.
Support for arbitrary data types
Most database systems are able to store and retrieve only a small set of data types. Berkeley DB can manage any data type that can be represented in a programming language. Simple scalar values or complex data structures can be used as either keys or as the values stored with each key.
Berkeley DB is able to store data in several different access methods. An application can use the storage structure and search strategy best suited to its needs. All of the access methods include default routines for operating on keys and values, so search and retrieval are easy to program. On the other hand, developers can override the defaults by providing management functions (for example, comparison or hash functions) specific to their data types. You can define your own keys, and define your own key ordering, using Berkeley DB.
Keyed and sequential access to records
Berkeley DB supports both keyed and sequential access to records. Keyed access permits fast searches for records that match part or all of a specific key. Sequential access allows programs to open a database and iterate over all its records, without regard to keys.
Store into application or allocated memory
Performance is the critical variable among embedded database systems. One important way that developers can control performance is by deciding whether to preallocate memory for operating on records, or to allow Berkeley DB to allocate memory for them. Function calls to fetch records, for example, allow programmers to pass in a buffer for the returned value, or to rely on Berkeley DB to allocate the required space.
Partial-record data storage and retrieval
Berkeley DB is able to manage long records. Since the time required to fetch a record is proportional to its size, the system includes tools for operating on partial records. If only a few bytes of a multi-megabyte record are required, the application can request partial record retrieval.
Support for cursors
Cursors are a database abstraction. They allow a program to iterate over multiple rows in the database easily. Berkeley DB supports cursors for ordinary searches, and for operating on sets of duplicate keys in the database.
Support for logical joins
In database terminology, a join is a combination of related data from two or more databases. For example, one database may store information on employees by employee ID, and another may store information on departments by department ID. If each employee is assigned a department ID, then Berkeley DB can join the two databases, and fetch department-specific information via the employee database.
在数据库术语中，连接是合并两个或更多的数据库中有关系的数据记录。例如，一个数据库可能按关键字Employee ID存储Employees信息，而另一个数据库按关键字Department ID存储Department信息。如果每个 employee是指派了一个Department ID，于是Berkeley DB能连接这两个数据库，并通过employee数据库取得department的细节信息。
Many applications need to look up a single record by more than one key at different times. Berkeley DB provides a mechanism called secondary indices to make this easy. An application can declare that a set of tables are related, with one storing the primary record and the rest providing fast lookup by alternate keys. Berkeley DB will update all the tables automatically whenever a new record is added to the primary table.
When the application wants to search for the record by one of the alternate keys, it simply searches the secondary index, and asks Berkeley DB to return the related record from the primary table.
Memory-mapped, read-only databases
Many of the operating systems on which Berkeley DB runs support memory-mapped operations on files. For applications that require read-only access to an existing database, using memory-mapped databases provides outstanding performance.
As computer system memory grows, more applications can run entirely out of main memory, rather than off of disk. Berkeley DB includes special support for main-memory databases. Using this support, applications can get fast access to the data that they need.
Architecture- independent databases
Applications today must run on a variety of hardware platforms. Even during the life of a single application at a customer's site, demand for services may change, forcing the application to move to new, faster hardware.
To simplify migration across hardware platforms, Berkeley DB can support the same database from either big-endian or little-endian systems. This allows end users to copy databases from one hardware platform to another.
Large storage devices and wide-area, high-speed networking demand that applications manage more data for more users than ever before. Applications built today will see exponential growth in disk and memory sizes over their lifetimes. As a result, developers need to plan for scalability up front.
Berkeley DB was designed to scale gracefully from low-volume, single-user data management to high-concurrency management of enormous databases.
Databases up to 256 terabytes
Berkeley DB uses 48 bits to address individual bytes in a database. This means that the largest theoretical Berkeley DB database is 248 bytes, or 256 terabytes, in size. Berkeley DB is in regular production use today managing databases that are hundreds of gigabytes in size.
Keys and values up to 4 gigabytes
New applications, including multimedia storage and playback systems, must manage individual data values that are large. Berkeley DB is able to store single keys and values as large as 232 bytes, or four gigabytes, in size.
Support for multiple readers
Berkeley DB applications support concurrent access to data by multiple readers, from a single process or from multiple processes. The system uses shared memory for caching, so that all users can share the work of fetching data from disk.
Read-only applications can declare the fact that they will do no database updates, reducing overhead and improving performance.
Support for multiple readers and writers
Most database applications require simultaneous access by many users, some of whom need to update records, and others who need only to view them. Berkeley DB includes support for concurrent access by readers and writers to a single database.
Users may access the database from a single process or from multiple processes. Caches are shared among all users, and Berkeley DB uses native O/S locking support on all platforms to guarantee that readers and writers are able to work without interfering with one another.
Berkeley DB offers both coarse-grained and fine-grained locking, allowing developers to choose the degree of concurrency that their application provides and the overhead that it incurs.
Fine-grained page locking allows many readers and writers to be active in the database at the same time. For high-concurrency workloads, this dramatically improves throughput. Coarse-grained, database-wide locking allows many readers to access the database at the same time, but guarantees that writers have exclusive access to it for any updates. This reduces locking overhead, but continues to provide great throughput for low-concurrency or read-mostly workloads.
Before- and after-image logging
Berkeley DB includes support for write-ahead logging, a database management technique that provides the ability to make many changes to the database at the same logical instant, while preserving the ability to back out erroneous changes later. This logging facility simplifies transaction commit and abort, and makes it possible to recover from catastrophic failures, including application or system crashes.
Berkeley DB supports group commit, a strategy for improving the performance of applications with a very high degree of write concurrency. Under group commit, if multiple transactions complete at close to the same time, Berkeley DB will automatically combine the operations in a single synchronous file system call. This lets multiple transactions take advantage of a single interaction with the operating system. Group commit can dramatically reduce the time spent committing any single transaction.
Group commit works automatically in Berkeley DB. The software developer does not need to take any special steps to turn it on in situations where it would help, and there is no overhead incurred in low-concurrency systems where group commit would not make a difference.
Applications that take advantage of Berkeley DB's replication service can support extremely high query loads and can scale up easily by adding new servers as necessary. With replication, updates go to a single master server, and the master distributes them to as many replicas as desired. Each of the replicas can answer read queries during normal processing. The ability to direct readers to any of a large number of replicas makes it simple to balance the query load for high-concurrency application.
Reliability and Availability
Database systems must provide reliable, on-demand access to the information that they manage. Berkeley DB meets these requirements through a combination of careful design and solid implementation.
Berkeley DB is a small-footprint data manager that includes no extraneous features. By leaving out features that programmers do not need, Berkeley DB is smaller and simpler than products from other vendors. Simplicity improves performance, because code paths are shorter, and reliability, because code review and testing are more likely to find any problems that exist.
Recovery from system or application failure
Berkeley DB uses write-ahead logging and checkpointing to log changes. Applications that need disaster recovery can use the logging system. After a failure, a database can be restored to its last transaction-consistent state by restoring the database file and rolling the log forward. The log and database can be located on different physical disks, to improve performance and to protect against disk failures.
Berkeley DB's replication service allows applications to run on a collection of cooperating server machines. In the event that any of the servers goes down, the others can assume its share of the workload.
Replication requires that all updates go to a single master server, which distributes them to as many replicas as the application needs. Each of the replicas can handle read queries during normal processing. In the event that the master system goes down for any reason, one of the replicas is chosen to take its place. From that point on, updates go to the new master. The application can continue to run with no interruption of service to the end user.
Some applications must operate twenty-four hours a day, seven days a week. For those applications, Berkeley DB includes support for on-line, or "hot," backups.
Hot backups allow system administrators to back up the database and the log while users are running applications. Berkeley DB allows developers to open, read, and copy database files, even while the database is in active use. As a result, developers can build support for on-line backups directly into their applications.
一些系统需要7X24小时运行，Berkeley DB支持在线热备。热备份允许系统管理员但系统运行时备份数据库数据及日志。Berkeley DB允许程序员在数据库数据使用时打开、读及拷贝数据库数据文件。因此，程序员可在他的应用程序中支持在线直接备份功能。