CSDN博客

img lyc96532

Berkeley DB嵌入数据库技术特性

发表于2004/7/16 17:17:00  2421人阅读

像MySQL这类基于C/S结构的关系型数据库系统虽然代表着目前数据库应用的主流,但却并不能满足所有应用场合的需要。有时我们需要的可能只是一个简单的基于磁盘文件的数据库系统。这样不仅可以避免安装庞大的数据库服务器,而且还可以简化数据库应用程序的设计。Berkeley DB正是基于这样的思想提出来的。 

Berkeley DB简介

 

Berkeley DB是一个开放源代码的内嵌式数据库管理系统,能够为应用程序提供高性能的数据管理服务。应用它程序员只需要调用一些简单的API就可以完成对数据的访问和管理。与常用的数据库管理系统(如MySQLOracle等)有所不同,在Berkeley DB中并没有数据库服务器的概念。应用程序不需要事先同数据库服务建立起网络连接,而是通过内嵌在程序中的Berkeley DB函数库来完成对数据的保存、查询、修改和删除等操作。

 

Berkeley DB为许多编程语言提供了实用的API接口,包括CC++JavaPerlTclPythonPHP等。所有同数据库相关的操作都由Berkeley DB函数库负责统一完成。这样无论是系统中的多个进程,或者是相同进程中的多个线程,都可以在同一时间调用访问数据库的函数。而底层的数据加锁、事务日志和存储管理等都在Berkeley DB函数库中实现。它们对应用程序来讲是完全透明的。俗话说:“麻雀虽小五脏俱全。”Berkeley DB函数库本身虽然只有300KB左右,但却能够用来管理多达256TB的数据,并且在许多方面的性能还能够同商业级的数据库系统相抗衡。就拿对数据的并发操作来说,Berkeley DB能够很轻松地应付几千个用户同时访问同一个数据库的情况。此外,如果想在资源受限的嵌入式系统上进行数据库管理,Berkeley DB可能就是惟一正确的选择了。

 

Berkeley DB作为一种嵌入式数据库系统在许多方面有着独特的优势。首先,由于其应用程序和数据库管理系统运行在相同的进程空间当中,进行数据操作时可以避免繁琐的进程间通信,因此耗费在通信上的开销自然也就降低到了极低程度。其次,Berkeley DB使用简单的函数调用接口来完成所有的数据库操作,而不是在数据库系统中经常用到的SQL语言。这样就避免了对结构化查询语言进行解析和处理所需的开销。

 

Berkeley DB is a full-service embedded database system for use by software developers. It is distributed in source code form, and is compiled and linked directly into your application.

The sections below give a detailed description of Berkeley DB's important features. If you need additional detail, the complete documentation suite is available on-line.

Berkeley DB是一个源码公开,可直接编译到你的应用里的嵌入数据库系统。以下章节将详细说明Berkeley DB的主要特性。如果你需要更加详细的内容,可查看http://www.sleepycat.com/products/documentation.shtml

 

Ease of Use

Berkeley DB is intended for use by software developers who need to embed reliable, high-performance data management services in their applications. It does not require mastery of database-specific query languages, like SQL. Instead, developers make function calls that operate directly on the database and the records that it manages.

Once deployed, Berkeley DB is simple to administer. Other databases require a trained database administrator to handle backups, recovery, performance tuning, and routine maintenance. Berkeley DB uses standard operating system services, and needs no special device access. Maintenance tasks such as backup and recovery can be handled by standard operating system tools. Our goal is that end users of applications that embed Berkeley DB never be aware that they are using a database.

 易用

 

Berkeley DB是专门为在他们应用中需要可靠嵌入,高性能数据库管理服务的开发者提供的。它不需要掌握专门的数据库查询语言(如SQL)。程序员可以调用功能函数直接操作数据库及管理的数据记录。

一旦经过配置,Berkeley DB就可以轻易的管理。一些数据库系统要求系统管理员培训去掌握备份、恢复、性能调优及日常维护。Berkeley DB使用标准的操作系统功能、不需要特殊的设备访问。如备份恢复任务可以用标准的操作系统工具。我们的目标是使终端用户使用嵌入Berkeley DB时不要感觉到是在使用数据库。

 

Open source distribution

 

Berkeley DB is an open source product, meaning that it is freely available for download in source code form, and may be freely used without commercial license under certain conditions. For information on licensing, see our Product Licensing page.

 

The fact that developers get full source code for the product makes it easier to use for several reasons.

 

First, you are no longer dependent on an outside vendor for changes, performance tuning, or debugging of the software. Sleepycat offers a full range of support and consulting services for Berkeley DB, but the fact that you have the source code means you have more control over your product than any binary database product can give you.

 

Second, you can integrate Berkeley DB into your product's build environment in the most natural way for you.

 

Finally, wide distribution of the source code means that many thousands of software developers have reviewed it. Berkeley DB's public interfaces and internal interfaces have all been carefully examined by a huge number of engineers, and their suggestions have produced a smaller, simpler, more reliable, and easier-to-use package.

 

开放源码发布

Berkeley DB是一个开源产品,也就是说可以免费的下载源代码并可经过认可在没有商业许可证的情况下免费使用。关于许可证的信息,可查看许可证页面。

 

实际上开发人员取得全部源码也是产品更容易使用的原因。

 

首先,其他的数据库厂商不会有太多修改、性能调优及系统调试的支持,Sleepycat提供完全的Berkeley DB支持和咨询服务,实际上你拥有源码就意味着你有相对一些二进制数据库产品给你的更多控制权。

 

第二,你能集成Berkeley DB到你的产品编译环境里去。

 

最后,源代码的广阔发布意味着有成千的软件开发人员检测它。Berkeley DB的公共接口和内部接口已经经过无数工程师的检查,他们的建议已经修改为更小、更简易,更可靠和更容易使用的开发包。

 

Small footprint

 

Berkeley DB is a compact system. The full package, including all access methods and recoverability and transaction support, is roughly 375K of text space on common architectures.

 

Berkeley DB是一个紧凑系统。所有包,包括访问方法和可恢复性及事务支持模块,在一般系统构架中大概占用375K

 

Choice of several, easy-to-use APIs

 

Built for programmers, by programmers, Berkeley DB requires no special training in database access languages. Instead, the system provides an easy-to-use function-call interface for operating on databases and the records that they store. This interface supports simple record insertion and search, but also more complicated operations, including cursors, joins, management of duplicate values, and more.

 

The C/C++ and Java APIs and full documentation for their use are included in the distributed system. Programmers working in other languages may also choose among Perl, Python, Tcl, Ruby and others. The language-specific interfaces make all the power and flexibility of Berkeley DB available in a way that is natural for the language of choice.

 

Berkeley DB不需要特殊的数据库访问语言。代替的是提供易用的功能接口操作数据库和存储记录。这些接口支持简单的数据插入与查询,但也包括游标、连接、复制管理等更多复杂的操作。

 

C/C++Jave APIs和全部文档包含在发布的系统程序目录里,也可以选择其他语言如PerlPythonTclRuby等。这些与语言相关接口覆盖Berkeley DB所有可用的功能与扩展性,语言是可选择的。

 

Thread-safe library

 

Because Berkeley DB can be deployed in so many different ways, Sleepycat has been careful to provide the tools that developers require, without mandating their use. A good example is Berkeley DB's support for multi-threaded operation.

 

The library is entirely thread-safe. As Berkeley DB itself does not mandate the use of any particular threads package, you can use the one you like best or the one most natural to your application. You can build applications that are single-threaded or multi-threaded, as your application requires.

 

Berkeley DB works equally well when multiple processes operate on a single database. Whether sharing is among threads in a single process, among processes on a machine, or some hybrid of the two, the database software correctly handles caching, locking, and other core services. You can concentrate on your application without worrying about database architecture.

 

安全线程库

 

因为Berkeley DB支持许多不同的配置方法,Sleepycat 。。。。一个好的例子是Berkeley DB支持多线程操作。

 

这个库完全线程安全。????

你能按你的应用要求建立单线程或多线程应用。

 

当多进程操作一个数据时,Berkeley DB同样能工作得很好。数据库无论共享在单进程中的多线程、在同一物理机器中的多进程中还是混合构架中,Berkeley DB同样能正确处理缓冲、锁和其他核心服务。你能专注你的应用而不必担心数据库的构架。

 

 

File system integration

 

Once your application is deployed and running at your customer's site, ongoing maintenance of the database is a major concern. Berkeley DB has been carefully designed to minimize and, in most cases, entirely eliminate database administration (DBA) tasks.

 

Other database systems require use of a dedicated disk (a "raw" device) for data storage. Berkeley DB uses the native file system on all platforms. Using the native file system has several important benefits.

 

First, since no special hardware configuration or support is required, your application will install and operate more easily. Your customer never needs to know that a database system is running.

 

Second, your customer never needs to dedicate storage space to your application. Since the file system is shared by all the applications running on a system, Berkeley DB can share space with other tools. You and your customers will never need to preallocate storage to your application.

 

Finally, ongoing administration of the database is much simpler. Berkeley DB uses the directory and file management services of the operating system. Moving databases from one location to another, or even from one machine to another, is simpler, since it only requires copying ordinary files.

文件系统集成

 

      

 

  

     

   

Database dump and load utilities

 

Since Berkeley DB stores data in the native OS file system, in many cases no special backup or recovery tools are required. Operating systems typically require that a file system be completely quiescent for backup. However, some database applications must run all the time.

 

Berkeley DB includes programmatic interfaces to identify files that need to be backed up. Because Berkeley DB uses the native file system, applications can simply open, read, and copy files, even while the database is active. As a result, programmers can embed support for backups and recovery directly in their applications. Again, your users need never know that a database is installed.

 

数据卸载与转载功能

 

因为Berkeley DB的数据存储在本地OS文件系统中,不需要特殊的备份及恢复工具。操作系统特别规定文件备份时需要文件系统完全停止下来。然而,有些数据库应用必须一直保持运行。

 

Berkeley DB 提供接口来标识数据文件需要备份。当数据库处于运行时,因为Berkeley DB 使用本地文件系统,所以应用能打开、读和拷贝文件。即程序员能在他的应用中提供直接数据备份与恢复功能。而你的用户决不知道安装了这个数据库。

 

Power and Flexibility

 

Berkeley DB is a powerful, flexible data manager. The system provides the same services as more expensive database systems in a smaller, less expensive, and easier-to-use package.

 

功能与灵活性

 

Berkeley DB有强大、灵活的数据管理功能。它提供比付费数据库系统更小、更低花费与更容易使用的开发包。

 

Support for arbitrary data types

 

Most database systems are able to store and retrieve only a small set of data types. Berkeley DB can manage any data type that can be represented in a programming language. Simple scalar values or complex data structures can be used as either keys or as the values stored with each key.

 

Berkeley DB is able to store data in several different access methods. An application can use the storage structure and search strategy best suited to its needs. All of the access methods include default routines for operating on keys and values, so search and retrieval are easy to program. On the other hand, developers can override the defaults by providing management functions (for example, comparison or hash functions) specific to their data types. You can define your own keys, and define your own key ordering, using Berkeley DB.

 

支持任意的数据类型

 

大部分数据库系统只能存储与返回仅仅设定的数据类型。Berkeley DB能管理能在程序语言中描述的数据类型。      

 

  

     

   

Keyed and sequential access to records

 

Berkeley DB supports both keyed and sequential access to records. Keyed access permits fast searches for records that match part or all of a specific key. Sequential access allows programs to open a database and iterate over all its records, without regard to keys.

 

Keyed 和连续访问记录

 

Berkeley DB支持按关键字访问和顺序访问两种方式。按键字访问准许快速查询记录。顺序访问允许程序不管键字而打开数据库并遍历所有记录。

 

Store into application or allocated memory

 

Performance is the critical variable among embedded database systems. One important way that developers can control performance is by deciding whether to preallocate memory for operating on records, or to allow Berkeley DB to allocate memory for them. Function calls to fetch records, for example, allow programmers to pass in a buffer for the returned value, or to rely on Berkeley DB to allocate the required space.

性能时评价嵌入数据库的标准。

      

 

  

     

   

Partial-record data storage and retrieval

 

Berkeley DB is able to manage long records. Since the time required to fetch a record is proportional to its size, the system includes tools for operating on partial records. If only a few bytes of a multi-megabyte record are required, the application can request partial record retrieval.

 

      

 

  

     

   

Support for cursors

 

Cursors are a database abstraction. They allow a program to iterate over multiple rows in the database easily. Berkeley DB supports cursors for ordinary searches, and for operating on sets of duplicate keys in the database.

 

游标的支持

它允许程序很容易的去遍历数据库的多条记录。Berkeley DB支持游标查询

 

Support for logical joins

 

In database terminology, a join is a combination of related data from two or more databases. For example, one database may store information on employees by employee ID, and another may store information on departments by department ID. If each employee is assigned a department ID, then Berkeley DB can join the two databases, and fetch department-specific information via the employee database.

 

支持逻辑连接

 

在数据库术语中,连接是合并两个或更多的数据库中有关系的数据记录。例如,一个数据库可能按关键字Employee ID存储Employees信息,而另一个数据库按关键字Department ID存储Department信息。如果每个 employee是指派了一个Department ID,于是Berkeley DB能连接这两个数据库,并通过employee数据库取得department的细节信息。

 

   

Secondary indices

 

Many applications need to look up a single record by more than one key at different times. Berkeley DB provides a mechanism called secondary indices to make this easy. An application can declare that a set of tables are related, with one storing the primary record and the rest providing fast lookup by alternate keys. Berkeley DB will update all the tables automatically whenever a new record is added to the primary table.

 

When the application wants to search for the record by one of the alternate keys, it simply searches the secondary index, and asks Berkeley DB to return the related record from the primary table.

 

Memory-mapped, read-only databases

 

Many of the operating systems on which Berkeley DB runs support memory-mapped operations on files. For applications that require read-only access to an existing database, using memory-mapped databases provides outstanding performance.

 

内存映射,只读保护

 

Berkeley DB在许多操作系统上运行时支持文件内存映射操作。为需要只读访问存在的数据库的应用提供只读保护功能,使用内存映射数据库提供更高的性能。

 

Main-memory databases

 

As computer system memory grows, more applications can run entirely out of main memory, rather than off of disk. Berkeley DB includes special support for main-memory databases. Using this support, applications can get fast access to the data that they need.

 

内存数据库

 

由于计算机系统内存的增长,更多的应用能完全允许在主内存中,Berkeley DB支持内存数据库模式。使用这些特性,应用可获取它们需要的更快数据访问能力。

 

Architecture- independent databases

 

Applications today must run on a variety of hardware platforms. Even during the life of a single application at a customer's site, demand for services may change, forcing the application to move to new, faster hardware.

 

To simplify migration across hardware platforms, Berkeley DB can support the same database from either big-endian or little-endian systems. This allows end users to copy databases from one hardware platform to another.

 

 

 

  

     

   

Scalability

 

Large storage devices and wide-area, high-speed networking demand that applications manage more data for more users than ever before. Applications built today will see exponential growth in disk and memory sizes over their lifetimes. As a result, developers need to plan for scalability up front.

 

Berkeley DB was designed to scale gracefully from low-volume, single-user data management to high-concurrency management of enormous databases.

 

      

 

  

     

   

Databases up to 256 terabytes

 

Berkeley DB uses 48 bits to address individual bytes in a database. This means that the largest theoretical Berkeley DB database is 248 bytes, or 256 terabytes, in size. Berkeley DB is in regular production use today managing databases that are hundreds of gigabytes in size.

 

 

 

Keys and values up to 4 gigabytes

 

New applications, including multimedia storage and playback systems, must manage individual data values that are large. Berkeley DB is able to store single keys and values as large as 232 bytes, or four gigabytes, in size.

 

 

Support for multiple readers

 

Berkeley DB applications support concurrent access to data by multiple readers, from a single process or from multiple processes. The system uses shared memory for caching, so that all users can share the work of fetching data from disk.

 

Read-only applications can declare the fact that they will do no database updates, reducing overhead and improving performance.

 

   

Support for multiple readers and writers

 

Most database applications require simultaneous access by many users, some of whom need to update records, and others who need only to view them. Berkeley DB includes support for concurrent access by readers and writers to a single database.

 

Users may access the database from a single process or from multiple processes. Caches are shared among all users, and Berkeley DB uses native O/S locking support on all platforms to guarantee that readers and writers are able to work without interfering with one another.

 

 

Fine-grained locking

 

Berkeley DB offers both coarse-grained and fine-grained locking, allowing developers to choose the degree of concurrency that their application provides and the overhead that it incurs.

 

Fine-grained page locking allows many readers and writers to be active in the database at the same time. For high-concurrency workloads, this dramatically improves throughput. Coarse-grained, database-wide locking allows many readers to access the database at the same time, but guarantees that writers have exclusive access to it for any updates. This reduces locking overhead, but continues to provide great throughput for low-concurrency or read-mostly workloads.

 

 

 

 

Before- and after-image logging

 

Berkeley DB includes support for write-ahead logging, a database management technique that provides the ability to make many changes to the database at the same logical instant, while preserving the ability to back out erroneous changes later. This logging facility simplifies transaction commit and abort, and makes it possible to recover from catastrophic failures, including application or system crashes.

 

 

Group commit

 

Berkeley DB supports group commit, a strategy for improving the performance of applications with a very high degree of write concurrency. Under group commit, if multiple transactions complete at close to the same time, Berkeley DB will automatically combine the operations in a single synchronous file system call. This lets multiple transactions take advantage of a single interaction with the operating system. Group commit can dramatically reduce the time spent committing any single transaction.

 

Group commit works automatically in Berkeley DB. The software developer does not need to take any special steps to turn it on in situations where it would help, and there is no overhead incurred in low-concurrency systems where group commit would not make a difference.

 

 

Load balancing

 

Applications that take advantage of Berkeley DB's replication service can support extremely high query loads and can scale up easily by adding new servers as necessary. With replication, updates go to a single master server, and the master distributes them to as many replicas as desired. Each of the replicas can answer read queries during normal processing. The ability to direct readers to any of a large number of replicas makes it simple to balance the query load for high-concurrency application.

 

 

 

Reliability and Availability

 

Database systems must provide reliable, on-demand access to the information that they manage. Berkeley DB meets these requirements through a combination of careful design and solid implementation.

 

Berkeley DB is a small-footprint data manager that includes no extraneous features. By leaving out features that programmers do not need, Berkeley DB is smaller and simpler than products from other vendors. Simplicity improves performance, because code paths are shorter, and reliability, because code review and testing are more likely to find any problems that exist.

 

 

Recovery from system or application failure

 

Berkeley DB uses write-ahead logging and checkpointing to log changes. Applications that need disaster recovery can use the logging system. After a failure, a database can be restored to its last transaction-consistent state by restoring the database file and rolling the log forward. The log and database can be located on different physical disks, to improve performance and to protect against disk failures.

 

Transparent fail-over

 

Berkeley DB's replication service allows applications to run on a collection of cooperating server machines. In the event that any of the servers goes down, the others can assume its share of the workload.

 

Replication requires that all updates go to a single master server, which distributes them to as many replicas as the application needs. Each of the replicas can handle read queries during normal processing. In the event that the master system goes down for any reason, one of the replicas is chosen to take its place. From that point on, updates go to the new master. The application can continue to run with no interruption of service to the end user.

 

 

Hot backups

 

Some applications must operate twenty-four hours a day, seven days a week. For those applications, Berkeley DB includes support for on-line, or "hot," backups.

 

Hot backups allow system administrators to back up the database and the log while users are running applications. Berkeley DB allows developers to open, read, and copy database files, even while the database is in active use. As a result, developers can build support for on-line backups directly into their applications.

 

热备

 

一些系统需要7X24小时运行,Berkeley DB支持在线热备。热备份允许系统管理员但系统运行时备份数据库数据及日志。Berkeley DB允许程序员在数据库数据使用时打开、读及拷贝数据库数据文件。因此,程序员可在他的应用程序中支持在线直接备份功能。
0 0

相关博文

我的热门文章

img
取 消
img