PicoStorage

Lightweight Structured Storage

Storage documentation

A storage is the hierarchical collection of files and directories stored in a single filesystem file. This class allow to create a new or open an existing storage, to get the root directory for the storage, and to commit or rollback the modifications operated on the storage.

A storage contains files and directories. A directory is pretty much like a file, with the difference that it has a flag marking it as a directory.

All files (and directories) are identified by a number, called inode number. There is a special file, called the inode file which is organized in fixed-size inode entries. The inode number is an index inside the inode file.

Each file has a corresponding inode entry (kept in the inode file). The inode entry stores the file size, some flags associated with the file (e.g. whether it's a file or directory), and the blocks which compose the file. When a file is composed of many blocks, there is not enough space to store the block information inside the inode entry. In this case we use indirection blocks, which keep information about the blocks composing the file. These indirection blocks are associated with the file, but are not part of the data contained in the file (e.g. you can't access the indirection blocks with pread()/pwrite()).

All the data in the storage can be compressed. You have some control over which type of data you want to be compressed or not. You can choose whether to compress or not the:

For example, you can choose to compress everything (this is the default), to completly disable compression and compress nothing, or, for example, to compress only the inode entries and the indirection blocks while keeping the user data uncompressed.

The compression options are set at storage creation time, and can't be changed afterwards.

create(const char *path, int compressFlags = kCompressAll);

Creates and initializes a new storage in a filesystem file. The filesystem file for the storage must not exist already (it is created now).

The create() operation is not protected in a transaction. This means that if the power goes down right in the middle of the create(), the half-created storage might be in an inconsistent state. In such a situation it's enough to delete the half-created storage file, and re-create it. But when the create() call finishes (i.e. returns), the storage file is already commited to disk, and the transactions are active; meaning that from this point on you're protected agains power-failures and software crashes, which won't corrupt the storage data.

Parameters:

path specifies the filesystem file where the storage data is kept. The path is system-dependent. Example:

The path may also contain only the file name (without a leading directory), in which case the file is opened in the current working directory of the process.

compressFlags is an optional parameter, and indicates the kind of compression you want to use. By default, full compression is enabled. For this parameter you can combine the flags:

Returns:

true on success, false if the storage file is already existing.

Exceptions:

An exception will be raised in the remaining error situations, e.g. the directory for the file doesn't exist, or no access rights, etc.

bool open(const char *path);

Opens an existing storage.

The filesystem file must already exist, and the process must be able to open the file in exclusive read/write mode. The exclusive open is needed to insure that the file won't be concurrently accessed by somebody else while it's modified by the PicoStorage library.

See create().

Returns:

true on success, false if the file wasn't found. An exception will be raised in other error situations, e.g. no access right, invalid file (not a valid PicoStorage file), etc.

Dir getRootDir()

Returns the root directory of this storage. You need the root directory as a starting point for accessing the other files and directories.

The returned directory is always valid (because any storage is guaranteed to have a root directory).

void commit();

Commits the modifications operated since the last commit to disk.

This operation can take some time, as the dirty pages from the Cache are written to the storage file. Next the storage file is sync-ed (to flush the written data to disk); afterwards an atomic write is done which switches the storage to the new state, followed by a second sync.

If a power failure happens during a commit(), at restart the storage will be either in the state before the commit() or after the commit(), i.e. coherent. The commit() takes place atomically: it can either happen completely, or not at all.

A crash outside commit() (e.g. during some modifications to the storage) has the effect of an automatic rollback() to the last commited state at restart. This way the coherence of the storage is guaranteed even in the presence of crashes, and no data corruption occurs.

If you don't need fullproof protection againts crashes, you can configure the commit() to skip the two syncs, thus gaining speed but allowing the possibility of data corruption in the case of a crash.

void rollback()

Undoes the modifications done since the last commit().

Reverts the storage to the last commited state. The Storage destructor also operates an implicit rollback(). All files and directories whose creation is reversed by the rollback() become invalid, and must not be used after the rollback.

void close()

Closes the storage.

Operates a commit(), and then releases all resources. Usually you close the storage when you're done with it. You don't have to call close(), because the destructor by itself will do all necessary cleanup. The difference is that the destructor does an implicit rollback(), while close() does an implicit commit().

After a close(), all files and directories opened from this storage become invalid, and must not be used anymore.

void setCacheSize(int nbOfPages);

Set the size of the of the in-RAM Cache used by the Storage.

The size is specified in number of pages (blocks). Increasing the cache size improves performances but increases memory usage. Normally, the more random accesses you do, the larger the cache you need. In the future it would be nice to specify the cache size in bytes (i.e. the maximum amount of memory you're willing to allocate for the cache), or to allow an auto-sizing of the cache depending on the working set and the amount of memory available. It's recommended to use a cache size of at least 50.

Any change to the cache size comes into effect at the next open() or create(), so you usually want to set the cache size before open()/create().