PalDB 读数据过程

简介: 开篇PalDB 介绍PalDB 写数据过程PalDB 读数据过程PalDB 线程安全版本PalDB reader的多级缓存 在解释PalDB的读过程之前先解释下为啥PalDB读的性能比较理想,其实PalDB可以理解为有多级缓存:一级缓存是StorageCache对象,里面是用LinkedHashMap实现的缓存二级缓存是StorageReader对象,通过mmap实现的文件到内存的映射。

开篇


PalDB reader的多级缓存

 在解释PalDB的读过程之前先解释下为啥PalDB读的性能比较理想,其实PalDB可以理解为有多级缓存:

  • 一级缓存是StorageCache对象,里面是用LinkedHashMap实现的缓存
  • 二级缓存是StorageReader对象,通过mmap实现的文件到内存的映射。
public final class ReaderImpl implements StoreReader {
  // Logger
  private final static Logger LOGGER = Logger.getLogger(ReaderImpl.class.getName());
  // Configuration
  private final Configuration config;
  // Buffer
  private final DataInputOutput dataInputOutput = new DataInputOutput();
  // Storage
  private final StorageReader storage;
  // Serialization
  private final StorageSerialization serialization;
  // Cache
  private final StorageCache cache;
  // File
  private final File file;
  // Opened?
  private boolean opened;




private StorageCache(Configuration config) {
    cache = new LinkedHashMap(config.getInt(Configuration.CACHE_INITIAL_CAPACITY),
        config.getFloat(Configuration.CACHE_LOAD_FACTOR), true) {
      @Override
      protected boolean removeEldestEntry(Map.Entry eldest) {
        boolean res = currentWeight > maxWeight;
        if (res) {
          Object key = eldest.getKey();
          Object value = eldest.getValue();
          currentWeight -= getWeight(key) + getWeight(value) + OVERHEAD;
        }
        return res;
      }
    };


PalDB 的读取过程

PalDB宏观的读取过程

 PalDB的读取过程也非常简单明了,就是标准的带有缓存的读取过程:

  • 判断缓存是否有数据,有则直接从cache中取出返回
  • 通过storage.get从mmap的文件中读取数据
  • 将数据放到缓存后直接返回数据
public <K> K get(Object key, K defaultValue) {
    checkOpen();
    if (key == null) {
      throw new NullPointerException("The key can't be null");
    }
    K value = cache.get(key);
    if (value == null) {
      try {
        byte[] valueBytes = storage.get(serialization.serializeKey(key));
        if (valueBytes != null) {

          Object v = serialization.deserialize(dataInputOutput.reset(valueBytes));
          cache.put(key, v);
          return (K) v;
        } else {
          return defaultValue;
        }
      } catch (Exception ex) {
        throw new RuntimeException(ex);
      }
    } else if (value == StorageCache.NULL_VALUE) {
      return null;
    }
    return value;
  }


StorageReader的读取过程

 PalDB读取过程是一个根据key进行hash定位slot的过程,整个过程如下:

  • 对key进行hash的得到hash值,获取key下的slot的数量
  • 在key的mmap内容中定位到key在PalDB中的位置,得到对应的value的偏移量
  • 通过value的偏移量在value的mmap内容中定位到value对应的值并返回

 在读取value的mmap的过程中,会根据偏移量对mmap的文件个数求余后定位mmap的文件后进行读取。

public byte[] get(byte[] key)
      throws IOException {
    int keyLength = key.length;
    if (keyLength >= slots.length || keyCounts[keyLength] == 0) {
      return null;
    }
    long hash = (long) hashUtils.hash(key);
    int numSlots = slots[keyLength];
    int slotSize = slotSizes[keyLength];
    int indexOffset = indexOffsets[keyLength];
    long dataOffset = dataOffsets[keyLength];

    for (int probe = 0; probe < numSlots; probe++) {
      int slot = (int) ((hash + probe) % numSlots);
      indexBuffer.position(indexOffset + slot * slotSize);
      indexBuffer.get(slotBuffer, 0, slotSize);

      long offset = LongPacker.unpackLong(slotBuffer, keyLength);
      if (offset == 0) {
        return null;
      }
      if (isKey(slotBuffer, key)) {
        byte[] value = mMapData ? getMMapBytes(dataOffset + offset) : getDiskBytes(dataOffset + offset);
        return value;
      }
    }
    return null;
  }



//Read the data at the given offset, the data can be spread over multiple data buffers
  private byte[] getMMapBytes(long offset)
      throws IOException {
    //Read the first 4 bytes to get the size of the data
    ByteBuffer buf = getDataBuffer(offset);
    int maxLen = (int) Math.min(5, dataSize - offset);

    int size;
    if (buf.remaining() >= maxLen) {
      //Continuous read
      int pos = buf.position();
      size = LongPacker.unpackInt(buf);

      // Used in case of data is spread over multiple buffers
      offset += buf.position() - pos;
    } else {
      //The size of the data is spread over multiple buffers
      int len = maxLen;
      int off = 0;
      sizeBuffer.reset();
      while (len > 0) {
        buf = getDataBuffer(offset + off);
        int count = Math.min(len, buf.remaining());
        buf.get(sizeBuffer.getBuf(), off, count);
        off += count;
        len -= count;
      }
      size = LongPacker.unpackInt(sizeBuffer);
      offset += sizeBuffer.getPos();
      buf = getDataBuffer(offset);
    }

    //Create output bytes
    byte[] res = new byte[size];

    //Check if the data is one buffer
    if (buf.remaining() >= size) {
      //Continuous read
      buf.get(res, 0, size);
    } else {
      int len = size;
      int off = 0;
      while (len > 0) {
        buf = getDataBuffer(offset);
        int count = Math.min(len, buf.remaining());
        buf.get(res, off, count);
        offset += count;
        off += count;
        len -= count;
      }
    }

    return res;
  }


private ByteBuffer getDataBuffer(long index) {
    ByteBuffer buf = dataBuffers[(int) (index / segmentSize)];
    buf.position((int) (index % segmentSize));
    return buf;
  }


StorageReader的初始化过程

 StorageReader的初始化过程其实就是一个写入的逆向过程,怎么写入就怎么读取,整体的顺序如下:

  • 读取metaData的PalDB信息,包括key起始位移,value起始位移
  • 将key对应的内容映射到内存的mmap当中,不超过2GB。
  • 将value对应的内容映射到内存的mmap当中,根据segment大小进行切分
StorageReader(Configuration configuration, File file)
      throws IOException {
    path = file;
    config = configuration;
    
    //Config
    segmentSize = config.getLong(Configuration.MMAP_SEGMENT_SIZE);

    hashUtils = new HashUtils();

    // Check valid segmentSize
    if (segmentSize > Integer.MAX_VALUE) {
      throw new IllegalArgumentException(
          "The `" + Configuration.MMAP_SEGMENT_SIZE + "` setting can't be larger than 2GB");
    }

    //Open file and read metadata
    long createdAt = 0;
    FormatVersion formatVersion = null;
    FileInputStream inputStream = new FileInputStream(path);
    DataInputStream dataInputStream = new DataInputStream(new BufferedInputStream(inputStream));
    try {
      int ignoredBytes = -2;

      //Byte mark
      byte[] mark = FormatVersion.getPrefixBytes();
      int found = 0;
      while (found != mark.length) {
        byte b = dataInputStream.readByte();
        if (b == mark[found]) {
          found++;
        } else {
          ignoredBytes += found + 1;
          found = 0;
        }
      }

      //Version
      byte[] versionFound = Arrays.copyOf(mark, FormatVersion.getLatestVersion().getBytes().length);
      dataInputStream.readFully(versionFound, mark.length, versionFound.length - mark.length);

      formatVersion = FormatVersion.fromBytes(versionFound);
      if (formatVersion == null || !formatVersion.is(FormatVersion.getLatestVersion())) {
        throw new RuntimeException(
                "Version mismatch, expected was '" + FormatVersion.getLatestVersion() + "' and found '" + formatVersion
                        + "'");
      }

      //Time
      createdAt = dataInputStream.readLong();

      //Metadata counters
      keyCount = dataInputStream.readInt();
      keyLengthCount = dataInputStream.readInt();
      maxKeyLength = dataInputStream.readInt();

      //Read offset counts and keys
      indexOffsets = new int[maxKeyLength + 1];
      dataOffsets = new long[maxKeyLength + 1];
      keyCounts = new int[maxKeyLength + 1];
      slots = new int[maxKeyLength + 1];
      slotSizes = new int[maxKeyLength + 1];

      int maxSlotSize = 0;
      for (int i = 0; i < keyLengthCount; i++) {
        int keyLength = dataInputStream.readInt();

        keyCounts[keyLength] = dataInputStream.readInt();
        slots[keyLength] = dataInputStream.readInt();
        slotSizes[keyLength] = dataInputStream.readInt();
        indexOffsets[keyLength] = dataInputStream.readInt();
        dataOffsets[keyLength] = dataInputStream.readLong();

        maxSlotSize = Math.max(maxSlotSize, slotSizes[keyLength]);
      }

      slotBuffer = new byte[maxSlotSize];

      //Read serializers
      try {
        Serializers.deserialize(dataInputStream, config.getSerializers());
      } catch (Exception e) {
        throw new RuntimeException();
      }

      //Read index and data offset
      indexOffset = dataInputStream.readInt() + ignoredBytes;
      dataOffset = dataInputStream.readLong() + ignoredBytes;
    } finally {
      //Close metadata
      dataInputStream.close();
      inputStream.close();
    }

    //Create Mapped file in read-only mode
    mappedFile = new RandomAccessFile(path, "r");
    channel = mappedFile.getChannel();
    long fileSize = path.length();

    //Create index buffer
    indexBuffer = channel.map(FileChannel.MapMode.READ_ONLY, indexOffset, dataOffset - indexOffset);

    //Create data buffers
    dataSize = fileSize - dataOffset;

    //Check if data size fits in memory map limit
    if (!config.getBoolean(Configuration.MMAP_DATA_ENABLED)) {
      //Use classical disk read
      mMapData = false;
      dataBuffers = null;
    } else {
      //Use Mmap
      mMapData = true;

      //Build data buffers
      int bufArraySize = (int) (dataSize / segmentSize) + ((dataSize % segmentSize != 0) ? 1 : 0);
      dataBuffers = new MappedByteBuffer[bufArraySize];
      int bufIdx = 0;
      for (long offset = 0; offset < dataSize; offset += segmentSize) {
        long remainingFileSize = dataSize - offset;
        long thisSegmentSize = Math.min(segmentSize, remainingFileSize);
        dataBuffers[bufIdx++] = channel.map(FileChannel.MapMode.READ_ONLY, dataOffset + offset, thisSegmentSize);
      }
    }
目录
相关文章
|
3月前
|
存储 缓存 安全
Buffer循环机制
Buffer循环机制
26 0
【文件随机读写和文件缓冲区】
1.1fseek函数 根据文件指针的位置和偏移量来定位文件指针。 看不懂没关系,举个例子你就明白了。 我们首先在text.txt文件中放入 “abcdef” 这些字符。
|
11月前
|
计算机视觉 索引 Windows
视频操作_01视频读写:视频读写+读取视频+保存视频
在OpenCV中我们要获取一个视频,需要创建一个VideoCapture对象
137 0
|
12月前
|
存储 编译器 C语言
缓冲区刷新在 C++ 中意味着什么?
缓冲区刷新是将计算机数据从临时存储区域传输到计算机的永久内存。例如,如果我们对文件进行任何更改,我们在一台计算机屏幕上看到的更改会临时存储在缓冲区中。
111 0
|
Linux C语言 Windows
【Linux进程】四、printf函数的缓冲区刷新机制与父子进程间的“读共享写拷贝”问题
【Linux进程】四、printf函数的缓冲区刷新机制与父子进程间的“读共享写拷贝”问题
184 0
【Linux进程】四、printf函数的缓冲区刷新机制与父子进程间的“读共享写拷贝”问题
文件IO操作的一些练习小案例
文件IO操作的一些练习小案例
112 0
|
存储 缓存 Linux
流的读写
流的读写
72 0
|
NoSQL 数据库 开发者
了解读写数据流程 | 学习笔记
快速学习 了解读写数据流程
69 0
了解读写数据流程 | 学习笔记
面试官:IO 操作必须要手动关闭吗?关闭流方法是否有顺序?
面试官:IO 操作必须要手动关闭吗?关闭流方法是否有顺序?
138 0
IO流的字节流的缓冲和非缓冲方式的区别及性能对比
IO流的字节流的缓冲和非缓冲方式的区别及性能对比
226 0

热门文章

最新文章