通过数据集管理工程数据#

在VisionFlow中, 数据被分成: Parameter 参数和 Property 属性两种类型。本节会通过一些例子详细介绍如何访问、更新 property 属性类型的数据。

所有的属性数据 property 都被保存在数据库中并被组织成 SampleSet 数据集这种数据结构，通过高层接口的抽象隐藏了底层实现细节提升易用性。

通过 PropertySet 访问 Property #

PropertySet是一个管理样本集中所有样本的单个同一属性数据的数据结构。样本集中所有样本的某个属性都存储在属性集中。

PropertySet同时还记录了所有样本的该属性的最后更新时间。

Note

与SampleSet类似，PropertySet同样也维护了一个数据库句柄，PropertySet析构时也会减少数据库的引用计数。

PropertySetIterator#

与SampleSet类似，PropertySet同样也提供了迭代器类型： visionflow::data::PropertySetIterator。

以下示例代码展示了如何通过PropertySetIterator迭代获取样本集中所有样本的图片属性：

C++

// get the PropertySet of Image
visionflow::data::PropertySet property_set = sample_set.property_set({"Input", "image"});

for (auto iter = property_set.begin(); iter.valid(); ++iter) {
    std::cout << "sample id :" << iter.key() << std::endl;

    // convert to props::Image ptr
    auto img = iter.value()->as<visionflow::props::Image>();
    // use img property to do some work ...
}

Python

.. TODO

.. TODO

访问和修改PropertySet中的属性#

你可以通过样本id或者PropertySetIterator来访问或修改更新属性集中的属性。

Warning

样本id必须在样本集中存在，否则会抛出异常 visionflow::excepts::SampleNotFound .

PropertySetIterator必须指向属性集中存在的属性，否则会抛出异常 visionflow::excepts::InvalidIterator .

以下示例代码展示了如何更新属性集中的图片属性：

C++

// get the PropertySet of Image
visionflow::data::PropertySet property_set = sample_set.property_set({"Input", "image"});

// get last update time
std::cout << "last time:" << property_set.last_update_time();

// get property type
assert(property_set.property_type() == "visionflow::props::Image");

// check sample not empty
assert(property_set.sample_empty() == false);
assert(property_set.data_exists(1) == true);

// get image property by sample id
auto img_of_sample_1 = property_set.at(1);

// reset a new image to the image property
img_of_sample_1->set_image(visionflow::img::Image::FromFile("D:/path/to/new_img.jpg"));

// update PropertySet by sample id
property_set.update(1, *img_of_sample_1);

// erase PropertySet by iterator
const auto&iter = property_set.find(4);
property_set.erase(iter);

// now the image property in sample 4 will be erased

Python

.. TODO

.. TODO

WriteBatch提供更好的性能#

由于每次将样本的修改更新到数据库都会引发磁盘I/O，因此频繁的更新将会花费很多性能开销。

我们提供WriteBatch批量更新功能，允许将多次修改缓存到内存中，最后将这些修改一次性的更新到数据库并落盘，减少频繁的数据库访问以提升性能。

PropertyWriteBatch#

PropertyWriteBatch缓存所有对属性集的修改操作。

Note

对于同一个样本，只有最后的修改会生效(之前缓存的修改将被覆盖)。

Warning

你需要确保被修改的样本存在于样本集中，否则在更新PropertyWriteBatch时会抛出异常 visionflow::excepts::SampleNotFound 。

以下示例代码展示了如何使用 PropertyWriteBatch 批量更新属性集：

C++

// get the PropertySet of Image
visionflow::data::PropertySet property_set = sample_set.property_set({"Input", "image"});

// firstly, create a PropertyWriteBatch
PropertyWriteBatch prop_write_batch = property_set.create_write_batch();

// catch all your modification on image PropertySet for different samples

// update sample 1 with img_101
visionflow::props::Image img_101;
img_101.set_image(visionflow::img::Image::FromFile("D:/path/to/img_101.jpg"));
prop_write_batch.update(1, img_101);

// update sample 2 with img_102
visionflow::props::Image img_102;
img_102.set_image(visionflow::img::Image::FromFile("D:/path/to/img_102.jpg"));
prop_write_batch.update(2, img_102);

// erase image property in sample 3
prop_write_batch.erase(3);

// erase image property in sample 2
// this will overwrite previous update operator
prop_write_batch.erase(2);

// update to database at once
property_set.write_batch(prop_write_batch);

// now all the modification are update to database
// image property in:
//          sample 1 will update to img_101;
//          sample 2 and sample 3 will be erased;

Python

.. TODO

.. TODO

SampleWriteBatch#

SampleWriteBatch缓存了所有对样本集的修改操作，例如更新整个样本数据、更新某个样本里的某个属性数据或是更新样本描述信息。

Note

同样的，对于同一个样本的同一个数据，只有最后的修改会生效(之前缓存的修改将被覆盖)。

Warning

你需要确保被修改的样本存在于样本集中，否则在批量更新SampleWriteBatch时会抛出异常 visionflow::excepts::SampleNotFound 。

以下示例代码展示了如何使用 SampleWriteBatch 批量更新样本集：

C++

// firstly, create a SampleWriteBatch
SampleWriteBatch sample_write_batch = sample_set.create_write_batch();

const visionflow::ToolNodeId img_prop_id = {"Input/image"};

// catch all your modification to SampleWriteBatch

// update sample 1 with img_101
visionflow::props::Image img_101;
img_101.set_image(visionflow::img::Image::FromFile("D:/path/to/img_101.jpg"));
sample_write_batch.update(
    1, img_prop_id, std::make_shared<visionflow::props::Image>(img_101));

// update sample 2 with img_102
visionflow::props::Image img_102;
img_102.set_image(visionflow::img::Image::FromFile("D:/path/to/img_102.jpg"));
sample_write_batch.update(
    2, img_prop_id, std::make_shared<visionflow::props::Image>(img_102));

// erase in sample 3
sample_write_batch.erase(3);

// erase image property in sample 2
// this will overwrite previous update operator
sample_write_batch.erase(2, img_prop_id);

// update to database at once
sample_set.write_batch(sample_write_batch);

// now all the modification are update to database
// image property in:
//          sample 1 will update to img_101;
//          sample 2 will be erased;
// and the whole sample 3 will be erased from SampleSet;

Python

.. TODO

.. TODO

数据过滤#

SampleSet 和 PropertySet 都提供过滤功能，允许你通过设置自定义过滤条件来获得你想要的某些数据。过滤条件是包含了一个函数的Python脚本字符串。

Note

为了能够正确的编写python脚本的过滤条件，你需要了解如下信息:

你需要十分注意缩进，缩进在Python中非常重要，错误的缩进会导致Python语法错误；
脚本中需要包含一个实现了具体的过滤逻辑的过滤函数，并返回一个布尔类型的结果作为过滤条件的结果的函数；
函数名必须为”vflow_filter”，并接受一个 Sample(ReadOnlySampleSetView) 或 Property(ReadOnlyPropertySetView) 作为参数；
你需要将引入正确的Visionflow模块，Visionflow的Python模块与C++中的命名空间相同。

Warning

Python脚本必须符合Python语法，否则执行时会抛出异常 visionflow::excepts::PythonSyntaxError 。

ReadOnlySampleSetView#

ReadOnlySampleSetView允许过滤出满足过滤脚本条件的样本。

以下示例代码展示了如何使用ReadOnlySampleSetView过滤出图片大小在1024*1024到2048*2048之间的样本：

C++

// firstly, write our filter script
const std::string filter_script = R"(
from visionflow import *
from visionflow.props import *
from visionflow.img import *
def vflow_filter(sample):
  img_prop_id = ToolNodeId("Input", "image")

  if not(sample.exist_property_data(img_prop_id)):
      return False
  img = sample.get(img_prop_id).image();
  return  1024 <= img.size().w <= 2048 and 1024 <= img.size().h <= 2048 )";

auto sample_view = sample_set.filter(filter_script);

// if image in sample 1 meet the filter requirement, then its id will exist
// assert(sample_filter.exists(1));

// so to other samples
// assert(sample_filter.exists(2));
assert(sample_view.ids() == std::vector<uint32_t>{1, 2});

Python

.. TODO

.. TODO

关于 ReadOnlySampleSetView 的更多信息，请查看 visionflow::data::ReadOnlySampleSetView.

ReadOnlyPropertySetView#

ReadOnlyPropertySetView提供了两种过滤方式：

可以设置要过滤的样本id，将某些id的样本直接排除在外(此过滤方式是在样本层面直接过滤)；
设置Python脚本过滤，该脚本以属性为参数，过滤出的样本中属性满足脚本的要求(此过滤方式是考察属性层面是否满足过滤条件)；

ReadOnlyPropertySetView的过滤脚本要求与ReadOnlySampleSetView一致，唯一的不同就是其接受的参数为属性而不是样本。(Script Helper)

以下示例代码展示了如何使用ReadOnlyPropertySetView过滤出图片大小在1024*1024到2048*2048之间的样本，并且排除了样本id等于1的样本：

C++

// firstly,
const visionflow::ToolNodeId img_prop_id = {"Input/image"};

// secondly, write our filter script
const std::string filter_script = R"(
from visionflow import *
from visionflow.props import *
from visionflow.img import *
def vflow_filter(prop):
  img = prop.image();
  return  1024 <= img.size().w <= 2048 and 1024 <= img.size().h <= 2048 )";

PropertySet img_prop_set = sample_set.property_set(img_prop_id);

ReadOnlyPropertySetView img_prop_view = img_prop_set.fitler(filter_script);

// if image in sample 1 meet the filter requirement of the python script, then its id will exist
assert(img_prop_view.sample_exists(1));
assert(img_prop_view.data_exists(1));

 // if image in sample 2 not meet the filter requirement of the python script, then it will be filtered out
assert(img_prop_view.sample_exists(2));
assert(img_prop_view.data_exists(2) == false);

// all sample with filter ids will be filtered out
// no matter they meet the filter requirement of the python script or not
img_prop_view.set_filter_ids({1});
assert(img_prop_view.sample_exists(1) == false);
assert(img_prop_view.data_exists(1) == false);

Python

.. TODO

.. TODO

关于 ReadOnlyPropertySetView 的更多信息，请查看 visionflow::data::ReadOnlyPropertySetView 。

通过数据集管理工程数据#

通过数据集 SampleSet 获取数据 Sample #

LazySample和Sample#

LazySample#

Sample#

SampleSetIterator#

SampleDescriptor#

更新SampleDescriptor#

添加Sample#

更新Sample#

移除sample#

通过 PropertySet 访问 Property #

PropertySetIterator#

访问和修改PropertySet中的属性#

WriteBatch提供更好的性能#

PropertyWriteBatch#

SampleWriteBatch#

数据过滤#

ReadOnlySampleSetView#

ReadOnlyPropertySetView#

通过数据集管理工程数据#

通过数据集 SampleSet 获取数据 Sample#

LazySample和Sample#

LazySample#

Sample#

SampleSetIterator#

SampleDescriptor#

更新SampleDescriptor#

添加Sample#

更新Sample#

移除sample#

通过 PropertySet 访问 Property#

PropertySetIterator#

访问和修改PropertySet中的属性#

WriteBatch提供更好的性能#

PropertyWriteBatch#

SampleWriteBatch#

数据过滤#

ReadOnlySampleSetView#

ReadOnlyPropertySetView#

通过数据集 SampleSet 获取数据 Sample #

通过 PropertySet 访问 Property #