Skip to content

More validation for partial read parameters #116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 29 additions & 4 deletions Zarr.m
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,13 @@ function pyReloadInProcess()
paramName)
end

if any(params>dims)
error("MATLAB:Zarr:PartialReadOutOfBounds",...
"Elements in %s must not exceed "+...
"the corresponding Zarr array dimensions.",...
paramName)
end

newParams = params;
end

Expand Down Expand Up @@ -315,20 +322,38 @@ function makeZarrGroups(existingParentPath, newGroupsPath)
count = Zarr.processPartialReadParams(count, info.shape,...
maxCount, "Count");

if any(count>maxCount)
error("MATLAB:Zarr:PartialReadOutOfBounds",...
"Requested Count in combination with other "+...
"parameters exceeds Zarr array dimensions.")
end

% Convert partial read parameters to tensorstore-style
% indexing
start = start - 1; % tensorstore is 0-based
% Tensorstore uses end index instead of count
% (it does NOT include element at the end index)
endInds = start + stride.*count;

% Store the datatype
obj.Datatype = ZarrDatatype.fromZarrType(info.dtype);

% Check if reading requested data might exceed available memory
try
zeros(count, obj.Datatype.MATLABType);
catch ME
if strcmp(ME.identifier, 'MATLAB:array:SizeLimitExceeded')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you have to use IF here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I only want to check for the error related to exceeding size limit. I am choosing to silently swallow any other possible errors (rare/unlikely edge-cases). We are trying to create an array of zeros here just as a way to see if reading a similarly-sized array from Zarr file will exceed the memory limit (unfortunately couldn't find a better way) , so my thinking is that any other errors would be unrelated, and we shouldn't throw them and interrupt user workflow. If something goes wrong with creating an array if zeros that is unrelated to memory limit, we should just try reading from the zarr file anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying this.

error("MATLAB:Zarr:OutOfMemory",...
"Reading requested data (%s %s matrix) "+...
"might exceed available memory.",...
join(string(count), "-by-"), obj.Datatype.MATLABType)
end
end

% Read the data
ndArrayData = Zarr.ZarrPy.readZarr(obj.KVStoreSchema,...
start, endInds, stride);

% Store the datatype
obj.Datatype = ZarrDatatype.fromTensorstoreType(ndArrayData.dtype.name);


% Convert the numpy array to MATLAB array
data = cast(ndArrayData, obj.Datatype.MATLABType);
end
Expand Down
14 changes: 14 additions & 0 deletions ZarrDatatype.m
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,15 @@
obj = ZarrDatatype(ind);
end

function obj = fromZarrType(zarrType)
% Create a datatype object based on Zarr datatype name
arguments
zarrType (1,1) string {ZarrDatatype.mustBeZarrType}
end

ind = find(zarrType == ZarrDatatype.ZarrTypes);
obj = ZarrDatatype(ind);
end

function mustBeMATLABType(type)
% Validator for MATLAB types
Expand All @@ -84,6 +93,11 @@ function mustBeTensorstoreType(type)
% Validator for Tensorstore types
mustBeMember(type, ZarrDatatype.TensorstoreTypes)
end

function mustBeZarrType(type)
% Validator for Zarr types
mustBeMember(type, ZarrDatatype.ZarrTypes)
end
end

end
23 changes: 19 additions & 4 deletions test/tZarrRead.m
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,16 @@ function nonExistentArray(testcase)
testcase.verifyError(@()zarrread('nonexistent/'),errID);
end

function tooBigArray(testcase)
% Verify zarrread error when a user tries to read data that is
% too large

bigDataPath = "bigData/myzarr";
zarrcreate(bigDataPath, [100000,100000], Datatype='single');
errID = 'MATLAB:Zarr:OutOfMemory';
testcase.verifyError(@()zarrread(bigDataPath),errID);
end

function invalidFilePath(testcase)
% Verify zarrread error when an invalid file path is used.

Expand Down Expand Up @@ -147,7 +157,7 @@ function invalidPartialReadParams(testcase)
testcase.verifyError(@()zarrread(zpath,Count=wrongDims),errID);

% Invalid type
errID = 'MATLAB:validators:mustBeNumericOrLogical';
errID = 'MATLAB:validators:mustBeNumeric';
testcase.verifyError(@()zarrread(zpath,"Start",""),errID);
testcase.verifyError(@()zarrread(zpath,"Stride",""),errID);
testcase.verifyError(@()zarrread(zpath,"Count",""),errID);
Expand All @@ -159,12 +169,17 @@ function invalidPartialReadParams(testcase)
testcase.verifyError(@()zarrread(zpath,"Stride",inpVal),errID);
testcase.verifyError(@()zarrread(zpath,"Count",inpVal),errID);

% Input out of bounds

% Parameters out of bounds
inpVal = [100 200];
errID = 'MATLAB:Python:PyException';
errID = 'MATLAB:Zarr:PartialReadOutOfBounds';
testcase.verifyError(@()zarrread(zpath,"Start",inpVal),errID);
%testcase.verifyError(@()zarrread(zpath,"Stride",inpVal),errID);
testcase.verifyError(@()zarrread(zpath,"Stride",inpVal),errID);
testcase.verifyError(@()zarrread(zpath,"Count",inpVal),errID);

% Combination of parameters out of bounds
testcase.verifyError(...
@()zarrread(zpath,Start=[3 4],Count=[2 2]),errID)
end
end
end
6 changes: 3 additions & 3 deletions zarrread.m
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@

arguments
filepath {mustBeTextScalar, mustBeNonzeroLengthText}
options.Start (1,:) {mustBeInteger, mustBePositive} = [];
options.Count (1,:) {mustBeInteger, mustBePositive} = [];
options.Stride (1,:) {mustBeInteger, mustBePositive} = [];
options.Start (1,:) {mustBeNumeric, mustBeInteger, mustBePositive} = [];
options.Count (1,:) {mustBeNumeric, mustBeInteger, mustBePositive} = [];
options.Stride (1,:) {mustBeNumeric, mustBeInteger, mustBePositive} = [];
end

zarrObj = Zarr(filepath);
Expand Down