Internal Design

This page gives an overview of the internal design of xarray.

In totality, the Xarray project defines 4 key data structures. In order of increasing complexity, they are:

  • xarray.Variable,

  • xarray.DataArray,

  • xarray.Dataset,

  • xarray.DataTree.

The user guide lists only xarray.DataArray and xarray.Dataset, but Variable is the fundamental object internally, and DataTree is a natural generalisation of xarray.Dataset.

Note

Our Development roadmap includes plans to document Variable as fully public API.

Internally private lazy indexing classes are used to avoid loading more data than necessary, and flexible indexes classes (derived from Index) provide performant label-based lookups.

Data Structures

The Data Structures page in the user guide explains the basics and concentrates on user-facing behavior, whereas this section explains how xarray’s data structure classes actually work internally.

Variable Objects

The core internal data structure in xarray is the Variable, which is used as the basic building block behind xarray’s Dataset, DataArray types. A Variable consists of:

  • dims: A tuple of dimension names.

  • data: The N-dimensional array (typically a NumPy or Dask array) storing the Variable’s data. It must have the same number of dimensions as the length of dims.

  • attrs: A dictionary of metadata associated with this array. By convention, xarray’s built-in operations never use this metadata.

  • encoding: Another dictionary used to store information about how these variable’s data is represented on disk. See Reading encoded data for more details.

Variable has an interface similar to NumPy arrays, but extended to make use of named dimensions. For example, it uses dim in preference to an axis argument for methods like mean, and supports Broadcasting by dimension name.

However, unlike Dataset and DataArray, the basic Variable does not include coordinate labels along each axis.

Variable is public API, but because of its incomplete support for labeled data, it is mostly intended for advanced uses, such as in xarray itself, for writing new backends, or when creating custom indexes. You can access the variable objects that correspond to xarray objects via the (readonly) Dataset.variables and DataArray.variable attributes.

DataArray Objects

The simplest data structure used by most users is DataArray. A DataArray is a composite object consisting of multiple Variable objects which store related data.

A single Variable is referred to as the “data variable”, and stored under the variable` attribute. A DataArray inherits all of the properties of this data variable, i.e. dims, data, attrs and encoding, all of which are implemented by forwarding on to the underlying Variable object.

In addition, a DataArray stores additional Variable objects stored in a dict under the private _coords attribute, each of which is referred to as a “Coordinate Variable”. These coordinate variable objects are only allowed to have dims that are a subset of the data variable’s dims, and each dim has a specific length. This means that the full size of the dataarray can be represented by a dictionary mapping dimension names to integer sizes. The underlying data variable has this exact same size, and the attached coordinate variables have sizes which are some subset of the size of the data variable. Another way of saying this is that all coordinate variables must be “alignable” with the data variable.

When a coordinate is accessed by the user (e.g. via the dict-like __getitem__ syntax), then a new DataArray is constructed by finding all coordinate variables that have compatible dimensions and re-attaching them before the result is returned. This is why most users never see the Variable class underlying each coordinate variable - it is always promoted to a DataArray before returning.

Lookups are performed by special Index objects, which are stored in a dict under the private _indexes attribute. Indexes must be associated with one or more coordinates, and essentially act by translating a query given in physical coordinate space (typically via the sel() method) into a set of integer indices in array index space that can be used to index the underlying n-dimensional array-like data. Indexing in array index space (typically performed via the isel() method) does not require consulting an Index object.

Finally a DataArray defines a name attribute, which refers to its data variable but is stored on the wrapping DataArray class. The name attribute is primarily used when one or more DataArray objects are promoted into a Dataset (e.g. via to_dataset()). Note that the underlying Variable objects are all unnamed, so they can always be referred to uniquely via a dict-like mapping.

Dataset Objects

The Dataset class is a generalization of the DataArray class that can hold multiple data variables. Internally all data variables and coordinate variables are stored under a single variables dict, and coordinates are specified by storing their names in a private _coord_names dict.

The dataset’s dims are the set of all dims present across any variable, but (similar to in dataarrays) coordinate variables cannot have a dimension that is not present on any data variable.

When a data variable or coordinate variable is accessed, a new DataArray is again constructed from all compatible coordinates before returning.

Note

The way that selecting a variable from a DataArray or Dataset actually involves internally wrapping the Variable object back up into a DataArray/Dataset is the primary reason we recommend against subclassing Xarray objects. The main problem it creates is that we currently cannot easily guarantee that for example selecting a coordinate variable from your SubclassedDataArray would return an instance of SubclassedDataArray instead of just an xarray.DataArray. See GH issue for more details.

Lazy Indexing Classes

Lazy Loading

If we open a Variable object from disk using open_dataset() we can see that the actual values of the array wrapped by the data variable are not displayed.

In [1]: da = xr.tutorial.open_dataset("air_temperature")["air"]
---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
File /usr/lib/python3/dist-packages/urllib3/connection.py:198, in HTTPConnection._new_conn(self)
    197 try:
--> 198     sock = connection.create_connection(
    199         (self._dns_host, self.port),
    200         self.timeout,
    201         source_address=self.source_address,
    202         socket_options=self.socket_options,
    203     )
    204 except socket.gaierror as e:

File /usr/lib/python3/dist-packages/urllib3/util/connection.py:60, in create_connection(address, timeout, source_address, socket_options)
     58     raise LocationParseError(f"'{host}', label empty or too long") from None
---> 60 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
     61     af, socktype, proto, canonname, sa = res

File /usr/lib/python3.13/socket.py:977, in getaddrinfo(host, port, family, type, proto, flags)
    976 addrlist = []
--> 977 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    978     af, socktype, proto, canonname, sa = res

gaierror: [Errno -3] Temporary failure in name resolution

The above exception was the direct cause of the following exception:

NameResolutionError                       Traceback (most recent call last)
File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:787, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    786 # Make the request on the HTTPConnection object
--> 787 response = self._make_request(
    788     conn,
    789     method,
    790     url,
    791     timeout=timeout_obj,
    792     body=body,
    793     headers=headers,
    794     chunked=chunked,
    795     retries=retries,
    796     response_conn=response_conn,
    797     preload_content=preload_content,
    798     decode_content=decode_content,
    799     **response_kw,
    800 )
    802 # Everything went great!

File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:488, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
    487         new_e = _wrap_proxy_error(new_e, conn.proxy.scheme)
--> 488     raise new_e
    490 # conn.request() calls http.client.*.request, not the method in
    491 # urllib3.request. It also calls makefile (recv) on the socket.

File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:464, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
    463 try:
--> 464     self._validate_conn(conn)
    465 except (SocketTimeout, BaseSSLError) as e:

File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:1093, in HTTPSConnectionPool._validate_conn(self, conn)
   1092 if conn.is_closed:
-> 1093     conn.connect()
   1095 # TODO revise this, see https://github.com/urllib3/urllib3/issues/2791

File /usr/lib/python3/dist-packages/urllib3/connection.py:704, in HTTPSConnection.connect(self)
    703 sock: socket.socket | ssl.SSLSocket
--> 704 self.sock = sock = self._new_conn()
    705 server_hostname: str = self.host

File /usr/lib/python3/dist-packages/urllib3/connection.py:205, in HTTPConnection._new_conn(self)
    204 except socket.gaierror as e:
--> 205     raise NameResolutionError(self.host, self, e) from e
    206 except SocketTimeout as e:

NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7f6a0436b390>: Failed to resolve 'github.com' ([Errno -3] Temporary failure in name resolution)

The above exception was the direct cause of the following exception:

MaxRetryError                             Traceback (most recent call last)
File /usr/lib/python3/dist-packages/requests/adapters.py:667, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    666 try:
--> 667     resp = conn.urlopen(
    668         method=request.method,
    669         url=url,
    670         body=request.body,
    671         headers=request.headers,
    672         redirect=False,
    673         assert_same_host=False,
    674         preload_content=False,
    675         decode_content=False,
    676         retries=self.max_retries,
    677         timeout=timeout,
    678         chunked=chunked,
    679     )
    681 except (ProtocolError, OSError) as err:

File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:841, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    839     new_e = ProtocolError("Connection aborted.", new_e)
--> 841 retries = retries.increment(
    842     method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2]
    843 )
    844 retries.sleep()

File /usr/lib/python3/dist-packages/urllib3/util/retry.py:519, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
    518     reason = error or ResponseError(cause)
--> 519     raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    521 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /pydata/xarray-data/raw/master/air_temperature.nc (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f6a0436b390>: Failed to resolve 'github.com' ([Errno -3] Temporary failure in name resolution)"))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
Cell In[1], line 1
----> 1 da = xr.tutorial.open_dataset("air_temperature")["air"]

File /build/reproducible-path/python-xarray-2025.03.1/xarray/tutorial.py:167, in open_dataset(name, cache, cache_dir, engine, **kws)
    164 downloader = pooch.HTTPDownloader(headers=headers)
    166 # retrieve the file
--> 167 filepath = pooch.retrieve(
    168     url=url, known_hash=None, path=cache_dir, downloader=downloader
    169 )
    170 ds = _open_dataset(filepath, engine=engine, **kws)
    171 if not cache:

File /usr/lib/python3/dist-packages/pooch/core.py:239, in retrieve(url, known_hash, fname, path, processor, downloader, progressbar)
    236 if downloader is None:
    237     downloader = choose_downloader(url, progressbar=progressbar)
--> 239 stream_download(url, full_path, known_hash, downloader, pooch=None)
    241 if known_hash is None:
    242     get_logger().info(
    243         "SHA256 hash of downloaded file: %s\n"
    244         "Use this value as the 'known_hash' argument of 'pooch.retrieve'"
   (...)
    247         file_hash(str(full_path)),
    248     )

File /usr/lib/python3/dist-packages/pooch/core.py:807, in stream_download(url, fname, known_hash, downloader, pooch, retry_if_failed)
    803 try:
    804     # Stream the file to a temporary so that we can safely check its
    805     # hash before overwriting the original.
    806     with temporary_file(path=str(fname.parent)) as tmp:
--> 807         downloader(url, tmp, pooch)
    808         hash_matches(tmp, known_hash, strict=True, source=str(fname.name))
    809         shutil.move(tmp, str(fname))

File /usr/lib/python3/dist-packages/pooch/downloaders.py:220, in HTTPDownloader.__call__(self, url, output_file, pooch, check_only)
    218     # pylint: enable=consider-using-with
    219 try:
--> 220     response = requests.get(url, timeout=timeout, **kwargs)
    221     response.raise_for_status()
    222     content = response.iter_content(chunk_size=self.chunk_size)

File /usr/lib/python3/dist-packages/requests/api.py:73, in get(url, params, **kwargs)
     62 def get(url, params=None, **kwargs):
     63     r"""Sends a GET request.
     64 
     65     :param url: URL for the new :class:`Request` object.
   (...)
     70     :rtype: requests.Response
     71     """
---> 73     return request("get", url, params=params, **kwargs)

File /usr/lib/python3/dist-packages/requests/api.py:59, in request(method, url, **kwargs)
     55 # By using the 'with' statement we are sure the session is closed, thus we
     56 # avoid leaving sockets open which can trigger a ResourceWarning in some
     57 # cases, and look like a memory leak in others.
     58 with sessions.Session() as session:
---> 59     return session.request(method=method, url=url, **kwargs)

File /usr/lib/python3/dist-packages/requests/sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    584 send_kwargs = {
    585     "timeout": timeout,
    586     "allow_redirects": allow_redirects,
    587 }
    588 send_kwargs.update(settings)
--> 589 resp = self.send(prep, **send_kwargs)
    591 return resp

File /usr/lib/python3/dist-packages/requests/sessions.py:703, in Session.send(self, request, **kwargs)
    700 start = preferred_clock()
    702 # Send the request
--> 703 r = adapter.send(request, **kwargs)
    705 # Total elapsed time of the request (approximately)
    706 elapsed = preferred_clock() - start

File /usr/lib/python3/dist-packages/requests/adapters.py:700, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    696     if isinstance(e.reason, _SSLError):
    697         # This branch is for urllib3 v1.22 and later.
    698         raise SSLError(e, request=request)
--> 700     raise ConnectionError(e, request=request)
    702 except ClosedPoolError as e:
    703     raise ConnectionError(e, request=request)

ConnectionError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /pydata/xarray-data/raw/master/air_temperature.nc (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f6a0436b390>: Failed to resolve 'github.com' ([Errno -3] Temporary failure in name resolution)"))

In [2]: var = da.variable
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 var = da.variable

AttributeError: module 'dask.array' has no attribute 'variable'

In [3]: var
Out[3]: 
<xarray.Variable (x: 10)> Size: 80B
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
Attributes:
    scale_factor:  10
    add_offset:    2

We can see the size, and the dtype of the underlying array, but not the actual values. This is because the values have not yet been loaded.

If we look at the private attribute _data() containing the underlying array object, we see something interesting:

In [4]: var._data
Out[4]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

You’re looking at one of xarray’s internal Lazy Indexing Classes. These powerful classes are hidden from the user, but provide important functionality.

Calling the public data property loads the underlying array into memory.

In [5]: var.data
Out[5]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

This array is now cached, which we can see by accessing the private attribute again:

In [6]: var._data
Out[6]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

Lazy Indexing

The purpose of these lazy indexing classes is to prevent more data being loaded into memory than is necessary for the subsequent analysis, by deferring loading data until after indexing is performed.

Let’s open the data from disk again.

In [7]: da = xr.tutorial.open_dataset("air_temperature")["air"]
---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
File /usr/lib/python3/dist-packages/urllib3/connection.py:198, in HTTPConnection._new_conn(self)
    197 try:
--> 198     sock = connection.create_connection(
    199         (self._dns_host, self.port),
    200         self.timeout,
    201         source_address=self.source_address,
    202         socket_options=self.socket_options,
    203     )
    204 except socket.gaierror as e:

File /usr/lib/python3/dist-packages/urllib3/util/connection.py:60, in create_connection(address, timeout, source_address, socket_options)
     58     raise LocationParseError(f"'{host}', label empty or too long") from None
---> 60 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
     61     af, socktype, proto, canonname, sa = res

File /usr/lib/python3.13/socket.py:977, in getaddrinfo(host, port, family, type, proto, flags)
    976 addrlist = []
--> 977 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    978     af, socktype, proto, canonname, sa = res

gaierror: [Errno -3] Temporary failure in name resolution

The above exception was the direct cause of the following exception:

NameResolutionError                       Traceback (most recent call last)
File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:787, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    786 # Make the request on the HTTPConnection object
--> 787 response = self._make_request(
    788     conn,
    789     method,
    790     url,
    791     timeout=timeout_obj,
    792     body=body,
    793     headers=headers,
    794     chunked=chunked,
    795     retries=retries,
    796     response_conn=response_conn,
    797     preload_content=preload_content,
    798     decode_content=decode_content,
    799     **response_kw,
    800 )
    802 # Everything went great!

File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:488, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
    487         new_e = _wrap_proxy_error(new_e, conn.proxy.scheme)
--> 488     raise new_e
    490 # conn.request() calls http.client.*.request, not the method in
    491 # urllib3.request. It also calls makefile (recv) on the socket.

File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:464, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
    463 try:
--> 464     self._validate_conn(conn)
    465 except (SocketTimeout, BaseSSLError) as e:

File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:1093, in HTTPSConnectionPool._validate_conn(self, conn)
   1092 if conn.is_closed:
-> 1093     conn.connect()
   1095 # TODO revise this, see https://github.com/urllib3/urllib3/issues/2791

File /usr/lib/python3/dist-packages/urllib3/connection.py:704, in HTTPSConnection.connect(self)
    703 sock: socket.socket | ssl.SSLSocket
--> 704 self.sock = sock = self._new_conn()
    705 server_hostname: str = self.host

File /usr/lib/python3/dist-packages/urllib3/connection.py:205, in HTTPConnection._new_conn(self)
    204 except socket.gaierror as e:
--> 205     raise NameResolutionError(self.host, self, e) from e
    206 except SocketTimeout as e:

NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7f69f75b6350>: Failed to resolve 'github.com' ([Errno -3] Temporary failure in name resolution)

The above exception was the direct cause of the following exception:

MaxRetryError                             Traceback (most recent call last)
File /usr/lib/python3/dist-packages/requests/adapters.py:667, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    666 try:
--> 667     resp = conn.urlopen(
    668         method=request.method,
    669         url=url,
    670         body=request.body,
    671         headers=request.headers,
    672         redirect=False,
    673         assert_same_host=False,
    674         preload_content=False,
    675         decode_content=False,
    676         retries=self.max_retries,
    677         timeout=timeout,
    678         chunked=chunked,
    679     )
    681 except (ProtocolError, OSError) as err:

File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:841, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    839     new_e = ProtocolError("Connection aborted.", new_e)
--> 841 retries = retries.increment(
    842     method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2]
    843 )
    844 retries.sleep()

File /usr/lib/python3/dist-packages/urllib3/util/retry.py:519, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
    518     reason = error or ResponseError(cause)
--> 519     raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    521 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /pydata/xarray-data/raw/master/air_temperature.nc (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f69f75b6350>: Failed to resolve 'github.com' ([Errno -3] Temporary failure in name resolution)"))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
Cell In[7], line 1
----> 1 da = xr.tutorial.open_dataset("air_temperature")["air"]

File /build/reproducible-path/python-xarray-2025.03.1/xarray/tutorial.py:167, in open_dataset(name, cache, cache_dir, engine, **kws)
    164 downloader = pooch.HTTPDownloader(headers=headers)
    166 # retrieve the file
--> 167 filepath = pooch.retrieve(
    168     url=url, known_hash=None, path=cache_dir, downloader=downloader
    169 )
    170 ds = _open_dataset(filepath, engine=engine, **kws)
    171 if not cache:

File /usr/lib/python3/dist-packages/pooch/core.py:239, in retrieve(url, known_hash, fname, path, processor, downloader, progressbar)
    236 if downloader is None:
    237     downloader = choose_downloader(url, progressbar=progressbar)
--> 239 stream_download(url, full_path, known_hash, downloader, pooch=None)
    241 if known_hash is None:
    242     get_logger().info(
    243         "SHA256 hash of downloaded file: %s\n"
    244         "Use this value as the 'known_hash' argument of 'pooch.retrieve'"
   (...)
    247         file_hash(str(full_path)),
    248     )

File /usr/lib/python3/dist-packages/pooch/core.py:807, in stream_download(url, fname, known_hash, downloader, pooch, retry_if_failed)
    803 try:
    804     # Stream the file to a temporary so that we can safely check its
    805     # hash before overwriting the original.
    806     with temporary_file(path=str(fname.parent)) as tmp:
--> 807         downloader(url, tmp, pooch)
    808         hash_matches(tmp, known_hash, strict=True, source=str(fname.name))
    809         shutil.move(tmp, str(fname))

File /usr/lib/python3/dist-packages/pooch/downloaders.py:220, in HTTPDownloader.__call__(self, url, output_file, pooch, check_only)
    218     # pylint: enable=consider-using-with
    219 try:
--> 220     response = requests.get(url, timeout=timeout, **kwargs)
    221     response.raise_for_status()
    222     content = response.iter_content(chunk_size=self.chunk_size)

File /usr/lib/python3/dist-packages/requests/api.py:73, in get(url, params, **kwargs)
     62 def get(url, params=None, **kwargs):
     63     r"""Sends a GET request.
     64 
     65     :param url: URL for the new :class:`Request` object.
   (...)
     70     :rtype: requests.Response
     71     """
---> 73     return request("get", url, params=params, **kwargs)

File /usr/lib/python3/dist-packages/requests/api.py:59, in request(method, url, **kwargs)
     55 # By using the 'with' statement we are sure the session is closed, thus we
     56 # avoid leaving sockets open which can trigger a ResourceWarning in some
     57 # cases, and look like a memory leak in others.
     58 with sessions.Session() as session:
---> 59     return session.request(method=method, url=url, **kwargs)

File /usr/lib/python3/dist-packages/requests/sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    584 send_kwargs = {
    585     "timeout": timeout,
    586     "allow_redirects": allow_redirects,
    587 }
    588 send_kwargs.update(settings)
--> 589 resp = self.send(prep, **send_kwargs)
    591 return resp

File /usr/lib/python3/dist-packages/requests/sessions.py:703, in Session.send(self, request, **kwargs)
    700 start = preferred_clock()
    702 # Send the request
--> 703 r = adapter.send(request, **kwargs)
    705 # Total elapsed time of the request (approximately)
    706 elapsed = preferred_clock() - start

File /usr/lib/python3/dist-packages/requests/adapters.py:700, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    696     if isinstance(e.reason, _SSLError):
    697         # This branch is for urllib3 v1.22 and later.
    698         raise SSLError(e, request=request)
--> 700     raise ConnectionError(e, request=request)
    702 except ClosedPoolError as e:
    703     raise ConnectionError(e, request=request)

ConnectionError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /pydata/xarray-data/raw/master/air_temperature.nc (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f69f75b6350>: Failed to resolve 'github.com' ([Errno -3] Temporary failure in name resolution)"))

In [8]: var = da.variable
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], line 1
----> 1 var = da.variable

AttributeError: module 'dask.array' has no attribute 'variable'

Now, notice how even after subsetting the data has does not get loaded:

In [9]: var.isel(time=0)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 var.isel(time=0)

File /build/reproducible-path/python-xarray-2025.03.1/xarray/core/variable.py:1020, in Variable.isel(self, indexers, missing_dims, **indexers_kwargs)
    996 """Return a new array indexed along the specified dimension(s).
    997 
    998 Parameters
   (...)
   1016     indexer, in which case the data will be a copy.
   1017 """
   1018 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel")
-> 1020 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims)
   1022 key = tuple(indexers.get(dim, slice(None)) for dim in self.dims)
   1023 return self[key]

File /build/reproducible-path/python-xarray-2025.03.1/xarray/core/utils.py:844, in drop_dims_from_indexers(indexers, dims, missing_dims)
    842     invalid = indexers.keys() - set(dims)
    843     if invalid:
--> 844         raise ValueError(
    845             f"Dimensions {invalid} do not exist. Expected one or more of {dims}"
    846         )
    848     return indexers
    850 elif missing_dims == "warn":
    851     # don't modify input

ValueError: Dimensions {'time'} do not exist. Expected one or more of ('x',)

The shape has changed, but the values are still not shown.

Looking at the private attribute again shows how this indexing information was propagated via the hidden lazy indexing classes:

In [10]: var.isel(time=0)._data
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 1
----> 1 var.isel(time=0)._data

File /build/reproducible-path/python-xarray-2025.03.1/xarray/core/variable.py:1020, in Variable.isel(self, indexers, missing_dims, **indexers_kwargs)
    996 """Return a new array indexed along the specified dimension(s).
    997 
    998 Parameters
   (...)
   1016     indexer, in which case the data will be a copy.
   1017 """
   1018 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel")
-> 1020 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims)
   1022 key = tuple(indexers.get(dim, slice(None)) for dim in self.dims)
   1023 return self[key]

File /build/reproducible-path/python-xarray-2025.03.1/xarray/core/utils.py:844, in drop_dims_from_indexers(indexers, dims, missing_dims)
    842     invalid = indexers.keys() - set(dims)
    843     if invalid:
--> 844         raise ValueError(
    845             f"Dimensions {invalid} do not exist. Expected one or more of {dims}"
    846         )
    848     return indexers
    850 elif missing_dims == "warn":
    851     # don't modify input

ValueError: Dimensions {'time'} do not exist. Expected one or more of ('x',)

Note

Currently only certain indexing operations are lazy, not all array operations. For discussion of making all array operations lazy see GH issue #5081.

Lazy Dask Arrays

Note that xarray’s implementation of Lazy Indexing classes is completely separate from how dask.array.Array objects evaluate lazily. Dask-backed xarray objects delay almost all operations until compute() is called (either explicitly or implicitly via plot() for example). The exceptions to this laziness are operations whose output shape is data-dependent, such as when calling where().