Data abstraction

The general methodology of isolating the detailed definition of data objects from the procedures that use and act on the objects. We want some "glue" that holds together objects to combine them into a single abstracted object.

But what is data anyways?

Claim:
We can get by just fine with only one primitive data type: the function.

Case-in-point 1:
The only condition a rational number representation needs to satisfy is the following:
if we construct a rational number x from two integers n and d, then extracting numer and denom of x and dividing them should give the same result as dividing n and d, i.e,

\frac{numer}{denom} = \frac{n}{d}

In general, we can think of data as defined by some collection of selectors and constructors, together with specified conditions that these procedures must fulfill in order to be a valid representation

Case-in-point 2:
In fact, the only condition that a pair needs to satisfy is that if we construct a pair from two inputs x and y, then the procedures get-first and get-second should return the original inputs. Lisp demonstration that data pairs can be implemented just by procedures:

(define (cons x y) 
	(lambda (f) (f x y)))

(define (get-first z) 
	(z (lambda (p q) p)))

(define (get-second z) 
	(z (lambda (p q) q)))

Case-in-point 3:
Church numerals
We can even represent the concept of zero, one, two... as a procedure that applies any given function $f$ 0, 1, 2.... times. Furthermore, addition would simply correspond to composition etc

(define zero (lambda (f) (lambda (x) x)))
(define one (lambda (f) (lambda (x) (f x))))
(define two (lambda (f) (lambda (x) (f (f x)))))

(define (add m n)
  (lambda (f) (lambda (x) ((m f) ((n f) x)))))

(define (mul m n)
  (lambda (f) (lambda (x) ((m (n f)) x))))

Useful data abstractions

Pairs allow us to create sequence data structures:
- lists, trees and the like.
- options to flexibly map over them, maps and conventional interfaces.
- stateful/mutable pairs allow the building of even more sophisticated data objects such as tables, queues (FIFO buffers) and stacks(LIFO buffers).
Quotation enables defining symbolic variables:
- say your name vs say 'your name'.
- can then start defining symbolic differentiation libraries, for example.
Sets - flexible data collections
- need to satisfy set operations union, intersection, is-elem?, adjoin
- different representations possible: unordered and ordered lists, binary trees etc
Huffman trees for efficient encoding of messages
- information-theoretically optimal way to assign bit representations to symbols given relative frequencies

Hierarchical data structures (vertical abstraction)

Data abstraction “stacks”, i.e, we can choose different implementations at every level without affecting other levels as long as the abstraction interfaces (procedure interfaces) are preserved.
Pasted image 20230722111749.png

Multiple representations (horizontal abstraction)

We might want to use different representations for the same data interchangeably.
Pasted image 20230722143226.png

Many strategies possible to handle the resulting code complexity.

Message-passing
- intelligent data objects - get op names, apply ops internally defined and return results
- sort of like object-oriented programming, but is also a valid strategy in purely functional programming.
- allows to add new types without changing already written code
- new operations necessitates changing all the data object constructors
- best option when new types are added often but the set of supported operations is more or less fixed
Explicit dispatch
- intelligent functions - get type names, and apply the corresponding op implementation
- allows to add new operations in the same manner
- adding new data type necessitates changing all op definitions
- best option when new operations are added often but the set of types is fixed
Data-directed approach
- allows us to add new types with corresponding operations and new operations for existing types as well just by adding new entries in a dispatch table
- new type addition is seamless
- new op addition can be done in one place where we just add the corresponding ops to the dispatch table
- best option in general, but requires maintenance to keep the code base clean and might lead to code scattered all over the place

Coercion helps deal with mixed data types even when code has only been written for different data types separately.

just need to write a coercion table listing ops to convert between data types wherever possible (ex: complex numbers force accompanying int/float arguments into complex numbers)
could be smart about coercion table construction
- ex: exploiting type1->type2 and type2->type3 implementations to do type1->type3 conversions
- more sophisticated graph-based path finding could also be applied
- could also try coercing both variables to a third common type if that helps

(define (apply-generic op . args)
  (let ((type-tags (map type-tag args)))
    (let ((proc (get op type-tags)))
      (if proc
          (apply proc (map contents args))
          (if (= (length args) 2)
              (let ((type1 (car type-tags))
                    (type2 (cadr type-tags))
                    (a1 (car args))
                    (a2 (cadr args)))
                (let ((t1->t2 (get-coercion type1 type2))
                      (t2->t1 (get-coercion type2 type1)))
                  (cond (t1->t2
                         (apply-generic op (t1->t2 a1) a2))
                        (t2->t1
                         (apply-generic op a1 (t2->t1 a2)))
                        (else
                         (error "No method for these types"
                                (list op type-tags))))))
              (error "No method for these types"
                     (list op type-tags)))))))

Links

Sources

Structure and Interpretation of Computer Programs