CharlesCNorton

Add float16 scientific LUT ops

b8c48ca 5 months ago

8.62 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- threshold-logic
	- arithmetic
	- verified-computing
	- neuromorphic
	- digital-circuits
	- frozen-weights
	pipeline_tag: other
	---

	# Threshold Calculus

	Digital circuits encoded as neural network weights.

	Each gate is a threshold logic unit: `output = step(weights * inputs + bias)`. The step function fires when the weighted sum >= 0. This maps digital logic to tensor operations.

	## What's Here

	\| File \| Description \|
	\|------\|-------------\|
	\| `arithmetic.safetensors` \| 626,374 tensors encoding 208,788 gates \|
	\| `eval.py` \| Test harness (211,581 tests) \|
	\| `build.py` \| Builds tensors and infers gate connectivity \|

	## Circuits

	Float16 (IEEE 754)
	- `float16.add`, `float16.sub`, `float16.mul`, `float16.div`
	- `float16.sqrt`, `float16.rsqrt`, `float16.pow`
	- `float16.exp`, `float16.ln`, `float16.log2`
	- `float16.sin`, `float16.cos`, `float16.tan`, `float16.tanh`
	- `float16.neg`, `float16.abs`, `float16.cmp`
	- `float16.toint`, `float16.fromint`
	- `float16.pack`, `float16.unpack`, `float16.normalize`

	Handles NaN, Inf, zero, subnormals. Mantissa alignment via barrel shifter. Normalization via CLZ.

	Accuracy/rounding:
	- Unary transcendental ops are LUT-backed over all 65,536 float16 inputs.
	- Outputs match torch.float16 results (round-to-nearest-even); NaNs are canonicalized to 0x7E00.
	- `float16.pow` is defined as exp(b * ln(a)) with float16 rounding at each stage.

	16-bit Integer
	- Adders: half, full, ripple carry (2/4/16 bit), add-with-carry (adc16bit)
	- Subtraction: sub16bit, sbc16bit, neg16bit
	- Comparison: cmp16bit, equality16bit
	- Shifts: asr16bit, rol16bit, ror16bit
	- CLZ: 16-bit

	Modular Arithmetic
	- mod2 through mod12 (divisibility testing)

	Boolean
	- AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES

	Threshold
	- k-of-n gates (1-of-8 through 8-of-8)
	- majority, minority, atleastk, atmostk, exactlyk

	Pattern Recognition
	- popcount, allzeros, allones, onehotdetector
	- symmetry8bit, alternating8bit, hammingdistance8bit
	- leadingones, trailingones, runlength

	Combinational
	- decoder3to8, encoder
	- multiplexer (2/4/8 to 1), demultiplexer (1 to 2/4/8)
	- barrelshifter8bit, priorityencoder8bit

	## How It Works

	A threshold gate computes:

	```
	output = 1 if (w1x1 + w2x2 + ... + wn*xn + bias) >= 0 else 0
	```

	This is a perceptron with Heaviside step activation.

	AND gate: weights = [1, 1], bias = -1.5
	- (0,0): 0 + 0 - 1.5 = -1.5 < 0 -> 0
	- (0,1): 0 + 1 - 1.5 = -0.5 < 0 -> 0
	- (1,0): 1 + 0 - 1.5 = -0.5 < 0 -> 0
	- (1,1): 1 + 1 - 1.5 = 0.5 >= 0 -> 1

	XOR requires two layers (not linearly separable):
	- Layer 1: OR and NAND in parallel
	- Layer 2: AND of both outputs

	## Float16 Architecture (Short)

	High-level dataflow:

	```
	float16.<op>
	a,b -> unpack -> classify -> core op -> normalize/round -> pack -> out
	```

	Step-by-step (condensed):
	1) Unpack sign/exponent/mantissa. Subnormals use implicit 0, normals use implicit 1.
	2) Classify inputs: zero, subnormal, normal, inf, NaN.
	3) Core op:
	- add/sub: align exponents, add/sub mantissas, compute sign.
	- mul/div: add/sub exponents (minus bias), multiply/divide mantissas.
	- unary LUT: lookup output for each 16-bit input (torch.float16), with NaN canonicalization.
	- pow: ln(a) -> mul(b, ln(a)) -> exp, rounded at each stage.
	4) Normalize and round-to-nearest-even (CLZ + shifts).
	5) Pack sign/exponent/mantissa and mux special cases (NaN/Inf/zero).

	## Self-Documenting Format

	Each gate has three tensors in `arithmetic.safetensors`:
	- `.weight` -- input weights
	- `.bias` -- threshold
	- `.inputs` -- int64 tensor of signal IDs (ordered to match `.weight`)

	Signal registry in metadata maps IDs to names:

	```python
	from safetensors import safe_open
	import json

	with safe_open('arithmetic.safetensors', framework='pt') as f:
	registry = json.loads(f.metadata()['signal_registry'])
	inputs = f.get_tensor('boolean.and.inputs')
	names = [registry[str(i.item())] for i in inputs]
	# ['$a', '$b']
	```

	Signal naming:
	- `$name` -- circuit input (e.g., `$a`, `$dividend[0]`)
	- `#0`, `#1` -- constants
	- `gate.path` -- output of another gate

	Format details:
	- Metadata includes `signal_registry` (JSON map from ID to name) and `format_version` (currently `2.0`).
	- `.inputs` stores global signal IDs; these IDs are resolved through `signal_registry`.
	- External inputs are names starting with `$` or containing `.$` (e.g., `float16.add.$a[3]`).
	- All gates include `.inputs`; `build.py` infers them and `--inputs-coverage` fails if resolution is missing.

	## How to Reproduce

	Rebuild tensors:

	```bash
	python build.py
	```

	Run full evaluation (always full + verbose):

	```bash
	python eval.py
	```

	Run coverage and input-routing validation:

	```bash
	python eval.py --coverage --inputs-coverage
	```

	Expected runtimes (ballpark, CPU dependent):
	- `build.py`: ~1-2 minutes, produces ~247 MB `arithmetic.safetensors`
	- `eval.py --coverage --inputs-coverage`: ~3-4 minutes for 211,581 tests

	## Running Eval

	```bash
	python eval.py
	```

	Tests all circuits. Small circuits are exhaustive; 16-bit arithmetic is sampled on grids (plus edge cases). Float16 tests cover special cases (NaN, Inf, +/-0, subnormals) plus normal arithmetic.
	Eval runs full + verbose by default; there is no quick/verbose mode. Use --circuit to filter reported circuits.

	For coverage and input-routing validation:

	```bash
	python eval.py --coverage --inputs-coverage
	```

	`--inputs-coverage` evaluates gates via their `.inputs` tensors using seeded external inputs and explicit overrides, and fails if inputs cannot be resolved. This is for coverage and routing sanity, not a correctness proof.

	## Python Calculator Interface (Gate-Level)

	`calculator.py` provides a pure gate-level evaluator that uses only `.inputs` + `.weight`/`.bias` (no arithmetic shortcuts). This is intended as a rigorous proof-of-concept for using the weights as a calculator.
	Expression mode routes even constants through a circuit (via `float16.add` with +0) to ensure gate-level evaluation.

	Examples:

	```bash
	python calculator.py float16.add 1.0 2.0
	python calculator.py float16.sqrt 2.0
	python calculator.py float16.add --inputs a=0x3c00 b=0x4000 --hex
	python calculator.py --expr "1 + 1"
	python calculator.py "sin(pi / 2)"
	python calculator.py --expr "1 + 1" --json
	python calculator.py "pi" --strict
	```

	Programmatic use:

	```python
	from calculator import ThresholdCalculator

	calc = ThresholdCalculator("arithmetic.safetensors")
	out, _ = calc.float16_binop("add", 1.0, 2.0)
	print(out)
	```

	## Development History

	Started as an 8-bit CPU project. Built boolean gates, then arithmetic (adders -> multipliers -> dividers), then CPU control logic. The CPU worked but the arithmetic core turned out to be the useful part, so it was extracted.

	Float16 was added later. The commit history shows the iterative process--float16.add went through multiple rounds of bug fixes for edge cases (zero handling, sign logic, normalization). Mul and div required multi-bit carry infrastructure.

	## Project Origin

	This began as an attempt to build a complete threshold-logic CPU. The CPU is in a separate repo (phanerozoic/8bit-threshold-computer). This repo focuses on the arithmetic core.

	## Roadmap

	Done:
	- Float16 core (add/sub/mul/div)
	- Float16 utilities (pack/unpack/normalize/conversions)
	- Float16 IEEE-754 half compliance for add/sub/mul/div + toint/fromint (including subnormals)
	- Float16 unary LUTs (sqrt/rsqrt/exp/ln/log2/log10/sin/cos/tan/tanh/asin/acos/atan/sinh/cosh/floor/ceil/round)
	- Float16 pow via exp(b * ln(a))
	- 16-bit integer arithmetic (add/sub/cmp/shifts/CLZ)
	- Boolean, threshold, modular, pattern recognition, combinational

	Next:
	- Add 32-bit integer arithmetic circuits (add/sub/shift/compare, then mul/div).
	- Add higher precision modes (float32 or fixed-point) for typical calculator accuracy.

	Cleanup:
	- None (8-bit arithmetic scaffolding removed)

	## TODO (Unified)

	- 32-bit integer circuits and an int32 calculator mode.
	- Degree-mode trig and implicit multiplication parsing.
	- Higher-precision arithmetic (float32 or fixed-point).
	- Complex-number support (or explicit domain errors).

	## Hugging Face Space (Proof of Concept)

	A minimal Space UI is included in `app.py`. It uses the gate-level evaluator only (no arithmetic shortcuts) and exposes a normal calculator-style expression input.

	Example expressions:
	- `1 + 1`
	- `sin(pi / 2)`
	- `exp(ln(2))`

	Notes:
	- All results are float16 (IEEE-754 half) and may be rounded.
	- `pow` uses the `exp(b*ln(a))` definition; negative bases yield NaN.

	## License

	Apache 2.0