Open Table Explorer

How to Gain Control of Your Data and Find Out Who's Lying to You

Greg Lawson

Free Data Analyst

Intro

[any material that should appear in print but not on the slide]

The Metaphorical Problems to Be Solved

We Are Starving In the Midst Of Plenty

[any material that should appear in print but not on the slide]

This Starving Metaphor

[any material that should appear in print but not on the slide]

How Do We, "Think Globally, Act Locally."

Agenda

[any material that should appear in print but not on the slide]

Project Evolution

[any material that should appear in print but not on the slide]

Fourth Architecture Considered

[any material that should appear in print but not on the slide]

What Is a Stream?

A Stream is a ruby class that abstracts as much useful data management architecture as possible. A stream can be visualized as a possibly infinite table consisting of rows and columns. Each column has a unique [[2.2.9. Column Data Types]]. This project intends to leave most data management issues to the underlying database management systems. Since Ruby Rails has been chosen as the user interface and database interface, the capabilities [[2.2.2.-ActiveRecord]] abstracts from or simulates on top of the underlying database system are the easiest to support. The tools specific to the underlying database system are still available, since [[2.2.2.-ActiveRecord]] makes few demands on the underlying database.

Stream Storage Life-cycle

Eventually each stream input is eventually consumed and deleted, to avoid an unbounded need to buy more disk drives. But for practical reasons some data's consumption is delayed indefinitely:
[any material that should appear in print but not on the slide]

Stream Patterns

Listing stream_patterns

Name
Acquisition Show Edit Destroy
Parallel_Acquisition Show Edit Destroy
Schedule Show Edit Destroy
Parse Show Edit Destroy
Storage Show Edit Destroy
R_Plot Show Edit Destroy
R_Report Show Edit Destroy
R_Value Show Edit Destroy

New Stream pattern
[any material that should appear in print but not on the slide]

Stream Pattern Arguments

Name: Acquisition

Edit | Back

Listing stream_pattern_arguments

Name Ruby type Direction
Uri URI Input Show Edit Destroy
acquisition String Output Show Edit Destroy

New Stream pattern argument
[any material that should appear in print but not on the slide]

Listing stream_methods

Stream Pattern Name Library Interface code
Acquisition HTTP net/http
@acquisition= Net::HTTP.get(@uri.uri)
Acquisition File
@acquisition=IO.read(@uri.schemelessUrl)
Acquisition Shell
@acquisition=`#{@uri.schemelessUrl} 2>&1`
Parse HTML hpricot
@parsed Data= Hpricot(@dataToParse).search(@selection)
Parse XML
@parsedData= REXML::Document.new(@dataToParse).get_elements(@selection)
Parse Regexp
@parsedData=Regexp.new(@selection).match(@dataToParse)
Parse JSON
@parsedData = JSON.parse(@dataToParse)[@selection]
Parse Binary
@parsedData=@dataToParse.unpack(@selection)
Storage Store ActiveRecord::Base
@model_class.create(@name_value_pairs)
New Stream method
[any material that should appear in print but not on the slide]

Where Should We Put the Computations?

Computational ToolScopeUsage
ActiveRecordWithin one RecordCells of same record
ActiveRecordSequential Scan of Recordscumulative sums, locals noise computation, moving average, time series windowing and filtering
Active RelationTablestable join computations
RBetween columnscompute analog models
[any material that should appear in print but not on the slide]

Rationale

[any material that should appear in print but not on the slide]

Why Open?

[any material that should appear in print but not on the slide]

Why Table?

[any material that should appear in print but not on the slide]

Why Home Energy Explorer?

[any material that should appear in print but not on the slide]

Why Ruby?

  • Ruby code can be introspective even self-referential.
  • Ruby is quietly ambitious as a programming language.

Why Ruby Rails?

[any material that should appear in print but not on the slide]

Why SQL Databases?

[any material that should appear in print but not on the slide]

Why Statistical Analysis Programs?

[any material that should appear in print but not on the slide]

Cognitive Biases

[any material that should appear in print but not on the slide]

Narrative Fallacy

[any material that should appear in print but not on the slide]

Applications

[any material that should appear in print but not on the slide]

Solar Production

[any material that should appear in print but not on the slide]

Electrical Consumption

[any material that should appear in print but not on the slide]

Load Parameters

Description Load measurement Load Power factor Period Duty cycle Mode
Toaster oven Measured-KAW 1460.0 1.0 Daily 0.01 On Show Edit Destroy
Microwave oven Measured-WUP 12.0 0.62 Weekly 0.99 Off Show Edit Destroy
Dell laptop Sleep mode Measured-KAW 13.0 Weekly 0.5 Standby Show Edit Destroy
Garbage disposal Measured-KAW 300.0 0.27 Daily 0.01 On Show Edit Destroy
Shredder Measured-KAW 170.0 0.23 0.01 On Show Edit Destroy
Dryer spinning Measured-WUP 680.0 0.54 Weekly 0.02 On Show Edit Destroy
[any material that should appear in print but not on the slide]

Levels at Which Electricity Can Be Measured

Measurement point Number of measurement points unit cost Full cost Smart cost
whole house measurement 1 $200 $200 $200
breaker branch circuit measurement 20 $10 to $200 $200 to $4,000 $500
plug-bar measurement 10 $10 to $30 $100 to $300 $100 to $300
individual device measurement 100 $100 $10,000 $100
Total cost 100 - $10,000 $1,000 or less
[any material that should appear in print but not on the slide]

Electric Monitoring Primary Cost Drivers

Cost Depends On How You Discriminate Devices

[any material that should appear in print but not on the slide]

Discrimination Algorithms

AlgorithmExampleDiscrimination math
power consumption 60 watt light from 1000W toaster oven 1D level change detection
real and reactive power consumption 100W light bulb from 100W motor 2D discrimination complex number discrimination
transient power analysis 100W motor from 100W TV measure transient ringing or resonance
schedule discrimination 60 watt front porch light from 60 watt sprinkler time-of-day probability distribution
branch discrimination identical devices on different branch circuits Gaussian elimination / generalized linear model
power consumption time sequence 300W washer from 300W electric heater state machine
multi-hypothesis tracking any ambiguous case above delay discrimination until sufficient time signature is collected
Statistically telling things apart is called discrimination (i.g. [http://en.wikipedia.org/wiki/Linear_discriminant_analysis Wikipedia linear discriminant analysis]. A device signature is a way of telling devices apart in a single measurement time series; that is devices with the same device signature cannot be told apart. The following table includes increasingly complicated algorithms needing increasingly more data to discriminate between device signatures. Each algorithm adds another dimension to the discrimination space:

Future Applications

[any material that should appear in print but not on the slide]

Health Data

DeviceBrandInterfaceHealthRawDerived
GPS and heart rateGarmin ForerunnerUSB, Windows text exercise; heart rate monitoringGPS location, pulseSpeed, calories
pedometerOmron 2-axis pedometerUSB, Windows textexercisesteps per hourDistance, calories
ScaleHemming WeighUSB, WindowsWeight, body fatimpedancepercent fat, water
Blood PressureMicrolife (Costco)USB, Windows textBlood pressureSystolic, diastolic, pulseother
Voice recorderPanasonicUSB storageVoice log, Sleep monitoringmp3
OximeterUSB, Windows text; ArduinoBlood oxygen contentother
Lung volumeArduinoLung volume, inspiration and expiration rate
4 camera DVDUS Security SolutionsJPEG, WindowsSleep monitoring
[any material that should appear in print but not on the slide]

Project Hostinge

  • Github
    • git Software Configuration Management
    • Wiki
    • Issue Tracker
    • HTML Pages
  • CDE - Code, Data, Environment packaging - purportedly automatically packages up everything required to execute a Linux command on another computer without any installation or configuration. Does require same kernel version (e.g. 2.6X).
  • Intranet - Trusted Users
  • Heroku - Pay as you compute Ruby Rails Hosting. I couldn't afford storage costs of rapid acquisition.
  • Vision of a Grand and Glorious Future
[any material that should appear in print but not on the slide]

Serving Data

[any material that should appear in print but not on the slide]

Overview plots in the browser

[any material that should appear in print but not on the slide]

Vision of a Grand and Glorious Future

[any material that should appear in print but not on the slide]

Social Incentives

[any material that should appear in print but not on the slide]

Licensing - How to Share Data?

[any material that should appear in print but not on the slide]

My Opinion, But Don't Sue Me If I'm Wrong

[any material that should appear in print but not on the slide]

Conclusion

Normal People ThinkThis Project Thinks
Software projects should only talk about software.Software projects generate an endless stream of questions, whose answers may be in engineering, psychology, or information theory.
Programs should get bigger with time as features are added.Programs should get smaller in time, as they are re-factored and conceptually simplified.
We know what we are doing.We are no where near as smart as we think we are. Cognitive biases are everywhere.
Statistics and information theory are boring subjects full of impenetrable jargon.True. But the tedium can be largely delegated to the computer. Exploratory Data Analysis is an essential learning tool.
Early release is essential to gain mind share.Solving problems takes time. The bottleneck is our learning curve.
[any material that should appear in print but not on the slide]

Levels of Commitment

[any material that should appear in print but not on the slide]

Project Interface

[any material that should appear in print but not on the slide]

Power Law Added Value of Aggregating Statistical Databases

[any material that should appear in print but not on the slide]