today's todo:
// INITIAL SUPPORT FINISHED - add support for three valued logic
- create typing system which dynamically types the memory and revists
portions to double check the validity of claims.
lets talk about the type checker. it is going to based loosely on what
the current implementation (brian's) does. the only difference is that
the initial data structure pointers ('d' in this case) need to be
checked... from then on, only pointers need to be checked. (though this
begs the question whether we will continue to attempt to
................
short term todo -------------------------------------------------------
- add checks for each pointer dereference (before dereference!) to make
sure the pointer points to valid memory... this does not provide type
safety but it does provide memory safety
- add support for three valued logic in the system... i suggest maybe a
goto mechanism or a exception mechanism... or maybe something a lot
more clever... anyway, gotta do three valued logic
- start thinking of ways to deal with limiting space consumption (as
well as measuring it... (add methods to brian's code to dump sizes of
sets and what not... we want actual sizes... bytes.. perhaps
- at some point it might be a good idea to handle cyclic graphs!!!
--------------------
-- arg... 3 valued logic --------
alright... so it turns out that three valued logic is needed because
doubly linked lists are not 1->1 ... they are infact 1->1/0... which is
not 1->1... and we can not expect the programmer to include guards all
the time .... so maybe's it is!
about typing ***********************************************************
ok... so it turns out that typing is not that easy. first off, there is
a distinction between typing the root types and then tying
pointers. when you start off you can go ahead and type the regions that
are global variables. these regions that are typed use a different
system for testing types. basically when you are typing something and
determining sizes of structures and what not its important not to try to
type data structures as you go along because you may find that it is
impossible getting in an infinite loop... instead you must just look at
the fields and verify that they are in the correct regions. you must do
this because otherwise it is impossible to type things... the claim
might be that given a heap and a set of accesses to that heap with
specific claims as to the types of these accesses, is it possible to
determine whether that access is well typed? it may be better to only
type what you actually need... for instance always do field type checks
... of course the question might be, when do you go ahead and type an
entire structure.
now typing is of course a moot point in a memory-safe, type-safe
language like Java.
there is another issue of explicit array bounds checks versus allowing
these to be checked by the typemaps. if you have two adjacent arrays
with the same types back to back then an out of bounds check on the
first may not fail in the typemap (unless it goes beyond the 2nd array
and you are lucky even in this case.) explicit array bounds checks may
be in order.
the issue may boil down to this: what can we guarantee in a system where
we check things at a structure level in some places but at a field level
in other places? can we guarantee anything? if not, perhaps we should
just use the typemaps as valid memory checks at a field level so that we
don't get unexpected crashes at runtime...
in short: this problem seems to be a lot more complicated then what was
perceived initially and the returns seem to be diminishing the more we
think about it. certainly the system now proposed (check root data
structures and any pointers accessed) seems to be inately flawed in some
sense (though i can not put my finger on it :)
i think my time is better spent working on something else... ?
************************************************************************
#STOPPED#
about typemaps : allow flag to gear level of checks. nothing, bounds,
the types. implement bounds check for now.
can we implement typemaps so that they are fast usually and slow if
something is wrong?
we're interested in space. hit theoretically low without reaching
quadratic time
create optimum benchmark sets... recognize typpical specs and generalize
heuristics to deal with these...
MARTIN's IDEA ************************************************* 7/9/03
use memory/type safe language and check only on reads (when data is
accessed) check consistency properties only when the property needs to
be consistent... only when it needs the property to be true
**********************************************************************
issue: what if you are building the model and you are adding elements to
a relation and the sets that compose those relations don't exist yet?
hmm... maybe the question boils down to a question of whether the items
being added to a relation are implied in the addition... that is if we
have:
R: S->S
condition(a) => in R
then if condition(a) => b in S then we never need to check for that
relations (R)'s type... we only need to check if condition(a) => b in S
hold ups (statically).
if not, in this case the question is whether its better to do
a post exhaustive step or a per addition check ... in the per addition
step we will obviously need to ensure that the sets that compose the
domain and range of a relation are built prior to the rules that add
elements to relations (i'm thinking that its better to do a per addition
check because the post exhaustive step seems just as expensive and
messier!)
question still remains though: do we need to build the sets prior?
well... of course we do unless we want the worklist to also trigger
rules based upon the additions to the sets of the respective
relations... of course we would only be interested if a particular
relation addition was blocked because of such a check...
what do we do now? well we don't even properly include required
constraints in the required symbol list... that is being changed right
now ... we'll have to compare this with the older graph
ok... the answer is that the graph changes once we make this change
... i'm checking out if cycles appear... i'm also going to consider the
problem of not having a well defined relation domain/range
(i.e. integers rather than a strictly typed set)
------------------------------------------------------------------------
1. check at compile time how relations are used. if they are used in
both directions (meaning inverse and otherwise) generate both
simultaneously so that checks are fast .... otherwise only generate
one of them that is used... code for generation should automatically
handle the necessary switches and double-adding...
new issues - it is not always possible to precalculate (or even
calculate) an offset for a particular field... this would be the case if
a field relied on a particular structure's fields and that fields'
offset relied upon itself or something beyond it in its own
structure...
compilation plan xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1. determine preconditions (required sets/relations), post conditions
(generated sets), and measures of difficult in checking constraints
2. find best ordering of rule building to minimize the cost of space
and time.
3. build ordering and generate code.
set traversal xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1. when accessing set traversal costs assume that sets do not have
cycles and that set traversal can occur in constant time if necessary.
2. on first run, do a linear space traversal to determine a time bound
on which the set traversal should finish. from then on use a constant
space traversal with a dynamic time bound.
accessing ease of relation building xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1. rank ease of relation building by whether the relation is a field
or a constant time countable function of the fields.
changing lattices xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1. examine how relations are built. for instance, if we can guarantee
that duplicates shouldn't occur, and we might be able to if we use a
hash just in case.
2. if a relation is only used for counting and we are guaranteed not
to double count than just keep the count around (saves massive space)
plan of action xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1. determine rules for generating pre/post conditions and cost measures
for rule building.
2. devise an algorithm to schedule these rule building and constraint
checking mechanisms.
3. be warned about tight coupling of set building and consistency
checking. it may be better to decouple them and set up pre/post
conditions on the consistency checks (per quantifier)
4. test scheduler against file example and compare to hand coded
plan of action revised xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1. perform topological sort of all rules, TS[nodes]
2. mark sets that need to be built in linear space if
a. it appears inside a guard expression or inclusion expr
b. it is marked "build" in the spec (see redundant)
c. if it appears as the range in a relation that is not
statically type safe
3. mark relations that need to be built in linear space if
a. they appear in constraints and are difficult to build
b. a cycle has been found
4. detect singletons and replace with constants and wrap traversals with
'if-else' checks. upon failure define sets that the quantifier
appears in as empty and check as usual
5. for all sets that appear in constraints and relations that need to be
built, sort them topologically by TS and the generate each one in order.
6. pile up set building on top of previous set traversals unless a set
is marked 'build'. then build it and each traversal dependent on build
should begin afresh.
7. at each interface point, assess what can be checked and check it. the
idea is that since we are guaranteeing that everything that needs to
traversed is traversed if we check whenever we can, what we can then
we should end up doing a complete check.
hacks xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1. statically find singleton sets and optimize them out of quantifiers
issues xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1. domain type problems with relations during fast traversal
S(node*);
R(S->S);
[], true => root in S
[forall s in S], s.next != null => ~~ in R
[forall s in S], s.next != null AND s.flag = 1 => s.next in S
the problem here is that if you have a simple linked list,
root(1)->element(0) where the parantheses mark the value of .flag
then the above system will add to R but element is
not in S because flag = 0. In brian's system the relations undergo a
post process to determine if any elements exist don't exist in the
domain/range of the relation.
the problem appears when the relation is being quickly traversed (via
pattern matching). now it is possible for a relation to be returned
that is not truly a match. the solution we have determined at this
point is to assume that most specs won't have this problem (its
arguably a problem in the spec!) and to basically do perform these
two static tests:
if [forall s in S], s.next != null => ~~~~ in R
AND s.next != null => s.next in S
then R is safe and no fast traversal should return non-elements,
regardless of the state of the data structure. If it can not be
determined whether a relation is type safe, then a linear space
storage of S must be done so that the relations can be
existence-checked against the stored set
2. redundancy in sets may result in slowdown
assume we have a set of file references (set R) which reference files
(set F). These files can contain a large amount of block
references (set B). Now assume that we have one file but thousands of
references to that file. If there a test on the blocks of a file than
there can be a slow down caused by a constant space traversal of the
sets that is disproportiate to the gain seen.
clarification. lets assume that we test that blocks are marked as
used. if sets R and F and B were generated in linear space than the
number of checks done to test the constraint on B would be O(B).
now assume that we traverse R, F, and B in constant space. Because
each reference points to the same file and we have no way of knowing
in constant space whether we are rechecking each file over and over,
we will check the constraints on B O(R*B) times. this is much worse
and represents a jump in the time complexity.
static analysis of the specs will not be able to determine such a
problem as a broken data structure is a perfectly viable input to the
system. therefore it is necessary to have the program be given hints
about the redundancy of structures. If a structure is highly
redundant than that structure should be stored in linear space to
fight the problem of quadratic blow up.
the hint could be in the form of a keyword addition to the space file
language. another option is to allow the checker to dynamically
detect redundancy (how? maybe hash table? maybe sparse linear time
checks) and change the way its builds sets realtime. for simplicity,
the spec hint will be used.
3. how do page accesses translate to incremental model checks?
TBD.
~~