Repair/RepairCompiler/compile.plan

   1
   2 today's todo:
   3
   4 // INITIAL SUPPORT FINISHED - add support for three valued logic
   5
   6 - create typing system which dynamically types the memory and revists
   7   portions to double check the validity of claims.
   8
   9 lets talk about the type checker. it is going to based loosely on what
  10 the current implementation (brian's) does. the only difference is that
  11 the initial data structure pointers ('d' in this case) need to be
  12 checked... from then on, only pointers need to be checked. (though this
  13 begs the question whether we will continue to attempt to
  14
  15 ................
  16
  17
  18
  19
  20
  21
  22
  23 short term todo -------------------------------------------------------
  24
  25 - add checks for each pointer dereference (before dereference!) to make
  26   sure the pointer points to valid memory... this does not provide type
  27   safety but it does provide memory safety
  28
  29 - add support for three valued logic in the system... i suggest maybe a
  30   goto mechanism or a exception mechanism... or maybe something a lot
  31   more clever... anyway, gotta do three valued logic
  32
  33 - start thinking of ways to deal with limiting space consumption (as
  34   well as measuring it... (add methods to brian's code to dump sizes of
  35   sets and what not... we want actual sizes... bytes.. perhaps
  36
  37 - at some point it might be a good idea to handle cyclic graphs!!!
  38
  39 --------------------
  40
  41
  42
  43 -- arg... 3 valued logic --------
  44 alright... so it turns out that three valued logic is needed because
  45 doubly linked lists are not 1->1 ... they are infact 1->1/0... which is
  46 not 1->1... and we can not expect the programmer to include guards all
  47 the time .... so maybe's it is!
  48
  49
  50
  51
  52 about typing ***********************************************************
  53 ok... so it turns out that typing is not that easy. first off, there is
  54 a distinction between typing the root types and then tying
  55 pointers. when you start off you can go ahead and type the regions that
  56 are global variables. these regions that are typed use a different
  57 system for testing types. basically when you are typing something and
  58 determining sizes of structures and what not its important not to try to
  59 type data structures as you go along because you may find that it is
  60 impossible getting in an infinite loop... instead you must just look at
  61 the fields and verify that they are in the correct regions. you must do
  62 this because otherwise it is impossible to type things... the claim
  63 might be that given a heap and a set of accesses to that heap with
  64 specific claims as to the types of these accesses, is it possible to
  65 determine whether that access is well typed? it may be better to only
  66 type what you actually need... for instance always do field type checks
  67 ... of course the question might be, when do you go ahead and type an
  68 entire structure.
  69
  70 now typing is of course a moot point in a memory-safe, type-safe
  71 language like Java.
  72
  73 there is another issue of explicit array bounds checks versus allowing
  74 these to be checked by the typemaps. if you have two adjacent arrays
  75 with the same types back to back then an out of bounds check on the
  76 first may not fail in the typemap (unless it goes beyond the 2nd array
  77 and you are lucky even in this case.) explicit array bounds checks may
  78 be in order.
  79
  80 the issue may boil down to this: what can we guarantee in a system where
  81 we check things at a structure level in some places but at a field level
  82 in other places? can we guarantee anything? if not, perhaps we should
  83 just use the typemaps as valid memory checks at a field level so that we
  84 don't get unexpected crashes at runtime...
  85
  86 in short: this problem seems to be a lot more complicated then what was
  87 perceived initially and the returns seem to be diminishing the more we
  88 think about it. certainly the system now proposed (check root data
  89 structures and any pointers accessed) seems to be inately flawed in some
  90 sense (though i can not put my finger on it :)
  91
  92 i think my time is better spent working on something else... ?
  93 ************************************************************************
  94
  95
  96 #STOPPED#
  97
  98 about typemaps : allow flag to gear level of checks. nothing, bounds,
  99 the types. implement bounds check for now.
 100
 101
 102
 103 can we implement typemaps so that they are fast usually and slow if
 104 something is wrong?
 105
 106 we're interested in space. hit theoretically low without reaching
 107 quadratic time
 108
 109 create optimum benchmark sets... recognize typpical specs and generalize
 110 heuristics to deal with these...
 111
 112
 113
 114
 115 MARTIN's IDEA ************************************************* 7/9/03
 116 use memory/type safe language and check only on reads (when data is
 117 accessed) check consistency properties only when the property needs to
 118 be consistent... only when it needs the property to be true
 119 **********************************************************************
 120
 121 issue: what if you are building the model and you are adding elements to
 122 a relation and the sets that compose those relations don't exist yet?
 123 hmm... maybe the question boils down to a question of whether the items
 124 being added to a relation are implied in the addition... that is if we
 125 have:
 126
 127 R: S->S
 128 condition(a) => <a, f(a)> in R
 129
 130 then if condition(a) => b in S then we never need to check for that
 131 relations (R)'s type... we only need to check if condition(a) => b in S
 132 hold ups (statically).
 133
 134 if not, in this case the question is whether its better to do
 135 a post exhaustive step or a per addition check ... in the per addition
 136 step we will obviously need to ensure that the sets that compose the
 137 domain and range of a relation are built prior to the rules that add
 138 elements to relations (i'm thinking that its better to do a per addition
 139 check because the post exhaustive step seems just as expensive and
 140 messier!)
 141
 142 question still remains though: do we need to build the sets prior?
 143 well... of course we do unless we want the worklist to also trigger
 144 rules based upon the additions to the sets of the respective
 145 relations... of course we would only be interested if a particular
 146 relation addition was blocked because of such a check...
 147
 148 what do we do now? well we don't even properly include required
 149 constraints in the required symbol list... that is being changed right
 150 now ... we'll have to compare this with the older graph
 151
 152 ok... the answer is that the graph changes once we make this change
 153 ... i'm checking out if cycles appear... i'm also going to consider the
 154 problem of not having a well defined relation domain/range
 155 (i.e. integers rather than a strictly typed set)
 156
 157 ------------------------------------------------------------------------
 158
 159 1. check at compile time how relations are used. if they are used in
 160    both directions (meaning inverse and otherwise) generate both
 161    simultaneously so that checks are fast .... otherwise only generate
 162    one of them that is used... code for generation should automatically
 163    handle the necessary switches and double-adding...
 164
 165
 166 new issues - it is not always possible to precalculate (or even
 167 calculate) an offset for a particular field... this would be the case if
 168 a field relied on a particular structure's fields and that fields'
 169 offset relied upon itself or something beyond it in its own
 170 structure...
 171
 172 compilation plan xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 173
 174 1. determine preconditions (required sets/relations), post conditions
 175    (generated sets), and measures of difficult in checking constraints
 176
 177 2. find best ordering of rule building to minimize the cost of space
 178    and time.
 179
 180 3. build ordering and generate code.
 181
 182
 183
 184 set traversal xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 185
 186 1. when accessing set traversal costs assume that sets do not have
 187    cycles and that set traversal can occur in constant time if necessary.
 188
 189 2. on first run, do a linear space traversal to determine a time bound
 190    on which the set traversal should finish. from then on use a constant
 191    space traversal with a dynamic time bound.
 192
 193
 194
 195 accessing ease of relation building xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 196
 197 1. rank ease of relation building by whether the relation is a field
 198 or a constant time countable function of the fields.
 199
 200
 201
 202 changing lattices xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 203
 204 1. examine how relations are built. for instance, if we can guarantee
 205    that duplicates shouldn't occur, and we might be able to if we use a
 206    hash just in case.
 207
 208 2. if a relation is only used for counting and we are guaranteed not
 209    to double count than just keep the count around (saves massive space)
 210
 211
 212
 213 plan of action xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 214
 215 1. determine rules for generating pre/post conditions and cost measures
 216    for rule building.
 217
 218 2. devise an algorithm to schedule these rule building and constraint
 219    checking mechanisms.
 220
 221 3. be warned about tight coupling of set building and consistency
 222    checking. it may be better to decouple them and set up pre/post
 223    conditions on the consistency checks (per quantifier)
 224
 225 4. test scheduler against file example and compare to hand coded
 226
 227
 228
 229 plan of action revised xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 230
 231 1. perform topological sort of all rules, TS[nodes]
 232
 233 2. mark sets that need to be built in linear space if
 234    a. it appears inside a guard expression or inclusion expr
 235    b. it is marked "build" in the spec (see redundant)
 236    c. if it appears as the range in a relation that is not
 237       statically type safe
 238
 239 3. mark relations that need to be built in linear space if
 240    a. they appear in constraints and are difficult to build
 241    b. a cycle has been found
 242
 243 4. detect singletons and replace with constants and wrap traversals with
 244    'if-else' checks. upon failure define sets that the quantifier
 245    appears in as empty and check as usual
 246
 247 5. for all sets that appear in constraints and relations that need to be
 248    built, sort them topologically by TS and the generate each one in order.
 249
 250 6. pile up set building on top of previous set traversals unless a set
 251    is marked 'build'. then build it and each traversal dependent on build
 252    should begin afresh.
 253
 254 7. at each interface point, assess what can be checked and check it. the
 255    idea is that since we are guaranteeing that everything that needs to
 256    traversed is traversed if we check whenever we can, what we can then
 257    we should end up doing a complete check.
 258
 259
 260
 261 hacks xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 262
 263 1. statically find singleton sets and optimize them out of quantifiers
 264
 265
 266
 267 issues xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 268
 269 1. domain type problems with relations during fast traversal
 270
 271    S(node*);
 272    R(S->S);
 273    [], true => root in S
 274    [forall s in S], s.next != null => <s, s.next> in R
 275    [forall s in S], s.next != null AND s.flag = 1 => s.next in S
 276
 277    the problem here is that if you have a simple linked list,
 278    root(1)->element(0) where the parantheses mark the value of .flag
 279    then the above system will add <root, element> to R but element is
 280    not in S because flag = 0. In brian's system the relations undergo a
 281    post process to determine if any elements exist don't exist in the
 282    domain/range of the relation.
 283
 284    the problem appears when the relation is being quickly traversed (via
 285    pattern matching). now it is possible for a relation to be returned
 286    that is not truly a match. the solution we have determined at this
 287    point is to assume that most specs won't have this problem (its
 288    arguably a problem in the spec!) and to basically do perform these
 289    two static tests:
 290
 291    if [forall s in S], s.next != null => <s, s.next> in R
 292       AND  s.next != null => s.next in S
 293
 294    then R is safe and no fast traversal should return non-elements,
 295    regardless of the state of the data structure. If it can not be
 296    determined whether a relation is type safe, then a linear space
 297    storage of S must be done so that the relations can be
 298    existence-checked against the stored set
 299
 300 2. redundancy in sets may result in slowdown
 301
 302    assume we have a set of file references (set R) which reference files
 303    (set F). These files can contain a large amount of block
 304    references (set B). Now assume that we have one file but thousands of
 305    references to that file. If there a test on the blocks of a file than
 306    there can be a slow down caused by a constant space traversal of the
 307    sets that is disproportiate to the gain seen.
 308
 309    clarification. lets assume that we test that blocks are marked as
 310    used. if sets R and F and B were generated in linear space than the
 311    number of checks done to test the constraint on B would be O(B).
 312
 313    now assume that we traverse R, F, and B in constant space. Because
 314    each reference points to the same file and we have no way of knowing
 315    in constant space whether we are rechecking each file over and over,
 316    we will check the constraints on B O(R*B) times. this is much worse
 317    and represents a jump in the time complexity.
 318
 319    static analysis of the specs will not be able to determine such a
 320    problem as a broken data structure is a perfectly viable input to the
 321    system. therefore it is necessary to have the program be given hints
 322    about the redundancy of structures. If a structure is highly
 323    redundant than that structure should be stored in linear space to
 324    fight the problem of quadratic blow up.
 325
 326    the hint could be in the form of a keyword addition to the space file
 327    language. another option is to allow the checker to dynamically
 328    detect redundancy (how? maybe hash table? maybe sparse linear time
 329    checks) and change the way its builds sets realtime. for simplicity,
 330    the spec hint will be used.
 331
 332 3. how do page accesses translate to incremental model checks?
 333
 334    TBD.
 335
 336