Subject: Re: Re: Proposal: Sum type branches as extended types (as Type!Constructor) Newsgroups: gmane.comp.lang.haskell.cafe Date: Saturday 5th June 2010 01:13:05 UTC (over 7 years ago) Gabriel Riba wrote: > New proposal draft: > > Proposal: Type supplement for constructor specific uses of sum types > > Purpose: Avoid error clauses (runtime errors), exception control or Maybe > types in partially defined (constructor specific) functions on sum types. > > As an example, with > > data List a = Nil | Cons a (List a) > > * Actual system, with runtime errors (as in GHC Data.List head) or > exception throwing or optional Maybe results. > > hd :: List a -> a > hd (Cons x _) -> x > hd Nil -> error "error: hd: empty list" -- error or exception throwing > > * Proposed system extending types with a suffix @ Constructor or @ > {Constructor1, Constructor2, ..} > > hd :: List @ Cons a -> a > hd (Cons x _) = x > > The caller must do pattern matching before applying the constructor-specific > function. Since the goal is to propagate case analysis information, the syntax should reflect that. That is, there should be support for nested patterns, e.g., cdar :: [a]@{_:_:_} -> a cdar (_:x:_) = x cadr :: [[a]]@{(_:_):_} -> a cadr ((_:xs):_) = xs headFromJust :: [Maybe a]@{Just _ : _} -> a ... The type@cons syntax is a nice shorthand, but we need to express the arguments to the data constructors in the extended syntax in order to support nested patterns. For delimiting multiple alternatives, it's not clear that comma is the best delimiter to use, especially since it could be the data constructor for tuples. Perhaps using ; or | would be better. Unless there's a syntactic reason for preferring braces over parentheses, perhaps we should just use parentheses for symmetry with as-patterns. Finally, there should also be support for negative patterns, i.e., propagation of the failure to match a pattern. One place this is useful is for distinguishing 0 from other numbers, which allows removing the error branches from functions like division. Sometimes we can't enumerate all the positive patterns we want to allow, but it's easy to express what should be disallowed. To match case analysis we should allow for a conjunction of negative patterns followed by a positive pattern. Or, if we want to incorporate multiple positive patterns, then a conjunction of negative patterns followed by a disjunction of positive patterns. (Disjunctive case matching has been an independent proposal in the past, and there's nothing prohibiting supporting it.) Thus, if we use | to delimit disjunctions, & to delimit conjunctions, and \\ to separate the disjuncts from the conjuncts, given the following case analysis: case x : T of p1 y1... -> e1 p2 y2... -> e2 _ -> eF The variable x has type T outside of the case expression. Within the branch e1 it is given the refinement type T@{p1 _...} where variables bound by the pattern are replaced with wildcards. In branch e2 it is given the refinement type T@{p2 _... \\ p1 _...}. This can be simplified to T@{p2 _...} if the head constructors of p2 and p1 are distinct. And in the eF branch x would be given the refinement type T@{_ \\ p1 _... & p2_...}. If this semantics is too hard to implement, we could instead require the use of as-patterns for introducing the refinements. The variable introduced by @ in the as-pattern would be given the refinement type, but the scrutinee would continue to have the unrefined type. This latter semantics is common in dependently typed languages, but it's verbose and ugly so it'd be nice to avoid it if we can. Other notes: Case matching on non-variable expressions would gain no extra support, since we have no variable to associate the refinement information with (unless as-patterns are used). A refinement type T@{p1} can always be weakened to T@{p1 | p2}. Similarly, a refinement type can always be weakened by erasing it. For type inference, I'd suggest that functions which do not have rigid signatures are treated the way they currently are; that is, all refinement information is weakened away unless it is explicitly requested or returned by a function's type signature. This could be improved upon later, but seems like the most reasonable place to start. One complication of trying to infer refinement types is that if we are too liberal then we won't catch bugs arising from non-exhaustive pattern matching. Syntax-wise, there's no particular reason for distinguishing difference from conjunctions under difference. That is, the type T@{... \\ p1 & p2} could just as well be written T@{... \\ p1 \\ p2}. And there's no need for conjunctions under disjunction because we can unify the patterns to get their intersection. Thus, it might be best to just have disjunction and difference for simplicity. -- Live well, ~wren |
|||