User Documentation
svec.sql_in
Go to the documentation of this file.
00001 /* ----------------------------------------------------------------------- *//** 
00002  *
00003  * @file svec.sql_in
00004  *
00005  * @brief SQL type definitions and functions for sparse vector data type
00006  *        <tt>svec</tt>
00007  *
00008  * @sa For an introduction to the sparse vector implementation, see the module
00009  *     description \ref grp_svec.
00010  *
00011  *//* ----------------------------------------------------------------------- */
00012 
00013 m4_include(`SQLCommon.m4')
00014 
00015 /**
00016 @addtogroup grp_svec
00017 
00018 @about
00019 
00020 This module implements a sparse vector data type named "svec", which 
00021 gives compressed storage of sparse vectors with many duplicate elements.
00022 
00023 When we use arrays of floating point numbers for various calculations, 
00024     we will sometimes have long runs of zeros (or some other default value). 
00025     This is common in applications like scientific computing, 
00026     retail optimization, and text processing. Each floating point number takes 
00027     8 bytes of storage in memory and/or disk, so saving those zeros is often 
00028     worthwhile. There are also many computations that can benefit from skipping
00029     over the zeros.
00030 
00031     To focus the discussion, consider, for example, the following 
00032     array of doubles stored as a Postgres/GP "float8[]" data type:
00033 
00034 \code
00035 '{0, 33,...40,000 zeros..., 12, 22 }'::float8[].
00036 \endcode
00037 
00038     This array would occupy slightly more than 320KB of memory/disk, most of 
00039     it zeros. Even if we were to exploit the null bitmap and store the zeros 
00040     as nulls, we would still end up with a 5KB null bitmap, which is still 
00041     not nearly as memory efficient as we'd like. Also, as we perform various 
00042     operations on the array, we'll often be doing work on 40,000 fields that 
00043     would turn out not to be important. 
00044 
00045     To solve the problems associated with the processing of sparse vectors 
00046     discussed above, we adopt a simple Run Length Encoding (RLE) scheme to 
00047     represent sparse vectors as pairs of count-value arrays. So, for example, 
00048     the array above would be represented as follows
00049 
00050 \code
00051 '{1,1,40000,1,1}:{0,33,0,12,22}'::MADLIB_SCHEMA.svec,
00052 \endcode
00053 
00054     which says there is 1 occurrence of 0, followed by 1 occurrence of 33, 
00055     followed by 40,000 occurrences of 0, etc. In contrast to the naive 
00056     representations, we only need 5 integers and 5 floating point numbers
00057     to store the array. Further, it is easy to implement vector operations 
00058     that can take advantage of the RLE representation to make computations 
00059     faster. The module provides a library of such functions.
00060 
00061     The current version only supports sparse vectors of float8
00062     values. Future versions will support other base types.
00063 
00064 @usage
00065 
00066     SVEC's can be constructed directly as follows:
00067     <pre>
00068     SELECT '{n1,n2,...,nk}:{v1,v2,...vk}'::MADLIB_SCHEMA.svec;
00069     </pre>
00070     WHERE <tt>n1,n2,...,nk</tt> specifies the counts for the values <tt>v1,v2,...,vk</tt>.
00071     
00072     Or, SVEC's can also be casted from a float array:
00073     <pre>
00074     SELECT ('{v1,v2,...vk}'::float[])::MADLIB_SCHEMA.svec;
00075     </pre>  
00076 
00077     Syntax reference can be found in svec.sql_in.
00078 
00079     Users need to add MADLIB_SCHEMA to their search_path to use the svec operators
00080     defined in the module.
00081 
00082 @examp
00083 
00084     We can use operations with svec type like <, >, *, **, /, =, +, SUM, etc, 
00085     and they have meanings associated with typical vector operations. For 
00086     example, the plus (+) operator adds each of the terms of two vectors having
00087     the same dimension together. 
00088 \code
00089 sql> SELECT ('{0,1,5}'::float8[]::MADLIB_SCHEMA.svec + '{4,3,2}'::float8[]::MADLIB_SCHEMA.svec)::float8[];
00090  float8  
00091 ---------
00092  {4,4,7}
00093 \endcode
00094 
00095     Without the casting into float8[] at the end, we get:
00096 \code
00097 sql> SELECT '{0,1,5}'::float8[]::MADLIB_SCHEMA.svec + '{4,3,2}'::float8[]::MADLIB_SCHEMA.svec;
00098  ?column?  
00099 ----------
00100 {2,1}:{4,7}             
00101 \endcode
00102 
00103     A dot product (%*%) between the two vectors will result in a scalar 
00104     result of type float8. The dot product should be (0*4 + 1*3 + 5*2) = 13, 
00105     like this:
00106 \code
00107 sql> SELECT '{0,1,5}'::float8[]::MADLIB_SCHEMA.svec %*% '{4,3,2}'::float8[]::MADLIB_SCHEMA.svec;
00108  ?column? 
00109 ----------
00110     13
00111 \endcode
00112 
00113     Special vector aggregate functions are also available. SUM is self 
00114     explanatory. SVEC_COUNT_NONZERO evaluates the count of non-zero terms 
00115     in each column found in a set of n-dimensional svecs and returns an 
00116     svec with the counts. For instance, if we have the vectors {0,1,5},
00117     {10,0,3},{0,0,3},{0,1,0}, then executing the SVEC_COUNT_NONZERO() aggregate
00118     function would result in {1,2,3}:
00119 
00120 \code
00121 sql> create table list (a MADLIB_SCHEMA.svec);
00122 sql> insert into list values ('{0,1,5}'::float8[]), ('{10,0,3}'::float8[]), ('{0,0,3}'::float8[]),('{0,1,0}'::float8[]);
00123 
00124 sql> SELECT MADLIB_SCHEMA.svec_count_nonzero(a)::float8[] FROM list;
00125 svec_count_nonzero 
00126 -----------------
00127     {1,2,3}
00128 \endcode
00129 
00130     We do not use null bitmaps in the svec data type. A null value in an svec 
00131     is represented explicitly as an NVP (No Value Present) value. For example, 
00132     we have:
00133 \code
00134 sql> SELECT '{1,2,3}:{4,null,5}'::MADLIB_SCHEMA.svec;
00135       svec        
00136 -------------------
00137  {1,2,3}:{4,NVP,5}
00138 
00139 sql> SELECT '{1,2,3}:{4,null,5}'::MADLIB_SCHEMA.svec + '{2,2,2}:{8,9,10}'::MADLIB_SCHEMA.svec; 
00140          ?column?         
00141  --------------------------
00142   {1,2,1,2}:{12,NVP,14,15}
00143 \endcode
00144 
00145     An element of an svec can be accessed using the svec_proj() function,
00146     which takes an svec and the index of the element desired.
00147 \code
00148 sql> SELECT MADLIB_SCHEMA.svec_proj('{1,2,3}:{4,5,6}'::MADLIB_SCHEMA.svec, 1) + MADLIB_SCHEMA.svec_proj('{4,5,6}:{1,2,3}'::MADLIB_SCHEMA.svec, 15);     
00149  ?column? 
00150 ----------
00151     7
00152 \endcode
00153 
00154     A subvector of an svec can be accessed using the svec_subvec() function,
00155     which takes an svec and the start and end index of the subvector desired.
00156 \code
00157 sql> SELECT MADLIB_SCHEMA.svec_subvec('{2,4,6}:{1,3,5}'::MADLIB_SCHEMA.svec, 2, 11);
00158    svec_subvec   
00159 ----------------- 
00160  {1,4,5}:{1,3,5}
00161 \endcode
00162 
00163     The elements/subvector of an svec can be changed using the function 
00164     svec_change(). It takes three arguments: an m-dimensional svec sv1, a
00165     start index j, and an n-dimensional svec sv2 such that j + n - 1 <= m,
00166     and returns an svec like sv1 but with the subvector sv1[j:j+n-1] 
00167     replaced by sv2. An example follows:
00168 \code
00169 sql> SELECT MADLIB_SCHEMA.svec_change('{1,2,3}:{4,5,6}'::MADLIB_SCHEMA.svec,3,'{2}:{3}'::MADLIB_SCHEMA.svec);
00170      svec_change     
00171 ---------------------
00172  {1,1,2,2}:{4,5,3,6}
00173 \endcode
00174 
00175     There are also higher-order functions for processing svecs. For example,
00176     the following is the corresponding function for lapply() in R.
00177 \code
00178 sql> SELECT MADLIB_SCHEMA.svec_lapply('sqrt', '{1,2,3}:{4,5,6}'::MADLIB_SCHEMA.svec);
00179                   svec_lapply                  
00180 -----------------------------------------------
00181  {1,2,3}:{2,2.23606797749979,2.44948974278318}
00182 \endcode
00183 
00184     The full list of functions available for operating on svecs are available
00185     in svec.sql.
00186 
00187 <b> A More Extensive Example</b>
00188 
00189     For a text classification example, let's assume we have a dictionary 
00190     composed of words in a sorted text array:
00191 \code
00192 sql> create table features (a text[]);
00193 sql> insert into features values 
00194             ('{am,before,being,bothered,corpus,document,i,in,is,me,
00195                never,now,one,really,second,the,third,this,until}');
00196 \endcode
00197     We have a set of documents, each represented as an array of words:
00198 \code
00199 sql> create table documents(a int,b text[]);
00200 sql> insert into documents values
00201             (1,'{this,is,one,document,in,the,corpus}'),
00202             (2,'{i,am,the,second,document,in,the,corpus}'),
00203             (3,'{being,third,never,really,bothered,me,until,now}'),
00204             (4,'{the,document,before,me,is,the,third,document}');
00205 \endcode
00206 
00207     Now we have a dictionary and some documents, we would like to do some 
00208     document categorization using vector arithmetic on word counts and 
00209     proportions of dictionary words in each document.
00210 
00211     To start this process, we'll need to find the dictionary words in each 
00212     document. We'll prepare what is called a Sparse Feature Vector or SFV 
00213     for each document. An SFV is a vector of dimension N, where N is the 
00214     number of dictionary words, and in each cell of an SFV is a count of 
00215     each dictionary word in the document.
00216 
00217     Inside the sparse vector library, we have a function that will create 
00218     an SFV from a document, so we can just do this:
00219 \code
00220 sql> SELECT MADLIB_SCHEMA.svec_sfv((SELECT a FROM features LIMIT 1),b)::float8[] 
00221          FROM documents;
00222 
00223                 svec_sfv
00224 -----------------------------------------
00225  {0,0,0,0,1,1,0,1,1,0,0,0,1,0,0,1,0,1,0}
00226  {0,0,1,1,0,0,0,0,0,1,1,1,0,1,0,0,1,0,1}
00227  {1,0,0,0,1,1,1,1,0,0,0,0,0,0,1,2,0,0,0}
00228  {0,1,0,0,0,2,0,0,1,1,0,0,0,0,0,2,1,0,0}
00229 \endcode
00230     Note that the output of MADLIB_SCHEMA.svec_sfv() is an svec for each 
00231     document containing the count of each of the dictionary words in the 
00232     ordinal positions of the dictionary. This can more easily be understood 
00233     by lining up the feature vector and text like this:
00234 \code
00235 sql> SELECT MADLIB_SCHEMA.svec_sfv((SELECT a FROM features LIMIT 1),b)::float8[]
00236                 , b 
00237          FROM documents;
00238 
00239                 svec_sfv                 |                        b                         
00240 -----------------------------------------+--------------------------------------------------
00241  {1,0,0,0,1,1,1,1,0,0,0,0,0,0,1,2,0,0,0} | {i,am,the,second,document,in,the,corpus}
00242  {0,1,0,0,0,2,0,0,1,1,0,0,0,0,0,2,1,0,0} | {the,document,before,me,is,the,third,document}
00243  {0,0,0,0,1,1,0,1,1,0,0,0,1,0,0,1,0,1,0} | {this,is,one,document,in,the,corpus}
00244  {0,0,1,1,0,0,0,0,0,1,1,1,0,1,0,0,1,0,1} | {being,third,never,really,bothered,me,until,now}
00245 
00246 sql> SELECT * FROM features;
00247                                                 a                                                    
00248 --------------------------------------------------------------------------------------------------------
00249 {am,before,being,bothered,corpus,document,i,in,is,me,never,now,one,really,second,the,third,this,until}
00250 \endcode
00251 
00252     Now when we look at the document "i am the second document in the corpus", 
00253     its SFV is {1,3*0,1,1,1,1,6*0,1,2}. The word "am" is the first ordinate in 
00254     the dictionary and there is 1 instance of it in the SFV. The word "before" 
00255     has no instances in the document, so its value is "0" and so on.
00256 
00257     The function MADLIB_SCHEMA.svec_sfv() can process large 
00258     numbers of documents into their SFVs in parallel at high speed.
00259 
00260     The rest of the categorization process is all vector math. The actual 
00261     count is hardly ever used.  Instead, it's turned into a weight. The most 
00262     common weight is called tf/idf for Term Frequency / Inverse Document 
00263     Frequency. The calculation for a given term in a given document is 
00264 \code
00265 {#Times in document} * log {#Documents / #Documents the term appears in}.
00266 \endcode
00267     For instance, the term "document" in document A would have weight 
00268     1 * log (4/3). In document D, it would have weight 2 * log (4/3).
00269     Terms that appear in every document would have tf/idf weight 0, since 
00270     log (4/4) = log(1) = 0. (Our example has no term like that.) That 
00271     usually sends a lot of values to 0.
00272 
00273     For this part of the processing, we'll need to have a sparse vector of 
00274     the dictionary dimension (19) with the values 
00275 \code
00276 log(#documents/#Documents each term appears in). 
00277 \endcode
00278     There will be one such vector for the whole list of documents (aka the 
00279     "corpus"). The #documents is just a count of all of the documents, in 
00280     this case 4, but there is one divisor for each dictionary word and its 
00281     value is the count of all the times that word appears in the document. 
00282     This single vector for the whole corpus can then be scalar product 
00283     multiplied by each document SFV to produce the Term Frequency/Inverse 
00284     Document Frequency weights.
00285 
00286     This can be done as follows:
00287 \code
00288 sql> create table corpus as 
00289             (SELECT a, MADLIB_SCHEMA.svec_sfv((SELECT a FROM features LIMIT 1),b) sfv 
00290          FROM documents);
00291 sql> create table weights as
00292           (SELECT a docnum, MADLIB_SCHEMA.svec_mult(sfv, logidf) tf_idf 
00293            FROM (SELECT MADLIB_SCHEMA.svec_log(MADLIB_SCHEMA.svec_div(count(sfv)::MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec_count_nonzero(sfv))) logidf 
00294                 FROM corpus) foo, corpus ORDER BYdocnum);
00295 sql> SELECT * FROM weights;
00296 
00297 docnum |                tf_idf                                     
00298 -------+----------------------------------------------------------------------
00299      1 | {4,1,1,1,2,3,1,2,1,1,1,1}:{0,0.69,0.28,0,0.69,0,1.38,0,0.28,0,1.38,0}
00300      2 | {1,3,1,1,1,1,6,1,1,3}:{1.38,0,0.69,0.28,1.38,0.69,0,1.38,0.57,0}
00301      3 | {2,2,5,1,2,1,1,2,1,1,1}:{0,1.38,0,0.69,1.38,0,1.38,0,0.69,0,1.38}
00302      4 | {1,1,3,1,2,2,5,1,1,2}:{0,1.38,0,0.57,0,0.69,0,0.57,0.69,0}
00303 \endcode
00304 
00305     We can now get the "angular distance" between one document and the rest 
00306     of the documents using the ACOS of the dot product of the document vectors:
00307     The following calculates the angular distance between the first document 
00308     and each of the other documents:
00309 \code
00310 sql> SELECT docnum,
00311                 180. * ( ACOS( MADLIB_SCHEMA.svec_dmin( 1., MADLIB_SCHEMA.svec_dot(tf_idf, testdoc) 
00312                     / (MADLIB_SCHEMA.svec_l2norm(tf_idf)*MADLIB_SCHEMA.svec_l2norm(testdoc))))/3.141592654) angular_distance 
00313          FROM weights,(SELECT tf_idf testdoc FROM weights WHERE docnum = 1 LIMIT 1) foo 
00314          ORDER BY 1;
00315 
00316 docnum | angular_distance 
00317 --------+------------------
00318      1 |                0
00319      2 | 78.8235846096986
00320      3 | 89.9999999882484
00321      4 | 80.0232034288617
00322 \endcode
00323     We can see that the angular distance between document 1 and itself 
00324     is 0 degrees and between document 1 and 3 is 90 degrees because they 
00325     share no features at all. The angular distance can now be plugged into
00326     machine learning algorithms that rely on a distance measure between
00327     data points.
00328     
00329     SVEC also provides functionality for declaring array given and array of positions and array of values, intermediate values betweens those
00330     are declared to be base value that user provides in the same function call. In the example below the fist array of integers represents the
00331     positions for the array two (array of floats). Positions do not need to come in the sorted order. 
00332     Third value represents desired maximum size of the array. This assures that array is of that size
00333     even if last position is not. If max size < 1 that value is ignored and array will end at the last position in the position vector. Final value is a        float representing the base value to be used between the declared ones (0 would be a common candidate):
00334 \code
00335 sql> SELECT MADLIB_SCHEMA.svec_cast_positions_float8arr(ARRAY[1,2,7,5,87],ARRAY[.1,.2,.7,.5,.87],90,0.0);
00336 
00337         svec_cast_positions_float8arr            
00338 -----------------------------------------------------
00339 {1,1,2,1,1,1,79,1,3}:{0.1,0.2,0,0.5,0,0.7,0,0.87,0}
00340 (1 row)
00341 \endcode
00342 
00343     Other examples of svecs usage can be found in the k-means module.
00344 
00345 @sa File svec.sql_in documenting the SQL functions.
00346 
00347 @internal
00348 @sa File sparse_vector.c documenting the implementation in C.
00349 @endinternal
00350 */
00351 
00352 
00353 --! @file svec.sql_in
00354 --!
00355 
00356 -- DROP SCHEMA MADLIB_SCHEMA CASCADE;
00357 -- CREATE SCHEMA MADLIB_SCHEMA;
00358 
00359 -- DROP TYPE IF EXISTS MADLIB_SCHEMA.svec CASCADE;
00360 CREATE TYPE MADLIB_SCHEMA.svec;
00361 
00362 --! SVEC constructor from CSTRING.
00363 --!
00364 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_in(cstring)
00365     RETURNS MADLIB_SCHEMA.svec
00366     AS 'MODULE_PATHNAME'
00367     LANGUAGE C IMMUTABLE STRICT;
00368 
00369 --! Converts SVEC to CSTRING.
00370 --!
00371 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_out(MADLIB_SCHEMA.svec)
00372     RETURNS cstring
00373     AS 'MODULE_PATHNAME'
00374     LANGUAGE C IMMUTABLE STRICT;
00375 
00376 --! Converts SVEC internal representation to SVEC.
00377 --!
00378 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_recv(internal)
00379     RETURNS MADLIB_SCHEMA.svec
00380     AS 'MODULE_PATHNAME'
00381     LANGUAGE C IMMUTABLE STRICT;
00382 
00383 --! Converts SVEC to BYTEA.
00384 --!
00385 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_send(MADLIB_SCHEMA.svec)
00386     RETURNS bytea
00387     AS 'MODULE_PATHNAME'
00388     LANGUAGE C IMMUTABLE STRICT;
00389 
00390 CREATE TYPE MADLIB_SCHEMA.svec (
00391        internallength = VARIABLE, 
00392        input = MADLIB_SCHEMA.svec_in,
00393        output = MADLIB_SCHEMA.svec_out,
00394        send = MADLIB_SCHEMA.svec_send,
00395        receive = MADLIB_SCHEMA.svec_recv,
00396        storage=EXTENDED,
00397        alignment = double
00398 );
00399 
00400 --! Basic floating point scalar operator: MIN.
00401 --!
00402 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dmin(float8,float8) RETURNS float8 AS 'MODULE_PATHNAME', 'float8_min' LANGUAGE C IMMUTABLE; 
00403 
00404 --! Basic floating point scalar operator: MAX.
00405 --!
00406 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dmax(float8,float8) RETURNS float8 AS 'MODULE_PATHNAME', 'float8_max' LANGUAGE C IMMUTABLE; 
00407 
00408 --! Counts the number of non-zero entries in the input vector; the second argument is capped at 1, then added to the first; used as the sfunc in the svec_count_nonzero() aggregate below.
00409 --!
00410 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_count(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec 
00411 AS 'MODULE_PATHNAME', 'svec_count' STRICT LANGUAGE C IMMUTABLE; 
00412 
00413 --! Adds two SVECs together, element by element.
00414 --!
00415 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_plus(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_plus' STRICT LANGUAGE C IMMUTABLE; 
00416 
00417 --! Minus second SVEC from the first, element by element.
00418 --!
00419 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_minus(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_minus' STRICT LANGUAGE C IMMUTABLE; 
00420 
00421 --! Computes the logarithm of each element of the input SVEC.
00422 --!
00423 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_log(MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_log' STRICT LANGUAGE C IMMUTABLE; 
00424 
00425 --! Divides the first SVEC by the second, element by element.
00426 --!
00427 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_div(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_div' STRICT LANGUAGE C IMMUTABLE; 
00428 
00429 --! Multiplies two SVEVs together, element by element.
00430 --!
00431 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_mult(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_mult' STRICT LANGUAGE C IMMUTABLE; 
00432 
00433 --! Raises each element of the first SVEC to the power given by second SVEC, which must have dimension 1 (a scalar).
00434 --!
00435 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_pow(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_pow' STRICT LANGUAGE C IMMUTABLE; 
00436 
00437 --! Returns true if two SVECs are equal. If the two SVEC's are of different size, then will return false.
00438 --!
00439 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_eq(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS boolean AS 'MODULE_PATHNAME', 'svec_eq' STRICT LANGUAGE C IMMUTABLE;
00440 
00441 --! Returns true if two SVECs are equal, not counting zeros (zero equals anything). If the two SVEC's are of different size, then the function essentially zero-pads the shorter one and performs the comparison.
00442 --!
00443 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_eq_non_zero(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS boolean AS 'MODULE_PATHNAME', 'svec_eq_non_zero' STRICT LANGUAGE C IMMUTABLE;
00444 
00445 --! Returns true if left svec contains right one, meaning that every non-zero value in the right svec equals left one
00446 --!
00447 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_contains(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS boolean AS 'MODULE_PATHNAME', 'svec_contains' STRICT LANGUAGE C IMMUTABLE; 
00448 
00449 --! Returns true if two float8 arrays are equal
00450 --!
00451 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_eq(float8[],float8[]) RETURNS boolean AS 'MODULE_PATHNAME', 'float8arr_equals' LANGUAGE C IMMUTABLE;
00452 
00453 --! Minus second array from the first array, element by element.
00454 --!
00455 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_minus_float8arr(float8[],float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_minus_float8arr' LANGUAGE C IMMUTABLE;
00456 
00457 --! Minus second SVEC from the first array, element by element.
00458 --!
00459 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_minus_svec(float8[],MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_minus_svec' LANGUAGE C IMMUTABLE;
00460 
00461 --! Minus second array from the first SVEC, element by element.
00462 --!
00463 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_minus_float8arr(MADLIB_SCHEMA.svec,float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_minus_float8arr' LANGUAGE C IMMUTABLE;
00464 
00465 --! Adds two arrays together, element by element.
00466 --!
00467 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_plus_float8arr(float8[],float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_plus_float8arr' LANGUAGE C IMMUTABLE;
00468 
00469 --! Adds an array and an SVEC, element by element.
00470 --!
00471 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_plus_svec(float8[],MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_plus_svec' LANGUAGE C IMMUTABLE;
00472 
00473 --! Adds an SVEC and an array, element by element.
00474 --!
00475 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_plus_float8arr(MADLIB_SCHEMA.svec,float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_plus_float8arr' LANGUAGE C IMMUTABLE;
00476 
00477 --! Multiplies two float8 arrays, element by element.
00478 --!
00479 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_mult_float8arr(float8[],float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_mult_float8arr' LANGUAGE C IMMUTABLE;
00480 
00481 --! Multiplies an array and an SVEC, element by element.
00482 --!
00483 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_mult_svec(float8[],MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_mult_svec' LANGUAGE C IMMUTABLE;
00484 
00485 --! Multiplies an SVEC and an array, element by element.
00486 --!
00487 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_mult_float8arr(MADLIB_SCHEMA.svec,float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_mult_float8arr' LANGUAGE C IMMUTABLE;
00488 
00489 --! Divides a float8 array by another, element by element.
00490 --!
00491 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_div_float8arr(float8[],float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_div_float8arr' LANGUAGE C IMMUTABLE;
00492 
00493 --! Divides a float8 array by an SVEC, element by element.
00494 --!
00495 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_div_svec(float8[],MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_div_svec' LANGUAGE C IMMUTABLE;
00496 
00497 --! Divides an SVEC by a float8 array, element by element.
00498 --!
00499 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_div_float8arr(MADLIB_SCHEMA.svec,float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_div_float8arr' LANGUAGE C IMMUTABLE;
00500 
00501 --! Computes the dot product of two SVECs.
00502 --!
00503 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dot(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_dot' STRICT LANGUAGE C IMMUTABLE; 
00504 
00505 --! Computes the dot product of two float8 arrays.
00506 --!
00507 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dot(float8[],float8[]) RETURNS float8 AS 'MODULE_PATHNAME', 'float8arr_dot' STRICT LANGUAGE C IMMUTABLE; 
00508 
00509 --! Computes the dot product of an SVEC and a float8 array.
00510 --!
00511 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dot(MADLIB_SCHEMA.svec,float8[]) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_dot_float8arr' STRICT LANGUAGE C IMMUTABLE; 
00512 
00513 --! Computes the dot product of a float8 array and an SVEC.
00514 --!
00515 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dot(float8[],MADLIB_SCHEMA.svec) RETURNS float8 AS 'MODULE_PATHNAME', 'float8arr_dot_svec' STRICT LANGUAGE C IMMUTABLE; 
00516 
00517 --! Computes the l2norm of an SVEC.
00518 --!
00519 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2norm(MADLIB_SCHEMA.svec) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_l2norm' STRICT LANGUAGE C IMMUTABLE; 
00520 
00521 --! Computes the l2norm of a float8 array.
00522 --!
00523 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2norm(float8[]) RETURNS float8 AS 'MODULE_PATHNAME', 'float8arr_l2norm' LANGUAGE C IMMUTABLE;
00524 
00525 --! Computes the l2norm distance between two SVECs.
00526 --!
00527 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.l2norm(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) 
00528 RETURNS float8 AS 'MODULE_PATHNAME', 'svec_svec_l2norm' LANGUAGE C STRICT IMMUTABLE;
00529 
00530 --! Computes the l1norm distance between two SVECs.
00531 --!
00532 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.l1norm(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) 
00533 RETURNS float8 AS 'MODULE_PATHNAME', 'svec_svec_l1norm' LANGUAGE C STRICT IMMUTABLE;
00534 
00535 --! Computes the l1norm of an SVEC.
00536 --!
00537 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l1norm(MADLIB_SCHEMA.svec) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_l1norm' STRICT LANGUAGE C IMMUTABLE; 
00538 
00539 --! Computes the l1norm of a float8 array.
00540 --!
00541 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l1norm(float8[]) RETURNS float8 AS 'MODULE_PATHNAME', 'float8arr_l1norm' STRICT LANGUAGE C IMMUTABLE; 
00542 
00543 --! Computes the angle between two SVECs in radians.
00544 --!
00545 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.angle(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) 
00546 RETURNS float8 AS 'MODULE_PATHNAME', 'svec_svec_angle' LANGUAGE C STRICT IMMUTABLE;
00547 
00548 --! Computes the Tanimoto distance between two SVECs.
00549 --!
00550 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.tanimoto_distance(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) 
00551 RETURNS float8 AS 'MODULE_PATHNAME', 'svec_svec_tanimoto_distance' LANGUAGE C STRICT IMMUTABLE;
00552 
00553 --! Unnests an SVEC into a table of uncompressed values  
00554 --!
00555 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_unnest(MADLIB_SCHEMA.svec) RETURNS setof float8  AS 'MODULE_PATHNAME', 'svec_unnest' LANGUAGE C IMMUTABLE; 
00556 
00557 --! Appends an element to the back of an SVEC.
00558 --!
00559 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_pivot(MADLIB_SCHEMA.svec,float8) RETURNS MADLIB_SCHEMA.svec  AS 'MODULE_PATHNAME', 'svec_pivot' LANGUAGE C IMMUTABLE; 
00560 
00561 --! Sums the elements of an SVEC.
00562 --!
00563 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_elsum(MADLIB_SCHEMA.svec) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_summate' STRICT LANGUAGE C IMMUTABLE; 
00564 
00565 --! Sums the elements of a float8 array.
00566 --!
00567 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_elsum(float8[]) RETURNS float8 AS 'MODULE_PATHNAME', 'float8arr_summate' STRICT LANGUAGE C IMMUTABLE; 
00568 
00569 --! Computes the median element of a float8 array.
00570 --!
00571 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_median(float8[]) RETURNS float8 AS 'MODULE_PATHNAME', 'float8arr_median' STRICT LANGUAGE C IMMUTABLE; 
00572 
00573 --! Computes the median element of an SVEC.
00574 --!
00575 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_median(MADLIB_SCHEMA.svec) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_median' STRICT LANGUAGE C IMMUTABLE; 
00576 
00577 --! Compares an SVEC to a float8, and returns positions of all elements not equal to the float as an array. Element index here starts at 0.
00578 --!
00579 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_nonbase_positions(MADLIB_SCHEMA.svec, FLOAT8) RETURNS INT8[] AS 'MODULE_PATHNAME', 'svec_nonbase_positions' STRICT LANGUAGE C IMMUTABLE;
00580 
00581 --! Compares an SVEC to a float8, and returns values of all elements not equal to the float as an array.
00582 --!
00583 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_nonbase_values(MADLIB_SCHEMA.svec, FLOAT8) RETURNS FLOAT8[] AS 'MODULE_PATHNAME', 'svec_nonbase_values' STRICT LANGUAGE C IMMUTABLE; 
00584 
00585 
00586 --! Casts an int2 into an SVEC.
00587 --!
00588 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_int2(int2) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_int2' STRICT LANGUAGE C IMMUTABLE; 
00589 
00590 --! Casts an int4 into an SVEC.
00591 --!
00592 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_int4(int4) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_int4' STRICT LANGUAGE C IMMUTABLE; 
00593 
00594 --! Casts an int8 into an SVEC.
00595 --!
00596 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_int8(bigint) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_int8' STRICT LANGUAGE C IMMUTABLE; 
00597 
00598 --! Casts a float4 into an SVEC.
00599 --!
00600 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_float4(float4) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_float4' STRICT LANGUAGE C IMMUTABLE; 
00601 
00602 --! Casts a float8 into an SVEC.
00603 --!
00604 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_float8(float8) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_float8' STRICT LANGUAGE C IMMUTABLE; 
00605 
00606 --! Casts a numeric into an SVEC.
00607 --!
00608 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_numeric(numeric) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_numeric' STRICT LANGUAGE C IMMUTABLE; 
00609 
00610 --! Casts an int2 into a float8 array.
00611 --!
00612 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_cast_int2(int2) RETURNS float8[] AS 'MODULE_PATHNAME', 'float8arr_cast_int2' STRICT LANGUAGE C IMMUTABLE; 
00613 
00614 --! Casts an int4 into a float8 array.
00615 --!
00616 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_cast_int4(int4) RETURNS float8[] AS 'MODULE_PATHNAME', 'float8arr_cast_int4' STRICT LANGUAGE C IMMUTABLE; 
00617 
00618 --! Casts an int8 into a float8 array.
00619 --!
00620 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_cast_int8(bigint) RETURNS float8[] AS 'MODULE_PATHNAME', 'float8arr_cast_int8' STRICT LANGUAGE C IMMUTABLE; 
00621 
00622 --! Casts a float4 into a float8 array.
00623 --!
00624 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_cast_float4(float4) RETURNS float8[] AS 'MODULE_PATHNAME', 'float8arr_cast_float4' STRICT LANGUAGE C IMMUTABLE; 
00625 
00626 --! Casts a float8 into a float8 array.
00627 --!
00628 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_cast_float8(float8) RETURNS float8[] AS 'MODULE_PATHNAME', 'float8arr_cast_float8' STRICT LANGUAGE C IMMUTABLE; 
00629 
00630 --! Casts a numeric into a float8 array.
00631 --!
00632 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_cast_numeric(numeric) RETURNS float8[] AS 'MODULE_PATHNAME', 'float8arr_cast_numeric' STRICT LANGUAGE C IMMUTABLE; 
00633 
00634 --! Casts a float8 into an SVEC.
00635 --!
00636 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_float8arr(float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_float8arr' STRICT LANGUAGE C IMMUTABLE; 
00637 
00638 --! Casts an array of int8 positions, float8 values into an SVEC.
00639 --!
00640 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_positions_float8arr(int8[],float8[],int8,float8) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_positions_float8arr' STRICT LANGUAGE C IMMUTABLE;
00641 
00642 --! Casts an SVEC into a float8 array.
00643 --!
00644 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_return_array(MADLIB_SCHEMA.svec) RETURNS float8[] AS 'MODULE_PATHNAME', 'svec_return_array' LANGUAGE C IMMUTABLE; 
00645 
00646 --! Concatenates two SVECs.
00647 --!
00648 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_concat(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_concat' LANGUAGE C IMMUTABLE; 
00649 
00650 --! Replicates n copies of an SVEC and concatenates them together.
00651 --!
00652 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_concat_replicate(int4,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_concat_replicate' LANGUAGE C IMMUTABLE; 
00653 
00654 --! Returns the dimension of an SVEC.
00655 --!
00656 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dimension(MADLIB_SCHEMA.svec) RETURNS integer AS 'MODULE_PATHNAME', 'svec_dimension' LANGUAGE C IMMUTABLE; 
00657 
00658 --! Applies a given function to each element of an SVEC.
00659 --!
00660 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_lapply(text,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_lapply' LANGUAGE C IMMUTABLE;
00661 
00662 --! Appends a run-length block to the back of an SVEC.
00663 --!
00664 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_append(MADLIB_SCHEMA.svec,float8,int8) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_append' LANGUAGE C IMMUTABLE;
00665 
00666 --! Projects onto an element of an SVEC.
00667 --!
00668 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_proj(MADLIB_SCHEMA.svec,int4) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_proj' LANGUAGE C IMMUTABLE;
00669 
00670 --! Extracts a subvector of an SVEC given the subvector's start and end indices.
00671 --!
00672 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_subvec(MADLIB_SCHEMA.svec,int4,int4) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_subvec' LANGUAGE C IMMUTABLE;
00673 
00674 --! Reverses the elements of an SVEC.
00675 --!
00676 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_reverse(MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_reverse' LANGUAGE C IMMUTABLE;
00677 
00678 --! Replaces the subvector of a given SVEC at a given start index with another SVEC. Note that element index should start at 1.
00679 --!
00680 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_change(MADLIB_SCHEMA.svec,int4,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_change' LANGUAGE C IMMUTABLE;
00681 
00682 --! Computes the hash of an SVEC.
00683 --!
00684 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_hash(MADLIB_SCHEMA.svec) RETURNS int4 AS 'MODULE_PATHNAME', 'svec_hash' STRICT LANGUAGE C IMMUTABLE; 
00685 
00686 --! Computes the word-occurence vector of a document
00687 --!
00688 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_sfv(text[], text[]) RETURNS MADLIB_SCHEMA.svec AS
00689 'MODULE_PATHNAME', 'gp_extract_feature_histogram' LANGUAGE C IMMUTABLE;
00690 
00691 --! Sorts an array of texts. This function should be in MADlib common.
00692 --!
00693 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_sort(text[]) RETURNS text[] AS $$
00694        SELECT array(SELECT unnest($1::text[]) ORDER BY 1);
00695 $$ LANGUAGE SQL;
00696 
00697 --! Converts an svec to a text string
00698 --!
00699 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_to_string(MADLIB_SCHEMA.svec) 
00700 RETURNS text AS 'MODULE_PATHNAME', 'svec_to_string' STRICT LANGUAGE C IMMUTABLE;
00701 
00702 --! Converts a text string to an svec
00703 --!
00704 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_from_string(text) 
00705 RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_from_string' STRICT LANGUAGE C IMMUTABLE;
00706 
00707 
00708 /*
00709 DROP OPERATOR IF EXISTS || ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec);
00710 DROP OPERATOR IF EXISTS - ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec);
00711 DROP OPERATOR IF EXISTS + ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec);
00712 DROP OPERATOR IF EXISTS / ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec);
00713 DROP OPERATOR IF EXISTS %*% ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec);
00714 DROP OPERATOR IF EXISTS * ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec);
00715 DROP OPERATOR IF EXISTS ^ ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec);
00716 */
00717 
00718 CREATE OPERATOR MADLIB_SCHEMA.|| (
00719     LEFTARG = MADLIB_SCHEMA.svec,
00720     RIGHTARG = MADLIB_SCHEMA.svec,
00721     PROCEDURE = MADLIB_SCHEMA.svec_concat
00722 );
00723 
00724 CREATE OPERATOR MADLIB_SCHEMA.- (
00725     LEFTARG = MADLIB_SCHEMA.svec,
00726     RIGHTARG = MADLIB_SCHEMA.svec,
00727     PROCEDURE = MADLIB_SCHEMA.svec_minus
00728 );
00729 CREATE OPERATOR MADLIB_SCHEMA.+ (
00730     LEFTARG = MADLIB_SCHEMA.svec,
00731     RIGHTARG = MADLIB_SCHEMA.svec,
00732     PROCEDURE = MADLIB_SCHEMA.svec_plus
00733 );
00734 CREATE OPERATOR MADLIB_SCHEMA./ (
00735     LEFTARG = MADLIB_SCHEMA.svec,
00736     RIGHTARG = MADLIB_SCHEMA.svec,
00737     PROCEDURE = MADLIB_SCHEMA.svec_div
00738 );
00739 CREATE OPERATOR MADLIB_SCHEMA.%*% (
00740     LEFTARG = MADLIB_SCHEMA.svec,
00741     RIGHTARG = MADLIB_SCHEMA.svec,
00742     PROCEDURE = MADLIB_SCHEMA.svec_dot
00743 );
00744 CREATE OPERATOR MADLIB_SCHEMA.* (
00745     LEFTARG = MADLIB_SCHEMA.svec,
00746     RIGHTARG = MADLIB_SCHEMA.svec,
00747     PROCEDURE = MADLIB_SCHEMA.svec_mult
00748 );
00749 CREATE OPERATOR MADLIB_SCHEMA.^ (
00750     LEFTARG = MADLIB_SCHEMA.svec,
00751     RIGHTARG = MADLIB_SCHEMA.svec,
00752     PROCEDURE = MADLIB_SCHEMA.svec_pow
00753 );
00754 
00755 -- float8[] operators
00756 -- DROP OPERATOR IF EXISTS = ( float8[], float8[]);
00757 /*
00758 DROP OPERATOR IF EXISTS %*% ( float8[], MADLIB_SCHEMA.svec);
00759 DROP OPERATOR IF EXISTS %*% ( MADLIB_SCHEMA.svec, float8[]);
00760 DROP OPERATOR IF EXISTS %*% ( float8[], float8[]);
00761 DROP OPERATOR IF EXISTS - ( float8[], float8[]);
00762 DROP OPERATOR IF EXISTS + ( float8[], float8[]);
00763 DROP OPERATOR IF EXISTS * ( float8[], float8[]);
00764 DROP OPERATOR IF EXISTS / ( float8[], float8[]);
00765 DROP OPERATOR IF EXISTS - ( float8[], MADLIB_SCHEMA.svec);
00766 DROP OPERATOR IF EXISTS + ( float8[], MADLIB_SCHEMA.svec);
00767 DROP OPERATOR IF EXISTS * ( float8[], MADLIB_SCHEMA.svec);
00768 DROP OPERATOR IF EXISTS / ( float8[], MADLIB_SCHEMA.svec);
00769 DROP OPERATOR IF EXISTS - ( MADLIB_SCHEMA.svec, float8[]);
00770 DROP OPERATOR IF EXISTS + ( MADLIB_SCHEMA.svec, float8[]);
00771 DROP OPERATOR IF EXISTS * ( MADLIB_SCHEMA.svec, float8[]);
00772 DROP OPERATOR IF EXISTS / ( MADLIB_SCHEMA.svec, float8[]);
00773 */
00774 
00775 /*
00776 CREATE OPERATOR MADLIB_SCHEMA.= (
00777     leftarg = float8[], 
00778     rightarg = float8[], 
00779     procedure = MADLIB_SCHEMA.float8arr_eq,
00780     commutator = operator(MADLIB_SCHEMA.=) ,
00781 --  negator = operator(MADLIB_SCHEMA.<>) ,
00782     restrict = eqsel, join = eqjoinsel
00783 );
00784 */
00785 
00786 CREATE OPERATOR MADLIB_SCHEMA.%*% (
00787     LEFTARG = float8[],
00788     RIGHTARG = float8[],
00789     PROCEDURE = MADLIB_SCHEMA.svec_dot
00790 );
00791 CREATE OPERATOR MADLIB_SCHEMA.%*% (
00792     LEFTARG = float8[],
00793     RIGHTARG = MADLIB_SCHEMA.svec,
00794     PROCEDURE = MADLIB_SCHEMA.svec_dot
00795 );
00796 CREATE OPERATOR MADLIB_SCHEMA.%*% (
00797     LEFTARG = MADLIB_SCHEMA.svec,
00798     RIGHTARG = float8[],
00799     PROCEDURE = MADLIB_SCHEMA.svec_dot
00800 );
00801 CREATE OPERATOR MADLIB_SCHEMA.- (
00802     LEFTARG = float8[],
00803     RIGHTARG = float8[],
00804     PROCEDURE = MADLIB_SCHEMA.float8arr_minus_float8arr
00805 );
00806 CREATE OPERATOR MADLIB_SCHEMA.+ (
00807     LEFTARG = float8[],
00808     RIGHTARG = float8[],
00809     PROCEDURE = MADLIB_SCHEMA.float8arr_plus_float8arr
00810 );
00811 CREATE OPERATOR MADLIB_SCHEMA.* (
00812     LEFTARG = float8[],
00813     RIGHTARG = float8[],
00814     PROCEDURE = MADLIB_SCHEMA.float8arr_mult_float8arr
00815 );
00816 CREATE OPERATOR MADLIB_SCHEMA./ (
00817     LEFTARG = float8[],
00818     RIGHTARG = float8[],
00819     PROCEDURE = MADLIB_SCHEMA.float8arr_div_float8arr
00820 );
00821 
00822 CREATE OPERATOR MADLIB_SCHEMA.- (
00823     LEFTARG = float8[],
00824     RIGHTARG = MADLIB_SCHEMA.svec,
00825     PROCEDURE = MADLIB_SCHEMA.float8arr_minus_svec
00826 );
00827 CREATE OPERATOR MADLIB_SCHEMA.+ (
00828     LEFTARG = float8[],
00829     RIGHTARG = MADLIB_SCHEMA.svec,
00830     PROCEDURE = MADLIB_SCHEMA.float8arr_plus_svec
00831 );
00832 CREATE OPERATOR MADLIB_SCHEMA.* (
00833     LEFTARG = float8[],
00834     RIGHTARG = MADLIB_SCHEMA.svec,
00835     PROCEDURE = MADLIB_SCHEMA.float8arr_mult_svec
00836 );
00837 CREATE OPERATOR MADLIB_SCHEMA./ (
00838     LEFTARG = float8[],
00839     RIGHTARG = MADLIB_SCHEMA.svec,
00840     PROCEDURE = MADLIB_SCHEMA.float8arr_div_svec
00841 );
00842 
00843 CREATE OPERATOR MADLIB_SCHEMA.- (
00844     LEFTARG = MADLIB_SCHEMA.svec,
00845     RIGHTARG = float8[],
00846     PROCEDURE = MADLIB_SCHEMA.svec_minus_float8arr
00847 );
00848 CREATE OPERATOR MADLIB_SCHEMA.+ (
00849     LEFTARG = MADLIB_SCHEMA.svec,
00850     RIGHTARG = float8[],
00851     PROCEDURE = MADLIB_SCHEMA.svec_plus_float8arr
00852 );
00853 CREATE OPERATOR MADLIB_SCHEMA.* (
00854     LEFTARG = MADLIB_SCHEMA.svec,
00855     RIGHTARG = float8[],
00856     PROCEDURE = MADLIB_SCHEMA.svec_mult_float8arr
00857 );
00858 CREATE OPERATOR MADLIB_SCHEMA./ (
00859     LEFTARG = MADLIB_SCHEMA.svec,
00860     RIGHTARG = float8[],
00861     PROCEDURE = MADLIB_SCHEMA.svec_div_float8arr
00862 );
00863 
00864 /*
00865 DROP CAST IF EXISTS (int2 AS MADLIB_SCHEMA.svec) ;
00866 DROP CAST IF EXISTS (integer AS MADLIB_SCHEMA.svec) ;
00867 DROP CAST IF EXISTS (bigint AS MADLIB_SCHEMA.svec) ;
00868 DROP CAST IF EXISTS (float4 AS MADLIB_SCHEMA.svec) ;
00869 DROP CAST IF EXISTS (float8 AS MADLIB_SCHEMA.svec) ;
00870 DROP CAST IF EXISTS (numeric AS MADLIB_SCHEMA.svec) ;
00871 */
00872 
00873 CREATE CAST (int2 AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_int2(int2) ; -- AS IMPLICIT;
00874 CREATE CAST (integer AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_int4(integer) ; -- AS IMPLICIT;
00875 CREATE CAST (bigint AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_int8(bigint) ; -- AS IMPLICIT;
00876 CREATE CAST (float4 AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_float4(float4) ; -- AS IMPLICIT;
00877 CREATE CAST (float8 AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_float8(float8) ; -- AS IMPLICIT;
00878 CREATE CAST (numeric AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_numeric(numeric) ; -- AS IMPLICIT;
00879 
00880 /*
00881 DROP CAST IF EXISTS (int2 AS float8[]) ;
00882 DROP CAST IF EXISTS (integer AS float8[]) ;
00883 DROP CAST IF EXISTS (bigint AS float8[]) ;
00884 DROP CAST IF EXISTS (float4 AS float8[]) ;
00885 DROP CAST IF EXISTS (float8 AS float8[]) ;
00886 DROP CAST IF EXISTS (numeric AS float8[]) ;
00887 */
00888 
00889 -- CREATE CAST (int2 AS float8[]) WITH FUNCTION MADLIB_SCHEMA.float8arr_cast_int2(int2) ; -- AS IMPLICIT;
00890 -- CREATE CAST (integer AS float8[]) WITH FUNCTION MADLIB_SCHEMA.float8arr_cast_int4(integer) ; -- AS IMPLICIT;
00891 -- CREATE CAST (bigint AS float8[]) WITH FUNCTION MADLIB_SCHEMA.float8arr_cast_int8(bigint) ; -- AS IMPLICIT;
00892 -- CREATE CAST (float4 AS float8[]) WITH FUNCTION MADLIB_SCHEMA.float8arr_cast_float4(float4) ; -- AS IMPLICIT;
00893 -- CREATE CAST (float8 AS float8[]) WITH FUNCTION MADLIB_SCHEMA.float8arr_cast_float8(float8) ; -- AS IMPLICIT;
00894 -- CREATE CAST (numeric AS float8[]) WITH FUNCTION MADLIB_SCHEMA.float8arr_cast_numeric(numeric) ; -- AS IMPLICIT;
00895 
00896 -- DROP CAST IF EXISTS (MADLIB_SCHEMA.svec AS float8[]) ;
00897 -- DROP CAST IF EXISTS (float8[] AS MADLIB_SCHEMA.svec) ;
00898 
00899 CREATE CAST (MADLIB_SCHEMA.svec AS float8[]) WITH FUNCTION MADLIB_SCHEMA.svec_return_array(MADLIB_SCHEMA.svec) ; -- AS IMPLICIT;
00900 CREATE CAST (float8[] AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_float8arr(float8[]) ; -- AS IMPLICIT;
00901 
00902 -- DROP OPERATOR IF EXISTS = (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) ;
00903 
00904 
00905 CREATE OPERATOR MADLIB_SCHEMA.= (
00906     leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_eq,
00907     commutator = operator(MADLIB_SCHEMA.=) ,
00908 --  negator = operator(MADLIB_SCHEMA.<>) ,
00909     restrict = eqsel, join = eqjoinsel
00910 );
00911 
00912 --! Transition function for mean(svec) aggregate
00913 --!
00914 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_mean_transition( FLOAT[], MADLIB_SCHEMA.svec) 
00915 RETURNS FLOAT[] AS 'MODULE_PATHNAME'
00916 LANGUAGE C IMMUTABLE; 
00917 
00918 --! Preliminary merge function for mean(svec) aggregate
00919 --!
00920 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_mean_prefunc( FLOAT[], FLOAT[]) 
00921 RETURNS FLOAT[] AS 'MODULE_PATHNAME'
00922 LANGUAGE C IMMUTABLE; 
00923 
00924 --! Final function for mean(svec) aggregate
00925 --!
00926 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_mean_final( FLOAT[]) 
00927 RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME'
00928 LANGUAGE C IMMUTABLE; 
00929 
00930 --! Aggregate that computes the element-wise mean of a list of vectors.
00931 --!
00932 CREATE AGGREGATE MADLIB_SCHEMA.mean( MADLIB_SCHEMA.svec) (
00933     SFUNC = MADLIB_SCHEMA.svec_mean_transition,
00934     m4_ifdef(`__GREENPLUM__',`prefunc = MADLIB_SCHEMA.svec_mean_prefunc,')
00935     FINALFUNC = MADLIB_SCHEMA.svec_mean_final,
00936     STYPE = FLOAT[]
00937 );
00938 
00939 --! Aggregate that provides the element-wise sum of a list of vectors.
00940 --!
00941 -- DROP AGGREGATE IF EXISTS MADLIB_SCHEMA.svec_sum(MADLIB_SCHEMA.svec);
00942 CREATE AGGREGATE MADLIB_SCHEMA.svec_sum (MADLIB_SCHEMA.svec) (
00943     SFUNC = MADLIB_SCHEMA.svec_plus,
00944     m4_ifdef(`__GREENPLUM__',`prefunc=MADLIB_SCHEMA.svec_plus,')
00945     INITCOND = '{1}:{0.}', -- Zero
00946     STYPE = MADLIB_SCHEMA.svec
00947 );
00948 
00949 --! Aggregate that provides a tally of nonzero entries in a list of vectors.
00950 --!
00951 -- DROP AGGREGATE IF EXISTS MADLIB_SCHEMA.svec_count_nonzero(MADLIB_SCHEMA.svec);
00952 CREATE AGGREGATE MADLIB_SCHEMA.svec_count_nonzero (MADLIB_SCHEMA.svec) (
00953     SFUNC = MADLIB_SCHEMA.svec_count,
00954     m4_ifdef(`__GREENPLUM__',`prefunc=MADLIB_SCHEMA.svec_plus,')
00955     INITCOND = '{1}:{0.}', -- Zero
00956     STYPE = MADLIB_SCHEMA.svec
00957 );
00958 
00959 --! Aggregate that turns a list of float8 values into an SVEC.
00960 --!
00961 -- DROP AGGREGATE IF EXISTS MADLIB_SCHEMA.svec_agg(float8);
00962 CREATE
00963 m4_ifdef(`__GREENPLUM__', m4_ifdef(`__HAS_ORDERED_AGGREGATES__', `ORDERED'))
00964 AGGREGATE MADLIB_SCHEMA.svec_agg (float8) (
00965     SFUNC = MADLIB_SCHEMA.svec_pivot,
00966     m4_ifdef(`__GREENPLUM__', m4_ifdef(`__HAS_ORDERED_AGGREGATES__', `', ``prefunc=MADLIB_SCHEMA.svec_concat,''))
00967     STYPE = MADLIB_SCHEMA.svec
00968 );
00969 
00970 --! Aggregate that computes the median element of a list of float8 values.
00971 --!
00972 -- DROP AGGREGATE IF EXISTS MADLIB_SCHEMA.svec_median_inmemory(float8);
00973 CREATE AGGREGATE MADLIB_SCHEMA.svec_median_inmemory (float8) (
00974     SFUNC = MADLIB_SCHEMA.svec_pivot,
00975     m4_ifdef(`__GREENPLUM__',`prefunc=MADLIB_SCHEMA.svec_concat,')
00976     FINALFUNC = MADLIB_SCHEMA.svec_median,
00977     STYPE = MADLIB_SCHEMA.svec
00978 );
00979 
00980 -- Comparisons based on L2 Norm
00981 --! Returns true if the l2 norm of the first SVEC is less than that of the second SVEC.
00982 --!
00983 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_lt(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS bool AS 'MODULE_PATHNAME', 'svec_l2_lt' LANGUAGE C IMMUTABLE;
00984 
00985 --! Returns true if the l2 norm of the first SVEC is less than or equal to that of the second SVEC.
00986 --!
00987 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_le(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS bool AS 'MODULE_PATHNAME', 'svec_l2_le' LANGUAGE C IMMUTABLE;
00988 
00989 --! Returns true if the l2 norm of the first SVEC is equal to that of the second SVEC.
00990 --!
00991 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_eq(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS bool AS 'MODULE_PATHNAME', 'svec_l2_eq' LANGUAGE C IMMUTABLE;
00992 
00993 --! Returns true if the l2 norm of the first SVEC is not equal to that of the second SVEC.
00994 --!
00995 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_ne(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS bool AS 'MODULE_PATHNAME', 'svec_l2_ne' LANGUAGE C IMMUTABLE;
00996 
00997 --! Returns true if the l2 norm of the first SVEC is greater than that of the second SVEC.
00998 --!
00999 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_gt(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS bool AS 'MODULE_PATHNAME', 'svec_l2_gt' LANGUAGE C IMMUTABLE;
01000 
01001 --! Returns true if the l2 norm of the first SVEC is greater than or equal to that of the second SVEC.
01002 --!
01003 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_ge(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS bool AS 'MODULE_PATHNAME', 'svec_l2_ge' LANGUAGE C IMMUTABLE;
01004 
01005 --! Returns a value indicating the relative values of the l2 norms of two SVECs.
01006 --!
01007 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_cmp(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS integer AS 'MODULE_PATHNAME', 'svec_l2_cmp' LANGUAGE C IMMUTABLE;
01008 
01009 --! Normalizes an SVEC that is divides all elements by its norm/magnitude.
01010 --!
01011 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.normalize(MADLIB_SCHEMA.svec) 
01012 RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_normalize' LANGUAGE C IMMUTABLE STRICT;
01013 
01014 /*
01015 DROP OPERATOR IF EXISTS < (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) CASCADE ;
01016 DROP OPERATOR IF EXISTS <= (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) CASCADE ;
01017 DROP OPERATOR IF EXISTS <> (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) ;
01018 DROP OPERATOR IF EXISTS == (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) CASCADE ;
01019 DROP OPERATOR IF EXISTS > (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) CASCADE ;
01020 DROP OPERATOR IF EXISTS >= (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) CASCADE ;
01021 DROP OPERATOR IF EXISTS *|| (int4, MADLIB_SCHEMA.svec) ;
01022 */
01023 
01024 CREATE OPERATOR MADLIB_SCHEMA.< (
01025     leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_l2_lt,
01026     commutator = operator(MADLIB_SCHEMA.>) , negator = operator(MADLIB_SCHEMA.>=) ,
01027     restrict = scalarltsel, join = scalarltjoinsel
01028 );
01029 CREATE OPERATOR MADLIB_SCHEMA.<= (
01030     leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_l2_le,
01031     commutator = operator(MADLIB_SCHEMA.>=) , negator = operator(MADLIB_SCHEMA.>) ,
01032     restrict = scalarltsel, join = scalarltjoinsel
01033 );
01034 CREATE OPERATOR MADLIB_SCHEMA.<> (
01035     leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_l2_eq,
01036     commutator = operator(MADLIB_SCHEMA.<>) ,
01037     negator = operator(MADLIB_SCHEMA.=),
01038     restrict = eqsel, join = eqjoinsel
01039 );
01040 CREATE OPERATOR MADLIB_SCHEMA.== (
01041     leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_l2_eq,
01042     commutator = operator(MADLIB_SCHEMA.=) ,
01043     negator = operator(MADLIB_SCHEMA.<>) ,
01044     restrict = eqsel, join = eqjoinsel
01045 );
01046 CREATE OPERATOR MADLIB_SCHEMA.>= (
01047     leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_l2_ge,
01048     commutator = operator(MADLIB_SCHEMA.<=) , negator = operator(MADLIB_SCHEMA.<) ,
01049     restrict = scalargtsel, join = scalargtjoinsel
01050 );
01051 CREATE OPERATOR MADLIB_SCHEMA.> (
01052     leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_l2_gt,
01053     commutator = operator(MADLIB_SCHEMA.<) , negator = operator(MADLIB_SCHEMA.<=) ,
01054     restrict = scalargtsel, join = scalargtjoinsel
01055 );
01056 
01057 CREATE OPERATOR MADLIB_SCHEMA.*|| (
01058     leftarg = int4, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_concat_replicate
01059 );
01060 
01061 CREATE OPERATOR CLASS MADLIB_SCHEMA.svec_l2_ops
01062 DEFAULT FOR TYPE MADLIB_SCHEMA.svec USING btree AS
01063 OPERATOR        1       MADLIB_SCHEMA.< ,
01064 OPERATOR        2       MADLIB_SCHEMA.<= ,
01065 OPERATOR        3       MADLIB_SCHEMA.== ,
01066 OPERATOR        4       MADLIB_SCHEMA.>= ,
01067 OPERATOR        5       MADLIB_SCHEMA.> ,
01068 FUNCTION        1       MADLIB_SCHEMA.svec_l2_cmp(MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec);
01069