MADlib
0.7 A newer version is available
User Documentation
|
00001 /* ----------------------------------------------------------------------- *//** 00002 * 00003 * @file svec.sql_in 00004 * 00005 * @brief SQL type definitions and functions for sparse vector data type 00006 * <tt>svec</tt> 00007 * 00008 * @sa For an introduction to the sparse vector implementation, see the module 00009 * description \ref grp_svec. 00010 * 00011 *//* ----------------------------------------------------------------------- */ 00012 00013 m4_include(`SQLCommon.m4') 00014 00015 /** 00016 @addtogroup grp_svec 00017 00018 @about 00019 00020 This module implements a sparse vector data type named "svec", which 00021 gives compressed storage of sparse vectors with many duplicate elements. 00022 00023 When we use arrays of floating point numbers for various calculations, 00024 we will sometimes have long runs of zeros (or some other default value). 00025 This is common in applications like scientific computing, 00026 retail optimization, and text processing. Each floating point number takes 00027 8 bytes of storage in memory and/or disk, so saving those zeros is often 00028 worthwhile. There are also many computations that can benefit from skipping 00029 over the zeros. 00030 00031 To focus the discussion, consider, for example, the following 00032 array of doubles stored as a Postgres/GP "float8[]" data type: 00033 00034 \code 00035 '{0, 33,...40,000 zeros..., 12, 22 }'::float8[]. 00036 \endcode 00037 00038 This array would occupy slightly more than 320KB of memory/disk, most of 00039 it zeros. Even if we were to exploit the null bitmap and store the zeros 00040 as nulls, we would still end up with a 5KB null bitmap, which is still 00041 not nearly as memory efficient as we'd like. Also, as we perform various 00042 operations on the array, we'll often be doing work on 40,000 fields that 00043 would turn out not to be important. 00044 00045 To solve the problems associated with the processing of sparse vectors 00046 discussed above, we adopt a simple Run Length Encoding (RLE) scheme to 00047 represent sparse vectors as pairs of count-value arrays. So, for example, 00048 the array above would be represented as follows 00049 00050 \code 00051 '{1,1,40000,1,1}:{0,33,0,12,22}'::MADLIB_SCHEMA.svec, 00052 \endcode 00053 00054 which says there is 1 occurrence of 0, followed by 1 occurrence of 33, 00055 followed by 40,000 occurrences of 0, etc. In contrast to the naive 00056 representations, we only need 5 integers and 5 floating point numbers 00057 to store the array. Further, it is easy to implement vector operations 00058 that can take advantage of the RLE representation to make computations 00059 faster. The module provides a library of such functions. 00060 00061 The current version only supports sparse vectors of float8 00062 values. Future versions will support other base types. 00063 00064 @usage 00065 00066 SVEC's can be constructed directly as follows: 00067 <pre> 00068 SELECT '{n1,n2,...,nk}:{v1,v2,...vk}'::MADLIB_SCHEMA.svec; 00069 </pre> 00070 WHERE <tt>n1,n2,...,nk</tt> specifies the counts for the values <tt>v1,v2,...,vk</tt>. 00071 00072 Or, SVEC's can also be casted from a float array: 00073 <pre> 00074 SELECT ('{v1,v2,...vk}'::float[])::MADLIB_SCHEMA.svec; 00075 </pre> 00076 00077 Syntax reference can be found in svec.sql_in. 00078 00079 Users need to add MADLIB_SCHEMA to their search_path to use the svec operators 00080 defined in the module. 00081 00082 @examp 00083 00084 We can use operations with svec type like <, >, *, **, /, =, +, SUM, etc, 00085 and they have meanings associated with typical vector operations. For 00086 example, the plus (+) operator adds each of the terms of two vectors having 00087 the same dimension together. 00088 \code 00089 sql> SELECT ('{0,1,5}'::float8[]::MADLIB_SCHEMA.svec + '{4,3,2}'::float8[]::MADLIB_SCHEMA.svec)::float8[]; 00090 float8 00091 --------- 00092 {4,4,7} 00093 \endcode 00094 00095 Without the casting into float8[] at the end, we get: 00096 \code 00097 sql> SELECT '{0,1,5}'::float8[]::MADLIB_SCHEMA.svec + '{4,3,2}'::float8[]::MADLIB_SCHEMA.svec; 00098 ?column? 00099 ---------- 00100 {2,1}:{4,7} 00101 \endcode 00102 00103 A dot product (%*%) between the two vectors will result in a scalar 00104 result of type float8. The dot product should be (0*4 + 1*3 + 5*2) = 13, 00105 like this: 00106 \code 00107 sql> SELECT '{0,1,5}'::float8[]::MADLIB_SCHEMA.svec %*% '{4,3,2}'::float8[]::MADLIB_SCHEMA.svec; 00108 ?column? 00109 ---------- 00110 13 00111 \endcode 00112 00113 Special vector aggregate functions are also available. SUM is self 00114 explanatory. SVEC_COUNT_NONZERO evaluates the count of non-zero terms 00115 in each column found in a set of n-dimensional svecs and returns an 00116 svec with the counts. For instance, if we have the vectors {0,1,5}, 00117 {10,0,3},{0,0,3},{0,1,0}, then executing the SVEC_COUNT_NONZERO() aggregate 00118 function would result in {1,2,3}: 00119 00120 \code 00121 sql> create table list (a MADLIB_SCHEMA.svec); 00122 sql> insert into list values ('{0,1,5}'::float8[]), ('{10,0,3}'::float8[]), ('{0,0,3}'::float8[]),('{0,1,0}'::float8[]); 00123 00124 sql> SELECT MADLIB_SCHEMA.svec_count_nonzero(a)::float8[] FROM list; 00125 svec_count_nonzero 00126 ----------------- 00127 {1,2,3} 00128 \endcode 00129 00130 We do not use null bitmaps in the svec data type. A null value in an svec 00131 is represented explicitly as an NVP (No Value Present) value. For example, 00132 we have: 00133 \code 00134 sql> SELECT '{1,2,3}:{4,null,5}'::MADLIB_SCHEMA.svec; 00135 svec 00136 ------------------- 00137 {1,2,3}:{4,NVP,5} 00138 00139 sql> SELECT '{1,2,3}:{4,null,5}'::MADLIB_SCHEMA.svec + '{2,2,2}:{8,9,10}'::MADLIB_SCHEMA.svec; 00140 ?column? 00141 -------------------------- 00142 {1,2,1,2}:{12,NVP,14,15} 00143 \endcode 00144 00145 An element of an svec can be accessed using the svec_proj() function, 00146 which takes an svec and the index of the element desired. 00147 \code 00148 sql> SELECT MADLIB_SCHEMA.svec_proj('{1,2,3}:{4,5,6}'::MADLIB_SCHEMA.svec, 1) + MADLIB_SCHEMA.svec_proj('{4,5,6}:{1,2,3}'::MADLIB_SCHEMA.svec, 15); 00149 ?column? 00150 ---------- 00151 7 00152 \endcode 00153 00154 A subvector of an svec can be accessed using the svec_subvec() function, 00155 which takes an svec and the start and end index of the subvector desired. 00156 \code 00157 sql> SELECT MADLIB_SCHEMA.svec_subvec('{2,4,6}:{1,3,5}'::MADLIB_SCHEMA.svec, 2, 11); 00158 svec_subvec 00159 ----------------- 00160 {1,4,5}:{1,3,5} 00161 \endcode 00162 00163 The elements/subvector of an svec can be changed using the function 00164 svec_change(). It takes three arguments: an m-dimensional svec sv1, a 00165 start index j, and an n-dimensional svec sv2 such that j + n - 1 <= m, 00166 and returns an svec like sv1 but with the subvector sv1[j:j+n-1] 00167 replaced by sv2. An example follows: 00168 \code 00169 sql> SELECT MADLIB_SCHEMA.svec_change('{1,2,3}:{4,5,6}'::MADLIB_SCHEMA.svec,3,'{2}:{3}'::MADLIB_SCHEMA.svec); 00170 svec_change 00171 --------------------- 00172 {1,1,2,2}:{4,5,3,6} 00173 \endcode 00174 00175 There are also higher-order functions for processing svecs. For example, 00176 the following is the corresponding function for lapply() in R. 00177 \code 00178 sql> SELECT MADLIB_SCHEMA.svec_lapply('sqrt', '{1,2,3}:{4,5,6}'::MADLIB_SCHEMA.svec); 00179 svec_lapply 00180 ----------------------------------------------- 00181 {1,2,3}:{2,2.23606797749979,2.44948974278318} 00182 \endcode 00183 00184 The full list of functions available for operating on svecs are available 00185 in svec.sql. 00186 00187 <b> A More Extensive Example</b> 00188 00189 For a text classification example, let's assume we have a dictionary 00190 composed of words in a sorted text array: 00191 \code 00192 sql> create table features (a text[]); 00193 sql> insert into features values 00194 ('{am,before,being,bothered,corpus,document,i,in,is,me, 00195 never,now,one,really,second,the,third,this,until}'); 00196 \endcode 00197 We have a set of documents, each represented as an array of words: 00198 \code 00199 sql> create table documents(a int,b text[]); 00200 sql> insert into documents values 00201 (1,'{this,is,one,document,in,the,corpus}'), 00202 (2,'{i,am,the,second,document,in,the,corpus}'), 00203 (3,'{being,third,never,really,bothered,me,until,now}'), 00204 (4,'{the,document,before,me,is,the,third,document}'); 00205 \endcode 00206 00207 Now we have a dictionary and some documents, we would like to do some 00208 document categorization using vector arithmetic on word counts and 00209 proportions of dictionary words in each document. 00210 00211 To start this process, we'll need to find the dictionary words in each 00212 document. We'll prepare what is called a Sparse Feature Vector or SFV 00213 for each document. An SFV is a vector of dimension N, where N is the 00214 number of dictionary words, and in each cell of an SFV is a count of 00215 each dictionary word in the document. 00216 00217 Inside the sparse vector library, we have a function that will create 00218 an SFV from a document, so we can just do this: 00219 \code 00220 sql> SELECT MADLIB_SCHEMA.svec_sfv((SELECT a FROM features LIMIT 1),b)::float8[] 00221 FROM documents; 00222 00223 svec_sfv 00224 ----------------------------------------- 00225 {0,0,0,0,1,1,0,1,1,0,0,0,1,0,0,1,0,1,0} 00226 {0,0,1,1,0,0,0,0,0,1,1,1,0,1,0,0,1,0,1} 00227 {1,0,0,0,1,1,1,1,0,0,0,0,0,0,1,2,0,0,0} 00228 {0,1,0,0,0,2,0,0,1,1,0,0,0,0,0,2,1,0,0} 00229 \endcode 00230 Note that the output of MADLIB_SCHEMA.svec_sfv() is an svec for each 00231 document containing the count of each of the dictionary words in the 00232 ordinal positions of the dictionary. This can more easily be understood 00233 by lining up the feature vector and text like this: 00234 \code 00235 sql> SELECT MADLIB_SCHEMA.svec_sfv((SELECT a FROM features LIMIT 1),b)::float8[] 00236 , b 00237 FROM documents; 00238 00239 svec_sfv | b 00240 -----------------------------------------+-------------------------------------------------- 00241 {1,0,0,0,1,1,1,1,0,0,0,0,0,0,1,2,0,0,0} | {i,am,the,second,document,in,the,corpus} 00242 {0,1,0,0,0,2,0,0,1,1,0,0,0,0,0,2,1,0,0} | {the,document,before,me,is,the,third,document} 00243 {0,0,0,0,1,1,0,1,1,0,0,0,1,0,0,1,0,1,0} | {this,is,one,document,in,the,corpus} 00244 {0,0,1,1,0,0,0,0,0,1,1,1,0,1,0,0,1,0,1} | {being,third,never,really,bothered,me,until,now} 00245 00246 sql> SELECT * FROM features; 00247 a 00248 -------------------------------------------------------------------------------------------------------- 00249 {am,before,being,bothered,corpus,document,i,in,is,me,never,now,one,really,second,the,third,this,until} 00250 \endcode 00251 00252 Now when we look at the document "i am the second document in the corpus", 00253 its SFV is {1,3*0,1,1,1,1,6*0,1,2}. The word "am" is the first ordinate in 00254 the dictionary and there is 1 instance of it in the SFV. The word "before" 00255 has no instances in the document, so its value is "0" and so on. 00256 00257 The function MADLIB_SCHEMA.svec_sfv() can process large 00258 numbers of documents into their SFVs in parallel at high speed. 00259 00260 The rest of the categorization process is all vector math. The actual 00261 count is hardly ever used. Instead, it's turned into a weight. The most 00262 common weight is called tf/idf for Term Frequency / Inverse Document 00263 Frequency. The calculation for a given term in a given document is 00264 \code 00265 {#Times in document} * log {#Documents / #Documents the term appears in}. 00266 \endcode 00267 For instance, the term "document" in document A would have weight 00268 1 * log (4/3). In document D, it would have weight 2 * log (4/3). 00269 Terms that appear in every document would have tf/idf weight 0, since 00270 log (4/4) = log(1) = 0. (Our example has no term like that.) That 00271 usually sends a lot of values to 0. 00272 00273 For this part of the processing, we'll need to have a sparse vector of 00274 the dictionary dimension (19) with the values 00275 \code 00276 log(#documents/#Documents each term appears in). 00277 \endcode 00278 There will be one such vector for the whole list of documents (aka the 00279 "corpus"). The #documents is just a count of all of the documents, in 00280 this case 4, but there is one divisor for each dictionary word and its 00281 value is the count of all the times that word appears in the document. 00282 This single vector for the whole corpus can then be scalar product 00283 multiplied by each document SFV to produce the Term Frequency/Inverse 00284 Document Frequency weights. 00285 00286 This can be done as follows: 00287 \code 00288 sql> create table corpus as 00289 (SELECT a, MADLIB_SCHEMA.svec_sfv((SELECT a FROM features LIMIT 1),b) sfv 00290 FROM documents); 00291 sql> create table weights as 00292 (SELECT a docnum, MADLIB_SCHEMA.svec_mult(sfv, logidf) tf_idf 00293 FROM (SELECT MADLIB_SCHEMA.svec_log(MADLIB_SCHEMA.svec_div(count(sfv)::MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec_count_nonzero(sfv))) logidf 00294 FROM corpus) foo, corpus ORDER BYdocnum); 00295 sql> SELECT * FROM weights; 00296 00297 docnum | tf_idf 00298 -------+---------------------------------------------------------------------- 00299 1 | {4,1,1,1,2,3,1,2,1,1,1,1}:{0,0.69,0.28,0,0.69,0,1.38,0,0.28,0,1.38,0} 00300 2 | {1,3,1,1,1,1,6,1,1,3}:{1.38,0,0.69,0.28,1.38,0.69,0,1.38,0.57,0} 00301 3 | {2,2,5,1,2,1,1,2,1,1,1}:{0,1.38,0,0.69,1.38,0,1.38,0,0.69,0,1.38} 00302 4 | {1,1,3,1,2,2,5,1,1,2}:{0,1.38,0,0.57,0,0.69,0,0.57,0.69,0} 00303 \endcode 00304 00305 We can now get the "angular distance" between one document and the rest 00306 of the documents using the ACOS of the dot product of the document vectors: 00307 The following calculates the angular distance between the first document 00308 and each of the other documents: 00309 \code 00310 sql> SELECT docnum, 00311 180. * ( ACOS( MADLIB_SCHEMA.svec_dmin( 1., MADLIB_SCHEMA.svec_dot(tf_idf, testdoc) 00312 / (MADLIB_SCHEMA.svec_l2norm(tf_idf)*MADLIB_SCHEMA.svec_l2norm(testdoc))))/3.141592654) angular_distance 00313 FROM weights,(SELECT tf_idf testdoc FROM weights WHERE docnum = 1 LIMIT 1) foo 00314 ORDER BY 1; 00315 00316 docnum | angular_distance 00317 --------+------------------ 00318 1 | 0 00319 2 | 78.8235846096986 00320 3 | 89.9999999882484 00321 4 | 80.0232034288617 00322 \endcode 00323 We can see that the angular distance between document 1 and itself 00324 is 0 degrees and between document 1 and 3 is 90 degrees because they 00325 share no features at all. The angular distance can now be plugged into 00326 machine learning algorithms that rely on a distance measure between 00327 data points. 00328 00329 SVEC also provides functionality for declaring array given and array of positions and array of values, intermediate values betweens those 00330 are declared to be base value that user provides in the same function call. In the example below the fist array of integers represents the 00331 positions for the array two (array of floats). Positions do not need to come in the sorted order. 00332 Third value represents desired maximum size of the array. This assures that array is of that size 00333 even if last position is not. If max size < 1 that value is ignored and array will end at the last position in the position vector. Final value is a float representing the base value to be used between the declared ones (0 would be a common candidate): 00334 \code 00335 sql> SELECT MADLIB_SCHEMA.svec_cast_positions_float8arr(ARRAY[1,2,7,5,87],ARRAY[.1,.2,.7,.5,.87],90,0.0); 00336 00337 svec_cast_positions_float8arr 00338 ----------------------------------------------------- 00339 {1,1,2,1,1,1,79,1,3}:{0.1,0.2,0,0.5,0,0.7,0,0.87,0} 00340 (1 row) 00341 \endcode 00342 00343 Other examples of svecs usage can be found in the k-means module. 00344 00345 @sa File svec.sql_in documenting the SQL functions. 00346 00347 @internal 00348 @sa File sparse_vector.c documenting the implementation in C. 00349 @endinternal 00350 */ 00351 00352 00353 --! @file svec.sql_in 00354 --! 00355 00356 -- DROP SCHEMA MADLIB_SCHEMA CASCADE; 00357 -- CREATE SCHEMA MADLIB_SCHEMA; 00358 00359 -- DROP TYPE IF EXISTS MADLIB_SCHEMA.svec CASCADE; 00360 CREATE TYPE MADLIB_SCHEMA.svec; 00361 00362 --! SVEC constructor from CSTRING. 00363 --! 00364 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_in(cstring) 00365 RETURNS MADLIB_SCHEMA.svec 00366 AS 'MODULE_PATHNAME' 00367 LANGUAGE C IMMUTABLE STRICT; 00368 00369 --! Converts SVEC to CSTRING. 00370 --! 00371 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_out(MADLIB_SCHEMA.svec) 00372 RETURNS cstring 00373 AS 'MODULE_PATHNAME' 00374 LANGUAGE C IMMUTABLE STRICT; 00375 00376 --! Converts SVEC internal representation to SVEC. 00377 --! 00378 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_recv(internal) 00379 RETURNS MADLIB_SCHEMA.svec 00380 AS 'MODULE_PATHNAME' 00381 LANGUAGE C IMMUTABLE STRICT; 00382 00383 --! Converts SVEC to BYTEA. 00384 --! 00385 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_send(MADLIB_SCHEMA.svec) 00386 RETURNS bytea 00387 AS 'MODULE_PATHNAME' 00388 LANGUAGE C IMMUTABLE STRICT; 00389 00390 CREATE TYPE MADLIB_SCHEMA.svec ( 00391 internallength = VARIABLE, 00392 input = MADLIB_SCHEMA.svec_in, 00393 output = MADLIB_SCHEMA.svec_out, 00394 send = MADLIB_SCHEMA.svec_send, 00395 receive = MADLIB_SCHEMA.svec_recv, 00396 storage=EXTENDED, 00397 alignment = double 00398 ); 00399 00400 --! Basic floating point scalar operator: MIN. 00401 --! 00402 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dmin(float8,float8) RETURNS float8 AS 'MODULE_PATHNAME', 'float8_min' LANGUAGE C IMMUTABLE; 00403 00404 --! Basic floating point scalar operator: MAX. 00405 --! 00406 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dmax(float8,float8) RETURNS float8 AS 'MODULE_PATHNAME', 'float8_max' LANGUAGE C IMMUTABLE; 00407 00408 --! Counts the number of non-zero entries in the input vector; the second argument is capped at 1, then added to the first; used as the sfunc in the svec_count_nonzero() aggregate below. 00409 --! 00410 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_count(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec 00411 AS 'MODULE_PATHNAME', 'svec_count' STRICT LANGUAGE C IMMUTABLE; 00412 00413 --! Adds two SVECs together, element by element. 00414 --! 00415 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_plus(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_plus' STRICT LANGUAGE C IMMUTABLE; 00416 00417 --! Minus second SVEC from the first, element by element. 00418 --! 00419 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_minus(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_minus' STRICT LANGUAGE C IMMUTABLE; 00420 00421 --! Computes the logarithm of each element of the input SVEC. 00422 --! 00423 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_log(MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_log' STRICT LANGUAGE C IMMUTABLE; 00424 00425 --! Divides the first SVEC by the second, element by element. 00426 --! 00427 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_div(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_div' STRICT LANGUAGE C IMMUTABLE; 00428 00429 --! Multiplies two SVEVs together, element by element. 00430 --! 00431 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_mult(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_mult' STRICT LANGUAGE C IMMUTABLE; 00432 00433 --! Raises each element of the first SVEC to the power given by second SVEC, which must have dimension 1 (a scalar). 00434 --! 00435 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_pow(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_pow' STRICT LANGUAGE C IMMUTABLE; 00436 00437 --! Returns true if two SVECs are equal. If the two SVEC's are of different size, then will return false. 00438 --! 00439 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_eq(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS boolean AS 'MODULE_PATHNAME', 'svec_eq' STRICT LANGUAGE C IMMUTABLE; 00440 00441 --! Returns true if two SVECs are equal, not counting zeros (zero equals anything). If the two SVEC's are of different size, then the function essentially zero-pads the shorter one and performs the comparison. 00442 --! 00443 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_eq_non_zero(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS boolean AS 'MODULE_PATHNAME', 'svec_eq_non_zero' STRICT LANGUAGE C IMMUTABLE; 00444 00445 --! Returns true if left svec contains right one, meaning that every non-zero value in the right svec equals left one 00446 --! 00447 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_contains(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS boolean AS 'MODULE_PATHNAME', 'svec_contains' STRICT LANGUAGE C IMMUTABLE; 00448 00449 --! Returns true if two float8 arrays are equal 00450 --! 00451 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_eq(float8[],float8[]) RETURNS boolean AS 'MODULE_PATHNAME', 'float8arr_equals' LANGUAGE C IMMUTABLE; 00452 00453 --! Minus second array from the first array, element by element. 00454 --! 00455 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_minus_float8arr(float8[],float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_minus_float8arr' LANGUAGE C IMMUTABLE; 00456 00457 --! Minus second SVEC from the first array, element by element. 00458 --! 00459 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_minus_svec(float8[],MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_minus_svec' LANGUAGE C IMMUTABLE; 00460 00461 --! Minus second array from the first SVEC, element by element. 00462 --! 00463 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_minus_float8arr(MADLIB_SCHEMA.svec,float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_minus_float8arr' LANGUAGE C IMMUTABLE; 00464 00465 --! Adds two arrays together, element by element. 00466 --! 00467 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_plus_float8arr(float8[],float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_plus_float8arr' LANGUAGE C IMMUTABLE; 00468 00469 --! Adds an array and an SVEC, element by element. 00470 --! 00471 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_plus_svec(float8[],MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_plus_svec' LANGUAGE C IMMUTABLE; 00472 00473 --! Adds an SVEC and an array, element by element. 00474 --! 00475 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_plus_float8arr(MADLIB_SCHEMA.svec,float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_plus_float8arr' LANGUAGE C IMMUTABLE; 00476 00477 --! Multiplies two float8 arrays, element by element. 00478 --! 00479 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_mult_float8arr(float8[],float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_mult_float8arr' LANGUAGE C IMMUTABLE; 00480 00481 --! Multiplies an array and an SVEC, element by element. 00482 --! 00483 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_mult_svec(float8[],MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_mult_svec' LANGUAGE C IMMUTABLE; 00484 00485 --! Multiplies an SVEC and an array, element by element. 00486 --! 00487 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_mult_float8arr(MADLIB_SCHEMA.svec,float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_mult_float8arr' LANGUAGE C IMMUTABLE; 00488 00489 --! Divides a float8 array by another, element by element. 00490 --! 00491 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_div_float8arr(float8[],float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_div_float8arr' LANGUAGE C IMMUTABLE; 00492 00493 --! Divides a float8 array by an SVEC, element by element. 00494 --! 00495 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_div_svec(float8[],MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'float8arr_div_svec' LANGUAGE C IMMUTABLE; 00496 00497 --! Divides an SVEC by a float8 array, element by element. 00498 --! 00499 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_div_float8arr(MADLIB_SCHEMA.svec,float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_div_float8arr' LANGUAGE C IMMUTABLE; 00500 00501 --! Computes the dot product of two SVECs. 00502 --! 00503 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dot(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_dot' STRICT LANGUAGE C IMMUTABLE; 00504 00505 --! Computes the dot product of two float8 arrays. 00506 --! 00507 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dot(float8[],float8[]) RETURNS float8 AS 'MODULE_PATHNAME', 'float8arr_dot' STRICT LANGUAGE C IMMUTABLE; 00508 00509 --! Computes the dot product of an SVEC and a float8 array. 00510 --! 00511 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dot(MADLIB_SCHEMA.svec,float8[]) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_dot_float8arr' STRICT LANGUAGE C IMMUTABLE; 00512 00513 --! Computes the dot product of a float8 array and an SVEC. 00514 --! 00515 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dot(float8[],MADLIB_SCHEMA.svec) RETURNS float8 AS 'MODULE_PATHNAME', 'float8arr_dot_svec' STRICT LANGUAGE C IMMUTABLE; 00516 00517 --! Computes the l2norm of an SVEC. 00518 --! 00519 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2norm(MADLIB_SCHEMA.svec) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_l2norm' STRICT LANGUAGE C IMMUTABLE; 00520 00521 --! Computes the l2norm of a float8 array. 00522 --! 00523 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2norm(float8[]) RETURNS float8 AS 'MODULE_PATHNAME', 'float8arr_l2norm' LANGUAGE C IMMUTABLE; 00524 00525 --! Computes the l2norm distance between two SVECs. 00526 --! 00527 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.l2norm(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) 00528 RETURNS float8 AS 'MODULE_PATHNAME', 'svec_svec_l2norm' LANGUAGE C STRICT IMMUTABLE; 00529 00530 --! Computes the l1norm distance between two SVECs. 00531 --! 00532 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.l1norm(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) 00533 RETURNS float8 AS 'MODULE_PATHNAME', 'svec_svec_l1norm' LANGUAGE C STRICT IMMUTABLE; 00534 00535 --! Computes the l1norm of an SVEC. 00536 --! 00537 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l1norm(MADLIB_SCHEMA.svec) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_l1norm' STRICT LANGUAGE C IMMUTABLE; 00538 00539 --! Computes the l1norm of a float8 array. 00540 --! 00541 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l1norm(float8[]) RETURNS float8 AS 'MODULE_PATHNAME', 'float8arr_l1norm' STRICT LANGUAGE C IMMUTABLE; 00542 00543 --! Computes the angle between two SVECs in radians. 00544 --! 00545 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.angle(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) 00546 RETURNS float8 AS 'MODULE_PATHNAME', 'svec_svec_angle' LANGUAGE C STRICT IMMUTABLE; 00547 00548 --! Computes the Tanimoto distance between two SVECs. 00549 --! 00550 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.tanimoto_distance(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) 00551 RETURNS float8 AS 'MODULE_PATHNAME', 'svec_svec_tanimoto_distance' LANGUAGE C STRICT IMMUTABLE; 00552 00553 --! Unnests an SVEC into a table of uncompressed values 00554 --! 00555 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_unnest(MADLIB_SCHEMA.svec) RETURNS setof float8 AS 'MODULE_PATHNAME', 'svec_unnest' LANGUAGE C IMMUTABLE; 00556 00557 --! Appends an element to the back of an SVEC. 00558 --! 00559 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_pivot(MADLIB_SCHEMA.svec,float8) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_pivot' LANGUAGE C IMMUTABLE; 00560 00561 --! Sums the elements of an SVEC. 00562 --! 00563 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_elsum(MADLIB_SCHEMA.svec) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_summate' STRICT LANGUAGE C IMMUTABLE; 00564 00565 --! Sums the elements of a float8 array. 00566 --! 00567 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_elsum(float8[]) RETURNS float8 AS 'MODULE_PATHNAME', 'float8arr_summate' STRICT LANGUAGE C IMMUTABLE; 00568 00569 --! Computes the median element of a float8 array. 00570 --! 00571 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_median(float8[]) RETURNS float8 AS 'MODULE_PATHNAME', 'float8arr_median' STRICT LANGUAGE C IMMUTABLE; 00572 00573 --! Computes the median element of an SVEC. 00574 --! 00575 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_median(MADLIB_SCHEMA.svec) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_median' STRICT LANGUAGE C IMMUTABLE; 00576 00577 --! Compares an SVEC to a float8, and returns positions of all elements not equal to the float as an array. Element index here starts at 0. 00578 --! 00579 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_nonbase_positions(MADLIB_SCHEMA.svec, FLOAT8) RETURNS INT8[] AS 'MODULE_PATHNAME', 'svec_nonbase_positions' STRICT LANGUAGE C IMMUTABLE; 00580 00581 --! Compares an SVEC to a float8, and returns values of all elements not equal to the float as an array. 00582 --! 00583 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_nonbase_values(MADLIB_SCHEMA.svec, FLOAT8) RETURNS FLOAT8[] AS 'MODULE_PATHNAME', 'svec_nonbase_values' STRICT LANGUAGE C IMMUTABLE; 00584 00585 00586 --! Casts an int2 into an SVEC. 00587 --! 00588 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_int2(int2) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_int2' STRICT LANGUAGE C IMMUTABLE; 00589 00590 --! Casts an int4 into an SVEC. 00591 --! 00592 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_int4(int4) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_int4' STRICT LANGUAGE C IMMUTABLE; 00593 00594 --! Casts an int8 into an SVEC. 00595 --! 00596 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_int8(bigint) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_int8' STRICT LANGUAGE C IMMUTABLE; 00597 00598 --! Casts a float4 into an SVEC. 00599 --! 00600 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_float4(float4) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_float4' STRICT LANGUAGE C IMMUTABLE; 00601 00602 --! Casts a float8 into an SVEC. 00603 --! 00604 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_float8(float8) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_float8' STRICT LANGUAGE C IMMUTABLE; 00605 00606 --! Casts a numeric into an SVEC. 00607 --! 00608 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_numeric(numeric) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_numeric' STRICT LANGUAGE C IMMUTABLE; 00609 00610 --! Casts an int2 into a float8 array. 00611 --! 00612 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_cast_int2(int2) RETURNS float8[] AS 'MODULE_PATHNAME', 'float8arr_cast_int2' STRICT LANGUAGE C IMMUTABLE; 00613 00614 --! Casts an int4 into a float8 array. 00615 --! 00616 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_cast_int4(int4) RETURNS float8[] AS 'MODULE_PATHNAME', 'float8arr_cast_int4' STRICT LANGUAGE C IMMUTABLE; 00617 00618 --! Casts an int8 into a float8 array. 00619 --! 00620 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_cast_int8(bigint) RETURNS float8[] AS 'MODULE_PATHNAME', 'float8arr_cast_int8' STRICT LANGUAGE C IMMUTABLE; 00621 00622 --! Casts a float4 into a float8 array. 00623 --! 00624 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_cast_float4(float4) RETURNS float8[] AS 'MODULE_PATHNAME', 'float8arr_cast_float4' STRICT LANGUAGE C IMMUTABLE; 00625 00626 --! Casts a float8 into a float8 array. 00627 --! 00628 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_cast_float8(float8) RETURNS float8[] AS 'MODULE_PATHNAME', 'float8arr_cast_float8' STRICT LANGUAGE C IMMUTABLE; 00629 00630 --! Casts a numeric into a float8 array. 00631 --! 00632 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.float8arr_cast_numeric(numeric) RETURNS float8[] AS 'MODULE_PATHNAME', 'float8arr_cast_numeric' STRICT LANGUAGE C IMMUTABLE; 00633 00634 --! Casts a float8 into an SVEC. 00635 --! 00636 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_float8arr(float8[]) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_float8arr' STRICT LANGUAGE C IMMUTABLE; 00637 00638 --! Casts an array of int8 positions, float8 values into an SVEC. 00639 --! 00640 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_cast_positions_float8arr(int8[],float8[],int8,float8) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_cast_positions_float8arr' STRICT LANGUAGE C IMMUTABLE; 00641 00642 --! Casts an SVEC into a float8 array. 00643 --! 00644 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_return_array(MADLIB_SCHEMA.svec) RETURNS float8[] AS 'MODULE_PATHNAME', 'svec_return_array' LANGUAGE C IMMUTABLE; 00645 00646 --! Concatenates two SVECs. 00647 --! 00648 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_concat(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_concat' LANGUAGE C IMMUTABLE; 00649 00650 --! Replicates n copies of an SVEC and concatenates them together. 00651 --! 00652 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_concat_replicate(int4,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_concat_replicate' LANGUAGE C IMMUTABLE; 00653 00654 --! Returns the dimension of an SVEC. 00655 --! 00656 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_dimension(MADLIB_SCHEMA.svec) RETURNS integer AS 'MODULE_PATHNAME', 'svec_dimension' LANGUAGE C IMMUTABLE; 00657 00658 --! Applies a given function to each element of an SVEC. 00659 --! 00660 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_lapply(text,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_lapply' LANGUAGE C IMMUTABLE; 00661 00662 --! Appends a run-length block to the back of an SVEC. 00663 --! 00664 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_append(MADLIB_SCHEMA.svec,float8,int8) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_append' LANGUAGE C IMMUTABLE; 00665 00666 --! Projects onto an element of an SVEC. 00667 --! 00668 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_proj(MADLIB_SCHEMA.svec,int4) RETURNS float8 AS 'MODULE_PATHNAME', 'svec_proj' LANGUAGE C IMMUTABLE; 00669 00670 --! Extracts a subvector of an SVEC given the subvector's start and end indices. 00671 --! 00672 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_subvec(MADLIB_SCHEMA.svec,int4,int4) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_subvec' LANGUAGE C IMMUTABLE; 00673 00674 --! Reverses the elements of an SVEC. 00675 --! 00676 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_reverse(MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_reverse' LANGUAGE C IMMUTABLE; 00677 00678 --! Replaces the subvector of a given SVEC at a given start index with another SVEC. Note that element index should start at 1. 00679 --! 00680 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_change(MADLIB_SCHEMA.svec,int4,MADLIB_SCHEMA.svec) RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_change' LANGUAGE C IMMUTABLE; 00681 00682 --! Computes the hash of an SVEC. 00683 --! 00684 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_hash(MADLIB_SCHEMA.svec) RETURNS int4 AS 'MODULE_PATHNAME', 'svec_hash' STRICT LANGUAGE C IMMUTABLE; 00685 00686 --! Computes the word-occurence vector of a document 00687 --! 00688 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_sfv(text[], text[]) RETURNS MADLIB_SCHEMA.svec AS 00689 'MODULE_PATHNAME', 'gp_extract_feature_histogram' LANGUAGE C IMMUTABLE; 00690 00691 --! Sorts an array of texts. This function should be in MADlib common. 00692 --! 00693 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_sort(text[]) RETURNS text[] AS $$ 00694 SELECT array(SELECT unnest($1::text[]) ORDER BY 1); 00695 $$ LANGUAGE SQL; 00696 00697 --! Converts an svec to a text string 00698 --! 00699 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_to_string(MADLIB_SCHEMA.svec) 00700 RETURNS text AS 'MODULE_PATHNAME', 'svec_to_string' STRICT LANGUAGE C IMMUTABLE; 00701 00702 --! Converts a text string to an svec 00703 --! 00704 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_from_string(text) 00705 RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_from_string' STRICT LANGUAGE C IMMUTABLE; 00706 00707 00708 /* 00709 DROP OPERATOR IF EXISTS || ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec); 00710 DROP OPERATOR IF EXISTS - ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec); 00711 DROP OPERATOR IF EXISTS + ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec); 00712 DROP OPERATOR IF EXISTS / ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec); 00713 DROP OPERATOR IF EXISTS %*% ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec); 00714 DROP OPERATOR IF EXISTS * ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec); 00715 DROP OPERATOR IF EXISTS ^ ( MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec); 00716 */ 00717 00718 CREATE OPERATOR MADLIB_SCHEMA.|| ( 00719 LEFTARG = MADLIB_SCHEMA.svec, 00720 RIGHTARG = MADLIB_SCHEMA.svec, 00721 PROCEDURE = MADLIB_SCHEMA.svec_concat 00722 ); 00723 00724 CREATE OPERATOR MADLIB_SCHEMA.- ( 00725 LEFTARG = MADLIB_SCHEMA.svec, 00726 RIGHTARG = MADLIB_SCHEMA.svec, 00727 PROCEDURE = MADLIB_SCHEMA.svec_minus 00728 ); 00729 CREATE OPERATOR MADLIB_SCHEMA.+ ( 00730 LEFTARG = MADLIB_SCHEMA.svec, 00731 RIGHTARG = MADLIB_SCHEMA.svec, 00732 PROCEDURE = MADLIB_SCHEMA.svec_plus 00733 ); 00734 CREATE OPERATOR MADLIB_SCHEMA./ ( 00735 LEFTARG = MADLIB_SCHEMA.svec, 00736 RIGHTARG = MADLIB_SCHEMA.svec, 00737 PROCEDURE = MADLIB_SCHEMA.svec_div 00738 ); 00739 CREATE OPERATOR MADLIB_SCHEMA.%*% ( 00740 LEFTARG = MADLIB_SCHEMA.svec, 00741 RIGHTARG = MADLIB_SCHEMA.svec, 00742 PROCEDURE = MADLIB_SCHEMA.svec_dot 00743 ); 00744 CREATE OPERATOR MADLIB_SCHEMA.* ( 00745 LEFTARG = MADLIB_SCHEMA.svec, 00746 RIGHTARG = MADLIB_SCHEMA.svec, 00747 PROCEDURE = MADLIB_SCHEMA.svec_mult 00748 ); 00749 CREATE OPERATOR MADLIB_SCHEMA.^ ( 00750 LEFTARG = MADLIB_SCHEMA.svec, 00751 RIGHTARG = MADLIB_SCHEMA.svec, 00752 PROCEDURE = MADLIB_SCHEMA.svec_pow 00753 ); 00754 00755 -- float8[] operators 00756 -- DROP OPERATOR IF EXISTS = ( float8[], float8[]); 00757 /* 00758 DROP OPERATOR IF EXISTS %*% ( float8[], MADLIB_SCHEMA.svec); 00759 DROP OPERATOR IF EXISTS %*% ( MADLIB_SCHEMA.svec, float8[]); 00760 DROP OPERATOR IF EXISTS %*% ( float8[], float8[]); 00761 DROP OPERATOR IF EXISTS - ( float8[], float8[]); 00762 DROP OPERATOR IF EXISTS + ( float8[], float8[]); 00763 DROP OPERATOR IF EXISTS * ( float8[], float8[]); 00764 DROP OPERATOR IF EXISTS / ( float8[], float8[]); 00765 DROP OPERATOR IF EXISTS - ( float8[], MADLIB_SCHEMA.svec); 00766 DROP OPERATOR IF EXISTS + ( float8[], MADLIB_SCHEMA.svec); 00767 DROP OPERATOR IF EXISTS * ( float8[], MADLIB_SCHEMA.svec); 00768 DROP OPERATOR IF EXISTS / ( float8[], MADLIB_SCHEMA.svec); 00769 DROP OPERATOR IF EXISTS - ( MADLIB_SCHEMA.svec, float8[]); 00770 DROP OPERATOR IF EXISTS + ( MADLIB_SCHEMA.svec, float8[]); 00771 DROP OPERATOR IF EXISTS * ( MADLIB_SCHEMA.svec, float8[]); 00772 DROP OPERATOR IF EXISTS / ( MADLIB_SCHEMA.svec, float8[]); 00773 */ 00774 00775 /* 00776 CREATE OPERATOR MADLIB_SCHEMA.= ( 00777 leftarg = float8[], 00778 rightarg = float8[], 00779 procedure = MADLIB_SCHEMA.float8arr_eq, 00780 commutator = operator(MADLIB_SCHEMA.=) , 00781 -- negator = operator(MADLIB_SCHEMA.<>) , 00782 restrict = eqsel, join = eqjoinsel 00783 ); 00784 */ 00785 00786 CREATE OPERATOR MADLIB_SCHEMA.%*% ( 00787 LEFTARG = float8[], 00788 RIGHTARG = float8[], 00789 PROCEDURE = MADLIB_SCHEMA.svec_dot 00790 ); 00791 CREATE OPERATOR MADLIB_SCHEMA.%*% ( 00792 LEFTARG = float8[], 00793 RIGHTARG = MADLIB_SCHEMA.svec, 00794 PROCEDURE = MADLIB_SCHEMA.svec_dot 00795 ); 00796 CREATE OPERATOR MADLIB_SCHEMA.%*% ( 00797 LEFTARG = MADLIB_SCHEMA.svec, 00798 RIGHTARG = float8[], 00799 PROCEDURE = MADLIB_SCHEMA.svec_dot 00800 ); 00801 CREATE OPERATOR MADLIB_SCHEMA.- ( 00802 LEFTARG = float8[], 00803 RIGHTARG = float8[], 00804 PROCEDURE = MADLIB_SCHEMA.float8arr_minus_float8arr 00805 ); 00806 CREATE OPERATOR MADLIB_SCHEMA.+ ( 00807 LEFTARG = float8[], 00808 RIGHTARG = float8[], 00809 PROCEDURE = MADLIB_SCHEMA.float8arr_plus_float8arr 00810 ); 00811 CREATE OPERATOR MADLIB_SCHEMA.* ( 00812 LEFTARG = float8[], 00813 RIGHTARG = float8[], 00814 PROCEDURE = MADLIB_SCHEMA.float8arr_mult_float8arr 00815 ); 00816 CREATE OPERATOR MADLIB_SCHEMA./ ( 00817 LEFTARG = float8[], 00818 RIGHTARG = float8[], 00819 PROCEDURE = MADLIB_SCHEMA.float8arr_div_float8arr 00820 ); 00821 00822 CREATE OPERATOR MADLIB_SCHEMA.- ( 00823 LEFTARG = float8[], 00824 RIGHTARG = MADLIB_SCHEMA.svec, 00825 PROCEDURE = MADLIB_SCHEMA.float8arr_minus_svec 00826 ); 00827 CREATE OPERATOR MADLIB_SCHEMA.+ ( 00828 LEFTARG = float8[], 00829 RIGHTARG = MADLIB_SCHEMA.svec, 00830 PROCEDURE = MADLIB_SCHEMA.float8arr_plus_svec 00831 ); 00832 CREATE OPERATOR MADLIB_SCHEMA.* ( 00833 LEFTARG = float8[], 00834 RIGHTARG = MADLIB_SCHEMA.svec, 00835 PROCEDURE = MADLIB_SCHEMA.float8arr_mult_svec 00836 ); 00837 CREATE OPERATOR MADLIB_SCHEMA./ ( 00838 LEFTARG = float8[], 00839 RIGHTARG = MADLIB_SCHEMA.svec, 00840 PROCEDURE = MADLIB_SCHEMA.float8arr_div_svec 00841 ); 00842 00843 CREATE OPERATOR MADLIB_SCHEMA.- ( 00844 LEFTARG = MADLIB_SCHEMA.svec, 00845 RIGHTARG = float8[], 00846 PROCEDURE = MADLIB_SCHEMA.svec_minus_float8arr 00847 ); 00848 CREATE OPERATOR MADLIB_SCHEMA.+ ( 00849 LEFTARG = MADLIB_SCHEMA.svec, 00850 RIGHTARG = float8[], 00851 PROCEDURE = MADLIB_SCHEMA.svec_plus_float8arr 00852 ); 00853 CREATE OPERATOR MADLIB_SCHEMA.* ( 00854 LEFTARG = MADLIB_SCHEMA.svec, 00855 RIGHTARG = float8[], 00856 PROCEDURE = MADLIB_SCHEMA.svec_mult_float8arr 00857 ); 00858 CREATE OPERATOR MADLIB_SCHEMA./ ( 00859 LEFTARG = MADLIB_SCHEMA.svec, 00860 RIGHTARG = float8[], 00861 PROCEDURE = MADLIB_SCHEMA.svec_div_float8arr 00862 ); 00863 00864 /* 00865 DROP CAST IF EXISTS (int2 AS MADLIB_SCHEMA.svec) ; 00866 DROP CAST IF EXISTS (integer AS MADLIB_SCHEMA.svec) ; 00867 DROP CAST IF EXISTS (bigint AS MADLIB_SCHEMA.svec) ; 00868 DROP CAST IF EXISTS (float4 AS MADLIB_SCHEMA.svec) ; 00869 DROP CAST IF EXISTS (float8 AS MADLIB_SCHEMA.svec) ; 00870 DROP CAST IF EXISTS (numeric AS MADLIB_SCHEMA.svec) ; 00871 */ 00872 00873 CREATE CAST (int2 AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_int2(int2) ; -- AS IMPLICIT; 00874 CREATE CAST (integer AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_int4(integer) ; -- AS IMPLICIT; 00875 CREATE CAST (bigint AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_int8(bigint) ; -- AS IMPLICIT; 00876 CREATE CAST (float4 AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_float4(float4) ; -- AS IMPLICIT; 00877 CREATE CAST (float8 AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_float8(float8) ; -- AS IMPLICIT; 00878 CREATE CAST (numeric AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_numeric(numeric) ; -- AS IMPLICIT; 00879 00880 /* 00881 DROP CAST IF EXISTS (int2 AS float8[]) ; 00882 DROP CAST IF EXISTS (integer AS float8[]) ; 00883 DROP CAST IF EXISTS (bigint AS float8[]) ; 00884 DROP CAST IF EXISTS (float4 AS float8[]) ; 00885 DROP CAST IF EXISTS (float8 AS float8[]) ; 00886 DROP CAST IF EXISTS (numeric AS float8[]) ; 00887 */ 00888 00889 -- CREATE CAST (int2 AS float8[]) WITH FUNCTION MADLIB_SCHEMA.float8arr_cast_int2(int2) ; -- AS IMPLICIT; 00890 -- CREATE CAST (integer AS float8[]) WITH FUNCTION MADLIB_SCHEMA.float8arr_cast_int4(integer) ; -- AS IMPLICIT; 00891 -- CREATE CAST (bigint AS float8[]) WITH FUNCTION MADLIB_SCHEMA.float8arr_cast_int8(bigint) ; -- AS IMPLICIT; 00892 -- CREATE CAST (float4 AS float8[]) WITH FUNCTION MADLIB_SCHEMA.float8arr_cast_float4(float4) ; -- AS IMPLICIT; 00893 -- CREATE CAST (float8 AS float8[]) WITH FUNCTION MADLIB_SCHEMA.float8arr_cast_float8(float8) ; -- AS IMPLICIT; 00894 -- CREATE CAST (numeric AS float8[]) WITH FUNCTION MADLIB_SCHEMA.float8arr_cast_numeric(numeric) ; -- AS IMPLICIT; 00895 00896 -- DROP CAST IF EXISTS (MADLIB_SCHEMA.svec AS float8[]) ; 00897 -- DROP CAST IF EXISTS (float8[] AS MADLIB_SCHEMA.svec) ; 00898 00899 CREATE CAST (MADLIB_SCHEMA.svec AS float8[]) WITH FUNCTION MADLIB_SCHEMA.svec_return_array(MADLIB_SCHEMA.svec) ; -- AS IMPLICIT; 00900 CREATE CAST (float8[] AS MADLIB_SCHEMA.svec) WITH FUNCTION MADLIB_SCHEMA.svec_cast_float8arr(float8[]) ; -- AS IMPLICIT; 00901 00902 -- DROP OPERATOR IF EXISTS = (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) ; 00903 00904 00905 CREATE OPERATOR MADLIB_SCHEMA.= ( 00906 leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_eq, 00907 commutator = operator(MADLIB_SCHEMA.=) , 00908 -- negator = operator(MADLIB_SCHEMA.<>) , 00909 restrict = eqsel, join = eqjoinsel 00910 ); 00911 00912 --! Transition function for mean(svec) aggregate 00913 --! 00914 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_mean_transition( FLOAT[], MADLIB_SCHEMA.svec) 00915 RETURNS FLOAT[] AS 'MODULE_PATHNAME' 00916 LANGUAGE C IMMUTABLE; 00917 00918 --! Preliminary merge function for mean(svec) aggregate 00919 --! 00920 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_mean_prefunc( FLOAT[], FLOAT[]) 00921 RETURNS FLOAT[] AS 'MODULE_PATHNAME' 00922 LANGUAGE C IMMUTABLE; 00923 00924 --! Final function for mean(svec) aggregate 00925 --! 00926 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_mean_final( FLOAT[]) 00927 RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME' 00928 LANGUAGE C IMMUTABLE; 00929 00930 --! Aggregate that computes the element-wise mean of a list of vectors. 00931 --! 00932 CREATE AGGREGATE MADLIB_SCHEMA.mean( MADLIB_SCHEMA.svec) ( 00933 SFUNC = MADLIB_SCHEMA.svec_mean_transition, 00934 m4_ifdef(`__GREENPLUM__',`prefunc = MADLIB_SCHEMA.svec_mean_prefunc,') 00935 FINALFUNC = MADLIB_SCHEMA.svec_mean_final, 00936 STYPE = FLOAT[] 00937 ); 00938 00939 --! Aggregate that provides the element-wise sum of a list of vectors. 00940 --! 00941 -- DROP AGGREGATE IF EXISTS MADLIB_SCHEMA.svec_sum(MADLIB_SCHEMA.svec); 00942 CREATE AGGREGATE MADLIB_SCHEMA.svec_sum (MADLIB_SCHEMA.svec) ( 00943 SFUNC = MADLIB_SCHEMA.svec_plus, 00944 m4_ifdef(`__GREENPLUM__',`prefunc=MADLIB_SCHEMA.svec_plus,') 00945 INITCOND = '{1}:{0.}', -- Zero 00946 STYPE = MADLIB_SCHEMA.svec 00947 ); 00948 00949 --! Aggregate that provides a tally of nonzero entries in a list of vectors. 00950 --! 00951 -- DROP AGGREGATE IF EXISTS MADLIB_SCHEMA.svec_count_nonzero(MADLIB_SCHEMA.svec); 00952 CREATE AGGREGATE MADLIB_SCHEMA.svec_count_nonzero (MADLIB_SCHEMA.svec) ( 00953 SFUNC = MADLIB_SCHEMA.svec_count, 00954 m4_ifdef(`__GREENPLUM__',`prefunc=MADLIB_SCHEMA.svec_plus,') 00955 INITCOND = '{1}:{0.}', -- Zero 00956 STYPE = MADLIB_SCHEMA.svec 00957 ); 00958 00959 --! Aggregate that turns a list of float8 values into an SVEC. 00960 --! 00961 -- DROP AGGREGATE IF EXISTS MADLIB_SCHEMA.svec_agg(float8); 00962 CREATE 00963 m4_ifdef(`__GREENPLUM__', m4_ifdef(`__HAS_ORDERED_AGGREGATES__', `ORDERED')) 00964 AGGREGATE MADLIB_SCHEMA.svec_agg (float8) ( 00965 SFUNC = MADLIB_SCHEMA.svec_pivot, 00966 m4_ifdef(`__GREENPLUM__', m4_ifdef(`__HAS_ORDERED_AGGREGATES__', `', ``prefunc=MADLIB_SCHEMA.svec_concat,'')) 00967 STYPE = MADLIB_SCHEMA.svec 00968 ); 00969 00970 --! Aggregate that computes the median element of a list of float8 values. 00971 --! 00972 -- DROP AGGREGATE IF EXISTS MADLIB_SCHEMA.svec_median_inmemory(float8); 00973 CREATE AGGREGATE MADLIB_SCHEMA.svec_median_inmemory (float8) ( 00974 SFUNC = MADLIB_SCHEMA.svec_pivot, 00975 m4_ifdef(`__GREENPLUM__',`prefunc=MADLIB_SCHEMA.svec_concat,') 00976 FINALFUNC = MADLIB_SCHEMA.svec_median, 00977 STYPE = MADLIB_SCHEMA.svec 00978 ); 00979 00980 -- Comparisons based on L2 Norm 00981 --! Returns true if the l2 norm of the first SVEC is less than that of the second SVEC. 00982 --! 00983 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_lt(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS bool AS 'MODULE_PATHNAME', 'svec_l2_lt' LANGUAGE C IMMUTABLE; 00984 00985 --! Returns true if the l2 norm of the first SVEC is less than or equal to that of the second SVEC. 00986 --! 00987 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_le(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS bool AS 'MODULE_PATHNAME', 'svec_l2_le' LANGUAGE C IMMUTABLE; 00988 00989 --! Returns true if the l2 norm of the first SVEC is equal to that of the second SVEC. 00990 --! 00991 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_eq(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS bool AS 'MODULE_PATHNAME', 'svec_l2_eq' LANGUAGE C IMMUTABLE; 00992 00993 --! Returns true if the l2 norm of the first SVEC is not equal to that of the second SVEC. 00994 --! 00995 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_ne(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS bool AS 'MODULE_PATHNAME', 'svec_l2_ne' LANGUAGE C IMMUTABLE; 00996 00997 --! Returns true if the l2 norm of the first SVEC is greater than that of the second SVEC. 00998 --! 00999 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_gt(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS bool AS 'MODULE_PATHNAME', 'svec_l2_gt' LANGUAGE C IMMUTABLE; 01000 01001 --! Returns true if the l2 norm of the first SVEC is greater than or equal to that of the second SVEC. 01002 --! 01003 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_ge(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS bool AS 'MODULE_PATHNAME', 'svec_l2_ge' LANGUAGE C IMMUTABLE; 01004 01005 --! Returns a value indicating the relative values of the l2 norms of two SVECs. 01006 --! 01007 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.svec_l2_cmp(MADLIB_SCHEMA.svec,MADLIB_SCHEMA.svec) RETURNS integer AS 'MODULE_PATHNAME', 'svec_l2_cmp' LANGUAGE C IMMUTABLE; 01008 01009 --! Normalizes an SVEC that is divides all elements by its norm/magnitude. 01010 --! 01011 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.normalize(MADLIB_SCHEMA.svec) 01012 RETURNS MADLIB_SCHEMA.svec AS 'MODULE_PATHNAME', 'svec_normalize' LANGUAGE C IMMUTABLE STRICT; 01013 01014 /* 01015 DROP OPERATOR IF EXISTS < (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) CASCADE ; 01016 DROP OPERATOR IF EXISTS <= (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) CASCADE ; 01017 DROP OPERATOR IF EXISTS <> (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) ; 01018 DROP OPERATOR IF EXISTS == (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) CASCADE ; 01019 DROP OPERATOR IF EXISTS > (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) CASCADE ; 01020 DROP OPERATOR IF EXISTS >= (MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec) CASCADE ; 01021 DROP OPERATOR IF EXISTS *|| (int4, MADLIB_SCHEMA.svec) ; 01022 */ 01023 01024 CREATE OPERATOR MADLIB_SCHEMA.< ( 01025 leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_l2_lt, 01026 commutator = operator(MADLIB_SCHEMA.>) , negator = operator(MADLIB_SCHEMA.>=) , 01027 restrict = scalarltsel, join = scalarltjoinsel 01028 ); 01029 CREATE OPERATOR MADLIB_SCHEMA.<= ( 01030 leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_l2_le, 01031 commutator = operator(MADLIB_SCHEMA.>=) , negator = operator(MADLIB_SCHEMA.>) , 01032 restrict = scalarltsel, join = scalarltjoinsel 01033 ); 01034 CREATE OPERATOR MADLIB_SCHEMA.<> ( 01035 leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_l2_eq, 01036 commutator = operator(MADLIB_SCHEMA.<>) , 01037 negator = operator(MADLIB_SCHEMA.=), 01038 restrict = eqsel, join = eqjoinsel 01039 ); 01040 CREATE OPERATOR MADLIB_SCHEMA.== ( 01041 leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_l2_eq, 01042 commutator = operator(MADLIB_SCHEMA.=) , 01043 negator = operator(MADLIB_SCHEMA.<>) , 01044 restrict = eqsel, join = eqjoinsel 01045 ); 01046 CREATE OPERATOR MADLIB_SCHEMA.>= ( 01047 leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_l2_ge, 01048 commutator = operator(MADLIB_SCHEMA.<=) , negator = operator(MADLIB_SCHEMA.<) , 01049 restrict = scalargtsel, join = scalargtjoinsel 01050 ); 01051 CREATE OPERATOR MADLIB_SCHEMA.> ( 01052 leftarg = MADLIB_SCHEMA.svec, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_l2_gt, 01053 commutator = operator(MADLIB_SCHEMA.<) , negator = operator(MADLIB_SCHEMA.<=) , 01054 restrict = scalargtsel, join = scalargtjoinsel 01055 ); 01056 01057 CREATE OPERATOR MADLIB_SCHEMA.*|| ( 01058 leftarg = int4, rightarg = MADLIB_SCHEMA.svec, procedure = MADLIB_SCHEMA.svec_concat_replicate 01059 ); 01060 01061 CREATE OPERATOR CLASS MADLIB_SCHEMA.svec_l2_ops 01062 DEFAULT FOR TYPE MADLIB_SCHEMA.svec USING btree AS 01063 OPERATOR 1 MADLIB_SCHEMA.< , 01064 OPERATOR 2 MADLIB_SCHEMA.<= , 01065 OPERATOR 3 MADLIB_SCHEMA.== , 01066 OPERATOR 4 MADLIB_SCHEMA.>= , 01067 OPERATOR 5 MADLIB_SCHEMA.> , 01068 FUNCTION 1 MADLIB_SCHEMA.svec_l2_cmp(MADLIB_SCHEMA.svec, MADLIB_SCHEMA.svec); 01069