BICO  1.0
 All Classes Namespaces Files Functions Variables Typedefs Pages
CluE::Bico< T > Class Template Reference

Fast computation of k-means coresets in a data stream. More...

#include <bico.h>

Inheritance diagram for CluE::Bico< T >:
Inheritance graph
Collaboration diagram for CluE::Bico< T >:
Collaboration graph

Classes

class  BicoNode
 Class representing a node in BICO's tree. More...
 

Public Member Functions

 Bico (size_t dimension, size_t n, size_t k, size_t p, size_t nMax, DissimilarityMeasure< T > *measure, WeightModifier< T > *weightModifier)
 Constructs BICO for points of type T T can be an arbitrary type but it has to fulfil the requirements of CFREntry. More...
 
virtual ProxySolution< T > * compute ()
 Returns a coreset of all point read so far. More...
 
virtual Bico< T > & operator<< (T const &element)
 Read a point Insert the point into BICO's tree. More...
 
void print (std::ostream &os)
 Write the tree as GraphViz source into a stream. More...
 

Private Member Functions

void insert (BicoNode *node, int level, T const &element)
 Inserts an element into a BicoNode at a certain level. More...
 
void allocateBucket (int bucket, bool left)
 Allocates a new bucket. More...
 
int calcBucketNumber (int rnd, double val)
 
void buildBuckets ()
 Create initial buckets. More...
 
double project (T point, int i)
 Projects a point onto a projection line. More...
 
void rebuild ()
 Rebuilds the tree. More...
 
void rebuildFirstLevel (BicoNode *parent, BicoNode *child)
 
void rebuildTraversMerge (BicoNode *node, int level)
 
void computeTraverse (BicoNode *node, ProxySolution< T > *solution)
 Recursive computation of the coreset. More...
 
double getT (int level)
 Returns the threshold for a given level. More...
 
double getR (int level)
 Returns the radius for a given level. More...
 
void print (std::ostream &os, BicoNode *node)
 

Private Attributes

size_t k
 Number of centers. More...
 
size_t L
 Number of projections. More...
 
std::vector< std::vector
< double > > 
rndprojections
 Random projection vectors. More...
 
std::vector< std::vector
< std::vector< typename
BicoNode::FeatureList::iterator > > > 
buckets
 Buckets for nearest neighbour search in first level. More...
 
std::vector< std::pair< double,
double > > 
borders
 Bucket borders. More...
 
std::vector< int > bucket_radius
 Width of buckets. More...
 
int nodeIdCounter
 Counter for unique BicoNode object ids. More...
 
std::unique_ptr
< DissimilarityMeasure< T > > 
measure
 Buffer for DissimilarityMeasure. More...
 
std::unique_ptr
< WeightModifier< T > > 
weightModifier
 Buffer for WeightModifier. More...
 
size_t maxNumOfCFs
 Maximum coreset size. More...
 
size_t curNumOfCFs
 Current coreset size. More...
 
size_t dimension
 Dimension of the input points. More...
 
double optEst
 Current estimation of the optimal clustering cost. More...
 
std::vector< double > maxVal
 Extreme values used for constructing the nearest neighbour buckets. More...
 
BicoNoderoot
 Root node of BICO's tree. More...
 
bool bufferPhase
 Buffer phase indicator. More...
 
std::vector< T > buffer
 Buffer phase's buffer. More...
 
int numOfRebuilds
 Current number of rebuilding. More...
 

Detailed Description

template<typename T>
class CluE::Bico< T >

Fast computation of k-means coresets in a data stream.

BICO maintains a tree which is inspired by the clustering tree of BIRCH, a SIGMOD Test of Time award-winning clustering algorithm. Each node in the tree represents a subset of these points. Instead of storing all points as individual objects, only the number of points, the sum and the squared sum of the subset's points are stored as key features of each subset. Points are inserted into exactly one node. A detailed description of BICO can be found here:

  • Hendrik Fichtenberger, Marc Gillé, Melanie Schmidt, Chris Schwiegelshohn, Christian Sohler: BICO: BIRCH Meets Coresets for k-Means Clustering. ESA 2013: 481-492 In this implementation, the nearest neighbour search on the first level of the tree ist sped up by projecting all points to random 1-d subspaces. The first estimation of the optimal clustering cost is computed in a buffer phase at the beginning of the algorithm.

Definition at line 42 of file bico.h.

Constructor & Destructor Documentation

template<typename T >
CluE::Bico< T >::Bico ( size_t  dimension,
size_t  n,
size_t  k,
size_t  p,
size_t  nMax,
DissimilarityMeasure< T > *  measure,
WeightModifier< T > *  weightModifier 
)

Constructs BICO for points of type T T can be an arbitrary type but it has to fulfil the requirements of CFREntry.

Parameters
dimensionDimension of the data
nSize of the data
kNumber of desired centeres
pNumber of random projections used for nearest neighbour search in the first level
nMaxMaximum coreset size
measureImplementation of the squared L2 metric for T
weightModifierClass to read and modify weight of T

Definition at line 492 of file bico.h.

Member Function Documentation

template<typename T >
ProxySolution< T > * CluE::Bico< T >::compute ( )
virtual

Returns a coreset of all point read so far.

Returns
Coreset

Implements CluE::Algorithm.

Definition at line 602 of file bico.h.

template<typename T >
Bico< T > & CluE::Bico< T >::operator<< ( T const &  element)
virtual

Read a point Insert the point into BICO's tree.

Parameters
elementPoint of type T
Returns
This BICO instance

Implements CluE::StreamingAlgorithm< T >.

Definition at line 622 of file bico.h.

template<typename T >
void CluE::Bico< T >::print ( std::ostream &  os)

Write the tree as GraphViz source into a stream.

Parameters
osOutput stream

Definition at line 881 of file bico.h.

template<typename T >
void CluE::Bico< T >::insert ( BicoNode node,
int  level,
T const &  element 
)
private

Inserts an element into a BicoNode at a certain level.

Parameters
nodeBicoNode to be inserted into
levelLevel of this BicoNode
elementElemente to be inserted

Definition at line 676 of file bico.h.

template<typename T >
void CluE::Bico< T >::allocateBucket ( int  bucket,
bool  left 
)
private

Allocates a new bucket.

Parameters
bucketNumber of projection
leftPush front bucket (instead of push back)

Definition at line 562 of file bico.h.

template<typename T >
int CluE::Bico< T >::calcBucketNumber ( int  rnd,
double  val 
)
private

Calculates the bucket number for a given value

Parameters
rndNumber of projections
valValue
Returns
Bucket number

Definition at line 535 of file bico.h.

template<typename T >
void CluE::Bico< T >::buildBuckets ( )
private

Create initial buckets.

Definition at line 540 of file bico.h.

template<typename T >
double CluE::Bico< T >::project ( point,
int  i 
)
private

Projects a point onto a projection line.

Parameters
pointPoint
iNumber of projection line
Returns
Projected point

Definition at line 592 of file bico.h.

template<typename T >
void CluE::Bico< T >::rebuild ( )
private

Rebuilds the tree.

Definition at line 751 of file bico.h.

template<typename T >
void CluE::Bico< T >::rebuildFirstLevel ( BicoNode parent,
BicoNode child 
)
private

Rebuilds the first level

Parameters
parentNew root
childOld root

Definition at line 765 of file bico.h.

template<typename T >
void CluE::Bico< T >::rebuildTraversMerge ( BicoNode node,
int  level 
)
private

Recursive rebuilding of the tree

Parameters
nodeSome node to be rebuilded
levelLevel of this node

Definition at line 835 of file bico.h.

template<typename T >
void CluE::Bico< T >::computeTraverse ( BicoNode node,
ProxySolution< T > *  solution 
)
private

Recursive computation of the coreset.

Parameters
nodeSome node to be processed
solutionProxySolution containing the coreset

Definition at line 611 of file bico.h.

template<typename T >
double CluE::Bico< T >::getT ( int  level)
private

Returns the threshold for a given level.

Parameters
levelLevel
Returns
Threshold at this level

Definition at line 871 of file bico.h.

template<typename T >
double CluE::Bico< T >::getR ( int  level)
private

Returns the radius for a given level.

Parameters
levelLevel
Returns
Radius at this level

Definition at line 876 of file bico.h.

template<typename T >
void CluE::Bico< T >::print ( std::ostream &  os,
BicoNode node 
)
private

Writes a BicoNode as GraphViz source into a stream

Parameters
osOutput stream
nodeSome BicoNode

Definition at line 889 of file bico.h.

Member Data Documentation

template<typename T>
size_t CluE::Bico< T >::k
private

Number of centers.

Definition at line 407 of file bico.h.

template<typename T>
size_t CluE::Bico< T >::L
private

Number of projections.

Definition at line 411 of file bico.h.

template<typename T>
std::vector<std::vector<double > > CluE::Bico< T >::rndprojections
private

Random projection vectors.

Definition at line 416 of file bico.h.

template<typename T>
std::vector<std::vector<std::vector<typename BicoNode::FeatureList::iterator > > > CluE::Bico< T >::buckets
private

Buckets for nearest neighbour search in first level.

Definition at line 421 of file bico.h.

template<typename T>
std::vector<std::pair<double, double > > CluE::Bico< T >::borders
private

Bucket borders.

Definition at line 425 of file bico.h.

template<typename T>
std::vector<int> CluE::Bico< T >::bucket_radius
private

Width of buckets.

Definition at line 429 of file bico.h.

template<typename T>
int CluE::Bico< T >::nodeIdCounter
private

Counter for unique BicoNode object ids.

Definition at line 434 of file bico.h.

template<typename T>
std::unique_ptr<DissimilarityMeasure<T > > CluE::Bico< T >::measure
private

Buffer for DissimilarityMeasure.

Definition at line 439 of file bico.h.

template<typename T>
std::unique_ptr<WeightModifier<T > > CluE::Bico< T >::weightModifier
private

Buffer for WeightModifier.

Definition at line 444 of file bico.h.

template<typename T>
size_t CluE::Bico< T >::maxNumOfCFs
private

Maximum coreset size.

Definition at line 449 of file bico.h.

template<typename T>
size_t CluE::Bico< T >::curNumOfCFs
private

Current coreset size.

Definition at line 454 of file bico.h.

template<typename T>
size_t CluE::Bico< T >::dimension
private

Dimension of the input points.

Definition at line 459 of file bico.h.

template<typename T>
double CluE::Bico< T >::optEst
private

Current estimation of the optimal clustering cost.

Definition at line 464 of file bico.h.

template<typename T>
std::vector<double> CluE::Bico< T >::maxVal
private

Extreme values used for constructing the nearest neighbour buckets.

Definition at line 469 of file bico.h.

template<typename T>
BicoNode* CluE::Bico< T >::root
private

Root node of BICO's tree.

Definition at line 474 of file bico.h.

template<typename T>
bool CluE::Bico< T >::bufferPhase
private

Buffer phase indicator.

Definition at line 479 of file bico.h.

template<typename T>
std::vector<T> CluE::Bico< T >::buffer
private

Buffer phase's buffer.

Definition at line 484 of file bico.h.

template<typename T>
int CluE::Bico< T >::numOfRebuilds
private

Current number of rebuilding.

Definition at line 489 of file bico.h.


The documentation for this class was generated from the following file: