BICO  1.1
 All Classes Namespaces Files Functions Variables Typedefs Pages
CluE::Bico< T > Class Template Reference

Fast computation of k-means coresets in a data stream. More...

#include <bico.h>

Inheritance diagram for CluE::Bico< T >:
Inheritance graph
Collaboration diagram for CluE::Bico< T >:
Collaboration graph

Classes

class  BicoNode
 Class representing a node in BICO's tree. More...
 

Public Member Functions

 Bico (size_t dimension, size_t n, size_t k, size_t p, size_t nMax, DissimilarityMeasure< T > *measure, WeightModifier< T > *weightModifier)
 Constructs BICO for points of type T T can be an arbitrary type but it has to fulfil the requirements of CFREntry. More...
 
virtual ProxySolution< T > * compute ()
 Returns a coreset of all point read so far. More...
 
virtual Bico< T > & operator<< (T const &element)
 Read a point Insert the point into BICO's tree. More...
 
void print (std::ostream &os)
 Write the tree as GraphViz source into a stream. More...
 
- Public Member Functions inherited from CluE::Algorithm
virtual ~Algorithm ()
 

Private Member Functions

void insert (BicoNode *node, int level, T const &element)
 Inserts an element into a BicoNode at a certain level. More...
 
void insertIntoNN (typename BicoNode::FeatureList::iterator iteratorElement)
 Inserts an element into the nearest neighbour data structure. More...
 
void initializeNN ()
 Initialize nearest neighbour data structure. More...
 
void allocateBucket (int bucket, bool left)
 Allocates a new bucket. More...
 
int calcBucketNumber (int rnd, double val)
 
double project (T point, int i)
 Projects a point onto a projection line. More...
 
void rebuild ()
 Rebuilds the tree. More...
 
void rebuildFirstLevel (BicoNode *parent, BicoNode *child)
 
void rebuildTraversMerge (BicoNode *node, int level)
 
void computeTraverse (BicoNode *node, ProxySolution< T > *solution)
 Recursive computation of the coreset. More...
 
double getT (int level)
 Returns the threshold for a given level. More...
 
double getR (int level)
 Returns the radius for a given level. More...
 
void print (std::ostream &os, BicoNode *node)
 

Private Attributes

size_t k
 Number of centers. More...
 
size_t L
 Number of projections. More...
 
std::vector< std::vector
< double > > 
rndprojections
 Random projection vectors. More...
 
std::vector< std::vector
< std::vector< typename
BicoNode::FeatureList::iterator > > > 
buckets
 Buckets for nearest neighbour search in first level. More...
 
std::vector< std::pair< double,
double > > 
borders
 Bucket borders. More...
 
std::vector< double > bucket_radius
 Width of buckets. More...
 
int nodeIdCounter
 Counter for unique BicoNode object ids. More...
 
std::unique_ptr
< DissimilarityMeasure< T > > 
measure
 Buffer for DissimilarityMeasure. More...
 
std::unique_ptr
< WeightModifier< T > > 
weightModifier
 Buffer for WeightModifier. More...
 
size_t maxNumOfCFs
 Maximum coreset size. More...
 
size_t curNumOfCFs
 Current coreset size. More...
 
size_t dimension
 Dimension of the input points. More...
 
double optEst
 Current estimation of the optimal clustering cost. More...
 
std::vector< double > maxVal
 Extreme values used for constructing the nearest neighbour buckets. More...
 
BicoNoderoot
 Root node of BICO's tree. More...
 
bool bufferPhase
 Buffer phase indicator. More...
 
std::vector< T > buffer
 Buffer phase's buffer. More...
 
std::vector< std::pair< double,
T const * > > 
projection_buffer
 Buffer phase's buffer for projected buffer points. More...
 
double minDist
 Minimum pair distance of two points read in buffer phase. More...
 
size_t pairwise_different
 Number of unique elements read in buffer phase. More...
 
int numOfRebuilds
 Current number of rebuilding. More...
 

Detailed Description

template<typename T>
class CluE::Bico< T >

Fast computation of k-means coresets in a data stream.

BICO maintains a tree which is inspired by the clustering tree of BIRCH, a SIGMOD Test of Time award-winning clustering algorithm. Each node in the tree represents a subset of these points. Instead of storing all points as individual objects, only the number of points, the sum and the squared sum of the subset's points are stored as key features of each subset. Points are inserted into exactly one node. A detailed description of BICO can be found here:

  • Hendrik Fichtenberger, Marc GillĂ©, Melanie Schmidt, Chris Schwiegelshohn, Christian Sohler: BICO: BIRCH Meets Coresets for k-Means Clustering. ESA 2013: 481-492

In this implementation, the nearest neighbour search on the first level of the tree ist sped up by projecting all points to random 1-d subspaces. The first estimation of the optimal clustering cost is computed in a buffer phase at the beginning of the algorithm.

Definition at line 43 of file bico.h.

Constructor & Destructor Documentation

template<typename T >
CluE::Bico< T >::Bico ( size_t  dimension,
size_t  n,
size_t  k,
size_t  p,
size_t  nMax,
DissimilarityMeasure< T > *  measure,
WeightModifier< T > *  weightModifier 
)

Constructs BICO for points of type T T can be an arbitrary type but it has to fulfil the requirements of CFREntry.

Parameters
dimensionDimension of the data
nSize of the data
kNumber of desired centeres
pNumber of random projections used for nearest neighbour search in the first level
nMaxMaximum coreset size
measureImplementation of the squared L2 metric for T
weightModifierClass to read and modify weight of T

Definition at line 522 of file bico.h.

Member Function Documentation

template<typename T >
ProxySolution< T > * CluE::Bico< T >::compute ( )
virtual

Returns a coreset of all point read so far.

Returns
Coreset

Implements CluE::Algorithm.

Definition at line 642 of file bico.h.

template<typename T >
Bico< T > & CluE::Bico< T >::operator<< ( T const &  element)
virtual

Read a point Insert the point into BICO's tree.

Parameters
elementPoint of type T
Returns
This BICO instance

Implements CluE::StreamingAlgorithm< T >.

Definition at line 669 of file bico.h.

template<typename T >
void CluE::Bico< T >::print ( std::ostream &  os)

Write the tree as GraphViz source into a stream.

Parameters
osOutput stream

Definition at line 957 of file bico.h.

template<typename T >
void CluE::Bico< T >::insert ( BicoNode node,
int  level,
T const &  element 
)
private

Inserts an element into a BicoNode at a certain level.

Parameters
nodeBicoNode to be inserted into
levelLevel of this BicoNode
elementElemente to be inserted

Definition at line 796 of file bico.h.

template<typename T >
void CluE::Bico< T >::insertIntoNN ( typename BicoNode::FeatureList::iterator  iteratorElement)
private

Inserts an element into the nearest neighbour data structure.

Parameters
iteratorElementFeature to be insertet into NN data structure

Definition at line 769 of file bico.h.

template<typename T >
void CluE::Bico< T >::initializeNN ( )
private

Initialize nearest neighbour data structure.

Definition at line 574 of file bico.h.

template<typename T >
void CluE::Bico< T >::allocateBucket ( int  bucket,
bool  left 
)
private

Allocates a new bucket.

Parameters
bucketNumber of projection
leftPush front bucket (instead of push back)

Definition at line 602 of file bico.h.

template<typename T >
int CluE::Bico< T >::calcBucketNumber ( int  rnd,
double  val 
)
private

Calculates the bucket number for a given value

Parameters
rndNumber of projections
valValue
Returns
Bucket number

Definition at line 569 of file bico.h.

template<typename T >
double CluE::Bico< T >::project ( point,
int  i 
)
private

Projects a point onto a projection line.

Parameters
pointPoint
iNumber of projection line
Returns
Projected point

Definition at line 632 of file bico.h.

template<typename T >
void CluE::Bico< T >::rebuild ( )
private

Rebuilds the tree.

Definition at line 849 of file bico.h.

template<typename T >
void CluE::Bico< T >::rebuildFirstLevel ( BicoNode parent,
BicoNode child 
)
private

Rebuilds the first level

Parameters
parentNew root
childOld root

Definition at line 862 of file bico.h.

template<typename T >
void CluE::Bico< T >::rebuildTraversMerge ( BicoNode node,
int  level 
)
private

Recursive rebuilding of the tree

Parameters
nodeSome node to be rebuilded
levelLevel of this node

Definition at line 911 of file bico.h.

template<typename T >
void CluE::Bico< T >::computeTraverse ( BicoNode node,
ProxySolution< T > *  solution 
)
private

Recursive computation of the coreset.

Parameters
nodeSome node to be processed
solutionProxySolution containing the coreset

Definition at line 658 of file bico.h.

template<typename T >
double CluE::Bico< T >::getT ( int  level)
private

Returns the threshold for a given level.

Parameters
levelLevel
Returns
Threshold at this level

Definition at line 947 of file bico.h.

template<typename T >
double CluE::Bico< T >::getR ( int  level)
private

Returns the radius for a given level.

Parameters
levelLevel
Returns
Radius at this level

Definition at line 952 of file bico.h.

template<typename T >
void CluE::Bico< T >::print ( std::ostream &  os,
BicoNode node 
)
private

Writes a BicoNode as GraphViz source into a stream

Parameters
osOutput stream
nodeSome BicoNode

Definition at line 965 of file bico.h.

Member Data Documentation

template<typename T>
size_t CluE::Bico< T >::k
private

Number of centers.

Definition at line 414 of file bico.h.

template<typename T>
size_t CluE::Bico< T >::L
private

Number of projections.

Definition at line 418 of file bico.h.

template<typename T>
std::vector<std::vector<double > > CluE::Bico< T >::rndprojections
private

Random projection vectors.

Definition at line 423 of file bico.h.

template<typename T>
std::vector<std::vector<std::vector<typename BicoNode::FeatureList::iterator > > > CluE::Bico< T >::buckets
private

Buckets for nearest neighbour search in first level.

Definition at line 428 of file bico.h.

template<typename T>
std::vector<std::pair<double, double > > CluE::Bico< T >::borders
private

Bucket borders.

Definition at line 432 of file bico.h.

template<typename T>
std::vector<double> CluE::Bico< T >::bucket_radius
private

Width of buckets.

Definition at line 436 of file bico.h.

template<typename T>
int CluE::Bico< T >::nodeIdCounter
private

Counter for unique BicoNode object ids.

Definition at line 441 of file bico.h.

template<typename T>
std::unique_ptr<DissimilarityMeasure<T > > CluE::Bico< T >::measure
private

Buffer for DissimilarityMeasure.

Definition at line 446 of file bico.h.

template<typename T>
std::unique_ptr<WeightModifier<T > > CluE::Bico< T >::weightModifier
private

Buffer for WeightModifier.

Definition at line 451 of file bico.h.

template<typename T>
size_t CluE::Bico< T >::maxNumOfCFs
private

Maximum coreset size.

Definition at line 456 of file bico.h.

template<typename T>
size_t CluE::Bico< T >::curNumOfCFs
private

Current coreset size.

Definition at line 461 of file bico.h.

template<typename T>
size_t CluE::Bico< T >::dimension
private

Dimension of the input points.

Definition at line 466 of file bico.h.

template<typename T>
double CluE::Bico< T >::optEst
private

Current estimation of the optimal clustering cost.

Definition at line 471 of file bico.h.

template<typename T>
std::vector<double> CluE::Bico< T >::maxVal
private

Extreme values used for constructing the nearest neighbour buckets.

Definition at line 476 of file bico.h.

template<typename T>
BicoNode* CluE::Bico< T >::root
private

Root node of BICO's tree.

Definition at line 481 of file bico.h.

template<typename T>
bool CluE::Bico< T >::bufferPhase
private

Buffer phase indicator.

Definition at line 486 of file bico.h.

template<typename T>
std::vector<T> CluE::Bico< T >::buffer
private

Buffer phase's buffer.

Definition at line 491 of file bico.h.

template<typename T>
std::vector<std::pair<double, T const *> > CluE::Bico< T >::projection_buffer
private

Buffer phase's buffer for projected buffer points.

Definition at line 496 of file bico.h.

template<typename T>
double CluE::Bico< T >::minDist
private

Minimum pair distance of two points read in buffer phase.

Definition at line 501 of file bico.h.

template<typename T>
size_t CluE::Bico< T >::pairwise_different
private

Number of unique elements read in buffer phase.

Definition at line 506 of file bico.h.

template<typename T>
int CluE::Bico< T >::numOfRebuilds
private

Current number of rebuilding.

Definition at line 511 of file bico.h.


The documentation for this class was generated from the following file: