学科分类
/ 25
500 个结果
  • 简介:Thisarticleisaboutapieceofmiddleware,allowingtoconvertadumptapebasedTertiaryStorageSystemintoamultipetabyterandomaccessdevicewiththousandsofchannels.Usingtypicalcachingmechanisms,thesoftwareoptimizestheaccesstotheunderlyingStorageSystemandmakesbetteruseofpossiblyexpensivedrivesandrobotsorallowstointegratecheapandslowdeviceswithoutintroducingunacceptableperformancedegadation.Inaddition,usingthestandardNFS2protocol,thedCacheprovidesauniqueviewintothestoragerepository,hidingthephysicallocationofthefiledata,cachedortapeonly.BulkdatatransferissupportedthroughthekerberizedFTPprotocolandaC-API,providingtheposixfileaccesssemantics,Datasetstaginganddiskspacemanagementisperformedinvisiblytothedataclients.TheprojectisaDESY,FermilabjointefforttoovercomelimitationsintheusageoftertiarystorageresourcescommontomanyHEPlabs.ThedistributedcachenodesmayrangefromhighperformanceSGImachinestocommodityCERNLinux-IDElikefileservermodels.Differentcachenodesareassumedtohavedifferentaffinitiestoparticularstoragegroupsorfilesets.AffinitiesmaybedefinedmanuallyorarecalculatedbythedCachebasedontopologyconsiderations.Cachenodesmayhavedifferentdiskspacemanagementpoliciestomatchthelargevarietyofapplicationsfromrawdatatouseranalysisdatapools.

  • 标签: dCahe 高速缓冲存储器 数据存储系统
  • 简介:Mostoftheearlierworkonclusteringmainlyfocusedonnumericdatawhoseinherentgeometricpropertiescanbeexploitedtonaturallydefinedistancefunctionsbetweendatapoints.However,dataminingapplicationsfrequentlyinvolvemanydatasetsthatalsoconsistsofmixednumericandcategoricalattributes.Inthispaperwepresentaclusteringalgorithmwhichisbasedonthek-meansalgorithm.Thealgorithmclustersobjectswithnumericandcategoricalattributesinawaysimilartok-means.Theobjectsimilaritymeasureisderivedfrombothnumericandcategoricalattributes.Whenappliedtonumericdata,thealgorithmisidenticaltothek-means.Themainresultofthispaperistoprovideamethodtoupdatethe'clustercenters'ofclusteringobjectsdescribedbymixednumericandcategoricalattributesintheclusteringprocesstominimisetheclusteringcostfunction.Theclusteringperformanceofthealgorithmisdemonstratedwiththetwowellknowndatasets,namelycreditapprovalandabalonedatabases.

  • 标签: 数据挖掘 数字数据 分类数据 聚类算法 数据库 数据集
  • 简介:Networktrafficclassificationaimsatidentifyingtheapplicationtypesofnetworkpackets.ItisimportantforInternetserviceproviders(ISPs)tomanagebandwidthresourcesandensurethequalityofservicefordifferentnetworkapplications.However,mostclassificationtechniquesusingmachinelearningonlyfocusonhighflowaccuracyandignorebyteaccuracy.TheclassifierwouldobtainlowclassificationperformanceforelephantflowsastheimbalancebetweenelephantflowsandmiceflowsonInternet.Theelephantflows,however,consumemuchmorebandwidththanmiceflows.Whentheclassifierisdeployedfortrafficpolicing,thenetworkmanagementsystemcannotpenalizeelephantflowsandavoidnetworkcongestioneffectively.Thisarticleexploresthefactorsrelatedtolowbyteaccuracy,andsecondly,itpresentsanewtrafficclassificationmethodtoimprovebyteaccuracyattheaidofdatacleaning.Experimentsarecarriedoutonthreegroupsofreal-worldtrafficdatasets,andthemethodiscomparedwithexistingworkontheperformanceofimprovingbyteaccuracy.Experimentshowsthatbyteaccuracyincreasedbyabout22.31%onaverage.Themethodoutperformstheexistingoneinmostcases.

  • 标签: 分类方法 网络流量 数据清洗 网络管理系统 高流动性 服务提供商
  • 简介:在无损失的压缩的地里,当他们面对集体数据时,传统的软件的大多数类型有一些缺乏。他们的压缩能力被数据窗口尺寸和压缩格式设计限制。这份报纸论述格式说出支持数据窗户尺寸直到4GB并且在集体数据压缩有一些优点的CZ格式的压缩的一个新图案。用这种格式,一个压缩共享软件说出ComZip被设计。实验结果支持那ComZip在大多数情况中比WinZip,Bzip2和WinRAR有更好的压缩比率,特别当集体数据的GB或TB被压缩时。并且ComZip有潜力当数据窗户尺寸超过128MB,在未来打败7压缩。

  • 标签: 数据压缩格式 设计 数据窗口 WINZIP WINRAR 无损压缩
  • 简介:Itisknownthatconditionalindependenceisaquitebasicassumptioninmanyfieldsofstatistics.Howtotestitsvalidityisofgreatimportanceandhasbeenextensivelystudiedbytheliterature.Nevertheless,alloftheexistingmethodsfocusonthecasethatdataarefullyobserved,butnoneofthemseemshavingtakenintoaccountofthescenariowhenmissingdataarepresent.Motivatedbythis,thispaperdevelopstwotestingstatisticstohandlesuchasituationrelyingontheideaofinverseprobabilityweightedandaugmentedinverseprobabilityweightedtechniques.Theasymptoticdistributionsoftheproposedstatisticsarealsoderivedunderthenullhypothesis.Thesimulationstudiesindicatethatbothtestingstatisticsperformwellintermsofsizeandpower.

  • 标签: CONDITIONAL INDEPENDENCE CUMULATIVE SUM process of
  • 简介:TheALICEdetectoratLHC(CERN),willrecordrawdataatarateof1.2Gigabytespersecond.TryingtoanalyseallthisdataatCRNwillnotbefeasible.AsoriginallyproposedbytheMONARCproject,dtacollectedatCERNwillbetransferredtoremotecentrestousetheircomputinginfrastructure,Theremotecentreswillreconstructandanalysetheevents.andmakeavailabletheresults.Thereforehigh-ratedatatransferbetweencomputingcentres(Tiers)willbecomeofparamountimportance.ThispaperwillpresentseveralteststhathavebeenmadebetweenCERNandremotecentresinPadova(Italy),Torino(Italy),Catania(Italy),Lyon(France),Ohio(UnitedStates),Warsaw(Poland)andCalcutta(India),Thesetestsconsisted,inafirststage,ofsendingrawdatafromCERNtotheremotecentresandback,usingaftpmethodthatallowsconnectionsofseveralstreamsatthesametime.Thankstothesemultiplestreams,itispossilbletoincreasetherateatwhichthedataistransferred.Whileseveral"multiplestreamftpsolutions"alreadyexist,ourmethodisbasedonaparallelsocketimplementationwhichallows,besidesfiles,alsoobjects(oranylargemessage)tobesendinparallel.Aprototypewillbepresentedabletomanagedifferenttransfers.Thisisthefirststepofasystemtobeimplementedthatwillbeabletotakecareoftheconnectionswiththeremotecentrestoexchangedataandmonitorthestatusofthetransfer.

  • 标签: 程序设计 数据传输程序 传输监视
  • 简介:Distribution,interoperability,interactivity,componentarefourmainfeaturesofdistributedGIS.Basedontheprincipleofhypermap,hypermediaanddistributeddatabase,thepapercomesupwithakindofdistributedspatialdatamodelwhichisinaccordancewiththosefeaturesofdistributedGIS.Themodeltakescatalogserviceastheoutlineofspatialinformationglobalization,anddefinesdatastructureofhypermapnodeindifferentlevel.Basedonthemodel,itisfeasibletomanageandprocessdistributedspatialinformation,andintegratemulti_source,heterogeneousspatialdataintoaframework.Traditionally,toretrieveandaccessspatialdataviaInternetisonlybythemeormapname.Withtheconceptofthemodel,itispossibletoretrieve,load,andlinkspatialdatabyvector_basedgraphicsontheInternet.

  • 标签: 乔治国王岛 分发 GIS hypermap 数据模型 互操作性
  • 简介:Timelyandcost-efficientmulti-hopdatadeliveryamongvehiclesisessentialforvehicularad-hocnetworks(VANETs),andvariousroutingprotocolsareenvisionedforinfrastructure-lessvehicle-to-vehicle(V2V)communications.Generally,whenapacket(oraduplicate)isdeliveredoutoftheroutingpath,itwillbedropped.However,weobservethatthesepackets(orduplicates)mayalsobedeliveredmuchfasterthanthepacketsdeliveredalongtheoriginalroutingpath.Inthispaper,weproposeanoveltreebasedroutingscheme(TBRS)forultilizingthedroppedpacketsinVANETs.InTBRS,thepacketisdeliveredalongaroutingtreewiththedestinationasitsroot.Andwhenthepacketisdeliveredoutitsroutingtree,itwon'tbedroptimmediatelyandwillbedeliveredforawhileifitcanarriveatanotherbranchofthetree.WeconducttheextensivesimulationstoevaluatetheperformanceofTBRSbasedontheroadmapofarealcitycollectedfromGoogleEarth.ThesimulationresultsshowthatTBRScanoutperformtheexistingprotocols,especiallywhenthenetworkresourcesarelimited.

  • 标签: 数据包传输 数据传输 个数 节点 AD-HOC网络 路由协议
  • 简介:ATLAS[1]hasrecentlyjoinedGaudi,anopenprojecttodevelopadataprocessingframeworkforHEPexperiments[2],ThedatamodelisoneoftheareaswhereATLAShasextendedmoretheoriginalGaudidesigntomeettheexperiment'sownrequirments.ThispaperdescribesStoreGate,thefirstimplementationoftheATLASDataModel.

  • 标签: 高能物理学 数据模型 ATLAS软件
  • 简介:高压缩比率,高译码性能,和进步数据传播是为WebGIS的向量数据压缩算法的最重要的要求。满足这些要求,我们在场一条新压缩途径。这篇论文由把漂流坐标变换成整数坐标以多尺度的数据的产生开始。在屏幕上的变换的点和原来的点之间的距离在2个象素以内,这被证明,因此,我们的途径对顾客方面上的向量数据的可视化合适。整数坐标被传递给一个整数小浪变压器,并且高周波的系数由变压器生产了被正规哈夫曼代码编码。河数据和道路数据上的试验性的结果表明建议途径的有效性:为河数据的压缩比率罐头活动范围10%和20%为道路数据分别地。我们断定更多的注意需要被付到在包含一些点的弯曲之间的关联。

  • 标签: 向量数据压缩 WebGIS 进步数据传播
  • 简介:它仅仅是能在数据被存储的真实世界的可见部分。为如此的不完全、组织病的数据,结晶的数据瞄准atpresenting在包括unobservable事件的事件之中的隐藏的结构。,这被数据结晶化认识到哑巴项目,相应于unobservable事件的潜在的存在,被插入到给定的数据。有可见事件的这些哑巴项目和他们的关系被applyingKeyGraph与哑巴项目设想到数据,象灰尘涉及水分子的结晶化的形成的雪的结晶化一样。为调节要设想的结构的颗粒度水平,数据结晶化的工具与人在真实世界上理解重要情形的过程是综合的。这个基本方法被期望为机会发现的以前的方法带人到成功的决策的各种各样的真实世界领域适用。在这篇论文,我们在一个真实公司与human-interactiveannealing(DCHA)把数据结晶化用于产品的设计。结果显示出它的效果到工业决策。

  • 标签: 新产品 设计 人机互动 数据具体化
  • 简介:Thisresearchtakestheviewthatthemodellingoftemporaldataisafundamentalsteptowardsthesolutionofcapturingsemanticsoftime.Theproblemsinherentinthemodellingoftimearenotuniquetodatabaseprocessing.Therepresentationoftemporalknowledgeandtemporalreasoningarisesinawiderangeofotherdisciplines.Inthispaperanaccountisgivenofatechniqueformodellingthesemanticsoftemporaldataanditsassociatednormalizationmethod.ItdiscussesthetechniquesofprocessingtemporaldatabyemployingaTimeSequence(TS)datamodel.Itshowsanumberofdifferentstrategieswhichareusedtoclassifydifferentdatapropertiesoftemporaldata,anditgoesontodevelopthemodeloftemporaldataandaddressesissuesoftemporaldataapplicationdesignbyintroducingtheconceptoftemporaldatanormalisation.

  • 标签: 相关数据库 数据存储 时间序列模型
  • 简介:GeographicalInformationSystem(GIS)iswidelyusedinmanyfields.Withtherapiddevelopmentofcomputernetwork,GISuserscaremoreaboutdatasharinginnetworks.Intraditionalrelationaldatabase,dataconsistencywascontrolledbyconsistencycontrolmechanismwhenadataobjectislockedinasharingmode,othertransactionscanonlyreadit,butcannotupdateit.Thisisappropriateintraditionalrelationaldatabasesthatstoreattributedataandmainlydealwithshorttransactions.Inspatialdatabases,becauseofvastamountofdataandcomplextopologicalrelations,longtransactionaremetfrequently.Ifthetraditionalconsistencycontrolmethodhasbeenusedyet,thesystem'sconcurrencywillbebadlyinfluenced.SotherecomemanynewrequirementsfortheconsistencycontrolinthefieldofGIS.Therearemanyaspectsofdataconsistencyproblemsinspatial databases,suchastheinconsistencybetweenattributeandgeometrydata;theinconsistencyoftopologicalrelationsaftergeometryobjectshasbeenmodified.Inthispaper,othertwocasesofdataconsistencyarediscussedinMulti_userGeographicalInformationSystem.  InGIS,therearemanyformsofdata,suchasgeometrydata,attribute,imagedata,andDEMdata.Inthispaper,weonlydiscussspatialgeometrydata.

  • 标签: DATA SHARING DATA CONSISTENCY lock UNDO/REDO
  • 简介:万维网上的地图数据的进步传播向用户提供自我适应的策略存取遥远的数据。它不仅加快网转移而且为信息提供一个有效航行指南获得。在这传播的关键技术在服务器地点上是空间数据和组织前的有效多重表示。这份报纸瞄准从三个方面调查一些限制的进步传播:服务器地点上的数据组织,在传播进程和数据的数据控制在联系顾客以后恢复。二策略,也就是联机的地图归纳和离线的地图归纳,为这种进步传播分别地被检验。

  • 标签: 进步传播 地图归纳 空间分辨率
  • 简介:ARGO-YBJ,aChinese-ItalianCollaboration,isgoingtofinishthefirststepoftheinstallationofthiscosmicraytelescopeconsistinginasinglelayerofRPCs,placedat4300m.elevation,inTibet,Thedetectorwillprovideadetailedspace-timepictureoftheshowersfront,initiatedbyprimariesofenergiesintherange10GeV-500TeV.Thedatatakingwillstartatthebeginningof2002withafractionofthedetectorinstalled.willbeupgradedtwotimes,beingcompletedattheendof2003,Inthispaperwebrieflydescribethedataflow,thetriggerorganization,thethreeoperationalstepsindatatakingandthecomputingmodeltoprocessthedata.theneedofremotemonitoringoftheexperimentwillbetouchedupon.TheprocessingpowerfortherawdatareconstructionandfortheMonteCarlosimulationisreported.

  • 标签: ARGO实验 数据收集 数据处理
  • 简介:Receiveroperatingcharacteristic(ROC)curvesareoftenusedtostudythetwosampleprobleminmedicalstudies.However,mostdatainmedicalstudiesarecensored.UsuallyanaturalestimatorisbasedontheKaplan-Meierestimator.InthispaperweproposeasmoothedestimatorbasedonkerneltechniquesfortheROCcurvewithcensoreddata.Thelargesamplepropertiesofthesmoothedestimatorareestablished.Moreover,deficiencyisconsideredinordertocomparetheproposedsmoothedestimatoroftheROCcurvewiththeempiricalonebasedonKaplan-Meierestimator.ItisshownthatthesmoothedestimatoroutperformsthedirectempiricalestimatorbasedontheKaplan-Meierestimatorunderthecriterionofdeficiency.Asimulationstudyisalsoconductedandarealdataisanalyzed.

  • 标签: ROC曲线 核估计 Kaplan-Meier估计 平滑估计 大样本性质 样本问题
  • 简介:Thispaperpresentsamethodologytodeterminethreedataquality(DQ)riskcharacteristics:accuracy,comprehensivenessandnonmembership.Themethodologyprovidesasetofquantitativemodelstoconfirmtheinformationqualityrisksforthedatabaseofthegeographicalinformationsystem(GIS).FourquantitativemeasuresareintroducedtoexaminehowthequalityrisksofsourceinformationaffectthequalityofinformationoutputsproducedusingtherelationalalgebraoperationsSelection,Projection,and...

  • 标签: