#title(情報知識学会 1999 年度 第 7 回研究報告会 (1999年5月22日(土)))
*情報知識学会 1999 年度 第 7 回研究報告会 (1999年5月22日(土)) [#u6efab3d]



**NON-SEMANTIC ATTRIBUTES OF DOCUMENTS AND THEIR IMPLICATION TO IR [#tbb5bc30]
'''Yongli Zou、Yoshihiro Sagara'''~
'''慶應義塾大学'''~


Indexing and searching mechanism based mainly upon semantic attributes of documents has been showing certain limitations, and the function of the non-semantic attributes and their implication to information retrieval and system design are not yet fully explored. By summarizing the non-semantic characteristics of documents and by analyzing them in terms of information needs, information seeking and use as well, the authors attempt to explore their potentialities to information retrieval and to information system design as well. 



**When will we stop search sessions ? [#k8ab0e3a]
'''Yoshihiro Sagara'''~
'''慶應義塾大学'''~


In the conventional information retrieval research, evaluation of information retrieval system has been performed mainly on the basis of precision and recall of final research result. However, modern retrieval systems are designed to provide results through the interaction with users. The decision by a user to stop his search is therefore influenced by the interactions done up to the point. The survey by questionnaires was conducted to clarify the factors which affect the decision of stopping a search. In the case when users stop search in unsatisfied way, many non-essential reasons are pointed out for giving up further search. The implications of the stop of a search in user behavior study are emphasized.

In the conventional information retrieval research, evaluation of information retrieval system has been performed mainly on the basis of precision and recall of final research result. However, modern retrieval systems are designed to provide results through the interaction with users. The decision by a user to stop his search is therefore influenced by the interactions done up to the point. The survey by questionnaires was conducted to clarify the factors which affect the decision of stopping a search. In the case when users stop search in unsatisfied way, many non-essential reasons are pointed out for giving up further search. The implications of the stop of a search in user behavior study are emphasized. 



Document Navigation System using Text Mining Algorithms 
Hiroyuki KAWANO、Minoru KAWAHARA 
京都大学 


Many knowledge discovery tools have been developed using data mining, the integrating technologies of machine learning database, statistics and others. We have been constructing "mondou" search systems based on extended association rules. In this paper, we discuss the experimental results of text mining applied for web hyper texts, INSPEC database, and magazines and articles index data in the National Diet Library. First of all, we express about the efficient strategies in order to derive association rules. Next, we discuss the relation between time threshold values and association rules, and we focus on the techniques of ROC (Reciever Operating Characteristics) graph to evaluate the characteristics of derived rules. By using the ROC convex full method, we can estimate appropriate threshold values to derive association rules for keywords. 





Trends in Digital Contents Services and Evolution of Informatics 
Masamitsu NEGISHI 


Development of the Internet has opened a new epoch of post-Web or post-search engine period since around 1996 - 1997. Now "portal sites" are often referred to in substitution for search engines as to indicate the main entrance to the Internet, where various types of digital contents services are being provided. The paper in the first half gives an overview to these services including issues of digital libraries, electronic publication, copyright and MP3 formatted music. The trend seems to encourage authors to make self-publication or direct marketing of their works. Though informatics is expected to give an effective future view for the development of informatized society, the analyses appear to have been superficial. The establishment of the research area in informatics including its methodology would be presently most required. 





知識の"柔らかさ"計量の試みとその意義 
藤原 鎮男 
神奈川大学 


知識は人間の知的活動の成果であり、文化や科学の本体である。本報告は、この知識の特性を定量的に捉える試みである。試みは、<人文科学、自然科学を通じてその「知識」は「普遍と個別」の二面から成る>という認識と、<この二面は、知識の表現媒体である語彙の出現の様相で分かる>という考えに立っている。実際に科学および国文学資料の語彙の出現頻度を解析した結果は、出現頻度上位の語彙と下位のものとがそれぞれこの二面に対応することを具体的に示した。とくに最上位から数えて数番までの語彙群と、最下位から上に向けて数番の語彙群は、順位に対して厳密な指数関係に立つことが明かになった。指数関係であるので、上位の数語は全語彙の出現頻度の過半を占めることになる。すなわち、それらは全体に相関することになるので、これらの語彙が優越するドキュメントは柔らかいと定義し得ると考える。すなわち出現頻度から普遍的ないし個別的語彙の別を知り、その相対比率で知識の'柔らかさ'が計量し得ると考えるのである。以上の普遍と個別の語彙の分類は、それらが均一分散であるとして見た結果である。こうして分類された普遍性ないし個別性の語彙群について、さらにそれらの間の相関を調べたところ、化学語彙ではその専門分科の特性に、国文学資料では情報表現の特性に対応する二次の相関の存在が見出された。これは、知識の高次構造の指示である。すなわち、<語彙の解析が、知識の専門分科の形成ないし、知識の表現特性の把握>に及ぶことを示唆するのである。あらためて全体を見直すと、最初に見出された相関は成分間の相関が弱い段階であり、各成分は個別、独立に近いとされる。ただし、相関はあるので外力の影響は全体に吸収分散され、その意味でこの系は<柔らかい>ことになる。近似的には系の全成分は一様、均一である。これに対して第二段階では成分間の相関はある限度を超えた状態であり、「知識」にクラスター(群、ないし専門と言ってもよい)構造が生まれることになる。この「知識における構造の認識」は、広く見ると、自然界の基本現象として他にも見出される。かくて、ここで述べた相関による「知識の柔らかさ」の計量は、人文科学と自然科学を通ずる「自然原理」へのアプローチになると考えられる。これをより一般的な解析演算路に乗せることを今後の情報知識学の課題として努力したい。 





Engaging Plan for Neuroinformatics in RIKEN BSI 
Shun-ichi Amari、Kazunori Nakabayashi 
理化学研究所脳科学総合センター 


Understanding brain structure, function and its development is a main theme in the 21st century. In Europe and U.S.A., "the Decade of the Brain" is planned and the related research activities are promoted. Also in Japan, the system has been formed and RIKEN BSI(Brain science Institute) was established in 1997 as a series of those activities. While researches in neuroscience are aggressively carried out in Japan, U.S.A. and Europe respectively, OECD Megascience Forum, which is promoting the international cooperation in megascience, established a Neuroinformatics Subgroup. They discuss the importance of integrating researches in neuroscience and the information technology, the necessity of the international cooperation of research in neuroscience, and also the influence of its information industry and medical field. Now I would like to make a presentation about Neuroinformatics - its definition, goal, measurements for development, the current movement and engaging plans in RIKEN BSI. 





情報知識学試案 ?起草・承章? Why? & Then 
村上 茂三 
止観第一研究所 


情報知識学という「新しい学術」は、益々、複雑化し乱脈化する近未来社会を、楽しく明るいものにする為の 「ひとつのガイド」 に成り得ると、堅く信じつつ思いを巡らして居ります。思索半にて失礼とは存知つつも、終着が見えぬ程に自侭に迷だして参りました大筋を報告申し上げる決心を致しました。各要項を複眼的視点から検討しているのですが、記述・口述は直線的である為、加えて貧困な文章技術の為に、御判り難い悪文となっております。御詫び申し上げます。然しながら、心を込めて上申致しますので御汲取り下さいますよう。其上にて、諸兄姉におかれましては、厳しく御検査、御指導下さいます様お願い申し上げます。更なる検討をより的確なものにさせて戴き度存知ております。 





A Multilingual Full-text Retrieval System for Tagged Documents 
Tetsuo Sakaguchi、Shigetaka Nakao、Akira Maeda、Shigeo Sugimoto、Koichi Tabata 
図書館情報大学 


The Internet enables people to share documents written ill various languages worldwide. Many documents on the Internet are provided by the WWW. Most of them are markupped with HTML tags. The tags which indicate document elements are very useful for full-text retrieval. The author considers that a full-text retrieval system for tagged multilingual documents is very important to get useful information. This article describes a multilingual full-text retrieval system for tagged documents. It has functions to store and retrieve SGML, XML, and HTML documents. The system handles character code sets both ISO-2022-JP-2 and Unicode for multilingual texts. it is developed with Java for portability. This article also discusses the performance issues of the implemented system. 





An Implementation of a Regional IX and Related Network Applications 
Shoryu ATAKA、etc 
富山県立大学、他 


Through a implementation of a regional IX(Internet exchange), we plan to promote new infrastructures for information communication network in Toyama region. Some background and needs of the regional IX are discussed in this paper. In addition, some ideas of related applications utilized in the inside of this regional IX network are reported. 





The Browser for Technical Terms with Hierarchical Structure 
Takanobu Gotoh、Yusuke Suzuki、Tomonori Gotoh 
神奈川大学 


Thesauri have been widely used in bibliographic databases for 30 years. Recently, CD-ROMs of a variety of dictionaries with their GUIs are spreaded to current users. On the other hand, End users dose not use thesauri as for their bulky printed matter. The king of software browsing graphically and managing thesauri does not appear in PC environment..
The browsing tool for lexical database with hierarchical structure has been developed using Java.. This paper describes the functions, the components, and examples of its usage. The problems of the browser and the functions to be extended are discussed. 





Constructing a Glass Material Database Using Java Language 
Tomoaki Saito、Hisashi Oguro、Takushi Fukami 
凸版印刷(株) 


Along with expanding the Internet, database and information shearing services became popular network applications widely used by the public and JAVA technology made us able to develop object oriented programs for multi-platform environments. Considering the situations above, we developed a database CD-ROM using a JAVA search engine that was originally developed for WWW based services. This was done without making major modifications on the programs.
We have made sure that the same JAVA application can be shared among Internet users and non-Internet users. It's provided as the WWW based service for the Internet users and as the CD-ROM content for non-Internet users. 





Development and Publication of Geological Informations at Geolocal Suvey of Japan 
Isao HASEGAWA、Xinglin LEI 
地質調査所 


Geological Survey of Japan has published and digitized many kind of geological maps. The digitized maps are processed on GIS(Geographical Information System) for the use of education, civil engineering, environmental problems, mitigation of geological hazards so on.
We developed a simple GIS software GeomapZ for viewing and analyzing geological data. GeomapZ can read data from DLG-formatted vector data files, DEM-formatted elevation data files, raster image data files in BMP/TIF, and user data in text format. It is easy to create and to print high quality geological images using GeomapZ. Geomapt is a suitable and easy viewer particularly for publication of geological data in the way with CD-ROM. 





Vocabulaire spécialisé dans les revues des beaux-arts 
TSUJI Hiroko 
東洋大学 


Nous nous proposons d'analyser le vocabulaire spécialisé utilisé dans les revues japonaises des beaux-arts destinées au grand public. Le vocabulaire des beaux-arts en usage aujourd'hui repose essentiellement sur la traduction de concepts occidentaux réalisé à l'ère Meiji. Mais il continue à s'accoître de nos jours, dans la mesure où les informations artistique en provenance de l'étrangerarrivent au Japon en temps réel, et où une partie de ces informations introduisent un vocabulaire nouveau, concernant les tendances, les techniques ou l'esthétique, qui doit être traduit aussitôt pour sa diffusion sr place. 


Il se trouve que la plupart des vocables des beaux-arts sont des mots composés. En fait, dans le lexique japonais méme, le nombre des mots composés est supérieur à celui des mots simples. Nous nous demanderont quelles ont été les méthodes suivies pour exprimer les concepts occidentaux à l'aide des caracéres sino-japonais en usage au Japon. Etant donné que la fréquence et la productivité varient, les néologismes de ce type se révèlent complexes fdans leur interprétation. En examinant cette question, nous envisagerons la problématique de la traduction conceptuelle liée au systéme des caractères sino-japonais et à celui des katakana, afin de relever leur corrélation sématique. 


A trabers divers phénomènes langagiers, nous avons fait apparaître que les vocabulaires spécialisés des beaux-arts en kango et en katakana ont un point commun : les uns comme les autres présentent une structure incohérente de monts composés. 





A Study of Page Ranking Factors for WWW Search Engines 
Toshikazu Fukushima、Katsushi Matsuda、Hajime Takano 
NECヒューノマンメディア研究所 


This paper surveys page ranking factors used in the current WWW search engines, such as (1) relevance to query keywords, (2) freshness, (3) popularity, (4) citation rank and (5) page types. The relevance to query keywords have been studied in the traditional information retrieval researches. However, other factors are introduced into the WWW search engines in order to improve their ranking performance, because WWW contents are heterogeneous and changeable large-scale hypermedia. The freshness, the popularity and the citation rank are the factors introduced from a viewpoint of contents reliability. On the other hand, the relevance to query keywords and the page types are the ones corresponding to user's domain and task in problem solving. Selection and combination of these factors must be refined for satisfying user's information needs. 





Knowledge Extraction from Technical Papers of Metallurgy 
Chieko Nakabasami、Kenichi Hoshimoto 
東洋大学 


We claim that it is necessary for semantic representations of technical papers to describe a word's meaning from various of view and emphasize the semantic aspect of words. In this paper, we describe semantic representations of technical papers based on Pustejovsky's GENERATIE LEXICON(GL). Our focus is on technical papers concerned with metallurgy because this research is carried out in cooperation with an expert in metal engineering. Our purpose for making semantic representations is twofold: (1) to detect differences and inconsistencies among papers by applying appropriate mechanisms; and (2) to integrate the content of a new paper into the content of an existing one. The semantic representations of technical papers based on the generative lexicon are modified so that they match the characteristics of the papers. In addition, we propose operations worked effectively on the semantic representation. Modifying some rules included the semantic representation for each paper makes it possible to analyze the semantic differences and similarities among the papers. The semantic representation is implemented using an object-oriented database management system on MATISSE. 





Some Visualization Models for Conceptual Relations in Virtual Space 
鈴木 祐介、下村 央人、後藤智範 
神奈川大学 


GUI has been widely used as user interface for computer from about 10 years. Objects or components are arranged in two dimensional plain in lots of GUI software. Some software appears as internet applications which arranges objects in virtual three dimensional space with VRML. This paper proposes some kinds of visualizing models for conceptual relations in virtual three dimensional space. Conceptural relations means the states in which many terms(concepts) are complicatedly connected by the semantic relations, such as tool, agent. VRML scripts are expelimentaly made using the EDR concept dictionary to examine effectivenesses and problems. 





用語間の意味関係の抽出 
畑口 冬彦、藤原 譲 
神奈川大学 


多種、多量、の情報が広域流通する情報化が、急速に進んでいる現在も情報処理機能は依然として数値計算と符号照合すなわち、検索演算推論などが中心である。しかし、情報の内容に関する高度な機能に対する必要性も強く認識されるようななってきた。 そこで専門用語を概念の表現として捉え、その意味について記述、表現、理解、生成、処理の方式を明らかにして用語間の意味関係の抽出と、それに基づく用語による構造化知識の構築と意味理解、学習・思考機構解明の試みを報告する。



トップ   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS