http://www.niso.org/publications/press/UnderstandingMetadata.pdf

“Metadata is key to ensuring that resources will survive and continue to be accessible into the future.” This is the first time I heard of metadata, so I did research more details about it. This link gives information about the metadata’s definition, scheme, functions, features, and etc. We can understand metadata as data about data, or information about information. There are three main types: descriptive, structural, and administrative metadata. Descriptive metadata describes a resource for purposes such as discovery and identification. It can include elements such as unique identifiers (PURL, Handle), physical attributions (media, dimensions condition), bibliographic attributes (title, abstract, author, and keywords). Structural metadata facilitates navigation and presentation of electronic resources, for example, structuring tags such as title page, table of contents, or index, etc. Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. There are several subsets of administrative data; but the two that are well-known are: rights management metadata, which deals with intellectual property rights, and preservation metadata, which contains information needed to archive and preserve a resource. Some examples of administrative metadata are technical data such as scanner type and model, resolution, bit depth, copyright data, preservation activities (refreshing cycles, migration, etc.) and etc. Describing a resource with metadata allows it to be understood by both humans, and machines in ways that promote interoperability, which is the ability of multiple systems with different hardware and software platforms, data structures, and interfaces to exchange data with minimal loss of content and functionality. Using defined metadata schemes, shared transfer protocols, and crosswalks between schemes, resources across the network can be searched more seamlessly.

A little bit off topic of this week! When I read the article about Google Book, I wondered about if there is an archive for all Google doodles. And here it is http://www.google.com/doodles#archive The page gives info of when, how, who, why, what about Google doodles. The doodles are organized by date from the 2000 (the starting date) to now. In each doodle, it gives the design logo, the description about the holiday, name of doodlers (if applicable), the map showing the location the doodle reaches, and the other design in exact same date of previous year. It is interesting to know that some doodles appears in the particular countries. I found the search bar inefficient that it is need to be exact words. For example, searching for doodle “Teacher’s Day (Vietnam), I needed to type exactly “Teacher’s Day Vietnam” or otherwise, it should not show up as I tried some similar phrases like “Teacher’s Day Viet”. The map is very interesting that when you click on a country, it brings you to all designs from or for that country. However, the designs are not in any orders so it is inconvenient to exam them. It is just info for someone loving Google doodle like me.

https://medium.com/@johanoomen/soima-turing-vision-into-reality-and-positive-change-fc2388ea953f#.7587qolui

Thinking about how the people in the world plan to archive different mediums has recently gotten my attention. There are so many countries and cultures that chronicle their content in different ways, that no one system or data structure could contain all of the materials. This link looks into how the world’s leading archivists plan to chronicle the audiovisuals around the world. Although they don’t have as much say in the technologies that are popular in an era, they are working towards agreeing on the formats to store them in the coming years.

From the conference that took place, they were able to solidify the standards for the next ten years. However, even in those ten years I wonder how much adaptation will need to take place for the new media types that are becoming available. With Virtual Reality kicking off, and high fidelity video and audio gaining traction, the physical size of the data is growing ever so larger. The amount of thought that is required from a data management front, legal front, and logistics front to make this common around the world is a huge undertaking for these leaders.

Housing digital data is significantly different compared to physical mediums. The ease of accessing this digital content has affected issued the archivists due to copyright regulations, SOIMA is leading initiatives to facilitate this process for archival purposes. The redundancy that SOIMA insists on is also important difference, this is something that cannot be done with physical materials. However in digital content, with the possibility of losing much of it in a system failure having it stored in multiple locations allows for more countries to collaborate along with keeping data safe. I believe this aspect of making this a global initiative on their part allows for more countries to share their archives together on one standard, which is incredibly useful for scholars and scavengers looking for information.

Although it is very important and respectable to maintain a good record of archives, but does it matter if they are archived digitally or physically ? I have always enjoyed looking at paper archives but are they necessary/needed ?

Physical archives usually refer to the process of archiving paper documents. However, archiving important information like forms, medical records, legal documents, customer files or conference papers in paper may not be a good idea. Paper cannot be stored over a long period and things like moisture, mildew, mold or improper handling can all ruin your important data.

With the invention of scanner, digital document archiving came into being. Scanners are used to capture a digital image of a document, which can then be easily stored on a server or hard drive. The biggest advantage of digitally archiving documents is that your data cannot be tampered with. You can also save on space, as you will only be using up virtual space. By opting for digital archiving, you can eliminate the need of maintaining bulky physical records.

I personally think that digital archiving is the way to go and I have the following reasons to why:

Here are four benefits that you can leverage by outsourcing document archiving :

  • Get access to fast and accurate document archiving services at an affordable cost
  • Save on making investments in digital archiving technology or infrastructure
  • Focus on your core business, while staying assured that experts are archiving your data
  • Archive your information in any format of your choice

http://rkroundtable.org/2014/10/14/digital-archives-heres-the-problem-how-would-you-address-it/

This is an article that addresses what we have been talking about the whole semester, the problem of digital archives. This group of expert archivists is trying to discover the best way to archive their collection with accessibility as the key to their project. The problem here lies with the word accessibility. Even if you are able to see on a document in digital form, but not have the document in person to see the different features of the document in person, have you really accessed it? I would say no. Many times the importance of a document or anything that is archived is the document itself. Now the content of the document is also important, but most of the time the whole story is not told strictly from the content. Many times with the increase of digital archives you are only able to access the content of the document, but not the document itself.

To give an example of why this could be a problem, my sister, who is grad student at Georgia State, was recently researching the impact the Civil War had on the family dynamics of the south. During her research she contacted the Atlanta Public Library Archives in order to read various letters that were sent during this time period. The office told her she was able to access these letters online through a digital source. However when she did so, she found the letters to be hardly legible, with most of the letters written in elaborate cursive. She contacted the archives department once again to set up a time to be able to view the letters in person; however the department did not have any openings until after her research was due.

In this way, even though my sister was able to access the content of the archive she was unable to use the content because of the difficulty accessing it. There are no easy ways to answer the question of accessibility. As more archives become digital we will be faced with the problem of ever shrinking accessibility to the original document, and be forced to deal with the digital copies of those documents.

We as students may have taken history classes and studied about the content that each generation preserves, but it is quite uncommon to think and learn about the medium of history. This can be best observed in the effect that a certain medium has when used to preserve human history. For example, in the GT Archives, our alumni lacked the technology to capture videos of the students and teachers. In their time, documentation were mostly handwritten and simple black and white photography was the norm. And as such, this has shaped the way we view our past: we see much less of a technological impression in our society. We can even see this more drastically when looking further into past history. Our information on the Roman Empire and its people is limited by the books produced during the time. Given that there were no other mediums like photography, our image of the Roman people can be quite hollow, and we can’t get quite as accurate of their society as we would like.

On the flip side, with technological advancements in documenting our daily lives, particularly in video recordings, we have been able to, in a sense, archive our current lives. Although the scope of this archival process is limited in that we do not place as much value in this present history as we would for previous history, perhaps our present recordings will serve as much needed advanced archival mediums for future generations. I find it pleasing to imagine a future where, with technologies far greater than ours, historians may pull up our archaic video recordings onto their 3D holograms and wonder how people were able to share experiences through a 2D view. I might liken this experience to our own experience at the GT Archives: browsing through old photos and hand-written letters in order to understand and appreciate the past. it makes you come to appreciate the current ability of technology, and wonder about the possibilities of future archives.

http://www.careerempowering.com/resume-empower/creating-a-scannable-resume.html

While reading the article about how metadata can be inaccurately recorded, I thought about an info session for a job that I attended last semester. The representative for that company was telling us about their resume scanning software, and how important it was to include keywords on our resumes. We were told about how sometimes the software wouldn’t pick up on some words or would log them incorrectly. The website above also goes into similar concerns, and gives pointers on how to format a resume in order for the software to correctly find and log the desired data.

This made me think about how reliant we are becoming on software and the use of computers for tasks that used to fall on humans to perform. How can we be sure that a computer will pick up on something that a human could surely judge? How often is the computer mistaken? I think that both the article and the website about formatting a resume for the software address this problem because they recognize the possibility of error. The data for the books was getting incorrectly logged, even entering dates that were before that author’s time or representing information that couldn’t possibly be correct. A meticulous person performing the same task would likely have not made this mistake when entering the information into a database, but while relying on software or google to keep track reveals a greater opportunity for incorrect data. Is the time saved by using computers to log more information worth the risk of that information being incorrect or misleading? How can we improve this process on both accounts? What kind of negative repercussions could come of this, or how can we be negatively affected by incorrect data? Will there eventually be a time where we won’t know what is even incorrect?

Metadata is defined as data about data. In simpler terms, it can be described as the contextual information that explains the who, what, where, when, and why of the data. Connecting our discussions on metadata and our recent field trips, I would like to discuss how the Georgia Tech Archive functions as an example of metadata.

In both field trips to the archives, we were assigned to browse and research articles and other artifacts. In the first trip, the class browsed through sci-fi zines stored in grey boxes with a bit of information on the side. In addition to the zines’ dates, the information on the side of the box told either the common author of the contained zines, or the fan publications that spread the short stories around. Because of this data it made sense when a group uncovered a dozen or so zines about people with hand tentacles. The locations of these boxes were fairly unordered which can indicate the chronological importance, or maybe the lack thereof, for each box.

For the second field trip we browsed through Georgia Tech – related information. In addition to boxes that gave similar information, there were books that contained myriad magazines as if the books themselves were a small archive. In this field trip we discovered many of Georgia Tech’s interesting little stories of the past. In many cases the archive’s website had pictures showcasing events, and the archivist and the data each group collected gave context, metadata, to each picture.

Lastly, I would like the reader to think about the Georgia Tech Archive as data itself for a small mental exercise. What can be said about it? What is its metadata? Maybe consider its location, the events surrounding it right now, and the popular viewpoint of it.

In the current digital landscape we find ourselves in today, metadata is arguably more important than the data itself. The importance of metadata and how it influence SEO, or, misinforms querries is appropriately examined in this week’s reading of Google Books: A Metadata Train Wreck. Metadata provides us with our best hopes of making sense out of large chunks of data. While at first glance the ability to store digital data excites so many people, what the data can actually be used for after its been collected all too often doesn’t get thought through. I can’t think of anything that can render an archive virtually useless more than a poor system for applying metadata to the artifacts it contains.

The issues that became apparent in our readings due to improper metadata tagging combined with an abundance of data reminds me of issues that can frequently occur in my profession. With the same excitement as those who geek out over big data, filmmakers/video producers can get so caught up in ‘getting shots of EVERYTHING” that by the end of shooting, there’s no clear way of organizing all of the clips, or a clear vision for how to meaningfully make use of them all for a final piece. I feel bad for Beyonce’s archivist. From the outside looking in it seems as though that poor schmuck was given a never-ending nightmare of a gig that will extend well beyond Beyonce’s lifetime, and is far worse than anyone working on a documentary or reality television show (the most time intensive types of productions for editors).

Without the context through which to see data that metadata helps provide, there is the risk that someone wanting to experience an archive may be deterred from even attempting to navigate through what all it has to offer.

http://languagelog.ldc.upenn.edu/nll/?p=1701

I found Geoff Nunberg’s August 29, 2009, blog to be very interesting and educational. I always thought of metadata as something entered manually, mainly for music files or something of the sort. I understood metadata to essentially be a sort of tagging system for files, but never realized just how important they could be. Nunberg essentially explains that Google uses metadata to digitally catalog the scanned books offered through Google Books. This metadata determines the dating and categorization of these scanned works. However, as of 2009, Google wasn’t doing a very good job at that, in part because they automate the process of collecting metadata from OCR’d (Optical Character Recognition) scans (pretty much, they scan pages, and OCR will detect the printed words on that page), which is essential given the vast amount of books. The problem with Google Books’ classification of books was that their automated metadata collection method was quite poor, pulling dates and classifications based on any date that appeared in the book (whether related to the publication date or some advertisement included within) and on select words in the book’s title, resulting in erroneous publication dates and categorization. Furthermore, Google wasn’t doing a good job of responding to the vast amount of errors with their system. Nunberg found this to be an issue because, given the Google Books Search Settlement Agreement, which eventually allowed Google Books to collect scans of books from libraries without being guilty of infringing on copyrights, Google’s service would probably be the source of digitally-preserved books for the foreseeable future, having quite a large (if not the largest) collection of scanned material. Therefore, it was and is imperative that the classification of books offered through Google Books be accurate.

This discussion of metadata and its proper use is appropriate considering Georgia Tech’s “Library Next” project, which aims to move the majority of the library’s print collection to a joint Emory-Georgia Tech facility (EmTech Library Service Center). With the print material off-site, the Library plans to offer potential new services, including “On-demand” scanning services, multiple deliveries of print material per day, and a reading room in the Library Services Center (read more: http://renewal.library.gatech.edu/content/plans-renew-georgia-tech-library-move-forward-campus-engagement-process-and-architectural). Along with these proposed services, there will come a greater need for student access to digital material, and the printed material housed at EmTech will need to be properly catalogued.

One question comes to mind: Several libraries have already digitally catalogued their print material. Is a service like Google Books necessary to preserve print material, or is it better for libraries to individually digitize their print collections?

Metadata as an Archive (Blog Post)

Data about data? What level of Inception are we entering? (Tangential question: are there archives about archivists?) It’s noteworthy to me that as I type this blogpost in Microsoft Word, the document itself is creating its own metadata. I see it tracking my word count in the lower left-hand corner, recording my editing time, and even naming me the author of this document!

Metadata

But to get back to business, metadata is an interesting concept. In relation to this class, it could very well be considered a method of archiving data. Archiving, as we’ve discussed, can have many interpretations – from preserving data, to cataloguing it, to trying to provide the archived material with a more accessible audience, and so on. Metadata has the ability to archive data through many of these methods. By providing information about the data, it eases the path to cataloguing it, which in turn, can enhance circumstances for preserving the data and reaching a larger audience.

Returning to the example of this Word document, it is actually pretty fascinating to realize how much metadata is being collected about what many would consider to be insignificant babblings by an inconsequential college student. What makes this blogpost worthy of archiving in the eyes of a computer? Therein lies potential. For human archivists, it takes effort and care to archive data. Not only must they focus on protecting and sharing the archived materials, they also must decide what materials are even worth archiving. Computers are far more efficient in processing and documenting data, and can therefore afford to be undiscerning in its metadata collection. Perhaps 50 years from now, the word count on this blogpost will actually have tremendous historical importance. Without this indiscriminate collection of metadata, that information would be lost forever.